Gurudev Lipsync Shootout

23-second clip from Bhakti Sutras 11 (21:55-22:19) · VibeVoice-1.5B German dub · side-by-side lipsync outputs from each model · RTX 5070 Ti

Generated German audio VibeVoice-1.5B

Zero-shot voice clone from 45s Gurudev reference. Generation: 55s for 23.3s audio (RTF 2.36x).

Original English source

BS 11, 320×240→640×480, 25 fps. Gurudev explaining gradations of devotees.

LatentSync 1.6 — 512² SOTA diffusion, max quality

stage2_512 · 20 steps · DeepCache · 11.5 min · 16 GB VRAM. Required killing ollama to free host RAM.

LatentSync 1.6 — 256² efficient

stage2_efficient · 20 steps · DeepCache · 67s inference · 256² face region. 10× faster than 512 variant.

Wav2Lip GAN baseline

wav2lip_gan.pth · S3FD face detector · 580 frames in ~30s · the 2020 classic.

MuseTalk v1.5 Tencent latent inpainting

v15 unet (3.4 GB) · face_alignment replacing dwpose · 586 frames landmark-extracted in ~70s + inference ~60s. Required 4 patches to unblock on Windows/Blackwell.

VideoReTalking OpenTalker 3DMM+GFPGAN

7-step pipeline: landmarks → 3DMM → expression stabilize → face enhance → lip synth → composite. 4GB checkpoints. ~6 min on 5070 Ti. 6 patches needed to unblock.