WAN Self Forcing T2V & VACE I2V ComfyUI(6 GB VRAM) - One Click Windows Installer
A new variant of the WAN Video Generation model, Self Forcing, is now available, enabling fast, high-quality video generation on consumer GPUs. This release includes a one-click installer and a complete workflow for both text-to-video and VACE image-to-video tasks.
What is the Self Forcing Model?
Self Forcing is an advanced autoregressive video diffusion model designed for real-time, streaming video generation. It simulates the inference process during training, using autoregressive rollout with KV caching. This approach eliminates the common mismatch between training and inference, resulting in smoother, more temporally consistent videos. The model excels at generating high-resolution 480p videos with an initial latency of about 0.8 seconds, followed by streaming frame generation at around 10 FPS on a single RTX 4090 GPU. Compared to previous models, Self Forcing offers:
Significantly faster generation (150–400x lower latency than prior models)
Superior or comparable visual quality, with smoother motion and no over-saturation
Real-time, identity-consistent, and motion-smooth video synthesis, especially when using long, detailed prompts
Model Details:
Model size: 1.3B parameters
Output: High-quality 480p videos
Speed: ~10 FPS on RTX 4090, faster on enterprise GPUs
Quality: Matches or exceeds state-of-the-art diffusion models; excels with long, descriptive prompts
System Requirements
Nvidia RTX 30XX, 40XX, or 50XX series GPU (fp16 and bf16 support required; GTX 10XX/20XX not tested)
CUDA-compatible GPU with at least 6GB VRAM
Windows operating system
Minimum 30GB free storage
What's Included in This Post
Portable ComfyUI Windows Installer: Pre-configured for Self Forcing WAN Video Generation.
Custom Workflow: Supports both text-to-video and VACE image-to-video generation.
Automatic Node and Model Download: All required custom nodes and models are downloaded and installed automatically.
Preloaded Models
The following models are included and will be automatically downloaded:
Wan2.1-T2V-1.3B-Self-Forcing-DMD-VACE-FP16.safetensors (ComfyUI\models\diffusion_models)
Wan2.1-T2V-1.3B-Self-Forcing-DMD-VACE-FP8_e4m3fn.safetensors (ComfyUI\models\diffusion_models)
umt5-xxl-encoder-Q5_K_M.gguf (ComfyUI\models\clip)
wan_2.1_vae.safetensors (ComfyUI\models\vae)
self_forcing_dmd.pt (ComfyUI\models\diffusion_models)
Usage Notes
The model performs best with long, detailed prompts, as it was specifically trained on such data.
Both text-to-video and image-to-video workflows are supported.
Support and More Information
Project GitHub: For technical details, updates, and documentation, visit the project’s GitHub repository.
Community Support: For troubleshooting or to connect with other users, join the Discord server.
Buy On Patreon
While I improve the store, you can purchase these items or sign up for a membership on Patreon - https://www.patreon.com/TheLocalLab.