WAN 2.1 Low VRAM Text To Image ComfyUI Workflow & One Click Installer
Generate high-resolution 1024 × 1024 images in under 30 seconds on just 6 GB VRAM using the WAN 2.1 video generation model.
WAN 2.1 in ComfyUI offers advanced text-to-image capabilities, enabling users to create high-quality images directly from text prompts. Although WAN 2.1 is best known for its text-to-video (T2V) and image-to-video (I2V) workflows, you can easily generate single-frame images by setting the frame count to one in the workflow. This method delivers the same cinematic, high-fidelity rendering that distinguishes WAN 2.1 video outputs, resulting in visually impressive still images.
I've developed a one-click installer and a custom workflow that lets you run WAN 2.1 diffusion models or quantized GGUF models for fast image generation—even on low VRAM devices. For users who want to run full-precision or fp8 WAN 2.1 models, my pre-made WAN 2.1 Runpod Template is also available. Simply launch the pod and upload the workflow provided below.
Runpod Template:
https://get.runpod.io/WANVideo-ComfyUI-Template
Preloaded Models within the Installer(Low VRAM)
umt5-xxl-encoder-Q5_K_M.gguf (ComfyUI\models\clip) - https://huggingface.co/city96/umt5-xxl-encoder-gguf/tree/main
wan_2.1_vae.safetensors (ComfyUI\models\vae) - https://huggingface.co/Kijai/WanVideo_comfy/tree/main
self_forcing_dmd.pt (ComfyUI\models\diffusion_models) - https://huggingface.co/gdhe17/Self-Forcing/tree/main/checkpoints
2xLexicaRRDBNet_Sharp.pth Upscale model (ComfyUI\models\upscale_models) - https://huggingface.co/Thelocallab/2xLexicaRRDBNet_Sharp/blob/main/2xLexicaRRDBNet_Sharp.pth
Wan21 T2V 14B lightx2v, Wan2.1 T2V 14B FusionX & WAN detailz-wan LoRA Models (ComfyUI\models\loras) - https://huggingface.co/Thelocallab/WAN-2.1-loras/tree/main
Speed: 1024 x 1024 resolution in under 30 seconds on RTX 4050 6GB VRAM; even faster on enterprise GPUs
System Requirements:
Nvidia RTX 30XX, 40XX, or 50XX series GPU (FP16 support required; GTX 10XX/20XX not tested)
CUDA-compatible GPU with at least 4–6 GB VRAM
Windows OS
At least 40 GB free storage
What’s Included:
Portable ComfyUI Windows Installer, pre-configured for WAN 2.1 text-to-image
Custom workflow supporting text-to-image generation
Automatic downloads for all required nodes and models
Usage Notes:
Type in your detailed text prompt describing the image you want. Use an LLM to enhance your prompt for best results.
Support and More Information
Community Support: For troubleshooting or to connect with other users, join the Discord server.
Buy On Patreon
While I improve the store, you can purchase these items or sign up for a membership on Patreon - https://www.patreon.com/TheLocalLab.