top of page
WAN 2.1 Low VRAM Text To Image ComfyUI Workflow & One Click Installer

WAN 2.1 Low VRAM Text To Image ComfyUI Workflow & One Click Installer

Generate high-resolution 1024 × 1024 images in under 30 seconds on just 6 GB VRAM using the WAN 2.1 video generation model.

 

WAN 2.1 in ComfyUI offers advanced text-to-image capabilities, enabling users to create high-quality images directly from text prompts. Although WAN 2.1 is best known for its text-to-video (T2V) and image-to-video (I2V) workflows, you can easily generate single-frame images by setting the frame count to one in the workflow. This method delivers the same cinematic, high-fidelity rendering that distinguishes WAN 2.1 video outputs, resulting in visually impressive still images.

 

I've developed a one-click installer and a custom workflow that lets you run WAN 2.1 diffusion models or quantized GGUF models for fast image generation—even on low VRAM devices. For users who want to run full-precision or fp8 WAN 2.1 models, my pre-made WAN 2.1 Runpod Template is also available. Simply launch the pod and upload the workflow provided below.

 

Runpod Template:
https://get.runpod.io/WANVideo-ComfyUI-Template

 

Preloaded Models within the Installer(Low VRAM)

  • umt5-xxl-encoder-Q5_K_M.gguf (ComfyUI\models\clip) - https://huggingface.co/city96/umt5-xxl-encoder-gguf/tree/main

  • wan_2.1_vae.safetensors (ComfyUI\models\vae) - https://huggingface.co/Kijai/WanVideo_comfy/tree/main

  • self_forcing_dmd.pt (ComfyUI\models\diffusion_models) - https://huggingface.co/gdhe17/Self-Forcing/tree/main/checkpoints

  • 2xLexicaRRDBNet_Sharp.pth Upscale model (ComfyUI\models\upscale_models) - https://huggingface.co/Thelocallab/2xLexicaRRDBNet_Sharp/blob/main/2xLexicaRRDBNet_Sharp.pth

  • Wan21 T2V 14B lightx2v, Wan2.1 T2V 14B FusionX & WAN detailz-wan LoRA Models (ComfyUI\models\loras) - https://huggingface.co/Thelocallab/WAN-2.1-loras/tree/main

  • Speed: 1024 x 1024 resolution in under 30 seconds on RTX 4050 6GB VRAM; even faster on enterprise GPUs

 

System Requirements:

  • Nvidia RTX 30XX, 40XX, or 50XX series GPU (FP16 support required; GTX 10XX/20XX not tested)

  • CUDA-compatible GPU with at least 4–6 GB VRAM

  • Windows OS

  • At least 40 GB free storage

 

What’s Included:

  • Portable ComfyUI Windows Installer, pre-configured for WAN 2.1 text-to-image

  • Custom workflow supporting text-to-image generation

  • Automatic downloads for all required nodes and models

 

Usage Notes:
Type in your detailed text prompt describing the image you want. Use an LLM to enhance your prompt for best results.

Support and More Information

  • Community Support: For troubleshooting or to connect with other users, join the Discord server.

  • Buy On Patreon

    While I improve the store, you can purchase these items or sign up for a membership on Patreon  - https://www.patreon.com/TheLocalLab.

$4.00Price
Quantity
    bottom of page