WAN 2.2 SVI 2.0 Pro — Long Length AI Videos in ComfyUI
Workflow

WAN 2.2 SVI 2.0 Pro — Generate Long AI Videos in ComfyUI (Low VRAM)

Oct 2025 · 10 min read · AI Video · WAN 2.2 · SVI · ComfyUI · Long-Form

WAN 2.2 produces some of the best AI video quality available — but most setups cap you at around 5 seconds before memory runs out. The SVI (Stable Video Infinity) 2.0 Pro LoRAs completely change that by chaining clips together seamlessly so you can generate long-form video that technically runs as long as you want. And with quantized GGUF models, this works on cards with as little as 6 GB VRAM.

How SVI Chaining Works

Instead of generating a 1-minute video in one shot (which would require enormous VRAM), SVI uses a smarter approach:

♾️
Infinite Length
Chain as many 5-second segments as needed — no hard video length limit
💾
6 GB VRAM Minimum
GGUF quantized models enable low-VRAM operation (12 GB+ recommended for comfort)
4–8 Steps with LightX2V
LightX2V LoRAs cut required steps from ~20 down to 4–8 for faster generation
🔗
Seamless Transitions
SVI LoRAs maintain motion, lighting, and character consistency between segments
One-click installer available on Patreon — handles all file downloads, model placement, and ComfyUI setup automatically.

Step 1 — Install ComfyUI (Portable Windows)

  1. Download the ComfyUI Portable ZIP from the ComfyUI releases page and extract it with 7-Zip.
  2. Navigate into the custom_nodes folder, click the address bar, type cmd, and press Enter.
  3. Run git clone for the ComfyUI Manager repository.
  4. Navigate back to the main ComfyUI folder (where the Python embedded folder is) and run the specific command from the written guide to install manager dependencies inside the portable Python environment.
Written guide with all commands and links is linked in the video description — copy-paste all commands from there to avoid typos.

Step 2 — Download Models

You'll need to gather several files before launching the workflow:

File Source Destination
WAN 2.2 14B GGUF (high-noise) Quantstack HuggingFace — Image-to-Video repo → high-noise folder models/unet/
WAN 2.2 14B GGUF (low-noise) Quantstack HuggingFace — Image-to-Video repo → low-noise folder models/unet/
SVI V2 Pro LoRA (high + low noise) Kaiji HuggingFace → loras/stable_video_infinity/v2.0/ models/loras/
LightX2V LoRA (high + low noise) Kaiji HuggingFace → loras/lightx2v/ or LightX2V WAN 2.2 HuggingFace repo models/loras/
UMT5 XXL CLIP model city96 HuggingFace — umt5 repository models/clip/
WAN 2.1 VAE Comfy-Org WAN repackaged → files/VAE folder models/vae/
Upscale model Channel HuggingFace repo (link in guide) models/upscale_models/
GGUF quantization level: The Q3_K_M quantization is a good starting point — it runs on 6 GB VRAM cards and delivers solid results. If you have 12 GB+ VRAM, download a larger quant (Q5, Q6, Q8) for better quality.
Important: Make sure you download from the Image-to-Video repository on Quantstack, not the Text-to-Video one. They are separate repositories.

Step 3 — Set Up the Workflow in ComfyUI

  1. Launch ComfyUI and load the SVI long-video workflow (download link on CivitAI — linked in video description).
  2. If any nodes appear red, go to Manager → Install Missing Nodes. Install each missing package, then restart ComfyUI and refresh your browser.
  3. Check all model loader nodes — verify the arrows point to the files you actually downloaded.

GGUF vs. Full Diffusion Model

The workflow defaults to the GGUF model loader for low-VRAM operation. If you have a high-end GPU and want to use the full-precision diffusion model instead, there's a fast group bypasser switch in the workflow — enable the diffusion model loader and disable the GGUF option.

Step 4 — Generate a Long Video

  1. Load your starting image in the Load Image node.
  2. Set resolution in the Resize Image node — this controls both the input resize and the output video dimensions.
  3. The default workflow generates a 20-second video split into four 5-second segments. Write a separate prompt for each segment describing what happens in that 5-second window.
  4. Check the seed settings — default is "fixed" (same output every run). Change to "randomize" if you want variations.
  5. Click Run. The SVI LoRA automatically passes the final frames of each segment into the next, creating seamless continuity.
Generation speed on RTX 4090: A full 20-second video takes approximately 5–7 minutes. On 6 GB VRAM hardware it's slower but functional.

Extending Beyond 20 Seconds (Infinite Chaining)

Want more than 20 seconds? Adding more segments is straightforward:

  1. Select all nodes in one of the existing 5-second subsection groups.
  2. Right-click → Clone.
  3. Drag the clone into position and connect the extended image output from the previous segment into the previous image input of the new clone.
  4. Connect the new clone's extended image output to the upscale image connector at the end of the workflow.

Each clone adds 5 seconds. Repeat as many times as you want.

Tips for Best Results

📦 Want to skip the setup?

The Local Lab offers pre-configured AI installer packages so you can get running in minutes, not hours.

Get the Installer →