LTX-2 AI Video — FREE Low VRAM Image to Video with Audio
Workflow

LTX-2 AI Video — FREE Low VRAM Image to Video with Audio (Wan2GP Tutorial)

Sep 2025 · 10 min read · AI Video · LTX-2 · Wan2GP · Talking Avatars

LTX-2 by Lightricks is a serious step forward in open-source AI video generation. We're talking 4K resolution at 50 frames per second with native synchronized audio — dialogue, ambient noise, and lip-sync — all generated from a single model. And thanks to Wan2GP's low-VRAM optimizations, this runs on consumer hardware.

What Makes LTX-2 Stand Out

🎬
Native Audio Generation
Synthesizes synchronized dialogue and ambient sound — not just pixels
📺
4K @ 50fps
Production-grade resolution and frame rate from an open-source model
🗺️
Depth Map Control
Use depth maps or multiple keyframes to direct camera movement precisely
3x Faster Generation
NVFP4 and BF-16 formats deliver up to 3x speed vs older video models

LTX-2 also offers two distinct generation modes: a Fast mode for quick iteration and a Pro mode for cinematic-quality output. We'll run it through Wan2GP, a local AI video tool optimized for low-VRAM consumer hardware that supports LTX-2, WAN 2.2, and Flux.

One-click installer available: Patreon members can skip the manual setup with a one-click Windows installer for Wan2GP, plus a separate ComfyUI installer with pre-built text-to-video and image-to-video workflows. Links in the video description.

Manual Install — Wan2GP with LTX-2

Requirements before starting:
  • Miniconda — isolates Python environments (search "Miniconda" → anaconda.com)
  • FFmpeg — required for video processing (ffmpeg.org)
  • Git — for cloning the repository (git-scm.com)
  • Python 3.10 — newer versions conflict with required libraries
  1. Create the Conda environment. Open Anaconda Prompt and run:
conda create -n wan2gp python=3.10 -y conda activate wan2gp
  1. Navigate to your install folder using cd, then clone the repository:
git clone https://github.com/deepbeats/wan2gp cd wan2gp
  1. Install base dependencies:
pip install -r requirements.txt
  1. Install PyTorch with CUDA support (Windows NVIDIA GPU — use the specific command from the video description for PyTorch 2.8 with CUDA 12.8).
  2. Install Sage Attention via pre-built wheel. This step is often the trickiest on Windows. Download the pre-built wheel from the linked GitHub repository (releases page) — match it to Python 3.10 + PyTorch 2.8 + CUDA 12.8. Drop the .whl file into your wan2gp folder, then run:
pip install [sage_attention_wheel_filename].whl
  1. Install Triton for Windows using the command from the video description (a Windows-compatible alternative to the standard Triton build).
  2. Launch Wan2GP:
python wgp.py

The terminal will output a local URL — paste it into your browser to open the Wan2GP Gradio interface.

First run: Wan2GP automatically downloads LTX-2 model weights on the first generation. This initial wait is long — plan for it and make sure you have enough disk space before starting.

Running the Three LTX-2 Workflows

Mode 1
Text to Video
Select "Text prompt only" — write a detailed prompt and let the model generate + add audio automatically
Mode 2
Image to Video
Select "Start video with image" — upload a source image and add the camera control LoRA for natural motion
Mode 3
Talking Avatar
Image to video + upload your own voice clip — generates a lip-synced talking head from a single photo

Text to Video

  1. In the Wan2GP UI, select LTX-2 from the model dropdown.
  2. Set Control video process to either Upload audio or Generate video based on soundtrack and text prompt.
  3. Write a highly detailed prompt — LTX-2 is a production-grade model and responds much better to descriptive, specific instructions (lighting, motion, character details). Running your draft through an LLM to expand it works well.
  4. Set resolution to 720p minimum — quality degrades heavily at lower resolutions.
  5. Click Generate. On an RTX 4090, expect 1.5–4 minutes depending on prompt complexity and frame count.

Image to Video

  1. Select Start video with image at the top of the interface.
  2. Upload your source image.
  3. Critical: Download and add the LTX2-camera-control-static LoRA to your Wan2GP loras/ folder. Without this LoRA, image-to-video generations produce endless camera zooms with little subject motion.
  4. Select the LoRA in the interface, set your strength value, and generate. Image-to-video takes slightly longer than text-to-video but gives much more compositional control.

Talking Avatar (Lip-Sync)

  1. Keep Start video with image selected and upload your character photo.
  2. Under Control video process, switch to Generate video based on soundtrack and text prompt.
  3. Upload your voice recording in the audio slot that appears.
  4. In your prompt, describe the character's actions. If they're speaking, include the dialogue explicitly.
  5. Match frame count to audio length — if your audio clip is 5 seconds, set frames accordingly so the video doesn't cut off mid-sentence.
  6. Make sure the camera control LoRA is still active — it's essential for talking avatar generations too.

Advanced Settings

Advanced Mode Tab

Post-Processing / Upscaling

Available in the Post-processing tab. If you use upscaling, prefer the spatial upscaler — it produces cleaner results than the temporal upscaler.

Quantized Models for Lower VRAM

The model dropdown includes distilled, GGUF, and FP4 quantized variants. If your GPU is under 12 GB VRAM, try the GGUF version — it significantly reduces memory requirements with a modest quality trade-off.

RunPod alternative: If your local GPU is too limited (tested on an RTX 4050 with 6 GB VRAM for reference), RunPod provides RTX 4090 access for a few dollars per session to run full-quality generations.

Tips for Best Results

📦 Want to skip the setup?

The Local Lab offers pre-configured AI installer packages so you can get running in minutes, not hours.

Get the Installer →