LTX-2 by Lightricks is a serious step forward in open-source AI video generation. We're talking 4K resolution at 50 frames per second with native synchronized audio — dialogue, ambient noise, and lip-sync — all generated from a single model. And thanks to Wan2GP's low-VRAM optimizations, this runs on consumer hardware.
What Makes LTX-2 Stand Out
LTX-2 also offers two distinct generation modes: a Fast mode for quick iteration and a Pro mode for cinematic-quality output. We'll run it through Wan2GP, a local AI video tool optimized for low-VRAM consumer hardware that supports LTX-2, WAN 2.2, and Flux.
Manual Install — Wan2GP with LTX-2
- Miniconda — isolates Python environments (search "Miniconda" → anaconda.com)
- FFmpeg — required for video processing (ffmpeg.org)
- Git — for cloning the repository (git-scm.com)
- Python 3.10 — newer versions conflict with required libraries
-
Create the Conda environment. Open Anaconda Prompt and run:
-
Navigate to your install folder using
cd, then clone the repository:
-
Install base dependencies:
-
Install PyTorch with CUDA support (Windows NVIDIA GPU — use the specific command from the video description for PyTorch 2.8 with CUDA 12.8).
-
Install Sage Attention via pre-built wheel. This step is often the trickiest on Windows. Download the pre-built wheel from the linked GitHub repository (releases page) — match it to Python 3.10 + PyTorch 2.8 + CUDA 12.8. Drop the
.whlfile into your wan2gp folder, then run:
-
Install Triton for Windows using the command from the video description (a Windows-compatible alternative to the standard Triton build).
-
Launch Wan2GP:
The terminal will output a local URL — paste it into your browser to open the Wan2GP Gradio interface.
Running the Three LTX-2 Workflows
Text to Video
- In the Wan2GP UI, select LTX-2 from the model dropdown.
- Set Control video process to either Upload audio or Generate video based on soundtrack and text prompt.
- Write a highly detailed prompt — LTX-2 is a production-grade model and responds much better to descriptive, specific instructions (lighting, motion, character details). Running your draft through an LLM to expand it works well.
- Set resolution to 720p minimum — quality degrades heavily at lower resolutions.
- Click Generate. On an RTX 4090, expect 1.5–4 minutes depending on prompt complexity and frame count.
Image to Video
- Select Start video with image at the top of the interface.
- Upload your source image.
- Critical: Download and add the
LTX2-camera-control-staticLoRA to your Wan2GPloras/folder. Without this LoRA, image-to-video generations produce endless camera zooms with little subject motion. - Select the LoRA in the interface, set your strength value, and generate. Image-to-video takes slightly longer than text-to-video but gives much more compositional control.
Talking Avatar (Lip-Sync)
- Keep Start video with image selected and upload your character photo.
- Under Control video process, switch to Generate video based on soundtrack and text prompt.
- Upload your voice recording in the audio slot that appears.
- In your prompt, describe the character's actions. If they're speaking, include the dialogue explicitly.
- Match frame count to audio length — if your audio clip is 5 seconds, set frames accordingly so the video doesn't cut off mid-sentence.
- Make sure the camera control LoRA is still active — it's essential for talking avatar generations too.
Advanced Settings
Advanced Mode Tab
- CFG scale, seed, frame count — accessible in advanced mode. Be careful with frame count — higher = longer video but much longer generation time.
- LoRAs — drop
.safetensorsfiles into theloras/folder, click Refresh, then select and set strength in the UI.
Post-Processing / Upscaling
Available in the Post-processing tab. If you use upscaling, prefer the spatial upscaler — it produces cleaner results than the temporal upscaler.
Quantized Models for Lower VRAM
The model dropdown includes distilled, GGUF, and FP4 quantized variants. If your GPU is under 12 GB VRAM, try the GGUF version — it significantly reduces memory requirements with a modest quality trade-off.
Tips for Best Results
- Detailed prompts are essential. LTX-2 rewards specificity — lighting, movement direction, subject description, atmosphere. Expand a brief idea with an LLM before generating.
- Always add the camera control LoRA for image-to-video. Without it, subjects barely move and the camera zooms endlessly.
- Match frame count to audio clip length for talking avatar workflows.
- 720p is the minimum viable resolution — quality drops sharply below this.
- Use spatial upscaling in post-processing, not temporal, for cleaner output.
📦 Want to skip the setup?
The Local Lab offers pre-configured AI installer packages so you can get running in minutes, not hours.
Get the Installer →