AI Voice Cloning in ComfyUI with F5 TTS
Guide

AI Voice Cloning in ComfyUI with F5 TTS — Free ElevenLabs Alternative

Dec 2025 · 8 min read · Voice Cloning · F5 TTS · ComfyUI · Local AI

F5 TTS is a powerful open-source text-to-speech system that can clone almost any voice from just 10 seconds of audio. Running it inside ComfyUI's node-based workflow system makes it accessible without any coding — and it works on systems with as little as 6 GB VRAM. Think of it as a completely free, local alternative to ElevenLabs voice cloning.

What F5 TTS Can Do

🎙️
10-Second Voice Clone
Just 10 seconds of clean audio is enough to clone a voice convincingly
🎭
Tone Variations
Clone the same voice in different emotional tones and speech styles
Fast Generation
Often just seconds per generation, even on low-VRAM hardware
💾
6 GB VRAM
Runs locally on consumer GPUs — tested on RTX 4050 with 6 GB VRAM

Two Workflow Options

Free / Basic
Basic Workflow
  • Record voice via microphone directly
  • Manual transcription required
  • Available from the ComfyUI web viewer node repository
⭐ Patreon Enhanced
Premium Workflow
  • Upload any audio file (≤15 sec)
  • Auto-transcription via Whisper Small
  • Text file input for long scripts
  • Ollama + Gemini API text generation nodes
  • One-click Windows installer included

Manual Setup — Basic Workflow

This path sets up the basic voice recording workflow from scratch. You'll need a microphone for audio input.

Install ComfyUI (Portable Windows)

  1. Download the ComfyUI Portable ZIP from the ComfyUI releases page. Extract it with 7-Zip.
  2. Navigate into the custom_nodes folder, click the address bar, type cmd, and press Enter.
  3. Run:
git clone https://github.com/ltdrdata/ComfyUI-Manager
  1. Navigate back to the main ComfyUI folder (where the Python embedded folder is) and run the dependency install command from the written guide linked in the video description.

Load the Workflow

  1. Launch ComfyUI. Download the basic F5 TTS workflow file (link in video description — also available in the ComfyUI web viewer node repository's workflows folder).
  2. Drag the workflow JSON into ComfyUI. Red nodes will appear — this is normal.
  3. Open Manager → Install Missing Nodes. Install each missing node one by one, then restart ComfyUI.
  4. After restart, the workflow is ready. Use the audio record node to record your voice sample and enter the text you want to clone.

Patreon Premium Workflow Setup (One-Click)

Fastest path: Download the F5 TTS ComfyUI batch file from Patreon, double-click it. The installer handles ComfyUI, all custom nodes, and all required models automatically.
  1. Download and double-click the F5TTS_ComfyUI.bat file from the Patreon page.
  2. Once installation completes, launch ComfyUI and load the enhanced workflow file.
  3. Sections of nodes are disabled by default. To enable a section: hold Ctrl, select the nodes in that section, right-click, and choose Bypass.

Using the Enhanced Workflow

The premium workflow has four main sections, each adding capabilities on top of the basic recording feature:

Section 1 — Microphone Recording

Record your voice sample directly in ComfyUI. Speak clearly and record in a quiet environment for best results.

Section 2 — Audio File Upload

Upload a pre-recorded audio file as your voice source. Keep clips to 15 seconds or less — the Whisper Small model automatically transcribes it, so you don't need to manually type what was said.

Audio quality matters: The cleaner and clearer your source audio, the better the voice clone. Background noise, music, or multiple speakers will reduce clone quality.

Section 3 — Text File Input

Instead of typing text directly, upload a .txt file and its content becomes the script your cloned voice will speak. Useful for longer content or pre-prepared scripts.

Section 4 — AI-Generated Text (Ollama + Gemini)

Generate the script text using an AI model:

Running a Generation

  1. Choose your audio input method (record, upload, or use an existing sample).
  2. Type (or load) the text you want the cloned voice to speak.
  3. Click the Queue button. Generation typically completes in just a few seconds.
  4. Listen directly in ComfyUI using the Open Web Viewer button, or find the output file at ComfyUI/output/audio/.
Performance note: Tested and working on an RTX 4050 with 6 GB VRAM and 16 GB RAM. Generation is fast — usually just seconds per clip.

Tips for Best Clone Quality

📦 Want to skip the setup?

The Local Lab offers pre-configured AI installer packages so you can get running in minutes, not hours.

Get the Installer →