F5 TTS ComfyUI Voice Cloning — Free ElevenLabs Alternative

F5 TTS is a powerful open-source text-to-speech system that can clone almost any voice from just 10 seconds of audio. Running it inside ComfyUI's node-based workflow system makes it accessible without any coding — and it works on systems with as little as 6 GB VRAM. Think of it as a completely free, local alternative to ElevenLabs voice cloning.

What F5 TTS Can Do

🎙️

10-Second Voice Clone

Just 10 seconds of clean audio is enough to clone a voice convincingly

🎭

Tone Variations

Clone the same voice in different emotional tones and speech styles

⚡

Fast Generation

Often just seconds per generation, even on low-VRAM hardware

💾

6 GB VRAM

Runs locally on consumer GPUs — tested on RTX 4050 with 6 GB VRAM

Two Workflow Options

Free / Basic

Basic Workflow

Record voice via microphone directly
Manual transcription required
Available from the ComfyUI web viewer node repository

⭐ Patreon Enhanced

Premium Workflow

Upload any audio file (≤15 sec)
Auto-transcription via Whisper Small
Text file input for long scripts
Ollama + Gemini API text generation nodes
One-click Windows installer included

Manual Setup — Basic Workflow

This path sets up the basic voice recording workflow from scratch. You'll need a microphone for audio input.

Install ComfyUI (Portable Windows)

Download the ComfyUI Portable ZIP from the ComfyUI releases page. Extract it with 7-Zip.
Navigate into the custom_nodes folder, click the address bar, type cmd, and press Enter.
Run:

git clone https://github.com/ltdrdata/ComfyUI-Manager

Navigate back to the main ComfyUI folder (where the Python embedded folder is) and run the dependency install command from the written guide linked in the video description.

Load the Workflow

Launch ComfyUI. Download the basic F5 TTS workflow file (link in video description — also available in the ComfyUI web viewer node repository's workflows folder).
Drag the workflow JSON into ComfyUI. Red nodes will appear — this is normal.
Open Manager → Install Missing Nodes. Install each missing node one by one, then restart ComfyUI.
After restart, the workflow is ready. Use the audio record node to record your voice sample and enter the text you want to clone.

Patreon Premium Workflow Setup (One-Click)

Fastest path: Download the F5 TTS ComfyUI batch file from Patreon, double-click it. The installer handles ComfyUI, all custom nodes, and all required models automatically.

Download and double-click the F5TTS_ComfyUI.bat file from the Patreon page.
Once installation completes, launch ComfyUI and load the enhanced workflow file.
Sections of nodes are disabled by default. To enable a section: hold Ctrl, select the nodes in that section, right-click, and choose Bypass.

Using the Enhanced Workflow

The premium workflow has four main sections, each adding capabilities on top of the basic recording feature:

Section 1 — Microphone Recording

Record your voice sample directly in ComfyUI. Speak clearly and record in a quiet environment for best results.

Section 2 — Audio File Upload

Upload a pre-recorded audio file as your voice source. Keep clips to 15 seconds or less — the Whisper Small model automatically transcribes it, so you don't need to manually type what was said.

Audio quality matters: The cleaner and clearer your source audio, the better the voice clone. Background noise, music, or multiple speakers will reduce clone quality.

Section 3 — Text File Input

Instead of typing text directly, upload a .txt file and its content becomes the script your cloned voice will speak. Useful for longer content or pre-prepared scripts.

Section 4 — AI-Generated Text (Ollama + Gemini)

Generate the script text using an AI model:

Ollama: If Ollama is installed and running locally, the workflow auto-detects your downloaded models in a dropdown. Select a model and write a prompt.
Gemini API: Create an API key at Google AI Studio, open the config file in the Ollama/Gemini custom nodes folder, and paste the key between the quotation marks. Enable the Gemini node in the workflow, select your model, and enter your prompt.

Running a Generation

Choose your audio input method (record, upload, or use an existing sample).
Type (or load) the text you want the cloned voice to speak.
Click the Queue button. Generation typically completes in just a few seconds.
Listen directly in ComfyUI using the Open Web Viewer button, or find the output file at ComfyUI/output/audio/.

Performance note: Tested and working on an RTX 4050 with 6 GB VRAM and 16 GB RAM. Generation is fast — usually just seconds per clip.

Tips for Best Clone Quality

Use 10–15 seconds of clean, clear audio. One speaker, no background noise, no music.
Record multiple tones — the same voice recorded in calm, excited, and conversational tones can be combined for more expressive outputs.
Match speaking pace — try to include natural speech rhythm in your reference clip, not just flat reading.
Short clips clone faster and better than long recordings — 10–15 seconds is the sweet spot.

📦 Want to skip the setup?

The Local Lab offers pre-configured AI installer packages so you can get running in minutes, not hours.

Get the Installer →

AI Voice Cloning in ComfyUI with F5 TTS — Free ElevenLabs Alternative

What F5 TTS Can Do

Two Workflow Options