Microsoft VibeVoice TTS - Gradio and ComfyUI Windows One Click Installers
Run VibeVoice with locally using these two seamless, one-click Windows installer scripts. VibeVoice is a state-of-the-art voice synthesis AI built on powerful models, optimized for both low and high VRAM setups.
The two streamlined Windows installer scripts tailored for different use cases: the ComfyUI installer and the Gradio-focused installer. Both scripts automate installation of dependencies and setup, but differ in scope and workflow integration, empowering users to get started quickly whether they prefer ComfyUI’s modular node-based interface or the straightforward Gradio UI.
Key Differences Between the Two Installers
ComfyUI Installer Script
Focuses on setting up ComfyUI, a node-based UI for AI workflows, customized with VibeVoice and ComfyUI-Manager custom nodes.
Verifies and installs Git and 7-Zip, downloads ComfyUI portable release, extracts it, clones custom nodes repos, and installs precise Python dependencies within the embedded Python environment.
Best suited for users wanting full flexibility by building workflows via ComfyUI’s visual graph interface.
Handles specific package versions and Flash Attention for optimized NVIDIA CUDA support within ComfyUI’s embedded Python.
Gradio Installer Script
Sets up a Miniconda environment configured with Python 3.10, installs Git, clones the VibeVoice repository, and installs dependencies including PyTorch with CUDA support.
Creates multiple launch batch files for running VibeVoice with either a low VRAM (1.5B) or high VRAM (7B) model using the Gradio web UI.
Ideal for users seeking a quick start with VibeVoice’s easy-to-use Gradio interface for voice synthesis without manually configuring workflows.
Includes an update script to easily pull latest improvements and reinstall dependencies.
What’s Included
Full automated environment setup—dependency installs, repo clones, and model downloads where applicable.
Preconfigured scripts for launching VibeVoice either within ComfyUI’s node architecture or via Gradio UI.
Dependency management tailored to each workflow: embedded Python for ComfyUI vs. isolated Miniconda for Gradio.
Support for NVIDIA CUDA-accelerated Flash Attention extensions for enhanced performance.
System Requirements
Windows OS
Nvidia GPU with CUDA support (preferably RTX 30XX or later)
At least 40 GB free disk space
Internet connection for dependency downloads and repo cloning
Usage Notes
Gradio UI Tips:
Once launched via the start script, the Gradio UI lets users select the number of speakers and pick or add voice samples (just drop short wav/mp3 clips into the VibeVoice/demo/voices folder and restart Gradio). The CFG slider controls how closely the generation follows text and voice characteristics; default values around 1.30 to 1.35 produce great results.
ComfyUI Workflow Tips:
Upload your short audio clip (mp3 or wav), write your prompt in the VibeVoice node, then adjust configurations as needed to customize voice synthesis. The node supports multi-speaker dialogue generation within ComfyUI’s modular interface for flexible, fine-tuned audio creation.
Support and More Information
Community Support: For troubleshooting or to connect with other users, join the Discord server.
Youtube Video Tutorial - https://youtu.be/3kNI_kc78S8
Buy on Patreon
While I improve the store, you can purchase these items or sign up for a membership on Patreon - https://www.patreon.com/TheLocalLab.