top of page

Microsoft VibeVoice TTS - Gradio and ComfyUI Windows One Click Installers

Run VibeVoice with locally using these two seamless, one-click Windows installer scripts. VibeVoice is a state-of-the-art voice synthesis AI built on powerful models, optimized for both low and high VRAM setups.

 

The two streamlined Windows installer scripts tailored for different use cases: the ComfyUI installer and the Gradio-focused installer. Both scripts automate installation of dependencies and setup, but differ in scope and workflow integration, empowering users to get started quickly whether they prefer ComfyUI’s modular node-based interface or the straightforward Gradio UI.

 

Key Differences Between the Two Installers

 

ComfyUI Installer Script

  • Focuses on setting up ComfyUI, a node-based UI for AI workflows, customized with VibeVoice and ComfyUI-Manager custom nodes.

  • Verifies and installs Git and 7-Zip, downloads ComfyUI portable release, extracts it, clones custom nodes repos, and installs precise Python dependencies within the embedded Python environment.

  • Best suited for users wanting full flexibility by building workflows via ComfyUI’s visual graph interface.

  • Handles specific package versions and Flash Attention for optimized NVIDIA CUDA support within ComfyUI’s embedded Python.

 

Gradio Installer Script

  • Sets up a Miniconda environment configured with Python 3.10, installs Git, clones the VibeVoice repository, and installs dependencies including PyTorch with CUDA support.

  • Creates multiple launch batch files for running VibeVoice with either a low VRAM (1.5B) or high VRAM (7B) model using the Gradio web UI.

  • Ideal for users seeking a quick start with VibeVoice’s easy-to-use Gradio interface for voice synthesis without manually configuring workflows.

  • Includes an update script to easily pull latest improvements and reinstall dependencies.

 

What’s Included

  • Full automated environment setup—dependency installs, repo clones, and model downloads where applicable.

  • Preconfigured scripts for launching VibeVoice either within ComfyUI’s node architecture or via Gradio UI.

  • Dependency management tailored to each workflow: embedded Python for ComfyUI vs. isolated Miniconda for Gradio.

  • Support for NVIDIA CUDA-accelerated Flash Attention extensions for enhanced performance.

 

System Requirements

  • Windows OS

  • Nvidia GPU with CUDA support (preferably RTX 30XX or later)

  • At least 40 GB free disk space

  • Internet connection for dependency downloads and repo cloning

 

Usage Notes

 

Gradio UI Tips:
Once launched via the start script, the Gradio UI lets users select the number of speakers and pick or add voice samples (just drop short wav/mp3 clips into the VibeVoice/demo/voices folder and restart Gradio). The CFG slider controls how closely the generation follows text and voice characteristics; default values around 1.30 to 1.35 produce great results.

 

ComfyUI Workflow Tips:
Upload your short audio clip (mp3 or wav), write your prompt in the VibeVoice node, then adjust configurations as needed to customize voice synthesis. The node supports multi-speaker dialogue generation within ComfyUI’s modular interface for flexible, fine-tuned audio creation.

 

Support and More Information

  • Community Support: For troubleshooting or to connect with other users, join the Discord server.

  • Youtube Video Tutorial - https://youtu.be/3kNI_kc78S8

  • Buy on Patreon

    While I improve the store, you can purchase these items or sign up for a membership on Patreon  - https://www.patreon.com/TheLocalLab.

$5.00Price
Quantity
    bottom of page