Text To Speech Dataset Creator Gradio UI
I'm excited to announce the release of my local TTS Dataset Maker—a tool with an intuitive Gradio UI designed to help you build high-quality Parquet TTS datasets for training or fine-tuning text-to-speech models.
What Does the Project Do?
Upload your own clear, high-quality audio files of your target voice.
Preview and play each audio file to verify your selections.
Select which files to include in your dataset.
With a single click, the tool:
Cuts selected audio into 10-second clips.
Transcribes each clip using a local Whisper small model.
Packages the transcriptions and audio into a Parquet dataset, ready for use in TTS model training (e.g., for uploading to Hugging Face).
Instantly analyze your newly created dataset in the Renumics Spotlight project, allowing you to explore both structured and unstructured data directly in your browser.
Browse and review previously created datasets within the same Spotlight interface.
Key Features
Local Processing: No cloud upload required—your data stays on your machine.
Gradio Interface: Simple, user-friendly controls for every step of the dataset creation process.
Automated Transcription: Integrated Whisper small model for fast, accurate local transcription.
Flexible Dataset Management: Easily manage, preview, and analyze both new and existing datasets.
Parquet Output: Datasets are saved in the efficient Parquet format, compatible with Hugging Face and other TTS training pipelines.
Integrated Analysis: Built-in support for Renumics Spotlight for dataset exploration.
Requirements
Windows PC (installer packages for both CPU and Nvidia GPU setups are included for top-tier Patreon members)
Sufficient disk space for audio files and generated datasets
No prior coding experience required
Installation Instructions
Download the appropriate ZIP file for your system (CPU or Nvidia GPU) from the Patreon post.
Extract the ZIP file to your preferred location.
Run start.bat to launch the TTS Dataset Creator.
Follow the on-screen instructions in the Gradio UI to upload audio, select files, and create your dataset.
Once your dataset is generated, use the integrated Spotlight window to analyze and explore your data.
I've will be dropped a video on my Youtube channel showing the tool in use you can watch here - https://youtu.be/NeWJTd9uIDE.
This tool is available for my highest-tier Patreon members but you can also purchase the one click package here - https://www.patreon.com/posts/text-to-speech-130323820.
If you have any questions just leave a comment in my discord channel here - https://discord.gg/5hmB4N4JFc.
Buy On Patreon
While I improve the store, you can purchase these items or sign up for a membership on Patreon - https://www.patreon.com/TheLocalLab.