Overview
The GistMag Text-to-Speech API converts text into natural-sounding speech using Google Cloud Text-to-Speech. It also provides Speech-to-Text transcription using OpenAI Whisper API. The API supports multiple languages, multiple voices, streaming, background music, and high-quality audio transcription.Features
Basic TTS
Convert text to speech with a simple API call
Streaming
Stream audio in real-time as it’s generated
Batch Processing
Process long text in batches with pauses
With Music
Generate speech with background music
Speech-to-Text
Transcribe audio files to text
Change Speed
Adjust playback speed of audio files
Add Music
Add background music to existing audio
Voices
Browse and select from Google Cloud voices
Languages
List all supported languages
Supported Languages
The TTS API supports multiple languages, including:- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Portuguese (pt)
- Japanese (ja)
- Korean (ko)
- Chinese (zh)
Audio Formats
- Input: Text (plain string)
- Output: WAV (uncompressed) or MP3 (compressed) audio files
- Streaming: MP3 format for efficient streaming
Engines
Text-to-Speech
The API uses Google Cloud Text-to-Speech for TTS, which provides:- High-quality, natural-sounding speech
- Multiple neural voices per language
- Control over speaking rate, pitch, and volume
- Fast, scalable generation with Google Cloud infrastructure
Speech-to-Text
The API uses OpenAI Whisper API for STT, which provides:- High-accuracy transcription with automatic language detection
- Support for many audio formats (MP3, WAV, M4A, FLAC, OGG, etc.)
- Multi-language support with automatic language detection
- Robust handling of various accents and audio qualities