Skip to main content

Overview

The GistMag Text-to-Speech API converts text into natural-sounding speech using Google Cloud Text-to-Speech. It also provides Speech-to-Text transcription using OpenAI Whisper API. The API supports multiple languages, multiple voices, streaming, background music, and high-quality audio transcription.

Features

Basic TTS

Convert text to speech with a simple API call

Streaming

Stream audio in real-time as it’s generated

Batch Processing

Process long text in batches with pauses

With Music

Generate speech with background music

Speech-to-Text

Transcribe audio files to text

Change Speed

Adjust playback speed of audio files

Add Music

Add background music to existing audio

Voices

Browse and select from Google Cloud voices

Languages

List all supported languages

Supported Languages

The TTS API supports multiple languages, including:
  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Japanese (ja)
  • Korean (ko)
  • Chinese (zh)

Audio Formats

  • Input: Text (plain string)
  • Output: WAV (uncompressed) or MP3 (compressed) audio files
  • Streaming: MP3 format for efficient streaming

Engines

Text-to-Speech

The API uses Google Cloud Text-to-Speech for TTS, which provides:
  • High-quality, natural-sounding speech
  • Multiple neural voices per language
  • Control over speaking rate, pitch, and volume
  • Fast, scalable generation with Google Cloud infrastructure

Speech-to-Text

The API uses OpenAI Whisper API for STT, which provides:
  • High-accuracy transcription with automatic language detection
  • Support for many audio formats (MP3, WAV, M4A, FLAC, OGG, etc.)
  • Multi-language support with automatic language detection
  • Robust handling of various accents and audio qualities

Quick Start

curl -X POST https://api.gistmag.co.uk/tts \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the text-to-speech API.",
    "language": "en",
    "api_key": "your_api_key_here"
  }'
The response will be an audio file that you can download or play directly.