Skip to main content

Overview

The GistMag Text-to-Speech API converts text into natural-sounding speech using Google Cloud Text-to-Speech. It also provides Speech-to-Text transcription using OpenAI Whisper API. The API supports multiple languages, multiple voices, streaming, background music, and high-quality audio transcription.

Features

Supported Languages

The TTS API supports multiple languages, including:
  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Japanese (ja)
  • Korean (ko)
  • Chinese (zh)

Audio Formats

  • Input: Text (plain string)
  • Output: WAV (uncompressed) or MP3 (compressed) audio files
  • Streaming: MP3 format for efficient streaming

Engines

Text-to-Speech

The API uses Google Cloud Text-to-Speech for TTS, which provides:
  • High-quality, natural-sounding speech
  • Multiple neural voices per language
  • Control over speaking rate, pitch, and volume
  • Fast, scalable generation with Google Cloud infrastructure

Speech-to-Text

The API uses OpenAI Whisper API for STT, which provides:
  • High-accuracy transcription with automatic language detection
  • Support for many audio formats (MP3, WAV, M4A, FLAC, OGG, etc.)
  • Multi-language support with automatic language detection
  • Robust handling of various accents and audio qualities

Quick Start

curl -X POST https://api.gistmag.co.uk/tts \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the text-to-speech API.",
    "language": "en",
    "api_key": "your_api_key_here"
  }'
The response will be an audio file that you can download or play directly.