Streaming Text-to-Speech
Text-to-Speech
Streaming Text-to-Speech
Stream audio in real-time as it’s generated
POST
Streaming Text-to-Speech
Overview
Stream audio in real-time as it’s being generated. This is ideal for live reading applications where you want to start playing audio before the entire text is processed.Request Body
The text to convert to speech
Language code (e.g., “en”, “es”, “fr”)
Number of characters per chunk. Smaller chunks provide faster initial response but more network overhead.
Your GistMag API key
Example Request
Response
The response is a streaming MP3 audio file. Audio chunks are sent as they’re generated. Content-Type:audio/mpeg
Credit Cost
1 credit per 1,000 characters, with a minimum of 1 credit for any request.Examples:
- 10 characters = 1 credit (minimum charge)
- 500 characters = 1 credit (minimum charge)
- 1,000 characters = 1 credit
- 2,500 characters = 3 credits (rounded up)
inline
How It Works
- Text Splitting: Text is automatically split into chunks at sentence boundaries (
.,!,?) to maintain natural speech flow - Sequential Processing: Each chunk is processed independently and converted to speech
- Real-time Streaming: Audio chunks are streamed immediately as MP3 (128k bitrate) as soon as they’re generated
- Low Latency: Client can start playing audio while remaining chunks are still being generated
When to Use Streaming
Use streaming when:- You want low latency - audio starts playing immediately
- Building live reading or real-time applications
- Users need to hear audio as quickly as possible
- You’re okay with receiving multiple audio chunks that need to be combined client-side
- You need a single complete file for download
- You want pauses between segments (use batch instead)
- You prefer higher quality audio (batch uses 192k vs streaming’s 128k)
Example Usage
Python
JavaScript
Streaming is ideal for long-form content where you want to start playback immediately rather than waiting for the entire audio to be generated.