Overview
Generate speech that mimics a specific voice by providing a reference audio sample. This allows you to create custom voices for your applications.
Request Body
This endpoint uses multipart/form-data for file uploads.
Reference audio file (WAV format) containing the voice to clone. Should be 3-10 seconds of clear speech.
The text to convert to speech using the cloned voice
Language code (e.g., “en”, “es”, “fr”)
Example Request
curl -X POST https://api.gistmag.co.uk/tts/voice-clone \
-F "reference_audio=@reference.wav" \
-F "text=Hello, this is a test using a cloned voice." \
-F "language=en" \
-F "api_key=your_api_key_here" \
--output output.wav
Reference Audio Requirements
- Format: WAV (recommended) or MP3
- Duration: 3-10 seconds
- Quality: Clear speech, minimal background noise
- Content: Should contain natural speech in the target language
Ensure you have permission to use the reference audio for voice cloning. Respect privacy and copyright laws.
Response
The response is an audio file (WAV format) with the generated speech using the cloned voice.
Content-Type: audio/wav
Content-Disposition: attachment; filename=output.wav
Example Usage
Python
import requests
with open("reference.wav", "rb") as f:
files = {"reference_audio": f}
data = {
"text": "Hello, this is a test.",
"language": "en",
"api_key": "your_api_key_here"
}
response = requests.post(
"https://api.gistmag.co.uk/tts/voice-clone",
files=files,
data=data
)
with open("output.wav", "wb") as f:
f.write(response.content)
JavaScript
const formData = new FormData();
formData.append('reference_audio', referenceAudioFile);
formData.append('text', 'Hello, this is a test.');
formData.append('language', 'en');
formData.append('api_key', 'your_api_key_here');
const response = await fetch('https://api.gistmag.co.uk/tts/voice-clone', {
method: 'POST',
body: formData
});
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
Streaming with Voice Clone
For streaming audio with voice cloning, use the /tts/stream/voice-clone endpoint.
Voice cloning works best with clear, high-quality reference audio. The model learns the voice characteristics from the reference sample.