docs
User Manual
Customer
API Reference
Audio Processing Models

Audio Processing Models

We provide several most popular Audio processing models for scenarios like TTS(Text To Speech) and ASR(Automatic Speech Recognition)/STT(Speech To Text).

Before started

You should get some parameters before get started : SERVICE_ID, API_KEY and MODEL, you can find them on our dashbord (opens in a new tab).

API reference

TTS

support model

  • fish-speech-1.4

/v1/audio/speech

Parameter nameTypeDescriptionRequired
modelStringmodel type: fish-speech-1.4, Whisper-large-v3, Whisper-large-v3-turboYes
inputStringThe text to generate audio for. Maximum length is 4096.Yes
response_formatStringAudio format: mp3, wav, pcm. default: wavYes

Response Elements

Parameter nameTypeDescription
Audio file content

ASR

support model

  • fish-speech-1.4
  • Whisper-large-v3
  • Whisper-large-v3-turbo

/v1/audio/transcriptions

Request Parameters

Parameter nameTypeDescriptionRequired
modelStringmodel type: fish-speech-1.4, Whisper-large-v3, Whisper-large-v3-turboYes
fileStringThe audio file object to transcribe, must be one of these formats: flac, mp3, mp4, mpeg, mgpa, m4a, ogg, wav, webmYes
languageStringThe language of audio file, format must in ISO-639-1.Yes

Response Elements

Parameter nameTypeDescription
textString

Usage

python

/v1/audio/speech

from pathlib import Path
import openai
 
client = openai.OpenAI(
    base_url="https://modelapi.holmesai.xyz/$SERVICE_ID/v1",
    api_key="$API_KEY",
)
 
output_file_path = Path(__file__).parent / "output.wav"
response = client.audio.speech.create(
    model="$model",
    input="The quick brown fox jumped over the lazy dog.",
)
response.stream_to_file(output_file_path)

/v1/audio/transcriptions

import openai
 
client = openai.OpenAI(
    base_url="https://modelapi.holmesai.xyz/$SERVICE_ID/v1",
    api_key="$API_KEY",
)
 
audio_file = open("input.wav", "rb")
transcript = client.audio.transcriptions.create(
    model="$MODEL", file=audio_file
)
print(transcript)

curl

/v1/audio/speech

curl -v --output output.wav -d '{
    "model": "$MODEL",
    "input": "The quick brown fox jumped over the lazy dog."
  }' -H "Authorization: Bearer $API_KEY" -H 'Content-Type: application/json'  https://modelapi.holmesai.xyz/$SERVICE_ID/v1/audio/speech
 

/v1/audio/transcriptions

curl -v https://llmapi.holmesai.xyz/$SERVICE_ID/v1/audio/transcriptions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@input.wav" \
  -F "metadata={\"model\":\"$MODEL\"}"