XPack

Speech-to-Text AI

@XPack

The Speech-to-Text AI service uses OpenAI Whisper for real-time speech recognition of audio/video files and YouTube videos. It supports multiple formats like mp3, mp4, etc. Offers tools to transcribe from url/file, add to queue, get result and status, with clear usage instructions for each.

Speech-to-Text AI Service Documentation

1. Service Overview

The Speech-to-Text AI service utilizes OpenAI Whisper for real-time speech recognition of audio/video files and YouTube videos. It supports multiple audio and video formats including mp3, mp4, mpeg, mpga, m4a, wav, and webm. The service can convert audio to text with high precision and reliability across multiple languages.

2. Tools

2.1 Transcribe from url or file

  • Function: Transcribes YouTube videos, files from remote URLs or local uploads.
  • Supported Formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.

2.2 Transcribe from url

  • Function: Transcribes videos from YouTube, TikTok, Instagram, Facebook, X (Twitter), Vimeo or LinkedIn, as well as files from remote URLs.
  • Supported Formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.

2.3 Add transcription to queue

  • Function: Adds a request to the queue to transcribe YouTube videos, files from remote URLs or local uploads.
  • Supported Formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm.

2.4 Get transcription result

  • Function: Allows you to retrieve the result of a queued transcription process. The request ID is needed to identify the specific transcription. The result includes the transcribed text and its corresponding chunks.

2.5 Get the transcription status

  • Function: Enables you to obtain the status of a queued transcription process. The request ID is required to identify the particular transcription.

3. Usage Instructions

3.1 Transcribing from a file or URL

  • For Transcribe from url or file:
    • Provide the appropriate URL or local file path.
    • Select one of the supported formats (mp3, mp4, mpeg, mpga, m4a, wav, or webm).
    • Initiate the transcription process.
  • For Transcribe from url:
    • Enter the URL of the video from YouTube, TikTok, Instagram, Facebook, X (Twitter), Vimeo or LinkedIn.
    • Choose a supported format.
    • Start the transcription.

3.2 Adding to the queue

  • Use the Add transcription to queue tool.
  • Supply the relevant YouTube video URL, remote URL, or local file path.
  • Select the supported format.
  • Submit the request to add it to the queue.

3.3 Retrieving the transcription result

  • Enter the request ID in the Get transcription result tool.
  • Wait for the result to be fetched, which will contain the transcribed text and chunks.

3.4 Checking the transcription status

  • Input the request ID into the Get the transcription status tool.
  • This will show the current status of the queued transcription process.
XPack MCP
{
  "mcpServers": {
    "speech-to-text-ai": {
      "type": "sse",
      "autoApprove":"all",
      "url": "https://mcp.xpack.ai/v1/mcp/speech-to-text-ai?authkey={Your-XPack-Auth-Key}"
    }
  }
}
© 2025 XPack. All rights reserved.