Skip to content

audio-transcriber

Transcribe .wav, .mp4, .mp3, and .flac files to text — or record your own audio — through a CLI, a Python API, an MCP server, and an A2A agent, built on the agent-utilities ecosystem.

Official documentation

This site is the canonical reference for audio-transcriber, maintained alongside every release.

PyPI MCP Server License GitHub

Overview

audio-transcriber wraps OpenAI Whisper — via the fast faster-whisper (CTranslate2) backend with an openai-whisper fallback — behind a typed, deterministic tool surface. It provides:

  • AudioTranscriber — a Python class that records microphone audio, transcribes local media files, and exports txt / srt / vtt / json results.
  • An MCP server (audio-transcriber-mcp) exposing the transcribe_audio tool for agents and IDE assistants.
  • An A2A agent (audio-transcriber-agent) that drives the MCP tools over the Agent Control Protocol with an optional web interface.

Transcription runs entirely in process — the Whisper model is loaded locally, so no external transcription service is required.

Explore the documentation

  • Installation — pip, source, extras, and the prebuilt Docker image.
  • Deployment — run the MCP server and the agent, Docker Compose, Caddy + Technitium.
  • Usage — the MCP tool surface, the AudioTranscriber API, and the CLI.
  • Overview — capability summary and ecosystem role.
  • Concepts — the CONCEPT:AUDIO-* registry.

Quick start

pip install "audio-transcriber[mcp]"
audio-transcriber-mcp            # stdio MCP server (default transport)

Transcribe a file directly from the command line:

audio-transcriber --file '~/Downloads/meeting.mp4' --model base --export

See Installation and Deployment for the full matrix (PyPI extras, Docker image, all transports, reverse proxy, DNS).