audio-transcriber¶
Transcribe .wav, .mp4, .mp3, and .flac files to text — or record your own
audio — through a CLI, a Python API, an MCP server, and an A2A agent, built on
the agent-utilities ecosystem.
Official documentation
This site is the canonical reference for audio-transcriber, maintained
alongside every release.
Overview¶
audio-transcriber wraps OpenAI Whisper — via the fast
faster-whisper (CTranslate2) backend
with an openai-whisper fallback — behind a typed, deterministic tool surface. It
provides:
AudioTranscriber— a Python class that records microphone audio, transcribes local media files, and exportstxt/srt/vtt/jsonresults.- An MCP server (
audio-transcriber-mcp) exposing thetranscribe_audiotool for agents and IDE assistants. - An A2A agent (
audio-transcriber-agent) that drives the MCP tools over the Agent Control Protocol with an optional web interface.
Transcription runs entirely in process — the Whisper model is loaded locally, so no external transcription service is required.
Explore the documentation¶
- Installation — pip, source, extras, and the prebuilt Docker image.
- Deployment — run the MCP server and the agent, Docker Compose, Caddy + Technitium.
- Usage — the MCP tool surface, the
AudioTranscriberAPI, and the CLI. - Overview — capability summary and ecosystem role.
- Concepts — the
CONCEPT:AUDIO-*registry.
Quick start¶
Transcribe a file directly from the command line:
See Installation and Deployment for the full matrix (PyPI extras, Docker image, all transports, reverse proxy, DNS).