audio-transcriber¶

Transcribe .wav, .mp4, .mp3, and .flac files to text — or record your own audio — through a CLI, a Python API, an MCP server, and an A2A agent, built on the agent-utilities ecosystem.

Official documentation

This site is the canonical reference for audio-transcriber, maintained alongside every release.

Overview¶

audio-transcriber wraps OpenAI Whisper — via the fast faster-whisper (CTranslate2) backend with an openai-whisper fallback — behind a typed, deterministic tool surface. It provides:

AudioTranscriber — a Python class that records microphone audio, transcribes local media files, and exports txt / srt / vtt / json results.
An MCP server (audio-transcriber-mcp) exposing the transcribe_audio tool for agents and IDE assistants.
An A2A agent (audio-transcriber-agent) that drives the MCP tools over the Agent Control Protocol with an optional web interface.

Transcription runs entirely in process — the Whisper model is loaded locally, so no external transcription service is required.

Explore the documentation¶

Installation — pip, source, extras, and the prebuilt Docker image.
Deployment — run the MCP server and the agent, Docker Compose, Caddy + Technitium.
Usage — the MCP tool surface, the AudioTranscriber API, and the CLI.
Overview — capability summary and ecosystem role.
Concepts — the CONCEPT:AUDIO-* registry.

Quick start¶

pip install "audio-transcriber[mcp]"
audio-transcriber-mcp            # stdio MCP server (default transport)

Transcribe a file directly from the command line:

audio-transcriber --file '~/Downloads/meeting.mp4' --model base --export

See Installation and Deployment for the full matrix (PyPI extras, Docker image, all transports, reverse proxy, DNS).