Private transcription with whisper.cpp
## Local Whisper Speech-to-Text MCP Server: Offline Audio Transcription The **Local Whisper MCP Server** brings OpenAI's powerful Whisper speech recognition model directly into Google Antigravity, enabling completely offline, privacy-preserving audio transcription without sending data to external APIs. ### Why Local Whisper MCP? - **Complete Privacy**: All audio processing happens locally on your machine - no data leaves your system - **No API Costs**: Eliminate per-minute transcription fees with unlimited local processing - **Offline Capable**: Transcribe audio files without internet connectivity - **Multiple Languages**: Support for 99+ languages with automatic language detection - **High Accuracy**: Leverage state-of-the-art transformer models for professional-grade results ### Key Features #### 1. Audio File Transcription ```python # Transcribe audio files result = await mcp.transcribe_audio( file_path="/recordings/meeting.mp3", language="en", # Optional: auto-detect if not specified task="transcribe" # or "translate" for translation to English ) # Access transcription with timestamps for segment in result["segments"]: print(f"[{segment['start']:.2f}s] {segment['text']}") ``` #### 2. Real-Time Streaming ```python # Stream transcription from microphone async for text in mcp.stream_transcription( device="default", chunk_duration=5, model="base" ): print(f"Heard: {text}") ``` #### 3. Model Selection ```python # Choose model size based on accuracy/speed tradeoff models = await mcp.list_models() # Available: tiny, base, small, medium, large, large-v2, large-v3 result = await mcp.transcribe_audio( file_path="/audio/podcast.wav", model="medium", # Balance of speed and accuracy fp16=True # Use half-precision for faster GPU inference ) ``` #### 4. Batch Processing ```python # Process multiple files efficiently files = ["/audio/ep1.mp3", "/audio/ep2.mp3", "/audio/ep3.mp3"] results = await mcp.batch_transcribe( files=files, output_format="srt", # Generate subtitle files word_timestamps=True ) for file, result in zip(files, results): print(f"Transcribed {file}: {len(result['text'])} characters") ``` ### Configuration ```json { "mcpServers": { "whisper-local": { "command": "python", "args": ["-m", "mcp_whisper"], "env": { "WHISPER_MODEL": "base", "WHISPER_DEVICE": "cuda", "WHISPER_COMPUTE_TYPE": "float16", "WHISPER_DOWNLOAD_ROOT": "~/.cache/whisper" } } } } ``` ### Use Cases **Meeting Transcription**: Automatically transcribe recorded meetings with speaker timestamps, generating searchable meeting notes without cloud dependencies. **Podcast Production**: Create accurate transcripts and subtitle files for podcast episodes, enabling accessibility and SEO optimization. **Voice Note Processing**: Convert voice memos and dictations into text documents while maintaining complete privacy for sensitive content. **Language Learning**: Transcribe foreign language audio with translation capabilities, helping learners understand spoken content. The Local Whisper MCP enables private, cost-effective speech-to-text processing directly within your development environment.
{
"mcpServers": {
"whisper-local": {
"mcpServers": {
"whisper-local": {
"args": [
"-y",
"local-speech-to-text-mcp"
],
"command": "npx"
}
}
}
}
}