mic

Local Whisper Speech-to-Text MCP MCP Server

Private transcription with whisper.cpp

whisperspeech-to-texttranscriptionlocal

About

## Local Whisper Speech-to-Text MCP Server: Offline Audio Transcription The **Local Whisper MCP Server** brings OpenAI's powerful Whisper speech recognition model directly into Google Antigravity, enabling completely offline, privacy-preserving audio transcription without sending data to external APIs. ### Why Local Whisper MCP? - **Complete Privacy**: All audio processing happens locally on your machine - no data leaves your system - **No API Costs**: Eliminate per-minute transcription fees with unlimited local processing - **Offline Capable**: Transcribe audio files without internet connectivity - **Multiple Languages**: Support for 99+ languages with automatic language detection - **High Accuracy**: Leverage state-of-the-art transformer models for professional-grade results ### Key Features #### 1. Audio File Transcription ```python # Transcribe audio files result = await mcp.transcribe_audio( file_path="/recordings/meeting.mp3", language="en", # Optional: auto-detect if not specified task="transcribe" # or "translate" for translation to English ) # Access transcription with timestamps for segment in result["segments"]: print(f"[{segment['start']:.2f}s] {segment['text']}") ``` #### 2. Real-Time Streaming ```python # Stream transcription from microphone async for text in mcp.stream_transcription( device="default", chunk_duration=5, model="base" ): print(f"Heard: {text}") ``` #### 3. Model Selection ```python # Choose model size based on accuracy/speed tradeoff models = await mcp.list_models() # Available: tiny, base, small, medium, large, large-v2, large-v3 result = await mcp.transcribe_audio( file_path="/audio/podcast.wav", model="medium", # Balance of speed and accuracy fp16=True # Use half-precision for faster GPU inference ) ``` #### 4. Batch Processing ```python # Process multiple files efficiently files = ["/audio/ep1.mp3", "/audio/ep2.mp3", "/audio/ep3.mp3"] results = await mcp.batch_transcribe( files=files, output_format="srt", # Generate subtitle files word_timestamps=True ) for file, result in zip(files, results): print(f"Transcribed {file}: {len(result['text'])} characters") ``` ### Configuration ```json { "mcpServers": { "whisper-local": { "command": "python", "args": ["-m", "mcp_whisper"], "env": { "WHISPER_MODEL": "base", "WHISPER_DEVICE": "cuda", "WHISPER_COMPUTE_TYPE": "float16", "WHISPER_DOWNLOAD_ROOT": "~/.cache/whisper" } } } } ``` ### Use Cases **Meeting Transcription**: Automatically transcribe recorded meetings with speaker timestamps, generating searchable meeting notes without cloud dependencies. **Podcast Production**: Create accurate transcripts and subtitle files for podcast episodes, enabling accessibility and SEO optimization. **Voice Note Processing**: Convert voice memos and dictations into text documents while maintaining complete privacy for sensitive content. **Language Learning**: Transcribe foreign language audio with translation capabilities, helping learners understand spoken content. The Local Whisper MCP enables private, cost-effective speech-to-text processing directly within your development environment.

Installation

Configuration

{
  "mcpServers": {
    "whisper-local": {
      "mcpServers": {
        "whisper-local": {
          "args": [
            "-y",
            "local-speech-to-text-mcp"
          ],
          "command": "npx"
        }
      }
    }
  }
}