MCP server for PDF text extraction and OCR
## PDF Reader MCP Server: Advanced PDF Processing The **PDF Reader MCP Server** provides specialized PDF processing capabilities within Google Antigravity, enabling AI-assisted reading, extraction, and analysis of PDF documents with OCR support. ### Why PDF Reader MCP? - **Text Extraction**: Extract text from both native and scanned PDFs with high accuracy - **OCR Integration**: Optical character recognition for scanned documents and images - **Structure Preservation**: Maintain document structure including headings, lists, and tables - **Form Data Extraction**: Extract data from fillable PDF forms automatically - **Annotation Access**: Read and process PDF annotations and comments ### Key Features #### 1. PDF Reading ```python from anthropic import Anthropic client = Anthropic() response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{ "role": "user", "content": "Read the PDF report and summarize the key findings from each section" }] ) ``` #### 2. OCR Processing ```python # Process scanned PDFs response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{ "role": "user", "content": "Extract text from this scanned invoice using OCR" }] ) ``` #### 3. Form Extraction ```python # Extract form data response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{ "role": "user", "content": "Extract all form field values from the PDF application" }] ) ``` #### 4. Table Parsing ```python # Parse tables response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{ "role": "user", "content": "Extract all tables from the financial report PDF as structured data" }] ) ``` ### Configuration ```json { "mcpServers": { "pdf-reader": { "command": "npx", "args": ["-y", "@anthropic/mcp-server-pdf-reader"], "env": { "PDF_OCR_ENABLED": "true", "PDF_OCR_LANGUAGE": "eng" } } } } ``` ### Use Cases **Invoice Processing**: Extract data from invoices for automated accounting workflows. **Legal Document Analysis**: Parse legal PDFs and extract relevant clauses and terms. **Research Papers**: Extract text and citations from academic PDF documents. **Form Automation**: Automate data entry by extracting form field values. The PDF Reader MCP Server provides comprehensive PDF processing, enabling intelligent document analysis and data extraction.
{
"mcpServers": {
"pdf-reader": {
"mcpServers": {
"pdf-reader": {
"args": [
"pdf-reader-mcp"
],
"command": "uvx"
}
}
}
}
}