AI image recognition server with Anthropic and OpenAI APIs supporting OCR and multiple formats
The MCP Image Recognition Server is an advanced technical solution that integrates AI capabilities for image recognition and description, leveraging the power of Anthropic and OpenAI vision APIs. Developed with support for multiple image formats such as JPEG, PNG, GIF, and WebP, this server offers a versatile platform for developers looking to deepen the intelligence in their applications by analyzing visual data.
The core features of the MCP Image Recognition Server are built around Model Context Protocol (MCP) to ensure seamless integration with AI applications. Key among these is its ability to provide detailed descriptions of images, achieved through APIs from Anthropic and OpenAI Vision. Additionally, it supports multiple image formats, ensuring compatibility with a wide range of inputs. Configurable primary and fallback providers allow for flexibility in backend service selection based on the desired balance between performance and cost.
The architecture of the MCP Image Recognition Server is designed to work closely with various AI applications through MCP. The server acts as an intelligent intermediary, routing requests from a client to an appropriate vision API provider while abstracting away implementation details. This adherence to MCP ensures that it can seamlessly integrate into existing MCPlink-based systems and applications.
graph TD
A[AI Application] -->|MCP Client| B[MCP Server]
B --> C[Vision API Provider (Anthropic/OAIP)]
C --> D[Detailed Image Description]
style A fill:#e1f5fe
style C fill:#f3e5f5
graph TD
A[Client] --> B[MCP Server (API Gateway)]
B --> C[Vision API Providers]
C --> D[Detailed Image Descriptions & Metadata]
style A fill:#e1f7ef
style B fill:#c6fff3
To begin using the MCP Image Recognition Server, follow these steps:
Clone the Repository:
git clone https://github.com/mario-andreschak/mcp-image-recognition.git
cd mcp-image-recognition
Create and Configure Your Environment File:
cp .env.example .env
# Edit .env with your API keys and preferences
Build the Project:
build.bat
The MCP Image Recognition Server excels in several real-world applications where visual data needs thorough analysis:
Automated Product Categorization: E-commerce platforms can use this server to automatically categorize products based on images, enhancing search capabilities and user experience.
Medical Image Analysis: Medical professionals could leverage the server for preliminary image analysis, assisting in diagnosing certain conditions or diseases through visual data.
The MVP of this server is fully compatible with MCP clients such as Claude Desktop, Continue, Cursor, among others. The compatibility matrix below outlines which features are supported by each client:
MCP Client | Resources | Tools | Prompts |
---|---|---|---|
Claude Desktop | ✅ | ✅ | ✅ |
Continue | ✅ | ✅ | ✅ |
Cursor | ❌ | ✅ | ❌ |
The server is highly compatible with multiple tools and resources, offering a robust environment for developers:
graph TB
classDef success fill:#c9dca5;
classDef warning fill:#e6a03b;
ClaudeDesktop[Claude Desktop] -- Full Support --> success
Continue[Continue] -- Full Support --> success
Cursor[Cursor] -- Tools Only --> warning
Ensure your environment is properly configured by setting up the following variables in your .env
file:
ANTHROPIC_API_KEY
: Your Anthropic API key.OPENAI_API_KEY
: Your OpenAI API key.VISION_PROVIDER
: Primary vision provider (anthropic
or openai
).FALLBACK_PROVIDER
: Optional fallback provider.LOG_LEVEL
: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR
: Enable Tesseract OCR text extraction (true
or false
).TESSERACT_CMD
: Optional custom path to Tesseract executable.OPENAI_MODEL
: OpenAI Model (default: gpt-4o-mini
). Can use OpenRouter format for other models (e.g., anthropic/claude-3.5-sonnet:beta
).OPENAI_BASE_URL
: Optional custom base URL for the OpenAI API.OPENAI_TIMEOUT
: Optional custom timeout (in seconds) for the OpenAI API.{
"mcpServers": {
"[server-name]": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-[name]"],
"env": {
"API_KEY": "your-api-key"
}
}
}
}
graph TD
A[Environment Variables] --> B[.env file]
style A fill:#e1f5fe
style B fill:#f3e5f5
Q: Can I use different AI providers with this server? A: Yes, the MCP Image Recognition Server supports both Anthropic and OpenAI for vision processing.
Q: Is text extraction available via OCR? A: Text extraction is optional using Tesseract OCR if enabled in your environment configuration.
Q: How do I start the server quickly without manually configuring everything each time?
A: You can use a batch file script provided by run.bat
for easier startup and debugging.
Q: Does this support all image formats? A: The current version supports JPEG, PNG, GIF, and WebP formats for optimal performance.
Q: How do I set up the server to work with MCP clients like Continue or Cursor?
A: Ensure your .env
file includes the necessary API keys and provider settings to enable seamless functionality with these MCPlink clients.
If you wish to contribute to this project, please follow our guidelines:
Fork & Clone Repository:
git clone https://github.com/mario-andreschak/mcp-image-recognition.git
cd mcp-image-recognition
Create a New Branch and Make Changes:
git checkout -b feature-your-feature
Run Tests for Verification:
run.bat test
Commit Your Changes:
git commit -m "Add new feature"
Push to GitHub:
git push origin feature-your-feature
Create Pull Request from GitHub UI
For more information about Model Context Protocol and its ecosystem, visit the official documentation:
The MCP Image Recognition Server stands as a powerful tool for enhancing AI applications with advanced image recognition capabilities. By adhering to the Model Context Protocol, this server ensures seamless integration with various AI clients and resources, making it an invaluable component in modern application development.
This comprehensive documentation is designed to provide clear guidance on using the MCP Image Recognition Server while emphasizing its role in the broader MCP ecosystem for developers building robust AI solutions.
RuinedFooocus is a local AI image generator and chatbot image server for seamless creative control
Learn to set up MCP Airflow Database server for efficient database interactions and querying airflow data
Simplify MySQL queries with Java-based MysqlMcpServer for easy standard input-output communication
Build stunning one-page websites track engagement create QR codes monetize content easily with Acalytica
Access NASA APIs for space data, images, asteroids, weather, and exoplanets via MCP integration
Explore CoRT MCP server for advanced self-arguing AI with multi-LLM inference and enhanced evaluation methods