Webscan Server for web content analysis including page fetching link extraction site crawling and sitemap generation
The Webscan Server MCP server is a specialized service designed to facilitate web content scanning, analysis, and extraction capabilities through the Model Context Protocol (MCP). It offers a robust suite of tools tailored for tasks such as fetching and converting pages to Markdown, extracting links and analyzing their text, recursively crawling websites for deeper exploration, verifying link validity, matching specific patterns with URLs, and generating XML sitemaps. By leveraging MCP, this server ensures seamless integration with various AI applications like Claude Desktop, Continue, Cursor, and more, making it an indispensable tool for any project requiring web content analysis.
This Webscan Server is equipped with several core features that enable comprehensive web content analysis:
The server can convert a web page into Markdown format. This feature is invaluable when you need to analyze and present the content of a webpage in a structured manner, making it easier for developers and analysts to navigate through the text.
With the ability to extract and analyze all links from a given webpage, this service identifies both functional and non-functional links. The provided parameters allow users to set filters based on URLs or limit the number of results returned, ensuring that only relevant data is processed.
Webscan offers functionalities for recursively crawling websites up to specified depths. This capability is particularly useful when performing audits, SEO optimization, or creating site maps, as it enables a thorough exploration of linked content across the entire domain.
The server facilitates checking for broken links on web pages and finding URLs that match specific patterns using JavaScript-compatible regular expressions. These tools are crucial in maintaining the integrity and usability of sites by identifying outdated or incorrect references early on.
Generate XML sitemaps to ensure that search engines can efficiently discover and index your website's content comprehensively. By setting a maximum crawl depth, you control how far the server should explore from the root URL, optimizing both performance and resource usage.
The Webscan Server implements the Model Context Protocol (MCP) to deliver seamless connectivity with various AI clients such as Claude Desktop. By adhering严格遵守指令,以下是根据提供的 README 内容转换并优化后的技术文档:
The WebScan Server MCP server is a specialized service designed to facilitate web content scanning, analysis, and extraction capabilities through the Model Context Protocol (MCP). It offers tools like fetching web pages and converting them into Markdown, extracting links, analyzing sites recursively, checking link validity, matching URL patterns, and generating XML sitemaps. By leveraging MCP, this server ensures seamless integration with various AI applications such as Claude Desktop, Continue, Cursor, etc., making it an indispensable tool for projects requiring web content analysis.
This WebScan Server is equipped with several core features that enable comprehensive web content analysis:
The server can convert a web page into Markdown format. This feature is invaluable when you need to analyze and present the content of a webpage in a structured manner, making it easier for developers and analysts to navigate through the text.
With the ability to extract and analyze all links from a given webpage, this service identifies both functional and non-functional links. The provided parameters allow users to set filters based on URLs or limit the number of results returned, ensuring that only relevant data is processed.
Webscan offers functionalities for recursively crawling websites up to specified depths. This capability is particularly useful when performing audits, SEO optimization, or creating site maps, as it enables a thorough exploration of linked content across the entire domain.
The server facilitates checking for broken links on web pages and finding URLs that match specific patterns using JavaScript-compatible regular expressions. These tools are crucial in maintaining the integrity and usability of sites by identifying outdated or incorrect references early on.
Generate XML sitemaps to ensure that search engines can efficiently discover and index your website's content comprehensively. By setting a maximum crawl depth, you control how far the server should explore from the root URL, optimizing both performance and resource usage.
The WebScan Server implements the Model Context Protocol (MCP) to deliver seamless connectivity with various AI clients such as Claude Desktop. By adhering to the MCP specification, it ensures compatibility and standardized communication between the server and client applications. The architecture includes a structured service configuration and a well-defined set of tools that can be executed through MCP commands.
To install Webscan for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install mcp-server-webscan --client claude
git clone <repository-url>
cd mcp-server-webscan
npm install
npm run build
{
"mcpServers": {
"webscan": {
"command": "node",
"args": ["path/to/mcp-server-webscan/build/index.js"],
"env": {
"NODE_ENV": "development",
"LOG_LEVEL": "info" // Set log level via env var
}
}
}
}
Could you fetch the content from https://example.com and convert it to Markdown?
Support for other MCP clients is planned and can be extended as needed.
graph TD
A[AI Application] -->|MCP Client| B[MCP Server]
B --> C[Data Source/Tool]
This diagram illustrates the flow of communication between an AI application, the WebScan Server via the MCP client, and finally to the data source or tool being accessed.
MCP Client | Resources | Tools | Prompts | Status |
---|---|---|---|---|
Claude Desktop | ✅ | ✅ | ✅ | Full Support |
Continue | ✅ | ✅ | ✅ | Full Support |
Cursor | ❌ | ✅ | ❌ | Tools Only |
The WebScan Server supports advanced configuration through environment variables and custom configurations. It includes comprehensive error handling and secure execution practices to ensure reliable performance. Users can customize the server's behavior by modifying the env
settings in the mcpServers
section of their MCP client’s configuration.
This comprehensive documentation highlights the WebScan Server's capabilities and integration with various MCP clients, ensuring that developers can effectively utilize this tool for web content analysis in their projects.
RuinedFooocus is a local AI image generator and chatbot image server for seamless creative control
Learn to set up MCP Airflow Database server for efficient database interactions and querying airflow data
Simplify MySQL queries with Java-based MysqlMcpServer for easy standard input-output communication
Build stunning one-page websites track engagement create QR codes monetize content easily with Acalytica
Access NASA APIs for space data, images, asteroids, weather, and exoplanets via MCP integration
Explore CoRT MCP server for advanced self-arguing AI with multi-LLM inference and enhanced evaluation methods