Site Crawler MCP - Web Crawling and Data Extraction

12 Extraction Modes

Choose from comprehensive extraction modes tailored for different use cases

🖼️

Images

Extract all images with metadata, file sizes, formats, and alt-text analysis

📊

SEO Metadata

Complete SEO analysis including title tags, meta descriptions, and structured data

🏢

Brand Information

Extract brand assets, logos, company information, and brand guidelines

⚡

Performance Metrics

Analyze loading times, resource sizes, and performance optimization opportunities

🔒

Security Headers

Comprehensive security analysis including SSL, HTTPS, and security headers

⚖️

Legal Compliance

Detect privacy policies, GDPR compliance, terms of service, and legal pages

🏗️

Infrastructure

Technical infrastructure details, hosting information, and technology stack

💼

Career Information

Extract job postings, career pages, and hiring information

📞

Contact Details

Find contact information, addresses, phone numbers, and social media links

👥

Client References

Extract client testimonials, case studies, and business references

🛒

E-commerce Assets

Product images, pricing information, and e-commerce platform analysis

🔍

Content Analysis

Text analysis, keyword density, content structure, and readability metrics

Usage Examples

See how to use Site Crawler MCP in different scenarios

Basic Image Extraction

{
  "modes": ["images"],
  "depth": 2
}

Extract all images from a website up to 2 levels deep with metadata

SEO + Images Combined

{
  "modes": ["images", "meta"],
  "depth": 1
}

Get both image assets and SEO metadata in a single crawl session

Complete Site Analysis

{
  "modes": ["images", "meta", "security", "brand"],
  "depth": 3
}

Comprehensive analysis including security headers, brand info, and compliance

Contributors

Meet the team behind Site Crawler MCP

Andaç Güven

Author

Muhammed Kılıç

Contributor

Technical Specifications

Built with modern Python technologies for optimal performance

🐍

Python 3.10+

Modern Python with async/await support for concurrent operations

⚡

Async HTTP

Built with aiohttp for high-performance asynchronous web crawling

🔧

BeautifulSoup4

Advanced HTML parsing and extraction with CSS selectors

🚦

Rate Limiting

1-2 second delays with maximum 5 concurrent requests

🔄

Retry Logic

Automatic retry with exponential backoff for failed requests

📱

MCP Protocol

Native Model Context Protocol integration for AI assistants

Installation Methods

Multiple installation options to suit your development environment

PyPI Installation (Coming Soon)

pip install site-crawler-mcp

Simple installation from Python Package Index when available

Source Installation with uv

git clone https://github.com/AndacGuven/site-crawler-mcp.git
cd site-crawler-mcp
uv pip install -e .

Development installation using the modern uv package manager

Virtual Environment Setup

python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -e .

Recommended setup with isolated Python environment