by webscraping-ai
A Model Context Protocol (MCP) server implementation that integrates with WebScraping.AI for web data extraction capabilities.
env WEBSCRAPING_AI_API_KEY=your_api_key npx -y webscraping-ai-mcp
# Clone the repository git clone https://github.com/webscraping-ai/webscraping-ai-mcp-server.git cd webscraping-ai-mcp-server # Install dependencies npm install # Run npm start
Note: Requires Cursor version 0.45.6+
The WebScraping.AI MCP server can be configured in two ways in Cursor:
.cursor/mcp.json
file in your project directory:
## Configuration
### Environment Variables
#### Required
- `WEBSCRAPING_AI_API_KEY`: Your WebScraping.AI API key
- Required for all operations
- Get your API key from [WebScraping.AI](https://webscraping.ai)
#### Optional Configuration
- `WEBSCRAPING_AI_CONCURRENCY_LIMIT`: Maximum number of concurrent requests (default: `5`)
- `WEBSCRAPING_AI_DEFAULT_PROXY_TYPE`: Type of proxy to use (default: `residential`)
- `WEBSCRAPING_AI_DEFAULT_JS_RENDERING`: Enable/disable JavaScript rendering (default: `true`)
- `WEBSCRAPING_AI_DEFAULT_TIMEOUT`: Maximum web page retrieval time in ms (default: `15000`, max: `30000`)
- `WEBSCRAPING_AI_DEFAULT_JS_TIMEOUT`: Maximum JavaScript rendering time in ms (default: `2000`)
### Configuration Examples
For standard usage:
```bash
# Required
export WEBSCRAPING_AI_API_KEY=your-api-key
# Optional - customize behavior (default values)
export WEBSCRAPING_AI_CONCURRENCY_LIMIT=5
export WEBSCRAPING_AI_DEFAULT_PROXY_TYPE=residential # datacenter or residential
export WEBSCRAPING_AI_DEFAULT_JS_RENDERING=true
export WEBSCRAPING_AI_DEFAULT_TIMEOUT=15000
export WEBSCRAPING_AI_DEFAULT_JS_TIMEOUT=2000
webscraping_ai_question
)Ask questions about web page content.
*Configuration content*
Example response:
*Configuration content*
webscraping_ai_fields
)Extract structured data from web pages based on instructions.
*Configuration content*
Example response:
*Configuration content*
webscraping_ai_html
)Get the full HTML of a web page with JavaScript rendering.
*Configuration content*
Example response:
*Configuration content*
webscraping_ai_text
)Extract the visible text content from a web page.
*Configuration content*
Example response:
*Configuration content*
webscraping_ai_selected
)Extract content from a specific element using a CSS selector.
*Configuration content*
Example response:
*Configuration content*
webscraping_ai_selected_multiple
)Extract content from multiple elements using CSS selectors.
*Configuration content*
Example response:
*Configuration content*
webscraping_ai_account
)Get information about your WebScraping.AI account.
*Configuration content*
Example response:
*Configuration content*
The following options can be used with all scraping tools:
timeout
: Maximum web page retrieval time in ms (15000 by default, maximum is 30000)js
: Execute on-page JavaScript using a headless browser (true by default)js_timeout
: Maximum JavaScript rendering time in ms (2000 by default)wait_for
: CSS selector to wait for before returning the page contentproxy
: Type of proxy, datacenter or residential (residential by default)country
: Country of the proxy to use (US by default). Supported countries: us, gb, de, it, fr, ca, es, ru, jp, kr, incustom_proxy
: Your own proxy URL in "http://user:password@host:port" formatdevice
: Type of device emulation. Supported values: desktop, mobile, tableterror_on_404
: Return error on 404 HTTP status on the target page (false by default)error_on_redirect
: Return error on redirect on the target page (false by default)js_script
: Custom JavaScript code to execute on the target pageThe server provides robust error handling:
Example error response:
*Configuration content*
This server implements the Model Context Protocol, making it compatible with any MCP-enabled LLM platforms. You can configure your LLM to use these tools for web scraping tasks.
const { Claude } = require('@anthropic-ai/sdk'); const { Client } = require('@modelcontextprotocol/sdk/client/index.js'); const { StdioClientTransport } = require('@modelcontextprotocol/sdk/client/stdio.js'); const claude = new Claude({ apiKey: process.env.ANTHROPIC_API_KEY }); const transport = new StdioClientTransport({ command: 'npx', args: ['-y', 'webscraping-ai-mcp'], env: { WEBSCRAPING_AI_API_KEY: 'your-api-key' } }); const client = new Client({ name: 'claude-client', version: '1.0.0' }); await client.connect(transport); // Now you can use Claude with WebScraping.AI tools const tools = await client.listTools(); const response = await claude.complete({ prompt: 'What is the main topic of example.com?', tools: tools });
# Clone the repository git clone https://github.com/webscraping-ai/webscraping-ai-mcp-server.git cd webscraping-ai-mcp-server # Install dependencies npm install # Run tests npm test # Add your .env file cp .env.example .env # Start the inspector npx @modelcontextprotocol/inspector node src/index.js
npm test
MIT License - see LICENSE file for details