by twardoch
Brosh is an advanced browser screenshot tool designed for developers, QA testers, AI engineers, and content creators. It excels at capturing comprehensive, scrolling screenshots of webpages using Playwright's asynchronous API.
What it does: Brosh automates the process of taking full-page or partial screenshots, including those requiring scrolling. It can capture single images, a series of images, or even animated PNGs (APNGs) of the scrolling process. Beyond pixels, Brosh intelligently extracts visible text content (as Markdown) and optionally, the underlying HTML structure of captured sections.
Who it's for:
Why it's useful:
Brosh streamlines the process of capturing web content through several key steps:
__init__.py
, cli.py
, api.py
):
CaptureConfig
object is created (defined in models.py
).browser.py
, tool.py
):
BrowserManager
determines the target browser (Chrome, Edge, Safari) based on user input or auto-detection.page
object is configured with the specified viewport dimensions and zoom.capture.py
, tool.py
):
page
navigates to the target URL.from_selector
is provided, the page scrolls to that element.capture.py
):
CaptureManager
calculates scroll positions based on viewport height and scroll_step
.DOMProcessor
(from texthtml.py
) is invoked to extract:
active_selector
for the visible content.visible_text
(converted to Markdown).visible_html
(minified).CaptureFrame
object.image.py
, tool.py
):
ImageProcessor
takes the captured raw image bytes (PNGs from Playwright).scale
is specified, images are downsampled.PNG
(default): Images are optimized using pyoxipng
.JPG
: Images are converted from PNG to JPG, handling transparency.APNG
: All captured frames are compiled into an animated PNG.tool.py
):
BrowserScreenshotTool
orchestrates the saving of processed images to disk.active_selector
or headers).mcp.py
):
brosh mcp
or brosh-mcp
), a FastMCP
server starts.see_webpage
tool that AI agents can call.capture_webpage_async
API function.MCPToolResult
model, potentially including base64-encoded images and/or text/HTML, optimized for AI consumption (e.g., default smaller image scale, text trimming).This modular design allows for flexibility and robust error handling at each stage.
Download pre-built binaries from the latest release for your platform:
# Download the appropriate binary wget https://github.com/twardoch/brosh/releases/latest/download/brosh-linux-x86_64 # Or for macOS Intel: wget https://github.com/twardoch/brosh/releases/latest/download/brosh-macos-x86_64 # Or for macOS Apple Silicon: wget https://github.com/twardoch/brosh/releases/latest/download/brosh-macos-arm64 # Make it executable and install chmod +x brosh-* sudo mv brosh-* /usr/local/bin/brosh # Test the installation brosh --version
# Download the Windows binary Invoke-WebRequest -Uri "https://github.com/twardoch/brosh/releases/latest/download/brosh-windows-x86_64.exe" -OutFile "brosh.exe" # Move to a directory in your PATH or create a local bin directory Move-Item brosh.exe C:\Windows\System32\brosh.exe # Test the installation brosh --version
Note: Binary releases include all dependencies and don't require Python or additional setup. However, you'll still need to install Playwright browsers as shown in section 4.6.
uv is a fast Python package manager.
# Install uv (if not already installed) curl -LsSf https://astral.sh/uv/install.sh | sh # Run brosh directly with uvx (no installation needed) uvx brosh shot "https://example.com" # Or install globally as a command-line tool uv tool install brosh # Install with all optional dependencies (for development, testing, docs) uv tool install "brosh[all]"
# Basic installation python -m pip install brosh # With all optional dependencies python -m pip install "brosh[all]"
pipx installs Python applications in isolated environments.
# Install pipx (if not already installed) python -m pip install --user pipx python -m pipx ensurepath # Install brosh pipx install brosh
For development or to get the latest changes:
git clone https://github.com/twardoch/brosh.git cd brosh python -m pip install -e ".[all]" # Editable install with all extras
After installing the brosh
package, you need to install the browser drivers required by Playwright:
playwright install # To install specific browsers, e.g., only Chromium: # playwright install chromium
This command downloads the browser binaries (Chromium, Firefox, WebKit) that Playwright will use. Brosh primarily targets Chrome, Edge (Chromium-based), and Safari (WebKit-based).
# Capture a single webpage (e.g., example.com) brosh shot "https://example.com" # Capture a local HTML file brosh shot "file:///path/to/your/local/file.html" # For potentially better performance with multiple captures, # start the browser in debug mode first (recommended for Chrome/Edge) brosh --app chrome run # Then, in the same or different terminal: brosh --app chrome shot "https://example.com" # When finished: brosh --app chrome quit # Create an animated PNG showing the scroll brosh shot "https://example.com" --output_format apng # Capture with a custom viewport size (e.g., common desktop) brosh --width 1920 --height 1080 shot "https://example.com" # Extract HTML content along with screenshots and output as JSON brosh shot "https://example.com" --fetch_html --json > page_content.json
Brosh uses a fire
-based CLI. Global options are set before the command.
Pattern: brosh [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS]
Example:
brosh --width 1280 --height 720 shot "https://example.com" --scroll_step 80
See Command Reference for all options.
Brosh offers both synchronous and asynchronous APIs.
import asyncio from brosh import ( capture_webpage, capture_webpage_async, capture_full_page, capture_visible_area, capture_animation ) from brosh.models import ImageFormat # For specifying image formats # --- Synchronous API --- # Best for simple scripts or CLI usage. # It internally manages an asyncio event loop if needed. def capture_sync_example(): print("Running synchronous capture...") result = capture_webpage( url="https://example.com", width=1280, height=720, scroll_step=100, output_format=ImageFormat.JPG, # Use the enum scale=75, # Scale images to 75% fetch_text=True ) print(f"Captured {len(result)} screenshots synchronously.") for path, metadata in result.items(): print(f" - Saved to: {path}") if metadata.get("text"): print(f" Text preview: {metadata['text'][:80]}...") return result # --- Asynchronous API --- # Ideal for integration into async applications (e.g., web servers, MCP). async def capture_async_example(): print("\nRunning asynchronous capture...") result = await capture_webpage_async( url="https://docs.python.org/3/", fetch_html=True, # Get HTML content max_frames=3, # Limit to 3 frames from_selector="#getting-started", # Start capturing from this element output_format=ImageFormat.PNG ) print(f"Captured {len(result)} screenshots asynchronously.") for path, metadata in result.items(): print(f" - Saved to: {path}") print(f" Selector: {metadata.get('selector', 'N/A')}") if metadata.get("html"): print(f" HTML preview: {metadata['html'][:100]}...") return result # --- Convenience Functions (use sync API by default) --- def convenience_functions_example(): print("\nRunning convenience functions...") # Capture entire page in one go (sets height=-1, max_frames=1) # Note: This may not work well for infinitely scrolling pages. full_page_result = capture_full_page( "https://www.python.org/psf/", output_format=ImageFormat.PNG, scale=50 ) print(f"Full page capture result: {list(full_page_result.keys())}") # Capture only the initial visible viewport (sets max_frames=1) visible_area_result = capture_visible_area( "https://www.djangoproject.com/", width=800, height=600 ) print(f"Visible area capture result: {list(visible_area_result.keys())}") # Create an animated PNG of the scrolling process animation_result = capture_animation( "https://playwright.dev/", anim_spf=0.8, # 0.8 seconds per frame max_frames=5 # Limit animation to 5 frames ) print(f"Animation capture result: {list(animation_result.keys())}") # --- Running the examples --- if __name__ == "__main__": sync_results = capture_sync_example() # To run the async example: # asyncio.run(capture_async_example()) # Note: If capture_sync_example() was called, it might have closed # the default event loop on some systems/Python versions if it created one. # For robust mixed sync/async, manage loops carefully or run separately. # For this example, we'll re-get/set a loop for the async part if needed. try: loop = asyncio.get_running_loop() except RuntimeError: loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) async_results = loop.run_until_complete(capture_async_example()) convenience_functions_example() print("\nAll examples complete.")
Brosh can run as an MCP (Model Context Protocol) server, allowing AI tools like Claude to request web captures.
# Start the MCP server using the dedicated command: brosh-mcp # Or, using the main brosh command: brosh mcp
This will start a server that listens for requests from MCP clients.
Configuring with Claude Desktop:
Locate your Claude Desktop configuration file:
~/Library/Application Support/Claude/claude_desktop_config.json
%APPDATA%\Claude\claude_desktop_config.json
~/.config/Claude/claude_desktop_config.json
Add or modify the mcpServers
section. Using uvx
is recommended if uv
is installed:
*Configuration content*json
{
"/path/to/output/domain-ts-scroll-section.png": {
"selector": "css_selector_for_main_content_block",
"text": "Extracted Markdown text from the visible part of the page...",
"html": "..." // Only if --fetch_html is true
},
// ... more entries for other screenshots
}
**Metadata Fields:**
- `selector` (str): A CSS selector identifying the most relevant content block visible in that frame (e.g., `main`, `article#content`, `div.product-details`). Defaults to `body` if no more specific selector is found.
- `text` (str): Markdown representation of the text content visible in the frame. Extracted using `html2text`. Included if `fetch_text` is true (default). Can be trimmed if `trim_text` is true (default).
- `html` (str, optional): Minified HTML of the elements fully visible in the frame. Included only if `fetch_html` is true.
For `apng` format, the JSON output will typically contain a single entry for the animation file, with metadata like `{"selector": "animated", "text": "Animation with N frames", "frames": N}`.
## 9. Technical Architecture Deep Dive
Brosh is built as a modular Python package. Understanding its architecture can help with advanced usage, customization, and contributions.
### 9.1. Core Modules and Flow
The primary logic flows through these key modules in `src/brosh/`:
1. **`__main__.py` & `cli.py`**:
- `__main__.py` is the entry point for the `brosh` CLI command.
- `cli.py` defines `BrowserScreenshotCLI` using `python-fire`. It parses CLI arguments, initializes common settings, and maps commands (`run`, `quit`, `shot`, `mcp`) to their respective methods. These methods often delegate to `api.py`.
1. **`api.py`**:
- This is the public interface for Brosh's functionality. It provides `capture_webpage` (synchronous wrapper) and `capture_webpage_async` (core asynchronous logic).
- It takes all capture parameters, validates them by creating a `CaptureConfig` object (from `models.py`), and then instantiates and uses `BrowserScreenshotTool`.
- Convenience functions like `capture_full_page` are also defined here.
1. **`tool.py`**:
- Contains `BrowserScreenshotTool`, the main orchestrator.
- Its `capture` method (async) manages the entire screenshot process:
- Sets up output directories.
- Determines screen dimensions if not specified.
- Initializes Playwright.
- Uses `BrowserManager` to get a browser `page`.
- Uses `CaptureManager` to get `CaptureFrame` objects.
- Processes these frames (scaling, format conversion via `ImageProcessor`, saving files).
- Cleans up browser resources.
1. **`browser.py`**:
- `BrowserManager` handles browser interactions:
- Detecting available browsers (Chrome, Edge, Safari) and their paths.
- Getting screen dimensions (OS-specific logic, Retina display handling on macOS).
- Launching browsers in debug mode or connecting to existing ones using Playwright's `connect_over_cdp` (for Chromium-based browsers) or `launch`.
- Managing debug ports (default: Chrome 9222, Edge 9223, Safari 9225).
1. **`capture.py`**:
- `CaptureManager` is responsible for the actual screenshotting logic on a given Playwright `page`:
- Navigating to the URL.
- Handling `from_selector` to scroll to a starting point.
- Calculating scroll positions based on `scroll_step` and viewport height.
- Looping through scroll positions:
- Scrolling the page (`page.evaluate("window.scrollTo(...)")`).
- Waiting for dynamic content to settle.
- Taking a screenshot of the current viewport (`page.screenshot()`).
- Using `DOMProcessor` to get visible HTML, text, and active selector.
- Storing this as a `CaptureFrame` dataclass instance.
1. **`texthtml.py`**:
- `DOMProcessor` extracts content from the browser's DOM:
- `extract_visible_content()`: Executes JavaScript in the page to get fully visible elements' HTML, then converts it to Markdown text using `html2text`. Also determines an `active_selector`.
- `get_section_id()`: Executes JavaScript to find a semantic ID for the current view (e.g., from a nearby header).
- `compress_html()`: Minifies HTML by removing comments, excessive whitespace, large data URIs, etc.
1. **`image.py`**:
- `ImageProcessor` performs image manipulations in memory using Pillow and `pyoxipng`:
- `optimize_png_bytes()`: Optimizes PNGs.
- `downsample_png_bytes()`: Resizes images.
- `convert_png_to_jpg_bytes()`: Converts PNG to JPG, handling transparency.
- `create_apng_bytes()`: Assembles multiple PNG frames into an animated PNG.
1. **`models.py`**:
- Defines Pydantic models and Enums:
- `ImageFormat` (Enum: PNG, JPG, APNG) with properties for MIME type and extension.
- `CaptureConfig` (Dataclass): Holds all settings for a capture job, including validation logic.
- `CaptureFrame` (Dataclass): Represents a single captured frame's data (image bytes, scroll position, text, HTML, etc.).
- `MCPTextContent`, `MCPImageContent`, `MCPToolResult` (Pydantic Models): Define the structure for data exchange in MCP mode.
1. **`mcp.py`**:
- Implements the MCP server using `FastMCP`.
- Defines an async tool function `see_webpage` that mirrors `api.capture_webpage_async`'s signature.
- This function calls Brosh's core capture logic.
- It then formats the results into `MCPToolResult` (defined in `models.py`), handling `fetch_image`, `fetch_image_path`, `fetch_text`, `fetch_html` flags to tailor the output for AI agents.
- Includes logic to apply size limits to MCP responses.
- The `brosh-mcp` script entry point is also here.
### 9.2. Key Architectural Components (Business Domains)
As outlined in `CLAUDE.md`, Brosh's functionality can be grouped into three core domains:
#### 9.2.1. Content Capture Engine (`capture.py`, `texthtml.py`)
- **Semantic Section Detection:**
- `DOMProcessor.get_section_id()` analyzes the DOM for prominent headers (`<h1>`-`<h6>`) or elements with IDs near the top of the viewport. This generates a human-readable identifier used in filenames (e.g., `introduction`, `installation-steps`).
- `DOMProcessor.extract_visible_content()` identifies the most encompassing, fully visible elements to determine the `active_selector` and extract their HTML and text. This helps in associating screenshots with specific content blocks.
- **Viewport Management & Scrolling:**
- `CaptureManager` progressively scrolls the page. The `scroll_step` (percentage of viewport height) allows for overlapping captures if less than 100%, ensuring no content is missed.
- It dynamically determines total page height (`document.documentElement.scrollHeight`).
- Waits after scrolls (`SCROLL_AND_CONTENT_WAIT_SECONDS`) allow for dynamically loading or expanding content to render before capture.
#### 9.2.2. Browser Integration Layer (`browser.py`)
- **Platform-Specific Browser Management:**
- `BrowserManager.get_browser_name()` uses a priority system (Chrome > Edge > Safari on macOS) for auto-detection.
- `BrowserManager.is_browser_available()` and `get_browser_paths()` check for browser installations in common locations across OSes.
- Firefox is explicitly unsupported. Safari is restricted to macOS.
- **Resolution Detection & Debug Mode:**
- `BrowserManager.get_screen_dimensions()` attempts to get logical screen resolution, accounting for Retina displays on macOS (by checking physical resolution from `system_profiler`). Falls back to defaults if detection fails.
- Orchestrates launching browsers with remote debugging enabled (`--remote-debugging-port`) on specific ports (Chrome: 9222, Edge: 9223). This allows connection to the user's actual browser profile. WebKit (Safari) uses a different launch mechanism.
#### 9.2.3. AI Integration Protocol (`mcp.py`)
- **MCP Tool Integration:**
- The `see_webpage` function acts as the tool interface for AI systems (like Claude) via `FastMCP`.
- It handles requests, invokes Brosh's core capture logic, and formats the output.
- **Visual & Textual Context Extraction:**
- The MCP tool can return:
- Base64-encoded image data (`fetch_image=True`).
- File paths to saved images (`fetch_image_path=True`).
- Extracted Markdown text (`fetch_text=True`).
- Minified HTML (`fetch_html=True`).
- This provides rich, multi-modal context to language models. Default MCP scale is 50% to keep image sizes manageable. Text can be trimmed.
- **HTML/Selector Mapping:**
- The `selector` field in metadata links screenshots and extracted text/HTML to a specific part of the DOM structure.
### 9.3. Core Business Rules
1. **Screenshot Organization:**
- Files are stored in a structured way, optionally in domain-based subdirectories (`--subdirs`).
- Filenames include domain, timestamp, scroll percentage, and a semantic section ID, promoting findability.
- Overlap in captures is controlled by `scroll_step` (10-200% of viewport height).
1. **Browser Constraints:**
- Safari usage is limited to macOS.
- Firefox is not supported.
- Specific debug ports are assigned per browser type.
1. **Content Processing:**
- `scroll_step` calculation is based on viewport percentage.
- Section-aware naming for files.
- Waits are implemented to handle dynamic content loading.
- Text is automatically extracted as Markdown; HTML extraction is optional.
### 9.4. Data Models (`models.py`)
The use of Pydantic models and dataclasses ensures type safety, validation, and clear data structures throughout the application. `CaptureConfig` centralizes all job parameters, `CaptureFrame` standardizes per-frame data, and MCP models ensure compliant communication with AI tools.
## 10. Coding and Contributing Guidelines
We welcome contributions to Brosh! Please follow these guidelines to ensure a smooth process.
### 10.1. Development Setup
1. **Clone the repository:**
```bash
git clone https://github.com/twardoch/brosh.git
cd brosh
uv
for faster environment setup:
Alternatively, usinguv venv # Create a virtual environment source .venv/bin/activate # Or .venv\Scripts\activate on Windows uv pip install -e ".[all]"
pip
:
python -m venv .venv source .venv/bin/activate # Or .venv\Scripts\activate on Windows python -m pip install -e ".[all]"
pre-commit install
Brosh uses pytest
for testing.
# Run all tests pytest # Run tests with coverage report pytest --cov=src/brosh --cov-report=term-missing # Run tests in parallel (if you have pytest-xdist, included in [all]) pytest -n auto # Run specific test file or test function pytest tests/test_api.py pytest tests/test_cli.py::TestBrowserScreenshotCLI::test_shot_basic
Ensure your changes pass all tests and maintain or increase test coverage. Test configuration is in pyproject.toml
under [tool.pytest.ini_options]
.
This project uses Ruff
for linting and formatting, and mypy
for type checking. Pre-commit hooks should handle most of this automatically.
ruff format src tests
ruff check --fix --unsafe-fixes src tests
mypy src
Configuration for these tools is in pyproject.toml
.
Adhere to these principles when developing:
README.md
, CHANGELOG.md
, TODO.md
) in mind.list
, dict
, |
for unions).loguru
for logging. Add verbose
mode logging and debug-log
where appropriate.python-fire
) Standard shebangs and this_file
comments are good practice.AGENT.md
):pre-commit
handles much of this, be aware of the typical full check sequence:
(Note:# This is a conceptual guide; pre-commit and hatch scripts automate parts of this. # uzpy run . # (If used, uzpy seems to be a project-specific runner) fd -e py -x autoflake --remove-all-unused-imports -ir . # Remove unused imports fd -e py -x pyupgrade --py310-plus . # Upgrade syntax fd -e py -x ruff check --fix --unsafe-fixes . # Ruff lint and fix fd -e py -x ruff format . # Ruff format python -m pytest # Run tests
fd
is a command-line tool. If not installed, adapt Ruff/Autoflake commands to scan src
and tests
.)hatch run lint:all
, hatch run test:test-cov
) simplify running these checks.feature/new-output-option
or fix/timeout-issue
).main
branch of twardoch/brosh
.
AGENT.md
: Contains detailed working principles, Python guidelines, and tool usage instructions relevant to AI-assisted development for this project.CLAUDE.md
: Provides an overview of Brosh's architecture, commands, and development notes, particularly useful for understanding the system's design.pyproject.toml
: Defines dependencies, build system, and tool configurations (Ruff, Mypy, Pytest, Hatch).(This section is largely the same as the original README, as it was already comprehensive. Minor updates for consistency.)
Error: "Could not find chrome installation" or similar.
Solution:
brosh --app edge shot "https://example.com"
playwright install
to ensure Playwright's browser binaries are correctly installed.Error: "Failed to connect to browser", "TimeoutError", or browser doesn't launch as expected.
Solution:
brosh --app chrome run
. Then, in another terminal, run your shot
command.--force_run
option with run
can help.sudo apt-get install -y libgbm-dev libnss3 libxss1 libasound2 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdbus-1-3 libdrm2 libexpat1 libgbm1 libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxtst6
.Error: "Screenshot timeout for position X" or page content doesn't load.
Solution:
brosh shot "..." --output_format jpg
.Error: "Permission denied" when saving screenshots.
Solution:
output_dir
.brosh --output_dir /tmp/screenshots shot "..."
Enable verbose logging for detailed troubleshooting:
brosh --verbose shot "https://example.com"
This will print debug messages from loguru
to stderr.
uvx
or pipx
in environments like Git Bash or WSL, ensure paths are correctly resolved.This project is licensed under the MIT License - see the LICENSE file for details.
pyproject.toml
.No version information available
2 contributors
twardoch (Adam Twardoch)
@twardoch
claude (Claude)
@claude