Browser Automation
Your personal assistant can browse the web. Kraken includes a headless Chromium browser as a built-in tool, enabling web research, data extraction, form filling, and screenshot capture — all from within a conversation.
Architecture
Kraken API ──(Playwright CDP)──> Browserless (Chromium)
The browser runs as a separate Docker container (Browserless), managed by Docker Compose. Kraken connects via the Chrome DevTools Protocol (CDP) using Playwright.
Configuration
# Browser connection
BROWSER_WS_ENDPOINT=ws://browserless:3000 # WebSocket endpoint for CDP
# Security
KRAKEN_BROWSER_BLOCKED_HOSTS= # Additional hosts to block (comma-separated)
The browser container is included in the default docker-compose.yml:
browserless:
image: ghcr.io/browserless/chromium:latest
restart: unless-stopped
environment:
- MAX_CONCURRENT_SESSIONS=5
- CONNECTION_TIMEOUT=120000
What the Agent Can Do
When the browser tool is available, the agent can:
| Action | Description |
|---|---|
| Navigate | Go to any public URL |
| Extract text | Get the text content of a page or specific element |
| Screenshot | Capture full-page or element screenshots |
| Click | Click buttons, links, or any element by selector |
| Fill forms | Type into input fields, select dropdowns |
| Wait | Wait for elements to appear before interacting |
| Execute JS | Run JavaScript in the page context |
Example conversation
You: Find the latest release of Node.js and tell me what's new.
Kraken: I'll check the Node.js releases page.
[Agent uses browser: navigates to nodejs.org/en/blog, extracts latest release post]
The latest Node.js release is v22.15.0 (LTS). Key changes include...
The agent decides when to use the browser based on the query. You don't need to explicitly ask it to "use the browser" — if web information is needed, it will browse.
SSRF Protection
Kraken blocks requests to internal/private IP ranges by default. The browser cannot navigate to:
127.0.0.0/8(localhost)10.0.0.0/8(private class A)172.16.0.0/12(private class B)192.168.0.0/16(private class C)169.254.0.0/16(link-local)::1(IPv6 loopback)fc00::/7(IPv6 unique local)
This prevents the agent from being used to probe internal services.
Additional blocked hosts
You can block additional domains via KRAKEN_BROWSER_BLOCKED_HOSTS:
KRAKEN_BROWSER_BLOCKED_HOSTS=internal.company.com,admin.local
Using Browser via the API
You don't call browser actions directly — the agent uses the browser tool autonomously when needed during chat. However, you can guide it:
# Research task — agent will use browser if needed
response = client.chat(
"Go to https://news.ycombinator.com and summarize the top 5 stories",
session_key="research",
)
print(response.content)
# Screenshot request
response = client.chat(
"Take a screenshot of https://example.com",
session_key="screenshots",
)
# The screenshot is captured and described in the response
Tool calls in response
When the agent uses the browser, you'll see it reflected in tool_calls:
response = client.chat("Check the weather on weather.gov")
for tool_call in response.tool_calls:
print(f"Tool: {tool_call.name}")
print(f"Args: {tool_call.arguments}")
Limitations
- No persistent browser state — Each browser action starts with a clean session. No cookies or login state persists between conversations.
- Public URLs only — SSRF protection blocks private/internal addresses.
- Timeout — Browser actions time out after 120 seconds (configurable via
CONNECTION_TIMEOUTon the Browserless container). - No file downloads — The browser can view pages and extract content but doesn't download files to disk.
- JavaScript-heavy pages — Pages that require complex JS interaction may need explicit wait/click instructions.
Disabling the Browser
If you don't need browser capabilities:
- Remove the
browserlessservice fromdocker-compose.yml - Unset
BROWSER_WS_ENDPOINTin your.env
The agent will gracefully degrade — it won't attempt browser actions if no endpoint is configured.