Architecture
Kraken is a self-contained stack that runs via Docker Compose. You deploy it once on your own hardware and it becomes your personal AI assistant for everything β coding, research, scheduling, browsing, automation. No external services, no third-party memory providers, no vendor lock-in. Every conversation from every connected platform feeds the same brain.
System Overview
Your App (Python SDK / any HTTP client)
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Kraken API (Hono) β
β REST + WebSocket + streaming β
β OpenAI-compatible /v1/chat/* β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ
β Worker β β PostgreSQL 17 β β Neo4j 5 β
β(BullMQ)β β + pgvector β β (knowledge graph)β
β β β β β β
β Jobs: β β β’ Sessions β β β’ Entities β
β β’ Extract entities β β β’ Relationships β
β β’ Detect communities β β β’ Communities β
β β’ Update user model β β β
β β’ Reflect on skills β β β
β β’ Dream cycle β β β
β β’ Execute schedules β β β
ββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
βΌ βΌ βΌ
ββββββββββ ββββββββββ ββββββββββββ
β Redis 7β βChromiumβ β Sandbox β
β(queues β β(browserβ β (Docker β
β cache) β β CDP) β β isolate) β
ββββββββββ ββββββββββ ββββββββββββ
Components
Kraken API
The core HTTP server built on Hono. Handles all client-facing requests:
- Chat β
/v1/chat(native) and/v1/chat/completions(OpenAI-compatible). Assembles context from SOUL.md, user model, memory retrieval, and relevant skills before calling the LLM. - Memory β Query the knowledge graph, manage entities and relationships, explore communities.
- Identity β Read/write SOUL.md personality, view the auto-maintained user model, manage cross-platform identity links.
- Sessions β CRUD for conversation sessions with stable routing keys.
- Skills β Manage reusable procedure documents that the agent loads by relevance.
- Tools β Registry of available tools with JSON Schema definitions.
- Schedules β Create and manage cron-based automated tasks.
The API calls the LLM via the Vercel AI SDK, which abstracts over OpenAI, Anthropic, and other providers.
Worker
A separate Node.js process that consumes jobs from Redis via BullMQ. Runs asynchronously so the API stays fast:
| Job | Trigger | What it does |
|---|---|---|
| memory-extraction | After every chat | Extracts entities and relationships from the conversation |
| memory-communities | After entities extracted | Re-clusters the knowledge graph using community detection, updates hierarchical summaries |
| memory-user-model | After every chat | Compresses new signals into the persistent user model |
| skill-reflection | After chat (conditional) | Evaluates if the conversation produced a reusable workflow, creates or updates a skill |
| memory-dream | Every 15 min (configurable) | Offline consolidation β reviews recent conversations, suggests skills/tools, strengthens graph connections |
| schedule-execution | Every minute (cron tick) | Executes due scheduled tasks as new sessions |
PostgreSQL + pgvector
The primary relational store. Holds:
- Sessions β with stable
session_keyfor routing, personality overlays, metadata - Messages β with pgvector embeddings for semantic search, full-text search index on content
- Skills β versioned procedure documents with embeddings for relevance matching
- Tools β registered tool schemas with embeddings
- Identity β SOUL.md content, user model, AGENTS.md
- Schedules β cron expressions, task prompts, run tracking
- Identity Links β cross-platform user mappings
Neo4j
The knowledge graph. Stores structured understanding of the world:
- Entities β people, projects, tools, concepts, goals, preferences β with typed properties
- Relationships β directed edges like
works_on,uses,prefers,relates_to,has_goal,knows_about,depends_on - Communities β hierarchical clusters detected via the Leiden algorithm, each with an LLM-generated summary
This is what makes Kraken's memory fundamentally different from flat context stuffing. The graph enables queries like "show me everything related to Project X" or "what are the common themes across all my goals" β queries that flat memory systems can't answer.
Redis
Dual purpose:
- Job queues β BullMQ uses Redis to manage the background worker pipeline
- Session cache β Recently active sessions are cached for fast context assembly
Chromium
A headless Chromium instance managed by Browserless. The API connects via CDP (Chrome DevTools Protocol) through Playwright:
- Navigate to URLs, click elements, fill forms
- Take screenshots (full page or element)
- Extract text content or structured data
- All behind SSRF protection β internal/private IPs are blocked
Sandbox
Docker containers for isolated code execution. When the agent needs to run code:
- A fresh container is spun up from
kraken-sandbox:latest - Memory and CPU limits enforced
- Read-only root filesystem
- Workspace files mounted at a known path
- Container destroyed after execution
Git & GitHub Integration
The agent can interact with Git repositories and the GitHub API directly:
- Clone any public or private repo (via
KRAKEN_GIT_TOKEN) - Analyze code with search, diff, log, and branch inspection
- Modify code, commit changes, and push to feature branches
- Create pull requests via the GitHub REST API with full descriptions
- Read files directly from GitHub repos without cloning (via Contents API)
- All authentication is handled server-side β tokens are never exposed to the LLM
Data Flow: What Happens When You Send a Message
1. Message arrives via POST /v1/chat
β
2. Resolve or create session (by session_key or session_id)
β
3. Store message in PostgreSQL with vector embedding
β
4. Build system prompt:
βββ SOUL.md personality
βββ Current timestamp
βββ Personality overlay (if active)
βββ AGENTS.md project context
βββ User model (preferences, expertise, goals)
βββ GraphRAG memory retrieval (entities, communities, messages)
βββ Relevant skills (top 3 by embedding similarity)
β
5. Check if context compaction is needed
βββ YES β Pre-flush (persist important facts to graph)
β Summarize old messages
β
6. Call LLM with assembled context + available tools
β
7. Store assistant response with embedding
β
8. Queue background jobs:
βββ Extract entities & relationships β Neo4j
βββ Update user model
βββ Reflect on conversation β create/update skills
βββ Re-cluster communities (if graph changed)
Every conversation strengthens the knowledge graph. The more you use Kraken, the better it understands you.
Context Budget
The system prompt is assembled within a configurable token budget (default: 128,000 tokens):
| Segment | Typical Size | Description |
|---|---|---|
| SOUL.md | ~1,500 tokens | Agent personality (always included) |
| Timestamp | ~50 tokens | Current date/time |
| Personality overlay | ~200 tokens | Session-level behavior mode |
| AGENTS.md | ~500 tokens | Project context |
| User model | ~500 tokens | Preferences, expertise, goals |
| GraphRAG retrieval | ~4,000 tokens | Entities, communities, relevant memories |
| Skills (top 3) | ~1,500 tokens | Matching procedures |
| Conversation history | Remaining budget | Most recent messages |
If the conversation grows too long, the compaction system kicks in β silently persisting important context to the graph before summarizing older messages.
Why This Architecture
Most agent systems are thin wrappers: they stuff your chat history into a context window and call an API. When the window fills up, they truncate. Your context is gone.
Kraken's architecture is designed around a different principle: nothing should be forgotten, but not everything needs to be in context. The knowledge graph holds the complete picture. The context window holds what's relevant right now. Background workers continuously strengthen the graph. The result is an agent that gets smarter over time β not one that resets every session.