Architecture

Kraken is a self-contained stack that runs via Docker Compose. You deploy it once on your own hardware and it becomes your personal AI assistant for everything β€” coding, research, scheduling, browsing, automation. No external services, no third-party memory providers, no vendor lock-in. Every conversation from every connected platform feeds the same brain.

System Overview

Your App (Python SDK / any HTTP client)
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Kraken API  (Hono)          β”‚
β”‚     REST + WebSocket + streaming    β”‚
β”‚     OpenAI-compatible /v1/chat/*    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚          β”‚                               β”‚
    β–Ό          β–Ό                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Worker β”‚ β”‚   PostgreSQL 17     β”‚  β”‚    Neo4j 5       β”‚
β”‚(BullMQ)β”‚ β”‚   + pgvector        β”‚  β”‚ (knowledge graph)β”‚
β”‚        β”‚ β”‚                     β”‚  β”‚                  β”‚
β”‚ Jobs:  β”‚ β”‚ β€’ Sessions          β”‚  β”‚ β€’ Entities       β”‚
β”‚ β€’ Extract entities             β”‚  β”‚ β€’ Relationships  β”‚
β”‚ β€’ Detect communities           β”‚  β”‚ β€’ Communities    β”‚
β”‚ β€’ Update user model            β”‚  β”‚                  β”‚
β”‚ β€’ Reflect on skills            β”‚  β”‚                  β”‚
β”‚ β€’ Dream cycle                  β”‚  β”‚                  β”‚
β”‚ β€’ Execute schedules            β”‚  β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β–Ό          β–Ό          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Redis 7β”‚ β”‚Chromiumβ”‚ β”‚ Sandbox  β”‚
β”‚(queues β”‚ β”‚(browserβ”‚ β”‚ (Docker  β”‚
β”‚ cache) β”‚ β”‚ CDP)   β”‚ β”‚ isolate) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

Kraken API

The core HTTP server built on Hono. Handles all client-facing requests:

  • Chat β€” /v1/chat (native) and /v1/chat/completions (OpenAI-compatible). Assembles context from SOUL.md, user model, memory retrieval, and relevant skills before calling the LLM.
  • Memory β€” Query the knowledge graph, manage entities and relationships, explore communities.
  • Identity β€” Read/write SOUL.md personality, view the auto-maintained user model, manage cross-platform identity links.
  • Sessions β€” CRUD for conversation sessions with stable routing keys.
  • Skills β€” Manage reusable procedure documents that the agent loads by relevance.
  • Tools β€” Registry of available tools with JSON Schema definitions.
  • Schedules β€” Create and manage cron-based automated tasks.

The API calls the LLM via the Vercel AI SDK, which abstracts over OpenAI, Anthropic, and other providers.

Worker

A separate Node.js process that consumes jobs from Redis via BullMQ. Runs asynchronously so the API stays fast:

Job Trigger What it does
memory-extraction After every chat Extracts entities and relationships from the conversation
memory-communities After entities extracted Re-clusters the knowledge graph using community detection, updates hierarchical summaries
memory-user-model After every chat Compresses new signals into the persistent user model
skill-reflection After chat (conditional) Evaluates if the conversation produced a reusable workflow, creates or updates a skill
memory-dream Every 15 min (configurable) Offline consolidation β€” reviews recent conversations, suggests skills/tools, strengthens graph connections
schedule-execution Every minute (cron tick) Executes due scheduled tasks as new sessions

PostgreSQL + pgvector

The primary relational store. Holds:

  • Sessions β€” with stable session_key for routing, personality overlays, metadata
  • Messages β€” with pgvector embeddings for semantic search, full-text search index on content
  • Skills β€” versioned procedure documents with embeddings for relevance matching
  • Tools β€” registered tool schemas with embeddings
  • Identity β€” SOUL.md content, user model, AGENTS.md
  • Schedules β€” cron expressions, task prompts, run tracking
  • Identity Links β€” cross-platform user mappings

Neo4j

The knowledge graph. Stores structured understanding of the world:

  • Entities β€” people, projects, tools, concepts, goals, preferences β€” with typed properties
  • Relationships β€” directed edges like works_on, uses, prefers, relates_to, has_goal, knows_about, depends_on
  • Communities β€” hierarchical clusters detected via the Leiden algorithm, each with an LLM-generated summary

This is what makes Kraken's memory fundamentally different from flat context stuffing. The graph enables queries like "show me everything related to Project X" or "what are the common themes across all my goals" β€” queries that flat memory systems can't answer.

Redis

Dual purpose:

  1. Job queues β€” BullMQ uses Redis to manage the background worker pipeline
  2. Session cache β€” Recently active sessions are cached for fast context assembly

Chromium

A headless Chromium instance managed by Browserless. The API connects via CDP (Chrome DevTools Protocol) through Playwright:

  • Navigate to URLs, click elements, fill forms
  • Take screenshots (full page or element)
  • Extract text content or structured data
  • All behind SSRF protection β€” internal/private IPs are blocked

Sandbox

Docker containers for isolated code execution. When the agent needs to run code:

  • A fresh container is spun up from kraken-sandbox:latest
  • Memory and CPU limits enforced
  • Read-only root filesystem
  • Workspace files mounted at a known path
  • Container destroyed after execution

Git & GitHub Integration

The agent can interact with Git repositories and the GitHub API directly:

  • Clone any public or private repo (via KRAKEN_GIT_TOKEN)
  • Analyze code with search, diff, log, and branch inspection
  • Modify code, commit changes, and push to feature branches
  • Create pull requests via the GitHub REST API with full descriptions
  • Read files directly from GitHub repos without cloning (via Contents API)
  • All authentication is handled server-side β€” tokens are never exposed to the LLM

Data Flow: What Happens When You Send a Message

1. Message arrives via POST /v1/chat
   ↓
2. Resolve or create session (by session_key or session_id)
   ↓
3. Store message in PostgreSQL with vector embedding
   ↓
4. Build system prompt:
   β”œβ”€β”€ SOUL.md personality
   β”œβ”€β”€ Current timestamp
   β”œβ”€β”€ Personality overlay (if active)
   β”œβ”€β”€ AGENTS.md project context
   β”œβ”€β”€ User model (preferences, expertise, goals)
   β”œβ”€β”€ GraphRAG memory retrieval (entities, communities, messages)
   └── Relevant skills (top 3 by embedding similarity)
   ↓
5. Check if context compaction is needed
   └── YES β†’ Pre-flush (persist important facts to graph)
            β†’ Summarize old messages
   ↓
6. Call LLM with assembled context + available tools
   ↓
7. Store assistant response with embedding
   ↓
8. Queue background jobs:
   β”œβ”€β”€ Extract entities & relationships β†’ Neo4j
   β”œβ”€β”€ Update user model
   β”œβ”€β”€ Reflect on conversation β†’ create/update skills
   └── Re-cluster communities (if graph changed)

Every conversation strengthens the knowledge graph. The more you use Kraken, the better it understands you.


Context Budget

The system prompt is assembled within a configurable token budget (default: 128,000 tokens):

Segment Typical Size Description
SOUL.md ~1,500 tokens Agent personality (always included)
Timestamp ~50 tokens Current date/time
Personality overlay ~200 tokens Session-level behavior mode
AGENTS.md ~500 tokens Project context
User model ~500 tokens Preferences, expertise, goals
GraphRAG retrieval ~4,000 tokens Entities, communities, relevant memories
Skills (top 3) ~1,500 tokens Matching procedures
Conversation history Remaining budget Most recent messages

If the conversation grows too long, the compaction system kicks in β€” silently persisting important context to the graph before summarizing older messages.


Why This Architecture

Most agent systems are thin wrappers: they stuff your chat history into a context window and call an API. When the window fills up, they truncate. Your context is gone.

Kraken's architecture is designed around a different principle: nothing should be forgotten, but not everything needs to be in context. The knowledge graph holds the complete picture. The context window holds what's relevant right now. Background workers continuously strengthen the graph. The result is an agent that gets smarter over time β€” not one that resets every session.