Memory System
Kraken's memory is what makes it a real personal assistant instead of a disposable chatbot. Everything you tell it — across Discord, Telegram, CLI, scheduled tasks, and direct API calls — feeds the same knowledge graph. While most systems treat memory as a flat list of facts crammed into the context window, Kraken builds a structured knowledge graph that it queries intelligently based on what you're actually asking.
Memory Tiers
Kraken organizes memory into six tiers, inspired by how human cognition works:
| Tier | Storage | Purpose |
|---|---|---|
| Working Memory | In-context | Current conversation + retrieved context. Limited by the model's context window (max 80% of budget). |
| Entity Memory | Neo4j | Structured knowledge — people, projects, tools, concepts, goals — with typed relationships. |
| Community Memory | Neo4j | Hierarchical clusters of related entities, each with an LLM-generated summary for holistic reasoning. |
| Episodic Memory | PostgreSQL | Every message ever exchanged. Searchable via full-text search and vector similarity. |
| User Model | PostgreSQL | A structured, auto-maintained document capturing your preferences, expertise, communication style, and goals. |
| Skill Memory | PostgreSQL | Learned procedures. The top 3 matching skills are loaded per query via embedding similarity. |
Why tiers matter
Flat memory systems have one strategy: dump everything into context. This fails in two ways:
- Context overflow — you hit the token limit and start losing information
- Noise — irrelevant memories dilute the signal
Kraken's tiered approach means:
- Entity Memory answers "what do you know about X?"
- Community Memory answers "what are the big themes?"
- Episodic Memory answers "what did we discuss last Tuesday?"
- User Model answers "how does this person prefer to work?"
- Skill Memory answers "have I done something similar before?"
Each tier is queried differently, and only relevant information enters the context window.
The Knowledge Graph
Every conversation is automatically mined for structured information by background workers.
Entities
Typed nodes representing things the agent has learned about:
| Type | Examples |
|---|---|
person |
"Alice", "Bob from the design team" |
project |
"Kraken Agent", "Q4 migration" |
tool |
"PostgreSQL", "Figma", "Docker" |
concept |
"GraphRAG", "event sourcing", "CQRS" |
goal |
"Ship v2 by March", "Learn Rust" |
preference |
"Prefers concise answers", "Uses dark mode" |
Each entity has a name, type, optional properties (JSON), and timestamps.
Relationships
Directed edges connecting entities:
| Type | Example |
|---|---|
works_on |
Alice → Kraken Agent |
uses |
Kraken Agent → PostgreSQL |
prefers |
Alice → concise answers |
relates_to |
GraphRAG → knowledge graphs |
has_goal |
Alice → Ship v2 by March |
knows_about |
Alice → event sourcing |
depends_on |
Q4 migration → PostgreSQL |
Communities
Groups of related entities detected by the Leiden clustering algorithm. Each community gets an LLM-generated summary that captures the theme of the cluster.
Communities are hierarchical — a top-level community might be "Alice's work," containing sub-communities for each project, each with their own entity clusters.
This enables queries like "give me a high-level overview of everything you know" — the agent reads community summaries instead of scanning thousands of individual entities.
Query Modes
When Kraken retrieves memory for a conversation, it uses one of five query modes:
auto (default)
The agent analyzes query intent and routes to the best strategy. This is what you should use unless you have a specific reason to choose a mode.
local
Best for: specific entity questions ("What do you know about Project X?")
- Find the entity matching the query
- Fan out to direct neighbors (1-2 hops)
- Gather properties and relationship context
- Return a focused, entity-centric result
global
Best for: holistic / overview questions ("What patterns do you see in my work?")
- Map the query over all community summaries
- Score each community's relevance
- Reduce the top results into a synthesized answer
- Returns broad, thematic insights
drift
Best for: entity + broader context ("Tell me about Project X and how it fits into the bigger picture")
Combines local entity search with community-level context enrichment. Starts at a specific entity but drifts outward to capture the surrounding theme.
basic
Best for: simple factual recall ("What's Alice's email?")
Pure vector similarity search over stored messages. Fast, no graph traversal. Good for exact-match recall.
How Memory Extraction Works
After every conversation, a background worker analyzes the messages:
Conversation messages
↓
LLM extraction prompt
"Extract entities and relationships from this conversation"
↓
Structured output: entities[] + relationships[]
↓
Merge into Neo4j (upsert — update existing, create new)
↓
If graph changed → trigger community re-clustering
↓
Updated community summaries
This runs asynchronously — the user sees no delay. The knowledge graph grows silently in the background.
Entity merging
When the extractor finds an entity that already exists (by name + type), it merges rather than duplicating. Properties are updated, timestamps refreshed, and new relationships are added alongside existing ones.
Community detection
After entities are added or relationships change, the Leiden algorithm re-clusters the graph. Communities that changed get fresh LLM-generated summaries. This keeps the holistic view current without reprocessing the entire graph.
Context Compaction
When a conversation approaches the token limit (KRAKEN_COMPACTION_THRESHOLD_TOKENS, default 80,000), Kraken performs context compaction:
Step 1: Pre-flush
Before summarizing anything, Kraken runs a silent "pre-flush" pass:
- Analyzes the conversation for important facts, decisions, and context
- Persists them to the knowledge graph as entities and relationships
- This ensures nothing is lost when older messages are summarized
Step 2: Summarize
Older messages (everything except the most recent KRAKEN_COMPACTION_KEEP_RECENT, default 10) are summarized into a compact digest by the LLM. The summary replaces the old messages in context.
The result
The conversation continues seamlessly. The user notices nothing. But behind the scenes, Kraken has:
- Persisted key facts to the knowledge graph (permanent)
- Compressed the conversation context (for this session)
- Kept recent messages intact (for continuity)
No other agent system does this. Most just truncate and hope for the best.
Dream Cycle
Every 15 minutes (configurable via KRAKEN_DREAM_CRON), Kraken runs an offline consolidation cycle:
- Reviews recent conversations that haven't been fully processed
- Strengthens connections between entities
- Identifies gaps or inconsistencies in the knowledge graph
- Suggests new skills or tools based on observed patterns
Think of it as the agent "sleeping" — consolidating short-term experiences into long-term knowledge.
Querying Memory via the API
Python SDK
# Auto mode — let Kraken choose the best strategy
results = client.memory.query("What do you know about my projects?")
# Specific mode
results = client.memory.query(
"What are the common themes in my work?",
mode="global",
)
# With filters
results = client.memory.query(
"What tools does Alice use?",
entity_filter=["Alice"],
limit=10,
)
REST API
curl -X POST http://localhost:8080/v1/memory/query \
-H "Authorization: Bearer $KRAKEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What do you know about my projects?",
"mode": "auto",
"limit": 20
}'
See the API Reference for complete endpoint documentation.