Memory System

Kraken's memory is what makes it a real personal assistant instead of a disposable chatbot. Everything you tell it — across Discord, Telegram, CLI, scheduled tasks, and direct API calls — feeds the same knowledge graph. While most systems treat memory as a flat list of facts crammed into the context window, Kraken builds a structured knowledge graph that it queries intelligently based on what you're actually asking.

Memory Tiers

Kraken organizes memory into six tiers, inspired by how human cognition works:

Tier	Storage	Purpose
Working Memory	In-context	Current conversation + retrieved context. Limited by the model's context window (max 80% of budget).
Entity Memory	Neo4j	Structured knowledge — people, projects, tools, concepts, goals — with typed relationships.
Community Memory	Neo4j	Hierarchical clusters of related entities, each with an LLM-generated summary for holistic reasoning.
Episodic Memory	PostgreSQL	Every message ever exchanged. Searchable via full-text search and vector similarity.
User Model	PostgreSQL	A structured, auto-maintained document capturing your preferences, expertise, communication style, and goals.
Skill Memory	PostgreSQL	Learned procedures. The top 3 matching skills are loaded per query via embedding similarity.

Why tiers matter

Flat memory systems have one strategy: dump everything into context. This fails in two ways:

Context overflow — you hit the token limit and start losing information
Noise — irrelevant memories dilute the signal

Kraken's tiered approach means:

Entity Memory answers "what do you know about X?"
Community Memory answers "what are the big themes?"
Episodic Memory answers "what did we discuss last Tuesday?"
User Model answers "how does this person prefer to work?"
Skill Memory answers "have I done something similar before?"

Each tier is queried differently, and only relevant information enters the context window.

The Knowledge Graph

Every conversation is automatically mined for structured information by background workers.

Entities

Typed nodes representing things the agent has learned about:

Type	Examples
`person`	"Alice", "Bob from the design team"
`project`	"Kraken Agent", "Q4 migration"
`tool`	"PostgreSQL", "Figma", "Docker"
`concept`	"GraphRAG", "event sourcing", "CQRS"
`goal`	"Ship v2 by March", "Learn Rust"
`preference`	"Prefers concise answers", "Uses dark mode"

Each entity has a name, type, optional properties (JSON), and timestamps.

Relationships

Directed edges connecting entities:

Type	Example
`works_on`	Alice → Kraken Agent
`uses`	Kraken Agent → PostgreSQL
`prefers`	Alice → concise answers
`relates_to`	GraphRAG → knowledge graphs
`has_goal`	Alice → Ship v2 by March
`knows_about`	Alice → event sourcing
`depends_on`	Q4 migration → PostgreSQL

Communities

Groups of related entities detected by the Leiden clustering algorithm. Each community gets an LLM-generated summary that captures the theme of the cluster.

Communities are hierarchical — a top-level community might be "Alice's work," containing sub-communities for each project, each with their own entity clusters.

This enables queries like "give me a high-level overview of everything you know" — the agent reads community summaries instead of scanning thousands of individual entities.

Query Modes

When Kraken retrieves memory for a conversation, it uses one of five query modes:

`auto` (default)

The agent analyzes query intent and routes to the best strategy. This is what you should use unless you have a specific reason to choose a mode.

`local`

Best for: specific entity questions ("What do you know about Project X?")

Find the entity matching the query
Fan out to direct neighbors (1-2 hops)
Gather properties and relationship context
Return a focused, entity-centric result

`global`

Best for: holistic / overview questions ("What patterns do you see in my work?")

Map the query over all community summaries
Score each community's relevance
Reduce the top results into a synthesized answer
Returns broad, thematic insights

`drift`

Best for: entity + broader context ("Tell me about Project X and how it fits into the bigger picture")

Combines local entity search with community-level context enrichment. Starts at a specific entity but drifts outward to capture the surrounding theme.

`basic`

Best for: simple factual recall ("What's Alice's email?")

Pure vector similarity search over stored messages. Fast, no graph traversal. Good for exact-match recall.

How Memory Extraction Works

After every conversation, a background worker analyzes the messages:

Conversation messages
        ↓
   LLM extraction prompt
   "Extract entities and relationships from this conversation"
        ↓
   Structured output: entities[] + relationships[]
        ↓
   Merge into Neo4j (upsert — update existing, create new)
        ↓
   If graph changed → trigger community re-clustering
        ↓
   Updated community summaries

This runs asynchronously — the user sees no delay. The knowledge graph grows silently in the background.

Entity merging

When the extractor finds an entity that already exists (by name + type), it merges rather than duplicating. Properties are updated, timestamps refreshed, and new relationships are added alongside existing ones.

Community detection

After entities are added or relationships change, the Leiden algorithm re-clusters the graph. Communities that changed get fresh LLM-generated summaries. This keeps the holistic view current without reprocessing the entire graph.

Context Compaction

When a conversation approaches the token limit (KRAKEN_COMPACTION_THRESHOLD_TOKENS, default 80,000), Kraken performs context compaction:

Step 1: Pre-flush

Before summarizing anything, Kraken runs a silent "pre-flush" pass:

Analyzes the conversation for important facts, decisions, and context
Persists them to the knowledge graph as entities and relationships
This ensures nothing is lost when older messages are summarized

Step 2: Summarize

Older messages (everything except the most recent KRAKEN_COMPACTION_KEEP_RECENT, default 10) are summarized into a compact digest by the LLM. The summary replaces the old messages in context.

The result

The conversation continues seamlessly. The user notices nothing. But behind the scenes, Kraken has:

Persisted key facts to the knowledge graph (permanent)
Compressed the conversation context (for this session)
Kept recent messages intact (for continuity)

No other agent system does this. Most just truncate and hope for the best.

Dream Cycle

Every 15 minutes (configurable via KRAKEN_DREAM_CRON), Kraken runs an offline consolidation cycle:

Reviews recent conversations that haven't been fully processed
Strengthens connections between entities
Identifies gaps or inconsistencies in the knowledge graph
Suggests new skills or tools based on observed patterns

Think of it as the agent "sleeping" — consolidating short-term experiences into long-term knowledge.

Querying Memory via the API

Python SDK

# Auto mode — let Kraken choose the best strategy
results = client.memory.query("What do you know about my projects?")

# Specific mode
results = client.memory.query(
    "What are the common themes in my work?",
    mode="global",
)

# With filters
results = client.memory.query(
    "What tools does Alice use?",
    entity_filter=["Alice"],
    limit=10,
)

REST API

curl -X POST http://localhost:8080/v1/memory/query \
  -H "Authorization: Bearer $KRAKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What do you know about my projects?",
    "mode": "auto",
    "limit": 20
  }'

See the API Reference for complete endpoint documentation.