A self-learning Claude Code orchestration layer. Every interaction intercepted, routed, remembered, and improved. Here's the full picture.
Architecture
All components wired through hook-handler.cjs — the central orchestrator that every Claude Code event flows through.
Modules
Eight CJS helpers form the actual running system — no compilation, no npm deps, pure Node.js built-ins.
Runtime Coordinator
Central dispatcher handling all 14 hook types. Wires every sub-system. On session-restore alone, it runs 8 sequential phases to construct the full prompt context Claude receives.
Persistent Memory
Cross-session memory with 4-tier retrieval hierarchy. L0 identity, L1 top-5 scored drawers, L2 namespace recall, L3 Okapi BM25 full-text. Zero AI calls — 100% local and deterministic.
Pattern Intelligence
Jaccard-scored context retrieval from learned patterns. At session-end, consolidates pending-insights.jsonl. Safety-guarded with 10MB file size and 5000 node limits.
Agent Router
Multi-tier waterfall routing from natural language to optimal agent. Non-dev detection → regex patterns → semantic RouteLayer → keyword fallback. Writes last-route.json for statusline.
Cost Tracking
Full codeburn pipeline port. Parses JSONL sessions, deduplicates subagent chains, calculates API costs with per-model pricing, renders ANSI dashboard with 6 panels. Auto-injected at session-restore.
Safety Layer
Built into hook-handler. Blocks destructive shell patterns before Claude can execute them: rm -rf /, fork bombs, dd zero-fill. Returns {action:"block"} to Claude Code which prevents execution.
Session State
Manages .monobrain/sessions/current.json lifecycle. restore(), end(), and metric() calls track task counts and session duration. Archives on end to session-{id}.json.
Key-Value Store
Simple flat-file key-value memory store. Backed by .monobrain/data/memory.json. Used for lightweight cross-session state that doesn't need full Memory Palace indexing.
Prompt Engineering
From raw user input to fully-engineered context — every step that fires before Claude sees your message.
On every new conversation, session.restore() loads current.json. Memory Palace injects [MEMORY_PALACE_L0] — the static identity.md containing project name, stack, key packages, git remote, and working style. This is always the first thing Claude reads.
From drawers.jsonl, the top-5 entries by score (within last 30 days) are injected as [MEMORY_PALACE_L1]. Score rises on retrieval frequency — frequently-useful memories bubble up automatically. No manual curation needed.
score bumped on every recall → high-frequency memories promoted to L1CLAUDE.md + docs/*.md are scanned, chunked, and keyword-indexed. Most relevant excerpts for this session are injected as [KNOWLEDGE_PRELOADED]. Shared agent instructions from .agents/shared_instructions.md are added as [SHARED_INSTRUCTIONS] (1500 char cap).
token-tracker.quickSummary() parses current month's JSONL session files, aggregates today + monthly spend, and injects as [TOKEN_USAGE]. Claude sees actual API spend before starting work. All timestamps handled in UTC to avoid timezone drift.
[TOKEN_USAGE] Today: $17.69 (414 calls) | Month: $2249.47 (39330 calls)Every UserPromptSubmit triggers the route hook. Intelligence scans auto-memory-store.json for Jaccard-scored relevant past patterns → injected as [INTELLIGENCE]. Router runs its 4-tier waterfall producing the routing panel Claude and the user see in terminal.
Pre-task hook scores the task description 0–100 based on word count + keywords. Score <30% → Haiku recommended. Score >70% → Opus recommended. Injected as [TASK_MODEL_RECOMMENDATION]. This informs agent spawning decisions.
[TASK_MODEL_RECOMMENDATION] Use model="haiku" (score: 18/100)After each task completes, post-task fires. Task content is chunked into 800-char segments (100-char overlap) and stored in drawers.jsonl. Closet terms extracted via regex. KG triple added. Future sessions can recall what was done here.
memory-palace.storeVerbatim(cwd, taskContent, {wing:'tasks', room:'active'})session-end triggers intelligence.consolidate() (clears pending-insights.jsonl), session.end() archives current.json, and Memory Palace stores a session-end temporal triple in kg.json with a valid_from timestamp.
Memory
Zero AI calls. Entirely deterministic and local. Inspired by MemPalace arXiv architecture.
Static .monobrain/palace/identity.md — project name, stack, key packages, working style, git remote. Injected verbatim as [MEMORY_PALACE_L0] on every session. Never auto-overwritten.
Always injected · ~500 charsReads drawers.jsonl, scores entries within last 30 days, picks top-5 by retrieval score. Auto-injected as [MEMORY_PALACE_L1]. Scores rise with every recall — high-frequency memories stay visible.
On session-restore · top-5 by scorerecall(wing, room, limit) retrieves top-scored drawers from a specific namespace. Called explicitly by hook code or agents needing focused context. Wings: tasks, sessions, architecture, debugging, general.
On-demand · namespace-scopedsearch(query, wing?, room?, limit?) runs Okapi BM25 (K1=1.5, B=0.75) across all drawers + closet term boost (+0.5 per matching topic). Most expensive but most comprehensive. Supports temporal KG queries via kgQuery().
Explicit call only · full corpus · BM25 + closet boostBM25 Formula (K1=1.5, B=0.75, closet boost=+0.5)
Routing
Every user prompt traverses this waterfall from top to bottom — first match wins.
If prompt matches marketing · sales · product · UI design · blockchain · legal → routes to "extras" with confidence 0.85. Returns top-8 specialist agents immediately. Skips all remaining tiers.
TASK_PATTERNS checked in priority order: implement|create|build → coder (0.8). fix|debug|error → reviewer (0.85). test|spec → tester (0.8). document|explain → researcher (0.75). security|auth → security-architect (0.9).
TF-IDF weighted keyword matching against agent capability profiles. Computes cosine similarity between prompt token vector and each agent's skill vector. Returns ranked list with similarity scores. Handles nuanced multi-concept prompts.
Simple token overlap between prompt and agent descriptions. Last resort — always produces a result. Default confidence 0.5 with "Default routing" reason. Prevents routing failure for any input.
Cost Tracking
Full codeburn port — parses native Claude Code JSONL sessions, deduplicates subagent chains, calculates real API cost.
~/.claude/projects/**/*.jsonl recursive scan, including subagent dirs
→Read assistant entries, deduplicate by msg.id across subagent chains
→13-category classifier: tools first, then keyword patterns
→multiplier × (in×rate + out×rate + cache reads/writes + webSearch)
→6 ANSI panels: Overview, Projects, Models, Daily chart, Tools, MCP
Cost Formula
cost = multiplier ×
(inputTokens × in_rate
+ outputTokens × out_rate
+ cacheWrite × cw_rate
+ cacheRead × cr_rate
+ webSearch × $0.01)Fast Mode Multipliers
UTC Timezone Rule
JSONL timestamps are UTC ISO strings. Never use new Date(y,m,d) for date math — it's local time. Always use Date.UTC() or now.toISOString().slice(0,10).
Hook System
Claude Code fires these events at precise lifecycle moments — hook-handler.cjs handles every one.
session-restore
Fires on every conversation start. Runs 8 phases: restore → intelligence → workers → knowledge → instructions → memory palace → tokens → microagent index. Most complex handler.
session-end
Fires on conversation end. Consolidates insights, archives session JSON, stores temporal KG triple for the session boundary.
route (UserPromptSubmit)
Every user message. Intelligence context retrieval → semantic routing → MicroAgent scan → prints routing panel → writes last-route.json.
pre-task
Before task execution. Increments task counter. Scores complexity 0–100 → outputs [TASK_MODEL_RECOMMENDATION] for Haiku/Sonnet/Opus selection.
post-task
After task completion. Stores task content in Memory Palace (800-char chunks). Extracts closet terms. Saves routing pattern for future intelligence.
pre-edit / post-edit
pre-edit: reads file, suggests specialist agent. post-edit: records edit to pending-insights.jsonl via intelligence.recordEdit().
pre-bash (PreToolUse)
Safety validator. Blocks: rm -rf /, format c:, dd if=/dev/zero, fork bombs. Returns {action:"block"} — Claude Code prevents command execution.
load-agent
Reads .claude/agents/{slug}.md and prints full content — activates an agent's identity and capabilities for the current conversation turn.
notify
Sends system notifications for important events. Used by workers and the routing system to surface alerts without interrupting the main flow.
worker hooks
worker-dispatch, worker-status, worker-list, worker-cancel, worker-detect: manage 12 background intelligence workers (ultralearn, optimize, consolidate, predict, audit, etc.)
intelligence hooks
trajectory-start/step/end, pattern-store/search, attention, learn, stats: inner workings of the intelligence learning loop. Records edit patterns and trajectories.
transfer
Handles cross-session context transfer. Packages the current session's learned state for injection into a new conversation.
Performance
All numbers measured on macOS, Node.js 20, SSD-backed filesystem.
| Operation | Component | Speed | Note |
|---|---|---|---|
| L0 Identity injection | memory-palace.cjs | <1ms | Single file read |
| L1 Top-5 drawers | memory-palace.cjs | 5–15ms | JSONL parse + sort |
| L3 BM25 full search | memory-palace.cjs | 20–80ms | Per 1000 drawers |
| Jaccard context retrieval | intelligence.cjs | <5ms | Per 500 entries |
| 4-tier routing | router.cjs | <3ms | Sync regex waterfall |
| Token quickSummary | token-tracker.cjs | 200–500ms | Per month of sessions |
| Full dashboard render | token-tracker.cjs | 1–3s | All-time JSONL parse |
| Hook timeout guard | hook-handler.cjs | 3s max | runWithTimeout cap |
ADR-026 — 3-Tier Model Routing
TIER 1
Agent Booster
WASM · <1ms · $0
Simple transforms — no LLM needed. Var→const, type additions.
TIER 2
Haiku 4.5
~500ms · $0.0002/req
Simple tasks, complexity score <30%. Most routine work.
TIER 3
Sonnet / Opus 4.6
2–5s · $0.003–0.015
Architecture, security, complex reasoning. Score >30%.