Skip to main content

Context Assembly Agent

The ContextAssemblyAgent is the core meta-agent — it implements the BEFORE-loop pattern from the agent memory course (Lesson 06). Its job is to assemble all relevant memory context before each LLM call.

Why This Matters

From the course:

"Chicken-and-egg problem: the agent can't decide to check memory it doesn't know exists. You need deterministic retrieval at the start."

The Context Assembly Agent solves this by always running before the LLM, loading all relevant context deterministically.

The BEFORE-Loop Pattern

Usage

from memharness import MemoryHarness
from memharness.agents import ContextAssemblyAgent

harness = MemoryHarness("sqlite:///memory.db")
await harness.connect()

agent = ContextAssemblyAgent(
harness=harness,
max_tokens=4000,
summarize_threshold=0.8, # 80% — trigger truncation
)

ctx = await agent.assemble(
query="How do I deploy to Kubernetes?",
thread_id="user-123",
include_tools=True,
)

print(f"Context usage: {ctx.context_usage_percent:.1%}")
print(f"Estimated tokens: {ctx.total_tokens_estimate}")

AssembledContext

The assemble() method returns an AssembledContext dataclass:

Fields

FieldTypeContent
personastrAgent identity/style
conversation_historylist[MemoryUnit]Recent messages from thread
knowledgestrRelevant KB entries
workflowsstrMatching workflow patterns
entitiesstrKnown entities
summariesstrCompressed summaries
toolsstrToolbox tree
user_querystrCurrent query
total_tokens_estimateintRough token count
context_usage_percentfloat0.0 to 1.0

Two Output Formats

# 1. Markdown string — for direct LLM prompting
prompt = ctx.to_prompt()
# Returns sectioned markdown:
# ## Agent Persona
# ## Conversation History
# ## Relevant Knowledge
# ...

# 2. LangChain messages — for any LLM provider
messages = ctx.to_messages()
# Returns list[BaseMessage]:
# [SystemMessage(content="## Agent Persona\n..."),
# HumanMessage(content="Hello!"),
# AIMessage(content="Hi there!"),
# ...]

Context Window Monitor

The agent monitors context usage and warns when approaching limits:

  • below 80% — normal, return all context
  • 80%+ — truncate older conversation history to last 10 messages
ctx = await agent.assemble(query="...", thread_id="...")

if ctx.context_usage_percent > 0.8:
print("⚠️ Context nearly full — consider running Summarizer Agent")
elif ctx.context_usage_percent > 0.5:
print("⚡ Context at 50% — monitor closely")
else:
print("✅ Context healthy")

What Gets Loaded

All reads are deterministic — they always happen, regardless of content:

Memory TypeMethodk
Personaget_active_persona()1
Conversationget_conversational(thread_id, limit=20)20
Knowledgesearch_knowledge(query, k=5)5
Workflowsearch_workflow(query, k=3)3
Entitysearch_entity(query, k=5)5
Toolboxtoolbox_tree("/")all

Integration with the Agent Loop

async def call_agent(query: str, thread_id: str, llm):
# BEFORE loop: deterministic context assembly
ctx = await context_agent.assemble(query, thread_id)
messages = ctx.to_messages()

# INSIDE loop: LLM reasoning + tool calls
response = await llm.invoke(messages)

# AFTER loop: persist artifacts
await harness.add_conversational(thread_id, "assistant", response.content)
# (workflow + entity extraction via agents)

return response.content

This is the complete BEFORE/INSIDE/AFTER pattern from Lesson 06 of the agent memory course.