Summary Memory
Summary memory stores compressed summaries of larger memory collections with bidirectional links to source memories, enabling hierarchical compression and efficient context management.
Overview
Summary memory implements a powerful compression strategy for managing long conversations, large document collections, or extensive knowledge bases. Each summary acts as a compressed representation of multiple source memories, with explicit links that allow "expansion" back to the original content when needed.
This memory type excels at:
- Context compression: Reduce token usage by replacing detailed histories with concise summaries
- Hierarchical summarization: Create multi-level summaries (summary of summaries)
- Selective expansion: Retrieve full details only when necessary
- Thread management: Organize summaries by conversation or session threads
The bidirectional linking (summary → sources, sources → summary) enables sophisticated memory management strategies where agents can work with compressed context but expand to details on demand.
When to Use
Use summary memory to:
- Compress long conversations: Summarize older messages to stay within context limits
- Manage document collections: Create high-level overviews of large knowledge bases
- Enable progressive detail: Start with summaries, drill down to sources as needed
- Implement memory consolidation: Periodically compress older memories into summaries
- Support multi-resolution context: Maintain both high-level and detailed views
Summary memory is essential for long-running agents that accumulate extensive conversational or knowledge history.
Storage Strategy
- Backend: VECTOR (semantic search with HNSW indexing)
- Default k: 3 results (summaries are typically broad, so fewer results)
- Embeddings: Yes (enables semantic similarity search across summaries)
- Ordered: No (accessed by relevance, though summaries may contain temporal info)
- Expansion: Supports bidirectional traversal to source memories
Schema
Each summary memory includes:
| Field | Type | Required | Description |
|---|---|---|---|
summary_type | string | Yes | Type of summary: "conversation", "document", "session", "topic" |
source_ids | array | Yes | List of memory IDs that this summary represents |
thread_id | string | No | Thread/session ID if summarizing a conversation |
time_range_start | datetime | No | Start timestamp of summarized content |
time_range_end | datetime | No | End timestamp of summarized content |
source_count | integer | No | Number of source memories summarized |
Additional fields are stored in the metadata dictionary.
API Methods
Adding Summaries
async def add_summary(
summary: str,
source_ids: list[str],
thread_id: str | None = None,
) -> str:
"""
Add a summary that references source memories.
Args:
summary: The summary text
source_ids: List of memory IDs that this summary is derived from
thread_id: Optional thread ID if this summarizes a conversation
Returns:
The ID of the created summary memory
"""
Expanding Summaries
async def expand_summary(summary_id: str) -> list[MemoryUnit]:
"""
Expand a summary to retrieve its source memories.
Args:
summary_id: The ID of the summary to expand
Returns:
List of source MemoryUnit objects that the summary was derived from
Raises:
KeyError: If the summary is not found
"""
Examples
Compressing Conversation History
from memharness import MemoryHarness
harness = MemoryHarness(backend="sqlite:///memory.db")
# Long conversation has accumulated many messages
thread_id = "chat-123"
messages = await harness.get_conversational(thread_id, limit=50)
# Compress the first 30 messages into a summary
old_messages = messages[:30]
message_ids = [msg.id for msg in old_messages]
summary_text = """
User asked about Python async programming, specifically about the Global
Interpreter Lock (GIL) and how async/await works. Discussion covered:
- GIL limitations and when it matters
- asyncio event loop mechanics
- Differences between threading, multiprocessing, and async
- Best practices for I/O-bound vs CPU-bound tasks
User expressed preference for async patterns in web applications.
"""
summary_id = await harness.add_summary(
summary=summary_text,
source_ids=message_ids,
thread_id=thread_id
)
print(f"Created summary {summary_id} covering {len(message_ids)} messages")
Progressive Detail Retrieval
# Start with summaries for efficient context loading
recent_messages = await harness.get_conversational("chat-123", limit=10)
summaries = await harness.search(
query="Python async programming",
memory_type=MemoryType.SUMMARY,
k=2
)
# Build initial context with summaries
context = []
for summary in summaries:
context.append(f"Previous discussion: {summary.content}")
# If agent needs more detail, expand specific summaries
if needs_more_detail:
detailed_messages = await harness.expand_summary(summaries[0].id)
for msg in detailed_messages:
context.append(f"[Detail] {msg.content}")
Hierarchical Summarization
# Create first-level summaries (daily summaries)
day1_messages = await harness.get_conversational("chat-123", limit=100)
day1_msg_ids = [m.id for m in day1_messages]
day1_summary = await harness.add_summary(
summary="Day 1: User onboarding, discussed Python basics, set up dev environment",
source_ids=day1_msg_ids,
thread_id="chat-123"
)
day2_summary = await harness.add_summary(
summary="Day 2: Advanced Python topics, async programming, testing strategies",
source_ids=[...], # Day 2 message IDs
thread_id="chat-123"
)
# Create second-level summary (weekly summary of daily summaries)
weekly_summary = await harness.add_summary(
summary="Week 1: New user learned Python fundamentals through advanced topics",
source_ids=[day1_summary, day2_summary], # Summarizing summaries!
thread_id="chat-123"
)
Automatic Memory Consolidation
async def consolidate_old_memories(thread_id: str, harness: MemoryHarness):
"""Automatically summarize older conversation segments."""
messages = await harness.get_conversational(thread_id, limit=1000)
# Summarize messages older than 1 hour in chunks of 20
from datetime import datetime, timedelta
cutoff = datetime.now() - timedelta(hours=1)
old_messages = [m for m in messages if m.created_at < cutoff]
# Process in chunks of 20
chunk_size = 20
for i in range(0, len(old_messages), chunk_size):
chunk = old_messages[i:i+chunk_size]
chunk_ids = [m.id for m in chunk]
# Generate summary (using LLM or rule-based)
summary_text = await generate_summary(chunk)
await harness.add_summary(
summary=summary_text,
source_ids=chunk_ids,
thread_id=thread_id
)
# Optionally delete source messages to save space
# for msg in chunk:
# await harness.delete(msg.id)
print(f"Consolidated {len(old_messages)} messages into summaries")
Best Practices
-
Include metadata about coverage: Store time ranges, message counts, and topics covered in metadata for quick assessment
-
Balance compression vs. detail: Don't over-summarize - keep enough detail that expansion isn't always necessary
-
Use hierarchical summaries: Create multiple levels (message → daily → weekly) for very long histories
-
Preserve critical information: Ensure important entities, decisions, and facts are retained in summaries
-
Link related summaries: Use metadata to connect related summaries (e.g., same topic across different threads)
-
Clean up orphaned sources: When deleting summarized memories, ensure summaries are updated or also removed
Integration with Other Memory Types
Summary memory complements other memory types:
- Conversational: Primary use case - summarizing chat history to manage context length
- Knowledge: Summarize document collections for efficient retrieval
- Entity: Extract and preserve entity mentions when creating summaries
- Workflow: Summarize completed workflow sequences for learning patterns
- Tool Log: Compress extensive tool execution logs into high-level activity summaries
Performance Notes
- Token efficiency: Summaries dramatically reduce token usage in LLM context windows
- Semantic search: Use embeddings to find relevant summaries without knowing exact content
- Lazy expansion: Only expand summaries when detail is actually needed
- Small default k: Returns 3 summaries by default since they typically cover broad topics
- Async traversal: Both summary creation and expansion are fully async for performance
- Cached source lookups: Frequently expanded summaries benefit from backend caching