Skip to main content

Tool Log Memory

Tool log memory provides an ordered audit trail of tool executions with status tracking, enabling debugging, analytics, and learning from tool usage patterns.

Overview

Tool log memory records every tool invocation with complete details about inputs, outputs, execution status, and timing. Unlike the toolbox (which stores tool definitions), tool log captures the actual execution history, creating a chronological record of what tools were used, when, and with what results.

This memory type excels at:

  • Execution tracking: Complete audit trail of all tool invocations
  • Error debugging: Detailed records of failures with error messages
  • Performance monitoring: Track execution duration and success rates
  • Usage analytics: Understand which tools are used most frequently
  • Learning from history: Discover successful tool usage patterns

The ordered nature of tool logs makes them ideal for temporal analysis, debugging sequences of actions, and understanding the chronological flow of agent behavior.

When to Use

Use tool log memory to:

  • Debug tool failures: Review what went wrong with specific tool executions
  • Track execution history: Maintain a complete audit trail for compliance or analysis
  • Monitor performance: Identify slow or unreliable tools
  • Learn patterns: Discover which tool sequences lead to success
  • Provide user transparency: Show users what actions the agent took
  • Analyze tool usage: Generate statistics on tool popularity and reliability

Tool log is essential for production agents where observability and debugging are critical.

Storage Strategy

  • Backend: SQL (ordered, sequential access)
  • Default k: 10 results (logs are numerous, so moderate retrieval)
  • Embeddings: No (logs are accessed chronologically or by exact match)
  • Ordered: Yes (temporal ordering is fundamental to logs)
  • Thread isolation: Logs can be scoped to conversation threads

Schema

Each tool log entry includes:

FieldTypeRequiredDescription
tool_namestringYesName of the executed tool
statusstringYesExecution status: "success", "error", "timeout"
inputobjectNoInput parameters passed to the tool
outputobjectNoOutput result from the tool
duration_msintegerNoExecution duration in milliseconds
errorstringNoError message if status is "error"
thread_idstringNoConversation thread if applicable

Additional fields are stored in the metadata dictionary.

API Methods

Logging Tool Executions

async def add_tool_log(
thread_id: str,
tool_name: str,
args: dict[str, Any],
result: str,
status: str,
) -> str:
"""
Log a tool execution.

Args:
thread_id: The conversation thread ID
tool_name: Name of the executed tool
args: Arguments passed to the tool
result: Result or output from the tool
status: Execution status ("success", "error", "timeout")

Returns:
The ID of the created log entry
"""

async def log_tool_execution(
tool_name: str,
input_params: dict[str, Any],
output_result: dict[str, Any] | None = None,
success: bool = True,
duration_ms: int | None = None,
error: str | None = None,
) -> str:
"""
Log a tool execution with detailed parameters.

Args:
tool_name: Name of the tool executed
input_params: Input parameters passed to the tool
output_result: Output result from the tool
success: Whether the execution was successful
duration_ms: Duration in milliseconds
error: Error message if execution failed

Returns:
The ID of the created log entry
"""

Retrieving Logs

async def get_tool_log(
thread_id: str,
limit: int = 20,
) -> list[MemoryUnit]:
"""
Retrieve tool execution log for a thread.

Args:
thread_id: The conversation thread ID
limit: Maximum number of log entries to retrieve

Returns:
List of tool log MemoryUnit objects, ordered from oldest to newest
"""

async def search_tool_logs(
query: str,
k: int = 10,
) -> list[MemoryUnit]:
"""
Search tool execution logs.

Args:
query: Search query (tool name or partial match)
k: Number of results to return

Returns:
List of matching tool log memory units
"""

Examples

Basic Tool Execution Logging

from memharness import MemoryHarness
import time

harness = MemoryHarness(backend="sqlite:///memory.db")

# Execute a tool and log it
start_time = time.time()

try:
result = await github_create_issue(
title="Bug fix",
body="Fixed the authentication issue"
)

duration_ms = int((time.time() - start_time) * 1000)

await harness.log_tool_execution(
tool_name="github.create_issue",
input_params={
"title": "Bug fix",
"body": "Fixed the authentication issue"
},
output_result={"issue_number": result.number},
success=True,
duration_ms=duration_ms
)

except Exception as e:
duration_ms = int((time.time() - start_time) * 1000)

await harness.log_tool_execution(
tool_name="github.create_issue",
input_params={
"title": "Bug fix",
"body": "Fixed the authentication issue"
},
success=False,
duration_ms=duration_ms,
error=str(e)
)

Debugging Tool Failures

# Retrieve recent tool logs for a conversation thread
logs = await harness.get_tool_log("chat-123", limit=20)

# Find failed executions
failures = [log for log in logs if log.metadata.get("status") == "error"]

for failure in failures:
print(f"Tool: {failure.metadata['tool_name']}")
print(f"Error: {failure.metadata.get('error', 'Unknown')}")
print(f"Input: {failure.metadata.get('input', {})}")
print(f"Time: {failure.created_at}")
print("---")

Tool Usage Analytics

async def analyze_tool_usage(harness: MemoryHarness):
"""Generate statistics on tool usage patterns."""

# Search for all tool logs (adjust limit as needed)
logs = await harness.search_tool_logs("", k=1000)

# Compute statistics
tool_counts = {}
tool_failures = {}
tool_durations = {}

for log in logs:
tool_name = log.metadata.get("tool_name", "unknown")
status = log.metadata.get("status", "unknown")
duration = log.metadata.get("duration_ms", 0)

# Count executions
tool_counts[tool_name] = tool_counts.get(tool_name, 0) + 1

# Count failures
if status == "error":
tool_failures[tool_name] = tool_failures.get(tool_name, 0) + 1

# Track durations
if tool_name not in tool_durations:
tool_durations[tool_name] = []
if duration > 0:
tool_durations[tool_name].append(duration)

# Print report
print("Tool Usage Report")
print("=" * 50)

for tool_name in sorted(tool_counts.keys()):
total = tool_counts[tool_name]
failures = tool_failures.get(tool_name, 0)
success_rate = ((total - failures) / total * 100) if total > 0 else 0

avg_duration = 0
if tool_name in tool_durations and tool_durations[tool_name]:
avg_duration = sum(tool_durations[tool_name]) / len(tool_durations[tool_name])

print(f"{tool_name}:")
print(f" Executions: {total}")
print(f" Success rate: {success_rate:.1f}%")
print(f" Avg duration: {avg_duration:.0f}ms")
print()

# Usage
await analyze_tool_usage(harness)

Automatic Tool Wrapper

from functools import wraps
from typing import Callable, Any

def logged_tool(tool_name: str, harness: MemoryHarness) -> Callable:
"""Decorator that automatically logs tool executions."""

def decorator(func: Callable) -> Callable:
@wraps(func)
async def wrapper(*args: Any, **kwargs: Any) -> Any:
start_time = time.time()

try:
result = await func(*args, **kwargs)
duration_ms = int((time.time() - start_time) * 1000)

await harness.log_tool_execution(
tool_name=tool_name,
input_params=kwargs,
output_result={"result": str(result)[:500]}, # Truncate long results
success=True,
duration_ms=duration_ms
)

return result

except Exception as e:
duration_ms = int((time.time() - start_time) * 1000)

await harness.log_tool_execution(
tool_name=tool_name,
input_params=kwargs,
success=False,
duration_ms=duration_ms,
error=str(e)
)

raise

return wrapper
return decorator

# Usage
@logged_tool("github.create_issue", harness)
async def create_github_issue(title: str, body: str):
# Tool implementation
return await github_api.create_issue(title=title, body=body)

Best Practices

  1. Log every tool execution: Don't selectively log - capture all invocations for complete observability

  2. Include timing information: Always record duration_ms for performance analysis

  3. Truncate large outputs: Store summaries of large results rather than full content to avoid bloat

  4. Use consistent tool names: Match tool names with those in toolbox memory for cross-referencing

  5. Set reasonable limits: When retrieving logs, use appropriate limits to avoid loading thousands of entries

  6. Implement log rotation: For long-running agents, periodically archive or summarize old logs

Integration with Other Memory Types

Tool log memory integrates with other memory types:

  • Toolbox: Tool definitions in toolbox map to executions in tool log
  • Skills: Successful tool patterns in logs inform skill development
  • Workflow: Tool logs show actual execution vs. planned workflow steps
  • Conversational: Thread-scoped logs provide context for conversation turns
  • Summary: Compress extensive tool logs into high-level activity summaries

Performance Notes

  • SQL storage: Efficient for chronological queries and exact-match lookups
  • No embeddings: Logs don't use vector search - accessed by time, thread, or tool name
  • Ordered retrieval: Logs are returned in chronological order for temporal analysis
  • Moderate default k: Returns 10 logs by default - adjust based on your analysis needs
  • Thread isolation: Scoping logs to threads keeps queries focused and fast
  • Index on tool_name: Backend indexes enable fast filtering by tool name