Tool Log Memory

Tool log memory provides an ordered audit trail of tool executions with status tracking, enabling debugging, analytics, and learning from tool usage patterns.

Overview

Tool log memory records every tool invocation with complete details about inputs, outputs, execution status, and timing. Unlike the toolbox (which stores tool definitions), tool log captures the actual execution history, creating a chronological record of what tools were used, when, and with what results.

This memory type excels at:

Execution tracking: Complete audit trail of all tool invocations
Error debugging: Detailed records of failures with error messages
Performance monitoring: Track execution duration and success rates
Usage analytics: Understand which tools are used most frequently
Learning from history: Discover successful tool usage patterns

The ordered nature of tool logs makes them ideal for temporal analysis, debugging sequences of actions, and understanding the chronological flow of agent behavior.

When to Use

Use tool log memory to:

Debug tool failures: Review what went wrong with specific tool executions
Track execution history: Maintain a complete audit trail for compliance or analysis
Monitor performance: Identify slow or unreliable tools
Learn patterns: Discover which tool sequences lead to success
Provide user transparency: Show users what actions the agent took
Analyze tool usage: Generate statistics on tool popularity and reliability

Tool log is essential for production agents where observability and debugging are critical.

Storage Strategy

Backend: SQL (ordered, sequential access)
Default k: 10 results (logs are numerous, so moderate retrieval)
Embeddings: No (logs are accessed chronologically or by exact match)
Ordered: Yes (temporal ordering is fundamental to logs)
Thread isolation: Logs can be scoped to conversation threads

Schema

Each tool log entry includes:

Field	Type	Required	Description
`tool_name`	string	Yes	Name of the executed tool
`status`	string	Yes	Execution status: "success", "error", "timeout"
`input`	object	No	Input parameters passed to the tool
`output`	object	No	Output result from the tool
`duration_ms`	integer	No	Execution duration in milliseconds
`error`	string	No	Error message if status is "error"
`thread_id`	string	No	Conversation thread if applicable

Additional fields are stored in the metadata dictionary.

API Methods

Logging Tool Executions

async def add_tool_log(
    thread_id: str,
    tool_name: str,
    args: dict[str, Any],
    result: str,
    status: str,
) -> str:
    """
    Log a tool execution.

    Args:
        thread_id: The conversation thread ID
        tool_name: Name of the executed tool
        args: Arguments passed to the tool
        result: Result or output from the tool
        status: Execution status ("success", "error", "timeout")

    Returns:
        The ID of the created log entry
    """

async def log_tool_execution(
    tool_name: str,
    input_params: dict[str, Any],
    output_result: dict[str, Any] | None = None,
    success: bool = True,
    duration_ms: int | None = None,
    error: str | None = None,
) -> str:
    """
    Log a tool execution with detailed parameters.

    Args:
        tool_name: Name of the tool executed
        input_params: Input parameters passed to the tool
        output_result: Output result from the tool
        success: Whether the execution was successful
        duration_ms: Duration in milliseconds
        error: Error message if execution failed

    Returns:
        The ID of the created log entry
    """

Retrieving Logs

async def get_tool_log(
    thread_id: str,
    limit: int = 20,
) -> list[MemoryUnit]:
    """
    Retrieve tool execution log for a thread.

    Args:
        thread_id: The conversation thread ID
        limit: Maximum number of log entries to retrieve

    Returns:
        List of tool log MemoryUnit objects, ordered from oldest to newest
    """

async def search_tool_logs(
    query: str,
    k: int = 10,
) -> list[MemoryUnit]:
    """
    Search tool execution logs.

    Args:
        query: Search query (tool name or partial match)
        k: Number of results to return

    Returns:
        List of matching tool log memory units
    """

Examples

Basic Tool Execution Logging

from memharness import MemoryHarness
import time

harness = MemoryHarness(backend="sqlite:///memory.db")

# Execute a tool and log it
start_time = time.time()

try:
    result = await github_create_issue(
        title="Bug fix",
        body="Fixed the authentication issue"
    )

    duration_ms = int((time.time() - start_time) * 1000)

    await harness.log_tool_execution(
        tool_name="github.create_issue",
        input_params={
            "title": "Bug fix",
            "body": "Fixed the authentication issue"
        },
        output_result={"issue_number": result.number},
        success=True,
        duration_ms=duration_ms
    )

except Exception as e:
    duration_ms = int((time.time() - start_time) * 1000)

    await harness.log_tool_execution(
        tool_name="github.create_issue",
        input_params={
            "title": "Bug fix",
            "body": "Fixed the authentication issue"
        },
        success=False,
        duration_ms=duration_ms,
        error=str(e)
    )

Debugging Tool Failures

# Retrieve recent tool logs for a conversation thread
logs = await harness.get_tool_log("chat-123", limit=20)

# Find failed executions
failures = [log for log in logs if log.metadata.get("status") == "error"]

for failure in failures:
    print(f"Tool: {failure.metadata['tool_name']}")
    print(f"Error: {failure.metadata.get('error', 'Unknown')}")
    print(f"Input: {failure.metadata.get('input', {})}")
    print(f"Time: {failure.created_at}")
    print("---")

Tool Usage Analytics

async def analyze_tool_usage(harness: MemoryHarness):
    """Generate statistics on tool usage patterns."""

    # Search for all tool logs (adjust limit as needed)
    logs = await harness.search_tool_logs("", k=1000)

    # Compute statistics
    tool_counts = {}
    tool_failures = {}
    tool_durations = {}

    for log in logs:
        tool_name = log.metadata.get("tool_name", "unknown")
        status = log.metadata.get("status", "unknown")
        duration = log.metadata.get("duration_ms", 0)

        # Count executions
        tool_counts[tool_name] = tool_counts.get(tool_name, 0) + 1

        # Count failures
        if status == "error":
            tool_failures[tool_name] = tool_failures.get(tool_name, 0) + 1

        # Track durations
        if tool_name not in tool_durations:
            tool_durations[tool_name] = []
        if duration > 0:
            tool_durations[tool_name].append(duration)

    # Print report
    print("Tool Usage Report")
    print("=" * 50)

    for tool_name in sorted(tool_counts.keys()):
        total = tool_counts[tool_name]
        failures = tool_failures.get(tool_name, 0)
        success_rate = ((total - failures) / total * 100) if total > 0 else 0

        avg_duration = 0
        if tool_name in tool_durations and tool_durations[tool_name]:
            avg_duration = sum(tool_durations[tool_name]) / len(tool_durations[tool_name])

        print(f"{tool_name}:")
        print(f"  Executions: {total}")
        print(f"  Success rate: {success_rate:.1f}%")
        print(f"  Avg duration: {avg_duration:.0f}ms")
        print()

# Usage
await analyze_tool_usage(harness)

Automatic Tool Wrapper

from functools import wraps
from typing import Callable, Any

def logged_tool(tool_name: str, harness: MemoryHarness) -> Callable:
    """Decorator that automatically logs tool executions."""

    def decorator(func: Callable) -> Callable:
        @wraps(func)
        async def wrapper(*args: Any, **kwargs: Any) -> Any:
            start_time = time.time()

            try:
                result = await func(*args, **kwargs)
                duration_ms = int((time.time() - start_time) * 1000)

                await harness.log_tool_execution(
                    tool_name=tool_name,
                    input_params=kwargs,
                    output_result={"result": str(result)[:500]},  # Truncate long results
                    success=True,
                    duration_ms=duration_ms
                )

                return result

            except Exception as e:
                duration_ms = int((time.time() - start_time) * 1000)

                await harness.log_tool_execution(
                    tool_name=tool_name,
                    input_params=kwargs,
                    success=False,
                    duration_ms=duration_ms,
                    error=str(e)
                )

                raise

        return wrapper
    return decorator

# Usage
@logged_tool("github.create_issue", harness)
async def create_github_issue(title: str, body: str):
    # Tool implementation
    return await github_api.create_issue(title=title, body=body)

Best Practices

Log every tool execution: Don't selectively log - capture all invocations for complete observability
Include timing information: Always record duration_ms for performance analysis
Truncate large outputs: Store summaries of large results rather than full content to avoid bloat
Use consistent tool names: Match tool names with those in toolbox memory for cross-referencing
Set reasonable limits: When retrieving logs, use appropriate limits to avoid loading thousands of entries
Implement log rotation: For long-running agents, periodically archive or summarize old logs

Integration with Other Memory Types

Tool log memory integrates with other memory types:

Toolbox: Tool definitions in toolbox map to executions in tool log
Skills: Successful tool patterns in logs inform skill development
Workflow: Tool logs show actual execution vs. planned workflow steps
Conversational: Thread-scoped logs provide context for conversation turns
Summary: Compress extensive tool logs into high-level activity summaries

Performance Notes

SQL storage: Efficient for chronological queries and exact-match lookups
No embeddings: Logs don't use vector search - accessed by time, thread, or tool name
Ordered retrieval: Logs are returned in chronological order for temporal analysis
Moderate default k: Returns 10 logs by default - adjust based on your analysis needs
Thread isolation: Scoping logs to threads keeps queries focused and fast
Index on tool_name: Backend indexes enable fast filtering by tool name

Overview​

When to Use​

Storage Strategy​

Schema​

API Methods​

Logging Tool Executions​

Retrieving Logs​

Examples​

Basic Tool Execution Logging​

Debugging Tool Failures​

Tool Usage Analytics​

Automatic Tool Wrapper​

Best Practices​

Integration with Other Memory Types​

Performance Notes​