const{ }=>async[ ]fn()<>...

$timemachine_

AI Agent Observability

Time MachineDebug the past. Fork the future.

Capture agent executions. Replay from any step. Compare diffs visually.

~/my-agent

live

Fork & Replay

Step-by-Step Tracing

Visual Diff

Time Travel

FORK & REPLAY

Time Travel for AI Debugging

Fork any execution at any step. Modify the input. Replay from that point forward. No need to re-run the entire agent from scratch.

Click "Fork from here" on any step in the execution graph

Edit inputs in a real-time validated JSON editor

Watch replay progress with live cost tracking

View execution lineage showing all forked branches

execution_7f3a

5 steps

Original

LLM Call1,234 tok

gpt-4o

Tool: search_db45ms

query users

Route Decisionfork point

confidence: 0.72

LLM Call892 tok

generate response

Final Output$0.03

sent to user

Forked

steps 1-2 shared

Route Decisionmodified

confidence: 0.95

LLM Call1,102 tok

new prompt

New Output$0.02

improved result

Replaying from step 3...0/3 steps

STEP-BY-STEP TRACKING

Complete Execution Visibility

Every action your agent takes is captured — inputs, outputs, LLM prompts, tool calls, state snapshots, and cost breakdowns. Nothing hidden.

Full state snapshot at each execution step

Token usage and cost breakdown per step

Supports LLM calls, tool use, decisions, retrievals

Query all execution data via PostgreSQL

Execution Steps

completed

1LLM Call

gpt-4o1,234 tok

InputAnalyze user request...

OutputThe user wants to...

Cost: $0.02Latency: 320ms

2Tool Use

search_db

Input{ "query": "..." }

Output{ "results": [...] }

3Decision

Inputcontext_state

Outputproceed_with_action

VISUAL DIFF

Compare Models Side-by-Side

Run the same prompt through different models simultaneously. See outputs side-by-side with diff highlighting, metrics comparison, and cost analysis.

Dual-pane comparison: GPT-4o, Claude, Gemini, and more

Diff view highlights added and removed text

Token usage, latency, and cost metrics per model

Save and export comparison reports

Model Comparison

Diff ViewOn

GPT-4o$0.024

The analysis shows that

revenue increased by 15%

compared to last quarter.

892 tokens|1.2s

Claude 3.5$0.018

The analysis shows that

revenue grew by 18%

compared to last quarter.

756 tokens|0.9s

RemovedAdded

REVIEW QUEUE

Feedback Loop That Closes

Human reviewers mark outputs as correct or wrong. Developers get automated debug packages. Replay & Validate confirms fixes actually work.

Pending → Wrong → Resolved workflow

One-click debug package generation for developers

Replay with automatic validation (pass/fail)

Keyboard shortcuts for rapid review (C/W)

Review Queue

CcorrectWwrong

exec_001

2m ago

exec_002

5m ago

Ready to debug smarter?

Stop guessing why your agents fail. Start seeing exactly what happened, step by step.

Free to startTypeScript SDKSelf-hosted option