const{ }=>async[ ]fn()<>...
$timemachine_
AI Agent Observability

Time MachineDebug the past. Fork the future.

Capture agent executions. Replay from any step. Compare diffs visually.

~/my-agent
live
$
$
$
Fork & Replay
Step-by-Step Tracing
Visual Diff
Time Travel
FORK & REPLAY

Time Travel for AI Debugging

Fork any execution at any step. Modify the input. Replay from that point forward. No need to re-run the entire agent from scratch.

Click "Fork from here" on any step in the execution graph
Edit inputs in a real-time validated JSON editor
Watch replay progress with live cost tracking
View execution lineage showing all forked branches
execution_7f3a
5 steps
Original
LLM Call1,234 tok
gpt-4o
Tool: search_db45ms
query users
Route Decisionfork point
confidence: 0.72
LLM Call892 tok
generate response
Final Output$0.03
sent to user
Forked
steps 1-2 shared
Route Decisionmodified
confidence: 0.95
LLM Call1,102 tok
new prompt
New Output$0.02
improved result
Replaying from step 3...0/3 steps
STEP-BY-STEP TRACKING

Complete Execution Visibility

Every action your agent takes is captured — inputs, outputs, LLM prompts, tool calls, state snapshots, and cost breakdowns. Nothing hidden.

Full state snapshot at each execution step
Token usage and cost breakdown per step
Supports LLM calls, tool use, decisions, retrievals
Query all execution data via PostgreSQL
Execution Steps
completed
1LLM Call
gpt-4o1,234 tok
InputAnalyze user request...
OutputThe user wants to...
Cost: $0.02Latency: 320ms
2Tool Use
search_db
Input{ "query": "..." }
Output{ "results": [...] }
3Decision
Inputcontext_state
Outputproceed_with_action
VISUAL DIFF

Compare Models Side-by-Side

Run the same prompt through different models simultaneously. See outputs side-by-side with diff highlighting, metrics comparison, and cost analysis.

Dual-pane comparison: GPT-4o, Claude, Gemini, and more
Diff view highlights added and removed text
Token usage, latency, and cost metrics per model
Save and export comparison reports
Model Comparison
Diff ViewOn
GPT-4o$0.024

The analysis shows that

revenue increased by 15%

compared to last quarter.

892 tokens|1.2s
Claude 3.5$0.018

The analysis shows that

revenue grew by 18%

compared to last quarter.

756 tokens|0.9s
RemovedAdded
REVIEW QUEUE

Feedback Loop That Closes

Human reviewers mark outputs as correct or wrong. Developers get automated debug packages. Replay & Validate confirms fixes actually work.

Pending → Wrong → Resolved workflow
One-click debug package generation for developers
Replay with automatic validation (pass/fail)
Keyboard shortcuts for rapid review (C/W)
Review Queue
CcorrectWwrong
exec_001
2m ago
exec_002
5m ago

Ready to debug smarter?

Stop guessing why your agents fail. Start seeing exactly what happened, step by step.

Free to startTypeScript SDKSelf-hosted option