Time MachineDebug the past. Fork the future.
Capture every agent step. Fork from any point. Replay with one click.
Capture every agent step. Fork from any point. Replay with one click.
From first install to your first debugged agent — in minutes, not hours.
Install the SDK or connect Claude Code hooks. Every agent step — LLM calls, tool uses, decisions — is automatically recorded.
Visualize every execution as an interactive timeline. Click any step to inspect inputs, outputs, tokens, and costs. Watch it unfold in real-time.
Found the bug? Fork from that exact step. Change one variable, replay the rest. Compare original vs. fixed side-by-side — at a fraction of the cost.
Connect Time Machine to Claude Code with a single command. Every prompt, tool call, and file edit is recorded — no code changes needed.
One command installs hooks into Claude Code
Prompts, tool calls, file edits, errors
See when Claude spawns subagents and what they do
Token counts and costs per session, per step
Ask Claude Code to pull a failed run and inspect the trace. The debugging loop stays where the development loop lives.
Fork any execution at any step. Modify the input. Only the steps after the fork point are re-executed — prior steps are reused instantly. No wasted compute, no waiting for the whole pipeline to run again.
Watch your agent execute like a video. Click anywhere on the timeline to jump to that moment — see exactly which files were read, what edits were made, and why the agent took each action.
Define test suites from real production inputs. Assert on outputs. Gate deployments on passing scores. Every eval run is a replay — powered by the same fork & replay engine.
Automated Pipeline
10 Assertion Types
contains, regex, llm_judge, cost_under, latency_under, json_valid, and more.
CI/CD Quality Gates
Block merges on eval regressions. GitHub Actions integration in 5 minutes.
LLM-as-Judge
Grade subjective output quality with a rubric. Quantify what "good" means.
Cost & Latency Guards
Assert that every run stays under budget and within latency SLAs.
Save from Production
Click any dashboard execution → "Save as eval case". Real data, zero authoring.
Fork & Replay Powered
Each eval run replays your agent via fork — same infra, deterministic results.
Every action your agent takes is captured — inputs, outputs, LLM prompts, tool calls, state snapshots, and cost breakdowns. Nothing hidden.
Analyze user request...The user wants to...{ "query": "..." }{ "results": [...] }context_stateproceed_with_actionGantt chart and trace tree views show timing, dependencies, and bottlenecks across your entire execution. Spot slow steps instantly — find the 200ms tool call hiding behind a 2s LLM call.
Run the same prompt through different models simultaneously. See outputs side-by-side with diff highlighting, metrics comparison, and cost analysis.
The analysis shows that
revenue increased by 15%
compared to last quarter.
The analysis shows that
revenue grew by 18%
compared to last quarter.
Automatically detect when agent outputs change for the same inputs. Pinpoint whether drift comes from data, model, or prompt changes — before your users notice.
Human reviewers mark outputs as correct or wrong. Developers get automated debug packages. Replay & Validate confirms fixes actually work.
Most tools show you what happened. Time Machine lets you change what happened.
Start capturing your agent executions in under 2 minutes. Free to get started.