@timemachine-sdk/sdk
AI Agent Observability SDK. Capture every execution step, fork from any point, replay with modifications, and compare results — with zero impact on your agent's performance.
Platform Features
Fork & Replay
Fork any execution at any step and replay from that point forward. Only the steps after the fork point are re-executed.
- Fork from any step in the execution graph
- Modify inputs, prompts, or tool configurations
- Replay only from the fork point (not from scratch)
- Compare original vs forked execution side-by-side
Step-by-Step Tracking
Every action your agent takes is captured with full context — inputs, outputs, state snapshots, token usage, and costs.
- LLM calls, tool use, decisions, retrievals
- Full state snapshot at each step
- Token usage and cost per step
- Latency tracking and performance metrics
Visual Diff & Model Comparison
Run the same prompt through different models simultaneously. See outputs side-by-side with diff highlighting.
- Dual-pane comparison across 8+ models
- Word-level diff highlighting (added/removed)
- Token, latency, and cost metrics per model
Review Queue
Human-in-the-loop feedback workflow. Reviewers mark outputs as correct or wrong, developers get debug packages.
- Three-phase workflow: Pending → Wrong → Resolved
- One-click debug package generation
- Batch replay & validate
Data Drift Detection
Detect when agent outputs change for the same inputs over time. Variable analysis pinpoints root causes.
- Auto-detect output drift across executions
- Variable-by-variable root cause analysis
- Visual divergence timeline
Execution Timeline
Interactive Gantt chart visualization. Spot bottlenecks instantly with cascading bars color-coded by step type.
- Cascading Gantt bars by type (LLM, tool, decision)
- Collapsible trace tree with hierarchy
- Zoom, pan, and keyboard navigation
Installation
Requires Node.js 18+ or any modern JavaScript runtime (Bun, Deno).
npm install @timemachine-sdk/sdkSub-path exports
Import only what you need to keep your bundle minimal:
// Core — client, execution, step recorder, typesimport { TimeMachine, Execution, StepRecorder } from '@timemachine-sdk/sdk';
// Adapters — LangChain callback handlerimport { TimeMachineCallbackHandler, createLangChainHandler } from '@timemachine-sdk/sdk/adapters';
// Utilities — cost calculation, token extractionimport { calculateCost, hasModelPricing, normalizeModelName } from '@timemachine-sdk/sdk/utils';Get Your API Key
Before using the SDK, you need an API key. Follow these steps to get one from the Time Machine dashboard.
2Create a project
Once logged in, click New Project and give it a name (e.g. "my-agent"). A project groups all your executions together.
3Copy your API key
Your API key is displayed once when the project is created. It starts with tm_.
The API key is shown only once. Copy it immediately and store it somewhere safe. If you lose it, you'll need to generate a new one from project settings.
4Set your environment variable
export TIMEMACHINE_API_KEY=tm_your_key_hereOr add it to a .env file in your project root:
TIMEMACHINE_API_KEY=tm_your_key_hereQuick Start
Don't have an API key yet? Get one here.
1Initialize the client
import { TimeMachine } from '@timemachine-sdk/sdk';
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY!, // baseUrl defaults to https://api.timemachine.dev});2Capture an execution
const execution = await tm.startExecution({ name: 'customer-support-agent', metadata: { userId: 'user_123', environment: 'production' },});
// Record an LLM call stepconst step = execution.step('llm_call', { model: 'gpt-4o', prompt: 'Analyze the customer request...',});
const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: 'Help me reset my password' }],});
await step.complete({ output: { message: response.choices[0].message.content }, tokensIn: response.usage?.prompt_tokens, tokensOut: response.usage?.completion_tokens,});
// Mark execution as doneawait execution.complete();3LangChain integration (automatic capture)
import { TimeMachine } from '@timemachine-sdk/sdk';import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });
const { handler, execution } = await createLangChainHandler(tm, { name: 'research-agent', metadata: { model: 'gpt-4o' },});
// Every LLM call, tool use, and decision is captured automaticallyconst result = await agent.invoke( { input: 'Research quantum computing trends' }, { callbacks: [handler] },);
await execution.complete();Core Concepts
Execution
An execution represents one complete run of your AI agent — from start to finish. It has a name, optional metadata, and contains a sequence of steps. An execution can berunning,completed, orfailed.
Step
A step is a single action within an execution. Every LLM call, tool use, decision, or retrieval is a step. Steps capture type, input, output, token counts, cost, latency, tool calls, and optional state snapshots for fork & replay.
Fork & Replay
The killer feature: fork any execution at any step and replay from that point forward with modifications. Only the steps after the fork point are re-executed — prior steps are reused. This lets you debug agent failures without re-running the entire pipeline.
TimeMachine
The main entry point. Create one instance and reuse it across your application.
Constructor
const tm = new TimeMachine({ apiKey: 'tm_...', // required baseUrl: 'https://...', // default: https://api.timemachine.dev maxRetries: 3, // default: 3 (exponential backoff) debug: false, // default: false});| Parameter | Type | Default | Description |
|---|---|---|---|
| apiKey | string | required | Your API key (format: tm_...) |
| baseUrl | string | https://api.timemachine.dev | API endpoint URL |
| maxRetries | number | 3 | Retries with exponential backoff |
| debug | boolean | false | Log SDK activity to console |
tm.startExecution(options?)
Creates a new execution and returns an Execution instance.
const execution = await tm.startExecution({ name: 'my-agent-run', metadata: { model: 'gpt-4o', version: '1.2.0' },});| Parameter | Type | Description |
|---|---|---|
| name | string | Human-readable name for the execution |
| metadata | Record<string, unknown> | Arbitrary key-value data attached to the execution |
Execution
Represents a running execution. Created via tm.startExecution().
Properties
| Property | Type | Description |
|---|---|---|
| id | string | Unique execution ID (read-only) |
| projectId | string | Project ID from the API (read-only) |
execution.step(type, input?)
Creates a new step recorder. The latency timer starts immediately.
const step = execution.step('llm_call', { model: 'gpt-4o', messages: [{ role: 'user', content: 'Hello' }],});execution.complete()
Marks the execution as completed. Flushes any pending batched steps before completing.
await execution.complete();execution.fail(error)
Marks the execution as failed with error details. Accepts an Error object or a string.
await execution.fail(new Error('LLM returned invalid JSON'));// orawait execution.fail('Rate limited by OpenAI');execution.getStatus()
Returns the current status.
const status = execution.getStatus();// 'running' | 'completed' | 'failed'StepRecorder
Records a single step. Created via execution.step(). Latency is auto-calculated from creation time.
step.complete(options?)
Marks the step as completed with optional output and metrics.
await step.complete({ output: { response: 'Here is the answer...' }, tokensIn: 150, tokensOut: 300, cost: 0.0045, latencyMs: 1200, // auto-calculated if omitted toolCalls: [ { name: 'web_search', input: { query: 'news' }, output: { results: [...] } } ], stateSnapshot: { agentState: { memory: [...], plan: [...] }, },});| Parameter | Type | Description |
|---|---|---|
| output | Record<string, unknown> | Output data from this step |
| stateSnapshot | object | Agent state snapshot for fork & replay |
| tokensIn | number | Number of input tokens |
| tokensOut | number | Number of output tokens |
| cost | number | Cost in USD |
| latencyMs | number | Latency in ms (auto-calculated if omitted) |
| toolCalls | ToolCall[] | Tool/function calls made during this step |
| error | StepError | Error details (step completes but with error info) |
step.fail(error)
await step.fail(new Error('API timeout'));step.getStatus() / step.getIndex()
step.getStatus(); // 'running' | 'completed' | 'failed'step.getIndex(); // 0-based index in execution sequenceStep Types
Steps are categorized by type for filtering and analysis in the dashboard.
| Type | Description | Typical Use |
|---|---|---|
| llm_call | LLM or chat model invocation | OpenAI, Anthropic, Google API calls |
| tool_use | Tool or function call | Web search, database queries, API calls |
| decision | Agent routing or planning | Agent selecting which tool to use |
| retrieval | RAG or document retrieval | Vector store queries, document fetches |
| human_input | Human-in-the-loop interaction | Approval prompts, user feedback |
| transform | Data transformation | Parsing, formatting, summarization |
| custom | Anything else | Custom logic, business rules |
Claude Code Integration
Automatically capture every Claude Code session as a traced execution you can inspect, replay, and fork — zero code changes needed.
How it works
Claude Code exposes lifecycle hooks — shell commands that fire on events like session start, tool use, prompt submission, and session end. Time Machine provides a hook bridge that receives these events via stdin and records them as execution steps.
Prerequisites
- Node.js >= 18 or Bun installed
- Claude Code CLI installed (
claudecommand available) - A running Time Machine instance (local dev or hosted)
- A Time Machine project with an API key (get one here)
Step 1: Install the SDK
# npmnpm install @timemachine-sdk/sdk
# bunbun add @timemachine-sdk/sdk
# pnpmpnpm add @timemachine-sdk/sdkStep 2: Set Environment Variables
The bridge reads two environment variables. Add them to your shell profile (~/.zshrc, ~/.bashrc) or export them before launching Claude Code:
export TIMEMACHINE_API_KEY="tm_your-api-key-here"export TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev" # or http://localhost:3000Reload your shell with source ~/.zshrc. For debugging, set TIMEMACHINE_DEBUG=1 to see bridge logs in stderr.
Step 3: Install Hooks
Option A: Automatic (recommended) — Run the installer from your project directory:
node --input-type=module -e " import { installClaudeCodeHooks } from '@timemachine-sdk/sdk/claude-code-installer'; const result = await installClaudeCodeHooks({ projectDir: process.cwd(), scope: 'local' }); console.log(result);"This creates .claude/hooks/timemachine-bridge.mjs and merges hook entries into .claude/settings.local.json for all 11 lifecycle events.
Option A1: Shell wrapper with .env file (recommended for local use) — If Claude Code doesn't inherit your shell environment, use a wrapper that sources a .env file:
TIMEMACHINE_API_KEY="tm_your_project_key"TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev"#!/bin/bashset -asource "$(dirname "$0")/.env"set +aexec node "$(dirname "$0")/timemachine-bridge.mjs" "$@"Make it executable: chmod +x .claude/hooks/run-bridge.sh. Then point all hooks at the wrapper in .claude/settings.local.json.
Option B: Manual installation — Add hook entries to .claude/settings.local.json yourself:
{ "hooks": { "SessionStart": [ { "matcher": "", "hooks": [ { "type": "command", "command": "node /absolute/path/to/.claude/hooks/timemachine-bridge.mjs" } ] } ] }}Repeat the same structure for all 11 events. Then create the bridge script:
import { runClaudeCodeHookBridge } from '@timemachine-sdk/sdk/claude-code-bridge';
runClaudeCodeHookBridge().catch((error) => { console.error('[TimeMachine][ClaudeCodeBridge]', error); process.exitCode = 1;});All 11 Hook Events
| Event | Step Type | What's Recorded |
|---|---|---|
| SessionStart | custom | Session ID, working directory |
| UserPromptSubmit | human_input | The user's prompt text |
| PostToolUse | tool_use | Tool name, success output |
| PostToolUseFailure | tool_use | Tool name, error details |
| Notification | custom | Notification message |
| Stop | custom | Stop reason |
| SubagentStart | custom | Subagent lifecycle start |
| SubagentStop | custom | Subagent lifecycle end |
| PreCompact | custom | Context compaction |
| PermissionRequest | custom | Permission decision |
| SessionEnd | custom | Final status, transcript ingestion |
On SessionEnd, the bridge also parses Claude Code's transcript file (JSONL) to extract assistant messages and file edits that hooks don't capture directly.
Step 4: Verify the Setup
# Check hooks are configuredcat .claude/settings.local.json | python3 -m json.tool | grep -c '"hooks"'
# Check environment variablesecho $TIMEMACHINE_API_KEY # should start with tm_echo $TIMEMACHINE_BASE_URL # should be your server URL
# Test the API keycurl -s -H "Authorization: Bearer $TIMEMACHINE_API_KEY" \ $TIMEMACHINE_BASE_URL/api/v1/executions | head -c 200
# Run a session — then check your dashboardclaudeArchitecture
Each hook invocation is a separate short-lived process. The bridge uses a file-based state store at ~/.timemachine/claude-code/ to correlate events across invocations:
File Layout
.claude/settings.local.json # Hook configuration (11 events).claude/hooks/.env # API key + base URL (gitignored).claude/hooks/run-bridge.sh # Shell wrapper — sources .env, runs node.claude/hooks/timemachine-bridge.mjs # Entrypoint — imports SDK bridge
~/.timemachine/claude-code/ # Session state (auto-cleaned on SessionEnd) <session-id>.json # Maps session → executionIdDashboard Features
- Filter by source — Use the “Claude Code” filter on the executions list
- Step timeline — See every prompt, tool call, and response in order
- Inspect step details — Click any step to see full input/output JSON
- Session replay — Scrub through the timeline to replay what happened
- Fork from any step — Right-click a step to fork the execution from that point
Security Best Practices
- • Keep
.claude/settings.local.jsonuncommitted - • Store
TIMEMACHINE_API_KEYin your shell profile, direnv, or a secret manager - • Add
.claude/hooks/.envto your.gitignore - • Never commit a live
tm_...key in repo-controlled JSON - • Rotate the key immediately if it was committed or shared in screenshots
Troubleshooting
Hooks aren't firing
Make sure .claude/settings.local.json is valid JSON. Verify the command path is absolute and the file exists. Check that node or bun is in your PATH.
“TIMEMACHINE_API_KEY is required” error
Export the variable in the same shell where you run claude. If using a new tab, make sure it's in your shell profile. As a fallback, use the shell wrapper approach with a .env file.
Execution appears but has no steps
Set TIMEMACHINE_DEBUG=1 to see bridge logs. Test the bridge manually: echo '{"session_id":"test","hook_event_name":"SessionStart"}' | node .claude/hooks/timemachine-bridge.mjs
Bridge state is stale
If a session ended abnormally, clean up: rm ~/.timemachine/claude-code/*.json
Quick Reference
# Install SDKbun add @timemachine-sdk/sdk
# Set env varsexport TIMEMACHINE_API_KEY="tm_..."export TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev"
# Install hooks (automatic)node --input-type=module -e " import { installClaudeCodeHooks } from '@timemachine-sdk/sdk/claude-code-installer'; await installClaudeCodeHooks({ projectDir: process.cwd(), scope: 'local' });"
# Verify — run a session, check dashboardclaudeMCP Server
The @timemachine-sdk/mcp package exposes your project's runs, traces, and steps as MCP tools that Claude Code can call directly — without opening a browser. Inspect failures, walk through traces, and get aggregate stats all within the Claude Code terminal.
The MCP server uses the same v1 API, reads three environment variables, and communicates with Claude Code over stdio (no daemon, no port).
Installation
Add the following to your project's .claude/settings.json (or ~/.claude/settings.json for a global install):
{ "mcpServers": { "timemachine": { "command": "npx", "args": ["-y", "@timemachine-sdk/mcp"], "env": { "TIMEMACHINE_API_KEY": "tm_...", "TIMEMACHINE_PROJECT_ID": "proj_...", "TIMEMACHINE_BASE_URL": "https://app.timemachinesdk.dev" } } }}Restart Claude Code after saving. The server starts on demand — you'll see a timemachine entry in /mcp.
Available Tools
Six tools are registered. All return structured plain-text so Claude can reason over results directly:
| Tool | Description | Key params |
|---|---|---|
| list_executions | List executions with optional filters | status, runtime, limit (default 20) |
| get_execution | Full execution detail — name, status, cost, tokens, metadata | execution_id |
| get_steps | All steps with type, status, latency, input, output, and error detail | execution_id |
| get_failed_runs | Shortcut: recent failed executions with debug hints | limit (default 10) |
| tail_execution | Poll an in-progress execution until it reaches a terminal state | execution_id |
| get_project_stats | Aggregate stats across your last 100 runs — success rate, avg cost, avg tokens, p95 latency | — |
Example prompts
How it works
get_failed_runsGET /api/v1/executions?status=failedRoadmap: Querying runs is step one. Native replay — inspect a failure, fork from the problem step, re-run with a fix — is in development. See the native replay roadmap.
CLI — tm
@timemachine-sdk/cli gives you a native terminal interface to your Time Machine project. List runs, tail live executions, inspect traces, fork at a failed step, and open the dashboard — all from your shell, without touching a browser.
The CLI is the fastest way to debug a failure: tm failed shows the last crash, tm view <id> walks the full trace, tm fork <id> --replay re-runs from the broken step.
Install
npm install -g @timemachine-sdk/cli# or run without installing:npx @timemachine-sdk/cli --helpConfiguration
Run tm config set once to store credentials in ~/.timemachine/config.json:
tm config set --api-key tm_... --project-id proj_...
# Or use environment variables:export TIMEMACHINE_API_KEY=tm_...export TIMEMACHINE_PROJECT_ID=proj_...Commands
| Command | Description |
|---|---|
| tm ls | List recent executions in a color-coded table (status, cost, tokens, duration) |
| tm ls --status failed | Filter by status: running | completed | failed | cancelled |
| tm ls --runtime langchain | Filter by runtime tag |
| tm view <id> | Full trace — all steps with type, latency, cost, LLM output snippet, and tool name |
| tm view <id> --json | Raw JSON output — pipe to jq or save for diffing |
| tm tail [id] | Stream a live execution, printing new steps as they arrive. Omit ID to tail the latest. |
| tm tail --all | Show all existing steps on attach, then stream new ones |
| tm failed | Recent failed executions with one-line debug hints |
| tm fork <id> | Fork an execution at a chosen step (interactive step picker) |
| tm fork <id> --at 3 | Fork at step index 3 directly |
| tm fork <id> --at 3 --replay | Fork and immediately start replay |
| tm stats | Aggregate stats across last 100 runs: success rate, avg cost, avg tokens, p95 latency |
| tm open <id> | Open execution in dashboard (browser) |
| tm setup | Install Claude Code hooks — same as installClaudeCodeHooks() but from the terminal |
| tm config show | Display current config (key redacted) |
Typical debug workflow
# 1. See what failedtm failed
# 2. Inspect the tracetm view exec_abc123
# 3. Fork at the broken step and replaytm fork exec_abc123 --at 4 --replay
# 4. Watch the replay livetm tail exec_def456Claude Code integration: Run tm setup to install hooks that automatically capture every Claude Code session as a traced execution — no code changes needed. Then use tm tail to watch your Claude session live.
Eval Platform
Ship agent changes with confidence. Time Machine's eval platform lets you define test suites of real production inputs, assert on outputs, and gate deployments on passing scores — all backed by the same fork & replay infrastructure that powers the dashboard.
Every eval run is a replay: your test case inputs are forked through your live agent, results are scored by assertion, and a 0–1 score rolls up per suite. Wire it into CI/CD and every PR gets an automated quality gate.
Key concepts
Eval Suite
A named collection of test cases — e.g. "Customer Support Quality" or "Reasoning Accuracy".
Test Case
One input (and optional expected output) your agent will be replayed against.
Eval Run
One execution of a full suite — forks each case, runs assertions, returns a 0–1 score.
Assertion
A pass/fail check on the agent's output: contains, regex, llm_judge, cost_under, and more.
Score
Aggregate 0.0–1.0 across all assertions in a run. Set a minimum threshold in CI.
LLM Judge
Ask a language model to grade output quality against a rubric — subjective tests made quantifiable.
Create a suite
Create via the dashboard (Evals → New Suite) or the API:
import { TimeMachine } from '@timemachine-sdk/sdk';
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });
const suite = await tm.createEvalSuite({ name: 'Customer Support Quality', description: 'Verify response accuracy, tone, and latency', agentEndpoint: 'https://your-api.com/agent', tags: ['production', 'support'],});
// Add test casesawait tm.addEvalCase(suite.id, { input: { message: 'How do I reset my password?' }, expectedOutput: { contains: 'reset link' }, tags: ['auth'],});
await tm.addEvalCase(suite.id, { input: { message: 'Cancel my subscription' }, tags: ['billing'],});Save cases from production
The fastest way to build a test suite: save real executions directly from the dashboard. Click any execution → Save as eval case. The input is captured and linked to the suite of your choice.
Run a suite
Three ways to trigger an eval run:
# Run a suite and wait for resultstm eval run suite_abc123 --wait
# Run with a minimum pass threshold (CI use-case)tm eval run suite_abc123 --wait --threshold 0.9
# Check status of a previous runtm eval status run_def456
# List recent runstm eval list suite_abc123Assertions
Assertions are the scoring rules for each test case. A case passes if all its assertions pass; partial passes score proportionally. Assertions are defined per case and evaluated after each eval run.
| Type | Description | Config |
|---|---|---|
| contains | Output string includes a substring (case-insensitive) | { value: "reset link" } |
| not_contains | Output does not include a substring | { value: "error" } |
| regex | Output matches a regular expression | { pattern: "order #\\d+" } |
| llm_judge | LLM grades output against a rubric (0–1 score) | { rubric: "Is the response helpful and accurate?", threshold: 0.8 } |
| json_valid | Output is valid JSON | {} |
| json_path | A JSON path equals an expected value | { path: "$.status", value: "success" } |
| cost_under | Total execution cost is below threshold ($USD) | { maxCost: 0.05 } |
| latency_under | Total execution latency is below threshold (ms) | { maxLatencyMs: 3000 } |
| step_count | Execution has exactly N steps | { count: 5 } |
| custom | Custom JS function evaluated server-side | { fn: "(output) => output.length > 10" } |
Assertion example
await tm.addEvalCase(suite.id, { input: { query: 'Summarise this article in 3 bullet points.' }, assertions: [ // Output must contain bullet points { type: 'contains', value: '•' }, // Graded by LLM on conciseness + accuracy { type: 'llm_judge', rubric: 'Does the response contain exactly 3 concise bullet points that accurately summarise the article?', threshold: 0.8, }, // Must complete within 5s and under $0.02 { type: 'latency_under', maxLatencyMs: 5000 }, { type: 'cost_under', maxCost: 0.02 }, ],});LLM Judge: Use sparingly in CI — each judge call adds LLM cost per run. A good pattern is to combine cheap structural assertions (contains, regex) as fast gates and reserve llm_judge for nightly or pre-release runs.
CI/CD Integration
Block merges on eval regressions. Add the eval run as a required status check and every PR automatically gates on your quality threshold.
The flow: PR opened → GitHub Actions workflow runs → CLI triggers your suite via the API → polls for completion → exits non-zero if score is below threshold → PR blocked until green.
GitHub Actions setup
1. Add TIMEMACHINE_API_KEY to your repo Secrets.
2. Add EVAL_SUITE_ID to your repo Variables.
3. Add this workflow:
name: Eval Suite
on: pull_request: branches: [main] workflow_dispatch:
jobs: evals: name: Run eval suite runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Run eval suite (threshold 0.9) env: TIMEMACHINE_API_KEY: ${{ secrets.TIMEMACHINE_API_KEY }} run: | npx @timemachine-sdk/cli eval run ${{ vars.EVAL_SUITE_ID }} \ --wait \ --threshold 0.9
# Optional: post score as PR comment - name: Post eval score if: always() env: TIMEMACHINE_API_KEY: ${{ secrets.TIMEMACHINE_API_KEY }} GH_TOKEN: ${{ github.token }} run: | SCORE=$(npx @timemachine-sdk/cli eval status --latest --format score) gh pr comment ${{ github.event.pull_request.number }} \ --body "**Eval score:** ${SCORE} / 1.0"Recommended thresholds
| Environment | Threshold | Rationale |
|---|---|---|
| Safety-critical (healthcare, finance) | 1.0 | No regressions tolerated |
| Production | 0.9 | Up to 10% failure rate acceptable |
| Staging / pre-release | 0.8 | Catch regressions early without blocking velocity |
| Experimental / nightly | 0.7 | Track trends; don't block iteration |
Webhook-triggered runs
Trigger runs without installing the CLI — useful for serverless environments or non-GitHub CI:
# Trigger a run via APIRUN=$(curl -s -X POST \ -H "Authorization: Bearer $TIMEMACHINE_API_KEY" \ -H "Content-Type: application/json" \ https://app.timemachinesdk.dev/api/v1/eval/suites/$SUITE_ID/runs)
RUN_ID=$(echo $RUN | jq -r '.id')
# Poll until terminalwhile true; do STATUS=$(curl -s \ -H "Authorization: Bearer $TIMEMACHINE_API_KEY" \ https://app.timemachinesdk.dev/api/v1/eval/runs/$RUN_ID/status)
STATE=$(echo $STATUS | jq -r '.status') SCORE=$(echo $STATUS | jq -r '.score')
[ "$STATE" = "completed" ] && break [ "$STATE" = "failed" ] && exit 1 sleep 5done
# Fail if below thresholdawk "BEGIN { exit ($SCORE < 0.9) }" || exit 1Pro tip: Tag your suites by severity — critical, regression, nightly. Run only critical tagged suites on every PR (fast, cheap), regression on merge, and full nightly on a schedule.
LangChain Adapter
Automatically captures all LLM calls, tool invocations, agent decisions, and retrievals — zero manual instrumentation.
createLangChainHandler(tm, options?)
One-liner to create an execution + callback handler. This is the recommended approach.
import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';
const { handler, execution } = await createLangChainHandler(tm, { name: 'research-agent', metadata: { model: 'gpt-4o' }, debug: false, autoCalculateCost: true, maxDocumentLength: 500,});
await agent.invoke(input, { callbacks: [handler] });await execution.complete();| Option | Type | Default | Description |
|---|---|---|---|
| name | string | — | Execution name |
| metadata | Record<string, unknown> | — | Execution metadata |
| debug | boolean | false | Log captured events to console |
| autoCalculateCost | boolean | true | Auto-calculate cost from token counts |
| maxDocumentLength | number | 500 | Max characters for retrieved documents |
What gets captured automatically
| LangChain Event | Step Type | What's Recorded |
|---|---|---|
| LLM / Chat Model call | llm_call | Model name, messages, tokens, cost, latency |
| Tool invocation | tool_use | Tool name, input, output, latency |
| Agent action | decision | Action type, tool selection, input |
| Agent finish | decision | Final output, return values |
| Retriever call | retrieval | Query, documents (truncated), doc count |
Security: Sensitive parameters (api_key, apiKey, callbacks) are automatically stripped from captured data.
OpenRouter
OpenRouter provides a unified API across 200+ models — Anthropic, OpenAI, Google, DeepSeek, Qwen, Llama, and more — through a single endpoint and API key. Time Machine works natively with OpenRouter with zero extra configuration.
Setup
import { TimeMachine } from '@timemachine-sdk/sdk';
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY!, // No changes needed — configure OpenRouter in your LLM client directly});
// Use OpenRouter as your LLM providerimport OpenAI from 'openai';
const openrouter = new OpenAI({ apiKey: process.env.OPENROUTER_API_KEY!, baseURL: 'https://openrouter.ai/api/v1', defaultHeaders: { 'HTTP-Referer': 'https://your-app.com', // optional, for rankings 'X-Title': 'Your App Name', // optional },});Tracking OpenRouter calls
Capture any model routed through OpenRouter — the model name is passed through transparently.
const execution = await tm.startExecution({ name: 'openrouter-agent', metadata: { router: 'openrouter' },});
const step = execution.step('llm_call', { model: 'anthropic/claude-opus-4', // OpenRouter model ID messages: [{ role: 'user', content: 'Explain quantum entanglement' }],});
const response = await openrouter.chat.completions.create({ model: 'anthropic/claude-opus-4', messages: [{ role: 'user', content: 'Explain quantum entanglement' }],});
await step.complete({ output: { message: response.choices[0].message.content }, tokensIn: response.usage?.prompt_tokens, tokensOut: response.usage?.completion_tokens,});
await execution.complete();LangChain + OpenRouter
The LangChain adapter works seamlessly — just point your ChatOpenAI instance at OpenRouter.
import { ChatOpenAI } from '@langchain/openai';import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';
const model = new ChatOpenAI({ modelName: 'google/gemini-2.5-pro', // or any OpenRouter model openAIApiKey: process.env.OPENROUTER_API_KEY!, configuration: { baseURL: 'https://openrouter.ai/api/v1', },});
const { handler, execution } = await createLangChainHandler(tm, { name: 'gemini-via-openrouter',});
// All LLM calls automatically capturedconst result = await model.invoke('What is the latest news?', { callbacks: [handler],});
await execution.complete();Why use OpenRouter with Time Machine
| Benefit | Detail |
|---|---|
| Single API key | Access 200+ models — no separate accounts for Anthropic, OpenAI, Google, etc. |
| Model fallback | Configure automatic fallback if a model is unavailable or rate-limited |
| Cost optimization | Route cheap tasks to smaller models, complex ones to frontier models |
| Unified billing | One invoice for all LLM costs across providers |
| Model comparison | Easily A/B test models by swapping the model string — Time Machine captures both |
Tip: OpenRouter model IDs use the format provider/model-name (e.g. deepseek/deepseek-r1, qwen/qwen3-235b-a22b). Time Machine stores the full model string in your execution trace for accurate attribution.
Utilities & Cost Tracking
Built-in pricing for 30+ models. Auto-calculated in the LangChain adapter, or use directly.
import { calculateCost, hasModelPricing, getModelPricing, normalizeModelName, configureFallbackPricing, extractTokensFromLLMResult,} from '@timemachine-sdk/sdk/utils';
// Calculate cost for known modelsconst cost = calculateCost('gpt-4o', 1000, 500);// => 0.00625 (USD)
// Check model pricing availabilityhasModelPricing('gpt-4o'); // truehasModelPricing('my-custom-model'); // false
// Get pricing detailsgetModelPricing('gpt-4o');// => { inputPer1k: 0.005, outputPer1k: 0.015 }
// Normalize model names (strips version suffixes)normalizeModelName('gpt-4-0125-preview'); // 'gpt-4'normalizeModelName('claude-3-sonnet-20240229'); // 'claude-3-sonnet'
// Configure fallback pricing for unknown modelsconfigureFallbackPricing({ inputPer1k: 0.002, outputPer1k: 0.006, enabled: true,});
// Extract tokens from LLM results (multi-provider)const { tokensIn, tokensOut } = extractTokensFromLLMResult(llmResult);Guide: Manual Step Recording
For custom agents or frameworks without a built-in adapter.
import { TimeMachine } from '@timemachine-sdk/sdk';
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });
async function runAgent(query: string) { const execution = await tm.startExecution({ name: 'research-agent', metadata: { query, timestamp: Date.now() }, });
try { // Step 1: Plan const planStep = execution.step('decision', { action: 'plan', query }); const plan = await generatePlan(query); await planStep.complete({ output: { plan } });
// Step 2: Search const searchStep = execution.step('tool_use', { tool: 'web_search', query: plan.searchQuery, }); const results = await webSearch(plan.searchQuery); await searchStep.complete({ output: { resultCount: results.length, results }, });
// Step 3: Synthesize const llmStep = execution.step('llm_call', { model: 'gpt-4o', context: results, }); const answer = await callLLM(query, results); await llmStep.complete({ output: { answer }, tokensIn: answer.usage.prompt_tokens, tokensOut: answer.usage.completion_tokens, });
await execution.complete(); return answer; } catch (error) { await execution.fail(error as Error); throw error; }}Guide: Multi-Step Workflows
For agents with sequential or branching logic.
const execution = await tm.startExecution({ name: 'multi-step-workflow' });
// Step 1: Classify the requestconst classifyStep = execution.step('llm_call', { action: 'classify' });const category = await classifyRequest(userInput);await classifyStep.complete({ output: { category } });
// Step 2: Route based on classificationconst routeStep = execution.step('decision', { category });const handler = selectHandler(category);await routeStep.complete({ output: { handler: handler.name } });
// Step 3+: Conditional executionif (category === 'needs_research') { const retrieveStep = execution.step('retrieval', { query: userInput }); const docs = await vectorStore.similaritySearch(userInput); await retrieveStep.complete({ output: { documentCount: docs.length } });
const answerStep = execution.step('llm_call', { model: 'gpt-4o', context: docs }); const answer = await generateAnswer(userInput, docs); await answerStep.complete({ output: { answer }, tokensIn: 2000, tokensOut: 500, });}
await execution.complete();Guide: Error Handling
The SDK is fail-open — it never crashes your app. But you should still record failures for debugging.
const execution = await tm.startExecution({ name: 'agent-run' });
try { const step = execution.step('tool_use', { tool: 'database_query' }); const result = await queryDatabase(sql); await step.complete({ output: { rows: result.length } });
await execution.complete();} catch (error) { // Records the error in Time Machine for debugging await execution.fail(error as Error); throw error;}
// Enable debug mode to see SDK activity in your consoleconst tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY!, debug: true, // Logs all SDK requests and errors});Guide: Express / Fastify
Wrap your API route handlers with execution tracking.
import express from 'express';import { TimeMachine } from '@timemachine-sdk/sdk';
const app = express();const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });
app.post('/api/chat', async (req, res) => { const execution = await tm.startExecution({ name: 'chat-endpoint', metadata: { userId: req.body.userId, sessionId: req.body.sessionId, }, });
try { const step = execution.step('llm_call', { model: 'gpt-4o', messages: req.body.messages, });
const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: req.body.messages, });
await step.complete({ output: { message: response.choices[0].message }, tokensIn: response.usage?.prompt_tokens, tokensOut: response.usage?.completion_tokens, });
await execution.complete(); res.json({ message: response.choices[0].message.content }); } catch (error) { await execution.fail(error as Error); res.status(500).json({ error: 'Internal server error' }); }});Types Reference
Full TypeScript coverage — no any types.
// Client configurationinterface TimeMachineConfig { apiKey: string; baseUrl?: string; // default: 'https://api.timemachine.dev' maxRetries?: number; // default: 3 debug?: boolean; // default: false}
// Execution creationinterface CreateExecutionRequest { name?: string; metadata?: Record<string, unknown>;}
// Step completioninterface StepCompleteOptions { output?: Record<string, unknown>; stateSnapshot?: Omit<StateSnapshot, 'stepId' | 'timestamp'>; tokensIn?: number; tokensOut?: number; cost?: number; latencyMs?: number; toolCalls?: ToolCall[]; error?: StepError;}
// Tool call recordinterface ToolCall { name: string; input: Record<string, unknown>; output?: Record<string, unknown>;}
// Error recordinterface StepError { message: string; stack?: string;}
// Status typestype ExecutionStatus = 'running' | 'completed' | 'failed';type StepStatus = 'running' | 'completed' | 'failed';type StepType = | 'llm_call' | 'tool_use' | 'decision' | 'retrieval' | 'human_input' | 'transform' | 'custom';
// Utility typesinterface TokenUsage { tokensIn: number; tokensOut: number; }interface ModelPricing { inputPer1k: number; outputPer1k: number; }interface FallbackPricingConfig { inputPer1k: number; outputPer1k: number; enabled: boolean;}Supported Models
Built-in pricing for frontier and open-source models (2025). For unlisted models use configureFallbackPricing(). Prices in USD per 1,000 tokens — approximate and subject to provider changes.
Anthropic — Claude 4 series
Current flagship family (2025)
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| claude-opus-4-6 | $0.01500 | $0.07500 | Most powerful, coding & reasoning |
| claude-sonnet-4-6 | $0.00300 | $0.01500 | Best balance of speed & quality |
| claude-haiku-4-5 | $0.00080 | $0.00400 | Fastest, lowest cost |
| claude-3.5-sonnet | $0.00300 | $0.01500 | Previous gen, still widely used |
| claude-3.5-haiku | $0.00080 | $0.00400 | Previous gen fast model |
OpenAI — GPT-4.x & o-series
Including reasoning models (2025)
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| gpt-4.5 | $0.07500 | $0.15000 | Multimodal frontier, highest quality |
| gpt-4.1 | $0.00200 | $0.00800 | Efficient, strong coding |
| gpt-4o | $0.00500 | $0.01500 | Omni model, vision + text |
| gpt-4o-mini | $0.00015 | $0.00060 | Fast & cheap for simple tasks |
| o3 | $0.01000 | $0.04000 | Reasoning model, complex problems |
| o4-mini | $0.00110 | $0.00440 | Reasoning, optimized for cost |
| o1 | $0.01500 | $0.06000 | Previous reasoning generation |
Google — Gemini 2.5
Long context, multimodal (2025)
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| gemini-2.5-pro | $0.00125 | $0.01000 | 1M context, top coding & reasoning |
| gemini-2.5-flash | $0.00008 | $0.00030 | Low latency, best value |
| gemini-2.0-flash | $0.00010 | $0.00040 | Previous gen flash |
| gemini-1.5-pro | $0.00125 | $0.00500 | Legacy, 2M context |
DeepSeek
Open-source, extremely cost-efficient
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| deepseek-r1 | $0.00014 | $0.00219 | Reasoning model, matches o1 quality |
| deepseek-v3 | $0.00007 | $0.00110 | Dense MoE, strong general tasks |
| deepseek-r1-zero | $0.00014 | $0.00219 | RL-trained reasoning, no SFT |
Qwen — Alibaba
Strong multilingual & coding
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| qwen3-235b-a22b | $0.00022 | $0.00088 | Flagship MoE, top open model |
| qwen3-32b | $0.00018 | $0.00072 | Dense, strong reasoning |
| qwen2.5-72b | $0.00023 | $0.00069 | Previous gen, widely deployed |
| qwen2.5-coder-32b | $0.00015 | $0.00060 | Best-in-class code generation |
Kimi — Moonshot AI
Long-context specialist (Chinese frontier lab)
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| kimi-k2 | $0.00060 | $0.00250 | Agentic reasoning, 1M context |
| moonshot-v1-128k | $0.01200 | $0.01200 | Ultra long context |
| moonshot-v1-32k | $0.00400 | $0.00400 | Standard context window |
GLM — Zhipu AI
Chinese lab, strong bilingual performance
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| glm-5 | $0.00100 | $0.00300 | Latest flagship, vision + reasoning |
| glm-4-plus | $0.00070 | $0.00140 | Enhanced GLM-4, long context |
| glm-4 | $0.00014 | $0.00014 | Fast, cost-effective baseline |
Meta — Llama 4
Open weights, free to self-host
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| llama-4-maverick | $0.00019 | $0.00085 | 17B MoE, multimodal |
| llama-4-scout | $0.00017 | $0.00017 | 17B MoE, ultra-efficient |
| llama-3.3-70b | $0.00023 | $0.00040 | Previous gen, solid baseline |
Mistral
| Model | Input ($/1k) | Output ($/1k) | Notes |
|---|---|---|---|
| mistral-large-2 | $0.00200 | $0.00600 | Top Mistral model |
| mistral-small-3 | $0.00010 | $0.00030 | Fast, lightweight |
| codestral | $0.00030 | $0.00090 | Code-specialized |
Pricing note: Prices are approximate as of mid-2025 and change frequently. For models not listed, use configureFallbackPricing() or pass cost directly in step.complete(). When using OpenRouter, pass the model string as-is (e.g. deepseek/deepseek-r1) — it will be stored in the execution trace for your records.
Design Principles
Fail-Open
Never crashes your application. If the API is unreachable, errors are silently logged and your agent continues.
Zero Overhead
Steps are batched asynchronously (up to 10 per batch, flushed every 500ms) to minimize performance impact.
Framework Agnostic
Manual step recording works with any framework. LangChain adapter provides automatic capture.
Type Safe
Full TypeScript coverage with no any types. Autocomplete and compile-time checks everywhere.
Tree-Shakeable
Sub-path exports (/adapters, /utils) let you import only what you need. Minimal bundle impact.
Production Ready
Exponential backoff retries, automatic batching, sensitive data filtering, and graceful error handling.
Pricing
Core observability (executions, steps, fork & replay) is free for all plans. Eval runs are the primary usage metric — they consume compute to replay your agent against each test case.
Free
- 100 eval runs / month
- 1 eval suite
- 10 test cases
- Manual runs only
- Dashboard access
- Community support
Pro
Popular- 2,000 eval runs / month
- Unlimited suites
- Unlimited test cases
- CI/CD integration
- LLM-as-judge assertions
- API + CLI access
- Email support
Team
- 10,000 eval runs / month
- Unlimited suites & cases
- 5 seats included
- Scheduled eval runs
- Slack / webhook alerts
- Priority support
- Usage dashboard
Enterprise
- Unlimited eval runs
- Unlimited seats
- SSO / SAML
- Audit logs
- SLA guarantee
- Dedicated support
- Custom integrations
Usage-based add-ons
Extra eval runs
$0.01 per run above plan limit
LLM judge tokens
Cost + 20% margin (passed through at near-cost)
Extra seats (Team)
$29 / seat / month
All plans include: Unlimited executions captured, unlimited steps, fork & replay, dashboard access, SDK & API access, Claude Code integration, MCP server, and CLI. Eval runs are the only metered resource.
Troubleshooting
Steps are not appearing in the dashboard
Check your API key. Enable debug mode: new TimeMachine({ apiKey: "...", debug: true }). Make sure you call await execution.complete() — steps are flushed on completion.
LangChain adapter isn't capturing events
Make sure you pass the handler in the callbacks array: await agent.invoke(input, { callbacks: [handler] }). Without the callbacks option, nothing is captured.
Cost shows as 0
The model may not be in the built-in pricing table. Use hasModelPricing() to check. Configure fallback pricing with configureFallbackPricing() or pass cost directly in step.complete().
TypeScript errors with sub-path imports
Set moduleResolution to "bundler" or "node16" in your tsconfig.json. This is required for sub-path exports to resolve correctly.
"Cannot find module @timemachine-sdk/sdk"
Run npm install @timemachine-sdk/sdk. Make sure the package is in your dependencies, not just devDependencies (unless you only need it in dev).