Documentation

@timemachine-sdk/sdk

AI Agent Observability SDK. Capture every execution step, fork from any point, replay with modifications, and compare results — with zero impact on your agent's performance.

Get Started GitHub

Platform Features

Fork & Replay

Fork any execution at any step and replay from that point forward. Only the steps after the fork point are re-executed.

Fork from any step in the execution graph
Modify inputs, prompts, or tool configurations
Replay only from the fork point (not from scratch)
Compare original vs forked execution side-by-side

Step-by-Step Tracking

Every action your agent takes is captured with full context — inputs, outputs, state snapshots, token usage, and costs.

LLM calls, tool use, decisions, retrievals
Full state snapshot at each step
Token usage and cost per step
Latency tracking and performance metrics

Visual Diff & Model Comparison

Run the same prompt through different models simultaneously. See outputs side-by-side with diff highlighting.

Dual-pane comparison across 8+ models
Word-level diff highlighting (added/removed)
Token, latency, and cost metrics per model

Review Queue

Human-in-the-loop feedback workflow. Reviewers mark outputs as correct or wrong, developers get debug packages.

Three-phase workflow: Pending → Wrong → Resolved
One-click debug package generation
Batch replay & validate

Data Drift Detection

Detect when agent outputs change for the same inputs over time. Variable analysis pinpoints root causes.

Auto-detect output drift across executions
Variable-by-variable root cause analysis
Visual divergence timeline

Execution Timeline

Interactive Gantt chart visualization. Spot bottlenecks instantly with cascading bars color-coded by step type.

Cascading Gantt bars by type (LLM, tool, decision)
Collapsible trace tree with hierarchy
Zoom, pan, and keyboard navigation

Installation

Requires Node.js 18+ or any modern JavaScript runtime (Bun, Deno).

terminal

npm install @timemachine-sdk/sdk

yarn add @timemachine-sdk/sdk|bun add @timemachine-sdk/sdk|pnpm add @timemachine-sdk/sdk

Sub-path exports

Import only what you need to keep your bundle minimal:

imports.ts

// Core — client, execution, step recorder, types
import { TimeMachine, Execution, StepRecorder } from '@timemachine-sdk/sdk';

// Adapters — LangChain callback handler
import { TimeMachineCallbackHandler, createLangChainHandler }
  from '@timemachine-sdk/sdk/adapters';

// Utilities — cost calculation, token extraction
import { calculateCost, hasModelPricing, normalizeModelName }
  from '@timemachine-sdk/sdk/utils';

Get Your API Key

Before using the SDK, you need an API key. Follow these steps to get one from the Time Machine dashboard.

1Create an account

2Create a project

Once logged in, click New Project and give it a name (e.g. "my-agent"). A project groups all your executions together.

3Copy your API key

Your API key is displayed once when the project is created. It starts with tm_.

The API key is shown only once. Copy it immediately and store it somewhere safe. If you lose it, you'll need to generate a new one from project settings.

4Set your environment variable

terminal

export TIMEMACHINE_API_KEY=tm_your_key_here

Or add it to a .env file in your project root:

.env

TIMEMACHINE_API_KEY=tm_your_key_here

Quick Start

Don't have an API key yet? Get one here.

1Initialize the client

agent.ts

import { TimeMachine } from '@timemachine-sdk/sdk';

const tm = new TimeMachine({
  apiKey: process.env.TIMEMACHINE_API_KEY!,
  // baseUrl defaults to https://api.timemachine.dev
});

2Capture an execution

agent.ts

const execution = await tm.startExecution({
  name: 'customer-support-agent',
  metadata: { userId: 'user_123', environment: 'production' },
});

// Record an LLM call step
const step = execution.step('llm_call', {
  model: 'gpt-4o',
  prompt: 'Analyze the customer request...',
});

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Help me reset my password' }],
});

await step.complete({
  output: { message: response.choices[0].message.content },
  tokensIn: response.usage?.prompt_tokens,
  tokensOut: response.usage?.completion_tokens,
});

// Mark execution as done
await execution.complete();

3LangChain integration (automatic capture)

langchain-agent.ts

import { TimeMachine } from '@timemachine-sdk/sdk';
import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';

const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });

const { handler, execution } = await createLangChainHandler(tm, {
  name: 'research-agent',
  metadata: { model: 'gpt-4o' },
});

// Every LLM call, tool use, and decision is captured automatically
const result = await agent.invoke(
  { input: 'Research quantum computing trends' },
  { callbacks: [handler] },
);

await execution.complete();

Core Concepts

Execution

An execution represents one complete run of your AI agent — from start to finish. It has a name, optional metadata, and contains a sequence of steps. An execution can berunning,completed, orfailed.

Step

A step is a single action within an execution. Every LLM call, tool use, decision, or retrieval is a step. Steps capture type, input, output, token counts, cost, latency, tool calls, and optional state snapshots for fork & replay.

Fork & Replay

The killer feature: fork any execution at any step and replay from that point forward with modifications. Only the steps after the fork point are re-executed — prior steps are reused. This lets you debug agent failures without re-running the entire pipeline.

TimeMachine

The main entry point. Create one instance and reuse it across your application.

Constructor

initialization

const tm = new TimeMachine({
  apiKey: 'tm_...',          // required
  baseUrl: 'https://...',   // default: https://api.timemachine.dev
  maxRetries: 3,             // default: 3 (exponential backoff)
  debug: false,              // default: false
});

Parameter	Type	Default	Description
apiKey	string	required	Your API key (format: tm_...)
baseUrl	string	https://api.timemachine.dev	API endpoint URL
maxRetries	number	3	Retries with exponential backoff
debug	boolean	false	Log SDK activity to console

tm.startExecution(options?)

Creates a new execution and returns an Execution instance.

const execution = await tm.startExecution({
  name: 'my-agent-run',
  metadata: { model: 'gpt-4o', version: '1.2.0' },
});

Parameter	Type	Description
name	string	Human-readable name for the execution
metadata	Record<string, unknown>	Arbitrary key-value data attached to the execution

Execution

Represents a running execution. Created via tm.startExecution().

Properties

Property	Type	Description
id	string	Unique execution ID (read-only)
projectId	string	Project ID from the API (read-only)

execution.step(type, input?)

Creates a new step recorder. The latency timer starts immediately.

const step = execution.step('llm_call', {
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

execution.complete()

Marks the execution as completed. Flushes any pending batched steps before completing.

await execution.complete();

execution.fail(error)

Marks the execution as failed with error details. Accepts an Error object or a string.

await execution.fail(new Error('LLM returned invalid JSON'));
// or
await execution.fail('Rate limited by OpenAI');

execution.getStatus()

Returns the current status.

const status = execution.getStatus();
// 'running' | 'completed' | 'failed'

StepRecorder

Records a single step. Created via execution.step(). Latency is auto-calculated from creation time.

step.complete(options?)

Marks the step as completed with optional output and metrics.

await step.complete({
  output: { response: 'Here is the answer...' },
  tokensIn: 150,
  tokensOut: 300,
  cost: 0.0045,
  latencyMs: 1200,   // auto-calculated if omitted
  toolCalls: [
    { name: 'web_search', input: { query: 'news' }, output: { results: [...] } }
  ],
  stateSnapshot: {
    agentState: { memory: [...], plan: [...] },
  },
});

Parameter	Type	Description
output	Record<string, unknown>	Output data from this step
stateSnapshot	object	Agent state snapshot for fork & replay
tokensIn	number	Number of input tokens
tokensOut	number	Number of output tokens
cost	number	Cost in USD
latencyMs	number	Latency in ms (auto-calculated if omitted)
toolCalls	ToolCall[]	Tool/function calls made during this step
error	StepError	Error details (step completes but with error info)

step.fail(error)

await step.fail(new Error('API timeout'));

step.getStatus() / step.getIndex()

step.getStatus(); // 'running' | 'completed' | 'failed'
step.getIndex();  // 0-based index in execution sequence

Step Types

Steps are categorized by type for filtering and analysis in the dashboard.

Type	Description	Typical Use
llm_call	LLM or chat model invocation	OpenAI, Anthropic, Google API calls
tool_use	Tool or function call	Web search, database queries, API calls
decision	Agent routing or planning	Agent selecting which tool to use
retrieval	RAG or document retrieval	Vector store queries, document fetches
human_input	Human-in-the-loop interaction	Approval prompts, user feedback
transform	Data transformation	Parsing, formatting, summarization
custom	Anything else	Custom logic, business rules

Claude Code Integration

Automatically capture every Claude Code session as a traced execution you can inspect, replay, and fork — zero code changes needed.

How it works

Claude Code exposes lifecycle hooks — shell commands that fire on events like session start, tool use, prompt submission, and session end. Time Machine provides a hook bridge that receives these events via stdin and records them as execution steps.

Claude Code (hook event) → stdin JSON → bridge script → Time Machine API → dashboard

Prerequisites

Node.js >= 18 or Bun installed
Claude Code CLI installed (claude command available)
A running Time Machine instance (local dev or hosted)
A Time Machine project with an API key (get one here)

Step 1: Install the SDK

terminal

# npm
npm install @timemachine-sdk/sdk

# bun
bun add @timemachine-sdk/sdk

# pnpm
pnpm add @timemachine-sdk/sdk

Step 2: Set Environment Variables

The bridge reads two environment variables. Add them to your shell profile (~/.zshrc, ~/.bashrc) or export them before launching Claude Code:

~/.zshrc

export TIMEMACHINE_API_KEY="tm_your-api-key-here"
export TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev"  # or http://localhost:3000

Reload your shell with source ~/.zshrc. For debugging, set TIMEMACHINE_DEBUG=1 to see bridge logs in stderr.

Step 3: Install Hooks

Option A: Automatic (recommended) — Run the installer from your project directory:

terminal

node --input-type=module -e "
  import { installClaudeCodeHooks } from '@timemachine-sdk/sdk/claude-code-installer';
  const result = await installClaudeCodeHooks({
    projectDir: process.cwd(),
    scope: 'local'
  });
  console.log(result);
"

This creates .claude/hooks/timemachine-bridge.mjs and merges hook entries into .claude/settings.local.json for all 11 lifecycle events.

Option A1: Shell wrapper with .env file (recommended for local use) — If Claude Code doesn't inherit your shell environment, use a wrapper that sources a .env file:

.claude/hooks/.env

TIMEMACHINE_API_KEY="tm_your_project_key"
TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev"

.claude/hooks/run-bridge.sh

#!/bin/bash
set -a
source "$(dirname "$0")/.env"
set +a
exec node "$(dirname "$0")/timemachine-bridge.mjs" "$@"

Make it executable: chmod +x .claude/hooks/run-bridge.sh. Then point all hooks at the wrapper in .claude/settings.local.json.

Option B: Manual installation — Add hook entries to .claude/settings.local.json yourself:

.claude/settings.local.json

{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "node /absolute/path/to/.claude/hooks/timemachine-bridge.mjs"
          }
        ]
      }
    ]
  }
}

Repeat the same structure for all 11 events. Then create the bridge script:

.claude/hooks/timemachine-bridge.mjs

import { runClaudeCodeHookBridge } from '@timemachine-sdk/sdk/claude-code-bridge';

runClaudeCodeHookBridge().catch((error) => {
  console.error('[TimeMachine][ClaudeCodeBridge]', error);
  process.exitCode = 1;
});

All 11 Hook Events

Event	Step Type	What's Recorded
SessionStart	custom	Session ID, working directory
UserPromptSubmit	human_input	The user's prompt text
PostToolUse	tool_use	Tool name, success output
PostToolUseFailure	tool_use	Tool name, error details
Notification	custom	Notification message
Stop	custom	Stop reason
SubagentStart	custom	Subagent lifecycle start
SubagentStop	custom	Subagent lifecycle end
PreCompact	custom	Context compaction
PermissionRequest	custom	Permission decision
SessionEnd	custom	Final status, transcript ingestion

On SessionEnd, the bridge also parses Claude Code's transcript file (JSONL) to extract assistant messages and file edits that hooks don't capture directly.

Step 4: Verify the Setup

terminal

# Check hooks are configured
cat .claude/settings.local.json | python3 -m json.tool | grep -c '"hooks"'

# Check environment variables
echo $TIMEMACHINE_API_KEY    # should start with tm_
echo $TIMEMACHINE_BASE_URL   # should be your server URL

# Test the API key
curl -s -H "Authorization: Bearer $TIMEMACHINE_API_KEY" \
  $TIMEMACHINE_BASE_URL/api/v1/executions | head -c 200

# Run a session — then check your dashboard
claude

Architecture

Each hook invocation is a separate short-lived process. The bridge uses a file-based state store at ~/.timemachine/claude-code/ to correlate events across invocations:

1. Claude Code fires hook event → passes JSON to stdin

2. Bridge reads stdin, normalizes the event payload

3. Checks ~/.timemachine/claude-code/<session>.json for existing execution

• No state → creates new execution via API

• State exists → resumes execution

4. Converts event to Time Machine step and records it

5. On SessionEnd: parses transcript, appends derived steps, marks complete

File Layout

project structure

.claude/settings.local.json    # Hook configuration (11 events)
.claude/hooks/.env             # API key + base URL (gitignored)
.claude/hooks/run-bridge.sh    # Shell wrapper — sources .env, runs node
.claude/hooks/timemachine-bridge.mjs  # Entrypoint — imports SDK bridge

~/.timemachine/claude-code/    # Session state (auto-cleaned on SessionEnd)
  <session-id>.json            # Maps session → executionId

Dashboard Features

Filter by source — Use the “Claude Code” filter on the executions list
Step timeline — See every prompt, tool call, and response in order
Inspect step details — Click any step to see full input/output JSON
Session replay — Scrub through the timeline to replay what happened
Fork from any step — Right-click a step to fork the execution from that point

Security Best Practices

Important

• Keep .claude/settings.local.json uncommitted
• Store TIMEMACHINE_API_KEY in your shell profile, direnv, or a secret manager
• Add .claude/hooks/.env to your .gitignore
• Never commit a live tm_... key in repo-controlled JSON
• Rotate the key immediately if it was committed or shared in screenshots

Troubleshooting

Hooks aren't firing

Make sure .claude/settings.local.json is valid JSON. Verify the command path is absolute and the file exists. Check that node or bun is in your PATH.

“TIMEMACHINE_API_KEY is required” error

Export the variable in the same shell where you run claude. If using a new tab, make sure it's in your shell profile. As a fallback, use the shell wrapper approach with a .env file.

Execution appears but has no steps

Set TIMEMACHINE_DEBUG=1 to see bridge logs. Test the bridge manually: echo '{"session_id":"test","hook_event_name":"SessionStart"}' | node .claude/hooks/timemachine-bridge.mjs

Bridge state is stale

If a session ended abnormally, clean up: rm ~/.timemachine/claude-code/*.json

Quick Reference

terminal

# Install SDK
bun add @timemachine-sdk/sdk

# Set env vars
export TIMEMACHINE_API_KEY="tm_..."
export TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev"

# Install hooks (automatic)
node --input-type=module -e "
  import { installClaudeCodeHooks } from '@timemachine-sdk/sdk/claude-code-installer';
  await installClaudeCodeHooks({ projectDir: process.cwd(), scope: 'local' });
"

# Verify — run a session, check dashboard
claude

MCP Server

The @timemachine-sdk/mcp package exposes your project's runs, traces, and steps as MCP tools that Claude Code can call directly — without opening a browser. Inspect failures, walk through traces, and get aggregate stats all within the Claude Code terminal.

The MCP server uses the same v1 API, reads three environment variables, and communicates with Claude Code over stdio (no daemon, no port).

Installation

Add the following to your project's .claude/settings.json (or ~/.claude/settings.json for a global install):

.claude/settings.json

{
  "mcpServers": {
    "timemachine": {
      "command": "npx",
      "args": ["-y", "@timemachine-sdk/mcp"],
      "env": {
        "TIMEMACHINE_API_KEY": "tm_...",
        "TIMEMACHINE_PROJECT_ID": "proj_...",
        "TIMEMACHINE_BASE_URL": "https://app.timemachinesdk.dev"
      }
    }
  }
}

Restart Claude Code after saving. The server starts on demand — you'll see a timemachine entry in /mcp.

Available Tools

Six tools are registered. All return structured plain-text so Claude can reason over results directly:

Tool	Description	Key params
list_executions	List executions with optional filters	status, runtime, limit (default 20)
get_execution	Full execution detail — name, status, cost, tokens, metadata	execution_id
get_steps	All steps with type, status, latency, input, output, and error detail	execution_id
get_failed_runs	Shortcut: recent failed executions with debug hints	limit (default 10)
tail_execution	Poll an in-progress execution until it reaches a terminal state	execution_id
get_project_stats	Aggregate stats across your last 100 runs — success rate, avg cost, avg tokens, p95 latency	—

Example prompts

›Show me my last 5 failed runs.

›Get the full trace for execution abc123 and tell me what went wrong.

›What was my agent doing in the last Claude Code session?

›Give me aggregate stats on my last 100 runs.

›Watch execution def456 until it finishes.

How it works

you type→Claude Code reads prompt

Claude decides→calls get_failed_runs

MCP server→fetches GET /api/v1/executions?status=failed

returns→structured text Claude can reason over

Claude responds→inline trace analysis + fix suggestions

Roadmap: Querying runs is step one. Native replay — inspect a failure, fork from the problem step, re-run with a fix — is in development. See the native replay roadmap.

CLI — `tm`

@timemachine-sdk/cli gives you a native terminal interface to your Time Machine project. List runs, tail live executions, inspect traces, fork at a failed step, and open the dashboard — all from your shell, without touching a browser.

The CLI is the fastest way to debug a failure: tm failed shows the last crash, tm view <id> walks the full trace, tm fork <id> --replay re-runs from the broken step.

Install

terminal

npm install -g @timemachine-sdk/cli
# or run without installing:
npx @timemachine-sdk/cli --help

Configuration

Run tm config set once to store credentials in ~/.timemachine/config.json:

terminal

tm config set --api-key tm_... --project-id proj_...

# Or use environment variables:
export TIMEMACHINE_API_KEY=tm_...
export TIMEMACHINE_PROJECT_ID=proj_...

Commands

Command	Description
tm ls	List recent executions in a color-coded table (status, cost, tokens, duration)
tm ls --status failed	Filter by status: running \| completed \| failed \| cancelled
tm ls --runtime langchain	Filter by runtime tag
tm view <id>	Full trace — all steps with type, latency, cost, LLM output snippet, and tool name
tm view <id> --json	Raw JSON output — pipe to jq or save for diffing
tm tail [id]	Stream a live execution, printing new steps as they arrive. Omit ID to tail the latest.
tm tail --all	Show all existing steps on attach, then stream new ones
tm failed	Recent failed executions with one-line debug hints
tm fork <id>	Fork an execution at a chosen step (interactive step picker)
tm fork <id> --at 3	Fork at step index 3 directly
tm fork <id> --at 3 --replay	Fork and immediately start replay
tm stats	Aggregate stats across last 100 runs: success rate, avg cost, avg tokens, p95 latency
tm open <id>	Open execution in dashboard (browser)
tm setup	Install Claude Code hooks — same as installClaudeCodeHooks() but from the terminal
tm config show	Display current config (key redacted)

Typical debug workflow

terminal

# 1. See what failed
tm failed

# 2. Inspect the trace
tm view exec_abc123

# 3. Fork at the broken step and replay
tm fork exec_abc123 --at 4 --replay

# 4. Watch the replay live
tm tail exec_def456

Claude Code integration: Run tm setup to install hooks that automatically capture every Claude Code session as a traced execution — no code changes needed. Then use tm tail to watch your Claude session live.

Eval Platform

Ship agent changes with confidence. Time Machine's eval platform lets you define test suites of real production inputs, assert on outputs, and gate deployments on passing scores — all backed by the same fork & replay infrastructure that powers the dashboard.

Every eval run is a replay: your test case inputs are forked through your live agent, results are scored by assertion, and a 0–1 score rolls up per suite. Wire it into CI/CD and every PR gets an automated quality gate.

Key concepts

Eval Suite

A named collection of test cases — e.g. "Customer Support Quality" or "Reasoning Accuracy".

Test Case

One input (and optional expected output) your agent will be replayed against.

Eval Run

One execution of a full suite — forks each case, runs assertions, returns a 0–1 score.

Assertion

A pass/fail check on the agent's output: contains, regex, llm_judge, cost_under, and more.

Score

Aggregate 0.0–1.0 across all assertions in a run. Set a minimum threshold in CI.

LLM Judge

Ask a language model to grade output quality against a rubric — subjective tests made quantifiable.

Create a suite

Create via the dashboard (Evals → New Suite) or the API:

eval-suite.ts

import { TimeMachine } from '@timemachine-sdk/sdk';

const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });

const suite = await tm.createEvalSuite({
  name: 'Customer Support Quality',
  description: 'Verify response accuracy, tone, and latency',
  agentEndpoint: 'https://your-api.com/agent',
  tags: ['production', 'support'],
});

// Add test cases
await tm.addEvalCase(suite.id, {
  input: { message: 'How do I reset my password?' },
  expectedOutput: { contains: 'reset link' },
  tags: ['auth'],
});

await tm.addEvalCase(suite.id, {
  input: { message: 'Cancel my subscription' },
  tags: ['billing'],
});

Save cases from production

The fastest way to build a test suite: save real executions directly from the dashboard. Click any execution → Save as eval case. The input is captured and linked to the suite of your choice.

Run a suite

Three ways to trigger an eval run:

DashboardNavigate to Evals → your suite → Run Now. Results appear live as cases complete.

APIPOST /api/v1/eval/suites/:id/runs — returns a run ID you can poll for status.

CLItm eval run <suiteId> --wait --threshold 0.9 — blocks until done, exits non-zero on failure.

terminal

# Run a suite and wait for results
tm eval run suite_abc123 --wait

# Run with a minimum pass threshold (CI use-case)
tm eval run suite_abc123 --wait --threshold 0.9

# Check status of a previous run
tm eval status run_def456

# List recent runs
tm eval list suite_abc123

Assertions

Assertions are the scoring rules for each test case. A case passes if all its assertions pass; partial passes score proportionally. Assertions are defined per case and evaluated after each eval run.

Type	Description	Config
contains	Output string includes a substring (case-insensitive)	{ value: "reset link" }
not_contains	Output does not include a substring	{ value: "error" }
regex	Output matches a regular expression	{ pattern: "order #\\d+" }
llm_judge	LLM grades output against a rubric (0–1 score)	{ rubric: "Is the response helpful and accurate?", threshold: 0.8 }
json_valid	Output is valid JSON	{}
json_path	A JSON path equals an expected value	{ path: "$.status", value: "success" }
cost_under	Total execution cost is below threshold ($USD)	{ maxCost: 0.05 }
latency_under	Total execution latency is below threshold (ms)	{ maxLatencyMs: 3000 }
step_count	Execution has exactly N steps	{ count: 5 }
custom	Custom JS function evaluated server-side	{ fn: "(output) => output.length > 10" }

Assertion example

assertions.ts

await tm.addEvalCase(suite.id, {
  input: { query: 'Summarise this article in 3 bullet points.' },
  assertions: [
    // Output must contain bullet points
    { type: 'contains', value: '•' },
    // Graded by LLM on conciseness + accuracy
    {
      type: 'llm_judge',
      rubric: 'Does the response contain exactly 3 concise bullet points that accurately summarise the article?',
      threshold: 0.8,
    },
    // Must complete within 5s and under $0.02
    { type: 'latency_under', maxLatencyMs: 5000 },
    { type: 'cost_under', maxCost: 0.02 },
  ],
});

LLM Judge: Use sparingly in CI — each judge call adds LLM cost per run. A good pattern is to combine cheap structural assertions (contains, regex) as fast gates and reserve llm_judge for nightly or pre-release runs.

CI/CD Integration

Block merges on eval regressions. Add the eval run as a required status check and every PR automatically gates on your quality threshold.

The flow: PR opened → GitHub Actions workflow runs → CLI triggers your suite via the API → polls for completion → exits non-zero if score is below threshold → PR blocked until green.

GitHub Actions setup

1. Add TIMEMACHINE_API_KEY to your repo Secrets.
2. Add EVAL_SUITE_ID to your repo Variables.
3. Add this workflow:

.github/workflows/evals.yml

name: Eval Suite

on:
  pull_request:
    branches: [main]
  workflow_dispatch:

jobs:
  evals:
    name: Run eval suite
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run eval suite (threshold 0.9)
        env:
          TIMEMACHINE_API_KEY: ${{ secrets.TIMEMACHINE_API_KEY }}
        run: |
          npx @timemachine-sdk/cli eval run ${{ vars.EVAL_SUITE_ID }} \
            --wait \
            --threshold 0.9

      # Optional: post score as PR comment
      - name: Post eval score
        if: always()
        env:
          TIMEMACHINE_API_KEY: ${{ secrets.TIMEMACHINE_API_KEY }}
          GH_TOKEN: ${{ github.token }}
        run: |
          SCORE=$(npx @timemachine-sdk/cli eval status --latest --format score)
          gh pr comment ${{ github.event.pull_request.number }} \
            --body "**Eval score:** ${SCORE} / 1.0"

Recommended thresholds

Environment	Threshold	Rationale
Safety-critical (healthcare, finance)	1.0	No regressions tolerated
Production	0.9	Up to 10% failure rate acceptable
Staging / pre-release	0.8	Catch regressions early without blocking velocity
Experimental / nightly	0.7	Track trends; don't block iteration

Webhook-triggered runs

Trigger runs without installing the CLI — useful for serverless environments or non-GitHub CI:

trigger.sh

# Trigger a run via API
RUN=$(curl -s -X POST \
  -H "Authorization: Bearer $TIMEMACHINE_API_KEY" \
  -H "Content-Type: application/json" \
  https://app.timemachinesdk.dev/api/v1/eval/suites/$SUITE_ID/runs)

RUN_ID=$(echo $RUN | jq -r '.id')

# Poll until terminal
while true; do
  STATUS=$(curl -s \
    -H "Authorization: Bearer $TIMEMACHINE_API_KEY" \
    https://app.timemachinesdk.dev/api/v1/eval/runs/$RUN_ID/status)

  STATE=$(echo $STATUS | jq -r '.status')
  SCORE=$(echo $STATUS | jq -r '.score')

  [ "$STATE" = "completed" ] && break
  [ "$STATE" = "failed" ] && exit 1
  sleep 5
done

# Fail if below threshold
awk "BEGIN { exit ($SCORE < 0.9) }" || exit 1

Pro tip: Tag your suites by severity — critical, regression, nightly. Run only critical tagged suites on every PR (fast, cheap), regression on merge, and full nightly on a schedule.

LangChain Adapter

Automatically captures all LLM calls, tool invocations, agent decisions, and retrievals — zero manual instrumentation.

createLangChainHandler(tm, options?)

One-liner to create an execution + callback handler. This is the recommended approach.

langchain.ts

import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';

const { handler, execution } = await createLangChainHandler(tm, {
  name: 'research-agent',
  metadata: { model: 'gpt-4o' },
  debug: false,
  autoCalculateCost: true,
  maxDocumentLength: 500,
});

await agent.invoke(input, { callbacks: [handler] });
await execution.complete();

Option	Type	Default	Description
name	string	—	Execution name
metadata	Record<string, unknown>	—	Execution metadata
debug	boolean	false	Log captured events to console
autoCalculateCost	boolean	true	Auto-calculate cost from token counts
maxDocumentLength	number	500	Max characters for retrieved documents

What gets captured automatically

LangChain Event	Step Type	What's Recorded
LLM / Chat Model call	llm_call	Model name, messages, tokens, cost, latency
Tool invocation	tool_use	Tool name, input, output, latency
Agent action	decision	Action type, tool selection, input
Agent finish	decision	Final output, return values
Retriever call	retrieval	Query, documents (truncated), doc count

Security: Sensitive parameters (api_key, apiKey, callbacks) are automatically stripped from captured data.

OpenRouter

OpenRouter provides a unified API across 200+ models — Anthropic, OpenAI, Google, DeepSeek, Qwen, Llama, and more — through a single endpoint and API key. Time Machine works natively with OpenRouter with zero extra configuration.

Setup

openrouter.ts

import { TimeMachine } from '@timemachine-sdk/sdk';

const tm = new TimeMachine({
  apiKey: process.env.TIMEMACHINE_API_KEY!,
  // No changes needed — configure OpenRouter in your LLM client directly
});

// Use OpenRouter as your LLM provider
import OpenAI from 'openai';

const openrouter = new OpenAI({
  apiKey: process.env.OPENROUTER_API_KEY!,
  baseURL: 'https://openrouter.ai/api/v1',
  defaultHeaders: {
    'HTTP-Referer': 'https://your-app.com',   // optional, for rankings
    'X-Title': 'Your App Name',               // optional
  },
});

Tracking OpenRouter calls

Capture any model routed through OpenRouter — the model name is passed through transparently.

openrouter-execution.ts

const execution = await tm.startExecution({
  name: 'openrouter-agent',
  metadata: { router: 'openrouter' },
});

const step = execution.step('llm_call', {
  model: 'anthropic/claude-opus-4',   // OpenRouter model ID
  messages: [{ role: 'user', content: 'Explain quantum entanglement' }],
});

const response = await openrouter.chat.completions.create({
  model: 'anthropic/claude-opus-4',
  messages: [{ role: 'user', content: 'Explain quantum entanglement' }],
});

await step.complete({
  output: { message: response.choices[0].message.content },
  tokensIn: response.usage?.prompt_tokens,
  tokensOut: response.usage?.completion_tokens,
});

await execution.complete();

LangChain + OpenRouter

The LangChain adapter works seamlessly — just point your ChatOpenAI instance at OpenRouter.

openrouter-langchain.ts

import { ChatOpenAI } from '@langchain/openai';
import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';

const model = new ChatOpenAI({
  modelName: 'google/gemini-2.5-pro',   // or any OpenRouter model
  openAIApiKey: process.env.OPENROUTER_API_KEY!,
  configuration: {
    baseURL: 'https://openrouter.ai/api/v1',
  },
});

const { handler, execution } = await createLangChainHandler(tm, {
  name: 'gemini-via-openrouter',
});

// All LLM calls automatically captured
const result = await model.invoke('What is the latest news?', {
  callbacks: [handler],
});

await execution.complete();

Why use OpenRouter with Time Machine

Benefit	Detail
Single API key	Access 200+ models — no separate accounts for Anthropic, OpenAI, Google, etc.
Model fallback	Configure automatic fallback if a model is unavailable or rate-limited
Cost optimization	Route cheap tasks to smaller models, complex ones to frontier models
Unified billing	One invoice for all LLM costs across providers
Model comparison	Easily A/B test models by swapping the model string — Time Machine captures both

Tip: OpenRouter model IDs use the format provider/model-name (e.g. deepseek/deepseek-r1, qwen/qwen3-235b-a22b). Time Machine stores the full model string in your execution trace for accurate attribution.

Utilities & Cost Tracking

Built-in pricing for 30+ models. Auto-calculated in the LangChain adapter, or use directly.

cost-tracking.ts

import {
  calculateCost,
  hasModelPricing,
  getModelPricing,
  normalizeModelName,
  configureFallbackPricing,
  extractTokensFromLLMResult,
} from '@timemachine-sdk/sdk/utils';

// Calculate cost for known models
const cost = calculateCost('gpt-4o', 1000, 500);
// => 0.00625 (USD)

// Check model pricing availability
hasModelPricing('gpt-4o');          // true
hasModelPricing('my-custom-model'); // false

// Get pricing details
getModelPricing('gpt-4o');
// => { inputPer1k: 0.005, outputPer1k: 0.015 }

// Normalize model names (strips version suffixes)
normalizeModelName('gpt-4-0125-preview');    // 'gpt-4'
normalizeModelName('claude-3-sonnet-20240229'); // 'claude-3-sonnet'

// Configure fallback pricing for unknown models
configureFallbackPricing({
  inputPer1k: 0.002,
  outputPer1k: 0.006,
  enabled: true,
});

// Extract tokens from LLM results (multi-provider)
const { tokensIn, tokensOut } = extractTokensFromLLMResult(llmResult);

Guide: Manual Step Recording

For custom agents or frameworks without a built-in adapter.

manual-agent.ts

import { TimeMachine } from '@timemachine-sdk/sdk';

const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });

async function runAgent(query: string) {
  const execution = await tm.startExecution({
    name: 'research-agent',
    metadata: { query, timestamp: Date.now() },
  });

  try {
    // Step 1: Plan
    const planStep = execution.step('decision', { action: 'plan', query });
    const plan = await generatePlan(query);
    await planStep.complete({ output: { plan } });

    // Step 2: Search
    const searchStep = execution.step('tool_use', {
      tool: 'web_search',
      query: plan.searchQuery,
    });
    const results = await webSearch(plan.searchQuery);
    await searchStep.complete({
      output: { resultCount: results.length, results },
    });

    // Step 3: Synthesize
    const llmStep = execution.step('llm_call', {
      model: 'gpt-4o',
      context: results,
    });
    const answer = await callLLM(query, results);
    await llmStep.complete({
      output: { answer },
      tokensIn: answer.usage.prompt_tokens,
      tokensOut: answer.usage.completion_tokens,
    });

    await execution.complete();
    return answer;
  } catch (error) {
    await execution.fail(error as Error);
    throw error;
  }
}

Guide: Multi-Step Workflows

For agents with sequential or branching logic.

workflow.ts

const execution = await tm.startExecution({ name: 'multi-step-workflow' });

// Step 1: Classify the request
const classifyStep = execution.step('llm_call', { action: 'classify' });
const category = await classifyRequest(userInput);
await classifyStep.complete({ output: { category } });

// Step 2: Route based on classification
const routeStep = execution.step('decision', { category });
const handler = selectHandler(category);
await routeStep.complete({ output: { handler: handler.name } });

// Step 3+: Conditional execution
if (category === 'needs_research') {
  const retrieveStep = execution.step('retrieval', { query: userInput });
  const docs = await vectorStore.similaritySearch(userInput);
  await retrieveStep.complete({ output: { documentCount: docs.length } });

  const answerStep = execution.step('llm_call', { model: 'gpt-4o', context: docs });
  const answer = await generateAnswer(userInput, docs);
  await answerStep.complete({
    output: { answer },
    tokensIn: 2000,
    tokensOut: 500,
  });
}

await execution.complete();

Guide: Error Handling

The SDK is fail-open — it never crashes your app. But you should still record failures for debugging.

error-handling.ts

const execution = await tm.startExecution({ name: 'agent-run' });

try {
  const step = execution.step('tool_use', { tool: 'database_query' });
  const result = await queryDatabase(sql);
  await step.complete({ output: { rows: result.length } });

  await execution.complete();
} catch (error) {
  // Records the error in Time Machine for debugging
  await execution.fail(error as Error);
  throw error;
}

// Enable debug mode to see SDK activity in your console
const tm = new TimeMachine({
  apiKey: process.env.TIMEMACHINE_API_KEY!,
  debug: true,  // Logs all SDK requests and errors
});

Guide: Express / Fastify

Wrap your API route handlers with execution tracking.

server.ts

import express from 'express';
import { TimeMachine } from '@timemachine-sdk/sdk';

const app = express();
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });

app.post('/api/chat', async (req, res) => {
  const execution = await tm.startExecution({
    name: 'chat-endpoint',
    metadata: {
      userId: req.body.userId,
      sessionId: req.body.sessionId,
    },
  });

  try {
    const step = execution.step('llm_call', {
      model: 'gpt-4o',
      messages: req.body.messages,
    });

    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: req.body.messages,
    });

    await step.complete({
      output: { message: response.choices[0].message },
      tokensIn: response.usage?.prompt_tokens,
      tokensOut: response.usage?.completion_tokens,
    });

    await execution.complete();
    res.json({ message: response.choices[0].message.content });
  } catch (error) {
    await execution.fail(error as Error);
    res.status(500).json({ error: 'Internal server error' });
  }
});

Types Reference

Full TypeScript coverage — no any types.

types.ts

// Client configuration
interface TimeMachineConfig {
  apiKey: string;
  baseUrl?: string;        // default: 'https://api.timemachine.dev'
  maxRetries?: number;     // default: 3
  debug?: boolean;         // default: false
}

// Execution creation
interface CreateExecutionRequest {
  name?: string;
  metadata?: Record<string, unknown>;
}

// Step completion
interface StepCompleteOptions {
  output?: Record<string, unknown>;
  stateSnapshot?: Omit<StateSnapshot, 'stepId' | 'timestamp'>;
  tokensIn?: number;
  tokensOut?: number;
  cost?: number;
  latencyMs?: number;
  toolCalls?: ToolCall[];
  error?: StepError;
}

// Tool call record
interface ToolCall {
  name: string;
  input: Record<string, unknown>;
  output?: Record<string, unknown>;
}

// Error record
interface StepError {
  message: string;
  stack?: string;
}

// Status types
type ExecutionStatus = 'running' | 'completed' | 'failed';
type StepStatus = 'running' | 'completed' | 'failed';
type StepType =
  | 'llm_call' | 'tool_use' | 'decision' | 'retrieval'
  | 'human_input' | 'transform' | 'custom';

// Utility types
interface TokenUsage { tokensIn: number; tokensOut: number; }
interface ModelPricing { inputPer1k: number; outputPer1k: number; }
interface FallbackPricingConfig {
  inputPer1k: number;
  outputPer1k: number;
  enabled: boolean;
}

Supported Models

Built-in pricing for frontier and open-source models (2025). For unlisted models use configureFallbackPricing(). Prices in USD per 1,000 tokens — approximate and subject to provider changes.

Anthropic — Claude 4 series

Current flagship family (2025)

Model	Input ($/1k)	Output ($/1k)	Notes
claude-opus-4-6	$0.01500	$0.07500	Most powerful, coding & reasoning
claude-sonnet-4-6	$0.00300	$0.01500	Best balance of speed & quality
claude-haiku-4-5	$0.00080	$0.00400	Fastest, lowest cost
claude-3.5-sonnet	$0.00300	$0.01500	Previous gen, still widely used
claude-3.5-haiku	$0.00080	$0.00400	Previous gen fast model

OpenAI — GPT-4.x & o-series

Including reasoning models (2025)

Model	Input ($/1k)	Output ($/1k)	Notes
gpt-4.5	$0.07500	$0.15000	Multimodal frontier, highest quality
gpt-4.1	$0.00200	$0.00800	Efficient, strong coding
gpt-4o	$0.00500	$0.01500	Omni model, vision + text
gpt-4o-mini	$0.00015	$0.00060	Fast & cheap for simple tasks
o3	$0.01000	$0.04000	Reasoning model, complex problems
o4-mini	$0.00110	$0.00440	Reasoning, optimized for cost
o1	$0.01500	$0.06000	Previous reasoning generation

Google — Gemini 2.5

Long context, multimodal (2025)

Model	Input ($/1k)	Output ($/1k)	Notes
gemini-2.5-pro	$0.00125	$0.01000	1M context, top coding & reasoning
gemini-2.5-flash	$0.00008	$0.00030	Low latency, best value
gemini-2.0-flash	$0.00010	$0.00040	Previous gen flash
gemini-1.5-pro	$0.00125	$0.00500	Legacy, 2M context

DeepSeek

Open-source, extremely cost-efficient

Model	Input ($/1k)	Output ($/1k)	Notes
deepseek-r1	$0.00014	$0.00219	Reasoning model, matches o1 quality
deepseek-v3	$0.00007	$0.00110	Dense MoE, strong general tasks
deepseek-r1-zero	$0.00014	$0.00219	RL-trained reasoning, no SFT

Qwen — Alibaba

Strong multilingual & coding

Model	Input ($/1k)	Output ($/1k)	Notes
qwen3-235b-a22b	$0.00022	$0.00088	Flagship MoE, top open model
qwen3-32b	$0.00018	$0.00072	Dense, strong reasoning
qwen2.5-72b	$0.00023	$0.00069	Previous gen, widely deployed
qwen2.5-coder-32b	$0.00015	$0.00060	Best-in-class code generation

Kimi — Moonshot AI

Long-context specialist (Chinese frontier lab)

Model	Input ($/1k)	Output ($/1k)	Notes
kimi-k2	$0.00060	$0.00250	Agentic reasoning, 1M context
moonshot-v1-128k	$0.01200	$0.01200	Ultra long context
moonshot-v1-32k	$0.00400	$0.00400	Standard context window

GLM — Zhipu AI

Chinese lab, strong bilingual performance

Model	Input ($/1k)	Output ($/1k)	Notes
glm-5	$0.00100	$0.00300	Latest flagship, vision + reasoning
glm-4-plus	$0.00070	$0.00140	Enhanced GLM-4, long context
glm-4	$0.00014	$0.00014	Fast, cost-effective baseline

Meta — Llama 4

Open weights, free to self-host

Model	Input ($/1k)	Output ($/1k)	Notes
llama-4-maverick	$0.00019	$0.00085	17B MoE, multimodal
llama-4-scout	$0.00017	$0.00017	17B MoE, ultra-efficient
llama-3.3-70b	$0.00023	$0.00040	Previous gen, solid baseline

Mistral

Model	Input ($/1k)	Output ($/1k)	Notes
mistral-large-2	$0.00200	$0.00600	Top Mistral model
mistral-small-3	$0.00010	$0.00030	Fast, lightweight
codestral	$0.00030	$0.00090	Code-specialized

Pricing note: Prices are approximate as of mid-2025 and change frequently. For models not listed, use configureFallbackPricing() or pass cost directly in step.complete(). When using OpenRouter, pass the model string as-is (e.g. deepseek/deepseek-r1) — it will be stored in the execution trace for your records.

Design Principles

Fail-Open

Never crashes your application. If the API is unreachable, errors are silently logged and your agent continues.

Zero Overhead

Steps are batched asynchronously (up to 10 per batch, flushed every 500ms) to minimize performance impact.

Framework Agnostic

Manual step recording works with any framework. LangChain adapter provides automatic capture.

Type Safe

Full TypeScript coverage with no any types. Autocomplete and compile-time checks everywhere.

Tree-Shakeable

Sub-path exports (/adapters, /utils) let you import only what you need. Minimal bundle impact.

Production Ready

Exponential backoff retries, automatic batching, sensitive data filtering, and graceful error handling.

Pricing

Core observability (executions, steps, fork & replay) is free for all plans. Eval runs are the primary usage metric — they consume compute to replay your agent against each test case.

Free

$0/mo

100 eval runs / month
1 eval suite
10 test cases
Manual runs only
Dashboard access
Community support

Get started free

Pro

Popular

$49/mo

2,000 eval runs / month
Unlimited suites
Unlimited test cases
CI/CD integration
LLM-as-judge assertions
API + CLI access
Email support

Start Pro trial

Team

$199/mo

10,000 eval runs / month
Unlimited suites & cases
5 seats included
Scheduled eval runs
Slack / webhook alerts
Priority support
Usage dashboard

Start Team trial

Enterprise

Custom

Unlimited eval runs
Unlimited seats
SSO / SAML
Audit logs
SLA guarantee
Dedicated support
Custom integrations

Usage-based add-ons

Extra eval runs

$0.01 per run above plan limit

LLM judge tokens

Cost + 20% margin (passed through at near-cost)

Extra seats (Team)

$29 / seat / month

All plans include: Unlimited executions captured, unlimited steps, fork & replay, dashboard access, SDK & API access, Claude Code integration, MCP server, and CLI. Eval runs are the only metered resource.

Troubleshooting

Steps are not appearing in the dashboard

Check your API key. Enable debug mode: new TimeMachine({ apiKey: "...", debug: true }). Make sure you call await execution.complete() — steps are flushed on completion.

LangChain adapter isn't capturing events

Make sure you pass the handler in the callbacks array: await agent.invoke(input, { callbacks: [handler] }). Without the callbacks option, nothing is captured.

Cost shows as 0

The model may not be in the built-in pricing table. Use hasModelPricing() to check. Configure fallback pricing with configureFallbackPricing() or pass cost directly in step.complete().

TypeScript errors with sub-path imports

Set moduleResolution to "bundler" or "node16" in your tsconfig.json. This is required for sub-path exports to resolve correctly.

"Cannot find module @timemachine-sdk/sdk"

Run npm install @timemachine-sdk/sdk. Make sure the package is in your dependencies, not just devDependencies (unless you only need it in dev).

Back to home

npm·GitHub·@timemachine-sdk/sdk v0.1.0

@timemachine-sdk/sdk

Platform Features

Fork & Replay

Step-by-Step Tracking

Visual Diff & Model Comparison

Review Queue

Data Drift Detection

Execution Timeline

Installation

Sub-path exports

Get Your API Key

1Create an account

2Create a project

3Copy your API key

4Set your environment variable

Quick Start

1Initialize the client

2Capture an execution

3LangChain integration (automatic capture)

Core Concepts

Execution

Step

Fork & Replay

TimeMachine

Constructor

tm.startExecution(options?)

Execution

Properties

execution.step(type, input?)

execution.complete()

execution.fail(error)

execution.getStatus()

StepRecorder

step.complete(options?)

step.fail(error)

step.getStatus() / step.getIndex()

Step Types

Claude Code Integration

How it works

Prerequisites

Step 1: Install the SDK

Step 2: Set Environment Variables

Step 3: Install Hooks

All 11 Hook Events

Step 4: Verify the Setup

Architecture

File Layout

Dashboard Features

Security Best Practices

Troubleshooting

Hooks aren't firing

“TIMEMACHINE_API_KEY is required” error

Execution appears but has no steps

Bridge state is stale

Quick Reference

MCP Server

Installation

Available Tools

Example prompts

How it works

CLI — tm

Install

Configuration

Commands

Typical debug workflow

Eval Platform

Key concepts

Create a suite

Save cases from production

Run a suite

Assertions

Assertion example

CI/CD Integration

GitHub Actions setup

Recommended thresholds

Webhook-triggered runs

LangChain Adapter

createLangChainHandler(tm, options?)

What gets captured automatically

OpenRouter

CLI — `tm`