Back
$timemachinedocs
Documentation

@timemachine-sdk/sdk

AI Agent Observability SDK. Capture every execution step, fork from any point, replay with modifications, and compare results — with zero impact on your agent's performance.

Platform Features

Fork & Replay

Fork any execution at any step and replay from that point forward. Only the steps after the fork point are re-executed.

  • Fork from any step in the execution graph
  • Modify inputs, prompts, or tool configurations
  • Replay only from the fork point (not from scratch)
  • Compare original vs forked execution side-by-side

Step-by-Step Tracking

Every action your agent takes is captured with full context — inputs, outputs, state snapshots, token usage, and costs.

  • LLM calls, tool use, decisions, retrievals
  • Full state snapshot at each step
  • Token usage and cost per step
  • Latency tracking and performance metrics

Visual Diff & Model Comparison

Run the same prompt through different models simultaneously. See outputs side-by-side with diff highlighting.

  • Dual-pane comparison across 8+ models
  • Word-level diff highlighting (added/removed)
  • Token, latency, and cost metrics per model

Review Queue

Human-in-the-loop feedback workflow. Reviewers mark outputs as correct or wrong, developers get debug packages.

  • Three-phase workflow: Pending → Wrong → Resolved
  • One-click debug package generation
  • Batch replay & validate

Data Drift Detection

Detect when agent outputs change for the same inputs over time. Variable analysis pinpoints root causes.

  • Auto-detect output drift across executions
  • Variable-by-variable root cause analysis
  • Visual divergence timeline

Execution Timeline

Interactive Gantt chart visualization. Spot bottlenecks instantly with cascading bars color-coded by step type.

  • Cascading Gantt bars by type (LLM, tool, decision)
  • Collapsible trace tree with hierarchy
  • Zoom, pan, and keyboard navigation

Installation

Requires Node.js 18+ or any modern JavaScript runtime (Bun, Deno).

terminal
npm install @timemachine-sdk/sdk
yarn add @timemachine-sdk/sdk|bun add @timemachine-sdk/sdk|pnpm add @timemachine-sdk/sdk

Sub-path exports

Import only what you need to keep your bundle minimal:

imports.ts
// Core — client, execution, step recorder, types
import { TimeMachine, Execution, StepRecorder } from '@timemachine-sdk/sdk';
// Adapters — LangChain callback handler
import { TimeMachineCallbackHandler, createLangChainHandler }
from '@timemachine-sdk/sdk/adapters';
// Utilities — cost calculation, token extraction
import { calculateCost, hasModelPricing, normalizeModelName }
from '@timemachine-sdk/sdk/utils';

Get Your API Key

Before using the SDK, you need an API key. Follow these steps to get one from the Time Machine dashboard.

1Create an account

Sign up for a free Time Machine account. No credit card required.

Sign Up

2Create a project

Once logged in, click New Project and give it a name (e.g. "my-agent"). A project groups all your executions together.

3Copy your API key

Your API key is displayed once when the project is created. It starts with tm_.

The API key is shown only once. Copy it immediately and store it somewhere safe. If you lose it, you'll need to generate a new one from project settings.

4Set your environment variable

terminal
export TIMEMACHINE_API_KEY=tm_your_key_here

Or add it to a .env file in your project root:

.env
TIMEMACHINE_API_KEY=tm_your_key_here

Quick Start

Don't have an API key yet? Get one here.

1Initialize the client

agent.ts
import { TimeMachine } from '@timemachine-sdk/sdk';
const tm = new TimeMachine({
apiKey: process.env.TIMEMACHINE_API_KEY!,
// baseUrl defaults to https://api.timemachine.dev
});

2Capture an execution

agent.ts
const execution = await tm.startExecution({
name: 'customer-support-agent',
metadata: { userId: 'user_123', environment: 'production' },
});
// Record an LLM call step
const step = execution.step('llm_call', {
model: 'gpt-4o',
prompt: 'Analyze the customer request...',
});
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Help me reset my password' }],
});
await step.complete({
output: { message: response.choices[0].message.content },
tokensIn: response.usage?.prompt_tokens,
tokensOut: response.usage?.completion_tokens,
});
// Mark execution as done
await execution.complete();

3LangChain integration (automatic capture)

langchain-agent.ts
import { TimeMachine } from '@timemachine-sdk/sdk';
import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });
const { handler, execution } = await createLangChainHandler(tm, {
name: 'research-agent',
metadata: { model: 'gpt-4o' },
});
// Every LLM call, tool use, and decision is captured automatically
const result = await agent.invoke(
{ input: 'Research quantum computing trends' },
{ callbacks: [handler] },
);
await execution.complete();

Core Concepts

Execution

An execution represents one complete run of your AI agent — from start to finish. It has a name, optional metadata, and contains a sequence of steps. An execution can berunning,completed, orfailed.

Step

A step is a single action within an execution. Every LLM call, tool use, decision, or retrieval is a step. Steps capture type, input, output, token counts, cost, latency, tool calls, and optional state snapshots for fork & replay.

Fork & Replay

The killer feature: fork any execution at any step and replay from that point forward with modifications. Only the steps after the fork point are re-executed — prior steps are reused. This lets you debug agent failures without re-running the entire pipeline.

TimeMachine

The main entry point. Create one instance and reuse it across your application.

Constructor

initialization
const tm = new TimeMachine({
apiKey: 'tm_...', // required
baseUrl: 'https://...', // default: https://api.timemachine.dev
maxRetries: 3, // default: 3 (exponential backoff)
debug: false, // default: false
});
ParameterTypeDefaultDescription
apiKeystringrequiredYour API key (format: tm_...)
baseUrlstringhttps://api.timemachine.devAPI endpoint URL
maxRetriesnumber3Retries with exponential backoff
debugbooleanfalseLog SDK activity to console

tm.startExecution(options?)

Creates a new execution and returns an Execution instance.

const execution = await tm.startExecution({
name: 'my-agent-run',
metadata: { model: 'gpt-4o', version: '1.2.0' },
});
ParameterTypeDescription
namestringHuman-readable name for the execution
metadataRecord<string, unknown>Arbitrary key-value data attached to the execution

Execution

Represents a running execution. Created via tm.startExecution().

Properties

PropertyTypeDescription
idstringUnique execution ID (read-only)
projectIdstringProject ID from the API (read-only)

execution.step(type, input?)

Creates a new step recorder. The latency timer starts immediately.

const step = execution.step('llm_call', {
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});

execution.complete()

Marks the execution as completed. Flushes any pending batched steps before completing.

await execution.complete();

execution.fail(error)

Marks the execution as failed with error details. Accepts an Error object or a string.

await execution.fail(new Error('LLM returned invalid JSON'));
// or
await execution.fail('Rate limited by OpenAI');

execution.getStatus()

Returns the current status.

const status = execution.getStatus();
// 'running' | 'completed' | 'failed'

StepRecorder

Records a single step. Created via execution.step(). Latency is auto-calculated from creation time.

step.complete(options?)

Marks the step as completed with optional output and metrics.

await step.complete({
output: { response: 'Here is the answer...' },
tokensIn: 150,
tokensOut: 300,
cost: 0.0045,
latencyMs: 1200, // auto-calculated if omitted
toolCalls: [
{ name: 'web_search', input: { query: 'news' }, output: { results: [...] } }
],
stateSnapshot: {
agentState: { memory: [...], plan: [...] },
},
});
ParameterTypeDescription
outputRecord<string, unknown>Output data from this step
stateSnapshotobjectAgent state snapshot for fork & replay
tokensInnumberNumber of input tokens
tokensOutnumberNumber of output tokens
costnumberCost in USD
latencyMsnumberLatency in ms (auto-calculated if omitted)
toolCallsToolCall[]Tool/function calls made during this step
errorStepErrorError details (step completes but with error info)

step.fail(error)

await step.fail(new Error('API timeout'));

step.getStatus() / step.getIndex()

step.getStatus(); // 'running' | 'completed' | 'failed'
step.getIndex(); // 0-based index in execution sequence

Step Types

Steps are categorized by type for filtering and analysis in the dashboard.

TypeDescriptionTypical Use
llm_callLLM or chat model invocationOpenAI, Anthropic, Google API calls
tool_useTool or function callWeb search, database queries, API calls
decisionAgent routing or planningAgent selecting which tool to use
retrievalRAG or document retrievalVector store queries, document fetches
human_inputHuman-in-the-loop interactionApproval prompts, user feedback
transformData transformationParsing, formatting, summarization
customAnything elseCustom logic, business rules

Claude Code Integration

Automatically capture every Claude Code session as a traced execution you can inspect, replay, and fork — zero code changes needed.

How it works

Claude Code exposes lifecycle hooks — shell commands that fire on events like session start, tool use, prompt submission, and session end. Time Machine provides a hook bridge that receives these events via stdin and records them as execution steps.

Claude Code (hook event) → stdin JSONbridge scriptTime Machine APIdashboard

Prerequisites

  • Node.js >= 18 or Bun installed
  • Claude Code CLI installed (claude command available)
  • A running Time Machine instance (local dev or hosted)
  • A Time Machine project with an API key (get one here)

Step 1: Install the SDK

terminal
# npm
npm install @timemachine-sdk/sdk
# bun
bun add @timemachine-sdk/sdk
# pnpm
pnpm add @timemachine-sdk/sdk

Step 2: Set Environment Variables

The bridge reads two environment variables. Add them to your shell profile (~/.zshrc, ~/.bashrc) or export them before launching Claude Code:

~/.zshrc
export TIMEMACHINE_API_KEY="tm_your-api-key-here"
export TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev" # or http://localhost:3000

Reload your shell with source ~/.zshrc. For debugging, set TIMEMACHINE_DEBUG=1 to see bridge logs in stderr.

Step 3: Install Hooks

Option A: Automatic (recommended) — Run the installer from your project directory:

terminal
node --input-type=module -e "
import { installClaudeCodeHooks } from '@timemachine-sdk/sdk/claude-code-installer';
const result = await installClaudeCodeHooks({
projectDir: process.cwd(),
scope: 'local'
});
console.log(result);
"

This creates .claude/hooks/timemachine-bridge.mjs and merges hook entries into .claude/settings.local.json for all 11 lifecycle events.

Option A1: Shell wrapper with .env file (recommended for local use) — If Claude Code doesn't inherit your shell environment, use a wrapper that sources a .env file:

.claude/hooks/.env
TIMEMACHINE_API_KEY="tm_your_project_key"
TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev"
.claude/hooks/run-bridge.sh
#!/bin/bash
set -a
source "$(dirname "$0")/.env"
set +a
exec node "$(dirname "$0")/timemachine-bridge.mjs" "$@"

Make it executable: chmod +x .claude/hooks/run-bridge.sh. Then point all hooks at the wrapper in .claude/settings.local.json.

Option B: Manual installation — Add hook entries to .claude/settings.local.json yourself:

.claude/settings.local.json
{
"hooks": {
"SessionStart": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "node /absolute/path/to/.claude/hooks/timemachine-bridge.mjs"
}
]
}
]
}
}

Repeat the same structure for all 11 events. Then create the bridge script:

.claude/hooks/timemachine-bridge.mjs
import { runClaudeCodeHookBridge } from '@timemachine-sdk/sdk/claude-code-bridge';
runClaudeCodeHookBridge().catch((error) => {
console.error('[TimeMachine][ClaudeCodeBridge]', error);
process.exitCode = 1;
});

All 11 Hook Events

EventStep TypeWhat's Recorded
SessionStartcustomSession ID, working directory
UserPromptSubmithuman_inputThe user's prompt text
PostToolUsetool_useTool name, success output
PostToolUseFailuretool_useTool name, error details
NotificationcustomNotification message
StopcustomStop reason
SubagentStartcustomSubagent lifecycle start
SubagentStopcustomSubagent lifecycle end
PreCompactcustomContext compaction
PermissionRequestcustomPermission decision
SessionEndcustomFinal status, transcript ingestion

On SessionEnd, the bridge also parses Claude Code's transcript file (JSONL) to extract assistant messages and file edits that hooks don't capture directly.

Step 4: Verify the Setup

terminal
# Check hooks are configured
cat .claude/settings.local.json | python3 -m json.tool | grep -c '"hooks"'
# Check environment variables
echo $TIMEMACHINE_API_KEY # should start with tm_
echo $TIMEMACHINE_BASE_URL # should be your server URL
# Test the API key
curl -s -H "Authorization: Bearer $TIMEMACHINE_API_KEY" \
$TIMEMACHINE_BASE_URL/api/v1/executions | head -c 200
# Run a session — then check your dashboard
claude

Architecture

Each hook invocation is a separate short-lived process. The bridge uses a file-based state store at ~/.timemachine/claude-code/ to correlate events across invocations:

1. Claude Code fires hook event → passes JSON to stdin
2. Bridge reads stdin, normalizes the event payload
3. Checks ~/.timemachine/claude-code/<session>.json for existing execution
• No state → creates new execution via API
• State exists → resumes execution
4. Converts event to Time Machine step and records it
5. On SessionEnd: parses transcript, appends derived steps, marks complete

File Layout

project structure
.claude/settings.local.json # Hook configuration (11 events)
.claude/hooks/.env # API key + base URL (gitignored)
.claude/hooks/run-bridge.sh # Shell wrapper — sources .env, runs node
.claude/hooks/timemachine-bridge.mjs # Entrypoint — imports SDK bridge
~/.timemachine/claude-code/ # Session state (auto-cleaned on SessionEnd)
<session-id>.json # Maps session → executionId

Dashboard Features

  • Filter by source — Use the “Claude Code” filter on the executions list
  • Step timeline — See every prompt, tool call, and response in order
  • Inspect step details — Click any step to see full input/output JSON
  • Session replay — Scrub through the timeline to replay what happened
  • Fork from any step — Right-click a step to fork the execution from that point

Security Best Practices

Important
  • • Keep .claude/settings.local.json uncommitted
  • • Store TIMEMACHINE_API_KEY in your shell profile, direnv, or a secret manager
  • • Add .claude/hooks/.env to your .gitignore
  • • Never commit a live tm_... key in repo-controlled JSON
  • • Rotate the key immediately if it was committed or shared in screenshots

Troubleshooting

Hooks aren't firing

Make sure .claude/settings.local.json is valid JSON. Verify the command path is absolute and the file exists. Check that node or bun is in your PATH.

“TIMEMACHINE_API_KEY is required” error

Export the variable in the same shell where you run claude. If using a new tab, make sure it's in your shell profile. As a fallback, use the shell wrapper approach with a .env file.

Execution appears but has no steps

Set TIMEMACHINE_DEBUG=1 to see bridge logs. Test the bridge manually: echo '{"session_id":"test","hook_event_name":"SessionStart"}' | node .claude/hooks/timemachine-bridge.mjs

Bridge state is stale

If a session ended abnormally, clean up: rm ~/.timemachine/claude-code/*.json

Quick Reference

terminal
# Install SDK
bun add @timemachine-sdk/sdk
# Set env vars
export TIMEMACHINE_API_KEY="tm_..."
export TIMEMACHINE_BASE_URL="https://app.timemachinesdk.dev"
# Install hooks (automatic)
node --input-type=module -e "
import { installClaudeCodeHooks } from '@timemachine-sdk/sdk/claude-code-installer';
await installClaudeCodeHooks({ projectDir: process.cwd(), scope: 'local' });
"
# Verify — run a session, check dashboard
claude

MCP Server

The @timemachine-sdk/mcp package exposes your project's runs, traces, and steps as MCP tools that Claude Code can call directly — without opening a browser. Inspect failures, walk through traces, and get aggregate stats all within the Claude Code terminal.

The MCP server uses the same v1 API, reads three environment variables, and communicates with Claude Code over stdio (no daemon, no port).

Installation

Add the following to your project's .claude/settings.json (or ~/.claude/settings.json for a global install):

.claude/settings.json
{
"mcpServers": {
"timemachine": {
"command": "npx",
"args": ["-y", "@timemachine-sdk/mcp"],
"env": {
"TIMEMACHINE_API_KEY": "tm_...",
"TIMEMACHINE_PROJECT_ID": "proj_...",
"TIMEMACHINE_BASE_URL": "https://app.timemachinesdk.dev"
}
}
}
}

Restart Claude Code after saving. The server starts on demand — you'll see a timemachine entry in /mcp.

Available Tools

Six tools are registered. All return structured plain-text so Claude can reason over results directly:

ToolDescriptionKey params
list_executionsList executions with optional filtersstatus, runtime, limit (default 20)
get_executionFull execution detail — name, status, cost, tokens, metadataexecution_id
get_stepsAll steps with type, status, latency, input, output, and error detailexecution_id
get_failed_runsShortcut: recent failed executions with debug hintslimit (default 10)
tail_executionPoll an in-progress execution until it reaches a terminal stateexecution_id
get_project_statsAggregate stats across your last 100 runs — success rate, avg cost, avg tokens, p95 latency

Example prompts

Show me my last 5 failed runs.
Get the full trace for execution abc123 and tell me what went wrong.
What was my agent doing in the last Claude Code session?
Give me aggregate stats on my last 100 runs.
Watch execution def456 until it finishes.

How it works

you typeClaude Code reads prompt
Claude decidescalls get_failed_runs
MCP serverfetches GET /api/v1/executions?status=failed
returnsstructured text Claude can reason over
Claude respondsinline trace analysis + fix suggestions

Roadmap: Querying runs is step one. Native replay — inspect a failure, fork from the problem step, re-run with a fix — is in development. See the native replay roadmap.

CLI — tm

@timemachine-sdk/cli gives you a native terminal interface to your Time Machine project. List runs, tail live executions, inspect traces, fork at a failed step, and open the dashboard — all from your shell, without touching a browser.

The CLI is the fastest way to debug a failure: tm failed shows the last crash, tm view <id> walks the full trace, tm fork <id> --replay re-runs from the broken step.

Install

terminal
npm install -g @timemachine-sdk/cli
# or run without installing:
npx @timemachine-sdk/cli --help

Configuration

Run tm config set once to store credentials in ~/.timemachine/config.json:

terminal
tm config set --api-key tm_... --project-id proj_...
# Or use environment variables:
export TIMEMACHINE_API_KEY=tm_...
export TIMEMACHINE_PROJECT_ID=proj_...

Commands

CommandDescription
tm lsList recent executions in a color-coded table (status, cost, tokens, duration)
tm ls --status failedFilter by status: running | completed | failed | cancelled
tm ls --runtime langchainFilter by runtime tag
tm view <id>Full trace — all steps with type, latency, cost, LLM output snippet, and tool name
tm view <id> --jsonRaw JSON output — pipe to jq or save for diffing
tm tail [id]Stream a live execution, printing new steps as they arrive. Omit ID to tail the latest.
tm tail --allShow all existing steps on attach, then stream new ones
tm failedRecent failed executions with one-line debug hints
tm fork <id>Fork an execution at a chosen step (interactive step picker)
tm fork <id> --at 3Fork at step index 3 directly
tm fork <id> --at 3 --replayFork and immediately start replay
tm statsAggregate stats across last 100 runs: success rate, avg cost, avg tokens, p95 latency
tm open <id>Open execution in dashboard (browser)
tm setupInstall Claude Code hooks — same as installClaudeCodeHooks() but from the terminal
tm config showDisplay current config (key redacted)

Typical debug workflow

terminal
# 1. See what failed
tm failed
# 2. Inspect the trace
tm view exec_abc123
# 3. Fork at the broken step and replay
tm fork exec_abc123 --at 4 --replay
# 4. Watch the replay live
tm tail exec_def456

Claude Code integration: Run tm setup to install hooks that automatically capture every Claude Code session as a traced execution — no code changes needed. Then use tm tail to watch your Claude session live.

Eval Platform

Ship agent changes with confidence. Time Machine's eval platform lets you define test suites of real production inputs, assert on outputs, and gate deployments on passing scores — all backed by the same fork & replay infrastructure that powers the dashboard.

Every eval run is a replay: your test case inputs are forked through your live agent, results are scored by assertion, and a 0–1 score rolls up per suite. Wire it into CI/CD and every PR gets an automated quality gate.

Key concepts

Eval Suite

A named collection of test cases — e.g. "Customer Support Quality" or "Reasoning Accuracy".

Test Case

One input (and optional expected output) your agent will be replayed against.

Eval Run

One execution of a full suite — forks each case, runs assertions, returns a 0–1 score.

Assertion

A pass/fail check on the agent's output: contains, regex, llm_judge, cost_under, and more.

Score

Aggregate 0.0–1.0 across all assertions in a run. Set a minimum threshold in CI.

LLM Judge

Ask a language model to grade output quality against a rubric — subjective tests made quantifiable.

Create a suite

Create via the dashboard (Evals → New Suite) or the API:

eval-suite.ts
import { TimeMachine } from '@timemachine-sdk/sdk';
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });
const suite = await tm.createEvalSuite({
name: 'Customer Support Quality',
description: 'Verify response accuracy, tone, and latency',
agentEndpoint: 'https://your-api.com/agent',
tags: ['production', 'support'],
});
// Add test cases
await tm.addEvalCase(suite.id, {
input: { message: 'How do I reset my password?' },
expectedOutput: { contains: 'reset link' },
tags: ['auth'],
});
await tm.addEvalCase(suite.id, {
input: { message: 'Cancel my subscription' },
tags: ['billing'],
});

Save cases from production

The fastest way to build a test suite: save real executions directly from the dashboard. Click any execution → Save as eval case. The input is captured and linked to the suite of your choice.

Run a suite

Three ways to trigger an eval run:

DashboardNavigate to Evals → your suite → Run Now. Results appear live as cases complete.
APIPOST /api/v1/eval/suites/:id/runs — returns a run ID you can poll for status.
CLItm eval run <suiteId> --wait --threshold 0.9 — blocks until done, exits non-zero on failure.
terminal
# Run a suite and wait for results
tm eval run suite_abc123 --wait
# Run with a minimum pass threshold (CI use-case)
tm eval run suite_abc123 --wait --threshold 0.9
# Check status of a previous run
tm eval status run_def456
# List recent runs
tm eval list suite_abc123

Assertions

Assertions are the scoring rules for each test case. A case passes if all its assertions pass; partial passes score proportionally. Assertions are defined per case and evaluated after each eval run.

TypeDescriptionConfig
containsOutput string includes a substring (case-insensitive){ value: "reset link" }
not_containsOutput does not include a substring{ value: "error" }
regexOutput matches a regular expression{ pattern: "order #\\d+" }
llm_judgeLLM grades output against a rubric (0–1 score){ rubric: "Is the response helpful and accurate?", threshold: 0.8 }
json_validOutput is valid JSON{}
json_pathA JSON path equals an expected value{ path: "$.status", value: "success" }
cost_underTotal execution cost is below threshold ($USD){ maxCost: 0.05 }
latency_underTotal execution latency is below threshold (ms){ maxLatencyMs: 3000 }
step_countExecution has exactly N steps{ count: 5 }
customCustom JS function evaluated server-side{ fn: "(output) => output.length > 10" }

Assertion example

assertions.ts
await tm.addEvalCase(suite.id, {
input: { query: 'Summarise this article in 3 bullet points.' },
assertions: [
// Output must contain bullet points
{ type: 'contains', value: '•' },
// Graded by LLM on conciseness + accuracy
{
type: 'llm_judge',
rubric: 'Does the response contain exactly 3 concise bullet points that accurately summarise the article?',
threshold: 0.8,
},
// Must complete within 5s and under $0.02
{ type: 'latency_under', maxLatencyMs: 5000 },
{ type: 'cost_under', maxCost: 0.02 },
],
});

LLM Judge: Use sparingly in CI — each judge call adds LLM cost per run. A good pattern is to combine cheap structural assertions (contains, regex) as fast gates and reserve llm_judge for nightly or pre-release runs.

CI/CD Integration

Block merges on eval regressions. Add the eval run as a required status check and every PR automatically gates on your quality threshold.

The flow: PR opened → GitHub Actions workflow runs → CLI triggers your suite via the API → polls for completion → exits non-zero if score is below threshold → PR blocked until green.

GitHub Actions setup

1. Add TIMEMACHINE_API_KEY to your repo Secrets.
2. Add EVAL_SUITE_ID to your repo Variables.
3. Add this workflow:

.github/workflows/evals.yml
name: Eval Suite
on:
pull_request:
branches: [main]
workflow_dispatch:
jobs:
evals:
name: Run eval suite
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run eval suite (threshold 0.9)
env:
TIMEMACHINE_API_KEY: ${{ secrets.TIMEMACHINE_API_KEY }}
run: |
npx @timemachine-sdk/cli eval run ${{ vars.EVAL_SUITE_ID }} \
--wait \
--threshold 0.9
# Optional: post score as PR comment
- name: Post eval score
if: always()
env:
TIMEMACHINE_API_KEY: ${{ secrets.TIMEMACHINE_API_KEY }}
GH_TOKEN: ${{ github.token }}
run: |
SCORE=$(npx @timemachine-sdk/cli eval status --latest --format score)
gh pr comment ${{ github.event.pull_request.number }} \
--body "**Eval score:** ${SCORE} / 1.0"

Recommended thresholds

EnvironmentThresholdRationale
Safety-critical (healthcare, finance)1.0No regressions tolerated
Production0.9Up to 10% failure rate acceptable
Staging / pre-release0.8Catch regressions early without blocking velocity
Experimental / nightly0.7Track trends; don't block iteration

Webhook-triggered runs

Trigger runs without installing the CLI — useful for serverless environments or non-GitHub CI:

trigger.sh
# Trigger a run via API
RUN=$(curl -s -X POST \
-H "Authorization: Bearer $TIMEMACHINE_API_KEY" \
-H "Content-Type: application/json" \
https://app.timemachinesdk.dev/api/v1/eval/suites/$SUITE_ID/runs)
RUN_ID=$(echo $RUN | jq -r '.id')
# Poll until terminal
while true; do
STATUS=$(curl -s \
-H "Authorization: Bearer $TIMEMACHINE_API_KEY" \
https://app.timemachinesdk.dev/api/v1/eval/runs/$RUN_ID/status)
STATE=$(echo $STATUS | jq -r '.status')
SCORE=$(echo $STATUS | jq -r '.score')
[ "$STATE" = "completed" ] && break
[ "$STATE" = "failed" ] && exit 1
sleep 5
done
# Fail if below threshold
awk "BEGIN { exit ($SCORE < 0.9) }" || exit 1

Pro tip: Tag your suites by severity — critical, regression, nightly. Run only critical tagged suites on every PR (fast, cheap), regression on merge, and full nightly on a schedule.

LangChain Adapter

Automatically captures all LLM calls, tool invocations, agent decisions, and retrievals — zero manual instrumentation.

createLangChainHandler(tm, options?)

One-liner to create an execution + callback handler. This is the recommended approach.

langchain.ts
import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';
const { handler, execution } = await createLangChainHandler(tm, {
name: 'research-agent',
metadata: { model: 'gpt-4o' },
debug: false,
autoCalculateCost: true,
maxDocumentLength: 500,
});
await agent.invoke(input, { callbacks: [handler] });
await execution.complete();
OptionTypeDefaultDescription
namestringExecution name
metadataRecord<string, unknown>Execution metadata
debugbooleanfalseLog captured events to console
autoCalculateCostbooleantrueAuto-calculate cost from token counts
maxDocumentLengthnumber500Max characters for retrieved documents

What gets captured automatically

LangChain EventStep TypeWhat's Recorded
LLM / Chat Model callllm_callModel name, messages, tokens, cost, latency
Tool invocationtool_useTool name, input, output, latency
Agent actiondecisionAction type, tool selection, input
Agent finishdecisionFinal output, return values
Retriever callretrievalQuery, documents (truncated), doc count

Security: Sensitive parameters (api_key, apiKey, callbacks) are automatically stripped from captured data.

OpenRouter

OpenRouter provides a unified API across 200+ models — Anthropic, OpenAI, Google, DeepSeek, Qwen, Llama, and more — through a single endpoint and API key. Time Machine works natively with OpenRouter with zero extra configuration.

Setup

openrouter.ts
import { TimeMachine } from '@timemachine-sdk/sdk';
const tm = new TimeMachine({
apiKey: process.env.TIMEMACHINE_API_KEY!,
// No changes needed — configure OpenRouter in your LLM client directly
});
// Use OpenRouter as your LLM provider
import OpenAI from 'openai';
const openrouter = new OpenAI({
apiKey: process.env.OPENROUTER_API_KEY!,
baseURL: 'https://openrouter.ai/api/v1',
defaultHeaders: {
'HTTP-Referer': 'https://your-app.com', // optional, for rankings
'X-Title': 'Your App Name', // optional
},
});

Tracking OpenRouter calls

Capture any model routed through OpenRouter — the model name is passed through transparently.

openrouter-execution.ts
const execution = await tm.startExecution({
name: 'openrouter-agent',
metadata: { router: 'openrouter' },
});
const step = execution.step('llm_call', {
model: 'anthropic/claude-opus-4', // OpenRouter model ID
messages: [{ role: 'user', content: 'Explain quantum entanglement' }],
});
const response = await openrouter.chat.completions.create({
model: 'anthropic/claude-opus-4',
messages: [{ role: 'user', content: 'Explain quantum entanglement' }],
});
await step.complete({
output: { message: response.choices[0].message.content },
tokensIn: response.usage?.prompt_tokens,
tokensOut: response.usage?.completion_tokens,
});
await execution.complete();

LangChain + OpenRouter

The LangChain adapter works seamlessly — just point your ChatOpenAI instance at OpenRouter.

openrouter-langchain.ts
import { ChatOpenAI } from '@langchain/openai';
import { createLangChainHandler } from '@timemachine-sdk/sdk/adapters';
const model = new ChatOpenAI({
modelName: 'google/gemini-2.5-pro', // or any OpenRouter model
openAIApiKey: process.env.OPENROUTER_API_KEY!,
configuration: {
baseURL: 'https://openrouter.ai/api/v1',
},
});
const { handler, execution } = await createLangChainHandler(tm, {
name: 'gemini-via-openrouter',
});
// All LLM calls automatically captured
const result = await model.invoke('What is the latest news?', {
callbacks: [handler],
});
await execution.complete();

Why use OpenRouter with Time Machine

BenefitDetail
Single API keyAccess 200+ models — no separate accounts for Anthropic, OpenAI, Google, etc.
Model fallbackConfigure automatic fallback if a model is unavailable or rate-limited
Cost optimizationRoute cheap tasks to smaller models, complex ones to frontier models
Unified billingOne invoice for all LLM costs across providers
Model comparisonEasily A/B test models by swapping the model string — Time Machine captures both

Tip: OpenRouter model IDs use the format provider/model-name (e.g. deepseek/deepseek-r1, qwen/qwen3-235b-a22b). Time Machine stores the full model string in your execution trace for accurate attribution.

Utilities & Cost Tracking

Built-in pricing for 30+ models. Auto-calculated in the LangChain adapter, or use directly.

cost-tracking.ts
import {
calculateCost,
hasModelPricing,
getModelPricing,
normalizeModelName,
configureFallbackPricing,
extractTokensFromLLMResult,
} from '@timemachine-sdk/sdk/utils';
// Calculate cost for known models
const cost = calculateCost('gpt-4o', 1000, 500);
// => 0.00625 (USD)
// Check model pricing availability
hasModelPricing('gpt-4o'); // true
hasModelPricing('my-custom-model'); // false
// Get pricing details
getModelPricing('gpt-4o');
// => { inputPer1k: 0.005, outputPer1k: 0.015 }
// Normalize model names (strips version suffixes)
normalizeModelName('gpt-4-0125-preview'); // 'gpt-4'
normalizeModelName('claude-3-sonnet-20240229'); // 'claude-3-sonnet'
// Configure fallback pricing for unknown models
configureFallbackPricing({
inputPer1k: 0.002,
outputPer1k: 0.006,
enabled: true,
});
// Extract tokens from LLM results (multi-provider)
const { tokensIn, tokensOut } = extractTokensFromLLMResult(llmResult);

Guide: Manual Step Recording

For custom agents or frameworks without a built-in adapter.

manual-agent.ts
import { TimeMachine } from '@timemachine-sdk/sdk';
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });
async function runAgent(query: string) {
const execution = await tm.startExecution({
name: 'research-agent',
metadata: { query, timestamp: Date.now() },
});
try {
// Step 1: Plan
const planStep = execution.step('decision', { action: 'plan', query });
const plan = await generatePlan(query);
await planStep.complete({ output: { plan } });
// Step 2: Search
const searchStep = execution.step('tool_use', {
tool: 'web_search',
query: plan.searchQuery,
});
const results = await webSearch(plan.searchQuery);
await searchStep.complete({
output: { resultCount: results.length, results },
});
// Step 3: Synthesize
const llmStep = execution.step('llm_call', {
model: 'gpt-4o',
context: results,
});
const answer = await callLLM(query, results);
await llmStep.complete({
output: { answer },
tokensIn: answer.usage.prompt_tokens,
tokensOut: answer.usage.completion_tokens,
});
await execution.complete();
return answer;
} catch (error) {
await execution.fail(error as Error);
throw error;
}
}

Guide: Multi-Step Workflows

For agents with sequential or branching logic.

workflow.ts
const execution = await tm.startExecution({ name: 'multi-step-workflow' });
// Step 1: Classify the request
const classifyStep = execution.step('llm_call', { action: 'classify' });
const category = await classifyRequest(userInput);
await classifyStep.complete({ output: { category } });
// Step 2: Route based on classification
const routeStep = execution.step('decision', { category });
const handler = selectHandler(category);
await routeStep.complete({ output: { handler: handler.name } });
// Step 3+: Conditional execution
if (category === 'needs_research') {
const retrieveStep = execution.step('retrieval', { query: userInput });
const docs = await vectorStore.similaritySearch(userInput);
await retrieveStep.complete({ output: { documentCount: docs.length } });
const answerStep = execution.step('llm_call', { model: 'gpt-4o', context: docs });
const answer = await generateAnswer(userInput, docs);
await answerStep.complete({
output: { answer },
tokensIn: 2000,
tokensOut: 500,
});
}
await execution.complete();

Guide: Error Handling

The SDK is fail-open — it never crashes your app. But you should still record failures for debugging.

error-handling.ts
const execution = await tm.startExecution({ name: 'agent-run' });
try {
const step = execution.step('tool_use', { tool: 'database_query' });
const result = await queryDatabase(sql);
await step.complete({ output: { rows: result.length } });
await execution.complete();
} catch (error) {
// Records the error in Time Machine for debugging
await execution.fail(error as Error);
throw error;
}
// Enable debug mode to see SDK activity in your console
const tm = new TimeMachine({
apiKey: process.env.TIMEMACHINE_API_KEY!,
debug: true, // Logs all SDK requests and errors
});

Guide: Express / Fastify

Wrap your API route handlers with execution tracking.

server.ts
import express from 'express';
import { TimeMachine } from '@timemachine-sdk/sdk';
const app = express();
const tm = new TimeMachine({ apiKey: process.env.TIMEMACHINE_API_KEY! });
app.post('/api/chat', async (req, res) => {
const execution = await tm.startExecution({
name: 'chat-endpoint',
metadata: {
userId: req.body.userId,
sessionId: req.body.sessionId,
},
});
try {
const step = execution.step('llm_call', {
model: 'gpt-4o',
messages: req.body.messages,
});
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: req.body.messages,
});
await step.complete({
output: { message: response.choices[0].message },
tokensIn: response.usage?.prompt_tokens,
tokensOut: response.usage?.completion_tokens,
});
await execution.complete();
res.json({ message: response.choices[0].message.content });
} catch (error) {
await execution.fail(error as Error);
res.status(500).json({ error: 'Internal server error' });
}
});

Types Reference

Full TypeScript coverage — no any types.

types.ts
// Client configuration
interface TimeMachineConfig {
apiKey: string;
baseUrl?: string; // default: 'https://api.timemachine.dev'
maxRetries?: number; // default: 3
debug?: boolean; // default: false
}
// Execution creation
interface CreateExecutionRequest {
name?: string;
metadata?: Record<string, unknown>;
}
// Step completion
interface StepCompleteOptions {
output?: Record<string, unknown>;
stateSnapshot?: Omit<StateSnapshot, 'stepId' | 'timestamp'>;
tokensIn?: number;
tokensOut?: number;
cost?: number;
latencyMs?: number;
toolCalls?: ToolCall[];
error?: StepError;
}
// Tool call record
interface ToolCall {
name: string;
input: Record<string, unknown>;
output?: Record<string, unknown>;
}
// Error record
interface StepError {
message: string;
stack?: string;
}
// Status types
type ExecutionStatus = 'running' | 'completed' | 'failed';
type StepStatus = 'running' | 'completed' | 'failed';
type StepType =
| 'llm_call' | 'tool_use' | 'decision' | 'retrieval'
| 'human_input' | 'transform' | 'custom';
// Utility types
interface TokenUsage { tokensIn: number; tokensOut: number; }
interface ModelPricing { inputPer1k: number; outputPer1k: number; }
interface FallbackPricingConfig {
inputPer1k: number;
outputPer1k: number;
enabled: boolean;
}

Supported Models

Built-in pricing for frontier and open-source models (2025). For unlisted models use configureFallbackPricing(). Prices in USD per 1,000 tokens — approximate and subject to provider changes.

Anthropic — Claude 4 series

Current flagship family (2025)

ModelInput ($/1k)Output ($/1k)Notes
claude-opus-4-6$0.01500$0.07500Most powerful, coding & reasoning
claude-sonnet-4-6$0.00300$0.01500Best balance of speed & quality
claude-haiku-4-5$0.00080$0.00400Fastest, lowest cost
claude-3.5-sonnet$0.00300$0.01500Previous gen, still widely used
claude-3.5-haiku$0.00080$0.00400Previous gen fast model

OpenAI — GPT-4.x & o-series

Including reasoning models (2025)

ModelInput ($/1k)Output ($/1k)Notes
gpt-4.5$0.07500$0.15000Multimodal frontier, highest quality
gpt-4.1$0.00200$0.00800Efficient, strong coding
gpt-4o$0.00500$0.01500Omni model, vision + text
gpt-4o-mini$0.00015$0.00060Fast & cheap for simple tasks
o3$0.01000$0.04000Reasoning model, complex problems
o4-mini$0.00110$0.00440Reasoning, optimized for cost
o1$0.01500$0.06000Previous reasoning generation

Google — Gemini 2.5

Long context, multimodal (2025)

ModelInput ($/1k)Output ($/1k)Notes
gemini-2.5-pro$0.00125$0.010001M context, top coding & reasoning
gemini-2.5-flash$0.00008$0.00030Low latency, best value
gemini-2.0-flash$0.00010$0.00040Previous gen flash
gemini-1.5-pro$0.00125$0.00500Legacy, 2M context

DeepSeek

Open-source, extremely cost-efficient

ModelInput ($/1k)Output ($/1k)Notes
deepseek-r1$0.00014$0.00219Reasoning model, matches o1 quality
deepseek-v3$0.00007$0.00110Dense MoE, strong general tasks
deepseek-r1-zero$0.00014$0.00219RL-trained reasoning, no SFT

Qwen — Alibaba

Strong multilingual & coding

ModelInput ($/1k)Output ($/1k)Notes
qwen3-235b-a22b$0.00022$0.00088Flagship MoE, top open model
qwen3-32b$0.00018$0.00072Dense, strong reasoning
qwen2.5-72b$0.00023$0.00069Previous gen, widely deployed
qwen2.5-coder-32b$0.00015$0.00060Best-in-class code generation

Kimi — Moonshot AI

Long-context specialist (Chinese frontier lab)

ModelInput ($/1k)Output ($/1k)Notes
kimi-k2$0.00060$0.00250Agentic reasoning, 1M context
moonshot-v1-128k$0.01200$0.01200Ultra long context
moonshot-v1-32k$0.00400$0.00400Standard context window

GLM — Zhipu AI

Chinese lab, strong bilingual performance

ModelInput ($/1k)Output ($/1k)Notes
glm-5$0.00100$0.00300Latest flagship, vision + reasoning
glm-4-plus$0.00070$0.00140Enhanced GLM-4, long context
glm-4$0.00014$0.00014Fast, cost-effective baseline

Meta — Llama 4

Open weights, free to self-host

ModelInput ($/1k)Output ($/1k)Notes
llama-4-maverick$0.00019$0.0008517B MoE, multimodal
llama-4-scout$0.00017$0.0001717B MoE, ultra-efficient
llama-3.3-70b$0.00023$0.00040Previous gen, solid baseline

Mistral

ModelInput ($/1k)Output ($/1k)Notes
mistral-large-2$0.00200$0.00600Top Mistral model
mistral-small-3$0.00010$0.00030Fast, lightweight
codestral$0.00030$0.00090Code-specialized

Pricing note: Prices are approximate as of mid-2025 and change frequently. For models not listed, use configureFallbackPricing() or pass cost directly in step.complete(). When using OpenRouter, pass the model string as-is (e.g. deepseek/deepseek-r1) — it will be stored in the execution trace for your records.

Design Principles

Fail-Open

Never crashes your application. If the API is unreachable, errors are silently logged and your agent continues.

Zero Overhead

Steps are batched asynchronously (up to 10 per batch, flushed every 500ms) to minimize performance impact.

Framework Agnostic

Manual step recording works with any framework. LangChain adapter provides automatic capture.

Type Safe

Full TypeScript coverage with no any types. Autocomplete and compile-time checks everywhere.

Tree-Shakeable

Sub-path exports (/adapters, /utils) let you import only what you need. Minimal bundle impact.

Production Ready

Exponential backoff retries, automatic batching, sensitive data filtering, and graceful error handling.

Pricing

Core observability (executions, steps, fork & replay) is free for all plans. Eval runs are the primary usage metric — they consume compute to replay your agent against each test case.

Free

$0/mo
  • 100 eval runs / month
  • 1 eval suite
  • 10 test cases
  • Manual runs only
  • Dashboard access
  • Community support
Get started free

Pro

Popular
$49/mo
  • 2,000 eval runs / month
  • Unlimited suites
  • Unlimited test cases
  • CI/CD integration
  • LLM-as-judge assertions
  • API + CLI access
  • Email support
Start Pro trial

Team

$199/mo
  • 10,000 eval runs / month
  • Unlimited suites & cases
  • 5 seats included
  • Scheduled eval runs
  • Slack / webhook alerts
  • Priority support
  • Usage dashboard
Start Team trial

Enterprise

Custom
  • Unlimited eval runs
  • Unlimited seats
  • SSO / SAML
  • Audit logs
  • SLA guarantee
  • Dedicated support
  • Custom integrations
Contact us

Usage-based add-ons

Extra eval runs

$0.01 per run above plan limit

LLM judge tokens

Cost + 20% margin (passed through at near-cost)

Extra seats (Team)

$29 / seat / month

All plans include: Unlimited executions captured, unlimited steps, fork & replay, dashboard access, SDK & API access, Claude Code integration, MCP server, and CLI. Eval runs are the only metered resource.

Troubleshooting

Steps are not appearing in the dashboard

Check your API key. Enable debug mode: new TimeMachine({ apiKey: "...", debug: true }). Make sure you call await execution.complete() — steps are flushed on completion.

LangChain adapter isn't capturing events

Make sure you pass the handler in the callbacks array: await agent.invoke(input, { callbacks: [handler] }). Without the callbacks option, nothing is captured.

Cost shows as 0

The model may not be in the built-in pricing table. Use hasModelPricing() to check. Configure fallback pricing with configureFallbackPricing() or pass cost directly in step.complete().

TypeScript errors with sub-path imports

Set moduleResolution to "bundler" or "node16" in your tsconfig.json. This is required for sub-path exports to resolve correctly.

"Cannot find module @timemachine-sdk/sdk"

Run npm install @timemachine-sdk/sdk. Make sure the package is in your dependencies, not just devDependencies (unless you only need it in dev).

Back to home
npm·GitHub·@timemachine-sdk/sdk v0.1.0