Personal blog powered by a passion for technology.

I Built an OpenTelemetry Extension for Pi Coding Agent

A few days ago I wrote about setting up observability for Claude Code with the Grafana stack. Claude Code ships with OTEL support baked in — set some env vars, data flows. Easy.

Pi is a different story. It’s an open-source coding agent by Mario Zechner that I’ve been running alongside Claude Code. Good extension system, no telemetry. I wanted both agents reporting into the same Grafana dashboards, so I built the missing piece.

What it does

The extension hooks into Pi’s lifecycle and exports OpenTelemetry traces and metrics. Same protocol, same pipeline as Claude Code. If you already have Alloy/Mimir/Tempo running, Pi data shows up in your existing dashboards.

The trace tree looks like this:

session (root span)
└── agent.prompt (per user message)
    └── agent.turn (per LLM call cycle)
        ├── tool.bash / tool.read / tool.edit / tool.write
        ├── llm.request (span event)
        └── model.changed (span event)

Metrics:

Metric Type What it tells you Labels
pi.tokens.input Counter Input tokens consumed (includes cache) llm.model
pi.tokens.output Counter Output tokens produced llm.model
pi.tool.calls Counter Tool invocations tool.name
pi.tool.errors Counter Failed tool calls tool.name
pi.tool.duration Histogram Tool execution time (ms) tool.name
pi.prompts Counter User messages llm.model
pi.turns Counter LLM call cycles llm.model
pi.session.duration Histogram Session length (s) llm.model

Setup

Three steps.

1. Install

pi install npm:pi-otel-telemetry

That’s it. Pi auto-discovers extensions from ~/.pi/agent/extensions/.

2. Configure

Add to your ~/.zshrc or ~/.bashrc:

export OTEL_EXPORTER_OTLP_ENDPOINT="http://your-alloy-host:14318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_RESOURCE_ATTRIBUTES="team.id=sre,environment=experiment,user.name=maksym"
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="http://your-mimir-host:9009/otlp/v1/metrics"

I use a separate metrics endpoint because I send traces through Alloy (forwarded to Tempo) but push metrics directly to Mimir’s OTLP ingestion. This sidesteps the metric renaming headache — Alloy’s Prometheus conversion turns pi_tokens_input into pi_tokens_input_total, and suddenly your dashboard queries don’t match. Pick one path and be consistent.

3. Verify

Start Pi, run a few prompts, then:

curl -s "http://your-mimir-host:9009/prometheus/api/v1/query?query=pi_prompts"

Data? You’re good.

Quick start with Jaeger

If you don’t have a Grafana stack and just want to see the traces:

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/jaeger:2 \
  --set receivers.otlp.protocols.http.endpoint=0.0.0.0:4318

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 pi

open http://localhost:16686

Session spans, prompt spans, tool execution spans — the full tree shows up in Jaeger’s UI.

The dashboard

I built a Grafana dashboard and included the JSON in the repo. Grab pi-otel-telemetry.json and import it (Dashboards → Import → Upload JSON).

What’s on it:

  • Six stat cards up top: prompts, turns, tool calls, errors, input tokens, output tokens
  • Token consumption over time, broken down by model
  • Tool call distribution — which tools get used most, how long they take, where they fail
  • A performance table with avg and p95 latency per tool
  • Session duration percentiles
  • Recent sessions from Tempo with clickable trace IDs

Map your Prometheus datasource to Mimir and your Tempo datasource, and it works.

Using this with OpenClaw

This is where it got interesting for me. I run OpenClaw as my AI assistant, and it spawns Pi sessions via the ACP runtime. The extension loads automatically when Pi starts — doesn’t matter how it was launched.

Set the OTEL environment variables on the host running OpenClaw and every spawned Pi coding session gets instrumented. You can see how many sessions ran today, which ones burned the most tokens, which tools failed, and whether agents are getting stuck (session duration spikes).

Combine it with Claude Code telemetry and you have one Grafana dashboard covering all your AI coding agent activity. Both agents share the same OTEL pipeline, separated by service.name:

Agent Prometheus job label
Claude Code claude-code
Pi pi-coding-agent

Filter by job to see them individually, or remove the filter for everything.

Things I learned building this

Alloy renames your metrics. OTEL’s service.name resource attribute becomes a Prometheus job label after conversion. Dashboard queries using service_name won’t find anything. Took me longer than I’d like to admit to figure out why my panels were empty.

The global trace provider is a singleton. Pi has a /reload command for extensions. But traceProvider.register() sets the global OTEL provider once per process. Reload the extension and the new provider can’t register — spans vanish silently. The fix: call traceProvider.getTracer() on the provider instance instead of trace.getTracer() from the global API.

npm publishing is fiddly. You need a granular access token with “bypass 2FA” enabled. Legacy tokens don’t work. The --otp flag doesn’t work. I spent 20 minutes on this before finding it buried in npm’s docs.

The code

Open source, MIT licensed.

GitHub: mprokopov/pi-otel-telemetry

npm: pi-otel-telemetry

If you’re running Pi and want visibility into what it’s doing, this gets you there in about five minutes. If you’re already monitoring Claude Code with OTEL, adding Pi to the same pipeline is just one pi install away.