I Built an OpenTelemetry Extension for Pi Coding Agent

March 15, 2026 · 4 min read · piopentelemetryotelgrafanaobservability

A few days ago I wrote about setting up observability for Claude Code with the Grafana stack. Claude Code ships with OTEL support baked in — set some env vars, data flows. Easy.

Pi is a different story. It’s an open-source coding agent by Mario Zechner that I’ve been running alongside Claude Code. Good extension system, no telemetry. I wanted both agents reporting into the same Grafana dashboards, so I built the missing piece.

What it does

The extension hooks into Pi’s lifecycle and exports OpenTelemetry traces and metrics. Same protocol, same pipeline as Claude Code. If you already have Alloy/Mimir/Tempo running, Pi data shows up in your existing dashboards.

The trace tree looks like this:

session (root span)
└── agent.prompt (per user message)
    └── agent.turn (per LLM call cycle)
        ├── tool.bash / tool.read / tool.edit / tool.write
        ├── llm.request (span event)
        └── model.changed (span event)

Metrics:

Metric	Type	What it tells you	Labels
`pi.tokens.input`	Counter	Input tokens consumed (includes cache)	`llm.model`
`pi.tokens.output`	Counter	Output tokens produced	`llm.model`
`pi.tool.calls`	Counter	Tool invocations	`tool.name`
`pi.tool.errors`	Counter	Failed tool calls	`tool.name`
`pi.tool.duration`	Histogram	Tool execution time (ms)	`tool.name`
`pi.prompts`	Counter	User messages	`llm.model`
`pi.turns`	Counter	LLM call cycles	`llm.model`
`pi.session.duration`	Histogram	Session length (s)	`llm.model`

Setup

Three steps.

1. Install

pi install npm:pi-otel-telemetry

That’s it. Pi auto-discovers extensions from ~/.pi/agent/extensions/.

2. Configure

Add to your ~/.zshrc or ~/.bashrc:

export OTEL_EXPORTER_OTLP_ENDPOINT="http://your-alloy-host:14318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_RESOURCE_ATTRIBUTES="team.id=sre,environment=experiment,user.name=maksym"
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="http://your-mimir-host:9009/otlp/v1/metrics"

I use a separate metrics endpoint because I send traces through Alloy (forwarded to Tempo) but push metrics directly to Mimir’s OTLP ingestion. This sidesteps the metric renaming headache — Alloy’s Prometheus conversion turns pi_tokens_input into pi_tokens_input_total, and suddenly your dashboard queries don’t match. Pick one path and be consistent.

3. Verify

Start Pi, run a few prompts, then:

curl -s "http://your-mimir-host:9009/prometheus/api/v1/query?query=pi_prompts"

Data? You’re good.

Quick start with Jaeger

If you don’t have a Grafana stack and just want to see the traces:

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/jaeger:2 \
  --set receivers.otlp.protocols.http.endpoint=0.0.0.0:4318

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 pi

open http://localhost:16686

Session spans, prompt spans, tool execution spans — the full tree shows up in Jaeger’s UI.

The dashboard

I built a Grafana dashboard and included the JSON in the repo. Grab pi-otel-telemetry.json and import it (Dashboards → Import → Upload JSON).

What’s on it:

Six stat cards up top: prompts, turns, tool calls, errors, input tokens, output tokens
Token consumption over time, broken down by model
Tool call distribution — which tools get used most, how long they take, where they fail
A performance table with avg and p95 latency per tool
Session duration percentiles
Recent sessions from Tempo with clickable trace IDs

Map your Prometheus datasource to Mimir and your Tempo datasource, and it works.

Using this with OpenClaw

This is where it got interesting for me. I run OpenClaw as my AI assistant, and it spawns Pi sessions via the ACP runtime. The extension loads automatically when Pi starts — doesn’t matter how it was launched.

Set the OTEL environment variables on the host running OpenClaw and every spawned Pi coding session gets instrumented. You can see how many sessions ran today, which ones burned the most tokens, which tools failed, and whether agents are getting stuck (session duration spikes).

Combine it with Claude Code telemetry and you have one Grafana dashboard covering all your AI coding agent activity. Both agents share the same OTEL pipeline, separated by service.name:

Agent	Prometheus job label
Claude Code	`claude-code`
Pi	`pi-coding-agent`

Filter by job to see them individually, or remove the filter for everything.

Things I learned building this

Alloy renames your metrics. OTEL’s service.name resource attribute becomes a Prometheus job label after conversion. Dashboard queries using service_name won’t find anything. Took me longer than I’d like to admit to figure out why my panels were empty.

The global trace provider is a singleton. Pi has a /reload command for extensions. But traceProvider.register() sets the global OTEL provider once per process. Reload the extension and the new provider can’t register — spans vanish silently. The fix: call traceProvider.getTracer() on the provider instance instead of trace.getTracer() from the global API.

npm publishing is fiddly. You need a granular access token with “bypass 2FA” enabled. Legacy tokens don’t work. The --otp flag doesn’t work. I spent 20 minutes on this before finding it buried in npm’s docs.

The code

Open source, MIT licensed.

GitHub: mprokopov/pi-otel-telemetry

npm: pi-otel-telemetry

If you’re running Pi and want visibility into what it’s doing, this gets you there in about five minutes. If you’re already monitoring Claude Code with OTEL, adding Pi to the same pipeline is just one pi install away.