I Built an OpenTelemetry Extension for Pi Coding Agent
A few days ago I wrote about setting up observability for Claude Code with the Grafana stack. Claude Code ships with OTEL support baked in — set some env vars, data flows. Easy.
Pi is a different story. It’s an open-source coding agent by Mario Zechner that I’ve been running alongside Claude Code. Good extension system, no telemetry. I wanted both agents reporting into the same Grafana dashboards, so I built the missing piece.
What it does
The extension hooks into Pi’s lifecycle and exports OpenTelemetry traces and metrics. Same protocol, same pipeline as Claude Code. If you already have Alloy/Mimir/Tempo running, Pi data shows up in your existing dashboards.
The trace tree looks like this:
session (root span)
└── agent.prompt (per user message)
└── agent.turn (per LLM call cycle)
├── tool.bash / tool.read / tool.edit / tool.write
├── llm.request (span event)
└── model.changed (span event)
Metrics:
| Metric | Type | What it tells you | Labels |
|---|---|---|---|
pi.tokens.input |
Counter | Input tokens consumed (includes cache) | llm.model |
pi.tokens.output |
Counter | Output tokens produced | llm.model |
pi.tool.calls |
Counter | Tool invocations | tool.name |
pi.tool.errors |
Counter | Failed tool calls | tool.name |
pi.tool.duration |
Histogram | Tool execution time (ms) | tool.name |
pi.prompts |
Counter | User messages | llm.model |
pi.turns |
Counter | LLM call cycles | llm.model |
pi.session.duration |
Histogram | Session length (s) | llm.model |
Setup
Three steps.
1. Install
pi install npm:pi-otel-telemetry
That’s it. Pi auto-discovers extensions from ~/.pi/agent/extensions/.
2. Configure
Add to your ~/.zshrc or ~/.bashrc:
export OTEL_EXPORTER_OTLP_ENDPOINT="http://your-alloy-host:14318"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_RESOURCE_ATTRIBUTES="team.id=sre,environment=experiment,user.name=maksym"
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="http://your-mimir-host:9009/otlp/v1/metrics"
I use a separate metrics endpoint because I send traces through Alloy (forwarded to Tempo) but push metrics directly to Mimir’s OTLP ingestion. This sidesteps the metric renaming headache — Alloy’s Prometheus conversion turns pi_tokens_input into pi_tokens_input_total, and suddenly your dashboard queries don’t match. Pick one path and be consistent.
3. Verify
Start Pi, run a few prompts, then:
curl -s "http://your-mimir-host:9009/prometheus/api/v1/query?query=pi_prompts"
Data? You’re good.
Quick start with Jaeger
If you don’t have a Grafana stack and just want to see the traces:
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/jaeger:2 \
--set receivers.otlp.protocols.http.endpoint=0.0.0.0:4318
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 pi
open http://localhost:16686
Session spans, prompt spans, tool execution spans — the full tree shows up in Jaeger’s UI.
The dashboard
I built a Grafana dashboard and included the JSON in the repo. Grab pi-otel-telemetry.json and import it (Dashboards → Import → Upload JSON).
What’s on it:
- Six stat cards up top: prompts, turns, tool calls, errors, input tokens, output tokens
- Token consumption over time, broken down by model
- Tool call distribution — which tools get used most, how long they take, where they fail
- A performance table with avg and p95 latency per tool
- Session duration percentiles
- Recent sessions from Tempo with clickable trace IDs
Map your Prometheus datasource to Mimir and your Tempo datasource, and it works.
Using this with OpenClaw
This is where it got interesting for me. I run OpenClaw as my AI assistant, and it spawns Pi sessions via the ACP runtime. The extension loads automatically when Pi starts — doesn’t matter how it was launched.
Set the OTEL environment variables on the host running OpenClaw and every spawned Pi coding session gets instrumented. You can see how many sessions ran today, which ones burned the most tokens, which tools failed, and whether agents are getting stuck (session duration spikes).
Combine it with Claude Code telemetry and you have one Grafana dashboard covering all your AI coding agent activity. Both agents share the same OTEL pipeline, separated by service.name:
| Agent | Prometheus job label |
|---|---|
| Claude Code | claude-code |
| Pi | pi-coding-agent |
Filter by job to see them individually, or remove the filter for everything.
Things I learned building this
Alloy renames your metrics. OTEL’s service.name resource attribute becomes a Prometheus job label after conversion. Dashboard queries using service_name won’t find anything. Took me longer than I’d like to admit to figure out why my panels were empty.
The global trace provider is a singleton. Pi has a /reload command for extensions. But traceProvider.register() sets the global OTEL provider once per process. Reload the extension and the new provider can’t register — spans vanish silently. The fix: call traceProvider.getTracer() on the provider instance instead of trace.getTracer() from the global API.
npm publishing is fiddly. You need a granular access token with “bypass 2FA” enabled. Legacy tokens don’t work. The --otp flag doesn’t work. I spent 20 minutes on this before finding it buried in npm’s docs.
The code
Open source, MIT licensed.
GitHub: mprokopov/pi-otel-telemetry
npm: pi-otel-telemetry
If you’re running Pi and want visibility into what it’s doing, this gets you there in about five minutes. If you’re already monitoring Claude Code with OTEL, adding Pi to the same pipeline is just one pi install away.