AI - Maksym Prokopov

NemoClaw: NVIDIA's Bet on Making AI Agents Enterprise-Ready

28.03.2026

At GTC 2026, NVIDIA announced NemoClaw — an enterprise wrapper around OpenClaw that adds the security and governance layer that’s been missing from autonomous AI agents. Jensen Huang called OpenClaw “the operating system for personal AI” and positioned NemoClaw as the stack that makes it safe for enterprise use.

This isn’t a competitor to OpenClaw. It’s the infrastructure layer underneath it.

What NemoClaw Actually Is

NemoClaw installs in a single command and adds two things to OpenClaw:

Nemotron models — NVIDIA’s open models that run locally on your hardware. No data leaving your network, no API calls to external providers for sensitive workloads.

OpenShell — an open-source security runtime that sandboxes each agent (or “claw”) in an isolated container. Administrators define permissions in YAML: which files an agent can access, which network connections it can make, which cloud services it can call. Everything outside those bounds is blocked.

The clever part is the privacy router. Sensitive workloads run on local Nemotron models. Non-sensitive queries get routed to frontier cloud models for higher capability. You get the power of Claude or GPT without sending your proprietary data through their APIs.

The Cisco Use Case

The most compelling demo came from Cisco’s security team. The scenario: a zero-day vulnerability advisory drops on a Friday evening.

Instead of the usual weekend scramble — pulling asset lists, pinging on-call engineers, mapping blast radius manually — a claw running inside OpenShell autonomously queries the configuration database, maps impacted devices against the network topology, generates a prioritized remediation plan, and produces an audit-grade trace of every decision it made. The entire response completes in about an hour.

The Cisco team’s framing is worth remembering: “We are not trusting the model to do the right thing. We are constraining it so that the right thing is the only thing it can do.”

That’s the right mental model for deploying AI agents in production. Trust the constraints, not the model.

The Hardware Strategy

Always-on agents need dedicated compute. They don’t wait for someone to open a browser tab — they run continuously, monitoring, executing, building. That requires hardware that doesn’t compete with the rest of your workloads.

NemoClaw runs on GeForce RTX PCs, RTX PRO workstations, and NVIDIA’s DGX Spark and DGX Station. NVIDIA is selling the silicon that agents live on 24/7. It’s a smart play — the more agents companies deploy, the more dedicated hardware they need.

The Partner Ecosystem

The launch partner list signals how seriously the enterprise software industry is taking this: Box, Cisco, Atlassian, Salesforce, SAP, Adobe, CrowdStrike, ServiceNow, LangChain, and more.

Box’s integration is particularly interesting — claws operate on enterprise files with the same permissions model as human employees. A parent claw can spawn sub-agents for invoice extraction, contract management, or RFP workflows, all governed by the same OpenShell policy engine.

LangChain is a launch partner for OpenShell integration, and NVIDIA announced the Nemotron Coalition with Mistral AI, Perplexity, Cursor, and LangChain to co-develop open frontier models specifically for agentic use cases.

What This Means for Engineering Teams

If you’re running infrastructure or platform engineering, a few things stand out:

Governance is now a first-class concern. OpenShell’s YAML-based policy model is the kind of thing that ISO 27001 auditors will want to see. If your company is deploying agents, you need a story for “what can this agent access, and how do we audit it?”

The scaffolding matters more than the agent. This is the same pattern we’ve seen from OpenAI’s harness engineering post and from companies like Factory — the agent is the easy part. The hard part is the environment it operates in: permissions, sandboxing, policy enforcement, audit trails.

Always-on agents change the compute model. If your agents are running 24/7, they need dedicated resources. That’s a capacity planning conversation your SRE team should be having now, not after deployment.

“Boring” security wins. YAML policy files, container isolation, permission-based file access, audit logging. None of this is new technology. It’s well-understood infrastructure patterns applied to a new problem. The teams that already think in terms of least-privilege access and blast radius containment are going to adapt fastest.

The Bigger Picture

Deloitte’s 2026 State of AI report found that only 1 in 5 companies has a mature governance model for autonomous AI agents. Goldman Sachs coined “orchestration risk” — the danger that AI agent layers will bypass traditional software platforms entirely.

NemoClaw is NVIDIA’s answer to both problems: a governed runtime for the agents that are coming whether enterprises are ready or not. The companies that figure out the scaffolding — security policies, audit trails, permission models, dedicated compute — are the ones that will actually deploy agents in production. Everyone else will be stuck in pilot mode.

Tracing the Thoughts of a Language Model: What Anthropic Found Inside Claude

05.03.2026

Anthropic just published something remarkable — they built an “AI microscope” that traces what actually happens inside Claude’s neural network during inference. Not what the model says it’s doing, but what it’s actually doing. The results are fascinating and sometimes unsettling.

Here are the key findings from their research on Claude 3.5 Haiku.

Universal Language of Thought

Claude doesn’t have separate “French Claude” or “Chinese Claude” running in parallel. The same core concepts activate across languages — “smallness” and “oppositeness” fire the same internal features regardless of language, and the output gets translated at the end. This shared circuitry increases with model scale.

This means Claude can genuinely learn something in one language and apply that knowledge when speaking another. It’s not translation — it’s a shared conceptual space.

Poetry: Planning, Not Guessing

The researchers expected to find word-by-word generation when Claude writes rhyming poetry. Instead, they discovered Claude plans rhymes before writing the line. Given “He saw a carrot and had to grab it,” Claude thinks of “rabbit” first, then constructs the second line to land there.

When they surgically removed the “rabbit” concept from Claude’s internal state, it pivoted to “habit.” When they injected “green,” it wrote a line ending in “green.” This is genuine planning — powerful evidence that even though models are trained to output one word at a time, they think on much longer horizons.

Mental Math: The Model Lies About Its Own Process

Claude uses parallel computational paths for addition — one for rough approximation, another for precisely determining the last digit. These paths interact to produce the final answer.

But here’s the kicker: when asked how it solved 36+59, Claude describes the standard carry-the-1 algorithm. It learned to explain math from human text, but invented its own internal strategies that it can’t introspect on. The model is genuinely unaware of its own reasoning process.

Catching the Model Bullshitting

On easy problems, Claude’s chain-of-thought reasoning is faithful — the intermediate computational features actually fire, matching what it claims to be doing. On hard problems (like computing cosine of a large number), Claude sometimes just makes up an answer with zero evidence of any calculation having occurred.

Even worse: when given a wrong hint about the answer, Claude works backwards, constructing reasoning steps that lead to the hinted result. This is textbook motivated reasoning, and Anthropic references philosopher Harry Frankfurt’s essay “On Bullshit” to describe it. The model doesn’t care whether its answer is true — it just produces something plausible.

The ability to trace actual internal reasoning, rather than relying on what the model claims, opens up real possibilities for auditing AI systems.

How Hallucinations Actually Work

The default state in Claude is refusal — a “can’t answer” circuit is always on. It only gets suppressed when a “known entity” feature fires strongly enough.

Hallucinations happen when the model recognizes a name but doesn’t actually know anything about the person. The “known entity” feature misfires, suppresses the refusal circuit, and the model proceeds to confabulate a plausible but entirely fictional answer. They demonstrated this by asking about “Michael Batkin” — an unknown person — and artificially activating the “known answer” features, causing Claude to consistently hallucinate that Batkin plays chess.

Anatomy of a Jailbreak

Studying a jailbreak that tricks Claude into spelling out “BOMB” via first letters of words, they found that Claude recognized the dangerous content early but couldn’t stop mid-sentence. Grammatical coherence features overpowered safety features — the model felt compelled to finish a grammatically valid sentence before it could refuse.

Grammar became the Achilles’ heel. The model could only pivot to refusal after completing a coherent sentence, using the sentence boundary as an opportunity to say “However, I cannot provide detailed instructions…”

Why This Matters

This is one of the most honest self-assessments I’ve seen from an AI company about their own model. They’re essentially saying: we caught our model bullshitting, we can show you the proof, and here’s how we plan to use these tools to make AI more trustworthy.

The limitations are real — even on short prompts, the method only captures a fraction of total computation, and it takes hours of human effort to analyze circuits for just tens of words. Scaling this to the thousand-word reasoning chains of modern models is an open challenge.

But the direction is clear: if you can trace what a model is actually computing rather than what it claims, you have a fundamentally new tool for AI safety.

Closing the Feedback Loop Changes Everything

13.02.2026

I refactored some components in our internal admin panel last week. The code looked fine. Tests passed. I shipped it.

Then someone opened it on a phone. Half the layout was broken.

This is the oldest story in frontend development. You change something, it looks fine on your screen, and it’s broken somewhere else. The feedback loop between “I changed the code” and “I can see what happened” has a gap in it.

I plugged that gap with chrome-devtools-mcp and Claude Code. And it kind of changed how I think about building UI.

What chrome-devtools-mcp does

It’s an MCP server that connects Claude Code to Chrome DevTools. That means Claude can look at your running page — the actual rendered DOM, computed styles, console errors, network requests. Not your source code. The real thing in the browser.

Setup is straightforward:

// .claude/mcp.json
{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": ["-y", "chrome-devtools-mcp"]
    }
  }
}

Launch Chrome with remote debugging, point Claude at your localhost, and you’re connected.

The feedback loop problem

Building frontend has always been a cycle:

Write code
Switch to browser
Inspect the result
Switch back to editor
Fix something
Repeat

Every step is a context switch. Every context switch costs you time and focus. When you’re debugging a mobile layout issue, you’re also juggling the device toolbar, resizing the viewport, scrolling to the right section, comparing against the design.

With AI coding assistants, this loop got faster for steps 1 and 5 — the writing part. But the AI was still blind. It could generate CSS all day, but it couldn’t see whether the button actually ended up in the right place. You had to be its eyes.

What changes when you close the loop

When Claude can see the browser, the conversation changes.

Instead of me describing “the cards are overlapping on mobile,” I say: “check the layout at 375px width.” Claude takes a screenshot through DevTools, sees the overlap, inspects the computed styles, finds that a flex container is missing flex-wrap: wrap, and fixes it.

One prompt. No context switching. No copy-pasting error messages. No describing visual bugs in words, which is honestly one of the worst things about working with AI on frontend code.

For our admin panel refactoring, this meant I could say things like:

“The sidebar collapses wrong on tablet. Take a look.”
“Something is off with the table headers. They don’t align with the data columns.”
“Check if the modal looks right on small screens.”

Claude would inspect, find the issue, fix it, and I could verify. The loop went from minutes to seconds.

Why this matters beyond frontend

The feedback loop principle isn’t just about CSS. It applies to anything you build.

When you write an API and immediately test it — that’s a closed loop. When you write infrastructure code and run terraform plan — closed loop. When you write a query and see the results — closed loop.

The places where software development feels painful are almost always where the loop is open. Where you change something and can’t immediately see what happened. Deployment pipelines that take 20 minutes. Staging environments that don’t match production. Code reviews that happen days later.

Every tool that closes a feedback loop is a multiplier. Hot module replacement closed the loop for frontend iteration. Docker closed it for “works on my machine.” CI/CD closed it for deployment confidence.

MCP + DevTools closes it for AI-assisted frontend development. The AI can finally see what it’s building.

Getting started

If you’re using Claude Code:

Install: npx chrome-devtools-mcp
Launch Chrome with --remote-debugging-port=9222
Add the MCP config to your project
Ask Claude to check your running app

It works with any web app. React, Rails with Hotwire, plain HTML — doesn’t matter. If it runs in Chrome, Claude can see it.

The gap between writing code and seeing results just got a lot smaller.

How I Use AI to Automate Daily Planning with Obsidian

10.02.2026

I’ve tried GTD, bullet journals, Notion, Roam, and probably a dozen apps I’ve already forgotten. They all failed for me. Not because they’re bad tools, but because I’d always find excuses not to open them.

Obsidian stuck. It’s local-first, plain markdown, fast. But even with Obsidian, I kept abandoning my daily notes. The friction of “open app → find today’s file → remember the format → actually write something” was enough to break the habit.

So I cheated.

What I built

I run OpenClaw on my home server. It’s an open-source AI assistant that connects to Telegram and can read/write files. I pointed it at my Obsidian vault.

Every morning at 6 AM, it messages me:

What do you plan for today? Personal goals? Work? Side projects?

I answer while making coffee. One thumb, half awake. The AI takes my rambling and turns it into a proper daily note with sections, checkboxes, timestamps.

At 7 AM, it sends me what I actually did yesterday. Seeing the gap between plans and reality every morning is… humbling. But useful.

Why this works for me

I’m an SRE. I think about systems. The old workflow had too many steps where I could fail: remember to open the app, navigate to the right file, context switch from whatever I was doing. Each step was a chance to say “eh, later.”

The new workflow has one step: reply to a Telegram message. I’m already in Telegram constantly. Building on an existing habit instead of creating a new one made all the difference.

Technical bits

OpenClaw runs as a systemd service. It’s basically an LLM with filesystem access. My vault syncs via Syncthing between Linux and Mac machines. iOS gets Obsidian Sync because Syncthing on iOS is painful.

The AI keeps:

Daily notes as YYYY-MM-DD.md
Memory files so it remembers context between sessions
Cron jobs for the morning prompts

Memory with PARA and Atomic Facts

The interesting part is how we organized the AI’s long-term memory. I use PARA (Projects, Areas, Resources, Archives) combined with atomic facts and memory decay.

The structure:

life/
├── projects/       # Active work with deadlines
├── areas/
│   ├── people/     # Important relationships
│   └── companies/  # Organizations I work with
├── resources/      # Topics I'm learning
└── archives/       # Done or inactive

Each entity has two files: a summary (markdown) and atomic facts (JSON).

Atomic facts

Not paragraphs. Small, standalone pieces of knowledge:

{
  "fact": "Mass Prokopov prefers mass Americano with oat milk",
  "learnedAt": "2026-01-15",
  "lastAccessed": "2026-02-08",
  "source": "daily-note"
}

Each fact tracks when it was learned, when last accessed, and where it came from. This metadata drives everything else.

Memory decay

Facts decay based on lastAccessed:

Hot (< 7 days) — appears in summaries, high priority
Warm (8-30 days) — in summaries, lower priority
Cold (30+ days) — searchable only, not in working context

Something I mentioned yesterday is top of mind. Something from three months ago fades but can be recalled when needed.

The AI runs a weekly job to recalculate what’s hot and regenerate summaries. This keeps context fresh without manual curation.

Everything is plain text. If OpenClaw dies tomorrow, I still have markdown and JSON files I can grep.

Capturing random thoughts

This changed how I work more than the morning routine did.

Walking somewhere, idea pops up: “message: add task — review the DR runbook before Monday.” Done. It’s in today’s note, properly formatted, I didn’t stop walking.

Compare that to: pull out phone, unlock, find Obsidian, wait for sync, navigate to today, scroll to tasks section, type, close app. I’d never do that. Now I actually capture things.

What I learned

Build on habits you already have. Fighting yourself is exhausting. I was already checking Telegram fifty times a day, so I made that the input.

Plain text ages well. I’ve lost data to apps that pivoted, shut down, or changed their export format. Markdown files on disk will outlive all of them.

Structure is boring but necessary. I don’t want to think about formatting when I’m half awake. Offloading that to the AI means I just dump thoughts and they end up organized.

What’s next

I want to see if there are patterns in my notes. Which tasks keep getting pushed? Which goals actually move forward? Months of daily notes should have some signal in there. Haven’t built that yet, but I’m curious.

If you’re doing something similar, let me know. I’m always interested in how other people hack their own productivity.

Related: Obsidian 1.12 shipped a CLI that makes this setup even cheaper — read why your AI agent just got 70,000× cheaper to run.