Show HN: Open-source browser for AI agents

theredsix 118 points 39 comments March 11, 2026
github.com · View on Hacker News

Hi HN, I forked chromium and built agent-browser-protocol (ABP) after noticing that most browser-agent failures aren’t really about the model misunderstanding the page. Instead, the problem is that the model is reasoning from a stale state. ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent. The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work. A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below. Try it out: `claude mcp add browser -- npx -y agent-browser-protocol --mcp` (Codex/OpenCode instructions in the docs) Demo video: https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369

Discussion Highlights (11 comments)

theredsix

Op here, happy to answer any question!

giancarlostoro

Interesting, I wonder if this would help with other projects too, one project that comes to mind is archivebox, I don't know if they still have the issue I'm thinking of, but archivebox eventually had the Chrome instances (as the meme goes) basically consume all available RAM. If by freezing execution this could stop that, it could be useful for more than just AI agents.

Retr0id

> As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark And what does opus score with "regular" browser harnesses?

gregpr07

Love it! From first principles: this kinda answers the "do we really even need CDP" I always have in my head building browser use...

appcustodian2

how do you know when a page is "settled"?

notpublic

From the commit history, it looks like you are using Claude for some of the development. Would love to hear how you are using Claude to go through such a massive code base. btw, impressive project.

robutsume

The freeze-between-steps approach is the right call. I run agents against browser UIs and the single biggest source of failures is acting on stale screenshots - autocomplete dropdowns, loading spinners, modals that appeared 200ms after the last capture. Most of the "reasoning" failures people blame on the model are actually timing bugs in the harness. Curious about the chromium fork maintenance burden though. Every major chrome release is going to want a rebase. Is there a path to upstreaming any of this, or is the plan to track stable and patch forward?

taskpod

Great to see purpose-built agent tooling. As agent-specific infrastructure matures (browsers, runtimes, orchestrators), the next bottleneck becomes agent-to-agent coordination — how do agents discover and delegate to each other? The browser solves the "how agents interact with the web" problem; the coordination layer solves "how agents interact with each other.

dokdev

Freezing the browser at every step is a very good approach. I am also working on an agent browser. It uses wireframe snapshots instead of screenshots to reduce token cost. https://github.com/agent-browser-io/browser

exabrial

> then freezes JavaScript + virtual time until the next step... Ironically, I wish this would happen for me browsing the internet too...

seanrrr

> Pause JavaScript + virtual time Very cool! Sometimes when I try to debug things with chrome dev tools MCP, Claude would click something and too many things happen then it kind of comes to the wrong conclusions about the state of things, so sounds like this should give it a more accurate slice of time / snapshot of things.

Semantic search powered by Rivestack pgvector
3,471 stories · 32,344 chunks indexed