Taming LLMs: Using Executable Oracles to Prevent Bad Code

mad44 32 points 18 comments March 26, 2026
john.regehr.org · View on Hacker News

Discussion Highlights (5 comments)

dktoao

"Our goal should be to give an LLM coding agent zero degrees of freedom" Wouldn't that just be called inventing a new language with all the overhead of the languages we already have? Are we getting to the point where getting LLMs to be productive and also write good code is going to require so much overhead and additional procedures and tools that we might as well write the code ourselves. Hmmm...

RS-232

Has anyone had success using 2 agents, with one as the creator and one as an adversarial "reviewer"? Is the output usually better or worse?

ReptileMan

Now is Haskell's time to shine.

JSR_FDED

> JustHTML was effectively tested into existence using a large, existing test suite. I love the phrase “tested into existence”.

shubhamintech

The oracle problem is tractable when the output is code: you can compile it, run tests, diff the output. For conversational AI it's much harder. We've seen teams use LLM-as-judge as their validation layer and it works until the judge starts missing the same failure modes as the generator.

Semantic search powered by Rivestack pgvector
3,471 stories · 32,344 chunks indexed