Taming LLMs: Using Executable Oracles to Prevent Bad Code
mad44
32 points
18 comments
March 26, 2026
Related Discussions
Found 5 related stories in 48.3ms across 3,471 title embeddings via pgvector HNSW
- LLM Doesn't Write Correct Code. It Writes Plausible Code pretext · 62 pts · March 07, 2026 · 60% similar
- I don't use LLMs for programming ms7892 · 68 pts · March 12, 2026 · 59% similar
- How I write software with LLMs indigodaddy · 69 pts · March 16, 2026 · 57% similar
- LLMs work best when the user defines their acceptance criteria first dnw · 137 pts · March 07, 2026 · 56% similar
- Reliable Software in the LLM Era mempirate · 102 pts · March 12, 2026 · 56% similar
Discussion Highlights (5 comments)
dktoao
"Our goal should be to give an LLM coding agent zero degrees of freedom" Wouldn't that just be called inventing a new language with all the overhead of the languages we already have? Are we getting to the point where getting LLMs to be productive and also write good code is going to require so much overhead and additional procedures and tools that we might as well write the code ourselves. Hmmm...
RS-232
Has anyone had success using 2 agents, with one as the creator and one as an adversarial "reviewer"? Is the output usually better or worse?
ReptileMan
Now is Haskell's time to shine.
JSR_FDED
> JustHTML was effectively tested into existence using a large, existing test suite. I love the phrase “tested into existence”.
shubhamintech
The oracle problem is tractable when the output is code: you can compile it, run tests, diff the output. For conversational AI it's much harder. We've seen teams use LLM-as-judge as their validation layer and it works until the judge starts missing the same failure modes as the generator.