Unverified: What Practitioners Post About OCR, Agents, and Tables
chelm
27 points
27 comments
April 05, 2026
Related Discussions
Found 5 related stories in 54.6ms across 3,663 title embeddings via pgvector HNSW
- Show HN: Online OCR Free – Batch OCR UI for Tesseract, Gemini and OpenRouter naimurhasanrwd · 13 pts · March 03, 2026 · 44% similar
- Optimizing Content for Agents vinhnx · 25 pts · March 14, 2026 · 44% similar
- Don't Trust, Verify lwhsiao · 17 pts · March 28, 2026 · 44% similar
- OpenRCA benchmark – Improving Claude's root cause analysis accuracy by 12 pp behat · 11 pts · March 11, 2026 · 43% similar
- Codex Security: now in research preview ancarda · 31 pts · March 06, 2026 · 43% similar
Discussion Highlights (7 comments)
bonsai_spool
Please write in your own words! I’m not inclined to read something if it consists of what you copy and pasted from Claude
jgalt212
> The Demo Works. Production Does Not. Truer words have never been spoken. LLMs make mind blowing demos, but real-world performance is much less (but still useful). An example from yesterday: I asked Google / Nano Banana to repaint my house with a few options. It gave a nice write up on three themes and a nice rendering of 1/3 vertical slices in one image of each theme. Then, I asked it to redraw the image entirely in one of the themes. It redrew the image 1/3 in the one theme I asked for and 2/3 in a theme I did not ask for. Further prompting did not fix it. At the end of the day, this was a useful exercise and I was able to get some sense of what color scheme would work better for my house, but the level of execution was miles away from the perfection portrayed in demos and hypester / huckster bloggers and VCs.
quinndupont
Very helpful analysis that confirms everything I’ve encountered. OCR remains a thorny issue. The author talks about professional workflows struggling with tables and such, but I’ve found it challenging to get clean copies of long documents (books). The hybrid workflow (layout then OCR) sounds promising.
ChrisKnott
Is there a SOTA OCR model that prioritises failing in a debuggable way? What I want is an output that records which sections of the image have contributed to each word/letter, preferably with per word confidence levels and user correctable identification information. I should be able to build a UI to say: no, this section is red-on-green vertically aligned Cyrillic characters; try again.
bobajeff
It's very surprising to me that the state of the art tools for data entry and digitizing still require a lot of supervision. From the article it's not that surprising that handwritten documents are harder for old-school OCR or AI as that can be hard even for humans in some cases. But tables and different layouts seem like low hanging fruit for vision models.
adam-badar
working with continuous OCR capture across 3 monitors using screenpipe. at 1.2fps you get usable text extraction but use 600mb-2gb ram. biggest issue is OCR can't distinguish directionality - ie. if someone messages you, or you type "let's cancel the meeting" the text is identical but the intent isn't
ikidd
Funny enough I was processing some handwritten tables into excel with Sonnet. It did way better than I thought it would, I'd say like 95%. I did have it put confidence indexes next to the output per line, and that was pretty useless, they were either really high or really low, and the confidence didn't match the mistakes at all.