Enabling Codex to Analyze Two Decades of Hacker News Data
ronfriedhaber
77 points
30 comments
April 02, 2026
Related Discussions
Found 5 related stories in 37.0ms across 3,471 title embeddings via pgvector HNSW
- Profiling Hacker News users based on their comments simonw · 60 pts · March 22, 2026 · 64% similar
- Show HN: Hackerbrief – Top posts on Hacker News summarized daily p0u4a · 66 pts · March 16, 2026 · 63% similar
- Researchers Deanonymize Reddit and Hacker News Users at Scale hk_flying_gear · 15 pts · March 01, 2026 · 62% similar
- Crow Watch: A Hacker News Alternative medv · 12 pts · March 09, 2026 · 58% similar
- Show HN: Rudel – Claude Code Session Analytics keks0r · 128 pts · March 12, 2026 · 54% similar
Discussion Highlights (12 comments)
mike_hearn
I don't quite understand how Modolap differs from just asking AI to use any other OLAP engine? Both your website and the github readme just emphasise that it's idiosyncratic and your personal approach, without explaining what that is or why anyone should care.
throwaway290
HN data is open? Under what conditions it's distributed?
zeroxfe
I've done this kind of thing many times with codex and sqlite, and it works very well. It's one prompt that looks something like this: - inspect and understand the downloaded data in directory /path/..., then come up with an sqlite data model for doing detailed analytics and ingest everything into an sqlite db in data.sqlite, and document the model in model.md. Then you can query the database adhoc pretty easily with codex prompts (and also generate PDF graphs as needed.) I typically use the highest reasoning level for the initial prompt, and as I get deeper into the data, continuously improve on the model, indexes, etc., and just have codex handle any data migration.
voidUpdate
When searching for references to Go, what does it actually look for? "Go" is a relatively common word, and I hardly see anyone referring to it as Golang
xnorswap
5% of all comments mention Claude code? Am I reading that right?
moralestapia
Do not estimate/plot DAUs/MAUs, it's not a pretty picture :'(.
Brajeshwar
The “Hacker News - Complete Archive” on Hugging Face,[1] recently popped up here. “The data is stored as monthly Parquet files sorted by item ID, making it straightforward to query with DuckDB, load with the datasets library, or process with any tool that reads Parquet.” Out of curiosity, I tinkered with it using Claude to see trends and patterns (I did find a few embarrassing things about me!). 1. https://huggingface.co/datasets/open-index/hacker-news
hakrgrl
That last chart showing the average comment length shows a clear negative downtrend, especially in recent months. I wonder why that is.
sd9
I'm kind of surprised that postgres was quite that dominated by mongodb back in the day. I remember the mongo fever, but I always thought postgres held reasonable market share. I guess it was other SQL dbs back then, I guess MySQL was still viable.
hsuduebc2
I really love codex. The price/value comparison to claude code is at least from my opinion much better.
sam-bee
Nobody who actually codes in that language ever calls it 'Golang'
RockyMcNuts
Could be interesting to chart quality of responses, toxicity/health of conversations, sentiment over time, impact of release of ChatGPT. (since AI can now answer many questions that might have been topics of conversation; people can use AI to participate; people may be reluctant to participate if AI can data mine everything and link it back to them, etc. similar to Stack Overflow)