Microgpt explained interactively
growingswe
237 points
36 comments
March 01, 2026
Related Discussions
Found 5 related stories in 30.9ms across 3,471 title embeddings via pgvector HNSW
- GPT‑5.4 Mini and Nano meetpateltech · 217 pts · March 17, 2026 · 52% similar
- NanoGPT Slowrun: 10x Data Efficiency with Infinite Compute sdpmas · 122 pts · March 19, 2026 · 52% similar
- GPT 5.4 Thinking and Pro twtw99 · 64 pts · March 05, 2026 · 51% similar
- GPT-5.4 meetpateltech · 156 pts · March 05, 2026 · 50% similar
- GPT-5.4 mudkipdev · 739 pts · March 05, 2026 · 50% similar
Discussion Highlights (11 comments)
politelemon
> By the end of training, the model produces names like "kamon", "karai", "anna", and "anton". None of them are copies from the dataset. Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...
windowshopping
The part that eludes me is how you get from this to the capability to debug arbitrary coding problems. How does statistical inference become reasoning? For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.
malnourish
I read through this entire article. There was some value in it, but I found it to be very "draw the rest of the owl". It read like introductions to conceptual elements or even proper segues had been edited out. That said, I appreciated the interactive components.
grey-area
The original article from Karpathy: https://karpathy.github.io/2026/02/12/microgpt/
jmkd
It says its tailored for beginners, but I don't know what kind of beginner can parse multiple paragraphs like this: "How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is − log ( � ) −log(p) where � p is the probability the model assigned to the correct token. This is called cross-entropy loss."
ChrisArchitect
Related: Microgpt https://news.ycombinator.com/item?id=47202708
love2read
Is it becoming a thing to misspell and add grammatical mistakes on purpose to show that an LLM didn't write the blog post? I noticed several spelling mistakes in Karpathy's blog post that this article is based on and in this article.
danhergir
I went through the article, and it makes sense to me that we're getting names as an output, but why doing so with names?
kinnth
That was one of the most helpful walkthroughs i've read. Thanks for explaining so well with all of the steps. I wasn't a coder but with AI I am actually writing code. The more i familiarise myself with everything the easier it becomes to learn. I find AI fascinating. By making it so simple and clear it helps when i think what i need to feed it.
thebiblelover7
I know many comments mentioned that it was too introductory, or too deep. But as someone that does not have much experience understanding how these models work, I found this overview to be pretty great. There were some concepts I didn't quite understand but I think this is a good starting point to learning more about the topic.
dreamking
It seems that Tmobile is originally block this website that I can't open this blog page... https://www.t-mobile.com/home-internet/http-warning?url=http...