Show HN: I built a tiny LLM to demystify how language models work

armanified 249 points 20 comments April 06, 2026
github.com · View on Hacker News

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food. Fork it and swap the personality for your own character.

Discussion Highlights (13 comments)

AndrewKemendo

I love these kinds of educational implementations. I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.

nullbyte808

Adorable! Maybe a personality that speaks in emojis?

SilentM68

Would have been funny if it were called "DORY" due to memory recall issues of the fish vs LLMs similar recall issues :)

ordinarily

It's genuinely a great introduction to LLMs. I built my own awhile ago based off Milton's Paradise Lost: https://www.wvrk.org/works/milton

cbdevidal

> you're my favorite big shape. my mouth are happy when you're here. Laughed loudly :-D

gnarlouse

I... wow, you made an LLM that can actually tell jokes?

martmulx

How much training data did you end up needing for the fish personality to feel coherent? Curious what the minimum viable dataset looks like for something like this.

NyxVox

Hm, I can actually try the training on my GPU. One of the things I want to try next. Maybe a bit more complex than a fish :)

dinkumthinkum

I think this is a nice project because it is end to end and serves its goal well. Good job! It's a good example how someone might do something similar for a specific purpose. There are other visualizers that explain different aspects of LLMs but this is a good applied example.

ankitsanghi

Love it! I think it's important to understand how the tools we use (and will only increasingly use) work under the hood.

kaipereira

This is so cool! I'd love to see a write-up on how made it, and what you referenced because designing neural networks always feel like a maze ;)

kubrador

how's it handle longer context or does it start hallucinating after like 2 sentences? curious what the ceiling is before the 9M params

zwaps

I like the idea, just that the examples are reproduced from the training data set. How does it handle unknown queries?

Semantic search powered by Rivestack pgvector
3,663 stories · 34,065 chunks indexed