CS336: Language Modeling from Scratch

kristianpaul 417 points 44 comments June 01, 2026
cs336.stanford.edu · View on Hacker News

Discussion Highlights (17 comments)

storus

Thanks for releasing this again! What are this year's changes to prior offerings?

meken

I have fond memories of cs224d [1] taught by richardsocher. It’s a bit dated at this point as it was created in the pre-transformer era, but it was a very cool introduction to applying deep learning to nlp at the time. [1] https://cs224d.stanford.edu

tmule

Are video lectures available online?

skerit

> GPU compute for self-study Those suggestions they make for a B200 start at $4.99 an hour. Is that really required, for starting out? I've been tinkering with my own from-scratch LLM, but in the early phases I don't need anything more than a 4090 on Vast.ai

airstrike

I wonder if people prefer to learn this on their own or if building a community around open learning is something that others are interested in

sonabinu

I brought a group together to do this class using the YouTube videos and course materials available online. It is challenging but rewarding. We tackled it one lecture video per week. Started with over 30 learners and by last session we were down to 8.

dominotw

i recently started reading "build reasoning model from scratch" then i realized that i am not really interested in building part and just want to understand theory and practice behind it. A want like a casual lesswrong style from ground up explanation.

ChrisArchitect

Related: AI Agent Guidelines for CS336 at Stanford https://github.com/stanford-cs336/assignment1-basics/blob/ma... ( https://news.ycombinator.com/item?id=48359232 )

chainsaw10

I’m intrigued by this course. However I’m also curious about its prerequisite: > Machine Learning (e.g. CS221, CS229, CS230, CS124, CS224N) You should be comfortable with the basics of machine learning and deep learning. Anyone have a good implementation-heavy self-study resource for those topics, or experience with the recorded lectures for those Stanford courses?

armas

I independently worked on the first two assignments over the course of a year. I learned so much! I was wondering what other courses people took on afterwards :)

fg137

I recently completed the 2025 version of this course (video + most assignments, skipping some of the most costly part of the tasks). That's quite something. There is a lot going on in the first two assignments which required a ton of thinking and debugging. Despite having a decent foundation in deep learning, it took me several months to finish it using bits of my after-work hours and weekends. (I am not a model part-time student by any means, and sometimes I didn't get to work on this for days, but it could have been much worse.) Hard to imagine how enrolled Stanford students manage to submit assignments in two week cadence. Coming back to the course, kudos to the course staff, including professors and TAs. The obviously put a ton of thought in designing the course, putting together those slides that contain the latest updates of the field, and preparing the wonderful assignments. You get to create a real LM and explore other important parts of LLM pipeline from small building blocks and validate them, validate each step, and see for yourself how everything comes together. You can really feel a sense of achievement after completing the assignments. That said, while the staff obviously put a lot of effort into making this accessible to everyone, I wish they made a bit more effort in clarifying the environment requirement. Their harness works best on a Linux environment with NVIDIA GPU, which may be taken for granted for researchers but rare for home computer setup. Their setup also expects specific CUDA versions and/or architectecture. For following at home, the next best setup is Windows with WSL2 + NVIDIA GPU, plus leased GPUs on various platforms, none of which is exactly trivial (or cheap, for that matter). It would be nice if the staff could put together a bit more guidance in that area, especially how someone without any compatible GPU can make the most out of the course. (One thing I learned is that if you use Mac OS and are not careful about memory analysis, your python code could freeze and force reboot your machine).

Oarch

Oh this is brilliant, I've spent the last month doing something just like this. As a challenge, no libraries allowed besides Python standard libs (so no numpy). Started with Word2Vec, built an RNN, then LSTM and am halfway through building transformer architecture.

AJRF

Can anyone answer question - whats the minimum viable GPU to follow along with this course at home? I have a 5080 16GB, are they really needing more than that in this course?

artemonster

I wish there was an option to render "executable" lectures as PDF too. Id love to scroll these while commuting to-from work

delis-thumbs-7e

I’d love to do this, but I’m afraid I still lack some of the required skills. But perhaps one day!

tevlon

Couple days ago, i used Claude to implement an improved version of gpt-1. I am no ML Engineer by no means. I am just a normal backend engineer. I ended up creating a hybrid between gpt-1 and modded-nanogpt (from KellerJordan). I was able to reproduce the results of the original gpt-1 paper with my gaming PC. I don't even have alot of VRAM. My NVIDIA GeForce RTX 2060 SUPER was able to reproduce most of the results with just 1 hour of training. I would totally recommend to do the same, if you are interested in pre-training LLMs. The code is here: https://github.com/epoyraz/modded-gpt-1 But, you can also just ask Claude 4.8 or Codex 5.5

wandering-nomad

Can the assignments be done on MBP M5 Max? I am hoping to get my hands on one in next couple of weeks and really want to pursue this course

Semantic search powered by Rivestack pgvector
9,294 stories · 87,504 chunks indexed