Autoresearch: Agents researching on single-GPU nanochat training automatically
simonpure
82 points
22 comments
March 07, 2026
Related Discussions
Found 5 related stories in 48.7ms across 3,471 title embeddings via pgvector HNSW
- Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster hopechong · 145 pts · March 19, 2026 · 73% similar
- AutoKernel: Autoresearch for GPU Kernels frozenseven · 44 pts · March 11, 2026 · 66% similar
- Show HN: Autoresearch@home austinbaggio · 55 pts · March 11, 2026 · 61% similar
- Autoresearch on an old research idea ykumards · 325 pts · March 23, 2026 · 60% similar
- Autoresearch for SAT Solvers chaisan · 84 pts · March 19, 2026 · 57% similar
Discussion Highlights (10 comments)
falcor84
The only thing missing is for the agents to publish and peer-review their research.
AlexCoventry
Wow, Gemini suggested a very similar experiment to me yesterday. Guess I know where it got the idea from, now. :-)
lostmsu
Non-zero based chart makes it look like it was very successful.
kubb
He's burning Claude tokens to slightly improve his tiny and not very capable LLM? It's fun, I bet, but wake me up when it leads to a research breakthrough.
abeppu
but the experiments it did that "improved" validation BPB in the GH screenshot were all basically hyperparameter changes right? So is this better or worse, either per experiment or per unit time, than hyperparameter tuning techniques that don't involve an LLM? It's not clear from this if the LLM is more or less making random changes which sometimes work , and or the LLM thinking actually finds "good" changes because of what the LLM has internalized. E.g. how does this compare to a hyperparameter tuning pass with e.g. BayesOpt that does the same number of 5-min training experiments?
mikert89
As ai improves, most tasks will become something like this. Environments setup where the model learns through trial and error Any human endeavor that can be objectively verified in some environment like this can be completely automated
oezi
Is there a Autoresearch for Jupyter somewhere? I point it to a Jupyter cell to improve based on another which calculates the target metric?
freakynit
Would it make this exercise even more interesting if we add that for every 25%+ improvement in val_bpb, existing limits (5 minute and VRAM usage) are also increased (by certain percentages)? This can simuate human-like dev iterations much more closely. Infra can be auto-scaled using a platform like Modal.
elikoga
> this means that autoresearch will find the most optimal model for your platform in that time budget I'm looking forward to finding out what model is optimal on my rtx3090 One thing I'm concerned with is that the model with best bpb after 5 minutes in smaller setups are only about ~10M Parameters in size which is too small for some emergent effects.
naomi_kynes
The "chief scientist + junior engineers in tmux sessions" framing is interesting as a communication architecture problem. Once you have more than a handful of concurrent experiments, the question becomes: how does the chief scientist reliably get status from the juniors without polling tmux output constantly? And when a junior finds something surprising — a result that changes the research direction — how does that signal propagate back quickly enough to stop wasted compute on now-irrelevant branches? The tmux channel works well for low concurrency. At higher concurrency it starts to look like the same problem as multi-agent coordination in production systems: you need something closer to pub/sub than session polling. Curious how you're thinking about the feedback loop design as you scale the number of concurrent junior agents.