Can LLMs Beat Classical Hyperparameter Optimization Algorithms?
galsapir
109 points
15 comments
June 09, 2026
Related Discussions
Found 5 related stories in 88.9ms across 10,002 title embeddings via pgvector HNSW
- Can LLMs model real-world systems in TLA+? mad · 64 pts · May 08, 2026 · 54% similar
- Are LLM merge rates not getting better? 4diii · 131 pts · March 12, 2026 · 53% similar
- Let's talk about LLMs cdrnsf · 153 pts · May 04, 2026 · 53% similar
- Show HN: Find the best local LLM for your hardware, ranked by benchmarks andyyyy64 · 279 pts · May 15, 2026 · 53% similar
- LLMs are not the black box you were promised _jayhack_ · 53 pts · June 02, 2026 · 52% similar
Discussion Highlights (5 comments)
harrigan
Somewhat related, the experiment ongoing at https://www.ecdsa.fail/ is fascinating: it's a competitive, leaderboard-style research challenge trying to optimise a quantum circuit for breaking ECDSA (specifically the elliptic-curve point addition in Shor's algorithm). It quickly surpassed a result announced by Google researchers last month. Now it's showing a 40% gain over Google's result.
woadwarrior01
Their centaur idea[1] is interesting and quite straightforward. It should be fairly easy to implement using a coding agent for the LLM and the ask-and-tell interface in pycma[2]. [1]: https://github.com/ferreirafabio/autoresearch-automl/blob/ma... [2]: https://github.com/CMA-ES/pycma
cpard
I'm personally interested in this problem and it's a quite active research area right now. My feeling is that the research is converging to what the paper claims, that the combination of two is the right way to do it and it's a matter of how you combine the two as part of the harness you built that makes the difference. At the AID-Wild / ACM CAIS 2026 workshop that happened recently, there are plenty of examples in the accepted papers on that. A great example is AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve. It uses AlphaEvolve and Vizier to evolve compiler code-layout heuristics. ( https://arxiv.org/abs/2606.00131 )
deerstalker
I have been doing some research on this topic, and found that for some budget regimes (really expensive objective function evaluations) and some applications (HPC code parameter autotuning), the frontier LLMs can even outperform classical optimizers. Even open-weight models can perform well on certain applications but one some they fail abysmally (Of course this is limited to a bunch of niche applications).
janalsncm
Honestly, the results kind of show the LLM is adding very marginal value. TPE crushes Karpathy’s autoresearch and it is neck and neck with the method in this paper, despite not needing to run any LLM inference at all. I remember a few months ago people were fairly skeptical about autoresearch, but we didn’t have a ton of data to say it was better or worse. My own bias is to prefer cheaper methods unless the more expensive method is shown to be better.