Types and Neural Networks

bgavran 77 points 24 comments April 21, 2026

www.brunogavranovic.com · View on Hacker News

Discussion Highlights (9 comments)

big-chungus4

So the model generates code, and let's say it is wrongly typed, we then take the rightly typed version and use cross entropy between them? Is that right? That just sounds like the typical training, unless you can somehow take arbitrary code that the model generated and automatically find the rightly typed version, so you won't need a dataset for it

Xmd5a

Related: https://cybercat.institute/2025/05/07/neural-alchemy/ https://cybercat.institute/2026/02/20/categorical-semantics-... https://cybercat.institute/2025/10/16/dependent-optics-ii/ > The reason I put off starting the series for so long is one of the same reasons blocking the writing of the paper: some of the introductory material is some of the most difficult to write. It has been such a long time that I no longer know how to adequately explain why the problem is so difficult. My sympathies to Jules

woolion

I'm not sure what to make of TFA (I don't have time right now to investigate in details, but the subject it interesting). It starts with saying you can stop generation as soon as you have an output that can't be completed -- and there's already more advanced techniques that do that. If your language is typed, then you can use a "proof tree with a hole" and check whether there's a possible completion of that tree. References are "Type-Constrained Code Generation with Language Models" and "Statically Contextualizing Large Language Models with Typed Holes". Then it switches to using an encoding that would be more semantic, but I think the argument is a bit flimsy: it compares chess to the plethora of languages that LLM can spout somewhat correct code for (which is behind the success of this generally incorrect approach). What I found more dubious is that it brushed off syntactical differences to say "yeah but they're all semantically equivalent". Which, it seems to me, is kind of the main problem about this; basically any proof is an equivalence of two things, but it can be arbitrarily complicated to see it. If we consider this problem solved, then we can get better things, sure... I think without some e.g. Haskell PoC showing great results these methods will have a hard time getting traction. Please correct any inaccuracies or incomprehension in this comment!

toolslive

> This is what most programmers do. They type raw text into the editor; the compiler either processes it into structured data, or returns an error the programmer has to internalise before resubmitting. In the 1980s structural editors were quite popular (fe the basic editor in the ZX81). Using these, it is impossible for the programmer to create text that is not a valid program.

Daffrin

The connection between type systems and neural net structure is underexplored in practice. One thing I'd add: when you're dealing with multi-modal inputs in production — say, mixed structured and unstructured content — the type-safety problem compounds. You end up with implicit contracts at inference boundaries that are very hard to enforce. Has the author written anything on how this applies to transformer architectures specifically? The attention mechanism seems like a place where a richer type theory would be genuinely useful.

pron

The problem is that the search space is so large that correcting errors via guardrails is only effective if the original error rate is low (how many Integer -> Integer functions are there? There's ~1 way to get it right and ~∞ ways to get it wrong). Sure, we can help the easy cases, but that's because they're easy to begin with. In general, we know (or at least assume) that being able to check a solution tractably does not make finding the solution tractable, or we'd know that NP = P. So if LLMs could effectively generate a proof that they've found the correct Integer -> Integer function, either that capability will be very limited or we've broken some known or assumed computational complexity limit. As Philippe Schnoebelen discovered in 2002 [1], languages cannot reduce the difficulty of program construction or comprehension. Of course, it is possible that machine learning could learn some class of problems previously unknown to be in P and find that it is in P, but we should understand that that is what it's done: realised that the problem was easy to begin with rather than finding a solution to a hard problem. This is valuable, but we know that hard problems that are of great interest do exist. [1]: https://lsv.ens-paris-saclay.fr/Publis/PAPERS/PDF/Sch-aiml02...

ogogmad

"Even more, this coproduct can be thought of as an instance of a dependent pair indexed by a finite set" - I know what the individual terms mean, but I don't get what this is saying.

calebh

In my mind the main problem here is setting up the environment for training the LLM and ensuring that there's enough high quality training data for consumption. Getting an environment set up for a single project is non-trivial - here I'm assuming that you want something similar to autocomplete in an IDE or language server integration. Even if you could set this up, are there enough projects to even train on in the first place? Maybe this set-up will work for Haskell, but you can abandon any hope of setting up environments for C or C++. Even languages like Rust or C# may be impossible to train on, despite the build chain being a bit nicer than C or C++.

aesthesia

I'm not totally convinced by this: > It might appear that this is an argument against scale, and the Bitter Lesson. That is not the case. I see this as a move that lets scale do its work on the right object. As with chess, where encoding the game rules into training produces a leap that no amount of inference-time search can today match, the move here is to encode the programming language itself into the training, and apply scale on a structure that actually reflects what we’re trying to produce. One way to think of the bitter lesson as it applies to generative models is that ~all data carries some information about the structure of reality, and architectures that let you train on more data are better because they learn better underlying world models. Knowledge transfers: LLMs are good at writing code partly because they've seen a lot of code, but also because they understand (at least to some extent) the relationship between that code and the rest of the world. Constraining a model's output structure also constrains the data that is available to train it. So the big question is whether you can actually meaningfully scale training with these kinds of strictly structured outputs.

Types and Neural Networks

Discussion Highlights (9 comments)

Related Discussions