The beginning of scarcity in AI

gmays 40 points 61 comments April 16, 2026

Discussion Highlights (16 comments)

Lapalux

"The first hit is free....."

stupefy

What limits LLM inference accelerators? I heard about Groq ( https://groq.com/ ) not sure how much it pushes away the problem.

vessenes

It seems very possible that we have at least five years of real limitations on compute coming up. Maybe ten, depending on ASML. I wonder what an overshoot looks like. I also wonder if there might be room for new entrants in a compute-scarce environment. For instance, at some point, could Coreweave field a frontier team as it holds back 10% of its allocations over time? Pretty unusual situation.

mattas

This notion that "we don't have enough compute" does not cleanly reconcile with the fact that labs are burning cash faster than any cohort of companies in history. If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."

isawczuk

It's artificial scarcity. LLM inference will soon be commodity as cloud. There is a 2-3years still before ASIC LLM inferences will catch up.

dmazin

Constraints can lead to innovation. Just two things that I think will get dramatically better now that companies have incentive to focus on them: * harness design * small models (both local and not) I think there is tremendous low hanging fruit in both areas still.

czk

"adaptive" thinking

itmitica

The current inference system is on a down slope. It remains to be seen what new wave of AI system or systems will replace it, making the whole current architecture obsolete. Meanwhile, they are milking it, in the name of scarcity.

henry2023

The US is bound by energy and China is bound by compute power. The one who solves its limitation first will end this “Scarcity Era”.

com2kid

To bang on the same damn drum: Open Weight models are 6 months to a year behind SOTA. If you were building a company a year ago based on what AI could do then, you can build a company today with models that run locally on a user's computer. Yes that may mean requiring your customers to buy Macbooks or desktops with Nvidia GPUs, but if your product actually improves productivity by any reasonable amount, that purchase cost is quickly made up for. I'll argue that for anything short of full computer control or writing code, the latest Qwen model will do fine. Heck you can get a customer service voice chat bot running in 8GB of VRAM + a couple gigs more for the ASR and TTS engine, and it'll be more powerful than the hundreds of millions spent on chat bots that were powered by GPT 4.x. This is like arguing the age of personal computing was over because there weren't enough mainframes for people to telnet into. It misses the point. Yes deployment and management of personal PCs was a lot harder than dumb terminal + mainframe, but the future was obvious.

paulddraper

This is wrong along multiple axes. 1. Supply can scale. You can point to COVID/supply-chain shocks, but the problem there is temporary changes. No one spins up a whole fab to address a 3 month spike. Whereas AI is not a temporary demand change. 2. Models are getting more efficient. DeepSeek V3 was 1/10th the cost of contemporary ChatGPT. Open weight models get more runnable or smarter every month. Cutting edge is always cutting edge, but if scarcity is real, model selection will adjust to fit it.

byyoung3

distillation is an equalizing force

yalogin

Does this also mean ram prices are not coming down anytime soon?

wg0

There's other side to it too. Whoever running and selling their own models with inference is invested into the last dime available in the market. Those valuations are already ridiculously high be it Anthropic or OpenAI to the tune of couple of trillion dollars easily if combind. All that investment is seeking return. Correct me if I'm wrong. Developers and software companies are the only serious users because they (mostly) review output of these models out of both culture and necessity. Anywhere else? Other fields? There these models aren't any useful or as useful while revenue from software companies by no means going to bring returns to the trillion dollar valuations. Correct me if I'm wrong. To make the matter worst, there's a hole in the bucket in form of open weight models. When squeezed further, software companies would either deploy open weight models or would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives, whole careers are built on that. Correct me if I'm wrong. Eventually - ROI might not be what VCs expect and constant losses might lead to bankruptcies and all that build out of data centers all of sudden would be looking for someone to rent that compute capacity result of which would be dime a dozen open weight model providers with generous usage tiers to capitalize on that available compute capacity owners of which have gone bankrupt and can't use it any more wanting to liquidate it as much as possible to recoup as much investment as possible. EDIT: Typos

2001zhaozhao

AKA, the beginning of big companies being able to roll over small companies with moar money (note: I don't expect this to actually happen until the AI gets good enough to either nearly entirely replace humans or solve cooperation, but the long term trend of scarce AI will go towards that direction)

ttul

Energy scarcity will drive more innovation in local silicon and local inference. Apple will be the unexpected beneficiary of this reality.

The beginning of scarcity in AI

Discussion Highlights (16 comments)

Related Discussions