I put a datacenter GPU in my gaming PC
birdculture
292 points
168 comments
May 31, 2026
Related Discussions
Found 5 related stories in 82.8ms across 9,043 title embeddings via pgvector HNSW
- 768GB Intel Optane DIMMs to run 1T-parameter LLM with single GPU at 4tps walterbell · 26 pts · May 30, 2026 · 54% similar
- Every GPU That Mattered jonbaer · 309 pts · April 07, 2026 · 53% similar
- Show HN: A game where you build a GPU Jaso1024 · 610 pts · April 04, 2026 · 52% similar
- Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs dnhkng · 358 pts · March 10, 2026 · 52% similar
- 10k-watt GPU meet 40-watt lump of meat speckx · 11 pts · April 21, 2026 · 51% similar
Discussion Highlights (20 comments)
lucamark
Congrats! Most people won’t want to debug drivers, kernels, ACPI, adapters, and fan headers. But for those who do, the capability-per-pound is absurd.
lelanthran
> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising. Because humans write exactly like this /s
matja
The AMD MI250X GPUs are also interesting - 128GB of HBM2E at 3TB/s, sometimes you see them second-hand for under $1k, the catch obviously is that it needs an OAM socket. Never seen an easy way to hook them up to a regular mainboard.
knollimar
A little bit of local copium but neat read. Isn't a rasbpi with 16gb of RAM $300 now?
jmyeet
Some context: - In 2017, the v100 was a ~$10,000 GPU. I believe there was a PCI-e version but this is probably so cheap because SXM2 is going to be harder to use; - A 5090 has 1800GB/s of internal memory bandwidth (compared to 900GB/s in the 9 year old GPU). Of course a 5090 is substantially more expensive; - A 5090 has ~21k CUDA cores vs ~5k; - The current $10k NVidia GPU is the RTX 6000 Pro w/ 96GB of VRAM. It has slightly more CUDA cores but it otherwise pretty much just a 5090. This is unsurprising. NVidia uses VRAM for market segmentation. Consider this: in 5-10 years, the trillions spent on AI data centers will likewise be sold for scrap most likely. That's how short the runway is for OpenAI and Anthropic to recover that investment. Anyway, I'm kind of impressed the author managed to get this all to work. I don't think it even would've occurred to me that someone had made an SXM2 adapter, particularly because it's not even used anymore. Like props to whoever did that.
mondainx
Great write-up, I've often considered these DC cards for a project and now you've convinced me to pick one up; you describe the price of the unit against what one spends on tokens and that does it for me.
casey2
Some resell group is going to have to make this easier. The shear amount of these cards otherwise heading towards the landfill is staggering. That is if Big Tech don't destroy them to prevent model weights from leaking.
Teknomadix
Tesla V100 SXM2 16GB is NOT DGX class as the author writes. It's HGX class. The V100 comes in two classes, SXM2 and SXM4, the latter coming with a Max of 80gb on board memory. Typically these are installed 8×A100 80GB SXM4 on an HGX riser, and what that gives you is NVSwitch fabric and 640GB of pooled HBM2e (on package stacked memory /w ~2 TB/s of memory bandwidth). 2u standard rack footprint too.
omarqureshi
Could probably avoid the crazy fan with a waterblock - I've seen a whole kit, v100 + PCIE adapter + block for £235. Yes, you'll have to pay for pump, radiators and radiator fans, but that should really quieten it down
recursivegirth
> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising. Had to stop there. Annoying. I can't stand AI use for writing. It makes any otherwise great article feel so disingenuous.
mickeyp
Impressive work. But the problem is not the 30 tok/s which is fine for agentic coding and chat. It's prefill; slow prefill kills agentic workloads dead. If you have 100,000 tokens at ~150tok/s per the OP, you're looking at: You have: 100000 / (150/s) You want: hms 11 min + 6.6666667 sec Which is quite a wait indeed.
whoamii
The real question: did your local LLM write this post?
bob1029
> And yes, if you want the absolute best, Opus 4.8 exists. It also costs more per 20 minutes of heavy use than I paid for this entire GPU and adapter setup combined. But the gap is shockingly small. I don't think this is a fair characterization of the situation. I use frontier models via API pre-paid tokens every single day, and I can barely rack up $100 per month . The fact that we figured out how to burn double this in 20 minutes is impressive, but I don't think it reflects the reality that many are experiencing right now. There are some exceptionally gluttonous approaches to harnessing LLMs that I think are serving as convenient straw men in these discussions. Paying for the API will almost always be more economical than self-hosting equivalent infrastructure. I am not against self-hosting, but the article suggests a primarily economic motivation for this effort. If you are consuming fewer than 10^9 tokens per month, I really don't think it's worth your time to try and compete with the hyperscalars. Most of the money is to be found in the integration of this technology with existing businesses.
ewy1
despite gaming being used in the title, it is not mentioned in the article, but i'm curious how this performs. i've ran some multi vendor frankenstein setups before and sometimes it even works, so i'm curious to hear your experience with it.
pogue
But could you game with the GPU? Or is that purely a drivers issue?
abejfehr
Based on the title I was really hoping to see how this was used for gaming, but they just ran an LLM on it
wg0
Wait a few years, everyone will be able to put one at half the price.
KnuthIsGod
AI written posts will kill HN.
axpy906
Wow. V100. That brings back memories. Way to go.
gtirloni
> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising. sigh