My first impressions on ROCm and Strix Halo
random_
30 points
21 comments
April 18, 2026
Related Discussions
Found 5 related stories in 56.7ms across 4,930 title embeddings via pgvector HNSW
- Taking on CUDA with ROCm: 'One Step After Another' mindcrime · 118 pts · April 12, 2026 · 51% similar
- Notes on Project Oberon surprisetalk · 23 pts · March 04, 2026 · 47% similar
- Running Mainline Linux, U-Boot, and Mesa on Rockchip: A Year in Review losgehts · 13 pts · March 02, 2026 · 45% similar
- Random musings: 80s hardware, cyberdecks speckx · 28 pts · April 17, 2026 · 43% similar
- OpenClaw: The Complete 2026 Deep Dive(Install, Cost, Hardware, Reviews and More) svrbvr · 23 pts · March 30, 2026 · 43% similar
Discussion Highlights (9 comments)
timmy777
Thanks for sharing. However, this missed being a good writeup due to lack of numbers and data. I'll give a specific example in my feedback, You said: ``` so far, so good, I was able to play with PyTorch and run Qwen3.6 on llama.cpp with a large context window ``` But there are no numbers, results or output paste. Performance, or timings. Anyone with ram can run these models, it will just be impracticably slow. The halo strix is for a descent performance, so you sharing numbers will be valuable here. Do you mind sharing these? Thanks!
everlier
owning GGUF conversion step is good in sone circumstances, but running in fp16 is below optimal for this hardware due to low-ish bandwidth. It looks like context is set to 32k which is the bare minimum needed for OpenCode with its ~10k initial system prompt. So overall, something like Unsloth's UD q8 XL or q6 XL quants free up a lot of memory and bandwidth moving into the next tier of usefulness.
JSR_FDED
Perfect. No fluff, just the minimum needed to get things working.
IamTC
Nice. Thanks for the writeup. My Strix Halo machine is arriving next week. This is handy and helpful.
roenxi
I thought the point of something like Strix Halo was to avoid ROCm all together? AMDs strategy seems to have been to unify GPU/CPU memory then let people write their own libraries. The industry looks like it's started to move towards Vulkan. If AMD cards have figured out how to reliably run compute shaders without locking up (never a given in my experience, but that was some time ago) then there shouldn't be a reason to use speciality APIs or software written by AMD outside of drivers. ROCm was always a bit problematic, but the issue was if AMD card's weren't good enough for AMD engineers to reliably support tensor multiplication then there was no way anyone else was going to be able to do it. It isn't like anyone is confused about multiplying matricies together, it isn't for everyone but the naive algorithm is a core undergrad topic and the advanced algorithms surely aren't that crazy to implement. It was never a library problem.
anko
I would be interested to know what speeds you can get from gemma4 26b + 31b from this machine. also how rocm compares to triton.
spoaceman7777
I'm somewhat confused as to why this is on the front page. It doesn't go into any real detail, and the advice it gives is... not good. You should definitely not be quantizing your own gguf's using an old method like that hf script. There are lots of ways to run LLMs via podman (some even officially recommended by the project!). The chip has been out for almost a year now, and its most notable (and relevant-to-AI) feature is not mentioned in this article (it's the only x86_64 chip below workstation/server grade that has quad-channel RAM-- and inference is generally RAM constrained). I'm also quite puzzled about this bit about running pytorch via uv. Anyway. I wouldn't recommend following the steps posted in there. Poke around google, or ask your friendly neighborhood LLM for some advice on how to set up your Strix Halo laptop/desktop for the tasks described. A good resource to start with would probably be the unsloth page for whichever model you are trying to run. (There are a few quantization groups that are competing for top-place with gguf's, and unsloth is regularly at the top-- with incredible documentation on inference, training, etc.) Anyway, sorry to be harsh. I understand that this is just a blog for jotting down stuff you're doing, which is a great thing to do. I'm mostly just commenting on the fact that this is on the front page of hn for some reason.
seemaze
Check out the officially supported project Lemonade[0] by AMD. It has gfx1151 specific builds of vLLM, llama.cpp, comfy-ui, and even a PR to merge a Strix Halo port of Apple’s MLX[1] with a quick and easy install. [0] https://www.amd.com/en/developer/resources/technical-article... [1] https://github.com/lemonade-sdk/lemonade/issues/1642
aappleby
No benchmarks?