Quotea desktop PC is not going to be able to replace Strix Halo's 128GB and 256GB/s RAM and AMD advertises this
Let me correct myself + general update: It won't replace Strix Halo's 256 GB/s, but one can get 128 GB, 5600 MT/s, RAM for 340 bucks (includes taxes) or 256 GB RAM for that matter.
QuoteFor $1400 or $1600 I could have a full size gaming desktop.
Why would i pay that for something that limits my options?
Correct, if you don't care about the RAM bandwidth (it's only 2.5 times faster on Strix Halo (see below)) or size, a desktop PC with a dedicated GPU would add the much faster VRAM on top and all the advantages that come with that (faster prompt processing (pp) and token generation (tg)), or the up to 256 GB RAM option (on mainstream AM5 B850 (just a rename of B650 + USB4) and thus maybe B650 as well). See below for more.
QuoteBOSGAME M5
AMD Ryzen AI Max+ 395, Radeon 8060S
123483 MB/s
Geekom A9 Max, AI 9 HX 370
AMD Ryzen AI 9 HX 370, Radeon 890M
86541 MB/s
(from the Framework Desktop Strix Halo review table)
HX 370: 128-bit * 5600 MT/s / 1000 / 8 = 89.6 GB/s: Does check out with the measurement.
AI Max+ 395: 256-bit * 8000 MT/s / 1000 / 8 = 256 GB/s: Does
NOT check out with the measurement.
Can you measure with the newest AIDA64 version 8.0?
LLMsThe "Strix Halo" APU is a 256-bit chip with a theoretical memory bandwidth of 256 GB/s (256-bit * 8000 MT/s / 1000 / 8) (and ~210 GB/s practically (expected)), comparable to an entry level quad-channel (4 * 64-bit) workstation' memory bandwidth. A normal desktop PC is dual-channel at best. AMD specifically advertises "Strix Halo" for running/inferencing LLMs. You can run the same LLMs on any PC, if you have at least the same amount of RAM (well, running off of a SSD will also work, but the speed will be super slow), ATX sized or not, dual-channel RAM or not, the differences are:
- The size: This is 2.5 Liters (20*21.5*5.7).
- The RAM speed at which any LLM will be running at: Strix Halo is a quad-channel chip at 8000 MT/s vs a normal PC, which is dual-channel at 5600 MT/s to 6200 MT/s (2*64-bit*6200/1000/8 = 99,2 GB/s)). A (mini-)PC based on the "Strix Halo" APU will run a LLM about 2.5 times faster: 256 GB/s / 99.2 GB/s = ~2.58.
- The RAM upgradability: The LPDDR5X RAM in "Strix Halo"-based PCs is not upgradable, maybe because it runs at 8000 MT/s vs 5600 MT/s to 6200 MT/s typically seen in DDR5 UDIMMs. A DDR5 UDIMM version with upgradable RAM may appear later, but it's not going to be at 8000 MT/s, like the soldered ones. CUDIMM may reach 8000 MT/s.
Questions to ask yourself:
- Is the LLM speed difference of 2.5 times (150 %) and the price worth it vs simply getting 2x48GB RAM sticks or 2x64GB RAM sticks for a fraction of the price and having then more RAM (although, yes, slower) vs paying 2400 bucks and being stuck with the hardware and no upgrade path (on a desktop you could upgrade to 4x64GB)?
- And, if the size matters, you can still get a mini-ITX case, AM5 mini-ITX motherboard and build a PC of the same size (or get a pre-built mini-ITX PC), with the possibility to:
- Upgrade the RAM.
- Having a dedicated GPU. For 4.0 - 4.5 Liter mini-ITX builds: RTX 4060 LP or RTX 5060 LP), and both are still better and faster (and harder, stronger, hehe) than the built-in iGPU in Strix Halo. And if you are ok with a 5.5 Liter case, then you can even fit a normal/full-sized 4060 Ti 16GB / 5060 Ti 16GB or 5070 (Rumor: 18 GB VRAM option in 2026's Refresh using 3 GB GDDR7 chips, instead of the current 2 GB ones - a 50% increase in VRAM density). Or a 5070 Ti Super with 24 GB VRAM.
- You get the ability to upgrade the GPU later, like when/if in 2026 the Refresh GPUs come out, using 3GB, instead of 2GB, GDDR7 chips and you get 50 % more VRAM in the same size.
- A dedicated GPU (4060 / 5060) will also have faster prompt processing (pp).
- The ability to partially or fully offload to the fast VRAM of the GPU (5060: 448 GB/s).
- A dedicated GPU adds additional capacity to the RAM.
- And, not LLM related, but: You can also game with higher FPS if you add a GPU that is faster than Strix Halo's iGPU (between RTX 4060 Laptop (=RTX 4050 desktop, which doesn't even exist, this is how bad it would be (en.wikipedia.org/wiki/GeForce_RTX_40_series)) and RTX 4070 Laptop (=RTX 4060 desktop)).
- Having 24 PCIe 4.0 lanes vs Stix Halo's 16 PCIe 4.0 lanes. Just know that some non-normal CPUs have only 16 PCIe lanes, instead of the full 24.
- Repairability.
- And, if looks matter, there are many arguably better looking mini-ITX cases, too.
Ok, but why/who would need 64 GB or 128 GB RAM all of a sudden? The latest hype is AI/running LLMs locally and it's private. Download e.g. LM Studio (wrapper for llama.cpp) and try it out or download llama.cpp directly (recently got their webUI redesigned) and the LLMs can be downloaded from huggingface.co). Open-weight LLMs are getting better and better, see "Artificial Analysis Intelligence Index by Open Weights vs Proprietary": artificialanalysis.ai/?intelligence-tab=openWeights.
Hardware requirements to run LLMs: Basically the filesize of the LLM must fit into your GPU's VRAM and/or system RAM minus whatever the OS needs. If you can't fit a model on the VRAM+RAM (or don't want/need the full weights, for e.g. speed reasons), you have to use a quantized model. The smaller the LLM (fewer (active) B parameters), the heavier quantized it can be and still perform well. For smaller LLMs, a Q4_K_M (4.83 BPW (bits per weight)) quant can still be good, because of the way of how a LLM's performance continues to be good from the full 16 BPW, down to between 5.3 to 3.9 BPW. For big LLMs, even dynamic 1-bit and 2-bit and especially 3-bit quants can perform (very) nicely: docs.unsloth.ai/new/unsloth-dynamic-ggufs-on-aider-polyglot.
Llama.cpp's GGUF format allows to run a LLM not only on GPU, but if you don't have enough VRAM, to also partially or fully offload to system RAM.
Current SOTA models:
- huggingface.co/unsloth/gpt-oss-20b-GGUF
- huggingface.co/openai/gpt-oss-20b (the original source) (from the makers of ChatGPT) (downloads last month: 6,556,549 and you've never heard of it?) (can fully fit on a 16 GB GPU (mentioned on OpenAI's own description) and therefore will run very fast)
- "Check out our awesome list for a broader collection of gpt-oss resources and inference partners." -> "Running gpt-oss with llama.cpp"
- huggingface.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
- huggingface.co/unsloth/GLM-4.5-Air-GGUF (106 B)
- huggingface.co/unsloth/gpt-oss-120b-GGUF (also from the makers of ChatGPT (huggingface.co/openai/gpt-oss-120b))
- huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF
- huggingface.co/unsloth/GLM-4.5-GGUF (357 B)
and many more, including thinking/reasoning variants (some LLMs, like gpt-oss, have easily configurable reasoning efforts: low, medium, high), like huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507, specialized ones, like Qwen3-Coder-30B-A3B-Instruct-GGU and big ones, like DeepSeek-V3.2-Exp-GGUF (671 B) and uncensored/abliterated ones.
News and infos about AI / LLMs e.g.: reddit.com/r/LocalLLaMA/top, reddit.com/r/LocalLLM/top
NOTEBOOKCHECK, maybe write an article to explain to your readers about the AI LLM self-hosting ability or why 64 GB RAM, 128 GB RAM or even more (AMD, when 256 GB RAM, 512-bit big Strix Halo/Strix Halo Halo/Strix Halo Max?) are a thing all of a sudden now, because normally, currently 32 GB RAM are enough for most people.