LLMsThe "Strix Halo" APU is a 256-bit chip with a theoretical memory bandwidth of 256 GB/s (256-bit * 8000 MT/s / 1000 / 8) (and ~210 GB/s practically (expected)), comparable to an entry level quad-channel (64-bit * 4) workstation' memory bandwidth. A normal desktop PC is dual-channel at best. AMD specifically advertises "Strix Halo" for LLM inferencing. You can do the same with a normal desktop ATX sized PC with dual-channel RAM, the differences are:
- The size
- The RAM of any Strix Halo is not upgradable (maybe because it runs at 8000 MT/s vs 6000 to 6200 MT/s in on normal ATX type motherboard in a desktop sized PC)
- And the speed at which a LLM will be running at (Strix Halo is quad-channel at 8000 MT/s vs a normal desktop PC, which is dual-channel at 6000 to 6200 MT/s (2*64-bit*6200/1000/8 = 99,2 GB/s)). A (mini-)PC based on the "Strix Halo" APU will run a LLM about 2.5 times faster: 256 GB/s / 99,2 GB/s = ~2.58.
Using the relatively expensive Strix Halo APU/chip, but giving it only 64 GB RAM is waste of silicon, because it's simply not enough for many LLMs (btw: the memory bandwidth will be the same, they are just using less dense RAM chips): Give it at least 96 GB RAM.
Questions to ask yourself:
- Is the LLM speed difference of 2.5 times (150 %) and the price worth it vs simply getting 2x48GB RAM sticks or 2x64GB RAM sticks for a fraction and having then more RAM (although, yes, slower) vs paying 1500 bucks and being stuck with the hardware / no upgrade path?
- And if the size matters, you can still get a AM5 mini-ITX motherboard and build a pretty small systems as well (or get a pre-built mini-ITX PC), with the possibility to: 1) upgrade the RAM if needed, 2) add a much faster GPU to the PCIe 4.0 x16 / PCIe 5.0 x16 slot (if you attach a NVIDIA GPU, your are going to have less issues running the LLMs as well and with much faster prompt processing (pp) with any LLM and, of course, offload a LLM entirely on the GPU (if it has enough VRAM) or partially offload some of the LLM layers to the GPU, which will increase the speed exponentially the more layers you can offload to the GPU's VRAM. And, not LLM related, but: You can also game with higher FPS if you add a GPU that is faster than Strix Halo's iGPU (between RTX 4060 Laptop (=RTX 4050 desktop, which doesn't even exist, this is how bad it would be (en.wikipedia.org/wiki/GeForce_RTX_40_series)) and RTX 4070 Laptop (=RTX 4060 desktop)).
- And if looks matter, there are many arguably better looking mini-ITX cases, too.
PS: For what's it worth: A whole notebook with fast 64 GB dual-channel, LPDDR5X-7500 RAM, but with a slower iGPU than Strix Point (=slower prompt processing (pp) than on Strix Point), can be get for 1319 bucks: "Lenovo ThinkPad P16s G2 (AMD), Villi Black, Ryzen 7 PRO 7840U, 64GB RAM, 1TB SSD, .." (at least in this region) (the 8840U APU is just a refresh of the 7840U APU and Strix Point' RAM speeds are only slightly faster at 8000 MT/s and Zen 5 is also only slightly faster than Zen 4) (and for 1511 bucks with a better display).
The higher the memory bandwidth, the faster AI / LLMs will run on your PC.
Ok, but why/who would need 64 GB or 128 GB RAM all of a sudden?
The latest hype is AI/running LLMs. Download e.g. LM Studio (wrapper for llama.cpp) and try it out.
The difference to ChatGPT is that it is private (and, to be fair, not as good as the best that proprietary AI has to offer, but it's catching up quick (and if the open-weight LLMs are always ~1 year behind, does it matter? (the difference is actually getting closer: as per artificialanalysis.ai/?intelligence-tab=openWeights report)).
You can also download llama.cpp directly and the LLMs can be downloaded from huggingface.co.
Current SOTA model that fits into 64 GB RAM at q8 quant:
huggingface.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF
Current SOTA models that fit into 128 GB RAM (or fit using a decent quant, not the full FP16) are e.g.:
huggingface.co/unsloth/GLM-4.5-Air-GGUF
huggingface.co/unsloth/gpt-oss-120b-GGUF (from the makers of ChatGPT)
Better, but won't fit:
huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF (tho the 2-bit and 3-bit quants might)
huggingface.co/unsloth/GLM-4.5-GGUF
huggingface.co/unsloth/DeepSeek-V3.1-GGUF
and many more, including "thinking" variants, like huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507.
News and infos about AI / LLMs: reddit.com/r/LocalLLaMA/top/
NOTEBOOKCHECK, maybe write an article to explain to your readers about the AI LLM self-hosting ability or why 64 GB RAM or even 128 GB RAM are a thing all of a sudden now, because normally, 32 GB RAM are enough for most people.