Medicaid's, correct, with any such mini-PCs, one pays the price for it being so small.
Since you mentioned it, leme quote:
LLMsThe "Strix Halo" APU is a 256-bit chip with a theoretical memory bandwidth of 256 GB/s (256-bit * 8000 MT/s / 1000 / 8) (and ~210 GB/s practically (expected)), comparable to an entry level quad-channel (4 * 64-bit) workstation' memory bandwidth. A normal desktop PC is dual-channel at best. AMD specifically advertises "Strix Halo" for running/inferencing LLMs. You can run the same LLMs on any PC, if you have at least the same amount of RAM (well, running off of a SSD will also work, but the speed will be super slow), ATX sized or not, dual-channel RAM or not, the differences are:
- The size: This is 2.5 to 4 Liters, depending on the Strix Halo chassis.
- The RAM speed at which any LLM will be running at: Strix Halo is a quad-channel chip at 8000 MT/s vs a normal PC, which is dual-channel at 5600 MT/s to 6200 MT/s (2*64-bit*6200/1000/8 = 99,2 GB/s)). A (mini-)PC based on the "Strix Halo" APU will run a LLM about 2.5 times faster: 256 GB/s / 99.2 GB/s = ~2.58.
- The RAM upgradability: The LPDDR5X RAM in "Strix Halo"-based PCs is not upgradable, but it runs at 8000 MT/s vs 5600 MT/s to 6200 MT/s typically seen in DDR5 UDIMMs. A DDR5 UDIMM version with upgradable RAM may appear later, but it's not going to be at 8000 MT/s, like the soldered ones. CUDIMM may reach 8000 MT/s.
Questions to ask yourself:
- Is the LLM speed difference of 2.5 times (150 %) and the price worth it vs simply getting 2x48GB RAM sticks or 2x64GB RAM sticks for a fraction of the price and having then more RAM (although, yes, slower) vs paying 2400 bucks and being stuck with the hardware and no upgrade path (on a desktop you could upgrade to 4x64GB)?
- And, if the size matters, you can still get a mini-ITX case, AM5 mini-ITX motherboard and build a PC of the same size (or get a pre-built mini-ITX PC), with the possibility to:
- Upgrade the RAM.
- Having a dedicated GPU. For 4.0 - 4.5 Liter mini-ITX builds: RTX 4060 LP or RTX 5060 LP), and both are still better and faster (and harder, stronger, hehe) than the built-in iGPU in Strix Halo. And if you are ok with a 5.5 Liter case, then you can even fit a normal/full-sized 4060 Ti 16GB / 5060 Ti 16GB or 5070 (rumor: 18 GB VRAM option in 2026's Refresh using 3 GB GDDR7 chips, instead of the current 2 GB ones - a 50% increase in VRAM density). A 5070 Ti SUPER 24 GB VRAM may fit too.
- You get the ability to upgrade the GPU later, like when/if in 2026 the Refresh GPUs come out, using 3GB, instead of 2GB, GDDR7 chips and you get 50 % more VRAM in the same size.
- A dedicated GPU (4060 / 5060) will also have much faster prompt processing (pp), since it's much faster than the iGPU (think of it in terms of gaming performance (higher FPS = higher pp).
- The ability to partially or fully offload to the fast VRAM of the GPU (5060: 448 GB/s) for much faster token generation (tg), where the memory bandwidth is key.
- A dedicated GPU adds additional, very fast, memory capacity to the RAM.
- And, not LLM related, but: You can also game with higher FPS if you add a GPU that is faster than Strix Halo's iGPU (between RTX 4060 Laptop (=RTX 4050 desktop, which doesn't even exist, this is how bad it would be (en.wikipedia.org/wiki/GeForce_RTX_40_series)) and RTX 4070 Laptop (=RTX 4060 desktop)).
- Having 24 PCIe 4.0 lanes vs Stix Halo's 16 PCIe 4.0 lanes. Just know that some non-normal CPUs have only 16 PCIe lanes, instead of the full 24.
- Repairability.
- And, if looks matter, there are many arguably better looking mini-ITX cases, too.
- A normal mainstream AMD AM5 B650 / B850 mobo, using 2x64 GB dual-channel 6000 MT/s RAM + a RTX 4090 (1 TB/s bandwidth) will have similar token generation (tg) speed as a Strix Halo, then layers are also offloaded to the 4090, and much faster prompt processing (pp) than Strix Halo' 4060 Laptop iGPU performance.