Strix Halo LLM benchmark 1YouTube/Just Josh benchmarked an AI/LLM: youtu.be/oyrAur5yYrA?t=743 ("ZBook Ultra G1a: Finally, Strix Halo in a Laptop"):
17.2 tokens per second with Qwen2.5-14B Q4_K_M (8.99 GB).
As I wrote in my previous post:
If 121 GB/s is correct: 121 GB/s / 8.99 GB = 13.5 tokens per second.
If 179 GB/s is correct: 179 GB/s / 8.99 GB = 19.9 tokens per second.
Fortunately, this would make the measured 121 GB/s incorrect: 8.99 GB * 17.2 [tokens per second = GB/s] = 154.6 GB/s.
For comparison (YT/"I tried to run a 70B LLM on a MacBook Pro .."), an M4 Pro (256-bit, LPDDR5X-8533, 273 GB/s) achieves 18.79 tokens per second (youtu.be/5bNDx5XBlLY?t=157) with the same 8.99 GB LLM. (Theoretically, this would be: 273 GB/s * 0.7 / 8.99 GB = 21.25 tokens per second, so it's about right.)
-> AMD Strix Halo (256-bit, LPDDR5X-8000, = 256 GB/s) vs M4 Pro (256-bit, LPDDR5X-8533, = 273 GB/s): 17.2 t/s vs 18.79 t/s. So, that's fine, and the measured 121 GB/s is fortunately wrong.
The video also mentions that the 96 GB RAM configured in the BIOS cannot be used by the software.
Strix Halo LLM benchmark 2YouTube/jack stone benchmarked an AI/LLM: youtu.be/watch?v=UXjg6Iew9lg ("【English subtitle】GMK EVO-X2 AI Max+ 395 Mini PC Review!"):
Strix Halo RAM bandwidth measured (AIDA64 measures crap):Strix Halo AIDA64: Read: 119.91 GB/s (LLM benchmarks suggest that this result is incorrect).
- Qwen3-14B (dense), Q4_K_M (8.38 GB): 20.28 tokens per second
- Qwen3-32B (dense), Q4_K_M (18.4 GB): 9.61 tokens per second
- Qwen3-30B-A3B (MOE), Q4_K_M (17.35 GB): 52.89 tokens per second
- LLama3.3-70B-Instruct (dense), Q3_K_L (37.1 GB)¹ (strange why they didn't test Q4_K_M): 5.45 tokens per second
- Qwen3-235B-A22B (MOE), Q3_K_L (111.17 GB): It doesn't work because, as you can see, only 96 GB of the 128 GB can be used...big fail!
- Qwen3-235B-A22B (MOE), (Q2_K) 85.69 GB: Even this low quanta still doesn't work...big fail!
¹ Checked here: huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF/tree/main
Strix Halo RAM bandwidth calculated:8.38 GB * 20.28 t/s [= GB/s] = 169.95 GB/s.
18.4 GB * 9.61 t/s [= GB/s] = 176.8 GB/s.
37.1 GB * 5.45 t/s [= GB/s] = 202 GB/s.
Generally speaking: You can go up to Q4_K_M, but then the quality drops significantly. For example, the aforementioned Q3 Quant, which doesn't work anyway, is questionable. So, if you want to use Qwen3-235B-A22B Q4_K_M (142.65 GB), you need 192 GB RAM (128 GB RAM ~ 256-bit [bus width], 192 GB RAM ~ 384-bit [bus width]) (hopefully, more than 75% of the RAM can be used, otherwise even 192 GB RAM wouldn't be enough and you'd need 256 GB RAM (512-bit). Rumor has it that 192 GB RAM up to 256 GB RAM will be available with the "Strix Halo" successor, "Medusa Halo."
NOTEBOOKCHECK, why not add LLM benchmarks to your parkour (let's say starting with a 256-bit bus width and starting with 64 GB RAM, or starting with 128 GB RAM would also be OK, because such a test doesn't take long (the LLM files can be copied to a device faster than downloading them).