Post reply

Posted by Strix Halo RAM bandwidth

- May 07, 2025, 09:41:06

PS: @NOTEBOOKCHECK, if I search for strix halo in notebookcheck.com/Notebook-Suche.1095.0.html, it doesn't list any results? You need to improve ur search.

Posted by Strix Halo RAM bandwidth

- May 07, 2025, 09:36:28

Strix Halo LLM benchmark 1
YouTube/Just Josh benchmarked an AI/LLM: youtu.be/oyrAur5yYrA?t=743 ("ZBook Ultra G1a: Finally, Strix Halo in a Laptop"):
17.2 tokens per second with Qwen2.5-14B Q4_K_M (8.99 GB).
As I wrote in my previous post:
If 121 GB/s is correct: 121 GB/s / 8.99 GB = 13.5 tokens per second.
If 179 GB/s is correct: 179 GB/s / 8.99 GB = 19.9 tokens per second.
Fortunately, this would make the measured 121 GB/s incorrect: 8.99 GB * 17.2 [tokens per second = GB/s] = 154.6 GB/s.

For comparison (YT/"I tried to run a 70B LLM on a MacBook Pro .."), an M4 Pro (256-bit, LPDDR5X-8533, 273 GB/s) achieves 18.79 tokens per second (youtu.be/5bNDx5XBlLY?t=157) with the same 8.99 GB LLM. (Theoretically, this would be: 273 GB/s * 0.7 / 8.99 GB = 21.25 tokens per second, so it's about right.)

-> AMD Strix Halo (256-bit, LPDDR5X-8000, = 256 GB/s) vs M4 Pro (256-bit, LPDDR5X-8533, = 273 GB/s): 17.2 t/s vs 18.79 t/s. So, that's fine, and the measured 121 GB/s is fortunately wrong.

The video also mentions that the 96 GB RAM configured in the BIOS cannot be used by the software.

Strix Halo LLM benchmark 2
YouTube/jack stone benchmarked an AI/LLM: youtu.be/watch?v=UXjg6Iew9lg ("【English subtitle】GMK EVO-X2 AI Max+ 395 Mini PC Review!"):
Strix Halo RAM bandwidth measured (AIDA64 measures crap):
Strix Halo AIDA64: Read: 119.91 GB/s (LLM benchmarks suggest that this result is incorrect).

Qwen3-14B (dense), Q4_K_M (8.38 GB): 20.28 tokens per second
Qwen3-32B (dense), Q4_K_M (18.4 GB): 9.61 tokens per second
Qwen3-30B-A3B (MOE), Q4_K_M (17.35 GB): 52.89 tokens per second
LLama3.3-70B-Instruct (dense), Q3_K_L (37.1 GB)¹ (strange why they didn't test Q4_K_M): 5.45 tokens per second
Qwen3-235B-A22B (MOE), Q3_K_L (111.17 GB): It doesn't work because, as you can see, only 96 GB of the 128 GB can be used...big fail!
Qwen3-235B-A22B (MOE), (Q2_K) 85.69 GB: Even this low quanta still doesn't work...big fail!

¹ Checked here: huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF/tree/main

Strix Halo RAM bandwidth calculated:
8.38 GB * 20.28 t/s [= GB/s] = 169.95 GB/s.
18.4 GB * 9.61 t/s [= GB/s] = 176.8 GB/s.
37.1 GB * 5.45 t/s [= GB/s] = 202 GB/s.

Generally speaking: You can go up to Q4_K_M, but then the quality drops significantly. For example, the aforementioned Q3 Quant, which doesn't work anyway, is questionable. So, if you want to use Qwen3-235B-A22B Q4_K_M (142.65 GB), you need 192 GB RAM (128 GB RAM ~ 256-bit [bus width], 192 GB RAM ~ 384-bit [bus width]) (hopefully, more than 75% of the RAM can be used, otherwise even 192 GB RAM wouldn't be enough and you'd need 256 GB RAM (512-bit). Rumor has it that 192 GB RAM up to 256 GB RAM will be available with the "Strix Halo" successor, "Medusa Halo."

NOTEBOOKCHECK, why not add LLM benchmarks to your parkour (let's say starting with a 256-bit bus width and starting with 64 GB RAM, or starting with 128 GB RAM would also be OK, because such a test doesn't take long (the LLM files can be copied to a device faster than downloading them).

Posted by RobinLight

- April 25, 2025, 18:39:02

QuoteYou are missing the point completely.

1. Form factor: There is nothing with so powerful CPU and GPU (more powerful than the Nvidia 5090 in LLM AI models) in this small workstation form factor.

What??? The only benefit of AMDs APU approach is having more V-Ram for larger models. But GPU performance and especially memory bandwidth are far less than dedicated GPUs like a 5090 have.
So even a medium sized 70b Q4 model would be slow as hell as estimated in earlier posts.

Don't make the mistake to repeat AMD's RTX 4090 comparison by let them on purpose exceed its maximum video memory. As soon as the model fits in the VRAM of the RTX, Strix Halo doesn't stand a chance.

In the end, I would rather spend my money in a lightweight RTX 5090 notebook even though it has only 24gb of V-RAM... or alternatively in a better suited M4 Max with tons of high bandwidth memory.

By the way: here in Europe HP doesn't sell the 128GB version at all. And worse than that, Asus even doesn't sell 64GB versions of their model!

It's like an advertisement saying: Move on to Apple, there is nothing for you here...

Posted by Wholesome

- April 21, 2025, 14:49:03

Quote from: sharath on April 19, 2025, 16:13:26Did AMD forget to implement power management in this?

I think they did. AMD tend to be terrible at sorting out low power. Not sure if it's a skill or experience issue. The only APU that was done right on release was the steam deck's, most likely because valve themselves did the tweaking. It took 8-9 months after strix points release to finally fix the efficiency issues plaguing the platform on windows.

One thing that's funny / weird is they don't tend to have as good efficiency vs competitors under light CPU only loads but as soon as you start putting heavier loads or stressing the GPU driver they tend to be a lot more competitive.

Quote from: sharath on April 19, 2025, 16:13:26Why would anyone buy this over a gaming laptop. you can get a RTX 4060 laptop for well under 1000$.

Because not everyone's living in the US. Sure in the US, there are better options out there. Even MacBooks are better priced. I would not be looking at the laptop if I was living in US with access to bestbuy or microcenter.

But for people living outside outside the states. There are regions in the world where this is a lot better priced than the US pricing or the global pricing of the flow z13 and MBPs. I'm kind of thankful for this as usually people living globally are told to bend over when it comes to pricing, so it's finally nice to have some relief here.

Posted by Dont_Look_Up

- April 21, 2025, 10:23:38

Quote from: sharath on April 19, 2025, 16:13:26WOW!! what a disappointment... Why would anyone buy this over a gaming laptop. you can get a RTX 4060 laptop for well under 1000$.

You are missing the point completely.

1. Form factor: There is nothing with so powerful CPU and GPU (more powerful than the Nvidia 5090 in LLM AI models) in this small workstation form factor.
2. The price is because of the professional and quality levels of this HP series. Ye, you can get a plastic, bulky laptop with 4060 but is not for buiseness worksation users. Yu can also get the commercial Asus Flow Z13 with same internals for half the price.
3. Yes, the power profile is not optimised. See comments below. You can get at least 50% more battery life with an easy custom setting (See The Phawx review on Youtube)

Personally, I can't find anything better as a professional workstation user in this form factor and quality.

Posted by sharath

- April 19, 2025, 16:13:26

WOW!! what a disappointment. The whole idea was power with no compromise on battery life. This is worse than having a DGPU. Did AMD forget to implement power management in this? 7 hrs hours for wifi. the newer 50 series gaming laptops do better. what was even the point of this then?
And that price tag, trying to get some suckers who still think this is like Apples m-series chips.

Has to be the most disappointing chips of them all. Why would anyone buy this over a gaming laptop. you can get a RTX 4060 laptop for well under 1000$.

Posted by dmitriy-kirilovTix

- April 19, 2025, 11:34:57

you were at that show? I thought we were only playing for a few local tweekers and the guys in the other bands. Come say hullo next time.

Posted by Look_down_instead

- April 15, 2025, 21:57:37

Quote from: Dont_Look_Up on April 15, 2025, 20:28:18ZBook Ultra has larger battery than the Flow Z13 (74.5 vs 56 Watt-hours)

But it doesn't tho. Might be mixing up with the old flow z13 from the year or 2 before. The latest 2025 flow z13 with the same Ryzen AI max chip has a 70 Whr battery, which is almost the same roughly.

Quote from: Dont_Look_Up on April 15, 2025, 20:28:18shows how unoptimised are the default power profiles.

+1 Agreed.

I would like to see more capped ingame benchmark tests (@30fps, 40-45, 60, 85-90, etc) in a variety of games (not just new but 5, 7, 10, 15, etc year old games) -- just to see how the power scales. The miracle of this chip isn't when running uncapped latest games it seems but seeing just how much can be squeezed within a limited power budget given a load that's not as demanding (older games and or at capped FPS).

Posted by Dont_Look_Up

- April 15, 2025, 20:28:18

Quote from: Worgarthe on April 15, 2025, 14:13:45Edit: So the game runs at 30 fps but the total system power is 7.6 to 8.4W which is completely insane and way too impressive 😳

And if you think that the HP ZBook Ultra has larger battery than the Flow Z13 (74.5 vs 56 Watt-hours) while having the same internals, it gives a potential battery life while gaming around 12 hours for the ZBook with The Phawx's custom settings!

By the way, if you check the FPS while gaming Celeste in the video, it is 60 FPS or very close to it!

Simply amazing, and also shows how unoptimised are the default power profiles.

Posted by tokens per second

- April 15, 2025, 17:42:58

Quote"Dual-Channel" seems wrong even on HP's datasheet and quickspecs. Ignoring that LPDDR5 channels are 32bit, the Ryzen AI Max 256-bit memory bus should have 4 64bit channels ("Quad-channel"). AMD's specs do not list a channel number. Memory channels are not mentioned on this site's review nor in Apple's specifications (though some sites say the MacBookPro M4 Max 128GB has 8 channels). For accuracy maybe the channels should be omitted here as in the MacBookPro review. (Or make it "multi-channel" if you need to contrast with "single-channel".)

Good point, it could be 8*32-bit, instead of 4*64-bit like in normal desktop systems or both. But it leads to the same following bandwidth speed calculation:

QuoteYou can't use the AIDA memory benchmark result for this formula ..

Right, ty for making me looking it up again: Strix Halo is a 256-bit chip, so its theoretical bandwidth is:

Code Select

= 8000 * 64 (bit per channel) * 4 (channels) (= the 256-bit memory bus width) / 8 (bit to Byte) / 1000 (MB to GB).
= 256 GB/s

The practical benchmark value is often 70-80% of the theoretical value, so 256 GB/s * 0.75 = ~192 GB/s.
Let's recalc:

Code Select

tokens per second = 192 GB/s / 39.6 GB (Llama-3.3-70B-Instruct-Q4_K_M.gguf)
                  = 4.85

Posted by RobinLight

- April 15, 2025, 15:29:36

Quote from: tokens per second on April 15, 2025, 08:55:01I agree. You can still roughly calculate the speed (works for "dense" models like the LLama-70B):
Code Select Expand
tokens per second = bandwidth / filesize = 121177 MB/s / 39600 MB (Llama-3.3-70B-Instruct-Q4_K_M.gguf) = 3.06

You can't use the AIDA memory benchmark result for this formula because it's just using a classic CPU related workload with it's own intrinsic bottlenecks. I am sure, the iGPU can achieve much more throughput near the theoretical limit of the memory bus.
And of course, this formula works only for a single user inferencing scenarios. I guess, local agent based systems are going to be more important in the future so there are multiple requests in parallel where the memory bandwidth is to some degree less important.

Posted by Worgarthe

- April 15, 2025, 14:13:45

Quote from: Dont_Look_UpNo he is playing "Celeste" at 1080p with full effects for 9.5 hours!
Check his video at 34min (I can't post the link).

Ok, thank you for the info with the timestamp basically, I know the vid is long so that's helpful and I appreciate it. I will check in a moment.

You can, btw, post links, just delete the https part, so post like this pretty much: youtube.com/watch?v=yiHr8CQRZi4

Edit: So the game runs at 30 fps but the total system power is 7.6 to 8.4W which is completely insane and way too impressive 😳

Posted by Dont_Look_Up

- April 15, 2025, 13:40:51

Quote from: Worgarthe on April 15, 2025, 11:29:22
Quote from: Dont_Look_Up on April 15, 2025, 11:25:34He gets 9.5 hours of battery life while gaming!
Playing Minesweeper and Solitaire?

No he is playing "Celeste" at 1080p with full effects for 9.5 hours!
Check his video at 34min (I can't post the link).

Posted by gc

- April 15, 2025, 12:14:50

"Dual-Channel" seems wrong even on HP's datasheet and quickspecs. Ignoring that LPDDR5 channels are 32bit, the Ryzen AI Max 256-bit memory bus should have 4 64bit channels ("Quad-channel"). AMD's specs do not list a channel number. Memory channels are not mentioned on this site's review nor in Apple's specifications (though some sites say the MacBookPro M4 Max 128GB has 8 channels). For accuracy maybe the channels should be omitted here as in the MacBookPro review. (Or make it "multi-channel" if you need to contrast with "single-channel".)

Posted by Worgarthe

- April 15, 2025, 11:29:22

Quote from: Dont_Look_Up on April 15, 2025, 11:25:34He gets 9.5 hours of battery life while gaming!

Playing Minesweeper and Solitaire?

News:

Post reply

Topic summary

Posted by Strix Halo RAM bandwidth

Posted by Strix Halo RAM bandwidth

Posted by RobinLight

Posted by Wholesome

Posted by Dont_Look_Up

Posted by sharath

Posted by dmitriy-kirilovTix

Posted by Look_down_instead

Posted by Dont_Look_Up

Posted by tokens per second

Posted by RobinLight

Posted by Worgarthe

Posted by Dont_Look_Up

Posted by gc

Posted by Worgarthe