Post reply

Name
Email
Subject
Message icon

[quote author=No need to really testLLM link=msg=744998 date=1773419670]
@Alex
For 3rd party LLM prompt processing (pp) speed only the iGPU gaming performance matters. 
-> So, if M5' iGPU perf increased by "up to 30%"² vs M4, then the 3rd party AI/LLM prompt processing speed will also increase by roughly the same amount.

For LLM token generation (tg), only the memory bandwidth matters (assuming one has enough unified memory to fit a LLM in the first place (so, the memory size matters too)).
-> So, if the memory bandwidth increased by 12.5% (=307.2 GB/s (M5 Pro)/273 GB/s (M4 Pro))¹, then the AI/LLM token generation speed will also increase by roughly the same amount.

¹ en.wikipedia.org/wiki/MacBook_Pro_(Apple_silicon)
² en.wikipedia.org/wiki/Apple_M5#Performance

APPLE's claim of [quote="en.wikipedia.org/wiki/Apple_M5#Performance"]Peak GPU AI compute: over 4× faster[/quote] is going to be (or rather: will be) relevant only to their own, tightly integrated, aka 1st party, solutions, not 3rd party LLMs. But, ofc, I have nothing against if the 3rd party LLM pp and tg speeds are tested.
[/quote]

Other options

Return to this topic
Don't use smileys

Verification:

Please leave this box empty:

Shortcuts: ALT+S post or ALT+P preview

Topic summary

Posted by RobertJasiek

- March 28, 2026, 12:51:31

Quote from: Himalayan Salt on March 28, 2026, 12:37:19Entire laptop market is a monopoly, it's either Nvidia or apple.

Duopoly or at best oligopoly, but otherwise right.

Posted by Himalayan Salt

- March 28, 2026, 12:37:19

Entire laptop market is a monopoly, it's either Nvidia or apple. Nobody bothers with anything else. Intels iGPU performance is still too poor for the cost and AMD doesn't even exist at all.

It's depressing.

Posted by Semper

- March 28, 2026, 00:43:33

I have the Asus G14 RTX 5070ti, running the benchmark from Team 2 Films that is a Davinci rendering of a file with several parts, the G14 RTX 5070ti takes 6000 seconds, worse than Macbook Neo, an M5 Max takes less than 180 seconds, for video the 5070ti mobile is in another dimension worse.
The G14 goes very well in Puget but with a real rendering the story is different, so what benchmarks like Puget or others are showing are to be taken with a grain of salt.

Posted by Alex

- March 16, 2026, 10:55:58

Token generation is indeed heavily dependent on bandwidth but if you compare the results of M3 Ultra chip vs M5 Max elsewhere, that's not the complete story and it reflects the new architecture having an advantage because the two are often matched, especially when we're talking about dense models and not MoE models.

To see the advantage of prefill that apple claims, you don't need anything particularly "tightly integrated", like you do with an NPU on windows. You just need to grab an MLX version of the model. It's Apple's own framework, sure, but it's widely available and can be used in many popular tools, like LM Studio.

So yes, there is a way and the need to test local LLMs and I even know more or less what to expect, I just think it's a realistic use of the new computer and hence a nice addition to an already extensive testing conducted here.

Posted by No need to really testLLM

- March 13, 2026, 17:34:30

@Alex
For 3rd party LLM prompt processing (pp) speed only the iGPU gaming performance matters.
-> So, if M5' iGPU perf increased by "up to 30%"² vs M4, then the 3rd party AI/LLM prompt processing speed will also increase by roughly the same amount.

For LLM token generation (tg), only the memory bandwidth matters (assuming one has enough unified memory to fit a LLM in the first place (so, the memory size matters too)).
-> So, if the memory bandwidth increased by 12.5% (=307.2 GB/s (M5 Pro)/273 GB/s (M4 Pro))¹, then the AI/LLM token generation speed will also increase by roughly the same amount.

¹ en.wikipedia.org/wiki/MacBook_Pro_(Apple_silicon)
² en.wikipedia.org/wiki/Apple_M5#Performance

APPLE's claim of

Quote from: en.wikipedia.org/wiki/Apple_M5#PerformancePeak GPU AI compute: over 4× faster

is going to be (or rather: will be) relevant only to their own, tightly integrated, aka 1st party, solutions, not 3rd party LLMs. But, ofc, I have nothing against if the 3rd party LLM pp and tg speeds are tested.

Posted by Alex

- March 13, 2026, 16:16:01

Thanks for the quick testing, I'm particularly grateful you include M5 Pro, as that's the version I'm personally interested in.

Would be great to see something like LM Studio testing, as this was one of the major improvements with these new chips and M5 was indeed noticeably better at it than the previous generation.

Note that you want to report prefill number and tokens/second. For best comparison, you can directly use any GGUF model for Nvidia/AMD and MLX models for Apple, as the latter introduces some additional special optimizations for Apple's GPUs. Nvidia's CUDA is quite well optimized with pretty much any GGUF version.

Posted by dada_dave

- March 12, 2026, 04:41:39

Quote from: Large L2 Cache too on March 10, 2026, 16:44:37
Quote from: davidm on March 10, 2026, 16:27:43the Mac Pro and Max have 384 and 512bit interfaces

Do you've a source for this? Apple generally don't disclose such specifications.

I'm slightly confused myself because multiple reviews and sites are stating that the M5 Pro and M5 Max chip are exact the same chip this time?

So, if true, are they like artificially segmenting through software a 512 bit to only use 384 bit or are they hardware binning bad yields (thought 3nm was fairly mature process with very few defective chips?) to give the differing effective bandwidths?

No the Max and Pro chips have the same CPU die, but different GPU dies and have different memory bandwidths. Apple has in fact disclosed bandwidth information (as well as number of memory controllers in the past) and people are able to figure out what the number of memory controllers must be from the bandwidth and RAM type. Also they follow a pretty regular pattern and the M5 is similar to the M4 generation.

Posted by Large L2 Cache too

- March 10, 2026, 16:44:37

Quote from: davidm on March 10, 2026, 16:27:43the Mac Pro and Max have 384 and 512bit interfaces

Do you've a source for this? Apple generally don't disclose such specifications.

I'm slightly confused myself because multiple reviews and sites are stating that the M5 Pro and M5 Max chip are exact the same chip this time?

So, if true, are they like artificially segmenting through software a 512 bit to only use 384 bit or are they hardware binning bad yields (thought 3nm was fairly mature process with very few defective chips?) to give the differing effective bandwidths?

Posted by davidm

- March 10, 2026, 16:27:43

It seems this site still doesn't register it's not just about the memory type (LPDDR5x-9600), it's also and hugely about the interface width. Strix Halo has a 256bit interface, the Mac Pro and Max have 384 and 512bit interfaces. Meanwhile, typical PCs have a 128bit interface. That's where a lot of the performance comes from, and it's an area PCs are mostly not competing unless going to a huge power hungry server board. It's been this way for a while. Macs are a lot more expensive, but I believe Strix Halo is selling well even at a premium. I'd happily pay double for a 384bit+ interface RAM Thinkpad.

Posted by pimpom

- March 10, 2026, 13:06:29

Blender v3.3 does not support MetalRT you idiot.

Posted by Doesn't line up

- March 10, 2026, 12:44:29

41% power efficiency improvement over the M4 Max 40-Core GPU is very good, but in Cyberpunk 2077 / Ultra Preset (FSR off) it's only 8%? Doesn't line up.

Posted by Detechtive

- March 10, 2026, 09:59:19

Quote from: joneskind on March 10, 2026, 04:11:19
Quote from: dada_dave on March 10, 2026, 01:37:08Just FYI Blender 3.3 does not use hw-accelerated ray tracing on Apple Silicon. That didn't come in until Blender 4.2 I think. Though it doesn't explain why the M5 Max did worse than the M4 Max here, it might explain why CB 2024 GPU shows the more expected results.

Even worse. Blender has been using HW-accelerated ray tracing on Nvidia cards since 3.3.

M5 Max 40 should be has powerful as the RTX 5090 laptop in Blender, if not even more, in Blender 4.5.

See opendata.blender

I did check. And assuming that Apple GPU scales linearly from M5 to M5 Max, Apple's best GPU is still at least 1000+ points behind RTX 5090 laptop. And remember, the Nvidia chip is already more than a year old at this point. Apple-Metal, Nvidia-OptiX.

Posted by PumpkinFury

- March 10, 2026, 08:18:53

My Strix Scar 18 5090 Laptop (175w) hit 6,752 in Steel Nomad.

Posted by joneskind

- March 10, 2026, 04:11:19

Quote from: dada_dave on March 10, 2026, 01:37:08Just FYI Blender 3.3 does not use hw-accelerated ray tracing on Apple Silicon. That didn't come in until Blender 4.2 I think. Though it doesn't explain why the M5 Max did worse than the M4 Max here, it might explain why CB 2024 GPU shows the more expected results.

Even worse. Blender has been using HW-accelerated ray tracing on Nvidia cards since 3.3.

M5 Max 40 should be has powerful as the RTX 5090 laptop in Blender, if not even more, in Blender 4.5.

See opendata.blender

Posted by dada_dave

- March 10, 2026, 01:37:08

Just FYI Blender 3.3 does not use hw-accelerated ray tracing on Apple Silicon. That didn't come in until Blender 4.2 I think. Though it doesn't explain why the M5 Max did worse than the M4 Max here, it might explain why CB 2024 GPU shows the more expected results.

News:

Post reply

Topic summary

Posted by RobertJasiek

Posted by Himalayan Salt

Posted by Semper

Posted by Alex

Posted by No need to really testLLM

Posted by Alex

Posted by dada_dave

Posted by Large L2 Cache too

Posted by davidm

Posted by pimpom

Posted by Doesn't line up

Posted by Detechtive

Posted by PumpkinFury

Posted by joneskind

Posted by dada_dave