Post reply

Name
Email
Subject
Message icon

[quote author=some llama.cpp benchmarks link=msg=750675 date=1775833173]
First I confirmed that the CPU and GPU scores of my new Air 15 M5 align with what is expected.

Here are some llama.cpp's llama-bench benchmarks:
First with battery saving mode on:
[code]/Users/../llama-b8740/llama-bench --no-warmup -m /Users/../Qwen3.5-9B-UD-Q4_K_XL.gguf -p 128 -n 256 -t 1,2,3,4
ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_library_init_from_source: error compiling source
ggml_metal_device_init: - the tensor API is not supported in this environment - disabling
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.013 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:  MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple10  (1010)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction  = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
| model                          |      size |    params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      1 |          pp128 |        110.11 ± 4.18 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      1 |          tg256 |          9.50 ± 0.30 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      2 |          pp128 |        107.58 ± 6.53 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      2 |          tg256 |          9.45 ± 0.08 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      3 |          pp128 |        110.79 ± 1.14 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      3 |          tg256 |          9.29 ± 0.09 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      4 |          pp128 |        110.78 ± 1.42 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      4 |          tg256 |          8.75 ± 0.95 |[/code]
With the default no battery saving mode:
[code]/Users/../llama-b8740/llama-bench --no-warmup -m /Users/../Qwen3.5-9B-UD-Q4_K_XL.gguf -p 128 -n 256 -t 1,2,3,4
[..]
| model                          |      size |    params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      1 |          pp128 |        216.04 ± 8.47 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      1 |          tg256 |        20.17 ± 0.04 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      2 |          pp128 |        209.43 ± 0.35 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      2 |          tg256 |        20.21 ± 0.03 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      3 |          pp128 |        196.55 ± 5.79 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      3 |          tg256 |        18.34 ± 3.22 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      4 |          pp128 |        193.50 ± 2.13 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      4 |          tg256 |        19.57 ± 0.18 |[/code]
[/quote]

Other options

Return to this topic
Don't use smileys

Verification:

Please leave this box empty:

Shortcuts: ALT+S post or ALT+P preview

Topic summary

Posted by some llama.cpp benchmarks

- Today at 16:59:33

First I confirmed that the CPU and GPU scores of my new Air 15 M5 align with what is expected.

Here are some llama.cpp's llama-bench benchmarks:
First with battery saving mode on:

Code Select

/Users/../llama-b8740/llama-bench --no-warmup -m /Users/../Qwen3.5-9B-UD-Q4_K_XL.gguf -p 128 -n 256 -t 1,2,3,4
ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_library_init_from_source: error compiling source
ggml_metal_device_init: - the tensor API is not supported in this environment - disabling
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.013 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:  MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple10  (1010)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction  = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
| model                          |      size |    params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      1 |          pp128 |        110.11 ± 4.18 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      1 |          tg256 |          9.50 ± 0.30 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      2 |          pp128 |        107.58 ± 6.53 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      2 |          tg256 |          9.45 ± 0.08 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      3 |          pp128 |        110.79 ± 1.14 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      3 |          tg256 |          9.29 ± 0.09 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      4 |          pp128 |        110.78 ± 1.42 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      4 |          tg256 |          8.75 ± 0.95 |

With the default no battery saving mode:

Code Select

/Users/../llama-b8740/llama-bench --no-warmup -m /Users/../Qwen3.5-9B-UD-Q4_K_XL.gguf -p 128 -n 256 -t 1,2,3,4
[..]
| model                          |      size |    params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      1 |          pp128 |        216.04 ± 8.47 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      1 |          tg256 |        20.17 ± 0.04 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      2 |          pp128 |        209.43 ± 0.35 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      2 |          tg256 |        20.21 ± 0.03 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      3 |          pp128 |        196.55 ± 5.79 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      3 |          tg256 |        18.34 ± 3.22 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      4 |          pp128 |        193.50 ± 2.13 |
| qwen35 9B Q4_K - Medium        |  5.55 GiB |    8.95 B | MTL,BLAS  |      4 |          tg256 |        19.57 ± 0.18 |

Posted by only bandwidth = more tg

- March 16, 2026, 12:05:46

Or look up the 153.6 GB/s and 120 GB/s values here: en.wikipedia.org/wiki/Apple_silicon#M-series_SoCs.

Posted by only bandwidth = more tg

- March 16, 2026, 12:04:31

As expected, a vid by Alex Ziskind (youtube.com/watch?v=XGe7ldwFLSE), in this case, proves that Apples's claim of

Quote from: en.wikipedia.org/wiki/Apple_M5#PerformancePeak GPU AI compute: over 4× faster

does not apply to running 3rd party LLMs and only the RAM/unified memory bandwidth increase increases the token generation (28% = 1.28 = 153.6 GB/s (M5) / 120 GB/s (M4)). (153.6 GB/s = 128-bit * 9600 MT/s / 1000 / 8)

Posted by Interesting for reviewers

- March 16, 2026, 11:07:12

This may be interesting for reviewers:

Quote from: youtube.com/watch?v=HKxIGgyeISMApple's Energy Model - Deconstructed

In this video, I reverse engineer Apple's Energy Model on the Mac Studio M4 Max. In the process I explain why and how measured DC power can appear up to 3 times higher than reported M4 Max GPU power.

[..]

Posted by How tastes differ

- March 13, 2026, 09:35:10

Quote from: juri on March 13, 2026, 03:09:11and still no matte option for the screen, so idiotic without any reason.

It's not 12W, look again. Maybe the matte coating on your screen played a prank on you ;)
And I'm so glad that it's glossy (richer colors, sharper text). Mind that while it's glossy, the anti-reflective coating/ability is very good.

Posted by juri

- March 13, 2026, 03:09:11

12W average in idle?? how can that be, lunar lake is only a third of this.
why are you not doing the battery test with idle and video playback any more??

and still no matte option for the screen, so idiotic without any reason.
until then i wont consider an air.

Posted by Will MBAir fit 27B quants

- March 11, 2026, 18:15:54

Quote from: not_anton on March 11, 2026, 17:16:44
Quote from: Will MBAir fit 27B quants on March 11, 2026, 12:23:33Will the 24 GB RAM option fit Qwen3.5-27B-UD-Q4_K_XL.gguf (17.6 GB) or Qwen3.5-27B-UD-Q5_K_XL.gguf (20.2 GB)? (huggingface.co/unsloth/Qwen3.5-27B-GGUF) (I know there's mlx-community/Qwen3.5-27B-4bit (16.1 GB) too, but I don't know if its perplexity is good)

I have a 15" M2 Air with 24GB RAM, but it can only run 3B models max because of overheating. Work is fine, gaming is fine, but LLMs throttle it to 0.4GHz on GPU in a minute. Sorry, you would need something with a fan or two to make those models useful.

Good to know, but will those quants fit (if I had ot guess I'd say yes)? There's also sysctl iogpu.wired_limit_mb=<MB> (I know not to assign too much to the VRAM (1-3 GB may be ok), as the OS may start to write/swap to the SSD).

Posted by not_anton

- March 11, 2026, 17:16:44

Quote from: Will MBAir fit 27B quants on March 11, 2026, 12:23:33Will the 24 GB RAM option fit Qwen3.5-27B-UD-Q4_K_XL.gguf (17.6 GB) or Qwen3.5-27B-UD-Q5_K_XL.gguf (20.2 GB)? (huggingface.co/unsloth/Qwen3.5-27B-GGUF) (I know there's mlx-community/Qwen3.5-27B-4bit (16.1 GB) too, but I don't know if its perplexity is good)

I have a 15" M2 Air with 24GB RAM, but it can only run 3B models max because of overheating. Work is fine, gaming is fine, but LLMs throttle it to 0.4GHz on GPU in a minute. Sorry, you would need something with a fan or two to make those models useful.

Posted by Will MBAir fit 27B quants

- March 11, 2026, 12:23:33

Will the 24 GB RAM option fit Qwen3.5-27B-UD-Q4_K_XL.gguf (17.6 GB) or Qwen3.5-27B-UD-Q5_K_XL.gguf (20.2 GB)? (huggingface.co/unsloth/Qwen3.5-27B-GGUF) (I know there's mlx-community/Qwen3.5-27B-4bit (16.1 GB) too, but I don't know if its perplexity is good)

Posted by h

- March 09, 2026, 16:25:21

screen contrast is lower so much than m4 15 macbook air ,is a defective one or m5 just had bad panel, we need more example

Posted by zorka

- March 09, 2026, 14:39:28

Can you please check the screen for Dithering? thank you.

Posted by Understated

- March 09, 2026, 14:03:26

Quote from: dumb_oems on March 09, 2026, 11:44:29This is not emphasized enough
...
the actual user experience, especially on battery power

I agree. I don't think people understand. There are many gaming laptops out there that give wonderful performance @ almost 60 dB noise. So basically like an airplane taking of a runway. Is that really usable? Then there's the heat output, enough to make someone impotent by sterilizing the sperm on your ballsac.

What happens to your thermals / noise output if you live in a warmer country which is especially becoming increasingly prevalent in this era of global warming?

We're not even getting into battery health and longevity increasing due to less heat.

My hope is that this showcase is enough of an embarrassment that the other big tech entities (Nvidia, AMD, Intel, Qcomm, etc) in the windows ecosystem are pretty much forced to wake up, start innovating again and eventually catch up in few years so we can all be benefit. Much like how over a decade ago Apple were among the first to start mandating SSDs & IPS displays in laptops when HDDs and TN were the norm.

Posted by Reliability

- March 09, 2026, 12:55:03

Quote from: JimD on March 09, 2026, 12:11:02Fanless is not really an attraction for me.

What a reliability? Would you consider that an important factor to you? Moving parts are considered less reliable. That's one of the reasons why we moved from HDDs to SSDs (apart from massive speed boost in read seek and write times) and also moved away from optical mediums (CDs, DVDs, BluRays) in laptops.

Fanless devices tend to have longer runtimes as well.

Posted by JimD

- March 09, 2026, 12:11:02

Fanless is not really an attraction for me. What I want is a quiet design. I'm not bothered by a fan that I can't hear. Windows OEMs are incredibly incompetent and can't even achieve giving that as an option.

Posted by dumb_oems

- March 09, 2026, 12:04:06

Indirectly, this also means that battery life benchmarks are not fair.

You will tear your hairs out while using heavily throttled Intel CPUs on Windows laptops, apps will be slow to open, tabs will be slow to switch etc. Unless you switch to high perf power profiles, which will reduce battery life by 80%.

But of course, on Macbooks, there is no such benchmark hacking. The experience is snappy throughout. And even with that, they can still deliver 17 hours of battery life. Nothing with a comparable user experience comes close in the non-apple laptop market.

News:

Post reply

Topic summary

Posted by some llama.cpp benchmarks

Posted by only bandwidth = more tg

Posted by only bandwidth = more tg

Posted by Interesting for reviewers

Posted by How tastes differ

Posted by juri

Posted by Will MBAir fit 27B quants

Posted by not_anton

Posted by Will MBAir fit 27B quants

Posted by h

Posted by zorka

Posted by Understated

Posted by Reliability

Posted by JimD

Posted by dumb_oems