First I confirmed that the CPU and GPU scores of my new Air 15 M5 align with what is expected.
Here are some llama.cpp's llama-bench benchmarks:
First with battery saving mode on:
/Users/../llama-b8740/llama-bench --no-warmup -m /Users/../Qwen3.5-9B-UD-Q4_K_XL.gguf -p 128 -n 256 -t 1,2,3,4
ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_library_init_from_source: error compiling source
ggml_metal_device_init: - the tensor API is not supported in this environment - disabling
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.013 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 1 | pp128 | 110.11 ± 4.18 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 1 | tg256 | 9.50 ± 0.30 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 2 | pp128 | 107.58 ± 6.53 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 2 | tg256 | 9.45 ± 0.08 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 3 | pp128 | 110.79 ± 1.14 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 3 | tg256 | 9.29 ± 0.09 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 4 | pp128 | 110.78 ± 1.42 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 4 | tg256 | 8.75 ± 0.95 |With the default no battery saving mode:
/Users/../llama-b8740/llama-bench --no-warmup -m /Users/../Qwen3.5-9B-UD-Q4_K_XL.gguf -p 128 -n 256 -t 1,2,3,4
[..]
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 1 | pp128 | 216.04 ± 8.47 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 1 | tg256 | 20.17 ± 0.04 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 2 | pp128 | 209.43 ± 0.35 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 2 | tg256 | 20.21 ± 0.03 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 3 | pp128 | 196.55 ± 5.79 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 3 | tg256 | 18.34 ± 3.22 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 4 | pp128 | 193.50 ± 2.13 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 4 | tg256 | 19.57 ± 0.18 |