On battery saving mode (for no battery saving mode, you guessed it, just like above, multiply by ~2):
/Users/../llama-b8740/llama-bench --no-warmup -m /Users/../Qwen3.5-9B-UD-Q4_K_XL.gguf -p 64 -n 64 -t 1-12
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 1 | pp64 | 107.20 ± 6.39 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 1 | tg64 | 9.75 ± 0.06 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 2 | pp64 | 109.29 ± 0.92 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 2 | tg64 | 9.62 ± 0.01 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 3 | pp64 | 109.05 ± 0.96 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 3 | tg64 | 9.47 ± 0.00 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 4 | pp64 | 108.81 ± 0.81 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 4 | tg64 | 9.23 ± 0.13 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 5 | pp64 | 108.59 ± 0.62 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 5 | tg64 | 9.20 ± 0.01 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 6 | pp64 | 108.57 ± 1.21 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 6 | tg64 | 9.12 ± 0.01 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 7 | pp64 | 108.34 ± 1.24 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 7 | tg64 | 9.02 ± 0.04 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 8 | pp64 | 108.07 ± 1.57 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 8 | tg64 | 8.97 ± 0.01 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 9 | pp64 | 107.81 ± 0.76 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 9 | tg64 | 8.76 ± 0.15 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 10 | pp64 | 107.75 ± 0.58 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 10 | tg64 | 8.82 ± 0.03 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 11 | pp64 | 107.64 ± 0.44 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 11 | tg64 | 8.71 ± 0.03 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 12 | pp64 | 107.18 ± 0.34 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | MTL,BLAS | 12 | tg64 | 8.67 ± 0.05 |
This confirms that the number of threads doesn't matter. On my desktop 7800X3D 4 threads gives pretty much the fastest tokens per second.