Quote from: yes on May 26, 2026, 09:22:29(The number of CPU threads doesn't matter for running AI (aka inferencing) (4 threads pretty much tops-out a dual-channel PC))
Correction: >4 threads improve MTP LLMs speed:
Using Qwen3.6-27B-UD-Q4_K_XL.gguf (download: huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF):
./llama.cpp/build/bin/llama-server --models-preset 'LLMs.ini' --threads 8 --models-max 1 --no-models-autoload -np 1 --no-warmup --chat-template-kwargs '{"preserve_thinking": true}' --no-mmproj --offline --no-mmap --spec-type draft-mtp --spec-draft-n-max 2- 4 threads: ~8.7 tokens per second
- 6 threads: ~10.8 tokens per second
- 8 threads: ~12 tokens per second
- beyond 8 threads no speed improvement in my case
Source: reddit/"PSA: Test your "threads" argument in llama.cpp (+80% performance in my case)"