Quote from: AI+ on May 08, 2026, 09:42:00(The number of CPU threads doesn't matter for running AI (aka inferencing) (4 threads pretty much tops-out a dual-channel PC))Correction: >4 threads improve MTP LLMs speed:
./llama.cpp/build/bin/llama-server --models-preset 'LLMs.ini' --threads 8 --models-max 1 --no-models-autoload -np 1 --no-warmup --chat-template-kwargs '{"preserve_thinking": true}' --no-mmproj --offline --no-mmap --spec-type draft-mtp --spec-draft-n-max 2Quote from: may not adequate for AI on May 06, 2026, 09:06:53(The number of CPU threads doesn't matter for running AI (aka inferencing) (4 threads pretty much tops-out a dual-channel PC))Correction: >4 threads improve MTP LLMs speed:
./llama.cpp/build/bin/llama-server --models-preset 'LLMs.ini' --threads 8 --models-max 1 --no-models-autoload -np 1 --no-warmup --chat-template-kwargs '{"preserve_thinking": true}' --no-mmproj --offline --no-mmap --spec-type draft-mtp --spec-draft-n-max 2Quote from: Same on May 13, 2026, 10:07:22(The number of CPU threads doesn't matter for running AI (aka inferencing) (4 threads pretty much tops-out a dual-channel PC))Correction: >4 threads inr MTP:
./llama.cpp/build/bin/llama-server --models-preset 'LLMs.ini' --threads 8 --models-max 1 --no-models-autoload -np 1 --no-warmup --chat-template-kwargs '{"preserve_thinking": true}' --no-mmproj --offline --no-mmap --spec-type draft-mtp --spec-draft-n-max 2Quote from: Very slow for AI on May 18, 2026, 12:04:34(The number of CPU threads doesn't matter for running AI (aka inferencing) (4 threads pretty much tops-out a dual-channel PC))
Quote from: Woof on May 25, 2026, 15:38:16I'm not impressed. The Mac Mini M5 is about to come out, which will be cheaper and faster.Correction: >4 threads inr MTP:
./llama.cpp/build/bin/llama-server --models-preset 'LLMs.ini' --threads 8 --models-max 1 --no-models-autoload -np 1 --no-warmup --chat-template-kwargs '{"preserve_thinking": true}' --no-mmproj --offline --no-mmap --spec-type draft-mtp --spec-draft-n-max 2