32 GB RAM - 8 GB for the OS itself = 24 GB. The SOTA MoE for its size AI LLM model, huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF UD-Q4_K_XL quant (good for 80-120k unquantized context), is 22.9 GB. Context requires additional 2 GB for 32k and 8 GB for 128k context (linear). This means that the quant barely fits and there's pretty much almost or no space left for any context.
Quote from: reddit.com/r/LocalLLaMA/comments/1sq94qx/is_anyone_getting_real_coding_work_done_with.. I've come to the conclusion that (1) 32768 is the biggest context I can get away with in an adequately smart model, and (2) it just ain't enough.
The dense huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF UD-Q4_K_XL quant (17.9 GB) would run much slower (because it's dense 27B = 27B active parameters per token vs 3B active for the 35B model), but at least it does somewhat fit with some context.
Had this laptop additional 8 GB VRAM dGPU (then it would be a gaming laptop, so it doesn't make sense indeed) or 48 GB RAM, then yes, it would come much closer to being perfect.