Quote from: 48 and 64 GB RAM nice on Yesterday at 19:39:1148 GB RAM fits and runs LLMs like GLM-4.7-Flash at solid q8 quant (32 GB) (or Gpt-Oss-20B (14 GB), but this would also fit within 32 GB RAM).
Unfortunately 64 GB RAM won't fit Gpt-Oss-120B even the smallest quant (62.6 GB), but 96 GB RAM would fit it easily with a lot (or even full) of context (go to huggingface.co/spaces/oobabooga/accurate-gguf-vram-calculator, paste huggingface.co/unsloth/gpt-oss-120b-GGUF/blob/main/gpt-oss-120b-F16.gguf into the calculator and set context to full).
Well, you will run LLM on NPU. Not iGPU lmao. I bought laptop with Ryzen AI 350 and normal SO-DIMM, upgraded to 64GB and I can now use NPU with 32GB. It's faster, than iGPU, but you must use models optimized for it. Right now I'm playing with converting qwen3 coder to onnx, int8 or int4 (NPU has int8 HW optimization).