QuoteThe company claims it can run 120 billion parameter models locally
Thing is:
Quote from: reddit.com/r/LocalLLaMA/comments/1u8kr2o/we_need_a_80160b_model_urgently_the_unified/We need a 80-160B model urgently. The unified memory device market needs more Models.
(read full text there)
For anyone who doesn't know, an alternative from AMD is called Strix Halo:
AMD Strix Halo: 256 GB/s = 256-bit * 8533 MT/s / 1000 / 8.
NVIDIA Spark: 273 GB/s = 256-bit * 8533 MT/s / 1000 / 8.
There are so many recently released 200B to 300B parameters MoE models, that it would be really nice, if these devices had twice the memory (= 256 GB RAM / unified memory). Then a good quant of those models (see below) could be run.
An alternative in the meantime could be to run a Qwen3.6-27B quant on a used 24 GB VRAM GPU (typically RTX 3090) (24 GB is the absolute bare minimum), a model that performs much better for its parameters size, because it's a dense model, see:
artificialanalysis.ai/?models=gpt-oss-120b%2Cgpt-oss-120b-low%2Cgemma-4-31b%2Cgemma-4-31b-non-reasoning%2Cmistral-medium-3-5%2Cdeepseek-v4-flash-non-reasoning%2Cdeepseek-v4-flash-high%2Cdeepseek-v4-flash%2Cminimax-m2-7%2Cstep-3-7-flash%2Cmimo-v2-5-0424%2Cqwen3-6-35b-a3b%2Cqwen3-5-122b-a10b%2Cqwen3-6-27b%2Cqwen3-6-27b-non-reasoning%2Cqwen3-6-35b-a3b-non-reasoning%2Cqwen3-5-122b-a10b-non-reasoning&intelligence=artificial-analysis-intelligence-index