I missed this kinda important (if memory prices weren't so high) news.
Quote from: Onymous on May 21, 2026, 11:28:26The same could be done on strix halo too.
Correct. Gogon Halo is part of AMD's rename/refresh/rerelease cycle, like all the Ryzen AI 400 are a rename of Ryzen AI 300. Gogon Halo is a renamed Strix Halo with a firmware and/or BIOS/minor hardware change, that allows for more than 128 GB RAM/unified memory.
While 192 GB RAM/VRAM sounds nice, it's only 160 GB that are actually allocateable. Under Linux, just as with Strix Halo, I assume something like 180 GB are going to be allocateable.
The biggest issue, as is with Strix Halo, will be the memory bandwidth of 256 GB/s and the therefore limited prompt processing (pp) and token generation (tg) speeds:
Strix Halo: 256 GB/s = 256-bit * 8000 MT/s / 1000 / 8.
256 GB/s allow for a 4060 Laptop level of performance:
3dmark.com/search - Steel Nomad:
- Strix Halo's Radeon 8060S iGPU (256 GB/s): "Average score: 2031".
- RTX 4060 (notebook) (256 GB/s): "Average score: 2261".
(4060 Laptop's 256 GB/s is froen.wikipedia.org/wiki/GeForce_RTX_40_series#Mobile)
See also [1].
Supporting up to 192 GB RAM/VRAM/unified memory is still a very important step.
Generally:
Running a dense AI LLM model, like Qwen3.6-27B on this Strix/Gogon Halo is not optimal and will run much slower due to the only 256 GB/s bandwidth vs running dense models (if they fit into the VRAM) on a, say, 24 GB VRAM GPU, like a RTX 3090, 4090 or 5090.
Strix/Gogon Halo is much better suited for running MoE AI LLM models, like Qwen3.5-122B-A10B, DeepSeek-V4-Flash, Step3.7 Flash, MiMo-V2.5, MiniMax-M2.7.
See e.g. this comparison of the mentioned LLMs: artificialanalysis.ai/?models=gpt-oss-120b%2Cgpt-oss-120b-low%2Cgemma-4-31b%2Cgemma-4-31b-non-reasoning%2Cmistral-medium-3-5%2Cdeepseek-v4-flash-non-reasoning%2Cdeepseek-v4-flash-high%2Cdeepseek-v4-flash%2Cminimax-m2-7%2Cstep-3-7-flash%2Cmimo-v2-5-0424%2Cqwen3-6-35b-a3b%2Cqwen3-5-122b-a10b%2Cqwen3-6-27b%2Cqwen3-6-27b-non-reasoning%2Cqwen3-6-35b-a3b-non-reasoning%2Cqwen3-5-122b-a10b-non-reasoning&intelligence=artificial-analysis-intelligence-index
Interesting how the low-end, 128-bit, APUs, support more RAM, than the much more expensive 256-bit Strix/Gorgon Halo ones:
Quote from: amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7840u.htmlMax. Memory 256 GB
Quote from: amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-7-350.htmlMax. Memory 256 GB
Quote from: amd.com/en/products/processors/laptop/ryzen/ai-400-series/amd-ryzen-ai-7-450.htmlMax. Memory 256 GB
AMD, maybe stop artificially limiting your 256-bit APUs? This doesn't exactly make you look like the good guy.
[1]:
AI, both for training and inferencing for the endconsumer requires these 3 things:
1. Memory size: To fit a decently capable LLM + its context.
2. Memory speed, also known as memory bandwidth: Relevant for token generation (output) speed.
3. GPU 3D/FPS performance / compute: Relevant for prompt processing (input) speed. The GPU performance is limited by the memory bandwidth (Steel Nomad score posted above).
Prompt processing: The larger the input, the faster the GPU you'd need, if you don't want to wait forever, especially for agentic workflows, where the input is naturally large.
(The number of CPU threads doesn't matter for running AI (aka inferencing) (3-4 threads pretty much tops-out a dual-channel PC))