Post reply

Posted by heffeque

- October 15, 2025, 13:21:46

Quote from: 256GB+512GB/s when? on October 15, 2025, 11:52:06why 64 GB RAM, 128 GB RAM or even more (AMD, when 256 GB RAM, 512-bit big Strix Halo/Strix Halo Halo/Strix Halo Max?) are a thing all of a sudden now, because normally, currently 32 GB RAM are enough for most people.

That's an easy one: some large models don't fit in desktop graphics card (24 GB or less) very high speed VRAM, but do fit in 96 GB of Strix Halo high speed RAM, and perform much better than on traditional RAM.

Cheers!

Posted by 256GB+512GB/s when?

- October 15, 2025, 11:52:06

Quotea desktop PC is not going to be able to replace Strix Halo's 128GB and 256GB/s RAM and AMD advertises this

Let me correct myself + general update: It won't replace Strix Halo's 256 GB/s, but one can get 128 GB, 5600 MT/s, RAM for 340 bucks (includes taxes) or 256 GB RAM for that matter.

QuoteFor $1400 or $1600 I could have a full size gaming desktop.
Why would i pay that for something that limits my options?

Correct, if you don't care about the RAM bandwidth (it's only 2.5 times faster on Strix Halo (see below)) or size, a desktop PC with a dedicated GPU would add the much faster VRAM on top and all the advantages that come with that (faster prompt processing (pp) and token generation (tg)), or the up to 256 GB RAM option (on mainstream AM5 B850 (just a rename of B650 + USB4) and thus maybe B650 as well). See below for more.

QuoteBOSGAME M5
AMD Ryzen AI Max+ 395, Radeon 8060S
123483 MB/s

Geekom A9 Max, AI 9 HX 370
AMD Ryzen AI 9 HX 370, Radeon 890M
86541 MB/s

(from the Framework Desktop Strix Halo review table)
HX 370: 128-bit * 5600 MT/s / 1000 / 8 = 89.6 GB/s: Does check out with the measurement.
AI Max+ 395: 256-bit * 8000 MT/s / 1000 / 8 = 256 GB/s: Does NOT check out with the measurement.
Can you measure with the newest AIDA64 version 8.0?

LLMs
The "Strix Halo" APU is a 256-bit chip with a theoretical memory bandwidth of 256 GB/s (256-bit * 8000 MT/s / 1000 / 8) (and ~210 GB/s practically (expected)), comparable to an entry level quad-channel (4 * 64-bit) workstation' memory bandwidth. A normal desktop PC is dual-channel at best. AMD specifically advertises "Strix Halo" for running/inferencing LLMs. You can run the same LLMs on any PC, if you have at least the same amount of RAM (well, running off of a SSD will also work, but the speed will be super slow), ATX sized or not, dual-channel RAM or not, the differences are:

The size: This is 2.5 Liters (20*21.5*5.7).
The RAM speed at which any LLM will be running at: Strix Halo is a quad-channel chip at 8000 MT/s vs a normal PC, which is dual-channel at 5600 MT/s to 6200 MT/s (2*64-bit*6200/1000/8 = 99,2 GB/s)). A (mini-)PC based on the "Strix Halo" APU will run a LLM about 2.5 times faster: 256 GB/s / 99.2 GB/s = ~2.58.
The RAM upgradability: The LPDDR5X RAM in "Strix Halo"-based PCs is not upgradable, maybe because it runs at 8000 MT/s vs 5600 MT/s to 6200 MT/s typically seen in DDR5 UDIMMs. A DDR5 UDIMM version with upgradable RAM may appear later, but it's not going to be at 8000 MT/s, like the soldered ones. CUDIMM may reach 8000 MT/s.

Questions to ask yourself:

Is the LLM speed difference of 2.5 times (150 %) and the price worth it vs simply getting 2x48GB RAM sticks or 2x64GB RAM sticks for a fraction of the price and having then more RAM (although, yes, slower) vs paying 2400 bucks and being stuck with the hardware and no upgrade path (on a desktop you could upgrade to 4x64GB)?
And, if the size matters, you can still get a mini-ITX case, AM5 mini-ITX motherboard and build a PC of the same size (or get a pre-built mini-ITX PC), with the possibility to:
- Upgrade the RAM.
- Having a dedicated GPU. For 4.0 - 4.5 Liter mini-ITX builds: RTX 4060 LP or RTX 5060 LP), and both are still better and faster (and harder, stronger, hehe) than the built-in iGPU in Strix Halo. And if you are ok with a 5.5 Liter case, then you can even fit a normal/full-sized 4060 Ti 16GB / 5060 Ti 16GB or 5070 (Rumor: 18 GB VRAM option in 2026's Refresh using 3 GB GDDR7 chips, instead of the current 2 GB ones - a 50% increase in VRAM density). Or a 5070 Ti Super with 24 GB VRAM.
- Having 24 PCIe 4.0 lanes vs Stix Halo's 16 PCIe 4.0 lanes. Just know that some non-normal CPUs have only 16 PCIe lanes, instead of the full 24.
- Repairability.
- And, if looks matter, there are many arguably better looking mini-ITX cases, too.

Ok, but why/who would need 64 GB or 128 GB RAM all of a sudden? The latest hype is AI/running LLMs locally and it's private. Download e.g. LM Studio (wrapper for llama.cpp) and try it out or download llama.cpp directly (recently got their webUI redesigned) and the LLMs can be downloaded from huggingface.co). Open-weight LLMs are getting better and better, see "Artificial Analysis Intelligence Index by Open Weights vs Proprietary": artificialanalysis.ai/?intelligence-tab=openWeights.

Hardware requirements to run LLMs: Basically the filesize of the LLM must fit into your GPU's VRAM and/or system RAM minus whatever the OS needs. If you can't fit a model on the VRAM+RAM (or don't want/need the full weights, for e.g. speed reasons), you have to use a quantized model. The smaller the LLM (fewer (active) B parameters), the heavier quantized it can be and still perform well. For smaller LLMs, a Q4_K_M (4.83 BPW (bits per weight)) quant can still be good, because of the way of how a LLM's performance continues to be good from the full 16 BPW, down to between 5.3 to 3.9 BPW. For big LLMs, even dynamic 1-bit and 2-bit and especially 3-bit quants can perform (very) nicely: docs.unsloth.ai/new/unsloth-dynamic-ggufs-on-aider-polyglot.

Llama.cpp's GGUF format allows to run a LLM not only on GPU, but if you don't have enough VRAM, to also partially or fully offload to system RAM.

Current SOTA models:

huggingface.co/unsloth/gpt-oss-20b-GGUF

huggingface.co/openai/gpt-oss-20b (the original source) (from the makers of ChatGPT) (downloads last month: 6,556,549 and you've never heard of it?) (can fully fit on a 16 GB GPU (mentioned on OpenAI's own description) and therefore will run very fast)

"Check out our awesome list for a broader collection of gpt-oss resources and inference partners." -> "Running gpt-oss with llama.cpp"

huggingface.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
huggingface.co/unsloth/GLM-4.5-Air-GGUF (106 B)
huggingface.co/unsloth/gpt-oss-120b-GGUF (also from the makers of ChatGPT (huggingface.co/openai/gpt-oss-120b))
huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF
huggingface.co/unsloth/GLM-4.5-GGUF (357 B)

and many more, including thinking/reasoning variants (some LLMs, like gpt-oss, have easily configurable reasoning efforts: low, medium, high), like huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507, specialized ones, like Qwen3-Coder-30B-A3B-Instruct-GGU and big ones, like DeepSeek-V3.2-Exp-GGUF (671 B) and uncensored/abliterated ones.

News and infos about AI / LLMs e.g.: reddit.com/r/LocalLLaMA/top, reddit.com/r/LocalLLM/top

NOTEBOOKCHECK, maybe write an article to explain to your readers about the AI LLM self-hosting ability or why 64 GB RAM, 128 GB RAM or even more (AMD, when 256 GB RAM, 512-bit big Strix Halo/Strix Halo Halo/Strix Halo Max?) are a thing all of a sudden now, because normally, currently 32 GB RAM are enough for most people.

Posted by mac4moron

- October 14, 2025, 19:31:59

@ M4M
Aaaaaaa, as I can see, you're having a break in the madhouse again. Nice! :)))

Posted by Worgarthe

- September 20, 2025, 01:25:11

Quote from: M4M on September 19, 2025, 23:49:45
Quote from: Worgarthe on August 30, 2025, 23:14:59
Quote from: M4M on August 30, 2025, 23:08:25M4
Five NEON FMLA ops per clock cycle. Do the math Re@tard.
And st@fu you aint talking sh@it IRL
weak inc@el Fu@ck
Aaah, ok, ok, I get it now... So you write lkfhusdgfvncsiuvfqrh oisf hnouiqeswh nfjhr because of being in a hurry as your mom is trying, for your own good, to prevent you from using a phone or other tech. So before she comes back you just smash random stuff on a keyboard.
Thinktrash imbecile take the fucking shower until than stop fucking yapping to me incel rat! Just come up to me IRL soy i will fucking crush your jaw the second you enter my aura!

You are very slow to reply. Meaning your Apple chip is too slow. Rip.

Posted by M4M

- September 19, 2025, 23:46:28

Quote from: MM128 on August 30, 2025, 12:48:13@M4M
I tried your "C-RAY is the non biased most accurate version of CB R23".

So, while the CB R23 gives the same result after "warming up" (CPU throttling) the system with a difference of 1-2% each and every time, this pile of s..... shows results with a difference of 20 even up to 25%!!!
I'm 100% sure that some iMoron like you programmed it, lol

because you re a biological muller x86 shill and you dont even know basic compile flags.
Go f*** yourself x86imbecile or else i will for everyone to see what a soy bitch you are

Posted by Worgarthe

- August 30, 2025, 23:14:59

Quote from: M4M on August 30, 2025, 23:08:25M4
Five NEON FMLA ops per clock cycle. Do the math Re@tard.
And st@fu you aint talking sh@it IRL
weak inc@el Fu@ck

Aaah, ok, ok, I get it now... So you write lkfhusdgfvncsiuvfqrh oisf hnouiqeswh nfjhr because of being in a hurry as your mom is trying, for your own good, to prevent you from using a phone or other tech. So before she comes back you just smash random stuff on a keyboard.

Posted by heffeque

- August 30, 2025, 20:46:11

Quote from: M4M on August 30, 2025, 19:53:16And dont even start arguing about it little IN"C"EL
You aint doing s*** when you see me, you be running like a pray i promise you that! TAKE THE f***"ING SHOWER I CANT STAND YOUR DISGUSTING FI"LFTH a**, DONT F"UC"CKING TALK TO ME UNTIL YOU TAKE THE SHOWER AND GET SOME TOOTHPASTE TOO!! IN"CEL F"U"CK!!!!!

LOL, you're funny.

Posted by Worgarthe

- August 30, 2025, 19:54:27

Brainrot is real with this one.

Posted by MM128

- August 30, 2025, 12:48:13

@M4M
I tried your "C-RAY is the non biased most accurate version of CB R23".

So, while the CB R23 gives the same result after "warming up" (CPU throttling) the system with a difference of 1-2% each and every time, this pile of s..... shows results with a difference of 20 even up to 25%!!!
I'm 100% sure that some iMoron like you programmed it, lol

Posted by Mike25

- August 26, 2025, 21:42:40

@M4M
"...we alpha athletic charming leaders, managers, doctors, politicians dont hear your tiny pathetic Incel voice. You dont matter!"

Well, your post is an "alpha pr.ck" post, that is for sure and with and with a 10% market share it's pretty hard to say that you matter :D:D:D

Posted by Hotz

- August 26, 2025, 14:52:55

Quote from: M4M on August 25, 2025, 12:08:28M4PRO and M4MAX smokes this and its proven by the goldstandard FP Throughput Benchmark
M4MAX is beyond any mobile chip and destroys the most powerful consumer DESKTOP CPU

Maybe, but the Apple-World is a very restrictive ecosystem, which is reason enough to avoid by most people.

Posted by heffeque

- August 26, 2025, 13:32:40

Quote from: M4M on August 25, 2025, 12:08:28M4PRO and M4MAX smokes this

The M4 machines can't do audio-pass through, so no Atmos, no DTS:X, no go for me.

Quote from: M4M on August 26, 2025, 12:06:23This is a SoC for professionals, not clowns who game all day, are fat, weak, and ugly idiots who have the personality of a 5year old toddler.
Incels opinion dont matter, we alpha athletic charming leaders, managers, doctors, politicians dont hear your tiny pathetic Incel voice. You dont matter!

I beg to differ, I'm going to use my Framework Desktop as a silent but powerful home-cinema + gaming machine. I might play with AI a bit for fun.

You seem thoroughly invested in your mental image of who will be buying these PCs. It doesn't seem healthy.

I'd recommend that you get some fresh air, touch grass, meet some friends...

Posted by M4M

- August 26, 2025, 12:06:23

Quote from: Worgarthe on August 19, 2025, 00:38:24
Quote from: heffeque on August 19, 2025, 00:29:17
Quote from: beerkegbob on August 18, 2025, 16:17:21For $1400 or $1600 I could have a full size gaming desktop.
Why would i pay that for something that limits my options?
Why would I want a huge desktop PC when I can have a compact and efficient PC?

Repairability? Upgradability? Longer warranty on each component inside? Non-soldered RAM but instead upgradable up to 192 GB? Full control of any item inside? 9950X is faster than this CPU? It is also overclockable to squeeze even more performance? RTX 5070 Ti stomps this 8060S (exactly twice faster in benchmarks, much faster in games and it can run almost any game in ultra/maxed at 1440p at 165 Hz native without VRR, no need for upscaling either)? Quieter with better cooling (larger case, larger vents)? Fractal Terra is small and stylish case where everything mentioned can fit inside and aesthetically looks much nicer than this here?

All of that is exactly US$1618.02.

This one has LLM workload as its only advantage. Well unless you want to have as small as possible but no idea why that matters to such degree where 10 cm here and there plays any difference, unless you live in a 6 m2 apartment (six, yes).

This is a SoC for professionals, not clowns who game all day, are fat, weak, and ugly idiots who have the personality of a 5year old toddler.
Incels opinion dont matter, we alpha athletic charming leaders, managers, doctors, politicians dont hear your tiny pathetic Incel voice. You dont matter!

Posted by M4M

- August 25, 2025, 12:08:28

M4PRO and M4MAX smokes this and its proven by the goldstandard FP Throughput Benchmark, C-RAY.
M4MAX is beyond any mobile chip and destroys the most powerful consumer DESKTOP CPU, 285K.
M4MAX completes C-RAY 2.0 5K in 116seconds, while M4PRO needs 143sec,
Fastest 285K needs 137sec. 9950X3D is nowhere near at 163sec best run.
C-RAY is the non biased most accurate version of CB R23, as it measures pure FP Throughput and before any r e t a r d shill says its memory bound, no its not.
Ray Casting(here 16rays per pixel) is pure SIMD FP + some integer math. Maxxing out the TDP, stressing the CPU more than R23 as the Power Consumption is Higher than in R23.

For M4PRO its 45w in R23, and 63.5W(presumably hardwired PL1 limit) in C-RAY 5K Image render.
M4MAX consumes about 81w on laptop due to throttle, in mac studio its higher.
Than again in R23 it is only about 50w for M4MAX.

Lets not talk about M4MAX GPU perf, as that would be a humiliation. M4MAX GPU surpasses 150w 4090M in DGEMM(Raw Performance predictor(just like C-RAY for CPU's)) and in other optimized benchmarks like GFXbench.

All in all for the price this AI MAX+ is a good trade but dont ever think it's faster than M4MAX or even M4PRO just because you all jerk off to a biased anti ARM benchmark called R23.
C-RAY has no SSE or AVX inline loops or Intel Embree unlike R23 which is full of garbage X86 intrins and base lib is Embree. CLOWN BENCHMARK.

C-RAY is compiled before able to run it and thus it runs native and utilizes every ability of the maschine on both sides. x86 and ARM

Posted by JohnIL

- August 23, 2025, 22:26:16

Not really in my ballpark of what I need but I feel you do have to accept some compromises with these small form PC's. Some of with is upgradable parts and potentially some longevity issues depending on cooling effectiveness. Certainly understand not everyone wants a giant PC case sitting on desk or floor. But as with laptops your ability to maintain and upgrade is limited. If I am going to spend some serious money on what I hope is a long life PC. I do prefer to have upgrade option to refresh it and keep it being useful longer. Not to mention all the parts can be cooled more effectively. Which in my view would also contribute to a longer lifespan.

News:

Post reply

Topic summary

Posted by heffeque

Posted by 256GB+512GB/s when?

Posted by mac4moron

Posted by Worgarthe

Posted by M4M

Posted by Worgarthe

Posted by heffeque

Posted by Worgarthe

Posted by MM128

Posted by Mike25

Posted by Hotz

Posted by heffeque

Posted by M4M

Posted by M4M

Posted by JohnIL