News:

Willkommen im Notebookcheck.com Forum! Hier können sie über alle unsere Artikel und allgemein über Notebook relevante Dinge disuktieren. Viel Spass!

Main Menu

Intel x Nvidia Serpent Lake leaks as Strix Halo rival with capable CPU and big GeForce RTX Rubin iGPU

Started by Redaktion, December 23, 2025, 10:53:50

Previous topic - Next topic

Redaktion

Intel and Nvidia are officially developing x86 chips for future Windows laptops. Intel Serpent Lake architecture could be a result of this partnership. RedGamingTech claims that Intel could use an Nvidia RTX iGPU for the Serpent Lake APUs.

https://www.notebookcheck.net/Intel-x-Nvidia-Serpent-Lake-leaks-as-Strix-Halo-rival-with-capable-CPU-and-big-GeForce-RTX-Rubin-iGPU.1190608.0.html


14

Quote16 memory channels for increased bandwidth.
DDR6 has 24-bit per channel? (as opposed to DDR5's 64-bit per channel) -> 16 channels * 24-bit = 384-bit. AMD's Strix Halo has a 256-bit wide memory interface and supports up to 128 GB RAM. Assuming same memory density and scaling up linearly, this INTEL + NVIDIA APU can have 192 GB, which would be pretty good.

Strix Halo: 8000 MT/s * 256-bit / 1000 / 8 = 256 GB/s.
INTEL + NVIDIA APU (assuming LPDDR6' "between 10,667 and 14,400 MT/s"): 10667 MT/s * 384-bit / 1000 / 8 = 512 GB/s (the bandwidth of a Geforce RTX 4070) to 691 GB/s.

Not bad, especially the 700 GB/s. The up to 192 GB RAM is the more important value tho.
What for are 192 GB RAM needed? For big-ish (MoE) LLMs. Strix Halo's up to 128 GB RAM is just not cutting it. Starting from 192 GB RAM allows one to host the better/smarter LLMs.

FSV

Quote from: 14 on Yesterday at 14:27:20What for are 192 GB RAM needed?

192 GB with an Nvidia badge is going to cost you like $10k :X

If you've that kind of cash, wouldn't it be better just buy / rent a small nuclear reactor and data centre at that point?

not spending 10k

@FSV

Luckily, for inferencing, only (V)RAM size and bandwidth matter, so if NGREEDIA thinks anyone would spend 10k for this rumored DGX Spark * 1.5, I have a bridge to sell them (as YT/The Duran would say):

The cheapest AMD Strix Halo PC with the full 128 GB RAM config is ~2k, having 50% more bit bus lanes and therefore 50% more RAM and 50% more performance (if the GPU would also be scaled up, which it probably would, not necessarily by the full 50% tho), then a new Strix Halo Halo / Medusa Halo PC would cost 3000 bucks.

The DGX Spark is 4000 bucks, right? This is 100% more expensive than what AMD offers, despite having the same RAM capacity and (almost) the same RAM bandwidth (8000 MT/s vs 8533 MT/s).
So, based on the above, this rumored INTEL+NVIDIA APU would cost: 2000 bucks [Strix Halo/256-bit APU] * 1.5 [Medusa Halo/384-bit APU] * 2 [NGREEDIA tax if you only inference, because AMD can inference using vulkan just fine, not fine-tuning or don't require CUDA] = 6000 bucks = 4000 [DGX Spark] * 1.5.

You are not that far off.

This vs a DIY desktop PC: When the RAM prices were normal a few months ago, a normal desktop PC with 192 GB RAM or 256 GB RAM + a dedicated GPU, would be far, far, cheaper.
The only difference is that it wouldn't be as compact and it would be a 128-bit, aka dual-channel (2*64-bit), system, running at 5600 MT/s, while this would be hexa-channel (6*64-bit), running at 8000 MT/s, so a 3*(8000/5600) = 4.3 times higher memory bandwidth.
192 GB RAM was 600 bucks a few months ago + a used 4090 for 800 bucks and the rest for 400 = less than 2000 bucks. So it would be 3 less expensive and the 4090 would smoke the iGPU in this rumored APU.

AFAIK, Strix Halo' iGPU (8060S) is slightly slower than a RTX 4060, so the DIY system with the 4090 would also be much faster in this regard, meaning in LLM prompt processing, which is comparable to gaming performance.

But even with uploading to the 4090' 24 GB of 1008 GB/s VRAM, in total, the DIY system would much slower at inferencing than these 384-bit APUs, because the 4090 only has 24 GB of the fast 1008 GB/s.

Using a (used) last gen DDR4 octa-channel server motherboard would give a memory bandwidth of 8*64-bit per channel * 3200 MT/s / 1000 / 8 = 205 GB/s; 12-channel DDR5 would be at 12 channels * 64-bit per channel * 4800 MT/s / 1000 / 8 = 460 GB/s.

The memory bandwidth of these rumored 384-bit, high MT/s, APUs would finally reach server hardware RAM speeds. These APU can be built-in into much smaller systems, but they "make up" with their high price for it. Also, their hardware can't be upgraded and the iGPU is still weak (RTX 4060 * 1.5 level of performance = (slightly slower than a) ~RTX 4070).

pp: If the input, aka context, is big-ish, then a 4070 type of LLM prompt processing (pp) performance would be the bottleneck of these rumored 192-bit APUs.
tg: Using a more modern MoE LLM with only 10B active parameters at a 4-bit quant (= 5 GB per token) ("only" means something positive in this case, as lower is better, if the performance is the same), then the token generation (tg) speed at 500 GB/s would be: 500 GB/s / 5 GB = 100 token/s in theory and in practice it's like 50 t/s.

Quick Reply

Name:
Email:
Verification:
Please leave this box empty:
Shortcuts: ALT+S post or ALT+P preview