The new Ryzen AI Max+ 395 is AMD's latest high-end mobile processor. With up to 16 Zen 5 cores, a powerful Radeon GPU, fast NPU and up to 128 GB RAM, the Ryzen AI Max+ is supposed to be the ideal companion for gaming, content creation, and AI development.https://www.notebookcheck.net/AMD-Ryzen-AI-Max-395-Analysis-Strix-Halo-to-rival-Apple-M4-Pro-Max-with-16-Zen-5-cores-and-iGPU-on-par-with-RTX-4070-Laptop.963274.0.html
While the standard suite of benchmarks is appreciated in this review, these performance metrics are largely irrelevant to the target audience of a product like this. The AI series chips, with high bandwidth, uniform memory architecture, are targeted at LLM inference users. A valuable benchmark for these users could be the running of various LLMs in verbose mode to check the tokens/s. The advantage of this product is that it can fit massive models in memory compared to even the highest end dGPUs. A great comparison for this product segment would be to compare the performance of Llama3.3 70b using ollama on CPU, and GPU (using ROCm) to the M4 series hardware from apple.
I know, right? Give us tok/sec for various llms with various level of quantization. People are not scooping 4090s and 5090s these day to play games with highest fps, they are buying them to run local AI.
For the "Power Consumption / Cyberpunk 2077 ultra Efficiency", do you / could you do a version that combines the CPU and GPU power?
If lets say 4070 60W = 8060S 60W, that's great, but it's ignoring that the 4070 has a CPU to power alongside it
I don't know what would be a fair way to test, besides making curves; comparing 100W 4070 is "unfair" since you get diminishing returns as you approach 100W on it
Maybe just 1080p60 Medium 60fps limit? Or just test different TDP limits, but it would be arbitrary
I would love to see a CFD test run on all of these, or at least a meshing process. This is a great general test of many things and overall performance of motherboards, CPUs, gpus and memory. Along with added LLM testing...
Quote from: Yeshy on February 18, 2025, 23:15:26For the "Power Consumption / Cyberpunk 2077 ultra Efficiency", do you / could you do a version that combines the CPU and GPU power?
If lets say 4070 60W = 8060S 60W, that's great, but it's ignoring that the 4070 has a CPU to power alongside it
I don't know what would be a fair way to test, besides making curves; comparing 100W 4070 is "unfair" since you get diminishing returns as you approach 100W on it
Maybe just 1080p60 Medium 60fps limit? Or just test different TDP limits, but it would be arbitrary
Yeah, total system power should be used when comparing between APUs/SoCs and CPU+dGPUs. You'll find that power consumption is much higher in discrete hardware simply by design: having two chips, CPU and GPU, and two sets of memory, LPDDR5/DDR5 and GDDR6, and more VRM MOSFETs to provide power.
Aren't LLMs broken under windows and AMD due to crappy MS DirectML? So it may not get that great LLM results unless you load up Linux. That is assuming that amdgpu supports it. Then there is the fact that a lot of the software and libraries out there aren't going to be using the NPU to assist
"Gets destroyed by last years QC CPU in MC efficiency and SC performance."
Seems like a great chip in portable devices.
Quote from: Donkey545 on February 18, 2025, 17:01:45While the standard suite of benchmarks is appreciated in this review, these performance metrics are largely irrelevant to the target audience of a product like this. The AI series chips, with high bandwidth, uniform memory architecture, are targeted at LLM inference users. A valuable benchmark for these users could be the running of various LLMs in verbose mode to check the tokens/s. The advantage of this product is that it can fit massive models in memory compared to even the highest end dGPUs. A great comparison for this product segment would be to compare the performance of Llama3.3 70b using ollama on CPU, and GPU (using ROCm) to the M4 series hardware from apple.
If it's for AI users, why is it a laptop chip ?
Notebookcheck should test it with Unreal Engine where GPU memory is crucial and compare it with Nvidia options.
It would also be a speed test of GPU memory vs RAM memory for GPU use in a practical situation. Does it matter?
have you set v-ram to 8gb minimum
People want to use "AI" on their notebooks too, for always running local assistance.
The review plays a shell game, they should just talk about memory bandwidth rather than changing things around in the headline, screenshots, text, etc. As others have said, memory bandwidth is the main thing that matters for LLMs, the NPU is not really used for LLMs, it's GPU cores and memory bandwidth. Mixture of Expert models may become more popular, they can run ok on slower memory, but x86 still needs something at least as fast as Apple Max chips.
The Asus ROG Flow Z13 GZ302 is a true marvel that brings together what every company needs: mobility and performance.
My next laptop or tablet, and I don't care, but the most important thing and what 99% of users are looking for is an APU with the perfect balance between CPU + iGPU and this is what the AMD Ryzen AI Max + 395 Ryzen AI Max 390 Ryzen AI Max 385 Ryzen 9 AI Max 380 processors have.
Quote from: Papajon on February 19, 2025, 11:34:42Quote from: Donkey545 on February 18, 2025, 17:01:45While the standard suite of benchmarks is appreciated in this review, these performance metrics are largely irrelevant to the target audience of a product like this. The AI series chips, with high bandwidth, uniform memory architecture, are targeted at LLM inference users. A valuable benchmark for these users could be the running of various LLMs in verbose mode to check the tokens/s. The advantage of this product is that it can fit massive models in memory compared to even the highest end dGPUs. A great comparison for this product segment would be to compare the performance of Llama3.3 70b using ollama on CPU, and GPU (using ROCm) to the M4 series hardware from apple.
If it's for AI users, why is it a laptop chip ?
That question makes about as much sense as asking in an RTX4090 mobile review "If it's for gamers, why is it a laptop chip?". Sometimes, you might want to run a LMM offline. Sure, you could SSH into a server at home over a VPN or whatever, but maybe you just want a laptop that does everything and don't want a desktop. Also, a 128GB strix halo laptop might cost $3000, an 80GB H100 costs tens of thousands of dollars and requires a server to put it in.
Hey, the efficiency per watt for the cb 2024 test does not add up, after you have added the update.
Quote from: Papajon on February 19, 2025, 11:34:42If it's for AI users, why is it a laptop chip ?
There is Z13 tablet and HP laptop with Strix Halo. Then on desktop we have HP, Framework and one of the Chinese nettop makers. Right now it's 3:2 for the desktop with this chip.
Both HP offerings are for prosumers (with inflated pricing), Z13 is a tablet at extra the cost. Framework desktop is bit cheaper but still above classic desktop (offering better specs/value).
Can you also add Ollama as AI benchmark. Have DeepSeek R1 1.5B, 8B and 32B models tested. Use verbose option to see the speed in tokens per second.
Usual prompt is "Write me a 1000 word story"
You can repeat 5 times to get some average.