Quote from: Michael Davidson on Today at 06:43:28The GB10's saving grace is of course CUDA and the ability to run NVFP4 models.
Problem is, even after a year nobody NVFP4 models are basically unavailable.
Maybe the bits per weight (BPW) are simply not enough after all, so nobody even bothers training them in native NVFP4?
Q4_K_M is the most popular quant when it comes to size vs performance and it is 5.24 BPW, after that, the performance drops off sharply. Gpt-Oss-120B is such a natively trained (QAT) (not on NVFP4 tho) 4-bit model and it constantly makes so many mistakes that I use it less and less, the BPW may be simply not enough: Forget 4-bit for now and move to 4.5 or 5.24-bits in the meantime. NVIDIA needs to introduce NVFP5 for the Blackwell successor XD, haha.
Quote from: It is a rival on Yesterday at 22:45:43it's 96 GB now.
Correct and 819.3 GB/s (M3 Ultra) at 96 GB RAM is still better and Spark not a rival.
Total memory size x speed value:
M3 Ultra: 78653 = 819.3 GB/s * 96 GB
GB10/Spark: 34944 = 273 GB/s * 128 GB
But I agree, offering only 96 GB is crazy low, and a waste of the huge 1024 bits APU potential (exactly like giving a RTX 5090 6 GB of VRAM (5.33 times lower: 512 GB -> 96 GB = 32 GB -> 6 GB), so low in fact, that it's going to push some users to get AMD Strix Halo or NVIDIA Spark, depending on their use case, especially considering the price.
Problem is tho, 96 GB @ 819.3 GB/s can fit and run dense models that outperform MoE models on the slower 128 GB RAM ;) and compensate hugely for the fact that it's not 128 GB RAM. If Qwen releases a dense e.g. 50B to 90B model, then the 819.3 GB/s start to shine even more. But we don't have to wait: Qwen3.6-27B dense is a good example.