The VRAM amount per $ is good, fair enough, but:
Why not use GDDR7, it offers 3 GB per chip density, then this GPU, using the same chip (well, the chip would need to have GDDR7 PHY, of course), could have 48 GB VRAM:
256-bit/32-bit per chip = 8 chips, 8 chips * 3 GB per chip = 24 GB VRAM, 24 GB VRAM * 2 (chips on both sides of the PCB) = 48 GB VRAM and the memory bandwidth would also be 30% higher, because it's GDDR7 and not GDDR6.
Alternative calculation: 32 GB VRAM * 1.5 (3 GB per chip, instead of the current 2 GB) = 48 GB VRAM.
Frankly, when it comes to AI/LLMs, I'm not interested in 32 GB VRAM GPUs..and I said this over 1 year ago.
Let's see if NVIDIA releases a consumer 48 GB VRAM GPU in the RTX 60 series (probably not, but NVIDIA released a RTX PRO 6000 Blackwell GPU with 96 GB VRAM, which I also didn't expect).
Quoteaimed at AI LLM training / inference without providing specific performance info in the initial press release.
(en.wikipedia.org/wiki/GeForce_RTX_50_series#Desktop, en.wikipedia.org/wiki/Intel_Arc#Workstation_2)
- RTX 5090: 1792 GB/s = 512-bit * 28 Gb/s / 8.
- B70 Pro / B65 Pro: 608 GB/s = 256-bit * 19 Gb/s / 8.
So a consumer 5090 has a 3 times faster memory bandwidth and CUDA, of course. The big thing these B70/B65 could have had for them is having 48 GB VRAM.
For inferencing these are probably fine (if already supported by the usual inferening apps, like llama.cpp) (I know NV works using Vulkan just fine (~20% slower than using CUDA)), but then again, a 5090 has 3x the token generation and also much higher prompt processing.