Intel's Raja Koduri showed off a die shot of the 2-tile Ponte Vecchio Xe HPC GPU, which is said to feature seven advanced silicon technologies including the 16 compute units that are fabbed on Intel's own 7 nm process. The other key technologies in Xe HPC include Xe Link I/O, HBM2 VRAM, Rambo Cache, EMIB, 10 nm Enhanced Super Fin, and Foveros 3D packaging technology.
https://www.notebookcheck.net/First-Intel-7-nm-wonder-comes-to-life-Raja-Koduri-reveals-full-die-shot-of-the-Ponte-Vecchio-Xe-HPC-2-tile-GPU-with-8-192-cores.517222.0.html
I recall from other presentations that Intel is using HBM2e stacks. (not HBM2)
The Rambo Cache is SRAM.
Another advanced technology is that the IO supports PCIE5/CXL for configuration and cache coherency.
Are you sure 128 EU per die? Intel also said EU could be configured to have dedicated FP64 pipeline, in which case it would only need 64 EU per compute die.
Either way about 21 tflops double precision per Ponte Vecchio GPU.
...Ok, no idea where this puts them.
Someone have a good Rambo Cache joke?
Quote from: danwat1234 on January 30, 2021, 19:16:10
Someone have a good Rambo Cache joke?
"I will fight to keep their "memory" alive for ever" ;)
Quote from: Johan on January 30, 2021, 06:09:49
Are you sure 128 EU per die? Intel also said EU could be configured to have dedicated FP64 pipeline, in which case it would only need 64 EU per compute die.
Either way about 21 tflops double precision per Ponte Vecchio GPU.
You're right. It should be 64 EUs as Ponte Vecchio is expected to have 512 EUs for both the tiles i.e. 1024 EUs for the entire 2-tile package. Corrected :)
Quote from: S.Yu on January 30, 2021, 19:13:18
...Ok, no idea where this puts them.
A100 has about 10 + 20 TFLOPS double precision. The 20 come from tensor cores. So, this is about 2/3 of the best what Nvidia has to offer. 3090 is around one TFLOP FP64 I believe. So, about 20 times what the 3090 can do. It won't be as good in FP32. It was made for FP64.