News:

Willkommen im Notebookcheck.com Forum! Hier können Sie über alle unsere Artikel und allgemein über notebookrelevante Dinge diskutieren. Viel Spass!

Main Menu

Intel releases new graphics card with up to 32 GB VRAM

Started by Redaktion, Today at 14:00:23

Previous topic - Next topic

Redaktion

Intel has released a new graphics card with up to 32 GB of VRAM. Also leveraging up to 32 Xe2 cores, the Arc Pro B70 will soon be joined by the cheaper Arc Pro B65, which Intel claims are 'cost-effective' yet 'high-performance solutions'.

https://www.notebookcheck.net/Intel-releases-new-graphics-card-with-up-to-32-GB-VRAM.1258638.0.html

48 GB VRAM when

Why not use GDDR7, which offers 3 GB per chip density, then this GPU could have:
256-bit/32-bit per chip = 8 chips, 8 chips * 3 GB per chip = 24 GB VRAM, 24 GB VRAM * 2 (chips on both sides of the PCB) = 48 GB VRAM and the memory bandwidth would also be 30% higher, because it's GDDR7 and not GDDR6.
Alternative calculation: 32 GB VRAM * 1.5 (3 GB per chip, instead of the current 2 GB) = 48 GB VRAM.
Frankly, when it comes to AI/LLMs, I'm not interested in 32 GB VRAM GPUs..and I said this over 1 year ago.
Let's see if NVIDIA gives us consumer 48 GB VRAM GPUs in the RTX 60 series (probably not, but NVIDIA gave us RTX PRO 6000 Blackwell GPU with 96 GB VRAM, which I also didn't expect).

QuoteThe ECC memory is essential here, as a single bit flip could ruin a long render or an AI training run.
This GPU is not capable to train a LLM big enough where ECC would matter (even with several and days run, it is unlikely that something would happen). Neither is ECC necessary for fine-tuning with this GPU. Also, there are checkpoints, one doesn't lose everything, and even if, on this GPU the costs of losing a run are a few bucks a few days..in this ballpark. How I know? I asked the big LLMs over at arena.ai (aka lmarena.ai).

(en.wikipedia.org/wiki/GeForce_RTX_50_series#Desktop, en.wikipedia.org/wiki/Intel_Arc#Workstation_2)

  • RTX 5090: 1792 GB/s = 512-bit * 28 Gb/s / 8.
  • B70 Pro / B65 Pro: 608 GB/s = 256-bit * 19 Gb/s / 8.

So a consumer 5090 has a 3 times faster memory bandwidth and CUDA, of course. The only thing these B70/B65 could have go for them is having 48 GB VRAM and people trying to make them work without CUDA.

For inferencing these are probably fine (I know NV works using Vulkan just fine (a bit slower than using CUDA)), but then again, a 5090 has 3x the token generation and also much higher prompt processing.

Do you agree or disagree?

Quote from: 48 GB VRAM when on Today at 15:37:40Why

Found this to be pertinent to the discussion at hand:

reddit.com/r/LocalLLaMA/comments/1rzaz7r/my_experience_spending_2k_and_experimenting_on_a/

Interested in hearing your thoughts.

RobertJasiek

Seems you ask whether to run AI locally or in the cloud. The answer is... surprise, surprise... it depends! E.g., if you use AI 24/7 for a long time, you better do it locally. If you have just a few quick queries per day but need more than cheap local hardware can process, then pay for a cloud service.

However, cloud comes in two ways: a) already offered as you want (maybe a standard LLM) or b) you first need to program and upload. The latter is also sometimes used for short-time research project (such as a mathematical proof of already proved maximum complexity), for which the time is predictable so you know in advance that a solution is generated within the time- and cost-frame.

I use deep neural net inferencing ca. 7 hours per day for years. For such a purpose, local hardware is by far more cost-efficient and always available. Cloud would be astronomically expensive. Local computing depends on wattage and its power bill. Make the greedy decision and soon you pay more for electricity than hardware. Your own solar park helps.

If you train deep neural nets, it depends. Maybe local hardware can be enough but might already become expensive - from €500 to €100,000 everything is possible, not to mention the electricity bill. If you have a need for giant hardware ressources, cloud together with your programming might be necessary but then we speak of industrial scale expenses - open-end. There is, however, a third option: distributed computing via a world-wide community of enthusiasts. Such has been done for some research projects or for, e.g., training KataGo (the nets for the game of Go).

Quick Reply

Name:
Email:
Verification:
Please leave this box empty:
Shortcuts: ALT+S post or ALT+P preview