News:

Willkommen im Notebookcheck.com Forum! Hier können sie über alle unsere Artikel und allgemein über Notebook relevante Dinge disuktieren. Viel Spass!

Main Menu

Post reply

The message has the following error or errors that must be corrected before continuing:
Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.
Other options
Verification:
Please leave this box empty:

Shortcuts: ALT+S post or ALT+P preview

Topic summary

Posted by A
 - December 27, 2023, 21:08:15
Well, this is what happens when someone doesn't even realize VRAM can be HBM and says "NNs don't need VRAM" while literally everyone and their mother are using VRAM, be it desktop GPU or A100 GPU.

The rest of this sh*tpost is just a travel into mental disorder, hundreds of terabytes of RAM, whole power plants, yeah, keep us posted...

Quote from: NikoB on December 27, 2023, 20:35:13I turn out to be right everywhere
Grand finale
Posted by NikoB
 - December 27, 2023, 20:35:13
Quote from: RobertJasiek on December 14, 2023, 16:07:32According to the user A, LLMs need much VRAM or unified memory but I do not know if improved / altered LLM nets might work mostly on RAM, too.
Robert, why do you keep trying to communicate with stupid bot A? Just ignore him, as if he's not on topic and that's it. The best thing you can do.

Complex neural networks simply require HBM3 memory as RAM, but a stupid bot And lies all the time about the need for VRAM. Neural networks don't need VRAM, they need the fastest memory available on the market. But the RAM in Apple hardware, although 1.5-2 times faster than in the x86 camp, is still inferior to the dedicated VRAM in video cards by up to 10 times. And of course, HBM3, which is used on serious neural networks in servers, loses up to 10 times.

Actually, there is only one problem - there is very little memory (terabytes, hundreds of terabytes are needed) and speeds of terabytes per second, which only HBM3 can provide. Obviously, neither Apple nor consumer versions of x86 are in any way suitable for serious neural networks and will not be very suitable yet for a long time due to slowdowns in both productivity and technical processes.

Once upon a time in the 90s, a computer with a performance of 1 Teraflops (with 64-bit precision) was considered the height of progress, but now many ordinary people have it. This is what led to the overall progress and development in software as we see it now. But a further leap (towards smart expert systems) requires hardware thousands of times more powerful than what is currently available to the average person. Everything is very simple. As before, advanced technologies consume energy (forced) like a whole power plant and cost a lot of money, which is what NVidia and AMD are now making money on, but someday it will be in a pocket gadget if the IT world finds a way out of the silicon impasse.

Samsung recently promised simultaneous speech translation in 2 languages using a local neural network in the S24. I don't believe in this and I'm sure that Samsung will screw up for a simple reason - the local resources of top-end smartphones are completely insufficient for accurate translation from one language to another, especially on free topics. If this were possible with low consumption and power of hardware, the Google translator would not produce such wild nonsense in elementary sentences.

I'm afraid that Samsung will soon quarrel with many people who have entrusted their conversations to such a smartphone. Well, it's kind of like how idiots who believed in a working "autopilot" die. I turn out to be right everywhere, later, years later, seeing everything in advance, when the crowd finally realizes how complicated it all is and how modern technologies are not ready for it.

Hundreds of billions of dollars were poured into autopilots (or rather, businessmen from startups sawed it up), and the result was a complete failure. There will also be simultaneous speech interpreters in the next 20-30 years. People, of course, will use it (they are greedy for new products without going into too much detail), but along with mistakes and quarrels, there will also be a wave of disappointment. But Samsung will be able to temporarily stand out from its competitors and sell more. Goal achieved...
Posted by A
 - December 27, 2023, 19:42:05
Smh
Posted by RobertJasiek
 - December 27, 2023, 19:21:00
I cannot spend time to read full texts of every paper somebody links.

If something is basic, explain it within seconds instead of wasting more time on meta-discussion! (16b might refer to different buses or to objects of the model; I do not guess which.)
Posted by A
 - December 27, 2023, 19:03:52
Quote from: RobertJasiek on December 27, 2023, 18:41:40More meaningless statements without context. 6.7B parameters for what LLM? What models with what numbers of parameters does it have and what is the quality of results for each model? 16bit precision of what? What GPU?
Oh man, half these questions is answered in the paper itself (there's a link to it in article), another half is so base-level, especially "16 bit precision of what" one. Why have you even started arguing in the first place, you don't know the basics.
Posted by RobertJasiek
 - December 27, 2023, 18:41:40
Quote from: A on December 27, 2023, 16:39:47as little as 5-6GB RAM for running 6.7B parameter LLM in 16bit precision - with a speedup of x20-25 on GPU.

More meaningless statements without context. 6.7B parameters for what LLM? What models with what numbers of parameters does it have and what is the quality of results for each model? 16bit precision of what? What GPU?
Posted by A
 - December 27, 2023, 16:39:47
Quote from: RobertJasiek on December 27, 2023, 16:08:17Arrogance not meeting reality of many saving money for a much longer time.
It's not an arrogance, it's just reality. MBP is for working professionals who do not find it expensive. You may like it or not.

Quote from: RobertJasiek on December 27, 2023, 16:08:17So what. x64 notebooks sell many times more. Your statement is empty.
It's not a competition. My statement is on point. There IS a target audience for MBP - and it's selling in millions - it's just "many times more" is not the target audience for Apple, they can't even produce that much.

Quote from: RobertJasiek on December 27, 2023, 16:08:17Not bad, but not enough to solve the principle storage problem yet.
What "storage problem", what are you even talking about. ) They were running LLMs using 50% of RAM usually required by those LLMs - as little as 5-6GB RAM for running 6.7B parameter LLM in 16bit precision - with a speedup of x20-25 on GPU.
Posted by RobertJasiek
 - December 27, 2023, 16:08:17
"Don't buy MBP if you are not earning $ for it in 2 weeks to 2 months of work": Arrogance not meeting reality of many saving money for a much longer time.

"MBP [...] sells millions": So what. x64 notebooks sell many times more. Your statement is empty.

LLM sparsification: the paper concludes: "We have demonstrated the ability to run LLMs up to twice the size of available DRAM" Therefore, if the paper is applied to its specific LLMs, 50% storage can be saved. Not bad, but not enough to solve the principle storage problem yet.
Posted by A
 - December 27, 2023, 13:01:16
Quote from: Chakwal on December 14, 2023, 12:29:09What is more important to the end user, inference or training
Inference

Quote from: Chakwal on December 14, 2023, 12:29:09Don't know if it's possible yet to run such LLM locally or the code isn't fully open sourced yet? Even if it were, would you need thousands of H100's or something to run such models?
OpenJourney is a Stable Diffusion model (as far as I remember), you can run it locally on iPhone nowadays. Don't do it though, waste of battery.

Quote from: Chakwal on December 14, 2023, 12:29:09Any thoughts on how well a device such as the Switch 2 would be for running LLM locally?
Switch 2 doesn't exist.

Quote from: Chakwal on December 14, 2023, 12:29:09do you think memory requirements for LLM's will reduce in future
Requirements reduce all the time.

Quote from: Chakwal on December 14, 2023, 12:29:09but I also feel it is kind a largely irrelevant because the most people do not buy $4300+ 64GB ram M-silicon Max MBP'
32Gb RAM MBP will provide you with enough VRAM to run 34B LLMs.

Quote from: Chakwal on December 14, 2023, 12:29:09Another thing to note is a lot of these AI/ML libraries (Pytorch, Tensorflow, etc) and tutorials almost all of them are tested or run on Nvidia h/w and accelerated via their own CUDA frameworks.
Most already have Metal bindings or you can play with something like llama.cpp which is almost a "one button" native solution to run LLMs locally.

Quote from: Chakwal on December 14, 2023, 12:29:09The kind of demographic that do have access to that kind of hardware, usually are given to them free by their jobs/workplaces
Don't buy MBP if you are not earning $ for it in 2 weeks to 2 months of work, easy. It has it's own target audience and sells millions.

Quote from: Chakwal on December 14, 2023, 12:29:09You're actually the first person, A, I've heard really pushing for the idea of running LLM's locally on MBP's
It was just an example of what is possible on MBPs without buying a couple 8-16Gb video cards. I wasn't the one to come up with the idea, there's a lot of hype about it over r/localllama.

Quote from: RobertJasiek on December 14, 2023, 16:07:32LLM nets might work mostly on RAM
They will run on CPU at CPU speed on x86. Get Threadripper.

Quote from: RobertJasiek on December 14, 2023, 16:07:32Likely, storage reqirements for inference will drop
Storage requirements never were high in the first place. They went down from bearable to pffft.

Quote from: RobertJasiek on December 14, 2023, 16:07:32Forget LLMs on gadgets. LLM needs proper hardware.
Your comment didn't age well ))
medium.datadriveninvestor.com/pocket-sized-revolution-7689eba63650
Though I myself strongly oppose wasting phone battery on local inferences.
Posted by RobertJasiek
 - December 14, 2023, 16:07:32
The ordinary enduser need inference. Training is for researchers or seekers of specialised variants offered by none.

Forget LLMs on gadgets. LLM needs proper hardware.

Likely, storage reqirements for inference will drop with improved networks for constant object sizes. One can, however, always blow up storage demand by increasing objects or quality. According to the user A, LLMs need much VRAM or unified memory but I do not know if improved / altered LLM nets might work mostly on RAM, too.

Again, x64 with Nvidia GPUs is not inefficient for AI but quite contrariy is often very efficient. Large LLM with too large VRAM needs are the exception.
Posted by Chakwal
 - December 14, 2023, 12:29:09
@A:

I don't know much about LLM's but am curious on the subject and interested in hearing your thoughts/comments on a few questions of mine regarding this topic:

1) What is more important to the end user, inference or training [*and will the answer to this change in future?]? By end user, I mean an amateur trying into this stuff. Such as AI generated imagery using stable diffusion/Automatic1111/openjourney (getting models from hugging face) Or running chatbots like ChatGPT to help get work done quicker on your machine locally for free (since most locked behind cloud services are too annoying as they force you to signup / register, disclose phone numbers, often times with feature limited api access per request and paywalls)

Don't know if it's possible yet to run such LLM locally or the code isn't fully open sourced yet? Even if it were, would you need thousands of H100's or something to run such models? Don't know what the "TOP" requirements  of such activities would require.

2) Any thoughts on how well a device such as the Switch 2 would be for running LLM locally? Current rumors state that it'll be coming (or atleast announced) March 2024, come with 12 gb vram. Ofcourse it'll be locked down running Nintendo's custom proprietary OS but looking at Nintendo's previous console history - almost every single of the devices were hacked/jailbroken shortly after release, so I expect someone to be running full fledged windows/linux with next gen switch as well. I don't expect it to be anywhere to perform anywhere close to massive M3 Max chip with 64gb vram but as an entry level device, possibly 1/3rd the price of MBA/base MBP, it seems enticing.

3) You mentioned earlier "70B" being ideal for current open source models, do you think memory requirements for LLM's will reduce in future (due to more efficient algorithms) or increase, due to just being able to handle more parameters gives better or more detailed/accurate results? You also mentioned using coding LLVMs and general chat ones, which ones or models do you use exactly? Would be nice if you could further elaborate on what exactly stuff like "7B", "13B", "15B", "33B", "70B", "120B", "130B", "180B" etc ... tokens per second - what it means in models, why it matters? You mentioned about it being about the math but why, what exactly about the math, makes it better?

Just some comments in general on this thread:

It's an interesting case / argument to make, that x86 mobile rig's lack the necessary Vram capacity to run modern full scale LLM locally but I also feel it is kind a largely irrelevant because the most people do not buy $4300+ 64GB ram M-silicon Max MBP's. The kind of demographic that do have access to that kind of hardware, usually are given to them free by their jobs/workplaces and most probably work at ML companies to begin with where they also give them free access to a cloud server with several thousand H100's or a lab where there are several RTX 3090 NVlinked together on a rack that they can eGPU with or something.

The points about x86 efficiency (or lack of), isn't so interesting to me because everyone has known x86 to be terrible for power/wattage for last 10 years (and arm being vastly superior in this aspect isn't apple specific advantage), so nothing new. Nonetheless it is impressive what apple has been able to achieve in mobile factor and being first to market for mainstream arm pc's.

Another thing to note is a lot of these AI/ML libraries (Pytorch, Tensorflow, etc) and tutorials almost all of them are tested or run on Nvidia h/w and accelerated via their own CUDA frameworks. I am not too sure how much this matters since it is all mostly open source and should be vendor agnostic with regards to NPU h/w. But it's nice to know when everything is Nvidia's ecosystem and tested for that ecosystem everything just works (less bugs/buggy?). You're actually the first person, A, I've heard really pushing for the idea of running LLM's locally on MBP's but it does make logical sense because of the large pool of UMA / efficiency and MBP's are already super popular in Big Tech US firms.
Posted by A
 - December 04, 2023, 13:18:31
Quote from: RobertJasiek on December 04, 2023, 13:09:13Sure, but I want to know if these renderers always use raytracing if computation is complex or is raytracing just one possible feature of them?
They don't use anything except ray tracing when rendering scenes (e.g. with Blender's Cycles renderer) at all.

Ray tracing on GPU = hardware implementation of ray tracing algorithms.
Posted by RobertJasiek
 - December 04, 2023, 13:09:13
Sure, but I want to know if these renderers always use raytracing if computation is complex or is raytracing just one possible feature of them?
Posted by A
 - December 04, 2023, 12:44:28
Quote from: RobertJasiek on December 04, 2023, 12:34:07In 3D games, ray tracing is a gimmick but what is its function in Blender?
It's a thing both for games and raytracing renderers like Blender or Redshift or 3DStudio.
Posted by RobertJasiek
 - December 04, 2023, 12:34:07
In 3D games, ray tracing is a gimmick but what is its function in Blender?