Apple's first public LLM is called Ferret, powered by 8 Nivida A100 GPUs

Redaktion · December 29, 2023, 13:16:04

Apple is getting serious about generative AI, releasing its first Multimodel Large Language Model dubbed Ferret. The new AI model has been published under an open source license and was trained using 8 Nvidia A100 GPUs.

https://www.notebookcheck.net/Apple-s-first-public-LLM-is-called-Ferret-powered-by-8-Nivida-A100-GPUs.787395.0.html

RobertJasiek · December 29, 2023, 14:06:12

8 * A100 is interesting for two reasons:
- Even Apple knows that Nvidia is the choice for AI.
- 8 GPUs is a very modest number for a mega company's big AI project.

A · December 29, 2023, 14:18:26

Quote from: RobertJasiek on December 29, 2023, 14:06:12Even Apple knows that Nvidia is the choice for AI.

Training*

Quote from: RobertJasiek on December 29, 2023, 14:06:128 GPUs is a very modest number for a mega company's big AI project.

8xA100s is simply a single server (not rack). Can train it with more or less.
"FERRET is trained on 8 A100 GPUs with 80GB memory. To train on fewer GPUs, you can reduce the per_device_train_batch_size and increase the gradient_accumulation_steps accordingly"

A · December 29, 2023, 15:47:44

Just to give an example of how small this Ferret model actually is, training Llama2 70B parameters model took 1.7 million A100-hours, which is 35 days non-stop on 2048xA100.

And you can run inferences on Llama2 70B locally on 64GB+ Apple Silicon mac.
Or on x86 with several external GPUs with total of 64GB+ VRAM (or a single A100 if you have a spare $10K just for GPU) - if you are able to build the rig.

Just so you understand the gap in computational power requirements between training and running the model.

Why tho · December 30, 2023, 00:37:34

So, why doesn't apple train on their own silicon instead of using another companies?

And if Nvidia are the leaders when it comes to training, why are they not equally interested in keeping such a lead in inference as well? Do they not care about inferencing anymore?

A · December 30, 2023, 01:35:10

Quote from: Why tho on December 30, 2023, 00:37:34So, why doesn't apple train on their own silicon instead of using another companies?

Probably massive parallelization of GPUs and economy of the process are the two main reasons. Renting A100 is cheap, everyone can get 8xA100 for $16/hr.

Quote from: Why tho on December 30, 2023, 00:37:34why are they not equally interested in keeping such a lead in inference

Maybe because they are equally interested in inferencing and upselling you to get two-three-four 4090s.

RobertJasiek · December 30, 2023, 07:18:42

Quote from: Why tho on December 30, 2023, 00:37:34if Nvidia are the leaders when it comes to training, why are they not equally interested in keeping such a lead in inference as well?

Do you believe everything the user A says...?

Nvidia GPUs together with Nvidia libraries are extraordinarily good (fast and efficient) at AI inferencing if VRAM is sufficient.

And this, according to A, means that Apple M3 Max with enough unified memory (say, 64+GB) can work for these two specific kinds of AI with models of intermediate size (too large for the VRAM of one Nvidia consumer GPU but small enough not to need AI servers): generative image creation; LLMs.

RobertJasiek · December 30, 2023, 07:26:12

Quote from: A on December 30, 2023, 01:35:10Renting A100 is cheap, everyone can get 8xA100 for $16/hr.

For short-time projects, renting prices are sort of ok. For 24/7, however, it is astronomic: $140,253/a. Suppose the GPU live for 7 years - renting would cost a million.

Quotebecause they are equally interested in inferencing and upselling you to get two-three-four 4090s.

Correct.

A · December 30, 2023, 11:30:42

Quote from: RobertJasiek on December 30, 2023, 07:18:42Nvidia GPUs together with Nvidia libraries are extraordinarily good (fast and efficient) at AI inferencing if VRAM is sufficient.

But this is literally what "user A says".. ))) Don't try and downplay the importance of "sufficient VRAM" by putting speed ahead of it, local inferences are not computationally expensive at all, VRAM is all that matters.

Quote from: RobertJasiek on December 30, 2023, 07:18:42these two specific kinds of AI ....: generative image creation; LLMs.

No, you can run any kind of AI that fits into Mac RAM and doesn't fit into GPU VRAM, including the medical ones that search for proteins. The sweet spot between 64 and 192GB RAM where mac is cheaper than getting a rack of 4090s. And that's only because of mac RAM limit, not because rack of 4090s wins in price at 192GB or something. So this "sweet spot" will get bigger and bigger with every iteration of Mac Studio/MBP.

Quote from: RobertJasiek on December 30, 2023, 07:18:42but small enough not to need AI servers

We are talking about local inferences. And it's not "small enough", it's "about the size of GPT3.5".

Quote from: RobertJasiek on December 30, 2023, 07:26:12For short-time projects, renting prices are sort of ok. For 24/7, however, it is astronomic: $140,253/a.

Buying 8 of them will be around the same price tag or more. That's like $10K-18K per A100 plus server components plus power bill.

RobertJasiek · December 30, 2023, 13:12:45

Quote from: A on December 30, 2023, 11:30:42local inferences are not computationally expensive at all, VRAM is all that matters.

Utter nonsense! Computational time and space complexities always depend on the algorithms! Take this from somebody having also studied theoretical informatics and applying time-complex AI every day.

Quoteyou can run any kind of AI that fits into Mac RAM and doesn't fit into GPU VRAM,

Provided running the AI software is possible on a Mac at all and, if it is, not too slow.

Quote64 and 192GB RAM where mac is cheaper than getting a rack of 4090s.

Such is false as a general statement because it depends on the exact hardware choice etc. whether Mac or PC is more expensive.

Quote"sweet spot"

Apple PR identified. By you again;)

QuoteWe are talking about local inferences.

Indeed. (As if I did not know, LOL.)

QuoteBuying 8 of them will be around the same price tag or more. That's like $10K-18K per A100 plus server components plus power bill.

I calculated such for one A100 (rent 3€/h) and found that building one's computer and paying power (in Germany 0.3€/h) is very much cheaper. (It was a quick and dirty calculation, so I did not bother to save it.)

A · December 30, 2023, 14:29:11

Quote from: RobertJasiek on December 30, 2023, 13:12:45Computational time and space complexities always depend on the algorithms

Oh, just don't start, because I will ask you two main properties of neural networks (one of them actually goes right against your "utter nonsense" claim) and you will lose the debate immediately. If you at all can call "debate" discussion between someone who is working on NNs and someone who simply uses one Go AI, that can be seemingly inferenced locally even on iPhone. )

Quote from: RobertJasiek on December 30, 2023, 13:12:45studied theoretical informatics

"Studied theoretical informatics" ))) You really should've deduced by now I'm an active IT professional.

Quote from: RobertJasiek on December 30, 2023, 13:12:45Provided running the AI software is possible on a Mac at all and, if it is, not too slow.

It is definitely not slower than running on CPU when you've ran out of GPU VRAM.

Quote from: RobertJasiek on December 30, 2023, 13:12:45Apple PR identified. By you again;)

Lol what, it's just an english idiom.
merriam-webster.com/dictionary/sweet spot
Merriam-Webster dictionary PR.

Quote from: RobertJasiek on December 30, 2023, 13:12:45I calculated such for one A100 (rent 3€/h) and found that building one's computer and paying power (in Germany 0.3€/h) is very much cheaper. (It was a quick and dirty calculation, so I did not bother to save it.)

Renting A100 is actually $2/hr, electricity just for fully loaded GPU itself will be ±650EUR/yr. Buying A100 and making a rig still is around the same or more money as renting it for about a year.

A · December 30, 2023, 15:02:26

Quote from: RobertJasiek on December 30, 2023, 13:12:45Such is false as a general statement because it depends on the exact hardware choice etc. whether Mac or PC is more expensive

We've discussed it just yesterday, try and beat $5600 192GB Mac Studio with 4090s.

A · December 30, 2023, 15:07:47

P.S. Not even asking you to beat the price of 96-128GB MBP with x86 laptop GPU because there's simply no laptop with that amount of VRAM, so it's very inconvenient for you and you will immediately jump back to desktop 4090s.

RobertJasiek · December 30, 2023, 16:23:17

So what are your two questions you think I cannot answer?

Hint: Image or Language AI needs some estimated fixed (upper limit of) time to produce the result of a desired quality. Go move generation needs indefinite time for an increasingly well chosen, with higher confidence selected move. This is so because the complexity of 19x19 Go is orders of magnitude larger than the number of particles in the universe. Go AI of the neural net kind does not use any (of, e.g., my) mathematical theorems on Go theory, which provide immediate solutions only for a few specialised (classes of, e.g., late endgame) positions.

A · December 30, 2023, 16:50:48

Quote from: RobertJasiek on December 30, 2023, 16:23:17So what are your two questions you think I cannot answer?

It was one question and it's already stated there.

Quote from: RobertJasiek on December 30, 2023, 16:23:17Go move generation needs indefinite time

That's IF you try calculating that move. NN isn't calculating. NN is "predicting" the move evaluation based on its "previous experience". Back in the day to evaluate a move you had to go through all moves after it and calculate the function of all possible outcomes. Worked for chess. Not much for Go.
So instead of infinite complexity calculation of best move you are using NN to predict the best move, which is a finite complexity task (and actually is using a very simple math behind the scenes). So computation complexity is going down, it's not infinite anymore. Even more, every NN run is ideally not only a finite-time computing task, but takes similar (equal) time for every inference.

News:

Apple's first public LLM is called Ferret, powered by 8 Nivida A100 GPUs

Redaktion

RobertJasiek

A

A

Why tho

A

RobertJasiek

RobertJasiek

A

RobertJasiek

A

A

A

RobertJasiek

A

Quick Reply