Apple is getting serious about generative AI, releasing its first Multimodel Large Language Model dubbed Ferret. The new AI model has been published under an open source license and was trained using 8 Nvidia A100 GPUs.https://www.notebookcheck.net/Apple-s-first-public-LLM-is-called-Ferret-powered-by-8-Nivida-A100-GPUs.787395.0.html
8 * A100 is interesting for two reasons:
- Even Apple knows that Nvidia is the choice for AI.
- 8 GPUs is a very modest number for a mega company's big AI project.
Quote from: RobertJasiek on December 29, 2023, 14:06:12Even Apple knows that Nvidia is the choice for AI.
Training*
Quote from: RobertJasiek on December 29, 2023, 14:06:128 GPUs is a very modest number for a mega company's big AI project.
8xA100s is simply a single server (not rack). Can train it with more or less.
"FERRET is
trained on 8 A100 GPUs with 80GB memory. To train on fewer GPUs, you can reduce the per_device_train_batch_size and increase the gradient_accumulation_steps accordingly"
Just to give an example of how small this Ferret model actually is, training Llama2 70B parameters model took 1.7 million A100-hours, which is 35 days non-stop on 2048xA100.
And you can run inferences on Llama2 70B locally on 64GB+ Apple Silicon mac.
Or on x86 with several external GPUs with total of 64GB+ VRAM (or a single A100 if you have a spare $10K just for GPU) - if you are able to build the rig.
Just so you understand the gap in computational power requirements between training and running the model.
So, why doesn't apple train on their own silicon instead of using another companies?
And if Nvidia are the leaders when it comes to training, why are they not equally interested in keeping such a lead in inference as well? Do they not care about inferencing anymore?
Quote from: Why tho on December 30, 2023, 00:37:34So, why doesn't apple train on their own silicon instead of using another companies?
Probably massive parallelization of GPUs and economy of the process are the two main reasons. Renting A100 is cheap, everyone can get 8xA100 for $16/hr.
Quote from: Why tho on December 30, 2023, 00:37:34why are they not equally interested in keeping such a lead in inference
Maybe because they are equally interested in inferencing and upselling you to get two-three-four 4090s.
Quote from: Why tho on December 30, 2023, 00:37:34if Nvidia are the leaders when it comes to training, why are they not equally interested in keeping such a lead in inference as well?
Do you believe everything the user A says...?
Nvidia GPUs together with Nvidia libraries are extraordinarily good (fast and efficient) at AI inferencing if VRAM is sufficient.
And this, according to A, means that Apple M3 Max with enough unified memory (say, 64+GB) can work for these two specific kinds of AI with models of intermediate size (too large for the VRAM of one Nvidia consumer GPU but small enough not to need AI servers): generative image creation; LLMs.
Quote from: A on December 30, 2023, 01:35:10Renting A100 is cheap, everyone can get 8xA100 for $16/hr.
For short-time projects, renting prices are sort of ok. For 24/7, however, it is astronomic: $140,253/a. Suppose the GPU live for 7 years - renting would cost a million.
Quotebecause they are equally interested in inferencing and upselling you to get two-three-four 4090s.
Correct.
Quote from: RobertJasiek on December 30, 2023, 07:18:42Nvidia GPUs together with Nvidia libraries are extraordinarily good (fast and efficient) at AI inferencing if VRAM is sufficient.
But this is literally what "user A says".. ))) Don't try and downplay the importance of "sufficient VRAM" by putting speed ahead of it, local inferences are not computationally expensive at all, VRAM is all that matters.
Quote from: RobertJasiek on December 30, 2023, 07:18:42these two specific kinds of AI ....: generative image creation; LLMs.
No, you can run any kind of AI that fits into Mac RAM and doesn't fit into GPU VRAM, including the medical ones that search for proteins. The sweet spot between 64 and 192GB RAM where mac is cheaper than getting a rack of 4090s. And that's only because of mac RAM limit, not because rack of 4090s wins in price at 192GB or something. So this "sweet spot" will get bigger and bigger with every iteration of Mac Studio/MBP.
Quote from: RobertJasiek on December 30, 2023, 07:18:42but small enough not to need AI servers
We are talking about local inferences. And it's not "small enough", it's "about the size of GPT3.5".
Quote from: RobertJasiek on December 30, 2023, 07:26:12For short-time projects, renting prices are sort of ok. For 24/7, however, it is astronomic: $140,253/a.
Buying 8 of them will be around the same price tag or more. That's like $10K-18K per A100 plus server components plus power bill.
Quote from: A on December 30, 2023, 11:30:42local inferences are not computationally expensive at all, VRAM is all that matters.
Utter nonsense! Computational time and space complexities always depend on the algorithms! Take this from somebody having also studied theoretical informatics and applying time-complex AI every day.
Quoteyou can run any kind of AI that fits into Mac RAM and doesn't fit into GPU VRAM,
Provided running the AI software is possible on a Mac at all and, if it is, not too slow.
Quote64 and 192GB RAM where mac is cheaper than getting a rack of 4090s.
Such is false as a general statement because it depends on the exact hardware choice etc. whether Mac or PC is more expensive.
Quote"sweet spot"
Apple PR identified. By you again;)
QuoteWe are talking about local inferences.
Indeed. (As if I did not know, LOL.)
QuoteBuying 8 of them will be around the same price tag or more. That's like $10K-18K per A100 plus server components plus power bill.
I calculated such for one A100 (rent 3€/h) and found that building one's computer and paying power (in Germany 0.3€/h) is very much cheaper. (It was a quick and dirty calculation, so I did not bother to save it.)
Quote from: RobertJasiek on December 30, 2023, 13:12:45Computational time and space complexities always depend on the algorithms
Oh, just don't start, because I will ask you two main properties of neural networks (one of them actually goes right against your "utter nonsense" claim) and you will lose the debate immediately. If you at all can call "debate" discussion between someone who is working on NNs and someone who simply uses one Go AI, that can be seemingly inferenced locally even on iPhone. )
Quote from: RobertJasiek on December 30, 2023, 13:12:45studied theoretical informatics
"Studied theoretical informatics" ))) You really should've deduced by now I'm an active IT professional.
Quote from: RobertJasiek on December 30, 2023, 13:12:45Provided running the AI software is possible on a Mac at all and, if it is, not too slow.
It is definitely not slower than running on CPU when you've ran out of GPU VRAM.
Quote from: RobertJasiek on December 30, 2023, 13:12:45Apple PR identified. By you again;)
Lol what, it's just an english idiom.
merriam-webster.com/dictionary/sweet spot
Merriam-Webster dictionary PR.
Quote from: RobertJasiek on December 30, 2023, 13:12:45I calculated such for one A100 (rent 3€/h) and found that building one's computer and paying power (in Germany 0.3€/h) is very much cheaper. (It was a quick and dirty calculation, so I did not bother to save it.)
Renting A100 is actually $2/hr, electricity just for fully loaded GPU itself will be ±650EUR/yr. Buying A100 and making a rig still is around the same or more money as renting it for about a year.
Quote from: RobertJasiek on December 30, 2023, 13:12:45Such is false as a general statement because it depends on the exact hardware choice etc. whether Mac or PC is more expensive
We've discussed it just yesterday, try and beat $5600 192GB Mac Studio with 4090s.
P.S. Not even asking you to beat the price of 96-128GB MBP with x86 laptop GPU because there's simply no laptop with that amount of VRAM, so it's very inconvenient for you and you will immediately jump back to desktop 4090s.
So what are your two questions you think I cannot answer?
Hint: Image or Language AI needs some estimated fixed (upper limit of) time to produce the result of a desired quality. Go move generation needs indefinite time for an increasingly well chosen, with higher confidence selected move. This is so because the complexity of 19x19 Go is orders of magnitude larger than the number of particles in the universe. Go AI of the neural net kind does not use any (of, e.g., my) mathematical theorems on Go theory, which provide immediate solutions only for a few specialised (classes of, e.g., late endgame) positions.
Quote from: RobertJasiek on December 30, 2023, 16:23:17So what are your two questions you think I cannot answer?
It was one question and it's already stated there.
Quote from: RobertJasiek on December 30, 2023, 16:23:17Go move generation needs indefinite time
That's IF you try calculating that move. NN isn't calculating. NN is "predicting" the move evaluation based on its "previous experience". Back in the day to evaluate a move you had to go through all moves after it and calculate the function of all possible outcomes. Worked for chess. Not much for Go.
So instead of infinite complexity calculation of best move you are using NN to predict the best move, which is a finite complexity task (and actually is using a very simple math behind the scenes). So computation complexity is going down, it's not infinite anymore. Even more, every NN run is ideally not only a finite-time computing task, but takes similar (equal) time for every inference.
I can tell this article was not written by GPT because of the low quality of writing.
I think I'm beginning to understand why people say nobody gives a sheet about Ai. 192 GB to inference decent models locally well? That's some sick joke. Nobody is gonna give a damn about this stuff (besides these corporate enterprise companies running in cloud). Doesn't matter if they're on x86, arm, pc vs mac or using dGPUs. The average person isn't gonna be buying more than 32 GB ram (or 48 GB if including vram). So unless these companies can reduce the size of their algorithms / models or some how use SSD as cache for additional memory -- this is bullsheet as far as I'm concerned.
@A, Go NN is not the simple 'learn from the past' it was a decade ago with about amateur 3 dan level then. Now, it is a mixture with other modules, such as pruned tree walks ('tactical reading'). Granted, it is much better than brute-force or alpha-beta but definitely still complex enough to profit from aeons of analysis on the current position.
(Phew, lucky that you have not asked me to solve P =? NP :) )
Quote from: RobertJasiek on December 30, 2023, 18:25:44Now, it is a mixture with other modules, such as pruned tree walks ('tactical reading').
It's more like they've started adding in older algorithms that were non-viable with normal move evaluation. So yeah, of course it's not 100% NN - NN-only engines can't win, they blunder and don't know theory.
There have also been comparatively new algorithmic AI breakthroughs during the last 20 years.
Go AI are essentially explicit-Go-theory agnostic, and actually this has turned out to be a strength compared to previously greater emphasis on (non-mathematical) expert knowledge. (With a very few exceptions. Implicit Go theory of modern AI perceived by strong humans has many similarities to human Go theory though.)
Quote from: RobertJasiek on December 30, 2023, 18:58:28There have also been comparatively new algorithmic AI breakthroughs during the last 20 years.
AI stuff is getting old in 3.
Quote from: RobertJasiek on December 30, 2023, 18:58:28Go AI are essentially explicit-Go-theory agnostic
No game openings?
Modern AI need no opening feeding.
Quote from: RobertJasiek on December 30, 2023, 20:28:16Modern AI need no opening feeding.
Oh they do, why not? If each move is time-limited first moves will be the worst calculated. Every bit of theory helps.
1. A single 8 gpu server is not the be all and end all, I'd have expected Apple to have a DGX farm
2. The A100 is not the peak performer by a long shot. H100s poo all over these and even the L40s is much better bang for buck with most AI related workloads