Astounding M1X multi-core performance estimates have the Apple Silicon playing in the same ballpark as the Intel Core i9-10900K and Ryzen 7 5800X

Richardd Costello · March 13, 2021, 22:54:08

Layman's terms as best I can.

RISC = simple instructions, executed quickly, to do something complicated you need a group of instructions. Simplicity means less transistors, means less power.

CISC = complex instructions, executed slowly, you can do complicated things with one instruction. Complexity means more transistors, means more power (consumption).

Most of the time, CPUs execute simple instructions so RISC is the way to go for performance. BUT years ago, the CPU of choice for the IBM PC was chosen by the accounting department rather than the engineering department. They went for a chip that they could get a good discount on because of existing deals and chose the Intel 8086.

This choice gave Intel lots of money to develop their chip family, making it more and more complex and crucially utilising faster and faster clock speeds. Other companies couldn't keep up, hence Apple dumping the PowerPC RISC design and jumping on the Intel bandwagon because even IBM couldn't keep up with Intels clock speeds.

The simplicity of the RISC design means you need a relatively small number of transistors to achieve 1 instruction per clock cycle. 80x86 needed many (variable number of) cycles per instruction. Intel developed the 80x86 line up to the Pentium 4 when they hit a problem. They simply couldn't get them to run faster than 3.4GHz - they were melting. So at that point they switched the architecture to the Core line which is more RISC like but internally decodes 80x86 to maintain compatibility. Still complex.

Ultimately Intel has managed to achieve 1 instruction per clock cycle (like RISC) but to achieve this they have had to have ever more complicated pipelining of instructions, meaning even more transistors, more power, more heat.

Think of a processor like a production line in a car factory, it might take 3 days to make a car, but if you have many stations all working on cars moving along a production line, after 3 days of production the cars actually come off the production line one per minute. CPUs are like this, with many instructions being worked on at the same time.

What has happened recently with the M1, is Apple has got close to Intel's clock speeds. The M1 is rumoured to be running at 3.2GHz, which is very fast for the ARM architecture. This speed means that the ARM can process its instructions at the same rate as the Intel chips but crucially using a LOT less transistors.

This has a two fold advantage for Apple, firstly their chip uses a lot less power (less transistors), generating a loss less heat. Secondly it makes their processor core very small compared with an Intel core.

The physically small core and low power means they can add more cores but crucially also means they can add more supplemental hardware to the die. They can have memory controllers and the RAM itself on chip - and can subsequently run the RAM a lot faster than your typical DDR RAM units. The whole SOC (system on a chip) thing means everything is faster.

Apple have also been able to produce their chips using smaller manufacturing methods than Intel can reliably achieve. Further adding benefits.

Intel having always been running into a dead end with CISC, sooner or later RISC was going to dominate once manufacturing and high clock speeds became mass market.

Intel have now reached that dead end. It started with mobile where battery life was the main issue. Now its the desktop where clock speed and memory performance is key. Soon, it will be the server market where 128 or 256 core ARM CPUs will trundle along feeding us our data using less electricity and crucially requiring less cooling. Thats why nVidia want to buy ARM.

Markus · March 14, 2021, 14:41:50

QuoteMiani came to this estimate by comparing the differences in results between the A12 and A12Z (1.68x faster) and the A14 and M1 (1.87x faster) and then picking an average for the M1X.

You did what? What a waste of time!

Mate · March 14, 2021, 15:29:10

@Richardd Castello

You are wrong - M1 uses more transistors than Intel/AMD core. However it also can execute more instructions per cycle so it can compete with x86 CPUs without boosting to 4GHz+. Main reason for M1 efficiency is smaller manufacturing process(They bought whole TSMC 5nm capacity for wagons of cash) - that alone gives 30-40% less energy consumption. Add BIG.little(another non-Apple technology) and voila, we have super energy efficient M1. That CPU is not miracle - AMD could easily smash its performance per watt under load if AMD only had access to 5nm nodes.

Additionally ARM64 is almost as complex as x86-64. Reduced instruction set but still executes ~1000 different instructions. Also x86 CPUs are not RISC inside. Every CPU(M1 too) is translating instruction sets to micro-ops and then execute.

Lee Rutter · March 14, 2021, 15:42:13

Wow, thank you for this amazing, layman's detail. Greatly in alignment with what I know, and then some. :)

Apple has been working on their M1 for 1/2 a decade and obviously, the iPhone and iPads have gone through many iterations so Apple has been quietly working to bring everything together so that they are on par or ahead of the game.

I am elated that Apple has worked on the M1 processor for so long and looking forward to the M1X as that is the one I will be buying once it rolls out and hopefully very soon.

Quote from: Richardd Costello on March 13, 2021, 22:54:08
Layman's terms as best I can.

RISC = simple instructions, executed quickly, to do something complicated you need a group of instructions. Simplicity means less transistors, means less power.

CISC = complex instructions, executed slowly, you can do complicated things with one instruction. Complexity means more transistors, means more power (consumption).

Most of the time, CPUs execute simple instructions so RISC is the way to go for performance. BUT years ago, the CPU of choice for the IBM PC was chosen by the accounting department rather than the engineering department. They went for a chip that they could get a good discount on because of existing deals and chose the Intel 8086.

This choice gave Intel lots of money to develop their chip family, making it more and more complex and crucially utilising faster and faster clock speeds. Other companies couldn't keep up, hence Apple dumping the PowerPC RISC design and jumping on the Intel bandwagon because even IBM couldn't keep up with Intels clock speeds.

The simplicity of the RISC design means you need a relatively small number of transistors to achieve 1 instruction per clock cycle. 80x86 needed many (variable number of) cycles per instruction. Intel developed the 80x86 line up to the Pentium 4 when they hit a problem. They simply couldn't get them to run faster than 3.4GHz - they were melting. So at that point they switched the architecture to the Core line which is more RISC like but internally decodes 80x86 to maintain compatibility. Still complex.

Ultimately Intel has managed to achieve 1 instruction per clock cycle (like RISC) but to achieve this they have had to have ever more complicated pipelining of instructions, meaning even more transistors, more power, more heat.

Think of a processor like a production line in a car factory, it might take 3 days to make a car, but if you have many stations all working on cars moving along a production line, after 3 days of production the cars actually come off the production line one per minute. CPUs are like this, with many instructions being worked on at the same time.

What has happened recently with the M1, is Apple has got close to Intel's clock speeds. The M1 is rumoured to be running at 3.2GHz, which is very fast for the ARM architecture. This speed means that the ARM can process its instructions at the same rate as the Intel chips but crucially using a LOT less transistors.

This has a two fold advantage for Apple, firstly their chip uses a lot less power (less transistors), generating a loss less heat. Secondly it makes their processor core very small compared with an Intel core.

The physically small core and low power means they can add more cores but crucially also means they can add more supplemental hardware to the die. They can have memory controllers and the RAM itself on chip - and can subsequently run the RAM a lot faster than your typical DDR RAM units. The whole SOC (system on a chip) thing means everything is faster.

Apple have also been able to produce their chips using smaller manufacturing methods than Intel can reliably achieve. Further adding benefits.

Intel having always been running into a dead end with CISC, sooner or later RISC was going to dominate once manufacturing and high clock speeds became mass market.

Intel have now reached that dead end. It started with mobile where battery life was the main issue. Now its the desktop where clock speed and memory performance is key. Soon, it will be the server market where 128 or 256 core ARM CPUs will trundle along feeding us our data using less electricity and crucially requiring less cooling. Thats why nVidia want to buy ARM.

Jan Onderwater · March 14, 2021, 17:08:44

I did some calculations myself, assuming the the CPU scales 80% Efficient and the GPU 90%
Every generation (m1->m2) gains 10% in speed in Single Core.

M1
Geekbench         Cinebench
En   HP   GPU   Single    Multi   Metal   Single   Multi
4   4   4   1.709   7.398   21.982   1.498   7.508
4   8   8   1.709   11.387   39.564   1.498   10.800
4   12   12   1.709   16.856   59.346   1.498   15.594
4   16   16   1.709   22.325   79.128   1.498   20.387
4   24   24   1.709   33.262   118.692   1.498   29.974
8   24   32   1.709   33.712   158.256   1.498   31.187
12   36   64   1.709   50.568   316.512   1.498   46.781
16   48   128   1.709   67.424   633.024   1.498   62.374
24   72   256   1.709   101.136   1.266.048   1.499   86.342
M2
Geekbench         Cinebench
En   HP   GPU   Single    Multi   Metal   Single   Multi
4   4   4   1.880   8.137   24.180   1.648   8.259
4   8   8   1.880   12.526   43.524   1.648   11.882
4   12   12   1.880   18.542   65.286   1.648   17.155
4   16   16   1.880   24.558   87.048   1.648   22.429
4   24   24   1.880   36.590   130.572   1.648   32.976
8   24   32   1.880   37.083   174.096   1.648   34.310
12   36   64   1.880   55.625   348.192   1.648   51.466
16   48   128   1.880   74.166   696.384   1.648   68.621
24   72   256   1.880   111.250   1.392.768   1.648   102.931
M3
Geekbench         Cinebench
En   HP   GPU   Single    Multi   Metal   Single   Multi
4   4   4   2.068   8.951   26.598   1.813   9.085
4   8   8   2.068   13.778   47.880   1.813   10.800
4   12   12   2.068   20.394   71.820   1.813   15.594
4   16   16   2.068   27.011   95.760   1.813   20.387
4   24   24   2.068   40.246   143.640   1.813   29.974
8   24   32   2.068   40.787   191.520   1.813   31.187
12   36   64   2.068   61.181   383.040   1.813   46.781
16   48   128   2.068   81.574   766.080   1.813   62.374
24   72   256   2.068   122.362   1.532.160   1.813   86.342

[email protected] · March 14, 2021, 17:54:16

This is all well and good, but you can not use a EGPU box with a 3080 card or 64gb of ram and other issues that need to get resolved on these M1 processor systems. Will I get a new Mac with an Apple processor, yes but not for at least 2 more years when everything is working as it should.

riklaunim · March 14, 2021, 19:05:17

Quote from: [email protected] on March 14, 2021, 17:54:16
This is all well and good, but you can not use a EGPU box with a 3080 card

AFAIK eGPU on Apple is limited to AMD cards only. Newer Nvidia cards are not supported.

_MT_ · March 15, 2021, 12:08:55

Quote from: Mate on March 14, 2021, 15:29:10
Additionally ARM64 is almost as complex as x86-64. Reduced instruction set but still executes ~1000 different instructions. Also x86 CPUs are not RISC inside. Every CPU(M1 too) is translating instruction sets to micro-ops and then execute.

Well, real processor designs are converging. RISC folk found out they could never compete without more complex instructions. And CISC folk found the sets growing too complex, necessitating a sort of hardware emulation. You can also see wide designs getting faster and fast designs getting wider.

There are fundamental differences between ARM and x86. A big factor for x86 is backward compatibility. It's a strength and a weakness at the same time. They benefit from huge proliferation and conservatism in IT. But it's also tying their hands. They have to live with decisions made decades ago, under completely different circumstances. The rub is that once the market becomes open enough to change, they can take advantage of it as well. Either starting something new or redefining what x86 is. Essentially, what we are seeing is that there are many ways to skin a cat, so to speak. A design work is about compromises. It isn't religion, it isn't dogma.

Another step we might see in the not so distant future is a sort of super-core design with many-way SMT. For example, instead of having 8 cores with 2-way SMT (meaning 16 threads in total), you could have a single core with 20-way SMT (20 threads processed simultaneously). With dynamic partitioning of resources between the many logical cores. Achieving higher utilization. Of course, we might never see it if doesn't work out.

_MT_ · March 15, 2021, 12:48:25

Quote from: riklaunim on March 14, 2021, 19:05:17
AFAIK eGPU on Apple is limited to AMD cards only. Newer Nvidia cards are not supported.

Unless something has changed, no current eGPU will work on M1. I imagine the problem is simply that Apple isn't interested in releasing drivers. Potentially to keep the market captive.

On their Intel platforms, they use GPUs from AMD and so MacOS does have drivers for them.

Richard Costello · March 19, 2021, 18:26:39

Quote from: Mate on March 14, 2021, 15:29:10
@Richardd Castello You are wrong - M1 uses more transistors than Intel/AMD core.

M1 = 16 billion / AMD = 40 billion so don't talk out of your arse 'Mate'. The size of the transistors and the relative size of the cores means more supplemental silicon can be on chip hence SOC. Which will mean more transistors on the M1 but not per core. But that conversation is just confusing the layman which is what my post was aimed at. Same for the microcode and x86 internals not being RISClike. Out of order execution etc used to classed as a RISC like feature. But again, thats over complicating things.

You are probably aware as I am that ARM isn't really RISC anymore anyway. But in layman's terms the explanation needs relatively simple descriptions. Its called trying to help, rather than being a nob.

News:

Astounding M1X multi-core performance estimates have the Apple Silicon playing in the same ballpark as the Intel Core i9-10900K and Ryzen 7 5800X

Richardd Costello

Markus

Mate

Lee Rutter

Jan Onderwater

[email protected]

riklaunim

_MT_

_MT_

Richard Costello

Quick Reply