Quote from: Ben3000 on December 22, 2020, 03:39:18Quote from: Mate on December 20, 2020, 22:22:16
@_MT_
(sorry for links, I cant post with those im comment :< )QuoteIf you need to run at 4.5 GHz to match the performance, I can't see you matching efficiency (under high load) on the same manufacturing node.
Even if M1 high performance cores are probably a lot larger? As I said earlier my Ryzen 4900HS clearly beats M1 if I underclock(underTDP? dunno how to describe it) it to 20W, so in theory it should be under load as efficient as M1 sucking 14W. Anandtech reviewed m1 mac and discorevered that under one core load mini is draining 10.5W while on idle its only 4.2W.
images.anandtech. com /graphs/graph16252/119344.png
Let me quickly dispel something. Underclocking is the same as overclocking. You are trying to get as close as possible to the edge of the capacity of the silica.
There is a reason why your 4900HS uses a specific amount of power, in a specific range. Namely, to enable as much as possible CPU's to be sold, within a specific range. So your comparison is the same as overlocking a CPU and stating its faster then the competition, while maybe 90% of the other owners of that same CPU, can not get that gain.
Next error:
You are looking at 10.5W WALL drawing power for the single core performance of the mac Mini. That means:
* Power conversion loss
* Memory power consumption
* GPU
* Controller
* Fan
* ... and every other component
Where as the numbers you are trying to use for your 4900HS, are totally different measurements.
youtu+++be/y_jw38QD5qY?t=684
Replace the +++ with a dot.
Notice how in real world, the package power reported by the 4900HS is 54W, then drops to 35W for 20% faster performance. While the M1 stays at 13W, for 20% less performance then the 4900HS, with 2.4 to 4.8 times less power used.
A theoretical 8+4 Core M1X can produce 30% faster results then a 5900HS at twice the power efficiency given the benchmark leaks. So no, AMD can not magically make power / efficiency appear out of thin air.
AMD and Intel produce desktop CPUs, where power consumption is secondary to performance. And then downscale those to Laptops.
Apple used Phone parts, where power efficiency is primary and then upscaled them to their laptop parts. And this approach is paying off much better in efficiency.
And while one can complain about Apple 5nm. Apple did the investment to buy 5nm space ( and 80% 5nm in 2021 ). AMD did not. AMD people did not complain when AMD got 7nm and Intel was still stuck on 14nm.
So the whole "things will be different when AMD is at 5nm". By that time Apple we are looking past 2022 ( AMD is going 6nm in 2022 based upon the leaks ). That means Apple is already looking at the M3/M3X + 3nm.
People underestimate how much of a leap Apple has on the competition in this regard. Anyway, enjoy the fun competition times. One thing i see is that for Apple, it was a good move to do away with Intel, as a lot more people are interested in their products as a result.
Quote from: Mate on December 20, 2020, 22:22:16
@_MT_
(sorry for links, I cant post with those im comment :< )QuoteIf you need to run at 4.5 GHz to match the performance, I can't see you matching efficiency (under high load) on the same manufacturing node.
Even if M1 high performance cores are probably a lot larger? As I said earlier my Ryzen 4900HS clearly beats M1 if I underclock(underTDP? dunno how to describe it) it to 20W, so in theory it should be under load as efficient as M1 sucking 14W. Anandtech reviewed m1 mac and discorevered that under one core load mini is draining 10.5W while on idle its only 4.2W.
images.anandtech. com /graphs/graph16252/119344.png
Quote from: xpclient on December 21, 2020, 14:09:47Quote from: vertigo on December 20, 2020, 18:57:49It can compete on pure quality according to this comparison: REDACTED LINK It beats x264 on Slow as well. Every encoding task has to be evaluated in terms of time spent too. And Turing NVENC sort of marked a transition where the quality being so good and the speed insanely fast, the only reason left now to use software encoding is if you want to use low bitrates (2000 kbits/s or lower). Everyone's connections too are fast enough to stream high bitrate video.
While GPU encoding is indeed much faster, it's nowhere near as good, and it results in worse quality and larger file sizes. I know that's changed a lot with the RTX NVENC, but even though it's a lot better, I still wonder if it can compete with software encoding on pure quality, i.e. it might be comparable at similar speeds, but I suspect software encoding, when slowed down, would still be superior. So if you want to encode videos, software encoding is still probably the best option in order to maximize quality and file size, whereas if you're talking about Twitch streaming, yeah, NVENC is probably better, or at least as good.
Quote from: vertigo on December 20, 2020, 18:57:49It can compete on pure quality according to this comparison: https://unrealaussies.com/tech/nvenc-x264-quicksync-qsv-vp9-av1/ It beats x264 on Slow as well. Every encoding task has to be evaluated in terms of time spent too. And Turing NVENC sort of marked a transition where the quality being so good and the speed insanely fast, the only reason left now to use software encoding is if you want to use low bitrates (2000 kbits/s or lower). Everyone's connections too are fast enough to stream high bitrate video.
While GPU encoding is indeed much faster, it's nowhere near as good, and it results in worse quality and larger file sizes. I know that's changed a lot with the RTX NVENC, but even though it's a lot better, I still wonder if it can compete with software encoding on pure quality, i.e. it might be comparable at similar speeds, but I suspect software encoding, when slowed down, would still be superior. So if you want to encode videos, software encoding is still probably the best option in order to maximize quality and file size, whereas if you're talking about Twitch streaming, yeah, NVENC is probably better, or at least as good.
Quote from: Mate on December 20, 2020, 22:45:08QuoteImagine that instead of having AVX512 units inside your generic high performance cores, you'd have a core specifically designed for SIMD which could go even wider than 512. Yes, it's the iGPUActually GPU is also coprocessor, same thing with sound card on mother boardCode SelectI think Mate is a bit confused on this. x86 does have certain properties which complicate implementation. And because of backward compatibility, they can't get rid of them.
I was trying to say exactly this. x86 decoders are a lot more complex and now you want to add additional tasks on them too.
Quote
I'm certain it was one of key factors of CISC success in 80s when RAM was very small. Now it doesnt make almost any impact. Code occupying 2x more space in memory? No problem at all when simple GUI takes a lot more.
QuoteAgree - this is huge problem. x86 processors needs to run even code prepared for 286.... on the other hand ARM have only 32/64 switch, not compatibility with 16/32/64 and dozens of instruction sets extensions.
QuoteImagine that instead of having AVX512 units inside your generic high performance cores, you'd have a core specifically designed for SIMD which could go even wider than 512. Yes, it's the iGPUActually GPU is also coprocessor, same thing with sound card on mother board
I think Mate is a bit confused on this. x86 does have certain properties which complicate implementation. And because of backward compatibility, they can't get rid of them.
QuoteFor example, perhaps exactly because x86 had many specialized instructions, they opted for variable length (it allows you to make common instructions short so you can get away with smaller instruction cache and you use less memory and storage; you can take less space on average while having a lot more instructions
QuoteOne big advantage of ARM is that they're not afraid to break backward compatibility. Intel has to live with decisions they made in a different age.Agree - this is huge problem. x86 processors needs to run even code prepared for 286.... on the other hand ARM have only 32/64 switch, not compatibility with 16/32/64 and dozens of instruction sets extensions.
Quote from: vertigo on December 20, 2020, 17:05:43I think it's silly since I doubt the iGPU actually has 12 generations (that they really improved it every single time). On the other hand, I think that it's more consumer friendly and avoids unnecessary confusion. It's a processor. The GPU is inside. Having multiple generations associated with a single product might give information to us, but it would just confuse ordinary consumers. Consider the mess AMD created with Zen+, releasing mobile 3000 series with a different core then desktop, skipping 4000 series on desktop and then mixing two cores in mobile 5000 series. What a dog's breakfast.
That makes sense. Thanks. Still seems like they're playing some misleading marketing trickery, though, trying to play it off as though it's several generations in. Technically, the iGPU is, but the Xe design isn't.
QuoteIf you need to run at 4.5 GHz to match the performance, I can't see you matching efficiency (under high load) on the same manufacturing node.
QuoteYes, Apple has manufacturing edge. But that core kicks a**. So much that it can bear comparison with Zen 3 running at desktop frequencies. Can you imagine what 8 Firestorm cores could do with a decent power limit?
QuoteHopefully, though, the M1 will push AMD and Intel to improve their efficiency, so x86 options will become better with battery life and noise as well,They would need to tweak CPUs to work on lower TDP and with significantly reduced boost clock. Also manufacturing process is bigger 'player' there. Before Renoir AMD was using 12nm process that was worse than Intel 14nm, their mobile CPUs sucked and on top of that Intel had significant edge in battery life too. AMD celebrated switch to 7nm by releasing CPUs with 2x better performance than generation before and with better energy efficiency.
QuoteSounds like an interesting approach, but the downside would be that due to their specialization those coprocessors can only be utilitized a portion of the time, vs the CPU being able to be used for everything, so while it would be more efficient, it will require more space.
Quote from: vertigo on December 20, 2020, 19:19:02I think I would call what they're doing a unified processor. A processor that offers a number of different cores with very different characteristics that can cooperate efficiently (instead of number of separate processors that are external to each other). Imagine that instead of having AVX512 units inside your generic high performance cores, you'd have a core specifically designed for SIMD which could go even wider than 512. Yes, it's the iGPU. Why not look at iGPU units as just another type of core that is very good at certain types of tasks. Instead of having one monster core that can do everything, let's have independent cores that can each do very different things (it's, really, a question of packaging, organization). What I'm curious about is how far can they take it. When they run out of die space, what will they do. It's one thing to do this for MBA. It's another to scale this up to Mac Pro. M1 has four types of cores inside. Historically, iGPUs were too weak for such thoughts. They existed for efficiency, not performance. The problem with dGPU is that it's far away and there are significant overheads (it's worth it only for relatively big tasks). I think they're looking in this direction - improving cooperation through better integration.
If I'm understanding you correctly, you're saying ARM, due to its relative simplicity, is easier to implement coprocessors, and instead of having one powerful CPU doing everything, they're working toward a central CPU (yes, I realize that's redundant, but intentionally so) for main tasks that's assisted by numerous, specialized coprocessors for various different tasks. So basically, using delegation to accomplish things with better efficiency vs just having everything done by a more generalized chip (the CPU). Is that right? Sounds like an interesting approach, but the downside would be that due to their specialization those coprocessors can only be utilitized a portion of the time, vs the CPU being able to be used for everything, so while it would be more efficient, it will require more space.
Quote from: Mate on December 20, 2020, 18:26:17I certainly wouldn't put it that way ("can easily outperform"). I define efficiency as an average number of operations per watt in a given load (and I'm interested in general arithmetics). You have to remember that M1 is a 4+4 configuration. There are two different cores at play. It's the high performance, Firestorm cores that are of interest to me when we are comparing the state of technology. If memory serves me right, Firestorm running at around 3 GHz can square up with Zen 3 running at roughly 4.5 GHz (Zen 2 can't go high enough). That's a huge difference and it makes a significant difference to efficiency (it also depends on where exactly the voltage-frequency curve sits). The higher the frequency, the higher the voltage to maintain stable operation. And power goes up with the square of voltage (meaning efficiency can drop quickly at higher frequencies). If you need to run at 4.5 GHz to match the performance, I can't see you matching efficiency (under high load) on the same manufacturing node.
Well, it depends how you define efficiency. According to TSMC 5nm improved energy efficiency by 30% and M1 is using 14W under load. Ryzen Renoir(8 cores) can easily outperform Apple chip in general tasks with 20W so in theory it should be more efficient under load as 20*0,7 = 14.
Quote from: Mate on December 20, 2020, 18:26:17
@_MT_QuoteEven if AMD was currently on the 5nm node, they wouldn't be able to match M1's efficiency.Well, it depends how you define efficiency. According to TSMC 5nm improved energy efficiency by 30% and M1 is using 14W under load. Ryzen Renoir(8 cores) can easily outperform Apple chip in general tasks with 20W so in theory it should be more efficient under load as 20*0,7 = 14.
However its clear that Apple aims to cover many use cases with dedicated coprocessors that will make their processors very efficient. It is also probably easier to implement more coprocoessors for ARM than in x86 as this makes x86 decoders even more complex. As things are now I can clearly see Macbooks in year or two just as better alternative for x86 premium laptops in performance/price ratio in every segment. AMD and Intel have very complex processor designs and it will be harder for them to keep up with Apple each year.
@Vertigo
Maybe I used wrong words - I'm using Windows(for my personal computer) and MacOS side by side for years. However now M1 is very close to meet all my performance expectations(also for gaming!) and blow away x86 devices in terms of battery life and silence. My G14 would be already on ebay if m1 macbooks could handle two external displays in normal way. I'm really tired of that annoying noise and heating for no reason.
Quote from: xpclient on December 20, 2020, 17:52:28Quote from: neblogai on December 20, 2020, 14:12:25
RDNA2 is coming to Van Gogh- a competitor to Intel's UP4 APUs, and it will likely trounce it in games in ultraportable/fanless. For gaming laptops- H-series will use dGPU, U-series can use dGPUs like MX350/450 too. And that iGPU in TigerLake is nothing special- Intel was showing wonders their slides, but in real life laptop tests- even current Renoir with Vega iGPU goes toe to toe with them. Lucienne/Cezanne, with higher enabled clocks and numbers of CUs, is likely to be somewhat faster.
The only real area where Intel leads, is Thunderbolt. Cezanne is still PCIe 3.0, and no thunderbolt or USB4. So for external dGPUs- Intel is the only option.
You are thinking purely from a gaming performance perspective. I was thinking more from a media encoding perspective, not just gaming. I already have a gaming laptop with Intel Coffee Lake Refresh CPU+Nvidia RTX 2080 GPU but I also use it heavily for media encoding since NVENC is the best in the industry there is for H.264 and HEVC encoding after Turing.
Tiger Lake has added VP9 so I'd like to check it out, and a reworked quality-focused HEVC encoder and AV1 hardware decoding which makes it attractive for this market. :) Nvidia's MX graphics lack NVENC entirely and AMD's Video Core Next is lacking in quality and features compared to Intel and Nvidia's implementations.
QuoteEven if AMD was currently on the 5nm node, they wouldn't be able to match M1's efficiency.Well, it depends how you define efficiency. According to TSMC 5nm improved energy efficiency by 30% and M1 is using 14W under load. Ryzen Renoir(8 cores) can easily outperform Apple chip in general tasks with 20W so in theory it should be more efficient under load as 20*0,7 = 14.