Atari2600
Golden Member
- Nov 22, 2016
- 1,409
- 1,655
- 136
I think this is all going too far isn't it ?
Is ignoring the only possibility ?
Long since put him on my ignore list.
Not adding anything interesting ===> ignore list.
I think this is all going too far isn't it ?
Is ignoring the only possibility ?
Did God prohibit AMD and Intel to use same size of L2 cache like Apple?
Core2Duo was using big shared L2 cache 10 years ago.
Yes? Physics is a bitch. L2 takes more die space than L3, and as you may have noticed, having a lot of L3 with good prefetch units can do a lot of improve the performance of multicore CPUs in parallel workloads with lots of intercore communication. Which is one sort of workload for which Intel and AMD have optimized their CPUs. Compare that situation to Apple who exclusively uses their A-series SoCs in phones and tablets where bursty, single-threaded (or sparsely-threaded) applications predominate. There you have less likelihood of core->core writes, meaning maintaining cache coherency is less important (and therefore, shared L3 is less important). So Apple chose to spend a lot of die area on L2 that could have been spent elsewhere, or that could not have been spent at all (driving higher yields and/or lower costs per die). Apple has the freedom to charge insane amounts of money for their hardware, and they don't have any OEMs telling them to trim costs, since they provide all their own SoCs for their own designs from top to bottom.
This is redacted. Apple L2 is shared so it's working just like L3 of todays x86 designs.
AMD does use L2 as last level cache with Jaquar
If Apple wants to move to an 8+4 DynamIQ configuration in 2021, they will have to rethink their SLC. Bottom line: L2 takes more transistors than L3 (or presumably SLC). When you increase L2 size, you make sacrifices elsewhere. And not all cache is made the same - look at the L2 and L3 performance of AMD chips versus Intel chips over the last ~15 years. Intel often had/has better cache performance.
None but letency. A big L2 will be slower than small L2. AMD reduced L1 either Data or Instruction cache to 32KB just to make it faster. And besides AMD64 processor have to cache a lot of different instruction than a ARM64 processor.Did God prohibit AMD and Intel to use same size of L2 cache like Apple? NO. Core2Duo was using big shared L2 cache 10 years ago. So you cry good but on a wrong shoulder here. You should write complain email to Apple headquarters to stop developing such a powerful cores because your ego cannot digest that your brand new x86 looks like garbage in compare to Apple uarch. Well, the problem is that you should complain 5 years ago because a very old Apple A9 Twister core from 2015 already had higher IPC by 7% than today's 9900K CoffeeLake and Zen2
Stop with the insults/confrontational postings.
Take some more time off for reflection.
AT Mod Usandthem
I don't have Tim Cook's Email. And has no interest to email small flys.So you cry good but on a wrong shoulder here. You should write complain email to Apple headquarters to stop developing such a powerful cores because your ego cannot digest that your brand new x86 looks like garbage in compare to Apple uarch.
Well, the problem is that you should complain 5 years ago because a very old Apple A9 Twister core from 2015 already had higher IPC by 7% than today's 9900K CoffeeLake and Zen2
What makes you think that L2 sram takes more transistors that L3? They use exactly same sram arrays.
Intel AMD/Zen L3 is actually part of a core. Every core has a slice of L3 where addresses are interleaved evenly to every slice. Pre Zen AMD chips have monolithic L3 which didn't scale with core count, similar to Apple SLC.
For phone, only thing that matters is power efficiency
Based on Andrei's breakdown, it looks like the L2 is partially shared:In A13, Lightning and Thunder do NOT share L2. The last-level cache between those core blocks is the 16 MB SLC. It's not much different than Conroe (see my previous post) since presently there are only two Lightning cores on A13.
Last year we determined that the L2 cache structure physically must be around 8MB in size, however we saw that it only looks as if the big cores only have access to around 6MB. Apple employs an “L2E” cache – this is seemingly a region of the big core L2 cache that serves as an L3 to the smaller efficiency cores (which themselves have their own shared L2 underneath in their CPU group).
In this region the new A13 behaves slightly different as there’s an additional “step” in the latency ladder till about 6MB. Frankly I don’t have any proper explanation as to what the microarchitecture is doing here till the 8MB mark. It does look however that the physical structure has remained at 8MB.
Based on Andrei's breakdown, it looks like the L2 is partially shared:
Yes, I for one am sick of trying to compare desktop/server/laptop cores to PHONE scores. If its the same OS (and I don't care what OS), running more than one benchmark, probably at least 5 different ones, than its a valid comparison. Otherwise its crap. You can't compare a PHONE CPU to a desktop , server or even real laptop. Now tablets and the like are a different story, kind of on their own also.I don't know why we have to keep bringing Apple on every thread in AT. Basically all the active threads.
Lisa keeps repeating AMD's core is high performance computing. Whether they are succeeding or not is a different matter. They are not chasing the cell phone market, at least on the CPU side for now.
Most of their architecture and patents have been targeted towards their EHP. At this time it has come to such a point that the major work around the architecture is interconnect not the core.
You can bet the core will have, at best, minor changes. But expect radical changes in interconnects, packaging, scalabilty, coherency with CPUs, GPUs, FPGAs etc.
Different goals different designs. Flip the purpose and you will see the outcome is different.
Snip
Yes, I for one am sick of trying to compare desktop/server/laptop cores to PHONE scores. If its the same OS (and I don't care what OS), running more than one benchmark, probably at least 5 different ones, than its a valid comparison. Otherwise its crap. You can't compare a PHONE CPU to a desktop , server or even real laptop. Now tablets and the like are a different story, kind of on their own also.
Good choice, but the guy is several levels of screwed already. He's included documents for TGL-H as well xdUpdate:
I edited this because I think the info was not meant to go out, unlike a github commit which is intentional.
Please edit the quote to the post tooGood choice, but the guy is several levels of screwed already. He's included documents for TGL-H as well xd
RIP.
Since this is a Ryzen 4000 series / Zen3 thread, what good would it do to start a new Intel vs Apple discussion in here?How about comparing the Intel CPU in a Surface Pro with the Apple CPU in an iPad Pro? They're in the same form factor, targeting the same market.
That can also have odd numbers for single CCD skus :>First, hi all!
I have not seen anyone posted anything about implications of the new ccx arrangement for product lineup.
AMD could have products from 1(yeah, I know) to 16 cores for desktop in this arrangement :
1
2
4(4100)
6(4300)
8(4600)
10(4700)
12(4800)
14(4900)
16(4950).
Yes, but those would be odd to market.That can also have odd numbers for single CCD skus :>
Sure, they could, but the question then becomes: is there a point to that much segmentation? They are currently selling a 12-core at $500 MSRP and a 16-core at $750. Is there really room for a 14-core in between, and is there a significant market consisting of "people who want more than 12 cores but don't need 16 or are willing to pay more than $500 but less than $750"? Because that sounds unlikely to me. I know Intel did this, but they also had astronomically inflated prices which made the artificial segmentation seem to make much more sense. That question becomes even more precarious between 12 and 8, as the price difference shrinks to half as much even if the relative increase in core count is higher.First, hi all!
I have not seen anyone posted anything about implications of the new ccx arrangement for product lineup.
AMD could have products from 1(yeah, I know) to 16 cores for desktop in this arrangement :
1
2
4(4100)
6(4300)
8(4600)
10(4700)
12(4800)
14(4900)
16(4950).
Sure, they could, but the question then becomes: is there a point to that much segmentation? They are currently selling a 12-core at $500 MSRP and a 16-core at $750. Is there really room for a 14-core in between, and is there a significant market consisting of "people who want more than 12 cores but don't need 16 or are willing to pay more than $500 but less than $750"? Because that sounds unlikely to me. I know Intel did this, but they also had astronomically inflated prices which made the artificial segmentation seem to make much more sense. That question becomes even more precarious between 12 and 8, as the price difference shrinks to half as much even if the relative increase in core count is higher.
Rather than excessively binning for functional cores like this (separate bins for every possible number of working cores per CCD) it seems much more sensible to stick with broader core count categories (8 working, >=6 working, etc.) as this would leave much more room to further bin for clock scaling, efficiency, etc. A chip with 7 working cores might kick butt as a high speed and efficient 12-core or 6-core CPU even if one of its working cores is a "dud" (that scales poorly/has lots of leakage etc.), while the same piece of silicon would fall into a much worse 7-core (14-core CPU) bin due to that single dud core - it might even be rejected and have to go through a second round of binning to see if it qualifies for a lower core count bin, making binning more complex, time consuming, costly and wasteful. Not to mention that half the bins (every odd number) would be for only one CPU (or a couple in tiers with multiple of the same core count) rather than several as in the current implementation.