Question How ahead is Intel in CPU design compared to AMD?

Adonisds · Feb 9, 2020

If we could get both companies to design a CPU core now on the same process and using the same number of transistors, who would be ahead and by how much? I'm assuming Intel would be ahead because Skylake has a similar IPC to Zen 2, but Zen 2 uses a more dense node, and Skylake is from 2015 while Zen 2 is from 2019.

The both companies using the same process scenario is just to illustrate the problem and I know it will never happen, but maybe there is a smart way to compare current CPU designs and find out who is ahead and by how much. Wikichip has die sizes and die pictures of Sunny Cove and Zen 2, but I'm not sure how to interpret and compare them.

Another question: If Intel had to design future cores using the same process and the same die size as Skylake, how much further could the Skylake design be improved? Also assuming it will have to have to be x86 too and have all the same functions.

lobz · Feb 11, 2020

Carfax83 said:
Yeah because Skylake-X has comparatively much less L3 cache than mainstream parts, slower non inclusive L3 cache, and the mesh topology which all hurt its gaming performance.

That said, Cascade lake-X can easily be tweaked to offer excellent gaming performance with overclocking (both core and mesh) and using high speed DRAM. I really want to see a desktop Willow Cove part with more than twice the amount of L3 cache as Cascade lake-X, assuming the HEDT version would keep the same cache configuration as the mobile version.

I already said that Zen 2 is a strong core. But for the sake of fairness, many people forget that it's also going up against an almost 5 year old Intel core.

If Intel hadn't screwed up their 10nm process, a Willow Cove desktop variant would be significantly more performant.

That was purely an answer to you stating that the L3 in Zen 2 can only do so much.

On the desktop you will only see WC backported to 14nm (if everything goes according to plan) in 2021 at the earliest, and I really don't think you'll like what it's gonna be like. Though I'm sure that the real world benchmarks are going to look awesome on the slides.

Please don't misunderstand me, Sunny Cove is an awesome uarch already, but here your phrase would fit perfectly: the amazing design can do only so much, trying to hide both the epic failure the 10nm is, and Intel's arrogance.

BigDaveX · Feb 11, 2020

coercitiv said:
The problem with this different timeline thinking of "what if X hadn't failed" is the weird butterfly effect. We know architectures are built with several production parameters in mind: cost, power, performance. So once you "go back in time" and change the outcome of node R&D, then you might influence architecture capabilities as well, because fixing a node is not done for free.

Let's explore. Let's say I get to go back in time and warn Intel of their 10nm fiasco. They believe me because I know the secret Intel password for time travelers. Armed with this knowledge Intel retargets 10nm parameters for 100% success chance. Better safe than sorry, right? Definitely better than this reality, but expect engineers to inform management that Cove arch was built with specific density in mind, hence in order to maintain the high margins they're used to... Cove will get some area adjustments.

So which Willow Cove should we compare Zen 2 against, the one we see today on a limping 10nm node... or the one they would have built for a realistic 10nm node?

Well, remember that the whole reason Intel started up the Tick-Tock cadence in the mid-2000s was to avoid a repeat of Prescott, which ended up being a perfect storm of a more power-hungry design ending up on a 90nm process that suffered from massive leakage issues at the kind of frequencies that Intel were pushing for.

That being the case, Intel would have used Cannon Lake as the guinea pig for their hypothetical non-broken 10nm process, seen what was feasible, and designed Ice Lake accordingly. As it was, all that Intel learned from Cannon Lake was that their initial 10nm process wasn't very useful for producing much more than pretty drinks coasters.

Gideon · Feb 11, 2020

Carfax83 said:
Also, I am close to 50ns for memory latency with quad channel DDR4 3400.

I'm not totally convinced about AMD's decision to embrace chiplet based designs. Although the Zen 2 core is strong, the step backward for memory latency is disconcerting to me. Sure the massive L3 cache helps a lot to hide memory latency, but that can only get you so much. It will be interesting to see what improvements if any Zen 3 will have in that area.

I still don't understand why people are keep pointiing at the chiplet desing for AMD latency problems. Their memory controller had bad latency before going chiplets (zen and zen+) and this was mostly due to the uncore design. The latency is suboptimal due to design-time and complexity issues (AMD didn't have the resources for zen 1 and focused elsewhere for zen 2). It was probably a conscious descision as server workloads are more sensitive to bandwidth rather than latency.

Chiplets are only a small part of AMD's latency disrepancy to Intel (which is about 15-20ns). Several electrical engineers have mentioned on twitter (including @chiakokhua) that going chiplets alone should only add <= 5ns of latency.
If you don't believe them, this packaging technology (BoW), which is very similar to what AMD does, also has a similar design goal (and has working chips on GloFo 14nm):

Since dies are spaced apart, a trace length of 25mm to 50mm is required with a latency of sub-5ns

TL;DR If the chiplets are under 5cm apart, It should should only cost up to 5ns of latency, unless the designers are incompetent. This seems to be backed up by the fact that Zen 2 latencies are about the same as Zen 1 (and between 5-8ns worse than Zen+).

Regarding Zen 3 - It seems the memory controller has been significantly reworked (a post where underfox lists patents of which many directly mention MC rework) That is on top of virtual uop cache (spills to L1 and above if needed) and a new unified L3 cache per CCD. Things *should* improve considerably with Zen 3.

coercitiv · Feb 11, 2020

Gideon said:
I still don't understand why people are keep pointiing at the chiplet desing for AMD latency problems.

It's simple. You see Achilles, you're gonna keep pointing at the heel. It doesn't matter if it's the wrong heel.

2blzd · Feb 11, 2020

lobz said:
1: Skylake-X is the worst example for IPC in gaming, as it falls even behind consumer Skylake.
2: "Sure the massive L3 cache helps a lot to hide memory latency, but that can only get you so much." - that can only get you so much, that Zen 2 has a higher IPC in the vast majority of workloads. Only so much.

Fun fact. 6900k is not Skylake-X...It's Haswell-E.

Carfax83 · Feb 11, 2020

Gideon said:
I still don't understand why people are keep pointiing at the chiplet desing for AMD latency problems. Their memory controller had bad latency before going chiplets (zen and zen+) and this was mostly due to the uncore design. The latency is suboptimal due to design-time and complexity issues (AMD didn't have the resources for zen 1 and focused elsewhere for zen 2). It was probably a conscious descision as server workloads are more sensitive to bandwidth rather than latency.

Chiplets are only a small part of AMD's latency disrepancy to Intel (which is about 15-20ns). Several electrical engineers have mentioned on twitter (including @chiakokhua) that going chiplets alone should only add <= 5ns of latency.
If you don't believe them, this packaging technology (BoW), which is very similar to what AMD does, also has a similar design goal (and has working chips on GloFo 14nm):

TL;DR If the chiplets are under 5cm apart, It should should only cost up to 5ns of latency, unless the designers are incompetent. This seems to be backed up by the fact that Zen 2 latencies are about the same as Zen 1 (and between 5-8ns worse than Zen+).

Well here is the benchmark in question which raised my eyebrow. The 2700x has a 22.5% advantage in memory latency compared to the 3700x. That to me seemed ridiculously large, and the reviewer himself said the chiplet design might be to blame, so that is what put that idea in my head:

These graphs all tell an interesting story, but that last one is the most curious. The revised “Zen+” cores in the second-generation Ryzen parts boasted significant improvements in memory latency over the original Ryzen CPUs—as you can see here—so the major regression in this metric is a bit disappointing. It was likely unavoidable in light of the chiplet-based construction of these CPUs, though.

Source

Regarding Zen 3 - It seems the memory controller has been significantly reworked (a post where underfox lists patents of which many directly mention MC rework) That is on top of virtual uop cache (spills to L1 and above if needed) and a new unified L3 cache per CCD. Things *should* improve considerably with Zen 3.

Well I certainly hope so. Zen 2 was an important step for AMD, but it is Zen 3 that will be the real show stopper and nail Intel's ass to the wall if it they can come up with another large double digit IPC gain while improving on the strengths of Zen 2 and ameliorating or nullifying the weaknesses.

And as I said before, the MASSIVE L3 caches that Zen 2 has is definitely effective for minimizing the impact of memory latency. Some of the games that Intel used to have a huge lead over AMD, ie the Far Cry series which have very little rendering parallelization, AMD was able to make big gains and almost close the gap:

Carfax83 · Feb 11, 2020

2blzd said:
Fun fact. 6900k is not Skylake-X...It's Haswell-E.

Actually the 6900K is Broadwell-E

moinmoin · Feb 11, 2020

Gideon said:
I still don't understand why people are keep pointiing at the chiplet desing for AMD latency problems. Their memory controller had bad latency before going chiplets (zen and zen+) and this was mostly due to the uncore design. The latency is suboptimal due to design-time and complexity issues (AMD didn't have the resources for zen 1 and focused elsewhere for zen 2). It was probably a conscious descision as server workloads are more sensitive to bandwidth rather than latency.

Chiplets are only a small part of AMD's latency disrepancy to Intel (which is about 15-20ns). Several electrical engineers have mentioned on twitter (including @chiakokhua) that going chiplets alone should only add <= 5ns of latency.
If you don't believe them, this packaging technology (BoW), which is very similar to what AMD does, also has a similar design goal (and has working chips on GloFo 14nm):

TL;DR If the chiplets are under 5cm apart, It should should only cost up to 5ns of latency, unless the designers are incompetent. This seems to be backed up by the fact that Zen 2 latencies are about the same as Zen 1 (and between 5-8ns worse than Zen+).

Regarding Zen 3 - It seems the memory controller has been significantly reworked (a post where underfox lists patents of which many directly mention MC rework) That is on top of virtual uop cache (spills to L1 and above if needed) and a new unified L3 cache per CCD. Things *should* improve considerably with Zen 3.

Good call. Btw. OP @Adonisds cache hierarchy with the different latencies is a crucial part of CPU design that directly affects IPC performance. All decisions are done around a balance, Skylake X is a very good example for where the focus was, through a mesh, to deliver a scalable uncore that keeps latencies within a small range. Unless overclocked those are significantly higher than those of the preceding ring bus. Unfortunately the mesh also needs much more power for all the links it offers.
Also whereas Intel has the advantage in L3$ and IMC latencies, AMD is actually in the clear lead regarding L1$ latencies: Whereas with its size increase Intel also increased the L1$ latency to 5 cycles in Ice Lake, Zen always had a L1$ latency of 4 cycles. That's a 25% time penalty for Ice Lake at the first bottleneck a core faces.

TheGiant · Feb 11, 2020

Carfax83 said:
That said, Cascade lake-X can easily be tweaked to offer excellent gaming performance with overclocking (both core and mesh) and using high speed DRAM. I really want to see a desktop Willow Cove part with more than twice the amount of L3 cache as Cascade lake-X, assuming the HEDT version would keep the same cache configuration as the mobile version.
significantly more performant.

gaming wise the whole skl-x lineup is a big zero to gaming
you can not tweak it to be competitive, you can only make the mesh penalty lower
your 6900K was a golden CPU- best of everything in that age
skl-x pretty much ruined it

Carfax83 said:
If Intel hadn't screwed up their 10nm process, a Willow Cove desktop variant would be significantly more performant.

but they did
without AMD challenging them - hello 4cores and 10nm nowhere to be seen

VirtualLarry · Feb 11, 2020

Arkaign said:
At the same time I feel like AMD could do more by adding a Vega chiplet to more AM4/AM5 consumer CPUs. The Ryzen 1300 for example is really hard to justify as a budget work PC option due to needing a GPU to go with it. This is true even up towards higher end options. I'd like to build office PCs using Ryzen 3600, but it's less attractive if I also have to buy a new GPU to go with it (can't really get away with used or old stock for business clients).

I hear you on that. I had high hopes for MCM mfg. Why not an I/O die for Ryzen 3000-series CPUs, with a tiny little Vega 5/6 iGPU. Maybe not enough for gaming, really, but enough for business desktop usage and presentations and especially, watching internet videos.

Gideon · Feb 11, 2020

Carfax83 said:
Well here is the benchmark in question which raised my eyebrow. The 2700x has a 22.5% advantage in memory latency compared to the 3700x. That to me seemed ridiculously large, and the reviewer himself said the chiplet design might be to blame, so that is what put that idea in my head:

Source

Thanks for the link! Yeah, I remember that review. Unfortunately techreport hasn't been quite the same after Scott Watson left and seems to be almost dead by now (the last meaningful review they did was RX 5700 in August 2019). I hope I'm wrong, as they have been a very good source for reviews for at least a decade. That's the site that invented frame-time percentage measurement after all!

Regardless, something is really off with their results. This is the AIDA score for my 3200Mhz CL16 Samsung B-Die running at 3466, otherwise stock (custom timings give better results):

These are my geekbench 4 results for my 1700x and 3700x drop-in-replacement (everything else is the same, at the exact same memory settings, 3466 Mhz):

Micro-Star International Co., Ltd. MS-7A34 vs Micro-Star International Co., Ltd. MS-7A34 - Geekbench

browser.geekbench.com

1700x: 75.9 ns
3700x: 74.2 ns

With expensive memory, and really aggressive timings people have got timings down to almost 60ns. See this thread:

Ryzen 3000 memory speed vs latency in gaming

https://www.purepc.pl/pamieci_ram/test_pamieci_do_procesorow_amd_ryzen_taktowania_vs_opoznienia?page=0,16 this is a really good read for someone who is still undecided between 3200 cl14 and 3600 cl16 3600 cl16 is always faster,by 1-2% in some games,by 5-7% in others. All tests done on single...

www.techpowerup.com

RetroZombie · Feb 11, 2020

Carfax83 said:
And as I said before, the MASSIVE L3 caches that Zen 2 has is definitely effective for minimizing the impact of memory latency.

The massive cache is doing that, but also adding latency.

sao123 · Feb 11, 2020

Adonisds said:
If we could get both companies to design a CPU core now on the same process and using the same number of transistors, who would be ahead and by how much? I'm assuming Intel would be ahead because Skylake has a similar IPC to Zen 2, but Zen 2 uses a more dense node, and Skylake is from 2015 while Zen 2 is from 2019.

The both companies using the same process scenario is just to illustrate the problem and I know it will never happen, but maybe there is a smart way to compare current CPU designs and find out who is ahead and by how much. Wikichip has die sizes and die pictures of Sunny Cove and Zen 2, but I'm not sure how to interpret and compare them.

Another question: If Intel had to design future cores using the same process and the same die size as Skylake, how much further could the Skylake design be improved? Also assuming it will have to have to be x86 too and have all the same functions.

To get back to the original question, I don't think there is a single answer to this question, as both have completely different design philosphies that are clearly dependent on the workload.
This question is not new, and has been a repetitive cycle for pretty much all of digital processing history... which is faster Linear Serial Architecture or Parallel Processing.

So... would you rather have 10 cars that each go 50 miles per hour or 1 car that goes 500 miles per hour? The answer is it depends on how far you need to go and how many people you need to move.

Typically speed advances are made in the serial portion of the loop, and then those advances are applied in a parallel architecture to have multiple copies of that individual serial architecture. Then a wall is hit and no further parallel advancement can happen and back to a single faster serial architecture it goes.
Intel and AMD are complete different phases of this cycle and thusly it is difficult to compare them in an apples to apples comparison.

To me it is clear intel is ahead in the serial portion of the cycle, and AMD is ahead in the parallel portion of the cycle. If you could take intels 4 core design and chiplet them into a single super computer, I think would be the biggest fastest solution... but that doesnt take into account any of the real logistics, such as what actual semiconductor wafers can and can't do, power and heat dissipation, the OS/software to even be able to use such a machine, etc

Carfax83 · Feb 11, 2020

TheGiant said:
gaming wise the whole skl-x lineup is a big zero to gaming
you can not tweak it to be competitive, you can only make the mesh penalty lower
your 6900K was a golden CPU- best of everything in that age
skl-x pretty much ruined it

Well Skylake-X was never going to be a top notch gaming CPU (at least not at stock clocks), because that's not what it's geared towards. However, it can definitely be a very competitive or even exceptional one with tweaks. The one thing nobody can't deny about Intel's 14nm process is that it's highly optimized for attaining very high clock speeds. When you overclock and tweak the 10980xe, it can beat an overclocked 9900K in many games and blow past any of the Zen 2 CPUs in most games:

but they did
without AMD challenging them - hello 4cores and 10nm nowhere to be seen

Yeah and it's a damn shame as well. A 12 core Willow Cove based HEDT CPU with tons of cache and quad channel high speed DDR4 would be right up my alley

Carfax83 · Feb 11, 2020

Gideon said:
Regardless, something is really off with their results. This is the AIDA score for my 3200Mhz CL16 Samsung B-Die running at 3466, otherwise stock (custom timings give better results):

Remember I was comparing the 2700x. The 2700x memory latency results are in line with other reviewers. The Zen+ core made improvements to the cache subsystem and memory controller, leading to lower latency all around.

Source

2nd%20Gen%20AMD%20Ryzen%20Desktop%20Processor-page-016_575px.jpg

These are my geekbench 4 results for my 1700x and 3700x drop-in-replacement (everything else is the same, at the exact same memory settings, 3466 Mhz):

Micro-Star International Co., Ltd. MS-7A34 vs Micro-Star International Co., Ltd. MS-7A34 - Geekbench

browser.geekbench.com

1700x: 75.9 ns
3700x: 74.2 ns

It would be great if you had a 2700x as well to compare. The original Zen had horrible memory latency, while the Zen+ core improved on it significantly.

thesmokingman · Feb 11, 2020

Carfax83 said:
When you overclock and tweak the 10980xe, it can beat an overclocked 9900K in many games and blow past any of the Zen 2 CPUs in most games:

Yea, at 4 times the power draw.

JasonLD · Feb 11, 2020

thesmokingman said:
Yea, at 4 times the power draw.

Well, tbh, if you completely disregard the power draw, they are definitely more fun to push than Zen architectures since they all seemingly have invisible wall when trying to push beyond its limit.

Carfax83 · Feb 11, 2020

thesmokingman said:
Yea, at 4 times the power draw.

I didn't know we were discussing power draw as well

On the real though, nobody in their right mind would recommend a Cascade Lake-X CPU for gaming, if that's what you're primarily going to be doing with it. They are productivity CPUs first and foremost. The point I was making, is that they can be excellent gaming CPUs when appropriately tweaked.

Arkaign · Feb 11, 2020

Carfax83 said:
Remember I was comparing the 2700x. The 2700x memory latency results are in line with other reviewers. The Zen+ core made improvements to the cache subsystem and memory controller, leading to lower latency all around.

Source

It would be great if you had a 2700x as well to compare. The original Zen had horrible memory latency, while the Zen+ core improved on it significantly.

If I go hard at 1.4v DDR4 4133, I get just over 34ns at 5.2Ghz. Most of the time I sit at 5Ghz at DDR4-4000 on stock volts though, 37-39ns. With my 3700x, best seems to be at 3466 speeds at around 62ns. But that is probably somewhat due to my Taichi 470 and this RAM. Had a hell of a time with stuff supposedly in the QVL, maybe because of the changes in bios to Zen2, and ended up with an unlisted 3600 kit.

Arkaign · Feb 12, 2020

VirtualLarry said:
I hear you on that. I had high hopes for MCM mfg. Why not an I/O die for Ryzen 3000-series CPUs, with a tiny little Vega 5/6 iGPU. Maybe not enough for gaming, really, but enough for business desktop usage and presentations and especially, watching internet videos.

Bingo, I'm not asking for the entire range for either AMD or Intel to have iGPU, just more options that make sense. Probably 90% of K-series Intel buyers would rather have a couple of extra cores instead of iGPU they're not using anyway, that's decidedly dGPU class stuff.

For AMD, my dream lineup not going into IPC and clocks, would see :

Ryzen 3 all have small Navi IGP chiplet. Immediately makes them far more attractive for basic office builds.

Ryzen 5 and 7 to have one basic model each with IGP, eg this gen would have been Ryzen 3600G and 3700G.

This opens the door to raising the roof a bit in the APU range to be more gaming specific. Add some CUs and RDNA2 design to make a $200ish model with perhaps twice the performance of the current best APU. Vega efficiency is poor per TF compared to even RDNA1, so that move could really boost things nicely. It would even be interesting to see if there was any packaging possibility for an APU with say 4GB of integrated HBM. I guess we'd need slot-AM5 for that LOFL. Would be hilarious retro greatness, and probably the only way to see APUs closer to what consoles can do. But who knows, maybe the 7nm+ and some clever engineering could make a socketable Zen+Navi+HBM derivative. I know it's probably a bit of a pipe dream due to it eating profitable potential from $150-$250 Navi dGPUs though.

Anyway, nuff rambling

MORE OPTIONS PLEASE, thanks AMD and Intel!

thesmokingman · Feb 12, 2020

Carfax83 said:
I didn't know we were discussing power draw as well

On the real though, nobody in their right mind would recommend a Cascade Lake-X CPU for gaming, if that's what you're primarily going to be doing with it. They are productivity CPUs first and foremost. The point I was making, is that they can be excellent gaming CPUs when appropriately tweaked.

is power draw not part of the design?

Arkaign · Feb 12, 2020

thesmokingman said:
is power draw not part of the design?

Seems like power draw is a compromise between the architecture and the process with the goals. Whenever a company is pressing, even with a good uarch but inferior process, seems like things get toasty. Eg; FX 8320/6350 did pretty well in efficiency, but when they tried to press too far, we got the 9590, a truly hilarious product. Now it's Intel's turn to be sitting on designs that are normally really efficient at 2-4Ghz, but quickly go far below a 1:1 power/performance metric as they push north from there as they try with mixed results to offer competitive performance.

One has to wonder what the results would look like with 14nm Zen2 vs 7nm TSMC Intel products. I'm thinking that the transistor density would limit Intel with hotspottinlg, and we'd actually see lower top clocks possible, while AMD would be forced to go with lower core counts, but achieve higher clock speeds using Intel's 14+++++ lol.

But all we honestly have is what's available, so who knows. I just know it will be monumental when Intel finally buries 14nm for good.

thesmokingman · Feb 12, 2020

Arkaign said:
Seems like power draw is a compromise between the architecture and the process with the goals. Whenever a company is pressing, even with a good uarch but inferior process, seems like things get toasty. Eg; FX 8320/6350 did pretty well in efficiency, but when they tried to press too far, we got the 9590, a truly hilarious product. Now it's Intel's turn to be sitting on designs that are normally really efficient at 2-4Ghz, but quickly go far below a 1:1 power/performance metric as they push north from there as they try with mixed results to offer competitive performance.

One has to wonder what the results would look like with 14nm Zen2 vs 7nm TSMC Intel products. I'm thinking that the transistor density would limit Intel with hotspottinlg, and we'd actually see lower top clocks possible, while AMD would be forced to go with lower core counts, but achieve higher clock speeds using Intel's 14+++++ lol.

But all we honestly have is what's available, so who knows. I just know it will be monumental when Intel finally buries 14nm for good.

It'd be hard to guess because the efficiency of Zen is not just in the process but it's design. It's changing the way we not only monitor cpu stats but how we even think about clockspeeds. Think effective clock vs discrete clock and core parking. But then again, reversing roles wouldn't be fair either because AMD designed their product with the resources they had available to them which was a crap ton less than Intel has at their disposal.

2blzd · Feb 12, 2020

Carfax83 said:
Actually the 6900K is Broadwell-E

Of course it is. I was just testing you to see if you actually knew.

🙋‍♂️

Carfax83 · Feb 12, 2020

Arkaign said:
If I go hard at 1.4v DDR4 4133, I get just over 34ns at 5.2Ghz. Most of the time I sit at 5Ghz at DDR4-4000 on stock volts though, 37-39ns. With my 3700x, best seems to be at 3466 speeds at around 62ns. But that is probably somewhat due to my Taichi 470 and this RAM. Had a hell of a time with stuff supposedly in the QVL, maybe because of the changes in bios to Zen2, and ended up with an unlisted 3600 kit.

This is the lowest score I've ever gotten, with the memory at stock voltage (1.35v). Can you imagine what the bandwidth and latency scores are going to be when we finally get DDR5 capable CPUs, ie Alder lake!

Question How ahead is Intel in CPU design compared to AMD?

Member

Platinum Member

Senior member

Platinum Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

No Lifer

Platinum Member

Senior member

Lifer

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Lifer

Lifer

Platinum Member

Lifer

Platinum Member

Senior member

Diamond Member