Discussion Intel current and future Lakes & Rapids thread

mikk · Dec 8, 2019

Spider-Man said:
Now, they used DDR4-3200 on AMD and DDR4-2666 on Intel. That's a ~20% difference in memory speed. They also divided performance by GHz, which disadvantages the higher clocked CPU because most workloads don't scale perfectly with clock speed.

Yes it's flawed but the tester didn't understand it, so there are better IPC tests available. The one from the stilt is a lot better.

RetroZombie · Dec 8, 2019

mikk said:
Yes it's flawed but the tester didn't understand it, so there are better IPC tests available. The one from the stilt is a lot better.

I don't think so, if the tester used some H310 or B360 motherboard what would be the performance?
And if they used an motherboard without MCE* capability what performance would you get?
With AMD using the potato A320 or the premium X570 the cpu always performs the same.

*(multi core enhancement)

tamz_msc · Dec 8, 2019

mikk said:
Yes it's flawed but the tester didn't understand it, so there are better IPC tests available. The one from the stilt is a lot better.

Spider-Man said:
So, according to Anandtech, Zen 2 has ~7% higher IPC than CFL-R (P0).

Now, they used DDR4-3200 on AMD and DDR4-2666 on Intel. That's a ~20% difference in memory speed. They also divided performance by GHz, which disadvantages the higher clocked CPU because most workloads don't scale perfectly with clock speed.

The Stilt tests a large suite and locks both CPUs to the same core and memory frequencies. SKX-R also ends up faster than both even when you look at the ER results.

Phoronix shows that the hardware fixes going from Skylake-SP to Cascade Lake-SP provide a positive uplift in performance (a smaller penalty). They even show a positive uplift in performance going from CFL-R (P0) to CFL-R (R0).

First of all, Andrei tested both at 3200 MHz, both of them at CL16. If only people actually read the page on which they did the SPEC testing. At the risk of sounding like a broken record, I'll repeat what Andrei has said many times - testing at a lower fixed frequency artificially inflates IPC by reducing the effective memory latency. Thus, testing for IPC should always be done at the highest possible frequency the architecture was designed to operate at. IPC is not independent of the frequency at which it is measured. This means that an i9 9900T with its lower clocks has higher IPC than an i9 9900K in any workload that is memory-bound. Therefore, Anandtech's testing is correct; The Stilt's results, even if they encompass a wider variety of workloads, doesn't paint a true picture.

It's quite surprising that there are still quite a few people around who insist on not believing objective facts, decrying them using the same tired, old arguments to suit their narrative, and the extent to which they resort to these arguments is almost pathological.

IntelUser2000 · Dec 8, 2019

tamz_msc said:
At the risk of sounding like a broken record, I'll repeat what Andrei has said many times - testing at a lower fixed frequency artificially inflates IPC by reducing the effective memory latency. Thus, testing for IPC should always be done at the highest possible frequency the architecture was designed to operate at. IPC is not independent of the frequency at which it is measured.

Ok, when you do ~~IPC~~ perf/clock tests, you do it for the purpose of finding out how they'll fare in the future with hypothetical scenarios.

Stilt didn't run his chips at 2.5GHz or some odd low frequency, he ran them at 3.8GHz, which is fairly high. Even in AT tests the difference in frequency between the 9900K and 3900X is only 7.5%. The point of fixing frequencies is especially valid because for whatever reason its easy to knock off 100MHz or 200MHz when its left to the CPU to auto boost to its max. And of course when you want to see how the architecture does you want to fix it anyway.

I am going to link to two more perf/clock tests-

AMD Ryzen 9 3900X Vs Intel Core i9-9900K IPC Shootout: Did AMD Close The Gap? - Page 2

AMD's new Ryzen CPUs can clock as high as 4.5 GHz, a notable bump over previous models, but what about AMD's purported IPC gains? - Page 2

hothardware.com

AMD Ryzen 5 3600: Supervýhodné šestijádro v testu

Pro mnohé hráče jsou Ryzeny 7 stále drahé a zvažují zakoupit procesor spíše z řady Ryzen 5. Ryzen 5 3600 nabízí šest jader a výkon jako starší Core i7-8700K od Intelu, ale stojí polovinu. Spolu s levnou starší deskou nemá z hlediska poměru cena/výkon konkurenci. Bohužel nejde moc taktovat, a to...

pctuning.tyden.cz

Zen 2 has a lead in Cinebench and other more vector and SIMD oriented code while Intel has lead in Integer. You can even see that reflected in Sisoftware Sandra tests with CFL leading in Dhrystone and Zen 2 leading in Whetstone, and also Anandtech tests. Continuing the tradition of Intel fairing better on Integer and AMD on FP.

SpecCPU2006
Int - 1.9% lead for CFL
FP - 6.5% lead for Zen 2

SpecCPU2017 Rate 1*
Int - 5% lead for Zen 2
FP - 7.8% lead for Zen 2

(Strictly speaking Integer is the better indicator of a faster architecture as ILP is higher in general with floating point code and you can get easy double digit gains by doubling vector unit resources, but as a product you have to consider both)

Based on tests it seems SpecCPU2006 is a better indicator of performance for the two architectures.

*Not to mention for some bizarre reason the tester decided to use Rate benchmark set to 1 thread for SpecCPU2017 rather than use the Speed version which is by default the benchmark for scalar performance.

tamz_msc · Dec 9, 2019

IntelUser2000 said:
Stilt didn't run his chips at 2.5GHz or some odd low frequency, he ran them at 3.8GHz, which is fairly high. Even in AT tests the difference in frequency between the 9900K and 3900X is only 7.5%. The point of fixing frequencies is especially valid because for whatever reason its easy to knock off 100MHz or 200MHz when its left to the CPU to auto boost to its max. And of course when you want to see how the architecture does you want to fix it anyway.

The point isn't whether 3.8GHz is high enough or whether 2.5GHz is too low. The point is that any deviation from fmax will have an effect when measuring PPC. Here, I'm giving a direct quote from Andrei:

Andrei. said:
Here's my 3700X on 429.mcf which is memory intensive:

4325MHz: 50.68 score, 11.71 score per GHz
3500MHz: 45.49 score, 12.99 score per GHz +10.9% IPC
3000MHz: 39.43 score, 13.14 score per GHz +12.1% IPC

And this is why measuring IPC at some arbitrary equal frequency between systems and especially between different micro-architectures is a load of crap.(emphasis mine)

Based on tests it seems SpecCPU2006 is a better indicator of performance for the two architectures.

SPEC2006 has libquantum which is known to give very high scores due to compiler shenanigans. I would discount it in favor of the newer 2017 suite.

*Not to mention for some bizarre reason the tester decided to use Rate benchmark set to 1 thread for SpecCPU2017 rather than use the Speed version which is by default the benchmark for scalar performance.

Speed vs rate is time to completion vs throughput. One generally runs multiple copies of the rate benchmark. Setting rate to 1T is how Intel and AMD report PPC numbers on their own slides. It is standard practice. Besides Andrei has given his reasons for running rate instead of speed:

Moving on to the 2017 suite, we have to clarify that we’re using the Rate benchmark variations. The 2017 suite’s speed and rate benchmarks differ from each other in terms of workloads. The speed tests were designed for single-threaded testing and have large memory demands of up to 11GB, while the rate tests were meant for multi-process tests. We’re using the rate variations of the benchmarks because we don’t see any large differentiation between the two variations in terms of their characterisation and thus the performance scaling between the both should be extremely similar. On top of that, the rate benchmarks take up to 5x less time (+1 hour vs +6 hours), and we're able run them on more memory limited platforms (which we plan on to do in the future).

Spider-Man · Dec 9, 2019

tamz_msc said:
First of all, Andrei tested both at 3200 MHz, both of them at CL16. If only people actually read the page on which they did the SPEC testing. At the risk of sounding like a broken record, I'll repeat what Andrei has said many times - testing at a lower fixed frequency artificially inflates IPC by reducing the effective memory latency. Thus, testing for IPC should always be done at the highest possible frequency the architecture was designed to operate at. IPC is not independent of the frequency at which it is measured. This means that an i9 9900T with its lower clocks has higher IPC than an i9 9900K in any workload that is memory-bound. Therefore, Anandtech's testing is correct; The Stilt's results, even if they encompass a wider variety of workloads, doesn't paint a true picture.

It's quite surprising that there are still quite a few people around who insist on not believing objective facts, decrying them using the same tired, old arguments to suit their narrative, and the extent to which they resort to these arguments is almost pathological.

You understood it wrong. I confirmed it from them that they tested the 9900K at DDR4-2666.

And what you said about the 9900T and 9900K is illogical. Also, the architecture on mainstream 9th Gen is the same architecture used on mainstream 6th Gen. Their methodology is flawed.

@IntelUser2000 brought up valid points.

Gideon · Dec 9, 2019

tamz_msc said:
The point isn't whether 3.8GHz is high enough or whether 2.5GHz is too low. The point is that any deviation from fmax will have an effect when measuring PPC. Here, I'm giving a direct quote from Andrei:

Andrei. said:

Here's my 3700X on 429.mcf which is memory intensive:

4325MHz: 50.68 score, 11.71 score per GHz
3500MHz: 45.49 score, 12.99 score per GHz +10.9% IPC
3000MHz: 39.43 score, 13.14 score per GHz +12.1% IPC

And this is why measuring IPC at some arbitrary equal frequency between systems and especially between different micro-architectures is a load of crap.(emphasis mine)

Click to expand...

While I understand what @Andrei. is saying there, I'm still not quite convinced this methodology is superior.

I'd like to see the IPC comparison between 6700K vs 9900KS. They are the same micro-architecture, so if 6700K ends up with 10% more IPC, then this just plainly isn't a very good metric. At least it definitely isn't something anyone would intuitively consider as IPC.

AMD and Intel are both running these flagship chips in the farthest end of the frequency/voltage curve possible. This methodology heavily screws them, while helping mobile chips, which are run at the optimal point of the curve.

This would be particularly evident when comparing these resuts to mobile chips (e.g. the A13 review). In that case they should use the most efficient chips instead (probably 15W-25W mobile SKUs)

Gideon · Dec 9, 2019

Spider-Man said:
You understood it wrong. I confirmed it from them that they tested the 9900K at DDR4-2666.

Do you happen to have any proof to these claims? Anandtech's review states in no uncertain terms:

The Ryzen 3900X system was run in the same way as the rest of our article with DDR4-3200CL16, same as with the i9-9900K, whilst the Ryzen 2700X had DDR-2933 with similar CL16 16-16-16-38 timings.

Essentially what you're saying is that they flat out lied (by giving you the "real info" and yet not updating the article that strictly says otherwise). IMO such a claim requires some proof.

IntelUser2000 · Dec 9, 2019

tamz_msc said:
The point isn't whether 3.8GHz is high enough or whether 2.5GHz is too low. The point is that any deviation from fmax will have an effect when measuring PPC. Here, I'm giving a direct quote from Andrei:

SPEC2006 has libquantum which is known to give very high scores due to compiler shenanigans. I would discount it in favor of the newer 2017 suite.

Exactly. So they should have run them at equal, and fixed frequencies. Because we want to know how the architectures compare in terms of academic terms. The same as most wanting to know how the Apple A13 compares to Zen/CFL irrespective of clocks. The clock speeds between the two aren't so different that you can call CFL a speed demon and Zen not, unlike against the A13.

I calculated SpecCPUInt benchmark to have ~85% scaling. So if one CPU has 100% advantage in clocks the one with half the clock has 1/0.85 or ~20% PPC advantage. For the FP test sure its quite memory sensitive but for Int its not, and that's what happens in most consumer workloads. The subtest .mcf shows 48% scaling which is extremely poor. But by looking at all the tests it evens out to be fairly realistic.

And again, the SpecCPU results show a rough correlation between the two chips - Intel does better on Integer and AMD on FP. The so-called compiler shenanigans don't even apply when AT tests don't even run the Intel compiler to address such concerns.

Besides Andrei has given his reasons for running rate instead of speed:

He should have done it anyway. Why? Because that's how benchmarks are done. To be equal as you can be. Otherwise you end up with asking more questions.

Spider-Man · Dec 9, 2019

Gideon said:
Do you happen to have any proof to these claims? Anandtech's review states in no uncertain terms:

Essentially what you're saying is that they flat out lied (by giving you the "real info" and yet not updating the article that strictly says otherwise). IMO such a claim requires some proof.

Ask them on Twitter.

tamz_msc · Dec 9, 2019

Spider-Man said:
You understood it wrong. I confirmed it from them that they tested the 9900K at DDR4-2666.

And what you said about the 9900T and 9900K is illogical. Also, the architecture on mainstream 9th Gen is the same architecture used on mainstream 6th Gen. Their methodology is flawed.

@IntelUser2000 brought up valid points.

Where's your proof that the 9900K was tested at 2666 MHz memory? You're making a claim therefore the onus is on you to prove that claim by linking the appropriate Twitter post, if you claim that you asked about it on Twitter.

Why is the 9900T Vs 9900K comparison illogical? Andrei has clearly proven with the help of an example that PPC is higher if you lower the operating frequency.

Andrei. · Dec 9, 2019

Spider-Man said:
You understood it wrong. I confirmed it from them that they tested the 9900K at DDR4-2666.

And what you said about the 9900T and 9900K is illogical. Also, the architecture on mainstream 9th Gen is the same architecture used on mainstream 6th Gen. Their methodology is flawed.

@IntelUser2000 brought up valid points.

Spider-Man said:
Ask them on Twitter.

I don't know who you asked on Twitter, certainly not me.

The 9900K SPEC data out there was on 3200CL16 as described in the article.

Yes, this is admittedly wrong and it should have been 2666 from a product standpoint - it was run on an available system at the time and I had forgotten to mention that the memory would have needed to be downclocked. On the other hand that article wasn't the place where I would go into arguments about memory imbalance and having all the systems at 3200CL16 was a better choice in regards to talking about Zen2.

Spider-Man said:
They also divided performance by GHz, which disadvantages the higher clocked CPU because most workloads don't scale perfectly with clock speed.

What kind of stupid rationale is that? The only performance point that matters in IPC tests is peak performance. That this peak performance point lands at different clock speeds across different micro-architectures is irrelevant. Workloads don't scale perfectly with clock speed because execution resources of a CPU isn't the only thing that affects performance, you also have to move data around and that's memory.

Gideon said:
While I understand what @Andrei. is saying there, I'm still not quite convinced this methodology is superior.

I'd like to see the IPC comparison between 6700K vs 9900KS. They are the same micro-architecture, so if 6700K ends up with 10% more IPC, then this just plainly isn't a very good metric. At least it definitely isn't something anyone would intuitively consider as IPC.

The 9900K probably still has higher IPC because it has double the L3 cache which affects performance likely to a higher degree than the core-clock:memory imbalance between the two SKUs. The CPU cores actually being the same µarch is irrelevant here.

I don't get the argument here, of course you have to take memory into account in IPC because that's like essentially the single most important aspect of a CPU. We'd otherwise just be measuring Dhrystone IPC where the whole dataset fits in the L1 cache.

Gideon · Dec 9, 2019

IntelUser2000 said:
He should have done it anyway. Why? Because that's how benchmarks are done. To be equal as you can be. Otherwise you end up with asking more questions.

How does your statement make sense in this context? Comparing an orange to another orange isn't any different than comparing two apples. If running ratee allows them to compare to more devices (like memory-limited mobile devices) this absolutely makes sense, especially if the results are similar.

tamz_msc · Dec 9, 2019

IntelUser2000 said:
Exactly. So they should have run them at equal, and fixed frequencies.

That I disagree with, given Andrei's findings.

And again, the SpecCPU results show a rough correlation between the two chips - Intel does better on Integer and AMD on FP. The so-called compiler shenanigans don't even apply when AT tests don't even run the Intel compiler to address such concerns.

It's not just the Intel compiler that does funny things with libquantum. I'm not sure if gcc does it as well, but I'm fairly certain that ICC isn't the only culprit. Besides, SPEC2017 is newer, more memory intensive, having a larger memory footprint and drops libquantum altogether. It's overall a better benchmark than SPEC2006.

He should have done it anyway. Why? Because that's how benchmarks are done. To be equal as you can be. Otherwise you end up with asking more questions.

I think that the reasons Andrei gave are perfectly valid reasons. Besides, both Intel and AMD report SPECrate instead of SPECspeed on their official slides.

Andrei. · Dec 9, 2019

IntelUser2000 said:
He should have done it anyway. Why? Because that's how benchmarks are done. To be equal as you can be. Otherwise you end up with asking more questions.

As equal to what? Who's to decide "how benchmarks are done"?

The rate benchmarks in their characterisation are almost identical to the speed versions, with runtime being by far the biggest difference, and that imagick is missing from the rate suite.

When I went ahead to decide on all of this I talked with all the CPU vendors and Intel even outright recommended to just use Rate 1T instead of Speed, because of the practical reasons I stated in the article originally.

tamz_msc said:
It's not just the Intel compiler that does funny things with libquantum. I'm not sure if gcc does it as well, but I'm fairly certain that ICC isn't the only culprit. Besides, SPEC2017 is newer, more memory intensive, having a larger memory footprint and drops libquantum altogether. It's overall a better benchmark than SPEC2006.

There's nothing wrong with libquantum on LLVM or GCC. ICC vectorizes across multiple cores and that's the issue as it's no longer an actual ST test in that situation.

tamz_msc said:
I think that the reasons Andrei gave are perfectly valid reasons. Besides, both Intel and AMD report SPECrate instead of SPECspeed on their official slides.

Intel and AMD have to follow SPEC publishing guidelines which are more anal in regards to what they can and can't publish. As our results aren't official submissions we can do whatever as long as it's marked as estimates.

tamz_msc · Dec 9, 2019

Andrei. said:
There's nothing wrong with libquantum on LLVM or GCC. ICC vectorizes across multiple cores and that's the issue as it's no longer an actual ST test in that situation.

Thanks for the clarification. Do you think that now that SPEC has stopped taking submissions for CPU2006 benchmarks, it is time more people started evaluating the 2017 suite, provided of course that there are no other constraints?

Intel and AMD have to follow SPEC publishing guidelines which are more anal in regards to what they can and can't publish. As our results aren't official submissions we can do whatever as long as it's marked as estimates.

I'm well aware of SPEC publishing guidelines; I was just talking about what Intel and AMD put out on their marketing slides.

Andrei. · Dec 9, 2019

tamz_msc said:
Thanks for the clarification. Do you think that now that SPEC has stopped taking submissions for CPU2006 benchmarks, it is time more people started evaluating the 2017 suite, provided of course that there are no other constraints?

There's not *that* big scaling difference between the two, 2017 is more memory heavy - for example you see AMD doing better on Zen2 on 2017 in relation to Intel than on 2006, and my take on it is because AMD has a stronger core memory system than Intel right now (Yes Im talking at the core level, Intel still has stronger system memory subsystem hence their latency advantage).

CPU2006 is academically very well understood and essentially my reason for still using (besides mobile) it is that given two results from two known micro-architectures I can exactly pinpoint what characteristics and changes of the µarch has affected given sub-tests.

tamz_msc said:
I'm well aware of SPEC publishing guidelines; I was just talking about what Intel and AMD put out on their marketing slides.

I don't think either of them have published 1T rate slides, at least not that I'm aware of.

Nothingness · Dec 9, 2019

Andrei. said:
When I went ahead to decide on all of this I talked with all the CPU vendors and Intel even outright recommended to just use Rate 1T instead of Speed, because of the practical reasons I stated in the article originally.

I can confirm all CPU perf teams I know only rely on SPECCPU 2017 rate with one thread.

There's nothing wrong with libquantum on LLVM or GCC. ICC vectorizes across multiple cores and that's the issue as it's no longer an actual ST test in that situation.

Are you sure that's the trick?

From LLVM Performance Improvements and Headroom AoS->SoA brings x2 and various very aggressive inter-procedural optims bring 10x. I'd expect icc to use these.

Andrei. · Dec 9, 2019

Nothingness said:
Are you sure that's the trick?

From LLVM Performance Improvements and Headroom AoS->SoA brings x2 and various very aggressive inter-procedural optims bring 10x. I'd expect icc to use these.

Sure, but I don't think that's the main % of Intel's 20x advantage.

Nothingness · Dec 9, 2019

Andrei. said:
Sure, but I don't think that's the main % of Intel's 20x advantage.

Well if the LLVM info are correct then that's 20x

Magic Carpet · Dec 9, 2019

tamz_msc said:
I'll repeat what Andrei has said many times - testing at a lower fixed frequency artificially inflates IPC by reducing the effective memory latency. Thus, testing for IPC should always be done at the highest possible frequency the architecture was designed to operate at. IPC is not independent of the frequency at which it is measured.

Correct, however lower clocks *may* give you better performance per watt

lobz · Dec 9, 2019

IntelUser2000 said:
The 32EU Gen 11 is ~20% faster than Gen 9, and 64EU Gen 11 is 2x as fast as Gen 9. Things not scaling linearly benefits the lower end more.

Might only end up being 10-20% slower than the 64EU Gen 11. Of course that's if whatever variant going in RKL behaves similarly to the one in Tigerlake and assuming TGL graphics is really 2x as fast as Gen 11.

Thanks @lobz

That's just because fewer EUs stress mem BW a lot less - well... theoretically it's not scaling, but in the end in practice it is, so you're right anyway

lobz · Dec 9, 2019

Spider-Man said:
Ask them on Twitter.

So you say they forgot to correct the article?

Andrei. · Dec 9, 2019

Nothingness said:
Well if the LLVM info are correct then that's 20x

That presentation is from 2015, you mean to say they didn't implement those features yet? I'm not familiar with the track-record on the matter, I'll have to look into it.

Spider-Man · Dec 10, 2019

lobz said:
So you say they forgot to correct the article?

It's a mistake on my part. That comparison was done with the same memory on both the 3900X and the 9900K. The general testing was done with different memory (although you could go down the rabbit hole of in spec and out of spec memory).

That still doesn't take away from the fact that dividing performance by clocks almost always disadvantages the higher clocked part. The 3900X isn't even the highest clocking 3rd Gen part. That would be the 3950X. You'll see lower IPC on it than the 3900X, so it's a disservice to Intel here.

SKX-R also wasn't tested. I guarantee that would do better considering you're looking at a 4.5GHz max boost vs 5.0GHz max boost.

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Senior member

Diamond Member

Elite Member

Diamond Member

Junior Member

Platinum Member

Platinum Member

Elite Member

Junior Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Senior member

Junior Member