Discussion Intel current and future Lakes & Rapids thread

Abwx · Aug 3, 2019

Not wanting to add to this IPC debate since that s not the purpose of the thread, but it is worth being mentioned :

https://dash.harvard.edu/bitstream/handle/1/25415919/selse13vopt.pdf?sequence=1

OTT we ll see how the things settles once the chip will be tested elsewhere than in a controled environment without a proper OS.

DrMrLordX · Aug 3, 2019

Abwx said:
OTT we ll see how the things settles once the chip will be tested elsewhere than in a controled environment without a proper OS.

Intel will start selling these things eventually. Question is, how many reviewers will get samples?

ajc9988 · Aug 3, 2019

Yotsugi said:
CPU freq affects your average IPC because your memory doesn't go faster together with CPU.

That is NOT how this works, except as Andrei pointed out on how memory can effect IPC with changes to core frequency.

Let's do an example:

IPC is 10 instructions per cycle, and you run at 4GHz.

10 IPC * 4,000,000,000 = 40,000,000,000 Instructions per second.

Let's now say IPC is 12, but the speed is only 3.6GHz

12 IPC * 3,600,000,000 cycles per second = 43,200,000,000 Instructions per second.

Increasing CPU speed, if perfectly controlled for with other variables, works in this way. It increases the number of cycles per second, not effecting the IPC. If I left the IPC at 10, but reduced the frequency to 3.6, it would produce 36,000,000,000 Instructions per second.

What Andrei referred to below is that you cannot always control for other factors, like the memory subsystems, effects on cache and cache latency, etc. But that is a more advanced and nuanced statement than yours, which is that CPU freq effects the average IPC because your mem doesn't go faster together with cpu, to modify the quoted statement from you.

Zucker2k said:
I don't know what's funny because 1 CYCLE = 1 CLOCK CYCLE. Power budget is not a consideration in an IPC test. Jesus!!

I'm happy you said this, at least, and not that IPC is determined by power budget.

This is easily solved by making sure the memory subsystem is not a bottleneck in your test. Or, you could simply stick to manufacturer specs.

See explanation above and discussion with Andrei below.

SAAA said:
I was always curious of this notion so today I quickly tested with CPU-z to check if the myth stands (on a 8700k):

View attachment 9143

There's at best a 2% difference in scores running from 4.5 to 2.5 GHz fixed in bios, same RAM speed and timings. Myth busted?
Well for this benchmark sure, maybe at 5GHz it decreases noticeably but I won't test that with my crappy cooling… anyone interested open another thread and find out with more benches, also more CPUs!

Andrei. said:
Yes CPU-Z and CB - both known to have tons of memory pressure amirite?

IPC should only be measured at peak performance of a chip because that's the only data-point that matters. Everything below that will artificially inflate IPC because you're essentially improving memory cycles by an equal amount to the clock reduction.

Thank you for actually addressing why there is a variance in the IPC here. Because I have been trying to get the basics through to them on IPC being instructions per cycle and frequency being cycles per second BEFORE I complicated it with explanations on how cache and memory frequency, latency, etc., can effect IPC. In other words, trying to build the foundation, followed by going into other factors that effect IPC that are harder to control for.

Now, although using peak frequency can be preferred to measure IPC due to the other factors you mention, CPU boost behaviors must be considered as well, and if the CPU frequency is changing throughout the testing of IPC, you get the problem of averages, which can actually give a worse image of IPC. So long as the frequency is fixed at that peak, I have no problem with the characterization. But you also understand the reasons that they fix all the CPUs to a single frequency that all of them can achieve for comparison purposes as well. It is a give and take for testing IPC, and there are issues with each method. But, taking a known instruction set with a set number of instructions, running it at a single frequency, and accurately measuring the time to completion will allow for the determination of the number of instructions per cycle. Unfortunately, not all of the software used in daily life has an ability to accurately, down to ms or ns, measure the time to completion, even with a known workload. You also can complicate things further when software selects a different instruction set based on the CPU available. So it can be difficult to fully control, but not impossible.

Edit: and for those that don't understand the difference between Andrei's answer and others, he is NOT saying clock Frequency is effecting IPC, he IS saying that the relationship of frequency to other systems, here being the change in relative memory cycles to the CPU clocks, can effect the IPC, which is a more nuanced statement. It is NOT the frequency, by, in, and of itself that is changing the IPC. That is an important distinction.

This is also why I mentioned controlling for memory frequency, meaning controlling memory bandwidth and latency, if being more correct, is important for testing IPC, which can effect the ability of keeping the cores fed and can change the bandwidth and latency of cache to a degree, something that cannot fully be divorced/controlled for from the IPC calculation (meaning that changes that effect those systems can effect the determined IPC of a CPU).

Yotsugi · Aug 3, 2019

ajc9988 said:
except as Andrei pointed out on how memory can effect IPC with changes to core frequency.

Which is what I said.

ajc9988 said:
It is NOT the frequency, by, in, and of itself that is changing the IPC. That is an important distinction.

Oh no, pure semantics.

ajc9988 · Aug 3, 2019

Yotsugi said:
Which is what I said.

What you wrote was not as clear a statement as his. This is why I returned asking which frequency you were referring to. It is also why, missing a more spelled out statement of the relationship to the system, I did not get the understanding you may have had.

His statement was clear and precise exactly what the relationship that he was referring to was. Yours did not come across as clearly. If that was what was meant, then I apologize for the grief in my retorts.

Yotsugi · Aug 3, 2019

ajc9988 said:
What you wrote was not as clear a statement as his

Semantics.

ajc9988 · Aug 3, 2019

Yotsugi said:
Semantics.

I'm a lawyer. Words are important. Something as simple as a preposition can change the meaning of a patent and mean the difference in an infringement case.

Yotsugi · Aug 3, 2019

ajc9988 said:
Words are important

This is forum we're talking about.

ajc9988 · Aug 3, 2019

Yotsugi said:
This is forum we're talking about.

Yes, but considering I mistook what you said due to clarity, others can as well. Mistakes in communications happen all the time.

Andrei. · Aug 3, 2019

ajc9988 said:
That is NOT how this works, except as Andrei pointed out on how memory can effect IPC with changes to core frequency.

Let's do an example:

IPC is 10 instructions per cycle, and you run at 4GHz.

10 IPC * 4,000,000,000 = 40,000,000,000 Instructions per second.

Let's now say IPC is 12, but the speed is only 3.6GHz

12 IPC * 3,600,000,000 cycles per second = 43,200,000,000 Instructions per second.

Increasing CPU speed, if perfectly controlled for with other variables, works in this way. It increases the number of cycles per second, not effecting the IPC. If I left the IPC at 10, but reduced the frequency to 3.6, it would produce 36,000,000,000 Instructions per second.

What Andrei referred to below is that you cannot always control for other factors, like the memory subsystems, effects on cache and cache latency, etc. But that is a more advanced and nuanced statement than yours, which is that CPU freq effects the average IPC because your mem doesn't go faster together with cpu, to modify the quoted statement from you.

I don't understand the hell you're arguing about, he's saying exactly the same thing.

IPC is a result of the core throughput which is dependent on things like the caches and memory. You can't just separate the two or somehow say IPC is some fixed characteristic.

Here's my 3700X on 429.mcf which is memory intensive:

4325MHz: 50.68 score, 11.71 score per GHz
3500MHz: 45.49 score, 12.99 score per GHz +10.9% IPC
3000MHz: 39.43 score, 13.14 score per GHz +12.1% IPC

And this is why measuring IPC at some arbitrary equal frequency between systems and especially between different micro-architectures is a load of crap.

ajc9988 · Aug 3, 2019

Andrei. said:
I don't understand the hell you're arguing about.

IPC is a result of the core throughput which is dependent on things like the caches and memory. You can't just separate the two or somehow say IPC is some fixed characteristic.

Here's my 3700X on 429.mcf which is memory intensive:

4325MHz: 50.68 score, 11.71 score per GHz
3500MHz: 45.49 score, 12.99 score per GHz +10.9% IPC
3000MHz: 39.43 score, 13.14 score per GHz +12.1% IPC

And this is why measuring IPC at some arbitrary frequency between systems and especially between different micro-architectures is a load of crap.

What I was arguing about is people saying core frequency, by, in, and of itself, which is defined as cycles per second, changed the IPC of a CPU, which is patently false. I didn't say that memory, cache, etc. could not have an effect, and used your statement to show that if you are arguing the relationship of the frequency of the core of the CPU to other systems changes those other factors, such as the cache bandwidth or latency, or, in your case, the number of memory cycles relative to the frequency selected on the CPU, that it can effect IPC.

This is why I mentioned, at the start, that you needed to control memory as well for testing IPC.

Considering those people did not distinguish between frequency, which is cycles in a period, and IPC, which is instructions per cycle, they can easily mislead people on what is measured.

I start there, then I move to more specific examination of the effects of cache and memory on IPC, which allows them to better understand the interplay of the systems on performance metrics.

So what in the hell don't you understand?

Edit: here is the exact quote: "CPU freq affects your average IPC because your memory doesn't go faster together with CPU "

Now, does that explain the same thing you did in the way you did?

Andrei. · Aug 3, 2019

ajc9988 said:
What I was arguing about is people saying core frequency, by, in, and of itself, which is defined as cycles per second, changed the IPC of a CPU, which is patently false. I didn't say that memory, cache, etc. could not have an effect, and used your statement to show that if you are arguing the relationship of the frequency of the core of the CPU to other systems changes those other factors, such as the cache bandwidth or latency, or, in your case, the number of memory cycles relative to the frequency selected on the CPU, that it can effect IPC.

This is why I mentioned, at the start, that you needed to control memory as well for testing IPC.

Considering those people did not distinguish between frequency, which is cycles in a period, and IPC, which is instructions per cycle, they can easily mislead people on what is measured.

I start there, then I move to more specific examination of the effects of cache and memory on IPC, which allows them to better understand the interplay of the systems on performance metrics.

So what in the hell don't you understand?

You're arguing a pointless point. Nobody else said IPC scales just for frequency's sake itself.

Your perspective here only works simply because currently the frequency differences between Intel and AMD aren't all that dramatic.

ajc9988 said:
This is why I mentioned, at the start, that you needed to control memory as well for testing IPC.

No you don't. memory is part of IPC. If you alter memory you alter IPC. You have some skewed logic here.

ajc9988 · Aug 3, 2019

Andrei. said:
You're arguing a pointless point. Nobody else said IPC scales just for frequency's sake itself.

Your perspective here only works simply because currently the frequency differences between Intel and AMD aren't all that dramatic.
No you don't. memory is part of IPC. If you alter memory you alter IPC. You have some skewed logic here.

Your actual argument is that IPC is an imperfect measure of real world performance. That is fine.

But IPC is also to explore the advantages and disadvantages of design choices of the uarch. As such, when testing IPC, you try to control variables that prevent an apples to apples comparison of the uarch, such as memory bandwidth and latency.

Now, your argument is correct that for a purchasing decision, IPC should NOT be the sole metric considered, rather the IPS metric on performance while under a standard use case model for the end consumer.

But that does not make my point useless. And yes, people here, at first, clearly stated that frequency changed IPC without ANY qualifier on it.

And when using IPC as stated above, to explore the advantages and disadvantages of design choices, which includes cache systems, it does NOT require the differences to be small.

I didn't even go into software optimizations and scheduler effects on IPC, which are more factors that influence it.

Do you understand what I'm saying now?

Edit: Some even said that using TDP the same, but at different frequencies and boosts, without controlling variables, gave IPC, and that power consumption was involved in IPC calculations.

Andrei. · Aug 3, 2019

ajc9988 said:
As such, when testing IPC, you try to control variables that prevent an apples to apples comparison of the uarch, such as memory bandwidth and latency.

You have a fundamental misunderstanding of microarchitecture design. You do NOT go around and change memory just to fit your comparison system, because your specific microarchitecture is going to be optimised and designed around the memory subsystem it's meant to end up with in a product. Again your perspective here is skewed because your view here is narrow and just looking at similar desktop CPUs.

What do I do when I go compare an Apple A12 vs Ryzen? In that case your logic is nonsense.

ajc9988 said:
But that does not make my point useless. And yes, people here, at first, clearly stated that frequency changed IPC without ANY qualifier on it.

It was implied, you're the only one going off a massive tangent about it here.

ajc9988 · Aug 3, 2019

Andrei. said:
You have a fundamental misunderstanding of microarchitecture design. You do NOT go around and change memory just to fit your comparison system, because your specific microarchitecture is going to be optimised and designed around the memory subsystem it's meant to end up with in a product. Again your perspective here is skewed because your view here is narrow and just looking at similar desktop CPUs.

What do I do when I go compare an Apple A12 vs Ryzen? In that case your logic is nonsense.

It was implied, you're the only one going off a massive tangent about it here.

So what is compared in that article and other articles when examining IPC? It is desktop to desktop CPUs and looking at how those differences impact the performance in as close to apples to apples comparisons.

What your argument is is that instead of controlling those variable, the rated memory speed should be used because that was the design of the manufacturer related to the memory subsystem. But then you have the issue of ram used having different timings, etc., creating variance even within the same system being tested. So by not controlling those variables, you can get wildly inconsistent IPC measures between systems with the same CPU, needless to say comparing the effects of different design choices on CPUs.

Edit: You are also arguing the absurd. When you go to compare an A12 to Ryzen, they can have different instruction sets, different schedulers, etc., meaning that if you are not even using the same instructions, you cannot compare the IPC. That is also why I mentioned before that software can select different instruction set extensions, which compromises the entire IPC comparison.

Edit 2: To clarify the point on memory. Vendor A uses Samsung memory rated to 3200MT/s. Vendor B uses Micron memory rated to 3200MT/s. Both have different timings resulting in different bandwidths and latencies, even though both are rated for the same 3200MT/s. Will that effect the IPC value on those two systems that use the same CPU?

JoeRambo · Aug 3, 2019

ajc9988 said:
Edit: You are also arguing the absurd. When you go to compare an A12 to Ryzen, they can have different instruction sets, different schedulers, etc., meaning that if you are not even using the same instructions, you cannot compare the IPC. That is also why I mentioned before that software can select different instruction set extensions, which compromises the entire IPC comparison.

What is compared, is not the literal instructions per clock (that would obviously impacted by instruction sets and so on), but rather IPC is derived from benchmark results. As long as benchmark like SPEC is not defeated by some clever compiler tricks and/or hw prefetching access patterns, it gives good overall picture of score/Ghz => derived IPC value of architecture

Zucker2k · Aug 3, 2019

JoeRambo said:
What is compared, is not the literal instructions per clock (that would obviously impacted by instruction sets and so on), but rather IPC is derived from benchmark results. As long as benchmark like SPEC is not defeated by some clever compiler tricks and/or hw prefetching access patterns, it gives good overall picture of score/Ghz => derived IPC value of architecture

This is why I said, stick to manufacturer specs. There's already an established, standardized methodology for IPC tests - namely, SPEC. The only reason for all this back and forth is because of the rather shocking attempt to introduce power budgets into the definition of the term.

AMDK11 · Aug 3, 2019

A given microarchitecture has fixed IPCs, regardless of memory. I believe that the cache memory (latency and bandwidth) is one of those actions that allow the IPC to reach as close as possible to the theoretical maximum for a given microarchitecture.

Andrei. · Aug 3, 2019

AMDK11 said:
A given microarchitecture has fixed IPCs, regardless of memory. I believe that the cache memory (latency and bandwidth) is one of those actions that allow the IPC to reach as close as possible to the theoretical maximum for a given microarchitecture.

So it's fixed IPC yet IPC varies?

DrMrLordX · Aug 3, 2019

In most cases, observers/readers are willing to accept IPC tests at face value without worrying overmuch about how clock scaling will affect the measurement. This is especially true when dealing with stock configurations that won't vary much in clockspeed outside of testing.

IceLake in particular doesn't have much headroom. If someone manages to bench (for example) 3 GHz static clocked IceLake-U versus . . . I don't know, 3 GHz Picasso, I'm not going to pitch a fit since I know neither IceLake-Y nor Picasso will go much higher in clocks than that anyway.

AMDK11 · Aug 3, 2019

Andrei. said:
So it's fixed IPC yet IPC varies?

Yes. If the IPC bottleneck of the microarchitecture is the cache memory, the change in its speed (clocking) will improve / worsen the virtually achievable IPC with respect to the theoretical maximum.

SAAA · Aug 3, 2019

The long quest for the perfect IPC definition keeps going… while we are at it Andrei's test above shows some interesting numbers over just a few hundred MHz, one wonders if other commonly used benches are affected as much by clock speed, oh what about gaming?

The multiplatform geekbench that everyone loves! I'm tempted to run it @ 2.5 GHz and see how much better Skylake fares against apple's cores clock/clock.

Also relevant: do different architectures vary the same over a range, say 3-4GHz, or there's consistent differences under a specific test? It would be another point in favour of architectures that can reach high clocks and keep scaling well. If you gain 10% frequency but IPC goes down 10%...

jpiniero · Aug 3, 2019

Hmm, according to ark the Icelake-Y parts only support 32 GB of LPDDR3 (?). Assuming the LPDDR3 is a typo, the 32 GB is not.

Also for some reason Intel didn't brand the IGP, it's just "Iris Plus" if you get 48 or 64 EUs and "UHD Graphics" for 32 EUs.

Nothingness · Aug 3, 2019

ajc9988 said:
But IPC is also to explore the advantages and disadvantages of design choices of the uarch. As such, when testing IPC, you try to control variables that prevent an apples to apples comparison of the uarch, such as memory bandwidth and latency.

The problem is that a micro architecture is also designed with a frequency target in mind (and also power) and this has huge impacts on design choices. So by setting all chips to the same frequency you're just forgetting that.

Icelake "IPC" at iso freq is really good. I wonder what frequency target Intel had in mind. Right now it's significantly below what we have on 14nm. Is that a process issue or did they target lower frequency and higher IPC to get better efficiency (hoping future processes will help to get to higher frequencies)? Or perhaps it's just too early and we'll see Icelake chips with higher freq.

PS - Anyway the winner at that game is Apple by far: their "IPC" lead at iso clock on SPEC is just huge. Would you say that's the better microarch for laptops and desktops? Or is it the better microarch for a given frequency/power target that fits phones/tablets needs?

AMDK11 · Aug 3, 2019

Each subsequent microarchitecture has always introduced higher IPCs. Until now, the deviations in the x86 programming model were Netbrust (Pentium4) and Bulldozer (FX). In the history of processors, microarchitecture with a higher IPC has always had an advantage over the microarchitecture with low IPC. We probably will not see a much faster clock anymore. We have a lot of craze for the number of cores behind us. In HPC or servers, each core is worth its weight in gold, but on home computers in typical applications of 10-12 cores and even 8 cores is too much. I doubt this will change, because not everything can be parallelized, and the communication / synchronization between the cores absorbs many cycles of the processor. In my opinion, cores, as before, will be increasingly larger in terms of complexity, and thus the number of transistors.

Every microarchitecture from any software model, whether it is x86 (CISC-RISC) / RISC / CISC / VISC, always evolves towards a higher IPC, with some exceptions that have been discontinued in favor of the high IPC.

Computational cores only simplify where they are to be specialized for specific burdens, but even in this case, next generations introduce the expansion of microarchitecture end its complexity.

Discussion Intel current and future Lakes & Rapids thread

Lifer

Lifer

Senior member

Golden Member

Senior member

Golden Member

Senior member

Golden Member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Golden Member

Golden Member

Senior member

Senior member

Lifer

Senior member

Senior member

Lifer

Diamond Member

Senior member