Question Alder Lake - Official Thread

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
It affects performance in any kind of branchy code where the predictors can't maintain high hit rates. If the Gracemont cores have to snag something from L3 outside of their cluster or from main memory, that latency is going to sting you.
Are we still talking about games?!
Because games pre render and pre do everything that is possible to do so, the only real-time thread is the user input and everything else is being slowed down or paused to match the speed of that one thread.
Unless it gets screwed up somehow and everything happens whenever, like parts of graphics showing up later or not at all (assassins creed horror face bug) , or you get heavy stutters because the pausing is way too long.
 

SKORPI0

Lifer
Jan 18, 2000
18,411
2,319
136
Not sure if I'll upgrade to the 12th Gen Intel. Price of DDR5 of $375 for only 32GB is ridiculous (understandably since they are new) , I'll wait when it goes a lot lower.
VENGEANCE® 64GB (2x32GB) DDR5 DRAM 4400MHz C36 Memory is currently priced at $615. :confused:


Paid $275 for Corsair Vengeance RGB Pro 64GB DDR4-3600 early Sept. last year, added another 64GB a few weeks later for about the same price. It's about $470 now at Amazon.
I've barely used my 11th Gen (4 month old) system for video encoding/remuxing. I'm pissed that GeForce RTX 30 series/AMD RX 6000 series GPU are hard to get at MSRP, I'm not paying for markups at eBay, etc,
Hopefully the RTX 40 series will be more available late 2022/early 2023. .
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,749
4,691
136
Are we still talking about games?!
Because games pre render and pre do everything that is possible to do so, the only real-time thread is the user input and everything else is being slowed down or paused to match the speed of that one thread.
Unless it gets screwed up somehow and everything happens whenever, like parts of graphics showing up later or not at all (assassins creed horror face bug) , or you get heavy stutters because the pausing is way too long.
WHAAAT?
 
  • Haha
Reactions: DAPUNISHER

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
There hasn't been much benchmarks for Goldmont Plus(atom-class Branch Predictor) vs Tremont(core-class Branch Predictor) or any benchmarks checking out mispredictions of late.

Agner Fog branch bottleneck text (condensed):
Silvermont - The prediction rate is fair.
Goldmont - The prediction rate is fair.
Nehalem - The branch prediction algorithm is good, especially for loops.
Sandy/Ivy - Worse than Nehalem for some cases.
Haswell/Broadwell - The size of the branch target buffer and the construction of the branch predictor is unknown, but at least the prediction rate seems good.
Skylakes - Same as above.
Icelake/Tigerlake - The branch prediction mechanism is quite complicated and allegedly improved over previous models. Same as above.

RWT VTune and CodeAnalyst of core2 vs k8 has Core2 with ~95% prediction hits in Far Cry, Fear, Prey, X3. "Both the Core 2 and K8 tend to encounter a branch every 5-7 instructions. Since each processor has an instruction window of ~100 instructions, this means that on average, there will be as many as 20 predicted branches in-flight at once."

There is also the case of fast recovery which was introduced with Nehalem, so no flushed pipeline is needed.

Cost for GoldenCove - 6-way decode front-end power consumption + recovery from op-cache(2-cycle recovery???)
Cost for Gracemont - 3-way or 3-way decode front-end power consumption + recovery from L1i(6-cycle recovery? or 5-cycle recovery if OD-ILD Bypassed???)

A particularly bad misprediction chain should waste more power on GoldenCove rather than Gracemont.
Using a rather old Intel paper on micro-op caches and me potentially misinterpreting percentages and what not.... misprediction recovery from op-cache should be ~2.6992 watts(40%? of 28% of processor avg power) for GoldenCove, and ~1.687 watts(28% of processor avg power) for L1i recovery for Gracemont.

So, while it hurts on performance-side w/o a sibling thread to take over, it hurts less on the power-side. Less power wasted is more ideal in Alder Lake N lineup.

On another note, I hope we get an Alder Lake big N with 24e cores(24 threads in p+e, 24 threads in just e). That way we can really compare hybrid architecture vs pure low-power w/ extreme width DVFS/AVFS architecture(AMD-esque). [[8+8 / 0+24 / 6+0 being the big boy dies]]
 
Last edited:

Harry_Wild

Senior member
Dec 14, 2012
834
150
106
Just wait till Intel moves on from 10nm to 7nm, it going be rocking again. Intel swallowed it pride and let TSMC do the manufacturing now, just like Apple. Intel had factories in Mexico and gave up on most of the factories in the U.S. around 10 years ago and lost it edge! Maybe Intel will start back up manufacturing CPU again in the U.S. again now!
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,573
14,526
136
Just wait till Intel moves on from 10nm to 7nm, it going be rocking again. Intel swallowed it pride and let TSMC do the manufacturing now, just like Apple. Intel had factories in Mexico and gave up on most of the factories in the U.S. around 10 years ago and lost it edge! Maybe Intel will start back up manufacturing CPU again in the U.S. again now!
Its been producing them in Hillsboro, Oregon for quite some time now. YEARS.

In fact 6 plants, the most recent starting in 2013
 
Last edited:

UsandThem

Elite Member
May 4, 2000
16,068
7,380
146

DrMrLordX

Lifer
Apr 27, 2000
21,643
10,860
136
Games are bound to how fast the main loop can run and that main loop starts with "see what user does" (poll user input)

Congratulations

Now you understand why games respond so well to fast memory and large caches. And why they are hurt so much by intercore latency and/or memory/cache latency.
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
G-Skill announces DDR5 7000MHz CL40 RAM

Intel's XMP 3.0 goes up to DDR5 6666MHz CL40
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
Last edited:
  • Like
Reactions: lightmanek

lightmanek

Senior member
Feb 19, 2017
387
754
136
  • Like
Reactions: Zucker2k

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
Very nice comparison, Alder Lake Cinebench R20 performanse per watt........................ :grimacing:

Damn, so compared to rocket AL is ~50% faster at 125W and ~70% faster at 241W compared to 251W, and mostly at the same price, for the CPU alone anyway.
That's all they need to sell them, actually it's way way more than they would need to sell them.

But these numbers are collected from various sources so not really 100% for certain.
 

Ajay

Lifer
Jan 8, 2001
15,468
7,874
136
Games are bound to how fast the main loop can run and that main loop starts with "see what user does" (poll user input)
Games don't poll anything anymore. User inputs are handled by the OS. The gaming application subscribes via a particular API to receive these inputs (probably a callback, but there are other ways). The gaming applications receives a notification of ready input asynchronously (on another thread). The game code decides what needs to be done and then sends a set of instructions to game's rending engine to update itself to show the user any necessary changes. Roughly.
 

Hitman928

Diamond Member
Apr 15, 2012
5,324
8,015
136
Damn, so compared to rocket AL is ~50% faster at 125W and ~70% faster at 241W compared to 251W, and mostly at the same price, for the CPU alone anyway.
That's all they need to sell them, actually it's way way more than they would need to sell them.

But these numbers are collected from various sources so not really 100% for certain.

It's 70% faster at PL2=PL1=241W for ADL versus RKL which is at PL2=251W and PL1=125W. So the ADL is 70% faster in this comparison but also using more average power across the run (unless tau was set to unlimited or long enough to complete the bench on RKL but then why even list the PL1 on RKL?). According to Intel, ADL is 50% faster than RKL when both are at ~240W so it seems it's ~50% faster at equal power. Maybe low power situations are different though.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
Congratulations

Now you understand why games respond so well to fast memory and large caches. And why they are hurt so much by intercore latency and/or memory/cache latency.
The only data point we would have for that would be canned benchmarks because that's all that reviewers give us and those don't use any user input.
Canned benches can fill up all of their cache with upcoming stuff that the game wouldn't know about or would have to guess on if you, the user, actually move around in it randomly.
Games don't poll anything anymore. User inputs are handled by the OS. The gaming application subscribes via a particular API to receive these inputs (probably a callback, but there are other ways). The gaming applications receives a notification of ready input asynchronously (on another thread). The game code decides what needs to be done and then sends a set of instructions to game's rending engine to update itself to show the user any necessary changes. Roughly.
I would look up what the word poll means for devs if I would care enough but I doubt that it's exclusively used for assembly level hardware poking (pulling/pushing, whatever it would be) .
The whole point was that the game checks for user input, indifferent from how this is accomplished.
 

DrMrLordX

Lifer
Apr 27, 2000
21,643
10,860
136
The only data point we would have for that would be canned benchmarks because that's all that reviewers give us and those don't use any user input.

What are you talking about? Memory latency, cache latency, and intercore latency (which affects the first two) have affected game benchmark performance for years, even in benchmarks that are not "canned".