Is intel still on nehalem architecture as their groundwork?

CentroX

Senior member
Apr 3, 2016
351
152
116
Nehalem is 10 years old now and was their first core i5/i7 architecture. Is intel limited by the fact that their groundwork is old now? Do they need a clean slate for their future generations?
 

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,997
136
Probably. The recent KL release makes it look like they're pretty tapped out in terms of even moderate overall performance gains.

A fresh start and ability to design without existing architecture constraints would allow them to take what they've learned and build something better.

They have to have had all manner of good design ideas that they had to kill because they only work if other significant changes were also made which isn't possible with their tick-rock approach.

I only think they haven't done so already because of their previous focus on getting into mobile and AMD having anemic CPU performance.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Nehalem is 10 years old now and was their first core i5/i7 architecture. Is intel limited by the fact that their groundwork is old now? Do they need a clean slate for their future generations?

Actually it's based off Sandy Bridge. Nehalem core is a minor change(even miniscule) from original Core uarch while Sandy Bridge does the overhaul to finally get it away from having the remnants of Pentium Pro.

Doing an overhaul does not indicate it'll turn out to have more headroom. CPU architectures have largely hit limits on IPC and clock. Look at CPUs from various companies, market positioning, and ISAs. The level is converging. And yes, Zen is another example.

I'd be surprised if the already disappointing 5 to 10% gain today is maintained in the future. Even with transistors so-called revolutionary changes like FinFET were required just to keep the performance increases going, nevermind bring a true breakthrough in performance. In the 90s, we got 50% improvements with new transistors in every area, while FinFET with all its complexities brought 40% only in very limited scenarios.
 

riggnix

Junior Member
Jul 27, 2016
23
3
41
I would actually claim that they still are in Pentium Pro, not on Nehalem. That's kind of hard to define though, as @IntelUser2000 pointed out you might also say they did the "switch" with Sandy Bridge.

If you consider what happenden to AMD in the last 10 years you can see why they would stay on the same architecture though. It took AMD 2 new architectures to be competitive again, not just 1. Assuming Ryzen is competitive of course.

So it might sound horrible to still be "on an old architecture", but it actually makes sense in a lot of ways too.
 

NTMBK

Lifer
Nov 14, 2011
10,239
5,026
136
Actually it's based off Sandy Bridge. Nehalem core is a minor change(even miniscule) from original Core uarch while Sandy Bridge does the overhaul to finally get it away from having the remnants of Pentium Pro.

Nehalem was a very big overhaul of the CPU as a whole- whole new cache hierarchy with introduction of the L3, L2 shrinking dramatically, memory controller unified into the CPU... it was a very Phenom-like architecture. The core itself didn't change that much apart from the (pretty major) addition of Hyperthreading, but everything around it did.
 

Greyguy1948

Member
Nov 29, 2008
156
16
91
Interesting tha the small L2 cache has been working so well. I remember Anand asking just if it was not too small. Now it will soon be bigger like 1024. I guess 64 bit was not very interesting when Nehalem was new. One alternative would be to split it 512i + 512d like IBM have in some CPUs.
 

lopri

Elite Member
Jul 27, 2002
13,209
594
126
I'd be surprised if the already disappointing 5 to 10% gain today is maintained in the future. Even with transistors so-called revolutionary changes like FinFET were required just to keep the performance increases going, nevermind bring a true breakthrough in performance. In the 90s, we got 50% improvements with new transistors in every area, while FinFET with all its complexities brought 40% only in very limited scenarios.
Didn't Intel's first FinFET lower the performance? I was very surprised by Ivy Bridge's power/thermal characteristics after hearing all the fanfare of FinFET's awesomeness. Compared to 32nm Sandy Bridge, its power consumption did not decrease and it clocked lower. That made me somewhat skeptical of FinFET, then I was surprised again when Samsung's 14nm FinFET actually did improve upon its 20nm by a sizable margin. TSMC followed shortly thereafter with 16nm FF/FF+ which appeared even better than Samsung's 14nm FF. That was when I was convinced that Intel's process lead is nowhere near where it was thought to be.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Didn't Intel's first FinFET lower the performance? I was very surprised by Ivy Bridge's power/thermal characteristics after hearing all the fanfare of FinFET's awesomeness.
Ivy Bridge had fine power/thermals, but bad TIM on k chips sort of wrecked it.
Samsung's 14nm FinFET actually did improve upon its 20nm by a sizable margin.
Was not 20nm considered a rolling disaster for both Samsung and TSMC? There's a reason they cancelled GPUs on it.
 
Mar 10, 2006
11,715
2,012
126
Actually it's based off Sandy Bridge. Nehalem core is a minor change(even miniscule) from original Core uarch while Sandy Bridge does the overhaul to finally get it away from having the remnants of Pentium Pro.

Doing an overhaul does not indicate it'll turn out to have more headroom. CPU architectures have largely hit limits on IPC and clock. Look at CPUs from various companies, market positioning, and ISAs. The level is converging. And yes, Zen is another example.

I'd be surprised if the already disappointing 5 to 10% gain today is maintained in the future. Even with transistors so-called revolutionary changes like FinFET were required just to keep the performance increases going, nevermind bring a true breakthrough in performance. In the 90s, we got 50% improvements with new transistors in every area, while FinFET with all its complexities brought 40% only in very limited scenarios.

Nehalem core was very modestly changed compared to the core in the C2D/C2Q line. Sandy Bridge made some significant enhancements, but it was still based on the same Nehalem core with a lot of bits reused.

I would say NHM -> SNB is on the same order as SNB->HSW or HSW->SKL.
 

lopri

Elite Member
Jul 27, 2002
13,209
594
126
Ivy Bridge had fine power/thermals, but bad TIM on k chips sort of wrecked it.
That is one explanation. I wonder why Intel would do such a thing, though? That made the whole FinFET achievement look underwhelming.

Was not 20nm considered a rolling disaster for both Samsung and TSMC? There's a reason they cancelled GPUs on it.
You are talking about 28 nm to 20 nm transition which was a mediocre improvement, and more importantly 20 nm was not FinFET. Also I am not talking about financial/business point of view, which I am sure has a lot of drama in it depending on who you speak to. Technology-wise, the Exynos 7420, which was the 2nd FinFET after Ivy Bridge, was a significant improvement over the Exynos 5433.
 

inf64

Diamond Member
Mar 11, 2011
3,702
4,030
136
Nehalem core was very modestly changed compared to the core in the C2D/C2Q line. Sandy Bridge made some significant enhancements, but it was still based on the same Nehalem core with a lot of bits reused.

I would say NHM -> SNB is on the same order as SNB->HSW or HSW->SKL.
HSW->SKLY is the least impressive "jump" in the whole core generation history.Basically for Haswell desktop users, it is is ~6% higher IPC according to AT. For BDW users it was ~3% "jump". The only saving grace is KL and its crazy high OC potential.
 
  • Like
Reactions: Drazick

inf64

Diamond Member
Mar 11, 2011
3,702
4,030
136
AT results were deflated, for one early Z170 bug reasons, so pretty please, stop bringing them up.
We will have new results soon as per Ian. He is preparing new CPU charts and there you can check how much "faster" is Skylake.

edit: hardware.fr shows very similar gain of 6.7% in apps.
 
Last edited:
  • Like
Reactions: Drazick

french toast

Senior member
Feb 22, 2017
988
825
136
Processor design is a constant evolution of what came before, even ryzen uses bulldozer ip i would think?, so its hard to pin down just when a new uarch starts and the old one ends.
A bit like that old only fools and horses sketch;
"ive had this brush for 20 years, its had 8 new handles and 10 new heads!" #trigger.

I consider kabylake to be a heavily modified nehalem, others may disagree.
 

lopri

Elite Member
Jul 27, 2002
13,209
594
126
Yes. No one throws away a successful μarch or a lesson learned from a failed μarch.
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,414
8,356
126
Nehalem was a very big overhaul of the CPU as a whole- whole new cache hierarchy with introduction of the L3, L2 shrinking dramatically, memory controller unified into the CPU... it was a very Phenom-like architecture. The core itself didn't change that much apart from the (pretty major) addition of Hyperthreading, but everything around it did.
iirc the anandtech nehalem article talked about how similar it was to phenom. just better.
 
  • Like
Reactions: NTMBK

bjt2

Senior member
Sep 11, 2016
784
180
86
I think that the best solution is POWER like. Separate schedulers for int, fp, memory and branch. Intel is at unified scheduler. AMD split in int+mem+branch and FP. Still better than unified, because you can have more ports at even lower FO4, but with fully split schedulers, you can lower further the FO4...
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Nehalem was a very big overhaul of the CPU as a whole- whole new cache hierarchy with introduction of the L3, L2 shrinking dramatically, memory controller unified into the CPU... it was a very Phenom-like architecture. The core itself didn't change that much apart from the (pretty major) addition of Hyperthreading, but everything around it did.

Many CPU experts were saying Sandy Bridge marks the true departure from Pentium Pro, and I am inclined to say the same. Also I'd think modifying the core is a far widespread change than adding some uncore and deleting some cache blocks. Nehalem was a good server chip not entirely due to the IMC/QPI but because the good core was hobbled by the lack of it. I'd bet Prescott being on an IMC/QPI wouldn't have been so impressive.

The lack of core changes were also why the initial impact on consumer applications were not so good. Single thread without Turbo were often only about 5% faster.

Because you might as well say every current chip today is based on Intel's 8086 chip. At what point do we talk about not being based off anymore?

I think that the best solution is POWER like. Separate schedulers for int, fp, memory and branch. Intel is at unified scheduler. AMD split in int+mem+branch and FP. Still better than unified, because you can have more ports at even lower FO4, but with fully split schedulers, you can lower further the FO4...

Funny thing about being perceived a "loser" and a "winner". When Intel chips were dominating, unified schedulers were "it". They take up less space, they were power efficient, and just a more efficient way of doing things. When AMD was doing well with Athlon, people said the same thing about separate schedulers. That different way of doing things were so consistent between the two companies over such a long period it may just be company culture.

Just a different way of doing things folks.
 
Last edited:

nismotigerwvu

Golden Member
May 13, 2004
1,568
33
91
You really don't need to look beyond the block diagrams to see that Kabylake and whatnot are part of the P6 lineage. But this alone means very little. There's no reason why the current architecture can't be modified at bottleneck points. The addition of a uOP cache made a huge difference and no one claims it broke the lineage of the architecture and the same is true for SMT. It would be far more expensive to just blow the whole thing up and start over than it would be to just alleviate the weaknesses. Lastly, identifying the weaknesses is the easy part, designing a better (faster, more efficient...ect) implementation is the tricky part.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Many CPU experts were saying Sandy Bridge marks the true departure from Pentium Pro, and I am inclined to say the same. Also I'd think modifying the core is a far widespread change than adding some uncore and deleting some cache blocks. Nehalem was a good server chip not entirely due to the IMC/QPI but because the good core was hobbled by the lack of it. I'd bet Prescott being on an IMC/QPI wouldn't have been so impressive.

The lack of core changes were also why the initial impact on consumer applications were not so good. Single thread without Turbo were often only about 5% faster.

Because you might as well say every current chip today is based on Intel's 8086 chip. At what point do we talk about not being based off anymore?



Funny thing about being perceived a "loser" and a "winner". When Intel chips were dominating, unified schedulers were "it". They take up less space, they were power efficient, and just a more efficient way of doing things. When AMD was doing well with Athlon, people said the same thing about separate schedulers. That different way of doing things were so consistent between the two companies over such a long period it may just be company culture.

Just a different way of doing things folks.

Unified scheduler is better (from the queue theory) only at same number of servants (queues). But split scheduler allows you to implement more queues. The peak computing throughput is 4 INT + 4 FP for Zen and 4 int OR 3 vecint or 2 FP for SKL/KBL. Split schedulers were not so fast with AMD because they lacked zero cycle moves. And so required to waste an ALU for such a simple task. Now Zen is like INTEL and can do 4 zero cycle move per clock in the renamer for each pipeline (INT/FP).
 

NTMBK

Lifer
Nov 14, 2011
10,239
5,026
136
iirc the anandtech nehalem article talked about how similar it was to phenom. just better.

Oh definitely, it was "Phenom done right" :) The overall design of Phenom I was pretty sound, but it clocked too low, didn't have enough L3 cache for all those cores, the DDR2 memory was too slow to feed those 4 cores (compounding the cache issues), and it had a nasty TLB bug. Phenom II fixed all of those things, but by that point Intel had brought out a superior version in the form of Nehalem.
 

NTMBK

Lifer
Nov 14, 2011
10,239
5,026
136
Many CPU experts were saying Sandy Bridge marks the true departure from Pentium Pro, and I am inclined to say the same. Also I'd think modifying the core is a far widespread change than adding some uncore and deleting some cache blocks. Nehalem was a good server chip not entirely due to the IMC/QPI but because the good core was hobbled by the lack of it. I'd bet Prescott being on an IMC/QPI wouldn't have been so impressive.

The lack of core changes were also why the initial impact on consumer applications were not so good. Single thread without Turbo were often only about 5% faster.

Because you might as well say every current chip today is based on Intel's 8086 chip. At what point do we talk about not being based off anymore?

From what I can tell, keeping the execution units of the CPU fed is one of the most important challenges in CPU design these days; as such I'd view a complete overhaul of the system architecture as a pretty radical departure. Sure, the memory architecture for Conroe was able to provide enough bandwidth for a single thread, but it really fell apart as soon as you spin up all four cores. Which is why you see ridiculous gains in performance (at the same clock speed!) over C2Q in multithreaded benches:

17765.png


If a change can make that much difference to the performance of the CPU, I'd say that it was pretty significant :) Sure, the core didn't change much, but the uncore is just as important.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Which is why you see ridiculous gains in performance (at the same clock speed!) over C2Q in multithreaded benches:
You do understand that you compare 4c+HT against 4c only and get ~33% gain from HT + IPC improvement, entirely in-line with SMT scaling in Cinebench and ~10% ST improvement per clock.
If a change can make that much difference to the performance of the CPU, I'd say that it was pretty significant :) Sure, the core didn't change much, but the uncore is just as important.
Yeah, 4 more threads certainly change a lot of performance :p