Intel microarchitecture: Nehalem v. Skylake

Dresdenboy · Apr 10, 2016

NTMBK said:
Nehalem also reworked the cache hierarchy, adding a new L2 per-core cache between L1 and shared LLC.

Hyperthreading?

And some minor changes:

jhu said:
Well, depends on the test and how it's run. From my own tests with Povray, I got a 5% IPC increase (1 thread per core) or 30% IPC increase (2 threads per core) over Core 2.

That's a good point regarding IPC numbers and SMT for Zen.

We may also always look into http://www.agner.org/optimize/microarchitecture.pdf, but neither there nor on the web I found a rather complete microarchitectural comparison table. It seems, we need to create our own table then.

And I also found this graph showing the evolution from P6 to Nehalem (thus completing the picture):

Source: http://www.hardwaresecrets.com/inside-intel-nehalem-microarchitecture/4/

waltchan · Apr 10, 2016

Sheep221 said:
The problem now with SB or Nehalem is lack of motherboards, the used ones are rare and overpriced, almost everyone is selling CPUs for cheap but no mobos.

ECS A55F-M4 FM1 boards will follow the same fate after all the current inventory dried up. That's what I'm trying to convince VirtualLarry, these will go up in value when all the quad-core FM1s depreciate down to $5 shipped.

They're keeping the used LGA1156 boards prices high because there's a rising demand for worthless LGA1156 CPUs when they depreciate down to $5 shipped. It will be chaos buying.... I've said this many times already, but no one believes me here.

Deders · Apr 10, 2016

Apparently with Nehalem, most of the changes were made in the front end as to how the CPU was fed data. There are quite a more changes including moving the dedicated loop stream detector (LSD) from the front to the middle where it could serve both sides.

There's a couple of good articles, one sums it up in 3 pages, the other goes into a lot more of the details. I'm learning quite a lot here.

http://arstechnica.com/gadgets/2008/04/what-you-need-to-know-about-nehalem/2/

http://www.realworldtech.com/nehalem/

Borealis7 · Apr 10, 2016

Deders said:
Quote:
Originally Posted by Borealis7
the Ring Bus lives on!

did Nehalem have a Ring Bus? i'm too lazy to google it.

Deders said:
From what I remember, it did not.

So there's your major architectural change. from Nehalem to SNB, the introduction of the ring bus which is still prevalent today.

Sheep221 · Apr 10, 2016

waltchan said:
ECS A55F-M4 FM1 boards will follow the same fate after all the current inventory dried up. That's what I'm trying to convince VirtualLarry, these will go up in value when all the quad-core FM1s depreciate down to $5 shipped.

They're keeping the used LGA1156 boards prices high because there's a rising demand for worthless LGA1156 CPUs when they depreciate down to $5 shipped. It will be chaos buying.... I've said this many times already, but no one believes me here.

It's a pity otherwise because I would gladly invest in another SB or Nehalem system just for the fact those CPUs were of much higher quality than following IB and Skylake respectively. They still perform well for most scenarios and don't need delidding and heavy coolers.

majord · Apr 10, 2016

Dresdenboy said:
After this thread turned to Atoms and past server CPUs, I thought I add something to the topic.

I made a simple animation of Oliverda's uarch diagrams to show the evolution.

This might be of interest - at least the top half in relation to this discussion.

I've been tinkering around with this for a while and decided to finish it in lieu of this thread

It's an attempt to take process out of the equation, to get a visual picture of how the cores have grown (or shrunk).

My 2 cents on Nehalem: As far as uArch is concerned, it was a tweaked Core 2 with SMT. The Large majority of changes were designed to facilitate the introduction of SMT to the Core uArch. (Which explains the focus on front end changes I think) .

Outside of that though the enhancements and subsequent IPC uplift (no SMT) were pretty minimal, and due tot he change in Cache architecture caused a performance regression in a few niche caches (mostly Cache games).

Nehalem was more about the moving to a monolithic, scalable processor architecture. Essentially taking away AMD's last advantage - and running away with it at the same time (i.e it was more advanced). This task was more than enough work without trying to change conroe's basic architecture on top of it all, so stands to reason really.. They saved that for Sandy bridge.

jhu · Apr 10, 2016

NTMBK said:
Nehalem also reworked the cache hierarchy, adding a new L2 per-core cache between L1 and shared LLC.

That's why I put "just" in quotes.

Sheep221 · Apr 10, 2016

majord said:
This might be of interest - at least the top half in relation to this discussion.

I've been tinkering around with this for a while and decided to finish it in lieu of this thread

It's an attempt to take process out of the equation, to get a visual picture of how the cores have grown (or shrunk).

My 2 cents on Nehalem: As far as uArch is concerned, it was a tweaked Core 2 with SMT. The Large majority of changes were designed to facilitate the introduction of SMT to the Core uArch. (Which explains the focus on front end changes I think) .

Outside of that though the enhancements and subsequent IPC uplift (no SMT) were pretty minimal, and due tot he change in Cache architecture caused a performance regression in a few niche caches (mostly Cache games).

Nehalem was more about the moving to a monolithic, scalable processor architecture. Essentially taking away AMD's last advantage - and running away with it at the same time (i.e it was more advanced). This task was more than enough work without trying to change conroe's basic architecture on top of it all, so stands to reason really.. They saved that for Sandy bridge.

Well nehalem definitely brought new things, reintroduced HT, on-die quad cores, processor graphics, on-die DDR3 memory controller, sandy bridge brought the ring bus, increased IPC and on-die iGP, you can't really say it was tweaked conroe.

intangir · Apr 11, 2016

I love all the uarch diagrams and die photos y'all have assembled. That's great stuff, and I love just staring at them. Kudos!

destrekor said:
... can the entire recent "Core" series be said to be related to the original Nehalem architecture?

I guess my main question is whether one could say Intel could radically re-engineer the architecture to be a totally new design, and not one that is iterative of past designs?

This is a bit of a Ship of Theseus question, isn't it? When every component of a design has been replaced piecemeal over the years, is it really the same design at the end? I would guess the line can be drawn differently by different people.

David Kanter, for one, believes a line can be drawn at Sandy Bridge.

http://www.realworldtech.com/sandy-bridge/

David Kanter said:
The Sandy Bridge CPU cores can truly be described as a brand new microarchitecture that is a synthesis of the P6 and some elements of the P4. Although Sandy Bridge most strongly resembles the P6 line, it is an utterly different microarchitecture. Nearly every aspect of the core has been substantially improved over the previous generation Nehalem. Many of these changes, such as the uop cache or physical register files, are drawn from aspects of or concepts behind the P4 microarchitecture.

Agner Fog, who's written some great manuals detailing instruction performance from experimentation on all the recent x86 microarchitectures, expressed bewilderment about why Intel didn't give Sandy Bridge a new brand name different from Nehalem's.

http://www.agner.org/optimize/blog/read.php?i=142

Agner Fog said:
It has struck me that the new Sandy Bridge design is actually under-hyped. I would expect a new processor design with so many improvements to be advertised aggressively, but the new design doesn't even have an official brand name. The name Sandy Bridge is only an unofficial code name. In Intel documents it is variously referred to as "second generation Intel Core processors", "2xxx series", and "Intel microarchitecture code name Sandy Bridge". I have never understood what happens in Intel's marketing department. They keep changing their nomenclature, and they use the same brand names for radically different technical designs. In this case they have no reason to obscure technical differences. How can they cash in on the good reputation of the Sandy Bridge design when it doesn't even have an official name?

And certainly it can be argued that the addition of the PRF, uop cache, 256-bit floating-point SIMD, and ring bus in Sandy Bridge was a bigger change than the addition of FMA, 256-bit integer SIMD, dispatch port expansion, and FIVR in Haswell. And it seems to be more than was advertised about Skylake, which is still somewhat mysterious to me in that it has nothing I can identify as a big main-feature change.

More on the "radical engineering" question in my next post.

intangir · Apr 11, 2016

destrekor said:
And, just because, let's add the original Core series, like Conroe, to the mix. ... I guess my main question is whether one could say Intel could radically re-engineer the architecture to be a totally new design, and not one that is iterative of past designs?

As other people have said, Intel tried with the Pentium 4, and even with non-x86 architectures like the iAPX 432 and Itanium, and those turned out to be dead ends. That's not to say various parts didn't eventually make it into later designs. But there's always risk in creating a new architecture/uarch from scratch, and those new directions are very commonly failures.

Andy Glew had this observation about making the jump to a new microarchitecture: https://groups.google.com/d/msg/comp.arch/dZvMy_oLiwc/OG35DtzV74IJ

Andy Glew said:
Anyway, my big problem with [Instruction-Level Distributed Parallelism] is that it was a microoptimization. To use it, you would basically have to throw away out-of-order CPUs, and start over. And in the first generation, you would just be playing catch up.

I've seen this many times. People think that they can get paid to re-implement an existing CPU better, with a better, newer, microarchiture. Maybe so - but remember that you are then competing with the design team that is already going over the existing design with a fine tooth comb. I've seen this several times with attempts to do a new low power microarchitecture. I think that [Willamette] was much like this -out-of-order done anew, rather than extending P6 OOO. The folks who pushed run-ahead were in this camp: they weren't better than OOO, just more efficient. Or so they believed. Because you also have to remember that there is risk in doing anything new - so if the new supposedly more efficient microarchitecture does not quickly make the phase change to proven to more efficient, it will get canned.

So if the company doesn't have the patience or resources to stick it through the first few iterations of a new design, it's not likely to end up in a better place than if it had just stuck with the existing one. That happened with Willamette, the first Pentium 4, which was in many ways a regression from the Pentium 3, and was only saved, I heard tell, by the transistor performance folks who enabled the next Pentium 4 microarchitecture Northwood to clock up to 3.06 GHz.

I mean, you can look at the first Alpha microprocessor that was sold by DEC for revenue, and it had the codename EV4. What happened to EV1 through EV3? EV3 was apparently manufactured in test runs only.

There's a really good video lecture called "Things CPU Architects Need to Think About", given by a role model of mine, Robert Colwell, about how since the P6 Pentium Pro came out in 1995, x86 microarchitects have run out of "big ideas" that they can use to main-feature their new designs. Unfortunately, Stanford University took the video off the web a few years ago (though they say they'll eventually put them on youtube), so you'll have to make do with my recollections of it instead.

Anyway, what Bob Colwell said was that, in contrast to the 386 which brought 32-bit to the x86, the 486 which brought pipelining, the Pentium P5 which brought superscalar (multiple execution pipelines), and the Pentium Pro P6 which implemented out-of-order instruction execution, the Pentium 4 was just a collection of 10 smaller features with maybe 2-3% performance impact each. And while that may add up to the nice 20 or 30% generational gain in performance that you need to justify the existence of your project, it is horrendous for the engineers to actually reason about ("It's not that we didn't have enough smart people, it's that their heads weren't big enough!"). Because instead of having one feature that provides clarity to all your engineering choices, now you have 10 or 20 little features that all require different conditions to work at their best, and worse, they interact in various ways that are hard to anticipate. And performance simulation is hard in that the matrix of possibilities you have to deal with and the number of workloads you have to test explode pretty quickly into unmanageability. So the amount of engineering work to tune all of these features properly is increasing exponentially with the number of features you're putting in at once.

And that lecture was given in Feb 2004! If you hear David Kaplan of AMD's take on it in Dec 2015, now they're down to putting in features that give you maybe a quarter to half a percent, instead of 2 or 3%. And that means they have to put in a lot more of these features to make a product that will sell.

https://www.youtube.com/watch?v=eDmv0sDB1Ak&t=25m22s

David Kaplan said:
One very common practice is that hardware designers will put disable bits into the hardware. And I've always heard this called "chicken bits", because the designer is "chicken" and maybe the thing won't work.

It's worth noting about this that when processors are built, there are some things that give you a ton of performance. I mean, branch prediction, everyone's got branch prediction nowadays. But the way that x86 chips get the new performance that you see generation on generation is generally by a sum of very very very small parts. There will be features that get you .5% over here, .25% over here, y'know maybe there's a big feature that gets you 1%. These all stack up, and then you get your 10-15% improvement, whatever you're expecting.

If you need to disable one of those to fix a critical bug, that's not always the end of the world, and there have been cases where that's been the workaround that had to go out. ... This does require that designers kind of think about what failures they're going to have and what things they're going to want to disable down the road, and it's one of these things where you might as well throw the kitchen sink at it, because you'd much rather never set a bit than have a bug that requires a $3 million respin.

Released chips can even afford to turn off a few new features via chicken bits if it means avoiding a mask respin to fix a bug, with the cost being the chip being almost unnoticeably slower or hotter than it otherwise might be if the feature were working correctly.

This is why I don't have high hopes for Zen. AMD just doesn't have the kind of resources to stick through the first three (probably crap) implementations to get to something that might be an improvement on what they had before. And AMD can't just skip the effort Intel went through over 5+ iterations of putting in batches of features tock by tock and tuning them after each batch; it would be like jumping processes straight from 90nm to 16nm FINFET.

It's like compound interest and investing. Yes, sure, you can bet on one big risky deal that will possibly pay off in spades of performance. But the smart bet is just making slow, incremental annual gains of a few percent each, and relying on that compounding to build your performance fortune.

majord · Apr 11, 2016

Sheep221 said:
Well nehalem definitely brought new things, reintroduced HT, on-die quad cores, processor graphics, on-die DDR3 memory controller, sandy bridge brought the ring bus, increased IPC and on-die iGP, you can't really say it was tweaked conroe.

With the exception of HT, these do not form part of the core Microarchitecture. As I said, they are processor-level or 'chip level' changes (and significant).

intangir said:
This is why I don't have high hopes for Zen. AMD just doesn't have the kind of resources to stick through the first three (probably crap) implementations to get to something that might be an improvement on what they had before. And AMD can't just skip the effort Intel went through over 5+ iterations of putting in batches of features tock by tock and tuning them after each batch; it would be like jumping processes straight from 90nm to 16nm FINFET.

It's like compound interest and investing. Yes, sure, you can bet on one big risky deal that will possibly pay off in spades of performance. But the smart bet is just making slow, incremental annual gains of a few percent each, and relying on that compounding to build your performance fortune.

Whilst it's true minor iterations have been a sucessful strategy for Intel (For now, anyway ) , to assume Zen would be a failure because it's not a simple iteration of Excavator would imply it is a completely ground-up development. Which it certainly is not.

Dresdenboy · Apr 11, 2016

intangir, thanks for your stories and links. I found the Colwell video in my archive. There are also newer videos with interesting stories. E.g. in this talk he tells the story of putting SSE into the core. (at the 26th minute) According to him, Andy Grove didn't like the idea of making the die 10% bigger for added costs while there were no apps supporting it (no CPU had it of course). The argument that worked was: "AMD's gonna do it!".

This is an interesting point in the discussion whether there would still be good progress without good competition.

A more detailed answer regarding uarchs is in the works.

nenforcer · Apr 11, 2016

Don't forget the progressive TDP improvements which have been made since the original Conroe Core 2 Quads we have seen go from

95W .45nm Quad Core Core 2 Quad ->
130W .32nm Hex Core Gulftown ->
95W .32nm Quad Core Sandy Bridge ->
77W .22nm Quad Core Ivy Bridge ->
88W .22nm Quad Core Haswell ->
65W .14nm Quad Core Broadwell ->
65(91)W .14nm Quad Core Skylake

SarahKerrigan · Apr 11, 2016

Essence_of_War said:
Look around. What fraction of the server and high end workstation, and HPC markets do POWER, SPARC, DEC Alpha, and PA-RISC have?

Sure, SPARC is in the Tianhe II (which is a pretty dubiously designed supercomputer anyway) and POWER will be in the new DoE computers and LLNL and Oak Ridge, but other than that? PA-RISC and DEC are dead, and POWER and SPARC are scraps of the server markets (although they are scraps with probably pretty reasonable margins)

Promises about the future improvements of Itanium without ever really delivering, and while x86-64 DID deliver on first Opterons and then Xeons, killed RISC. And that worked out realllly well for Intel.

I think a look at SGI's financials in the late 90's, and at the performance of the R10k microarchitecture by the early 2000's, is more enlightening for explaining the demise of server MIPS than blaming IPF. Additionally, MIPS survives to present date in datacenter-class chips (Broadcom XLP II - 20 cores, each of which is quad-issue, 4-threaded SMT, and out-of-order) with more microarchitectural sophistication than R10k ever had; they're just mainly sold into the networking niche. (By the way, I know you didn't mention MIPS in this context, but another poster did.)

Saying IPF "killed" PA is weird too, since HP started developing IPF in the late 80's as the successor to PA (Intel jumped on HP's proto-IPF project around 1994 and killed their own 64-bit RISC program.) That's like saying POWER killed ROMP - technically correct but contextually weird.

SPARC was pretty bad in early 2000's, worse than IPF on a lot of things, and I don't think that particularly helped its present market share. Neither did Sun's failure to deliver, over and over - Gemini, UltraSPARC V, Rock... Also, take a look at Oracle and Fujitsu SPARC pricing sometime - both of them publish their list prices. Then compare that to their SPEC numbers. Most of SPARC's exodus, as far as I know, went to x86, not IPF.

Alpha, I think you're right about; it's genuinely a pity 21464 never shipped as it was an impressively wide core with one of the first SMT implementations.

POWER is doing just fine; P8 is doing well in commercial servers and IBM seems to have at least some idea of how to get back into HPC after killing Blue Gene. I'm looking forward to seeing how the accelerator-focused strategy pans out.

intangir · Apr 11, 2016

Dresdenboy said:
intangir, thanks for your stories and links. I found the Colwell video in my archive. There are also newer videos with interesting stories. E.g. in this talk he tells the story of putting SSE into the core. According to him, Andy Grove didn't like the idea of making the die 10% bigger for added costs while there were no apps supporting it (no CPU had it of course). The argument that worked was: "AMD's gonna do it!". This is an interesting point in the discussion whether there would still be good progress without good competition.

Glad you liked them. And thanks for the video pointer; I have not seen that before! And yes, without Netburst's failure against the Athlon64, Intel would not have implemented 64-bit x86 or tick-tock!

A more detailed answer regarding uarchs is in the works.

I'm eagerly waiting to read that reply.

intangir · Apr 11, 2016

majord said:
Whilst it's true minor iterations have been a sucessful strategy for Intel (For now, anyway ) , to assume Zen would be a failure because it's not a simple iteration of Excavator would imply it is a completely ground-up development. Which it certainly is not.

Well, maybe. I'd love to be proven wrong on that. But even if every piece of the design has been used before, the whole is still a never-before-combined gestalt that has to be tuned and tested and debugged. Previous tunings pretty much have to be thrown out because of all the inter-component interfaces that have changed, and you can't necessarily make the same assumptions you did before, or reuse any of the test collateral.

jhu · Apr 12, 2016

intangir said:
Well, maybe. I'd love to be proven wrong on that. But even if every piece of the design has been used before, the whole is still a never-before-combined gestalt that has to be tuned and tested and debugged. Previous tunings pretty much have to be thrown out because of all the inter-component interfaces that have changed, and you can't necessarily make the same assumptions you did before, or reuse any of the test collateral.

They've had at least 5 years to work on this thing. We'll soon find out if it really is all that.

Dresdenboy · Apr 12, 2016

intangir said:
Glad you liked them. And thanks for the video pointer; I have not seen that before! And yes, without Netburst's failure against the Athlon64, Intel would not have implemented 64-bit x86 or tick-tock!

I'm eagerly waiting to read that reply.

OK, here it is:

Regarding Theseus' ship: Of course, the components don't survive process related and microarchitectrual changes (even an ALU has to change with a new scheduler or result bus), but at the high level it continues to be a ship of about the same class, just being faster while consuming less, being less ugly, whatever. It doesn't become a Titanic instead.

Does adding a HW divider and increasing some buffers make the Husky core (Llano) a new microarchitecture? Yes, as it's not the same K10 anymore. But it's clearly been derived from it. So at which level or at which size do changes need to happen to call it totally new (like created from scratch), or just new (not the same anymore)?

The discussion itself remembers me of the long Barcelona IPC enhancing feature list. And of course, each single improvement doesn't improve performance for all softwares out there, and some might even be mutually exclusive. Once I made this list of list of K10's microarchitectural improvements (majord should know it well

), which are many, but of course didn't improve IPC by 1% each. According to one of the fading old reviews, which compared K8 and K10 at the same clock speed (no result spoiling turbo mode at least!), the improvement is ~24% for Cinebench. But as it seems the process in AMD's own fab ate up some of this improvement.

Phenom 2.6GHz vs. Athlon X2 2.6GHz
CB ST (2936 vs. 2359): +24.4%
CB MT (10311 vs. 4570): +225.6% (2578 vs. 2285 pts/core -> +12.8% per core, scaling 87.8% vs. 93.7%!)
CB OGL (3396 vs. 2937): +15.6%

The tested game Supreme Commander: Forged Alliance seems to be scaling well in its AI-heavy benchmark:
SC:FA min/avg/max FPS: +71.5% / +17.9% / +3.7%
2.3GHz K10 vs. K8 min/avg/max: +64.0% / +17.4% / +3.0%
WinRAR (MT): +40.2%
Source: http://www.pcper.com/reviews/Processors/AMD-Phenom-versus-Athlon-X2-Something-Old-Something-New

pcper said:
Despite its 900 MHz clock speed disadvantage, the Phenom 9600 can outpace the Athlon 64 X2 6400+ by up to 40% in applications that take advantage of multi-core CPUs, such as video encoding and 3D rendering.

Regarding your design complexity points (and likely less informative for you than for the forum

):
DEC with its Alpha CPUs surely had to learn something on its way to the desired performance levels. I think, they also had to find new ways to design things while going the low FO4 path. The ARM1 also hasn't been sold in a product.

Interesting: The ARM developers wrote the ISA model and a microarchitectural simulator in BASIC back then. But this was a simple uarch. Alpha was complex. And as someone said a while ago, the best simulator of a chip is the chip itself. But I think, that situation has changed, also indicated by Keller's statements regarding availability of x86 and ARM traces, LinkedIn profiles, papers, use of FPGAs, etc. While in the past the logic complexity of big chips was way beyond what could have been simulated in a cycle exact uarch simulator or even significant parts of them in SPICE, the computing performance continued to grow exponentially, while microarchitectures got features added in a more linear fashion. Colwells "big head" argument actually describes what happens without these options.

The growth might also help to manage the testability. The improvements and complexity can only grow as much as the design team can handle it. So there is a shift from creating abstract "mind models" of the uarch to think through known use cases (while simulating small parts to support this analysis) to simulating increasing amounts of the uarch's components at increasing detail. I often have the feeling, that many discussions going on here revolve around hidden wrong or simply missed assumptions, due to the increasing complexity. How often is energy efficiency left out of uarch discussions, while under the presence of a power wall this even helps high performance processors?! Simulation is the key to standardizing the evaluation of ideas and handle more complex scenarios. We use them for ADAS too. It's just not possible to recreate lots of realistic traffic conditions in a few architected NCAP/NHTSA tests. And nobody wants to go out and try to provoke some crashes to test new ideas (at least not here

).

I think, these changes (simulators, new design targets) might help seeing some success in creating something new with reused proven components. As a counter example to BD, Bobcat didn't do that bad for a synthesizable design, created from scratch. They started to sell the B0 stepping, which might have been caused by the core or any other component. Jaguar hit the market as A1 stepping.

Also a shift in the design goals to avoid for example logic, which causes big voltage droops (to reduce the margin) could mean, that known high performance design decisions have to be reiterated. Instead of the typical multipliers AMD might use rectangular or unpipelined arrays. So with multiple dimensions, this would for example reduce FP workload IPC by 10% while reducing FPU area by 10% and max power by 25%.

P.S.: Oh, I had to add to the newer Colwell video, that the relevant part about using AMD to force development of arch/uarch improvements at Intel begins at minute 26. BTW Dave Ditzel sat in the auditorium during that other talk at Stanford. This all is a very valuable input to my chipdesign game. ^^

cytg111 · Apr 12, 2016

(meta : nice read, thx)

sm625 · Apr 12, 2016

destrekor said:
No, I am not interesting in comparing performance or features, that is obviously futile.

It isnt futile to compare performance. I wish more review sites would keep adding nehalem benchmarks, especially for gaming. There are so many high clocked xeon quads out there for dirt cheap (i7-975 equivalents) it would be nice to know how these things keep up in modern games.

Dresdenboy · Apr 12, 2016

@sm625, doesn't the benchmark database reach that far into the past?

@all:
I added some labels to mAJORD's pic:

The AMD ones should be correct. For Intel, I added question marks when I don't have information and/or can't identify the units.

majord · Apr 13, 2016

:thumbsup: . If you're basing Core-M onwards off of the intel annotated PIII dieshot, the P-M floorplan seems to be very(very) similar. in which case I'd:

invert SCH RET and up to the top, move DEC where RET is (later sitting right on the Decoders I believe? , and take the ? off BP, as it's spot on I think.

Conroe onwards. Couldn't guess any better.. They start to hurt my head D:

Dresdenboy · Apr 13, 2016

majord said:
:thumbsup: . If you're basing Core-M onwards off of the intel annotated PIII dieshot, the P-M floorplan seems to be very(very) similar. in which case I'd:

invert SCH RET and up to the top, move DEC where RET is (later sitting right on the Decoders I believe? , and take the ? off BP, as it's spot on I think.

Conroe onwards. Couldn't guess any better.. They start to hurt my head D:

Thanks. I'll add the changes soon. I also have the annotated Nehalem core, partly annotated Sandy Bridge and Core (the latter to show some new features). Somewhere I should also have a more abstract view of the core layout of Core in my archive.

I'll post it on my blog, if that is OK with you.

guskline · Apr 13, 2016

Dresdenboy, thank you for the work you are doing in this area. I had an "itch" to build a 6700k rig but my cpus in my rigs below are so strong I decided to wait unitl next year to see what Zen really brings to the table.

destrekor · Apr 13, 2016

guskline said:
Dresdenboy, thank you for the work you are doing in this area. I had an "itch" to build a 6700k rig but my cpus in my rigs below are so strong I decided to wait unitl next year to see what Zen really brings to the table.

Yeah I'm currently in no hurry to upgrade, I've got no itch. My i7-2600K @ 4.4GHz handles everything I need perfectly. More would be great, sure, wouldn't it always, but at this point in this chip's life, with modern software, it doesn't feel any worse than it did when new. A C2D at a similar point in time would feel painful - in fact I replaced that C2D with the 2600K at about the same point in time, comparatively, but I feel zero reason to do so right now.

Zen could be interesting, but I still suspect Intel will have the edge in other ways. I very much hope Zen is a very competitive platform, but if I even upgraded right now, CPU performance would be a bonus in reality. I'd be looking for what the new platforms bring: USB 3.1, Thunderbolt 3, more PCIe 3.0 lanes for other chipset features, etc etc etc. So much to be taken advantage of nowadays. I might last one more generation - either Kaby Lake or Zen might draw me into another round of upgrades.

edit:

Also, agreed, thanks so much everyone for turning this into the discussion I hoped it would become, though I expected far less if anything.

Intel microarchitecture: Nehalem v. Skylake

Golden Member

Senior member

Platinum Member

Platinum Member

Golden Member

Senior member

Lifer

Golden Member

Member

Member

Senior member

Golden Member

Golden Member

Senior member

Member

Member

Lifer

Golden Member

Lifer

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Lifer