Doesn't anyone have anything else to say about the IME?
Not much, other than if implemented correctly, the IME should have no impact at all on performance during normal operation. Really, it's like adding another independent watchdog component that operates in parallel with the existing chip components. So the entire premise of the thread is fallacious.
You'll have to look elsewhere for an explanation of the tepid performance gains of the last few generations of x86 chips. In my estimation, it's pretty much the reasons frozentundra123456 summed up: it's hard to squeeze out more IPC gains at this point,
Dennard scaling of transistors has come to an end, and the market is no longer as performance-driven as it used to be (existing chips are fast enough for the majority of customers).
These small IPC gains aren't recent. Here are where the large IPC gains happened:
1) 8086 to 80286
2) 80286 to 80386
3) 80386 to 80486
4) 80486 to Pentium
5) Pentium to Pentium Pro
IPC increases since Pentium Pro have been fairly small. So essentially we've had 20 years of small IPC gains.
Yes, exactly, and with clock speeds no longer increasing, that doesn't leave anywhere for performance improvements to come from.
Pentium 4 to Core2
Core 2 to Nehalem
Nehalem to Sandy bridge
All pretty big IPC increases.
You don't have a proper perspective of what IPC gains used to look like back in the 1980s and '90s. The gains you listed were in the neighborhood of ~50% for the Pentium 4 to Core transition and maaaybe at most 20% for the generations since then. And the P4 transition was a unique circumstance, switching from the Netburst "speed-racer" microarchitecture that pushed clock speed to a "brainiac" one that emphasized more work per clock cycle.
Here, let me show you some numbers for the earlier x86 generations:
Code:
Year Gen MIPS MHz IPC Gain
1978 8086 0.33 5 0.066 -
1982 286 1.5 10 0.15 +127%
1985 386 11.4 33 0.33 +120%
1989 486 40 50 0.80 +142%
1993 Pentium (P5) 126.5 75 1.69 +111%
1995 PPro (P6) 541 200 2.71 + 60%
As you can see, historically, any gain that doesn't at least double IPC isn't anything to get excited about. We've been inching along IPC-wise since the Pentium Pro came out 20 years ago. And things are unlikely to improve no matter how many resources Intel expends on the problem.
The 386 was a pretty straightforward 32-bit extension of the previously 16-bit x86 architecture. The 486 added scalar operation (pipelining) and an integrated floating-point unit, the P5 added superscalar (multiple execution pipelines) and branch prediction, the P6 added out-of-order execution and micro-op decode. Since then, with one 5-year detour through Netburst, Intel has essentially been making incremental tweaks to the base P6 design and bolting on special-purpose hardware like SSE units. There are just no big microarchitectural ideas left that we can expect to suddenly wave a magic wand and unlock new branches of IPC fruit to pick.
Moreover, Robert Colwell, one of the designers of the P6, made the observation that the vast majority of performance gains since 1980 have been from clock speed and not microarchitecture. In his estimation, it's about 3500x improvement from clockspeed (1 MHz to 3.5 GHz) and about 50x from architecture / microarchitecture (0.066 IPC to ~3.5). And now that transistor scaling seems to be running out of steam, we'll be looking forward to perhaps 10% gains from here on in.
You should listen to his keynote speech "
The Chip Design Game at the End of Moore's Law", from the HotChips25 symposium in 2013. It's really a good entertaining watch. At the time, he was Director of the Microprocessor Technology Office at DARPA.
"It's really important: don't confuse performance and transistors. Moore's Law is about transistors. It's [the industry's] job to turn that into performance, or whatever people want to pay money for."
"I don't think there's ever been a technology development exponential like this one. I also heard Ray Kurzweil in particular goes down a different path. He says 'no no no no, Moore's Law is just one of a set of exponentials over history. It's just the latest one; don't worry about it.' And I say, baloney, I don't agree with you at all. I think there haven't been five; I think there's been one, and this is it.
"The way I thought of a chip architect's job in context with Moore's Law was to stay out of the way. The idea was if nature's going to give you this bounty of lots more transistors, oh and they're all faster, oh and they're lower power, don't fight it! Find a way to design the machine so as to leverage that fact, rather than try to be clever and just blow that off.
"I'm not saying microarchitecture has no place. But the way I view it, the scorecard over the last, say, 35 years: I figure, [in] 1980, Bill and I were both at Bell Labs designing essentially a 1 megahertz 32-bit processor. Today, clocks are running about 3500 times faster. And so I think it's entertaining to sort of consider what did we architects bring to the table, in terms of pure architecture--microarchitecture ideas like pipelining, or superscalar, caches, all the stuff that we threw in--relative to just the plain clocks plus large number of transistors. And I think the score is, we came out way on the short end of that stick. I think the silicon gave us way more than the architects could have made up for.
"Why that matters, aside from getting yelled at from [your managers]-- Regina Dugan was the previous head of DARPA, and at some point I said much these same words to her, and she said, well then, you wasted your time professionally for many years, didn't you? And I went, I choose not to view it that way, ok? But aside from personal pride, the question is if the fundamental Energizer Bunny silicon engine stalls out, and we have to resort to being clever and only microarchitecture without additional transistors, how much runway is left? And I'm saying we're going to go down there, we're going to do the best we can, but don't expect 3500x. There's nothing like that on the [horizon]--I don't see that remaining.
"All right, so can we continue to crank out successful new chips? You could ask the question, well sure we can, if you can find enough goodies to bucket them all together and say, okay maybe Moore's Law left and I don't have as many transistors, but I'm a clever person and my new machine is 50% better than the old one in... performance, or power, or something. I would say 50% is pretty good. You could probably find a market for that. Uh, how about 20%? How about ten percent? How far down are you willing to go and still think that you've got something that you can sell? I think that's the future that we have to contemplate seriously and try to avoid, because I don't think the world's going to give a whole lot of extra money for a ten-percenter in general.
"So here's an example. I picked this off the Internet. Unfortunately, in DARPA's zeal to give everything the proper attribution down here, they replaced the person's name with the column, but you'll find it. This was one of those letters to the editor kind of things, a comment column. It's usually a lot of junky stuff down there, but this at least was
crisp about what the attitude was. It says, 'Ultimately, I think Moore's Law will never stop. Computer builders will find other methods to make their computers faster.' And I think, well, that's at least--I'm happy for your optimism. I actually think that there's some truth to that for a short time.
"But the problem is that the low-hanging fruit has already been taken, and the amount of effort it's going to take to do anything beyond that is going to be substantial. And you
cannot, in my personal opinion, you cannot make up for the lack of an underlying exponential. Those fixes will last us a few--we'll play all the little tricks that we still didn't get around to, we'll make better machines for a while, but you can't fix that exponential. A bunch of incremental tweaks isn't gonna hack it."
Performance doesn't always come from IPC*MHz, it can be dependent on the kind of code used. For instance having more SIMD paths is partly what set Sandy bridge processors apart from Nehalem. A specific example of this is comparing my old i5-750 to an Ivy bridge i5 system I built for my brother. Using the same model Geforce 670 in both, PhysX performance in Arkham City was close to double using the ivy Bridge chip, and the i5-750 was clocked 400MHz higher.
I think the reason we are seeing significantly better performance in games like GTAV with Skylake compared to Haswell is because games like these can make better use of the CPU's registers and resources than older less demanding games.
I think what you're seeing here is an example of software taking advantage of special-purpose hardware additions, like QuickSync or AES encryption instructions, and those sorts of gains are workload-dependent and rely on the unpredictable implementation schedules of software writers. You can't depend on those for general performance improvements.