Breaking news: Security flaw supposedly found in Intel's hyperthreading implementation

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
I find it amusing that so many people here assume P4's SMT implementation is for show and has no performance benefits even though many workloads benefit from fine-grained multithreading, and not just "multitasking" either. It's been documented for years. Then again, if the only benchmarks run are games, there won't be any improvement.

This sounds like a case of fanboyism... especially when they blithely assert that Power and SPARC do SMT "correctly". LOL!

Naturally certain loads will suffer from the resource sharing... but considering the fact that the way resources are shared can be manipulated in software (in multiple ways, no less) or turned off entirely on the fly, only a very small percentage of legacy code actually suffer a performance hit.

By the way, SMT was not a backhanded attempt to overcome P4's deficiencies post-Willamette. It was there from the very beginning.
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
I thought hyperthreading was just a technique they used on a p4 that sort of separates the pipeline into logical units, so when there's a stall, it doesn't stall the whole pipeline, only part of it. Correct me if I'm wrong, but this is needed on a p4 because of it's long pipeline and the huge performance hit it takes if there's a stall.

AMD, on the other hand, has a shorter pipeline, and takes less of a hit when there's a stall. So, supposing no stall occurs, does hyperthreading do anything then? Does it REALLY let the p4 execute 2 threads simultaneously, or is it just a trick to boost performcance in certain cases?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
paper

its got some marketing bs, ignore it and go to the parts on the pipes and architectural state. and yes, it can execute two threads simultaneously.

smt was not designed to "fix" more severe flush penalties with a longer global pipe. that problem was supposed to be handled with speculation / replay. smt was designed to use resources when available by bringing in another thread, because stalls occur frequently no matter what.
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
Originally posted by: dmens
I find it amusing that so many people here assume P4's SMT implementation is for show and has no performance benefits even though many workloads benefit from fine-grained multithreading, and not just "multitasking" either. It's been documented for years. Then again, if the only benchmarks run are games, there won't be any improvement.

This sounds like a case of fanboyism... especially when they blithely assert that Power and SPARC do SMT "correctly". LOL!

Naturally certain loads will suffer from the resource sharing... but considering the fact that the way resources are shared can be manipulated in software (in multiple ways, no less) or turned off entirely on the fly, only a very small percentage of legacy code actually suffer a performance hit.

By the way, SMT was not a backhanded attempt to overcome P4's deficiencies post-Willamette. It was there from the very beginning.



The benefits of multitasking are a side product....the fact it can speed up apps that are multi-threaded is the one that is a bit more blurry cause the fact is if Intel had implemented a different architecture with high IPC like AMD with a shorter pipeline less prone to stalls when branch mispredictions occur then the cpu would be as fast as the current chips are with HT...Ht is leveling out the problems inherenet in the architecture used.....It is great for P4 northwood users cause it was the burst of speed needed to put the northwood in a class of its own at the time. X2 and true dual core processors show us the real benefit of a 2nd core. Again like it has been mentioned numerous of times the AMD architecture would benefit very little with HT as it is implemented today. mentions of HT and future Dothan based cpus led to a discussion that pointed to a drastically different implementation of HT with future designs...Why??? Cause future architectures will not be hampred by the long pipeline, and have ever better branch predictions. HT will be implemented to other forms of SMT and be implemented like a set of SSE codes....


Dont get me wrong the HT is a nice gain in apps when it is turned on versus off but dont think of it as some sort of turbo bonus....It is basically taking the banana out of the one tailpipe....

As many have mentioned befiore a software implementation of a better thread scheduler could do for AMD what Ht has done for INtel in terms of multitasking smoothness...It is no great hardware feature and that is why I have stated it is an added benefit that was inevitable when the cpu recognized the HT as being a 2nd core (virtual core).....
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
Originally posted by: dmens
paper

its got some marketing bs, ignore it and go to the parts on the pipes and architectural state. and yes, it can execute two threads simultaneously.

smt was not designed to "fix" more severe flush penalties with a longer global pipe. that problem was supposed to be handled with speculation / replay. smt was designed to use resources when available by bringing in another thread, because stalls occur frequently no matter what.



True but they are amplified by the longer pipeline when they do occur. the penalty is much greater. By implementing that secoind thread simultaneously the cpu can assure the pipeline will stay full...Now AMD and Dothan would not suffer as greatly with this type of stalls so again as I have stated this shows the HT as more of a beneficial fix tot eh specific P4 architecture....


I am not saying it is market BS...I respect the fact they continued to get all they can out of it and make the design more efficient....
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
K8 probably stalls even more than Prescott on mispredicts. Even though K8 recovers in fewer cycles than P4, it would still have benefited from SMT simply because of the fact that the functional blocks are often idle, not just because of branches. I would have liked to see K8 with SMT. I am almost certain it will demonstrate significant averaged improvements.

SMT in future P6 families will be implemented much the same way as P4, not drastically different. SMT implementations are similar across the board.

As for software schedulers, well, as I said above, can't blame the hardware when software is not using it correctly. That is a different discussion entirely.
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
I will look for the discussion but I believe it was from an AMD engineer who discussed that HT would not benefit anywhere near what it does in the pressie and northwood...K8 has been improving the branch prediction in the last 2 core reviosions....

I will try to find the info....If it really benefited why wouldn't HT have ever been implemented in a Dothan chip??? The added heat from the extra transistors couldn't have been the reason and the added larger cache would have been more beneficial to that typoe multi threading...I read the paper and it mentioned cache and its rule when the processor is scheduling threads, accessing the data, accessing it out of order, etc....
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
In terms of Hyperthreading, AMD seems to be quite vocal in its opposition to the technology. In a quote, AMD told us that ?Hyperthreading is a fix for an inefficient architecture?. AMD has no plans to integrate a Hyperthreading-like technology into their future processors, even though Simultaneous Multi-Threading, the real-name behind Hyperthreading, is not an Intel exclusive technology. AMD likes to dub the technology, ?Hype?rthreading.

I will find more but I would think there could be some spin to it, and the fact at the time the K8 was emerging when they said this so in their mind true SMT would arrive in the future with dual core...for which we all know the K8 was designed form the ground up to be....

http://www20.tomshardware.com/business/20050330/multicores-04.html

read the top two paragraphs...pretty much states what I said....I am sure she knows a helluva lot more then either one of us...

On the contrary. The much shorter pipeline on AMD will make Hyperthreading pretty pointless. Memory latency of course still exists on the K8, but the massive penalty of a branch misprediction on the P4 is not as bad on the K8. The K8 is also better at pairing MMX/SSE operations.

Just a clip form a tech review of a hardware site...who knows if they know for sure...looking for a more hardware engineers take on it...I can see your reluctance since PR and spin is always around us....
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
IMHO, SMT was not done on yonah or dothan because when those projects started the P-M's were supposed to stay in mobile, where chip size and power matter more... even the 5% die size overhead for SMT was not acceptable.

Also, adding SMT takes a lot of time and a ton of validation effort.

Even if K8's predictors got better, I think they'd *still* benefit from SMT. If you can find that paper, I'd appreciate it. I suspect it's PR... like a lot of things these days.

In retrospect, it was definitely a better return on invest for AMD to work on an integrated memory controller on K8. SMT would have added complexity to an already ambitious core. But that doesn't mean it doesn't mean SMT does not offer big benefits.

SMT and dual/multi-core are not mutex... saying that SMT is no good because dual cores are coming makes no sense, you can have four threads instead of two. Dual Xeons already proved that model beneficial.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
What's "true" SMT? For years fine-grained multithreading is considered "true" SMT. Well, that's what p4 does. Hijacking the term for dual core is obvious PR, pure and simple.

On the contrary. The much shorter pipeline on AMD will make Hyperthreading pretty pointless. Memory latency of course still exists on the K8, but the massive penalty of a branch misprediction on the P4 is not as bad on the K8.

The "shorter pipeline will make SMT pointless" mantra is fluff. A branch mispredict is not the only stalling event, and with certain events the K8 is slower than P4, regardless of pipeline length. If anything, K8's increased fetch bandwidth would allow it to keep both threads fed even as it satisfies other memory requests.

The K8 is also better at pairing MMX/SSE operations.

afaik, SIMD heavy workloads tend mispredict stall the least, and they are also the easiest to prefetch, making SMT benefits minimal compared to other, more random loads. Beats me why the article mentioned that fact. that's one hardware engineer's pov. :)
 

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
Not having read the thread, how in the world can this happen. This is not a piece of software or anything. All it does is merely while data is being sent through the pipeline, instead of waiting for all of it to be executed before sending more, it sends another group of data (in a nutshell).

How can this have any affect in security? It does the same work with it disabled, it isn't like Intel added another line of code, or another stage to the pipeline.

-Kevin
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
paper

just read it now... looks like the key theft issue works only for that particular implementation of openssl, as far as i understand it... actually, the same algorithm compiled into a similar loop can still be exploited.

to use this technique elsewhere you'd have to infect regular binaries then add a spy process, which seems to be overkill? there's probably easier ways to leak forbidden info... maybe.

still, nice find. overhyped by the press and amd fanboyz above though. hiya. :laugh:
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
http://www.realworldtech.com/forums/ind...=3408&Thread=4&entryID=50850&roomID=11

Above is the possible explanation of that. We can obviously see that the press saying "HT has security flaws!!!" is overhyped as if you read deeply into the article it says:

1. Affects servers likely more than home users
2. All SMT implementations that share memory caches inherit these problems(I mean, which doesn't share caches between logical threads?)

So what's basically saying is we are basically screwed in the future when Intel or AMD puts say, 16 threads per core in the future(maybe XD, NX or LaGrande will be used to offset that).

The reason the articles that say other SMT implementations are flawed are non existent is because none of the other processors are cheap/widely spread for somebody to do this research, it can be done, but proportionally, Pentium 4's are much more likely analyzed for these kind of flaws since LOT more people have access to Pentium 4's than Power 5's. Look, it took like 3 years for this article to come out, and millions of people have Pentium 4's with HT now.





(The following is not directly related to above, but I'll say it anway)

The main reason of Pentium 4 having HT is not to fill the "empty execution units because of longer pipelines" as stated by some people. It is a good reason to put HT, but not the only one.

With HT, you can have performance increases that were not possible considering the increase in die size and power consumption increase. That is very attractive for a CPU manufacturer. The resources and money required to increase performance by adding more execution units are becoming very hard, what better way than HT to increase performance superlinearly(compared to die size increase/power consumption increase). The overall benefits of HT in performance is 5-10%, with less than 4% increase in die size, also 5-10% increase in power consumption. Before, you had to increase die size by 20% and increase performance even less than HT gives you(like double cache, or increasing number of execution units, more pipelines and/or better branch prediction). If HT was so significantly better for Pentium 4 than Athlon64 because of misprediction penalty, Prescott should have had so much better increase with HT on/off. Instead, Prescott gain by enabling HT was only slightly better than Northwood. Major benefits to us is the semi-dual processor like smoothness we get.

The reason the articles that say other SMT implementations are flawed are non existent is because none of the other processors are cheap/widely spread for somebody to do this research, it can be done, but proportionally, Pentium 4's are much more likely analyzed for these kind of flaws since LOT more people have access to Pentium 4's than Power 5's. Look, it took like 3 years for this article to come out, and millions of people have Pentium 4's with HT now.

Its not that Pentium M can't have HT and benefit from it, its likely the market and the thermal constraint that is preventing Pentium M from having HT, probably also the fact that Pentium M was never designed from the beginning to have HT, while Pentium 4 did(kinda like how A64 was designed for better dual core operation).