So ok SMT (Hyper threading) but what about TLS?

waqaasist

Junior Member
Mar 21, 2005
9
0
0
Ok, like everyone out there...i am quite disappointed in Intel's pathetic attempt to implement smt based processors. It was a hack job where they added to the native pipeline to implement simultaneous threading...however SMT cannot be designed like that because its a different architecture by itself...so what do we end up with...well a long pipeline which causes huge performance impacts when it comes to branch perdiction....so here's the question...is it feasible to do a complete redesign of a processor in today's market? granted true SMT is superior than CMP processors...however, its far easier to go from an existing core to a CMP then to go from an exisiting core to an SMT. In the end you come up with a crap product...Now what would be kool is an SMT processor that implements TLS where you can predict up to 8 branches simulatenously...however this requires a complex multi branch perdictor...so do you think its possible to implement this...why didnt intel do a proper redesign? market is growing too fast? well why not hold out your current processor as long as possible while designing the next gen? but what about the hardware costs? is it feasible to implement complex branch perdictors and TLS, speculative thread branches...will it be too expensive for the market? people willing to buy...in the end there are amazing architectures out there...but trying to understand why they were never implemented is rather difficult.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
So how exactly is Intel's SMT implementation pathetic? And how does TLS tie into this? I have no idea what you're talking about.
 

SuperTool

Lifer
Jan 25, 2000
14,000
2
0
SMT is not superior to CMP by definition. I would take a dual core CMP over a single core SMT any day. Really, SMT is not a big selling point for me at all. Especially if you have to pay two processor licenses for a piece of software because your single core now looks like two processors to the OS.
As far as complete CPU redesign, it's a question of time and money. A shrink and optimization of an existing design costs a couple hundred mil when all is said and done. A complete redesign could cost maybe 500 mil to a billion plus dollars as in case of Itanium for example. The other consideration is time. A corollary of Moore's law is that if you take 18 months more to design something, it better be twice as fast. So if you can get something out 18 months earlier, you only need to make it half as good. A complete redesign you have to start 5 or more years out and try to predict what the market demand and software will be like when the CPU comes out. Itanium bet on good compiler support and market acceptance that never materialized.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Or you can just do both. Core changes for SMT and CMP tend to happen on different parts of the processor.
 

borealiss

Senior member
Jun 23, 2000
913
0
0
i'm not sure if you're asking a question or just ranting, but here are my 2 cents. SMT is an awesome technology and is especially suited to cpus with deep pipelines and lots of unused functional units. The die space is minimal compared to a true multicore design which effectively doubles your die area with intel's current implementation. afaik intel's initial dual core strategy is to have two cpus on a die and not 2 true cores. if all that is needed are a few tweaks to the microops scheduler for a potential 50% more utilization of functional units per clock on a pipeline that is already penalized massively for a bubbled datapath then i will take it. couple that with the possibility of SMT already on top of a true dual core design and the potential for throughput is enormous. The only problem i see with intel's SMT implementation are the performance penalties seen over a cpu with SMT dsiabled. ideally this should never be the case and indicates some areas in the scheduler that could use improvements. typically some of the performance loss seen in intel's SMT enabled parts are not seen in a true SMP system which would indicate a hardware limitation.
 

Calin

Diamond Member
Apr 9, 2001
3,112
0
0
Redesigning a processor from the ground up is certainly possible - and it was made not once but twice in the not-so-distant past. The Athlon64 would be an example, and the NetBurst architecture (Pentium 4) would be the other example. You could also add the Transmeta processors here, as a third example of a new concept for an x86 processor
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Isn't A64 yet another proliferation from K7? And then K7 is quite similar to P6...
 

MetalStorm

Member
Dec 22, 2004
148
0
0
Originally posted by: dmens
Isn't A64 yet another proliferation from K7? And then K7 is quite similar to P6...

Yep, but you can see that even a relitivly small redesign like that (on-die mem controller more cache, better branch prediction, 64bit support, more GPRs, a longer pipeline and more) took about 4 years
 

waqaasist

Junior Member
Mar 21, 2005
9
0
0
incorrect actually. if you look at the benchmarks...cmp and smt performance are exactly ontop of eachother until u hit 6 way and up. its easier to add more threads then it is to add more cores. also, the idea of unsed functional units still exists in the case of dual cores. in both cases they will look like two processors to windows. dual cores and smt's. think of it this way. 300 million transistors right. 150/150 if you wanna do dual core. that means the effective resources have been decreased by half. now lets say u do dual threaded. u have more resources and better resource utilization because the unused functional units can be utilizied by independent threads.
 

waqaasist

Junior Member
Mar 21, 2005
9
0
0
you never want a deep pipeline...it affects misprediction. long pipelines suck because if you mispredict you loose too much information...which is why a redesign is generally required with smt. a lot of ASP and DSP processors are smt based with smaller pipelines. take for example the super IBM power5 which is a crazy architecture. fastest clock is 1.5ghz last i checked...but dual core and dual smt in 1 processor! smt is an amazing technology but it is more affective in shorter pipelines because u can shoot off threads in more directions and resolve faster with less chance of loss
 

waqaasist

Junior Member
Mar 21, 2005
9
0
0
their pipeline is too deep...they didnt actually create an smt processor in a native redesign but added stuff to the old p4 architecture. if you do true smt you put in TLS...TLS is thread level speculation where you can shoot off threads in the perdicted direction of the branch...that way if you encounter a branch whilst still trying to resolve the first branch, you can start off a thread in the way of the second perdicted branch. this allows each thread to only see threads that are less speculative then themselves.
 

waqaasist

Junior Member
Mar 21, 2005
9
0
0
very good point calin! i keep forgetting that the k8 was a redesign...so is the next gen amd proc
 

Calin

Diamond Member
Apr 9, 2001
3,112
0
0
Originally posted by: dmens
Isn't A64 yet another proliferation from K7? And then K7 is quite similar to P6...

I can accept that the AMD64 is a K7 which is a PentiumPro (even if Intel and AMD had no common paths from the days of K5), and P4 could be a Pentium3 which is a Pentium 2 which is a Pentium Pro.
However, the Transmeta with "native" x86 functionality is a completely new design
 

waqaasist

Junior Member
Mar 21, 2005
9
0
0
a64 is a complete redesign...if anything might be similar it might be a functional unit here and there but even that i would think is rare....k8 is completely seperate from k7...working for AMD i can tell you that much at least.
 

borealiss

Senior member
Jun 23, 2000
913
0
0
A64 is not a complete redesign, many pipeline blocks were kept the same going from K7 to K8, dispatchers, ALU's, etc... There are core components that are completely new like the xbar, BU/cache/memory controller, SRQ/SRI, HT controller, and the scheduler which has evolved from K7. But for the most part the datapath and execution units are almost untouched.

deeper pipelines are going to be more of a scheduling nightmare than ones that have short pipelines because the datapath is going to be much harder to keep full. SMT represents a way for a CPU like the P4 to deal with this scenario and keep its datapath full. SMT may work wonderfully on short pipelined CPU's but for processors with many unused execution units and a deeper pipeline it will show more benefit. it probably is harder to implement because of the different hardware contexts at different stages.

longer pipelines are not ideal from a scheduling standpoint but tradeoffs have to be made between ramping in clockspeed and IPC. Simply saying that they are bad because of branch mispredictions is not entirely accurate.

As for having a true SMT design, i'm not sure about true SMT designs inherently incorporating TLS.
 

borealiss

Senior member
Jun 23, 2000
913
0
0
back to your original question, redesigning the P4 for a true SMT/TLS design would probably not be cost effective. For one, its a discontinued design and they did not speculate about the multicore approach that the market was going to take over a monolithic pipelined single threaded CPU. They also did not foresee their p4 design running out of steam so quickly, so their SMT is more of a knee jerk reaction to the market. If it was prudent for them to redesign the P4 this late in the game i believe they would have done so. Even their initial dual core offerings will not be true dual core processors, so if they are not going to crank out a true CMP design soon i don't think they are going to make a true SMT/TLS design based on a dying archtecture.