No Larrabee this year

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hard Ball

Senior member
Jul 3, 2005
594
0
0
Originally posted by: Nemesis 1
Originally posted by: Hard Ball
Originally posted by: IntelUser2000
Originally posted by: soccerballtux
What's the big deal about this? How is it going to function being such a different chip from ATI/Nvidia's monolithic approach? Isn't this just a bunch of Atom processors stuck together?

It's much different. Atom in comparison has a heftier branch prediction, better integer processing and data prefetcher units with optimization towards single thread processing and de-empathized towards FP.

In Larrabbee, to make the cores smaller those things are much simplified but they beefed up the FP units a lot which will be critical for graphics processing. Each of the Larrabbee core will have 4x the capability of the FP/SSE units in Core 2.

The Atom core alone is too big to have enough cores and be a competitive graphics core anyway.

Agree with most of what you said, except this:

Atom in comparison has a heftier branch prediction


I am sorry but you guys lost me . I have a larrabe diagram .I am sorry but that diagram
doesn NOT show a branch prediction unit on larrabee at all only decoders. As is allready been stated intell can do this in software threw compiler. Heres diagram . Notice NO branch predition unit. As shown on other intel core diagrams.

http://www.eweek.com/c/a/IT-In...side-Intel-Larrabee/4/

The VPU pipeline in Larrabee uses predication to reduce the penalty of branch misprediction, which involves using mask registers to record polarity of the branches. And prediction is necessary only when there is a uniform polarity of bits across the relevant mask register, which would be a small percentage of the time under most work loads. So it's not really comparable to the branch predict unit of a mostly scalar processor. Nor would it take less area to implement, which accomplishes something quite different than a large BTB + BHT + loop predictor + RAS of your standard x86 microarchitecture.

I'm not sure what is really confusing. Almost all microprocessors with a significant pipeline have some type of branch prediction, otherwise the only recourse would be to delay the fetch of instr after branch until the BU resolves. Larrabee does not have the type of robust branch prediction that most current microprocessors have, such as multilevel correlating BHT and tournament predictors on some of the more complex designs that you may see today.

The real trick with larrabee is that control flow of the the execution trace can actually be routed through the vector pipeline, by using predication in the vector units with mask registers; which are basically bits that dictate the destination registers/mem addr of each of the lanes of a vec instr. The VPU in Larrabee is essentially a vec16 ALU. And in the cases where the control flow at run time dictates that the CF polarities of some of the lanes in vec instruction be different from others, then both the target/fall-through instructions are executed, with results of each written to the appropriate destination location as specified by the mask register; when the mask register contains only uniform control flow, then only the appropriate branch is executed. In the type of vec heavy types of software that Larrabee is envisioned to excel at, the front end BP actually has very little effect on the efficiency of the overall execution, but the predication scheme would handle most of the heavy lifting.

The cache hierarchy and cache control of Larrabee is also quite different from a conventional x86. Especially is that case, that some explicit cache control instructions modes (in addition to explicit prefetch) are provided, such as modes that can mark lines for early eviction, and even some explicit control of the coherence scheme (actually allowing some lines to be explicitly invalidated, to support scrach pad mode for regions of caches).
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
I'm not sure what is really confusing. Almost all microprocessors with a significant pipeline have some type of branch prediction

Except for vector processors, which Larry is. Larry is an in order core, you avoid branches.
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
Originally posted by: IntelUser2000
Originally posted by: SickBeast
Originally posted by: IntelUser2000
Originally posted by: Genx87
Originally posted by: LOUISSSSS
i'm willing to wait till 2010 for a great GPU from Intel. If they're gonig to do it right, do it right the first time. I believe that intel will do a great job creating their first discreet gpu.

This isnt their first discrete GPU. The infamous i740 was their first attempt and it wasnt all that great. They didnt do a follow up.

The i740 wasn't really Intel's attempt. It's merely a rehash of the GPU created by the company they gobbled up. But Larrabbee will be their real in-house project. Let's not look back to i740 and IGPs and compare to Larrabbee. It's like predicting how Sandy Bridge will be looking at how Netburst(Pentium 4) CPUs did.

Idontcare: I doubt they can make up a 128 core version. I don't know if you have seen it but the very first leaked documents of Larrabbee had 24 cores. It's definitely more now but I'm not sure if they'll even reach 64 cores.

I actually thought that the i740 was supposed to be quite good; about as good as whatever 3DFX had on the market at the time. Sure it was a little late, but it was competitive and AFAIK it was a high-end part.

http://en.wikipedia.org/wiki/Intel740

According to Wiki the i740 shared system memory. That's not really high-end in my book...

Back then the new big thing for GPU's was the ability to share cheap main memory across the AGP bus. But what happened was on board memory was cheap enough there was little need. Intel made a poor decision to utilize AGP while their competiton did the opposite.

Kind of sound familiar? Intel wants to do raytracing on Larrabee while the industry doesnt?

Intel has a way of trying to capture or revolutionize a market. Sometimes it fails(i740, Rambus, Itanic). While other times it shines (Centrino line).

 

Hard Ball

Senior member
Jul 3, 2005
594
0
0
Originally posted by: BenSkywalker
I'm not sure what is really confusing. Almost all microprocessors with a significant pipeline have some type of branch prediction

Except for vector processors, which Larry is. Larry is an in order core, you avoid branches.

I see what you are saying; but that's not what Larrabee is, at least not what Intel conceives it to be. Larrabee has a fully functional scalar x86 pipeline, which needs some form of branch prediction. Task scheduling is done entirely in software on larrabee, which is in contrast with most vector processors (often with hardware command proc). There is no clean implementation of structures common in vector processors, such as stream buffers or streaming register file (really a non-coherent cache suited for large streams of data) either; rather, it uses some additional cues implemented in conventional cache, plus some software mediated control of cachelines to try to accomplish most of the same thing.

In order core does not allow you to avoid branches, I'm not sure where the rationale of this assertion comes from. Branches will induce obligatory bubbles in the pipeline in in-order or OoO processor alike, without any prediction at the I-Agen stage; one will be much more costly than the other, but nevertheless. If the pipeline is extremely short, like the classic 5 stage RISC pipeline, you may be able to avoid penalties of some of the branches with a very clever design with some extra logic in the ID stage; that is certainly not the case with any x86 design. Even the simplest RISC procs usually use a simple heuristic such as "always predict taken" and the like.
 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
Originally posted by: chizow
Originally posted by: SickBeast
I don't understand why the NV guys hate on the Larrabee so much. It's new tech and will give us all better hardware if it's the real deal.
Probably has something to do with the fact you tried dragging Nvidia into the discussion while mixing in unbounded optimism for Larrabee based on questionable information and opinion.
Actually, the OP posted an article where an obviously biased NV scientist was overly negative about Larrabee. How on earth did I drag NV into this when it was right in the OP? :confused:
 

Rhino2

Member
Jun 19, 2008
59
0
0
I'm saddened at having to wait to see it make its debut, but hopefully this will just mean a better product in the end.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,003
126
Originally posted by: BenSkywalker

Larry is an in order core, you avoid branches.
Why does an in-order core guarantee no branching? If I issue a branch instruction on an in-order processor, it?ll branch. Why wouldn?t it?

In-order simply means the processor executes every instruction in the sequence it was given and can?t re-order them during execution.
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
The wiki page mentions Larrabee is an in order superscaler processor.

Larrabee's x86 cores will be based on the much simpler Pentium P54C design which is still being maintained for use in embedded applications. [9] The P54C-derived core is superscalar but does not include out-of-order execution, though it has been updated with modern features such as x86-64 support, [8] similarly to Intel Atom. In-order execution means lower performance for individual cores, but since they are smaller, more can fit on a single chip, increasing overall throughput (and lowering observed memory latency). Execution is also more deterministic so instruction and task scheduling can be done by compiler.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Originally posted by: IntelUser2000
Originally posted by: Fox5

600mm^2 for the top end larabee?
You could be looking at nvidia and ATI chips that are 25% to 50% bigger. Even if they're not, ATI currently markets ~600mm^2 dies at the ~$100 market, and the higher end packages just play around with the amount and speed of ram and core speed. Intel's got a lot to prove here, and I don't think a Pentium derived design is going to get it right on the first try, except maybe if they're a process node ahead of ati/nvidia. (32nm gpu?)

Please, the biggest GPU die is from Nvidia at 576mm2 and that's the GT200. The 55nm parts are smaller than that. The ATI parts are almost HALF of the GT200 and less than 300mm2 die size and that's for the high end part. Too many misinformed people are creating rumors!

My bad, I typed 4850 die size into google and that result popped out, I didn't check the context to see they were talking about the gtx 280.

I wonder what process node intel is using? I guess if it's a limited release part it doesn't matter, but I can't see them prioritizing an expensive to produce gpu over say more i7s.
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Why does an in-order core guarantee no branching?

It doesn't, you avoid them(ie- the programmer). I avoid potholes in my car too, but sometimes they are unavoidable. If you fail to avoid them, you 'break' the design intent of the processor(performance collapses).

If I issue a branch instruction on an in-order processor, it?ll branch.

Very, very slowly yes. You are better off increasing the workload by an order of magnitude to avoid a branch if at all possible the overwhelming majority of the time, sometimes a couple orders of magnitude. If you create code with even a moderate amount of branches, run it on the CPU.

I see what you are saying; but that's not what Larrabee is, at least not what Intel conceives it to be.

Based on what Intel has stated it is exactly what it is conceived to be. I don't think it will work at all like they are claiming, but what we have to work with is what they have told us.

In order core does not allow you to avoid branches, I'm not sure where the rationale of this assertion comes from.

Not in an absolute sense. The programmer who doesn't jump through hoops to avoid it on most in order cores tends to write rather poor code.
 

chizow

Diamond Member
Jun 26, 2001
9,537
2
0
Originally posted by: SickBeast
Actually, the OP posted an article where an obviously biased NV scientist was overly negative about Larrabee. How on earth did I drag NV into this when it was right in the OP? :confused:
No he wasn't overly negative, he made a statement grounded in fact based on his knowledge and work in the field and instead of attacking the veracity of his statements, you launched into a tirade about his credibility and intent being influenced solely by his employment, even calling for NV to fire their CEO over it? In any case, I think it was made pretty clear to you in that thread why there iss considerable doubt about Larrabee's ability to compete in the market as either a GPGPU or Rasterizer.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Originally posted by: Fox5

My bad, I typed 4850 die size into google and that result popped out, I didn't check the context to see they were talking about the gtx 280.

I wonder what process node intel is using? I guess if it's a limited release part it doesn't matter, but I can't see them prioritizing an expensive to produce gpu over say more i7s.

All I read so far is that it'll end up to be a 45nm product.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: IntelUser2000
Originally posted by: Fox5

My bad, I typed 4850 die size into google and that result popped out, I didn't check the context to see they were talking about the gtx 280.

I wonder what process node intel is using? I guess if it's a limited release part it doesn't matter, but I can't see them prioritizing an expensive to produce gpu over say more i7s.

All I read so far is that it'll end up to be a 45nm product.

This is what the extent of what I have seen/read in public domain as well, and when the time-window for early release included the possibility of 2009 it made a lot of sense that it would be 45nm.

But now that we are seeing more info enter the public domain regarding eliminating 2009 from timeline window it is making more and more sense to consider the possibility that Larrabee could debut in a 32nm implementation.

Consider what Intel did with Itanium and Poulson.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Idontcare, you do realize that even though Poulson will skip 45nm just to catch up with its x86 brothers in process node, by the time its released the 32nm process technology will be well matured?

With Tukwila delayed to middle of this year, Itanium is a FULL process technology node behind even compared to the Xeon MP CPUs. Back then when Itanium 2 was first out it at least kept parity with Xeon MP processors, which should have been all the time. Now the Intel 32nm CPUs will be trickling out while Itanium just reached 65nm.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: IntelUser2000
Idontcare, you do realize that even though Poulson will skip 45nm just to catch up with its x86 brothers in process node, by the time its released the 32nm process technology will be well matured?

With Tukwila delayed to middle of this year, Itanium is a FULL process technology node behind even compared to the Xeon MP CPUs. Back then when Itanium 2 was first out it at least kept parity with Xeon MP processors, which should have been all the time. Now the Intel 32nm CPUs will be trickling out while Itanium just reached 65nm.

Of course, but Itanium is not a XEON competitor, it is a SUN Sparc/Niagara and IBM Power5/6 competitor.

Care to hazard a guess when either SUN or IBM will be fielding their 32nm chips to compete with Poulson?

You may have noticed neither SUN nor IBM are fielding 45nm big-iron CPU's yet...

(PS - you may not know this about me but I spent a decade making the leading-edge process technology for those SUN chips at TI...I may not carry myself like someone who knows much of the field but I happen to know a little bit about it)
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I'm not talking about Itanium vs. others for performance. I'm talking about process technology.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: IntelUser2000
I'm not talking about Itanium vs. others for performance. I'm talking about process technology.

Me too. There is a reason the process technology for Itanium lags that used for the x86 parts. The same reason that the process technology used on all the remaining big-iron CPU competitors to Itanium lag the leading edge as well.

At TI our high-performance SUN process node (internally called dot-C and dot-B) lagged our leading edge release timeline (internally called dot-0 and dot-m) by nearly 18 months.

For example we qualified our 65nm node in late 2005, but the SUN node for 65nm did not qualify until middle 2007.

It's the same at IBM (65nm Power6 was June 2007) as well as Intel, but I would totally agree with you that Tukwilla is crazy late to the party by about a year now.

Intel skipping 45nm to focus on pulling in the Itanium 32nm release timeline will be a major coup for them if they pull it off as it would easily give them a 2yr lead over IBM and SUN releasing their 32nm products.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I originally responded because people speculate the possibility of Larrabbee being 32nm when the release date is supposedly Q1 2010. Unless they want to kill the project, that's not going to happen.

Poulson is late 2010 just for the reason its 65nm predecessor isn't even here yet. Delay in one project delays the rest.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: IntelUser2000
I originally responded because people speculate the possibility of Larrabbee being 32nm when the release date is supposedly Q1 2010. Unless they want to kill the project, that's not going to happen.

Poulson is late 2010 just for the reason its 65nm predecessor isn't even here yet. Delay in one project delays the rest.

Tukwila was intentionally delayed to re-engineer some interface aspects of the design which were deemed to be necessary from a forward-looking veiwpoint as the Poulson and Kittson teams fleshed out the Itanium roadmap in parallel to Tukwila's ramp to tapeout.

Its a "take the nasty medicine today and avoid the major surgery tomorrow" situation as Tukwila represents a major platform change-up for Itanium, a once-in-a-decade type of change up for that market segment.

SUN and IBM design teams do not suffer this added headache of accommodating the design-in requests from the design teams on architecture N+2 and N+3 for simple fact that IBM and SUN don't have N+2 and N+3 design teams of the scale and scope which Intel fields.

Back on topic though, I'm not sure why you feel Larrabee being 32nm and released in Q1 2010 would kill the project. Can you elaborate on what motivates you to make this statement?
 

Hard Ball

Senior member
Jul 3, 2005
594
0
0
Originally posted by: BenSkywalker

I see what you are saying; but that's not what Larrabee is, at least not what Intel conceives it to be.

Based on what Intel has stated it is exactly what it is conceived to be. I don't think it will work at all like they are claiming, but what we have to work with is what they have told us.

No, not at all; I have had a small part on the verification of the RTL of Larrabee, so I know exactly what Larrabee is, and it is and has been envisioned as a general purpose design that can excel at many different mixtures of workloads; and to quote a small snippet of quote from internal documentation in Intel (nothing of any value in regard to NDA):
Larrabee?s general-purpose many-core architecture delivers performance scalability for various non-graphics visual and throughput computing workloads and common HPC kernels.


Originally posted by: BenSkywalker
In order core does not allow you to avoid branches, I'm not sure where the rationale of this assertion comes from.

Not in an absolute sense. The programmer who doesn't jump through hoops to avoid it on most in order cores tends to write rather poor code.

As we have discussed, Larrabee is designed to work with a large variety of workloads, and is not dependent on a small set of program kernels written in assembly (which is what essentially what needs to be done to ensure that very little branching occurs under all workloads); it is designed to work with compilers that have similar multi-pass multi-IR and backend instr-select/reg alloc, as most of the x86 compilers today. In no sense, is Larrabee going to be able to avoid branching to such an extent that the absence of any front end branch predict at the I-Agen stage would not significantly affect performance. So unless all of the developers that Larrabee is targeting are proficient at writing directly in x86 and x86 extensions, what you said does not make any sense.