Yonah article here on Anandtech Part II

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
I am confused... P6 loads and stores go to the MOB (memory order buffer) after they are scheduled from the RS, which acts as a conduit for all memory interactions. Once in the MOB, a uop is no longer required in the RS; the MOB cotinues to use the robid (reorder buffer id) to track the uop. Reorder buffers do not deal with scheduling, they deal with retirement and x86 events.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
I am confused... P6 loads and stores go to the MOB (memory order buffer) after they are scheduled from the RS, which acts as a conduit for all memory interactions. Once in the MOB, a uop is no longer required in the RS; the MOB cotinues to use the robid (reorder buffer id) to track the uop. Reorder buffers do not deal with scheduling, they deal with retirement and x86 events.

We say scheduler -- mainly mean RS.

You are wrong about MOB. It may have some schedule function("RS" for load/store?)

The load/store uops will not placed into RS , but go into MOB and ROB.
For example:
x86 code
mov eax,X; one load uop
mov ebx, ecx; one ALU uop
sub eax, edx; one ALU uop

the load uop will go into ROB and MOB, but not go into RS.

After reg renaming the new uops maybe:

RAT:
......
R1: ROB2
R2:ROB1
......


ROB:
LOAD R1,X(TAG:MOB0)
R2<-R3(TAG:RS0)
R1<-R1-R0(TAG RS1)

RS:
ROB1<-R0
ROB2<-ROB0-R0

MOB:
LOAD ROB0,X
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
No, on P6 all loads/stores go to the MOB via the RS. There is no direct path from rename to the memory/execution units. The only uops that go directly from rename to retirement in the ROB are FXCH uops, certainly not integer. Without going through the RS, the machine has no way of resolving data dependencies. Also, once a load hits, the MOB still needs to send the robid back to the RS for waking up dependents. Only when the load data writes back and is verified to be valid can the uop write into the speculative ROB and ready itself for retirement.

It is true that the MOB is a "scheduler" of sorts for LD/STD, but it does not bypass the RS.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
No, on P6 all loads/stores go to the MOB via the RS. There is no direct path from rename to the memory/execution units. The only uops that go directly from rename to retirement in the ROB are FXCH uops, certainly not integer. Without going through the RS, the machine has no way of resolving data dependencies. Also, once a load hits, the MOB still needs to send the robid back to the RS for waking up dependents. Only when the load data writes back and is verified to be valid can the uop write into the speculative ROB and ready itself for retirement.

It is true that the MOB is a "scheduler" of sorts for LD/STD, but it does not bypass the RS.
Above the example is from the Intel's designer.

No.
Loads/stores uops will not be placed into the entry of RS. There is no direct path from rename to the execution units, but it is not true that memory uops go into RS, then RS dispatch the load/store uops into MOB.

The load/store uops will be placed into MOB , but not RS.

If there is another RS(for load/store), L/S uops maybe go into the another "RS", then go to MOB.
 

Betwon

Member
Dec 20, 2005
81
0
0
MOB can communicate with RS to solve the data or control dependency.

Right:LOAD uops are stored into MOB
Wrong: LOAD uops are stored into RS -> RS dispatch them ->are stored into MOB
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Well I happen to be a RS designer on a future intel uproc family based on the p6 style, and I don't know how the load would be able to resolve its source dependencies before going to the MOB if it does not pass through the RS.

I'm pretty damn sure what you said is wrong is exactly what happens. :)

The fact that the MOB has to communicate robid's back to the RS means that all loads must go through the RS. Imagine a load -> load dependency. The second load has to wait in the RS until the MOB returns the robid of the first load before it can even be sent to the MOB.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
Well I happen to be a RS designer on a future intel uproc family based on the p6 style, and I don't know how the load would be able to resolve its source dependencies before going to the MOB if it does not pass through the RS.

I'm pretty damn sure what you said is wrong is exactly what happens. :)
Really?

Well, do you know Dr. AviMendelson from Intel?

He give the P6 example step by step.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
No I work in Oregon.

You don't know anything about K7/K8 of AMD?

After early decode, the load ops are placed into LS1(Load Store Unit1), but not be place into RS, then RS dispatch them into LS1. AMD can do it.

Do you really know P6? Maybe Pentium2 is different.

But you say all P6 CPU is that way.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
What's with the tone? Yeah, I do know the P6. And yeah, afaik this protocol has been consistent since P6. Having the MOB resolve source dependencies would require the addition of a bunch of new logic which already exists in the RS, so what's the point.

No, I don't exactly how the K8 does it. But I can imagine them writing various bits of information into their load/store unit after rename then waiting for the RS to write the pending information later. The P6 just waits until dispatch then writes everything at once.
 

Marmion

Member
Dec 1, 2005
110
0
0
Hey, I just clicked - speculating about Conroe clock speeds, is it possible that Intel deliberately clocked the Smithfields/Presslers at 2.8-3.46Ghz because thats what they expect to get out of Conroe?
Pure speculation but if proved true that is some very clever marketing (ie no speed drop explanations needed for ordinary Joe Blogs).
And it also seams realistic, with Yonah capable of 2.5Ghz clock (then declocked for power reasons to 2.16Ghz at launch, 2.33Ghz later in the year before Memron), I guess with the slightly longer pipeline then maybe thats what we can expect from Conroe?
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
What's with the tone? Yeah, I do know the P6. And yeah, afaik this protocol has been consistent since P6. Having the MOB resolve source dependencies would require the addition of a bunch of new logic which already exists in the RS, so what's the point.

No, I don't exactly how the K8 does it. But I can imagine them writing various bits of information into their load/store unit after rename then waiting for the RS to write the pending information later. The P6 just waits until dispatch then writes everything at once.

Well. Can you tell me something about P6? Store buffer and load buffer is two different buffers or a unified buffer? Is there the load buffer?
 

Betwon

Member
Dec 20, 2005
81
0
0
Is a certain model of CPU in Patent 5974523 "Mechanism for efficiently overlapping multiple operand types in a microprocessor" more close to P6 CPU?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Sorry to tell you this, but that presentation is a simplification of what really happens in the machine, since a cycle accurate description would take a lot more than 8 slides. To prove my previous point, if you insert say, "LD X,Y" before the "LD R1, X", the R1 load from X would have to wait in the RS until X LD from Y is finished in the MOB and wakeup robid sent to the RS. The presentation depicts the MOB write at the same time as the ROB allocate just to illustrate the allocation of MOB and ROB entries by the in-order portion of the machine, nothing more.

 

Betwon

Member
Dec 20, 2005
81
0
0
But you should notice that:
......................
LD R1,X -- MOB0
R2<-R3 -- RS0
......................

ID has definely shown what happen.

Dr. Avi may be wrong, You say? Really nothing more?
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
Sorry to tell you this, but that presentation is a simplification of what really happens in the machine, since a cycle accurate description would take a lot more than 8 slides. To prove my previous point, if you insert say, "LD X,Y" before the "LD R1, X", the R1 load from X would have to wait in the RS until X LD from Y is finished in the MOB and wakeup robid sent to the RS. The presentation depicts the MOB write at the same time as the ROB allocate just to illustrate the allocation of MOB and ROB entries by the in-order portion of the machine, nothing more.

You make a mistake.
For x86 or RISC CPU, there is no LD X,Y instruction. It is impossible!
LOAD means -- load data from memory subsystem to reg, so X is not a reg!
if you want to store data into X, you have to use the store instruction, such as ST X, reg.
 

Betwon

Member
Dec 20, 2005
81
0
0
You will find that Store instructions were place into MOB.

The previous store instructions will store the data into the store buffer. If the data isn't ready, a non-ready tag will be set.

When LD instrution load data to reg, MOB always try to find the match data from store buffer firstly. If it find the match data , but still not ready, it can let LD wait until the data ready. If it can not find the match data, it will find the data from cache or RAM.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
You are not making sense. You say X is not a reg... OK, how do you think addresses are stored if not in physical registers? Magically floating around the AGU to be retrieved at will or somesuch, lol. I already said loads/stores go into the MOB... and I'm fully aware of the MOB's higher protocols.

I'm not saying the presentation is wrong... it is just a gross simplification designed to illustrate basics.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
You are not making sense. You say X is not a reg... OK, how do you think addresses are stored if not in physical registers? Magically floating around the AGU to be retrieved at will or somesuch, lol. I already said loads/stores go into the MOB... and I'm fully aware of the MOB's higher protocols.

I'm not saying the presentation is wrong... it is just a gross simplification designed to illustrate basics.

The pointer reg of data is different with reg. There is no instruction, such as Load the value of reg to memory data, -- it should be Store instruction.

You also should know the address displacement of x86. It exits such instruction:
load reg,[0x72h31h].

X is a memory var.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Sorry, mem pointer is still considered data, hence stored in a register. You might want to look up something called "microcode", since x86 backends have about as much similarity to x86 ISA as like, dogs to cats.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
Sorry, mem pointer is still considered data, hence stored in a register. You might want to look up something called "microcode", since x86 backends have about as much similarity to x86 ISA as like, dogs to cats.
You may foreget something?
How about:
mov eax,[0x72a3h]?

You should know what is address displacement. Why do you not know it?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Seriously, what's up with the tone. I'm not going to explain any more, since you are obviously a troll who knows jackshit... because if you did, you should have been able to construct the P6 flow for displaced loads using my previously polite explanations. Goodbye.
 

Betwon

Member
Dec 20, 2005
81
0
0
You don't explain: mov eax,[0x72a3h], it is address displacement , not a reg.
Your explanations is not for the unified scheduler of P6, but you think CPU impossible not to place the load op into RS.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
I know it is impossible because I own that chunk of logic on a P6-style uproc in dev, and yeah, I know the historical background too because I read all those design docs as well. Nitpicking about the meaning of x86 wording reveals absolutely zilch about backend operation. Please get the basics from a book or something because I'm not going to explain anything in detail.