Technical difference between AGP 1X, 2X, 4X and forthcoming 8X

andri

Senior member
Aug 12, 2000
339
0
0

Hm... From where does the bandwidth difference come from and what's the biggest differences between those standards? I know that AGP 1X is basically 66MHz PCI plus some features, AGP 2X is AGP 1X plus DDR, but what about the others?

 

RSI

Diamond Member
May 22, 2000
7,281
1
0
Doesn't 2x AGP mean AGP but twice as fast? Wouldn't the same pattern apply to the rest?

-RSI
 

highwire

Senior member
Nov 5, 2000
363
0
76
This is almost classic. Everybody tripping out and over the term "quad pumped" for months and no one can say what it really is. Funny.

Maybe it's like the "Kludge Maker". A secret technical device invented on a US Navy ship in WW2. No one knew what it was and didn't want to seem stupid and ask. When the inventor having been rewarded with leisure and privilege for this important work finally demonstrated it by launching it overboard, sure enough - as it hit the water and sank, it made this definite sound - *k l u d g e*.

BTW, I got nothing out of the linky thread - gibberish.
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
There are a number of different ways to achieve signalling beyond DDR. Multiple voltage levels can transfer, for example, two bits at a time: "low" = 00, "sort of low" = 01, "sort of high" = 10, "high" = 11...in combination with DDR signaling, this achieve QDR. Or you can have two DDR clock signals that are 90 degrees out of phase with each other, so that you can sample four times per global clock cycle.

In the case of AGP, some crucial signals are strobed at a fixed multiplier of the global clock. With AGP8X, the address, side-band address, and bus command signals are strobed at 8X the clock signal. It looks like the 8X side-band signal drives two 4X signals, SB_STBS and ST-STBF....these let you latch data twice at a 4X strobe. With AGPs pipelined bus transfers, this allows you to potentially transfer data with the 8X multiplier.

The AGP 3.0 specification is here if anyone is interested.



<< This is almost classic. Everybody tripping out and over the term "quad pumped" for months and no one can say what it really is. Funny.....BTW, I got nothing out of the linky thread - gibberish. >>

Thank you very much for your insightful input.
 

highwire

Senior member
Nov 5, 2000
363
0
76
Thank you, Sohan, for the reference.

Since most now sound like marketing types with jargon terms like "quad pumped" and "sideband", I thought that I, as an inheritor of The Enlightenment could defend knowledge and reason in a jocular way. Apparently the forces of sensitivity and equality do not want obvious muddle pointed out.

A term like "sideband" has a precise meaning in the spectral realm. Outside of that, it is jargon. Quad pumped? The old Ford flathead engine had two water pumps, so I guess that could be called a "duo pumped" device. Jargon.

On to more light. And thanks again for the reference to the AGP 3.0 spec.

OK, here it is. AGP X8 is nothing more than 8 possible BINARY transitions in one 66 mhz clock period. Period is right. No multi-level sigs. No phases. Just BINARY. Fast binary. Some clever sync and stuff, but just binary.

Kind of a let down after all the jargon.
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
I think you'll find that just about any technology that the marketing departments like to run with does have real technical merit behind them, they just are not apparent from the marketing propoganda.

For example, take the P4's "double pumped" ALUs....Intel's marketing loves to say that the latest P4 has 4GHz ALUs.

The truth of the matter is that the ALUs are pipelined with a fast clock (FCLK) signal that indeed runs at twice the global clock. It takes two FCLK periods, equal to one global clock period, to complete an arithmetic operation....in the first FCLK cycle, the lower 16-bits of the operation are calculated, and in the second FCLK cycle, the upper 16-bits are calculated. But the ALU bypass network runs on the FCLK signal and takes care of fetching operands from the register file and issuing arithmetic operations twice per global clock cycle. This has a few distinct advantages over normal ALUs.

First, a little background (I don't know your background, disregard if you know all this already :))....with superscalar cores, it is the goal to issue as many operations as possible per clock cycle. To accomodate for the maximum amount of instructions of a certain type that may burst through in a particular clock cycle, you're going to want to have multiple copies of the same unit, such as 3 or 4 integer units (ALUs). But there are limitations to the number of instructions that you may issue in a cycle...aside from memory stalls, data dependencies between instructions may prevent you from issuing multiple instructions in a cycle. There are three types of data dependencies:

Read-after-write:
a = b + c
d = a + e
The result of the first operation is used in the second, thus they have to be issued in order

write-after-read:
a = b + c
b = e + f
Issuing these instructions out of order or at the same time may change the results for the first, since b will be written to.

write-after-write:
b = a + c
b = d + e
Issuing these instructions out of order or at the same time could semantically change the program.

Write-after-read and write-after-write dependencies can be solved with register renaming, but traditionally there is no way to issue two instructions with a read-after-write dependency at the same time or out of order.

But with the P4's two-staged pipelined ALUs, running at 2X the global clock rate, the bypass network can effectively issue two instructions with a read-after-write dependency in the same global clock cycle. From the perspective of the global clock, the first instruction will be issued at cycle N, and the second instruction will be issued at cycle N + 1/2...at this time, the lower 16-bits of the first instruction are already calculated and available to the second instruction, so in a way this solves the read-after-write dependency problem. This doesn't necessarily have a *huge* impact on performance (probably not noticeable for most programs), but it does affect the way the processor can handle bursts of integer operations.

Secondly, two fast ALUs can handle the same number of instructions/cycle as four normal ALUs, but with less die area used and heat produced, which could conceivably be very useful for the MPU designers.

So when the marketing department gets a handle on this concept, they're obviously not going to understand the issues and benefits behind it, so they'll say "Wow, 4GHz ALUs! That must mean the P4 has twice the performance!" I really don't think it should be the fault of the engineers when they have a good idea and marketing doesn't understand it.
 

MadRat

Lifer
Oct 14, 1999
11,941
264
126
If a CPU could just store problems in an undefined variable cache it could perform out of order computations. (undefined variable = abstract register link to the incomplete sum.) It would simply hold back some computation until the dependencies become defined. Eventually a nasty loop of undefined variables would cause a paradigm that could not be solved! No answer no problem, at least it was like that in my math classes. To go deeper than one level of undefined variables they could have imcomplete function caches! (A single, abstract variable represents the sum of the function.) It would be like math class all over again.

Edit: How true that last statement, Sohcan!
 

highwire

Senior member
Nov 5, 2000
363
0
76
Sohcan, thank you for the effusive soliloquy on ALU design. I followed it - even might have refreshed some points for me.

However, most of the design strategies you elude to have been well known and used for years. Anyone who has strung together gates to make an ALU soon knows that there is a speed/complexity issue that must be resolved. As the ALU gets wider, the fast carry gating, for example, might quickly become overwhelming. So, if one has the gate speed, breaking it up and using more and faster cycles makes sense. No biggy. The design will depend, among other things, on the resources in silicon the designer has to work with. If Intel didn't want to talk trick, they might have described their 32 bit ALUs as being implemented with double clocked 16 bit registers. Or, if they decide that their solution should have some unique name outside the Intel cabana, they owe it to the community to define their solutions in the simple terms they can be.

To show how far away terminology can be from the community, here is a comment from the previous aceshardware link which asked what AGP X8 was:

<< Thanks, I will keep on searching... Why is it that the ones I would expect to fill my limited brain with the details (Johan, Paul Demone, Tom, Anand, etc...) have yet to dive into these details. This seems to be the wave (no pun intended) of the future:) >>

I think he got it about right. I said I didn't get anything from that thread, and I meant that none of the submitters had a clue what AGP X8 really was. That is understandable, since none of their usual sources of info seemed to know either.

Marketing departments and even universities are prone to take yesterday's technical meat and potatoes, put a little sauce and a french name to it and call it a paradigm shift.

But a few of us have to ask: Hey, how does your paradigm kludge really work?
The answer usually involves the meat and potatoes we already know.
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< Sohcan, thank you for the effusive soliloquy on ALU design >>

Yeah, yeah....I know I talk too much sometimes. :)

You've got a very good point, most hardware sites don't delve into the technical details, be they benefits, issues, and background of said technologies. But I don't really blame them....Anand, Tom, and other hardware sites write great product reviews, but its obvious after reading their "technical" articles that they rely mostly on marketing packets. And since marketing handouts obviously aren't going to go into the real meat and issues you describe, the "general community" only gets a sense of what the marketing tells people.

Paul DeMone (and Johan at Ace's to a lesser degree) was able to use his engineering background to interpret what's actually going on....too bad he doesn't write articles at RWT anymore :(. So in the end there are hardly any unified sources that write truely unique in-depth articles, so one must rely on scouring the posts of engineers at various bulletin boards. These days I pretty much rely on comp.arch and engineers who frequent various forums, such as pm here and DeMone and others at Real World Tech.

And what about AGP4X and 8X? I guess the glitz and glamour of CPU architecture has put bus technology on the backburner for informative articles. :)

Oh, and I didn't mean to be a little curt in my reply to your first post....I mistook your intention.