Asynchronous computer components

Foxery

Golden Member
Jan 24, 2008
1,709
0
0
In modern PCs, the various parts tend to operate at very different frequencies; multipliers and ratios abound. CPU:FSB and FSB:RAM are the ones changeable by overclockers, plus hidden/automatic ones between the FSB and, say, your PCI(e) slots.

In the old days, a classic CPU ran 2-4X the speed of the system bus and RAM, and these were multiples of 33MHz, the PCI bus speed. There was a simple buffer (the L2 cache) which could just wait out the other clock ticks, waiting for the bus to be ready.

But what connects these together and keeps the data flowing smoothly when we use strange ratios which don't evenly divide? e.g. modern 0.5-step CPU multipliers, and RAM/FSB ratios like 5:4 and 6:5. They can't wait around like a 486, and I can't imagine there are little high-speed buffers all over the motherboard. What modern mechanism am I missing here?
 

firewolfsm

Golden Member
Oct 16, 2005
1,848
29
91
There's a quartz crystal on the motherboard, I'm not exactly sure how it works but everything syncs to it somehow, google it.
 

bobsmith1492

Diamond Member
Feb 21, 2004
3,875
3
81
Caching? Data from A->B at speed X comes in a burst, gets stored in cache at B, then moved from B->C at speed Y where Y<X. Hard drives do it; video cards do it.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Most of the magic happens at the memory controller -- the northbridge in Intel chipsets. Its a pretty complicated problem, yes, but not unsolvable. Check out this paper on Intel's high-performance memory controller (Bensley).
 

Foxery

Golden Member
Jan 24, 2008
1,709
0
0
3 "I don't know, but I felt like neffing" and 1 white paper about a completely different topic. Ok...

Where's CTho? :/
 

Special K

Diamond Member
Jun 18, 2000
7,098
0
76
Typically different clock domains are separated by a synchronizer circuit. A simple one would consist of three back-to-back flip flops. The first flip flop would be clocked in the first clock domain, and the second two would be clocked in the second clock domain. You need to have two flip flops in the second clock domain to ensure the data is captured correctly. When a signal coming from one clock domain enters a flip flop clocked by a different clock, it is asynchronous. If the setup and hold times of the second clock are not met, then the wrong data value can be flopped. To prevent this from occurring, you send the signal though two consecutive flops in the second clock domain. Provided you hold the original signal steady for more than one cycle of clock two, the second flop in the second clock domain will capture the signal's correct value. The output of the second flip flop is now synchronous with respect to clock 2.

I am having trouble explaining this concept in words, but this website does a pretty good job of explaining the concept using sample circuits and waveforms:

http://www.edadesignline.com/howto/205201913

As you mentioned in the OP, it is obviously much easier to handle clocks that are multiples of each other, as these are considered to be the same clock domain and do not suffer from the metastability problems that arise when you try to cross from one clock domain to another.
 

dorion

Senior member
Jun 12, 2006
256
0
76
Here I was ready to get all my Asynchronous Computer Links out.

Clockless Computing FTW.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Special K
Typically different clock domains are separated by a synchronizer circuit. A simple one would consist of three back-to-back flip flops. The first flip flop would be clocked in the first clock domain, and the second two would be clocked in the second clock domain. You need to have two flip flops in the second clock domain to ensure the data is captured correctly. When a signal coming from one clock domain enters a flip flop clocked by a different clock, it is asynchronous. If the setup and hold times of the second clock are not met, then the wrong data value can be flopped. To prevent this from occurring, you send the signal though two consecutive flops in the second clock domain. Provided you hold the original signal steady for more than one cycle of clock two, the second flop in the second clock domain will capture the signal's correct value. The output of the second flip flop is now synchronous with respect to clock 2.

I am having trouble explaining this concept in words, but this website does a pretty good job of explaining the concept using sample circuits and waveforms:

http://www.edadesignline.com/howto/205201913

As you mentioned in the OP, it is obviously much easier to handle clocks that are multiples of each other, as these are considered to be the same clock domain and do not suffer from the metastability problems that arise when you try to cross from one clock domain to another.

Good post. That link is really good.

They can't wait around like a 486, and I can't imagine there are little high-speed buffers all over the motherboard. What modern mechanism am I missing here?

There are little buffers everywhere (at least wherever there's a clock-domain crossing, e.g. between the northbridge and DRAM, between the northbridge and each core, etc).

A couple notes:

1. While it's obviously preferable to have clocks that are in-phase with each other (and rational frequency multiples, e.g. 1/2 or 3/5), at some point it becomes impossible to distribute / generate the clocks while maintaining that guarantee.

2. Three flops isn't necessarily enough. The settling time of a flop goes up as the data arrives closer to the clock edge, and at some point becomes longer than a whole clock cycle. In this case, the first flop in the receiver domain will still be outputting garbage when the second flop reads it, and you get bad data. To avoid this, you can add more flops in the receiving domain, so that even if the second flop doesn't stabilize, the third one hopefully does. You can do some mathematical analysis to find the probability of synchronizer failure for a given synchronizer, and have to add more flops in the chain until the probability of failure is acceptably low. Of course, each one you add adds another cycle of latency, so you don't want to just use arbitrarily long chains.
 

Foxery

Golden Member
Jan 24, 2008
1,709
0
0
Thanks, Special K. The third page made my head spin, but I got the gist.


Originally posted by: CTho9305
There are little buffers everywhere (at least wherever there's a clock-domain crossing, e.g. between the northbridge and DRAM, between the northbridge and each core, etc).

Really? Hmm. How large are we talking about? Are these SRAM modules, (like a CPU L1 cache,) but tiny and at lower speeds?

In the case of enthusiast boards, where the manufacturer doesn't know ahead of time what settings the user will want, how do they decide on the specs/capacities of the flipflops and buffers?

1. While it's obviously preferable to have clocks that are in-phase with each other (and rational frequency multiples, e.g. 1/2 or 3/5), at some point it becomes impossible to distribute / generate the clocks while maintaining that guarantee.

2. Three flops isn't necessarily enough. The settling time of a flop goes up as the data arrives closer to the clock edge, and at some point becomes longer than a whole clock cycle. In this case, the first flop in the receiver domain will still be outputting garbage when the second flop reads it, and you get bad data.

I'm guessing this means we won't likely see a system which lets you choose arbitrary ratios willy-nilly, since the designer needs to know in advance which ones his circuits are able to process correctly.

/lick CTho
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
This is beyond my knowledge - a lot of this is educated guessing.

Originally posted by: Foxery
Originally posted by: CTho9305
There are little buffers everywhere (at least wherever there's a clock-domain crossing, e.g. between the northbridge and DRAM, between the northbridge and each core, etc).

Really? Hmm. How large are we talking about? Are these SRAM modules, (like a CPU L1 cache,) but tiny and at lower speeds?

They're almost certainly built out of flip flops - they're integrated into the dies, so they're not separate chips. I would be surprised if buffers were large enough to warrant using an SRAM-style design rather than flops... but I'm guessing.

The synchronizers need to run at the highest speed of the source / consumer, so definitely not "at lower speeds". I'm pretty sure one of the presentations on Barcelona has a die photo with the synchronizers labeled if you're really interested.

In the case of enthusiast boards, where the manufacturer doesn't know ahead of time what settings the user will want, how do they decide on the specs/capacities of the flipflops and buffers?

Part of it is that buffers can often indicate when they're full, so the sender stops for a while. Part of it is that "enthusiast boards" aren't doing anything unexpected or secret. Pretty much everything you can tweak is documented somewhere (usually in the BIOS and Kernel Developer's Guide: Barcelona/Phenom, DDR1 CPUs, DDR2 K8's). I know overclockers sometimes like to think they're "pulling one over on the man", but "the man" is well aware of what's going on and could probably put a stop to it easily (e.g. make it require hardware modification).

I think enthusiast boards just make more of the settings available to the user, while other boards only provide safe options.

I would imagine buffers are generally sized for maximum performance with some future-proofing in mind (it's expensive to tape out a new revision), possibly controlled by fuses.

1. While it's obviously preferable to have clocks that are in-phase with each other (and rational frequency multiples, e.g. 1/2 or 3/5), at some point it becomes impossible to distribute / generate the clocks while maintaining that guarantee.

2. Three flops isn't necessarily enough. The settling time of a flop goes up as the data arrives closer to the clock edge, and at some point becomes longer than a whole clock cycle. In this case, the first flop in the receiver domain will still be outputting garbage when the second flop reads it, and you get bad data.

I'm guessing this means we won't likely see a system which lets you choose arbitrary ratios willy-nilly, since the designer needs to know in advance which ones his circuits are able to process correctly.

"It depends." In a system with no full/empty feedback that relies on known guarantees, operating outside of the allowed ratios would cause problems. However, I would expect most synchronizers use some sort of protocol to ensure that buffers don't overflow... which would allow them to handle arbitrary ratios.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Thanks for the links you guys. I'm actually working on the portion of the chip with multi-clock crossing domains and let me tell you it's freaking impossible to keep track.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: TuxDave
Thanks for the links you guys. I'm actually working on the portion of the chip with multi-clock crossing domains and let me tell you it's freaking impossible to keep track.

Hopefully they're at least in the same order of magnitude -- I once had a design that required a 10-stage resynchronizing pipeline to avoid metastability. Of course, that was bloody old CMOS, and was probably an artifact of our library.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Originally posted by: degibson
Originally posted by: TuxDave
Thanks for the links you guys. I'm actually working on the portion of the chip with multi-clock crossing domains and let me tell you it's freaking impossible to keep track.

Hopefully they're at least in the same order of magnitude -- I once had a design that required a 10-stage resynchronizing pipeline to avoid metastability. Of course, that was bloody old CMOS, and was probably an artifact of our library.

If there's anything good to talk about it, they're at least multiples of each other. Since I'm not allowed to give out clock frequencies, imagine a clock frequency F and having blocks that are 2xF, 4xF, 10xF. And then various flavors within each frequency but are just phase shifts of each other. Yeah.. it sucks.