Domino vs. Static xtor logic

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Slide 30 - "Low Power Chip Design"

http://intel.wingateweb.com/US.../SF08_TCHS001_100u.pdf

The graph on the left implies watt/performance doubled in the transition from static to domino...and the only reason for including this graph would be to imply that performance/watt will double in the forthcoming transition back to static from domino...

So someone educate me here, is Nehalem really going to demonstrate significantly lower power consumption numbers (performance normalized) when fully loaded relative to yorkfield?

To a naive outsider it seems like returning to static CMOS is a MAJOR milestone for Intel and ought to result in some major power reductions (on the order of 25-30%) under full load.

If it doesn't then what was the point?

Any of our architecture guys want to help me out?

Edit: and is this the "going back and doing things different because we can" thing that Gelsinger was talking about Intel doing when they came to terms with the fact that their HK/MG 45nm tech gave them substantially improved PMOS xtors and beta ratio's that were back to nearly one again? Is that what a beta-ratio of one enables you to do, effectively implement your design in static CMOS? If it isn't, then what was Gelsinger talking about a year ago?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
heh. i don't think that graph is linearly scaled.

going static CMOS can actually increase power. for example, to achieve equivalent speed on a static circuit with the same topology will result in much bigger gates compared to high skew domino. the ROI depends on the design and scenario.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
I would be wary of seeking education in marketing presentations. They're good for finding questions to ask (as you've done), but they can be misleading. I wonder what they mean by "datapath" on that slide.

It's complicated. When you look at that graph, you have to keep in mind that (as far as I can tell) each bar represents not just a different circuit style, but also a different architecture. I would imagine that P4 could have made static CMOS look pretty bad. Nehalem's IPC probably helps make static CMOS look good compared to the in-order designs that dominated for the first half of the 1990s.

One significant advantage of static CMOS logic is that it's a heck of a lot easier to port. If you're big on tick-tock, it's nice to be able to get the shrink out quickly. Static CMOS is extremely robust - you can change all sorts of process parameters, and the design will still work fine*. On the other hand, any sort of ratio logic can stop working if the process changes, requiring significant redesign. If you imagine a 16-input dynamic NOR, you need to guarantee that when the PFET is on and all the NFETs are off but leaking, their leakage isn't big enough relative to the PFET to pull the output low. At the same time, you need to guarantee that when just one of the 16 NFETs is on, it's strong enough to overpower any keeper PFETs you may have. Obviously if your new process node leaks more or has more variability, you get to waste engineers on splitting your 16-input dynamic NORs into 8-input gates or worse. If you start with static logic, it will still work fine in the new process node. You can also use a much less experienced engineer to baby sit static CMOS across a process shrink than you can use for dynamic logic.

*if you have hold time failures, you can break it... but generally hold time is margined very conservatively.

To elaborate on dmens' post, consider a hypothetical 128 bit adder designed with dynamic logic. The logic may be fast enough that you can use longer-gate or higher-Vt transistors to build the circuit. If you choose to switch to static logic, you may have to use shorter gate lengths or lower Vts, increasing leakage; you may even have to change the architecture (for example, you might switch to a carry-select adder to shorten the carry propagation path, and triple the overall size of the adder). In today's world of fast-transistors-and-slow-wires this is a bad example, but if you're worrying about that you already know enough to not need an example ;)

An even simpler reason that static CMOS can be higher power is that a static CMOS gate could easily have 2X as much cap on its inputs as a dynamic gate for the same driving strength... and switching more cap increases power (you have a tradeoff of switching less cap more often, or more cap less often).

When it comes to beta ratios, as you get closer to 1:1, static CMOS does become more appealing because the penalty you pay in both drive strength and load cap from the PFETs is reduced. I don't remember what Intel's 45nm devices look like though in terms of relative performance.

.and the only reason for including this graph would be to imply that performance/watt will double in the forthcoming transition back to static from domino...

Do you see any indication that we're going to escape from the power wall we just ran into? If not, increasing power a lot for a little more performance is unlikely to show up again in the mainstream segments...
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
good points. with the continual march towards cell-based design, there is always more pressure to move as much as possible to static logic for schedule and design simplicity.

i personally think blind domino removal was a step backwards on power/performance on the nehalem products. high activity, high power logic will end up using less power overall with a domino implementation, especially when the clocking has a fine grain disable, which is the case with a lot of ALU logic. it's the "domino for domino's sake" that hurts. workplace politics sucks.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Very interesting guys, many thinks for the lessons here.

So this is very much a gray area...as with all publicized metrics of success the trade-offs inherent with the implementation is relegated to the fine print or simply not mentioned at all.

But it begs the question then - why do it at all then?

What did Intel gain by pushing Nehalem into static CMOS? I assumed from their presentation of the topic that it was going to enable them to reduce power consumption, or trade-off for higher clockspeeds with equivalent power consumption. But is there some other reason (is it easier to implement new architecture with static CMOS, so budgetary reasons in the design phase?)
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Very interesting guys, many thinks for the lessons here.

So this is very much a gray area...as with all publicized metrics of success the trade-offs inherent with the implementation is relegated to the fine print or simply not mentioned at all.

But it begs the question then - why do it at all then?

What did Intel gain by pushing Nehalem into static CMOS? I assumed from their presentation of the topic that it was going to enable them to reduce power consumption, or trade-off for higher clockspeeds with equivalent power consumption. But is there some other reason (is it easier to implement new architecture with static CMOS, so budgetary reasons in the design phase?)

It's great for shrinks, it's great as process variation goes up, it's easier to clock (if you can get rid of phase paths, duty cycle stops being an issue), you can use cheaper resources to design and maintain it*, it's not that much slower if you have to margin your dynamic circuits too much, if you're switching from dual-rail dynamic logic you just saved half of your routing tracks (which are more valuable as transistors shrink faster than wires), etc etc etc. Sometimes I'm disappointed that I got into the industry at a point where things were changing that made the more interesting-to-design circuit styles less optimal.

*As I mentioned above, you can't really break a static CMOS design. You plop down a bunch of cells, wire them up, and do a few checks to make sure the result is "good enough". With anything fancier, you have to do much more analysis to make sure it'll work across the PVT corners**, and it's harder to optimize (not only do you add a lot of constraints, but you also add a lot more dimensions of flexibility).

** Keep in mind that you technology guys not only give variation out of the box, but also make the transistors change as they age.

One more thing I just thought of: if you look at Intel's 45nm layouts, they're extremely regular. Fancy circuit styles tend to want more fancy / convoluted / irregular layout (partly because you end up with different number of NMOS and PMOS transistors).
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: CTho9305
It's great for shrinks, it's great as process variation goes up, it's easier to clock (if you can get rid of phase paths, duty cycle stops being an issue), you can use cheaper resources to design and maintain it*, it's not that much slower if you have to margin your dynamic circuits too much, if you're switching from dual-rail dynamic logic you just saved half of your routing tracks (which are more valuable as transistors shrink faster than wires), etc etc etc. Sometimes I'm disappointed that I got into the industry at a point where things were changing that made the more interesting-to-design circuit styles less optimal.

*As I mentioned above, you can't really break a static CMOS design. You plop down a bunch of cells, wire them up, and do a few checks to make sure the result is "good enough". With anything fancier, you have to do much more analysis to make sure it'll work across the PVT corners**, and it's harder to optimize (not only do you add a lot of constraints, but you also add a lot more dimensions of flexibility).

** Keep in mind that you technology guys not only give variation out of the box, but also make the transistors change as they age.

One more thing I just thought of: if you look at Intel's 45nm layouts, they're extremely regular. Fancy circuit styles tend to want more fancy / convoluted / irregular layout (partly because you end up with different number of NMOS and PMOS transistors).

removing domino circuitry would also result in a more shallow frequency spread. a highly transparent static CMOS design will be less susceptible to frequency tanking than a domino heavy design which as CTho mentioned is heavily restricted by phase paths and vulnerable to clock duty cycle issues. end result is that the junk bin with the slowest transistors will be faster with a transparent static CMOS design than the domino design.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Originally posted by: CTho9305
It's great for shrinks, it's great as process variation goes up, it's easier to clock (if you can get rid of phase paths, duty cycle stops being an issue), you can use cheaper resources to design and maintain it*, it's not that much slower if you have to margin your dynamic circuits too much, if you're switching from dual-rail dynamic logic you just saved half of your routing tracks (which are more valuable as transistors shrink faster than wires)

I can certainly appreciate the gravity of those improvements. Particularly the improved margin to process variation as we all are acutely aware of the limitations of getting a uniform electrical channel as the physical channel narrows and dopant limitations become significant.

This scaling issue is here to stay as your channel becomes 100 atoms or less but you can't shove a dopant atom into the lattice but for one out of every 1,000 or 10,000 sites. And since implant is a truly stochastic process their is nothing that can be done to improve this aspect of xtor variability from chip to chip.

And of course this matters more as the die size is gets larger and larger. I can see the rational here in Intel going from MCM'ed quads to monolithic quads (and 8-core Beckton) and wanting to avoid as much impact from process variability across 700-800 mm^2 as possible.

Originally posted by: CTho9305
** Keep in mind that you technology guys not only give variation out of the box, but also make the transistors change as they age.
You're welcome ;) Of course we'd probably make the xtors a little more robust if only we had the resources to do so, but the design guys are budget hogs who all want their 6-figure salaries to go along with your 9-3 day jobs (plus 2hrs for lunch of course, and coffee breaks every 15min) :p

Originally posted by: dmens
removing domino circuitry would also result in a more shallow frequency spread. a highly transparent static CMOS design will be less susceptible to frequency tanking than a domino heavy design which as CTho mentioned is heavily restricted by phase paths and vulnerable to clock duty cycle issues. end result is that the junk bin with the slowest transistors will be faster with a transparent static CMOS design the domino design.

Well this certainly speaks to financial motivation then in the production environment sense. One way to boost margins is to do what you can to increase your NUBs, and you can increase your NUBs if you stop scrapping out unsaleable low-frequency bins.

Follow-up question for the panel: would Phenom have stood a better chance of hitting 3GHz+ clocks had it been engineered with static CMOS to better insulate the clocks from the 65nm process variation? Or is static CMOS for Phenom not plausible because the beta-ratios will be more like 2 instead of 1?
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Idontcare
Follow-up question for the panel: would Phenom have stood a better chance of hitting 3GHz+ clocks had it been engineered with static CMOS to better insulate the clocks from the 65nm process variation? Or is static CMOS for Phenom not plausible because the beta-ratios will be more like 2 instead of 1?

That would depend on what limits its critical paths.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
I agree - it depends on the limits on the fundamental critical paths of the design.

Nice thread.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,042
3,522
126
not to be mean,

and you guys completely lost me after the second post.

But shouldnt this thread belong in highly technical?

I mean if im getting lost, this has to be highly technical. :X

 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Originally posted by: aigomorla
not to be mean,

and you guys completely lost me after the second post.

But shouldnt this thread belong in highly technical?

I mean if im getting lost, this has to be highly technical. :X

If it was in HT, I would've been able to jump into this conversation earlier!! :(
 

VirtualLarry

No Lifer
Aug 25, 2001
56,571
10,206
126
Originally posted by: aigomorla
not to be mean,

and you guys completely lost me after the second post.

But shouldnt this thread belong in highly technical?

I mean if im getting lost, this has to be highly technical. :X

Don't feel bad, I still don't know what a transistor beta ratio is. Guess I should have paid attention in school.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Originally posted by: VirtualLarry
Originally posted by: aigomorla
not to be mean,

and you guys completely lost me after the second post.

But shouldnt this thread belong in highly technical?

I mean if im getting lost, this has to be highly technical. :X

Don't feel bad, I still don't know what a transistor beta ratio is. Guess I should have paid attention in school.

For MOSFET CMOS it's the ratio of the PMOS to NMOS transistor widths (not channel length, which some folks confuse as being the width from those fancy cross-sections that get so much publicity) necessary to result in identical Idrive (current output).

Alternatively it can be defined as simply the ratio of NMOS/PMOS drive currents when normalized per unit width (milli amps per micron, for example).

IBM defines it as "Beta refers to the ratio of electrical conductivity between p-FETs and n-FETs." (public source, Power-constrained high-frequency circuits for the IBM POWER6 microprocessor)

Since PMOS Idrive has historically been around 1/2 of that of NMOS Idrive (for same xtor width) it takes a PMOS xtor of 2x the width of an NMOS xtor to have the same Idrive (current). Thus, historically, the beta ratio is said to be around 2.

Intel's 45nm process with HK/MG has very strong PMOS at 1.07 mA/µm, the strongest in the world, and strong NMOS at 1.36mA/µm resulting in a beta ratio of 1.27. (public source,Intel's 45nm CMOS Technology)