CPU design questions

Arachnotronic · Dec 29, 2011

Hi,

So I'm a CS major and have only a rudimentary understanding of what goes into really basic circuits/CPUs, and this background is mostly from a programmer's perspective. I'm curious...what are the "hard" parts of CPU design? What kind of "cleverness" is required? Is it at the higher level architectural level? At the implementation? The fabrication? All of the above?

Some examples of the kinds of technical challenges the guys at Intel/NVIDIA/AMD/etc. have to face to get these chips out the door would be fantastic.

paperwastage · Dec 29, 2011

somewhat related (GPU), a good read (but not that technical

)

http://www.anandtech.com/show/2937/1

TuxDave · Dec 29, 2011

In my opinion...

1) Fabrication is at the highest level of difficulty and requires the most "researcher" type mind
2) Architecture of the chip and system is the next most difficult and requires a good amount of creativity. They're the ones that comes up with new programming models etc...
3) And at the bottom comes circuit. We're really good at making things faster with new plans and ideas but that's all it'll be. Just improving what already exists (widening datapaths, removing a cycle here and there)

Then again I may be biased since I specialize in digital circuits and so everything I do seems painfully obvious

bononos · Dec 30, 2011

What does "diffused in Germany" mean?

PottedMeat · Dec 30, 2011

bononos said:
What does "diffused in Germany" mean?

the actual die was made in germany - i assume you're talking about some AMD processor.

http://www.anandtech.com/show/1821

then it may get shipped to china/malaysia/wherever for packaging

Modelworks · Dec 30, 2011

Since you are a programmer, one area that is really hard for hardware engineers is designing hardware that can be easily used by the programmer. I have seen so many great hardware designs with amazing features that fail because while the features are great it is such a pain to implement them for the programmer that those features don't get used .

For example I use a dsp chip, that chip is powerful but to program for the chip is like pulling teeth. it takes 9 lines of code just to set the processors clock speed up. You have to initialize the internal clock , check and wait for that clock to stabilize, switch to the external clock, wait for that clock to stabilize, select the internal PLL that multiplies the clock, check and wait for that clock to stabilize, then you get to start telling the processor what functions you want to enable in the hardware.

GammaLaser · Jan 1, 2012

Let's not forget verification/validation...and even the big co's can let flaws through the cracks (think Cougar Point bug).

TuxDave · Jan 1, 2012

GammaLaser said:
Let's not forget verification/validation...and even the big co's can let flaws through the cracks (think Cougar Point bug).

I find validation acts more like a cap on complexity. The stuff on a CPU is difficult, but it's only as difficult as validation will allow it.

mustard010 · Jan 1, 2012

Hi. Computer Engineering major here about to intern for Intel. I have to argue that architecture is the biggest thing in industry right now.

Let's start with single core processors. Before, big companies like AMD and Intel were trying to exploit performance gains from ILP (Instruction Level Parallelism). By aggressively pipelining instructions, one could theoretically speed up the process by n, where n is the number of pipe stages. Of course, that argument is grossly simplified.

AMD/Intel then hit a barrier. They can no longer attain speedups in core clock speed with the current single core architectures (we can talk about this more if you like, but from my understanding is: the design became too complex and too difficult).

As an alternative, the push now is in multicore/manycore architectures. Instead of Moore's law in doubling the number of transistors, it's now doubling the number of cores every 18 months. Manycore/multicore architecture seems promising, but programming for it is a little bit of a transition. CS students such as yourself have been taught to program sequentially, not in parallel.

An architecture push to multicore/manycore architecture requires a whole new programming model. A lot of research is being done right now to figure out how we can characterize computational idioms in programming. If you are interested, I suggest reading up on: http://view.eecs.berkeley.edu/wiki/Main_Page . A committee of research professors aggregated for seven months and discussed the future of computing.

The push is for manycore/multicore. GPU (many core) and CPUs (multi-core) are being fused to achieve a massive amount of parallel processors. Notice that AMD has came up with their APU (Accelerated Processing Unit) line which combines a CPU and GPU on the same die.

pm · Jan 2, 2012

So I'm a CS major and have only a rudimentary understanding of what goes into really basic circuits/CPUs, and this background is mostly from a programmer's perspective. I'm curious...what are the "hard" parts of CPU design? What kind of "cleverness" is required? Is it at the higher level architectural level? At the implementation? The fabrication? All of the above?

I'm not sure which I'd say is the "hardest part" but I would probably say that I mostly agree with TuxDave. From my perspective the hardest parts are in developing the fabrication recipe, and with I/O circuit design. But the parts that probably could use the most innovative thinking are in validation and security.

mustard010 said:
They can no longer attain speedups in core clock speed with the current single core architectures (we can talk about this more if you like, but from my understanding is: the design became too complex and too difficult).

High clock rates are neither too complex nor too difficult - they are too performance/power inefficient. And that lead to issues with cooling, but also with power delivery.

mustard010, which site are you going to be working at?

Aluvus · Jan 2, 2012

pm said:
High clock rates are neither too complex nor too difficult - they are too performance/power inefficient. And that lead to issues with cooling, but also with power delivery.

Insofar as the pursuit of higher clock rates led to longer pipelines (and as you imply, more involved systems for shuffling power around), I suppose that is a kind of complexity. But yes, fundamentally power/heat is what killed the clock speed horse race.

mustard010 · Jan 2, 2012

pm said:
High clock rates are neither too complex nor too difficult - they are too performance/power inefficient. And that lead to issues with cooling, but also with power delivery.

mustard010, which site are you going to be working at?

Thank you for that insight.... I knew I was missing something so fundamental. I am going to be working in Folsom, CA as a Graphics Software Engineer... mostly validating pre-silicon chips.

wuliheron · Jan 3, 2012

mustard010 said:
Hi. Computer Engineering major here about to intern for Intel. I have to argue that architecture is the biggest thing in industry right now.

I have to agree. The variety of processors today including bulldozer, sandy bridge, and ARM speaks to how well the fundamentals have been mastered and the growing interest in producing new architectures and heterogeneous computing. Some have speculated that 4 or more different processors would be ideal for heterogeneous architecture, but as far as I know there is no over arching theory to predict what would work best for any particular task.

Arachnotronic · Jan 22, 2012

Thank you, all. Some more questions:

As a CS/Pure Math major, I view problem solving from two kinds of perspectives. First, the math perspective is, "okay, here's a statement: prove that it's true" and from there a lot of creativity ensues in trying to figure out how to actually prove it.

From a CS (software) perspective, it seems as though there are two classes of problems: software design (i.e. how does everything 'fit' together, picking the right algorithms, modularity, etc.) and then there's the more low level algorithmic design (i.e. how do I actually *do* this and do it efficiently).

I just don't know what the..."unsolved" problems in CE/EE are. It seems to me on the fabrication front that all you need is tons of money to buy the right equipment to do things, and on the uarch front, people know what to do and simply need process tech to advance. But I can't imagine people getting paid an average of $130K/yr (at least @ Intel) just to "cook-book" existing ideas...?

Insight much appreciated.

TuxDave · Jan 22, 2012

Intel17 said:
Thank you, all. Some more questions:

As a CS/Pure Math major, I view problem solving from two kinds of perspectives. First, the math perspective is, "okay, here's a statement: prove that it's true" and from there a lot of creativity ensues in trying to figure out how to actually prove it.

From a CS (software) perspective, it seems as though there are two classes of problems: software design (i.e. how does everything 'fit' together, picking the right algorithms, modularity, etc.) and then there's the more low level algorithmic design (i.e. how do I actually *do* this and do it efficiently).

I just don't know what the..."unsolved" problems in CE/EE are. It seems to me on the fabrication front that all you need is tons of money to buy the right equipment to do things, and on the uarch front, people know what to do and simply need process tech to advance. But I can't imagine people getting paid an average of $130K/yr (at least @ Intel) just to "cook-book" existing ideas...?

Insight much appreciated.

I disagree on the fabrication side that it's all about buying the right equipment. If I asked you to make a material that will scratch diamond, what equipment will you go buy to make it happen? It takes quite a bit of research and creativity on materials etc... to figure out an answer to that.

As for uArch, I go back to your CS issue where it's creative to come up with algorithms and therefore the program that will use that algorithm. From what I see in uArch it's the same question. What's the best algorithm to use and then build the hardware to do it. CS and uArch are very linked together where if software finds a great algorithm, uArch will try to implement it in hardware. Same if uArch finds a great algorithm, CS will write the program to use it.

Arachnotronic · Jan 22, 2012

So, what's the difference between, say, "architecture" and "implementation". So the architects lay down the "big picture" and then it's up to the EE/CE folk to actually lay it out, design circuits, etc. What kinds of problem solving goes on there? Can one implementation of the same uArch be "better" than another via better circuit designs, implementation, etc.?

TuxDave · Jan 22, 2012

Intel17 said:
So, what's the difference between, say, "architecture" and "implementation". So the architects lay down the "big picture" and then it's up to the EE/CE folk to actually lay it out, design circuits, etc. What kinds of problem solving goes on there? Can one implementation of the same uArch be "better" than another via better circuit designs, implementation, etc.?

Yes, the same implementation of the same uArch can be better with better circuit design/hardware support. There's two ways, one is obvious and the less not so much.

Obvious way: FP operations typically take multiple cycles. Take for example, FP divides. Those take many cycles to complete but with a good circuit design (and a math major) you can improve performance without touching the uArch by simply reducing the number of cycles in a divide. It's sort of a way how circuit designers need to come up with "algorithms" like CS majors.

The less obvious way circuit design can help in the same uArch is due to the fact that most complex instructions are a series of microcode. Direct circuit solution may not have been implemented because putting in this special instruction may have caused all the other uses of that hardware to slow down and therefore hurt performance. The only example that comes to mind (that I can disclose) would be the implementation of STTNI hardware. How does one build the hardware to do string matching on such an arbitrary scale without becoming a beast and taking up a ton of hardware and area AND fast enough to make it worthwhile.

So from my interactions with uArch, they know what algorithms and instructions need to get done. They know how many steps/cycles are required to make it worthwhile and so they need to talk to the circuit designers to figure out how are they going to do this complex instruction in 3 cycles or through two 1 cycle uops, etc....

And then there are circuit problems that don't have to do with uArch ever. For example, how does one increase the width and entries of a register and maintain timing on the new process? Process scaling just makes everything terrible so a circuit designer needs to figure out if a new circuit topology will make it work. The setback is that if it doesn't work, the CPU will take a performance hit because you either have to allocate more cycles or reduce functionality.

Arachnotronic · Jan 22, 2012

So there's definitely a lot of creative problem solving going on with all aspects of CPU design/fabrication and it isn't simply a matter of making high level decisions and then having people do a straightforward implementation?

A5 · Jan 22, 2012

Intel17 said:
I just don't know what the..."unsolved" problems in CE/EE are.

Here's a quick list off the top of my head, there are many more:
Useful quantum computers
Cheaper and more efficient solar cells
Organic ICs
Seamless heterogeneous CPUs
Cheaper lasers for fiber optic networks
Rural broadband access
Reliable gigabit+ wireless (both short and long range)

There's more stuff in E-Mag, DSP, and even Analog that I don't even know about because I've never studied it. EE is an incredibly broad field.

A5 · Jan 22, 2012

Intel17 said:
So there's definitely a lot of creative problem solving going on with all aspects of CPU design/fabrication and it isn't simply a matter of making high level decisions and then having people do a straightforward implementation?

Yes. To put it in your terms, how much problem solving do you do to go from a flowchart in Visio to a working C++ program?

TuxDave · Jan 23, 2012

Intel17 said:
So there's definitely a lot of creative problem solving going on with all aspects of CPU design/fabrication and it isn't simply a matter of making high level decisions and then having people do a straightforward implementation?

Absolutely or else I wouldn't be doing it. It's as creative as you want to make it. There are engineers that don't want to think and want to just do everything with the "dumb" solution and they really suffer because they can't figure out what to do when it doesn't hit frequency or power. And if those guys are unchecked you will lose potential performance in the final product (and sometimes that happens)

pm · Jan 25, 2012

Intel17 said:
What are the kinds of "problems" that CPU architects/design engineers have to face? What kinds of problem solving cleverness is there in the transition from saying, "oh, the CPU will have these execution units and this cache" to actually designing it, and then from there actually building it?

What kinds of things make one CPU design team produce a better CPU than another (say, Bulldozer v.s. Sandy Bridge)? Is it at the high architectural level, or is the circuit design & such just better? Or a combination of both?

The problems are a broad swath of issues. In the logic design stage, you develop the CPU using code that looks suspiciously like the programming languages of either C or Pascal, and during this phase, coders can create bugs and these bugs can be hard to find and hard to fix. For example, a portion of the design might have a counter and then a series of “if” statements (or a “case&#8221

, and the coder might forget to include a stage in the “if” flow for a scenario that they can’t imagine happening, but can. In the circuit design stage, you are getting input signals from other engineer’s blocks and those signals might be under-driven and thus produce a really slow edge-rate (it takes too long to get from a 0->1 or visa-versa) so then you have to get those fixed, or you might find that you have a reliability problem in a large wire and you need to change the metal width, but this causes other things to move, or you might find that someone goofed and the schematics don't do the same thing as the high-level RTL language. Or, and this might seem unbelievably odd, but you might spend a lot of time trying to get the “tools” (the CPU CAD software) to understand whatever clever thing you have happened to put together and thus waste a whole lot of time trying to get some program to agree with you that your circuit will work.

There’s a huge amount of cleverness that goes into everything. Some of it is minor cleverness and some is major. I remember working on a CPU codenamed McKinley and someone on my team came up with the idea of a “prevalidated cache” and that this would greatly speed up cache accesses, and instead of a two or three cycle L1 cache, we had a single cycle cache. So that’s a cache that has ½ to 1/3 lower latency than a traditional design. There were some negatives to the idea too, but overall it was a huge performance improvement. But there’s an idea that took the traditional design and changed it fairly dramatically to result in a much faster design. But beyond the big ideas, there’s a lot of little ideas of the way things are assembled. I remember once I was working on circuit that involved two 64-bit rotators – which allows you to shift a 64-bit value right or left (numbers wrap around, so if you rotated “0111” by two bits right, you’d get “1101&#8221

, and every byte could rotate to every other byte, and, it supported switching “endianness” (the ordering of the bits) on the fly. It was a sub-component to a register file. So, when you think about how to do this, it’s more than a bit confusing. You draw this diagram and any one bit can move to any one of 16 places. So, two of us were thinking about it and we realized that we have lots of time to do this, so rather than build two of these monstrosities for the two 64-bit values, we could build one and then use the circuit on the first clock phase to do one 64-bit register, and then swap and use the second clock phase to do the other 64-bit register. Thus saving a lot of space and a bit of power. I’m not sure where this falls in on the cleverness spectrum, but I thought it was pretty cool. And this is for some circuit that normally no one would ever hear about that was a precalculation for a “store bypass” from a cache to a register file.

So what makes one team better is a super hard question, and I’m not sure that I know the answer. There’s a lot to a successful design. It’s very hard to get a huge team of people to work together so that everyone is working together as a cohesive team. For example, I remember one project that had the marketing department constantly changing the requirements. On another I remember that practically the entire first and second levels of management were made up of engineers who had never managed anything before. I remember hearing about one that had a huge problem figuring out the timing of paths between units so that when the chip pieces were all fitted together, there were huge timing paths. I remember teams that struggled with large team defections (a large number of influential engineers left as a group) or layoffs in the middle of the project. I remember hearing of problems with fabs where the circuit performance specs were supposed to be pretty good, but then the fab had yield problems and to resolve them, they slowed the circuits and then a decent design was now much slower and less competitive in the marketplace. I remember one design that was spread across 5 time zones, and had about a third of the team working 12.5 hours off from another third of the team, so that if you asked a question, it took a day to get an answer. I remember one design that didn’t have good access to the electronic characteristics of the circuits (the process file) and a lot of the design was very aggressive and when it was manufactured it had huge electrical design problems.

Beyond what can go wrong, another big lever towards a great design is how many engineers your team has and how much you rely on automated design tools. A great example of this was the DEC Alpha.

The main contribution of Alpha to the microprocessor industry, and the main reason for its performance, was not so much the architecture but rather its implementation. At that time (as it is now), the microchip industry was dominated by automated design and layout tools. The chip designers at Digital continued pursuing sophisticated manual circuit design in order to deal with the overly complex VAX architecture. The Alpha chips showed that manual circuit design applied to a simpler, cleaner architecture allowed for much higher operating frequencies than those that were possible with the more automated design systems.

So I have seen a lot of what can go wrong. I’m not sure what exactly makes projects go well – but it’s my opinion (which a lot of engineers disagree with) that a large portion of it comes down to who your “lead engineers” are, and how good your management team is. I think good leadership is essential in a large project and without good leadership, you end up with a muddled mess that turns out to be late. If you have literally a hundred people trying to work together, you really need a core team of leaders who are experienced at this who can get everyone to work together as a cohesive whole. But beyond just good leadership, you need all of the core design team sub-groups to know what they are doing – a validation team that doesn’t find a lot of bugs until after the design has gone off to manufacturing can totally crater a project – and you need a good manufacturing group to make the chip, and then you need good marketing team to sell it and work with other companies to use the design – the classic example to me of what happens without good marketing is the DEC Alpha

.

Cogman · Jan 25, 2012

IDK, I have a hard time doing boring jobs, so I would imagine the hardest part would be hammering out timing problems.

pm · Jan 25, 2012

Cogman said:
IDK, I have a hard time doing boring jobs, so I would imagine the hardest part would be hammering out timing problems.

It's funny but I don't think hammering out timing problems is boring. Objectively taking a few steps back, my job for some people would probably be unbelievably boring. But I like it. I think timing is kind of fun - intra-block timing which I totally own is nice because you can see things making significant progress. I really like days when I come in, fix a ton of stuff and go home and I can see that "well, that's most of my problems fixed... for now". And then inter-block timing - timing between you and someone else - involves hanging out in someone's cube trying to figure out who will fix what and that's cool too.

The boring parts of the job for me are: meetings, particularly meetings that barely have anything to do with me; status reports ("last week I did X, Y, and Z and this week I'll do A, B and C"); documentation; and, sometimes, planning.

soccerballtux · Jan 28, 2012

Intel17 said:
Hi,

So I'm a CS major and have only a rudimentary understanding of what goes into really basic circuits/CPUs, and this background is mostly from a programmer's perspective. I'm curious...what are the "hard" parts of CPU design? What kind of "cleverness" is required? Is it at the higher level architectural level? At the implementation? The fabrication? All of the above?

Some examples of the kinds of technical challenges the guys at Intel/NVIDIA/AMD/etc. have to face to get these chips out the door would be fantastic.

finding a way to synchronize all the oscillators on the die in a way that data can propagate without losing sync of which cycle you're on is very tricky business.
You're clocking at 3-4ghz, which gives you ~5cm of signal travel per cycle, or 1.25cm per 90 degrees of phase. That's around half the length of your CPU die, so if you didn't do any synchronizing and just brought them all up at the same time, your clock signal at any instant in time would be 90 degrees out of phase with the rest of the chip.

CPU design questions

Lifer

Golden Member

Lifer

Diamond Member

Lifer

Lifer

Member

Lifer

Member

Elite Member Mobile Devices

Platinum Member

Member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Diamond Member

Lifer

Elite Member Mobile Devices

Lifer

Elite Member Mobile Devices

Lifer