Hi! Can you help me understand CPUs?

TechGod123 · Oct 30, 2015

New here and I'm quite interested in how CPUs work!

So, how exactly does a CPU work?
I know it has the control unit and the ALU.
I also know it uses different types of registers like how the storage register is used to store the result of an arithmetic operation or a logical calculation (correct me if I'm wrong) and a general register that does whatever the control unit tells it to as well but there's obviously a lot more I don't know.

What is a Floating point unit? What exactly is it? I know that AMD's Bulldozer shared one FPU between two cores which proved to be a pretty big bottle neck. Why?

If the core "modules" posed such an issue to AMD for performance, why couldn't they use a software implementation to "merge" two physical cores into one like the rumour was back in 2006 for AMD and reverse hyperthreading? Is that possible? If not, why not?

Does every core get a control unit? Or is it more like there's one control unit for one CPU and then each core has its own ALU? If each core gets a control unit, wouldn't that introduce overhead?

What exactly is an instruction set? I see talks of Intel implementing new instruction sets into their CPUs. How does that impact performance? Does it mean that new instruction sets means that the CPUs process the raw data they've been given in a better and more efficient manner?

What is branch prediction? Why is a deep pipeline(correct me if this is the wrong terminology) a bad thing if branch prediction algorithms aren't good?

Sorry for all these questions ( I have more, lol) but YouTube was no help and I think I want to be a CPU architect one day

Roland00Address · Oct 30, 2015

Everybody else is going to go into more detail about all the details like cache latency, speed, branch prediction, etc and how they actually work. I am going to stick on the why it is important

But understand that your CPU often has "bottlenecks" where the cpu completes a task, but it needs to retrieve the data from the cache, or the ram, or the harddrive and thus it just sits there waiting. Waiting CPU means you do not get the full potential of your silicon for the thinking/math parts are waiting for the sorting and organizing parts to get the data ready.

Now there are various technologies that allow the cpu to do another thing during this time like Hyper Threading. Improving the cache speed, brach prediction (and thus getting the data in the right place before it is needed, think of a car assembly line with different stages of the car and if there is a bottleneck it takes time to catch up).

AMD problem is not the modules, it is the feeding the beast that is the problem. AMD figured why don't we do some tricks to slow it down even more and make the organization and sorting even slower for while we take a small penalty we save silicon and make it easier to do 2 cores with less space. (Now if their foundries were not 2 generations behind they would not have to do this). They figured they could make up for this by higher clockspeeds and a pipeline that could shuffle data around in such a way that while it still takes forever to retrieve the data you would not have to do it that much.

In other words AMD designs were always designed to get more cores for less die space for three reasons.

1) Marketing

2) To be less behind on cores per silicon due to the foundry differences

3) Trying to save money on silicon.

In other words AMD complete downplayed all the problems that were kinda obvious for they were confident with their engineers and their foundries to find a way to minimize the downsides and maximize the upsides of the tradeoffs that the higher ups decided were important.

TechGod123 · Oct 30, 2015

Roland00Address said:
Everybody else is going to go into more detail about all the details like cache latency, speed, branch prediction, etc and how they actually work. I am going to stick on the why it is important

But understand that your CPU often has "bottlenecks" where the cpu completes a task, but it needs to retrieve the data from the cache, or the ram, or the harddrive and thus it just sits there waiting. Waiting CPU means you do not get the full potential of your silicon for the thinking/math parts are waiting for the sorting and organizing parts to get the data ready.

Now there are various technologies that allow the cpu to do another thing during this time like Hyper Threading. Improving the cache speed, brach prediction (and thus getting the data in the right place before it is needed, think of a car assembly line with different stages of the car and if there is a bottleneck it takes time to catch up).

AMD problem is not the modules, it is the feeding the beast that is the problem. AMD figured why don't we do some tricks to slow it down even more and make the organization and sorting even slower for while we take a small penalty we save silicon and make it easier to do 2 cores with less space. (Now if their foundries were not 2 generations behind they would not have to do this). They figured they could make up for this by higher clockspeeds and a pipeline that could shuffle data around in such a way that while it still takes forever to retrieve the data you would not have to do it that much.

In other words AMD designs were always designed to get more cores for less die space for three reasons.

1) Marketing

2) To be less behind on cores per silicon due to the foundry differences

3) Trying to save money on silicon.

In other words AMD complete downplayed all the problems that were kinda obvious for they were confident with their engineers and their foundries to find a way to minimize the downsides and maximize the upsides of the tradeoffs that the higher ups decided were important.

So CMT could have been a viable alternative to SMT if AMD provided enough resources to the cores? Like for example doubling cache and things, significantly making the pipeline bigger to allow more data to go through and the like?

TechGod123 · Oct 30, 2015

Roland00Address said:
Everybody else is going to go into more detail about all the details like cache latency, speed, branch prediction, etc and how they actually work. I am going to stick on the why it is important.

If anyone actually responds in the first place

Ken g6 · Oct 31, 2015

TechGod123 said:
If anyone actually responds in the first place

Patience, grasshopper! It's late on a Friday night here in the USA.

TechGod123 said:
Sorry for all these questions ( I have more, lol) but YouTube was no help

Did you see this video? I have a couple of articles too.

TechGod123 said:
What is a Floating point unit? What exactly is it? I know that AMD's Bulldozer shared one FPU between two cores which proved to be a pretty big bottle neck. Why?

Here's the basic way a floating point number is stored in a computer:

175px-IEEE_754r_Half_Floating_Point_Format.svg.png

As you can see, it's very different from a normal binary number. It takes special circuits to perform arithmetic with it.

Floating point really isn't used very much. I don't think it was a really bad decision by AMD to consolidate their FPUs. What they did is put two 128-bit (SSE-sized) FPUs together, which could theoretically be used separately by two cores; or they could be combined for AVX work on one core at a time. I think it's just certain practical aspects of how they implemented the FPU design and the overall CPU design, relating to how many instructions could run at once and how fast, that make AMD so inferior in the FPU department. That, and certain benchmarks (Prime95; also see my sig) are more FPU dependent than most real workloads.

TechGod123 · Oct 31, 2015

Ken g6 said:
Patience, grasshopper! It's late on a Friday night here in the USA.

Did you see this video? I have a couple of articles too.

Here's the basic way a floating point number is stored in a computer:

As you can see, it's very different from a normal binary number. It takes special circuits to perform arithmetic with it.

Floating point really isn't used very much. I don't think it was a really bad decision by AMD to consolidate their FPUs. What they did is put two 128-bit (SSE-sized) FPUs together, which could theoretically be used separately by two cores; or they could be combined for AVX work on one core at a time. I think it's just certain practical aspects of how they implemented the FPU design and the overall CPU design, relating to how many instructions could run at once and how fast, that make AMD so inferior in the FPU department. That, and certain benchmarks (Prime95; also see my sig) are more FPU dependent than most real workloads.

Oh and some of this response went completely over my head so I'll do some research on the confusing things.

Ken g6 · Oct 31, 2015

TechGod123 said:
Oh and some of this response went completely over my head so I'll do some research on the confusing things.

That's partly because it was late and I forgot to link the words "floating point".

K7SN · Oct 31, 2015

TechGod123 said:
New here and I'm quite interested in how CPUs work!
...
Sorry for all these questions ( I have more, lol) but YouTube was no help and I think I want to be a CPU architect one day

Every thing comes down to improving performance of a CPU - A CPU designer with today's technology still has one limiting factor - the speed of light. If your talking about CPUs like AMD and Intel sell currently there is a long way to go; Reachable perhaps reachable in your lifetime; not mine. Here below is a stale (2009) appraisal of the end of advancement.

[URL said:
http://www.popsci.com/gadgets/article/2009-10/scientists-say-moores-law-will-diein-75-years[/URL]]

Silicon wafers. Quantum computing. Light-based processors. Any way you slice it, scientists say that processor speeds will absolutely max out at a certain point, regardless of how hardware or software are implemented.

Lev Levitin and Tommaso Toffoli, two researchers at Boston University, devised an equation which sets a fundamental limit for quantum computing speeds. According to their studies, a perfect quantum computer can generate 10 quadrillion more operations per second than fastest current processors. They estimate that the maximum speed will be reached in 75 years.

Others, including MIT professor Scott Aaronson, think that even with the emergernce of quantum computing, Moore's Law will die even sooner, in 20 years. Gordon Moore (along with others) predicted that the axiom would die anywhere between 4 and 15 years from now with regard to silicon chips.

But Levitin says that a variety of factors, such as technological barriers, will slow the process, leading them to believe that processors still have 75 years of evolution left.

Above is the goal - you start here with physics below:

http://electronics.stackexchange.com/questions/122050/what-limits-cpu-speed

TechGod123 · Oct 31, 2015

Ken g6 said:
That's partly because it was late and I forgot to link the words "floating point".

Brb. I think my head just exploded...

TechGod123 · Oct 31, 2015

K7SN said:
Every thing comes down to improving performance of a CPU - A CPU designer with today's technology still has one limiting factor - the speed of light. If your talking about CPUs like AMD and Intel sell currently there is a long way to go; Reachable perhaps reachable in your lifetime; not mine. Here below is a stale (2009) appraisal of the end of advancement.

Above is the goal - you start here with physics below:

http://electronics.stackexchange.com/questions/122050/what-limits-cpu-speed

Alright. Thanks. I'll give that a go.

know of fence · Nov 2, 2015

Just want to share a great video testing and laying out and testing circuit logic using dominoes.

https://www.youtube.com/watch?v=lNuPy-r1GuQ

adamantine.me · Nov 2, 2015

Well, I audited a course that taught this stuff and somebody asked how floating points were done right after we went over some basic arithmetic examples and the TA basically gave the same response saying it was very different and confusing, so don't worry about that.

I'm really tired but the instruction set could impact CPU performance by being smaller. The Wikipedia page actually has a lot more than I imagined: https://en.wikipedia.org/wiki/Instruction_set

That answers some of your questions. As for me...

set bed
add me
reset

richaron · Nov 3, 2015

TechGod123 said:
New here and I'm quite interested in how CPUs work!

So, how exactly does a CPU work?
*SNIP*

Equivalent of joining an aviation group and asking "How does an aeroplane work?" Your questions are just like "What are the windows made out of? How does a Pilot's headset work? What is air-conditioning?"

Answer: There are experts who can spend their whole career working in only one area of your questions. And whilst each of your questions address an area which is (arguably) fundamental to aircraft(/CPU) design, you are completely ignorant of the fundamental forces involved.

If you were truly interested in the field I would recommend you start with a course (or book) on discrete mathematics. Get your head around the idea of data types and logical operators building flop-flops and shift registers. Then build upon this basic understanding.

This is a ground-up approach, something which very few in this forum have followed*. If you're not interested in understanding the basics, that doesn't mean you have to give up on your love of computers/technology. And it doesn't mean you'll never know anything about hardware. Just don't bother asking "how does a CPU work?"...

*Not saying most members here are clueless.

Edit: Problem with the aircraft example is that we all already have some concept of air, and movement, and metals, and pressure etc. For the most part computer design requires you learning a whole new abstract world.

sm625 · Nov 3, 2015

TechGod123 said:
What is a Floating point unit? What exactly is it? I know that AMD's Bulldozer shared one FPU between two cores which proved to be a pretty big bottle neck. Why?

It really wasnt. The cache latency is what killed bulldozer. It would have been a fine design if they could have matched Intel's cache latency at every level. One of the reasons Athlon 64 was faster than P4 was due Athlon 64 having a 3 cycle L1 cache versys P4's 4 cycle, and A64's 17 cycle L2 versus P4's 23 cycle L2. AMD cant just match Intel's performance, it has to exceed it. Given the massive outperformance I just mentioned, overall performance was still closer than you'd expect, simply due to Intel's many optimizations that a company like AMD cannot afford to match. That's why their core pipeline has to be significantly faster. Right now its not even close.

TechGod123 · Nov 4, 2015

richaron said:
Equivalent of joining an aviation group and asking "How does an aeroplane work?" Your questions are just like "What are the windows made out of? How does a Pilot's headset work? What is air-conditioning?"

Answer: There are experts who can spend their whole career working in only one area of your questions. And whilst each of your questions address an area which is (arguably) fundamental to aircraft(/CPU) design, you are completely ignorant of the fundamental forces involved.

If you were truly interested in the field I would recommend you start with a course (or book) on discrete mathematics. Get your head around the idea of data types and logical operators building flop-flops and shift registers. Then build upon this basic understanding.

This is a ground-up approach, something which very few in this forum have followed*. If you're not interested in understanding the basics, that doesn't mean you have to give up on your love of computers/technology. And it doesn't mean you'll never know anything about hardware. Just don't bother asking "how does a CPU work?"...

*Not saying most members here are clueless.

Edit: Problem with the aircraft example is that we all already have some concept of air, and movement, and metals, and pressure etc. For the most part computer design requires you learning a whole new abstract world.

You're quite right, I have no knowledge of the underlying maths or physics that goes behind processor nodes what floating points are etc...I just thought I would be able to jump in and glean some basic information about CPU architecture but again, as you've mentioned, its pretty important to know the underlying "stuff" before I try pursue this any further.

KWiklund · Nov 4, 2015

TechGod123 said:
New here and I'm quite interested in how CPUs work!

So, how exactly does a CPU work?
I know it has the control unit and the ALU.
I also know it uses different types of registers like how the storage register is used to store the result of an arithmetic operation or a logical calculation (correct me if I'm wrong) and a general register that does whatever the control unit tells it to as well but there's obviously a lot more I don't know.

Well, an answer to some of your questions might be found on sites like Wikipedia, although they may be higher level than you really want. If you're looking to understand things at the most basic level, you should look at some of the various books on the subject.

Mano and Kime's book is one of the common introductory texts on the subject, although they really build from the ground up: basic digital logic through to registers and data paths.

http://www.amazon.ca/Logic-Computer-Design-Fundamentals-Edition/dp/013198926X

Hennessy and Patterson's books are also good, but are targeted towards upper-year students who have already had the benefit of the one I mentioned above.
http://www.amazon.ca/Computer-Archi...6652137&sr=1-1&keywords=computer+architecture

Search

Hi! Can you help me understand CPUs?

TechGod123

Member

Roland00Address

Platinum Member

TechGod123

Member

TechGod123

Member

Ken g6

Programming Moderator, Elite Member

TechGod123

Member

Ken g6

Programming Moderator, Elite Member

K7SN

Senior member

TechGod123

Member

TechGod123

Member

know of fence

Senior member

adamantine.me

Member

richaron

Golden Member

sm625

Diamond Member

TechGod123

Member

KWiklund

Member

TRENDING THREADS