Are AMD cpus true octo-cores?

Exophase · May 17, 2013

galego said:
Bulldozer cores are based on the CIC (Clustered Integer Core) architecture developed by DEC in 1996.

I know you got this from Wikipedia but I find this claim questionable. Alpha 21264 has its two ALU/AGU pairs (that aren't even totally symmetric) physically partitioned to separate register files, but it still has the same scheduler in front of it and the same load/store queue, DTLBs, L1 dcache, etc ahead of it. DEC's scheme was in place purely to reduce the number of ports on the register files, and the only difference between it and cloning the reg file to double read ports - a bog standard technique - is that writes weren't automatically synchronized so it also doubled the write ports. The downside is that there was a cycle penalty for when the domains were crossed, but the domains could be crossed implicitly which means that the two clusters still worked on the same logical thread. Probably the only reason anyone made this claim is because both designs use the term cluster for their partitioning.

According to Andy Glew, who worked as a CPU architect for both Intel and AMD, the CMT concept was devised because he witnessed that SMT on Netburst was thrashing the small dcache. The idea was to replicate the dcache, but since the load/store units, AGUs, and even ALUs are on the critical path to the dcache they all needed to be replicated too, as well as part of the scheduler. AMD took this idea a little further, replicating the entire integer scheduler (and on Steamroller the decoders as well). The point is, this makes the root of CMT the split dcache, so if you don't have that you're probably not following the same idea.

Charles Kozierok · May 17, 2013

galego said:
An alternative notation used in some review sites is 4C/8T for Intel and 4M/8T for AMD.

I started with something like that, and given what others have pointed out about the integer cores not even being fully independent, I may move back to something like that. I was trying to avoid "module" as much as possibly because it's a jargony and very overloaded term in the computer industry (even worse than "core").

Exophase: I don't think anyone is saying they are the same, just that the Bulldozer design was inspired by the Alpha. I don't know for sure if it's true (and yes, there's a lot of nonsense on Wikipedia) but at least superficially they seem similar enough that it's easy to understand why the comparison is made.

I'm confused by your last sentence, because BD cores do have their own dcache, do they not? You seem to know more about microarchitecture than I do so I'd like to understand what you mean here.

Asterox · May 17, 2013

galego said:
Bulldozer cores are based on the CIC (Clustered Integer Core) architecture developed by DEC in 1996.

An alternative notation used in some review sites is 4C/8T for Intel and 4M/8T for AMD.

It is very clear what is actually Bulldozer Architecture, facts are very clear the FX-8350 is 4 Cores or Modules / 8 Threads CPU.

Actually the facts are well known for a long time, look at the below article on the thirteen years old AMD idea they called Double Pumped Core.:biggrin:

So we have an AMD expression Double Pumped Core, pay attention to the Core not Cores right.When it all boils down to the end, things are very simple Double Pumped Core is one Bulldozer Module or Core.So FX-8350 has 4 specific Cores or Modules = 8 Threads CPU.

Remember why complete Bulldozer CPU marketing got fired, long story short FX CPU marketing as first and true 8 Cores CPU or "Get 8-cores in Your System" was a wrong approach or marketing failure no dubt.

http://chip-architect.com/news/2000_09_27_double_pumped_core.html

http://youtu.be/OIhUh5068qc

Exophase · May 17, 2013

Charles Kozierok said:
Exophase: I don't think anyone is saying they are the same, just that the Bulldozer design was inspired by the Alpha. I don't know for sure if it's true (and yes, there's a lot of nonsense on Wikipedia) but at least superficially they seem similar enough that it's easy to understand why the comparison is made.

Frankly I don't see any real connection at all. I really don't think BD was inspired by this design in any way. I think someone saw similar terminology referring to different things and found a connection where there wasn't one.

The unreleased 21464 had SMT, but it too was more like other SMT uarchs that came out afterwards and didn't have anything in common with the things Bulldozer did differently.

Charles Kozierok said:
I'm confused by your last sentence, because BD cores do have their own dcache, do they not? You seem to know more about microarchitecture than I do so I'd like to understand what you mean here.

Yes that's what I'm saying. Bulldozer having two separate dcaches for the two integer cores is the root of the design. Anything that doesn't have split dcaches, like Alpha 21264, a totally different idea.

If you think about it, it's really the dual dcaches that gives a PD module any potential advantage over an IB core w/HT (at the same clock anyway). It enables the whole module to execute four loads per cycle, while an IB core can only execute two. None of the other differences in resources are so great, and the gap will narrow further with Haswell which has 4 ALUs.

mrmt · May 17, 2013

Asterox said:
Remember why complete Bulldozer CPU marketing got fired, long story short FX CPU marketing as first and true 8 Cores CPU or "Get 8-cores in Your System" was a wrong approach or marketing failure no dubt.

I don't think this had much to do with Bulldozer marketing campaign. There is a limit on what marketing can do, and selling a dog like Bulldozer is well beyond this limit. No marketing campaign could make up for all the benchmarks where Bulldozer sucked big time.

If I were to guess, I'd risk four reasons for the marketing team demise:

- Marketing gathers data of what the customers want and overall industry trends, and it's clear that Bulldozer is completely disconnected from these. They went for "moar cores, moar heat, moar power" when the market wanted more performance per watt and better single threaded performance.

- AMD consistently missed their sales forecasts for a long time. When marketing cannot correctly forecast demand, it is as good as dead for the company.

- They missed a huge opportunity in GPUs. When Nvidia lost focus in the consumer market, they had a superior product, a cheaper to manufacture product, and yet they could not make any money with their GPU line. And the result is that Nvidia recovered with Kepler and AMD still can't make any money from their GPU business.

- Hype culture. Names like David Baumann, John Fruehe, Mike Houston, Randy Allen comes to my mind in episodes when they outright lied about AMD products. Everybody in this list is out of the company, but when you have this number of senior guys spreading lies about the product, there are some things beyond each one of these individuals.

Those guys should have told some customers the same things they talked to the press, maybe even more things, and this should have tarnished AMD reputation with OEM and big customers (like Cray) and once it happens, marketing is toast.

The new AMD marketing team doesn't talk too much, but they at least don't lie with a straight face like Fruehe and Allen did, and this new behavior isn't exclusive to the marketing team. It's better to have Lisa Su claiming that they have nothing to throw against Intel 14nm than Seifert promising that they will have 35% more performance with Bulldozer.

Charles Kozierok · May 17, 2013

Exophase said:
Frankly I don't see any real connection at all. I really don't think BD was inspired by this design in any way. I think someone saw similar terminology referring to different things and found a connection where there wasn't one.

You may be right that they are not related and that this is one of those things that has taken on a life of its own. I think it's possible that people look at the Alpha block diagram and they notice what looks like two integer units and one FP unit, and see that similarity in BD. AMD has focused a lot of attention on the 2+1 concept.

Exophase said:
Yes that's what I'm saying. Bulldozer having two separate dcaches for the two integer cores is the root of the design. Anything that doesn't have split dcaches, like Alpha 21264, a totally different idea.

If you think about it, it's really the dual dcaches that gives a PD module any potential advantage over an IB core w/HT (at the same clock anyway). It enables the whole module to execute four loads per cycle, while an IB core can only execute two. None of the other differences in resources are so great, and the gap will narrow further with Haswell which has 4 ALUs.

I misunderstood what you were saying about the D$. If I'm not mistaken, your final paragraph answers a big part of my query as to what the difference is between the BD microarchitecture and just having twice as many EUs.

galego · May 17, 2013

Exophase said:
I know you got this from Wikipedia but I find this claim questionable.

Alpha 21264 Microprocessor
- Designed in 1996
- Bulldozer borrows architectural design

http://meseec.ce.rit.edu/551-projects/winter2011/2-2.pdf

Recall that 21264 architect worked for AMD latter.

Charles Kozierok said:
I started with something like that, and given what others have pointed out about the integer cores not even being fully independent, I may move back to something like that. I was trying to avoid "module" as much as possibly because it's a jargony and very overloaded term in the computer industry (even worse than "core").

I don't find any problem with DEC/AMD definition of cores, but find interesting the module notation, because hides the complexity from different definitions of core. Of course the problem arises again when someone ask what is the relation between 4M/8T and 4C/8T. As a rule of dumb:

4C < 4M < 8C

Exophase · May 17, 2013

galego said:
http://meseec.ce.rit.edu/551-projects/winter2011/2-2.pdf

Recall that 21264 architect worked for AMD latter.

You gave some comment from that graduate student's class presentation. I don't know where these two students got this information - could have possibly started from the same Wikipedia uncited claim - but it's wrong (also makes me think of this http://meseec.ce.rit.edu/551-projects/winter2011/2-2.pdf). They either didn't look into the actual 21264 design closely enough to realize the two shared nothing in common or they did and didn't understand it.

It doesn't matter if DEC engineers later worked at AMD. Some people seem to act like every AMD product was finished years ago at DEC.

Saying things like "DEC definition of cores" makes even less sense than saying their designs are related. 21264 had no multithreading whatsoever. You'd may as well say that a Pentium 1 has two cores.

Please try to give stronger arguments for your claims than "because X said so." X isn't always right, or the statement doesn't always mean what you think it does in context. You'd be doing yourself a big favor if you tried to really understand the science and engineering behind the claims so you can evaluate for yourself whether or not they make sense, instead of just following authority.

Zucker2k · May 17, 2013

Look up in the sky!
It's a bird...
It's a plane.....
No, it's a Bulldozer!

So here's a serious question: Does a bulldozer core constitute a .5, .6, .7, .8, .9, or is it a full core? This seems a ridiculous question, and so would any attempt to answer it conclusively because there's no baseline. What I'm inferring from most of what I'm reading is an attempt to answer the question from an efficiency angle, ie. SMT implemented differently. IMHO, Bulldozer's shared components were mostly necessitated by an attempt to rein in a bloated design - rather unsuccessfully, I might add, looking at the huge die size and outrageous power consumption. Mind you, it was also intended to operate at a rather high frequency, close to 5Ghz if memory serves. I'm not an expert by any means, but something had to give.

Edit: This does imply efficiency, but it's efficiency at the silicon level, if even that, at the expense of execution resources.

galego · May 17, 2013

Exophase said:
It doesn't matter if DEC engineers later worked at AMD. Some people seem to act like every AMD product was finished years ago at DEC.

Saying things like "DEC definition of cores" makes even less sense than saying their designs are related. 21264 had no multithreading whatsoever. You'd may as well say that a Pentium 1 has two cores.

By DEC/AMD definition of cores I mean a Clustered Integer Core

http://hpseewiki.ipb.ac.rs/index.php/Processor_architectures#Buldozer_architecture

Nobody is saying that DEC designs and AMD designs are exactly the same, but one is/was inspired in the other.

Exophase · May 17, 2013

galego said:
By DEC/AMD definition of cores I mean a Clustered Integer Core

http://hpseewiki.ipb.ac.rs/index.php/Processor_architectures#Buldozer_architecture

Nobody is saying that DEC designs and AMD designs are exactly the same, but one is/was inspired in the other.

Just because both used the term cluster to refer to something in their design doesn't mean that DEC's CPU has anything resembling BD's integer core design even the tiniest bit.

Bulldozer wasn't inspired by Alpha's 21264 dual register files. Repeating this won't make it true. I don't know why you're so married to this idea.

Search

Are AMD cpus true octo-cores?

Exophase

Diamond Member

Charles Kozierok

Elite Member

Asterox

Golden Member

Exophase

Diamond Member

mrmt

Diamond Member

Charles Kozierok

Elite Member

galego

Golden Member

Exophase

Diamond Member

Zucker2k

Golden Member

galego

Golden Member

Exophase

Diamond Member

TRENDING THREADS