Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

podspi · Jun 3, 2011

Why would Intel bother messing with AVX when it has just been introduced? I think Intel's focus on IB is going to be -- and should be -- the IGP.

If there was widespread OpenCL software support, we could actually see Llano outperforming Sandy Bridge for a lot of important tasks. Luckily for Intel this isn't the case, but GPU compute is something they have to pay attention to now.

Unlike a lot of other people, I don't think Intel is as hopeless as people think in GPU. I hope for AMD's sake that I am wrong. Seems to me that they missed their chance to really take advantage of what was once a significant lead. As it is, Llano should enjoy a good 6 months ~ a year advantage (talking retail brick and mortar).

One thing that struck me when looking at AMD's roadmap is that they don't plan on increasing the number of cores in the mainstream from 4. This suggests to me that they believe Bulldozer will have the necessary singlethread performance in the mainstream, even when Intel goes to quad-core in the mainstream.

I think AMD will "underprice" the FX series in an attempt to gain marketshare. Also, because they are now fabless, margin levels that before would have been unsustainable, are now sustainable. Just look at NV or ex-ATI's margins to see what I mean.

Anarchist420 · Jun 3, 2011

BD is probably going to suck because the L3 cache is clocked so slow. It has a lot more than SB, but BD's L3 cache is not clocked at the same freq as the cores.

In any event, with the L3 cache clocked so slow, it better run cool as shit. Vcore shouldn't be more than 1.22v at default clocks, and NB shouldn't be more than 1.1V if the L3 cache is only clocked @ 2.4GHz.

In addition to the slowly clocked L3 cache, AMD's chipset performance generally isn't as good as that of intel.

Arachnotronic · Jun 3, 2011

podspi said:
I think AMD will "underprice" the FX series in an attempt to gain marketshare.

Not likely. If they significantly underprice the chips (i.e. it blows away 2600k in everything for $250-$300) then they will probably sell a good number of them to enthusiasts, sure, but the enthusiast crowd is *tiny*. But if it's that good, they should be selling them to enthusiasts (who have deeper pockets for this kind of thing) for what they're worth.

Also, AMD will have a pretty full product line up. Llano dual core/quad core, FX 4 core, FX 6 core, FX 8 core. If the FX 8 core is really all it's cracked up to be, then they should be able to price it pretty sky high. In fact, even if it only TIES SNB, the marketing guys can still use the whole "8 cores versus their 4 cores" tactic.

AMD doesn't need to sell their chips at minimal profit margin/loss to "gain market share". AMD needs to market their stuff better, which is why I think they're going with more weaker cores and high frequency at the expense of IPC -- for marketing reasons. And if I were running the company, I'd be looking for ways to get my stuff to tell, too. This is the perfect way to do it.

Nemesis 1 · Jun 3, 2011

OF course you would work on IGP But you stay with a game plan also until you are behind the 8 ball. YOU do understand whole whole idea behind a GPCPU correct .

Well amd is using a game plan in fusion . same as nv now is with arm . and Intel with the Bridge family.

You do understand the nature of AVX don't you ? Why would one create a cpu that combines elements of the GPU without maximizing all the elements in that processor.
Open CL with take advantage of intels AVX elements , Thats why Intel mentioned specificly they been working on Open CL for the CPU.
SO on IB intel introduces CL for the IGP elements. Which now has few programms using it but that improves with time .
So the wise move by Intel is improve the AVX and the biggest bang for the buck would be going from 256bit to 512 bit. Because this offers intel 2 performance enhancements first to AVX. and Than open CL as thes are the elements in a cpu that CL use.

Arachnotronic · Jun 3, 2011

Anarchist420 said:
BD is probably going to suck because the L3 cache is clocked so slow. It has a lot more than SB, but BD's L3 cache is not clocked at the same freq as the cores.

Nehalem and Gulftown didn't have L3 that ran at the core's speeds either, you know.

Edrick · Jun 3, 2011

Why are so many people dismissing SB-E as overkill for the desktop (4 and 6 core) while at the same time claiming that the 8 core BD is going to be the king of the desktop. Makes no sense.

podspi · Jun 3, 2011

Intel17 said:
Not likely. If they significantly underprice the chips (i.e. it blows away 2600k in everything for $250-$300) then they will probably sell a good number of them to enthusiasts, sure, but the enthusiast crowd is *tiny*. But if it's that good, they should be selling them to enthusiasts (who have deeper pockets for this kind of thing) for what they're worth.

Also, AMD will have a pretty full product line up. Llano dual core/quad core, FX 4 core, FX 6 core, FX 8 core. If the FX 8 core is really all it's cracked up to be, then they should be able to price it pretty sky high. In fact, even if it only TIES SNB, the marketing guys can still use the whole "8 cores versus their 4 cores" tactic.

AMD doesn't need to sell their chips at minimal profit margin/loss to "gain market share". AMD needs to market their stuff better, which is why I think they're going with more weaker cores and high frequency at the expense of IPC -- for marketing reasons. And if I were running the company, I'd be looking for ways to get my stuff to tell, too. This is the perfect way to do it.

Not sure I agree with you. You're assuming that, if the FX-8 series is faster than SNB, that enthusiasts will pay anything for it, which I don't think is true.

The > $300 market is very, very small. Since AMD is fabless, you could make the argument that AMD could just price higher, sell less, and call it a day (depending on the demand characteristics for high-end processors, this could make them better off or worse, but I don't have any data on this)

BUT as we've talked about in other threads, GF hasn't exactly been overly successful in gaining customers other than AMD. Like it or not, while they are legally separate entities, at this point neither GF or AMD can survive on their own. Meaning they have to take each other into account when making decisions, meaning that volume needs to be high on 32nm.

Of course, if BD's delay is caused by 32nm shortages (already) then I highly doubt the leaked BD prices are true (or it says something very bad about BD performance).

Arachnotronic · Jun 3, 2011

podspi said:
Not sure I agree with you. You're assuming that, if the FX-8 series is faster than SNB, that enthusiasts will pay anything for it, which I don't think is true.

No, but if it's faster than 990x in single threaded and multi-threaded apps, don't think for a moment that it will sell for only $320.

Edrick · Jun 3, 2011

Intel17 said:
No, but if it's faster than 990x in single threaded and multi-threaded apps, don't think for a moment that it will sell for only $320.

Exactly!

AMD is a business and their goal is to make money. If they have a CPU that beats the 2600K, do not think for a second that it will be priced lower. People seem to forget that when AMD had the performance crown, they also charged a lot of money for them.

Arachnotronic · Jun 3, 2011

Edrick said:
Exactly!

AMD is a business and their goal is to make money. If they have a CPU that beats the 2600K, do not think for a second that it will be priced lower. People seem to forget that when AMD had the performance crown, they also charged a lot of money for them.

Yeah. They sold their top FX CPUs for > $1000. Hell, a 990x is a *steal* at $999 compared to those days.

Nemesis 1 · Jun 3, 2011

Edrick said:
Why are so many people dismissing SB-E as overkill for the desktop (4 and 6 core) while at the same time claiming that the 8 core BD is going to be the king of the desktop. Makes no sense.

Differant type of cores make it interesting. I wouldn't dismiss AMD so easily . IF it wasn't for intels resources and I have a pretty good feel for where Intel is heading . Were going to begin to see more clearly with with IB the genius of a man who likely be in history books next to moore . With haswell well see the result of a dream he has labored at since 1982.
He is a compute science icon in compiler work specificly VLIW . Intel bought a company to get the MAN . IT won't belong befor none will ask the question . Boris who?

Arachnotronic · Jun 3, 2011

Are you talking about Boris Babayan?

podspi · Jun 3, 2011

Edrick said:
Exactly!

AMD is a business and their goal is to make money. If they have a CPU that beats the 2600K, do not think for a second that it will be priced lower. People seem to forget that when AMD had the performance crown, they also charged a lot of money for them.

... The point I am trying to make is that higher price == higher margin, NOT higher profit. Without knowing the demand characteristics of AMD's products, we don't know what their optimal pricing strategy is. My intuition is that it is in AMD's best interest to sell high volume at low(er) margin than low volume but high margin.

Arachnotronic · Jun 3, 2011

But again, if we have a 990x killer, it won't be for less than $800.

Edrick · Jun 3, 2011

podspi said:
My intuition is that it is in AMD's best interest to sell high volume at low(er) margin than low volume but high margin.

That is what they have been doing recently. But in the early 2000's they were selling $1000 CPUs when they held the performance crown. So I guess we will find out in a few months.

podspi · Jun 3, 2011

Edrick said:
That is what they have been doing recently. But in the early 2000's they were selling $1000 CPUs when they held the performance crown. So I guess we will find out in a few months.

I remember :biggrin:

Either something has changed, that was originally a mistake (which they realized later on), or I am wrong.

Since I don't assume I am wrong :whiste:, and I remember AMD being capacity constrained for much of the 2000s, I am guessing that the combination of loss of marketshare they have experienced + higher volume capacity of GF means that there is a chance 32nm wouldn't be fully utilized, if they decided to sell the FX-series for $800 dollars.

Of course, we don't actually know anything about FX-series pricing, all we have are rumors. The only thing we really know about the FX series is that it is called the FX series. So I could be wrong, AND pricing could be much higher.

But if the pricing is as rumored, all I am saying is that it doesn't necessarily spell disaster for Bulldozer's performance. You are correct, we will see soon enough though. Although soon enough isn't soon enough. I was excited to find out performance a year ago... These last minute delays just seem cruel!

Arachnotronic · Jun 3, 2011

I think BD might be a disappointment because of AMD's lack of willingness to, you know, give us any benchmarks.

alyarb · Jun 3, 2011

You've seen the news. They themselves aren't happy, going to launch with the next stepping, yada yada I think I saw September in there. It's a desperate race and I am sure they are encountering all forms of manufacturing difficulties. Barcelona wasn't even on a new process and look at how long it took for that to get off the ground.

With BD we have no hindsight and no privileged technical details. All we know is that this is a a new, ambitious architecture, on a new ambitious process, and that there was a bit of indecision in their gate-first/last strategy which means they were having a tough time choosing between stressing their transistors and risking a low frequency ceiling, or going with gate-last which is a much more complicated workflow with no solid guarantees of higher yields or clocks. Gate last is also less dense I believe.

tweakboy · Jun 3, 2011

Good one.

Who would take the 4 Sandy over 8 core bd. ?

garagisti · Jun 3, 2011

There's this guy OBR, who seemingly has access/ possesses a BD. He suggests (read confirmed) base clocks of 3.2 Ghz for a chip with 8 cores. I think he has the B1 chip. Also, he mentioned SB-E base prices will start a little over 2600K. So those who recently bought 2600K, you have a reason to be annoyed. Rich Wargo, on SA forums suggested that people who may buy SB parts in near future, may rue their decision. Whether it was owing to better pricing, performance, or both, he didn't comment on that. Of course, that's his personal opinion, so please take it with a pinch, bucket, as it suits you, of salt.

Anyways, so what do we know? BD 8 core chips have a base clock of 3.2 Ghz. 2.4 Ghz is the L3 cache speed. Sizes of cache was disclosed by AMD. Any other information about desktop chips to glean over?

podspi · Jun 3, 2011

garagisti said:
There's this guy OBR, who seemingly has access/ possesses a BD. He suggests (read confirmed) base clocks of 3.2 Ghz for a chip with 8 cores. I think he has the B1 chip. Also, he mentioned SB-E base prices will start a little over 2600K. So those who recently bought 2600K, you have a reason to be annoyed. Rich Wargo, on SA forums suggested that people who may buy SB parts in near future, may rue their decision. Whether it was owing to better pricing, performance, or both, he didn't comment on that. Of course, that's his personal opinion, so please take it with a pinch, bucket, as it suits you, of salt.

Anyways, so what do we know? BD 8 core chips have a base clock of 3.2 Ghz. 2.4 Ghz is the L3 cache speed. Sizes of cache was disclosed by AMD. Any other information about desktop chips to glean over?

What's the source on the guy with the B1 chip?

And Rich works for the GlobalFoundries fab being built in New York. I doubt he knows Bulldozer's performance, since the fab isn't producing anything atm. And even if he does have an idea, his statements are peppered with lots of "ifs" and things like that. I will admit, I loll'ed at his comment (I'll go back to running my fab) :biggrin:

Nemesis 1 · Jun 5, 2011

The thing with SB vs BD is that intel did the open CL work for AVX befor they did the work for IB IGP. I suspect in new games developed after SB open CL paths will be in use for SB and IB . IB adds CL to the IGP . Its important . As amd did the leg work first on the APU (gpu) They really haven't completed the work on the cl for the cpu path as yet. LLANO has no AVX on the cpu thus no CL on the cpu . YOU might say so what?

http://software.intel.com/file/30814

http://software.intel.com/file/33189

Nemesis 1 · Jun 5, 2011

I have read in many forums the topic of FMA on BD and its high praises. I like intels approach alot better. The debate rages on . But on this subject the best debate I have seen is this one here. It pretty much covers all the bases without exception.

http://software.intel.com/en-us/forums/showthread.php?t=61121

If you read the 2 links above Than I think you can see more clearly. It also shows why intel worked so hard on AVX and open CL for the cpu first.

Idontcare · Jun 5, 2011

Nemesis 1 said:
I have read in many forums the topic of FMA on BD and its high praises. I like intels approach alot better. The debate rages on . But on this subject the best debate I have seen is this one here. It pretty much covers all the bases without exception.

http://software.intel.com/en-us/forums/showthread.php?t=61121

If you read the 2 links above Than I think you can see more clearly. It also shows why intel worked so hard on AVX and open CL for the cpu first.

I was/am under the impression that AMD's proposed FMA4 solution is superior to that of Intel's FMA3.

http://en.wikipedia.org/wiki/FMA_instruction_set

Nemesis 1 · Jun 5, 2011

Just
a quick ans. I would have to get the info from intel but intel FMA3 has 3 0ps but can pull from 4 registers. If intel went FMA4 and used 4 ops than it could pull from 5 registers
I find this more to the point
Personally I think a strided load would be a waste in the long term. Sooner or later true scatter/gather will be added (*) and the strided load becomes another superseded legacy instruction that you have to drag with you till the end of days.

If that's not a concern, fine, but please consider adding the gather instruction as soon as possible. An early implementation could work just like in Larrabee; using multiple wide loads till all the elements have been 'gathered'. It would definitely be faster than using individual insertps instructions, with a minimal latency equal to that of a movups (for sequential indexes or indexes all in the same vector).

And it would be useful for a lot more than just matrix transposition. It opens the door for things that aren't even conceivable today. Truely any loop that involves independent iterations could be (automatically) parallelized when we have scatter/gather instructions, no matter how the data is organized, or even in the presence of pointer chasing. So it's not just for HPC or multimedia (although those would benefit massively as well). If you think that's radical, please realise that the rules for writing high performance software have already changed dramatically when we went multi-core. So you might as well finish what you started and add scatter/gather support or the CPU will keep losing terrain to the GPU. You're nearing the point where people just buy the cheapest CPU available and rather invest in a more powerful GPU to do the 'real work'. The competition (both AMD and NVIDIA) are in rather sweet spots to take the biggest pieces of the pie in this scenario. So you'd better give people good reasons to keep buying the latest CPUs, by adding instructions to support algorithms that would otherwise run more optimally outside of the CPU. The only reason I care is because I believe it's better for the end user

UntiL single uop execution units are available. Intel with AVX is working on scatter gather . This is the more forward looking approach

Absolutely. It's really about adoption and compatibility:
Scenario 1: FMA instructions are added later when single uop execution units are available.
Let's say this happens in four years. At that point developers will be eager to use FMA, but they have to be careful to still support older processors. So they have the choice of writing two code paths, or just not using FMA till it's ubiquitous. Maintaining multiple code paths is a software engineer's daily nightmare (it's not just FMA, it's other ISA extensions and many other system parameters as well). So it's not uncommon to only start supporting new instructions years later. In fact I believe that only recently it has become relatively safe to assume SSE2 support as a minimum (i.e. putting that on the box won't cost us a significant number of clients). That's a full 7 years after its introduction! So in this scenario FMA would suffer pretty slow adoption up to the year 2019...

Scenario 2: FMA instructions are added sooner and executed in two uops.
Developers can and will experiment with these instructions sooner. Compilers and other tools will support them years sooner too. Code size, extra precision, and the potential of seeing faster implementations in future processors (without requiring a code rewrite) are enough incentive for the early adopters. By the time single uop FMA processors become available they'll see a nice boost in performance. That's good for Intel too since real-world applications can be used as benchmarks, which is a lot more convincing for consumers than numbers on paper and a much later return on investment. And just as importantly, those 2 uop FMA processors will still run applicaitons that have one code path and demand FMA as a minimum. They won't run it faster than an application with two code paths (one using separate mul and add) but at leat they'll run it. There's nothing more frustrating than not being able to run an application because the hardware doesn't support it (and guess who gets the blame).
So I think scenario 2 is a win for everybody (hardware guys, software guys and consumers). And I strongly believe it applies to much more than FMA. Of course you can't just blindly start adding instructions, but if you already decided you're going to invest transistors into a feature at some point, it really doesn't hurt to have a functional 'interface' much sooner. In fact, if it turns out that developers are not so interested in the feature after all, you have the option of postponing the full-fledged implementation a couple years till they're more interested, investing those transistors elsewhere in the meantime.
Lastly, in case anyone's worried about the marketing aspects: It's simply a case of not marketing to consumers until the faster execution units are added. Core 2's vastly increased SSE performance has been a grand succes despite that SSE has been around for a decade. It's easy to market when the numbers speak for themselves.

Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Golden Member

Diamond Member

Lifer

Lifer

Lifer

Golden Member

Golden Member

Lifer

Golden Member

Lifer

Lifer

Lifer

Golden Member

Lifer

Golden Member

Golden Member

Lifer

Platinum Member

Diamond Member

Senior member

Golden Member

Lifer

Lifer

Elite Member

Lifer