Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

AtenRa · Jan 29, 2011

bryanW1995 said:
WTF?? you can't compare a ROG mobo to a p8p67 deluxe. crosshair formula mobo for p67 will probably be more like $100 extra over the amd equivalent, if anything a mere $50 premium is on the low end of the likely price delta.

http://www.newegg.com/Product/Produc...82E16813131666 this one is probably a better comparison, and it's on a 2 year old platform that is currently on life support.

At almost the same price both boards have almost the same features and they are top of the line M/B in there socket segment (AM3 and 1155), 1366 is in a different category.

bryanW1995 · Jan 29, 2011

that's just bc asus doesn't have any ROG mobos out yet for p67. once they do they will be significantly more expensive. ROG is their premium brand, those boards are significantly more expensive than comparable consumer-class boards.

they don't have an exact asus comparison on an 890fx platform, but this is the closest I could find.

http://www.newegg.com/Product/Produc...82E16813131655

it has the same features, though the p8p67 does offer a few more sata 6.0 gbs connections and "extra" on a few other items that the 890fx board also has. however, the p8p67 is also newer so it really just has the latest advanced features of any board that would be out right now. if they had both released at the same time they would probably be nearly feature-identical.

996GT2 · Jan 29, 2011

Stoneburner said:
Yeah but like most people I don't live near an MC. Probably a good thing as I would waste money on unnecessary stuff.

You also have to consider, MC has deals on AMD boards and chips as well though. If I'm near an MC in the next few weeks and they're running a deal, I am very intrigued by it. However, as somebody else pointed out, getting ALL features on an intel board is not easy.

What specific features are you referring to? My UD3 is a pretty basic P67 board and it has all of the features I wanted. VRD 12 certified power circuitry with a heatsink over the VRMs, SATA 6 Gb/s, USB 3.0, RAID 0/1/5/10, 4 fan headers, dual BIOS in case one gets corrupted by failed overclocking, ALC892 HD audio. It even supports x8/x4 CrossfireX if you're into running a multi-GPU setup.

It runs my 2500K @ 4.4 without any issues (goes up to 5.0 with more voltage) and only cost about $90 as part of an MC combo (about$120 retail, but that's still quite low). What other features are you looking for? If you want more ports or x8/x8 Crossfire/SLI support, you can easily get them in a ~$150 ish board, or you can easily get an add-on PCIe SATA 6 or USB 3.0 card very inexpensively (often free AR). I had looked into AMD 7xx/8xx boards prior to getting my SB setup and a decent board for AMD also seemed to come in at around the $150 mark.

LtMikePowelll · Jan 29, 2011

bryanW1995 said:
that's just bc asus doesn't have any ROG mobos out yet for p67. once they do they will be significantly more expensive. ROG is their premium brand, those boards are significantly more expensive than comparable consumer-class boards.

Asus does have ROG mobos for P67 chipset:
http://www.newegg.com/Product/Produc...-692-_-Product
Except it's out of stock.

Stoneburner · Jan 29, 2011

996GT2 said:
What specific features are you referring to? My UD3 is a pretty basic P67 board and it has all of the features I wanted. VRD 12 certified power circuitry with a heatsink over the VRMs, SATA 6 Gb/s, USB 3.0, RAID 0/1/5/10, 4 fan headers, dual BIOS in case one gets corrupted by failed overclocking, ALC892 HD audio. It even supports x8/x4 CrossfireX if you're into running a multi-GPU setup.

It runs my 2500K @ 4.4 without any issues (goes up to 5.0 with more voltage) and only cost about $90 as part of an MC combo (about$120 retail, but that's still quite low). What other features are you looking for? If you want more ports or x8/x8 Crossfire/SLI support, you can easily get them in a ~$150 ish board, or you can easily get an add-on PCIe SATA 6 or USB 3.0 card very inexpensively (often free AR). I had looked into AMD 7xx/8xx boards prior to getting my SB setup and a decent board for AMD also seemed to come in at around the $150 mark.

the GA-P67A-UD4 would be the board for me. That's $180 to $200.

Again, i'm going to wait for bulldozer and assess then what route i'm taking. I'm not going to get Sandy Bridge right now despite it's excellent performance, unless I happen to be near the Bay Area or LA where I can hit up an MC.

bryanW1995 · Jan 29, 2011

ah, thank you. that one is $349, but it is a better board than a formula so it's probably not a fair apples to apples comparison any more than atenra's board was.

podspi · Jan 29, 2011

JFAMD said:
The OS doesn't know about modules, it only sees cores. But all cores are physical cores, so it won't matter.

This might be the case if you're running 8/16 full and independent threads, but there are lots of cases where I can imagine it would matter. The most obvious case is on laptops, where you'd almost always prefer to pack as many threads on a module as possible to power down the other modules.

Also, if you are running two related threads that could benefit from shared L2 cache, you are going to want to run those threads on the same module.

Ideally, I'd like at least the ability to differentiate between cores, as well as the ability to perhaps explicitly shutdown modules for mobile offerings (quad-core CPU becomes dual-core when the battery needs to last).

From what you're saying, it sounds to me like AMD is not suggesting developers will need to optimize for Bulldozer's unique design. While this may mostly be the case for high-throughput workloads, this seems shortsighted to me. From the high-level views of Bulldozer AMD has allowed us to see, if Core 0 is being utilized at 100%, and I have another high-throughput thread, there is a difference between whether that thread is placed on Core 1 or Core 2, even if performance will be acceptable in either case, why not allow developers to test their applications and optimize for your product?

bryanW1995 · Jan 29, 2011

he also stated the other day that he can't comment on their discussions with software companies. maybe I'm reading too much into that statement, but from the context I assume that means that AMD is actively working with software companies to maximize the potential of the new architecture.

AtenRa · Jan 29, 2011

bryanW1995 said:
he also stated the other day that he can't comment on their discussions with software companies. maybe I'm reading too much into that statement, but from the context I assume that means that AMD is actively working with software companies to maximize the potential of the new architecture.

Yeap they do, not only for Bulldozer but for Llano and Brazos

Ajay · Jan 29, 2011

podspi said:
This might be the case if you're running 8/16 full and independent threads, but there are lots of cases where I can imagine it would matter. The most obvious case is on laptops, where you'd almost always prefer to pack as many threads on a module as possible to power down the other modules.

Also, if you are running two related threads that could benefit from shared L2 cache, you are going to want to run those threads on the same module.

Ideally, I'd like at least the ability to differentiate between cores, as well as the ability to perhaps explicitly shutdown modules for mobile offerings (quad-core CPU becomes dual-core when the battery needs to last).

From what you're saying, it sounds to me like AMD is not suggesting developers will need to optimize for Bulldozer's unique design. While this may mostly be the case for high-throughput workloads, this seems shortsighted to me. From the high-level views of Bulldozer AMD has allowed us to see, if Core 0 is being utilized at 100%, and I have another high-throughput thread, there is a difference between whether that thread is placed on Core 1 or Core 2, even if performance will be acceptable in either case, why not allow developers to test their applications and optimize for your product?

Truly concurrent software (highly parallel) is difficult to write using the current more popular languages/platforms. Thus, the ROI won't be there for most client side software packages (notable exceptions would be high priced imaging and video software).

On there server side, the ROI would be higher, but there is a complication with many server applications running in a VM environment. Very high performance dedicated database servers, web servers and HPC applications would be good targets for the concurrent programming paradigm.

There are some games that would benefit from better multi threaded optimization, but the dominance of console games and the need to share a code base with them precludes this, at least for the current generation of consoles (it can be done on the PS3, but not the same way as would be done on a Windows PC).

JFAMD · Jan 29, 2011

podspi said:
This might be the case if you're running 8/16 full and independent threads, but there are lots of cases where I can imagine it would matter. The most obvious case is on laptops, where you'd almost always prefer to pack as many threads on a module as possible to power down the other modules.

Also, if you are running two related threads that could benefit from shared L2 cache, you are going to want to run those threads on the same module.

Ideally, I'd like at least the ability to differentiate between cores, as well as the ability to perhaps explicitly shutdown modules for mobile offerings (quad-core CPU becomes dual-core when the battery needs to last).

From what you're saying, it sounds to me like AMD is not suggesting developers will need to optimize for Bulldozer's unique design. While this may mostly be the case for high-throughput workloads, this seems shortsighted to me. From the high-level views of Bulldozer AMD has allowed us to see, if Core 0 is being utilized at 100%, and I have another high-throughput thread, there is a difference between whether that thread is placed on Core 1 or Core 2, even if performance will be acceptable in either case, why not allow developers to test their applications and optimize for your product?

So, here's the deal - we are obviously working with the OS and app vendors. My comment was pointed at the people who are obsessing about "how do I over-ride what you are doing because I think I am smarter and I know how to do it better."

Every environment is different, but there are people that believe they will get some massive boost by threading over modules vs. just loading threads in order. The reality is that the OS is going to figure out the best place to put the next thread. when you start up an app you may have all of the cores fire up, but once that happens, they all free up for different times. So people will never see that perfect world, and more importantly, the performance delta, for most apps, is not going to be radically different.

bryanW1995 said:
he also stated the other day that he can't comment on their discussions with software companies. maybe I'm reading too much into that statement, but from the context I assume that means that AMD is actively working with software companies to maximize the potential of the new architecture.

Yes, all the usual suspect.

podspi · Jan 30, 2011

JFAMD said:
My comment was pointed at the people who are obsessing about "how do I over-ride what you are doing because I think I am smarter and I know how to do it better."

()

JFAMD said:
The reality is that the OS is going to figure out the best place to put the next thread.

This is all I really wanted to know. I realize that this is probably a touchy subject (from a PR point of view) because people are already accusing Bulldozer of not having "real cores", and to be clear that wasn't what I was trying to say. Many of the things I'm doing (data-analysis) will consume all the cores anyway, but particularly on the power side, I'm fascinated by the idea of power-gating entire modules when they aren't needed.

SickBeast · Jan 30, 2011

It will be interesting to see how well Bulldozer can run Crysis 2. IMO that's going to be the most important benchmark for most people. The game is now heavily multi-threaded, and it should allow us to see how BD does in terms of gaming compared to SB.

beginner99 · Jan 31, 2011

podspi said:
This might be the case if you're running 8/16 full and independent threads, but there are lots of cases where I can imagine it would matter. The most obvious case is on laptops, where you'd almost always prefer to pack as many threads on a module as possible to power down the other modules.

Also, if you are running two related threads that could benefit from shared L2 cache, you are going to want to run those threads on the same module.

Ideally, I'd like at least the ability to differentiate between cores, as well as the ability to perhaps explicitly shutdown modules for mobile offerings (quad-core CPU becomes dual-core when the battery needs to last).

From what you're saying, it sounds to me like AMD is not suggesting developers will need to optimize for Bulldozer's unique design. While this may mostly be the case for high-throughput workloads, this seems shortsighted to me. From the high-level views of Bulldozer AMD has allowed us to see, if Core 0 is being utilized at 100%, and I have another high-throughput thread, there is a difference between whether that thread is placed on Core 1 or Core 2, even if performance will be acceptable in either case, why not allow developers to test their applications and optimize for your product?

AFAIK windows is aware if hyper-threading and hence the scheduler can "decide" in which core to run a certain thread, meaning on an already active core as second thread using hyper-threading or on an idle core.
Windows 7 was optimized for "shutting down" a s many cores (or sockets) as possible to save power. So workload is first put on the same core until it is too much. Vista always distributed workload evenly hence in general more cores were active.

The question is, is I actually asked previously, if OS will be aware if 2 logical cores belong to the same or a different module and that seems to be the case according to JFAMD. But in the end it's the scheduler that decides.
(On Win 7 you can actually override the scheduler if you want to. http://msdn.microsoft.com/en-us/library/dd627187(v=vs.85).aspx)

grimpr · Jan 31, 2011

Well that depends, if Crytek used anything with the Intel logo in their multithreaded CPU physics engine, which frankly puts Physx to shame, an 8core 125W TDP BD wont stand a chance even to locked multiplier SB 2300s.

Lonbjerg · Jan 31, 2011

grimpr said:
Well that depends, if Crytek used anything with the Intel logo in their multithreaded CPU physics engine, which frankly puts Physx to shame, an 8core 125W TDP BD wont stand a chance even to locked multiplier SB 2300s.

Why does that myth persist?
(I suspect because people confuse graphics with physics):

http://www.youtube.com/watch?v=XBc3AR-Dl10

JFAMD · Jan 31, 2011

beginner99 said:
AFAIK windows is aware if hyper-threading and hence the scheduler can "decide" in which core to run a certain thread, meaning on an already active core as second thread using hyper-threading or on an idle core.
Windows 7 was optimized for "shutting down" a s many cores (or sockets) as possible to save power. So workload is first put on the same core until it is too much. Vista always distributed workload evenly hence in general more cores were active.

The question is, is I actually asked previously, if OS will be aware if 2 logical cores belong to the same or a different module and that seems to be the case according to JFAMD. But in the end it's the scheduler that decides.
(On Win 7 you can actually override the scheduler if you want to. http://msdn.microsoft.com/en-us/library/dd627187(v=vs.85).aspx)

You can obviously tie processes to cores, but that is - in most cases outside of virtualization - not necessarily going to optimize the application.

In reality, the scheduler is going to look for the next open core. Each thread has varying lengths, so at the beginning they all may start together, but trying to figure dependencies and how threads could share the cache is a difficult exercise.

I don't know for sure but my guess is that scheduling cores is going to be based on availability because doing anything else is going to require the scheduler to start looking deep into the thread to see what is happening and then try to figure out the best place to put a thread.

akugami · Jan 31, 2011

Delete.

akugami · Jan 31, 2011

Lonbjerg said:
Why does that myth persist?
(I suspect because people confuse graphics with physics):

http://www.youtube.com/watch?v=XBc3AR-Dl10

Because of reports like this on nVidia "crippling" CPU PhysX to make its GPU's look good. Also because of how good the Infernal Engine VELOCITY Physics engine looks.

Mopetar · Jan 31, 2011

Any word yet on how overclocking will work?

Intel essentially locked it down for anything but their K-branded parts. I haven't heard if AMD is going to follow a similar route.

cusideabelincoln · Jan 31, 2011

I don't think (as far as I know) AMD is putting the clock generator on the CPU so I imagine overclocking with Bulldozer will work exactly like it does with Phenom. Black Editions will have unlocked multis and non-Blacks overclock via the bus.

Dice144 · Jan 31, 2011

With Intel chipset issues I really cannot wait for BD now more then before. I almost order Intel ITX system but now I gotta wait. Come on BD!

bryanW1995 · Jan 31, 2011

Yeah, jfamd if you're reading this you guys need to crack the whip on your BD teams and see if you can get it launched before SB resumes sales. that would be a real coup for amd, with the added pleasure of sticking it to intel at the same time.

HW2050Plus · Jan 31, 2011

hamunaptra said:
Whats more beneficial per core? Shared schedulers or dedicated schedulers? To me, it seems like shared is more flexible as in the following case: If only one pipe is needed for some execution, it has most of the schedulers dedicated to it?
If both pipes are being used, the schedulers are working their hardest in distributing appropriately, but hopefully can mix and match / be more effecient in a shared setup?

If thats the case, couldnt the "not quite" doubling of schedulers not be an issue, since going to shared is more effecitive utilization of them?

This is a design question. Intel CPUs cannot execute integer with SSE or FPU instructions in parallel (per issue port), so a seperate scheduler doesn't make sense. They use a shared schedulers because the execution resources are shared.

AMD CPUs can execute integer and SSE/FPU in parallel and therefore they had always dedicated schedulers. The problem of AMD was that the actual software could seldomly make use of this massive execution parallelism (3*integer + 3*FPU/SSE + 3*AGU + 3*load/store in one cycle). Even more since most compilers optimized for Intel and it is generally difficult to make use of parallel execution power. A lesson also Intel learned with Itanium.

Intel's Core design has 3 * Integer/FPU/SSE + 1 AGU + 2 load/store. But as you know this less units do not affect performance of Intel Core.

You want to ask then why AMD has choosen this design for their old and current products which wastes resources. That was because that simplified the architecture (if you have 3 of everything that simplifies scheduling a lot) and they didn't know at that time that software will not be able to make use of those massive execution resources (esp. AGU and FPU/SSE).

Bulldozer ist the technical solution or if you like "magic trick" to squeeze everything out of the capabilities of their design approach.

It has 2*integer + 2*FPU/SSE + 2*AGU per core (yeah still 2 of everything to make it simple though the sense of a second AGU is questionable). That is much better in economics because less resources are wasted (AGU does not consume too much). High clock design compensates the missing third integer execution unit, while the third FPU/SSE and AGU were just unused in any real software and just consumed die area and power.

All this reduction gave die area to just add another core. That works because of the front-/backend as well is oversized in current designs. So you get another core for free so to say. Regarding overall chip you get double of cores at same price or lets say production cost/die size/power consumption.

That is also why Intel will have to do much more work to get a Sandy Bridge successor to use the same design trick, though that is no real problem for Intel as they have plenty of resources to do that even if it means more work.

All this is much more complex than I have written here, so the above is simplified to point out what has done with Bulldozer and why. E.g. there are some constraints in former AMD CPUs limiting it's scalarity in some cases and there is a design advantage for Intel with their parallel integer instructions as the issue port is shared.

With Core architecture Intel used their design approach to the limits and now bulldozer from AMD will come which will bring their design approach to the limits.

AMD will then have more execution units (4+2 in Bulldozer vs. 3 in Sandy Bridge) running at better utilization (IPC/unit) and at higher clock rates. That are three important improvements besides some other minor ones that will give this radical performance boost.

Intel has likly better branch prediction with Sandy Bridge, some minor core features AMD doesn't provide with Bulldozer and Hyperthreading.

Therefore Bulldozer is not twice as fast as Sandy Bridge but as this rumor suggests something like 50% faster overall. Though the improvement will vary greatly depending on the benchmark used.

And some word regarding this: "AMD 8 core vs. Intel 4 core. Ha I am not impressed." Once again: Both CPUs will show 8 cores to the operating system. It is just different how those 4 extra cores are generated: Intel's virtual cores from Hyperthreading and AMD's real cores from their "split module technology".

And another word about the rumor. This could really get interesting, because the comparison leaked on donanimhaber shows media, render and gaming. What is interesting about that is that none of those should be the real strength of AMD Bulldozer, since it's real strength is in integer power like database, compiler, web server, chess, etc.

HW2050Plus · Jan 31, 2011

JFAMD said:
You can obviously tie processes to cores, but that is - in most cases outside of virtualization - not necessarily going to optimize the application.

In reality, the scheduler is going to look for the next open core. Each thread has varying lengths, so at the beginning they all may start together, but trying to figure dependencies and how threads could share the cache is a difficult exercise.

I don't know for sure but my guess is that scheduling cores is going to be based on availability because doing anything else is going to require the scheduler to start looking deep into the thread to see what is happening and then try to figure out the best place to put a thread.

That works in principle really easy: The cores have preferences like first use those 4 cores before you use the other 4 cores. Plus it should use cores in a certain order (allow power saving/Turbo mode).

So all Windows 7 has to do: Look if core0 is not fully busy, if yes then assign thread. Otherwise go to next core in the ordered list.

pseudo code:
x = 0;
while (IsCoreFullyBusy(x) && (x < core count))
{
x++;
}
AssignThreadToCore(thread, x);

With that simple code part the task scheduler is hyper threading and power saving/turbo mode aware. No need for magic thread analysis.

Maybe it is a bit more clever and replaces "Iscorefullybusy(x)" with "(CoreUtilization(x) - ThreadConsumptionInPast(thread))".

However the main point is just to do an ordering in which those cores are checked. So the trick is inside AssignThreadToCore().

Rumour: Bulldozer 50% Faster than Core i7 and Phenom II.

Lifer

Lifer

Diamond Member

Senior member

Diamond Member

Lifer

Golden Member

Lifer

Lifer

Lifer

Senior member

Golden Member

Lifer

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Lifer

Member

Member