Fudzilla: Bulldozer performance figures are in

Page 73 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
The thing is, even if we're to assume everything is multi-threaded:

You have CPU X with eight cores, a CMT architecture, and a 3.6GHz clock speed. You have CPU Y with four cores, a CMP architecture, and a 3.3GHz clock speed. If your IPC is too low, you won't have that much multi-threaded speed. The Phenom II X6 1100T has two more cores than the Core i5-2500K and can only match it in multi-threaded.

Are you saying HT is equivalent to BD approach? When a program is properly multi-threaded, it has very little need for HT.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
That's all it needs to do. A different approach to the same end. The problem comes when it's too slow in single threaded applications. We'll have to see, but BD with it's aggressive turbo and fatter front end is likely to solve that. If it can, at the same power and cost perform as well, why isn't that acceptable?

I'm gonna have to disagree with that. If your CPU is only able to match the other in multi-threaded while being left with a 40% gap separating them in single-threaded, I'll choose the other CPU because it's more balanced. It'd do well in all types of workloads, and that's why Nehalem and Sandy Bridge are so great. With Bulldozer I'm pretty sure we'll have another deja vu moment like when we saw the Phenom II X6 1090T versus the Core i7-860, where the X6 was overall around 15% faster in multi-threaded but 25% slower in single-threaded. I don't want that; I want balance.

Turbo doesn't really matter to the people buying these, as we're enthusiasts. If they can really reach 4.8-5GHz on good air cooling and with medium overvolts, then they'll probably be a good alternative for things like video encoding, compiling, and the like. 3D rendering depends more on floating point performance, so we'll have to see how it does there. Overall, though, I expect Intel to have yet again more balance when it comes to speed in different workloads.
 

Davidh373

Platinum Member
Jun 20, 2009
2,428
0
71
That's all it needs to do. A different approach to the same end. The problem comes when it's too slow in single threaded applications. We'll have to see, but BD with it's aggressive turbo and fatter front end is likely to solve that. If it can, at the same power and cost perform as well, why isn't that acceptable?

It doesn't have the same power cost though lol. They have roughly the same idle consumption, but load is WAY different. 30W-40W difference? That's like another i3 on top of the i5 lol. If you are referring to BD vs. i5 2500k, let's put this in perspective. 8-Cores at 130W TDP vs. 4-Cores at 95W... let's consider also that BD is rumored to ship with stock WATERCOOLING. Yeah, it'll have the same power consumption... in your dreams...
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
It doesn't have the same power cost though lol. They have roughly the same idle consumption, but load is WAY different. 30W-40W difference? That's like another i3 on top of the i5 lol. If you are referring to BD vs. i5 2500k, let's put this in perspective. 8-Cores at 130W TDP vs. 4-Cores at 95W... let's consider also that BD is rumored to ship with stock WATERCOOLING. Yeah, it'll have the same power consumption... in your dreams...

TDP doesn't mean power consumption
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Are you saying HT is equivalent to BD approach? When a program is properly multi-threaded, it has very little need for HT.

That's not what I said. CMP>CMT>>SMT

And when programs are properly multi-threaded, even considering what others will tell you, SMT will definitely give it a good advantage. The heaviest multi-threaded applications like compiling, video encoding and 3D rendering definitely see a benefit from SMT, most of the time by 20% with Intel's HyperThreading. Regardless, what I've said earlier isn't taking SMT into account. I'm not expecting the FX series to compete with the Core i7 overall, but the Core i5. I'm having a feeling of deja vu, as if we're looking at the Core i7-860 vs the Phenom II X6 1090T again.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
That's not what I said. CMP>CMT>>SMT

And when programs are properly multi-threaded, even considering what others will tell you, SMT will definitely give it a good advantage. The heaviest multi-threaded applications like compiling, video encoding and 3D rendering definitely see a benefit from SMT, most of the time by 20% with Intel's HyperThreading. Regardless, what I've said earlier isn't taking SMT into account. I'm not expecting the FX series to compete with the Core i7 overall, but the Core i5.

CMT > CMP > SMT

get use to it or drop it
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
I'm gonna have to disagree with that. If your CPU is only able to match the other in multi-threaded while being left with a 40% gap separating them in single-threaded, I'll choose the other CPU because it's more balanced. It'd do well in all types of workloads, and that's why Nehalem and Sandy Bridge are so great. With Bulldozer I'm pretty sure we'll have another deja vu moment like when we saw the Phenom II X6 1090T versus the Core i7-860, where the X6 was overall around 15% faster in multi-threaded but 25% slower in single-threaded. I don't want that; I want balance.

Turbo doesn't really matter to the people buying these, as we're enthusiasts. If they can really reach 4.8-5GHz on good air cooling and with medium overvolts, then they'll probably be a good alternative for things like video encoding, compiling, and the like. 3D rendering depends more on floating point performance, so we'll have to see how it does there. Overall, though, I expect Intel to have yet again more balance when it comes to speed in different workloads.

While you're correct in today's environment, tomorrow's will be different. The API's are becoming increasingly easier to go multi-threaded. So, your probably right, it won't be awesome for a single thread apps.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
I'm gonna have to disagree with that. If your CPU is only able to match the other in multi-threaded while being left with a 40% gap separating them in single-threaded, I'll choose the other CPU because it's more balanced. It'd do well in all types of workloads, and that's why Nehalem and Sandy Bridge are so great. With Bulldozer I'm pretty sure we'll have another deja vu moment like when we saw the Phenom II X6 1090T versus the Core i7-860, where the X6 was overall around 15% faster in multi-threaded but 25% slower in single-threaded. I don't want that; I want balance.

Turbo doesn't really matter to the people buying these, as we're enthusiasts. If they can really reach 4.8-5GHz on good air cooling and with medium overvolts, then they'll probably be a good alternative for things like video encoding, compiling, and the like. 3D rendering depends more on floating point performance, so we'll have to see how it does there. Overall, though, I expect Intel to have yet again more balance when it comes to speed in different workloads.

Well, if there is indeed a 40% difference it will make your argument stronger. That's yet to be seen. While we are enthusiasts and will likely O/C I don't think that is a big section of the market, overall. The way the product performs out of the box can't just be dismissed like it's completely irrelevant. Especially when most only the K models are O/C'able.
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
That's not what I said. CMP>CMT>>SMT

And when programs are properly multi-threaded, even considering what others will tell you, SMT will definitely give it a good advantage. The heaviest multi-threaded applications like compiling, video encoding and 3D rendering definitely see a benefit from SMT, most of the time by 20% with Intel's HyperThreading. Regardless, what I've said earlier isn't taking SMT into account. I'm not expecting the FX series to compete with the Core i7 overall, but the Core i5. I'm having a feeling of deja vu, as if we're looking at the Core i7-860 vs the Phenom II X6 1090T again.

Uhh, no. Properly threaded and programmed applications see a decrease in performance with HT on.
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Uhh, no. Properly threaded and programmed applications see a decrease in performance with HT on.
Then I guess it's a fortunate thing for Intel that "improperly threaded and programmed" applications are overwhelmingly in the majority. Though even the properly threaded and programmed applications will run incredibly fast on SB due to its inherently strong core.
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
Then I guess it's a fortunate thing for Intel that "improperly threaded and programmed" applications are overwhelmingly in the majority. Though even the properly threaded and programmed applications will run incredibly fast on SB due to its inherently strong core.

That is correct :)

However, if we had good software, you would see 8-16 core Intel CPU's by now.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
I'm gonna have to disagree with that. If your CPU is only able to match the other in multi-threaded while being left with a 40% gap separating them in single-threaded, I'll choose the other CPU because it's more balanced.

We only have to take a look at $120 X4 965 vs. i3 and $160-180 X6 vs. i5 (previous and current generation) to realize that 80% of the market did not want more slow cores. There is no evidence that this will change any time soon until programs use 6-8 threads ubiquitously. Sure, there are plenty of users who DO want 6-8 cores. But since at least 2007, market share / market trends have shown that selling MORE cores for the same price didn't do wonders for AMD in terms of improving their overall profitability or market share.

The majority of the market isn't interested in content creation CPUs. Actually in P4 vs. A64 days, A64 excelled exactly in the type of programs in which SB beats the Phenom X6. Ironically, P4 was the better encoding, rendering/content creation CPU. AMD managed to gain a ton of market share back then.

The truth is, any of the following would have been great for AMD:

1) Same performance as SB but lower power consumption (especially important for laptops!)
2) Same performance as SB but better overclocking (especially important for us overclockers)
3) Combination of 1 & 2

If AMD matches the performance of SB, I would be happy (and I am sure most would welcome a competitive AMD). The problem is looking back at history and comparing 2 very similarly performing CPUs (A64 mobile and Pentium M (Centrino)), A64 was very successful on the desktop but was still a huge disappointment for the mobile segment (by far the faster growing segment).

For this very reason, if an 8-core 8150 matches a 2600k, that would only satisfy the desktop segment. If AMD wants to turn the heat on Intel, having lower power consumption over Intel would be a HUGE WIN for them.

I mean an overclocked X6 may trade blows with an overclocked 2500k at times, but it consumes nearly 2x more power!

It's going to be interesting to see how a 4.8ghz 8150 fares against a 4.6-4.7ghz 2500k/2600k.
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
CMT > CMP > SMT

get use to it or drop it

CMT is equivalent to CMP when it's not sharing resources. When it is, it's inferior. By how much depends on the architecture.

While you're correct in today's environment, tomorrow's will be different. The API's are becoming increasingly easier to go multi-threaded. So, your probably right, it won't be awesome for a single thread apps.

Most things today are multi-threaded. The problem is if you have a CPU with low IPC and many cores it's not gonna do significantly better in them than one with high IPC and half as many cores. I'd gladly trade 10-15% lower multi-threaded performance for 25-30% higher single-threaded performance because it works well for all workloads. It's fast in single-threaded; it's fast in multi-threaded.

Well, if there is indeed a 40% difference it will make your argument stronger. That's yet to be seen. While we are enthusiasts and will likely O/C I don't think that is a big section of the market, overall. The way the product performs out of the box can't just be dismissed like it's completely irrelevant. Especially when most only the K models are O/C'able.

The 40% difference comes from comparing the Core i5-2500K and Phenom II X6 1100T in single-threaded. Given the high clock speeds and that they need so many cores, as well as the pricing, I only expect AMD to have improved in this metric by 10% better than before. Most of that will come from the higher clock speeds, probably. As for multi-threaded, I'm willing to wager a bit higher increase. It'd [FX-8120] end up being 10-15% faster than the Core i5-2500K in multi-threaded and 25-30% slower in single-threaded, hence my argument on Intel balancing workloads better.
 
Last edited:

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
Most things today are multi-threaded. The problem is if you have a CPU with low IPC and many cores it's not gonna do significantly better in them than one with high IPC and half as many cores. I'd gladly trade 10-15% lower multi-threaded performance for 25-30% higher single-threaded performance because it works well for all workloads. It's fast in single-threaded; it's fast in multi-threaded.

The things that are import to you and are single threaded are what? I'm not trying to be an arse, it's just an honest question.

To say most applications are multi-threaded is not correct. I code a bunch of random programs and only a 1/4 of mine are multi-threaded. All my .Net applications are meant for single core CPU's as most large corporations have still failed to move forward. I have only recently moved my code to be multi-threaded.

My argument is that even a decent quad core can run most single thread programs with little stress. So, my main focus right now is to code for 4 - 8 cores.
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
Uhh, no. Properly threaded and programmed applications see a decrease in performance with HT on.

you got it reversed, although a poorly programmed application can see a decrease of performance with HT enabled, any properly threaded application (i.e. with minimum synchronization between threads and good load balencing) enjoys easily 15%-20% speedup from HT, 20%-25% with cache aware code (cache blocking for half L1D/L2, minimize aliasing issues) and 30% or more in some cases (typically codes with a lot of LLC cache misses, i.e. memory latency bound)
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
The 40% difference comes from comparing the Core i5-2500K and Phenom II X6 1100T in single-threaded. Given the high clock speeds and that they need so many cores, as well as the pricing, I only expect AMD to have improved in this metric by 10% better than before. Most of that will come from the higher clock speeds, probably. As for multi-threaded, I'm willing to wager a bit higher increase. It'd [FX-8120] end up being 10-15% faster than the Core i5-2500K in single-threaded and 25-30% slower in single-threaded, hence my argument on Intel balancing workloads better.

I know where the argument comes from. BD is a completely new arch though. It's not just a Thuban shrink.

All I'm saying is that we don't know what BD is going to do single-thread (we can guess and assume), and I'm not ready to just dismiss the higher clock speeds until Intel raises theirs (If it even ends up they need to. They might not have to if BD ends up being as big of a dog as you are assuming.) It's also possible that in stock configuration BD will beat SB single-threaded if it's higher clocks are enough to make up the performance difference. It appears that this is what AMD is shooting for.
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
you got it reversed, although a poorly programmed application can see a decrease of performance with HT enabled, any properly threaded application (i.e. with minimum synchronization between threads and good load balencing) see easily 15%-20% speedup from HT, 20%-25% with cache aware code (cache blocking for half L1D/L2, minimize aliasing issues) and 30% or more in some cases (typically codes with a lot of cache misses, i.e. memory latency bound)

Huh? Are you serious? The load balancing is done by Microsoft or by the application saying to do something on a cpu. There is no current load balancing mechanism.

Software application can say run a process on core 4 or run this thread on core 3. However, it can't load balance on it's on.

You can set it to if a process uses a certain percentage of a cpu to move the process but that's not load balancing. Load balancing would be to distribute the thread process across multiple cores.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Uhh, no. Properly threaded and programmed applications see a decrease in performance with HT on.
Wait what? So if I have a computation that is integer heavy (or whatever) that makes it "unproperly" threaded/programmed? Something which the programmer most certainly doesn't even know because it's basically impossible to predict the exact code a compiler generates for a halfway complex program?

Well let's see how well you do with a really simple test case: So please tell us whether the following simple matrix multiplication algorithm will, if split among several threads (assuming some fork/join framework so that we can trivially split the recursive calls), be "properly" programmed:

Code:
void matrixMult(int **a, int **b, int **c, unsigned i0, unsigned i1, unsigned j0, unsigned j1, unsigned k0, unsigned k1) {
	unsigned di = i1 - i0, dj = j1 - j0, dk = k1 - k0;
	const int LEAFSIZE = 8; /* for simplicity hardcoded here, 8 isn't an especially great default value obviously on modern cpus */
	if (di >= dj && di >= dk && di > LEAFSIZE) {
		unsigned im = i0 + di / 2;
		matrixMult(a, b, c, i0,im,j0,j1,k0,k1); 
		matrixMult(a, b, c, im,i1,j0,j1,k0,k1);
	} else if (dj >= dk && dj > LEAFSIZE) {
		unsigned jm = j0 + dj / 2;
		matrixMult(a, b, c, i0,i1,j0,jm,k0,k1); 
		matrixMult(a, b, c, i0,i1,jm,j1,k0,k1);
	} else if (dk > LEAFSIZE) {
		unsigned km = k0 + dk / 2;
		matrixMult(a, b, c, i0,i1,j0,j1,k0,km); 
		matrixMult(a, b, c, i0,i1,j0,j1,km,k1);
	} else {
		for (unsigned i = i0; i < i1; i++)
			for (unsigned j = j0; j < j1; j++)
				for (unsigned k = k0; k < k1; k++)
					c[i][j] += a[i][k] * b[k][j];
	}
}

If not, we're obviously all ears how to improve such a simple algorithm so that it passes your high standards!
 
Last edited:

bronxzv

Senior member
Jun 13, 2011
460
0
71
If not, we're obviously all ears how to improve such a simple algorithm so that it passes your high standards!

thinking the same, why the hell all this SPECrate submissions with HT enabled if it's really possible to achieve better scores without HT?

why Intel IPP/MKL enjoy speedups with HT, what are all these developers doing?!
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
The things that are import to you and are single threaded are what? I'm not trying to be an arse, it's just an honest question.

To say most applications are multi-threaded is not correct. I code a bunch of random programs and only a 1/4 of mine are multi-threaded. All my .Net applications are meant for single core CPU's as most large corporations have still failed to move forward. I have only recently moved my code to be multi-threaded.

My argument is that even a decent quad core can run most single thread programs with little stress. So, my main focus right now is to code for 4 - 8 cores.

Well, let's see. Here are some popular applications:

Single-threaded: iTunes, LAME, Cinebench ST, Photoshop*

Mildly multi-threaded (two-four threads/cores): almost all games, Windows Media Encoder 9, x264 1st Pass, Blender 3D, PAR2 MT Decompression, WinRAR 3.80

Multi-threaded: Photoshop*, Xmpeg + DviX, x264 HD 2nd Pass, 3dsmax 9, Cinebench MT, POV-Ray, 7-zip Compression/Decompression, Visual Studio 2008, Sorenson Flash Video Creation, Microsoft Excel.

From this we can see why Intel does so well. Pretty much all applications are either mildly multi-threaded or fully multi-threaded. Sandy Bridge especially excels in midly multi-threaded applications, and that's where a big bulk of applications is. Sandy Bridge also does good at multi-threaded, so they're not at a big loss there. The problem for Bulldozer, and why I think it'll ultimately not be the choice of most enthusiasts, is that it'll probably be somewhat better than Sandy Bridge in multi-threaded and in everything below it'll be 15-30% slower. Thuban, Part 2.

Quote me on this, if you want.

It'd be interesting to see how it does in POV-Ray given the module concept and the shared front-end.

*Depending on the filter.
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
I know where the argument comes from. BD is a completely new arch though. It's not just a Thuban shrink.

All I'm saying is that we don't know what BD is going to do single-thread (we can guess and assume), and I'm not ready to just dismiss the higher clock speeds until Intel raises theirs (If it even ends up they need to. They might not have to if BD ends up being as big of a dog as you are assuming.) It's also possible that in stock configuration BD will beat SB single-threaded if it's higher clocks are enough to make up the performance difference. It appears that this is what AMD is shooting for.

It may be a new architecture, but that doesn't mean it's a significant speedup from K10.5. I think it'll be a very minor improvement in IPC, truth be told.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Umn the claim that Photoshop is heavily (if we already make a distinction between mildly and something else that's heavily?) multi-threaded should bear an extremely, extremely large asterisk. And Visual Studio? Same thing.

There are several quite important things in both of them that are basically completely single threaded (check out some of the quite computation heavy filters and see how many threads PS uses) and it's quite unlikely this will change in the near future. So that makes them pretty good examples of programs where single threaded performance is still important in some situations? Well who would've thought that.
 
Last edited:

Voo

Golden Member
Feb 27, 2009
1,684
0
76
it's multithreaded for the most time consuming task: compiling, when building big projects (hundreds of source files) it makes a tremendous difference
So c2.dll uses multiple cores on your Visual Studio version (it at least doesn't on my VS2010 ultimate install)? Or are you just not using LTCG? Well that certainly is a possibility to avoid that problem, but then you're giving up quite some performance increase without PGO and LTCG. Compiling in parallel? Easy. Linking and doing the link time stuff in parallel? Much harder problem that.

MS stated that they're looking into improving it, but they say it's not a simple problem - haven't tested the Dev build for any serious projects where it'd make a difference though, so did they "fix" that?
 
Status
Not open for further replies.