Fudzilla: Bulldozer performance figures are in

Page 74 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Umn the claim that Photoshop is heavily (if we already make a distinction between mildly and something else that's heavily?) multi-threaded should bear an extremely, extremely large asterisk. And Visual Studio? Same thing.

35024.png


35056.png



There are several quite important things in both of them that are basically completely single threaded (check out some of the quite computation heavy filters and see how many threads PS uses) and it's quite unlikely this will change in the near future. So that makes them pretty good examples of programs where single threaded performance is still important in some situations? Well who would've thought that.

See above. In Photoshop many filters are multi-threaded. The most intense almost always are.

it's multithreaded for the most time consuming task: compiling, when building big projects (hundreds of source files) it makes a tremendous difference

This.
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
Or are you just not using LTCG?

I use the Intel C++ XE 12.1 plugged in VS 2008 with the "-MP" compilation flag, it simply compile in parallel as much files as logical processors, I get more than 4x speedup on my 2600K with an SSD
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Yeah great that you tested the part that nobody disputed to be multithreaded, but missed the part about LTCO? Great that you tested the last century stuff - just too bad that the performance gains of using link time optimizations are totally worth the longer "link" times for medium and large programs and that part ISN'T especially well multithreaded.

And great that parts of photoshop that were tested by the benchmark were multithreaded - doesn't change the fact that several filters aren't:

http://www.tomshardware.com/reviews/cool-n-quiet-power-management said:
For this test, we ran Driverheaven.net's Photoshop benchmark script, using the default test image. The script runs several filters in sequence: Texturizer, CMYK Color Conversion, RGB Color Conversion, Ink Outlines, Dust & Scratches, Watercolor, Texturizer, Stained Glass, Lighting Effects, Mosaic Tiles, Extrude, Smart Blur, Underpainting, Palette Knife, and Sponge.

CPU-Power-Performance,Y-4-228028-13.gif


The red line says it all: Adobe Photoshop CS4, or more precisely, the filters used in the benchmark script, do not fully utilize both cores



I use the Intel C++ XE 12.1 plugged in VS 2008 with the "-MP" compilation flag, it simply compile in parallel as much files as logical processors, I get more than 4x speedup on my 2600K with an SSD
And now test how great your speedup is with link time optimizations and the VS compiler on a larger project. No idea about the Intel compiler, but at least for the VC++ you can gain some large performance increases from PGO + LTCO..
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Yeah great that you tested the part that nobody disputed to be multithreaded, but missed the part about LTCO? Great that you tested the last century stuff - just too bad that the performance gains of using link time optimizations are totally worth the longer "link" times for medium and large programs and that part ISN'T especially well multithreaded.

And great that parts of photoshop that were tested by the benchmark were multithreaded - doesn't change the fact that several filters aren't.


And now test how great your speedup is with link time optimizations and the VS compiler on a larger project. No idea about the Intel compiler, but at least for the VC++ you can gain some large performance increases from PGO + LTCO..


Who cares, really? Many of the intensive filters are multi-threaded. Even if they weren't, it doesn't change the outlook of anything or my main point. It's only one program out of the ~20.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Who cares, really? Many of the intensive filters are multi-threaded. Even if they weren't, it doesn't change the outlook of anything or my main point. It's only one program out of the ~20.
Not "many", "some" - and depending on what you do the performance will be quite limited by single threaded performance (sadly I know that all too well; also lots of the nicely multithreaded stuff is/can be implemented on the GPU which is much faster than any CPU for that stuff anyhow)

Then the same is true for VS so that doesn't fit in the list that well, excel? Same thing as VS/PS. So then we end up with some benchmarks (who cares?), some encoding (yup that's embarissingly parallel) and compressing stuff (same as encoding). Not such an impressive list anymore that.

Not the best examples of why single threaded performance doesn't matter anymore I fear.
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Not "many", "some" - and depending on what you do the performance will be quite limited by single threaded performance (sadly I know that all too well; also lots of the nicely multithreaded stuff is/can be implemented on the GPU which is much faster than any CPU for that stuff anyhow)

Then the same is true for VS so that doesn't fit in the list that well, excel? Same thing as VS/PS. So then we end up with some benchmarks (who cares?), some encoding (yup that's embarissingly parallel) and compressing stuff (same as encoding). Not such an impressive list anymore that.

Not the best examples of why single threaded performance doesn't matter anymore I fear.

As far as Visual Studio goes, the time-consuming process (compiling, mostly) is heavily multi-threaded. Excel simulations and math calculations are multi-threaded, as well. Looks to me like you're grasping at straws.

And I never said single-threaded performance doesn't matter; as a matter of fact, I said the opposite. Perhaps you should read a page or two back before you make false statements.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
As far as Visual Studio goes, the time-consuming process (compiling, mostly) is heavily multi-threaded.
You ignored the whole LTCO part, did you? Because if you enable that you'll notice that suddenly linking takes an awful amount of time (well technically the parts that take the extra time are still mostly compiling) and isn't especially well multithreaded.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
You ignored the whole LTCO part, did you? Because if you enable that you'll notice that suddenly linking takes an awful amount of time (well technically the parts that take the extra time are still mostly compiling) and isn't especially well multithreaded.

You didn't use a clear term. Might want to explain what is LTCO. Are you referring to LCTG?

Again, the most time-consuming process is compiling, and that's completely multi-threaded. You can nitpick one or two things, but it doesn't change things: we're talking about ~20 applications.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
You didn't use a clear term. Might want to explain what is LTCO. Are you referring to LCTG?

Again, the most time-consuming process is compiling, and that's completely multi-threaded.
Ups, yeah LCTG - used the correct one before though (generation instead of optimization, oh well). And again that's not true. If you use LTCG + PGO on a large project, the most time is spent linking and that is sadly not especially well multithreaded.

So basically the only applications that are completely multithreaded are encoding/compressing (and I'm wondering how long we'll use the CPU for that stuff anyhow) - the usual parallel stuff, for everything else the picture is more complex. My own photoshop stuff is sadly limited to one core for a rather large portion of time as a good example and uses the GPU for lots of ex-CPU intense stuff.
 
Last edited:

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
Ups, yeah LCTG - used the correct one before though (generation instead of optimization, oh well). And again that's not true. If you use LTCG + PGO on a large project, the most time is spent linking and that is sadly not especially well multithreaded.

So basically the only applications that are completely multithreaded are encoding/compressing (and I'm wondering how long we'll use the CPU for that stuff anyhow) - the usual parallel stuff, for everything else the picture is more complex. My own photoshop stuff is sadly limited to one core for a rather large portion of time as a good example and uses the GPU for lots of ex-CPU intense stuff.

Even if you were to take out Visual Studio and Photoshop and place them as being both single and multi-threaded, it wouldn't change the picture any. What are you arguing here, really? It still leaves Xmpeg + DviX, x264 HD 2nd Pass, 3dsmax 9, Cinebench MT, POV-Ray, 7-zip, Sorenson Flash Video Creation, and Microsoft Excel (math calculations and simulations).

Most applications are still mildly multi-threaded and multi-threaded... :|
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
I'm arguing that apart from a handful applications (encoding, compressing,.) most consumer applications aren't especially heavily multi-threaded, ie don't scale especially well to more than 2-4 cores at the moment and the most useful applications listed in your "heavily multithreaded list" shouldn't be put in it without a big asterisk. Which gives a much more realistic real-life scenario, because that'll be true for the next time to come: Even programs that can use lots of cores for some stuff, still have large parts that aren't particularly well multi-threaded.


Eg Excel is only truly multi-threaded if you limit yourself to monte-carlo simulations and similar stuff.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
I'm arguing that apart from a handful applications (encoding, compressing,.) most consumer applications aren't especially heavily multi-threaded, ie don't scale especially well to more than 2-4 cores at the moment and the most useful applications listed in your "heavily multithreaded list" shouldn't be put in it without a big asterisk. Which gives a much more realistic real-life scenario, because that'll be true for the next time to come: Even programs that can use lots of cores for some stuff, still have large parts that aren't particularly well multi-threaded.


Eg Excel is only truly multi-threaded if you limit yourself to monte-carlo simulations and similar stuff.

Your point is...? I already said most applications are mildly multi-threaded (use two-four cores/threads) or multi-threaded, which to now holds true.

Even if we were to say (erroneously) that, as you say, Excel, Photoshop, and Visual Studio are just as single-threaded as they are multi-threaded, we'd have this new list:

Single-threaded: iTunes, LAME, Photoshop, Visual Studio, Excel, Cinebench ST

Mildly multi-threaded: almost all games, Windows Media Encoder 9, x264 1st Pass, LAME-MT, Blender 3D, PAR2 MT Decompression, WinRAR 3.80

Multi-threaded: Photoshop, Xmpeg + DviX, x264 HD 2nd Pass, 3dsmax 9, Cinebench MT, POV-Ray, 7-zip Compression/Decompression, Visual Studio 2008, Sorenson Flash Video Creation, Microsoft Excel

That's six single-threaded programs, seven mildly multi-threaded programs, and ten multi-threaded programs. And gaming in reality encompasses more than 20 mildly multi-threaded applications, so I'm giving the worst-case scenario.

So we go back to where I was before: most applications are mildly multi-threaded and multi-threaded. Given that Sandy Bridge does great at single-threaded, great at mildly multi-threaded and good at multi-threaded while Bulldozer will probably do bad at single-threaded, mediocre at mildly multi-threaded, and great at multi-threaded, I think most enthusiasts will choose to go with Sandy Bridge. It just has more balance.

There's a reason why AMD needed so many revisions and got so many delays and tried desperately to increase clock speeds: so it's not extremely embarrassed in anything that's not multi-threaded. AMD wouldn't be putting four modules on these unless they absolutely needed to. Again, I smell another Core i7-860 vs Phenom II X6 1090T coming up, with the exception that Bulldozer should have much better power consumption than K10.5.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Here we go Opteron dogfights ;)


Edit: BD's are ES again

4 x Opteron MC 6134 (8 Cores):
http://browse.geekbench.ca/geekbench2/view/405187

2 x Opteron Interlagos (16 Cores):
http://browse.geekbench.ca/geekbench2/view/484217

2 x Opteron MC 6180 (12 Cores):
http://browse.geekbench.ca/geekbench2/view/402195

2 x Opteron Interlagos x12 (12 Cores):
http://browse.geekbench.ca/geekbench2/view/483705

In most of those benchmarks it shows singlethread going up with multithread going down

Also the last two are invalid comparisons as you are doing Magny Cours 64bit vs Interglagos 32bit
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Even if you were to take out Visual Studio and Photoshop and place them as being both single and multi-threaded, it wouldn't change the picture any. What are you arguing here, really? It still leaves Xmpeg + DviX, x264 HD 2nd Pass, 3dsmax 9, Cinebench MT, POV-Ray, 7-zip, Sorenson Flash Video Creation, and Microsoft Excel (math calculations and simulations).

Most applications are still mildly multi-threaded and multi-threaded... :|
You said it , "are still" , but as times goes by , is there a high
probability that they will be even less multithreaded or is it likely
that parallelisation will improve ?...():)
 

zlejedi

Senior member
Mar 23, 2009
303
0
0
By the time almost everything is multi threaded will we be still using BD/SB or rather some nice 14nm chip with 16 cores bought for 100$ ? ;)
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
Code:
void matrixMult(int **a, int **b, int **c, unsigned i0, unsigned i1, unsigned j0, unsigned j1, unsigned k0, unsigned k1) {
	unsigned di = i1 - i0, dj = j1 - j0, dk = k1 - k0;
	const int LEAFSIZE = 8; /* for simplicity hardcoded here, 8 isn't an especially great default value obviously on modern cpus */
	if (di >= dj && di >= dk && di > LEAFSIZE) {
		unsigned im = i0 + di / 2;
		matrixMult(a, b, c, i0,im,j0,j1,k0,k1); 
		matrixMult(a, b, c, im,i1,j0,j1,k0,k1);
	} else if (dj >= dk && dj > LEAFSIZE) {
		unsigned jm = j0 + dj / 2;
		matrixMult(a, b, c, i0,i1,j0,jm,k0,k1); 
		matrixMult(a, b, c, i0,i1,jm,j1,k0,k1);
	} else if (dk > LEAFSIZE) {
		unsigned km = k0 + dk / 2;
		matrixMult(a, b, c, i0,i1,j0,j1,k0,km); 
		matrixMult(a, b, c, i0,i1,j0,j1,km,k1);
	} else {
		for (unsigned i = i0; i < i1; i++)
			for (unsigned j = j0; j < j1; j++)
				for (unsigned k = k0; k < k1; k++)
					c[i][j] += a[i][k] * b[k][j];
	}
}

*sigh*

You can't thread this code... It has to finish. The code has to finish on the same thread it was started. If you didn't, you would have to calculate the answer to

if (di >= dj && di >= dk && di > LEAFSIZE) {
unsigned im = i0 + di / 2;
matrixMult(a, b, c, i0,im,j0,j1,k0,k1);
matrixMult(a, b, c, im,i1,j0,j1,k0,k1);

and take the answer and start a new thread. But that's useless.
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
By the time almost everything is multi threaded will we be still using BD/SB or rather some nice 14nm chip with 16 cores bought for 100$ ? ;)

Quite possible, however I think to look at it as a 4850 GPU if it equals good performance. The 4850 is still a decent GPU.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
*sigh*

You can't thread this code... It has to finish. The code has to finish on the same thread it was started.
Wait you're claiming all those fork/join frameworks out there don't exist that make it trivial to parallelize that code? Heck even Java7 already ships with one, which is the best sign that the technology stopped being new about one decade ago (heck Leiserson started with that stuff when? Seems like a long time ago)

But if you've got no idea, just assume that the code is trivially to parallelize and tell us how to improve the code so that HT won't bring any performance gains because otherwise the code is obviously unreasonable and horribly written.
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
*sigh*

You can't thread this code... It has to finish. The code has to finish on the same thread it was started. If you didn't, you would have to calculate the answer to

if (di >= dj && di >= dk && di > LEAFSIZE) {
unsigned im = i0 + di / 2;
matrixMult(a, b, c, i0,im,j0,j1,k0,k1);
matrixMult(a, b, c, im,i1,j0,j1,k0,k1);

and take the answer and start a new thread. But that's useless.
What?

I take it you have never actually done any multithreaded programming. This is an IDEAL problem for multithreading. (in fact, it is COMMONLY multithreaded, do a google search for multithreaded matrix multiplication).

In fact, most forking recursion like this is a pretty good candidate for multithreading. So long as the data from the first function doesn't change the outcome of the second, you can thread it.
 

zlejedi

Senior member
Mar 23, 2009
303
0
0
Quite possible, however I think to look at it as a 4850 GPU if it equals good performance. The 4850 is still a decent GPU.

Well yes but I sold mine 2 years ago for GTX 260 then 1 year ago i sold 260 to get GTX 470 and now i'm waiting for next gen 28nm cards to sell 470 and.... ;)
 

StrangerGuy

Diamond Member
May 9, 2004
8,443
124
106
Quite possible, however I think to look at it as a 4850 GPU if it equals good performance. The 4850 is still a decent GPU.

If I'm AMD and has a 4850-esque BD chip that 90&#37; good as a 2500K for the price of $150-165 I will be previewing the crap out it. The only reason why AMD didn't because they know it's going to disappoint.
 
Status
Not open for further replies.