Is Ivy Bridge the answer to AMD Bulldozer ?

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

RobertPters77

Senior member
Feb 11, 2011
480
0
0
They are not directly comparable. AMD's performance/transistor, watt, and die area is currently much superior to Nvidia's. It's especially ridiculous when the full Fermi GF100 die has 3.2 Billion transistors at 529 mm^2, and runs the shaders twice the speed as the rest of the chip. Over on the AMD side of things we have Cypress XT with about 2.2 Billion transistors on a 334 mm^2, running a single clock domain over the entire chip that is running much slower than Nvidia's shaders need to be at.
Didn't the folding at home boys report the ati/amd gpu's are barely using 50% of available shaders when folding?


I would liken AMD's graphics as to like many Japanese cars. Highly efficient, well engineered for their intended purpose. Nvidia's graphics are more like an American car. Heavy, and large-muscled with a very high theoretical capability through sheer brute force, but at the cost of fuel and efficiency.
Wait. So Less Shaders that do More = Power Hungry Muscle Car. More Shaders that do less = Lean Efficient Green machine?


so you find all VLIW based designs dissapointing? no offence but if after your thread on this issue you dont understand the differences and how it really doesn't matter, then you have some comprehension issues.
You're right I don't understand. But I want to. Noone can explain to me why Amd's VLIW design doesn't match and/or beat Nvidia's shader design without the same old. 'It wasn't designed that way!' Retort. I don't think Amd cards bad though.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Number of shader units are really terrible ways to measure performance especially when the architecture is so much different.

One could also say that AMD's GPU architecture is akin to having 1600 cores with no SIMD, while Nvidia's GPU is using 400 cores with 4-wide SIMD. The results are same.

Another way of putting AMD's GPU architecture is 320 cores with 5-wide architecture. Which is what it is. Bigger number makes marketing easier though.

(Notice the numbers used for comparison are entirely made up. Calling each shaders cores is also being generous)

Makes no difference to us because the performance ends up roughly in the same class.
 

iSeeker

Junior Member
Mar 14, 2011
6
0
0
[\
I will agree if we only talking about one application running at any given time, but what if we run two apps simultaneously? 16 cores will be faster than 8 and what if we run 3 or four apps simultaneously etc.

AMD and Intel don’t make CPUs for Desktop/Laptop use only

Agree for server purposes, but that's not my interest.

For a lot of desktop users, most multi-tasking is about small utilities that run in background. And often only one UI has focus, with only one main task active.
Even more so for gamers, where everything else is shut down.

My main PC usage is running compilers and running console apps that require heavy 2D calculations and maths (stochastic screening).
My perception of PC performance is the duration of my heaviest thread I'm waiting for and I want to see results from.
All I read about BD is hints towards more cores compared to competition and that worries me a bit.
For me, IB usage of 22 nm sounds more in the direction of increasing single core perfomance, even if it is merely a clock speed increase.

Furthermore, it's going to be 1155 compatible, so it wouldn't stop me to already invest in a i5 2500K based system, a system that has an affordable price for me.
That's what's a bit strange to me, that BD doesn't even try to convince me otherwise. But I could be byassed by none AMD info.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
[\

Agree for server purposes, but that's not my interest.

For a lot of desktop users, most multi-tasking is about small utilities that run in background. And often only one UI has focus, with only one main task active.
Even more so for gamers, where everything else is shut down.

My main PC usage is running compilers and running console apps that require heavy 2D calculations and maths (stochastic screening).
My perception of PC performance is the duration of my heaviest thread I'm waiting for and I want to see results from.
All I read about BD is hints towards more cores compared to competition and that worries me a bit.
For me, IB usage of 22 nm sounds more in the direction of increasing single core perfomance, even if it is merely a clock speed increase.

Furthermore, it's going to be 1155 compatible, so it wouldn't stop me to already invest in a i5 2500K based system, a system that has an affordable price for me.
That's what's a bit strange to me, that BD doesn't even try to convince me otherwise. But I could be byassed by none AMD info.


Multitasking was not possible a few years ago in the Desktop (not to mention Laptops) and most of the users have learned to use their systems in single task mode (one app at the time).

With multi-core CPUs, Desktop systems are becoming more capable of Multitasking and users only now start to learn that they can run more than one app at the same time.

We can decode/trasncode and play games or run more apps at the same time now and people will have to learn to do that in order to fully use their systems. Multitasking is how we need to learn to use our Desktops/laptops and not waiting for Multithreaded programs to do the entire job.

Bulldozer will not only have more cores than SB (1155) but its IPC will be higher than Phenom II. It might not have the same IPC as SB but the combination of higher core count and higher IPC from last gen AMD processors will make BD a very competitive processor at 32nm.

And I have just learned that there will be an AMD 980G chipset for people that need integrated graphics.

Edit: 980G and not 990G
 
Last edited:

iSeeker

Junior Member
Mar 14, 2011
6
0
0
We can decode/trasncode and play games or run more apps at the same time now and people will have to learn to do that in order to fully use their systems. Multitasking is how we need to learn to use our Desktops/laptops and not waiting for Multithreaded programs to do the entire job.

I agree that multi-tasking only recently became real multi-tasking with the introduction of multi-core cpu's.
But I'm not so agree that humans needs to become more multi-tasking aware, because their new PC hardware asks for it.
The human kind (especially man :) is not very good at multi tasking, and therefore user interface systems have only one controll focused at any given time.
All we do with our PC is sequential context switching from one app to another, and even in that we are not so strong ('what was I doing again with that app?').

Another objection is causality: When I switch on my app for a specific task, I can't start with my next task before I know the results of the first. And my PC work is mainly based on that.

So for me multi-tasking should be an internal thing that I'm not too much bodered with. And I think current system software (OS, UI, apps, ....) has not yet that intelligence. Now I'm more helped with super single threaded performance.
 

podspi

Golden Member
Jan 11, 2011
1,982
102
106
Now I'm more helped with super single threaded performance.

It is hard to benchmark that sort of thing... Are you using a certain program or is this all custom-written stuff?

Honestly, you should always get whatever works best for you. It sucks if you have a company preference and they're just not selling you what you need, but Intel (from a product point of view) is not a bad company. Their behavior hasn't always been the best :thumbsdown: but supposedly they've paid their debt to society.

When I needed a portable computer (years ago) I bought Intel, even though I preferred AMD, because the C2D was much, much, much better than the Turion. When I needed a work machine, everything I do is embarrassingly parallel, so I picked up an X6.
 

iSeeker

Junior Member
Mar 14, 2011
6
0
0
Honestly, you should always get whatever works best for you
That's right, and maybe my posts are probably too much specific for my needs.
Finding a benchmark for me is not difficult: I just run my program that relaxes a 2D field of points into a nearly perfect poisson disc spaced distribution. I know I should re-engeneer my (highly recursive) implementation towards more core usage, but I have no Einstein brain, and I don't have enough resources and time to just concentrate on this task.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
My perception of PC performance is the duration of my heaviest thread I'm waiting for and I want to see results from.
All I read about BD is hints towards more cores compared to competition and that worries me a bit.
The solution could be to have HW-based priorized threads, else any other not so important thread would activate some HW, maybe some additional cores, so taking away potential performance (which lies in free TDP headroom for being applied by turbo) from the important thread.

Else efficiently implemented turbo modes are a sound solution. Many threads -> let them run on multiple cores for throughput. Few or single threads -> switching off unused cores, use free power budget for turbo to make the remaining cores faster. Twice the power could actually lead to 30-40% higher clock (with voltage adjusted accordingly). Reduced sharing of the IMC and L3 caches adds to that.

And it's not like the BD core is a small core. Actually the module (the originally designed "core") is slightly larger than a SB core. Just that AMD went for dedicated integer and L/S resources for the second thread while on SB the whole core is shared between the two threads.
See http://www.xtremesystems.org/forums/showpost.php?p=4760114&postcount=173 for a comparison.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
That's right, and maybe my posts are probably too much specific for my needs.
Finding a benchmark for me is not difficult: I just run my program that relaxes a 2D field of points into a nearly perfect poisson disc spaced distribution. I know I should re-engeneer my (highly recursive) implementation towards more core usage, but I have no Einstein brain, and I don't have enough resources and time to just concentrate on this task.
This will be an interesting use case on BD. I assume that you're using floating point and you seem to compile it for your target hardware. So your program might benefit at least from
a) a beefed up FPU with FMA instructions
b) improved, decoupled branch prediction, which might handle the recursive calls well
c) 8MB of L3 to be used by your program + 2 MB of L2 (not shared, if only one thread)
d) an improved NB+IMC (they said 50%) since you seem to work with large working sets
e) improved turbo based on power consumption estimation + power gated modules, so the module used by your program should easily stay in the highest P-state. The already mentioned 500MHz all core boost is not the highest P-state BTW.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,139
4,010
136
that post from SX really shows how much bigger the int core is on bulldozer comapred to STARS with 1/4 of the cache the core is 16% bigger. just eyeballing it it looks like -L1D BD int core is around 45% larger then a STARS core.

here is my bad at work eyeballing math :D

Llano 3.3mm sq
Llano L1D littel under a 1/3 of core size so 1mm sq

BD 3.84 mm sq
BD cache is about 1/2 the size of Llano L1D so 0.5mm

so 3.34/2.3=1.45

is there any more detailed analysis of the SB core then in that link?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
It's critical you include the L1 caches in core measurements. That's how important they are to the function of a particular architecture.

So the correct core comparison is

Sandy Bridge: minus the L2
Bulldozer: minus the one "core"
K10: of course this is just the L1 included core

16.5mm2 vs. 15.6mm2 vs. 9.69mm2 respectively

Bobcat is not comparable because its not even on the same process generation.
 
Last edited:

itsmydamnation

Diamond Member
Feb 6, 2011
3,139
4,010
136
It's critical you include the L1 caches in core measurements. That's how important they are to the function of a particular architecture.

So the correct core comparison is

Sandy Bridge: minus the L2
Bulldozer: minus the one "core"
K10: of course this is just the L1 included core

16.5mm2 vs. 15.6mm2 vs. 9.69mm2 respectively

Bobcat is not comparable because its not even on the same process generation.

i know, but you get an idea how much more logic there is in BD int core compared to STARS(that was my only point). its also interesting that even though BD L1D its about 1/2 the physical size of llano its only 1/4 of the size in bytes with higher latency but higher associativity. also is L1D in STARS exclusive to L2 ( i think it is).
 

Morg.

Senior member
Mar 18, 2011
242
0
0
Your question is better suited for Marty McFly or Christopher Lloyd in all honesty.

One only has to look at the sheer amount of hardware AMD has to throw at Intel just to compete, 8 CPU's??? Seriously???? You really need eight cores to be competitive AMD !?!?!?

We'll have to wait and see, as I think comparing it to Sandy Bridige is a bit of a stretch in itself.

This is obvious nonsense, my dear.
Intel needs HT to compete with AMD's number of cores ?

What in the end matters is the following :

Native thread count (HT does NOT count as it's NOT as effective, but you *could* count HT threads if your implementation requires massive threading)
Max monothread power (HT makes you lose a f*ton of that, yes it does ;) )
Max multithread power (AMD and Intel have always been tied at every price point they fight for, with AMD paying for the spot and getting it)
Energy efficiency (Those CPU's are not designed to fit in your gaming rig, they are designed to be Server CPU's and Intel and AMD get a market share of high-end desktop out of cheaper versions of these Server CPU's)
Scalability (again these are not toys, they're server cpu mods)

And, so far :
native thread count : HT Intel > AMD > non-HT Intel
Max monothread power : non-HT Intel > AMD > HT Intel
Max multithread power : tied @ same price
Energy efficiency : Intel > AMD for 1 and 2 socket applications
Scalability : AMD >>>> Intel (yes, Intel never went past dual socket for some reason)


The future is in parallel computing, and nVidia will kill both AMD and Intel on that one, you can be sure.

In that sense, I would place the bet on AMD this time, as they have numerous time shown they are more ready for parallel than Intel.

(AMD dual core vs Core 2 Duo)
(Phenom real quad core vs core 2 quad)
(core i7 copying the multicore architecture first shown by AMD)
(Bulldozer another innovative architecture for multicore die)

Now will it make any difference ... who knows.

You have to remember that Intel only has one single thing now : the core2 core design, which it's been stretching from core 2 duo to sandy bridge and probably ivy bridge too.

At one point, that core design advantage is going to disappear, and all the other factors are going to matter more.

And while they are able to copy multicore designs from AMD, they still release it much later (ideas from the phenom 1 were not integrated in Intel products until the i7 - read actual quad core architecture, imc, ht (rebranded quickpath but who cares)).

The only reason I would bet on Intel is because I know they don't mind getting their hands dirty to keep the business running :)
 

996GT2

Diamond Member
Jun 23, 2005
5,212
0
76
In that sense, I would place the bet on AMD this time, as they have numerous time shown they are more ready for parallel than Intel.

(AMD dual core vs Core 2 Duo)
(Phenom real quad core vs core 2 quad)
(core i7 copying the multicore architecture first shown by AMD)
(Bulldozer another innovative architecture for multicore die)

"More ready"=worse performance?

Do I need to remind you that:

1) Core 2 Duo annihilated the Athlon X2s of the era in performance
2) Core 2 Quad annihilated Phenom I Quads and is still faster than a Phenom II Quad clock for clock
3) Core i7 (Nehalem) annihilates anything AMD has out right now

And we haven't even mentioned Sandy Bridge yet.

Why do you feel like placing the bets in AMD's favor again??
 
Last edited:

Morg.

Senior member
Mar 18, 2011
242
0
0
"More ready"=worse performance?

Do I need to remind you that:

1) Core 2 Duo annihilated the Athlon X2s of the era in performance
2) Core 2 Quad annihilated Phenom I Quads and is still faster than a Phenom II Quad clock for clock
3) Core i7 (Nehalem) annihilates anything AMD has out right now

And we haven't even mentioned Sandy Bridge yet.

Why do you feel like placing the bets in AMD's favor again??

You sir iz correct.
As I said : core design on core2-> core i7 >>> core design on anything AMD since barton more or less.

On the other side : multicore die architecture design on amd from X2 to phenom II >>>> multicore die architecture design from core2 to core i7 (well not really anymore since i7 is more or less the same die architecture as phenom2).

And hey, i've got an oc'd e6600 @ home I know its better.

I feel like placing the bet on AMD because the deeper you go in parallelism , the more multicore architecture supplants core architecture.

Also, it's been a while since AMD announced bulldozer and that implies (for me) a new core design, which was their problem when racing against intel's core2 cores.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
"More ready"=worse performance?

Do I need to remind you that:

1) Core 2 Duo annihilated the Athlon X2s of the era in performance
2) Core 2 Quad annihilated Phenom I Quads and is still faster than a Phenom II Quad clock for clock
3) Core i7 (Nehalem) annihilates anything AMD has out right now

And we haven't even mentioned Sandy Bridge yet.

Why do you feel like placing the bets in AMD's favor again??

Have you actually looked at any heavily threaded embarrassing parallel benchmarks? AMD usually wins those, even at the high end (Magny Cours vs Xenon). Also, they have better CPU to CPU communication in their processors, which is why they have 4P servers, and Intel only has 2P servers. AMD processors scale better with more cores as well. IDC has shown this in his posts in the past. So yes, AMD is currently better at scaling than Intel. Although given the resources of both, I would not be surprised if Intel caught up, and quickly, in that area.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
i know, but you get an idea how much more logic there is in BD int core compared to STARS(that was my only point). its also interesting that even though BD L1D its about 1/2 the physical size of llano its only 1/4 of the size in bytes with higher latency but higher associativity. also is L1D in STARS exclusive to L2 ( i think it is).
As I wrote on twitter you have to be careful with mAJORD's Llano core area analysis since he didn't include some parts of the integer core:
Llano-x86-Core.jpg


At least the parts left and above "Exec" should be included as well.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Have you actually looked at any heavily threaded embarrassing parallel benchmarks? AMD usually wins those, even at the high end (Magny Cours vs Xenon).

And GPUs beat them both. Those heavily threaded embarrassing parallel applications are only a small piece of the market, which can be offloaded now.


Also, they have better CPU to CPU communication in their processors, which is why they have 4P servers, and Intel only has 2P servers. AMD processors scale better with more cores as well.

Are you sure about that? Nehalam-EX and soon to be released Westmere-EX: (8p servers)
http://www.xbitlabs.com/news/cpu/di..._Critical_Servers_Features_to_Mainstream.html
http://www.theregister.co.uk/2011/02/25/intel_westmere_ex_sandy_bridge_ep_xeons/
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Have you actually looked at any heavily threaded embarrassing parallel benchmarks? AMD usually wins those, even at the high end (Magny Cours vs Xenon). Also, they have better CPU to CPU communication in their processors, which is why they have 4P servers, and Intel only has 2P servers. AMD processors scale better with more cores as well. IDC has shown this in his posts in the past. So yes, AMD is currently better at scaling than Intel. Although given the resources of both, I would not be surprised if Intel caught up, and quickly, in that area.

Been a while since I've compiled data with recent hardware, and of course the result is app dependent, but the following showcases what Martimus is speaking to:
Euler3DBenchmarkScaling.gif


Looking forward to comparing Sandy-Bridge Xeons and Bulldozer thread scaling, mono-socket as well as multi-socket. :thumbsup:

Gotta wait for the hardware and then the reviews of the hardware though. :(
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Been a while since I've compiled data with recent hardware, and of course the result is app dependent, but the following showcases what Martimus is speaking to:

Would be interesting to see that chart with Beckton and Tukwila on there. :awe:
 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
It's really simple actually. If 8 cores outperform 4 cores, then more cores are preferable and that's the bottom line, end of story. Sure you can pick a couple corner cases where the i3 can outperform thuban, but i'll guarantee you that I can pick more cases that thuban demolishes i3. So because you find one or two examples where i3 is faster, you are suggesting that proves fewer cores are better? C'mon dude, i've no doubt you know better than that.
Please show me some games that benefit from 8 cores. You can't. OTOH I can show you a ton of games that benefit from faster single threaded performance.

Single threaded performance is still very important, and I really hope that Bulldozer can compete with Intel when Bulldozer comes out.