• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Can AMD "rescue" the Bulldozer?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
IMO the shared module design is the biggest drawback, so the best real fix would be a pretty dramatic change. Duplicate the missing portions on the 'cores' so that each core is complete in and of itself as far as FPU/INT.

The problem with the shared core/module design is that it is something you should ONLY be resorting to IF your problem is that you already have to much single-threaded IPC and not enough cores.

Taking already weak cores and making them even weaker by hobbling them and sharing components is a doomed scenario. Time to go cruise the Titanic through the N.Atlantic in January :|
 
I think they really need to work on the L3 latency, not a designer but it is getting killed by intel in L2 and L3 latency. Go for a smaller and faster L3, save some die space, make some more profit
 
I do think it's about time AMD was able to get their cache at core clock speed. Company fell flat after Hypertransport, HT development could have paralleled well with some serious cache and uncore improvements but it seems they've only taken several baby steps forward. Even at the slow pace they've had I think it's the main thing that may make a niche for BD Interlagos in server, lots of memory channels combined with a good interprocess connection.
 
Even if they do respin it and fix the performance and adjust their out to lunch pricing on them it still isnt going to be enough for some people to go back to AMD.

Remember its not just the performance that was a flop but the marketing as well, personally i find the constant lies that IPC will increase(even long after AMD knew that was BS), will work on current AM3 boards(said in 2010, also BS), and their brutal attacks on people posting leaked benchmarks saying they are BS and spreading FUD and are in no way real(when they were true 90% of the time). Perhaps this is why they fired most of the marketing department, i dont know, but either way recovering from that will take as much work as fixing the CPU. I for one wont consider an AMD CPU again till their marketing department has a clean track record for 2-3 years straight with no BS.
 
Bulldozer in fact does work on some AM3 boards. I'm not sure if it's due to board design, or something as simple as board manufacturers releasing a BIOS that supports it. But ... some AM3 boards can run Bulldozer processors.
 
I think they really need to work on the L3 latency, not a designer but it is getting killed by intel in L2 and L3 latency. Go for a smaller and faster L3, save some die space, make some more profit

well, there are some rumours that bulldozer L2, L3 and floating point are all working at half clock.
and the trinity "ipc boost" is pretty much the L2 and floating point are now at full clock.
 
Bulldozer in fact does work on some AM3 boards. I'm not sure if it's due to board design, or something as simple as board manufacturers releasing a BIOS that supports it. But ... some AM3 boards can run Bulldozer processors.

It was discussed prior to Bulldozers release that some AM3 listed motherboards actually had AM3+ sockets in them. There was a change made to the socket, but the motherboard and chipset all remained the same so it was still released as an AM3 motherboard. Basically they kept making the same motherboard, but slapped an updated socket into it without officially changing any of the motherboards specs. Bulldozer technically doesn't work in AM3, I think the easiest distinction was color difference of the socket.

Can anyone else confirm this? Or am I wrong?
 
Well they used a new 32nm fabrication process, and a new architecture and still fail to beat there older processors. There basically still in 2008.
 
well, there are some rumours that bulldozer L2, L3 and floating point are all working at half clock.
and the trinity "ipc boost" is pretty much the L2 and floating point are now at full clock.
If you have seen the programming guide for AMD h15(Bulldozer),the IMUL is indeed working at half clock....
 
Well they used a new 32nm fabrication process, and a new architecture and still fail to beat there older processors. There basically still in 2008.

This sums up Bulldozer pretty well.


They moved to 32nm, they had a whole new design and still failed to improve over the old cpu's let alone Intels

They couldnt even beat themselves LOL hahahah AMD really did mess up big time with this
 
BD probably need to boost performance by 30-40% to have something that's remotely threatening to the SB much less the new Ivy that's coming up. I doubt that can be pulled off any time soon, even intel only manages 15-20% boost per new chip per year. A BD derivative that is 30-40% better will probably be at least a year from now but from amd's own presentation slides they only expect 15-20% boost per year. so it will take them about 2 more years to catch up w/ current crop of Intel chips, but by then Intel already put out haswell and its successor. With AMD's track record, I just hope they don't keep regressing in performance/efficiency is already doing pretty well.
 
BD probably need to boost performance by 30-40% to have something that's remotely threatening to the SB much less the new Ivy that's coming up. I doubt that can be pulled off any time soon, even intel only manages 15-20% boost per new chip per year. A BD derivative that is 30-40% better will probably be at least a year from now but from amd's own presentation slides they only expect 15-20% boost per year. so it will take them about 2 more years to catch up w/ current crop of Intel chips, but by then Intel already put out haswell and its successor. With AMD's track record, I just hope they don't keep regressing in performance/efficiency is already doing pretty well.

My guess is that the current Bulldozer cores are pretty small. The decode isn't very wide at 4 issue for two integer cores. That comes out to 2-wide per core.

The old Phenom II/Lisbon CPUs were 3-wide and very likely had larger integer cores.

So my guess is that there is plenty of room to upgrade Bulldozer assuming something else in the design isn't bottle necking single thread performance?

On the other hand, maybe AMD won't upgrade Bulldozer to wider decode and larger cores? Maybe they keep the design "small" overall and perfect it for low power mobile APU and many core server SKUs? (This makes a degree of sense to me considering AMD's process tech disadvantage)

Perhaps (for future higher single threaded performance designs) AMD will go back to Phenom II/Lisbon (or Llano) and build up from that 3-wide design? Maybe add SMT, beef up the core and add a 4-wide decode like the Intel CPUs?
 
Last edited:
BD probably need to boost performance by 30-40% to have something that's remotely threatening to the SB much less the new Ivy that's coming up. I doubt that can be pulled off any time soon, even intel only manages 15-20% boost per new chip per year. A BD derivative that is 30-40% better will probably be at least a year from now but from amd's own presentation slides they only expect 15-20% boost per year. so it will take them about 2 more years to catch up w/ current crop of Intel chips, but by then Intel already put out haswell and its successor. With AMD's track record, I just hope they don't keep regressing in performance/efficiency is already doing pretty well.

If BD gets a 10% boost over the whole current lineup they would be very competitive with intel for all the models. 30-40% would bring them in the range of current released SB-E.

Wheter you like BD or not, if it would have launched at 3.9-4.5GHz as topmodel instead of 3.6-4.2 they would have been very competitive with the 2600 which is the current topmodel of intel main stream.

Wether AMD can 'fix' bd is a whole other thing. They would at atleast need to be able to reach 3.9-4.2GHz (process fix) and an ipc gain (especially for the worst case scenarios) to compete with IB in performance. Gaining 20% performance seems doable when you have manufacturing issues you might solve (clock limitations) and can do some refinement in BD. However I don't think they can become competitive in performance/W since they would need to increase performance while dropping 50W.
 
I read that Windows 8 is supposed to make better use of the Bulldozer architecture than 7 or XP do. That might help.
 
My guess is that the current Bulldozer cores are pretty small. The decode isn't very wide at 4 issue for two integer cores. That comes out to 2-wide per core.

The old Phenom II/Lisbon CPUs were 3-wide and very likely had larger integer cores.
That really sounds like what this website's editor is believed...
Well, I once thought this is not that bad. "Core" can handle 4 Mops, But the 4 Decode Unit can issue 4-8 Mops.
Does it sounds like Intel's design of 1Complex + 3Easy is more useful😕
Anyway, I see Real world tech wrote that Bulldozer can only let 4 Mops go through the Dispatch, which is an information I can't find on the software optimization guide of Bulldozer.
If that's true, this will be an essential problem.
It sounds like AMD design the Decode Unit wrongly:hmm:
 
and now the Athlon 2 and Phenom 2 line is no longer being made by AMD, so they are only good for mobile and GPU's.

http://www.tomshardware.com/news/amd-cpu-apu-athlon-phenom-Llano-Bulldozer,14173.html

So glad I got my 1055T last Friday!

Surely AMD would make a better profit on Thuban because its cheaper to make anyway? I mean, due to its smaller die size even at a larger process node.

AMD needs to:
1. Reduce L2 cache to 256k per core and seriously reduce latency of both L2 and L3.
2. Bump decode units back up to 3 per core.
3. Bump execution units up to 3 ALU/AGU per core.
4. Double shared floating point execution units.

That would just about fix it. they can keep the longer pipeline if GF can sort out its issues and boost clockspeeds such that the top SKU can turbo up to at least 4.5GHz without needing 1.21 jigawatts.
 
If you have seen the programming guide for AMD h15(Bulldozer),the IMUL is indeed working at half clock....

thanks, i didn't read it.

I read that Windows 8 is supposed to make better use of the Bulldozer architecture than 7 or XP do. That might help.

The scheduler will try to use both cores of the module, this will put the other modules to be gated off, and this will allow the module in use to turbo more.
It will help more in the power savings features, the performance gain won't be high.
 
So intel's AMD killer is selling at a measly 3:1 ratio. There's nothing there to be impressed about for intel considering every review on the net has proclaimed AMD's death due to Bulldozer's 'performance'. Gotta love the propaganda, keep up the good work it seems to be working in AMD's favor! 🙂

Also just as a reminder, intel's virtualization is broken in sb.

It's seems we underestimated masses of idiots being lured to more cores more Ghz marketing.
 
It was discussed prior to Bulldozers release that some AM3 listed motherboards actually had AM3+ sockets in them. There was a change made to the socket, but the motherboard and chipset all remained the same so it was still released as an AM3 motherboard. Basically they kept making the same motherboard, but slapped an updated socket into it without officially changing any of the motherboards specs. Bulldozer technically doesn't work in AM3, I think the easiest distinction was color difference of the socket.

Can anyone else confirm this? Or am I wrong?

I have 2 ASUS M4A89GTD PRO/USB3 that are AM3 socket that I bought last February. There is a beta BIOS (3027) for these boards that adds support for BD. the socket is definitely AM3 and not AM3+. I'm not going to upgrade (downgrade?) to BD as I already have 1090t in both and see no need to change atm. if they improve BD within the next year, then maybe but I don't think that will happen.

My understanding is that an AM3+ chip will physically fit in a AM3 socket, it's up to the vendor to add support for it as AMD will not officially support that option.
 
Last edited:
So glad I got my 1055T last Friday!

Surely AMD would make a better profit on Thuban because its cheaper to make anyway? I mean, due to its smaller die size even at a larger process node.

AMD needs to:
1. Reduce L2 cache to 256k per core and seriously reduce latency of both L2 and L3.
2. Bump decode units back up to 3 per core.
3. Bump execution units up to 3 ALU/AGU per core.
4. Double shared floating point execution units.

That would just about fix it. they can keep the longer pipeline if GF can sort out its issues and boost clockspeeds such that the top SKU can turbo up to at least 4.5GHz without needing 1.21 jigawatts.

So basically you want them to make 2 full cores again... waste alot of die space in the process and call it da day.

Reducing L2 to 256 is bad. 1MB sounds better. Reduce latency of this cache and implement a L0 cache like structure or other means to bypass branch hits. (think this is already in the pipeline for steamroller).
Decoding width is fine, in combination with some enhancements like SB should do the trick.
Bumping exeuction resources might be an option, although they would be far better of by making their AGU do more basic calculations.
double the fpu is pure nonsense, affecting execution times would have more effect...
 
Realistically, in terms of what is somewhat possible, AMD could do the following:

Reduce cache latencies across all levels by about 25%.
Improve the prefetcher by 10%.
Add one more INT execution unit per core (2 per module).
Improve fpu throughput by 20%.

If they did ALL those things they might match the IPC of sb. For a company like AMD that would take two+ years. And in two years intel will have something with 20% more ipc than sb. With intel milking it, it will be more like 10%.
 
Back
Top