I think AMD will need software to be compiled to use it's FMAC units, but since AMD's FMAC won't be compatible with Intel's future FMAC I don't hold much hope for support outside of HPC until both AMD and Intel supports the same standard.
Why would AMD need software to be recompiled?
The way I understand it FMAC4 (Bulldozer) is FMAC3 (Haswell) with the ability to assign the accumulation to a 4th register. It seems to me that FMAC4 is a super set of the proposed operations and all FMAC3 operands could decode into FMAC4 operations.
I agree with you, since most of the calculations are done on the integer unit, make sense doubling them, we don't know how much AMD beefed up each unit to increase IPC, but AMD's attack against Intel's HT is throwing more hardware at it. Hyper Threading benefits of the bubbles in the execution pipeline, but what if a thread is able to fullfill the pipeline? Or what if two thread stalls within the pipeline? Hyper Threading may actually cause a performance drop, plus HT its known to have the cache pollution issue. And AMD has a pretty strong FPU performance, so makes sense to beef up the integer unit.
A pretty basic example of the hardware solution in heavy multi threading scenario, the Anandtech moderator Mark, stated that in F@H, the X6 1090T was actually faster than his heavily overclocked i7 930 or 940, dont remember which one, because you have 4 phi\ysical cores with 4 logical cores sharing execution resources compared to 6 cores with their individual execution resources, if only AMD had better IPC, the performance advantage would be even bigger.
Except that such implementations suck. One of the things that low-latency communications between the GPGPU and CPU should allow for, is for software like that to take advantage of the GPU bits, but be able to use the CPU for branchy goodness. Currently, latency is so high as to be prohibitive for any code that is not exceptionally loosely coupled.P.S. I don't think many people would be concerned about CPU media encoding provided these tasks were better handled on a Discrete Video card or fusion iGPU.
I think this is AMD strategy. Its mid and entry level are becoming AMD Fusion with an IGP already included. I guess theyre assuming if your looking at Bulldozer you'll have a discrete card capable of offloading most FP tasks to a GPU. This is their idea of the future.
Except that such implementations suck. One of the things that low-latency communications between the GPGPU and CPU should allow for, is for software like that to take advantage of the GPU bits, but be able to use the CPU for branchy goodness. Currently, latency is so high as to be prohibitive for any code that is not exceptionally loosely coupled.
AMD hyped it up first, but Intel will actually get there, first, by the looks of things. AMD is currently lacking in software more than hardware, though, and hopefully they will set aside some funds to fix that. nVidia went from wishy-washy on anything not CUDA to, "here's nearly complete support." Intel just hasn't done much publicly since the Larrabee failure, but make no mistake: when they feel ready, it be like nVidia did, with some polish. AMD needs to put a few good people full-time on nothing but their GPGPU software support, both the SDK and drivers. If they can do a good enough job, big companies may end up supporting all three companies within their products, and small companies, along with free projects, could move to generic OpenCL and DirectCompute, with vendor-optimized versions of a single code base (going for the most performance for the dev/support cost, rather than simply max performance, while still getting benefits from everyone's extra hardware).
Personally, I want a >13" very light Bobcat notebook, with a real keyboard, and not pay a mint for it, with a crappy high-power chipset (IE, the larger Atom ones, that fall flat compared to CULV and netbooks), and I don't want Intel's IGP. If a quality maker, like Asus, Samsung, Lenovo, HP (a business model), etc., come out with such an animal, I'd buy one, and finally retire my PIII Thinkpad (Atom+ION is fast enough, but I want a real keyboard, dammit!). I was going to hold out for Cortex A9 machines, but I think there's a better chance of a good Bobcat.
They've already said their first Zambezi chip will be a quad-core. And they've already mentioned before that the performance of one int core of Bulldozer is greater than the int performance of Deneb.I guess I see it a different way. Each core or half module is smaller now since it cut down the number of IEU per core and maybe managed to save some area for the quasi-merged FP unit/scheduler across cores. So it could be that in terms of die area, 1 module is roughly the same area as 1 core and assuming AMD is pricing based on area then for the same money you can get 4 modules of Bulldozer against a 4 core Phenom II.
I could be completely wrong since I haven't looked at any of the die area numbers yet.
JFAmd, I know you're a server-side guy, but if the server bulldozers are drop-in upgrades, are the desktop CPUs going to be AM3 compatible?
The key word there is "likely", that's not very reassuring. And the Gizmodo article http://gizmodo.com/5620423/amd-announces-8+core-bulldozer-cpu makes sound like it won't be upgradeable.Bulldozer: One of two new x86 architectures, Bulldozer will be used in performance desktops and servers. Bulldozer-based modules will serve as the basis for AMDs next generation of processors. The company has already confirmed that itll maintain socket compatibility with existing Magny-Cours-based Opteron processors. Thus, you can expect to see Bulldozer-based CPUs dropping into existing server boards and, likely, Socket AM3 desktop platforms as well. AMDs target power use for Bulldozer-based chips is between 10 and 100 W.
It sounds like you can put Thuban CPUs in the new BD platform but can't put BD in the current Thuban platform. Same pin layout but possibly different power requirements may explain the vagueness about AM3 upgradeability.AMD officials say Bulldozer is being targeted at servers and performance desktop machines. The good news is that Bulldozer will be drop-in compatible with most current high-end servers. The bad news is that it won't be compatible with existing AM3 boards. Instead, AMD says it will introduce a new AM3+ socket. These sockets will be backward compatible with older chips so you could drop a Phenom II X6 in it.
That statement fits in with early reports of "systems with 33 per cent more cores and 50 per cent more performance", ie 16-core BD is 50% faster then a 12-core Magny. The article states that the high end BD 16 core will be around 2.75 GHz.Fruehe has also said in interviews with El Reg that Bulldozer's shared component approach results in a Bulldozer module with two quasi-cores, and yields about 1.8 times the performance as two current Magny-Cours cores. That's a 10 per cent performance hit, clock for clock, for every pair of cores, but much lower power consumption because of the shared nature of the Bulldozer modules.
nVidia went from wishy-washy on anything not CUDA to, "here's nearly complete support."
As it gains traction I suspect the general standards options like DirectCompute, which works on all vendors products that can use DirectX 10 and 11, will take over, rather than one companies implementation.
jvroig said:A 4 module Bulldozer does not go against a 4 core Phenom II. Rather, it would be a 2 module Bulldozer (thus, quad-core vs quad-core), which would then make it: 8 vs 12(Deneb), instead of 16 vs 12 as you stated
The same happened when Core i7 9xx (8 Logical Cores) was introduced and everyone compared them against Core 2 Quad (4 Physical Core).
The difference being the i7 9xx was marketed as a quad.The same happened when Core i7 9xx (8 Logical Cores) was introduced and everyone compared them against Core 2 Quad (4 Physical Core).
That's 8 physical cores. You are letting the whole "half-cores / mini-cores / modules" language distort your idea of what is a logical and a physical core, especially when in the same context you use the same for hyperthreading.4 Module (8 Logical Cores) Bulldozer
That's 8 physical cores. You are letting the whole "half-cores / mini-cores / modules" language distort your idea of what is a logical and a physical core, especially when in the same context you use the same for hyperthreading.
I did, and I just reviewed them now through AT Gallery. I can't see where they called it logical cores. The only mention of 'logical' I saw was in slide 13, where they mentioned "each core is a logical processor from the viewpoint of software". That is far from an admission that each core is simply a logical core. That statement just means "despite all the shared-resources in a Bullldozer (module), the OS will see them as two cores and make use of them as such".Even AMD calls the cores in Bulldozer logical, just look at the presentations
If you did that on a "Technical level", it still won't exactly be fair, right? Because when you take away the semantics ("logical" vs "physical"), each Bulldozer core will have more hardware dedicated to it than the corresponding SB logical core through hyperthreading (assuming the same as Nehalem, which is not too unreasonable an assumption).If you want to be fair in the comparison on a Technical level (Cores) you have to compare a 2 Core SB (4 Logical Cores) to a 2 Module Bulldozer (4 Logical Cores) and the same with Quad Core SB (8 Logical Cores) to a 4 Module Bulldozer (8 Logical Cores).
It's about the ALU+AGU count, not about what is fair or not. If you want to compare based on performance level, or dollar value, or market segment, then go ahead and do so. I was not concerned with that, and if you review that post of mine, you will see I did not bother with anything but correcting the context for ALU+AGU concerns.... which makes me wonder why you bothered comparing an octo-core Bulldozer against a quad-core Deneb to count ALUs and AGUs in the first place.
That nVidia went on pretty because of Apple, like everyone did, and had very little outside of Apple's stuff to show for it, and tended to downplay it. Then, AMD, who had been talking it all up like the second coming, didn't have anything, and didn't then have anything, repeat over and over again. Many people were wanting something they could port later on, or port between OS X and whatever, or for whatever other reason, so nVidia took the opportunity to hurry up and get it out there, and in their drivers.What do you mean by that?
There isn't a more fair comparison. Intel's CPUs with HT are actually four cores, but with extra sets of registers and other related goodies, so that when it stalls, it doesn't just sit idle. An optimized application, though, can get little benefit from it, or will even run slower, and cause increased power consumption in the process, sometimes (less so w/ the i7, but that problem w/ the P4 I'm sure played a part in ditching it for the Core 2s). What AMD is basically doing is improving their perf/watt in ways that don't need die shrinks, but also going, "hey, these units are always on, and are very high bandwidth, so if we beefed them up some, shared them between cores, we could save plenty of space and power, yet only have a very small performance loss in some rare corner cases." Two modules is still really four cores, just not four completely separate cores.AtenRa said:If you want to be fair in the comparison on a Technical level (Cores) you have to compare a 2 Core SB (4 Logical Cores) to a 2 Module Bulldozer (4 Logical Cores) and the same with Quad Core SB (8 Logical Cores) to a 4 Module Bulldozer (8 Logical Cores).
That nVidia went on pretty because of Apple, like everyone did, and had very little outside of Apple's stuff to show for it, and tended to downplay it.
IE, I saw it as nVidia pushing good OpenCL support as soon as they did in large part as opportunistic move against AMD/ATi