June 7th for the new ones? Ok I will wait to see how these stack up.
I wonder if JFAMD could clear this up, can BD do 4 macroOps per core or per module? Or, is this a secret?
As I recall, AMD does not promote modules but cores.
Bulldozers decode unit extracts and
decodes up to four x86 instructions per
cycle from raw instruction bytes. The decode
pipeline converts x86 instructions into Cops
that can directly execute on the functional
units.
The scheduler picks and
schedules four Cops per cycle to the execution
units out of order.
Source: Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas, "Bulldozer: An Approach to Multithreaded Compute Performance," IEEE Micro, pp. 6-15, March/April, 2011AMD designed the Bulldozer FPU to
deliver industry-leading performance on
HPC, multimedia, and gaming applications.
The primary means of achieving such
performance is a four-wide, two-way, multithreaded,
fully out-of-order FPU, combined
with two 128-bit FMAC units supported by
a 128-bit high-bandwidth load/store subsystem.
It can do bothIn recent papers the Bulldozer designers call those ops "Cops" (complex ops, equivalent of a ALU/FP micro-op + a mem op [load/store/load+store]).
Decode: 4 Cops/cycle/module or up to 5 in case of branch fusion (IIRC branch op has to be in last place then)
Issue: 2 ALU ops + 2 AGLU ops per cycle per core plus 4 FP/SIMD ops per cycle per module in the FPU (belonging to both threads)
4+ whatever you gain from fusing ops. 6 avg for two threads both doing tight loops with two branches that have tests right before them?Does this mean the theoretical max-throughput of a module is 4 "cops" a cycle?
Note that large part of that, the cpu was twiddling it's proverbial thumbs waiting for memory. (or even disk...) To increase on that, you'd have to either decrease memory latency (not really possible), or increase ipc during the sections where the processor is actually doing something. Even if the average ipc is <1, upping the peak ipc can help.I think this should be enough. I'm doing some massive (100gb+) archiving right now with Winrar, and according to CodeAnalyst average IPC is 0.35 on my Thuban.
You lost me there, what do you mean ??
On the other hand, AMD has pretty much transitioned completely to selling big 6-core chips.
So you saying AMD is desperate and thats why Bulldozer's die size is bigger than Intel's Sandybridge ??
So die size is related to how desperate the company is ?? i guess then, NVIDIA is the most desperate company because of GF110 die size of 520mm2![]()
What is the Bulldozer die size? And what is the source of that information?
I'll bet you the share of the X6 chips in AMD's product mix is in the single digits.
Far, far away from being "completely transitioned".
So you saying AMD is desperate and thats why Bulldozer's die size is bigger than Intel's Sandybridge ??
i guess then, NVIDIA is the most desperate company because of GF110 die size of 520mm2
As a humorous aside, both [Cypress and GF104] are made on the same process, TSMC’s 40nm, and literally at the same fab. AMD managed to cram 2.15 billion transistors into 334mm^2, about 6.44 million transistors per mm^2. GF104 has 1.95 billion transistors in 367mm^2, about 5.31 million transistors per mm^2. This means AMD’s Evergreen architecture is over 20% more space efficient than GF104 while delivering much more raw performance and vastly more performance per watt. When SemiAccurate teases Nvidia’s layout and physical design teams, it is for a reason.
I don't know what the cause/effect here is, but I think you can find some pretty good correlation between how well a company is doing and how big their chips are.
semiaccurate said:AMD managed to cram 2.15 billion transistors into 334mm^2, about 6.44 million transistors per mm^2. GF104 has 1.95 billion transistors in 367mm^2, about 5.31 million transistors per mm^2. This means AMDs Evergreen architecture is over 20% more space efficient than GF104
As JFAMD has pointed out sever times, consumers don't care about the die size. No one makes purchasing decisions based on die size. They look at raw performance, performance/price, performance/watt, or some other more complicated metric.
The problem is simple for Nvidia, the economics of this part dont work out, the underlying architecture is wrong, so the resultant parts start out with an uphill battle. This is a problem for Nvidia, not for the end user. If the GTX460 is priced at a loss, the consumer shouldnt care, they get a deal, and that is the end of it. Retail buyers rarely care if the part is making a profit for the manufacturer.
a 4 core Bulldozer is smaller then a 4 core Sandybridge..
This entire discussion only becomes relevant if an 8-core BD is slower than a 4-core SB.
It depends on how real those Bulldozer cores are; right now a Sandy Bridge core is getting close to the throughput of two existing AMD cores at the same frequency.I dont understand what all this talk about die size is? YES the 8 core Bulldozer is larger then the 4 core Sandybridge but umm it has twice as many REAL CPU cores so it should be.
The CPU core size advantage isn't that significant, at 45nm it took AMD 346 mm^2 to match the throughput of 263mm^2 Nehalem.At the end of the day AMD has the smaller CPU per core then intel and so will have more room to improve on that design later on, either by tacking on a GPU or more cores.
a 4 core Bulldozer is smaller then a 4 core Sandybridge..
I dont understand what all this talk about die size is? YES the 8 core Bulldozer is larger then the 4 core Sandybridge but umm it has twice as many REAL CPU cores so it should be.
At the end of the day AMD has the smaller CPU per core then intel and so will have more room to improve on that design later on, either by tacking on a GPU or more cores.
Overall its a better option then intels, lets stick more FULL cpu's next to each other and try and shrink process node as fast as possible so we can stay ahead.
Is it though? I mean in the image posted above advertised as an "8-core" is actually a 4-core chip.....
It depends on how real those Bulldozer cores are; right now a Sandy Bridge core is getting close to the throughput of two existing AMD cores at the same frequency.
http://www.anandtech.com/bench/Product/289?vs=85
