PlasmaBomb
Lifer
- Nov 19, 2004
- 11,636
- 2
- 81
I've seen a ton of people post this phrase.
Was this written by a grade 5 student ??
Its disappointed!!!
It is a meme.
I've seen a ton of people post this phrase.
Was this written by a grade 5 student ??
Its disappointed!!!
Well if the respin allows them to hit higher clocks, that will help with the cache throughput, right? I thought that was one of the reasons it was under performing, BD wasn't able to come close to the clocks they were hoping for.
Or do the cache problems go beyond just not being able to meet clock speed targets?
Actually at higher clock speed latencies might be getting even worse as checked here:
http://forum.purepc.pl/findpost-t322141-p3662624.html
A respin might let them run both higher core and uncore speeds, depends what the issues are and where they feel they can get the most benefit out of tweaking...
No, monkeying around with things like cache timings and size are not something you can get away with in a respin.
It could be addressed in piledriver, provided they planned for such changes maybe 2 yrs ago or so.
Can the cache measuments be "biased" by unballanced GF process, causing unforeseen jitter whatever? - sorry dont have the technical insight, but i simply try to understand more of the situation![]()
Update:As many astute readers have pointed out, Core 2's prefetchers are able to work their magic with ScienceMark 2.0, which results in the significant memory latency advantage over AMD's Athlon 64 FX-62. This advantage will not always exist; where it doesn't, AMD will continue to have lower latency memory access and where it does, Intel can gain performance advantages similar to what ScienceMark 2.0 shows.
Updated - 1/5/07: Although AMD previously did not mention any issues with our findings, we were contacted today and informed that the latency information both ScienceMark and CPU-Z produced is incorrect. The Brisbane core's L2 latency should be 14 cycles, up from 12 cycles and not 20 cycles. This would help explain the relatively low impact on application performance that we've seen across the board. We are still waiting to hear back from AMD on a handful of other issues regarding Brisbane and will update you as soon as we have more information.
3 issue as in three ALUs? *CPU hardware noob*
Has anyone done any studies/research on overclocking Bobcat derived parts to see how the TDP and TDW changes?
Isn't each "core" in a BD module 2 issue? Where does a Bobcat core sit in reference to each core in a Bulldozer module? Judging from indicated performances, 8x Bobcat cores might hold their own in overall capability vs a full 4 module Bulldozer chip unless we get into AVX or something like that. It really seems that AMD was overly confident in the module's ability to decode and schedule data into the two integer cores, at least with today's programs.
How can we be sure that the cache latency is the main reason? Sounds to me a bit like (using CA - car analogies for the sake of having just a touch screen keyboard right now) ?Oh, the new Corvette has a lower max. mph! Ah, I can see, why: the diameter of the exhausts is smaller!? It's just too early to say such things w/o some decent profiling.I believe they planned for equivalent IPC but to enable higher clocks. I also believe their cache hierarchy is holding them back. The issue I have with having any confidence whatsoever in the cache-impacting IPC argument though (I'm arguing with myself here) is that cache latency and the congestion from it is something that is readily simulated as well as being "baked in" when they design the microarchitecture.
The latency certainly is exactly what they intended, but maybe they failed to simulate the congestion that would come from it with conventional instructions mixes?
According to AIDA measurements, BD's raw integer instruction throughput is a little higher (10-15% IIRC), while FP t'put is more like 2-4x for more than 32b floats, as expected.3 issue as in three ALUs? *CPU hardware noob*
Has anyone done any studies/research on overclocking Bobcat derived parts to see how the TDP and TDW changes?
Isn't each "core" in a BD module 2 issue? Where does a Bobcat core sit in reference to each core in a Bulldozer module? Judging from indicated performances, 8x Bobcat cores might hold their own in overall capability vs a full 4 module Bulldozer chip unless we get into AVX or something like that. It really seems that AMD was overly confident in the module's ability to decode and schedule data into the two integer cores, at least with today's programs.
How can we be sure that the cache latency is the main reason? Sounds to me a bit like (using CA - car analogies for the sake of having just a touch screen keyboard right now) ?Oh, the new Corvette has a lower max. mph! Ah, I can see, why: the diameter of the exhausts is smaller!? It's just too early to say such things w/o some decent profiling.
Well, whats known about the caches.. one to one
BD : 4c L1, 2x 2R/1W (48B)
SB : 4c L1, 1x R/W (48B combined)
BD : 2x16k (4 way) for 2 threads
SB : 1x32k (8 way) for 2 threads
BD : 20c L2 2048MB
SB : 9(?)c L2 256MB
...
I think that he's talking about the misspelling of "disappoint", as "dissapoint", which is very prevalent on the internet.It is a meme.
I think that he's talking about the misspelling of "disappoint", as "dissapoint", which is very prevalent on the internet.
Also, this quote from IDC might be a little bit Prescient. http://forums.anandtech.com/showpost.php?p=30358536&postcount=231
No, monkeying around with things like cache timings and size are not something you can get away with in a respin.
It could be addressed in piledriver, provided they planned for such changes maybe 2 yrs ago or so.
Well, AMD clearly put more/better (or maybe even more better) resources into bobcat, so they might have been planning that as the next NEXT cpu family for a while. However, unlike intel, they can't just ramble around for 5+ years while it happens. They are likely pushing it hard even now, and it will come out ASAP if they can make it work.
Bobcat also has lower IPC than stars
I remember JF saying at one point that customers had asked him if they were going to get super-dense BC servers, and JF said that they would be better served through BD. I wonder if that is still the case.
You could get a WHOLE LOT more BC cores vs. BD cores for the same power envelope.
All of these shenanigans about the cache.![]()
Bulldozer has a longer pipeline than any processor currently on the market.
Bulldozer artificially inflates core count by forcing extra integer units to share the floating point units.
Those are the reasons why the thing is so inefficient.
I'm sure that a better cache could help somewhat, but the crux of the problem is that Bulldozer is inherently inefficient and seems to have been created to enhance a pissing contest involving words like core count and megahertz.
Is that true, though? Bobcat is great for what it is, but AMD has claimed that TDP per core is going down to ~ 5W (on server) for BD. Of course, AMD also said IPC wouldn't go down, so I guess that isn't a terribly strong argument :$
If it turns out that you're right and perf/W ends up being much higher for BC, we actually COULD see an Opteron BC. Maybe the rumored 28nm Opterons are in fact them...![]()
"Dropping" to 20 isn't necessary since the 2MB L2 latency is 20c acvording to Software Optimization Manual, AIDA and other latency measurement tools. Sandra might suffer from a wrong way of measurement, or at least they didn't adapt the code correctly.But didn't they deliberately increase the latency to enable higher clocks? Dropping it to 20, much less 12-15, would be a huge task right?
Is that true, though? Bobcat is great for what it is, but AMD has claimed that TDP per core is going down to ~ 5W (on server) for BD. Of course, AMD also said IPC wouldn't go down, so I guess that isn't a terribly strong argument :$
If it turns out that you're right and perf/W ends up being much higher for BC, we actually COULD see an Opteron BC. Maybe the rumored 28nm Opterons are in fact them...![]()