IntelUser2000
Elite Member
- Oct 14, 2003
- 8,686
- 3,786
- 136
According to intel the 970 runs @ 3.2Ghz?
Apparently not the tested chip. It must be underclocked.
According to intel the 970 runs @ 3.2Ghz?
But the published FDIV latency is 42 cycles (compared to 19 in phenom and 14 in SB...), which is 7 consecutive 6-cycle FMA:s, which is `coincidentally` the amount of operations needed for full-precision software divide when you have FMA and a single extra rounding mode. I take this to mean that there is no hardware FDIV in BD.
FSQRT also got a huge increase, 35 => 52, however, this is faster than any software FSQRT I know, so I assume they have some special-purpose hardware left for it.
Remeber that the ALU's on those monsters were double-pumped -- or 6 FO4. They used dynamic logic, but I literally cannot imagine how it's possible to make 32-bit add with all the overhead on 6 FO4. Crazy.
from http://www.eecg.toronto.edu/~moshovos/ACA05/read/Pentium4ArchITJ.pdfThe processor does ALU operations with an effective
latency of one-half of a clock cycle. It does this operation
in a sequence of three fast clock cycles (the fast clock
runs at 2x the main clock rate) as shown in Figure 7. In
the first fast clock cycle, the low order 16-bits are
computed and are immediately available to feed the low
16-bits of a dependent operation the very next fast clock
cycle. The high-order 16 bits are processed in the next
fast cycle, using the carry out just generated by the low
16-bit operation. This upper 16-bit result will be
available to the next dependent operation exactly when
needed. This is called a staggered add. The ALU flags
are processed in the third fast cycle. This staggered add
means that only a 16-bit adder and its input muxes need to
be completed in a fast clock cycle. The low order 16 bits
are needed at one time in order to begin the access of the
L1 data cache when used as an address input.
Maybe we'll see something loosely similar in a future CPU.Wasn't the reason they couldn't ramp clockspeed gate leakage? Which was overlooked in predictions because it was never a major factor before 90nm? And doesn't high-K gate material really help against gate leakage?
I second the wish to see P4 on a modern process. Or even just the alu's, because damn, that's just crazy.
5Ghz 8 core at stock? You guys are in dream land. Not unless AMD went forward in time and brought back some 16nm cpu's from 2015.
Why is everyone still so sure bulldozer is going to come out at 3.5 after we have just seen an engineering sample doing 1.8?
There were "2.3GHz" Interlagos chips mentioned in supercomputer upgrade plans, scheduled for June. The 3.5GHz likely comes from the ISSCC paper.Why is everyone still so sure bulldozer is going to come out at 3.5 or more after we have just seen an engineering sample doing 1.8?
K8 samples before launch were mostly running at 800MHz. So such kind of analysis might be useless and depends heavily on AMD's ES policy. We're not observing trees hereI am sure you could do some study of how close to final speeds engineering samples are using engineering samples from previous generations of chips. I haven't done that but from what I remember of previous chips (more intel then amd) they were mostly either at or very close to final released speed. I doubt you'll find many where the speed doubled, particularly this close to release.
There were "2.3GHz" Interlagos chips mentioned in supercomputer upgrade plans, scheduled for June. The 3.5GHz likely comes from the ISSCC paper.
Already there and old by now. IBM Power 6: 5 GHz base clock at 65 nm (2007)!Maybe we'll see something loosely similar in a future CPU.![]()
Why is everyone still so sure bulldozer is going to come out at 3.5 or more after we have just seen an engineering sample doing 1.8?
Already there and old by now. IBM Power 6: 5 GHz base clock at 65 nm (2007)!
Though I don't know if they used that type of ALU, I think they used a regular one. However Power 6 is not OoO. I am not sure but afaik Power 6 has also 12 FO4.
These are x87 latencies with at least 80b precision. As I posted here, double precision SSE division has a latency of 27 cycles. What's not written there is that single precision has an even lower latency of 24 cycles (as one would expect). Same for SQRT, where DP lands at 38 cycles and SP at 29 cycles.
According to a patent the FMA units might have a forwarding path w/o rounding, which would provide the result after 5 cycles which would surely help in an implementation like Goldschmidt's division algorithm.
+(define_insn_reservation "bdver1_ssediv_double" 27
I think a 3.5GHz Zambezi at time-zero release on an immature 32nm process with only first few stepping respins under their belt would be reasonable.
I am not saying what clock speed is, but you always ship with the highest speed clocks that you can hit.
All of the future updates come from process/design enhancements. One would be a fool to hold back when bringing a new product to market.
My thinking would be in regards to the Turbo clock. I like the idea and would wand C&C to still be able to run. Will they include a way to "overclock" the TDP and set the head room to 175watts for example.
i doubt the motherboards would be able to handle that
You bring up an excellent point. If BD is designed to hit up against its TDP all the time, unless TurboCORE disables itself when not on stock settings, we could see a lot of blown motherboards (similar thing happened when Thuban was released, there is still an ancient thread going on about it over at HardOCP)
. So if we found out a CPU could run at 5GHz and lets say that was at 180w of heat output and it was stable. Wouldn't you rather that it only ran at that when it really needed to, then it would also give that extra power to other cores on a more distributed load (like 4 cores at 4.5 and 6 cores at 4.0GHz) and so on?
i doubt the motherboards would be able to handle that
I ran my QX6700 @4GHz for a couple of years on vaporphase cooling, it used more 300W on an ASUS P5E WS Pro mobo (not a cheap mobo at its time).
Just to check again. Are you sure that wasn't system power?
http://www.tomshardware.com/reviews/overclock-core-i7,2268-10.html