Digital Foundry: next-gen PlayStation and Xbox to use AMD's 8-core CPU and Radeon HD

Page 13 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I like blu ray personally. Storing 100gb on a disk works out way cheaper than 100gb on a HDD

Seektime is 10x higher and speed is something like half of a HD for the current fastest bluray. Sounds like patience would be required.
 

wlee15

Senior member
Jan 7, 2009
313
31
91
The Power Processing Element is based on the PowerPC 970. Which is the main CPU in both those consoles.

The PowerPC970 is an out-of-order processor that can decode up to 8 instructions per cycle, the PPE is an in-order processor that can decode 2 instructions per cycle. The PowerPC970 has 2 Integer units, 2 scalar FP units, 2 load store units, 1 branch unit, and 1 VMX unit. The PPE has 1 Integer unit, 1 combined scalar FP/VMX128 unit, 1 load store unit, and 1 branch unit. The PPE also has SMT while the PowerPC970 does not. They really can't be more different.
 

Fx1

Golden Member
Aug 22, 2012
1,215
5
81
Seektime is 10x higher and speed is something like half of a HD for the current fastest bluray. Sounds like patience would be required.

i couldnt really care less to be honest. A movie plays fine on them and games seem to do ok with a small install to the HDD.

If i wanted fast id use my SSD's on my PC.
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,395
8,558
126
Broadband in the US is ranked 15 in OECD in terms of subscribers per 100 people.

I just want to point a huge myth out. It got absolutely nothing to do with size thats often used ass the primary excuse. You can only blame the politicians for you dont have faster and cheaper uncapped internet.

Companies act like companies do. So while their behaviour aint "ethical". Its not them to blame.

going way tangentially here, for the vast majority of home users the most intensive thing they do on broadband is stream netflix, which works fine in the single digit Mbps range.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
Those scores seem awefully low compared to http://www.anandtech.com/show/4134/the-brazos-review-amds-e350-supplants-ion-for-miniitx/6
And other benchmarks show a greater difference.
Upon close examination - mine only shows PCMark. Your shows that one, and others. But your PCMark one is an average of all 6 - mine show all 6 separately. I didn't look at all of the individual marks. So it turns out - the different benchmarks do agree - the Brazos is better in all 6 averaged, by 22%. In Productivity - Intel's is better, i.e. office/databases, etc. And also in hard-drive speed. But in everything else - AMD's is better. It is way better in yours' Javascript and Browser benchmarks - of course, out-of-order processing. Those are programming code-heavy.
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
Upon close examination - mine only shows PCMark. Your shows that one, and others. But your PCMark one is an average of all 6 - mine show all 6 separately. I didn't look at all of the individual marks. So it turns out - the different benchmarks do agree - the Brazos is better in all 6 averaged, by 22%. In Productivity - Intel's is better, i.e. office/databases, etc. And also in hard-drive speed. But in everything else - AMD's is better. It is way better in yours' Javascript and Browser benchmarks - of course, out-of-order processing. Those are programming code-heavy.

Best to avoid PCMark altogether and go with some proper benchmarks. Brazos is well ahead of Atom in single thread and slightly ahead in multithreads on average. The real difference is in use where the atoms are sluggish and not very nice to use. Obviously Brazos's graphics are in another league.

If you ever used both together you'd think Atom was 10 years older.
 

wlee15

Senior member
Jan 7, 2009
313
31
91
In China for example its mandatory for new constructions from april to have fiber and being connected. In Sweden the fibernetwork is owned by the publi and rented out as dark fiber. In Denmark the government sets up minimum demands. Thats whats needed, a government that wants broadband so people can start using it to save society cost. I can only do my taxes online for example, there is no such thing as paper.

Here in Western Canada our telecom provider no longer provides copper to new construction since 2011 so I would assume the same is true for much of the US as well since the cost difference between fiber and copper in new construction is negligible. Obviously it doesn't mean that you can get the 100 megabit+ speeds since the telecoms still have to provide a coherent pricing scheme with those still on DSL.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
The PowerPC970 is an out-of-order processor that can decode up to 8 instructions per cycle, the PPE is an in-order processor that can decode 2 instructions per cycle. The PowerPC970 has 2 Integer units, 2 scalar FP units, 2 load store units, 1 branch unit, and 1 VMX unit. The PPE has 1 Integer unit, 1 combined scalar FP/VMX128 unit, 1 load store unit, and 1 branch unit. The PPE also has SMT while the PowerPC970 does not. They really can't be more different.
According to IBM they are both based on the Power4 architecture. Just an interesting side-note. Why's that? Basically the same hardware - similar pipeline, clock-speeds, and transistor size. It simply uses half the execution units - except for the VMX. They took the Power4 and modified it.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,055
3,862
136
According to IBM they are both based on the Power4 architecture. Just an interesting side-note. Why's that? Basically the same hardware - similar pipeline, clock-speeds, and transistor size. It simply uses half the execution units - except for the VMX. They took the Power4 and modified it.


they are nothing alike , Xenons at best ( MS numbers) delivers around 0.2 IPC per clock per thread. Comparing Thats optimized code to un-optimised X86 code that gets around 0.9 IPC per clock per thread on bobcat, jaguar will be pushing 1.1. I dont know what the 970 IPC is like exactly , i believe it was somewhere around 20% higher then Prescott based P4's at the time so somewhere around 0.8-9. Thats worlds apart from Xenon's and it hurt people very badly, i particularly remember John Carmack not being to thrilled.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
Okay new information. Apparently the Bobcat is a 380 million transistor processor, with just 2 cores. Why - the on-chip GPU, 80 32-bit SIMD units. However - the Jaguar will not have that. It will have a separate SIMD unit. So it's hard to tell how many transistors the Bobcat is actually composed of. In this image of an A8 Fusion CPU

http://www.techradar.com/reviews/pc-mac/pc-components/processors/amd-a8-3500m-965258/review

The right side is obviously the SIMD math. It appear to take up half of the CPU die. So I think 180 million transistors is an accurate estimate for the Bobcat core. The Jaguar will have an extra 64-bit FP unit - so we can put that one at 200 million transistors for each 2 core unit. Thus I'll have to revise my statement - the PS4 CPU will likely have 800 million transistors for computation alone.

Numbers for the AMD chips transistors count:
http://www.techarp.com/showarticle.aspx?artno=347&pgno=2
http://www.alternatewars.com/BBOW/Computing/Computing_Power.htm

Also my original benchmark:
http://www.hardwarezone.com/feature...os-lord-netbooks/benchmarking-and-performance

The guy was comparing a 1 ghz AMD Bobcat vs a 1.5 ghz Intel Atom - thus another reason the difference was not that high.

Thus we're seeing more then 10 times greater CPU performance, as the Bobcat is 20-50% faster then the Atom core. Thus, 12 - 15 times better performance for the Playstation 4's CPU as opposed to the Playstation 3's, in general-purpose computing. As for SIMD - I expect 2-3 times greater performance for the CPU as well. Thus - a significant improvement. 6 times the transistors.
 

jpiniero

Lifer
Oct 1, 2010
16,567
7,070
136
Wouldn't it just make more sense to drop the SIMD unit and do any of those calculations on the GPU?
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
@Scientist - Nah the bobcat cores are tiny in comparison to Llano, and the chip layout is a lot different. It's more like 1/4 compared to 1/2.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
However the 12-15 times greater general purpose-processing will open the door of possibilities. Greater A.I. performance, more characters (playable and non-playable), and better physics. This generation had very small numbers for all of those.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
they are nothing alike , Xenons at best ( MS numbers) delivers around 0.2 IPC per clock per thread.

Do you have a link for that? I keep seeing people mention it, but no source.

So if you got some official Microsoft numbers it would be great.
 

wlee15

Senior member
Jan 7, 2009
313
31
91
According to IBM they are both based on the Power4 architecture. Just an interesting side-note. Why's that? Basically the same hardware - similar pipeline, clock-speeds, and transistor size. It simply uses half the execution units - except for the VMX. They took the Power4 and modified it.

AMD's Bobcat processor and Intel's Merom variant of their Core 2 processor have a nearly identical instruction set, but it would obviously be incorrect to state that one is based on the other.
 

NTMBK

Lifer
Nov 14, 2011
10,423
5,727
136
According to IBM they are both based on the Power4 architecture. Just an interesting side-note. Why's that? Basically the same hardware - similar pipeline, clock-speeds, and transistor size. It simply uses half the execution units - except for the VMX. They took the Power4 and modified it.

I'm sorry, but this is entirely wrong. The PPE for the Cell (and the Xenos which was based off it) was a brand new design, built from the ground up.

If you want to learn about how that chip was designed, read The Race For a New Game Machine. It's written by one of the chief architects of that PowerPC core. Obviously he is pretty biased and thinks it did pretty well, but if you read between the lines you can see where things went off track. (For instance, according to that account out-of-order-execution was thrown out and replaced with SMT because one engineer wanted to make his mark on the core.)
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
Wouldn't it just make more sense to drop the SIMD unit and do any of those calculations on the GPU?
It's far more power efficient. It makes the computer smaller. You don't even need a separate graphics chip on the motherboard, let alone another video card.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
This also allows for far greater optimization of the memory - which is key. As the memory problem is the main problem. A regular computer:

CPU
A few hundred registers
8 MB of L2 cache
512 Kbytes of L1 cache
An external memory bus
A bus to the GPU

GPU
512 - 4 GB of DDR5 RAM
A separate bus to the CPU

Massive inefficiency as data has to be transferred from the 3 separate pools of memory via buses, and across distances. Higher latencies.

The ideal PC:
1 CPU, with 8 MB of L2 cache
GPU - right next to the CPU, with a bus to the CPU's L2 cache
Registers, which the CPU and GPU can access
An external memory bus

Far more efficient. Far less latency.
 

Scientist113

Junior Member
Jan 21, 2013
19
0
0
The Pentium 4 Extreme Edition's maximum memory bandwidth was 8.512 GB/s. It was released in 2004:

http://ark.intel.com/products/27492...-HT-Technology-3_73-GHz-2M-Cache-1066-MHz-FSB

The second generation Core i7 Extreme Edition (the latest Intel CPU) can only support 25.6 GB/s.

2004 - Pentium 4 HT Extreme - 8.512 GB/s
2013 - Core I7 Extreme Edition - 25.6 GB/s

In 9 years - only a 3 times increase in bus bandwidth. That's hardly nothing. How to improve the memory bandwidth? Put it next to the GPU.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
That article is 2yrs old.

I made my comment within the context of assuming we'd be comparing Jaguar (built with TSMC 28nm) to a modern atom sku (built on 22nm).

I'm expecting the 22nm atom to sip power like a prius, but also be designed such that it is not going to scale to the kinds of performance capabilities that 28nm jaguar has been designed to hit.

Are you expecting jaguar to be have even better performance/watt than 22nm atom, despite its process handicap of being built on 28nm? (if you are, then I'd buy that argument given that AMD was able to do just that with their 40nm brazos versus Intel's 32nm atom IIRC)

Be careful speculating about 22nm Atom. I've heard their microarchitecture is vastly improved.

Yep typical xbitlabs, they do a lot of what seems to be good testing but it's horribly flawed at best, or biased depending on what they are reviewing. The original Brazos review shows a completely different story.

power-1.png

power-2.png

That's not necessarily useful for comparing core efficiency... the Bobcat may be doing twice as much work in CPU Burn as the Atom. If you run it at a lower frequency and voltage so that it runs the CPU Burn loop at the same speed as Atom, the change in power from idle might shrink dramatically. In the Jaguar talk at Hot Chips, the chief architect actually mentioned something related: the clock gating efficiency of Jaguar appeared to be lower than Bobcat for the "power virus" test pattern, but it was running the test about twice as fast, so in fact it's still more efficient.

On 4 higher frequency cores vs 8 lower frequency cores. Higher frequency of the four cores will require more voltage. Power scales quadratically with voltage, so I can see why they decided to use 8 cores rather than 4, especially if they don't want RROD issues this time around.

That's only true for a particular design (i.e an Atom at 1GHz vs. an Atom at 2GHz). It's not true when you're comparing two different microarchitectures. For example, a Bulldozer or Llano would likely reach 1.6GHz at a lower voltage than a Bobcat. Sometimes a higher-frequency design can be more power-efficient than a lower-frequency design, but I suspect the details of that are beyond the scope of this thread.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Power is proportional to C*V^2*f, where C is the capacitive load of all the circuits switching each clock cycle. A more complex design (Piledriver) will inevitably have a greater C than a simpler design (Jaguar). Is it possible that Piledriver would be able to operate at 1.6GHz with lower voltage than a Jaguar? Possibly, but I certainly don't think that's a given. What we can definitely say is that using a wider, lower-voltage design is a well-known technique to reduce power consumption while maintaining or increasing performance (Intel is also doing this for the ULV GT3e graphics).
 

Fx1

Golden Member
Aug 22, 2012
1,215
5
81
I think the biggest issue will be the 8GB DDR3 vs the 4GB DDR5.

Personally i vote DDR5 since bottlenecks ruin everyones party
 

SPBHM

Diamond Member
Sep 12, 2012
5,066
418
126
I think the biggest issue will be the 8GB DDR3 vs the 4GB DDR5.

Personally i vote DDR5 since bottlenecks ruin everyones party

well, it's 256bit DDR3 + embedded SRAM
it shouldn't be a problem if the GPU is really 12 CUs at 800MHz