XBitlabs: Advanced Micro Devices Set to Unveil New Strategy Next Week

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Cell is difficult to program for because to take advantage of it you have to take advantage of the parallelism which isn't an easy thing to do.
You have to take advantage of SIMD-oriented coprocessors that do not support any meaningful form of multitasking (OTOH, they can be chained together), can compete with each other and the CPU for memory accesses (in theory this can be avoided, but not in reality), and have to fit their code and data within 256KB at any given time, for each kernel. It was like a big iron CPU and DSP got together and had a baby. Taking advantage of parallelism in a regular CPU is going to be easy, by comparison, but slower (CPU+Fermi/GCN should be able to make up for SPEs). An 8C ARM <whatever> would be 8 normal CPU cores, not remotely Cell-like.

your 8 ARM core CPU isn't going to be any easier to program for. programmers are having a hard enough time figuring out what to do with quad core CPUs.
Quad core is already being effectively used by quite a few games, some even getting 80%+ per core up to quad. The hard part has been that software support for such parallelism sucked. Now that that is catching up, it's just one of the hard parts about making high performance software, not really any harder than any other.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,683
2,571
136
Cell is difficult to program for because to take advantage of it you have to take advantage of the parallelism which isn't an easy thing to do. your 8 ARM core CPU isn't going to be any easier to program for. programmers are having a hard enough time figuring out what to do with quad core CPUs.

Not just that. With cell, the biggest hurdle is that you have to fit all the code and data you need into the local store, and you cannot just do normal memory reads from ram.

Compared to cell, programming for multicore desktops is a walk in the park. (Which, combined with how little proper multithread programming is being done on the desktop tells you all you need to know about cell.)
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,402
8,574
126
bah. ok, still, my main thrust has been and will be that ARM isn't some magic technology for power consumption and at similar performance levels powerpc can have similar power consumption. ARM's main advantage with low power designs over other RISC-type architectures is their experience building them. that advantage doesn't apply so much when you have a relatively higher performance level that consoles demand (and an orders of magnitude larger power budget), and so IBM's experience building relatively higher performance chips becomes an advantage over ARM.
 

eastofeastside

Junior Member
Nov 19, 2011
17
3
81
Take a look at this X-Gene Applied Micro ARMv8 64bit server chip. Couldn't something similar to it work for a next-gen console:

http://www.anandtech.com/show/5098/appl ... -armv8-soc

APM's performance estimates put a 3GHz X-Gene at roughly half the integer performance of a 2.4GHz Sandy Bridge. The X-Gene advantage however is the ability to integrate many more cores. APM expects a quad-core X-Gene will be able to perform similarly to a dual-core Sandy Bridge Xeon, but with much lower power consumption.
Half the performance of Intel Sandy Bridge Xeon at only 2 watts per core.

Despite the aggressive architecture, each core is estimated to consume only 2W per core.
You could have a quad core configuration running at only 8 watts.

Here is a hypothetical ARM console chip integrated directly with the GPU:

So what might Microsoft do? I'll speculate that they won't design their own entirely new pipeline, the return on investment seems slim compared to other things they could spend time on. Its more likely they'd start from an existing ARM core and begin making changes. Microsoft will certainly integrate a powerful GPU onto the processor die, not doing so would be a step backwards from the existing XBox 360-S. I'll speculate they will tightly couple the GPU, allowing very low latency access to it as an ARM coprocessor in addition to the more straightforward memory mapped device. This is not unique: some of the on-chip XScale functional units can be accessed both as coprocessors for low latency and as memory mapped registers to get to the complete functionality of the unit. Having very low latency access to the GPU would allow efficient offloading of even small chunks of processing to GPU threads.

One possibility is to let the GPU directly access the ARM processor cache and registers. This would allow GPU offloading to work almost exactly like a function call, putting arguments into registers or onto the stack with a coprocessor instruction to dispatch the GPU. When the GPU finishes, the ARM returns from the function call. For operations where the GPU is dramatically better suited, the ARM CPU would spend less time stalled than it would take to compute the result itself. If the ARM CPU supported hardware threads, it could switch to a different register file and run some other task while the GPU is crunching.

Part of the success of the XBox is due to its straightforward programming model compared to the Sony PS3. XBox has a fast SMP CPU paired with a GPU, where PS3 has an unruly gaggle of Cell processors to be managed explicitly. XBox cannot rely on the individual cores getting faster, as single core performance has leveled off due to power dissipation constraints. XBox has to make it easy for game developers to take advantage of more cores. Tightly coupling the GPU threads so they can function more like one big SMP system is one avenue to do this.
 
Last edited:

wlee15

Senior member
Jan 7, 2009
313
31
91
If your looking at half the speed of 2.4 Ghz Sandy Bridge at 3 Ghz then you're not doing not much better than the rather simple 3.2 Ghz Power based CPU that you have in the existing consoles.
 

iCyborg

Golden Member
Aug 8, 2008
1,356
65
91
And that is only integer performance, which is about the best case scenario for ARM.
 

eastofeastside

Junior Member
Nov 19, 2011
17
3
81
If your looking at half the speed of 2.4 Ghz Sandy Bridge at 3 Ghz then you're not doing not much better than the rather simple 3.2 Ghz Power based CPU that you have in the existing consoles.

Specifically the Xenon E3-1260L mentioned in the performance chart?
 
Last edited:

Tuna-Fish

Golden Member
Mar 4, 2011
1,683
2,571
136
If your looking at half the speed of 2.4 Ghz Sandy Bridge at 3 Ghz then you're not doing not much better than the rather simple 3.2 Ghz Power based CPU that you have in the existing consoles.

The in-order cores in Xenon and Cell don't come even close to half the speed of 2.4GHz SNB. Clock to clock, they are considerably worse than P4.