Discussion [TweakTown] Guerrilla dev: PS3's Cell CPU is by far stronger than new Intel CPUs

Hitman928

Diamond Member
Apr 15, 2012
6,606
12,103
136

Even desktop chips nowadays, the fastest Intel stuff you can buy is not by far as powerful as the Cell CPU, but it's very difficult to get power out of the Cell. I think it was ahead of its age, because it was a little bit more like how GPUs work nowadays, but it was maybe not balanced nicely and it was too hard to use. It overshot a little bit in power and undershot in usability, but it was definitely visionary

Interesting perspective. I completely disagree, but interesting. I think in a way it's true with the Cell having the different types of compute units, especially the SPEs which made it very powerful (on paper) for SIMD instructions compared to CPUs at the time but I don't see how it could be considered any more powerful than a modern APU.
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,379
126
Lol that's just factually incorrect, but it does lead to some hilarious logic problems.

The PS3 had Cell @ 3.2Ghz. The thing was basically a very streamlined PPC design with 1 full core (controller) and theoretically 8 SPEs (sort of 'dumb' cores, stripped down to run very specific tasks). However, the 7th was dedicated to OS background/security, and the 8th was disabled for yield purposes, leaving an effective BIGlittle style 1+6 layout.

NO branch prediction or OOOE. It would have been catastrophic as a general purpose CPU. But in the right hands was capable of some nice stuff.

Anyway, as far as Intel (or AMD x86 for that matter), let's play lol.

The Jaguar uArch was developed as a very low power CPU for Netbooks and similar. When Sony and MS came knocking for APUs for the 8th gen consoles, it was literally the only thing they had that would work for such a TDP limit, and leave plenty of die space for GPU portion critical for console performance.

Because we have de facto identical CPU for PC, we can compare directly to Intel. PS4 used 1.6Ghz Jaguar, which is identical to Athlon 5150, just with 4 Cores for the AM1 part vs 8 Cores for the PS4.

Using Geekbench 5, we get the following (not the best benchmark, but easiest to compare such a lengthy gap in timeline)

Athlon 5150 1.6Ghz Single Core : 136
Pentium E2140 1.6Ghz Single Core : 191
P4 Cedar Mill 1.6Ghz Single Core : 123 (calculated to normalize for a 1.6Ghz clock speed)

And for modern CPU by comparison hah

9900K 1.6Ghz Single Core : 591
(Calculated to normalize for a 1.6Ghz clock speed)

As you can see, Jaguar is barely better than Pentium 4 clock per clock. Yet it replaced the PS3 processor in the 8th gen @ 1.6Ghz vs 3.2Ghz Cell. And it has shown to be capable of running the same games such as The Last of Us, Uncharted Series, God of War 3, which were the most lauded 7th gen titles, at equal or higher framerates.

TLDR : just because someone is a "dev" doesn't mean they can't say incredibly dumb things. Cell was an ambitious and interesting product, but it's not even remotely close to modern x86 units in performance. It would get crushed by 10+ year old CPUs.
 

majord

Senior member
Jul 26, 2015
509
710
136
Lol that's just factually incorrect, but it does lead to some hilarious logic problems.

The PS3 had Cell @ 3.2Ghz. The thing was basically a very streamlined PPC design with 1 full core (controller) and theoretically 8 SPEs (sort of 'dumb' cores, stripped down to run very specific tasks). However, the 7th was dedicated to OS background/security, and the 8th was disabled for yield purposes, leaving an effective BIGlittle style 1+6 layout.

NO branch prediction or OOOE. It would have been catastrophic as a general purpose CPU. But in the right hands was capable of some nice stuff.

Anyway, as far as Intel (or AMD x86 for that matter), let's play lol.

The Jaguar uArch was developed as a very low power CPU for Netbooks and similar. When Sony and MS came knocking for APUs for the 8th gen consoles, it was literally the only thing they had that would work for such a TDP limit, and leave plenty of die space for GPU portion critical for console performance.

Because we have de facto identical CPU for PC, we can compare directly to Intel. PS4 used 1.6Ghz Jaguar, which is identical to Athlon 5150, just with 4 Cores for the AM1 part vs 8 Cores for the PS4.

Using Geekbench 5, we get the following (not the best benchmark, but easiest to compare such a lengthy gap in timeline)

Athlon 5150 1.6Ghz Single Core : 136
Pentium E2140 1.6Ghz Single Core : 191
P4 Cedar Mill 1.6Ghz Single Core : 123 (calculated to normalize for a 1.6Ghz clock speed)

And for modern CPU by comparison hah

9900K 1.6Ghz Single Core : 591
(Calculated to normalize for a 1.6Ghz clock speed)

As you can see, Jaguar is barely better than Pentium 4 clock per clock. Yet it replaced the PS3 processor in the 8th gen @ 1.6Ghz vs 3.2Ghz Cell. And it has shown to be capable of running the same games such as The Last of Us, Uncharted Series, God of War 3, which were the most lauded 7th gen titles, at equal or higher framerates.

TLDR : just because someone is a "dev" doesn't mean they can't say incredibly dumb things. Cell was an ambitious and interesting product, but it's not even remotely close to modern x86 units in performance. It would get crushed by 10+ year old CPUs.

Jaguar is well faster than P4 clk/clk ( it's faster than K8) outside of that GB result.

Your point still stands point though
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
Arkaign

When I get bored, I fire up my Kabini box (A4-5000) and play around. Yes it's got 4 cores but man... you literally have to adjust to that experience before you can actually using it, it's that slow. Definitely feels slower than any pentium 4's that I used at the time (~3 Ghz SKUs). Only good for watching YouTube videos and thinking... thats kind of stuff consoles have got, mmm.... sweet! Not.

Honestly, even with 8 slightly higher clocked cores I have no idea how PS4 is able to produce that kind of FPS/quality in those games. In the PC scene you would quickly become CPU limited in any of those games with similar GPU power.

1576018550287.png

Or maybe PS4 IS slow but it's just extremely well optimized... I know that if I can't get less than 60 fps in any fps game I start dropping down gfx settings. You see, what's good here is very subjective. In the late 90's there was a famous Diamond Monster 3D II ad "if you can't play at 60 fps, you can't play". That's definitely me.
 
Last edited:
  • Haha
Reactions: Arkaign

Hitman928

Diamond Member
Apr 15, 2012
6,606
12,103
136
Arkaign

When I get bored, I fire up my Kabini box (A4-5000) and play around. Yes it's got 4 cores but man... you literally have to adjust to that experience before you can actually using it, it's that slow. Definitely feels slower than any pentium 4's that I used at the time (~3 Ghz SKUs). Only good for watching YouTube videos and thats about it!

Honestly, even with 8 slightly higher clocked cores I have no idea how PS4 is able to produce that kind of FPS/quality in those games. In the PC scene you would quickly become CPU limited in any of those games with similar GPU power.

View attachment 14231

Or maybe PS4 is slow, it's just been a while since I last had a console... I know that if I can't get less than 60 fps in any fps game I start dropping down gfx settings. You see, what's good here is very subjective. In the late 90's there was a famous Diamond Monster 3D II ad "if you can't play at 60 fps, you can't play". That's definitely me.

The A4-5000 runs at 1.5 GHz, so half the speed of the P4 in that graph. If scaled up the A4-5000 score for 3 GHz and even assuming 90% scaling, you'd still have 25% faster clock for clock performance over the P4. Obviously running at 1.5 GHz, though, it's going to be slower than the P4 running at 3 GHz except maybe if running more modern instruction sets the P4 lacks.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
Hitman928

Of course, Kabini had newer instructions and what not, those kind of workloads would slaughter any P4s, I was merely thinking about legacy code. The ST performance wasn't just there, because of low clocks. Similar late P3 versus early P4, until P4 gained enough clock, there were slower @ 3x TDP. Out of the box, ST/SC performance really is what most important on the desktop at least, but of course, with DX12/Vulkan tech... its less of an issue now for gaming (e.g. BL3 DX12 can efficiently use 8T versus 4T in DX11).

1576020908860.png1576020915991.png


So the question is, what took them so long to bring these MT improvements found in DX12/Vulkan to the PC users. Apparently, the slow PS4 has been enjoying that from day 1.
 
Last edited:

BigDaveX

Senior member
Jun 12, 2014
440
216
116
I can imagine the Cell might be able to beat the current back of Intel desktop CPUs when it comes to certain kinds of specialized vector math. Trouble is, any halfway-decent GPU would slaughter them both.

And that's part of the reason why the current console generation was able to get away with Jaguar CPUs - the PS3 and to a lesser extent the Xbox 360 had to make do with older fixed-function GPUs, but the PS4 and Xbox One were both equipped with GCN-derived graphics cores, which excel at GPGPU functions and can take the strain away from the CPU.
 

majord

Senior member
Jul 26, 2015
509
710
136
Hitman928

Of course, Kabini had newer instructions and what not, those kind of workloads would slaughter any P4s, I was merely thinking about legacy code. The ST performance wasn't just there, because of low clocks.

Legacy or no legacy , It's [jaguar] is way faster - 50% + higher IPC. I don't know what that graph you posted is, but the fact the Phenom II X6 isn't much faster than the P4 is telling of it's relevance
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
Legacy or no legacy , It's [jaguar] is way faster - 50% + higher IPC. I don't know what that graph you posted is, but the fact the Phenom II X6 isn't much faster than the P4 is telling of it's relevance
I shelved the P3/P4 machines a while ago but still have the data. The rest is still in active use. When I mention that Kabini is dog-slow, I mean general usage in Windows 10. You have obviously never used it, that's okay, you're not missing much. At 3 Ghz, it would have been much better but at only 1.5 Ghz nobody cares about its IPC improvements if the whole experience is not acceptable. My merom based laptop (T7600) gets around 820 points in this test and that would be my bare minimum for lowest acceptable ST performance. Not sure what engineers were smoking at that time. Yeah, YouTube box @ 15w, sounds awesome.

Thuban was clocked at 3.1 Ghz, yeah still slow... but much more usable even with its inferior IPC. Lowest clocked Haswell i7 gets around 2200 marks, Ryzen 3700x around 3000 marks. I know what is fast and what is slow.
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I'm glad someone started a thread on this article, as it popped up on my news feed.

When I first read it, my immediate thought was how on Earth could this be true when x86-64 CPUs now have AVX/AVX2/AVX-512?

Aren't modern x86-64 CPUs already past the 1 teraflop threshold?
 

gdansk

Diamond Member
Feb 8, 2011
4,061
6,714
136
I'm glad someone started a thread on this article, as it popped up on my news feed.

When I first read it, my immediate thought was how on Earth could this be true when x86-64 CPUs now have AVX/AVX2/AVX-512?

Aren't modern x86-64 CPUs already past the 1 teraflop threshold?
I guess so, I just benched my 3700X and it achieved 1.045 tflops using 256-bit FMA3. But that doesn't seem possible since 8 cores * 4 ghz * 8 single precision operations * 2 'flops' per FMA = 512 gflops. I'm missing something. Maybe it can do two FMA per clock cycle?
 
Last edited:
  • Like
Reactions: Carfax83

moinmoin

Diamond Member
Jun 1, 2017
5,203
8,365
136
I guess so, I just benched my 3700X and it achieved 1.045 tflops using 256-bit FMA3. But that doesn't seem possible since 8 cores * 4 ghz * 8 FMA operations per clock * 2 'flops' per FMA = 512 gflops. I'm missing something. Maybe it can do two FMA per clock cycle?
Yes, your 3700X can do two 256-bit FMA per clock cycle. The previous two Ryzen gens could do only one 256-bit FMA per clock cycle.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Sheesh, why do so many people love to dump on x86? Even when the facts are so obviously against them?

I remember many years ago there was a U.S gov't contract for a new CPU design that could reach 1 teraflop. For the life of me I can't remember the codename for that damn CPU, as it had to be at least 15 years ago, and doing a Google search isn't retuning anything useful.

At any rate, I don't think anyone would have imagined15 years ago that we'd have x86 desktop CPUs capable of hitting or exceeding the teraflop barrier.
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,379
126
Sheesh, why do so many people love to dump on x86? Even when the facts are so obviously against them?

I remember many years ago there was a U.S gov't contract for a new CPU design that could reach 1 teraflop. For the life of me I can't remember the codename for that damn CPU, as it had to be more than 10 years ago, and doing a Google search isn't retuning anything useful.

At any rate, I don't think anyone would have imagined 10-15 years ago that we'd have x86 CPUs capable of hitting or exceeding the teraflop barrier.

Haha yeah. X86 these days is so juiced up on steroids that it's hardly recognizable. I remember when RISC was a big theorized displacement for x86, but you can't exactly say x86 didn't adapt and absorb so much innovation that it doesn't really make sense to be worried about replacing it for most current applications. ARM is alright, scales down better, but I've yet to see one scale UP to the levels of Ryzen etc.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,802
1,285
136
There was always that weird Intel project. With four x86 PPEs(Harpertown-derived) and 64 VLIW SIMD SPEs(x86 POD-extentsion ISA) on 45nm.

Cell BE @ 3.2 GHz => 200 GFlops @ single-precision or 50 GFlops @ double-precision.
Intel's 2.5D/3D POD @ 3 GHz => 1.5 TFlops @ single-precision or 768 GFlops @ double-precision.

 
Last edited:

Nereus77

Member
Dec 30, 2016
142
251
136
"The Cell processor was (and apparently still is) a potent workhorse that could push some serious performance, but tapping its power was a convoluted, complex, and often frustrating, time-consuming process."

Yeah, that doesn't sound like an amazing processor to me...
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,379
126
You mean you failed to notice that ARM systems are starting to enter TOP500.org? Or that AWS has designed an ARM chip that's competitive against x86 offers for DB tasks?

Of course we've yet to see an acceptable ARM desktop...

Yeah that's kind of what I mean. A general purpose product that can run windows competently. I remember giving Windows RT a try with those first surface models, and holy cow. Made me almost misty for Kabini lol. Which is impossible.
 

SPBHM

Diamond Member
Sep 12, 2012
5,065
418
126
this sounds like the type or thing you would hear in 2006/2007

I highly doubt that the cell would be faster at anything meaningful for real word applications than a current desktop CPU.

I guess Guerrilla being a PS4 developer and dealing with a very slow CPU made them think that!?

ubisoft-cloth-simulation-ps4-vs-ps3.jpg
 

Nothingness

Diamond Member
Jul 3, 2013
3,283
2,341
136
A general purpose product that can run windows competently. I remember giving Windows RT a try with those first surface models, and holy cow. Made me almost misty for Kabini lol. Which is impossible.
Things have changed since Windows RT. Which doesn't mean the new ARM laptops are any better for those who need Windows, but they are in a completely different league, both from a performance point of view and from software point of view (with i386 emulation).

I'd be happy with a Linux machine and a CPU with the performance level of the one found in Surface Pro X. I might dream :)

But let's get back on topic!

I made some (light) coding on the CELL and it was fun because it was very different. Comparing it to modern CPU is ridiculous for sure. But I wouldn't bet a CPU-only match against Jaguar would be a win for the AMD CPU. As others pointed out (and what the article makes clear) the main difference is in ease of development, and in consoles the GPU is what makes the largest difference.
 
  • Haha
Reactions: Arkaign

Arkaign

Lifer
Oct 27, 2006
20,736
1,379
126
Ah man, I literally just watched the LTT video on the SPX, and man, even clocked at a ludicrous 3Ghz, 16GB of DDR4, and with a 256GB PCIe state of the art SSD, it's .... kind of terrible? The native ARM coded browser is choppy, emulated apps run like garbage of course, but it at least gets ~10hrs of battery life and is a bit thinner. But gets wiped by the base Intel Surface models. Hell, it probably is worse than the OG Surface Pro, I used one of those recently with a fresh install of 1903 (before 1909 came out), and it flies for Office/Windows/Web.

I do wonder if some of the issue with the SPX could be the super high res display. Seems like that could needlessly waste performance.

Link for those curious : Surface Pro X : $1499 Molasses .. but THIN molasses
 

Panino Manino

Golden Member
Jan 28, 2017
1,046
1,278
136
I understood that they're talking about VMX/AVX, Floating Point, and in that sense the CELL is still good to this day. However this makes the CELL a good GPU but not a good CPU.
Doesn't matter much that the Jaguar was weaker (and notice that despite being "much weaker" it got just a bit behind CELL in that test) because work that used to be done on the CELL was offload to the GPU, that "GPU Compute" thing that was the subject of much talk a few years ago.

And not just graphics.
The old consoles had to process the audio in the CPU but the current consoles have dedicated blocks to do this work, so doesn't matter if it's weaker in theory.