# DiscussionCell CPU vs. modern day CPU

#### dragonlike55

##### Junior Member
I know this comparison isn't easy to do because of different architectures used but I'm wondering which cpu today can rival and even surpass a Cell cpu in terms of doing vector and floating point math? AFAIK any modern cpu can blow cell out of water but I'm interested in particular area I mentioned. I'd really like to know how far we got in this regard with modern cpu's-namely with gaming/workstation cpu's? The whole idea of cpu that's 10+ years old competing with modern ones seems a bit far stretched if I may say, comic It would be also helpful to hear an answer from someone who works in field but any answer is most welcome. I apologize if I haven't been perfectly clear, I have little experience with it.

Last edited:

#### Schmide

##### Diamond Member
I guess I'll do the math. Cell was basically a dual core CPU with 8 SPE Synergistic Processing Elements.

Being liberal with my factors.

Each core was ~3 GFlops (equal to a core2) so 6 GFlops of CPU
Each SPE was ~25 GFlops (equal to a very very low end video card) so 200 GFlops.

Using GCN AMD's Graphics Core Next as a starting point for modern video cards. (2012)

(For nvidia Fermi gForce 400)

A GCN compute unit is 64 stream processors and produces approximately 128 GFlops per ghz.

A low end first generation GCN GPU (oland) has 5 compute units for 5 x 128 = 640 GFlops.

In terms of modern flagships.

An AMD 6900xt = 24 TFlops (24,000 GFlops)

A nVidia 3090 = 35 TFlops

Consoles

PS5 = 10 TFlops
xbox series x = 12 TFlops

So console to console

New consoles are ~ 10,000 / 200 = 50 times faster.

#### fkoehler

##### Member
I understand what/why you are asking this, as its puzzled me before as well.
However, do you even have a baseline on the Cell and any modern cpu to query if there is any reason to suppose it is NOT already in the dustbin of history?

I'm wondering why anyone would come to think the Cell would be faster at anything compared to any of today's normal procs.

I know this comparison isn't easy to do because of different architectures used but I'm wondering which cpu today can rival and even surpass a Cell cpu in terms of doing vector and floating point math? AFAIK any modern cpu can blow cell out of water but I'm interested in particular area I mentioned. I'd really like to know how far we got in this regard with modern cpu's-namely with gaming/workstation cpu's? The whole idea of cpu that's 10+ years old competing with modern ones seems a bit far stretched if I may say, comic It would be also helpful to hear an answer from someone who works in field but any answer is most welcome. I apologize if I haven't been perfectly clear, I have little experience with it.

#### Thala

##### Golden Member
PS5 = 10 TFlops
xbox series x = 12 TFlops

So console to console

New consoles are ~ 10,000 / 200 = 50 times faster.

Technically the PS3 had a GPU in addition to the Cell engine - so the console-to-console comparison is not really correct. I vaguely remember Sony talking about 1TFlop for PS3 overall compute performance.
I also assume that the OP was asking for Cell engine vs. CPU and not Cell engine vs. GPU.

#### Schmide

##### Diamond Member
Well IBM did continue to release iterations, including a blade server that was reported to have that level. It did not live up to expectations.

Each core was ~3 GFlops (equal to a core2) so 6 GFlops of CPU

Cores were round off errors. but...

Zen 2 single precision FMA ~64 GFlops.

So for an xbox series x vs ps3 cell

8 x 64 = 2^(3+6) = 2^9 = 512 / 6 = 85x

Looks even more lopsided.

Last edited:

#### dragonlike55

##### Junior Member
I guess I'll do the math. Cell was basically a dual core CPU with 8 SPE Synergistic Processing Elements.

Being liberal with my factors.

Each core was ~3 GFlops (equal to a core2) so 6 GFlops of CPU
Each SPE was ~25 GFlops (equal to a very very low end video card) so 200 GFlops.

Using GCN AMD's Graphics Core Next as a starting point for modern video cards. (2012)

(For nvidia Fermi gForce 400)

A GCN compute unit is 64 stream processors and produces approximately 128 GFlops per ghz.

A low end first generation GCN GPU (oland) has 5 compute units for 5 x 128 = 640 GFlops.

In terms of modern flagships.

An AMD 6900xt = 24 TFlops (24,000 GFlops)

A nVidia 3090 = 35 TFlops

Consoles

PS5 = 10 TFlops
xbox series x = 12 TFlops

So console to console

New consoles are ~ 10,000 / 200 = 50 times faster.
May I ask you if you know by any chance were there any CPU's at that time (2006-2007) that could achieve performance of 200-300 GFLOPS?

Last edited:

#### dragonlike55

##### Junior Member
I understand what/why you are asking this, as its puzzled me before as well.
However, do you even have a baseline on the Cell and any modern cpu to query if there is any reason to suppose it is NOT already in the dustbin of history?

I'm wondering why anyone would come to think the Cell would be faster at anything compared to any of today's normal procs.
Exactly. You see, I have some knowledge of computer architecture and I'd like to learn more but in this period of life I can't do it. I would in no way call me an expert. It was interesting information I saw and as I cruised through I found all sort of things. And it LOOKED info came from ppl who know their stuff. Basically they agreed that cell could do vector and float point operations exceedingly well. There was one post on reddit IIRC that even compared it to a modern quad core in this department. Since I have a brain and am no sheep I had to dig further. And all I found Sony has some good PR when it comes to these things, like they did with ps5 SSD.

#### dragonlike55

##### Junior Member
Technically the PS3 had a GPU in addition to the Cell engine - so the console-to-console comparison is not really correct. I vaguely remember Sony talking about 1TFlop for PS3 overall compute performance.
I also assume that the OP was asking for Cell engine vs. CPU and not Cell engine vs. GPU.
Yes cpu vs. cpu.

#### Schmide

##### Diamond Member
Like I thought. May I ask you if you know by any chance were there any CPU's at that time (2006-2007) that could achieve performance of 200-300 GFLOPS?

There were no CPUs including the cell that could reach that level.

SPE is more like a GPU than a CPU. It is unfair to treat it as such.

In 2007 AMD released their 2000 series cards that broke away from the fixed pipeline. (DX10.1) They could do near 400 GFlops. nVIdia released its 8000 series later that year. The teraflop era didn't come till 2008.

Edit: Technically the xbox 360 Xenos was released in 2005 and had some DX10.1 features and could do 240 GFlops.

Last edited:

#### Thala

##### Golden Member
There were no CPUs including the cell that could reach that level.

SPE is more like a GPU than a CPU. It is unfair to treat it as such.

That view surely is debatable. The SPEs could be treated as SIMD extensions to the control CPUs - while conceptionally more asynchronous than conventional SIMD extensions, where the PEs are running synchronously to the CPU pipeline. This holds in particular, when we consider that back in the days, GPGPU was not a thing yet.

#### NTMBK

##### Lifer
That view surely is debatable. The SPEs could be treated as SIMD extensions to the control CPUs - while conceptionally more asynchronous than conventional SIMD extensions, where the PEs are running synchronously to the CPU pipeline. This holds in particular, when we consider that back in the days, GPGPU was not a thing yet.

They're nothing like putting SIMD in the CPU. They have no access to main memory- they can only access a tiny scratchpad, which you need to DMA data in and out of. They run a completely different instruction set.

If you want to optimize your code with SIMD, you can take your slow function and replace the slow math operations with SIMD versions. If you want to optimize your code with SPEs, you need to completely rearchitect your engine.

DisEnchantment

#### ThatBuzzkiller

##### Golden Member
They're nothing like putting SIMD in the CPU. They have no access to main memory- they can only access a tiny scratchpad, which you need to DMA data in and out of. They run a completely different instruction set.

If you want to optimize your code with SIMD, you can take your slow function and replace the slow math operations with SIMD versions. If you want to optimize your code with SPEs, you need to completely rearchitect your engine.

Pretty much this so the Cell CPU can be described as having a heterogeneous ISA with the most extreme form of NUMA (non-uniform memory access) and you need another compiler to handle the SPEs which is not at all comparable to other CPU architectures (x86/ARM) with traditional SIMD since they have a unified compiler ...

#### Nothingness

##### Platinum Member
They're nothing like putting SIMD in the CPU. They have no access to main memory- they can only access a tiny scratchpad, which you need to DMA data in and out of. They run a completely different instruction set.
Yes, but that still don't make them look like a GPU. IMHO the SPE are closer to a CPU than to a GPU. I guess a better analogy is to consider them as DSP with no shared memory (like many SoC have).

#### Mopetar

##### Diamond Member
It's probably been surpassed even in the niches where it likely excelled simply on account of it being so old. I do recall a story about it being used for crypto mining before GPUs (and then ASICs) taking over, so it probably lived a lot longer than most technology in that regard.

I think the original Cell chips were made on a 90 nm process which is so many full steps away from modern nodes that if there's something it does better than modern CPUs it's only because it's such a niche area that almost no one cares about it or it's not worth it for 99.999% of users.

It was a cool design for sure and in some ways a lot more forward thinking than anyone might have imagined.

Tlh97

#### Thala

##### Golden Member
They're nothing like putting SIMD in the CPU. They have no access to main memory- they can only access a tiny scratchpad, which you need to DMA data in and out of. They run a completely different instruction set.

If you want to optimize your code with SIMD, you can take your slow function and replace the slow math operations with SIMD versions. If you want to optimize your code with SPEs, you need to completely rearchitect your engine.

I know - therefore I said it is debatable And as i said, back in the days Sony surely positioned the Cell engine against CPUs. In addition if you were to design a DSP engine to offload some computation from the CPUs - it would most likely rather look like Cell SPEs than a GPU.

#### moinmoin

##### Diamond Member
I wish we could get an interview with Dr Lisa Su about Cell considering she was part of the team at IBM that worked with Sony and Toshiba on it.

#### eek2121

##### Platinum Member
Is anyone besides me curious as to how well a 7nm shrink with more cores and more SPEs would perform? Maybe quadruple everything up?

mikediaz