The PowerPC 750 in modern terms, how would it be?

tipoo

Senior member
Oct 4, 2012
245
7
81
A good briefing from Ars, if you've never seen it, on the 750 architecture.

http://arstechnica.com/features/2004/10/ppc-2/

So, I became interested in this when it was revealed that the cores in the Wii U are extremely similar to PPC750s, which makes sense given the GC and Wii also used that processor. The difference in the U being that it's the only multicore 750 implementation that I know of, plus it's clocked higher than any 4 stage pipeline that I know of.

So I'm curious, how do you think a PPC 750 clocked much higher than it originally was designed for, but still lower than most mainstream processors, would fare performance wise? It fetches four instructions per second and dispatches two, that's more than say, low power cores like AMDs Jaguar (which is also an interesting comparison since the PS4/Durango use it).

Most people think about short processor pipelines as a good thing, but there is a balance to be struck between crazy high netburst like pipelines and short 4 stage pipelines like this, in actuality lower isn't always better, lower does mean less has to be evacuated from the processor in a branch miss wasting less clocks, but higher allows for higher clock speed and with lower branch miss rates in modern processors that's also beneficial.

So...That's some stuff. Thoughts?
 

wlee15

Senior member
Jan 7, 2009
313
31
91
Remember that an x86 processor can include 1 memory operand in a instruction while an PowerPC processor can only work in register. So an load-execute-store that might be a single x86 instruction would be three instructions on PPC. Plus Jaguar's out-of-order engine is far superior to that of the PowerPC 750.
 

Centauri

Golden Member
Dec 10, 2002
1,631
56
91
IBM's PPC 750 and the heavily related G4 7400/7410 probably represent one of the best microarchitectures of the past few decades. Very ahead of its time in clock cycle efficiency and power efficiency.

A shame it wasn't given more attention at the turn of the century because I think it would have born an excellent lineage.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
According to the Broadway user manual, it's only 4 pipeline stages in the shortest path. It has separate pipelines and as you'd expect the load/store pipe adds one stage and the FPU pipe adds two.

CPU architect Mitch Alsup once noted that he was able to hit 3GHz with a 6-stage pipeline for an x86 processor, and since this was years ago it must have been on a pretty old process by today's standards. He didn't describe the uarch but I assume it was a pretty simple scalar ordeal, also didn't say anything about the cache interface. If you look closely at the pipeline stages you'll see several of them are there for extracting more ILP, not merely increasing clock speed.

That doesn't mean a 750 would be able to hit 3GHz though. It's probably already pushing L1 dcache latency to the limits, and that's with a single add load-use latency cycle.

I'm pretty confident it'd fare much worse than Jaguar. The only part about it that looks wider is its ability to remove some branches at the front-end before decode, which I don't think Jaguar can do. The execution part (2 ALU + 2 AGU + 2 128-bit FPU, simultaneous load + store) is significantly wider than 750 and the OoO capabilities greatly outclass it. The missed branch penalty is a lot higher but the branch prediction is a lot better. There's also some degree of hardware auto-prefetching on Jaguar, which should bring down the L2 miss rate a lot.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
See here. That 750 result was from my ibook before it died. I'll eventually root my Wii and test that too, but I'm lazy so it may not happen.
 

NTMBK

Lifer
Nov 14, 2011
10,400
5,635
136
It's always interesting to see these older designs make a comeback on newer processes. Kind of like how the original Pentium core can hit 1GHz on 22nm (in the Xeon Phi).
 

tipoo

Senior member
Oct 4, 2012
245
7
81
From what I've been reading, it seems like Espresso is more in the ballpark of ARM cores than even ULV x86 cores.

It seems to me everything Nintendo chose is for easy hardware Wii compatibility. In Wii mode, it literally just shuts down two cores and some of the cache, lowers the clock to Wii levels and behaves as normal, it's very easy 1:1 Wii U - Wii mapping. The GPU is a bit different but it has an 8 bit CPU stuck in there to enable easy Wii compatibility through some sort of translation, again, plus it has some SRAM and higher density eDRAM (than the larger 32mb pool) to emulate the Wiis GPU buffers. So Nintendo seems very much stuck in legacy mode. I wish they would have made a clean break and went x86 like everyone else to make cross platform easier. Even three Jaguar cores would have been better, from everything I see. And Jaguar actually has great SIMD extensions.

There's not much I would miss if they dropped wii compat, and more future proof power would have been much appreciated.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,111
136
IBM's PPC 750 and the heavily related G4 7400/7410 probably represent one of the best microarchitectures of the past few decades. Very ahead of its time in clock cycle efficiency and power efficiency.

A shame it wasn't given more attention at the turn of the century because I think it would have born an excellent lineage.

It was given allot of attention in the high performance embedded world.
 

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
From what I've been reading, it seems like Espresso is more in the ballpark of ARM cores than even ULV x86 cores.

It seems to me everything Nintendo chose is for easy hardware Wii compatibility. In Wii mode, it literally just shuts down two cores and some of the cache, lowers the clock to Wii levels and behaves as normal, it's very easy 1:1 Wii U - Wii mapping. The GPU is a bit different but it has an 8 bit CPU stuck in there to enable easy Wii compatibility through some sort of translation, again, plus it has some SRAM and higher density eDRAM (than the larger 32mb pool) to emulate the Wiis GPU buffers. So Nintendo seems very much stuck in legacy mode. I wish they would have made a clean break and went x86 like everyone else to make cross platform easier. Even three Jaguar cores would have been better, from everything I see. And Jaguar actually has great SIMD extensions.

There's not much I would miss if they dropped wii compat, and more future proof power would have been much appreciated.

They're obsession with BC and I think to some extension, architectural familiarity was pretty foolish. Reengineering the architecture with wider SIMD would've probably been too expensive on top of adapting it to multicore and installing the larger L2 cache. Thinking of that, I wonder how much it cost to integrate the Wii GPU and 32 MB of eDRAM into the graphics die.

Nintendo perhaps should've injected some cash into AMD to get:

1) Custom 4 core Jaguar APU of some sort with 384+ GCN SPs.

2) Custom 4 core Bobcat APU a la Krishna but with 320+ VLIW5 SPs.

3) Expanded Trinity variant with 384+ VLIW4 SPs.

I'd give all of these 128 bit GDDR5 interfaces, and keep the 2 GB of GDDR5.
 

tipoo

Senior member
Oct 4, 2012
245
7
81
Kind of interesting reading about how the 750 evolved, the GX and canceled VX may have been reused in the U. The VX was cancelled when Apple moved to the G4 in the same year. It had more cache than the original, actually more than the U. The GX was the highest clocked 750, at 1.1GHz, pretty close to the 1.2 of the U.

http://en.wikipedia.org/wiki/PowerPC_7xx#PowerPC_750GX