How the PlayStation 4 is better than a PC

Page 45 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

galego

Golden Member
Apr 10, 2013
1,091
0
0
PC isn't a PS3, they work differently, try to understand that PS3 isn't the same as the PS4, the PS4 is far closer to a PC than it is the PS3... No backward compatibility, hello?

Backward compatibility between PS3/PS4 was not the point, compatibility was not even mentioned.
 

notty22

Diamond Member
Jan 1, 2010
3,375
0
0
To bad for the PS3 guys with a big game collection. Screwed.

I only considered buying a PS3 a few times, because of my interest in Gran Turismo. It was just announced the next version of GT, 6 , is comming for the PS3. I have to guess because of the massive time spent developing each version.
Ideally , I thought they would have had a GT for a PS4 launch, but I assume they are saving a version of that for a 1-2 punch after the launch of the system. In case it pulls a Wii U.
Gran Turismo 6 announced for PS3 this autumn

Gran Turismo 6 coming this holiday season for PlayStation 3
Gran Turismo 6 officially announced, available this holiday
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
Which makes a lot of sense, assuming Carmack was right (i don't see any reason to second guess him) and consoles just "double everything"' due to optimization, you'll get close the spec you posted if you double the PS4.

Basically a 500$ console, will perform like a 1000$ PC.
It's a fair trade considering the sacrifices you have to make : No Steam + No Upgrades + Zero Flexibility

I do not see why anyone should doubt (or be troubled) by this fact.

Saying "years ahead" (in performance) is PR bull**** and we all know it.
Maybe HUMA is the future (in design) but it's not "years ahead" in performance.

Not only Carmack gives a 2x factor. Huddy gives 2x as well. And a beyond3d post introduced earlier in this thread by another poster, also mentioned something as 2x-3x hardware for compensating the overhead.

Therefore I took the 2x and effectively

1.84 TFLOP x 2 ~ GTX-680

it makes sense as you note.

About why they choose 16 GB RAM I think it must be related to texture compression. Unreal engine uses 4:1 and 8:1. The GTX-680 has 2 GB and the elemental demo requires 2 GB. Therefore, if they are going to store uncompressed textures on RAM they would need something between 8 GB and 16 GB. Add the rest of the game and the OS and this would explain why their PC had 16 GB RAM installed.

About why they choose an i7, this is the part where I lack info. I suppose that must be related to the i5 lacking hyperthreading, but cannot say more now. I will research. Someone here knows the GFLOPs of the i5-3570k?

PS : That quote you love so much for x10 "the PC can actually show you only a tenth of the performance if you need a separate batch for each draw call".... Did you notice it's only "IF you need a separate batch" ?

Both Batch calls and the reduction of overhead that they introduce were already discussed in this thread by me.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Remember that texture compression (or any compression) is a trade off between size and computation.

You're sacrificing computational time to compress/decompress textures to save space in memory.
For large-scale compression, like Megatexture, yes. For the typical DXT formats, though, that doesn't apply. By the time nV and AMD had DX9 GPUs, texture compression related performance problems had been taken care of. It's entirely a size v. quality trade-off, today. If the quality difference is visible in the end result, use that texture uncompressed. Usually, it's it's not, by the time all is said and done.
Of course it is standardized. And there are many methods offering different effects.
We had dedicated hardware do texture compression - pixel shader. Now everything uses unified architecture, where every shader can be anything is needed at that moment. If there is heavy texture decompression required there may be huge chunk of core assigned to that.
Bandwidth and capacity is very tied together. To efficiently use high capacity much bandwidth is needed.
Pixel shaders need to be able to spend their time doing other work. IIRC, the only notable use of shaders has been Doom 3, performing post-processing that swapped R with A.

By dedicated hardware, I mean die space in the part of the chip where textures come in from memory, and then are buffered, and translated for use by the rest of the hardware, for the purpose of decompressing texture blocks, to make it a free bandwidth saver, to the developer and user, for common formats. Compressed blocks are small, with the most common size being 4x4 pixels, which is going to be smaller than single DDR burst (512 bits for a 64-bit channel). So, it's easy enough to read in the whole block, and then serve out all the needed pixels from that block, to the shaders that need them, or simply to be applied to a surface. DXT1-5, FI, are going to be free on almost any GPU from the last 10 years.
 

BallaTheFeared

Diamond Member
Nov 15, 2010
8,115
0
71
Backward compatibility between PS3/PS4 was not the point, compatibility was not even mentioned.

Point is they're two different systems, PS4 is so far detached from the PS3 they can't make it work.

Didn't take you long to turn draw calls into gpu performance again :|
 

PrincessFrosty

Platinum Member
Feb 13, 2008
2,300
68
91
www.frostyhacks.blogspot.com
Most of the time the overhead of texture compression is dealt with via dedicated parts of the pipeline along with heavy optimization, I wouldn't call it "free" as some others might, but it certainly benchmarks that way.

This idea of double the performance due to overheads is a nice idea but again we need to consider the scope of this, will games actually leverage this out of the gate? Almost certainly not and here is why I think that.

1) It will be a rush to release as fast as possible, being a launch title or out early will net you more sales as titles are thin on the ground.
2) Comparisons will mostly be made to the old generation, people expectations in the first few years are easy to hit because making stuff better than last gen is easy without the need to optimize.
3) Optimizations take time and effort to code, they also require more specialist knowledge and some time spent with the platform getting to understand it, these are NOT trivial things to implement.

It all points to optimisations coming late in the consoles lifecycle where they're both technically better understood and because of tighter competition much easier to justify. I think we just need to keep this in perspective here. It's also per title, AAA titles may optimize others may not, it's not for free, it takes some development time and cost.

You only need look at launch titles of something like the PS3 vs titles made 4-5 years in, there are some huge visual and performance differences, some PS3 games look really nice and perform insanely well (Killzone 3, Rage) and other games (too many to mention) look worse and perform like total shit.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
By dedicated hardware, I mean die space in the part of the chip where textures come in from memory, and then are buffered, and translated for use by the rest of the hardware, for the purpose of decompressing texture blocks, to make it a free bandwidth saver, to the developer and user, for common formats. Compressed blocks are small, with the most common size being 4x4 pixels, which is going to be smaller than single DDR burst (512 bits for a 64-bit channel). So, it's easy enough to read in the whole block, and then serve out all the needed pixels from that block, to the shaders that need them, or simply to be applied to a surface. DXT1-5, FI, are going to be free on almost any GPU from the last 10 years.

Dunno what part of GPU you are talking about.
I was only able to find this:
http://staff.elka.pw.edu.pl/~jstacher/publikacje/GPU-Based_Hierarchical_Texture_Decompression.pdf
3.3. HiTCg Decompression.

The rectangular texture sampler object, texture
coordinates and scale/offset are the input data to our
algorithm (Figure 4). The output constitutes the linearly
interpolated texel values. The decompression algorithm is
executed in the pixel shader units.

DXT (3Dc by ATI) seems to be using shaders aswell:
http://www.hardwaresecrets.com/datasheets/3Dc_White_Paper.pdf
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Most of the time the overhead of texture compression is dealt with via dedicated parts of the pipeline along with heavy optimization, I wouldn't call it "free" as some others might, but it certainly benchmarks that way.
If it tests that way, how is it not? IIRC, Geforces were faster with compression than without by the NV20 (GF 3), though definitely were by the NV40 (GF 6xxx, benchmarks below).

Dunno what part of GPU you are talking about.
I was only able to find this:
http://staff.elka.pw.edu.pl/~jstacher/publikacje/GPU-Based_Hierarchical_Texture_Decompression.pdf

DXT (3Dc by ATI) seems to be using shaders aswell:
http://www.hardwaresecrets.com/datasheets/3Dc_White_Paper.pdf
Until recently, the GPUs have generally decompressed textures as part of reading them in from memory. Fermi and GCN are fancier, but aside from keeping compressed textures compressed until the pixels are actually being read, the physical implementations are not disclosed.

Time machine, go!
http://ixbtlabs.com/articles2/gffx/nv40-part1.html
Note how the compressed formats are the fastest.

You don't see much about today, because it's just there (required by DX10 and up), and that it became an on by default thing with the VRAM and bandwidth limitations of the last consoles. The last hardware I recall making a point about having it in hardware was Nintendo's Gamecube GPU, Flipper.

3Dc is new. It was developed using shaders, and we won't get to know, w/o benchmarking for it, knowing whether it's eating shaders in GPU uarch X, or at what point it might not be that way. S3TC, OTOH, has been in GPUs since DX6/7. It's part of the landscape, like hardware TnL.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Time machine, go!
http://ixbtlabs.com/articles2/gffx/nv40-part1.html
Note how the compressed formats are the fastest.

Thanks! That is some in depth info right there. Will look in detail into it later.
Comparing performance with and without compression is a hard nut. It all depends on hardware specifics, how it is balanced. I believe that if you take nvidia 8800 GTS 640MB 320-bit GDDR3 (G80 core) it will do better without (or will lower) compression due to its relatively high memory bandwidth and capacity compared to slow core (96 shaders 500MHz). On the other hand geforce GT640 (GDDR3) have 2x faster core but less than a half memory bandwidth - will work wonders with heavily compressed textures saving as much bandwidth as possible.
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
Returning to why Epic selected an i7 in their PC-PS4 comparisons.

After some search I found several people on forums who claims that the CPU on the PS4 peaks to 102 GFLOP

A Jaguar core's FPUs can do 4 MULs + 4 ADDs per cycle, or 8 FLOPS/cycle. That gives us 8 (FLOPS) × 1600 (MHz) × 8 (cores) = 102 GFLOPS.

This makes a lot of sense because then 1840 + 102 gives about the 2 TFLOP total that is quoted in some press releases.

The i5-3570k peaks to 108.8 GFLOP, but lacks HT. The i7-3770k peaks 112 GFLOP but has HT. The PS4 is a real 8-core design.

What I don't know exactly is the overhead that would consider on the CPUs from using a light OS.
 

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
Returning to why Epic selected an i7 in their PC-PS4 comparisons.

After some search I found several people on forums who claims that the CPU on the PS4 peaks to 102 GFLOP

A Jaguar core's FPUs can do 4 MULs + 4 ADDs per cycle, or 8 FLOPS/cycle. That gives us 8 (FLOPS) × 1600 (MHz) × 8 (cores) = 102 GFLOPS.

This makes a lot of sense because then 1840 + 102 gives about the 2 TFLOP total that is quoted in some press releases.

The i5-3570k peaks to 108.8 GFLOP, but lacks HT. The i7-3770k peaks 112 GFLOP but has HT. The PS4 is a real 8-core design.

What I don't know exactly is the overhead that would consider on the CPUs from using a light OS.
From an engineering standpoint, it looks like Sony learned its lesson(s) from the PS3. MHz = power cost, which adds a lot of overhead and monetary cost in the hardware design (not to mention failure points, etc.). By using more cores at a lower frequency, they can save a lot of power and make up performance with proper software optimization.
 

Rakehellion

Lifer
Jan 15, 2013
12,181
35
91
Remember that texture compression (or any compression) is a trade off between size and computation.

You're sacrificing computational time to compress/decompress textures to save space in memory.

Not true. Texture decompression is free on any modern GPU as it's built right into the hardware, not software. In fact, using uncompressed textures might actually be slower since they use more memory bandwidth.

That being said, using compressed textures on a modern game engine is pretty much mandatory if you want to stay competitive since the storage gains approach 10:1 and memory space is limited.
 
Aug 11, 2008
10,451
642
126
I really dont know anything about programming, but these gflop numbers dont really seem that significant to me in the real world for a cpu. They would only be relevant for 8 core vs 4 core if the workload somehow scaled perfectly to the 8 cores, and none of the processes were waiting for data from another process. I dont see how in the real world 8 cores at 1.6 ghz will come even close to an 8350, i5, or i7.

I thought this metric was more often used as a comparison for gpus where the workload is much more parallel than the typical workload for a cpu.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
Returning to why Epic selected an i7 in their PC-PS4 comparisons.

After some search I found several people on forums who claims that the CPU on the PS4 peaks to 102 GFLOP

A Jaguar core's FPUs can do 4 MULs + 4 ADDs per cycle, or 8 FLOPS/cycle. That gives us 8 (FLOPS) × 1600 (MHz) × 8 (cores) = 102 GFLOPS.

This makes a lot of sense because then 1840 + 102 gives about the 2 TFLOP total that is quoted in some press releases.

The i5-3570k peaks to 108.8 GFLOP, but lacks HT. The i7-3770k peaks 112 GFLOP but has HT. The PS4 is a real 8-core design.

What I don't know exactly is the overhead that would consider on the CPUs from using a light OS.

And the extractable performance is what?

Because a four core 3.2 ghz jaguar (equivalent to 8 cores at 1.6 ghz) does not match a 3570k at 3.7 ghz. Its weaker IPC wise and clockspeed wise.
 

Rakehellion

Lifer
Jan 15, 2013
12,181
35
91
I really dont know anything about programming, but these gflop numbers dont really seem that significant to me in the real world for a cpu. They would only be relevant for 8 core vs 4 core if the workload somehow scaled perfectly to the 8 cores, and none of the processes were waiting for data from another process. I dont see how in the real world 8 cores at 1.6 ghz will come even close to an 8350, i5, or i7.

I thought this metric was more often used as a comparison for gpus where the workload is much more parallel than the typical workload for a cpu.

It probably isn't as fast as an i7, but 8 cores really isn't that much for a CPU, especially if your software is guaranteed to be multithreaded as in the case of the PS4.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
And the extractable performance is what?

Because a four core 3.2 ghz jaguar (equivalent to 8 cores at 1.6 ghz) does not match a 3570k at 3.7 ghz. Its weaker IPC wise and clockspeed wise.

Except there is no such thing as a 3.2GHz Jaguar. The 2GHz ones that have been tested in some ways are faster than a same clocked SNB. Although I cannot vouch for their accuracy. Nobody has really been able to test a Jaguar, so nobody knows for sure how it will perform.

http://gamrconnect.vgchartz.com/thread.php?id=159069
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Returning to why Epic selected an i7 in their PC-PS4 comparisons.

After some search I found several people on forums who claims that the CPU on the PS4 peaks to 102 GFLOP

A Jaguar core's FPUs can do 4 MULs + 4 ADDs per cycle, or 8 FLOPS/cycle. That gives us 8 (FLOPS) × 1600 (MHz) × 8 (cores) = 102 GFLOPS.
This should only be for single-precision operations. Ivy Bridge can do 8 single-precision MULs and 8 ADDs per cycle per core.

Really, the only reason Jaguar is on the PS4 is it's the best that AMD can do.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
This should only be for single-precision operations. Ivy Bridge can do 8 single-precision MULs and 8 ADDs per cycle per core.

Really, the only reason Jaguar is on the PS4 is it's the best that AMD can do.

AMD and Sony worked closely in designing the PS4 SOC chip. Jaguar is chosen for its perf/sq mm and perf/watt and excellent multithreaded performance. a single Jaguar core has an area of 3.1 sq mm on TSMC 28nm process. 8 Jaguar cores measure just 25 sq mm. you can see various core sizes below. atom on 32nm measures 5.6 sq mm. haswell on 22nm measures 14.2 sq mm. llano on GF 32nm mesures 9.69 sq mm. sandybridge on Intel 32nm measures 18.4 sq mm. ivybridge measures 10 sq mm on Intel 22nm.

http://semiaccurate.com/forums/showthread.php?t=6857

http://www.amdzone.com/phpbb3/viewtopic.php?f=532&t=139016&p=215686#p215677

A quad core Temash (based on Jaguar) matches a core i3 sandybridge (dual core with HT) at the same clocks for multithreaded performance. power consumption on Temash is actually lower.

http://translate.google.com/transla...6597-amd-temash-specifikationer-och-prestanda

8 jaguar cores will match a quad core sandybridge with HT at the same clocks. even considering Intel 22nm process the 4 ivybridge cores measure 40 sq mm compared to 25 sq mm for Jaguar.

Also Sony's major concern while designing PS4 was to reduce losses and to make the console more affordable. The PS3 launched at USD 499 - 599 which was much higher than traditional launch price of USD 299. One of the major reasons being Bluray drive and the hardware cost of two big separate chips - Reality synthesizer (RSX) by Nvidia and Cell broadband engine by Sony. Also Sony lost money heavily even at those prices.

http://www.extremetech.com/gaming/155521-playtation-4-sony-announces-the-ps4-wont-be-sold-at-loss

With PS4 Sony needs a single AMD SOC which integrates 8 Jaguar CPU cores, 1152 GCN cores GPU, dedicated video decode and encode, integrated 256 bit GDDR5 memory controller and northbridge (maybe even southbridge, though its not known for sure). the PS4 can be expected to launch around USD 299 - 349. we will know more at E3 but the launch price of PS4 will be much lower than PS3.

The 1280 sp Pitcairn chip measures 212 sq mm. With 1152 GCN cores and 8 Jaguar cores in 2 quad core blocks with 2MB shared L2 cache (4MB L2 cache measures approx 12 sq mm on TSMC 28nm as 512k cache on TSMC 40nm measures 3 sq mm, so 1MB on 28nm will be the same 3 sq mm.).

http://www.chip-architect.com/news/2010_09_04_AMDs_Bobcat_versus_Intels_Atom.html

the entire PS4 APU SOC would be below 300 sq mm. On a mature 28nm process that should be easy to manufacture with good yields and quite cheap.

If Sony had chosen Intel and Nvidia they would have had to pay for the design of 2 chips and bear the extra manufacturing cost of 2 chips. also the 2 chip design would draw more power and take more area.

But the most important and often overlooked factor is the PS4 will feature full HSA integration. The programmability advantages of a single APU with unified address space and fully coherent memory between CPU and GPU are underestimated. this will allow programmers to do fine grained work distribution to the GPU's compute modules. the CPU and GPU have much faster communication and lower latency which allows things like CPU physics affecting GPU physics and vice versa.

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=1

So a simplistic statement saying Jaguar is on PS4 because its the best AMD can do says more about your lack of understanding. Jaguar was chosen based on specific requirements relating to perf/ sq mm and perf/watt. Piledriver or Llano are not going to be able to match perf/sq mm of Jaguar. Jaguar was chosen to allow Sony to design a chip which is easy to manufacture and quite cheap. this allows Sony to sell PS4 at lower prices while still reducing losses compared to PS3.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
After looking at the Temash review of Jaguar based cores. I would say my faith in any kind of performance for the PS4 is severely lowered. And I didnt have much hopes in it to begin with.

The Temash review said the following:
The low CPU performance, in more detail the low performance per core, slows down the graphics card in many games. We want to illustrate this with an example:

And thats a 1Ghz quadcore Jaguar.

No wonder we already see a 30FPS capped demo for the PS4.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
After looking at the Temash review of Jaguar based cores. I would say my faith in any kind of performance for the PS4 is severely lowered. And I didnt have much hopes in it to begin with.

The Temash review said the following:

And thats a 1Ghz quadcore Jaguar.

No wonder we already see a 30FPS capped demo for the PS4.

Temash is a tablet SOC with 1 Ghz Jaguar cores and 8W TDP. it has a single DDR3 1600 memory controller with 12.8 GB/s. low bandwidth and low core clocks.

PS4 SOC will have a 1.8 - 2.0 Ghz core clock. the chip will have a 256 bit GDDR5 memory controller at 176 GB/s. massive bandwidth for CPU. the PS4 SOC should have a TDP of around 100w. the entire PS4 will draw around 170w - 200w. also the software environment is worlds apart. a desktop OS and industry standard DX11 API is no match for a highly optimized console OS with low API overheads and close to metal programmability.

also the top game developers are aggressively pursuing multithreading in their game engines. Frostbite 3 and Cryengine 3 have already shown the direction they are taking. With both the next gen consoles having 8 cores you can bet the consoles multithreaded CPU horsepower will be exploited to the fullest.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
Temash is a tablet SOC with 1 Ghz Jaguar cores and 8W TDP. it has a single DDR3 1600 memory controller with 12.8 GB/s. low bandwidth and low core clocks.

PS4 SOC will have a 1.8 - 2.0 Ghz core clock. the chip will have a 256 bit GDDR5 memory controller at 176 GB/s. massive bandwidth for CPU. the PS4 SOC should have a TDP of around 100w. the entire PS4 will draw around 170w - 200w. also the software environment is worlds apart. a desktop OS and industry standard DX11 API is no match for a highly optimized console OS with low API overheads and close to metal programmability.

The PS4 will have a 1.6Ghz core clock. And that memory will have no effect on it. Its just for the GPU part. Lastly there is not so much OS difference as you wish to believe. It might end up to 5% or less in real world.

Its getting very clear that the nextgen consoles are discount solutions for yesterday, rather than tomorrow.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
After looking at the Temash review of Jaguar based cores. I would say my faith in any kind of performance for the PS4 is severely lowered. And I didnt have much hopes in it to begin with.

The Temash review said the following:


And thats a 1Ghz quadcore Jaguar.

No wonder we already see a 30FPS capped demo for the PS4.

You do realize that the games will be designed specifically to run on an APU with those 8 Jaguar cores? Trying to compare performance outside of that parameter is meaningless. It's like using tires that are made from racing compounds on the street.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
You do realize that the games will be designed specifically to run on an APU with those 8 Jaguar cores? Trying to compare performance outside of that parameter is meaningless. It's like using tires that are made from racing compounds on the street.

Games will not use 8 cores, at best they will use 7 since 1 core is dedicated to the OS. Just like 1GB memory is reserved as well. Nor will the PS4 magically render Amdahls law irrelevant in terms of core scaling.

You are trying to tell me there is some fairy dust. I tell you there is reality.

PS4 demos capped at 30FPS already due to limitations.
 
Status
Not open for further replies.