Question Raptor Lake - Official Thread

Page 144 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hulk

Diamond Member
Oct 9, 1999
4,269
2,089
136
Since we already have the first Raptor Lake leak I'm thinking it should have it's own thread.
What do we know so far?
From Anandtech's Intel Process Roadmap articles from July:

Built on Intel 7 with upgraded FinFET
10-15% PPW (performance-per-watt)
Last non-tiled consumer CPU as Meteor Lake will be tiled

I'm guessing this will be a minor update to ADL with just a few microarchitecture changes to the cores. The larger change will be the new process refinement allowing 8+16 at the top of the stack.

Will it work with current z690 motherboards? If yes then that could be a major selling point for people to move to ADL rather than wait.
 
  • Like
Reactions: vstar

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
When RT is turned on, the CPU has to build and maintain the BVH structure, and that is CPU and bandwidth intensive apparently. The overclocked 12900K with DDR5 6400 nearly catches up with the stock 13900K with slower DDR5 5600 RAM and that is very indicative of how bandwidth plays a major role.

But this goes into what I've been saying about how when there is a high amount of ILP in a workload, RPL and ADL just pull ahead like no one's business. Usually most desktop applications lack high ILP in the code, but in certain types of workloads they are much more common. I'm not presenting this as a fact, just a theory.
 
  • Like
Reactions: Henry swagger

Tarkin77

Member
Mar 10, 2018
75
163
106
Games with ray tracing love wider and deeper cores.. 512 rb is too strong
Or maybe Windows 11 2H22 doesn't like Ryzen


Ryzen does much better with Windows 10 (not tested with Zen 4, but Zen 3 results look fine)

PCGH has 13900K over 100% faster than Zen 2 with 8 Cores, at gamecpu.ru RTL is not even 40% faster.. so it would not suprise me, if Zen 4 is actually stronger in this game if both test systems would use Windows 10.
 
Last edited:

Harry_Wild

Senior member
Dec 14, 2012
838
152
106
If you re like me doing general use computer tasks like internet surfing, watching streaming content, email, you want to focus on single core performance figures of the CPU. Look for higher clock speed and fast DDR 5 speed too. Fast NvMe like PCIe 4.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Or maybe Windows 11 2H22 doesn't like Ryzen


Ryzen does much better with Windows 10 (not tested with Zen 4, but Zen 3 results look fine)

PCGH has 13900K over 100% faster than Zen 2 with 8 Cores, at gamecpu.ru RTL is not even 40% faster.. so it would not suprise me, if Zen 4 is actually stronger in this game if both test systems would use Windows 10.

I don't think you're looking at the right graph. You're looking at the one with RT disabled, which shows the 13900KF being 37.5% faster than the 3900x. But when you look at the RT enabled graph, the difference jumps up to 80% in favor of the 13900KF.

I actually just bought the game out of curiosity, and it runs very smooth with no hitching or anything. I'm practically locked at 120 FPS with V-sync on with no DLSS or anything, just DLAA and maxed out settings.
 
  • Like
Reactions: Henry swagger

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Kind of makes you wonder why 3D cache ala 5800X3D wouldn't be the preferred solution to that problem.

I don't know. The 5800X3D is 23% faster than the 5800x, but the 13900K with DDR5-5600 is 35% faster than the 12900K with DDR5-4400 in that benchmark. To me this indicates that the working set or whatever must be too large to comfortably hold in cache because a 35% increase is huge. Also, when they outfitted the 12900K with DDR5-6400, it caught up with the 13900K with DDR5-5600 which implies system memory bandwidth is the main bottleneck. It would be interesting if they would test with DDR5-7200 or faster on the 13900K just to see the difference.

Also, the developer stated that the CPU is also decompressing data from storage on the fly as well so that's also another factor in the performance of this game. When I used MSI Afterburner, I saw all 32 threads on my 13900KF lit up like a Christmas tree with Spider-Man remastered. Not maxed out by any means, but they were all being well utilized. That's the first time I've ever seen that in any game I've played so far on my new system because there's always some idling cores while gaming.
 
  • Wow
Reactions: igor_kavinski
Jul 27, 2020
16,831
10,781
106
Also, when they outfitted the 12900K with DDR5-6400, it caught up with the 13900K with DDR5-5600 which implies system memory bandwidth is the main bottleneck.
Suppose the 7600X3D will have 64MB V-cache due to its relatively lower price point. With DDR5-6000 CL30, if the cache can always fulfill the requests and the CPU cores remain busy with that requested data while the cache is refilled with the next block of required data quickly enough, it could result in more utilization of the available cores. I can see the 7600X3D being able to overcome the requirement of higher speed RAM to satisfy bandwidth hungry games like Spiderman Remastered. And if the Zen 4 cores are really being held back by slower RAM, the 7600X3D could trounce the 13600K in 90% of games tested.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Suppose the 7600X3D will have 64MB V-cache due to its relatively lower price point. With DDR5-6000 CL30, if the cache can always fulfill the requests and the CPU cores remain busy with that requested data while the cache is refilled with the next block of required data quickly enough, it could result in more utilization of the available cores. I can see the 7600X3D being able to overcome the requirement of higher speed RAM to satisfy bandwidth hungry games like Spiderman Remastered. And if the Zen 4 cores are really being held back by slower RAM, the 7600X3D could trounce the 13600K in 90% of games tested.

A large L3 cache would definitely help, as evidenced by the fact that the 5800x3D is 23% faster than the regular 5800x. The question is, would it be enough to beat Raptor Lake? Perhaps, but the lead that Raptor Lake has is absolutely massive in this title, about 2 generational leaps. From what I've seen, system memory bandwidth is the trump card and that likely has a lot to do with the decompression being done by the CPU just as much as the BVH building and maintenance.

Cache helps a lot, but when the working set is too large the CPU has no choice but to access system memory, and in this particular game, that look to be very often.
 

biostud

Lifer
Feb 27, 2003
18,286
4,813
136
I don't know. The 5800X3D is 23% faster than the 5800x, but the 13900K with DDR5-5600 is 35% faster than the 12900K with DDR5-4400 in that benchmark. To me this indicates that the working set or whatever must be too large to comfortably hold in cache because a 35% increase is huge. Also, when they outfitted the 12900K with DDR5-6400, it caught up with the 13900K with DDR5-5600 which implies system memory bandwidth is the main bottleneck. It would be interesting if they would test with DDR5-7200 or faster on the 13900K just to see the difference.

Also, the developer stated that the CPU is also decompressing data from storage on the fly as well so that's also another factor in the performance of this game. When I used MSI Afterburner, I saw all 32 threads on my 13900KF lit up like a Christmas tree with Spider-Man remastered. Not maxed out by any means, but they were all being well utilized. That's the first time I've ever seen that in any game I've played so far on my new system because there's always some idling cores while gaming.
Maybe developers should start utilizing DirectStorage with GPU decompression.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Maybe developers should start utilizing DirectStorage with GPU decompression.

Is GPU decompression part of the Direct Storage standard? If it's not, I don't think we can expect that anytime soon. The efficiency cores on ADL and RPL definitely help out with decompression, as I saw activity across all of them when playing Spiderman Remastered and I haven't seen that in any game so far.
 
  • Like
Reactions: biostud

Rigg

Senior member
May 6, 2020
472
979
136
Is GPU decompression part of the Direct Storage standard? If it's not, I don't think we can expect that anytime soon. The efficiency cores on ADL and RPL definitely help out with decompression, as I saw activity across all of them when playing Spiderman Remastered and I haven't seen that in any game so far.
Yes.
DirectStorage-Modern-IO-e1644518533263.jpeg
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I wonder, would using GPU decompression result in a framerate decrease? I don't want it to be like PhysX where turning it on affects performance. It will only be successfully adopted if it can be used seamlessly with no performance decrease to rendering.

For a while in the past there was a concerted effort to have the GPU take on workloads that were predominantly ran on CPUs, ie physics and encoding. GPU based physics became less and less palatable over time due to the additional costs of having a dedicated PhysX card and or the performance issues associated with running physics on the GPU in terms of additional latency costs and penalties to rendering performance.

And as CPUs increased core count and became more powerful, that drove the final nail into the coffin of GPU accelerated physics.

GPU based encoding was even more shortlived as I recall because while the GPU could encode at neck breaking speed, the quality was so poor as to make it unviable.

So my point being that exporting some of these workloads to the GPU could end up backfiring and lead to overtaxing the GPU and underutilizing the CPU.
 

JustViewing

Member
Aug 17, 2022
139
239
76
If you re like me doing general use computer tasks like internet surfing, watching streaming content, email, you want to focus on single core performance figures of the CPU. Look for higher clock speed and fast DDR 5 speed too. Fast NvMe like PCIe 4.
For those type of workloads you really don't need a very fast single threaded performance. With a good NVME and good internet connection, I doubt anyone would notice any performance difference between latest CPU and a 5 year old 4core CPU. I doubt that you even need a NVME drive, SATA SDD will do just fine.
 

coercitiv

Diamond Member
Jan 24, 2014
6,257
12,197
136
Maybe developers should start utilizing DirectStorage with GPU decompression.
They will, what we're seeing now is the stopgap solution.

I wonder, would using GPU decompression result in a framerate decrease? I don't want it to be like PhysX where turning it on affects performance. It will only be successfully adopted if it can be used seamlessly with no performance decrease to rendering.
Of course there's going to be a price to pay in terms of performance, but I don't understand why you would think of it in terms of loss, when it's a pure gain. The alternatives are CPU decompression which affects performance to a larger degree, or keeping the loading times, when the FPS is effectively zero. A third option would be a dedicated logic block, but even that has a hidden cost.

I'm more excited about Direct Storage than I am about Ray Tracing (for the immediate future), it brings a more palpable gain to game immersion.
 
  • Like
Reactions: Elfear

Timmah!

Golden Member
Jul 24, 2010
1,430
660
136
When RT is turned on, the CPU has to build and maintain the BVH structure, and that is CPU and bandwidth intensive apparently. The overclocked 12900K with DDR5 6400 nearly catches up with the stock 13900K with slower DDR5 5600 RAM and that is very indicative of how bandwidth plays a major role.

But this goes into what I've been saying about how when there is a high amount of ILP in a workload, RPL and ADL just pull ahead like no one's business. Usually most desktop applications lack high ILP in the code, but in certain types of workloads they are much more common. I'm not presenting this as a fact, just a theory.

Is this something you know for sure, or just a guess? Cause i was under impression this is what the RT core inside the GPU does, at least in case of Nvidias GPUs.

see here:


It concerns Vray and not games, but it specifically says that BHV thing is part of "raycasting" process, which is accelerated by RT cores. Thus if GPU itself has dedicated hardware part to do this computation, it makes no sense for CPU do it, which is as multi-purpose as it gets, and different chip on top of that.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Of course there's going to be a price to pay in terms of performance, but I don't understand why you would think of it in terms of loss, when it's a pure gain. The alternatives are CPU decompression which affects performance to a larger degree, or keeping the loading times, when the FPS is effectively zero. A third option would be a dedicated logic block, but even that has a hidden cost.

I never said it's going to be a pure loss, just that the technology's reception hinges on how big, small or negligible the performance hit is for implementing it. I mean, it's not like we haven't seen this type of thing before with GPU accelerated PhysX and that turned out to be a failure for several reasons. If the performance hit is significant with GPU decompression, ie 10% or more, then a lot of gamers won't embrace it.

Core count on CPUs has continued to increase to the point where the vast majority of games engines can't fully utilize 8 core CPUs (let alone 16 cores or more), while GPU workload has increased with the addition of higher resolution monitors, faster refresh rates, advanced graphical effects like ray tracing that burdens the GPU with heavier workloads than ever before. Ideally, if it could be done asynchronously it would probably work without much of a performance hit but then developers are going to have to balance it properly.

Perhaps I am being overly pessimistic because I've been a PC gamer for a long time and I had a front row seat during the GPU accelerated physics era, which was hailed as the next big thing in gaming. I actually used and enjoyed GPU PhysX for a few years back in the day with dedicated GPU PhysX cards, but the technology never saw wide adoption primarily due to being proprietary. Nowadays, all the advanced physics effects that used to be possible only with hardware acceleration can now be done via software on the CPU with even higher performance and efficiency due to how much more powerful CPUs have become.

I'm more excited about Direct Storage than I am about Ray Tracing (for the immediate future), it brings a more palpable gain to game immersion.

I don't see how Direct Storage is going to increase game immersion. Games already load so fast already on PC with a fast multicore CPU and good NVMe drive, and streaming/asset decompression technology has improved by leaps and bounds over the years with faster, higher core count CPUs, faster RAM, larger frame buffers and faster storage.

I remember the frenzy the Playstation fanboys were in when Epic showcased the Unreal Engine 5 PS5 demo and a lot of them were saying how that wouldn't be possible on PC or the XSX as they lacked the super fast, high decompression rate hardware blocs the PS5 uses. But the demo ran even better on comparable PC hardware without the PS5's hardware decompression.

And heck, if Star Citizen with its amazingly huge environments can run well without GPU decompression, then I struggle to think just how much more useful GPU decompression could be.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Is this something you know for sure, or just a guess? Cause i was under impression this is what the RT core inside the GPU does, at least in case of Nvidias GPUs.

In the Digital Foundry Spider-Man Remastered PC video, Alex goes into why the game is so CPU dependent and he directly asked Nixxes and according to them, it's because "BVH building is very expensive and the extra cost on PC of decompressing game assets from storage into memory."


It concerns Vray and not games, but it specifically says that BHV thing is part of "raycasting" process, which is accelerated by RT cores. Thus if GPU itself has dedicated hardware part to do this computation, it makes no sense for CPU do it, which is as multi-purpose as it gets, and different chip on top of that.

I think it's similar in many ways to how the CPU and GPU work together to render a frame. The CPU sets it up, and the GPU executes. It's the same with ray tracing, in that the CPU sets up the BVH while the GPU executes.
 
  • Like
Reactions: Timmah!