Discussion i7-5775C: the blast from the past - Effect of large L4 cache in high-refresh-rate gaming

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
Ian is in the middle of updating the Anandtech bench results for 2020 CPU tests.

He recently added the Broadwell i7-5775C desktop CPU, the one with 128 MB EDRAM L4 on a separate die. With all the Zen 3 L3 hype going on, I took a peak how is this CPU performing now, in games as High-Refresh rate gaming really wasn't a thing back then and the GPUs were the absolute bottleneck.

I was really surprised by the results.

Not only does the i7-5775C (3.3 base, 3.7 boost) win handily vs the i7-4790K (4.0 base 4.4 boost) Haswell:

In a number of charts it beats every single current CPU, despite being a lowly-clocked 4/8 Broadwell core:

Civilization VI - 1080p Max (1st in both FPS and 95th Percentile)
Strange Brigade 1080p Ultra (1st in both FPS and 95th Percentile)
F1 2019 1080p Ultra (4th beating everything south of 10600K in both FPS and 95th Percentile)

Now obviously it's a small subsample of the games and it doesn't fair anywhere near as well in others like FF 15, Borderlands 3 (and probably any heavily multithreaded games like BF5).

But still, comparing it to even the 7700K (that has a slight IPC advantage and nearly a GHz clock-speed advantage) it's still nect-to-neck. I'd even say that in gaminig benchmarks the 7700K loses more than it wins (in productivitiy it's obviously the other way around).

Really impressive overall! Too bad those Crystal Well EDRAM caches never manifested themselves in later CPUs. Can't imagine how well a modern chip with 14nm 256MB off-die EDRAM would perform now.
 

arandomguy

Senior member
Sep 3, 2013
556
183
116
It's worth keeping in mind that Anandtech's test setup for CPUs use a "stock" JEDEC memory configuration. Anandtech's reasoning is they feel most users don't even enable XMP (much less any further memory tuning) and so that default configuration is more representative. This does however mean that the relative results for the 5775C in gaming (which will become very memory sensitive when targeting those high frame rates) draw relatively greater benefit from the cache setup compared to what is possible with even just using XMP settings on relatively common OCed ram kits.

In practice we aren't even talking about more expensive and highly binned kits like those using Samsung B-Dies. DDR4 3200C16 memory kits have really for the last few years now been essentially the minimum spec in terms of what you'd buy in practice from a DIY enthusiast stand point. Anything lower than that is sometimes even slightly more expensive and/or harder to find at this point.
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
It's worth keeping in mind that Anandtech's test setup for CPUs use a "stock" JEDEC memory configuration. Anandtech's reasoning is they feel most users don't even enable XMP (much less any further memory tuning) and so that default configuration is more representative. This does however mean that the relative results for the 5775C in gaming (which will become very memory sensitive when targeting those high frame rates) draw relatively greater benefit from the cache setup compared to what is possible with even just using XMP settings on relatively common OCed ram kits.

In practice we aren't even talking about more expensive and highly binned kits like those using Samsung B-Dies. DDR4 3200C16 memory kits have really for the last few years now been essentially the minimum spec in terms of what you'd buy in practice from a DIY enthusiast stand point. Anything lower than that is sometimes even slightly more expensive and/or harder to find at this point.
That's a valid point. I really wish they'd add memory overclocking to at least the "overclocking" page of the reviews. It's far more relevant nowadays than core OC, with both vendors pretty much maxing out single-core boost out of the factory.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Trying to tune the memory configuration for dozens of different processors on dozens of different boards is an extremely ornerous task. I don't expect it, nor would I even bother asking it. I ask that they only test at the maximum officially supported memory speed for each platform.

Does that negatively impact some platforms more than others? Sure! But, those choices were made by the vendors and manufacturers.

Going forward, with AMD championing such large L3 caches, and with DRAM speeds slated to get much faster, I don't see much point in having large L4 caches in desktop systems.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
Going forward, with AMD championing such large L3 caches, and with DRAM speeds slated to get much faster, I don't see much point in having large L4 caches in desktop systems.
Well i could see it happening on a fesign with active interposer (say Zen 4) thst would perhaps have an even smaller L3 but a large L4 cache or locsl memory pool for all the chiplets. Could even be HBM2
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
L4 cache as implemented in Broadwell had two nasty problems:

1) L4 tags ate 2MB of L3, leaving less for everything.
2) Overall L4 latency and BW were pretty bad. I think ~40-50GB/s and ~45ns. To put these numbers to perspective, by end of DDR3 era, people were running 2000CL11 setups that had like 35GB/s ~40ns latency.

So nothing to write home about, easy for enthusiasts to beat these numbers without paying L3 size and not paying L4 tag check price on each and every memory access. Bad deal for silicon spent in controller, tags and actual cache chip.

Even Intel realized these problems and in Skylake gen moved from L4 cache with tags to "system side" L4 memory cache, but i don't think they had much success with those either.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Feels like AMD's Zen team looked at that EDRAM size while thinking about the possible L3$ design and thought "hold our beer". With the 5775C Intel essentially showed that a larger cache size is of use not only in datacenters but also in gaming systems (thus leading to the infamous moniker "gamecache").

Though one could argue consoles set a precedence using EDRAM as a cache, like in PS2, GC, PSP, Wii, Wii U, Xbox 360, Xbox One (ESRAM in that one case)...
 
  • Like
Reactions: Tlh97

richierich1212

Platinum Member
Jul 5, 2002
2,741
360
126
The 5775C wasn't readily available at launch back then. I remember contemplating ordering one from Amazon.jp to mess around with. It would've replaced my 4790K but it was a sidegrade at best and broadwell didn't clock as high as Devil's Canyon.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Those generally weren't caches- they were explicitly managed memory pools.
That's true. Due to console development mostly (especially on older consoles) happening close to the metal it was up to the games how everything was handled, so deliberate management of all memory was more likely used than "random" caching.
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,712
142
106
This was a tempting CPU at the time.
Also worth noting is the yorkfield Core 2 Quad had 12MB of L2 (6MB per core). This gave that cpu a nice edge over the 4MB competition.
I kinda miss the big L2 caches, and still question the need for L3/4. L2 can be completely shared between all cores. However an HBM chip on package might make sense today. APUs would love it.
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,691
136
Also worth noting is the yorkfield Core 2 Quad had 12MB of L2 (6MB per core). This gave that cpu a nice edge over the 4MB competition.

It wasn't really 12MB, but 2x 6MB since Yorkfield was dual die. To access the other die also meant a trip over the FSB, so latency suffered quite a bit between dies.

So essentially more of a dual-socket-on-chip setup, but it was pretty cool at the time.
 

ZGR

Platinum Member
Oct 26, 2012
2,052
656
136
Those performance numbers are fantastic. It really is too bad Intel didn't follow through with more L4 CPU's. They did have Skylake L4 options, but never in an overclockable desktop form factor.

The L4 cache was improved with Skylake and Kaby Lake. Broadwell's L4 cache is inferior. No idea what the performance uplift would be.


"Rather than acting as a pseudo-L4 cache, the eDRAM becomes a DRAM buffer and automatically transparent to any software (CPU or IGP) that requires DRAM access. As a result, other hardware that communicates through the system agent (such as PCIe devices or data from the chipset) and requires information in DRAM does not need to navigate through the L3 cache on the processor. "

Since Skylake's launch, I always felt like my 5775C had the beta version of the L4 cache. So to see these kind of performance numbers kinda saddens me. What would a 10900k look like with the L4 cache? Wish we could see.
 
  • Like
Reactions: Tlh97 and Drazick

zir_blazer

Golden Member
Jun 6, 2013
1,160
400
136
Did anyone theorycrafted whenever it was possible to couple a 6C/8C Skylake die with a 128 MiB Crystalwell from a physical size perspective? Just check LGA 1150 Broadwell. Given than Skylake supporting Crystalwell was a fact, I suppose that Intel may not have been able to put both on a desktop socket package. Note that the interface to the Crystalwell die may have been removed in the newer 6C+ die designs, so it may require a new die to actually put it back.
Also, consider costs. There is an extra die, a more complex package PCB and MCM assembly that should be significantly more expensive than standard single die parts, but is weird that Intel bothered to do so on Broadwell when it was already superior to any AMD offering (Except APUs on GPU side) but doesn't do it now when it actually needs every boost that it can get. Ironically, the two desktop Broadwell models were barely more expensive that the previous Haswells, so for the end user they were beautifully priced considering how much silicon you were getting. I suppose that it ballooned manufacturing cost at a time when there was no need to do so and cut into profits. There may be a reason why it launched so close to Skylake, Intel artifically wanted to lower demand but launch a generation anyways as to not skip it.
 

Leeea

Diamond Member
Apr 3, 2020
3,599
5,340
106
wow.

I purchased a i7-4790k in July 2017 at $360* as the final upgrade for my 1150 board, which I use as my daily driver to this day.

Three years later now wondering in hindsight if I should have gone for this thing instead. There is a long list of games where this thing spanks** my 4790k. Probably would have been much cheaper to.

*that was expensive for it, and it was out manufacture even then. At the time I perceived the 4790k as the last upgrade for my 1150 mainboard for four core gaming.

**there is a small chance my xmp 1600 low cas ( 8-8-8-24 ) memory helps the 4790k, but said memory would also help the i7-5775C
 
Last edited:
  • Like
Reactions: Tlh97

Leeea

Diamond Member
Apr 3, 2020
3,599
5,340
106
apples to oranges:

So having acquired Strange Brigade on steam sale, I decided to see how well my system ( with a Vega 56 ) would do against Anandtech's bench test system ( which has a 2080 Ti ).
Mine was destroyed, dx12 1440p low result of 161 avg fps on mine comparted 276 avg fps on Anandtech's.

It would seem in $ per fps upgrade terms, there is no point in upgrading my ancient 4790k. For me, the graphics card continues to be the most important factor.

I suppose I should be delighted by my cpu first manufactured in 2014 slogs on. Saves a lot of money. I really wanted to upgrade to Ryzen though :\.


--------------------------

oranges to apples:

So I decided next to take a look at Anandtech's GPU comparison benches, from 2019.

Anandtech's 2019 gpu comparison uses an i9-9900k + 32 GB ddr4-3600 low cas.
My computer uses i9-4790k + 16 GB ddr3-1600 low cas.

with strange brigade 1440p ultra quality:
Anandtech with Vega 56 from 2019 turns out avg of 91.4 fps
my system with Vega 56 in 2020* ( default present-balanced ) turns out avg 94.2
my system with Vega 56 in 2020 ( overclocked ) turns out avg 102 fps

*2020 also means I have 2020 drivers, 2020 strange brigade patches, 2020 windows 10 patches, etc.

It appears to me the CPU + memory is all but irrelevant at the graphics settings I play at provided:
-said CPU has enough cores
-enough memory is provided to avoid the swap file
-graphics card is inferior to 2080 ti
 
Last edited:

zir_blazer

Golden Member
Jun 6, 2013
1,160
400
136
Well, wadda you know? Dr. Ian Cutress is monitoring these forums :D

I think that a big point that he didn't mentioned about Broadwell is that Intel limited its support to the new H97 and Z97 Chipsets (That were released I think around the same time that Devil's Canyon), but not the original 8 generation series Chipsets released with the Haswell launch like the Z87. The fun thing is that the Workstation/Server C226 Chipset (NOT the C222), which is from the 8 series generation and launched about the same time than Haswell, DOES oficially support Xeons E3 1200v4 Broadwell. So the whole point about needing an extra pin or whatever lame excuse Intel came up to justify not supporting Broadwell on previous generation was completely bogus, and it literally killed any meaningful upgrade path for first Haswell adopters.
Due to Broadwell minimal popularity, I don't even recall if BIOS modders managed to get it working in older Motherboards or not.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
Well, wadda you know? Dr. Ian Cutress is monitoring these forums :D

Well tbf, I think Ian had this article in the works long before i made the thread (and it's the resson these benches were up in the first place).

Still agree with his ideas in the end. High end CPUs are expensive enough now that a pure gaming oriented (igp-uless) SKU with a 2GB HBM2 cache on package might make some sense (once the package is 2.5D anyhow for higher end chips)
 
Last edited:

SPBHM

Diamond Member
Sep 12, 2012
5,056
409
126
very nice gaming results still,

the new Radeons have that 128MB cache that I think is over 1TB/s
it would be nice to have something like that in a CPU, well they already have 32MB for Zen3, but,
 

Panino Manino

Senior member
Jan 28, 2017
813
1,010
136
As we move into an era where AMD is showcasing its new ‘double’ 32 MiB L3 cache on Zen 3 as a key part of their improved gaming performance, we already had 128 MiB of gaming acceleration in 2015. It was enabled through a very specific piece of hardware built into the chip. If we could do it in 2015, why can’t we do it in 2020?

In this case, this extra cache would be inside the IOD, right?