Markfw
Moderator Emeritus, Elite Member
- May 16, 2002
- 27,404
- 16,255
- 136
If anybody is a cheerleader here its YOU, for Intel.Amd is not paying you so stop with the cheerleading lol
If anybody is a cheerleader here its YOU, for Intel.Amd is not paying you so stop with the cheerleading lol
Amd is not paying you so stop with the cheerleading lol
Do you still have the Core X CPU? Which one is it? Could you post a screenshot of MaxxMem² on it?I had to overclock the mesh by 25% on my Cascade Lake-X CPU just to get close to the latencies of their desktop CPUs.
Amd is not paying you so stop with the cheerleading lol
So AMD giving the middle finger to it's customers is cheerleading? Ha!
*Intel's 10nm was equivalent to foundries' 7nmThis is true, but @Hulk is still correct. Intel is jumping from ~7nm to 20A which is a GAAFET node. TSMC has 10, 7, 5, and 3 (plus tweaked versions for each) before moving to GAAFET at 2/20A. Samsung also had a 5 node before moving to GAAFET at 3 and is scheduled to move to 2/20A in 2025. Intel is trying to catch up (and even leap over) the others and is jumping forward a bit comparatively.
Also, from what I've seen discussed/rumored online, Intel's 20A and 18A are basically the same process. It's just that 20A doesn't have all of the features and cell libraries available yet.
There are stock EPY Genoa CPUs today with 64 Cores that are faster in MT than the 5995WX, So one could build a single CPU monster WS today.Seriously though, it looks like the 5995WX isn't even really threatened by many/any of these Sapphire Rapids-W CPUs. Nobody's really lighting a fire under AMD's butt just yet.
There are stock EPY Genoa CPUs today with 64 Cores that are faster in MT than the 5995WX, So one could build a single CPU monster WS today.
No, I just sold it a few weeks back when I got my 13700KF.Do you still have the Core X CPU? Which one is it? Could you post a screenshot of MaxxMem² on it?
*Intel's 10nm was equivalent to foundries' 7nm
*Similarly, Intel's 7nm (renamed to Intel 4/3) is equivalent to something between foundries' 5nm and 3nm
*Intel's 20/18A is a rename from 5nm and is equivelent to foundries' 2nm (remember that Intel does bigger jumps per node, so two Intel nodes may be more equivelent to three rather than two foundry nodes)
*Intel 18A is *not* the same as 20A, just as Intel 3 is not the same as 4. There have been various improvement in order to yield the 10% improvement in performance per watt; in the old nomenclature it would have been called 5nm+(+)
There are actual numbers of the node itself, which are measured the same way across foundries, that you could use to compare Intel 4 and TSMC nodes.Where is the evidence Intel are doing bigger jumps than the foundries? They tried that with the original 10nm which was supposed to scale 2.4x, and got badly burned and had to back it off. i.e. similar to how TSMC had to back off N3 with N3E, but it didn't take them years to concede there was a problem like it did for Intel. Intel isn't making that mistake again, they are not going to try to be very aggressive in the future, especially with how quickly they want us to believe they can roll out new processes.
Until we see some figures for SRAM cell size and transistors/mm^2 - for actual Intel CPUs, not claimed figures no one approaches in real life and aren't even measured in the same way across foundries - there's really no way to know how the processes compare.
And until Intel actually start shipping stuff when they claim they will it is all academic. Intel buries us in roadmaps showing how they are going to be releasing these processes on a yearly cadence (actually even faster than that, given how they keep pushing the EUV stuff out) but they have been far from reliable in releasing new processes and have yet to ship anything made using EUV. Given Samsung's continued problems getting remotely decent yields with their EUV based processes, no one should assume Intel is suddenly going to turn around the screwups they've had on their fab side for the better part of a decade.
Both are getting updates to be able to run up to 192 GB of DDR5 sooner or later. That's the 48x4 kits from Corsair but the fastest they go up to are relegated to the 4 dimm population speed and are on samsung dies. Cl40, too. The 48 and 96 GB kits are x2 stick kits. I don't know of any 64 GB sticks for a 128 GB kit at the moment. And anything in that size range will be stuck at 5600 or 6000.I've got a dead X399 board as decoration somewhere... haven't felt the need to deal with HEDT platform headaches in a while given how powerful consumer CPUs have become in the past 4 years.
I'd be happy for either Intel or AMD to change my mind.
Good point. I totally forgot about that, I think you are right.My first guess would be "bandwidth per core" refers to the core to fabric interface, more so than the caches. Anyone keep track of what that has been for the last few gens?
Where is the evidence Intel are doing bigger jumps than the foundries?
There are actual numbers of the node itself, which are measured the same way across foundries, that you could use to compare Intel 4 and TSMC nodes.
Ironically enough however, despite having a ~15?% lower density than TSMC 5nm, The 512KB data array in RWC is smaller than in Zen 4. Same story in Zen 3 vs ADL for SRAM in L2. Design stuff I guess.
Just want to add, I made a slight mistake on INT.Integer performance: Integer performance is very difficult to increase and improvement in Integer performance indicates "quality" of the core. It is a complex combination of latency, frequency, balance of units, branch prediction(and pipeline length), across all areas. You cannot have too big of an L1 cache as it increases latency, but can't be too small. You cannot just increase branch targets either, latency will come into play and so does performance of the algorithm. Decoders won't scale without beefing up the rest.
How do you increase top speed of a car from 200km/h to 300km/h? Just double engine displacement and horsepower? No. Aerodynamics needs to be greatly improved, since wind resistance is the limit at high speeds. You need the transmission to keep up so it does not fail, and also switch fast and seamlessly. And you need to do all that while making it lighter. You need a capable driver, because otherwise an accident might happen. Not a simple solution at all.
The 787 achieved 20% fuel consumption reduction by moving to composite materials rather than just aircraft aluminum for the chassis. Then they had to move the battery to lighter lithium technology. The engine has been replaced for a bigger, and slower running turbojet. And they had to do aerodynamic work using CFDs as well.
That's why Integer performance is the most important aspect of a general purpose CPU. It is said that in order to chase 1% improvement in performance, CPU architects did what can be called "heroic" work. It's amazing what they are doing now. The work pays off though. Improving Integer performance, or uarch benefits everything. Deep learning, floating point, word processing, gaming, emulation, snappiness.
That's why it's absolutely retarded when mega corporations mistreat employees, especially veteran ones. These kind of decisions require very seasoned, and experienced architects with 30+ years of experience. That's an entire lifetime doing nothing but being an CPU architect and being top of the field at that!
Density: The L2 array is not LLC(Last Level Cache) anymore and is partway a core cache. It has more stringent voltage and power requirements, thus the cells used aren't exactly bog standard ones for L3.
They could have improved the density aspect compared to the competition in Redwood Cove, thus making it more favorable against chips such as Zen 4. This is me speculating based on what you said though.
Also, there is a fundamental limit. A company that does a better job scaling down will reach limits faster than those that won't. You could argue the limits "democratize" compute and make latest technologies available to the small companies and those with less resources.
That's why DRAM has been on the 10nm class node(10-19nm) for a 5 years now. 10x, 10y, 10z, 10a, 10b, and they even talk about 10y! They will reach a decade, since Micron just announced 10b(10 Beta) availability as being the most advanced node. 10x = 17-18nm depending on the manufacturer.
Very much a possibility the designations are as follows:
10x = 17-18nm
10y = 16-17nm
10z = 15-16nm
10a = 14-15nm
10b = 13-14nm
10y = 12-13nm
1nm improvement per "generation".
Why? Because DRAM at 10nm is far, far denser than logic DRAM(3x the density of eDRAM) and that is denser than logic cells. TSMC is showing almost no gains for SRAM on the N3 node, meaning even SRAM is hitting hard limits. Logic isn't hitting yet because it's less dense.


SPR continues to be broken? Hm.CH&CH Artilce about Sapphire Rapids L3 Cache performance..
![]()
Sapphire Rapids: Golden Cove Hits Servers
Last year, Intel’s Golden Cove brought the company back to competitive against AMD.chipsandcheese.com
View attachment 78051
View attachment 78052
I must ask. What Capacity? The CPU is starving of L3 at only 112.5 MiB per CPU.
Chips and Cheese said:On Intel DevCloud, the chip appears to be set up to expose all four chiplets as a monolithic entity, with a single large L3 instance. Interconnect optimization gets harder when you have to connect more nodes, and SPR is a showcase of this. Intel’s mesh has to connect 56 cores with 56 L3 slices. Because L3 accesses are evenly hashed across all slices, there’s a lot of traffic going across that mesh. SPR’s memory controllers, accelerators, and other IO are accessed via ring stops too, so the mesh is larger than the core count alone would suggest. Did I mention it crosses die boundaries too? Intel is no stranger to large meshes, but the complexity increase in SPR seems remarkable. . . Intel engineers now have an order of magnitude more bandwidth going across EMIB stops. The mesh is even larger, and has to support a pile of accelerators too. L3 capacity per slice has gone up too, from 1.25 MB on Ice Lake SP to 1.875 MB on SPR.
From that perspective, Intel has done an impressive job. SPR has similar L3 latency to Ampere Altra and Graviton 3, while providing several times as much caching capacity. Intel has done this despite having to power through a pile of engineering challenges. But from another perspective, why solve such a hard problem when you don’t have to?
In contrast, AMD has opted to avoid the giant interconnect problem entirely. EPYC and Ryzen split cores into clusters, and each cluster gets its own L3. Cross-cluster cache accesses are avoided except when necessary to ensure cache coherency. That means the L3 interconnect only has to link eight cache slices with eight cores. The result is a very high performance L3, enabled by solving a much simpler interconnect problem than Intel.
This boosting behavior is likely specific to Intel DevCloud and not indicative of how fast SPR can boost
Even on Intel client parts, when using a simple Ringbus, Intel has marginally higher L3 latency regardless no?That means the L3 interconnect only has to link eight cache slices with eight cores. The result is a very high performance L3, enabled by solving a much simpler interconnect problem than Intel.
I'm talking about relative to Zen 3 latency. Hence me quoting chips and cheese talking about simply connecting 8 cores with 8 cache slices- as in Zen 3.Not even close, Golden Cove client is so much faster
View attachment 78054
Sapphire Rapids L3 Latency is 142 Cycles
![]()
Correct.I'm talking about relative to Zen 3 latency. Hence me quoting chips and cheese talking about simply connecting 8 cores with 8 cache slices- as in Zen 3.
Doesn't GLC have higher L3 latency than Zen 3 regardless of ringbus vs mesh?
Yes, but one thing I'll bet money on: Intel 7 is at least as good as TSMC N7, and Intel 4 will be at least as good as TSMC N4, and so on...Let me rephrase - it's literally marketing. Intel could call it's nodes whatever it likes. I don't think Intel's made any actual claims yet as to density or quality.
No, they aren't unless you are big enough to be able to order 10s of thousands of units from Intel direct. Once you involve a 3rd party, this is no longer the case. I can absolutely tell you that the price on Intel Ark is on the high side compared to the server/hardware I can get from Dell. Often I can get a full system for less than that price.Quotes are confidential. You would know that if you ever dealt with such things. i am done talking with you. Waste of energy
AMD has a better core with higher perf/watt. There may be some pro-AMD folks here, but the reality is Intel has fallen behind both in terms of absolute performance, and in terms of perf/watt and compute density/compute density/watt.Amd is not paying you so stop with the cheerleading lol
