Discussion [Speculation] Working silicon that must exist because I don't see why it wouldn't

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DavidC1

Golden Member
Dec 29, 2023
1,673
2,757
96
im actually waiting for Ram to go obselete with Gen7 or Gen8 nVME's replacing RAM / Storage all together.
Gen5 is i think considered faster then DDR4 in a RAID-0 utilizing a full 16x PCI-E slot.

I think by Gen7 we may see the DDR slot disappear all together.
How? NAND flash has terrible latency. You could have 1TB/s NVMe drive and you still couldn't replace DRAM. Optane DIMMs were 100x faster in this regard and it was still significantly behind DRAM.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,052
3,533
126
Have you set that up in your TR system?
No not with PCI-E 5.0.

How? NAND flash has terrible latency. You could have 1TB/s NVMe drive and you still couldn't replace DRAM. Optane DIMMs were 100x faster in this regard and it was still significantly behind DRAM.
I am predicting by the time we hit Gen7 the PCI-E Lanes will be Gen.7 as well, and will probably work itself out as the PCI-E lanes are all tied directly to the CPU now.

Its like how you look at SCI FI movies and see how everything is based off iso crystals or some sort.
I am thinking that will be the RAM / OS / Storage all in 1 part, and the CPU/GPU/Motherboard will be its own.
 

DavidC1

Golden Member
Dec 29, 2023
1,673
2,757
96
Its like how you look at SCI FI movies and see how everything is based off iso crystals or some sort.
I am thinking that will be the RAM / OS / Storage all in 1 part, and the CPU/GPU/Motherboard will be its own.
That changes nothing. NAND tech itself is limited to about 50-100us, and that's under random reads. It's literally 1000x the difference. Writes are worse because you need write-delete-write cycle, which is so slow that without DRAM buffer it would be absolutely unusable beyond 1-bit SLC SSDs. Under random writes, bufferless 2-bit NAND is in the slow HDD range for speed. Even DRAM-less SSDs use tricks such as using system DRAM or SLC caches.

The purpose of a DRAM buffer is the controller keeps a table where it maps addresses of the NAND chip in a certain fixed ratio, and clears them as necessary. So a 1GB DRAM on 1TB SSD means it has 1:1000 ratio. So in very simple terms, 1 byte in DRAM has to contain 1 location in NAND, and a page in this case is 1000 bytes.

So even on SSDs, it's all about the DRAM.

DRAM is anywhere from 50-150ns. It's incomparable. Who cares about direct CPU connections when you are that slow?

Not to mention your fancy combined drive will die in a few days because DRAM is essentially unlimited while NAND is severely limited and needs all sorts of tricks such as wear levelling. Optane DIMMs showed promise because it had 200ns random reads, and 500ns random writes, along with the 5 year warranty for 24/7 writes, which would work out to be in the 1-10 million cycles per cell range, or 500 Petabytes for a 256GB Optane DIMM. At 500x the random read speed, it came rather close to the "1000x faster" claim.

The really old MLC SSDs with much more robust 60nm lithographies were in the 10K cycle per cell. Now you are talking 1-3000 with TLCs.
 
Last edited:

Thunder 57

Diamond Member
Aug 19, 2007
3,814
6,427
136
im actually waiting for Ram to go obselete with Gen7 or Gen8 nVME's replacing RAM / Storage all together.
Gen5 is i think considered faster then DDR4 in a RAID-0 utilizing a full 16x PCI-E slot.

I think by Gen7 we may see the DDR slot disappear all together.

How? NAND flash has terrible latency. You could have 1TB/s NVMe drive and you still couldn't replace DRAM. Optane DIMMs were 100x faster in this regard and it was still significantly behind DRAM.

I was going to post was @DavidC1 said but he beat me to it, twice. My first concern was the limited lifespan of NAND, but latency would be a huge factor that I did not even consider at first. CPU's need low latency.

How many times have you had DRAM fail? I did once. And since any that is worth a crap has a lifetime warranty all I had to do was put in the RMA "Failed in Memtest86" and I got a free replacement.

If anything I wish Intel/Micron didn't give up on Optane. The performance low low QD levels was unbeatable. Just a bit too pricey to be worth it for the mass market I guess.
 
  • Like
Reactions: DavidC1

DavidC1

Golden Member
Dec 29, 2023
1,673
2,757
96
How many times have you had DRAM fail? I did once. And since any that is worth a crap has a lifetime warranty all I had to do was put in the RMA "Failed in Memtest86" and I got a free replacement.
They fail of course, but it's normal variations, and unpredictable one at that. But NAND lifecycles? It's predictable. It's low enough that while most of us may never experience it, it's within realms of plausibility.
If anything I wish Intel/Micron didn't give up on Optane. The performance low low QD levels was unbeatable. Just a bit too pricey to be worth it for the mass market I guess.
I really think it had potential, but I get their point. If they succeeded, then eventually they would have had to compete in a low margin market, which is basically moderate amount of money unless you are #1.

And the road to getting there? One analyst said you need to be 1/10th volume to succeed in fulfilling the promises of cost. 1/10th volume of NAND or DRAM, man oh man that's not just successful, but a wildly successful one.

Intel would have not only had to have perfect execution, but willingness to work with everyone, competition or not(ARM, AMD), and grind for years at that. Gelsinger trained under Andy Grove. He was famous for shifting Intel away from system memory into CPUs, which were much more profitable. This would have to be a big shift significantly back into it.
 

Doug S

Diamond Member
Feb 8, 2020
3,335
5,822
136
I was going to post was @DavidC1 said but he beat me to it, twice. My first concern was the limited lifespan of NAND, but latency would be a huge factor that I did not even consider at first. CPU's need low latency.

How many times have you had DRAM fail? I did once. And since any that is worth a crap has a lifetime warranty all I had to do was put in the RMA "Failed in Memtest86" and I got a free replacement.

If anything I wish Intel/Micron didn't give up on Optane. The performance low low QD levels was unbeatable. Just a bit too pricey to be worth it for the mass market I guess.


There's a third problem he's also overlooking, granularity. NAND writes happen in pages that are far far larger than cache lines. So even if you had magic NAND with unlimited write cycles, no erase required before write, and latency comparable to DRAM you'd STILL have to re-architect the entire caching layer to be able to handle a last level/system level cache to have line sizes that matched the page size of the NAND you were using.

And all that is for what? What's the gain here? Not having DIMM slots? Oh yay, the boards can be made a bit smaller, that TOTALLY sounds worth all this effort! :rolleyes:
 

DavidC1

Golden Member
Dec 29, 2023
1,673
2,757
96
I want them to introduce a high end iGPU with a relatively low end CPU, and price it as such. I know why they don't do it. They are forcing you to buy high end CPUs. It's such a beancounter move.

You used to pay $4-5 extra for a chipset with graphics and pair it with a $100 Celeron. Now if you want the fastest of the "free" graphics, you have to get i7 or higher. The real cost for those folks is easily $150 for the bottom of the barrel graphics, since they don't care about the CPU.

You are better off getting a cheap dGPU, because you don't have to pair it with the fastest, most expensive CPU.

Notice that integration doesn't really bring the price down. In addition to what I said, the CPUs that started integrating graphics increased the price exactly by the price of the cost adder on the chipset. And even though you have one less big chip on your motherboard, the motherboard still costs the same!

Novalake Celeron with 24 Xe4 graphics for $150 please!
 
Last edited:
Jul 27, 2020
26,228
18,060
146
Novalake Celeron with 24 Xe4 graphics for $150 please!
Possibly more absurd than my P4 wish. You know they won't ever do it because of the "there's no market for it" excuse. But it would be the quickest way to establish a huge following especially in third world markets, enticing people there to ditch low end dGPUs.

At least with the P4, millions of people still alive who used to have a P4 may buy it again for nostalgia.
 

Thunder 57

Diamond Member
Aug 19, 2007
3,814
6,427
136
I guess because Intel advertised the hell out of it during that era.

No, I mean why would somebody buy a modernized P4? It's a flawed design. I think we've been through this before, but I suggest you read the Chips & Cheese article on it. If you just want the cliffnotes, go to the "Putting it Together" section. The P4 did have some good ideas that we saw later with Sandy Bridge, but it was too ambitious and too soon for its time. It was good in integer code with few branches and SSE2 but those were the highlights IMHO.
 
Jul 27, 2020
26,228
18,060
146
It was good in integer code with few branches and SSE2 but those were the highlights IMHO.
And it can be a good integer crunching CPU with upgraded AVX-512 on 18A, cheaper to make due to smaller die size than the current architectures and easily hit 6.5 GHz due to lesser complexity. It could be the modern Celeron for the cash strapped.
 

Thunder 57

Diamond Member
Aug 19, 2007
3,814
6,427
136
And it can be a good integer crunching CPU with upgraded AVX-512 on 18A, cheaper to make due to smaller die size than the current architectures and easily hit 6.5 GHz due to lesser complexity. It could be the modern Celeron for the cash strapped.

What you are proposing sounds exactly like Larrabee, except using the P4 as the core rather than a modern Pentium (P5). How'd that work out?
 

DavidC1

Golden Member
Dec 29, 2023
1,673
2,757
96
And it can be a good integer crunching CPU with upgraded AVX-512 on 18A, cheaper to make due to smaller die size than the current architectures and easily hit 6.5 GHz due to lesser complexity. It could be the modern Celeron for the cash strapped.
LOL. You know the "Bonnell" Atom? The one that still causes people to criticize E cores because they can't learn beyond what they read 17 years ago? The in-order design?

I estimated that the integer performance of that chip was not too far away from Pentium 4 derivatives at the same clock. Within 10-15%. It was faster at SSE2, but you are talking maybe 30% over that Atom.

The 22nm out of order successor beat P4 per clock in both Int and FP. Didn't even need SSE of any sort.
Possibly more absurd than my P4 wish. You know they won't ever do it because of the "there's no market for it" excuse. But it would be the quickest way to establish a huge following especially in third world markets, enticing people there to ditch low end dGPUs.
You don't know much about CPU design do you? Not even on a high level sense?

Celeron with a fast iGPU can be done right now.

For your P4 idea they'll need to spend tens of millions of dollars to port it to the latest process, and I assure you it's not Microsoft Paint's Ctrl+C, Ctrl+V.
 
Last edited:
Jul 27, 2020
26,228
18,060
146
What you are proposing sounds exactly like Larrabee, except using the P4 as the core rather than a modern Pentium (P5). How'd that work out?
P55C is much more ancient than a P4. And Larrabee was supposed to be a programmable GPU running on a sea of tiny CPU cores. Had Intel managed to make it work and had it not been abandoned after Pat left, it could've made realtime raytracing fashionable much earlier than Geforce RTX. Intel's problem has always been that they find it hard to stick with their ideas and see them through to completion. They get spooked by anything mildly successful from AMD and then they reactively change their tune to one-up AMD, often with disastrous results. Hybrid cores is one such disaster.
 
Jul 27, 2020
26,228
18,060
146
We already got modern P4, it's called Sandy Bridge. It takes the crazy part of P4 and optimizes it to be much more efficient. It's 2.5x the perf/clock.
OK, I'm down with that. Let's give Sandy Bridge 35 pipeline stages, AVX-512, 6.5 GHz all-core clocks with boost clocks a few hundred MHz above that and we are good :)
 

DavidC1

Golden Member
Dec 29, 2023
1,673
2,757
96
P55C is much more ancient than a P4. And Larrabee was supposed to be a programmable GPU running on a sea of tiny CPU cores.
It did work. It got so delayed that they repurposed it for a workstation accelerator.

And the successor had a far superior core based on Silvermont. Yea that same core which beats P4 in per clock performance. They quoted over 3x the performance per core despite just 30% increase in clocks.

The 10nm successor called Knights Hill featured Goldmont core, which is another 30% faster per clock and moves the FP to OoOE execution and SSE instructions with a proper pipelined FPU.
 
Last edited:

zir_blazer

Golden Member
Jun 6, 2013
1,239
537
136
Doesn't the current E Cores only Xeons like Sierra Forest can be considered a Xeon Phi of sorts? No GPU wannabe functionality, but it fits the definition of a sea of small cores. They also have socket compatibility with the P Cores only Xeons, which is something that Xeon Phi apparently intended but didn't managed to do in the sense that they used a different pinout and Motherboards.
 
  • Like
Reactions: coercitiv

DavidC1

Golden Member
Dec 29, 2023
1,673
2,757
96
Doesn't the current E Cores only Xeons like Sierra Forest can be considered a Xeon Phi of sorts? No GPU wannabe functionality, but it fits the definition of a sea of small cores. They also have socket compatibility with the P Cores only Xeons, which is something that Xeon Phi apparently intended but didn't managed to do in the sense that they used a different pinout and Motherboards.
Xeon Phi was more accelerator oriented, hence the low clocks. Also it was 1S only. It had HMC memory on package, which is BetaMax compared to HBM memory. They modified the core to support 4 threads and many other small details which would specifically boost performance in HPC. The architect for the chip said the core has been so heavily modified that they didn't call it Silvermont. They touched things like out of order buffers, so a redesign.

I don't know the figures for Knights Landing, but for the predecessor they said the HPC specific optimizations improved performance by three to fivefold compared to a hypothetical untouched P55C.

Also Silvermont was no way comparable to Broadwell core at the time. Skymont is not too far away from Lion Cove and Golden Cove. Clearwater Forest is going to be in many ways a replacement to Granite Rapids.