• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

NV40 where are you????? Are you going too be called Geforce FX 2

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
I saw those NV40 specs a while back, and I'm pretty sure they're bogus...

I mean, do you know how much eDRAM costs? Not to mention that there are only 2 or 3 manufacturers of it in the first place. Those specs are just too astronomical of a jump methinks.

Now on the other hand, if NVidia does pull it off and manage to offer the card at a reasonable price (at or below current top end prices), then, holy crap.
 
I dont think you can be sure of it. And to call anything bogus or legit. Fact is, no one knows.
This "could be" totally false. Or "for the fearful types" it could be 100% accurate.


I agree though. It would be Holy Crap indeed.
 
Originally posted by: Insomniak
I mean, do you know how much eDRAM costs? Not to mention that there are only 2 or 3 manufacturers of it in the first place. Those specs are just too astronomical of a jump methinks.
Actually, not much. The GC has built in DRAM on the flipper chip. It nVidia was to use the type of ram that the GC uses, then it could add that 16mb quite cheaply.
Now on the other hand, if NVidia does pull it off and manage to offer the card at a reasonable price (at or below current top end prices), then, holy crap.
As above, the GC has built in ram, and its theoretical maximum bandwidth for the flipper and its memory is 10Gb/s+. And we`ve all seen some of the stuff that the GC can push out.
 
ctually, not much. The GC has built in DRAM on the flipper chip. It nVidia was to use the type of ram that the GC uses, then it could add that 16mb quite cheaply.

But it would be quite a waste, as even the peak bandwidth of the memory would still be 1/4th the bandwidth of the regular memory. Unless the on-chip ram is running at astronomical speeds(which it probably wont considering how cache right now runs) its pretty much a waste of silicon

And we`ve all seen some of the stuff that the GC can push out.

And what that actually has to do with the on-chip memory remains to be seen, plus comparing what a console can push out compared to a PC has always been a moot comparison
 
If ATI did it, would you call it a waste then? No offense, just curious. And where did you get 1/4th from?
 
And can we also speculate that GPU/VPU's will become more CPU like? Its almost like having L2 cache in the core of the graphics processor.. What kind of memory is used for L2 cache again?

From the looks of Doom3 and Half life2 bringing todays top hardware to a crawl, it's not out of the question that the hardware makers would feel the "market need" to go hardcore on the next-gen GPU/VPU's. People were used to getting well over 100fps in most games easily. Now it is questionable whether the games can be played with any real enjoyment at all. The software companies, including Microsoft of course, are probably pushing hardware vendors to pick up the pace.
To keep up with DX9/9.1 and so on. Some heavy duty hardware is needed right now to push and shade these super high powered games.

IMHO
 
Originally posted by: gorillaman
If ATI did it, would you call it a waste then? No offense, just curious. And where did you get 1/4th from?

It doesnt matter who does it, if either company did it, it would most likely drive the cost up immensely and offer little to no performance increase. And i get 1/4th because the peak of the memory in the GC chip is 10gb/s and the memory that will be used in the R420/NV40 will run anywhere from 30-45gb/s

What kind of memory is used for L2 cache again?

They use SRAM for the cache, it really can only come in small densities and has adequate bandwidth
 
Originally posted by: Jeff7181
Originally posted by: videoclone
🙂 i think Nvidia will be making something fast but i dont think it will be based on a new Fab i think they will stick with 13u micron

!!!! its more mature and there not going to make the same mistake twice!!!!! And have there new core delayed a year due too poor fabrication maturity and yield

the core speed will be 550 - 600 but i dont think anything higher then that and they will stick with
Extravagant cooling solution with the beast and wouldn?t be surprised if the card ends up being even bigger then the old GF FX?s ? and they may also change the name back too Geforce5 or Geforce FX2

You're one of those guy's who's voice gets higher and you can't sit still when you talk about computers aren't you?


I am a Computeraholic and i'm proud of it 🙂 ... why you are you even here this is a place for this kind of talk !!!!!!

Your in the wrong place if you find it strange ...... maybe you should Go back to nark land ! i know your asking for a dual channeling !! big time ...
 
Originally posted by: reever
Originally posted by: gorillaman
If ATI did it, would you call it a waste then? No offense, just curious. And where did you get 1/4th from?

It doesnt matter who does it, if either company did it, it would most likely drive the cost up immensely and offer little to no performance increase. And i get 1/4th because the peak of the memory in the GC chip is 10gb/s and the memory that will be used in the R420/NV40 will run anywhere from 30-45gb/s

What kind of memory is used for L2 cache again?

They use SRAM for the cache, it really can only come in small densities and has adequate bandwidth

Ahh, SRAM. Thanks. I'm not going to pretend to understand how the architecture operates, but why instead of this eDRAM wouldn't they use a small amount of this much faster SRAM? Like 256k or even less? Is it more cost prohibitive than 16MB of eDRAM?

This is of course if any of the NV40 specs are true.

 
As far as eDRAM goes, all the chips currently have some type of embedded RAM on their chip that they use for a texture cache(at least). The hype around eDRAM from a few years back was being able to use it for a framebuffer, which back in the day when it was a hot topic would have screamed compared to the competition which you could argue it still would however-

1600x1200x32x4~= 29.3MBs needed to be able to run 1600x1200x32x4, if you want a FP buffer and 8X AA you could be looking at close to 120MB of eDRAM needed to utilize for a framebuffer, that is not going to happen even on .09u from an IMR, particularly not one with the amount of feature support rumored to be in the upcoming parts.

An increased amount of on chip RAM is likely. If it is fast enough with low enough latency they would be able to mask performance hits from AF as long as the core was busy doing something, anything else. This could also be used to help work around an imbalance with a part if their pixel pipe configuration exceeded 8 reads per clock(be it 12x1, 8x2 or even 4x3) although without a significant amount it wouldn't be enough to get close to the theoretical rate out of it.
 
Originally posted by: gorillaman
OK, here is what I have come up with so far..

Here are the nVidia NV40 Specs:

- 0.09u process
- 300-350 Million Transistors
- 750-800 MHz Core clock speed
- 16 Mb Embedded DRAM (134 million transistors)
- 1.4 GHz 256-512 Mb DDR-II Memory
- 8 Pixel Rendering Pipelines (4 texels each)
- 16 Vertex Shader Engines
- 204.8 GB/sec Bandwidth (eDRAM)
- 44.8 GB/sec Bandwidth (DDR-II)
- 25.6 GigaTexels per Second
- 3 Billion Vertices per Second
- DirectX 9.1 (or even DirectX 10) features


To compare, here are the nVidia NV30 Specs:
- 0.13u process
- 500 MHz Core clock speed
- 500 (1Ghz) MHz 128-256 Mb DDR-II
- 125 million transistors
- 8 pixel pipelines with 2 texturing units each
- 16 texture layers per rendering pass
- 3.2 gigapixels per second
- 6.4 gigatexels per second
- 360-400 million vertices per second
- 16 gigabytes/sec

- DirectX 9.0+ features (Pixel Shader 2.0+, Vertex Shader 2.0+, etc.)
- 128 and 64-bit Floating-Point Pixel Processing - Quad Vertex Shader Engine
- Improved FSAA (Programmable Grid AA or Adaptive AA)
- Improved HSR (Lightspeed Memory Architecture III)
- AGP 3.0 (AGP 8x)

Note: The data is not offical and shouldnt be treated as such.

^ Bwahahahahahaahaha! Those were an "guestimate" made on some board by a hardware guru named ChairmanSteve well over a year ago (and used in many publications too).

I'll dig up the link from where it was originally posted, but that has NOTHING to do with Nvidia (this may take awhile).

Also, just look at the features:

.09 - we already know ATI and Nvidia will be using .13.
eDram - not likely.
750-800MHz core speed - used to be incredibly optimistic, but after the FX cards, something like 500-650MHz is probably doable for the core.
 
Originally posted by: reever
Originally posted by: gorillaman
Reever seems to have "Conspiracy Theory Syndrome".... LOL.. Everyone lies to him and he knows it without a doubt.
Let him be the low IQ on the totem pole right where he constantly puts himself becuase he cant control what he types.
It's the metacarpal version of terets syndrome. Don't pay him any mind, he doesn't know any better. And I know that I
don't feel like making time for things like this. He's really a good kid, but his knuckles are red and swollen from smashing his
fingers with a hard rubber mallet whenever they act up. I hope he seeks treatment.

It's so much easier to make personal attacks then it is to discuss technology right?

You've hit the nail on the head there my friend. Sad but true. :beer:
 
Originally posted by: jiffylube1024
Originally posted by: gorillaman
OK, here is what I have come up with so far..

Here are the nVidia NV40 Specs:

- 0.09u process
- 300-350 Million Transistors
- 750-800 MHz Core clock speed
- 16 Mb Embedded DRAM (134 million transistors)
- 1.4 GHz 256-512 Mb DDR-II Memory
- 8 Pixel Rendering Pipelines (4 texels each)
- 16 Vertex Shader Engines
- 204.8 GB/sec Bandwidth (eDRAM)
- 44.8 GB/sec Bandwidth (DDR-II)
- 25.6 GigaTexels per Second
- 3 Billion Vertices per Second
- DirectX 9.1 (or even DirectX 10) features


To compare, here are the nVidia NV30 Specs:
- 0.13u process
- 500 MHz Core clock speed
- 500 (1Ghz) MHz 128-256 Mb DDR-II
- 125 million transistors
- 8 pixel pipelines with 2 texturing units each
- 16 texture layers per rendering pass
- 3.2 gigapixels per second
- 6.4 gigatexels per second
- 360-400 million vertices per second
- 16 gigabytes/sec

- DirectX 9.0+ features (Pixel Shader 2.0+, Vertex Shader 2.0+, etc.)
- 128 and 64-bit Floating-Point Pixel Processing - Quad Vertex Shader Engine
- Improved FSAA (Programmable Grid AA or Adaptive AA)
- Improved HSR (Lightspeed Memory Architecture III)
- AGP 3.0 (AGP 8x)

Note: The data is not offical and shouldnt be treated as such.

^ Bwahahahahahaahaha! Those were an "guestimate" made on some board by a hardware guru named ChairmanSteve well over a year ago (and used in many publications too).

I'll dig up the link from where it was originally posted, but that has NOTHING to do with Nvidia (this may take awhile).

Also, just look at the features:

.09 - we already know ATI and Nvidia will be using .13.
eDram - not likely.
750-800MHz core speed - used to be incredibly optimistic, but after the FX cards, something like 500-650MHz is probably doable for the core.

It has pretty much no chance of being 0.09, since even Intel have only just switched to 0.09.
Normally processors are ahead of graphics cards in terms of technology processes, also, graphics cards tend to make the smaller jumps. 0.18 > 0.15 > 0.13. Processors have gone 0.18 > 0.13 > 0.09.
The next logical step (and the one ATi is supposedly going to) is 0.11 (the next step down, and one of the RAM processes, IIRC), this seems more probable than 0.09, which is a 2 step jump (something that doesn't seem to have happened in graphics cards recently)

Also, nVidia are unlikely to move to 0.09 a year after they went to o.13, and some of their lineup is still 0.15 IIRC (5200?)
Also, using a more mature process (like ATi did with continuation of 015) would make more sense.
And 300million transistors would be extremely unlikely, the yields of 300mil transistor designs on a new process would probably be 1% getting the required clock speeds. It's just not going to happen for a long while.
DDR-II at 1GHz+ is likely, and achieveable, but I expect we will stick with 128MB for "base" RAM on cards, not 512MB versions just yet (although i may well be wrong, but I don't see companies seeing a need for more RAm, at more costs, and less profit).
 
Originally posted by: reever
But it would be quite a waste, as even the peak bandwidth of the memory would still be 1/4th the bandwidth of the regular memory. Unless the on-chip ram is running at astronomical speeds(which it probably wont considering how cache right now runs) its pretty much a waste of silicon
Asuming the chip ran a 500-600mhz, running on-chip cache at the speed wouldnt be much of a problem, asuming that they could intergrate it without having a huge increase in voltage and heat.


And what that actually has to do with the on-chip memory remains to be seen, plus comparing what a console can push out compared to a PC has always been a moot comparison
What it has to do is; more on chip memory can help gfx chips in a very big way. And any other chip for that matter.
Even if it wasnt 16mb, even if it was only 512kb, it would do wonders for the chip.

If they considered using the 1T-SRAM that the flipper uses, then it would be quite cheap.

it doesnt matter who does it, if either company did it, it would most likely drive the cost up immensely and offer little to no performance increase.
If either company did do it, it would mean quite a performance increase compared to the same chip with no on-chip memory.
 
Asuming the chip ran a 500-600mhz, running on-chip cache at the speed wouldnt be much of a problem, asuming that they could intergrate it without having a huge increase in voltage and heat.

Cache on processors runs at core speed, and only gives 20-25gb/s of bandwidth, main video memory still would have higher bandwidth. If you need more bandwidth you would have to increase the bus and associativity of the cache which would make costs skyrocket and prove just about impossible to manufacture. Intel is the master t making cache and not even they can do it effectively


more on chip memory can help gfx chips in a very big way. And any other chip for that matter. Even if it wasnt 16mb, even if it was only 512kb, it would do wonders for the chip.

How would it help? What would the cache hold? Framebuffer, textures, instructions? If it held anything but instructions you would need atleast 16mb of memory to make it affect anything, there is no point in holding 512k of textures or framebuffer
 
Originally posted by: reever
Cache on processors runs at core speed, and only gives 20-25gb/s of bandwidth, main video memory still would have higher bandwidth. If you need more bandwidth you would have to increase the bus and associativity of the cache which would make costs skyrocket and prove just about impossible to manufacture. Intel is the master t making cache and not even they can do it effectively
Only 25gb/s. Well ah heck me, if the numbers that small!
As i said. By using the right type of sram, it could be done relativly cheaply.
How would it help? What would the cache hold? Framebuffer, textures, instructions? If it held anything but instructions you would need atleast 16mb of memory to make it affect anything, there is no point in holding 512k of textures or framebuffer
You dont seem to understand the nature of cache fully do you?
Can the CPUs cache hold the entire OS in its cache? No it cant, so it transfers the parts it doesnt need to the main memory. And thats what`ll happen on gfx chips, if they use cache.
Look at the GC (again). The Flipper has 3mb of cache on it. Yet its pumping higher resolutions and texture details than many PS2 and Xbox games, which supposidly use "traditional" gfx chips. The GC chucks around very high res textures and with the aid of S3TC, can do it successfully, due to its on-chip cache.

 
Only 25gb/s. Well ah heck me, if the numbers that small!
As i said. By using the right type of sram, it could be done relativly cheaply.

27.2GB/s - The memory bandwidth of a GF FX 5900U 256MB card.
21.8GB/s - the memory bandwidth of a Radeon 9800Pro 128MB card.

25GB/s is small. These are raw figures, excluding any kind of compression (which may be something like 4:1 compression ratio) allowing 80GB/s+ theoretical memory bandwidth. So yes, 25GB/s is small.
RAM on a computer can offer maybe 6.4GB/s max (dual channel DDR400) IIRC, compared to:
At 1.5GHz, the Pentium 4?s L2 cache offers a 48GB/s throughput while a theoretical 1.5GHz Pentium III would only offer 24GB/s of available bandwidth.
Remember that the P4 is running at 3 times the speed of current graphics processors, so about 16GB/s unless they increased the data path.

I don't see why you need "cache" type RAM on a thing that already has 25GB/s bandwidth from its main memory. The Flipper on the Gamecube is clocked at 162MHz
GPU - "Flipper" (system LSI)
Manufacturing Process: 0.18 microns NEC Embedded DRAM Process
Clock Frequency: 162 MHz
Embedded Frame Buffer/Z Buffer: Approx. 2 MB, Sustainable Latency: 6.2 ns (1T-SRAM)
Embedded Texture Cache: Approx. 1 MB, Sustainable Latency: 6.2 ns (1T-SRAM)
Texture Read Bandwidth: 10.4 GB/second (Peak)
Main Memory Bandwidth: 2.6 GB/second (Peak)

-Unified memory architecture bank to share 64MB between CPU and graphics tasks. Memory bandwidth of 6.4GB/second. 125 million polygons per second
Xbox specs. Less bandwidth at 6.4GB/s rather than peak or 10.2GB/s for Gamecube
Can't be bothered with PS2.

And consoles are different from PC's anyway, as they are engineered for gaming, not for anything else, hence why an XBox can put out fairly good graphics despite being similar to a P3 733MHz with GF3 graphics card.
 
OK ATI will try 16x1, how will they fill all those pipelines? Wouldn't that take a massive amount of bandwidth?

I just remember reading from Anand, that more is not always better, but sometimes just more.
 
Originally posted by: gorillaman
And can we also speculate that GPU/VPU's will become more CPU like? Its almost like having L2 cache in the core of the graphics processor.. What kind of memory is used for L2 cache again?

From the looks of Doom3 and Half life2 bringing todays top hardware to a crawl, it's not out of the question that the hardware makers would feel the "market need" to go hardcore on the next-gen GPU/VPU's. People were used to getting well over 100fps in most games easily. Now it is questionable whether the games can be played with any real enjoyment at all. The software companies, including Microsoft of course, are probably pushing hardware vendors to pick up the pace.
To keep up with DX9/9.1 and so on. Some heavy duty hardware is needed right now to push and shade these super high powered games.

IMHO


Well, that's been the trend of late....we're basically moving to multiprocessor systems....

The GPU is much like a self contained little computing system (its own core, memory) that works solely on grahics...I think it's only a matter of time until we see APU (Audio Processing Units) that are similar to GPUs in construction but are optimized for detailed, highly complex sound creation - a lot of sound cards already take some of the work away from the CPU.

The hardware IS getting more efficient, but at the same time we're splitting up system work among several different processing cores. That's how we've been pulling off these quantum leaps in graphical fidelity over the past few years.

 
By using the right type of sram, it could be done relativly cheaply.

And still, it will not have bandwidth better than main video memory, i'm sure Intel and AMD would love to hear your insights on how to make their cache run faster and cheaper

Can the CPUs cache hold the entire OS in its cache? No it cant, so it transfers the parts it doesnt need to the main memory. And thats what`ll happen on gfx chips, if they use cache.

You can't directly make a compariosn between GPU's and CPU's. GPU's would get zero benefit from having anything like an L1 or L2 cache, instructions are tiny enough already to be streamed through the GPU without any excess or stuff they don't need left over, not to mention the fact that GPU'save an emormous amount of normal and temp ergisters compared to the amount CPU's have, which would make any sort of instruction cache overkill

 
Originally posted by: BoomAM
You are too stubern to argue with.
We dont even know if the nV40 will have cache yet.

No, he's making sense. Gamecube/Xbox/PS2 are totally different cases because they use such little amounts for texture data, etc (usually max 10MB, often much less). PC games often use >50MB for textures. What purpose would a small cache on a GPU like that serve?

The only way it would be effective would be like in ChairmanSteve's example given - something like 8MB -16MB of ultra high bandwidth eDRAM, which would unfortunately cost a fortune (I'm sure he meant 16MB not Mb (megabits) because he says 134 million transistors, which is way too much for only 2MB (=16Mb) ).

For the time being, sticking to the "conventional" GPU memory type (high speed DDR on a 256-bus) seems to make the most sense for the gigantic texture swaps required on the PC.
 
Back
Top