• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

A DPU , How does that sound ?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Originally posted by: Modelworks


Something has to process the instructions from the OS to the various chips on the board. Direct access to on board chips is not available on the x86 platform. In the embedded world it is done all the time, but that is a totally different architecture. If you use cpu time to get data from your proposed dpu then it would have to take less time than the cpu could do it for itself. Right now cpu and the data they need from storage are not a bottleneck for any application except copying of files and for that their are already controller cards.

Afcourse there is no direct access for programs, even in embedded systems since ARM is coming up strong and fast it is possible with for example embedded linux to have a totally pre-emptive OS. But the OS can do anything you want it too. As long as the OS knows of the existence of a certain hardware feature and has the code to use it. Just drivers and algorithms. But that is what i mean and Idontcare explained it perfectly : prefetchers.
Combine the knowledge of the OS when data is needed with a specialized core that takes care of the data and it will speed up greatly. That and a one stop caching system for reads and writes will improve it. See it like this : you need data on your hdd after you modify it.
The DPU gives you the modified data from the large local DRAM cache while this data is still being written to the HDD or SSD. I am sure it sounds familiar now. Because that is exactly what the cpu does with it's main memory and local on die cache. But since these chunks of data are relative small the hardware can handle it. With the boatloads of data coming from HDD or SSD , and the OS just servicing requests from program's , it is the OS who knows best what will be needed.

Now afcourse the HDD uses it's local onboard cache for this feature as well. As do some raid controllers. As does the OS. All i am saying is, get rid of the seperate caching systems and device one central version.

Applications that need large amounts of data spend more time waiting for the cpu to process the data than they do reading it from storage.

We are coming to a part of pc history where specialized cores take over. When the GPU becomes finally not scary anymore , the processing of data will speed up greatly while the read and write speeds of storage will make the performancegap between hdd and ssd and the cpu/gpu/memory combo even larger.

 
one must wonder how much effect compression can have on overall bandwidth.

to say its dead technology. better go check sql server 2008 (R2).
 
Originally posted by: William Gaatjes
Let's use a few million transistors to set up such a device that can handle this. It would really improve. Let the OS handle it on the software side. And use that special core to find out what to prefetch. Since there are spare cores as well, we can use some calculation power of those cores as well. The amount of cores will just be growing. And a specially designed core for a certain task, will always be faster on the same process then a general core.

I see, kinda like how Intel threw a million xtors at their PCU which serves as a sophisticated power-consumption control feedback loop to regulate clockspeeds and power-states.

If you can do that in the name of minimizing power-consumption so your performance/watt is improved then you should also be interested in throwing a few million xtors at integrating the IOP341 (or similar) controller onto the cpu-die (or at least under the IHS) with the prioritizing being better management of the data flows so that performance improves.

For Intel this would require the performance increase from doing this to be at least 2x the increase in power-consumption associated with those extra xtors operating plus any other ancillary power-consumption the activity generates within the socket.

I can see a high-level path to implementing it, and I really would be surprised if it hasn't been an idea in the idea jar at Intel/AMD for a while now, but somewhere someone has to come up with the relative likelihood of people being willing to pay extra for this feature (just as they did for turbo-clocking and SMT and black editions, etc) and I'm wondering if we just haven't seen it yet because there is little confidence in their being able to sell it for the kind of ROI that merits making the "I" years in advance of seeing any of the "RO".
 
I have a book somewhere, if I can find it, that came out last year by a engineer that worked for Altera , in it he describes what he thinks will be the future. He describes the future as a bunch of dedicated function chips with the cpu only acting as a traffic cop for the data. He uses the analogy of a cop standing in the middle of a four way intersection and as a car (the program or data) approaches he sends it to whatever lane (bus connected to a dedicated chip) it would need to go to. The cpu performs no actual program calculations. It is a lot more complex than that , but it resolves some of the issues with programming for smp and also boost execution, I think he said somewhere around 300%.
 
Now sure the performance of the raid card is not as good as just going ramdisk, which if I understand your argument you are saying that until we see mass-storage operating at the efficiency and speed of a ramdisk then there is room and opportunity for improvement and I totally agree with that sentiment.
Speaking of, companies have been working on the tech to have non volitile ram for decades now... they claim they are getting close to having a major breakthrough, but that does not necessarily mean that they will...
 
Originally posted by: Idontcare
Originally posted by: William Gaatjes
Let's use a few million transistors to set up such a device that can handle this. It would really improve. Let the OS handle it on the software side. And use that special core to find out what to prefetch. Since there are spare cores as well, we can use some calculation power of those cores as well. The amount of cores will just be growing. And a specially designed core for a certain task, will always be faster on the same process then a general core.

I see, kinda like how Intel threw a million xtors at their PCU which serves as a sophisticated power-consumption control feedback loop to regulate clockspeeds and power-states.

If you can do that in the name of minimizing power-consumption so your performance/watt is improved then you should also be interested in throwing a few million xtors at integrating the IOP341 (or similar) controller onto the cpu-die (or at least under the IHS) with the prioritizing being better management of the data flows so that performance improves.

For Intel this would require the performance increase from doing this to be at least 2x the increase in power-consumption associated with those extra xtors operating plus any other ancillary power-consumption the activity generates within the socket.

I can see a high-level path to implementing it, and I really would be surprised if it hasn't been an idea in the idea jar at Intel/AMD for a while now, but somewhere someone has to come up with the relative likelihood of people being willing to pay extra for this feature (just as they did for turbo-clocking and SMT and black editions, etc) and I'm wondering if we just haven't seen it yet because there is little confidence in their being able to sell it for the kind of ROI that merits making the "I" years in advance of seeing any of the "RO".


I know what you mean. The problem as even microsoft encountered when introducing new technologies with vista a few years back, is that the media will burn down everything if it is not an immediate benefit. Although obviously it is not that simple when talking about vista
since there are more design flaws from a pure performance point of view but understandable and not design flaws with regards to the "financial partners" . I am talking about the greatly increased protection of data by the use of DRM technologies. But the kernel is modern and thus beneficial.


I'm wondering if we just haven't seen it yet because there is little confidence in their being able to sell it for the kind of ROI that merits making the "I" years in advance of seeing any of the "RO".

I think it is because there are so many people in the loop. HDD manufacturers. Motherboards OEM manufacturers. The chipset designers and the cpu designers.
The operating system developers, And not to forget the marketing department. And afcourse the cost of research and development. Although i think that in the open source community people would jump on this technology the second it is available. But then again opensource means less profit.


Originally posted by: Modelworks
I have a book somewhere, if I can find it, that came out last year by a engineer that worked for Altera , in it he describes what he thinks will be the future. He describes the future as a bunch of dedicated function chips with the cpu only acting as a traffic cop for the data. He uses the analogy of a cop standing in the middle of a four way intersection and as a car (the program or data) approaches he sends it to whatever lane (bus connected to a dedicated chip) it would need to go to. The cpu performs no actual program calculations. It is a lot more complex than that , but it resolves some of the issues with programming for smp and also boost execution, I think he said somewhere around 300%.

The cpu would be like Gary or Gayle in the Amiga back in the days 🙂.

EDIT: Do not get me wrong, i think what you describe is the way to go.




Originally posted by: taltamir
Speaking of, companies have been working on the tech to have non volitile ram for decades now... they claim they are getting close to having a major breakthrough, but that does not necessarily mean that they will.

Yeah, there has been so much technologies but nothing is able to keep up with the price and speed of current dram technology.

EDIT :
I am still waiting for MRAM and memristor technologies. I think MRAM would be best for bulk storage as SSD for now.


Now flash is the number one because it is a low risk established technology.
i agree with Idontcare. As long it is not directly beneficial, the media will burn it down and the customer will not touch it.

 
Originally posted by: taltamir
Now sure the performance of the raid card is not as good as just going ramdisk, which if I understand your argument you are saying that until we see mass-storage operating at the efficiency and speed of a ramdisk then there is room and opportunity for improvement and I totally agree with that sentiment.
Speaking of, companies have been working on the tech to have non volitile ram for decades now... they claim they are getting close to having a major breakthrough, but that does not necessarily mean that they will...

heh, one of the many projects I got to work on at TI...FRAM

It worked, worked great, but no traction with actual IC designers to use the stuff in their designs (besides ramtron which just sold discreet chips for niche applications because the density was so low) so it just sat on the shelf once we got it to the point of full-production worthiness a few years ago. (we were the foundry for the fram that ramtron sold/sells)

Kind of like eDRAM, great product and performance but its taken ages for it to gain traction in mainstream high-volume commodity microprocessors. Hopefully with power7 we'll see the technology thoroughly vetted and will gain broader acceptance and use.
 
Originally posted by: Idontcare
Originally posted by: taltamir
Now sure the performance of the raid card is not as good as just going ramdisk, which if I understand your argument you are saying that until we see mass-storage operating at the efficiency and speed of a ramdisk then there is room and opportunity for improvement and I totally agree with that sentiment.
Speaking of, companies have been working on the tech to have non volitile ram for decades now... they claim they are getting close to having a major breakthrough, but that does not necessarily mean that they will...

heh, one of the many projects I got to work on at TI...FRAM

It worked, worked great, but no traction with actual IC designers to use the stuff in their designs (besides ramtron which just sold discreet chips for niche applications because the density was so low) so it just sat on the shelf once we got it to the point of full-production worthiness a few years ago. (we were the foundry for the fram that ramtron sold/sells)

Kind of like eDRAM, great product and performance but its taken ages for it to gain traction in mainstream high-volume commodity microprocessors. Hopefully with power7 we'll see the technology thoroughly vetted and will gain broader acceptance and use.


Wow...
You have nice work.

IBM and AMD are partners, i hope too for some amazing new trick up the silicon sleeve.
I know AMD has a license for Z-RAM ,which is also a kind of embedded DRAM.
I do not know if they already used such technology.



 
Originally posted by: William Gaatjes
Wow...
You have nice work.

Thanks, I thoroughly enjoyed working on that project, a real geeky science and engineering affair.

Originally posted by: William Gaatjes
IBM and AMD are partners, i hope too for some amazing new trick up the silicon sleeve.
I know AMD has a license for Z-RAM ,which is also a kind of embedded DRAM.
I do not know if they already used such technology.

Unfortunately zram ran into some parametric issues that basically precluded it from becomming a viable technology and GF's (and thus AMD) has dropped further efforts to commercialize it:

GlobalFoundries Outlines 22 nm Roadmap

Pellerin also said that GlobalFoundries is no longer pursuing the one-transistor ZRAM developed by Innovative Silicon Inc. (ISI, Lausanne, Switzerland), a capacitor-less design based on SOI substrates.

Instead, GlobalFoundries is working on a thyristor-based memory with T-RAM Semiconductor Inc. (Milpitas, Calif.). GlobalFoundries and T-RAM announced in mid-May that GlobalFoundries would co-develop 32 and 22 nm versions of the T-RAM, which is based on SOI technology, for low-power cache applications.

http://www.semiconductor.net/a...urce=title&rid=8242135

T-RAM is intriguing and does appear to be viable for commercialization, at least the early vetting process has not found critically fatal flaws as found in the vetting process of zram.

(T-RAM is not to be confused with TT-RAM, which itself is a zram competitor)

Scaling Limits of Double-Gate and Surround-Gate Z-RAM Cells

We consider the scaling of the capacitorless single-transistor [zero-capacitor RAM (Z-RAM)] dynamic RAM (DRAM) cells having surround-gate and double-gate structures.

We find that the scaling is limited to the channel length of approximately 25 nm for both types of cells, which is somewhat more pessimistic than previously believed.

The mechanisms that are found to be of most importance in imposing the scaling limits are as follows: 1) short-channel effects; 2) quantum confinement of carriers in the body; and 3) band-to-band tunneling at the source/drain-to-body junctions. Like other DRAM cells, practical considerations such as the process variations in cell dimensions, random doping fluctuations, and single-event upsets are likely to remain as important scaling concerns for Z-RAM cells.

http://ieeexplore.ieee.org/sea...r.jsp?arnumber=4294187

(emphasis added by me in the above abstract)
 
Back
Top