Is 25 nm approaching theoretical technical limits?

Discussion in 'Memory and Storage' started by RhoXS, Feb 12, 2011.

  1. RhoXS

    RhoXS Member

    Joined:
    Aug 14, 2010
    Messages:
    122
    Likes Received:
    0
    Normally, I perceive, a die shrink is a win win for the OEM and the consumer. The OEM makes a lot more product for the same resources and the consumer gets better performance. At least that is the way it has always seemed to happen.

    Now, the flash memory manufacturing process reduction from 34 nm to 25 nm seems to have resulted in slower speeds. OCZ stated "One effect of the die shrink ... the NAND is not as robust as the previous generation ... This made necessary a reduction in the rated P/E cycles.". Does this imply this latest shrink has exceeded theoretical limits and was only done because of the huge savings in the manufacturing process? Are we not likely to see another die shrink iteration unless an entirely new technology is implemented?
     
  2. Tsavo

    Tsavo Platinum Member

    Joined:
    Sep 29, 2009
    Messages:
    2,618
    Likes Received:
    0
    No, all this means is that OCZ doesn't have enough money to fully vet a new technology. Intel spends more on toilet paper than OCZ spends in total so the end isn't nigh in terms of 25nm flash.

    Very large companies will put very large amounts of cash toward perfecting a new technology. Other companies, very much not so. Look at how Intel handled their SB chipset and how much money they spent verifying it only to release a product that has errors and faces a $1B recall.

    The cost of playing in this field only goes up every year and we'll find that there are fewer and fewer companies able to front those costs as time goes by.

    Look at the firestorm of 25nm vs 34nm flash on nearly every tech site concerning OCZ. They (OCZ) are reacting in the manner that they can afford to...nothing else.
     
  3. Mark R

    Mark R Diamond Member

    Joined:
    Oct 9, 1999
    Messages:
    8,496
    Likes Received:
    0
    As flash cells are shrunk, they become less good. This is a fundamental feature of the technology. The overall volume of the cell becomes smaller, so less electrons can be stored in the cell (so the signal picked up by the electronics is weaker and less clear, so you get a higher error rate) and the insulating barriers around the cell must be made thinner, in order to save space - allowing the electrons to leak out of the cell more easily (reducing power off data retention time). The thinner insulation also wears out more quickly (reducing life cycles)

    It's difficult to define a 'fundamantal' limit for flash, because it may be possible to work around poor performance, and as yet unknown new manufacturing techniques and semiconductor materials may be developed. However, it has been suggested in the scientific literature that 18-22 nm, is the realistic limit. Beyond that, the performance/reliability/lifespan of the flash would be too poor, no matter how much wear levelling, and how sophisticated the ECC codes were.

    Enterprise grade SSD flash, will need higher specifications than flash for toy cameras. Enterprise applications are unlikely to tolerate 18 nm flash with 100 write cycles and one lost sector per 100 GB of data stored. However, this probably would be acceptable for toys or throwaway devices.

    Of course, there is some scope for new advanced materials (in the same way that special materials like SOI and high-k brought new life to CPU processes), but the research on this isn't publicly available, and in any case, is unlikely to be able to push the limit much further - maybe to 16 nm at the limit.
     
  4. Emulex

    Emulex Diamond Member

    Joined:
    Jan 28, 2001
    Messages:
    9,759
    Likes Received:
    0
    there are some extremely high tech methods of invalidating lines and chip sparing that could apply to an enterprise array. the idea would be to use the ssd smarter. tiered storage with a dedicated controller (or soft controller).

    the o/s and drivers need to be more aware of the flash memory imo. that is not the case now except with high end SAN storage and most of them are just doing read caching or sticking to slc.

    keep in mind the demand for 25nm is insane. it's possible ocz is buying the dregs of the stock - and making it work for maximum profit. micron/intel have their stock on lock down for enterprise.
     
  5. ksec

    ksec Member

    Joined:
    Mar 5, 2010
    Messages:
    147
    Likes Received:
    0
    Well Toshiba manage to prove us wrong with 16nm NAND. But it means nothing if it only has 1000 Read Write Cycle.

    Error Rate is also rising like the Morres Law, even though clever controller and software technique will migrate some of those short coming.
     
  6. Hacp

    Hacp Lifer

    Joined:
    Jun 8, 2005
    Messages:
    13,926
    Likes Received:
    0
    This is where people get confused. Die shrinks do not automatically mean better performance. In CPUs, die shrinks translate to better performance because they can fit more transistors in there. For memory, die shrinks generally mean less performance. Has to do with both the walls on the floating gate degrading faster because they are thinner and also with quantum tunneling on thinner walls.
     
  7. taltamir

    taltamir Lifer

    Joined:
    Mar 21, 2004
    Messages:
    13,574
    Likes Received:
    0
    they don't become slower, and not less good. they become less reliable, there is a difference. It must be compensated for, compensating for it is possible, it just means that the benefit from the transition is not as great as it would have been without it.

    There are still benefits to the transition, overall. But we are indeed reaching the limit of the current tech and it is possible that floating gate nand will not go smaller than 25nm.

    However, there are 4 or 5 other technologies in the works that can go much further. I would like to point out that every 2 or 3 generations in CPUs technology shifted significantly. things like high-K and metal gate, water refraction of lasers, switching to ever narrower bands of light, and other improvements have been necessary and provided to fuel ever improving miniaturization... at ever increasing dollar costs. Likewise spinning disks have had tremendous shifts and fundamental technology changes to make the continue to be viable, and TV display technology is constantly shifting from one tech to another that are nothing alike in how they work, but similar in how they appear to the end consumer. In the end, customers may never even realize that at some point their SSD's floating gate nand was replaced with phase change nand which will be replaced with the next best thing.

    Every single aspect of computer technology is actually reaching its current EOL. HDD cannot be further miniaturized with current tech, SSD cannot be, CPUs, etc... what has always happened is that new tech was used... and all the above cases there are several new technologies in late development that will replace current tech for next gen needs. People cry in alarm for SSD tech for some reason, even though we have seen it all before, even though we have alternatives in development.
     
    #7 taltamir, Feb 13, 2011
    Last edited: Feb 13, 2011
  8. adamantinepiggy

    Joined:
    May 29, 2010
    Messages:
    174
    Likes Received:
    0
    Upfront FYI: As I always state, I am not an engineer, simply an old IT guy who supports the engineers within a SSD R&D department so I might not have it technically perfect. Just putting out the gist if what I think I know.

    With the die shrink from 34nm to 25nm you get an increase of densities (more mem cells per area), however you lose reliability per cell. This is exacerbatingcurrent controller chip/firmware weaknesses more than it hurts NAND in regards to SSD operation.

    Compared to a hard drive, an SSD's "individual" cell bit is actually slower to flip between 1 or 0 than the magnetic bit retention portion of a typical hard drive. What makes an SSD fast though is that unlike a hard drive, it can flip the states of many cells at once compared to a hard drive only being able to flip the bit that is currently under the actuator heads. This also applies to reading data. Compounded with the latencies in getting the heads to the proper position on the hard drive, an SSD ends up being much faster in writing real data which is always more than an individual bit.. This reading/writing parallelism is what allows SSD's to be speedy compared to hard drives.

    The more NAND on an SSD, the more parallelism can be performed at once. Unfortunately, the only way to cram on more NAND is to shrink the size of each cell so more can be crammed into a given space. One of the problems with these NAND die shrinks is that the bit flipping writing "process" on the smaller sized NAND is slower. It is not necessarily slower "physically", however the tests to ensure that the cell did indeed change bit states properly is more complicated and involved which slows down the overall process. It is also more complicated to read existing data on smaller NAND cells, but reading is not hampered anything approaching writing.

    Even though each, per cell, writing process is slower, the larger physical amounts of NAND allow for more channels to be active at once and thusly, more parallelism to occur (assuming the controller chip can handle it). So while individually, the smaller NAND is is worse in write speeds, the larger parallelism capability still makes the overall SSD package faster WITH LARGER AMOUNTS OF NAND (Small capacity SSD's need not apply).

    The fact that each cell of the smaller sized NAND is also less durable can be compensated for by smarter controllers and well-balanced overprovisioning sizes. Yes, if you target SSD weaknesses with specific tests, the smaller NAND cells will die much faster (like filling a drive to 99.9% and running burn tests on the remaining 0.1%). There's no way around that as cell life is a physical limitation. While a larger cell sized NAND might survive this type of testing longer, it's largely a moot point since SSD's are not designed to work this way, whether with 34nm or 25nm NAND.

    As such it might not matter much anyway about making NAND cells die from too much use. Most people notice that SSD's seem pretty binary in failure nature anyway. It's the controller/firmware interaction that seems to kill SSD's more often than any worries about long term NAND burnout. You will probably see even more controller/firmware physical and software failures (AKA SSD bricks) compared to pure NAND burnout failures going forward. NAND physical and electrical limitations are pretty set. It's making controllers and especially firmware that can compensate for the smaller sized cell disadvantages that's the hard part.

    In addition, while parallelism with more physical memory allows for more NAND to be operated upon at once, the loads this places on the controller chip gets much higher (dont know if it's exponential or not, but it's a lot higher as paralism goes up). Issues with the controller/firmware making the SSD's get into an unrecoverable state (basically bricked SSD) are already far higher than long-term NAND burnout issues. The higher loads make the controller chip's hotter, draw more power through the power circuitry, etc and make the likelyhood of controller/firmware opration failure even higher. These controller are becoming like the GPU's of video cards with increasing functionality increasing heat by a ton, however unlike a video card, the tiny form factor limits what you can with the heat.

    You'l notice that some SSD makers already add a thermal pad between the controller and the case of 2.5" SSD drives. It gets much worse with the increased loads caused by the increased error checking of smaller die NAND, and also from new-added functionality built into the controller.

    As always, there's all sorts of tricks you can do to reduce heat though hardware and software, however if you do to many tricks, it will end up slowing down the controller substantially to the point where the speed advantage of being able to have higher parallelism is overcome by having to slow the controller down for simple reliability. Things like heat doesnt end up just affecting the controller, but it spreads to other components and affects their reliability also especially because of the required form factors. It's not like you can just add a heat sink or fans to an SSD.

    Smaller sized SSD's will probaly show a much worse performance with smaller die NAND when all else is roughly equal, especially with designs that had larger die NAND in mind when designed. Smaller capacity SSD's do not benefit from increased parallelism (not enough NAND channels) yet requires the more complicated error checking proceedures. I personally wouldn't buy small capacity SSD's (like 64GB) of next release SSD's. I can tell you that this upcoming 25nm specific generation of SSD's will probably start artificially limiting performance to give wiggle room for the following generation as performance apeears to be topping out within the contrained power and form factors limitations. We still have to keep within defined power/heat contraints of the fom factor.
     
  9. taltamir

    taltamir Lifer

    Joined:
    Mar 21, 2004
    Messages:
    13,574
    Likes Received:
    0
    I would like to point out that you lose reliability with spindle drives as well. This is why we are transitioning to 4k sectors (extra ECC). Going beyond 2TB with 512B sectors is possible with existing 64bit controllers, and beyond 12TB with future 128bit controllers. but the 4k sectors let you get much more out of your ECC, compensating for the progressive reliability decreases we saw with each gen.

    As you shrink CPUs and RAM they become more susceptible to solar radiation causing bit flips, so their reliability also decreases. Its a fact of life that as you shrink anything it becomes more sensitive and can be disrupted more easily.

    @adamantinepiggy: NAND cells cannot flip states, thats actually part of why they need trim. NAND cells can SET a state on an empty cell, and they can erase... but they can only erase many cells at once. So to "flip" 1 bit you need to read a whole bunch of cells (everything in the group to be erased, except that 1 bit which is being flipped), then you need to erase the whole bunch of them, then you need to write to all of them.

    The solution is simple, avoid the ultra budget "too small to enjoy the benefits of parallelism" drives. Since price is going down, it shouldn't be too hard.
    I would also point that at the lowest end of parallism you have USB sticks with single or dual chanel NAND. So while fewer chips will hurt the lowest end, the high end will still enjoy full parallelism at a much lower price.
     
    #9 taltamir, Feb 14, 2011
    Last edited: Feb 14, 2011
  10. jimhsu

    jimhsu Senior member

    Joined:
    Mar 22, 2009
    Messages:
    703
    Likes Received:
    0
    I concur with the statement that tomorrow's 120GB drives will be essentially equivalent to today's (34nm) 60GB drives. Buying a 25nm drive and expecting it to act like a 34nm drive OF THE SAME CAPACITY is just asking for problems.
     
  11. dangerman1337

    dangerman1337 Senior member

    Joined:
    Sep 16, 2010
    Messages:
    312
    Likes Received:
    0
    I think we should all wait till other 25nm SSDs come out, the OCZ ones were probably just pumped out to the market for profits (reducing the cost of the flash memory). Though I'd say that we'll not likely see another flash memory shrink unless a major breakthrough or we just move onto memrisitors or similar coming soon hopefully.
     
  12. Arsynic

    Arsynic Senior member

    Joined:
    Jun 22, 2004
    Messages:
    410
    Likes Received:
    0
    When you purchase bleeding edge tech from a small manufacturer like OCZ, you're basically paying to be their QA tester.
     
  13. Modelworks

    Modelworks Lifer

    Joined:
    Feb 22, 2007
    Messages:
    16,237
    Likes Received:
    0
    It might go below 25nm but I doubt much smaller. There are a lot of reasons not to go smaller . When the cost of changing out the manufacturing process is only slightly less than switching to another tech entirely then corporations will switch to the other technology because staying with NAND for just one more production run isn't worth it.

    They have already had to go vertical with transistors to make them fit closer together and even are implementing stacked NAND cells to try to fit more into the same space.

    I think the future is PCM memory where the material for the chips is similar to that used in cd-rw blanks. When heated it changes its layout altering its resistance. Instead of cells requiring capacitors, transistors, it uses measuring the resistance of the cell to determine a 0 or 1. The only connection each cell needs is two wires. Apply current and the cell changes state that can be read as a 1, apply current again and the cell changes state that can be read as a 0. The cells actually get faster and faster the smaller they get so scaling them down to 5nm has already been done in labs. The problem right now is cost of manufacturing. The cells in testing are as fast as DDR memory and immune to things like stray magnetic fields or RF noise so that is another bonus. Servers wouldn't need ECC hardware. The test chips I have are supposedly able to outlast even the best NAND chips in write cycles.
    http://www.micron.com/products/pcm/parallel_pcm.html
     
  14. taltamir

    taltamir Lifer

    Joined:
    Mar 21, 2004
    Messages:
    13,574
    Likes Received:
    0
    I doubt PCM is very similar to CDRW considering its properties, but I am not sure. However, PCM doesn't automatically change its layout when heated. njor is it "flipped" by a current...

    the difference between glass and crystal lattice is that glass cooled much faster and has a less ordered structure... ANY material can be made into glass. http://en.wikipedia.org/wiki/Glass

    PCM uses a material that normally forms glass when cooled (or at least, would within the environment of the drive), the material is heated to melt it, then it is either allowed to cool rapidly forming glass, or forced to cool slowly (by applying more heat as it cools) to form a crystal lattice. The end result is a cell containing a solid which lasts near forever and is either a glass or crystal lattice which have different resistance.
     
    #14 taltamir, Feb 14, 2011
    Last edited: Feb 14, 2011
  15. adamantinepiggy

    Joined:
    May 29, 2010
    Messages:
    174
    Likes Received:
    0
    Yeah I know they don't just flip bits, but it's easier to say that than explaining the entire process in getting the single cell from a 0 to a 1 state. as the end result is still a flipped bit.
     
  16. Modelworks

    Modelworks Lifer

    Joined:
    Feb 22, 2007
    Messages:
    16,237
    Likes Received:
    0
    The material used is a chalcogenide with the main difference being that in a CDRW the optical change is used as an indicator and in PCM the resistance change is used for the indicator. It does change layout when heated and current polarity determines the bit.

    Not quite
    Applying current one way causes the material to crystalize , sending a reverse current causes it to reset or clear. The material is electrically conductive unlike normal glass which insulates.


    From the Micron patent
     
  17. taltamir

    taltamir Lifer

    Joined:
    Mar 21, 2004
    Messages:
    13,574
    Likes Received:
    0
    Thank you for an informative post. It seems that it is very similar after all, although its still different materials.

    A chalcogenide glass (hard "ch" as in "chemistry") is a glass containing one or more chalcogenide elements.
    The chalcogens (pronounced /ˈkælkədʒɨn/) are the chemical elements in group 16 (old-style: VIB or VIA) of the periodic table. This group is also known as the oxygen family. It consists of the elements oxygen (O), sulfur (S), selenium (Se), tellurium (Te), the radioactive element polonium (Po), and the synthetic element ununhexium (Uuh).

    And the quote you gave from the micron patent describes chalcogenide glass mixed with germanium selenide, silver, or silver selenide. So just because both use chalcogenide glasses they still vary on composition.

    Glass is indeed normally not conductive, there is your 0 and 1 forms right there.
    Interestingly the micron implementation doesn't work that way. Instead of controlling the rate of cooling to form glass or crystal lattice they manipulate the polarity of the voltage applied to the device. This is different than what I said. I am 100% sure I read of it working the way I described, so either I read from a bad source or they are competing implementations.
     
    #17 taltamir, Feb 15, 2011
    Last edited: Feb 15, 2011
  18. capeconsultant

    capeconsultant Senior member

    Joined:
    Aug 10, 2005
    Messages:
    454
    Likes Received:
    0
    This is seriously interesting stuff and supports my rant in same forum that things have hit a bit of a snag wiht SSD NAND chips. Maybe why there is no tired and true current new generation of drives., Crucial seems to have the latest. If 25nm chips are problematic and have the effect of limiting reliability and speed and wear unless more chips are used, well, that cancels out the move to smaller size, yes?

    I actually believe that there are points at which things cannot be made smaller. The risks outweigh the rewards at some point.

    And hey, I tip my hat to OCZ. Sure, they are not Intel, and yes, they are using us a testers to a degree. But at least they are in the game mixing it up and that counts for something! And I wish I made .0000001% of Intel's toilet paper budget as a yearly salary :))))))!
     
  19. Dadofamunky

    Dadofamunky Platinum Member

    Joined:
    Jan 4, 2005
    Messages:
    2,185
    Likes Received:
    0
    Can I just say, that was well DONE.

    So is this. Good discussion.
     
    #19 Dadofamunky, Feb 15, 2011
    Last edited: Feb 15, 2011