OCZ Vertex 2E loses data?

Discussion in 'Memory and Storage' started by no.human.being, Sep 11, 2014.

  1. no.human.being

    no.human.being Junior Member

    Joined:
    Sep 11, 2014
    Messages:
    5
    Likes Received:
    0
    The OCZ Vetex 2E (180 GB) in my notebook just lost all of its data.

    I used the drive as system drive and had a Linux distro installed to it. All of a sudden, the system wouldn't find a bootable device.

    I thought that the boot sector was damaged, so I booted the system into a live Linux from a USB device and the drive showed up as "unpartitioned".

    I thought: "Oh damn, the partition table is damaged."

    However, as I grabbed a hex-view of the device, all that was stored on it was zeroes, as if an "ATA secure-erase" command had been issued to it. (But I was sure it had not.)

    I decided to replace the drive, since obviously the memory was becoming unreliable, but I thought I might operate it again until I get a new drive. So I performed a "secure-erase" (to make sure every page is in fact physically erased and in a clean state) and installed the operating system to it again. After installation, my system booted from the device just fine. So I thought: "Ok, it's probably gonna stay until the new drive arrives. After that I'm just gonna replace it just in case."

    However, the data only remained on the drive until I shut the system down. After power-cycling it, it came up completely blank (all sectors storing only zeroes) again.

    I heard that SSDs, especially from the Vertex 2E series, which appear to have quite its share of reliability issues, might either turn read-only or "panic-lock" and completely disappear from the host system (basically the controller "dying"), without any prior warnings, so I have quite recent backups of all important data on a conventional hard-disk drive. However, that the SSD would just become "volatile", losing its contents on power-down is "new" for me.

    What's also interesting is that all S.M.A.R.T. values look normal (or rather very good) and self-tests complete without any errors.

    The drive is 3 years old now, so it's out-of-warrenty etc. and "allowed to fail". However, that flash memory basically "continues working" but becomes "volatile" is something that I don't really grasp. Anyone has an idea what might have happened here?
     
  2. Loading...

    Similar Threads - Vertex loses data Forum Date
    External WD my book HD losing trasfer rate to zero Memory and Storage Jun 12, 2017
    Mirroring 2 OCZ Vertex 2 50gb as my boot drive? Memory and Storage Jul 28, 2016
    OCZ Vertex 3 MAX IOPS not booting. Memory and Storage May 5, 2016
    Plextor M5 Pro vs OCZ Arc 100 vs OCZ Vertex 4 Memory and Storage Dec 12, 2014
    [RESOLVED] OCZ Vertex 2 not recognized on Late 2014 Mac Mini Memory and Storage Nov 21, 2014

  3. BrightCandle

    BrightCandle Diamond Member

    Joined:
    Mar 15, 2007
    Messages:
    4,763
    Likes Received:
    0
    Both my Vertex 2E SSDs died become unavailable to the bios/OS in the usual way. One did so in warranty, the other was just outside it.

    Terrible quality drives in the end, utterly awful reliability on the grand scale, over 30% died within warranty and who knows how many died after that already.
     
  4. Phynaz

    Phynaz Lifer

    Joined:
    Mar 13, 2006
    Messages:
    10,035
    Likes Received:
    738
    OCZ - Complete crap.
     
  5. no.human.being

    no.human.being Junior Member

    Joined:
    Sep 11, 2014
    Messages:
    5
    Likes Received:
    0
    Yeah, but anyone knows why? I mean they use pretty standard Sandforce controllers and Intel NAND on it. Sandforce controllers are probably not the best (even though I don't understand why they would die that often - I mean it's basically a processor - no "moving parts" - I understand memory wears out, but why would the controller fail? - apart from it containing a bit of NAND itself - to store its own firmware or what - and that might fail - but the firmware basically seems intact - the drive reports itself to the host system, I can read SMART values, I can issue ATA commands - doesn't look like a controller error at all), but back in the days of the Vertex 2E and shortly after, lots of SSDs used them and most of them don't have that much of a bad reputation as OCZ had. I mean basically all that's "OCZ" on that drive is probably the PCB layout and casing - the controller is Sandforce, the NAND is Intel - a combination seen in many other SSDs of that time.

    But I heard OCZ uses "off-spec" NAND (basically NAND that's considered "not good enough for any critical application other than simple bulk storage of unimportant data") for their SSDs - that's at least what one of their competitors claimed and since OCZ, as far as I can see, has not sued that competitor, that claim might be correct.

    I still don't understand why the NAND would basically become "clean" - and even less why it would become "volatile" and lose its contents on power-down. The best explanation I could get is that the NAND basically failed completely - and all data written to the drive is stored in a volatile DRAM cache, so that the host sees it as "written", but it never actually hit the NAND. And when you power the drive down, the contents of that cache get lost. However, I pretty much doubt that a full operating-system will fit into that cache. I mean I installed an OS and booted from that drive after it "failed".

    I mean it's clear that the drive is defunct and needs replacement (which I already ordered - Samsung 840). I just don't understand what sort of malfunction might cause the behaviour I see here.
     
  6. BrightCandle

    BrightCandle Diamond Member

    Joined:
    Mar 15, 2007
    Messages:
    4,763
    Likes Received:
    0
    OCZ was responsible for everything else, power cicuitry, the PCB and much of the firmware. Somewhere in there is what failed. Maybe the line(s) that lead to the clear NAND function are becoming shorted by some function that occurs at boot, so its literally running a secure erase everytime you power cycle the machine.
     
  7. no.human.being

    no.human.being Junior Member

    Joined:
    Sep 11, 2014
    Messages:
    5
    Likes Received:
    0
    Today's EEPROMs no longer have a simple "erase line". Before I bought my Vertex 2E, I asked OCZ whether I could clear the NAND physically by opening the SSD and raising an "erase line" to high (or low if it's active-low) on each NAND chip or whatever, in case the controller "panic-locks" and I want to erase the NAND, since I used that SSD for sensitive corporate data. (Fortunately everything secured with strong encryption so I can safely dispose of the drive at some point.)

    They told me that, for some years now, EEPROM no longer has an actual "erase line", which erases the entire chip (which is normally not what you want obviously - in almost all circumstances you need some sort of "selective erase"). You can erase blocks by sending an "erase command" over the data bus with the address of the block you want to erase. So when the controller wants to perform a "secure erase", it actually has to run through the address range of each NAND chip and issue an "erase" command for each block. And while "secure erase" is extremely fast on SSDs (compared to HDDs which actually have to overwrite each sector - even though it's still blazingly fast compared to sending "write" commands to a HDD for actually "writing" zeroes to the sectors, so there's still a difference here) it's still not "instant". It still takes a few seconds indicating that this is in fact a "sequential operation".

    Whatever the case, I won't trust this disk any data anymore, so while I might be able to install an OS and run from it, unless I power down the machine, I won't do this, as I won't trust it to store any meaningful data, so I certainly won't grab data from, say, a NAS, copy it to the SSD, work on it, then upload it back. That has too much potential for sequences of zeroes to be uploaded at some point, making the "master copy" worthless as well. ;)
     
  8. ronbo613

    ronbo613 Golden Member

    Joined:
    Jan 9, 2010
    Messages:
    1,232
    Likes Received:
    45
    The End
     
  9. VirtualLarry

    VirtualLarry Lifer

    Joined:
    Aug 25, 2001
    Messages:
    40,168
    Likes Received:
    2,714
    I would be interested in a HDTune surface scan of the drive. Is it full of red blocks (failed CRC, sector read error) or green blocks?

    I have a pair of refurb OCZ 240GB Vertex Plus R2 drives, and I had one in a Foxconn NanoPC, and due to the high ambient temps in that device, the NAND started failing and corrupted.

    I would get increasing amounts of red sectors on the surface scan. Eventually, the increasing bad sectors ate some of my system files and the OS wouldn't boot.

    However, I was able to revive it via a Secure Erase. It lasted about a month before it failed again in the same way.

    I have since secure-erased it again, and it has been doing fine in a much cooler ATX desktop case with decent fans.
     
  10. no.human.being

    no.human.being Junior Member

    Joined:
    Sep 11, 2014
    Messages:
    5
    Likes Received:
    0
    [​IMG]

    [​IMG]

    Looks good eh?

    Now what I did after that was connect the drive to my server for testing purposes, to make sure that it's not the notebook's fault.

    I disconnected all other drives from the server and booted a Live Linux from USB, then used "dd" to completely fill it with random data (from "/dev/urandom" in 512 kiB blocks), which took about 3 hours and a bit.

    What I recognized immediately was that, after this, the SSD was recognized by the server as a very "weird device".

    When you "invert" every 2-byte-sequence (the controller seems to be some 16-bit processor which got the endianness wrong here) you get the following.

    Now that looks a lot like "controller error", eh? Looks like the controller's grabbing some data that's intended for something else and sending that out as the model and serial number. Also note that the drive does not respond to any SMART requests.

    However, wherever you read from the device, you'd get random values, not zeroes, so it looks like everything got actually stored in NAND. (180 GB of random values cannot possibly be cached.) When reading from different offsets, you got different numbers. When reading from the same offset again after reading different offsets, you got the same numbers as when you were reading from that offset for the first time.

    Then I powered my server down, left it off for a minute or so, then booted it back up and looked at the drive. Still got the same numbers. So seems like everything's stored. After the reboot, that "bad identification" crap was also gone.

    I'm gonna leave the drive alone (and unpowered) now for some time and check again later if it still contains valid data.

    If it does, I'm gonna put it in my notebook, boot a live system there and check if it still contains valid data. If it's zeroed afterwards, then my notebook's probably doing some weird sh*t (for some reason ATA-erasing or TRIM-ming the entire drive on every boot).
     
    #9 no.human.being, Sep 13, 2014
    Last edited: Sep 13, 2014
  11. AdamK47

    AdamK47 Lifer

    Joined:
    Oct 9, 1999
    Messages:
    13,094
    Likes Received:
    563
    False
     
  12. no.human.being

    no.human.being Junior Member

    Joined:
    Sep 11, 2014
    Messages:
    5
    Likes Received:
    0
    Put it back into the notebook, still everything stored. Plus device name and serial number are back to "normal" and SMART is working again.

    Seems like the drive "works" now, but since it's that unreliable, replacing it is probably the right option. I probably shouldn't even bother installing an OS on it.

    EDIT: ... or not. Can't even format the drive to Ext4.
     
    #11 no.human.being, Sep 14, 2014
    Last edited: Sep 14, 2014