Hybrid drive caching algorithms

name99 · Sep 11, 2010

I just read the review of the Seagate Momentus XT where we see speculation regarding the caching algorithm that is used in the sentence:
"most likely via a history table of LBAs and their frequency of access"

I don't think so. If you look at the number of LBs that exist, it is freaking HUGE --- even if you cluster at, say, 4KB clusters, on a 500GB drive you have about 125 million of these and that's still a not insubstantial amount of RAM --- and an array which then has to be ordered dynamically to do anything useful.

I felt compelled to write this post because it appears that the Seagate drive is fragile to streaming data pushing good data out the cache, and it doesn't have to be this. So with luck someone from Seagate (or one of it's competitors) will read this and give us a better hybrid drive in future.

The way I would handle this is to treat the thing like a CPU cache with sets and ways. If we treat it as non-associative, then we have each block of cache (whether a "block" is 512 bytes or 4KB) corresponds to ~128 blocks of disk. The absolute dumbest way to do things is that, for each block, as the block is read, if it's not in the cache it's put in it's appropriate single pre-ordained place --- like a simple-minded 1-way cache.

But of course that's the dumbest way of doing things. Much better would be to make the cache 4 or 8 way wide, and for each way to store a dynamic LRU to MRU ordering (or the various tricks CPU designers have used to fake this), then when a block is read that is not in the cache, we toss the oldest block in the cache and store the newly read block.

BUT, and this is important, this is STILL not optimal --- it's not optimal for CPUs, and it's not optimal for drives. It is, however, easy to fix in drives, harder in CPUs. The problem is streaming data. With the model described above, any sort of operation that performs a one-time run through a large file (copying/backup, or just watching a movie) is going to replace the entire cache with one-time data. I don't know what the standard ways to deal with this are, but I have an easy solution which is that, associated with each way is a small amount of RAM that stores the most recently seen blockID as a POTENTIAL candidate for the cache. So at any given time, a way contains, say, 4 blocks of good data, plus the ID of the most recent block mapped onto that way which did not hit in the cache. If the next block that does not hit in the cache is the SAME as the potential candidate, we treat that as a verification that we are not streaming, and the on this second read we move the block into the cache.

You can expand this idea, based on real-world data, to whatever works best. In particular, this scheme as exactly described is potentially fragile in that it requires two successive reads to the same block (within a particular way) without an intervening read elsewhere in the way. So it is good at keeping out streaming data, but potentially also keeps out some re-used data. You can deal with this by having the per-way pool of potential blockIDs be 2, or 3, or N in size --- when the pool is N in size, we can allow up to N-1 reads in that way to intervene between two successive reads to a block, and still catch the block.

So there is scope for some ingenuity in quite how these systems are designed. If I had to guess, my guess would be that the current system is something like 4-way associative. Not clear if they are using my idea (or some equivalent) to prevent streaming from screwing the system over. The test that should be done, which I don't see in the post, would be to time something like a bunch of app launches, THEN read 4GB sequentially from the disk, time the app launches again, and see if the time has gone down. It would not surprise me if this first round of firmware does little to nothing to prevent streaming pollution --- not least because the existing benchmarks are not testing for it. On the other hand, this also all suggests that there is scope, in time, for much better engineering to figure out the optimal number of ways for the cache, the optimal cache block size, and the optimal strategy to prevent streaming from polluting the cache.

Soulkeeper · Sep 12, 2010

Nice post. The world needs more thinkers 🙂
Welcome to AT !

corkyg · Sep 12, 2010

A most thorough and well written analysis! I see the Momentus XL drives as a "mark on the wall," and they should go from there. I happen to have three of them, all in a pair of Lenovo Thinkpads. My usage involves very little streaming of data. What I experience is that the drive is an improvement over the standard 7200 RPM pure spindles in the sense that, as they are used, they learn my boot sequences and preferences, and have reduced boot and Windows load times by about 10-15% on average. Same for shutdown.

I hope that Seagate engineers take note of your comments and make improvements in the next generation.

Hybrid drive caching algorithms

name99

Senior member

Soulkeeper

Diamond Member

corkyg

Elite Member | Peripherals

TRENDING THREADS