Interesting post,
Sahakiel! Let me share my thoughts..
One big problem with your argument: That requires you to break down memory chips to feed each controller. There's really no point in scaling a controller past 10 channels when you're only adding 10 memory chips at most.
Indeed! So that is why SSD performance should scale with its size. Thus a future 1TB SSD should be many times faster than a smaller 160GB SSD of the same brand family.
Currently, we only see this with Intel's X25-V, which is essentially half the channels of the original X25-M. But i could imagine that this 'scale performance with size' would be a logical thing for SSDs, especially when the controller market matured a bit.
Let me portray my vision of future SSDs:
Stacked flash chips with controller underneath:
http://www.dailytech.com/article.aspx?newsid=1761
Imagine a controller sits at the bottom and has 8 NAND chips on top. Now this makes an 8-channel controller and a nice size like 8*64Gb=64GiB. Very good usage of real estate, controller requires no extra real estate. High theoretical speeds due to NAND and controller being so closely integrated.
Simple SSDs would have only one NAND stack, and may be plugged directly on SATA port on the motherboard. They would still have decent size (64GB/128GB) and have a decent controller not like current versions that plug directly in PATA/SATA port).
More expensive / higher capacity SSDs would be either 1,8" or 2,5" form factor, with some exceptional beasts at 3,5". Thus a variety of form factors for different usage. A 2,5" could have 2 to 8 NAND stacks, while 3,5" could have something like 16 to 32 NAND stacks; all connected by a 'master controller', though the individual nand controllers do most of the dirty work. The master controller multiplexes all controllers into a single SAS or SATA/1200 port, and may share architecture with high-end PCI-express 3.0 SSDs.
This multiple controller configuration may also fix processing speed ability on the individual controllers, as it removes some of the logic away of the controllers and integrates it into a separate chip which can be passively cooled by a small heatsink. The seperate master-chip would be handling write-back and mapping table and uses only SRAM buffers for low-latency I/O and to store its mapping table. Not sure if that would be feasible, as Intel opted to use a DRAM chip for the mapping table instead of more SRAM which is understandable.
Power could also be conserved by switching individual chips off. This would especially work well to lower power consumption on the 3,5" 'monsters' with alot of NAND chips.
Concluding my future fantasy, i think there's ALOT of performance headroom that current generations are not utilizing. With the technology available today, we could build much better SSDs rated at much higher speeds.
An alternative would be: software NAND controllers. Then you buy 'dumb' NAND memory which is essentially controller-less but allows a software driver to implement the NAND controller's functions. This would add some latency, but would also allow for utilization of alot of CPU power. I read such a driver being in progress, but likely will be a difficult task with many uncertainties and potential issues with hardware.
The point is SSD's do share some similarities with RAID 0 arrays, only with memory chips instead of entire hard drives. If you're familiar with RAID, you know the limitations and, unsurprisingly, SSD's don't seem to be breaking them, yet.
I indeed think to have a decent understanding of RAID0 performance characteristics. But you weren't very specific; could you elaborate this further?
Hence, the artificial controller limitation you cite is actually a physical memory limitation.
Can you explain that? With memory limitation you mean the NAND or DRAM memory chip? In this context, it is interesting to note two things:
1) Intel doesn't use the DRAM for write-back, it has internal 256K SRAM buffercache to do that job. The DRAM is only used for HPA mapping tables.
2) Intel's X25-M G1 has 133MHz SDRAM chip; X25-M G2 has 100MHz SDRAM chip, but its performance does not appear to suffer as a result of the slower DRAM speed.
Internal SRAM buffers would allow for the extreme low latencies for both reads and writes that Intel has. If all the data has to run via one SDRAM chip it would indeed be slow as the memory is also used for the controller's functioning. Though stated, in the case for Intel only for doing LBA lookups, essentially looking up where it actually stored a data block.
If you can't get faster memory through increasing density, your only recourse is to make the output logic for the memory itself faster. Your access latencies take a hit (extra logic) but your sustained transfer (down to a certain chunk size depending on output logic) will increase.
If this is meant as analogy for RAID0, then may i add that parallel I/O (RAID0/interleaving/striping) not only accelerates sequential workloads, but also random I/O workloads, i.e. those with a low number of contiguous I/O requests. This applies to both SSDs and HDDs, though HDDs have latency handicaps that RAID0 cannot circumvent and revert to single-disk performance or just gain limited benefit from the striping.
If RAID0 is properly implemented, those limitations are very minor and RAID0 could substantially increase your I/O performance for both sequential and random workloads. Given enough queued I/O's, scaling should be nearly 100 percent or linear, until it hits a bottleneck of interface latency/bandwidth or CPU/RAM. Most intelligent RAID5 drivers are memory bottlenecked, such as the geom_raid5 in FreeNAS. The parity calculations only take up a fraction of all the memory copies due to splitting/combining I/O requests, which is needed for RAID5 to write fast.
Unfortunately, i haven't seen a perfect RAID driver yet that exploits all/most of theoretical abilities. RAID0 drivers do best in this regard. RAID1 is terrible since most implementations do not benefit from an additional disk capable of reading other stuff than the primary disk. Windows onboard RAID drivers aren't much better, though Intel is the only Windows software RAID5 implementation that uses RAM write-back, and thus capable of single disk+ write speeds.