I think you'd see the greatest benefit when using traditional SATA ports because you'd no longer be choking on the limited bandwidth of a single port. Sequential throughput would see a big boost, especially on reads.
Most modern consumer motherboards have an abundance of SATA ports but limited number of PCIe lanes. I think that makes a compelling case for buying large when it comes to PCIe SSDs, since you can't just pile on more drives when you need more space.
Well, that's a traditional view. I gave my own solution in a previous post, and here's a bench result with Magician on an ADATA SP550 in AHCI-mode with a moderate 2.5 GB RAM-cache and a 40GB NVMe M.2 caching volume:
Now, obviously, this is just a benchmark. It pretty much conforms to similar results with Anvil and CrystalDiskMark, ATTO and others. The "eye of the needle" where sequential reads "falls down" are the benchtests with > X GB test-size on an X GB cache or <= X. However, this all hinges on how big a single file you might load and how often you'd load that file size, because for most work, and for most programs and games -- a 2.5 GB cache is more than sufficient for most work.
The Primo program is also caching to my NVMe M.2 960 EVO, and it does it in sort of a "stealth" mode, when the system is idle. Unlike ISRT (which also requires RAID-mode), the caches fill slowly. The assumption even I would make is that it would hammer the caching-NVMe-SSD and rack up enormous TBWs. But it depends on deployment and usage: For an OS-boot disk, the persistent cache may fill up, but subsequent writes to it are much less frequent, so I found another system with a regular SATA caching-SSD had only accumulated about 6TB of writes over a two-year deployment.
You can cache a RAID0 array and an AHCI disk (with the different modes requiring different controllers). But it's the best argument I can make for buying a single large SSD, which you'd only cache to RAM unless you had an NVMe caching volume available -- yet to similar effect. Most of the 21,000 MB/s score shown here is due to the RAM-cache, with only some portion representing hits to the NVMe.
RAM is cheap. 16GB will always have RAM to spare unless you have some task and workload that will hog memory.
A couple days of operating this configuration, and everything I use in OS and software is just "right there."
Here's a benchie for a 5400 RPM 2.5" 2TB laptop spinner -- Seagate Barracuda:
If I choose to configure these disks for deferred writes -- equivalent to the "Maximum" setting of ISRT -- then the "Write" part of the equation changes also.
So someone would say "Gee, though! You have to have two SATA SSDs or "the spare, small one" to do this. In fact, you could get a large 1TB SATA SSD, put the OS volumes on it, add a ~50GB caching volume for an HDD, and cache all of that to RAM for both the SSD and the HDD, resulting in similar benchmarks.
You spend $30 on a lifetime license, free upgrades through all future versions. You configure it, tweak it and -- if you want -- forget about it. You can change the RAM-caching on the fly. If you change the size of the SSD caches, they'll be purged and the stealth-caching will begin all over again. But that still means you can adjust it for usage, depending on what you're doing.
My priority is to conserve SATA ports -- for things like hot-swap bays and eSATA. You could hook up two SSD's in RAID-mode; you used two ports; your storage volume is still more limited; you can't break a RAID0 array -- and all the other misgivings. But if you had a RAID0, you could still cache it and throw AHCI-mode disks from a different controller into the configuration.
Put it another way. Instead of spending money on duplicate SSDs, you buy a program that will allow adding other disks, like spinners, into the mix.
It may be a stopgap anticipating cheaper NVMe or -- whatever -- but it works great. I've got plans to shell out for a 1TB Pro or EVO, and get myself a 27" 1440p gaming monitor.
I can wait. The only thing nagging me? My curiosity. And, of course, the other way to go is excessive RAM of 32GB kit allocated to larger caches. The only thing about that is that the Hiberfil.sys will need to be at least 16GB, but 16GB is the default for a 16GB kit.