Question How best to benchmark test NVMe in Linux?

garengllc

Junior Member
Feb 25, 2020
4
0
6
I am running Ubuntu 18.04 in a fairly new system. I recently bought a Samsung 970 EVO Plus to do some speed tests for work. While trying different things and not getting the results I wanted, I stumbled upon the great write up here: https://www.anandtech.com/show/13512/the-crucial-p1-1tb-ssd-review/7

It is really well written, and the 970 Plus is one of the comparison models in their tests. In particular I am interesting in the "whole drive sequential write" test. Looking at the plot, I was thinking that 1200MB/s was a reasonable rate to hope for. When I run a test via dd for 950GB, I only get 832MB/s. Not bad, but I should have a 50% bump in there somewhere. My command was: sync; dd if=/dev/zero of=~/SSD/tempfile bs=128000 count=7421875; sync. I have the drive mounted as EXT4 and have noatime as part of the mount parameters to try and speed things up.

I assume that anandtech doesn't release their testing software, so does anyone else have a good way to test things in Linux? In the end I need to write a program to stream the data to disk, but for now I would at least like to see the expected numbers.

TIA
 

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
The synthetic tests used in our SSD reviews are done with fio: https://github.com/axboe/fio

The whole-drive sequential write test uses a fio test that does a sequential write of 1GB, and the script increments the offset by 1GB at a time to get all the data points that go into the plot.

One factor that may be contributing to your dd test being slower is that you're using a block size that doesn't fit well with the underlying NAND. Try 131072 bytes instead of 128000. Real memory isn't decimal.
 

garengllc

Junior Member
Feb 25, 2020
4
0
6
The synthetic tests used in our SSD reviews are done with fio: https://github.com/axboe/fio

The whole-drive sequential write test uses a fio test that does a sequential write of 1GB, and the script increments the offset by 1GB at a time to get all the data points that go into the plot.

One factor that may be contributing to your dd test being slower is that you're using a block size that doesn't fit well with the underlying NAND. Try 131072 bytes instead of 128000. Real memory isn't decimal.
Thanks for the quick response. Thank you for the information on how the tests were run. I actually did some tests with fio as a plugin for spdk. I got good better numbers doing that, but what concerned me was that it is really doing the writes a different way. I was worried that I would not be able to recreate that kind of throughput without stepping into the STEEP learning curve of spdk. Would that be your take on it as well?

You know, I read 128k everywhere, but I wasn't thinking about the value being kiB. I will see what I can do with that once my iozone testing finishing (I've tried Bonnie++, fio/spdk, and dd at this point)
 

garengllc

Junior Member
Feb 25, 2020
4
0
6
Ignore SPDK and just use fio with one of the more normal ioengine backends.
OK, sounds good

BTW, I just reran a 30GB dd test and didn't see a different between MiB and MB buffer sizes. Now, very important, This was for 30GB so I could get some quick results. I don't know how similar things will be when I write 900GB (I will try it later, it just takes so long :) )
$ sync; dd if=/dev/zero of=~/SSD/tempfile bs=128000 count=234375; sync 234375+0 records in 234375+0 records out 30000000000 bytes (30 GB, 28 GiB) copied, 13.8327 s, 2.2 GB/s $ sync; dd if=/dev/zero of=~/SSD/tempfile bs=131072 count=228882; sync 228882+0 records in 228882+0 records out 30000021504 bytes (30 GB, 28 GiB) copied, 13.463 s, 2.2 GB/s
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,712
142
106
To be fair, you saw about a 3% improvement. For best results, run the same thing 2-3x and take the average.
hdparm -tT /dev/nvme0n1 is another option
It's very hard to isolate these things because the dram buffer, nvme host side caching, kernel/driver overhead, syncing, and other variables.
Also it is mounted and has a filesystem, which impacts things. If you don't care about data loss benchmark the entire device with a read/write destructive test for the best results. ie: /dev/nvme0n1

The phoronix benchmark tool has several suites/tests you can use too (aio-stress, fs-mark, iozone, tiobench, postmark, etc.)
Here is an older test I ran with that, for comparison https://openbenchmarking.org/result/1306266-SO-1306266SO28
I recently stumbled across https://github.com/koct9i/ioping for latency testing.
 
Last edited:

garengllc

Junior Member
Feb 25, 2020
4
0
6
Thanks all for the help, I am still diving into this. I went off on a libaio rabbit hole and had to shelve that for now. What makes things difficult in my end program versus all the speed test examples (which I know is what I originally asked about) is that I will receive data in a bursty fashion (say 1018B at a shot; it is always the same), so I will need to either buffer than internally or make sure I am able to service it in a way that the work is being done between data captures. I went down the libaio path since I figured that this sort of covered the solution, but it was a mess to incorporate into my larger app.

I will take a look at your links Soulkeeper next.