Hi, I’m Tim Amundsen. After over ten years in the memory and storage industry, engaging with customers as an SSD Application Engineer, Field Sales Engineer, and Solutions Architect, I created a quick and straightforward guide to SSD testing.
Purchasing the latest-and-greatest storage device should be considered an investment that will bring a return for years. And like each wise investor, you need to do your homework before placing a purchase order. Especially in the enterprise segment, reading the datasheet or someone else’s drive review is not good enough. Lucky for you, evaluating drives for enterprise workloads can be a simple and rewarding experience, and I’m going to show you how to do it.
This blog will guide you through a simple SSD performance checkout. This is more of an evaluation than a qualification or validation exercise. An evaluation is a relatively quick and effective assessment to confirm interoperability and performance, while a validation or qualification are more in-depth assessments. I love Pareto’s Principle and believe it applies to SSD testing. You get 80% of the decision-defining data from just 20% of the work and time.
Take the Time to Test the Drive for Yourself
The datasheet performance numbers provided by the manufacturer are often the best-case or corner-case bandwidth and IOs per second measurement and are not representative of performance in your system. CPU, DRAM, BIOS settings, and OS settings significantly influence SSD performance. So, let’s ensure you test in an environment as close to your production/final configuration as possible. Those datasheet numbers simply provide a broad target. I expect to get within 10% of the datasheet numbers when I do my drive testing. If you don’t achieve that rule of thumb, review the pre-work section below and reach out to your SSD vendor for additional guidance. They are usually accommodating.
First, establish your baseline and metrics for “better” or “good enough.” Your target may be simply relative to what you had before. Was the last drive good enough on performance, and now you want more capacity? Or was the system held back because of a storage performance bottleneck? Either way, let’s establish a baseline with the existing drives. Then we can repeat the checkout steps on the new drive and have discussion-defining data to justify the investment.
Now a quick reminder, this is a high-level evaluation guide, not a full qualification or validation test flow. I will not provide details on testing signal integrity, out-of-band management operations, thermals, shock, vibration, and other environmental metrics. Instead, we will keep it simple and focus on performance testing, including preparing the system, preparing the drives, providing a recommended tool, understanding workloads, and reviewing results. None of the information I share here is new, novel, or proprietary. SSD testing practices are well-defined so that we can all use the same measuring stick…usually. This blog references many excellent publications and links are provided to dive deeper and get a second opinion.
Step 1 – Prepare the System
Just as you cannot put a square peg in a round hole, ensure the new drive you want to test is physically compatible with your system. If you are replacing an existing drive, this is an easy visual comparison between the old and new drive, paying attention to the pins. For example, 2.5” SATA and U.2 7.5mm NVMe drives look similar except for the pins. If you are introducing a new form factor to the system, check the service manual and verify the availability of the slots. The Storage Networking Industry Association (SNIA) has a great outline on their site of the most common data center SSD form factors, including U.2 and E1.S.
Now let’s check the system’s BIOS/BMC settings. If you are replacing an existing drive, this may not be needed, but if you are introducing a new drive type to the system, ensure you supply enough airflow to the drive. Solid-state drives can peak at 25 watts and rely on the airflow controlled in the BIOS/BMC to keep cool enough to perform correctly. To access the BIOS/BMC and increase fan speed, check the system documentation, or reach out to your system manufacturer. For example, on a Dell PowerEdge system, you change fan speed preferences via iDRAC.
Next, power on the system and confirm that the drive has been appropriately enumerated. In Windows, we can do this by going to ‘Disk Management’ as outlined in this guide by Kazim Ali Alvi at allthings.how . On a Linux system, use this command in the terminal, $ sudo nvme list.
NVMe-CLI is a great Linux tool and referenced throughout this guide. If you want to dive deeper into NVMe-CLI, check out this article by JM Hands, SSD Product Line Manager at Intel, and NVME Working Group Co-Chair.
While confirming proper enumeration, let’s check the PCIe lane count and speed. It is possible for the drive to enumerate, but find itself stuck at a lower lane count or speed. If you are working in a Windows environment, you want to follow this guide from Avalue Technology. And for Linux, use this guide from Dell. Most drives today are PCIe Gen 4, which is 16 GT/s, and consume four PCIe lanes.
Step 2 – Prepare the Drive
An essential step to preparing the drive for testing is to ensure you have the latest firmware. If ‘firmware’ is a new term for you or you want to learn more about it, Micron produced an informative video explaining the fundamentals of SSD firmware. Using the latest firmware may improve performance, stability, and security. Confirming the current firmware version is straightforward. You can follow this guide from Dell for a Windows system using the command line or this guide from Micron for using Windows Device Manager. On Linux, we use NVMe-CLI as outlined in JM’s guide referenced earlier. Check the vendor support website to confirm you are on the latest firmware and use their recommended tool or command to perform the update.
Starting with a clean slate will remove the impact of past operations on the drive performance. If you are using a brand new drive, you can skip this step. To get that “clean slate,” let’s wipe all data on the drive. On a Windows system, use the diskpart clean command as outlined in this article on WinAero. On a Linux system, use the command $ sudo nvme format -s 1 <the path to the drive, i.e.,/dev/nvme0n1>.
In this fresh-out-of-box state, an SSD accepts IO faster than usual. This is nice, but it will not last. This SNIA presentation shows the temporary elevated performance on a fresh drive followed by a sudden drop known as the ‘write cliff.’ Next comes a gradual decline in performance before reaching a more consistent and realistic number. The process for getting to steady state performance is called preconditioning. To precondition a drive, perform two entire drive writes of the target workload. You can use this guide from Intel on how to precondition using FIO. We will talk more about FIO in the next section. Remember to do this before changing workload types, i.e., jumping from random to sequential and small block to large block sizes.
Step 3 – Tools for Synthetic Benchmarking SSDs
There are many tools for pushing synthetic workloads to an SSD and measuring performance, but they are not all equal. We can have a separate blog on this topic, but in a nutshell, the tool needs to check the following boxes.
- Ability to precondition the drive
- Ability to exclude measurements when performance is still ramping
- Ability to control the workload basics: block size, sequential vs. random, read vs. write mix, queue depth
- Ability to run a series of back-to-back tests with no human interaction
- Ability to control the compressibility of the data
As the name denotes, Flexible I/O is a very flexible tool with many knobs to turn as you create a synthetic workload to model your actual workload and data. It has become an industry-standard tool, written by Jens Axboe, to remove the hassle of writing special test case programs for testing a specific workload. Able to run on both Windows and Linux, I recommend you download this free tool for all your SSD testing.
FIO’s syntax and arguments are simple; you can follow this guide from Oracle to understand the basics of FIO. But one element that is often overlooked is controlling data compressibility. You want to set the compression ratio if the storage devices can compress data independent of the CPU. You can read about this in the FIO syntax and arguments referenced above, but for your notes these are the arguments for 50% compressible data. –buffer_compress_percentage=60 –buffer_compress_chunk=4096 –refill_buffers=1. Several full FIO command examples are below.
Step 4 – Workloads
As mentioned, we need a measuring stick to show quantifiable improvement. One industry standard methodology for establishing a measuring stick is “four-corner” testing. This method will test the following workloads with a high queue depth. Remember to precondition before changing from sequential to random so you can reach steady state performance. “Four-corners” test results are often included in the product datasheet.
- 128k Sequential Read
- 128k Sequential Write
- 4k Random Read
- 4k Random Write
A better way to quantify improvement is to create a workload that represents real-world application data. In reality, data is a mix of reads and writes. The image below from Solidigm shows workloads across a variety of applications. In FIO, you can control the read-to-write mix with these arguments, –rw=randrw –rwmixread=70.
Here are a few examples of FIO scripts:
- Sequential Fill 1
$ sudo fio --filename=/dev/nvme0n1 --rw=write --size=7680g --ramp_time= --runtime=fill --bs=512kb --iodepth=8 --numjobs=1 --ioengine=libaio --rwmixread= --buffer_compress_percentage=60 --buffer_compress_chunk=4096 --refill_buffers=1 --randrepeat=0 --norandommap=1 --log_avg_msec=200 --name=preconditioning_fill_1 --direct=1 --group_reporting=1 --output=./preconditioning_fill_1.log
- Sequential Fill 2 (remember two drive fills to reach steady state performance)
$ sudo fio --filename=/dev/nvme0n1 --rw=write --size=7680g --ramp_time= --runtime=fill --bs=512kb --iodepth=8 --numjobs=1 --ioengine=libaio --rwmixread= --buffer_compress_percentage=60 --buffer_compress_chunk=4096 --refill_buffers=1 --randrepeat=0 --norandommap=1 --log_avg_msec=200 --name=preconditioning_fill_2 --direct=1 --group_reporting=1 --output=./preconditioning_fill_2.log
- 4k Random Read
$ sudo fio --filename=/dev/nvme0n1 --rw=randread --size=7680g --ramp_time=240 --runtime=120--bs=4kb --iodepth=32 --numjobs=32 --ioengine=libaio --rwmixread= --buffer_compress_percentage=60 --buffer_compress_chunk=4096 --refill_buffers=1 --randrepeat=0 --norandommap=1 --log_avg_msec=200 --name=4k_random_read --direct=1 --group_reporting=1 --output=./4k_random_read.log
- 4k Random Write
$ sudo fio --filename=/dev/nvme0n1 --rw=randwrite --size=7680g --ramp_time=240 --runtime=120 --bs=4kb --iodepth=32 --numjobs=32 --ioengine=libaio --rwmixread= --buffer_compress_percentage=60 --buffer_compress_chunk=4096 --refill_buffers=1 --randrepeat=0 --norandommap=1 --log_avg_msec=200 --name=4k_random_write --direct=1 --group_reporting=1 --output=./4k_random_write.log
- 4k 70% Read 30% Write
$ sudo fio --filename=/dev/nvme0n1 --rw=randrw --size=7680g --ramp_time=240 --runtime=120 --bs=4kb --iodepth=32 --numjobs=32 --ioengine=libaio --rwmixread=70 --buffer_compress_percentage=60 --buffer_compress_chunk=4096 --refill_buffers=1 --randrepeat=0 --norandommap=1 --log_avg_msec=200 --name=4k_70read_30write --direct=1 --group_reporting=1 --output=./4k_70read_30write.log
- 128k Sequential Write
$ sudo fio --filename=/dev/nvme0n1 --rw=write --size=7680g --ramp_time=240 --runtime=120 --bs=128kb --iodepth=32 --numjobs=32 --ioengine=libaio --rwmixread= --buffer_compress_percentage=60 --buffer_compress_chunk=4096 --refill_buffers=1 --randrepeat=0 --norandommap=1 --log_avg_msec=200 --name=128k_sequential_write --direct=1 --group_reporting=1 --output=./128k_sequential_write.log
- 128k Sequential Read
$ sudo fio --filename=/dev/nvme0n1 --rw=read --size=7680g --ramp_time=240 --runtime=120 --bs=128kb --iodepth=32 --numjobs=32 --ioengine=libaio --rwmixread= --buffer_compress_percentage=60 --buffer_compress_chunk=4096 --refill_buffers=1 --randrepeat=0 --norandommap=1 --log_avg_msec=200 --name=128k_sequential_read --direct=1 --group_reporting=1 --output=./128k_sequential_read.log
Step 5 – Assessing Results
After running your FIO script, you will get a text output. Amy Tobey writes a great blog detailing each part of the FIO results printout. For random workloads, make note of the IOPs. For sequential workloads, make note of the bandwidth.
Though this guide is not exhaustive on assessing an SSD, I hope you found it helpful in providing an outline to follow and offering further resources for additional study. You don’t need to be an expert or spend months testing to get quantifiable data to help your storage purchase decision.
You also have an opportunity to contribute to the knowledge pool. As you write FIO scripts to test drives with your workload, please share them on GitHub or another platform.