Skip to content
Home > Blog > Maximizing IT Infrastructure Efficiency: Hardware vs. Software Compression (Part 2)

Maximizing IT Infrastructure Efficiency: Hardware vs. Software Compression (Part 2)

  • JB Baker 
  • 5 min read

The goal of this Chapter 2 is to answer this important question: Do I use hardware or software compression? 🤔

This is where you start to really consider system configuration, architectural choices and make more complex tradeoffs. 

First off, let’s align on how we’re using these terms:

  • Software compression refers to running the compression algorithm on the system CPU. This is the default method for compressing data. Some applications will have options for SW compression settings and/or have some form of SW compression on by default.
  • Hardware compression refers to using a dedicated state machine that runs the compression algorithm. This state machine is a specific set of circuitries in a chip (SoC, ASIC, or FPGA) that is set up to run one fixed-function (compression or decompression in this case). In computing, repetitive, fixed-function tasks (such as compression & decompression) can be done faster and more efficiently in state machines than in software running on general purpose processors.

Now, let’s take a look at some of the effects of choosing Hardware or Software for compression:

Compression algorithms are computationally intense, especially when dealing with large datasets or when using one of the heavier weight compression algorithms as mentioned in part 1. Offloading compression to dedicated hardware or specialized accelerators can significantly speed up the compression process and improve overall system performance.

Compression tasks can consume a significant amount of CPU resources, especially in scenarios where multiple compression tasks are running concurrently or alongside other computationally demanding processes. While the first-level benefit from offloading is simply “freeing up CPU cycles for other critical tasks,” there’s also a benefit to the efficiency of the CPU cycles. For each compression action, the CPU will have to time slice between compression and the application, which results in extra burden on the CPU beyond what you might see in simple benchmark test that is running compression exclusively instead of running compression concurrently with multiple users pinging the application. The use of larger compression blocks (e.g. 16KB or 32KB) may consume significant amounts of cache, causing other threads from applications to be evicted (cache pollution), further impairing application performance and latency.

Offloading compression to dedicated hardware or accelerators allows you to scale the compression capabilities independently from the CPU. You can add or upgrade compression-specific hardware to handle increased workloads or higher data throughput without necessarily needing either to upgrade the entire system or to over-spec the system in the first place.  If you’re using NVMe SSDs and looking to get most of their performance potential, it’s awfully tough for SW compression to keep up with the drives as you scale beyond 2 drives. 

Figure 1: Can the throughput scale to keep up with your storage? With 12 Xeon cores running SW compression, the CPUs are maxed out before 2 NVMe SSDs are saturated. Drive-based HW compression enables throughput to scale with every drive added.

Compression tasks can be power-hungry, particularly when performed by the CPU. Dedicated compression hardware or accelerators are often optimized for power efficiency, enabling you to achieve compression at a lower energy cost. This benefit is particularly important in mobile devices, embedded systems, or battery-powered devices where power consumption is a concern.  In Part 1, I referenced the compression throughput per core with various compression algorithms.  Taking a look at the power that CPU compression would consume in comparison to HW compression is eye-popping – even by sacrificing storage capacity savings with LZ4, you’re looking at 100x the power per GB of data compressed using SW instead of HW1.

Figure 2 – Total power needed to provide 4GB/s of compression throughput with GZIP and LZ4 SW running on Xeon Gold cores vs GZIP-equivalent compression running in a HW state machine in the SSD controller. Lower is better!!

Some compression algorithms, such as hardware-accelerated codecs like H.264 or HEVC for video compression, are specifically designed to leverage dedicated hardware for optimal performance. Offloading these algorithms to specialized hardware can yield significant efficiency gains and higher compression ratios compared to relying solely on the CPU.

Compression tasks can often be parallelized, meaning that they can be split into multiple subtasks and processed simultaneously. Dedicated compression hardware or accelerators often offer parallel processing capabilities, allowing for faster compression and decompression times compared to sequential processing on a CPU.

Overall, HW compression offers significantly better power efficiency and scalability of compression throughput without introducing the application performance penalties that can come from SW compression.  While some workloads won’t run into the penalties, applications that need the high read/write performance and low latency of SSDs can quickly become constrained by SW compression.

In part 3, we’ll take a look at the last couple of questions – where to compress and how to deal with encryption. 

JB Baker

JB Baker

JB Baker is a successful technology business leader with a 20+ year track record of driving top and bottom line growth through new products for enterprise and data center storage. He joined ScaleFlux in 2018 to lead Product Planning & Marketing as we expand the capabilities of Computational Storage and its adoption in the marketplace.