In this blog published on Infoworld, Dr Tong Zhang describes how offloading heavy-duty computation tasks to storage nodes would work to enable efficient scaling of data center performance.
Introducing Computational Storage Drive
The inevitable slowing down of Moore’s Law has pushed the computing industry to undergo a paradigm shift from the traditional CPU-only homogeneous computing to heterogeneous computing. With this change, CPUs are complemented with special-purpose domain-specific computing fabrics. As we’ve seen over time, this is well reflected by the tremendous growth of hybrid-CPU/GPU computing, significant investment on AI/ML processors, wide deployment of SmartNIC, and more recently the emergence of computational storage drives. Not surprisingly, as a new entrant into the computing landscape, the computational storage drive sounds quite unfamiliar to most people and many questions naturally arise, e.g., “what is a computational storage drive?”, “where should it be used?”, and “what kind of computational function/capability should it provide?”.
Resurgence of A Simple and Decades-Old Idea
The essence of computational storage is to empower data storage devices with additional data processing or computing capabilities. Loosely speaking, any data storage device -built on any storage technology, -such as flash memory and magnetic recording, -that can carry out any data processing tasks beyond its core data storage duty can be called a computational storage drive. The simple idea of empowering data storage devices with additional computing capability is certainly not new. It can be traced back to more than 20 years ago through the intelligent memory (IRAM) and intelligent disks (IDISKs) papers from Professor David Patterson’s group at UC Berkeley around 1997. Fundamentally, computational storage complements host CPUs to form a heterogeneous computing platform. This even stems back to when early academic research showed that such a heterogeneous computing platform can significantly improve the performance and/or energy efficiency for a variety of applications like database, graph processing, and scientific computing. However, the industry chose not to adopt this idea for real world applications simply because previous storage professionals could not justify the investment on such a disruptive concept in the presence of the steady CPU advancement. As a result, this topic has become largely dormant over the past two decades.
Fortunately, this idea recently received a significant resurgence of interest from both academia and industry. It is driven by two grand industrial trends:
- There is a growing consensus that heterogeneous computing must play an increasingly important role as the CMOS technology scaling is slowing down.
- The significant progress of high-speed solid-state data storage technologies pushes the system bottleneck from data storage to computing. The concept of computational storage natively matches these two grand trends. Not surprisingly, we have seen a resurgent interest on this topic over the past few years, not only from academia but also arguably more importantly, from the industry. Momentum in this space was highlighted when the NVMe standard committee recently commissioned a working group to extend NVMe for supporting computational storage drives, and SNIA (Storage Networking Industry Association) formed a working group on defining the programming model for computational storage drives.
Going into the Real World
As data centers have become the cornerstone of modern information technology infrastructure and are responsible for the storage/processing of ever-exploding amounts of data, they are clearly the best place for computational storage drives to start the journey towards real world application. However, the key question here is how computational storage drives can best serve the needs of data centers. Data centers prioritize on cost saving, and its hardware TCO (total cost of ownership) can only be reduced via two venues: (1) cheaper hardware manufacturing, and (2) higher hardware utilization. The slow-down of technology scaling forces data centers to increasingly rely on the second venue, which naturally leads to the current trend towards compute and storage disaggregation. Even though not including the term “computation” in its name, storage nodes in disaggregated infrastructure can be responsible for a wide range of heavy-duty computational tasks:
- Storage-centric computation: Cost saving demands the pervasive use of at-rest data compression in storage nodes. Lossless data compression is well known for its significant CPU overhead, mainly because of the high CPU cache miss rate caused by the randomness in compression data flow. Meanwhile, storage nodes must ensure at-rest data encryption too. Moreover, data deduplication and RAID or erasure coding can also be on the task list of storage nodes. All these storage-centric tasks demand a significant amount of computing power.
- Network-traffic-alleviating computation: Disaggregated infrastructure imposes a variety of application-level computation tasks onto storage nodes in order to greatly alleviate the burden on inter-node networks. In particular, compute nodes could off-load certain low-level data processing functions like projection, selection, filtering, and aggregation to storage nodes in order to largely reduce the amount of data that must be transferred back to compute nodes.
To reduce storage node cost, it is necessary to off-load heavy computation loads from CPUs. In comparison to off-loading computations to separate standalone PCIe accelerators for conventional design practice, directly migrating computation into each storage drive is a much more scalable solution. In addition, it minimizes data traffic over memory/PCIe channels, and avoids data computation/transfer hotspots. This naturally calls for the computational storage drives. Apparently, storage-centric computation tasks (in particular compression and encryption) are the most convenient low-hanging fruits for computational storage drives to pick. Their computation-intensive and fixed-function nature renders compression/encryption perfectly suited for being implemented as customized hardware engines inside computational storage drives.
Moving beyond storage-centric computation, computational storage drives could further assist storage nodes to perform computation tasks that aim to alleviate the inter-node network data traffic. The computation tasks in this category are application-dependent and hence demand a programmable computing fabric (e.g., ARM/RISC-V cores or even FPGA) inside computational storage drives. It is evident that “computation” and “storage” inside computational storage drives must cohesively and seamlessly work together in order to provide the best possible end-to-end computational storage service. In the presence of continuous improvement of host-side PCIe and memory bandwidth, tight integration of “computation” and “storage” becomes even more important for computational storage drives. Therefore, it is necessary to integrate “computing fabric” and “storage media control fabric” into one chip.
Architecting Computational Storage Drives
At a glance, a commercially viable computational storage drive should have the architecture as illustrated in the following figure: A single chip integrates flash memory control and computing fabrics that are connected via very high-bandwidth on-chip bus, and the flash memory control fabric can serve flash access requests from both the host and the computing fabric. Given the universal at-rest compression/encryption in data centers, computational storage drives must own compression/encryption in order to further assist any application-level computation tasks. Therefore, computational storage drives must strive to provide the best-in-class support of compression and encryption, ideally in both in-line and off-loaded modes.
For the in-line compression/encryption, computational storage drives implement compression and encryption directly along the storage IO path, being transparent to the host: For each write IO request, data go through the pipelined compression→encryption→write-to-flash path; for each read IO request, data go through the pipelined read-from-flash→decryption→decompression path. Such in-line data processing minimizes the latency overhead induced by compression/encryption, which is highly desirable for latency-sensitive applications such as relational databases. Moreover, computational storage drives may integrate additional compression and security hardware engines to provide off-loading service through well-defined APIs. Security engines could include various modules such as root-of-trust, random number generator, and multi-mode private/public key ciphers. The embedded processors are responsible for assisting host CPUs on implementing various network-traffic-alleviating functions. Finally, it’s key to remember that a good computational storage drive must first be a good storage device. Its IO performance must be at least comparable to that of normal storage drive. Without a solid foundation of storage, computation becomes practically irrelevant and meaningless.
Following the above intuitive reasoning and the naturally derived architecture, ScaleFlux Inc. (a silicon-valley startup company) has successfully launched the world-first computational storage drives for data centers. Its products are being deployed in hyperscale and webscale data centers worldwide, helping data center operators to reduce the system TCO from two aspects:
- Storage node cost reduction: The CPU load reduction enabled by ScaleFlux’s computational storage drives allows storage nodes to reduce the CPU cost. Therefore, without changing the compute/storage load on each storage node, one can directly deploy computational storage drives to reduce the per-node CPU and storage cost.
- Storage node consolidation: One could leverage the CPU load reduction and intra-node data traffic reduction to consolidate the workloads of multiple storage nodes into one storage node. Meanwhile, the storage cost reduction enabled by computational storage drives largely increases the per-drive storage density/capacity, which can well support the storage nodes consolidation.
Looking into the Future
The inevitable paradigm shift towards heterogeneous and domain-specific computing opens a wide door for opportunities and innovations. Natively echoing the wisdom of moving computation closer to data, computational storage drives are destined to become an indispensable component in future computing infrastructure. Driven by the industry-wide standardization efforts (e.g., NVMe and SNIA), this emerging area is being very actively pursued by more and more companies. It would be very exciting to see how this new disruptive technology will rapidly progress and evolve over the next a few years.
Want to learn more? Read Dr Zhang’s White Paper “Computational Storage Drives and Data Processing Units: Friends or Foes?”