SPOTLIGHT | 信息 |
News 信息 Events 展览日历 Blog 博客
SCALEFLUX BLOG 锐开博客

Dr. Tong Zhang Chief Scientist Co-Founder ScaleFlux Computational Storage
ABOUT THE AUTHOR 首席科学家
创始人
DR. TONG ZHANG 张彤博士
Dr. Tong Zhang is a well-established researcher with significant contributions to the areas of data storage systems and VLSI signal processing. He is a co-founder and Chief Scientist of ScaleFlux, responsible for developing key techniques and algorithms for Computational Storage products and exploring their optimal use in mainstream application domains. Dr. Zhang is currently a Professor at Rensselaer Polytechnic Institute. 美国伦斯勒理工学院正教授、博导,数据存储和信号处理硬件体系结构方面的国际顶尖学者。作为 ScaleFlux 的联合创始人与首席科学家,张教授主要负责产品关键技术与算法的研发、以及探索其产品在各个主要应用领域(如数据库)内的最优集成。张教授有超过15年的从业经验,其科研领域涉及数据库、文件系统、固态与磁盘数据存储设备及系统、数据信号处理与通信、纠错码、大规模集成电路架构、以及计算机体系结构。发表超过150篇高质量论文,其论文引用率的 H-Index 高达36,并作为主席主持过多场国际会议。张教授对闪存信号处理这一新兴的科研方向做出过开创性的工作,他的科研团队对低密度奇偶校验码(LDPC)在实际应用中的大规模普及做出过非常关键的贡献。张教授获过两次会议最佳论文奖、拥有二十多项美国专利。张教授本科及研究生毕业于西安交通大学电信工程系、博士毕业于美国明尼苏达大学电子与计算机工程系 。
Computational Storage: An Inevitable Trend and ScaleFlux’s First Step Computational Storage: An Inevitable Trend and ScaleFlux’s First Step
May 8th, 2018 | 2018年5月8日 |
4 Minute Read | 4 Minute Read |
Dr. Tong Zhang Dr. Tong Zhang

ScaleFlux officially launched the industry-first Computational Storage product in 2017, and is now in volume revenue shipment of its CSS 1000 Series (Computational Storage Subsystem) to customers worldwide. As a new entrant into the IT ecosystem, Computational Storage may be quite unfamiliar to most people and garner questions such as:

  1. What is Computational Storage?
  2. Why do I need Computational Storage?
  3. How will Computational Storage evolve and eventually become mainstream?

This post attempts to address these questions from ScaleFlux’s perspective.

A Simple and Decades-Old Idea

The essence of Computational Storage is to empower data storage devices with additional processing or computing capability. Loosely speaking, any data storage device (e.g., HDD, SSD, or DIMM) that can carry out any data processing tasks beyond its core data storage duty can be classified as Computational Storage. Note that we use the term “storage” in a general sense, and it does not have to be non-volatile or persistent.

The simple idea of empowering data storage devices with additional computing capability is certainly not new and can be traced back to more than 20 years ago, e.g., the intelligent memory (IRAM) and intelligent disks (IDISKs) papers from the renowned Professor David Patterson’s group at UC Berkeley around 1997. Fundamentally, Computational Storage complements host CPUs (and GPUs) to form a heterogeneous computing platform, where the computing power naturally scales with the data volume. Early academic research showed that such heterogeneous computing platforms could significantly improve the performance and/or energy efficiency for a variety of applications including database, graph processing, and scientific computing. However, industry chose not to commercialize this idea into the real world because the potential benefits could not justify the upfront cost at the time.

Computational Storage is fundamentally subject to two cost overheads:

  1. Manufacturing costs: the cost of adding computing engines into data storage devices
  2. System integration costs: the cost of modifying the application source code and existing infrastructure software/hardware (SW/HW) stack (e.g., OS kernel, filesystem, and I/O interface) in order to embrace the underlying heterogeneous computing platform

As a result, this topic has become largely dormant over the years.

Its Resurgence Today

The idea of Computational Storage recently received a significant resurgence of interest from both academia and industry driven by two sweeping trends:

  1. There is a growing consensus that heterogeneous computing (e.g., complementing CPU with GPU and/or FPGA) becomes increasingly necessary and indispensable as the CMOS technology scaling is slowing down
  2. The significant progress of high-speed solid-state data storage technologies (NAND Flash and emerging non-volatile memory technologies such as 3DXP and STT-RAM) pushes the system bottleneck from data storage to computing

The concept of Computational Storage very nicely matches to these two major trends, and as a result, we have seen a resurgent interest on this topic over the past few years, during which many terms were coined, such as intelligent SSD, smart SSD, near-memory computing, and in-memory computing. At ScaleFlux, we are fully convinced that these trends open the door for transferring this decades-old idea from academic papers into real-world applications. That being said, as the famous quote:

“Those who fail to learn from history are doomed to repeat it.”

teaches us, we must genuinely acknowledge that Computational Storage still faces the same cost obstacles today as it did 20 years. Although the stated trends certainly give the industry more incentive to overcome the cost obstacles, we by no means can simply ignore them, especially at this early stage of deployment. We must very judiciously take into account the benefit vs. cost trade-off as we find our way forward on this unexplored path.

ScaleFlux’s View

How will Computational Storage evolve and eventually become mainstream? Without a crystal ball, we cannot definitively predict the long-term path it will take. Nevertheless, it is reasonable to expect that the path forward will be evolutionary, and a fundamental key to adoption is the gradual elimination of the stated cost obstacles.

Common sense tells us that one should start picking low-hanging fruit that naturally exhibit favorable benefits versus cost trade-offs. Specifically, it makes sense to initially avoid the challenge of convincing users to change their application source codes and existing infrastructure SW/HW stack, and instead focus on applications that do not require any system integration cost at all. In other words, we should choose computing tasks that can be migrated into storage devices with complete transparency to user applications and existing infrastructure SW/HW stack.

Meanwhile, to best justify the manufacturing cost, we should choose computing tasks that are widely applicable in mainstream applications, highly compute-intensive, and optimally fit to customized FPGA/ASIC implementations. Equally important, Computational Storage devices must achieve top-notch raw storage performance (e.g., in terms of IOPS, throughput, latency, and predictability), and the total BOM (bill of materials) should be sufficiently low and at least comparable to commodity storage devices with similar performance.

Successful commercialization of Computational Storage should start from being transparent to applications and existing infrastructure SW/HW stack as much as possible, even if the overall system performance gain from the corresponding heterogeneous computing platform does not result in the 100x scale (as usually claimed by some “revolutionary” products).

This explains why ScaleFlux has designed and shipped its first generation CSS 1000 product with the following features:

  1. Near-zero system integration cost overhead: The chosen computing tasks include lossless data compression (in particular zlib), erasure coding, and AES encryption, all of which are typically encapsulated in stand-alone libraries with clean and simple APIs exposed to applications and systems. We migrate the core of those computing tasks into the FPGA on the storage device while maintaining the exactly same APIs. Meanwhile, the product leverages NVMe infrastructure and easily integrates into existing application environments. Therefore, our customers can enjoy accelerated computing with minor changes on their application source code and existing infrastructure SW/HW stack.
  2. Low manufacturing cost overhead: Thanks to our highly optimized flash memory controller design, ScaleFlux Computational Storage is able to use a single, moderate-cost FPGA device to handle both flash memory control and computing tasks. In addition, the FPGA device incorporates a very powerful LDPC (low-density parity-check) engine that can achieve superior error correction performance, which enables the use of low-cost 3D NAND flash memory.
  3. High-performance data storage: Data storage performance is on par with (and even beyond) existing high-end NVMe SSDs, e.g. >500k random read IOPS and 3GB/s sequential read throughput. Even more important, it has a well-controlled tail latency profile, which is highly desirable for mission-critical applications. For example, it noticeably outperforms high-end NVMe SSDs when serving the key-value store Aerospike that has the most stringent requirements on tail latency (please refer to Aerospike SSD certification page for details).

The above discussion also helps to answer the following frequently asked questions:

  1. Why not integrate CPU cores (e.g., ARM or emerging RISC-V) into Computational Storage devices to carry out the computation?
    To win the benefit vs. cost argument at this early stage, Computational Storage should focus on those highly compute-intensive and meanwhile well-defined computing tasks. If a powerful Xeon processor has a hard time to execute these computing tasks, how much could integrated ARM help? In addition, the programming of integrated CPU cores may incur significant system integration cost for end users.
  2. Why not accelerate tasks like machine learning and artificial intelligence (everyone in town is talking about them, right?)?
    We fully understand the importance of machine learning and artificial intelligence, and have no doubt that future Computational Storage could and should play a role for those applications. Nevertheless, the inherently high system integration cost prevents them from being desirable initial targets for Computational Storage. Moreover, the quickly evolving machine learning algorithms/frameworks further make the situation much more complicated.
  3. Why not do in-storage computing (e.g., Computational Storage devices internally carry out operations such as image recognition or graph processing on the stored data, and simply send the final results to host)?
    Indeed in-storage computing appears to be very appealing and revolutionary, and we believe that Computational Storage with in-storage computing capability will eventually evolve and help to address many important problems. However, again the very high cost for system integration makes it unlikely to be a viable near-term target. In addition, to achieve a reasonably good in-storage processing performance compared with CPU/GPU-based computing, Computational Storage device must incorporate sufficiently powerful processors and enough DRAM resource, which may significantly increase the device manufacturing cost.

The effort to enhance and optimize the design of our CSS 1000 Series products is done in tandem with expanding our exploration towards a future where truly general-purpose Computational Storage is mainstream. Clearly, as we depart from “low-hanging fruit” application targets, Computational Storage will no longer be transparent to applications or the underlying infrastructure SW/HW stack. At that point in time, we must be prepared to overcome higher system integration cost and eventually establish industry-wide standardized framework for architecting, implementing, and deploying Computational Storage.

We believe that it will be a long and evolutionary path forward, and we can only accomplish the goal when the entire industry collaborates together. ScaleFlux’s first significant step in the commercialization of Computational Storage will serve as a good starting point to explore this exciting new paradigm. We have already started discussions and joint efforts with a variety of well-established industry giants and other fast moving startup companies to move Computational Storage to its next stage.

Stay tuned, and we look forward to sharing new results and discoveries from our market exploration and deployment of Computational Storage.

ScaleFlux officially launched the industry-first Computational Storage product in 2017, and is now in volume revenue shipment of its CSS 1000 Series (Computational Storage Subsystem) to customers worldwide. As a new entrant into the IT ecosystem, Computational Storage may be quite unfamiliar to most people and garner questions such as:

  1. What is Computational Storage?
  2. Why do I need Computational Storage?
  3. How will Computational Storage evolve and eventually become mainstream?

This post attempts to address these questions from ScaleFlux’s perspective.

A Simple and Decades-Old Idea

The essence of Computational Storage is to empower data storage devices with additional processing or computing capability. Loosely speaking, any data storage device (e.g., HDD, SSD, or DIMM) that can carry out any data processing tasks beyond its core data storage duty can be classified as Computational Storage. Note that we use the term “storage” in a general sense, and it does not have to be non-volatile or persistent.

The simple idea of empowering data storage devices with additional computing capability is certainly not new and can be traced back to more than 20 years ago, e.g., the intelligent memory (IRAM) and intelligent disks (IDISKs) papers from the renowned Professor David Patterson’s group at UC Berkeley around 1997. Fundamentally, Computational Storage complements host CPUs (and GPUs) to form a heterogeneous computing platform, where the computing power naturally scales with the data volume. Early academic research showed that such heterogeneous computing platforms could significantly improve the performance and/or energy efficiency for a variety of applications including database, graph processing, and scientific computing. However, industry chose not to commercialize this idea into the real world because the potential benefits could not justify the upfront cost at the time.

Computational Storage is fundamentally subject to two cost overheads:

  1. Manufacturing costs: the cost of adding computing engines into data storage devices
  2. System integration costs: the cost of modifying the application source code and existing infrastructure software/hardware (SW/HW) stack (e.g., OS kernel, filesystem, and I/O interface) in order to embrace the underlying heterogeneous computing platform

As a result, this topic has become largely dormant over the years.

Its Resurgence Today

The idea of Computational Storage recently received a significant resurgence of interest from both academia and industry driven by two sweeping trends:

  1. There is a growing consensus that heterogeneous computing (e.g., complementing CPU with GPU and/or FPGA) becomes increasingly necessary and indispensable as the CMOS technology scaling is slowing down
  2. The significant progress of high-speed solid-state data storage technologies (NAND Flash and emerging non-volatile memory technologies such as 3DXP and STT-RAM) pushes the system bottleneck from data storage to computing

The concept of Computational Storage very nicely matches to these two major trends, and as a result, we have seen a resurgent interest on this topic over the past few years, during which many terms were coined, such as intelligent SSD, smart SSD, near-memory computing, and in-memory computing. At ScaleFlux, we are fully convinced that these trends open the door for transferring this decades-old idea from academic papers into real-world applications. That being said, as the famous quote:

“Those who fail to learn from history are doomed to repeat it.”

teaches us, we must genuinely acknowledge that Computational Storage still faces the same cost obstacles today as it did 20 years. Although the stated trends certainly give the industry more incentive to overcome the cost obstacles, we by no means can simply ignore them, especially at this early stage of deployment. We must very judiciously take into account the benefit vs. cost trade-off as we find our way forward on this unexplored path.

ScaleFlux’s View

How will Computational Storage evolve and eventually become mainstream? Without a crystal ball, we cannot definitively predict the long-term path it will take. Nevertheless, it is reasonable to expect that the path forward will be evolutionary, and a fundamental key to adoption is the gradual elimination of the stated cost obstacles.

Common sense tells us that one should start picking low-hanging fruit that naturally exhibit favorable benefits versus cost trade-offs. Specifically, it makes sense to initially avoid the challenge of convincing users to change their application source codes and existing infrastructure SW/HW stack, and instead focus on applications that do not require any system integration cost at all. In other words, we should choose computing tasks that can be migrated into storage devices with complete transparency to user applications and existing infrastructure SW/HW stack.

Meanwhile, to best justify the manufacturing cost, we should choose computing tasks that are widely applicable in mainstream applications, highly compute-intensive, and optimally fit to customized FPGA/ASIC implementations. Equally important, Computational Storage devices must achieve top-notch raw storage performance (e.g., in terms of IOPS, throughput, latency, and predictability), and the total BOM (bill of materials) should be sufficiently low and at least comparable to commodity storage devices with similar performance.

Successful commercialization of Computational Storage should start from being transparent to applications and existing infrastructure SW/HW stack as much as possible, even if the overall system performance gain from the corresponding heterogeneous computing platform does not result in the 100x scale (as usually claimed by some “revolutionary” products).

This explains why ScaleFlux has designed and shipped its first generation CSS 1000 product with the following features:

  1. Near-zero system integration cost overhead: The chosen computing tasks include lossless data compression (in particular zlib), erasure coding, and AES encryption, all of which are typically encapsulated in stand-alone libraries with clean and simple APIs exposed to applications and systems. We migrate the core of those computing tasks into the FPGA on the storage device while maintaining the exactly same APIs. Meanwhile, the product leverages NVMe infrastructure and easily integrates into existing application environments. Therefore, our customers can enjoy accelerated computing with minor changes on their application source code and existing infrastructure SW/HW stack.
  2. Low manufacturing cost overhead: Thanks to our highly optimized flash memory controller design, ScaleFlux Computational Storage is able to use a single, moderate-cost FPGA device to handle both flash memory control and computing tasks. In addition, the FPGA device incorporates a very powerful LDPC (low-density parity-check) engine that can achieve superior error correction performance, which enables the use of low-cost 3D NAND flash memory.
  3. High-performance data storage: Data storage performance is on par with (and even beyond) existing high-end NVMe SSDs, e.g. >500k random read IOPS and 3GB/s sequential read throughput. Even more important, it has a well-controlled tail latency profile, which is highly desirable for mission-critical applications. For example, it noticeably outperforms high-end NVMe SSDs when serving the key-value store Aerospike that has the most stringent requirements on tail latency (please refer to Aerospike SSD certification page for details).

The above discussion also helps to answer the following frequently asked questions:

  1. Why not integrate CPU cores (e.g., ARM or emerging RISC-V) into Computational Storage devices to carry out the computation?
    To win the benefit vs. cost argument at this early stage, Computational Storage should focus on those highly compute-intensive and meanwhile well-defined computing tasks. If a powerful Xeon processor has a hard time to execute these computing tasks, how much could integrated ARM help? In addition, the programming of integrated CPU cores may incur significant system integration cost for end users.
  2. Why not accelerate tasks like machine learning and artificial intelligence (everyone in town is talking about them, right?)?
    We fully understand the importance of machine learning and artificial intelligence, and have no doubt that future Computational Storage could and should play a role for those applications. Nevertheless, the inherently high system integration cost prevents them from being desirable initial targets for Computational Storage. Moreover, the quickly evolving machine learning algorithms/frameworks further make the situation much more complicated.
  3. Why not do in-storage computing (e.g., Computational Storage devices internally carry out operations such as image recognition or graph processing on the stored data, and simply send the final results to host)?
    Indeed in-storage computing appears to be very appealing and revolutionary, and we believe that Computational Storage with in-storage computing capability will eventually evolve and help to address many important problems. However, again the very high cost for system integration makes it unlikely to be a viable near-term target. In addition, to achieve a reasonably good in-storage processing performance compared with CPU/GPU-based computing, Computational Storage device must incorporate sufficiently powerful processors and enough DRAM resource, which may significantly increase the device manufacturing cost.

The effort to enhance and optimize the design of our CSS 1000 Series products is done in tandem with expanding our exploration towards a future where truly general-purpose Computational Storage is mainstream. Clearly, as we depart from “low-hanging fruit” application targets, Computational Storage will no longer be transparent to applications or the underlying infrastructure SW/HW stack. At that point in time, we must be prepared to overcome higher system integration cost and eventually establish industry-wide standardized framework for architecting, implementing, and deploying Computational Storage.

We believe that it will be a long and evolutionary path forward, and we can only accomplish the goal when the entire industry collaborates together. ScaleFlux’s first significant step in the commercialization of Computational Storage will serve as a good starting point to explore this exciting new paradigm. We have already started discussions and joint efforts with a variety of well-established industry giants and other fast moving startup companies to move Computational Storage to its next stage.

Stay tuned, and we look forward to sharing new results and discoveries from our market exploration and deployment of Computational Storage.