It is the fifth part of our serial blogs on In-storage Transparent Compression: A Catalyst for Innovating Infrastructure Software. Part 5: Implement a Table-less Hash-based Key-Value Store
To implement a key-value (KV) store, the core design decision is the index data structure that can be either tree-based or hash-based. In conventional practice, a hash-based KV store must use an in-memory hash table to maintain the mapping from the key space to storage space. As a result, the memory usage of a hash-based KV store linearly scales with the size of the key space (i.e., the total number of KV pairs). In contrast, tree-based index data structures, e.g., B+ tree and log-structured merge (LSM) tree, consume much less memory resource by keeping the vast majority of index on storage devices. Therefore, although hash-based index can support higher access throughput and lower access latency, it is only used by in-memory data stores (e.g., Redis) where the total number of KV pairs is relatively small. Large-scale, storage-resident KV stores are almost always built upon tree-based index data structure.
This blog shows that in-storage transparent compression for the first time makes it possible to implement a large-scale hash-based KV store at very small usage of memory resource. To fundamentally overcome the memory cost barrier of hash-based KV store, the only option is to hash the key space directly onto the logical storage space, without using a hash table at all. Nevertheless, KV pairs can no longer be tightly placed over the logical storage space, leading to substantial logical storage space under-utilization (e.g., almost all the 4KB LBA blocks have some empty space left unoccupied). Therefore, when running on conventional storage hardware, such table-less KV store will suffer from a prohibitively high storage cost and hence is not practically viable. In contrast, as pointed out in the previous blogs, storage hardware with built-in transparent compression enables virtually variable-size block I/O, which naturally embraces the logical storage space under-utilization within each 4KB LBA block. Therefore, when running on storage hardware with built-in transparent compression, table-less hash-based KV store can eliminate the memory cost obstacle without sacrificing storage cost. This for the first time makes hash-based approach a viable alternative to its tree-based counterparts on implementing large-capacity KV store.
Accordingly, Fig. 1 illustrates the conventional hash-based KV store (on the left) and the table-less hash-based KV store (on the right). Let K denote the key space of the KV store and L denote the logical LBA storage space. We use a hash function fK®L to hash each key kiÎK directly onto one liÎL, without going through an intermediate hash table. By obviating the use of a hash table, it eliminates the memory cost obstacle faced by conventional implementation of hash-based KV store. Moreover, it relieves CPU from managing/searching hash table, leading to less CPU usage.
For the purpose of demonstration, we implemented a table-less hash-based KV store called KallaxDB and compared it with RocksDB and WiredTiger. We ran all the experiments on a server with 24-core 2.6GHz Intel CPU, 64GB DDR4 DRAM, and a 3.2TB CSD 2000 with built-in transparent compression. We use the following five YCSB benchmarks: YCSB A (50% reads, 50% updates), YCSB B (95% reads, 5% updates), YCSB C (100% reads), YCSB D (95% reads, 5% inserts), and YCSB F (50% reads, 50% read-modify-writes). We load around 400GB dataset with 400B KV pair size into the three KV stores, and the following table shows the logical/physical storage space usage and index memory usage. The results clearly show that although KallaxDB occupies a much larger logical storage space, its physical storage cost is very similar to the others. Meanwhile KallaxDB has a much smaller memory usage compared with others.
Fig. 2 shows the measured results of average operational throughput (i.e., ops/s), average read latency, 99-percentile read latency, and CPU efficiency. We quantify the CPU efficiency in terms of “CPU cycles per KV store operation”. The results in Fig. 2 show that the table-less hash-based KV store consistently outperforms RocksDB and WiredTiger on all the performance and CPU usage metrics. For example, under the workload YCSB F, KallaxDB outperforms RocksDB and WiredTiger by 2.6x and 5.8x, respectively. Compared with RocksDB and WiredTiger, KallaxDB achieves 1.4x~2.7x shorter average read latency across all the five YCSB workloads. On average, the CPU efficiency of KallaxDB is 2.3x~3.2x better than that of RocksDB, and 2.8x~4.7x better than that of WiredTiger. For details, please refer to our paper “KallaxDB: A Table-less Hash-based Key-Value Store on Storage Hardware with Built-in Transparent Compression” published on the International Workshop on Data Management on New Hardware in 2021.