Mat Young, Principal Solutions Architect
Today, I’m going to take you through what computational storage is and how it can help your business. A little about myself. I was lucky enough to join the industry back in the 1990s. I’ve had the privilege of working with some really great large companies such as Data General and Microsoft, and I’ve also had some fun with some startups, most notably Fusion IO.
I was very excited to be asked to join the ScaleFlux team back in December of last year. And today I’m going to take you through what computational storage is and how you might take advantage of it.
A proven approach, now applied to storage
So computational storage. It’s not a new idea, particularly its new methodology, but we’re all familiar with the idea of graphic processing units or GPUs from MVD or AMD and others that have the capability of being able to perform some very specialist mathematics that relieves the pressure from the CPU from doing those functions.
And sometimes they do it even better than the CPU can do, which gains even more power from that compute resource in a similar way. We have the idea of smart nics or DPU. Is this again a network interface cards with an amount of compute on board that can do things like compression, decompression, encryption and many other things at wire speed.
So that leaves us with computational storage or computational storage very simply is the idea of a storage device that has an element of compute on to perform a specialist function. These can come in many different shapes and sizes. We have now based SSDs, we have some devices on the memory bus with DRAM. They may have FPGAs or ASICs, but their basic role remains.
The same is we use the right compute type of resource as close to data as possible so that we can relieve pressure on the CPU and get more work done.
So let’s look at the ScaleFlux approach. We take our own SSD design. This designed to be as physically performant as any Samsung Intel or Micron device. And then we had our own custom silicon to give it some additional function that enables it to perform with better performance, better capacity utilization and more endurance. Now, there’s a couple of ways you could do this using FPGA.
A slightly easier to design, but they consume host resources and can be more complex in other ways, and they’re not quite as fast as an ASIC design and delivering an ASIC is, from an engineering point, extremely tricky and very expensive if you’ve get it wrong. ScaleFlux managed to deliver this with their first ever Silicon design, and that’s what sits at the heart of the CSD 3000.
This exec allows us to transparently compress and decompress data before writing to the NAND. Well, what does this deliver? Very simply, if you write less bits to the NAND, you have an increase in performance from an IOPS or bandwidth perspective from the host, and you also have the ability to run the device to a far higher capacity utilization before you see any abnormal performance side effects.
And again, if you’re writing less data, it means that your endurance and your reliability is greatly increased. So looking at the design ahead of you, you can see that we’re a PCI Gen4 by 4 interface. We then have, as I mentioned, our own customized take on board that is then backed up by industry leading NAND to persist and store the data.
When we bring this all together, even with a modest around amount of data compression, this allows you to get industry leading performance both in IOPS and in bandwidth. It allows you to utilize your device well above 85, 90% in some cases, and also allows you to expand the namespace. So you may see double the capacity or even triple the capacity that you can use on the device.
And then finally, because we’re writing less data to the NAND, this allows us to go from a five drive write per day device to a nine plus drive write per day device.
Okay. So we’ve talked about the hype around computational storage, what it is, what it might be, what it currently is. I want to take a little bit of time now to deep dive on the ScaleFlux approach. I’m running through what, a CSD 3000 is and how it’s made up, and to do that, I’m going to use my digital whiteboard to have a ScaleFlux approach.
Designing a computational storage device. Well, first off is we need to build an SSD that is physically the equal of any of the other enterprise class SSDs. And on board we have some NAND, we use 16 channels to access that. NAND, which is common design practice for other enterprise cost SSD as we choose a you to form factor.
This is the one of the most common form factors for our SSDs. And as we move forward into generation five in enterprise service, we’ll do other form factors such as E1s, & E3s, etc. What else do we have on board? Well, we need a portion of DRAM and the DRAM does a few things. It acts as an area buffer to land data before we commit it to the NAND channels and it allows us to manage our flash to translation layer and at the same time we do a few other functions without DRAM as well.
The next slide and this is the heart of our system and something that makes us a little different is our ASIC, also known as an SEC. Now this is custom designed by ScaleFlux and inside it has a number of arm cores. And what do those arm cores do? Well, the first function that we implemented in computational storage was an inline transparent compression and decompression comp and decouple.
So what’s the other side of this? Well, to get data into the device, we are a PCI e gen4 interface. It does mean work on older Gen three servers, but you’re not going to quite see the performance that you would from GM, from a gen Phobos. So as we write data, it comes from the host into the DRAM, flows through, our ASIC gets compressed and then gets written to the non NAND device where it’s persisted for data coming out.
Very similar. We read from those NAND channels it gets hydrated or decompressed as it comes out the device and then to the host. So you can see from here that compression is always on. There’s different ways you can use it, either having a regular sized namespace or you could do something called Expanded Namespace. And we’re going to come on to the details of that a little later on.
So where can we use this wonderful ScaleFlux devices? Let’s give ourselves a little space. We’ve seen customers use it on large datasets, particularly focused around machine learning and artificial intelligence. We have customers using it today in databases, both traditional structured databases and unstructured databases such as NoSQL. We also see customers using it for streaming data. These streams might be a Iot of data coming in from sensors, or they could be examples of market data where we see financial institutions very interested in being able to take that high speed network packets and being able to manipulate those and store them and then calculate risk upon them very quickly.
ScaleFlux hasn’t worked out every use case. We have customers that come to us almost on a daily basis with Could your device help us with this problem? And that’s one of the things that we enjoy about the job is we’re helping customers work out new ways of using these devices to make their workloads that much more efficient and that much faster.
Who knows, maybe the next sci fi motion blockbuster might be even filmed and shot and stored on scale. Flux devices. We certainly are having those conversations today. So far it sounds pretty straightforward and pretty simple. And in essence, the idea is that simple. The benefit you get from compressing data on the storage device as opposed to elsewhere in the CPU, or only on the network, allows you to gain additional performance from your application.
Now, this can have many side effects. This could be increased license efficiency if you’re paying for a paid application such as SQL Server or Oracle, it could be that you’re particularly sensitive to rack power, thermal or other considerations from a sustainability perspective. But the point is by using computational storage, especially with one with compression built into the storage device itself, it allows you to be more efficient on many, many levels.
So how can I take advantage of this today? And what’s in it for me? Well, the first thing you should look at is, is your application performance bound? Are you looking at system resource constraints either in DRAM, CPU or you’re just that your storage device or storage array can’t keep up? That’s the first indication that this could be a great fit.
The next one is, is your dataset compressible? Now, to be fair, most people don’t know whether their data is compressible or not, and some people have been led to believe that their data is completely uncompressible. That can happen, but it’s extremely rare. By working with ScaleFlux, we can provide you a tool that allows you to assess the compressed ability of your dataset of the applications that you care about, and very quickly give you an indication of the degree of benefit value that you’re going to see by deploying CSD 3000 in your architecture.
The other thing is that our device right now is a U2 form factor device. That means you need a server with a U2 backplane and ideally it should be Gen 4 PCIe. They do work in the earlier Gen three, but you’re going to live a performance and the whole point of using a computational storage device is to gain that performance capacity increase and endurance.
So in today’s presentation, I’ve kept it deliberately high level. I’m very simple to cover the basics. In the coming weeks. I’m going to take a little bit more time and go in depth around how the performance gains are realized, how you can get that capacity utilization up to a very high level or even extend your capacity namespace by two or three times.
And then also I’m going to follow up with how we get the endurance benefits and then wrap it all up with how we overall operationalize these sorts of devices in our application infrastructure. But if anything, I’ve said today peaks your interest and you have an application problem. Maybe it’s related to performance and we’d love to hear from you here at ScaleFlux.
You can reach out to us at the link below and myself and the rest of the ScaleFlux team. Look forward to helping you unlock the power of computational storage and really make those applications fly.
We’ve already having some great questions coming up in the feed, so I’m going to take a couple of those straight off the gate.
And the first one, do you require special drivers? No. This is a plug and play, a Gen Gen4 NVMe device. So any server that has a U2 slot for NVMe we will plug in and work with. We use like every other body in the industry, the BMI comes online set. We have a plug in just like every other vendor that allows you to see some additional reporting.
So to recap that off is that we don’t require any special drivers. We’re a plug and play U2 and NVMe device. Another question that’s coming in and we talk a little bit about the increased utilization. Well, this has two main approaches. We’re going to talk about the simple one. Let’s say you buy a 7.68 terabyte device and use it with the namespace.
It comes to because the compression is always on. So we’re turning it off. Even with data streams that are thought to be relatively incompressible, there’s still a small amount of compression ability and even if it’s a few percentage points that does give you insurance and other increases in performance. The increase utilized is very straightforward. With most devices, people show or limit the amount of capacity used.
So somewhere between 75 or 80%, because what happens beyond those case, beyond those utilization rates is that the device can then suddenly have abnormal performance where the performance or the read and write characteristic changes depending on how much data is being stored on the device because we’re compressing our data. Whilst the device may appear to be 80% full, the reality under the covers in the NAND is much less.
So you can still use all of that 7.68 terabytes right up to the limit and not see any write cliff or any abnormal performance behavior as you would on another device. Now the next question that comes to us is how do I use the expanded namespace, the Times two or times three capability? We’re going to do a whole video on that because there is a little bit of operational detail that is worth driving into.
But in very simple terms today you would receive the device and you have already looked at your data. You figured out that you have a compression ratio of say, three or 4 to 1. How do you know that? Well, we actually have a tool that you can run on your file system, on your application data, so that you can determine that up front.
Once you have the output from that tool, you can then take it to the device and delete the IT namespace that it comes with and then create a larger namespace. So let’s just use simple numbers for the minute. Let’s say it’s a 4 TB device and you want to create 8 TB namespace. You would put that command in for the of measly and then you would have this 8 TB namespace.
You would start writing data to that point and your file system would say that it has 8 TB to write to. You do have to keep an eye on not just the file system utilization but the physical device utilization as well. That means that if you start writing data that maybe has a different compression ratio less or maybe incompressible data, then you might have a little bit of difference between those two pieces, which you need to manage.
But to actually go in with expanded namespace, very straightforward, very easy. Delete the old one, create the new one and start writing data.