In this podcast, recorded live at Dell Technologies World 2019, Chris talks to Rob Davis, VP of Storage Technology at Mellanox, about the development of SmartNIC technology and storage. SmartNICs offload tasks such as encryption, compression and protocol traffic from core CPUs. This allows system vendors to create more efficient and cheaper products. It can also provide a backward compatibility for legacy applications and operating systems.
Rob takes us through the implementation of SmartNICs on both the initiator (host) and target (array) side of the storage network. We discuss vendors using the technology and what we can expect in terms of evolution in SmartNICs in the coming years.
You can read more about the Mellanox BlueField and Innova SmartNIC ranges here – https://www.mellanox.com/products/smartnic/.
Elapsed Time: 00:16:34
- 00:00:00 – Intros
- 00:00:42 – What are SmartNICs?
- 00:02:00 – Network offload has been around forever
- 00:02:44 – Additional components can be SoC, FPGA, processor
- 00:04:00 – For storage, SmartNICs could be Initiators or Targets
- 00:06:00 – How is storage accelerated at the initiator end?
- 00:08:50 – Hyperscalers are using SmartNICs with their own software
- 00:10:00 – How much cost do SmartNICs add?
- 00:11:50 – Where are vendors using the technology?
- 00:12:30 – SmartNICs are great for disaggregated storage solutions
- 00:15:00 – Where do we think SmartNICs are headed?
- 00:15:23 – Wrap Up
Chris Evans: Hi. This is Chris Evans, recording another Storage Unpacked Podcast at Dell Technologies World. This time I’m joined by Rob Davis from Mellanox. Hi, Rob.
Rob Davis: Hi, Chris. How are you?
Chris Evans: Pretty good, thanks. How are you?
Rob Davis: Good. Thanks for this opportunity.
Chris Evans: Enjoying the show?
Rob Davis: Very much.
Chris Evans: Excellent. Could you just tell us what you do at Mellanox?
Rob Davis: Sure. I’m the Vice President of Storage Technologies, so I overlay all of our different organizations, different product lines. And I came from QLogic where I was CTO for 15 years and moved over to Mellanox to basically take their super high-performance networking technologies and apply them to storage.
Chris Evans: Fantastic. So, I thought it would be really interesting to talk about the subject of SmartNICs, because this seems to be A, a technology that’s starting to appear in storage products and B, a technology that Mellanox sells. So, as a company to talk to, you seem like the perfect company. But on a serious note we’re seeing SmartNICs appear in storage platforms, storage architectures. And really I’m interested to start off with … to understand where the technology came from? And why Mellanox built them in the first place?
Rob Davis: Sure. So, SmartNICs are basically a network adapter that offloads the processing of the main CPU. So, where did they come from? Efficiency probably cloud most, but also in storage, the efficiency of the CPU, the lower cost CPU you can put in your storage system, the better. And if you can offload particular functions like we do on the storage area, like NVMe-over-Fabrics for example, encryption, compression, those are processes that are highly CPU-intensive. Which means the vendor that’s building that storage system has to buy a very high-end processor and it drives the expense of their product up. If they can offload those processes to a SmartNIC, they can have a much lower cost point on their products.
Chris Evans: Okay. So, this didn’t originally evolve as a storage solution, it was more of a network offload and it just happens to be of a [inaudible 00:02:07] for storage?
Rob Davis: I think SmartNICs have not been coined the words, the marketing term SmartNIC, but they’ve been around forever. I mean, think of FPGA coupled with NIC ASICs.
Chris Evans: We have tow cards, I suppose it … one stage for [crosstalk 00:02:22].
Rob Davis: Exactly. That would be a SmartNIC. It’s kind of … it’s a very murky area because what used to be a SmartNIC, which would have been a tow card for example, now that then later gets built in to the NIC, right? As part of the NIC. So, it’s sort of a … maybe a stepping stone.
Chris Evans: Okay. So, if we look at the architecture of what these cards look like, they’re a network card and in your case it’s a system on a chip?
Rob Davis: So, typically they’re a network card that has an addition, whether it’s an FPGA, whether it’s a processor. In our case, we have a mixture of a processor and hardware offload engines. And so we have a … it’s called the BlueField SmartNIC. It has a 16-core Arm Processor on it, A72 technology and a whole lot of different offloads for storage and many other functions. Because we don’t only sell this for the storage space, we also sell it in the Telco World, the hyperscale cloud customers, security areas, lots of different areas. I just focus on the storage side of it.
Chris Evans: On the storage side of it, great. Okay. Just as a reference for anybody who’s listening, I might have said this a few times, we have a lot of noise behind us and I apologize if there’s sometimes a noise does creep up. It was … I don’t know, I think we’re next to the Alienware demonstration areas, so I think they’re cranking that up every so often, so hopefully they’ll calm down for a little bit once each time.
Chris Evans: Okay, so system on a chip offload and we’re starting to see software vendors using it. Now, I’ve seen mostly in vendors I’ve looked at initiating use of that technology, but I guess it could be used in initiator and target from what you’ve already started to say.
Rob Davis: Yeah. On the target side, think about an all-flash array that is full of NVMe SSDs. And with our SmartNIC, we put it into those systems and it allows them to offload the CPU from running the NVMe-over-Fabric stack. So, we offload that actually into hardware gates in our BlueField SoC system on chip. And so when the incoming NVMe traffic comes into the target, it’s automatically routed directly to those SSDs without having to touch the local processor, except for exceptions and configuration. So, it actually runs that CPU utilization right down near zero.
Rob Davis: Now the maker of that all-flash array can buy a smaller … a lower-end processor, have a much better cost point on their storage or also add more features into that mainstream processor. It’s crazy to take a processor designed for calculations and move data with it. And that’s basically what’s happening a lot in NVMe all-flash arrays today.
Chris Evans: Right. I mean that’s really quite an interesting comparison to perhaps the offload that you would see, or at least the direct access you might see in some other technologies like say RDMA where you can directly address the memory in other machines by going through the NIC and not going through the CPU to do that. So potentially, I would imagine vendors are really keen on this because from the scalability perspective, they can really scale their platform up without bottlenecking through, as you said, the local CPU and memory on that particular system.
Chris Evans: Absolutely. And RDMA is a big part of that offload engine and NVMe-over-Fabrics using RoCE or InfiniBand are … where you get the optimum performance, but you can also use NVMe over TCP. There’s other neat tricks you can do there because of the processor that’s on that SoC, you can actually customize it for your particular application.
Rob Davis: Right. So, what about the initiator and then at the host end? Because clearly great to accelerate the traffic at the target end, where you’ve got a big storage box where you might want to make sure you can optimize that SSD storage. What about at the initiator? You can use it just as easily there?
Chris Evans: Absolutely. So on the initiator side, besides the security and the Telco and all the other applications that I’m not going to go and do because I’m the storage guy.
Rob Davis: Yeah, fair enough. No worries.
Chris Evans: On the initiator side, what we can do is actually make that SmartNIC look like a local NVMe SSD, so you can plug it into any operating system, any hypervisor and instantly have access to NVMe-over-Fabrics. And with customization you can actually support legacy protocols so you can support NVMe-over-Fabrics and if you have some legacy storage, you also want to be supported, you can just route that to the CPU and the CPU can take the NVMe-over-Fabrics and convert it to say iSCSI.
Rob Davis: That’s really quite a powerful scenario, because in a lot of the technology that we’ve seen today, people tend to talk about NVMe-over-Fabrics and you’re pretty much tied to the Linux Stack in order to achieve that. So, are you saying that with a plugin card, like a SmartNIC, you could run a different operating system …
Chris Evans: Windows or VMware.
Rob Davis: Like Windows and VMware and then you just … it just appears to look like a local device.
Chris Evans: Or back ports, right? Because a lot of times you’ve got an older version of Linux or an older version of … and VMware, because VMware is coming out with NVMe-over-Fabrics support. But you have an older version and you don’t really want to touch that, because you’ll affect the applications potentially.
Rob Davis: Yeah. Okay. So, that’s given you not just the performance benefit, but it’s obviously given you a serious compatibility benefit you couldn’t achieve another way.
Chris Evans: Absolutely.
Rob Davis: That’s really quite interesting.
Chris Evans: And you can add things like compression or encryption, so if you want to encrypt the data coming out of the initiator, you can encrypt it before it gets sent out by using the SmartNIC’s encryption engines or compression engines and the same at the other end of the wire. If you want to encrypt the data being stored before you store it, you can encrypt it. And those are very high CPU utilization functions and compression and encryption. So, having them offloaded is very …
Rob Davis: A significant saving.
Chris Evans: Yeah.
Rob Davis: Yeah. So, if we look at that as a technology and we think, okay, “So, I can put this in the host. I can effectively give what looks like a local drive there.” What are we seeing in terms of adoption? Are we seeing storage vendors using that technology and saying, “We’ll try and pair it up with an initiator and the target.” Or are we actually seeing solutions built specifically just for initiator type options?
Chris Evans: So, typically it depends on the customer. And the hyperscalers, of course, they have armies of programmers and they take these cards and they write their own software forum to support whatever kind of offload they want to do, in storage or all the other things. The Server OEMs, they love the capability of the compatibi- … the way to get instant compatibility with new technologies. So, there they’re packaging their own applications, just the specific Plug and Play kind of products. But those probably aren’t going to see the market I’d say for another year or so, it’s kind of in development phase.
Rob Davis: Okay. Is it still early days do you think, for this sort of technology? I mean we sort of said that it’s been around for a little while in the format, but in the current instantiation?
Chris Evans: It’s early days for the storage piece on the initiator side. Some of those other technologies are much further along and already in production. And on the target side, the NVMe-over-Fabrics offload we talked about, there’s already CMs showing products with that technology today. At Flash Memory Summit last year for example, there were …
Rob Davis: And that’s right around … funny enough, that’s one of the areas where I looked at and thought, “I need to come and talk to somebody about this technology.” Because I saw some of this stuff at Flash Memory Summit and …
Chris Evans: And I think you’ll see the initiator side this year probably.
Rob Davis: This year. Right. Okay.
Chris Evans: That’s my prediction.
Rob Davis: What about cost then in terms of building systems? What is the difference? Is it adding a significant cost to the development of the system?
Chris Evans: Well, it’s more expensive than a regular NIC. Something to say about that in a minute is … well, I’ll say it now. What you see today on SmartNICs you typically are going to see in a couple of years on the actual NICs. Because the functions that really become very popular, will get moved into the NICs themselves. But the cost is definitely more than a standard NIC. The percentage wise, I don’t know exactly. I think … I know for sure they’re on our website already, the BlueField SmartNICs. So, I’m sure you can go look up your own pricing.
Rob Davis: And to be fair, you’ve already said “Rob, that there’s a trade off between using more calls in an actual server and so on.” And so the fact that it might cost slightly more to buy a SmartNIC is clearly offset by the cost of actually building out a more expensive system.
Chris Evans: Absolutely true. I apologize for the costing, but I deal with costs at all different levels and I’m not sure what the [crosstalk 00:11:13] pricing is.
Rob Davis: No. Absolutely. And I wasn’t expecting you to sort of …
Chris Evans: Rattle off $19.95.
Rob Davis: Yeah, exactly. I was just … I think that people thought it’s going to be double the cost. They might think that’s really difficult to sort of swallow, but when you’re talking about 10, 20, 30% more expensive that’s a probably more …
Chris Evans: And another issue with us is we sell mostly OEM and I have no idea what Dell pricing is.
Rob Davis: No, absolutely. So, let’s talk about where companies are using the technology in a bit more detail. Because I don’t want you to talk in specifics, but you mentioned obviously VMware could be accelerating stuff, that’s one scenario. So, hypervisors could be … but what about vendors are taking this technology and using it [crosstalk 00:11:53]. So, I imagine Dell must be able to …
Chris Evans: Yeah. So, hyperscalers … Dell’s customers, they love this technology. Because they’re currently trying to get the most efficiency they can out of their storage and they’re doing a new technology called Compute Storage Disaggregation. It has many names, Composable Infrastructure, sometimes it’s called Rack Scale. Basically what they do is they disconnect the storage from the CPU within a rack, so that they can mix and match the storage requirement to the CPU requirement depending on the application they want to run. So for example, they can serve up pages during the day and analyze how those pages were used at night, taking different mixes of CPU and storage for those different applications. And they love this technology because what a SmartNIC can do is make that application think it’s talking to local storage when it’s really remote, which is just what they’re trying to … the problem they’re trying to solve.
Rob Davis: Yeah, I agree with the disaggregated side of things or composable. And I think the disaggregated one is really interesting for me personally, because I look at it and think when we go back and think about Fibre Channel, my history is in enterprise systems and deploying fiber channels …
Chris Evans: Mine as well. QLogic, 15 years.
Rob Davis: So, QLogic for me … it’s interesting you said you worked for QLogic, I’m thinking, how many generations of QLogic comes based on storing into service. But that was when they were only sort of one, two, four gig-type devices.
Chris Evans: Wow. One gig, you’re way back there.
Rob Davis: Which is a long way back. Yeah. But when you look at that, you think … and where was I going with my question? I was talking about …
Chris Evans: Compute storage disaggregation.
Rob Davis: Compute storage disaggregation. Yes. So, when you look at that, you think the centralized storage benefit was significant over the last sort of 20 years. It made a lot of sense to have centralized storage, but obviously the complication around building the network, especially on a Fibre Channel network, that was separate to your IP network was either expensive or potentially relatively complex, shall we say. Whereas now with the idea of being able to make a device look like it’s entirely local, that gives you so much more flexibility. Because that device could be connected and visible to one host one day, another host another. Or even within minutes or seconds, you could swap things around very dynamically.
Chris Evans: Depends on how good your composing software is.
Rob Davis: Yeah. So, I think that is an area I think we’re going to see more I think, simply because it just gives people more flexibility.
Chris Evans: And the SmartNICs are really well suited for that because of that onboard processor, they can interface to that composing software very well in … out of band and while an application is running, the next one can be getting … can be setting up.
Rob Davis: And do you think we’ll get to the level where you are actually changing the code on the card that dynamically?
Chris Evans: I’m not sure if you’re changing the code, but you’re changing the parameters of where the storage is pointing or what protocols are running, those sort of things. Yeah.
Rob Davis: Okay. So as we come towards the end of our time, so where do you think this technology’s headed? What can you … give us a little bit of a teaser for for the future? And what can you not tell us?
Chris Evans: Well, I can tell you that we have a roadmap for new SoC chips out into at least 2022-ish timeframe. So from our perspective, they’re here to stay. From the perspective of the industry, it seems like all of our competitors are doing them as well. Intel just entered the space a few months ago. So, it’s a wide open space.
Rob Davis: Okay. It’s good to see that storage still has a significant value and we still have to put a lot of effort into it. And it continues to evolve and I hope that continues to be the case for a long time forward.
Chris Evans: I agree with you. Thank you.
Rob Davis: Okay. Well thanks, Rob. I appreciate it. If people want to go and find out a bit more is there anywhere on the website they should go?
Chris Evans: Absolutely. If you go to the Mellanox web site, we have a whole section on SmartNICs and some white papers and blogs and pointers to other stuff. So, take a look and we are very … always looking for new customers in this space. So, feel free to reach out and we do POCs and we’re happy to look at your problems and solve them in storage.
Rob Davis: Brilliant. Okay. Thanks, Rob. Talk to you soon. Thank you.
Chris Evans: Thank you. Bye-bye.
Related Podcasts & Blogs
- #89 – Choices in NVMe Architectures
- #76 – Fibre Channel and NVMe with Mark Jones
- #61 – Introduction to NVM Express with Amber Huffman
Copyright (c) 2016-2019 Storage Unpacked. No reproduction or re-use without permission. Podcast Episode 0E59