This week Chris and Martin are joined by Chris Mellor for what is turning into an annual look at storage technology for the year ahead. We should point out that this is only for enterprise storage and not a forecast on the world in general!
We start with a discussion on media. At the macro level, capacities continue to grow for SSDs and HDDs year on year. This trend is expected to continue, but what micro-level advancements are being made?
QLC increases in layers, while PLC (penta-level cell) flash is being mooted as increasingly more practical. Hard drives are at 20TB and growing, but suffering the challenges of throughput. Will 2020 be the year we see multi-actuator take off?
Does old media continue into decline as the new tape? Could punched cards make a comeback? We think it’s unlikely.
The final part of the discussion turns to systems and composable systems in particular. Composability enables servers to be dynamically built from their constituent parts. The technology is in the early days and maybe not ready yet for mass adoption. Similarly, ARM processors look to be on the cusp of data centre relevance.
Part II of this recording will follow next week.
Elapsed Time: 00:38:00
- 00:00:00 – Intros
- 00:01:40 – Storage predictions only please!
- 00:02:30 – Kioxia fire in the clean room
- 00:04:00 – NAND prices will go up in 2020
- 00:05:15 – 128-layer flash on the way
- 00:06:55 – Will PLC flat be “a thing”?
- 00:10:40 – Enmotus have developed a hybrid SSD
- 00:11:40 – 20TB+ hard drive capacity, what’s next?
- 00:13:20 – Are hard drives the new tape?
- 00:14:35 – Is the physical reliability of tape good or bad?
- 00:16:55 – Has much happened in 12 months – 10 years?
- 00:19:30 – Migration to SSDs wasn’t a big architectural change
- 00:20:40 – Uh oh! Gartner Hype Cycles!
- 00:22:00 – Bring back punched cards!
- 00:23:20 – What has happened with composable systems?
- 00:26:34 – Liqid could be an interesting composable solution
- 00:27:35 – Obligatory mainframe reference!
- 00:30:00 – ARM servers could be popular in 10 years’ time.
- 00:31:10 – Containerisation – moving from x86 to ARM
- 00:34:05 – Is hyperscaler IT diverging from enterprise IT?
- 00:37:40 – Wrap Up
Speaker 1: This is Storage Unpacked, subscribe at StorageUnpacked.com.
Chris Evans: Hi, this is Chris Evans recording another Storage Unpacked podcast and I’m joined by Martin and Chris this week. Hi you guys. How are you doing?
Chris Mellor: Hello, Chris. I’m doing fine. Hi, Martin.
Martin: Hi, Chris and hi, Chris. Happy New Year everybody.
Chris Mellor: Oh yes, and you.
Chris Evans: Well we’re a days in aren’t we?
Martin: Supposedly in France, it’s acceptable to wish people Happy New Year until the end of January.
Chris Evans: Is it?
Martin: If you’ve got an French listeners, they can confirm or deny this.
Chris Evans: I lived in France for three years and I don’t honestly remember what you did. That’s a bit naughty of me, isn’t it?
Martin: Yeah, that’s the wine, Chris.
Chris Evans: Yeah, probably. Anyway, moving on. You’ve just highlighted the fact, Martin, that it is the start of the new year. It is 2020 and we did our sort of recap of the success or not of our predictions a couple of weeks ago. And actually, we didn’t really make any serious predictions. We just talked about the industry. You didn’t make that one Chris because you were off going gallivanting in the US again, one of your trips.
Chris Mellor: I was.
Chris Evans: But thank you.
Chris Mellor: I loved the IT Press Tour meeting. A lot of startups.
Chris Evans: Yes and doing lots of writing about it I guess. Or not, maybe you didn’t like them. Maybe you didn’t write anything about them at all. I couldn’t say.
Chris Mellor: They’re a lovely bunch of people.
Chris Evans: Good. Wonderful. Well let’s do our stuff then and see if we can go through and work out what we think is going to happen in 2020. Now I sort of just-
Martin: In storage that is with particularly what’s happening in storage. We’re not predicting what anybody else is going to do because who knows at the moment.
Chris Evans: That is a good point actually I should have said latest enterprise data storage as well even more to qualify that because we don’t know what’s happening in the general sort of storage industry for keeping stuff in lockups and things like that. We should step away from all of that and talk about enterprise storage.
Chris Evans: Now, I made a list of things. I think Chris, you provided a few things so we will wander through the lists of what we think and then we might add some. We might take some away. Let’s see how we get on. But the first area is to talk about media. Now before we started recording, this might be relevant, Chris, you talked about a fire actually. Do want to go back over that again and then we’ll dive into QLC and see what that relevance that has to the discussion.
Chris Mellor: Sure thing. There’s been a fire this morning at a KIOXIA flash foundry in Japan. Nobody was hurt the fire’s been put out, but the clean room in this fab six is out of action while it’s been inspecting the damage assessed. It would seem that all flash wafers downstream of the clean room are going to be okay and production can continue. But all production upstream of the clean room, will have to wait until the clean room’s fixed. It’s for 64 and 96 layer flash, so it is current flash and KIOXIA apparently is responsible for up to 3% of the world’s flash production. There maybe a hiccup in NAND supply with a consequent rise in prices.
Chris Evans: You know when we used to say with things like Spectrum Protect and stuff like that, we’d go TSM who were KIOXIA beforehand.
Chris Mellor: Toshiba Memory Corporation. It’s Toshiba’s flash memory, which is in a joint venture with Western Digital. It got sold off by Toshiba to private equity, a bunch of private equity people then renamed itself KIOXIA. Bought itself more or less out of private equity and is still in a flash foundry partnership with Western Digital.
Chris Evans: Okay. So that might have an impact, but at the end of the day, the comment on here was to say what do we think is going to happen in 2020. I think maybe the first thing is to think that prices might go up?
Chris Mellor: Yeah. I think prices … The market seems to be stabilizing. A trough bottom has been reached and it looks like prices will go up quite steadily during 2020.
Martin: Yeah, I have a conspiracy theory. We seem to get one of these every year and I just wonder if it’s a way of the NAND manufacturers trying to manage for bottom price?
Chris Mellor: You think it’s an insurance job?
Martin: I do. I do. It’s a double insurance job. Not only are you going to get some money back, but it’s also we’ll get a nice increase and they’ll start making bigger margins. I think we engineer this.
Chris Evans: Do you recommend I ring up their insurance company and go, we’ve had this thing happen and the insurance company is saying you do know this is all recorded. If we want to go back over this, this conversation is recorded.
Martin: And then they turn around and say, oh yeah, that’s fine because it’s still recorded on our media, so keep recording.
Chris Evans: Yeah. Good point. Okay. Chris, what would you say then in terms of how much you think this is going to affect the industry? Minor? Major?
Chris Mellor: Minor. Minor. Small potatoes I think.
Chris Evans: Yeah, fair enough.
Chris Mellor: Unless there’s some dreadful problem, puts a clean room out of action for weeks.
Chris Evans: Yep, fair enough. But what about QLC in general? We’ve seen what 96 layers. There’s talk about 144. That technology is going to keep on growing, isn’t it?
Chris Mellor: SK Hynix is sampling 128 layer flash at the moment. Micron says it has 128 layer flash scheduled for production later this year. I don’t see any reason why the layer shouldn’t increase.
Chris Evans: And that increases capacity?
Chris Mellor: Faster capacities go up.
Chris Evans: Yeah, that increases capacity. Do you remember quite a few years ago, you and I were downstairs at the VM world conference in Barcelona. Anybody who knows that conference area knows that there’s set of escalators that takes you downstairs to a quiet area and then you walk into the conference area. I remember you and I are having a discussion talking about NAND and saying that, I think you were saying that some reason we saw a gap and that NAND was going to scale to take all the demand of data around the world. But, that that doesn’t really seem to have played out, does it?
Chris Mellor: No, it doesn’t. I think the effects of increasing bit counts in cells and increasing the layer counts in dies is sending NAND capacity up. Well, it’s skyrocketing. There doesn’t seem to be any practical limit to it at the moment.
Chris Evans: You don’t have to worry about shortage in that sense then. We’re still going to have the capacity to deal with it. I guess maybe price affects that though, Martin.
Martin: Yeah, I imagine so. It’s interesting. You were just talking about the impact this might have, I mean due to the fire will have on prices. I bought a new SSD just earlier this week and it’s gone up. It’s gone up five pounds in price today.
Chris Evans: You don’t normally see that do you?
Martin: No, I don’t normally see it. But, it can be a little bit of fluctuation but I don’t normally expect to see things just such suddenly shift as consumer flash, but I don’t imagine this might actually impact it. You never know. Suddenly you see these things happen. People feel well there’s going to be constraint in supply. It’s a few people decided, okay, I’m just going to bump the price a little bit.
Chris Evans: Okay. Moving on. PLC, everybody seems to think that this might be a thing, but I’m not really sure whether I think it will be.
Chris Mellor: It’s gone from being, oh we’ll never do it to yes, of course it’s going to happen in about three months. When QLC, the four bits of cell flash started getting established really well established in the middle of last year, then PLC raised its head as a possibility. The idea that you’d be able to pick out one electric voltage level from 32 levels to find out the bit value of a five bits per cell flash seemed nonsense. And now, Western Digital people are talking blithely about, oh well we can do that. We’ll have controllers with machine learning models in them and PLC flash will be quite feasible in a year or two’s time.
Chris Evans: I don’t know. Are you going to say something then, Martin?
Martin: I was going it really depends how it’s going to drive down the price. If it drives down the price, it might have a future as almost a write once, read many times. I mean it’s sort of an archive. Maybe somewhere you’re going to stick all your Plunk analytics for analysis going forward. I think for general purpose flash QLC’s probably about the sweet spot. Yeah, it will be interesting to see if PLC actually has this. It’s really going to have to come down to lifecycle, isn’t it? Whether it has the endurance we expect. Because we are again, if you look up the tables which show endurance cycles, it really starts to see a drop.
Martin: Obviously, SLC’s got best insurance. MLC is not too bad compared to that. Then you start seeing these big drops to TLC, QLC. Then PLC may have, it’s not horrendous compared to QLC, but it might be just that little bit worse.
Chris Evans: You think it’ll be too far?
Martin: You might think, you know what, I’m not going to risk it. Yeah, but we know the flash manufacturers will keep pushing things. They want to keep this cycle going. But, maybe PLC is the last we get for the time being?
Chris Mellor: Oh heavens, if you had something after PLC, it’s endurance and it’s speed. I think it would be so horribly slow. And the complexity of the controllers, that will be 64 levels, 64 voltage levels to sort out wouldn’t it. Horrendous.
Martin: Yeah. Maybe the next generation after PLC is for tape replacement, the final tape replacement.
Chris Evans: I would question a couple of things there. First of all, obviously we know that we get diminishing returns as we only add an extra bit into the capacity. Obviously with four bits going to five then you’re only sort of getting that 20% gain rather than the 100% gain that we got with going from SLC to MLC when we’d effectively doubled the capacity.
Chris Evans: But from the things that I’ve seen, first of all, Peter Kirkpatrick, who was on the podcast with us before Christmas, he said he’d seen the presentation about PLC at, I think he might have said it was Flash Memory Summit, I can’t remember which event it was. It seemed to be that PLC was being used almost like a discovery area to work out how to fix some of the problems that might then be pushed back into QLC, so we just make better QLC.
Chris Evans: And I know that Chris, you’ve published a graph on one of your recent articles which actually showed the different levels of endurance depending on the cell size. So depending on the actual manufacturing cell size, you’ve got better quality. It may well be that what we end up doing is we end up sort of compromising on increasing cell size to get better endurance and then we change after that.
Chris Evans: And third, I think I might have said two things, but there was a third one. You may have seen that [Inmost 00:10:38] yesterday announced a hybrid drive, which is basically SLC flash and QLC flash on the same physical drive with a controller managing the balance of active data between the two. Effectively, it’s using machine learning, ML, AI type stuff to move the actual active data back and forth between the SLC and the QLC. That might make PLC possibly attractive if we use that sort of hybrid function as long as the costs actually still stack up.
Chris Mellor: I think it’s a fascinating area because increasing the cell’s bit count gives us progressively lower returns. One bit added to four bits is not as effective as adding one bit to three bits. But at the same time, the layer count’s going up. The two things together mean that the capacities on a die can shoot up even if we only get minor increases in capacity at first cell. We get so much increase from the extra layers that the two combined give us an enormous increase in capacity.
Chris Evans: Well, I’ll use that as a segue way into the next topic and that’s really the idea of 20 plus terabyte hard drives. Because what we just talked about in terms of techniques and other things that are used to extend the life and capacity of flash, we’ve seen that in the hard drive industry for years and that’s still seems to be moving on and on and on.
Chris Mellor: Yes, we’ve got Western digital sampling its 18 terabyte and 20 terabyte drives at the moment with some elements of its MAMR, microwave assisted magnetic recording technology in it. We’ve got Seagate demonstrating alive drive at CES with 18 terabyte hammer drives inside it which are not yet in production. And, both Western Digital and Seagate see themselves bringing out 40 terabyte plus drives in a couple of years. It seems like they can just carry on increasing capacity again and again.
Chris Mellor: However, the ability to get data onto an off these drives seems to be stuck at the one actuator level. Multi actuator drives with two actuators will increase it one and a half, 1.7 times perhaps. But, the idea of having a 40 terabyte drive with the same actuator schemes of 20 terabyte drive is pretty poor in terms of IO density.
Chris Evans: Is that any different to a tape though in terms of the density?
Chris Mellor: Well it’s got to be better than the tape hasn’t it, because you only have one drive motor, one read-write recording head per tape, whereas at least with these multi actuator drives you’ll have two when the drives logically split into two half drives.
Chris Evans: I did make a comment to say our hard-drives really the new tape and that was sort of tongue in cheek really to a certain degree, but you can see to a certain degree hard-drives acting almost like a sort of semi offline capability rather than just being purely offline like tape is, Martin, I would’ve said.
Martin: Yeah it’s possible, Chris. But we remember the issue with hard drives is we know when hard drives fail and that’s when they spin up, spin down. Tape is fantastic because tape you don’t spin up spin down particularly. You don’t have to worry about that. It’s environmentally is very good. You can get very good storage densities. If you actually look at the magnetic densities on tape media compared to a hard drive, I mean a tape is a lot less dense. There’s a long way for IBM especially to go with what they could probably put on a tape if they really wanted to.
Chris Evans: Tape is the new tape.
Martin: Tape is the new tape. People keep talking about the death of tape and at the moment it doesn’t … If you want an offline archive and you want a longterm data retention, it looks like tape’s still where you want to be.
Chris Evans: I’ve got a question for you the, Martin, because you still deal with, well you have people who deal with tape. You don’t lower yourself to actually deal with that yourself but you have people to do that on your behalf. I haven’t used the tape for a few years. Let’s say LTL2 was the last one I had in the lab. I remember 34AC when I used to find that the ends would come off the actual, the leader that would come out of the cartridge. But how reliable do you find those devices these days, compared to what they used to be? Are they getting more reliable? Do you see lots of failures or can you really afford to put that much data on one tape and be happy?
Martin: Well, being as it’s a working media, if you haven’t got two copies you haven’t got a copy.
Chris Evans: Of course.
Martin: You generally have two copies. We do have a special tool for replacing leader pins, so actually still.
Chris Evans: Do you?
Martin: Yeah, we do have a tool. IBM gave it to us because we had enough. There was an issue with I think early life LTL5. We had a whole load of leader pins pop and they eventually got so fed up of it they gave us a tool which looks like the inside of a toilet roll.
Chris Evans: Right, okay. Yeah.
Martin: You have your tee and you can unwind the tape a little part and reseat the pin and put it back in again. For tapes generally, yeah, we’ve got a lot of tape so we do a few failures, but we have over 30,000 tapes. We’re expecting to see some failures. If we probably to do for the mass, you’d probably find that tape from a failure point of view is more reliable than hard-drive.
Chris Evans: No. Fair enough, because I just don’t know. I mean I’m just out of touch with knowing the level of reliability and obviously if you do have a tape failure, it’s copying the whole tape again, isn’t it? You’re replicating from another copy in its entirety. It makes no sense to try and just replace the failed data like you would do on a hard drive.
Martin: Yeah. But as hard drives get bigger, that failure zone is getting larger as well. If you lost a 20 terabyte hard drive or 40 terabyte hard drive, that’s a huge amount of data to try and recover.
Chris Evans: That is true. Now from my perspective, I wonder whether when you look at the size of the drives and the capacities and the cost of putting in additional actuators and so on, whether the manufacturers have looked at the cost of things like multiple controllers, fail over controllers, like redundant controllers and things like that to see whether things like that might be worth justifying in order to make sure that they can keep continuing to increase the reliability of the drives? Because as you said, Chris, they’ve got to do something to I guess mitigate our concerns about the fact that copying data on and off is so slow.
Chris Mellor: I’m thinking that I’m off at a slight tangent, forgive me, but I’m thinking this time last year we were probably thinking that disk drives were going to make more of an assault on tape and the Optane media was going to make more of an assault on the gap between DRAM and NAND. But basically, neither of those two things have come to pass. We had a storage hierarchy at the beginning of 2019 and here we are at the beginning of 2020 with exactly the same storage hierarchy. It doesn’t seem likely that it is actually going to change this year either much.
Chris Mellor: I don’t personally think disk drive reliability is much of an issue. The failure rate for disk drives, if you look up Backblaze’s drive statistics are astonishingly low, down single percent. It’s less than 1% in some instances. It’s tremendous technology.
Chris Evans: Here’s a question for you. Who was it said? Was it Bill Gates that said people overestimate what will happen in 12 months or one year, but they underestimate what will change in 10 years? I think it was a Bill Gates quote. But maybe, the fact that we look at this on an annual basis and we think things might change dramatically, actually in fact they never do. But if we went back and looked at this over 10 years, then we would see a very different scenario.
Chris Evans: Martin and I sort of touched on this a little bit a couple of weeks ago when we were sort of saying, you look back at where we were 10 years ago, we were talking about one terabyte drives and now we’re talking about 20. We didn’t have flash. We didn’t have 3D XPoint. We didn’t have any of those other types of technologies really to that degree like MRAM or any of those sort of things. On a 10 year cycle, we’ve seen a lot of change, but maybe one year is too short to measure this.
Martin: The cleverness lies in looking at the various things that are happening, like MRAM developments and Optane and so forth and working out somehow that yes, these things will fly. They have longevity. Whereas other things, putting compute in solid state storage drives for instance, isn’t going to fly at all. I don’t know how to pick out the winners from the losers.
Chris Evans: Yeah. It’s hard though, isn’t it Martin? I mean we talk about this week in, week out. You look at trying to work out who’s going to be successful in all of this.
Martin: Yeah. I think sometimes we have to look at technology and see how it would be deployed. I mean especially to Chris’s point about Optane. Optane’s interesting because I think one of the reasons that Optane’s adoption has been slower is for changes to use Optane properly, you have to make a fundamental offering system level and it means that people are going to have to understand how their systems perform differently. For rapid change into SSDs really was not a massive architectural change. You could just swap them in and swap them out and you’d have something which is faster. We can argue about in the enterprise arrays for whether that was the right thing to do and whether you got the most out of it. But it was easy. When we start going to Optane, Optane you can’t really treat like a hard drive because you just can’t get the densities at the moment. They don’t exist. Architecturally things need to change.
Martin: I think we were probably very enthusiastic and cheerleaders and things of Optane and now if you look back at it, that might have been a bit rash. But I think in 10 years time, assume we are still recording in 10 years time, we’ll look back and think well it happened. There was a sudden change for software. The applications caught up and it was the obvious thing to do.
Martin: I think as you were talking about quotes, William Gibson once said the future is already here. It’s just not very evenly distributed. That’s true. The future’s here if you look at say Linux and Windows, they do support Optane and they support it in a right way, but it’s just not being distributed very well yet. We’ll see it happen. I’m sure.
Chris Evans: Perhaps it’s the Gartner hype cycle effect here with start of last year Optane being close to the initial peak of the hype cycle and it’s gone down to the bottom. It’s climbing up out of it, but eventually it will go past the peak and become a very popular technology.
Martin: We have to remember, Chris, that there is no empirical evidence behind the Gartner hype cycle. The Gartner hype cycle itself is a hype, but yeah, I take your point. I think you’re right.
Chris Evans: Let’s get it over here. Where is the hype cycle on the hype side?
Martin: I think it’s over hyped actually.
Chris Mellor: Well, I don’t know.
Chris Evans: Oh dear. Okay. Then let’s wrap this section up on media because I think we can summarize this by saying incremental change every year we’re going to keep on seeing it over a 10 year period. That’s a better way of looking at this. And actually maybe, I should go in and try and work it out how we’re going to measure it, but we should look at it and say when did 3D XPoint come in? How long has it been out there and really what’s happened over his lifetime? Because over his lifetime is probably a better indication than anything else at the moment, because it’s not been in market long enough for us to care.
Chris Evans: I did say it’s funny how this disk and tape still hangs in. I did have a question to say, did we think older or slower media eventually becomes the archive tier? But actually when I looked at it I thought, well, we’re not using punch cards anymore and we’re not using Ferrite Core. And then I suddenly thought, I wonder if we are. I wonder if there’s anybody who’s actually still using punch cards somewhere. There probably is.
Martin: Amazon’s deep glacier archive is actually an Iron Mountain vault full of old punch cards.
Chris Evans: Cards. Right, okay.
Chris Mellor: There you go. Cost-effective.
Chris Evans: Yeah, you certainly don’t want a fire in it like you had in your own NAND fab do you? That certainly wouldn’t help things. There you go. I tell you what. It might help Travelex this week. At least you couldn’t hack it could you if it’s all on punch cards. You couldn’t encrypt the data.
Martin: Well, there’s a new technology for punch cards. They’re now having them layered.
Chris Evans: Layered punch cards.
Martin: We have a 3D stack of punch cards.
Chris Evans: Like Jenga.
Chris Mellor: Yeah, you could back them up though, you can stick them through a knitting machine. You could stick them through a jacquard loom or something that could be your backup. It could be this wonderful woven tapestry.
Martin: You mean you could weave data into a sweater? No, stop it. I’m not going there.
Chris Mellor: Yeah you could. Density would be rubbish, but yeah.
Chris Evans: I think we should move on and talk about systems before we get … I can tell it’s a new year can’t you? All the Christmas frivolity still hanging in.
Chris Evans: Let’s move on and talk about systems. This is a couple of them. I think you’d might have pushed us in this direction, Chris. Let’s talk about things like composable and rack scale and and so on. I sort of extended this a little bit and said why have those technologies not been as readily adopted as say HCI was. Maybe that’s related to the discussion we just had about 3D XPoint and its adoption compared to SSDs. Why don’t you give us the benefit of your wisdom on this area?
Chris Mellor: Yeah, I think Martin put his finger on this. The flash adoption was easy because it was really a plug in replacement for disk drives, forcing the issue a little bit, whereas Optane is not a plugin replacement for anything. I think composable systems sound to me like an absolutely great idea. Instead of having 50 individual servers each with their own CPU, memory storage and networking and so on, I’ll have one massive server and carve it up via software means into ones I need for particular workloads.
Chris Mellor: But, that seems to me actually extraordinary difficult thing to do. This is a granularity thing. If I’m looking at my big massive server and the granularity is that I can carve it up in 20% lumps of compute or 20% lumps of storage or DRAM, but I actually need sub 10% changes. If I need say 5% of this. If the granularity is The control isn’t fine enough, then composable systems aren’t going to do me any good at all because I’ll still have wasted and stranded capacity.
Chris Mellor: I also think that ensuring you get a good rate of use out of the components is also going to be very, very difficult. How can you know with a composable system that over a period of time you’re going to get a certain minimum level of use out of the compute cores, the disk drives the flash drives and network switches and so on? I don’t think you can because it’s going to be workload dependent. People will start trying to match a composable system to a suite of applications. It just leaves me thinking these things are just there for hyperscalers really.
Chris Mellor: If I’ve got a hundred servers, then a composable system probably isn’t worthwhile. If I’ve got 10,000 server based systems, then a composable system probably really is worthwhile because the improvements in efficiency I get will be so great at that level. But for the rest of us, for the non hyperscalers, I think composable systems might be a figment of our imagination. They may not fly at all.
Chris Evans: Any opinions, Martin, before I chime in?
Martin: Well, we discovered when HP started talking about composable systems or as we like to call this compostable, which is a bit unfair. But I think one of the promises, A, there are two things. It is a technical challenge. I don’t think anybody really understand. Some don’t understand how we use them, but the marketing around it has been terrible. Anybody looking at it from the outside you say, well surely that’s just Nutanix. Surely it’s just an HCI system given a different name. Nobody’s done a very good job at distinguishing them from an HCI.
Chris Evans: Yeah. Well I’ve written something. I wrote something actually towards the end of last year talking about this. One company that I’ve seen that looks like they could be interesting so far in this area is a company called Liquid. The reason I think they’re interesting is because they’re connecting all the devices together using an expert PCI express backbone, which is low latency, high performance. Therefore, if you were trying to build a server where you were putting in say GPUs or other types of more bespoke and specific devices, it would be potentially really useful because you can add them in, use them for 24 hours, do what you need to do and put them somewhere else. You’d have to be pretty hot on reorganizing that and making efficient use of it, but so far that’s about the only platform I can see that’s got the flexibility in the low latency and the performance to make it work.
Chris Evans: You you mentioned HPE and their system, but that’s composable from a sense of connecting storage across a network to existing physical servers. We certainly can’t say here’s a processor or here’s 10 processors, let’s put them together and then call that a machine and then two minutes later, let’s split it into two five CPS. But you know what, we could do that 30 years ago on the mainframe. I’m sorry to say it, but we could.
Chris Mellor: I thought you were going to bring that up. I thought you were going to bring up Sysplex. It’s Sysplex, parallel Sysplex.
Chris Evans: It’s what basically it’s what Amdahl invented first of all. Then, IBM then went back and re-engineered their systems to produce LPARs and the ability to map. In fact, in the Amdahl implementations, you could actually split a physical processor up into logical ones, cross domain and you could over subscribe. It was a bit more advanced.
Chris Evans: But you know that technology has been around, and it only existed like it did because everything was in one massive monolithic system. We had other examples of technology like that. There’s some Spark machines could do that like the 10Ks, the 25 Ks, they could do that to a certain degree.
Chris Evans: The technology’s existed and been out there and being used, but I think you’re right, Chris, there’s got to be enough of a reason for you to want to chop and change in your environment to make this really worthwhile.
Chris Mellor: I think HCI systems, the hyper conversion peaks have a wide appeal across a range of enterprise sizes, whereas composable systems to my mind are only going to be appealing to the hyperscaler people because they’re the only people who have thousands and thousands of components and can get efficiencies out of their use that other people can’t.
Chris Evans: I’m going to throw one into the mix here. A company that I’ve spoken to already, they’re UK based. They’re based in Cambridge. They’re called Bamboo Systems. Have you spoken to them at all? They are doing ARM-based pro servers. Funny enough, their chairman I think it is, is Geoff Barrall from the old Connected Data and Drobo days and obviously.
Martin: Oh, Bamboo.
Chris Evans: Yes.
Martin: This is a strange company.
Chris Evans: But, they could be interesting from the point of view that what they’re doing is they’re producing ARM-based servers that have to have some sort of high speed mesh interconnect. Now, I’m not entirely sure how that mesh is configured, but imagine you could actually create a mesh of processes together that can act as a system. How configurable will that mesh be? If that mesh was highly configurable, you maybe could create logical sort of servers within a physical server framework and that might be a way to do some of what we’ve just been talking about. Whether that’s possible or not, I don’t know, but I’m really interested to see whether something like that might be the way that we move it forward.
Martin: Maybe one of the things that you can detect if you look at IT with 10 year chunks, is that ARM becoming useful in enterprise servers is one of these 10 year chunk effects that year by year it doesn’t seem as if it’s going to happen. Possibly when we look in five years time, ARM servers will be very popular indeed.
Chris Evans: I think they will be. Absolutely. I think there will be.
Chris Mellor: See I’m not seeing it at the moment. I’ve spoken to ARM so many times and I’m just, I’m not seeing it. I everybody certainly in the enterprise corporate space is waiting for somebody like Dell or HP or somebody else to start building servers using ARM components. I think if that happens, yeah then we might see it happen, but until they’re embraced by one of the big boys, I don’t say it at the moment.
Chris Evans: Well, it’s needed Amazon to do it themselves isn’t it, to actually take the reference design and create the one last year graviton, two this year, processors that are based on it. The new one’s based off NEOVERSE designs. The previous one was based off the Cortex A72 design which is similar-ish to what’s in your last year prior really but except that’s a system on a chip.
Martin: Could you contribute to my education please? If you have containerized software, can those containers be ported readily from an X86 system to an ARM system?
Chris Evans: You have to recompile.
Chris Mellor: Yeah, you have to recompile. It’s exactly the same situation which we are today. If you run a Linux on a Raspberry Pi or any one of the ARM servers, the applications you can recompile, but you can’t move it natively, or not easily. There are ways you can do it, but you’d always take a performance hit so you’d always want to compile it natively, certainly on an ARM which has less performance then the equivalent Intel architecture.
Martin: Okay. So containerization isn’t a sly way for ARM processors to slip into the enterprise data center.
Chris Mellor: Well, Kubernetes allows you to orchestrate across ARM and Intel-based servers relatively easy. As long as you have a container image, it’s fine. You could actually have a service which ran quite happily on the Intel and ARM and it was just a recompiled version of the same application.
Chris Mellor: We’ve had cross architecture type schedulers for a long time. IBM had one which would allow you to run workloads on Risk-based servers and SGI-based servers in a super compute environment. It’s been around, but this hybrid architecture within a data center hasn’t really taken off. Even to extent, if you look at most data centers, you don’t see a big mix of Intel and AMD, although we might start to see that happen now with the latest AMD chips because your power price ratio is so much better on them at the moment.
Martin: Okay. Just to run one little bit that, I think if you look at the difference between say Intel and AMD in terms of the way that the processes work, obviously we know that ARM is a Risk-based design. The background to that is that you expect the more efficient and smaller instruction set to be executed quicker or in fewest clock cycles than you would do if it was a CISC system like Intel, a complex instruction set. And as a result, the Intel systems are great in a straight line. If you’ve got a single threaded application, they have I guess high-performance because they execute the instructions much quicker because they’ve got a higher clock speed. Whereas, with the ARM processors you’ve got a much more power efficient solution that’s probably suited more readily to scaling, let’s call it scaling out rather than scaling up. And therefore, if you’ve got a distributed system like something that runs thousands of containers that might be more suitable to the ARM architecture than it would be to the Intel architecture.
Martin: Amazon are looking at it for things like, I think for some of their load balances and things like that where they know their handling many, many threads and they’re trying to distribute those threads across many cores, but they need a low power processor to do that because there’s other interrupts going on in the background. I can see there’s a definite use for it, but you really do have to understand replication to know where it will work and where it won’t.
Chris Evans: Do you Martin or Chris think that the way hyperscalers are using IT is diverging more and more away from the way that enterprisers use IT and that the hyperscaler computing model will become less and less applicable to enterprisers in terms of implementing boxes of hardware?
Chris Mellor: I think we’re very different already how you implement it. But functionally, the way an enterpriser will consume it or how a developer will consume it is identical. We’ve seen some very different architectural modules, but that’s not necessarily a big problem because it’s scale. The hyperscaler’s scale is so different to a lot of enterprises, but you have to implement in different ways.
Chris Evans: Yeah, so if you look at say the technology that Amazon developed with Nitro, they went off and decided that there was more value to developing bespoke hardware there that they could offload things like networking and storage functionality to and security of course in order to gain more efficiency from the actual servers themselves at scale. And clearly for them, maybe 5%, 3% who knows what you know, 10% whatever the saving is is a significant cost saving for them compared to, I guess you know most of the enterprises, unless you’re really large.
Chris Evans: Now the question is why is that technology not being developed by people like Dell and HP? Why have we not seen an equivalent type technology? I wonder whether we’ll see things like that. If you’ve not seen it, there’s a new startup on the block from a couple of famous names, a company called Oxide Computer and they’re looking to take the open compute definitions that people like Facebook developed and actually build those into the systems that you could deploy in your own data center. Give you the sort of the hardware infrastructure that people, Amazon have got so that you could deploy that yourselves without having to go down the route of outposts. My wonder there though is this is the software side of it as well.
Martin: I’m beginning to think that IT supply could split into companies that supply the hyperscalers and companies that supply enterprises. The reason that Dell and others didn’t develop this Nitro hardware is because they’re focused on selling systems to enterprises and enterprises don’t want it. There are no specialized companies developing this kind of hardware just looking at Amazon or Facebook or Google because to succeed, they’d have to sell tens of thousands of pieces of kit to five customers.
Chris Evans: Yes. It’s a horrible, horrible business model, isn’t it?
Chris Mellor: That’s a massive risk.
Chris Evans: HP did a couple of things didn’t they? Wasn’t the Apollo meant to be like a more lightweight server model and of course Moonshot, which was definitely more lightweight. But, I don’t know whether they’ve been successful.
Martin: But it failed. Hewlett Packard is as far as R and D and developing advanced IT architecture is concerned, HPE is a catastrophe. The machine is a miserable effort in hindsight. I think composable systems I fear are going to go the same way and HPE just progresses its offer by basically buying up other companies.
Chris Evans: Okay. Oh well. We’ve done, we’ve put the knife in the back of composable systems in that discussion.
Chris Mellor: Composable systems, they’re going to be recycled as mainframes.
Chris Evans: I think that’s probably a pretty good point to say end of part one. Keep listening everybody, and we’ll start part two soon.
Speaker 1: You’ve been listening to Storage Unpacked. For show notes and more, [email protected] Follow us on Twitter at StorageUnpacked or join our LinkedIn group by searching for Storage Unpacked podcast. You can find us on all good pod catchers, including Apple podcasts, Google podcasts, and Spotify. Thanks for listening.
Related Podcasts & Blogs
- #136 – The End of the Year Show 2019
- #82 – Storage Predictions for 2019
- #37 – State of the Storage Union with Chris Mellor
- Are ARM Processors Ready for Data Centre Primetime?
- Liqid’s PCIe Fabric is the Key to Composable Infrastructure
Copyright (c) 2016-2020 Storage Unpacked. No reproduction or re-use without permission. Podcast episode #WASP.