Quick Stealth Edit: I should preface a lot of this with much of this discussion is focused on storage in Virtualized Environments, but the overall gist of the 3E’s should apply to physical as well.
So in part one, I rambled on about some of the stuff we see with legacy storage platforms and how they are not necessarily fitting what I call my 3Es of Storage: Efficient, Effective, Easy. Obviously that begs the question, what does? So today I get to ramble on some more.
When it comes to storage Efficiency: deduplication and compression have long been the two factors utilized in the marketplace, with both functions used in conjunction to further accelerate the “space saved” equation. As spinning disks get to sizes that hurt my head (rebuild times forever, begin) the capacity issue doesn’t become as pressing as it once was when it came to simply storing data. Still there is relevance beyond a simple data reduction, especially if these two factors are initiated during data ingestion and not as a post process function. When done across all tiers of storage (DRAM,SSD,HDD) for Primary, Backup, and Replicated data, the efficiency of storage moves into the realm of IO reduction.
True storage efficiency runs across all tiers and all data sets. When you start to do that, you start to eradicate IO operations. I’ve used this quote repeatedly by Gene Amdahl, and it boils down true efficiency to this: “The best IO is the one you don’t have to do”. When I talk about efficiency that’s essentially what I’m talking about.This aspect is further evolved when we start to apply it to backup and replication actions, and if done correctly can allow near instantaneous point in time full backups at the object level, or in the case of virtualized workloads, the Virtual Machine level all while creating no additional IO, and no additional space. We simply don’t write what we don’t have to.
This leads us into the Effective aspect of 3E, where we start to provide data management that becomes specific to objects. In the case of a virtualized infrastructure that object is the Virtual Machine, and in that sense we can provide VM-Centric based storage operations. When you take deduplication and compression and apply it globally, one of the primary benefits is the reduction in IO and ability to remove IO from the storage equation. Lets take the case of a traditional backup or clone, as we see there is the traditional high amount of read activity followed by a smaller amount of write activity. This is the traditional backup IO profile, high read IO, moderate to low write (at least for incremental backups). In the Efficient/Effective storage system this IO doesnt exist because there is an understanding of the underlying file structure as well as no need to recreate blocks that already exist.
So with your standard backup you can significantly reduce the amount of local IO, but whats not immediately obvious is new functionality that doesn’t really exist in legacy or modern storage systems, in this case virtual machine workload mobility.
Thanks to my buddy Hans De Leenheer for great illustration below.
What we see here is two separate datacenters and VM workloads residing in both. Whats also seen here is that commonality of data exists between both locations at the block level. Now this provides two areas of functionality, one is the ability to replicate a Virtual Machine from in this case San Francisco to London, at this point data is evaluated at both ends and the blocks that dont exist in San Fran are sent to London. Now that secondary location has all the required blocks to rebuild that Virtual Machine, in essence the backup operation becomes a replication at the VM level (VM-Centric). We don’t have to replicate unnecessary data, (more efficiency) by stripping out log and swap files. We don’t have to send blocks that already exists, thus significantly reducing the amount of IO that needs to transverse the WAN link. We dont have to replicate VM’s that reside in a datastore or LUN that we don’t want to replicate as we would with many array based replication features. IO becomes a function of sending and receiving only what is necessary (aka Efficient/Effective). Now take this one step further, and simply move the VM from San Francisco to London. Turn off the VM in San Francisco, transfer only the unique blocks, and recreate it across the WAN in London. Depending on the amount of data and the available bandwidth this operation could take as little as a few minutes, and what you have effectively done is a storage cold vMotion/Storage vMotion over distance without the need of shared storage.
The last aspect of the 3E is Easy, and honestly who doesn’t like that. Easy from the standpoint of abstraction of the mundane and complex nature that storage has been in the past. Gone are the days of zones, lun masking, luns in general, configuration and updating of HBA’s, WWNs/WWPNs, raid sets, configuration of disk groups, etc., the list goes on and on. What could have taken hours of work should be replaced by a few mouse clicks and a small amount of data input. Additionally moving the storage closer to the VM and compute, out of the array as it were and onto the hosts themselves is one aspect of this. Not having to traverse the traditional storage area networks and the headaches involved with them is beneficial. Furthermore, not having to rely on a dedicated storage team to provision simple resources, a task that for some organizations can take weeks or days due to political aspects allows IT to become more agile, more effective, and to dedicate their time and resources to the business itself instead of the joys of menial tasks.
In conclusion: I very well may have no idea WTF I’m talking about, but I do see a need for significant disruption in the IT space.
The last few months for me have been a real journey of discovery as I’ve moved into working with a technology that I truly believe is not just new and exciting, but also something that addresses key aspects of some of my over riding philosophies around IT as a whole. I believe there should be an aspect of IT that is Simple, I often use the hashtag #SimplifyIT in discussions around hyperconvergence, and part of its goal is to take the complexities out of that legacy stack of technologies, and condense, consolidate, and converge (oooh 3C’s sounds like a new blog post), but I also believe that the silos we have created do us no favors, and in order to break those silos apart we need to do what we can to make technology approachable and accessible to the cross section of IT department members. Thats where I see a need for the technological aspects of the 3E’s when it comes to storage. It doesn’t necessarily have to follow the items I have laid out exactly, but it should make an attempt to take some of those processes and apply them in the datacenter.