Backup Is Broken

wastedio

Every night millions of terabytes of data are read and written. In my view, this is a wasted IO operation. This graph should look familiar, its what I’m speaking of when I say that backup is broken. That every time I need to protect or replicate data, I have to read and write data, in many cases to multiple locations and multiple mediums. In the beginning, this graph would have been equal Reads/Writes. As backup software became more intelligent it allowed us to simply write out the deltas or changes that occurred since the last time we went to protect the data.

 You’re Doing It Wrong

As a former TSM user (and Backup Exec, Netbackup, Veeam, vRanger, ArcServe, Legato), the idea of incremental forever was always appealing to me. Yet still there are a host of data protection programs today that are still subscribing to data protection schemes cooked up in the stone age of IT.wrong I’m not the only person who believes this, over the last few years I’ve started seeing more focus being placed on intelligent backup and data protection. Still the underlying fact remains, that a significant number of organizations are still following the traditional data protection schemes that were designed in the 1990’s. Things get further complicated at the storage layer, where Snapshots are being leveraged and then being called “backups”. Repeat after me:

Storage Layer Data Protection Isn’t Cutting It.

One of the primary tools used at the storage level is the snapshot. We use them for data protection, as well as with virtual machines to protect data (to an extent). As I’m fond of screaming on Twitter: Snapshots are Not Backups!  Snapshots are helpful in my view for wanting a quick rollback position for something I want to change on a temporary basis and test. If a platform leverages a snapshot aspect today and calls it data protection, run away, and run fast, like as if the fast zombies from 28 Days Later were chasing you. Thankfully that’s probably not going to happen. wastefulNow for the sake of argument, lets say that snapshots are a valid form of data protection. At the storage level, what happens if I want to protect just one VM (the green one in the image to the left) in a datastore or LUN? At that point, and in general terms, the storage arrays have no inherent understanding of the data within at the block level. Thus the concept of VM-Centricity is alien. There is no inherent understanding of what constitutes the VM object in the LUN, and therefore the ability to segment or isolate a single Virtual Machine at that layer is not possible. And thats where we get into the situation of having to protect and replicate large chunks of data we don’t need. Now backup software itself can address the limitations I’ve discussed above, but along with my “What If” question in the last post, what if VM-Centric protection was built into your infrastructure as a native function and allowed you to not have to leverage a 3rd party software layer?

Now thats just local data protection, and I’ve yet to even touch on moving data from one location to another for offsite protection. The same scenario plays out when we do replication at the array level. The lack of data awareness means that we can and do replicate a good amount of data we don’t need or want. Swap files, temp files? All included even when we have no need of them. To go further, if I want to get granular down to the individual VM level, the storage systems we have today cannot provide replication features at that level. We have to leave that functionality up to software solutions.

Commercial Time!

12-19-2013 8-34-39 PMSo my counterpart Damian Bowersock has written a bit about our New Approach to Data Protection: Part 1 and Part 2. I wanted to discuss it by putting my own take on it and how I see some of the benefits with OmniCube.

One functional aspect of the SimpliVity OmniCube platform that sets it apart from other Converged systems is the built in backup and replication capabilities. These are native operations that come with the platform. These are not ala-carte or pay for services, it’s all included. These are independent, crash consistent, point in time backups of the virtual machines that run on the OmniCube. They are also VM-Centric in their nature, meaning we address this data protection at the VM object level for local data protection, as well as off-site replication. Furthermore, these are policy driven and automated. Create a policy with your data protection scheme and apply it to the VM’s you wish.policies Need to backup that VM every 10 minutes? Simply create a policy and set the rules in place to do so. From that point on, its an automated process with all your standard aging and data retention rules. Oh and the best part, that a local backup takes 3 seconds, consumes no space, and has zero impact on the running virtual machine. That backup window in the first image presented in this blog post becomes a thing of the past.

RPO/RTO can be met very quickly, and replication schedules can be crafted at the per-VM level based on data change rates instead of trying to meet a one size fits all scheme. Now that is for local VM’s running on High Availability enabled OmniCube platform (two OmniCubes). I need to point out that one thing to keep in mind is that when the system scales there is not just high availability from a standpoint of running VM’s, there is high availability from the standpoint of the storage level as well. So if I have two OmniCube systems connected together and one fails, not only is the VM protected and standard VMware based HA kicks off to bring the VM up on the surviving system, but the backup data and VM storage is protected as well.

backups

 

Blah Blah Blah

So yet again, I find myself being a bit long winded and prattling on. What started out as a screengrab long-windedfrom a system in the lab, moved into a full blog post, that migrated into a second blog post, that looks like it might move into a third. Hopefully you have not fallen asleep yet and see that there is something to this Hyper-Convergence stuff that you were not aware of. Next up, VM Mobility. Oh and if you are looking for additional information on OmniCube, this is a good starting point.

Edited to add: because my Veeam friends asked, what about granular file level recovery and application integrated backup/recovery? Application consistent backup is available currently but single level file is not. For the larger part of this discussion I’ve focused more on the machine level protection and not app/file. Hope that helps.

Posted in HyperConvergence, Omnicube, SimpliVity, Storage, Storage & Virtualization | Leave a comment

vExpert Weekly Digest – December 19th 2013

12-19-2013 7-41-52 AMIt’s that time yet again, vExpert Weekly Digest is now available for your purview. This week I cheesed and put my own article on the cover, primarily because that image cracks me up every time I see it. Good stuff abounds for your reading pleasure, make sure to check out the two part series by Kenneth Hui on loading Openstack on your laptop. Also some good storage related stuff from the usual suspects. And as a very special treat, the entire Metallica concert from Antartica just to mix it up and throw everyone into the Christmas Spirit.

Posted in vExpert, vExpert-Weekly, Virtualization | Leave a comment

DEDUPE ALL THE DATA

Funfact: This post will be markety in nature.

I just got back from the Gartner Data Center Symposium where we (SimpliVity) were Platinum sponsor. This was my first time at a Gartner based event and I found the overall atmosphere of the show to be

GartnerDCvery well put together and effective. One of the really nice things about this show was the non-competition with sessions for the solutions exchange vendors. This gives participants the ability to not have to sacrifice a session in order to go talk with the vendors they would be interested in.

One other aspect of the show I really liked was the ability to speak to both end users/decisions makers as well as the Gartner analysts. I found myself having much longer one on one or small group discussions on our technology and how it would fit into the organization instead of just running through a random pitch/demo. For me, the evangelization aspect of working the show floor is what I tend to enjoy. That said, I was doing a fair amount of demonstrations of our new software release on the OmniCube system.

This of course leads me into the thrust of this post and discussion.

 DedupeDedupe All The Data.

There are 3 primary means to reducing the storage footprint for virtualized workloads in the datacenter today. Deduplication, Compression, and Thin Provisioning. All three assist in reducing the physical storage required, all three have matured to a point of wide adoption over the years as well and the technology behind them is traditionally well understood and accepted across the IT spectrum.

Deduplication: It’s Not Just For Space Reduction

For some storage platforms, dedupe takes place as a post process action. Coming in after the data has been written, and reducing the total physical capacity after the fact. Helpful this is for storage space stop-hammertimereduction, but it has very little to no benefit for the reduction in IO operations. In fact, I’d argue it generates more IO within the array as the work that has to be done to hydrate/dehydrate the data requires significant overhead. This overhead impacts performance, and thus most storage systems today cannot do inline deduplication of data. As with tiering and data progression actions (where data is moved about the array during a scheduled period or in real time), post process dedupe can lay some serious hammertime onto your underlying disk infrastructure. 

Now don’t get me wrong, there is a lot to be said for space reduction at the storage level. The All Flash array vendors require it in order to be efficient and bring the cost of their flash down in line with the costs of disk, if they can’t then it blows the cost model for Flash as a general purpose storage platform. Then there is deduplication used in backup and replication technologies. The need to reduce RPO/RTO for backup windows given the data explosion is significant. That Weekend Full that takes 64 hours to backup that 50TB of data isn’t cutting it for backup, and it most certainly isn’t going to work for a recovery point objective of 4 hours. So we get backup appliances that incorporate dedupe so the same data isn’t backed up redundantly. Going further, replication at the storage level really can benefit from deduplicaiton as well, if you only have to send data across the WAN once, you’re much better off than having to send the same blocks over and over again. Bandwidth is expensive.

All This Brings Me to What If?

whatifWhat if all of our data was deduplicated before it hit the disk structure? To do that you have to be able to do in-line deduplication. Taking it further you need to do it at a very fine grained level (aka 4-8k blocks). Just so we are clear, this can’t be done post-process, and its not enough to do it on just a single tier of storage within the array, you need to be able to provide dedupe once and forever across all tiers of storage (DRAM, SSD, HDD), across all classifications of data (primary, backup, WAN, and Cloud) and do it on a global scale (across all locations that contain your data). Oh and for fun lets throw in the requirement that there will be no penalty on performance at any level of that process? One last thing, throw compression into the mix as well because that would be sweet. Sweet like a chaco taco made with a bacon shell sweet. If you have this functionality, you can move beyond simple reductions in space on the storage tier, and now you are moving into the realm of Data Virtualization.

Data Virtualization Engine

When I first starting working with our OmniCube platform I didn’t quite get the true benefit of what we call our Data Virtualization Engine. I thought, hey its cool that we dedupe and compress data, that will save tons of space. What I didn’t fully grasp was the drastic amount of IO we would essentially eradicate. With inline deduplication and compression that occurs before the data moves down the traditional storage disk system the ability to simply not do IO is presented. As I’m fond of pointing out, the quote from Gene Amdahl is: “The Best IO is the one you don’t have to do”, and I can think of no other platform that illustrates this functionality like the OmniCube.

dedupe-all-the-thingsSo, what am I looking at in the image above? Essentially, the data structure layout on the OmniCube system. Data is broken out in logical groupings of VM Data, Local Backups, and Remote Backups (data replicated from one local OmniCube Federation to another). Then broken out is the deduplication/compression ratios which when multiplied present an Efficiency quotient. In the example above, its 164:1. And while a total of nearly 72TB of data is being stored on a pair of OmniCubes that physically can hold 28.4TB , the actual amount of data written has only been 445GB (aka 1.5% of physical capacity) of actual unique data . The result is called Savings, ie eradicated IO, or IO we have not had to write. It goes a step further as well, when you don’t have to write data more than once, new writes if they are not truly unique remain unwritten to the backend disks though acknowledged to the VM, this increases performance far beyond what a traditional storage platform can achieve, even with hundreds of spindles..  A write that you don’t have to do will always be faster than the one you must do, regardless of how fast your underlying storage is. This is why we call the equation Efficiency, and the result Savings.

Yeah Right, Lab Queen.

laqqueenOk, so you’re probably asking yourself, what was the makeup of those machines, I bet it was just the same machine over and over again. Well, for the record its not, its several VM’s with a data creation script that generates change rate data over the day. It’s more like a 2:1 ratio of Windows/Linux of varying size. The larger machines are roughly 470GB when compressed, with the smaller systems clocking in at 3GB, 16GB, and 40GB.  So now for a bit further explanation about the numbers. There is one thing about this that helps with the seemingly unrealistic Efficiency ratios, its the fact that we don’t have to write the same block twice in the system. I also can take a backup of a Virtual Machine 100 times, and we will count that as part of the logical calculation. Now before you cry foul, keep this one thing in mind. If you were going to back those machines up through traditional means, you would have had to read and write out that data structure, we don’t, and since we don’t its our position that you should understand just how efficient the Data Virtualization is.

This posts getting a little long winded, and in my usual fashion, its getting a little divergent from the general thrust of what I was looking to write. But the one aspect of this that should stick with you dear reader, is that you cannot do what I’ve shown you above with traditional storage platforms especially if they claim dedupe, but only do it across a single tier of storage, or as a post process action.

The sister post to this one will go further into the aspects of the backup and replication and why I tend to say that Backup is Broken. You will have to stay tuned for that one. As always more information about SimpliVity can be found here.

Edited to add per the point brought up in comments: We are not simply writing a single copy of this primary data, it would not be proper to do so. In the next post about backup, I’ll go into more depth about the underlying data protection aspects. To put it simply, we will have more than a single copy of primary data, and it will not always reside on the same system. Full storage HA comes with a secondary OmniCube unit, or multi-site implementations.

Posted in HyperConvergence, Omnicube, SimpliVity, Storage, Storage & Virtualization, Uncategorized | 2 Comments

vExpert Weekly Digest – December 6th 2013

12-6-2013 5-21-05 AMBack to our regularly scheduled programming. This week finds a good collection of stuff not necessarily VMware related, but still well within the Virtualization realm. Overall a bit of a slow week, I’m guessing it has something to do with the holidays. As for me, I’ll be at the Gartner Data Center event in Las Vegas starting this Sunday through Wednesday. This is my first time at a Gartner event so I’m not 100% sure what to expect.

Shameless plug time: if you are attending the event, SimpliVity is hosting a Scotch & Rock event on Wednesday evening. Come join us for Scotch tasting and 80’s rock. Sadly I won’t be able to attend, as I’ll be presenting in conjunction with Brocade in Santa Clara that day (click here to register) on “Supercharging your VDI Environment” sounds groovy.

Posted in SimpliVity, vExpert, vExpert-Weekly, Virtualization | Leave a comment

vExpert Weekly Digest – Black Friday Edition

11-29-2013 8-01-36 AMWhat better way to pass the post holiday food coma that many of you are probably still recovering from than with a nice casual reading of vExpert community ramblings. Or you could be out spending the day trying to score an XBOX One/PS4 during Black Friday. Lots of good stuff this week, make sure you check out Mike Prestons ongoing series on VCAP prep. Also lots of discussion around OpenStack and AWS this week. Both are subjects I’m continuing to expand my knowledge on.

Posted in vExpert, vExpert-Weekly, Virtualization | Leave a comment

vExpert Weekly Digest – Yes I paid my webhosting bill edition

11-18-2013 3-30-25 PMSo that was embarrassing. Having been on the road for some time, I realize that I probably should have actually listened to those GoDaddy spam messages that kept showing up on the home phone that I no longer use. Hosting Fail.

That said, massive travel efforts along with huge crazy awesomeness at work (aka $58M C-Round) has kept me completely bogged down in tech evangelism, selling, outreach, awesome. Still I’ve made time in my unrelenting schedule to whip up an edition of vExpert Weekly Digest for your reading pleasure. I’ve not forgotten about the half dozen unwritten blog posts that are still sitting in my skull. Once I figure out once and for all how to completely avoid sleep, I’ll start cranking them out.

Posted in vExpert, vExpert-Weekly, Virtualization, VMWare | Leave a comment

VMUG Portlandia

portlandiaI only recently started watching Portlandia, yes its funny and worth at least an introductory viewing.

That said, I’ll be displaying my wares at the Portland VMUG this November 12th, so if you are in the area please swing by and say hi.

Following the days events there will be vBeers at the Spirit of 77 right across from the convention center where the End User Conference is taking place. Looks like it will be a pretty awesome place to hang out and chew the fat.

Posted in SimpliVity, vBeers, VMUG | Leave a comment

vExpert Weekly Digest – November 5th 2013 – Super Slacker Edition

Long time no post, frankly I’ve been mega busy these last few weeks. Call it Geek-Interrupted (only I don’t look anything like Angelina Jolie). With my work schedule taking me all over the Western United States (yes I’m the official Southwest Seat Warmer for the Nerd Bird), I’ve managed to pull together a few key minutes to pull together a quick vExpert Weekly Digest for November 5thvExpert-11-5-13

So much great content out there, I really wish I could put it all in one place, but then you my readers would get nothing done. So, give it a flip. As always feedback is always welcome and you can find me on the Twitters @Bacon_Is_King

Posted in vExpert, vExpert-Weekly, VMUG, VMWare | Leave a comment

vExpert Weekly Digest – October 11th 2013

vExpert-Oc11New Edition hot off the Ethernet. With VMworld Europe kicking off this Monday there is a lot of buzz around what the VMware community will be doing across the pond. Of course there will be some new sessions and new announcements that come out of this years show so I’m hoping to collect them all with the next issue. If you are going to VMworld Europe, its probably not too late to grab a ticket for the vRockstar party at the Hard Rock. Hit up Hans or Patrick.

Posted in vExpert, vExpert-Weekly, VMWare, VMworld | Leave a comment

3E’s of Storage Part Deux

Quick Stealth Edit: I should preface a lot of this with much of this discussion is focused on storage in Virtualized Environments, but the overall gist of the 3E’s should apply to physical as well.

So in part one, I rambled on about some of the stuff we see with legacy storage platforms and how they are not necessarily fitting what I call my 3Es of Storage: Efficient, Effective, Easy.  Obviously that begs the question, what does? So today I get to ramble on some more.

efficient

When it comes to storage Efficiency: deduplication and compression have long been the two factors utilized in the marketplace, with both functions used in conjunction to further accelerate the “space saved” equation. As spinning disks get to sizes that hurt my head (rebuild times forever, begin) the capacity issue doesn’t become as pressing as it once was when it came to simply storing data. Still there is relevance beyond a simple data reduction, especially if these two factors are initiated during data ingestion and not as a post process function. When done across all tiers of storage (DRAM,SSD,HDD) for  Primary, Backup, and Replicated data, the efficiency of storage moves into the realm of IO reduction.
3E-IO

True storage efficiency runs across all tiers and all data sets. When you start to do that, you start to eradicate IO operations. I’ve used this quote repeatedly by Gene Amdahl, and it boils down true efficiency to this: “The best IO is the one you don’t have to do”. When I talk about efficiency that’s essentially what I’m talking about.This aspect is further evolved when we start to apply it to backup and replication actions, and if done correctly can allow near instantaneous point in time full backups at the object level, or in the case of virtualized workloads, the Virtual Machine level all while creating no additional IO, and no additional space. We simply don’t write what we don’t have to.

This leads us into the Effective aspect of 3E, where we start to provide data management that becomes specific to objects. In the case of a virtualized infrastructure that object is the Virtual Machine, and in that sense we can provide VM-Centric based storage operations. When you take deduplication and compression and apply it globally, one of the primary benefits is the reduction in IO and ability to remove IO from the storage equation. Lets take the case of a traditional backup or clone, as we see there is the traditional high amount of read activity followed by a smaller amount of write activity.wastedio This is the traditional backup IO profile, high read IO, moderate to low write (at least for incremental backups). In the Efficient/Effective storage system this IO doesnt exist because there is an understanding of the underlying file structure as well as no need to recreate blocks that already exist.

So with your standard backup you can significantly reduce the amount of local IO, but whats not immediately obvious is new functionality that doesn’t really exist in legacy or modern storage systems, in this case virtual machine workload mobility.

Thanks to my buddy Hans De Leenheer for great illustration below.global-vm-migration

What we see here is two separate datacenters and VM workloads residing in both. Whats also seen here is that commonality of data exists between both locations at the block level. Now this provides two areas of functionality, one is the ability to replicate a Virtual Machine from in this case San Francisco to London, at this point data is evaluated at both ends and the blocks that dont exist in San Fran are sent to London. Now that secondary location has all the required blocks to rebuild that Virtual Machine, in essence the backup operation becomes a replication at the VM level (VM-Centric). We don’t have to replicate unnecessary data, (more efficiency) by stripping out log and swap files. We don’t have to send blocks that already exists, thus significantly reducing the amount of IO that needs to transverse the WAN link. We dont have to replicate VM’s that reside in a datastore or LUN that we don’t want to replicate as we would with many array based replication features. IO becomes a function of sending and receiving only what is necessary (aka Efficient/Effective).  Now take this one step further, and simply move the VM from San Francisco to London. Turn off the VM in San Francisco, transfer only the unique blocks, and recreate it across the WAN in London. Depending on the amount of data and the available bandwidth this operation could take as little as a few minutes, and what you have effectively done is a storage cold vMotion/Storage vMotion over distance without the need of shared storage.

The last aspect of the 3E is Easy, and honestly who doesn’t like that. Easy from the standpoint of abstraction of the mundane and complex nature that storage has been in the past. easyGone are the days of zones, lun masking, luns in general, configuration and updating of HBA’s, WWNs/WWPNs, raid sets, configuration of disk groups, etc., the list goes on and on. What could have taken hours of work should be replaced by a few mouse clicks and a small amount of data input. Additionally moving the storage closer to the VM and compute, out of the array as it were and onto the hosts themselves is one aspect of this. Not having to traverse the traditional storage area networks and the headaches involved with them is beneficial. Furthermore, not having to rely on a dedicated storage team to provision simple resources, a task that for some organizations can take weeks or days due to political aspects allows IT to become more agile, more effective, and to dedicate their time and resources to the business itself instead of the joys of menial tasks.

In conclusion: I very well may have no idea WTF I’m talking about, but I do see a need for significant disruption in the IT space.

The last few months for me have been a real journey of discovery as I’ve moved into working with a technology that I truly believe is not just new and exciting, but also something that addresses key aspects of some of my over riding philosophies around IT as a whole. I believe there should be an aspect of IT that is Simple, I often use the hashtag #SimplifyIT in discussions around hyperconvergence, and part of its goal is to take the complexities out of that legacy stack of technologies, and condense, consolidate, and converge (oooh 3C’s sounds like a new blog post), but I also believe that the silos we have created do us no favors, and in order to break those silos apart we need to do what we can to make technology approachable and accessible to the cross section of IT department members. Thats where I see a need for the technological aspects of the 3E’s when it comes to storage. It doesn’t necessarily have to follow the items I have laid out exactly, but it should make an attempt to take some of those processes and apply them in the datacenter.

Posted in Backup & Recovery, Enterprise Tech, HyperConvergence, Omnicube, Storage, Storage & Virtualization | Leave a comment