Bear with me folks, as this is going to be a two-parter, and yes I ramble when I write, which is also how I speak.
Like many people who work in the technology field, I’m a bit of a pack rat when it comes to old hardware and tech manuals. Rooting around the garage the other day, I came across a pile of Storage magazines from the mid to late 2000s.
In one of the lead stories, “How to provision the right way,” one thing stands out immediately: many of the discussion points and trends from a decade ago are still just as relevant today.
We still struggle to provision and allocate storage
We still employ tricks to address performance We still have yet to tame storage growth. When one looks at the numerous storage products that exist in the market today, some are affected with all of those challenges, some can help alleviate a few, and very few have been able to address all of them.
Let’s tackle these issues one by one.
Provision/Allocation
Storage administrators have historically had a difficult decision tree when it comes to the provisioning and storage allocation process.
As frustrating as it may seem, there are still solutions today that require the administrator to determine a specific RAID level at the LUN level, or performing LUN masking, striping, and concatenation of RAID groups or disk pools. There can be manual path selection concerns for most dual controller storage solutions that don’t have a proprietary pathing software component. Adding additional complexity, some solutions still have disparate disks types deployed for which complex design scenarios are required for provisioning.
While becoming less prevalent, some solutions leverage LUN mirroring for data protection. This requires that the administrator allocate additional storage for data protection or purchase twice the storage capacity actually required (try explaining that one to your CFO). Then, of course, there are reserves and set-asides for replication events, snapshot reserves, and the like.
Given the strong move of many organizations today toward a more automated and simplified management scheme for all aspects of the data center, the above issues pose a barrier to rapid provisioning and allocation of storage for the next generation data center.
Storage Performance Shenanigans
Let’s face it, storage performance has always been a challenge, especially with primarily disk-based solutions. Vendors have done everything from short stroking hard drives, to loading in expensive cache memory/cache cards, and in some instances throwing a few solid-state drives into the mix to address the shortcomings of the underlying architecture.
While most solutions today have moved away from leveraging those tricks, data tiering/disk tiering tends to still be utilized to address performance shortcomings.
On its face, a tiering solution seems smart. Let’s use HDDs for what they do best (capacity, sequential I/O), and let’s use faster disk, flash, or cache tiers for what they are best used for (high speed random I/O), and maybe even let’s throw in the “cloud” as a long-term archival solution, because it’s “cheap.”
But (and there’s always a but) the devil tends to be in the details with these types of solutions.
Most specifically, the ability to predict when data will be “hot” can be a challenge in and of itself, and the backend manageability of those “hot blocks” can result in resource contention in the classic robbing Peter to pay Paul sense. CPU cycles and overhead dedicated to the movement of granular data chunks (ranging in size from 32k to 1MB) can result in a reduction in performance, and in some instances, by the time the data has sufficiently been moved into the performance tier from the capacity tiers, the event that triggered the move may be over.
So now we get to start all over again moving data up and down the tiers, incurring wasted CPU cycles, impacting wear cycles on flash media, and impacting the longevity of spinning media. The hidden impact of data movement inside the array, and sometimes off of it, must be taken into account when looking at solutions that use on-array tiering and data movement.
Death, Taxes, and Growing Storage Demands
Indeed, as the old adage goes, there is nothing certain in life but death and taxes.
Let’s add storage growth to that as well. I think for most of us who started out as “storage admins,” the ability to predict storage growth within the organization has always been an exercise in futility. Sure, we poll the organization or business units that need storage solutions. We work with teams to “predict” some specific allocation needs for a project, which by and large always seem to actually be 3x what was asked for.
Bottom line, things change rapidly and that sweet box of storage we bought last year can be outgrown by a phone call from a business unit, a new customer being on boarded, or a merger/acquisition that was unforeseen. While some storage growth is indeed predictable in smaller environments, as the organization scales, its needs scale.
Many storage solutions today are perfectly capable of adding additional shelves of capacity, but it’s when that capacity outgrows the controllers behind it that we have a problem, and that’s when the dreaded “forklift upgrade” or expansion starts to rear its ugly head. Scaling of storage capacity in conjunction with the ability to scale the needs of driving that capacity (compute capability) tend to be elusive to many of the current crops of storage arrays on the market today due to their inability to scale out in a sufficient manner that addresses both capacity and compute capability in parallel.
The result tends to be organizations that end up with a more siloed storage approach as they outgrow the capacity and performance of the solutions that initially may have met their needs, but which they have outgrown.
What to do?
Sometimes an organization will face one of these challenges, or sometimes even all three. When looking to the future and attempting to address provisioning, performance, and capacity all at the same time, it’s important to take into consideration that no one size fits all and that there are indeed tradeoffs that can be made. The goal, of course, is to limit the impact of those trade-offs while minimizing risk and reducing costs.
I have a bit of a bias when it comes to how to address these issues, so in the next part of this post I’ll tackle each one head on. Stay tuned.