As most of you know, VMware decided to change their licensing model for vSphere 5 just prior to VMWorld 2011 this year. The results were less than positive. I think the initial thread in VMTN forums was close to 133k views by the time VMware decided to change the initial scheme and upgrade the entitlements across the board.
A little history:
If you’re not familiar with the new licensing entitlement breakdown, the revised edition is here.
I think customers were right to be alarmed at the initial low entitlements per socket. The documentation that came out with the original entitlements pointed to consolidation ratios of around 5:1 which was the case for say 2006, but today many customers are achieving ratios of 35:1 and higher depending on the workload being virtualized. In my organization I have roughly 30:1 consolidation ratios per host, and expect to reach 45:1 with additional per host memory upgrades in the near future. The increased entitlement went a long way to helping quell the uprising that resulted.
As an FYI, Rynardt Spies of VirtualVCP.com @rynardtspies has a very excellent vsphere licensing calculator that was very helpful in determining pricing.
Birth of a Virtual Environment:
For the record, I have learned a great deal in the last 2 years when it comes to virtualization and host sizing. My initial VMware hosts were dual socket dual core IBM x3650’s with 12GB of RAM running ESXi 3.5. I had migrated off of Windows Virtual Server 2005 and onto “free” ESXi 3.5 as a proof of concept exercise to prove to my manager that VMware was going to be the better virtualization play, as well as to show that I could do the work myself without having to engage outside consultants. My organization is very cost conscious and it is difficult to get budget for many of the IT based projects that we take on. Regardless of the huge ROI we could incur with virtualization, there was serious pushback from upper IT management and a great deal of skepticism and fear.
Currently I manage a small virtualization environment and we are primarily an IBM shop. I have 3 Production Data Centers, separated out by host type and location. Our first true production cluster was 2 IBM 3850M2 Quad Socket Quad Core systems with 128GB of RAM. We use 8GB FC for Data Store connectivity and 10GB Ethernet for LAN. At the time I purchased these systems, they were the largest x86 machines in our environment, eclipsed only by our Power5 and iSeries.
As we started P2V’ing more and more systems, and as requests for new servers came in the need for additional hosts to support the ever growing environment became clear. At this time the new Nehelam chipset was being finalized and we opted to wait until the IBM X5 series of servers was released before we purchased new hosts. A second production cluster was introduced with 2 Socket 8 core x3690 X5 servers with 128GB of RAM each. We were very pleased with the performance of the new X5 servers, and noticed that they were far more efficient than our older 4 socket xSeries servers.
As with many virtualized environments, we are memory constrained, to the point that deploying additional VM’s will have to wait until something changes. This graph above was the impetus for this post.
Now when I graph on memory utilization per socket the real issue becomes exposed.
As you can see, some servers perform better than others when it comes to memory utilization per socket. For me the the idea is to get as much bang for your buck as you can. The 3690 systems are currently utilizing 46GB of memory per socket, vs the 4Socket 3850’s at 13. Part of this is higher socket density, but when you look at CPU utilization, the higher socket count systems ultimately become less efficient and far more costly than 2 socket systems which outperform them.
Now given the changes to the VMware licensing, my goal is to utilize the full memory entitlement per socket on my ESXi hosts. I want to be able to deploy enough VM’s per host to eat up the entire 192GB vRAM entitlement for a two socket host, and allow for N+1 redundancy in the cluster, without incurring a vTAX cost. Thus, the sweetspot.
- 4 ESXi Hosts with 256 GB of RAM each
- 192 GB Active on each system takes full advantage of Enterprise + Licensing
- Spare 64GB Capacity on each host provide for N+1 with 4 Hosts
- 4 Hosts provides for approximately 192 Standard Servers at 4GB of RAM each
- Consolidation Ratio of 48:1 (best case scenario, but not likely with real world workloads)
Arriving at this conculsion:
I’m no excel wizard, but I’ve tried my best to determine exactly where the price point was for our organization when it came to getting the most out of our existing hosts, and planning for future host purchases.
Here I took 4 different hosts/memory configurations to see at what point do I come to the sweetspot where I can support N+1 utilizing the full memory entitlement for each socket without a tremendous amount of surplus memory being wasted. The idea is to get zero excess memory while utilizing the full licensed amount during a host failure.
I had to choose 512GB of RAM per host as my upper limit because that is the max for the X5 servers without going to an external memory tray.
For my IBM systems, the prices reflect what I’ve paid in the past and what I’ve been quoted in the future, so your prices of course will vary and this is by no means 100% accurate but it does give a good indication of what I would expect to pay given our current environment.
The red line is the where N+1 comes into effect. A 3 host cluster will need 384GB of RAM per host in order to utilize the full licensed vRAM per socket and sustain a loss of a single host. A 4 host cluster needs 256GB as does a 5 Host system. It’s not until we reach a 6 hosts with 384GB of RAM per machine that we get the biggest impact with the ability to lose half the cluster and still have enough capacity in each of the remaining systems to run all of the licensed Virtual Machines. Still that’s a lot of un-used physical memory.
For me, the sweet spot will be 4 hosts at 256GB of RAM. This equates to a perfect amount of RAM allocated to result in no excess memory. If I go to 5 hosts, I will have 64GB of spare memory not utilized by actual VM’s, but 4 hosts leaves me with 0. This for me is an exercise at trying to be efficient with memory utilization that supports the loss of a host for a short period of time, but allowing all VM’s to run. I know some shops have different requirements, but for me being able to run all of my hosts at 75% memory utilization that equates to the amount licensed is a good fit. That 25% overhead is simply buying me the +1 without the cost of buying and licensing an entire extra host.
So if I have 4 hosts with 256GB of RAM and each host is running at licensed vRAM max of 192GB then the 4 node cluster will still have enough physical memory capacity available to support the licensed vRAM entitlement for the physical servers. That is, unless I don’t understand what I’ve been lead to believe so far in regards to the licensing scheme for vSphere5.
Now I could be totally wrong on all of this and have missed some glaringly obvious flaw so, if you see something that I am missing please let me know. I have excel spreadsheets that back most of this, but I’m afraid it might not make a lot of sense to anyone but myself, but I will post them soon.