Through Howard Marks and a discussion on needing new bench mark tools, I saw this link to Demartek digging under the covers of the various IO tools to find that some of those data sets really lend themselves well to dedupe and compression algos more so than others.
SQLIO: all zeros = all worthless, but I do believe there is an option to do random data sets. Though I’m not sure what the output looks like.
IOmeter 2006 = good
IOmeter 2008 = Crap, Why the change?
Full data above at the link. Perhaps my initial take on this is somewhat simplistic, but I would say that the overall issue remains. Simple tools produce simple results.
So this quip from Howard got me thinking about a demo I saw from Nimble the other day:
If that weren’t depressing enough, even the most sophisticated benchmarks write the same, or random, data to create their entire dataset. While disk drives, and most SSDs, perform the same regardless of the data you write to them, the same can’t be said about storage systems that include data reduction technology such as compression or data deduplication. If we test a storage system that does inline deduplication, like the new generation of all solid state systems from Pure Storage, Nimbus Data or Solidfire, and use a benchmark that writes a constant data pattern all the time, the system will end up storing a 100GB test file in just a few megabytes of memory, eliminating pretty much all IO to the back end disk drives, or flash, to deliver literally unreal performance numbers.
During the demo, they run IOmeter on one of their rigs during the entire time. You can see the system chugging along at around 16k IOPS, but without knowing what IOmeter version they are using, that number is pretty worthless. I’m not trying to pick on them, but its something that I was unaware of with IOmeter as well as some of the other tools and it puts the results given into a better perspective for me.
Just an aside, the IOmeter version used in the VMware IO Analyzer tool is version 2006.