Windows Storage Spaces 2016: What I learned about Microsoft’s SDS by upgrading my home lab.

I built it wrong 3 times, destroyed 3TB of data, but learned a lot.


I will start this off by saying in full disclosure that the most I’ve ever used Storage Spaces is in a very limited lab when I was studying for 070-411. I have always used more “established” SDS options (Ceph, ZFS, Gluster) and looked at storage spaces as a “cute” feature that they included in Windows Home Server, but wasn’t something for actual Enterprise. I will also add that I didn’t get to play with any of the new Clustering or Storage Spaces Direct, and that makes me sad because this is what I’m really interested in for my prod environment.

Where I started:
My Server is a whitebox. Used to be my gaming rig that I turned into a Hyper-V host when I stopped gaming and switched my work battle-station to my MBP. The Specs are i7 920, 24GB RAM, 3x WD RE3 1TB, 3x WD RED 2TB, running on Server 2016 (RTM). The WD RE3s are in RAID 0 controlled by the motherboard (Intel ICH10) and were partitioned into 2 for Hypervisor OS and VM Storage and then the 3 WDs passed to my Plex VM via Hyper-V RDM. I have 6 VMs running at all times. 1x DC (2012 R2), 1x Ubiquiti WAP Controller (CentOS), 1x Pfsense, 1x Media Server running Plex, 1x ELK stack box (Ubuntu), 1x App Server (2012 R2). I ended up with the RAID 0 because I could only get my hands on 3 RE3 1TB drives (the 4th I got failed SMART during my DBAN of the drive and was unusable and out of warranty, and at the time I couldn’t afford to go by another one) and I needed the IOPS for the VMs. So I ran a MBR partitioned 3TB RAID 0 where all the VMs and the boot OS was (yeah, I know this was dumb, but I built this back in 2011 when I didn’t know anything and I just knew enough to get stuff working).

What I wanted to accomplish:
I wanted to get rid of my RDMs (and the rats nets of symlinks I created to make them look like one single large drive) and move everything into a single pool of drives, managed by the hypervisor where I would just have to manage VHDX sizes on the VMs and not worry about shuffling media around drives. I purchased a Toshiba OCZ RD400 256GB (NVMe) and a 5TB Hitatchi DeskStar 7.2K on black friday for $260 combined (Thanks Fry’s). I wanted that pool to have either an SSD tier, or at the very least an SSD Write Cache. I went NVMe for my cache disk for one simple reason. I was out of SATA ports on my Motherboard, I had no drive bays left in my case, and the NVMe drive was cheaper than the cost of a regular SATA III 256GB SSD + SIIG 6 port SATA III controller.

My First Attempt
After sorting out copying all my media off the 2TB disks and and fixed all the system issues around that copy (see last section). I decided that I wanted to do a single parity space with a SSD tier. So 3x 2TB WD Red in Parity + Single SSD in a SSD tier, then create a volume that spans both, and use ReFS to do auto tiering. After lots of googling, I determined this wasn’t going to be possible (or at the very least I couldn’t figure it out). It seamed that the only two options I had for a tiered disk was Mirrored or Simple. I didn’t really want to do a simple space because I wanted some sort of failure protection (my disks are now 5+ years old, and I don’t trust them), but at the same time I really wanted tiered storage. I kept getting failures that the disk couldn’t be created because either the column settings or disk resiliency settings were wrong. Lots of google, powershell, and scotch later I gave up and decided that trying to force an unsupported configuration wasn’t going to happen. All the guides and whitepaper I read trying to set this all up recommended minimum 2 SSDs in a Mirror for your SSD tier, and your storage tier also be a mirror. Many graphics showed the ability to use SSD + Parity, but like I said, I couldn’t get that working (I did get this working later.)

For those doing multiple tiers, This technet blog was super helpful in explaining why even though I had mixed drives I still couldn’t use the wizard. The Powershell snippit I used to fix the issue was

Get-PhysicalDisk | Where MediaType -eq "Unknown" | Set-PhysicalDisk -MediaType "HDD"

My Second Attempt
After reading more about 2016 I decided that I would just do a writeback cache on my NVMe drive and then put the disks in parity. This worked. I built my new fixed disk with a 3 drive parity, gave it 100GB of WriteBack Cache, then once complete formatted a ReFS volume, and it all seemed good!

The Final Product:
Windows Storage Space 2016
1 Storage Pool containing all disks (Except the 5TB). 3x 1TB, 3x 2TB, 1x NVMe.
1 Virtual Drive. Using Parity (single disk failure) total usable 5.4TB. Write Cache 100GB

Problems I had / Things I learned along the way:
1. Intel RST does NOT play nice with Windows Server 2016. Installing it prevented the server from booting if there were any disks attached to the ICH10 controller. It would just spin forever at the windows loading screen.
2. JMicron jmb36x WHQL drivers cause the disks to show as SCSI instead of SATA. This prevents the disks from being added to the primordial pool in storage spaces.
4. DO NOT use the GUI for storage spaces. Learn the powershell. The GUI is extremely limiting and actually caused a lot of me scratching my head trying to figure out why things weren’t working. Powershell gave useful errors, and there were a lot of settings (like cache size, and column settings) that I couldn’t select from the GUI wizards.
5. When you attach normal HDDs internally, you have to set the disk type. It picked up my NVMe drive as an SSD no problem, but my HDDs were all “Unknown” The fix for this is this PS one liner.This wasn’t useful in the end for me, however if you are building an auto tiered storage pool, this is crucial.

Get-PhysicalDisk | Where MediaType -eq "Unknown" | Set-PhysicalDisk -MediaType "HDD"
  1. Adding a SSD to a pool will by default (in 2012 R2 and 2016) be used for cache. You do not need to set this manually. Just make sure that all your caches together don’t exceed the total available SSD space.
  2. You can see when your cache is draining to disk during a large write operation if you have a parity disk. Transferring 4TB of data was very hard on this pool. I totally agree with MS now that the Parity tier is really for cold data only, but unfortunately you have to actually get cold data into that tier somehow.
  3. You CAN NOT modify the settings of a virtual disk once you create it. Once you execute that New-VirtualDisk command that’s the last time you get to modify the settings. Make sure that you do all appropriate testing and verification BEFORE you start migrating data.
  4. If you use a parity space to get optimal performance out of it use the following powershell
Set-StoragePool -FriendlyName <Storage Pool Name> -IsPowerProtected $True

Performance of my Space
I am seeing 310MBps average when the writes are hitting the Cache and not the disk. When they do leave the cache and start hitting the disk I see about 105MBps average, which is normal for SATA II disks, HOWEVER that only tells half the story. My disk latencies during transfers are horrific. Like 600-20,000ms latency during sustained write operations. CPU utilization on the hypervisor was 6% average, and in the guest was 29% average. Again these are not great numbers.

Final Thoughts
So far I like storage spaces. It had a small learning curve, but honestly was a lot more shallow than say Ceph. In typing this all up after having multiple hours of sleep, and going back through my history to find old links I do think I figured out where I was going wrong with storage tiering, and I may revisit that to see if I can get it working. I also unplugged a drive from the pool to see how the rebuild process was. It was a little more manual than I would have liked (didn’t just auto rebuild once it saw the drive was replaced) but the overall process took under 2 hours, and during that time didn’t have a huge performance hit. I’m excited to keep playing with storage spaces, and I really like what MS is doing with all the hyperconvergence in Server 2016.