Recover or Clone Virtual Machine using Datrium Protection Snapshots.

In my last post I talked about how easy it was to setup application consistent recovery points with Datrium DVX Protection groups.

In this post we will look at how to actually use these snapshots in the real world to recover or clone a VM, or a VM file.

Recover a Virtual Machine:

Ok, so you get the phone call “my VM is not accessible” or a patch install fails, and you get the dreaded “Please do the needful and revert with urgency.” You could pull from a backup, and wait hours for the restore to complete OR! You could restore a DVX snap and be back up in seconds! Lets look at how to do this.

Recovering an entire protection group:

Maybe we have an entire group of servers that all got the same bad MS update and all of them need to be rolled back to pre WSUS update. We start by powering off the VM guest. This will always be the first step in a VM restore with Datrium. If you are doing a single file restore, Guest file, or application restore that is a different process. After Powering off the VMs in the PG (protection group) we can then select the PG, and click Recover

Screen Shot 2018-06-19 at 6.16.07 PM.png

The following window will pop up with the latest snap selected.

Screen Shot 2018-06-19 at 6.17.04 PM.png

If the latest isn’t the one you want to restore from, click Select another snapshot and you will be given a list of all snapshots for that PG across all DVX systems you have replicated to including CloudDVX.

Click Restore, and you will see in the tasks pane the restore operation starting. This usually takes 30 seconds if all snapshots are on the on premise DVX. If the snapshots are on remote DVX systems or in CloudDVX the process could take longer as the data needs to be replicated cross premise.

Once the restore operation is complete, power on the VMs and you’re good to go! It’s that simple!

Recover an individual VM:

There are 2 places you can recover an individual VM from. The VMs tab in the DVX plugin, or under the Monitor > Datrium DVX > Protection tab on the individual VM. In this case we are going to use the Monitor > Datrium DVX > Protection route because usually when I am restoring or snapping a single VM I am already on the individual VM setting in VMware instead of being in the DVX plugin.

The VM I am selecting for this restore is a 1TB file server. Maybe you had crypto locker hit your main file share and need to recover the whole thing.

The first step is always to shut down the VM, and then select your recovery point. In VMs

Once offline, the next step is to select the snapshot you want to recover from. In most cases this will be the latest, but it could be older depending on when the encryption started and you need to go back 2-3 hours. Then click VM Snapshot Actions. You will notice that you have a lot of options here, but for now we are just going to worry about Restore VM From Snapshot.

Screen Shot 2018-06-20 at 5.38.33 PM

You will get the restore screen popping up and at this point you just need to click Restore. Side note. DVX takes a snapshot of the VM pre restore, so you’re able to restore or clone from the pre-restore condition if needed.

Screen Shot 2018-06-20 at 5.39.02 PM

In the Protection tasks column you will see that your restore has completed, and usually takes under 30 Seconds.

Screen Shot 2018-06-20 at 5.39.15 PM

Once the restore is complete, power on the VM and you’re good to go!

BUT WAIT! What if the snapshot you want is in your CloudDVX remote archive? Simple, select the sources drop down, select your CloudDVX instance or a remote replicated DVX and follow the same process!

Screen Shot 2018-06-20 at 5.40.24 PM

Recovering a VM File:

Sometimes we don’t need to recover a whole VM, sometimes we just need to recover a VMX or VMDK that has been corrupted. We follow the same process as above, but use the VM Files in Snapshot option.  We select the file we want to recover, and select whether we would like to clone or recover the file. Cloning is useful if you are trying to recover a file back out of a disk, where as recover is…well a recovery.

Screen Shot 2018-06-20 at 5.56.32 PM.png

Cloning a Virtual Machine:

There are a few common reasons an administrator might want to clone a virtual machine. Maybe they want to be able to clone a running VM to a development sandbox to test an update. Maybe you want to be able to rapidly deploy a new virtual machine from a template. Lets talk about how to do that.

If you want to clone a running VM, just select the VM, go to the Monitoring > Datrium DVX tab, and click Clone.

Screen Shot 2018-06-20 at 6.18.18 PM

The following screen will pop up to give you options on what to name the cloned VM to.

Screen Shot 2018-06-20 at 6.08.48 PM

If you want to clone from a snapshot, follow the same process as restoring, but click Clone from Snapshot instead of restore VM.

Screen Shot 2018-06-20 at 6.08.23 PM

Once you have cloned your Snapshot or VM, you will need to register the newly cloned VM to VMware.

Start by selecting your DVX Datastore in VCenter, click Actions and then Register VM.

Screen Shot 2018-06-20 at 6.11.33 PM

Find the folder that matches the cloned VM name, select the VMX, and click OK.

Screen Shot 2018-06-20 at 6.12.02 PM

Complete the registration wizard, to select the host and folder to register the VM to. Once finished you can find your VM in your VCenter inventory and power it on.

Screen Shot 2018-06-20 at 6.13.02 PM

So there we have it, multiple snapshot recovery or clone options in Datrium DVX. In the next segment we will discuss application restore options.

Advertisements

Configure Datrium protection groups for application consistent recovery points.

Datrium has had snapshotting since their DVX 2.0 release which dropped in April of 2017, however one of the flaws I saw with it initially was that there was no application consistency. In the DVX 3.0 release in August of the same year a VSS writer was released which solves this issue.

So, you have a DVX, you want to use the DVX Snapshots to protect your VMs from failure, and maybe even replicate them off site. Lets talk about how we do that. In this example we’re going to use VCenter to do our protection setup.

Firstly we need to Log into the VCenter web client, and select the DVX plugin. Note Datrium does not support the VMware fat client.

Screen Shot 2018-06-19 at 3.15.19 PM

Select the Protection Tab.

Screen Shot 2018-06-19 at 3.16.22 PM

Select Create and the new protection group tab will come up. In this example we will do a couple things. Firstly I’m going to create a dynamic pattern search for all VM’s that match the pattern CTJ. Then I’m going to exclude any VM that has the pattern CTJ. In my environment, I use “-template” and “*-clone” to show VMs that are clones of production for testing, or templates for rapid deployment. I don’t need to snapshot either of these in this schedule. Hover over VM pattern tips to learn how to string multiple patterns together, how to exclude a pattern, or how to wildcard a single character instead of all following.

Screen Shot 2018-06-19 at 3.17.56 PM

This screenshot show the result of this naming convention string search. 6 VMs that I am responsible for. But we notice, CTJSQLP01 / 02. These are Production SQL servers, and a crash consistent snapshot would NOT be acceptable. So Let’s look at how to configure them properly for VSS.

Screen Shot 2018-06-19 at 3.18.07 PM

Under Dynamic pattern, there is Select Individual VMs. Click this box, and then search for the VMs you want to protect with VSS. Once they are selected, they move to the Selected column where you see the Use VSS. You will notice that SQL01 has a nice green check mark, however SQL02 has a yellow exclamation because the VSS agent is not installed here. Follow the instruction in my previous post Installing Datrium VSS Writer using Powershell to install the VSS writer automatically, Or manually download the VSS writer from the management IP of your DVX and run the MSI package.

Screen Shot 2018-06-19 at 3.24.55 PM

After selecting the VSS VMs, click Continue and you will see the protection schedule page. Here we can set the snapshot schedules, their retention, as well as a replication if we want. Notice that the retention on the replication is different than the local DVX snapshot. This way you can decide to for instance send the snapshot to a CloudDVX instance running in AWS for long term archive, but keep the on premise snap for long enough to keep an “oops” button. Also notice that replication can be a 1:1 or a 1:Many option. So if you have multiple DR sites, or in my case I have both DR and CloudDVX for multiple retention and restore possibilities.

Screen Shot 2018-06-19 at 3.25.35 PM

You can learn more about Datroum Data Cloud Foundation or CloudDVX on Datrium’s website and determine how this could be beneficial in your own DVX environment.

In my next post we look at Recovering VMs from a snapshot, Cloning VMs from a Snapshot, and Recovering individual VM files from a snapshot.

 

Installing Datrium VSS Writer using Powershell.

With Datrium’s DVX 3.0 release we included a VSS writer into the mix. The idea being that not only would you be able to have storage based snapshots on a per VM basis, but now it can be application consistent! The one drawback is that you have to manually install the MSI. This is changing in future releases, however for now we have to automate the process ourself.

Clint Wyckoff is currently writing a SQL recovery paper for Datrium and asked me to peer review it. One of the things we decided to add was a Powershell automated deployment script, that can be found below!

#################################################################
#   Push Datrium VSS Provider to a VM.                          #
#   Created by - Datrium - Cameron Joyce & Clint Wyckoff        #
#   Last Modified - Jun 13 2018                                 #
#################################################################
# This script will install the SQL VSS package on Servers which require the Datrium VSS Agent. You will need
# You will need to be able to hit all SQL servers and the Management IP of the Datrium Data Node from the
# system you run this script on. It is assumed that both local and remote hosts are running Powershell 3.0 or newer.

# Variables
$vm = Read-Host "What is the FQDN or IP of the VM to deploy Datrium VSS Agent to?"
$damgmtfloat = Read-Host "What is the IP address of the Datrium Data Node?"
$dapw = Read-Host "What is the Admin password to the Datrium Data Node?"

# Download a copy of the VSS MSI from the DataNode
Try{
    If(!(Test-Path C:\Temp\Datrium-VSS-Provider-1.1.0.0.msi)){
        wget https://$damgmtfloat/static/Datrium-VSS-Provider-1.1.0.0.msi -outfile C:\temp\davss.msi
    }
}
Catch{
    Write-Warning "Failed Downloading the MSI package. Manually download from https://$damgmtfloat:7443 and copy to the C:\Temp Folder of $vms."
    Break
}

# Attempt to connect to each VM and install the MSI package.
If(Test-Connection -ComputerName $vm -count 1 -Quiet){
    Write-Host "Checking WMI on $vm"
}

# Try / Catch block for WMI errors. A client that passes Test-Connection may not have PSRemoting enabled and will error. This will handle that.
Try{
    $ErrorActionPreference = "Stop"
    Invoke-Command -ComputerName $vm -ScriptBlock {If(!(Test-Path "C:\Temp")){New-Item "C:\Temp" -Type Directory}}
}
Catch [System.Management.Automation.Remoting.PSRemotingTransportException]{
    Write-Warning "$vm failed connecting to WMI."
    Write-Output "$vm Failed on WMI" | Out-File "C:\Temp\failed_$date.txt" -Append
    Continue
}
Finally{
    $ErrorActionPreference = "Continue"
}

# Install DA VSS
Write-Host "Checking: \\$vm\c$\Temp exists" -ForegroundColor Yellow
If(!(Test-Path "\\$vm\C$\Temp\")){
    Write-Host "\\$vm\c$\Temp DOES NOT EXIST ->> CREATING DIRECTORY \\$vm\c$\Temp" -ForegroundColor Yellow
    New-Item "\\$vm\c$\Temp" -ItemType Directory
}
Else{
    Write-Host "C:\Temp DOES EXIST" -ForegroundColor Yellow
}

Write-Host "Copying DAVSS.msi to \\$vm\c$\Temp" -ForegroundColor Yellow
Copy-Item "C:\temp\davss.msi" -Destination "\\$vm\C$\Temp\davss.msi " -force

Write-Host "Modifying UAC for installation" -ForegroundColor Yellow
Invoke-Command -ComputerName $vm -ScriptBlock {& cmd.exe /c "C:\Windows\System32\reg.exe ADD HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System /v EnableLUA /t REG_DWORD /d 0 /f"}

Write-Host "Installing MSI package" -ForegroundColor Yellow
Invoke-Command -ComputerName $vm -ArgumentList $damgmtfloat, $dapw -ScriptBlock {& cmd.exe /c "C:\Temp\davss.msi /quiet NETSHELF_IP=$($args[0]) ADMIN_PASSWD=$($args[1])"}

Write-Host "Installation of Datrium VSS Agent SUCCESSFUL on $VM" -ForegroundColor Green

This will be on the GetSysadminBlog GitHub repo, as well as being on Datrium’s official resources.

Datrium DVX and CVE-2017-5753, CVE-2017-5715 (Spectre), CVE-2017-5754 (Meltdown)

Given the current concerns regarding the latest security flaws regarding Intel and AMD processors Datrium thought it would be a good look to see if there were any issues they needed to address. The short answer is no, the software and the DataNode are unaffected and normal VMware or 3rd party server hardware patching would need to be done, however no DVX patches are needed.

Datrium DVX and CVE-2017-5753, CVE-2017-5715 (Spectre), CVE-2017-5754 (Meltdown)
Datrium has investigated impacts and mitigation options for Spectre, Meltdown, and related issues.
References
Summary
● Datrium DVX software is not directly affected and no action is required.
● Datrium Data Nodes are not affected and no action is required.
● 3rd party servers being used as Compute Nodes likely need to be patched with updated BIOS / firmware.
● Any Compute Nodes (Datrium or 3rd party) running ESXi should be patched with VMware fixes.
● Guest OS instances will likely need to be patched.
3rd Party Compute Nodes
3rd party servers that are being used in a DVX system should be updated based on recommendations from the server vendor. Please contact the server vendor directly for details.
VMware Patches
All Compute Nodes (both Datrium and 3rd Party) in any DVX that are running ESXi should update to the appropriate patches from VMware. For more information, please see the following link:
Guest OS
It is very likely that all guest OS instances running in DVX will need to be patched. Please contact the OS vendor directly for details and recommendations.
Performance Impacts
At this time we are unaware of any significant impact on performance from BIOS and VMware patches. However, based on reported behavior of the various fixes from VMware and server vendors, it is possible that some workloads will exhibit some non-trivial changes in performance related to guest OS patches and updates. We recommend that you work closely with your guest OS vendors to ensure performance concerns are addressed.

VMware Flash Read Cache (VFRC) Performance. Does it really make a difference?

I’ve been generally disappointed with the performance of my MS S2D storage array due to the way MS lays writes down onto parity pools. All the SSD caching and NVMe write cache just wasn’t doing enough and it was time to go a different route. I’ve written about my Datrium install a few times now, and for those who are unfamiliar with the solution the short version is that Datrium takes host populated flash and builds a dedicated Read / Write pool on the host. It does an extremely good job of this and if you would like to know more check out their website here https://www.datrium.com/open-convergence-architecture I like that I am able to populate my host with “cheap[” SSD for performance, and then dump the data to cheaper spinning disk for cold storage.

So how can I do this at home without spending a bunch of money? My first though was Dell’s Cachecade which I have on the PERC H700 I used for this test. The problem with Cachecade however is that it has a hard limitation of 512GB per host, reads above 64kb are not cached, and there is no support for Write cache. I then thought about VSAN, and using the vFlash caching in that product, however given my lab and limited resources, VSAN would be a no-go as well.

Finally I came to VFRC. Like cachecade it is only a read side cache, however unlike Cachecade it will cache all reads, it can generate up to a 32TB pool with a maximum of 400GB of cache per VMDK. The downsides of VFRC are that you have to enable it per VMDK, and you have to set the cache block size to the average of your workload’s IO size otherwise you will degrade the performance of the cache. That said it is extremely easy to get setup and running, and it is included in your ESXi Enterprise Plus licensing, so you don’t get hit with additional cost for VSAN licensing (though that said, VSAN cache tiering would provide a significantly better performance experience).

Let’s look at the test VM. The server is a Dell R510 w/ 2x Xeon E5640 CPUs, 64GB of RAM, and the PERC H700 1GB BBWC card. I have 3 WD RE4 2TB drives + 1 Samsung 850 Pro 256GB SSD. The server is running ESXi 6.5 U1, and the VM is a 4 vCPU 6GB vRAM PVSCSI controller VM.

First up let’s see what this machine does with a VM running directly on the OCZ RD400 NVMe AIC. This is a Thin provisioned .vmdk living on a VMFS v6 volume.

Screen Shot 2017-08-20 at 7.18.32 PM

This test shows that a 8k 70/30 Random Read/Write test against a 5GB file with 30 minutes of sustained IO generated 165MBps Throughput, 21,193 IOPS, and averaged 0.3ms of read latency. This is fantastic and exactly what I would expect of an NVMe disk.

So now let’s run exactly the same test against my VFRC volume. This is a RAID 5 of 3x SATA6 2TB Western Digital RE4 drives + 50GB of VFRC at an 8k block size.

Screen Shot 2017-08-20 at 7.19.31 PM

For a SATA RAID5 that’s not bad. 3.3MBps, 429 IOPS, and an average read latency of 18ms. This isn’t amazing performance by any means, but again for a 3 disk SATA array this is pretty fantastic.

Let’s compare that to a disk with no vFlash Cache.

Screen Shot 2017-08-20 at 7.19.45 PM

1.87MBps, 239 IOPS, and 33ms average latencyu. The VFRC disk is literally 2x the performance of the traditional disk running on the same array.

So as you can see VFRC is a great option for those who have traditional storage, either local or SAN, where a flash tier would normally not be available.

Finally, let’s look at the same workload on my Datrium cluster.

Screen Shot 2017-08-20 at 7.20.24 PM

As you can see, this is not going to be a replacement for having dedicated flash resources, and Datrium caches much more efficiently than VMware does, however if you need a quick “Cheap” fix, VFRC is a great answer.

Microsoft Exchange: Powershell Script to add a SMTP Address to all users.

Recently during an Exchange to Office 365 Migration a bunch of the mailboxes were not part of the default SMTP address policy, and therefore didn’t get the “@domain.mail.onmicrosoft.com”

Instead of trying to update them all manually I just wrote a quick script to add the address with their default alias just so we could get them moved to Office 365

 

#################################################
#   Add new SMTP address to all mailboxes.      #
#   Created by - Cameron Joyce                  #
#   Last Modified - Jun 04 2017                 #
#################################################
# This script will add a new SMTP address for each mailbox on a specificed Exchange server.
# This script must be run from the Exchange Management Shell for Exchange 2010 - 2016

$mailboxes = Get-Mailbox -server servername

Foreach ($mailbox in $mailboxes){
    $name = $mailbox.alias
    Set-Mailbox "$name" -EmailAddresses @{add="$name@domain.com"}
}

VMware vSphere: A general system error occurred: Connection refused When starting Virtual Machines.

I had an issue the other day with starting a VM. It would DRS successfully, however fail with “A general system error occurred: Connection refused”

Screen Shot 2017-06-02 at 1.10.34 PM

Googling tells me that the culprit is the vmware-vpx-workflow service being stopped. I SSHed into my VCSA and sure enough found that the service was indeed stopped.

Screen Shot 2017-06-02 at 1.15.32 PM.png

So I attempt starting the service, and that failed.

Screen Shot 2017-06-02 at 1.15.43 PM.png

What the hell? Doing a tail on all the logs in the /var/log/vmware/workflow folder don’t come up with anything. However after re-reading the errors during start I realized…maybe its a disk space issue.

Screen Shot 2017-06-02 at 1.16.17 PM.png

Sure enough, our log disk was full. I grew the log disk, and ran the autogrow command in VMware to resize the disks in the VCSA, restarted the services and VOLA!

Screen Shot 2017-06-02 at 1.16.28 PM.png

After updating the disk size, we were all set and I was able to start VMs without issues.