Automatically clean up VMware snapshots using PowerCLI.

Something that every VMware admin who also uses Veeam has had to deal with once if not multiple times is Veeam not cleaning up snapshots properly which then leads to broken snapshot chains, leading to VMDK corruption, and finally leading to an admin crying into his / her bourbon realizing that “no the VM isn’t coming back up, no I can’t consolidate the snap chain and recover it, and no there haven’t been backups in $n days to recover from.” A process I’ve put into my own environment to prevent this is a simple PowerCLI script that looks for all snapshots over 24 hours old, and removes them. VMware recommends that you never have more than 3 snapshots in a chain and those should never be over 72 hours old from a performance standpoint. Personally I agree completely with that. Snapshots should only be used before making a big change so you can quickly roll back, not something that you create and then live off of.

This script requires that the VMware PowerCLI modules are installed on the system you run it from.

#################################################################
#   Remove all snapshots from vSphere from the last 24 Hours    #
#   Created by - Cameron Joyce                                  #
#   Last Modified - Jun 19 2017                                 #
#################################################################
# This script uses PowerCLI to remove all snapshots from virtual machines that are 24 hours old. 

# Load all VMware Modules, and set PowerCLI config.
Get-Module -ListAvailable VM* | Import-Module

# Connect to vSphere vCenter Server.
Try{
    connect-viserver -server your.vmware.server -user administrator@vsphere.local -Password Password
}
Catch{
    Write-Host "Failed Connecting to VSphere Server."
    Send-MailMessage -From "" -To "server@domain.com" -Subject "Unable to Connect to VSphere to clean snapshots" -Body `
    "The powershell script is unable to connect to host your.vmware.server. Please investigate." -SmtpServer "smtp.server.com"
    Break
}

# Variables
$date = get-date -f MMddyyyy
$logpath = "C:\Scripts\Script_Logs"

# Verify the log folder exists.
If(!(Test-Path $logpath)){
    Write-Host "Log path not found, creating folder."
    New-Item $logpath -Type Directory
}

# Get all snapshots older than 24 hours, remove them.
If((get-snapshot -vm *) -ne $null){
    $snapshotlist = get-snapshot -vm * | select VM, Name, SizeMB, @{Name="Age";Expression={((Get-Date)-$_.Created).Days}}
    Write-Host "Current Snapshots in Dallas vSphere"
    Write-Output $snapshotlist
    Write-Output "Snapshots existing before cleanup" | Out-File $logpath\Snapshots_$date.txt -Append
    Write-Output $snapshotlist | Out-File $logpath\Snapshots_$date.txt -Append
}

# Check to make sure that all snapshots have been cleaned up.
If((get-snapshot -vm *) -ne $null){
    get-snapshot -vm * | Where-Object {$_.Created -lt (Get-Date).AddDays(-1)} | Remove-Snapshot -Confirm:$false
    $snapshotlist = get-snapshot -vm * | select VM, Name, SizeMB, @{Name="Age";Expression={((Get-Date)-$_.Created).Days}}
    Write-Host "Current Snapshots in Dallas vSphere after cleanup"
    Write-Output $snapshotlist
    Write-Output "Snapshots existing after cleanup" | Out-File $logpath\Snapshots_$date.txt -Append
    Write-Output $snapshotlist | Out-File $logpath\Snapshots_$date.txt -Append
}
Else{
    Write-Output "No Snapshots to clean up." | Out-File $logpath\Snapshots_$date.txt -Append
}

# Send snapshot log to email.
$emailbody = (Get-Content $logpath\Snapshots_$date.txt | Out-String)
Send-MailMessage -From "server@domain.com" -To "user@domain.com.com" -Subject "Daily vSphere snapshot cleanup report" -Body $emailbody -SmtpServer "smtp.server.com"

# Exit VIM server session.
Try{
    disconnect-viserver -server your.vmware.server -Confirm:$false
}
Catch{
    Write-Host "Failed disconnecting from VSphere."
    Send-MailMessage -From "server@domain.com" -To "user@domain.com" -Subject "Disconnection from VSphere Failed" -Body `
    "The powershell script is unable to disconnect from VSphere. Please manually disconnect" -SmtpServer "smtp.server.com"
}

# Cleanup Snapshot logs older than 30 days.
gci -path $logpath -Recurse -Force | Where-Object {!$_.PSIsContainer -and $_.LastWriteTime -lt (Get-Date).AddDays(-30)} | Remove-Item -Force

There is more logic in here for sending email alerts than actual VMware commands, however when this is running automated from a PS job, it is super helpful to have all the emails.

Create AES secure passwords for use in PowerShell scripting.

Something that I’ve always wanted to get away from in my scripting is leaving passwords in plain text. It fails audits and is just generally insecure and needs to be avoided at all costs. A solution I’ve come up to deal with this so far is to generate a secure key and password hash using AES. Now of course the problem with this is that you still need to secure your keys as they can be used to decrypt your hash into plain text, however the same can be said for PGP or any other reversible encryption.

#################################################################
#   Generate AES Secured Password for Scriptng                  #
#   Created by - Cameron Joyce                                  #
#   Last Modified - Dec 25 2016                                 #
#################################################################
# This script is used to generate a secured AES key and password string for use with other automation. 

# Variables
$Key = New-Object Byte[] 32   # AES Key. Sizes for byte count are 16 (128) 24 (192) 32 (256).
$UnSecPass = Read-Host "Enter Password you wish to use"
$PassName = Read-Host "Enter a filename for the password file."
$SecPass = "$UnSecPass" | ConvertTo-SecureString -AsPlainText -Force
$PasswordFile = "$env:Userprofile\Downloads\$PassName.txt" # OutFile Path for encrypted password.
$KeyFile = "$env:Userprofile\Downloads\$PassName.AES.key" # Path to Generated AES Key.

# Create Random AES Key in length specified in $Key variable.
[Security.Cryptography.RNGCryptoServiceProvider]::Create().GetBytes($Key)

# Export Generated Key to File
$Key | out-file $KeyFile

# Combine Plaintext password with AES key to generate secure Password.
$SecPass | ConvertFrom-SecureString -key $Key | Out-File $PasswordFile

How do we use this in real life though? Well here is an excerpt from another script I wrote to install the SCCM agent using my credentials without being logged in.

# Create secure credentials.
$User = "domain\username"
$PasswordFile = "C:\password"
$KeyFile = "C:\key"
$key = Get-Content $KeyFile
$MyCredential = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $User, (Get-Content $PasswordFile | ConvertTo-SecureString -Key $key)

# Using the $MyCredential in practice.
Start-Process Powershell -ArgumentList "-File C:\Temp\SCCM_Agent_Install.ps1 -Verb runAs -Credential $MyCredential"

There is much to be improved on here, however it satisfied my audit requirements and makes me considerably more comfortable that I don’t have a folder full of .ps1 files on a server with my password in plain text.

Cleaning up Lync ADSI attributes for all users in Active Directory for Office 365 migration using PowerShell.

Working through the same migration as my last blog post I ran into a second issue. The client is currently running an on premise Lync 2010 server and they want to migrate all of their users to Skype for Business in Office 365. The users are just using Lync for basic IM, and aren’t using any of the conferencing, or SIP features. Traditionally the way you would move users is to follow the guidelines for a Lync 2010 Hybrid deployment (Microsoft KB explaining how to do this) however that is a lot of effort if all you want is your users in SFB and you don’t care about their data.

If you just license the users in the Office 365 portal you will notice that the license applies, however it doesn’t actually create the Skype user, and the Skype user isn’t able to sign into the Skype client. This is because they still have their on premise Lync Server attributes in their AD account. You will need to clear these out and re-run a Azure AD Sync Delta sync to allow the user accounts to create. Given that going through each user’s ADSI Attributes and clearing them by hand is super tedious, here is a simple PS Script to fix this for you.

#####################################################
#   Remove msRTCSIP Attributes from all users.      #
#   Created by - Cameron Joyce                      #
#   Last Modified - Mar 05 2017                     #
#####################################################
# This script will remove the msRTCSIP attributes from all users in ActiveDirectory. This is meant to be used in an
# Office 365 migration in which you have an on premise lync server, however do not plan to do a hybrid migration to
# migrate users. 

# Variables
$users = Get-ADUser -Filter *

# Foreach user in AD, set all attributes to $null
foreach($user in $users){
    $ldapDN = "LDAP://" + $user.distinguishedName
    $adUser = New-Object DirectoryServices.DirectoryEntry $ldapDN
    $adUser.PutEx(1, "msRTCSIP–DeploymentLocator", $null)
    $adUser.PutEx(1, "msRTCSIP-FederationEnabled", $null)
    $adUser.PutEx(1, "msRTCSIP-InternetAccessEnabled", $null)
    $adUser.PutEx(1, "msRTCSIP-Line", $null)
    $adUser.PutEx(1, "msRTCSIP-OptionFlag", $null)
    $adUser.PutEx(1, "msRTCSIP-PrimaryHomeServer", $null)
    $adUser.PutEx(1, "msRTCSIP–PrimaryUserAddress", $null)
    $adUser.PutEx(1, "msRTCSIP–UserEnabled", $null)
    $adUser.PutEx(1, "msRTCSIP-UserPolicies", $null)
    $adUser.SetInfo()
}

Diagnosing and fixing slow migration times to Office 365 from Exchange 2010.

I’m currently working on an Office 365 migration for a client and have been seeing extremely long migration times for mailboxes. We are talking 30+ hours to move a single 20GB mailbox. The client has a 100Mbps symmetrical fiber pipe, so we know that isn’t the issue, and while their hypervisor and SAN setup aren’t the highest performing we should be seeing better performance than that.

First thing we have to do is actually figure out how bad the situation is. To do so we use this “You had me at EHLO” blog post to figure out what is going on. You need to download this TechNet gallery script and save it locally. Next open up an administrative PowerShell window and change directory to the Directory the script you just downloaded and run the following commands (or if you want, dump them into PowerShell ISE and run them from there.

# Enable unsigned scripts.
Set-ExecutionPolicy Unrestricted

# Create the ProcessStats function.
# Notice the space between the two periods.
# This is important. It is Period space Period Slash.
. .\AnalyzeMoveRequestStats.ps1

# Connect to MSOL Service.
Connect-MsolService

# Set the variables
$moves = Get-MoveRequest | ?{$_.Status -ne 'queued'}
$stats = $moves | Get-MoveRequestStatistics –IncludeReport

# Generate your report.
ProcessStats -stats $stats -name ProcessedStats1

After running this, my results looked like this.

Name                            : ProcessedStats1
StartTime                       :
EndTime                         : 3/6/2017 5:29:27 PM
MigrationDuration               : 1 day(s) 04:08:33
MailboxCount                    : 61
TotalGBTransferred              : 302.20
PercentComplete                 : 92.01
MaxPerMoveTransferRateGBPerHour : 1.75
MinPerMoveTransferRateGBPerHour : 0.19
AvgPerMoveTransferRateGBPerHour : 1.05
MoveEfficiencyPercent           : 80.05
AverageSourceLatency            : 1,469.07
AverageDestinationLatency       : 929.00
IdleDuration                    : 214.40 %
SourceSideDuration              : 71.68 %
DestinationSideDuration         : 14.10 %
WordBreakingDuration            : 7.14 %
TransientFailureDurations       : 0.91 %
OverallStallDurations           : 0.82 %
ContentIndexingStalls           : 0.00 %
HighAvailabilityStalls          : 0.00 %
TargetCPUStalls                 : 0.77 %
SourceCPUStalls                 : 0.05 %
MailboxLockedStall              : 0.00 %
ProxyUnknownStall               : 0.00 %

So our problem is the on premise server causing the problem. We can see in both the AverageSourceLatency and the SourceSideDuration for the idle timer that they are significantly higher than the target. We also notice that we are only transferring 1.75GB per hour which is again horrible.

The resolution for all of this is to increase the MaxActiveMovesPerTargetMDB and MaxActiveMovesPerTargetServer settings from their default of 2, and 5 respectively to anything between 10 and 100. Personally I set mine to 20. Tony Redmond does an excellent job of explaining what these settings do, and how to modify them in his blog post here. Additionally he explains why Microsoft sets them so low to begin with. The TL;DR is that by setting these settings higher you use significantly more CPU and Disk IO on your CAS servers, which in smaller environments can cause disruptions in services to your users, and can also overwhelm your CAS to the point where it can’t keep up with the number of migrations it is running and they time out. All the articles and TechNet forum posts I’ve seen have suggested using a setting of 10 for these, which I agree is a good starting point, then you can tune your server from there. Another note on this is that you need to modify each CAS server in your environment and restart the Exchange Mailbox Replication service after modifying the config, on each server so that this will take effect.

Enabling users for ActiveSync based on group membership using Exchange Powershell.

I recently had a task where I was required to create a nightly task to enable or disable users’ ActiveSync access based on being a member of a group. I wrote a simple powershell script and tied it to a nightly Powershell Job to to run at midnight.


#####################################################
#   Disable ActiveSync for all users except Group   #
#   Created by - Cameron Joyce                      #
#   Last Modified - Feb 24 2017                     #
#####################################################
# This script will disable ActiveSync in Exchange for all users except those in a specified security group.

# Import Exchange Modules
Add-PSSnapin Microsoft.Exchange.Management.PowerShell.SnapIn;

# Variables
$AsMemeber = @(Get-DistributionGroupMember -Identity 'ActiveSync Users' | Select Name) # Insert all users from the ActiveSync Users group into an array.
$mailboxes = Get-Mailbox -ResultSize Unlimited # Get all Mailboxes in the exchange Orginization.

# For each mailbox check to see if the mailbox user is a member of the ActiveSync users group, if so enable OWA and AS. If not, disable it.
Foreach($Mailbox in $Mailboxes){
    $Ismember = $false # Set the variable to the default of off
    $Name = $mailbox.Name # Convert the property to a string value.
    If($AsMemeber -like "*$name*"){ # If the Name of the mailbox is found in the array of ActiveSync Users, set the variable from $false to $true.
        $Ismember = $true
    }
    If($ismember){ # If the member is part of the Array do the following
        Write-Host "$name is an ActiveSync user and is being enabled"
        Set-CASMailbox $MName –ActiveSyncEnabled $true
        $astatus = Get-CASMailbox $Name | Select-Object Name, ActiveSyncEnabled
        if($astatus -like "False"){
            Write-Host "Failure occured setting ActiveSync policy on the following mailbox"
            Write-Output $astatus
        }
        Set-CASMailbox $Name -OWAforDevicesEnabled $true
        $ostatus = Get-CASMailbox $Name| Select-Object Name, OWAforDevicesEnabled
        if($ostatus -like "False"){
            Write-Host "Failure occured setting OWA for Devices policy on the following mailbox"
            Write-Output $ostatus
        }
    }
    Else{ # If the mailbox is not a member of the Array do the following.
        Write-Host "$name is not an ActiveSync user and is being disabled"
        Set-CASMailbox $Name –ActiveSyncEnabled $false
        Set-CASMailbox $Name –OWAforDevicesEnabled $false
    }
}

Performance of Microsoft Storage Space 2016 on Dell PERC H700 RAID controller.

Before we get into this post let me be up front and say that this is a completely unsupported configuration by Microsoft. Much like UnRAID or ZFS, storage spaces wants direct access to the disk to work properly. You can force Storage spaces to work with RAID volumes, but if you have problems MS support will not assist. Your biggest issue will be handling failures, as Storage spaces will not be able to accurately predict SMART failures.

Now with that out of the way. Here is the setup for this. Dell T710 as the server, Perc H700 with 512MB BBWC, 2x Samsung 850 EVO + 2x 500 GB Seagate Constellation ES SATA drives in a Tiered mirror space, and 4 500GB Seagate Constellation ES + 4 1TB WD Black SATA disks in a Parity space with 30GB SSD Write Cache. I wanted to compare how drives configured on a RAID controller that had its own built in write cache would stack against my configuration.

We setup the test the same as the earlier test on my homelab server. 100GB Testfile from IOMeter and a 75% Read 512 and 75% Read 4k. I only tested against the host, not the VM, and we tested against both the Mirrored tiered drives, and the Parity array. Here are the results.

SERVER TYPE: Dell T710
CPU TYPE / NUMBER: Xeon x5560 x2
HOST TYPE: Server 2016
STORAGE TYPE / DISK NUMBER / RAID LEVEL: Storage Spaces Tiered Mirror
Test name Latency Avg iops Avg MBps cpu load
512 B; 75% Read; 0% random 0.12 8601 4 0%
4 KiB; 75% Read; 0% random 0.15 6742 26 0%
SERVER TYPE: Dell T710
CPU TYPE / NUMBER: Xeon x5560 x2
HOST TYPE: Server 2016
STORAGE TYPE / DISK NUMBER / RAID LEVEL: Storage Spaces Parity
Test name Latency Avg iops Avg MBps cpu load
512 B; 75% Read; 0% random 0.11 8638 4 0%
4 KiB; 75% Read; 0% random 0.16 6169 24 0%

When we compare these figures to my setup we see a couple things. Firstly the CPU load on the host is considerably less. I was seeing between 10-15% CPU utilization during my tests, because my CPU has less compute power than a single Xeon, let alone 2.Next we notice that on average my latency was lower. This is due to the fact that I am using NVMe Cache instead of just SATA SSD cache for my tiers. Lastly we look at average IOPS and throughput which again are significantly higher on my system because we are seeing the NVMe cache really help things out.

Conclusions:
A RAID controller with its own dedicated cache on the card, not only is an unsupported configuration, however also really isn’t a substitute for NVMe write cache. Additionally Parity spaces greatly benefit from more powerful CPUs in terms of overall system performance. Replacing my stock i7-920 with an i7-965 with a QPI Bus speed of 1×6.2k should help cut the CPU overhead down some, and is about a $75 upgrade these days.