VMware Paravirtual SCSI adapter: Is it really that much faster?

I asked the same question myself after reading a best practice guide from Datrium that suggested using the VMware PVSCSI controller instead of the default recommendation of the LSI SAS controller that VMware makes when you create a Windows VM.

Out of curiosity I spun up a new server 2016 VM. 4 Cores 8GB of RAM, and a 100GB drive, hosted on my Datrium storage to find out how much of a difference there was.

For this test I ran during a normal production workload, and used Microsoft DiskSpd with a 16k IO sizeĀ (my current average for my app servers) to test to see what we would get for results. The specific command I used was

diskspd.exe -b16K -d1800 -h -L -o2 -t4 -r -w50 -c10G C:\io.dat

The first run on the VMware LSI SAS controller resulted in this.

Command Line: C:\Users\cjoyce_admin\Downloads\Diskspd-v2.0.17\amd64fre\diskspd.exe -b16K -d1800 -h -L -o2 -t4 -r -w50 -c10G c:\io.dat

Input parameters:

timespan: 1
-------------
duration: 1800s
warm up time: 5s
cool down time: 0s
measuring latency
random seed: 0
path: 'c:\io.dat'
think time: 0ms
burst size: 0
software cache disabled
hardware write cache disabled, writethrough on
performing mix test (read/write ratio: 50/50)
block size: 16384
using random I/O (alignment: 16384)
number of outstanding I/O operations: 2
thread stride size: 0
threads per file: 4
using I/O Completion Ports
IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time: 1800.00s
thread count: 4
proc count: 4

CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 8.35%| 1.84%| 6.50%| 91.65%
1| 8.38%| 1.89%| 6.48%| 91.62%
2| 7.78%| 1.79%| 5.99%| 92.22%
3| 7.39%| 1.60%| 5.79%| 92.61%
-------------------------------------------
avg.| 7.97%| 1.78%| 6.19%| 92.03%

Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 15150776320 | 924730 | 8.03 | 513.74 | 3.888 | 3.175 | c:\io.dat (10240MB)
1 | 15089106944 | 920966 | 7.99 | 511.65 | 3.904 | 3.289 | c:\io.dat (10240MB)
2 | 15108947968 | 922177 | 8.00 | 512.32 | 3.899 | 3.140 | c:\io.dat (10240MB)
3 | 15109013504 | 922181 | 8.01 | 512.32 | 3.898 | 3.086 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 60457844736 | 3690054 | 32.03 | 2050.03 | 3.897 | 3.173

Read IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 7574110208 | 462287 | 4.01 | 256.83 | 3.274 | 2.741 | c:\io.dat (10240MB)
1 | 7539032064 | 460146 | 3.99 | 255.64 | 3.297 | 2.966 | c:\io.dat (10240MB)
2 | 7562526720 | 461580 | 4.01 | 256.43 | 3.297 | 2.861 | c:\io.dat (10240MB)
3 | 7543046144 | 460391 | 4.00 | 255.77 | 3.293 | 2.613 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 30218715136 | 1844404 | 16.01 | 1024.67 | 3.290 | 2.798

Write IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 7576666112 | 462443 | 4.01 | 256.91 | 4.501 | 3.448 | c:\io.dat (10240MB)
1 | 7550074880 | 460820 | 4.00 | 256.01 | 4.510 | 3.479 | c:\io.dat (10240MB)
2 | 7546421248 | 460597 | 4.00 | 255.89 | 4.501 | 3.289 | c:\io.dat (10240MB)
3 | 7565967360 | 461790 | 4.01 | 256.55 | 4.503 | 3.389 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 30239129600 | 1845650 | 16.02 | 1025.36 | 4.504 | 3.402
%-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 0.000 | 0.000 | 0.000
25th | 1.360 | 2.258 | 1.709
50th | 2.818 | 3.885 | 3.269
75th | 4.481 | 6.093 | 5.443
90th | 6.259 | 8.370 | 7.195
95th | 7.163 | 9.928 | 8.987
99th | 10.090 | 13.425 | 12.593
3-nines | 23.523 | 30.284 | 27.785
4-nines | 47.191 | 52.535 | 49.878
5-nines | 190.339 | 161.402 | 190.339
6-nines | 534.581 | 534.289 | 534.289
7-nines | 545.593 | 535.040 | 545.593
8-nines | 545.593 | 535.040 | 545.593
9-nines | 545.593 | 535.040 | 545.593
max | 545.593 | 535.040 | 545.593

Overall not terrible. Now lets look at what we get when we replace the LSI SAS with a PVSCSI.

Input parameters:

timespan: 1
-------------
duration: 1800s
warm up time: 5s
cool down time: 0s
measuring latency
random seed: 0
path: 'c:\io.dat'
think time: 0ms
burst size: 0
software cache disabled
hardware write cache disabled, writethrough on
performing mix test (read/write ratio: 50/50)
block size: 16384
using random I/O (alignment: 16384)
number of outstanding I/O operations: 2
thread stride size: 0
threads per file: 4
using I/O Completion Ports
IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time: 1800.00s
thread count: 4
proc count: 4

CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 7.37%| 1.53%| 5.84%| 92.63%
1| 7.02%| 1.40%| 5.62%| 92.98%
2| 6.35%| 1.25%| 5.10%| 93.65%
3| 6.04%| 1.22%| 4.82%| 93.96%
-------------------------------------------
avg.| 6.70%| 1.35%| 5.35%| 93.30%

Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 15667019776 | 956239 | 8.30 | 531.24 | 3.760 | 2.938 | c:\io.dat (10240MB)
1 | 15743369216 | 960899 | 8.34 | 533.83 | 3.741 | 3.011 | c:\io.dat (10240MB)
2 | 15789637632 | 963723 | 8.37 | 535.40 | 3.730 | 2.841 | c:\io.dat (10240MB)
3 | 15788425216 | 963649 | 8.36 | 535.36 | 3.731 | 2.914 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 62988451840 | 3844510 | 33.37 | 2135.84 | 3.740 | 2.926

Read IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 7831814144 | 478016 | 4.15 | 265.56 | 2.660 | 2.405 | c:\io.dat (10240MB)
1 | 7862943744 | 479916 | 4.17 | 266.62 | 2.640 | 2.538 | c:\io.dat (10240MB)
2 | 7904346112 | 482443 | 4.19 | 268.02 | 2.632 | 2.247 | c:\io.dat (10240MB)
3 | 7881277440 | 481035 | 4.18 | 267.24 | 2.631 | 2.557 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 31480381440 | 1921410 | 16.68 | 1067.45 | 2.641 | 2.440

Write IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 7835205632 | 478223 | 4.15 | 265.68 | 4.859 | 3.010 | c:\io.dat (10240MB)
1 | 7880425472 | 480983 | 4.18 | 267.21 | 4.840 | 3.045 | c:\io.dat (10240MB)
2 | 7885291520 | 481280 | 4.18 | 267.38 | 4.831 | 2.946 | c:\io.dat (10240MB)
3 | 7907147776 | 482614 | 4.19 | 268.12 | 4.827 | 2.833 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 31508070400 | 1923100 | 16.69 | 1068.39 | 4.839 | 2.959
%-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 0.000 | 0.000 | 0.000
25th | 1.189 | 2.947 | 1.810
50th | 1.868 | 4.126 | 3.120
75th | 3.536 | 6.037 | 4.971
90th | 5.392 | 8.026 | 6.924
95th | 6.269 | 9.628 | 8.417
99th | 9.446 | 13.234 | 12.021
3-nines | 22.655 | 32.422 | 28.825
4-nines | 45.679 | 50.249 | 48.554
5-nines | 158.326 | 159.371 | 159.371
6-nines | 475.470 | 427.329 | 427.329
7-nines | 475.711 | 427.338 | 475.711
8-nines | 475.711 | 427.338 | 475.711
9-nines | 475.711 | 427.338 | 475.711
max | 475.711 | 427.338 | 475.711

So overall we see roughly a 4% performance increase across the board. Not groundbreaking numbers, however if you’re trying to squeeze every last drop of performance out of your VMs this could be a big step in the right direction.

Speaking of squeezing every last drop, lets see what happens when we test against a ReFS formatted disk.

Command Line: C:\Users\cjoyce_admin\Downloads\Diskspd-v2.0.17\amd64fre\diskspd.exe -b16K -d1800 -h -L -o2 -t4
10G E:\io.dat

Input parameters:

timespan: 1
-------------
duration: 1800s
warm up time: 5s
cool down time: 0s
measuring latency
random seed: 0
path: 'E:\io.dat'
think time: 0ms
burst size: 0
software cache disabled
hardware write cache disabled, writethrough on
performing mix test (read/write ratio: 50/50)
block size: 16384
using random I/O (alignment: 16384)
number of outstanding I/O operations: 2
thread stride size: 0
threads per file: 4
using I/O Completion Ports
IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time: 1800.02s
thread count: 4
proc count: 4

CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 8.65%| 1.62%| 7.03%| 91.35%
1| 8.69%| 1.49%| 7.20%| 91.31%
2| 7.83%| 1.35%| 6.47%| 92.17%
3| 7.43%| 1.36%| 6.07%| 92.57%
-------------------------------------------
avg.| 8.15%| 1.46%| 6.69%| 91.85%

Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 18047041536 | 1101504 | 9.56 | 611.94 | 3.263 | 2.708 | E:\io.dat (10240MB)
1 | 18078842880 | 1103445 | 9.58 | 613.02 | 3.258 | 3.004 | E:\io.dat (10240MB)
2 | 18066751488 | 1102707 | 9.57 | 612.61 | 3.260 | 2.712 | E:\io.dat (10240MB)
3 | 18132910080 | 1106745 | 9.61 | 614.85 | 3.248 | 2.727 | E:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 72325545984 | 4414401 | 38.32 | 2452.42 | 3.257 | 2.791

Read IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 9020080128 | 550542 | 4.78 | 305.85 | 2.762 | 2.399 | E:\io.dat (10240MB)
1 | 9030025216 | 551149 | 4.78 | 306.19 | 2.760 | 2.927 | E:\io.dat (10240MB)
2 | 9041592320 | 551855 | 4.79 | 306.58 | 2.759 | 2.342 | E:\io.dat (10240MB)
3 | 9050865664 | 552421 | 4.80 | 306.90 | 2.752 | 2.479 | E:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 36142563328 | 2205967 | 19.15 | 1225.53 | 2.758 | 2.547

Write IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 9026961408 | 550962 | 4.78 | 306.09 | 3.764 | 2.899 | E:\io.dat (10240MB)
1 | 9048817664 | 552296 | 4.79 | 306.83 | 3.754 | 2.998 | E:\io.dat (10240MB)
2 | 9025159168 | 550852 | 4.78 | 306.03 | 3.762 | 2.954 | E:\io.dat (10240MB)
3 | 9082044416 | 554324 | 4.81 | 307.96 | 3.742 | 2.870 | E:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 36182982656 | 2208434 | 19.17 | 1226.90 | 3.756 | 2.931
%-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 0.267 | 0.297 | 0.267
25th | 1.252 | 1.773 | 1.403
50th | 2.019 | 3.097 | 2.618
75th | 3.724 | 5.038 | 4.275
90th | 5.581 | 6.998 | 6.240
95th | 6.395 | 8.584 | 7.525
99th | 9.641 | 12.213 | 11.021
3-nines | 20.505 | 26.232 | 23.305
4-nines | 42.971 | 45.559 | 44.280
5-nines | 238.498 | 175.573 | 204.921
6-nines | 502.382 | 359.149 | 435.862
7-nines | 547.128 | 547.124 | 547.128
8-nines | 547.128 | 547.124 | 547.128
9-nines | 547.128 | 547.124 | 547.128
max | 547.128 | 547.124 | 547.128

With a ReFS formatted disk on top of PVSCSI we see a 17% increase!

So if your applications support it, and you truly want to squeeze every last drop out of your storage, ReFS and PVSCSI is the combination to go with!

Manually adding hosts to Datrium DVX without the vCenter plugin.

I will be doing a full writeup later on my experience with Datrium DVX, but I wanted to make sure that I got this out before I forgot.

DVX 1.1.x does not support VMware 6.0 U3. I found this out of course AFTER I updated my vCenter appliance and the DVX plugin kept erroring and crashing. I needed to add 3 new hosts to my DVX cluster so that we could expand from our POC to our prod environment, but didn’t want to downgrade vCenter. Datrium has an excellent stand alone management console, but the one thing that console can’t do is add new hosts, so how do we do this?

BEFORE WE CONTINUE I HAVE SOME DISCLAIMERS!

  1. This is not the process to add disks to your existing Datrium array. If you already have an array and want to use the dacli to add disks, we will cover that later.
  2. This is NOT a best practice. This is an emergency procedure.
  3. Datrium support is awesome. Just call them and have them work through this with you.

Now that we have that out of the way, if you still want to be self sufficient, and do this on your own here are your steps.

  1. SSH to your ESX host you would like to add to your DVX cluster.
  2. install the DVX vib on that host.
esxcli software vib install -d http://datrium.mgmt.float.ip/static/esxVibHEAD/index.xml --no-sig-check
  1. Set source to /etc/profile
source /etc/profile
  1. Exit and re-enter the SSH session using the “exit” command.
  2. Enter the dacli
da
  1. Run dacli to select all SSDs for use. This will warn you that it will scan all drives in the system, and wipe all SSDs. This is true, however it will ignore any SSDs with the VMware OS partitions on them. So if your host is running ESXi on an SSD, you are safe. Press Y when prompted.
dacli SSD select-all

6a. If no SSDs are found, you will need to wipe the SSD and rebuild them as GPT disks. I had this issue with my Samsung PM863a 960GB drives. I did not have this issue with my Intel DC P3608 4TB AIC. For each disk you wish to add run the following command.

partedUtil setptbl /vmfs/devices/disks/yourdiskidhere gpt

Then once complete you will be able to run dacli SSD select-all

  1. Premount the NFS target
premount datrium.data.float.ip
  1. In VMware vCenter Web Client find the Datrium NFS datastore, select actions, mount to new host, then select your new host you just added.

Screen Shot 2017-04-04 at 12.18.31 PM

Screen Shot 2017-04-04 at 12.18.41 PM

Screen Shot 2017-04-04 at 12.18.55 PM

We mount this way because if you use esxcfg-nas to mount the store, it will duplicate, and you will not be able to live migrate VMs between hosts.

Not too terribly difficult, however, this is a last ditch 4am it’s gotta be up in 2 hours resolution type thing and NOT by any means how you should be adding hosts to your cluster.