Evaluate PernixData FVP with replayed Production IO Traces

Using synthetic workloads to test drive PernixData FVP might result into odd findings. The most meaningful approach to test FVP is to deploy the software to production in monitor mode, let Architect do its magic and enable acceleration after checking the recommendations after a couple of days. Despite it is possible to deploy FVP, test drive, and remove it, without any downtime to virtual machines, this approach might not fit to all environments.

pernixdata-fvp-replay-workload

If you have separate DEV/QA environments with sophisticated load generators, the solution is obviously. If you don't have that, there is another option by record production I/O traces and replay them in a FVP accelerated test platform.

Required Tools

vscsiStats (If you have an ESXi hosts - You already have it)
I/O Analyzer (Free Fling)

vscsiStats is a tool for storage profiling that is available since ESXi 4.1. It collects and reports counters on storage activity at the virtual SCSI device level.

VMware I/O Analyzer is a well-known virtual appliance than can not only generate synthetic IOs but also replay traces recorded by vscsiStats.

Testbed

I’ve tested PernixData FVP in my Homelab. No enterprise grade hardware, so don't expect outstanding results:

Server: 5th gen Intel NUC (NUC5I5MYHE) running ESXi 6.0 Update 2 (Build: 3620759)
Storage: HP Microserver N40L running FreeNAS providing iSCSI Datastores
PernixData FVP Management Server 3.1.010920.4
PernixData FVP Host Extension 3.1.0.4-39343
Acceleration Resource: Samsung 850 EVO M.2 250GB SSD

Record Workloads with vscsiStats

Pick a Virtual Machine that is running the workload you want to record and locate the ESXi host where the VM is running.

Connect to the ESXi host with SSH

Get a list of all available worlds (Virtual Machines) with vscsiStats -l and note the worldGroupID

root@esx01:~ $ vscsiStats -l
Virtual Machine worldGroupID: 966699, Virtual Machine Display Name: io1, Virtual Machine Config File: /vmfs/volumes/datastore1/io1/io1.vmx, {
   Virtual SCSI Disk handleID: 8196 (scsi0:0)
   Virtual SCSI Disk handleID: 8197 (scsi0:1)
}

vscsistats-list

Start SCSI command tracing with vscsiStats -t -s -w [worldGroupID]

root@esx01:~ $ vscsiStats -t -s -w 966699
vscsiStats: Starting Vscsi stats collection for worldGroup 966699, handleID 8196 (scsi0:0)
vscsiStats: Starting Vscsi cmd tracing for worldGroup 966699, handleID 8196 (scsi0:0)
vscsi_cmd_trace_966699_8196
Success.
vscsiStats: Starting Vscsi stats collection for worldGroup 966699, handleID 8197 (scsi0:1)
vscsiStats: Starting Vscsi cmd tracing for worldGroup 966699, handleID 8197 (scsi0:1)
vscsi_cmd_trace_966699_8197
Success.

vscsistats-trace

Write the the trace to a file with logchannellogger [traceChannel] [outputFile] &
```
root@esx01:~ $ logchannellogger vscsi_cmd_trace_966699_8197 io.trc &
```
Wait at least 30 minutes to collect data.

Stop command tracing with vscsiStats -x

root@esx01:~ $ vscsiStats -x
vscsiStats: Stopping all Vscsi stats collection for worldGroup 966699, handleID 8196 (scsi0:0)
Success.
vscsiStats: Stopping all Vscsi stats collection for worldGroup 966699, handleID 8197 (scsi0:1)
Success.

Convert the binary trace file to a CSV file with vscsiStats -e [traceFile] > [csvFile]
```
root@esx01:~ $ vscsiStats -e io.trc > io.csv
```

Now you have a defined Workload that can be tested with, and without PernixData FVP. Replay the workload at least twice, with and without FVP to see the difference. You can run it a third time to see better Read Cache results.

Replay the Workloads with I/O Analyzer

Download and deploy I/O Analyzer
Connect to the VM console and login with root (Password vmware) to start I/O Analyzer
Open the I/O Analyzer Webinterface
Select Upload vscsi trace
View vscsi trace characteristics to determine trace replay configuration
The workload I'm using here was captured from my vCenter Server. Required parameters are
Duration: ~2000 Seconds
Disk Size: ~100GB
Resize the 2nd Virtual Disk of the I/O Analyzer Appliance. This is a very important step because the default working set is far to low to get good results.
Reboot the I/O Analyzer Appliance
Start I/O Analyzer (Console Login root/vmware), navigate to the Workload configuration and add the ESXi Host running the Appliance to the configuration.
Add a Workload Entry for the I/O Analyzer Appliance with the following configuration
Test Type: Trace Replay
Trace: [Your Trace]
Set the workload duration to the trace length
Press Run Now

During the replay, you can monitor statistics with PernixData Architect or esxtop. When the test is finished, I/O Analyzer generates a report. After the first test, I've activated FVP and done the same test again:

Trace Replay Statistics in PernixData Architect
average-write-latency-pernix-data average-read-latency-pernix-data

Acceleration was done with a cheap consumer grade SSD so the values are lower than you would expect it with an enterprise grade SSD and a better backend storage. Also the cache was cold, so the read latency spikes were caused by the storage.

The report created by I/O Analyzer shows the following latency statistics (Left graph without FVP):
vmware-io-analyzer-trace-replay

Another example shows the effect on a read intensive workload, a VDI boot storm. FVP has been activated in the blue marked zone. The average latency has dropped below 1ms while serving IOs from the local cache:
Avg-Latency-vmware-io-analyzer-VDI-boot-storm Avg-IOPS-vmware-io-analyzer-VDI-boot-storm

Evaluate PernixData FVP with replayed Production IO Traces

Required Tools

Testbed

Record Workloads with vscsiStats

Replay the Workloads with I/O Analyzer

Share:

Leave a Reply Cancel reply