Skip to content

Evaluate PernixData FVP with replayed Production IO Traces

Using synthetic workloads to test drive PernixData FVP might result into odd findings. The most meaningful approach to test FVP is to deploy the software to production in monitor mode, let Architect do its magic and enable acceleration after checking the recommendations after a couple of days. Despite it is possible to deploy FVP, test drive, and remove it, without any downtime to virtual machines, this approach might not fit to all environments.

pernixdata-fvp-replay-workload

If you have separate DEV/QA environments with sophisticated load generators, the solution is obviously. If you don't have that, there is another option by record production I/O traces and replay them in a FVP accelerated test platform.

Required Tools

  • vscsiStats (If you have an ESXi hosts - You already have it)
  • I/O Analyzer (Free Fling)

vscsiStats is a tool for storage profiling that is available since ESXi 4.1. It collects and reports counters on storage activity at the virtual SCSI device level.

VMware I/O Analyzer is a well-known virtual appliance than can not only generate synthetic IOs but also replay traces recorded by vscsiStats.

Testbed

I’ve tested PernixData FVP in my Homelab. No enterprise grade hardware, so don't expect outstanding results:

  • Server: 5th gen Intel NUC (NUC5I5MYHE) running ESXi 6.0 Update 2 (Build: 3620759)
  • Storage: HP Microserver N40L running FreeNAS providing iSCSI Datastores
  • PernixData FVP Management Server 3.1.010920.4
  • PernixData FVP Host Extension 3.1.0.4-39343
  • Acceleration Resource: Samsung 850 EVO M.2 250GB SSD

Record Workloads with vscsiStats

Pick a Virtual Machine that is running the workload you want to record and locate the ESXi host where the VM is running.

  1. Connect to the ESXi host with SSH
  2. Get a list of all available worlds (Virtual Machines) with vscsiStats -l and note the worldGroupID
    root@esx01:~ $ vscsiStats -l
    Virtual Machine worldGroupID: 966699, Virtual Machine Display Name: io1, Virtual Machine Config File: /vmfs/volumes/datastore1/io1/io1.vmx, {
       Virtual SCSI Disk handleID: 8196 (scsi0:0)
       Virtual SCSI Disk handleID: 8197 (scsi0:1)
    }

    vscsistats-list

  3. Start SCSI command tracing with vscsiStats -t -s -w [worldGroupID]

    root@esx01:~ $ vscsiStats -t -s -w 966699
    vscsiStats: Starting Vscsi stats collection for worldGroup 966699, handleID 8196 (scsi0:0)
    vscsiStats: Starting Vscsi cmd tracing for worldGroup 966699, handleID 8196 (scsi0:0)
    vscsi_cmd_trace_966699_8196
    Success.
    vscsiStats: Starting Vscsi stats collection for worldGroup 966699, handleID 8197 (scsi0:1)
    vscsiStats: Starting Vscsi cmd tracing for worldGroup 966699, handleID 8197 (scsi0:1)
    vscsi_cmd_trace_966699_8197
    Success.

    vscsistats-trace

  4. Write the the trace to a file with logchannellogger [traceChannel] [outputFile] &

    root@esx01:~ $ logchannellogger vscsi_cmd_trace_966699_8197 io.trc &
  5. Wait at least 30 minutes to collect data.
  6. Stop command tracing with vscsiStats -x
    root@esx01:~ $ vscsiStats -x
    vscsiStats: Stopping all Vscsi stats collection for worldGroup 966699, handleID 8196 (scsi0:0)
    Success.
    vscsiStats: Stopping all Vscsi stats collection for worldGroup 966699, handleID 8197 (scsi0:1)
    Success.
  7. Convert the binary trace file to a CSV file with vscsiStats -e [traceFile] > [csvFile]
    root@esx01:~ $ vscsiStats -e io.trc > io.csv

Now you have a defined Workload that can be tested with, and without PernixData FVP. Replay the workload at least twice, with and without FVP to see the difference. You can run it a third time to see better Read Cache results.

Replay the Workloads with I/O Analyzer

  1. Download and deploy I/O Analyzer
  2. Connect to the VM console and login with root (Password vmware) to start I/O Analyzer
  3. Open the I/O Analyzer Webinterface
  4. Select Upload vscsi trace
    upload-vscsi-trace
  5. View vscsi trace characteristics to determine trace replay configuration
    vscsi-trace-characteristics
  6. The workload I'm using here was captured from my vCenter Server. Required parameters are
    Duration: ~2000 Seconds
    Disk Size: ~100GB
    vscsi-trace-stats
    trace-iops
  7. Resize the 2nd Virtual Disk of the I/O Analyzer Appliance. This is a very important step because the default working set is far to low to get good results.
    resize-io-analyzer-appliance
  8. Reboot the I/O Analyzer Appliance
  9. Start I/O Analyzer (Console Login root/vmware), navigate to the Workload configuration and add the ESXi Host running the Appliance to the configuration.
    config-io-analyzer-tests
  10. Add a Workload Entry for the I/O Analyzer Appliance with the following configuration
    Test Type: Trace Replay
    Trace: [Your Trace]
    configure-trace-replay-workload
  11. Set the workload duration to the trace length
    tracereplay-workload-configuration
  12. Press Run Now

During the replay, you can monitor statistics with PernixData Architect or esxtop. When the test is finished, I/O Analyzer generates a report. After the first test, I've activated FVP and done the same test again:

Trace Replay Statistics in PernixData Architect
average-write-latency-pernix-dataaverage-read-latency-pernix-data

Acceleration was done with a cheap consumer grade SSD so the values are lower than you would expect it with an enterprise grade SSD and a better backend storage. Also the cache was cold, so the read latency spikes were caused by the storage.

The report created by I/O Analyzer shows the following latency statistics (Left graph without FVP):
vmware-io-analyzer-trace-replayvmware-io-analyzer-trace-replay-with-fvp

Another example shows the effect on a read intensive workload, a VDI boot storm. FVP has been activated in the blue marked zone. The average latency has dropped below 1ms while serving IOs from the local cache:
Avg-Latency-vmware-io-analyzer-VDI-boot-storm Avg-IOPS-vmware-io-analyzer-VDI-boot-storm

Leave a Reply

Your email address will not be published. Required fields are marked *