NSX-T Backup Monitoring

When you've configured automated backups in NSX-T, you might be unaware that failed backup jobs do not trigger alarms in the integrated NSX-T alarm dashboard. When a backup fails, you can only see the following error message in the Backup & Restore configuration:

At the moment, you have to manually check that the backup is running as expected. This can also be done using the API:

> GET /api/v1/cluster/backups/history HTTP/1.1
{
  "cluster_backup_statuses": [
    {
      "backup_id": "5cf21742-091a-b9b9-1f24-ad75ede2d23b-1615489436",
      "start_time": 1615489436085,
      "end_time": 1615489440865,
      "success": false,
      "error_code": "BACKUP_AUTHENTICATION_FAILURE",
      "error_message": "either backup server login failed or unauthorized access to backup directory"
    }
  ],
  "node_backup_statuses": [
    {
      "backup_id": "5cf21742-091a-b9b9-1f24-ad75ede2d23b-1615403036",
      "start_time": 1615403036017,
      "end_time": 1615403354709,
      "success": true
    }
  ],
  "inventory_backup_statuses": [
    {
      "backup_id": "inventory-1615490636",
      "start_time": 1615490636254,
      "end_time": 1615490641758,
      "success": true
    }
  ]
}

In this example, the cluster backup failed. Besides the backup status itself, you should also check when the last backup finished. The end_time is given as milliseconds timestamp.

I've published a Nagios check to monitor the status and age of NSX-T backups.

usage: check_nsxt_backup.py [-h] -n NSX_HOST [-t TCP_PORT] -u USER -p PASSWORD
                            [-i] [-a MAX_AGE]

# python check_nsxt_backup.py -n nsx.virten.lab -u audit -p password
NSX-T cluster backup failed
NSX-T node backup is to old (1461 minutes)

The script is available on GitHub: github.com/fgrehl/virten-scripts/blob/master/python/check_nsxt_backup.py

NSX-T Backup Monitoring

Share:

Leave a Reply Cancel reply