When you've configured automated backups in NSX-T, you might be unaware that failed backup jobs do not trigger alarms in the integrated NSX-T alarm dashboard. When a backup fails, you can only see the following error message in the Backup & Restore configuration:

At the moment, you have to manually check that the backup is running as expected. This can also be done using the API:
> GET /api/v1/cluster/backups/history HTTP/1.1
{
"cluster_backup_statuses": [
{
"backup_id": "5cf21742-091a-b9b9-1f24-ad75ede2d23b-1615489436",
"start_time": 1615489436085,
"end_time": 1615489440865,
"success": false,
"error_code": "BACKUP_AUTHENTICATION_FAILURE",
"error_message": "either backup server login failed or unauthorized access to backup directory"
}
],
"node_backup_statuses": [
{
"backup_id": "5cf21742-091a-b9b9-1f24-ad75ede2d23b-1615403036",
"start_time": 1615403036017,
"end_time": 1615403354709,
"success": true
}
],
"inventory_backup_statuses": [
{
"backup_id": "inventory-1615490636",
"start_time": 1615490636254,
"end_time": 1615490641758,
"success": true
}
]
}
In this example, the cluster backup failed. Besides the backup status itself, you should also check when the last backup finished. The end_time is given as milliseconds timestamp.
I've published a Nagios check to monitor the status and age of NSX-T backups.
usage: check_nsxt_backup.py [-h] -n NSX_HOST [-t TCP_PORT] -u USER -p PASSWORD
[-i] [-a MAX_AGE]
# python check_nsxt_backup.py -n nsx.virten.lab -u audit -p password
NSX-T cluster backup failed
NSX-T node backup is to old (1461 minutes)
The script is available on GitHub: github.com/fgrehl/virten-scripts/blob/master/python/check_nsxt_backup.py