Filesystem consistent Linux Backups with VMware

Life is easy if you are running Windows because if you want to create image-based VMware Backups with Veeam Backup & Replication, Quest vRanger, PHD Virtual or any other VADP using competitor you can use VSS. Backing up Linux is much more complex as there is no equivalent. What you get are crash consistent copys from your virtual disks. After some research i couldn’t find any established solution. This is the backup vendors answer (I am not talking about application aware backups as this is another problem):

veeam Backup & Replication v6
Veeam refers to the “Enable VMware tools quiescence” option. But is this true? Yes, there is this option and you can enable it. But the vmsync driver inside you virtual machine is disabled by default. So if you activate “Quiesce”, nothing actually happens. The backup succeeds but all you get is an inconsistent state.
Source: User Guide

Quest vRanger 5.3.1
The solution Quest provides is only a small hint: Install VMware Tools, create freeze Scripts and enable Guest Quiescing. But who will support my custom script?
Source: Quest Solution SOL84967

PHD Virtual Backup
PHD Virtual does not provide any information about consistent linux backup. The only thing i could find was a note “Quiesce? Windows only!”


Possible Solution?

So, how to create a filesystem consistent linux backup with veeam, vRanger or PHD virtual? As every vendor is doing the same -triggering the VMware API-  the answer is identical. But first let’s have a look at the basics. What do i have to do to get a consistent state? And how can i determine that my backup is consistent?

An inconsistent filesystem has to be recovered prior to mount. Using dmesg you can determin whether is was consistent or not:

Consistent filesystem mount:

root@ubuntu:~# dmesg |grep EXT
[3.711991] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[7.685314] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro

Inconsistent filesystem mount:

root@ubuntu:~# dmesg |grep EXT
[3.780568] EXT4-fs (dm-0): INFO: recovery required on readonly filesystem
[3.780855] EXT4-fs (dm-0): write access will be enabled during recovery
[4.153234] EXT4-fs (dm-0): recovery complete
[4.178622] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[8.058018] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro

This test was made with the current Ubuntu 12.04 LTS.


Solution 1: Custom fsfreeze Script

The fsfreeze command suspends and resumes the access to an filesystem. After suspendig the access, the volume is in an consistent state and can be copied. Please note that fsfreeze is a recently published tool and not available on older systems.
1. Install VMware Tools

2. Create custom scripts:

root@ubuntu:~# touch /usr/sbin/pre-freeze-script
root@ubuntu:~# touch /usr/sbin/post-thaw-script

3. Edit both scripts and add your mountpoints. Your file should look like (Only one mount):

root@ubuntu:~# cat /usr/sbin/pre-freeze-script
#!/bin/sh
fsfreeze -f /
root@ubuntu:~# cat /usr/sbin/post-thaw-script
#!/bin/sh
fsfreeze -u /

4. Make both files executable:

root@ubuntu:~# chmod 755 /usr/sbin/pre-freeze-script
root@ubuntu:~# chmod 755 /usr/sbin/post-thaw-script

5. Activate “Quiesce” Option in your backup client

During backup the backup client triggers the vCenter server to make a snapshot with the “quiesce” option. This involves both scripts to freeze and unfreeze the write IOs on the filesystem during the snapshot creation.


Solution 2: vmsync

As mentioned above VMware has created a sync driver that allows to create consistent backups. Unfortunately i couldn’t find any information about this driver and it is disabled by default, giving a small explanation:

[EXPERIMENTAL] The VMware FileSystem Sync Driver (vmsync) is a new feature that
creates backups of virtual machines. Please refer to the VMware Knowledge Base
for more details on this capability. Do you wish to enable this feature?

This comment emphasizes that this feature is not supported at the moment. I also couldn’t find this Knowledge Base details.

To enable vmsync driver you have to enable it during the installation, or run vmware-config-tools.pl later:

root@ubuntu12:~# vmware-config-tools.pl
Initializing...

Making sure services for VMware Tools are stopped.

vmware-tools stop/waiting

[EXPERIMENTAL] The VMware FileSystem Sync Driver (vmsync) is a new feature that
creates backups of virtual machines. Please refer to the VMware Knowledge Base
for more details on this capability. Do you wish to enable this feature?
[no] yes

After activating vmsync and “quiesce” option in your backup client you can create consistent backups. I have tested both solutions in testing environments with Ubuntu 12 and RHEL6 systems and was able to create consistent backups. But please note that this is not supported by VMware or any Backup Vendor. So please test it out before you roll it out into production. Be warned that i have encountered serious issues while using this solutions and would not use it in production!

HP Proliant N40L Temperature

While doing some stress tests i monitored the temperature of my HP N40L. The N40L mainboard has 3 sensors: CPU, Nothbridge and Ambient. The room where the servers are located has a temperature of 20°C. Surprisingly the fan speed did not change, even at full load.

Here are my results:

Idle:
CPU: 34°C
North Bridge: 37°C
Ambient: 21°C

Moderate Load (Both CPUs at 50% and some harddrive activity):
CPU: 42°C
North Bridge: 38°C
Ambient: 22°C

Full Load:
CPU: 51°C
North Bridge: 39°C
Ambient: 23°C

Storage vMotion and dvSwitch / HA problem explained

Today I tried to explain why this Storage vMotion / dvSwitch / HA problem actually existis, how the virtual machines getting affected and what I can do to mitigate the problem. This issue seems to be hard to explain so i started to search the internet for pictures. I could find many explanations, but no one draws a picture of it. I’ve done it:

Explanation

  1. We have 2 ESXi hosts, both connected to a shared storage containing 2 LUNs. The first ESXi runs two virtual machines (VM1 & VM2), the second runs only one (VM3). All three virtual machines are connected to a Distributed Virtual Switch. There is an additional folder on each LUN containing the dvSwitch Port information (.dvsData). This information is required for ESXi servers to know where the ports belong to, without asking the vCenter.
  2. Using storage vMotion VM2 is migrated to LUN 2. This could be done either manually or triggered by Storage DRS. Actually, this is where the bug happens. All files like .vmx or .vmdk are moved to LUN 2 and whyever the dvSwitch port information remains on LUN 1. The bug has happend, but nothing noticeable at this point. VM2 stays up and running without any network issues.
  3. The first ESXi dies. This is where HA shoud get active and initiates a restart on another host.
  4. HA initiates VM1 to be started on the second ESXi. Everything fine.
  5. HA initiates VM2 to be started on the second ESXi. During this process, HA tries to access the port information inside the .dvsData directory. HA fails as it can’t find the port information within LUN2 .dvsData directory.

Operation failed, diagnostics report: Failed to open file /vmfs/volumes/UUID/.dvsData/ID/Port Status (bad0003)= Not found

Additional Information

VMware KB2013639
Issue explaind by Duncan Epping @ Yellow-Bricks
Script to identify and fix affected VMs by Alan Renouf

Running WSX as Appliance

A few weeks ago VMware published a Technology Preview for VMware Workstation which also comes with a service called WSX. This service allows to connect virtual machine consoles through a web interface, without any plugins. A lot of ideas around WSX are posted at the creators blog and i want to pick up one idea: stripping wsx out of its 400MB download package and deploy it as single package or appliance.

Running WSX as a standalone is nothing new because William Lam has already posted about it. So here is Part 2…

Read more »

Datastore cluster permissions lost – Script Workaround

Placing the datastore clusters inside a folder in some cases is not an option, so i decided to write a PowerCLI script which creates the permisson after vCenter service restart. As you might know, all permissons set at datastore cluster level are gone after vCenter restarts. This workaround referred to VMware KB: 2008326.

First you have to find affected permissons. This applies to permissons which are set directly to datastore clusters. A datastore cluster is referred as “StoragePod”, so this is the keyword:

Read more »

Datastore cluster permissions lost

After migrating datastores to datastore clusters and adding permissons at datastore cluster level I run into incomprehensible issues where users suddenly failed to create VMs. Users getting an error messege while selecting the Cluster:

You do not have the privilege ‘Datastore > Allocate space’ on the datastore connected to the selected Cluster

I checked the vCenter permissons and noticed that the datastore permission is missing. I remembered that there was this bug that causes all vCenter permissons to disappear after renaming Windows users or groups, so i just reassigned the permisson. Shortly later the problem recurred, so I searched the VMware KB and noticed that this is a known issue: KB: 2008326

VMwares resolution is to place the datastore cluster inside a folder, set the permissons to that folder and propagate them. Unfortunately you can’t simply move the cluster:

The specified folder does not support this operation

Move entities – The specified folder does not support this operation.

The solution is to create a new storage cluster, recreate all settings and move the VMFS-Datastores into the new cluster. This task is doable withou any interruption to the running virtual machines.

SDRS permissons inside a folder

Shutting down the VSA

As you can not control the VSA Appliance, questions come up how to properly shutdown the VSA Cluster. So here is the answer:

  1. Shut down all VMs
  2. Put the VSA Cluster into maintenance mode
  3. Shut Down the ESXi Hosts

Do not…

  • …put the ESXi hosts into maintenance mode
  • …shutdown the VSA Appliance

vsa_maintenance_mode

shutdown_esx_host

HP N40L Shared Storage with vSphere Storage Appliance (VSA)

Without a shared storage it is quite hard to deploy a reasonable test scenario. Within vSphere 5 VMware introduced the vSphere Storage Appliance (VSA). The VSA transforms the local storage from up to 3 servers into a mirrored shared storage. This sounds really great for a testing environment because it supports plenty VMware Features like vMotion, HA and DRS.

Prior to installation there are a few things to check because the VSA has very strict system requirements. As it is only a testing environment and I do not consider getting support, so the main goal is getting the VSA up and running. The server requirements are:

  • 6GB RAM
  • 2GHz CPU
  • 4 NICs
  • Identical configuration across all nodes
  • Clean ESXi 5.0 Installation

I deliberately ignored all the vendor/model or hardware raid controller requirements, as this are only soft-requirements. The HP Proliant N40L supports all above requirements, except the 2GHz CPU. But there is a little XML File which contains the host audit configuration the installer uses during the installation. I am going to tweak this file a little bit to get the installation done.
Read more »

vSphere 5 Homelab – ESX on HP ProLiant N40L Microserver

Hewlett Packard has launched an extremely affordable server for SMB and home users. Not only due to its price of approximately 200 euros, but also because of its low power consumption it is an great candidate for a virtualization home lab. An optional Remote Access Card (RAC) can extend ILO similar functions to the server.

HP Proliant N40L Package

The HP N40L has 2 CPU cores and supports up to 8GB of RAM. Although this is quite low for a hypervisor, it should be sufficient for a pure test environment. The server does only have a software-based RAID controller which will not work with the ESX. If you want to use the local disks as an array, you have to buy an additional RAID controller. The P410 for example is supported. I decided not to buy a RAID controller, because I want to store my VMs on an shared Storage. The good news is that the server is shipped with 4 hard drive trays, which allows the installation of any SATA hard drive.

HP Proliant N40L Front

Features

The Server is shipped with the following configuration:

  • Prozessor: AMD Turion™ II Neo N40L (2x 1,50GHz)
  • Memory: 2GB PC3-10600E UDIMMs DDR3
  • Hard Disk: 1x Seagate Barracude (250GB, 7200RPM, SATA)
  • LAN: 1x 10/100/1000 MBit (NC107i)
  • PSU: 150 Watt, non-redundant
  • Ports: VGA, eSATA, 7x USB 2.0 (4x Front, 2x Back, 1x On-Board)

Read more »