Free high availability: Create a XenServer virtualization cluster
By Lab Notes
Created 2010-11-12 03:00AM
High availability used to be expensive, requiring both specialized software and redundant hardware. Today it almost grows on trees, but you need two key ingredients: virtualization [1] and network storage [2]. By creating a virtual server farm across multiple physical servers, and storing all of your virtual machines on a central SAN or NAS, you can ensure that the failure of any given piece of physical hardware will not bring down your virtual environment.
We recently created a highly available virtual server cluster based on the free edition of Citrix XenServer; this article outlines the process step by step. Although the obvious choice for any enterprise-grade virtualization deployment is VMware vSphere [3], I chose XenServer for two reasons. First, we're cheap. We don't like spending money on things that we can get for free. Second, and more important, the licensing of VMware is extremely confusing; we're never sure what exactly is required to be properly licensed. (I guess this also amounts to being cheap. We always feel that we are being overcharged for items we aren't fully utilizing.)
It should be mentioned that there are many different flavors of Xen available. Just about any version of Linux includes a Xen implementation. The version discussed here is not "pure" open source Xen, but XenServer, the commercial bare-metal hypervisor originally developed by XenSource, which was subsequently purchased by Citrix [4]. The version of XenServer we used was the latest available at the time, 5.6.0.
Virtualization ABCsAside from the cost and licensing issues, the primary reasons to implement a virtual server farm on XenServer are the same as for using any other virtualization suite:
Consolidation. Because virtualization allows us to squeeze multiple servers into a much smaller hardware footprint, we can provide the same level of functionality as before, with fewer servers, allowing for a much smaller server farm footprint. Fewer servers equals less power consumption, lower overall hardware cost, and lower manpower requirements.
Isolation of services. With the ability to quickly create virtual servers instead of standing up physical servers, there is no longer a need to consolidate services on one server. Need a new service? Instead of adding yet another service to an existing server, create a new virtual server. This allows for much simpler troubleshooting. When each service is completely isolated in its own virtual machine, there is no more second-guessing about whether service X conflicts with service Y.
Multiple-OS environment. Services that are best handled by one operating system can run side-by-side with services that are best handled by another. Most of us have an OS preference for specific applications. You like your service X to run on Windows, and your service Y to run on Linux? You no longer need two physical servers to do so. Simply stand up two virtual servers, and away you go.
Easy hardware upgrades. A system that is performing as desired need not be entirely rebuilt in the inevitable event of a hardware failure. All of us know the familiar scenario in which a service runs exactly the way we want it, and yet the physical hardware hosting this service is nearing the end of its life. With virtual servers, upgrading hardware no longer means having to rebuild the entire system from scratch. The hardware abstraction layer is standardized, and the hard drives are simple flat files. Copying these flat files to new hardware is a snap. With the proper preparation, the downtime for hardware upgrades can be reduced from potentially days to minutes.
Step 1: Install XenServerThe current version of XenServer is available from Citrix [5]. You will want the Product Installer ISO, which will be used to install both XenServer and its central management console, XenCenter. Additionally, if you are going to use virtual machines to run Linux, you will need the Linux Guest Support ISO. These will need to be burned to CD for installation on a bare-metal system.
Installation is very straightforward, following the instructions from the guided setup. Below is a brief excerpt of the installation process. Booting from the CD, you will be met with a Citrix screen. Pressing Enter or waiting on the timeout will proceed to the installation. An abbreviated version of the questions presented is listed in the table below, along with some generic responses.
After the basic installation you will be prompted to install supplemental packs. We installed the Linux Guest Support pack and hit <OK>. A prompt asks if you want to verify the disk, use it, or go back. Choose <Use>. It is possible to install more than one supplemental pack, so you will be prompted again. There are no other supplements that we intend to use, so <Skip>. When finished, you will be prompted to reboot: <OK>.
Naturally, this installation procedure will be repeated on as many physical hosts as you want in your environment. For our purposes, we used three physical servers that we called xennode01, xennode02, and xennode03.
Step 2: Install XenCenterAfter reboot, you will see a status display. The most important piece of information here is the IP address that you assigned to XenServer during the installation. From another machine on your network, you'll need to open a browser and navigate to that IP in order to download the installer for XenCenter. XenCenter is the Windows-based central management console for all of your XenServer hosts. The installer is in an MSI format, meaning you will need to be using some version of Windows to execute it.
Optional: As an alternative to the above method, you can simply use the same CD used to install XenServer. It contains the XenCenter MSI as well. We prefer the previous method, as it gives some idea that XenServer was installed properly, and we don't have to shuffle CDs.
Installation of XenCenter is very straightforward, following a wizard. Once this is done, launch XenCenter. From this point forward, 99 percent of your administration of XenServer will be handled from XenCenter. XenCenter allows you to create, delete, start, stop, and administer virtual machines.
Step 3: Add servers using XenCenterTo administer each of the XenServer servers through XenCenter, you can simply "add a server." There are multiple shortcuts to this function, but for the sake of simplicity, using the top toolbar, select Server -> Add.
You will be prompted for the server IP and user/password credentials. Unless you changed something from the directions above, the user name will be root and the password will be as set during installation.
You may be prompted to verify the SSL certificate. <Accept>
Step 4: Create a XenServer poolA pool in XenCenter is a collection of servers that you can manage as a group. If your physical servers are all of the same type, creating this pool will simplify administration. If you are intending to use XenServer's high-availability functions, a pool is required. By creating a pool and storing all of your virtual machines on an external share, to the virtual machines are freed from ties to any specific physical host. In the event of a physical host failure, the VMs on that host can be restarted immediately on another host in the pool.
The process for creating a pool is as follows: From the toolbar, select Pool -> New. You will be prompted to provide a name for the pool and an optional description. On the next screen, you will be asked to select a "master" server. This master server should be one that you have already connected to as a single server. Below the master server selection, you can add other members to the pool by selecting the check box next to the list.
To support high availability, your pool will need an external storage repository. You can create a pool without external storage, but doing so would be useful only for administrative purposes. If your VM's storage is hosted on the physical machine, and the physical machine goes down, there is no easy way of recovering that VM. Setting up your external storage will likely be the trickiest part of the installation. We cannot offer any help here, as this will depend on what equipment you are using. XenServer supports NFS, iSCSI, HBA, StorageLink, and CIFS.
We used an iSCSI target that could be easily referenced by each of the nodes created earlier. One could argue that the external iSCSI drive could fail, and you still have a single point of failure. In our case, our external iSCSI target is a RAID-6 with two controllers, so multiple failures would have to occur to lose the flat files that constitute the VM's virtual hard disk.
Step 5: Install virtual machinesWith a pool set up with external storage, you can create virtual machines that are not tied to any physical server. During the installation, choose the option to create the virtual hard drive on the external store, as well as the option "Don't assign this VM a home server..." It should be noted that these VMs are still assigned a server, to use for processing and memory allocation, but the virtual hard disks are stored elsewhere. We chose to create both a Linux VM and a Windows VM, and our pool looked something like this:
As you can see, the VMs had been assigned to xennode01. Before moving on, we verified that both machines had good network connectivity by simply pinging the network Interfaces on each.
Step 6: The true test (graceful)Now that we had two VMs running, we could run some high-availability tests. With a ping test running against both machines, we wanted to see what happened if we stopped xennode01 (the hosting server). To do this gracefully, we would put that server into Maintenance Mode. Right-clicking xennode01 and selecting Maintenance Mode gives us a prompt about migrating the VMs -- namely, a live migration requires the installation of XenServer Tools on the VMs. Doing so on either Linux or Windows prompts a reboot (which does interrupt the ping test).
After the installation and reboot, verify that XenServer Tools has installed. You can easily see this on the General tab of the instance in question. Under "Virtualization state," you will see either "Tools not installed" or "Optimized (version 5.6 installed)." Verification is important, as the XenServer Tools did not install properly on my Linux machine the first time.
With the XenServer Tools were properly installed in our Linux and Windows VMs, right-clicking xennode01 and selecting Maintenance Mode results in a smooth migration. During the migration, ping times rise from less than 1ms to about 30ms, and the VMs land successfully on xennode03, after which the pings return to less than 1ms.
Step 7: The true test (clumsy)So that was cool, but if there is going to be a hardware failure, it is doubtful that we will be able to switch gracefully to Maintenance Mode. Ensuring both the Linux and Windows VMs are running on xennode03 (which happens to be our master controller), we physically remove power (pull the plug) on xennode03.
Result? No surprise, the pings fail and we lose access in XenCenter. Trying to reconnect to the pool doesn't work because XenCenter accesses the pool and all nodes through the main controller. So how do we get control back? From one of the other physical servers, I use the local XenServer interface to navigate to Resource Pool Configuration. After a long wait, it would appear that we are getting nowhere here. Using SSH to access xennode01, we type:
Were back! Well, not quite. We can now see the pool, by using a different IP, but the VMs are not back online. Still using SSH:
We get in return:
So now we have the uuid for the host that is down (note the
We get:
Both VMs need to be recognized as turned off, so we enter:
Returning to XenCenter, the console now shows that both the Windows VM and the Linux VM are off. Starting them moves the VMs to a different server (xennode01), and we are back in business.
Final notes
It is not difficult to create a highly available virtual server cluster using XenServer. Given the right approach, the availability can be carried even further. Within the XenServer functionality are methods to bond network cards, so the failure of any given NIC does not bring down a system. Using a different switch for each network card removes the switches from the possible failure list. Failure of the external storage can be overcome with the right RAID environment. And all this is possible with free software!
XenServer is built on a Linux kernel. A few people have suggested that we could script the recovery and even add a cron job to check for downed nodes and execution of the recovery script. This seems very plausible. One would just have to be sure that multiple copies of the script weren't getting executed from the different nodes, and that a node that went down and stayed down didn't prompt recurring script executions. If this were done properly, you would get automatic recovery of downed VMs!
As a starting point we offer the following. (Please note: We have not extensively tested this script.)
We recently created a highly available virtual server cluster based on the free edition of Citrix XenServer; this article outlines the process step by step. Although the obvious choice for any enterprise-grade virtualization deployment is VMware vSphere [3], I chose XenServer for two reasons. First, we're cheap. We don't like spending money on things that we can get for free. Second, and more important, the licensing of VMware is extremely confusing; we're never sure what exactly is required to be properly licensed. (I guess this also amounts to being cheap. We always feel that we are being overcharged for items we aren't fully utilizing.)
It should be mentioned that there are many different flavors of Xen available. Just about any version of Linux includes a Xen implementation. The version discussed here is not "pure" open source Xen, but XenServer, the commercial bare-metal hypervisor originally developed by XenSource, which was subsequently purchased by Citrix [4]. The version of XenServer we used was the latest available at the time, 5.6.0.
Virtualization ABCsAside from the cost and licensing issues, the primary reasons to implement a virtual server farm on XenServer are the same as for using any other virtualization suite:
Consolidation. Because virtualization allows us to squeeze multiple servers into a much smaller hardware footprint, we can provide the same level of functionality as before, with fewer servers, allowing for a much smaller server farm footprint. Fewer servers equals less power consumption, lower overall hardware cost, and lower manpower requirements.
Isolation of services. With the ability to quickly create virtual servers instead of standing up physical servers, there is no longer a need to consolidate services on one server. Need a new service? Instead of adding yet another service to an existing server, create a new virtual server. This allows for much simpler troubleshooting. When each service is completely isolated in its own virtual machine, there is no more second-guessing about whether service X conflicts with service Y.
Multiple-OS environment. Services that are best handled by one operating system can run side-by-side with services that are best handled by another. Most of us have an OS preference for specific applications. You like your service X to run on Windows, and your service Y to run on Linux? You no longer need two physical servers to do so. Simply stand up two virtual servers, and away you go.
Easy hardware upgrades. A system that is performing as desired need not be entirely rebuilt in the inevitable event of a hardware failure. All of us know the familiar scenario in which a service runs exactly the way we want it, and yet the physical hardware hosting this service is nearing the end of its life. With virtual servers, upgrading hardware no longer means having to rebuild the entire system from scratch. The hardware abstraction layer is standardized, and the hard drives are simple flat files. Copying these flat files to new hardware is a snap. With the proper preparation, the downtime for hardware upgrades can be reduced from potentially days to minutes.
Step 1: Install XenServerThe current version of XenServer is available from Citrix [5]. You will want the Product Installer ISO, which will be used to install both XenServer and its central management console, XenCenter. Additionally, if you are going to use virtual machines to run Linux, you will need the Linux Guest Support ISO. These will need to be burned to CD for installation on a bare-metal system.
Installation is very straightforward, following the instructions from the guided setup. Below is a brief excerpt of the installation process. Booting from the CD, you will be met with a Citrix screen. Pressing Enter or waiting on the timeout will proceed to the installation. An abbreviated version of the questions presented is listed in the table below, along with some generic responses.
Select Keymap | [qwerty] us |
Welcome to XenServer Setup | < OK > |
EULA | < Accept EULA > |
Select Installation Source | Local Media |
Supplemental Packs | Yes -- Choose Yes if you intend to run Linux VMs in my environment. If your environment is going to be purely Windows, there is no need for Supplemental Packs. |
Verify Installation Source | Skip verification -- If you are not confident in the downloaded iso, you can verify, but I have used these disks repeatedly and they are known to be good copies. |
Set Password | < choose your password > |
Networking | eth0 (< MAC >) -- This choice can vary depending on your system and the number of NICs you have. It is best practice to plug in the NIC you intend to use for administration, and unplug all others. The unused NIC(s) will indicate "[no link]" |
Networking (cont.) | Static Configuration: IP Address: < varies > Subnet Mask: < varies > Gateway: < not needed > |
Hostname and DNS Configuration | Hostname: < what you want > DNS: < must have at least a dummy IP > |
Select Time Zone | America |
Select Time Zone (cont.) | Los Angeles |
System Time | Manual time entry |
Confirm Installation | Install XenServer |
After the basic installation you will be prompted to install supplemental packs. We installed the Linux Guest Support pack and hit <OK>. A prompt asks if you want to verify the disk, use it, or go back. Choose <Use>. It is possible to install more than one supplemental pack, so you will be prompted again. There are no other supplements that we intend to use, so <Skip>. When finished, you will be prompted to reboot: <OK>.
Naturally, this installation procedure will be repeated on as many physical hosts as you want in your environment. For our purposes, we used three physical servers that we called xennode01, xennode02, and xennode03.
Step 2: Install XenCenterAfter reboot, you will see a status display. The most important piece of information here is the IP address that you assigned to XenServer during the installation. From another machine on your network, you'll need to open a browser and navigate to that IP in order to download the installer for XenCenter. XenCenter is the Windows-based central management console for all of your XenServer hosts. The installer is in an MSI format, meaning you will need to be using some version of Windows to execute it.
Optional: As an alternative to the above method, you can simply use the same CD used to install XenServer. It contains the XenCenter MSI as well. We prefer the previous method, as it gives some idea that XenServer was installed properly, and we don't have to shuffle CDs.
Installation of XenCenter is very straightforward, following a wizard. Once this is done, launch XenCenter. From this point forward, 99 percent of your administration of XenServer will be handled from XenCenter. XenCenter allows you to create, delete, start, stop, and administer virtual machines.
Step 3: Add servers using XenCenterTo administer each of the XenServer servers through XenCenter, you can simply "add a server." There are multiple shortcuts to this function, but for the sake of simplicity, using the top toolbar, select Server -> Add.
You will be prompted for the server IP and user/password credentials. Unless you changed something from the directions above, the user name will be root and the password will be as set during installation.
You may be prompted to verify the SSL certificate. <Accept>
Step 4: Create a XenServer poolA pool in XenCenter is a collection of servers that you can manage as a group. If your physical servers are all of the same type, creating this pool will simplify administration. If you are intending to use XenServer's high-availability functions, a pool is required. By creating a pool and storing all of your virtual machines on an external share, to the virtual machines are freed from ties to any specific physical host. In the event of a physical host failure, the VMs on that host can be restarted immediately on another host in the pool.
The process for creating a pool is as follows: From the toolbar, select Pool -> New. You will be prompted to provide a name for the pool and an optional description. On the next screen, you will be asked to select a "master" server. This master server should be one that you have already connected to as a single server. Below the master server selection, you can add other members to the pool by selecting the check box next to the list.
To support high availability, your pool will need an external storage repository. You can create a pool without external storage, but doing so would be useful only for administrative purposes. If your VM's storage is hosted on the physical machine, and the physical machine goes down, there is no easy way of recovering that VM. Setting up your external storage will likely be the trickiest part of the installation. We cannot offer any help here, as this will depend on what equipment you are using. XenServer supports NFS, iSCSI, HBA, StorageLink, and CIFS.
We used an iSCSI target that could be easily referenced by each of the nodes created earlier. One could argue that the external iSCSI drive could fail, and you still have a single point of failure. In our case, our external iSCSI target is a RAID-6 with two controllers, so multiple failures would have to occur to lose the flat files that constitute the VM's virtual hard disk.
Step 5: Install virtual machinesWith a pool set up with external storage, you can create virtual machines that are not tied to any physical server. During the installation, choose the option to create the virtual hard drive on the external store, as well as the option "Don't assign this VM a home server..." It should be noted that these VMs are still assigned a server, to use for processing and memory allocation, but the virtual hard disks are stored elsewhere. We chose to create both a Linux VM and a Windows VM, and our pool looked something like this:
myXENpool
- xennode03
- xennode01
- Fedora Test
- Windows XP Test
- xennode02
- CIFS ISO library
- iSCSI Target
Step 6: The true test (graceful)Now that we had two VMs running, we could run some high-availability tests. With a ping test running against both machines, we wanted to see what happened if we stopped xennode01 (the hosting server). To do this gracefully, we would put that server into Maintenance Mode. Right-clicking xennode01 and selecting Maintenance Mode gives us a prompt about migrating the VMs -- namely, a live migration requires the installation of XenServer Tools on the VMs. Doing so on either Linux or Windows prompts a reboot (which does interrupt the ping test).
After the installation and reboot, verify that XenServer Tools has installed. You can easily see this on the General tab of the instance in question. Under "Virtualization state," you will see either "Tools not installed" or "Optimized (version 5.6 installed)." Verification is important, as the XenServer Tools did not install properly on my Linux machine the first time.
With the XenServer Tools were properly installed in our Linux and Windows VMs, right-clicking xennode01 and selecting Maintenance Mode results in a smooth migration. During the migration, ping times rise from less than 1ms to about 30ms, and the VMs land successfully on xennode03, after which the pings return to less than 1ms.
Step 7: The true test (clumsy)So that was cool, but if there is going to be a hardware failure, it is doubtful that we will be able to switch gracefully to Maintenance Mode. Ensuring both the Linux and Windows VMs are running on xennode03 (which happens to be our master controller), we physically remove power (pull the plug) on xennode03.
Result? No surprise, the pings fail and we lose access in XenCenter. Trying to reconnect to the pool doesn't work because XenCenter accesses the pool and all nodes through the main controller. So how do we get control back? From one of the other physical servers, I use the local XenServer interface to navigate to Resource Pool Configuration. After a long wait, it would appear that we are getting nowhere here. Using SSH to access xennode01, we type:
xe pool-emergency-transition-to-master | This command forces xennode01 (which we are currently SSH'ed to) to become the master controller. |
xe pool-recover-slaves | This command causes the master controller to find the other nodes that are part of the pool, and inform them of the master controller change. |
Were back! Well, not quite. We can now see the pool, by using a different IP, but the VMs are not back online. Still using SSH:
xe host-list params=uuid,name-label,host-metrics-live | This returns a list of the pool members. |
We get in return:
uuid ( RO) : 5ff9245d-726d-41ee-872b-1480ab4e2a56 name-label ( RW): xennode01 host-metrics-live ( RO): true uuid ( RO) : a1716dba-7a75-4e99-94f6-27c00b8b122d name-label ( RW): xennode03 host-metrics-live ( RO): false uuid ( RO) : 79285776-847d-4ce0-acd3-86934a026634 name-label ( RW): xennode02 host-metrics-live ( RO): true |
So now we have the uuid for the host that is down (note the
live=false
): It's uuid a1716dba-7a75-4e99-94f6-27c00b8b122d
. Now we enter:xe vm-list is-control-domain=false resident-on=a1716dba-7a75-4e99-94f6-27c00b8b122d | This command lists the VMs the cluster thinks are running on the downed node. (The is-control-domain=false parameter removes dom0 from the list.) |
We get:
uuid ( RO) : 5d21d7e2-5cb3-5e20-4307-b69d7eea8d94 name-label ( RW): Windows XP Test power-state ( RO): running uuid ( RO) : 2fbb543e-aac0-4488-8e57-099d2f71f01e name-label ( RW): Fedora13_32 power-state ( RO): running |
Both VMs need to be recognized as turned off, so we enter:
xe vm-reset-powerstate resident-on=a1716dba-7a75-4e99-94f6-27c00b8b122d --force --multiple | This command forces the nodes in the pool to recognize the VMs associated with this uuid to be off. (The multiple is only necessary if you need to turn off multiple VMs. Be careful, because incorrect usage can force all VMs in the cluster to be powered off.) |
Returning to XenCenter, the console now shows that both the Windows VM and the Linux VM are off. Starting them moves the VMs to a different server (xennode01), and we are back in business.
Final notes
It is not difficult to create a highly available virtual server cluster using XenServer. Given the right approach, the availability can be carried even further. Within the XenServer functionality are methods to bond network cards, so the failure of any given NIC does not bring down a system. Using a different switch for each network card removes the switches from the possible failure list. Failure of the external storage can be overcome with the right RAID environment. And all this is possible with free software!
XenServer is built on a Linux kernel. A few people have suggested that we could script the recovery and even add a cron job to check for downed nodes and execution of the recovery script. This seems very plausible. One would just have to be sure that multiple copies of the script weren't getting executed from the different nodes, and that a node that went down and stayed down didn't prompt recurring script executions. If this were done properly, you would get automatic recovery of downed VMs!
As a starting point we offer the following. (Please note: We have not extensively tested this script.)
#!/bin/bash downedhost=` xe host-list params=uuid,name-label,host-metrics-live | \ sed -e :a -e '$!N;s/\n/|/g;ta;s/|||/\n/g' | \ grep false | \ awk -F"[:,|]" '{ print $2 }'` if [ -z "$downedhost" ]; then echo "Hosts all good." else echo "$downedhost is down! Promoting myself to master." xe pool-emergency-transition-to-master xe pool-recover-slaves xe vm-reset-powerstate resident-on=$downedhost --force --multiple fi |
No comments:
Post a Comment