XenServer – Pool Master Recovery (The Missing Part 1 to XenServer Hosts in Halted Mode)

In July of 2012 I wrote a “part 2” regarding XenServer Hosts in halted mode — however I seem to have misplaced part 1 – which I’ve rewritten after having to need to reference these steps again recently.

There are several events which can cause a XenServer Pool to become corrupt. In a recent instance of mine, the pool master was unable to communicate with the HA storage repository (SR) and fenced. I also had another instance where several shutdown unexpectedly, and the pool master was among them. Here are the steps I performed to recover the Pool Master.

  1. Work on recovering the pool, elect the server you want to become the master, and on that box run “xe pool-emergency-transition-to-master”
  2. Once that is completed, on the newly elected/transitioned master, run “xe pool-recover-slaves”
  3. Once that is complete, you should be able to run “xe host-list” and see all of your hosts listed

Enjoy

Based in part on information from: XenServer System Recovery Guide

Hung VM, unable to force reboot/shutdown

I have been working with a few vendor provided VM’s which run Linux. For some reason this specific set of Linux VMs do not properly respond when issuing reboot or shutdown commands when they VMs are hung. This is even true of force-shutdown. The following process works great for virtual servers that are non-responsive in a XenServer environment, after normal reboot/shutdown attempts have failed.

  1. “xe vm-list name-label={vm logical name}” to get the uuid of the VM that is hung
  2. “list_domains” to list the domain uuid’s so you can determine the domain # of the VM above by matching the uuids from this output with the uuid for your VM from the previous command.
  3. “/opt/xensource/debug/destroy_domain -domid XX” where XX is the domain number from the previous command
  4. “xe vm-reboot name-label={vm logical name} –force”

Enjoy

 

Based in part on information from: http://www.r2dtop.com/xenserver-6-virtual-machine-crash-and-hang-issue/

 

Disabled Mailbox is not showing in Disconnected Mailbox Area

In Exchange, when you delete an active directory user account, it does not delete their mailbox automatically. Instead it considers the mailbox to be in a “disconnected” state. The mailbox exists but it is no longer associated to an active directory user account. There are several reasons why you might want to keep the mailbox around and perhaps eventually reconnect it. Today I was working on a very corrupt user account in AD, but the mailbox itself was fine. I simply deleted the user account from AD (after ensuring proper backups were taken), and then recreated a new user account. Now even though the username is the same as the one I just deleted, they contain a different GUID, so they are, in fact, different users. After creating the AD user account, I went over to the Exchange Management Console and the users mailbox was missing from both the Mailbox list, as well as the Disconnected list. The reason for this is because these are moved during a mailbox maintenance process. However you can speed this up.

 

Launch the Exchange PowerShell and run the following

Clean-MailboxDatabase

After that is complete, go back to the Disconnect Mailbox list and refresh the page, and you will find your mailbox.

 

Enjoy!

How to Remove a XenServer Slave when it No Longer Exists in the Pool

Citrix article CTX126382 describes how to remove a XenServer Slave from a pool, however it does not completely clean up after the process is complete. While the host will be removed, any storage repositiories will be left behind, such as DVD and local storage.

To clean these up perform the following:

1) Click on the disconnected storage repository on the console

2) On the general tab, right-click on the UUID and select copy

3) On the Pool master console, type: xe sr-forget uuid= (and then right-click paste which will insert the UUID of the disconnected storage repository)

Repete this process for all disconnected storage repositories, which is tpically local storage, DVD, and removable storage.

 

 

 

Xenserver hosts in halted mode (part 2)

I recently encountered a problem where one server in a pool had shutdown expected in a way which cased the vms running on that host to fail. We restarted the host and found that about half of the vms returned to the pool and could be started on another pool member, however a handful of vms were unable to start. Using information I have previously posted, I checked the power-state for these vms and they were in a halted state. However they were not available in the list_domains command. Further attempts at recovery had failed.

At that point we took a closer took at the system and discovered that the dom0 drive had zero free disk space by running the command df from the console. I connected using winscp and browsed to the log directory and deleted a majority of the old and large log files, which freed up over 59% of the disk space. Another reboot later and the disk space issue was resolved,

However in this case, there was a second issue, which is that the host that was in this state was hosting the Citrix license server and this specific host was unable to contact the license server so it couldn’t start vms. But since this vm was halted instead of stopped I couldn’t start it on a different host yet. Simply going into the license manager in XenCenter, I removed licensing on the host, which placed it into a 28 grace period. Once this was completed I could restart the halted vms, and then subsequently repoint the host back to the license server to remain the Enterprise License feature set.

XenServer: Changing management adapter in pool

After going through several rounds of problems to move a management adapter for a xenserver pool, I have found the following working process. However, it is because of this processes that Citrix makes very clear that you should configure it properly in the first place, and if you need to make changes post-installation, to make them BEFORE you join it to a pool… Also you must change the subnet when changing interfaces. Even if you need to move it to a temporary, non-existant IP address space, and then move it back to the correct IP address space after you are on the correct network interface.

However, lets say you have a pool in production and you need to make the change…

  1. Perform a metadata backup and back up your virtual machines before performing the rest of this procedure.
  2. Disable High Availability from XenCenter, if enabled.
  3. Disable external authentication (Active Director)
  4. Log on to a pool member from the physical console and change the management interface IP address
  5. From the xsconsole, go to Network and Management Interface > Configure Management Interface.
    1. Note: xsconsole freezes when the change is applied. You can use the key sequence CTRL+Z to gain access to the command prompt to run step 4 below. Then, use the command fg %1 to return to xsconsole and exit cleanly.
  6. From the CLI: use the following command: xe pif-reconfigure-ip uuid= IP= gateway= netmask= DNS= mode=
  7. To locate the correct PIF uuid for pif-reconfigure command, use the following command: xe pif-list params=uuid,host-name-label,device,management
  8. From the CLI, run the following command: xe-toolstack-restart
  9. The server enters the emergency mode. Verify that the server is using the new IP address. You can ping it from another host. Try a Secure Shell connection to it, or use the ifconfig command. Verify that the server is in emergency mode by running xe host-is-in-emergency-mode from the CLI. You should get True as the output.
  10. Repeat steps 3 and 4 on each of the pool members.
  11. Change the management interface IP address on the pool master using step 3 above.
  12. Run the following command on the pool master: xe-toolstack-restart
  13. DO NOT RUN THIS COMMAND ON THE POOL MASTER
    From the CLI, on each of the pool members, run xe pool-emergency-reset-master master-address=IP_OF_THE_MASTER.
    DO NOT RUN THIS COMMAND ON THE POOL MASTER
  14. Verify the correct status of the pool. Connect with XenCenter to the new master’s IP address and check everything from there.
  15. Re-enable High Availability and external authentication, if required

If during this process, any of your pool-slave hosts reboot and show missing management interface, and no network cards, please see our post over at: https://reddingitpro.wordpress.com/2012/04/07/xenserver-missing-network-cards-pool-member/

You can also view a video walk through of this process at: http://www.citrix.com/tv/#videos/4330

Adapted from CTX123477

XenServer: Hung VM

I’ve experieneced several instances where a VM appears to hang and is non-repsonsive, not only at the console level, but also to the XenServer Hypervisor and XenCenter. Attempts to force shutdown the server using xe vm-reboot or xe vm-shutdown fail with the error “Another operation involving the object is currently in progress class: VM”.

This has worked consistently to recover this VM.

1 – “xe vm-list” to get the uuid of the VM that is hung
2 – “list_domains” to list the domain uuid’s so you can determine the domain # of the VM above by matching the uuids from this output with the uuid for your VM from the previous command.
3 – “/opt/xensource/debug/destroy_domain -domid XX” where XX is the domain number from the previous command
4 – “xe vm-reboot uuid=XXXX –force” where XXXX is the uuid from the first vm-list command for your VM.

XenServer 6.0 – Import/Export OVF

We had received several OVF from a vendor who exported their VM’s from VMWare and we needed to import them into our XenServer 6.0 environment. After learning that this functionality is now built into Citrix XenServer and no longer needing XenConverter we were excited. However our initial test to import failed. After re-reading the documentation and searching several forums, nothing appeared to resolve the problem – the import would start and several seconds later it would fail.

So we imported the images into our VMWare environment to ensure the OVF’s were good, and even exported them again just to make sure the OVF files themselves were not the issue.

We then tried to export a XenServer VM via OVF and it failed as well. However we could import and export VXA files without issues. Okay, so we have it narrowed down. A bit more research brought us to this Citrix Blog about TransferVM

http://blogs.citrix.com/2010/12/09/diagnosing-xenserver-appliance-wizard-failures/

We attempted this but it said that the package as already installed.

We then contact Citrix who said to try: Nagivating to /opt/xensource/packages/files/transfer-vm and then running the uninstall-transfer-vm.sh

However that didn’t work, it prompted for a UUID but it didn’t document anything about the UUID

We brought this back to our test environment and it worked fine, we uninstalled and then installed and our OVF imports work properly. The difference between the test environment and production is that production is in a pool, whereas the test is standalone.

I have tried to find documentation on which UUID it is looking for but at this point I’ve tried it with the pool, host, and sr UUIDs to no avail. I might have to resort to cycling hosts out of the pool into standalone mode and reinstalling the transfer-vm component and then rejoining the pool.

Unrecoverable error during 5.5 restore (from failed 5.6)

This weekend I decided to perform the upgrade of our 2 XenServer 5.5 servers in a farm configuration to 5.6 FP1. However I found conflicting information on how to perform the actual upgrade. The mistake I made was to put the server into maintenance mode before shutting it down. When performing the upgrade you must keep the pool master in normal mode, with all VMs migrated off of it, and then shut it down, which will place the farm into a recovery mode. While in this mode you are supposed to perform the implace upgrade in a rolling style. I miss read that step. So instead I ran the upgrade with the pool master in maintenance mode (thus it was no longer the true pool master as it nominated another server to be the master). Well it let me perform the upgrade, and everything appeared to be working fine. The server rebooted and I was greeted by the regular XSConsole. However I noticed two things:
1) XenCenter still saw the server as offline;
2) XSConsole showed that there were no network interfaces (NO NICS).

After researching the issue, I discovered it was caused by an improper upgrade, but no fear there is a build in restore option. Simply insert the upgrade CD and reboot… It will prompt with a restore option. And it was working great until about 95% where it errored out saying:
“Installer only supports having a single kernel of each type installed. Found 2 of kernel-xen”

Apparently if you have any prior backups on the server, plus the one made during the upgrade, the restore will fail. I found a Citrix Forum post http://forums.citrix.com/message.jspa?messageID=1521356 which described by specific situtation and I attempted the recovery to no success. Only having mild Linux experience it took be a while to discover what I was missing from that forum post since I am a Microsoft guy. Here is the actual steps for a windows guy:
1) Reboot the server with the 5.6 upgrade CD
2) When prompted for advanced setup, press F2 (it will quickly auto select standard install if you aren’t watching)
3) It will prompt you for which advanced setup mode, type “shell” and press enter (no quotes)
4) Setup will continue and dump you to a command line
5) Type “vi /opt/xensource/installer/backend.py” and press enter (again, without the quotes)
6) You are now in the VI editor which is a pain, you can google for how to nagivate, but for the purposes of this, type “/kernel” – and press enter, repeat that until you see the line beginning with “assert len(out) == 1, “Installer only supports having a single kernel ”
7) with the cursor over that line, type dd (this should delete the entire line)
8) Then move the cursor over to “return out[0]” and press “a” to enter into the append mode, change it to read “return out[-1]” – then press “esc” and then type “ZZ” (Case sensative).

Powered by WordPress.com.

Up ↑