Wednesday, March 7, 2012

Apache Active Directory Login for Nagios Access

http://telinit0.blogspot.com/2009/09/apache-active-directory-login-for.html
I initially asked for a new AD group with the nagios users login names as members but it turned out to be unusable. so what i did is to just browse for the
IT department level, filter out the login names and require the specific users that need to access the nagios site.

I did a trial and error for the required fields. The ldapsearch utility (part of openldap-clients) packages came in very handy.

one command i used to test the filtering is this:

ldapsearch -b 'OU=Departments,OU=Users,OU=domain,DC=asia,DC=org' -D 'CN=srv_nagios,OU=Service Accounts,OU=Operations,DC=asia,DC=org' -h ldapsearch -x -W sAMAccountName

(it will ask for the srv_nagios password)

srv_nagios is an unprivileged AD account used to bind the process to the Active Directory (since the AD doesn't allow anonymous browsing). without the account, one error that i had is:

auth_ldap authenticate: user user01 authentication failed; URI /nagios/ [ldap_search_ext_s() for user failed][Operations error]

also, from ldapsearch, as i still don't have the correct parameters to use, i encountered mostly this message:

# search result
search: 2
result: 1 Operations error
text: 00000000: LdapErr: DSID-0C090627, comment: In order to perform this ope
ration a successful bind must be completed on the connection., data 0, vece


in the nagios web config file:

/etc/httpd/conf.d/nagios.conf

the following is the one that worked for me (your requirement will
vary so try to work with your AD admin of the correct fields to use).

ScriptAlias /nagios/cgi-bin "/usr/lib64/nagios/cgi"
<Directory "/usr/lib64/nagios/cgi">
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthBasicProvider ldap
AuthzLDAPAuthoritative  off
AuthLDAPURL "ldap://ldap_server:389/OU=Departments,OU=Users,OU=domain,DC=asia,DC=org?sAMAccountName?sub?(objectClass=*)"
AuthLDAPBindDN "CN=srv_nagios,OU=Service Accounts,OU=Operations,DC=asia,DC=org"
AuthLDAPBindPassword "secretpassword"
AuthLDAPGroupAttribute  memberOf
AuthLDAPGroupAttributeIsDN off
AuthName "Nagios Access"
AuthType Basic
Require ldap-user user1 user2 user3
Require ldap-user user4 user5
</Directory>

Alias /nagios "/usr/share/nagios"

<Directory "/usr/share/nagios">
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthBasicProvider ldap
AuthzLDAPAuthoritative  off
AuthLDAPURL "ldap://ldap_server:389/OU=Departments,OU=Users,OU=domain,DC=asia,DC=org?sAMAccountName?sub?(objectClass=*)"
AuthLDAPBindDN "CN=srv_nagios,OU=Service Accounts,OU=Operations,DC=asia,DC=org"
AuthLDAPBindPassword "secretpassword"
AuthLDAPGroupAttribute  memberOf
AuthLDAPGroupAttributeIsDN off
AuthName "Nagios Access"
AuthType Basic
Require ldap-user user1 user2 user3
Require ldap-user user4 user5
</Directory>

Watch for errors in /var/log/httpd/error_log - if there's a problem, log entries are a very big help.

NOTE: this works with 2.2.x. i was trying with a 2.0.x version but the mod_authnz_ldap module is not builtin to it.

Once it is working, i've re-defined the contacts (for each nagios user) and then assigned them to groups. these groups are then used in templates and used by service definitions.

----
I use a setup similar to this that recently broke for no apparent reason - turns out you can use port 3268 for Global Catalog searches. Changing from 389 -> 3268 got my config working again.

===========
radius authentication

ScriptAlias /nagios/cgi-bin "/usr/lib/nagios/cgi"
<Directory "/usr/lib/nagios/cgi">
#  SSLRequireSSL
   Options ExecCGI
   AllowOverride None
   Order allow,deny
   Allow from all
#  Allow from 127.0.0.1
###############
#   AuthName "Nagios Access"
#   AuthType Basic
#   AuthUserFile /etc/nagios/htpasswd.users
#   Require valid-user
###############
   AuthName "Login with SecurID"
   AuthType Basic
   AuthBasicProvider xradius
   AuthXRadiusAddServer "radius-server:1645" "radius-secret"
   AuthXRadiusTimeout 5
   AuthXRadiusRetries 1
   AuthXRadiusRejectBlank on
   Require valid-user
###############
</Directory>
Alias /nagios "/usr/share/nagios"
<Directory "/usr/share/nagios">
#  SSLRequireSSL
   Options None
   AllowOverride None
   Order allow,deny
   Allow from all
###############
#   AuthName "Nagios Access"
#   AuthType Basic
#   AuthUserFile /etc/nagios/htpasswd.users
#   Require valid-user
###############
   AuthName "Login with SecurID"
   AuthType Basic
   AuthBasicProvider xradius
   AuthXRadiusAddServer "radius-server:1645" "radius-secret"
   AuthXRadiusTimeout 5
   AuthXRadiusRetries 1
   AuthXRadiusRejectBlank on
   Require valid-user
</Directory>

Monday, March 5, 2012

Xen Server Recovery

- license: /etc/xensource/license
              /opt/xensource/gpg/pubring.gpg, trustdb.gpg

Customize XenServer 6.0
- use gdisk, mkfs to repartition and create /iso; add loca ISO repository
- install tcsh-6.13-10.el4.i386.rpm
- install hp-agents-xs-8.6.iso
- install hpcciss-4.6.28-XS60E16.zip (http://support.citrix.com/article/CTX133952)

XenCenter Role Based Access Control https://support.citrix.com/article/CTX126442
Host Base Access Control http://support.citrix.com/article/CTX118504
XenServer System Recovery Guide http://support.citrix.com/servlet/KbServlet/download/17140-102-671536/XenServer%20System%20Recovery%20Guide.pdf

Where’s the XenServer 6 auto start VM feature

http://www.virtues.it/2011/10/xenserver6-vm-auto-start-feature/

http://www.danieletosatto.com/2011/05/10/tools-for-xenserver-troubleshooting/
http://blogs.citrix.com/?s=Tools+for+XenServer+Troubleshooting
XenServer HA http://www.citrix.com/content/dam/citrix/en_us/documents/products/citrixxenserverhaquickstartguide.pdf
HA under hood http://blogs.citrix.com/2008/09/17/peeking-under-the-hood-of-high-availability/

Citrix XenServer 6.0 PXE installation

XenServer System Recovery
  1. Is the Pool Master Down? (No -> 3)
    xe host-list or xe host-is-in-emergency-mode
  2. Recover Pool operations. Promote a member server to a master:
    • xe pool-emergency-transition-to-master
    • xe pool-recover-slaves
  3. Verify which XenServer(s) failed. At any surviving pool member:
    xe host-list params=uuid,name-label,host-metrics-live
    Any servers listed as host-metrics-live=false have failed.
  4. Verify which VMs failed. (If a failure has occurred in another subsystem on the server (for example, the xapi service has failed) and virtual machines are still active, starting the VMs again could in rare cases cause data corruption.)
    xe vm-list is-control-domain=false resident-on=UUID_of_failed_server
  5. Reset Power state on failed VMs
    xe vm-reset-powerstate resident-on= UUID_of_failed_server --force --multiple
    Caution! Incorrectly using the "--multiple" option could result in ALL virtual machines within the pool being reset. Be careful to use the "resident-on" parameter as well. Alternately, you can reset VMs individually.
    Repeat Step 4 to verify.
  6. Restart VMs on another XenServer
    verify in XenCenter VMs running on the failed server are halted. VMs have a home server assigned will not appear in XenCenter, because that specific home server host is still down. A workaround for this ischanging the home server by changing the affinity parameter from the CLI:
    xe vm-param-set uuid=<uuid of vm to change> affinity=<uuid of new homeserver>
XenServer Loses Management Parameters (http://support.citrix.com/article/CTX119240)
- xsconsole "Status Display" shows XenServer version as <Unknown> <Unknown>; management network parameters as <No network configured> and Confgure Management Interface->Select NIC shows <No interfaces present>.
--------------

Resetting XenServer Networking Configuration in an Emergency

Incorrect networking settings can cause loss of network connectivity, and a XenServer host may become
inaccessible via XenCenter or remote SSH. Emergency Network Reset provides a simple mechanism to recover and reset a host's networking.

This feature is available from the Command Line Interface (CLI) using the xe-reset-networking command
and within the Network and Management Interface section of xsconsole.

Incorrect settings which could cause a loss of network connectivity could include renaming network interfaces, creating bonds or VLANs, or mistakes when changing the management interface (for example, entering the wrong IP address). In addition, you may want to run this utility if a rolling pool upgrade, manual upgrade, hotfix installation or driver installation causes a lack of network connectivity, or if a Pool master or host in a resource pool is unable to contact with other hosts.

This utility should only be used in an emergency as it will remove the configuration for all PIFs, Bonds, VLANs and tunnels associated with the host. Guest Networks and VIFs are preserved. As part of this utility, VMs will be shutdown forcefully, where possible before running this command, VMs should be cleanly shutdown. Before applying a reset, users can make changes to the Primary Management Interface and specify which IP configuration, DHCP or Static, should be used.

If the Pool Master requires a network reset, it must be carried out before a network reset of any other pool
members. It should then be followed a network reset on all remaining hosts in the pool to ensure that the pool's networking configuration is homogeneous. This is a particularly important factor for XenMotion.

Note:

If the Pool Master's IP address (the Primary Management Interface) changes, as a result of a network reset or xe host.management_reconfigure, you must also apply the network reset command to other hosts in the pool, so that they can reconnect to the Pool Master on its new IP address. In this situation, the IP address of the Pool Master must be specified.

Network reset is NOT supported if High Availability (HA) is enabled. To reset network configuration in this scenario, you must first manually disable HA, and then run the network reset command.

Verifying the Network Reset

After specifying the configuration mode to be used after the network reset, xsconsole and the CLI will display the settings which will be applied after host reboot. This offers a final chance to make any modifications before applying the emergency network reset command. After reboot, the new network configuration can be verified in XenCenter and xsconsole. In XenCenter, with the host selected, click the Networking tab, this displays the new network configuration. In xsconsole, this information is displayed in the Network and Management Interface section.

Note:

Emergency Network Reset should also be applied on other pool members to replicate bonds, VLANs or tunnels from the Pool Master's new configuration.

Using the CLI for Network Reset

The following table shows the available optional parameters which can be used with the xe-reset-networking
command.

Warning:

Users are responsible for ensuring the validity of parameters for the xe-reset-networking command, check the parameters carefully. If invalid parameters are specified, network connectivity and configuration will be lost. In this situation, Citrix advises customers to re-run the command xe-reset-networking without using any parameters.

Resetting the networking configuration of a whole pool must begin on the Pool Master, and should then be followed by network reset on all remaining hosts in the pool

ParameterRequired/Optional Description
-m, --master Optional
IP address of the Pool Master's primary management interface.
Defaults to the last known Pool Master's IP address.
--device Optional
Device name of the primary management interface. Defaults to
the device name specified during installation.
--mode=static Optional
Enables the following four networking parameters for static
IP configuration for the primary management interface. If not
specified, networking will be configured using DHCP.
--ip
Required if
mode=static

IP address for the host's primary management interface. Only
valid if mode=static.
--netmask
Required if
mode=static

Netmask for the primary management interface. Only valid if
mode=static.
--gateway Optional
Gateway for the primary management interface. Only valid if
mode=static.
--dns Optional
DNS Server for the primary management interface. Only valid if
mode=static.


Pool Master Command Line Examples

Examples of commands that could be applied on a Pool Master:

To reset networking for DHCP configuration:

xe-reset-networking

To reset networking for Static IP configuration:

xe-reset-networking --mode= static --ip=<ip-address> \
  --netmask=<netmask> --gateway=<gateway> \
  --dns=<dns>

To reset networking for DHCP configuration if another interface became the primary management interface
after initial setup:

xe-reset-networking --device=<device-name>

To reset networking for Static IP configuration if another interface became the primary management interface
after initial setup:

xe-reset-networking --device=<device-name> --mode=static \
--ip=<ip-address> --netmask=<netmask> \
--gateway=<gateway> --dns=<dns>

Pool Member Command Line Examples

All previous examples also apply to pool members. Additionally the Pool Master's IP address can be specified (which will be necessary if it has changed.)

To reset networking for DHCP configuration:

xe-reset-networking

To reset networking for DHCP if the Pool Master's IP address was modified:

xe-reset-networking --master=<master-ip-address>

To reset networking for Static IP configuration, assuming the Pool Master's IP address didn't change:

xe-reset-networking --mode=static --ip=<ip-address> --netmask-<netmask> \
  --gateway=<gateway> --dns=<dns>

To reset networking for DHCP configuration if the primary management interface and the Pool Master's IP
address was modified after initial setup:

xe-reset-networking --device=<device-name> --master<master-ip-address>

--------------


在XenServer中,碰到VM挂起(hang)的情况,也不是那么少见,而VM长时间挂起,那么很影响心情和后续的操作。
一般情况下,为了关闭VM或者重启VM,我们推荐这样的操作顺序:
  1. 进入到VM内,使用系统的关机或者重启功能
  2. 通过XenCenter的菜单选择ShutDown或者Restart。虽然这个菜单的实现是通过XenServer tool来控制系统的命令来实现,但是不保证在XenServer Tools工作异常的情况下,导致VM挂起(Hang),而且,这个应该也是VM挂起(XenCenter中VM标志处于黄色状态)的主要原因。
  3. 尝试通过XenCenter菜单的Force Shutdown和Force Restart来强制操作。
如果这些操作都进行了以后,VM也长时间处于挂起状态,为了让VM能够关机,或者说是强制关机来重置其状态,我们有以下几种解决方法,这些解决方法的危害会逐渐增加,所以,请按顺序尝试:
  1. 尝试重置VM的电源状态
    1
    xe vm-reset-powerstate force=true vm=<vm name>
  2. 尝试重启toolstack
    1
    xe-toolstack-restart
  3. 尝试destroy domain
    1
    2
    3
    4
    5
    6
    #首先获取VM的UUID
    xe vm-list name-label=<vm name> params=uuid
    #获取VM的Domain ID
    list_domains | grep <VM-UUID>
    #尝试重置hang状态的VM
    /opt/xensource/debug/xenops destroy_domain -domid <vm domain id>
  4. 到这里还不行,就可以强制VM进入崩溃状态
    1
    2
    3
    4
    5
    6
    #首先获取VM的UUID
    xe vm-list name-label=<vm name> params=uuid
    #获取VM的Domain ID
    list_domains | grep <VM-UUID>
    #手动触发VM的Crash机制
    /usr/lib/xen/bin/crash_guest <domain ID>
  5. 如果连Crash机制都不起作用的情况下,那么就只剩下强制关闭XenServer主机电源一条途径了。
注:在Crash VM以后,VM会处于蓝屏状态,这个时候,可以再试试正常的关机或者强制关机命令来关闭虚机。
BTW:某些情况下,可能关机等操作会由于某些原因,导致操作延迟,而且取消也会失败,这个时候可能稍微多等一下就OK了。
---
Auto Restart VM for XenServer 6.0
http://forums.citrix.com/thread.jspa?threadID=300865
http://burm.net/2012/01/28/xenserver-tips-and-tricks-auto-start-your-vm/

When using the free edition of XenServer 6.0, you’ll want to do a few things such as enabling Auto Start / Auto Boot / Auto Power On.  For some reason this feature was removed from the “Free” version in 6.0, so lets go ahead and set this up.
First your going to want to get the UUID of the VM’s you wish to enable auto start on as well as the UUID of the pool these VM’s reside in.
To get the list of the pool’s on your XenServer type:
xe pool-list
Copy the UUID of the pool, in my case there ist just one pool. Then issue the following command, and replace the UUID with your pools UUID.
xe pool-param-set uuid=UUID other-config:auto_poweron=true
Then, at the command prompt of your XenServer type:
xe vm-list
You should get a full list of the VM’s on the server, along with their name and UUID. Copy the UUID of the VM you wish to enable autostart then issue the command below, again replacing the UUID with the UUID of the VM you wish to auto start.
To Auto Start your Pool (replace UUID with the UUID of your Pool):
xe pool-param-set uuid=UUID other-config:auto_poweron=true
To Auto Start your VM (replace the UUID with the UUID of your VM):
xe vm-param-set uuid=UUID other-config:auto_poweron=true
xe vm-param-set uuid=UUID other-config:auto_poweron=true
And thats it, the next time you ever need to power cycle your main server, the Xen instances should power up automatically.

---
Force remove dead pool member
Prolog:
We had a XenServer go down and it required a rebuild.  The problem was that we could not use the same name until the old server was removed from XenCenter.  Using XE Host-Forget UUID=<Host UUID> did not work because the pool master thought a VM was still running on the missing server.  However, using some other, more drastic commands, we managed to remove the host UUID from the pool master so we could not use the typical XE commands to shut down the VM and remove the host.
Get to work:
Here are the steps I used to remove the host (these commands were culled from different sources, so hopefully putting them in once place will be a benefit).  For the purposes of this example the dead server will be known as FISHHEAD and the pool master IP will be 192.168.1.1.
I first had to get the UUID of FISHHEAD which was no longer in the host list, so I ran this command from a server with XenCenter installed:
xe –s 192.168.1.1 – u root –pw <MYROOTPW> pool-sync-database
This generated the following error:
You attempted an operation which involves a host which could not be contacted.
host: 5491fe8d-70ae-4a82-aae1-ab2719f1469e (FISHHEAD)
Now with the UUID of FISHHEAD I ran this from the pool-master’s console.
xe vm-list resident-on=5491fe8d-70ae-4a82-aae1-ab2719f1469e
Which listed FISHHEADGUEST (UUID=1d75984e-1b9c-0ea0-0a22-8db2175ca70f) which was preventing the removal of FISHHEAD.
I used this command to power off FISHHEADGUEST :
xe vm-reset-powersate uuid=1d75984e-1b9c-0ea0-0a22-8db2175ca70f force=true
Finally, I could use host-forget to remove FISHHEAD.
xe host-forget UUID=5491fe8d-70ae-4a82-aae1-ab2719f1469e