Thursday, May 10, 2012

Catalog Recovery Procedure

DOCUMENTATION: Catalog recovery procedure: an example of disaster recovery.


Details:

Sample Catalog recovery procedure
This is only an example of procedures used. Not all steps are required and may not be applicable to all situations.

These directions assume that NetBackup has already been installed on the DR (disaster recovery) server.

Note: these instructions describe a full recovery of the master for redeployment into production.

1. Load media into DR site robot.
For safety, only load the catalog tape initially.

Alternately, load all media as write protected.

2. Validate the bp.conf and vm.conf (if applicable) configuration files.
Additional things to check:

  • Check /usr/openv/volmgr/vm.conf for a MEDIA_ID_BARCODE_CHARS entry, if it is needed.
  • Check /usr/openv/netbackup/db/config for the touch files NUMBER_DATA_BUFFERS and SIZE_DATA_BUFFERS.
  • Check /usr/openv/netbackup for the touch file NET_BUFFER_SZ.
3. In the GUI, run the device configuration wizard.
Uncheck all servers except the master server.
Run the wizard to configure the devices.
On the first result dialog, check for any limitations.

To change from the default drive densities, if desired:
In the Drag and Drop Configuration dialog, select the drive and click on the Properties button. Verify the Drive Density.
Repeat this for each drive.

In the Configure Storage Unit dialog, select the storage unit and click Properties to verify the storage unit settings.

4. Run an inventory of the robot.
Preview the inventory and verify the media type is correct.

Make note of the barcodes and any changes from the production site, as different robots can return different barcodes for the same tape. For instance, LTO tapes can have "L#" on the end of the barcode. This can be disabled via the robot console option for short labels.

Update the volume configuration.

5. Add the following line to bp.conf:
RESOURCE_MONITOR_INTERVAL = 3600

This will change media server polling from 10 minutes to 1 hour.

6. Make copies of the DR environment bp.conf and vm.conf files
# cd /usr/openv/netbackup
# cp bp.conf bp.conf.dr
# cd ../volmgr
# cp vm.conf vm.conf.dr

7. Recover the entire catalog.
This is performed from the GUI on the master server. Always log in to the Java GUI using the short hostname, as the fully qualified domain name may not match the production site.

Note: Optionally, the catalog recovery can be performed from the command line:
# /usr/openv/netbackup/bin/admincmd/bprecover -wizard

8. Manually deactivate all backup policies.
From the GUI, select all policies, right click and select Deactivate. This may take a while.

Be sure that all policies are deactivated before proceeding.

9. Shut down NetBackup.
# /usr/openv/netbackup/bin/bp.kill_all

Verify with bpps -x that only /opt/VRTSpbx/bin/pbx_exchange is running.

10. Prep the bp.conf and vm.conf configuration files.
Copy bp.conf and vm.conf to bp.conf.prod and vm.conf.prod:
# cd /usr/openv/netbackup
# cp bp.conf bp.conf.prod
# cd ../volmgr
# cp vm.conf vm.conf.prod

Then, copy back the bp.conf.dr and vm.conf.dr to bp.conf and vm.conf:
# cd /usr/openv/netbackup
# cp bp.conf.dr bp.conf
# cd ../volmgr
# cp vm.conf.dr vm.conf

Verify that the hostnames of any remote Windows console servers are included in the bp.conf with a SERVER entry.

11. Make sure bp.conf and vm.conf are configured correctly to reflect DR environment.
Append FORCE_RESTORE_MEDIA_SERVER entries to bp.conf for each media server not present at DR that were used to do backups in production. The syntax of these entries is as follows:

FORCE_RESTORE_MEDIA_SERVER =

12. Perform a partial startup nbemm.
This will allow modification of the nbemm database without the job manager running and kicking off jobs.

# /usr/openv/netbackup/bin/nbdbms_start_stop start
# /usr/openv/netbackup/bin/nbemm

Run bpps -x to verify that nbemm and NB_dbsrv are running

13. Deactivate all media servers not participating in the DR.
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -updatehost -machinename -machinestateop set_admin_pause -machinetype media -masterserver

Be sure to execute this command against every unavailable media server.

14. Start nbevtmgr and bpdbm.
# /usr/openv/netbackup/bin/nbevtmgr
# /usr/openv/netbackup/bin/initbpdbm

Again, run bpps -x to verify that bpdbm and nbevtmgr are running.

15. Delete all storage units.
This step is optional, but will give a cleaner experience. Either use the GUI or the command line.

From the command line:
# /usr/openv/netbackup/bin/admincmd/bpstulist -go | cut -f 1 -d ' ' > /tmp/stu_groups
# /usr/openv/netbackup/bin/admincmd/bpstulist | cut -f 1 -d ' ' > /tmp/stu_list
# for i in `cat /tmp/stu_groups` ; do echo "/usr/openv/netbackup/bin/admincmd/bpstudel -group $i" ; done >> /tmp/delete_stu_groups
# for i in `cat /tmp/stu_list` ; do echo "/usr/openv/netbackup/bin/admincmd/bpstudel -label $i" ; done >> /tmp/delete_stus
# sh /tmp/delete_stu_groups

Note: Be sure to delete storage unit groups first prior to deleting storage units!

16. Delete all tape devices from the command line.
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -deletealldevices -allrecords

Verify no devices are returned:
# /usr/openv/volmgr/bin/tpconfig -emm_dev_list -noverbose

17. Stop and restart NetBackup.
# /usr/openv/netbackup/bin/bp.kill_all

Use bpps -x to verify that only /opt/VRTSpbx/bin/pbx_exchange is running.

# /usr/openv/netbackup/bin/bp.start_all
...
# bpps -x
NB Processes
------------
root 13809 13808 0 15:05:07 ? 0:00 /usr/openv/netbackup/bin/nbproxy dblib nbjm
root 13775 1 0 15:05:03 ? 0:00 /usr/openv/netbackup/bin/bpcompatd
root 13786 1 1 15:05:04 ? 0:00 /usr/openv/netbackup/bin/bpdbm
root 13757 1 0 15:05:01 ? 0:00 /usr/openv/netbackup/bin/nbrb
root 13747 1 0 15:04:59 ? 0:00 /usr/openv/netbackup/bin/nbevtmgr
root 13795 13786 0 15:05:05 ? 0:00 /usr/openv/netbackup/bin/bpjobd
root 13856 1 0 15:05:12 ? 0:00 /usr/openv/netbackup/bin/nbsvcmon
root 13813 1 1 15:05:08 ? 0:01 /usr/openv/netbackup/bin/nbstserv
root 13811 13810 1 15:05:07 ? 0:01 /usr/openv/netbackup/bin/nbproxy dblib nbpem
root 13818 1 1 15:05:09 ? 0:01 /usr/openv/netbackup/bin/nbrmms
root 13752 1 2 15:05:00 ? 0:02 /usr/openv/netbackup/bin/nbemm
root 13808 13797 0 15:05:07 ? 0:00 sh -c "/usr/openv/netbackup/bin/nbproxy" dblib nbjm
root 13770 1 1 15:05:02 ? 0:01 /usr/openv/netbackup/bin/bprd
root 13844 1 0 15:05:11 ? 0:00 /usr/openv/netbackup/bin/nbsl
root 13797 1 0 15:05:05 ? 0:00 /usr/openv/netbackup/bin/nbjm
root 13810 13804 0 15:05:07 ? 0:00 sh -c "/usr/openv/netbackup/bin/nbproxy" dblib nbpem
root 13804 1 0 15:05:06 ? 0:00 /usr/openv/netbackup/bin/nbpem
root 13742 1 0 15:04:57 ? 0:02 /usr/openv/db/bin/NB_dbsrv


MM Processes
------------
root 13783 1 1 15:05:04 ? 0:01 vmd -v


Shared Symantec Processes
-------------------------
root 142 1 1 16:46:54 ? 0:52 /opt/VRTSpbx/bin/pbx_exchange

18. In the GUI, run the device configuration wizard to configure shared drives.
Uncheck all servers except the robot control host.
Run the wizard to configure the devices
On the first result dialog, check for any limitations

To change from the default drive densities, if desired:
In the Drag and Drop Configuration dialog, select the drive and click on the Properties button. Verify the Drive Density.
Repeat for each drive.

In the Configure Storage Unit dialog, select the storage unit and click Properties to verify the storage unit settings.

If needed, repeat the process for any additional servers.

Using the Device Monitor, make sure that no drives have the RESTART bit set. Restart ltid on the servers if needed.

19. Run the robot inventory.
Before running the inventory, use the GUI to verify all the recovery media are set to non-robotic.
If they are not, select all robotic media, right click and select Move. Make sure Volume is in a robotic library is unchecked.

Note: The volume group may be "---" - this is okay.

Hit OK.

Make note of the barcodes and any changes from the production site (the presence of the "L#" tag). The hardware should be able to toggle the "L#" tag (long vs. short labels). If that is not possible, the barcode of the media can be changed with the following command:
# /usr/openv/volmgr/bin/vmchange -barcode -m

20. Verify that restores work.

More information on DR procedures can be found in Chapter 7 of the NetBackup Troubleshooting Guide (linked below).

No comments:

Post a Comment