Tuesday, May 4, 2010

Recover OCR from Valid OCR Mirror

The restore process will use the good OCR copy (whether its the primary OCR or the OCR mirror) to restore the missing/corrupt copy. Remember that if there is at least one copy of the OCR available, you can use that valid copy to restore the contents of the other copy of the OCR. The best part about this type of recovery is that it doesn't require any downtime! Oracle Clusterware and the applications can remain online during the recovery process.
For the purpose of this example, let's corrupt the primary OCR file:

[root@racnode1 ~]# dd if=/dev/zero of=/u02/oradata/racdb/OCRFile bs=4k count=100
100+0 records in
100+0 records out
409600 bytes (410 kB) copied, 0.00756842 seconds, 54.1 MB/s


Running ocrcheck picks up the now corrupted primary OCR file:
[root@racnode1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 4668
Available space (kbytes) : 257452
ID : 1331197
Device/File Name : /u02/oradata/racdb/OCRFile <-- Corrupt OCR
Device/File needs to be synchronized with the other device
Device/File Name : /u02/oradata/racdb/OCRFile_mirror
Device/File integrity check succeeded
Cluster registry integrity check succeeded

Note that after loosing the one OCR copy (in this case, the primary OCR file), Oracle Clusterware and the applications remain online:

While the applications and CRS remain online, perform the following steps to recover the primary OCR using the contents of the OCR mirror.
When using a clustered file system, remove the corrupt OCR file and re-initialize it:
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# cp /dev/null /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# chown root /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# chgrp oinstall /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# chmod 640 /u02/oradata/racdb/OCRFile

NOTE: If the target OCR is located on a raw device, verify the permissions are applied correctly for an OCR file (owned by root:oinstall with 0640 permissions), that the device is being shared by all nodes in the cluster, and finally use the dd command from only one node in the cluster to zero out the device and make sure no data is written to the raw device.
[root@racnode1 ~]# ls -l /dev/raw/raw1
crw-r----- 1 root oinstall 162, 1 Oct 6 11:05 /dev/raw/raw1
[root@racnode2 ~]# ls -l /dev/raw/raw1
crw-r----- 1 root oinstall 162, 1 Oct 6 11:04 /dev/raw/raw1
[root@racnode1 ~]# dd if=/dev/zero of=/dev/raw/raw1


Restore the primary OCR using the contents of the OCR mirror. Note that this operation is the same process used when adding a new OCR location:
[root@racnode1 ~]# ocrconfig -replace ocr /u02/oradata/racdb/OCRFile

NOTE: If the target OCR is located on a raw device, substitute the path name above with that of the shared device name: (i.e. /dev/raw/raw1)
Verify the restore was successful by viewing the Clusterware alert log file.
[root@racnode1 ~]# tail $ORA_CRS_HOME/log/racnode1/alertracnode1.log
...
2009-10-06 17:46:51.118
[crsd(11054)]CRS-1007:The OCR/OCR mirror location was replaced by /u02/oradata/racdb/OCRFile.
Verify the OCR configuration by running the ocrcheck command:
[root@racnode1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 4668
Available space (kbytes) : 257452
ID : 1331197
Device/File Name : /u02/oradata/racdb/OCRFile
Device/File integrity check succeeded <-- Primary OCR Restored
Device/File Name : /u02/oradata/racdb/OCRFile_mirror
Device/File integrity check succeeded
Cluster registry integrity check succeeded


As the oracle user account with user equivalence enabled on all the nodes, run the cluvfy command to validate the OCR configuration:
[oracle@racnode1 ~]$ ssh racnode1 "hostname; date"
racnode1
Tue Oct 6 17:52:52 EDT 2009
[oracle@racnode1 ~]$ ssh racnode2 "hostname; date"
racnode2
Tue Oct 6 17:51:50 EDT 2009
[oracle@racnode1 ~]$ cluvfy comp ocr -n all
Verifying OCR integrity
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check passed.
Verification of OCR integrity was successful.

Recover OCR from Automatically Generated Physical Backup -
This section demonstrates how to recover the Oracle Cluster Registry from a lost or corrupt OCR file. This example assumes that both the primary OCR and the OCR mirror are lost from an accidental delete by a user and that the latest
automatic OCR backup copy on the master node is accessible.
At this time, the second node in the cluster (racnode2) is the
master node and currently available. We will be restoring the OCR using the latest OCR backup copy from racnode2 which is located at /u01/app/crs/cdata/crs/backup00.ocr.
Let's now corrupt the OCR by removing both the primary OCR and the OCR mirror:

[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile_mirror
Running ocrcheck fails to provide any useful information given that both OCR files are lost

[root@racnode1 ~]# ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
Note that after loosing both OCR files, Oracle Clusterware and the applications remain online. Before restoring the OCR, the applications and CRS will need to be shutdown as described in the steps below.
Perform the following steps to recover the OCR from the latest automatically generated physical backup:
With CRS still online, identify the
master node (which in this example is racnode2) and all OCR backups using the ocrconfig -showbackup command:
[root@racnode1 ~]# ocrconfig -showbackup
racnode2 2009/10/07 12:05:18 /u01/app/crs/cdata/crs
racnode2 2009/10/07 08:05:17 /u01/app/crs/cdata/crs
racnode2 2009/10/07 04:05:17 /u01/app/crs/cdata/crs
racnode2 2009/10/07 00:05:16 /u01/app/crs/cdata/crs
racnode1 2009/09/24 08:49:19 /u01/app/crs/cdata/crs
Note that ocrconfig -showbackup may result in a segmentation fault or simply not show any results if CRS is shutdown.
For documentation purposes, identify the number and location of all configured OCR files that will be recovered in this example.
[root@racnode2 ~]# cat /etc/oracle/ocr.loc
#Device/file /u02/oradata/racdb/OCRFile getting replaced by device /u02/oradata/racdb/OCRFile
ocrconfig_loc=/u02/oradata/racdb/OCRFile
ocrmirrorconfig_loc=/u02/oradata/racdb/OCRFile_mirror
Although all OCR files have been lost or corrupted, the Oracle Clusterware daemons as well as the clustered database remain running. In this scenario, Oracle Clusterware and all managed resources need to be shut down in order to recover the OCR. Attempting to stop CRS using crsctl stop crs will fail given it cannot write to the now lost/corrupt OCR file:
[root@racnode1 ~]# crsctl stop crs
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
With the environment in this unstable state, shutdown all database instances from all nodes in the cluster and then reboot each node:
[oracle@racnode1 ~]$ sqlplus / as sysdba
SQL> shutdown immediate
[root@racnode1 ~]# reboot
------------------------------------------------
[oracle@racnode2 ~]$ sqlplus / as sysdba
SQL> shutdown immediate
[root@racnode2 ~]# reboot
When the Oracle RAC nodes come back up, note that Oracle Clusterware will fail to start as a result of the lost/corrupt OCR file:
[root@racnode1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[root@racnode2 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
When using a clustered file system, re-initialize both the primary OCR and the OCR mirror target locations identified earlier in the
/etc/oracle/ocr.loc file:
[root@racnode1 ~]# rm -f /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# cp /dev/null /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# chown root /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# chgrp oinstall /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# chmod 640 /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# rm -f /u02/oradata/racdb/OCRFile_mirror
[root@racnode1 ~]# cp /dev/null /u02/oradata/racdb/OCRFile_mirror
[root@racnode1 ~]# chown root /u02/oradata/racdb/OCRFile_mirror
[root@racnode1 ~]# chgrp oinstall /u02/oradata/racdb/OCRFile_mirror
[root@racnode1 ~]# chmod 640 /u02/oradata/racdb/OCRFile_mirror

NOTE: If the target OCR is located on a raw device(s), verify the permissions are applied correctly for an OCR file (owned by root:oinstall with 0640 permissions), that the device is being shared by all nodes in the cluster, and finally use the dd command from only one node in the cluster to zero out the device(s) and make sure no data is written to the raw device(s).
[root@racnode1 ~]# ls -l /dev/raw/raw[12]
crw-r----- 1 root oinstall 162, 1 Oct 7 15:00 /dev/raw/raw1
crw-r----- 1 root oinstall 162, 2 Oct 7 15:00 /dev/raw/raw2
[root@racnode2 ~]# ls -l /dev/raw/raw[12]
crw-r----- 1 root oinstall 162, 1 Oct 7 14:59 /dev/raw/raw1
crw-r----- 1 root oinstall 162, 2 Oct 7 14:59 /dev/raw/raw2
[root@racnode1 ~]# dd if=/dev/zero of=/dev/raw/raw1 <-- OCR (primary)
[root@racnode1 ~]# dd if=/dev/zero of=/dev/raw/raw2 <-- OCR (mirror)
Before restoring the OCR, dump the contents of the physical backup you intend to recover from the
master node (racnode2) to validate its availability as well as the accuracy of its contents:
[root@racnode2 ~]# ocrdump -backupfile /u01/app/crs/cdata/crs/backup00.ocr
[root@racnode2 ~]# less OCRDUMPFILE
With CRS down, perform the restore operation from the
master node (racnode2) by applying the latest automatically generated physical backup:
[root@racnode2 ~]# ocrconfig -restore /u01/app/crs/cdata/crs/backup00.ocr
Restart Oracle Clusterware on all of the nodes in the cluster by rebooting each node or by running the crsctl start crs command:
[root@racnode1 ~]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[root@racnode2 ~]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
Verify the OCR configuration by running the ocrcheck command:
[root@racnode1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 4668
Available space (kbytes) : 257452
ID : 1331197
Device/File Name : /u02/oradata/racdb/OCRFile
Device/File integrity check succeeded <-- Primary OCR Restored
Device/File Name : /u02/oradata/racdb/OCRFile_mirror
Device/File integrity check succeeded <-- Mirror OCR Restored
Cluster registry integrity check succeeded
As the oracle user account with user equivalence enabled on all the nodes, run the cluvfy command to validate the OCR configuration:
[oracle@racnode1 ~]$ ssh racnode1 "hostname; date"
racnode1
Wed Oct 7 16:29:49 EDT 2009
[oracle@racnode1 ~]$ ssh racnode2 "hostname; date"
racnode2
Wed Oct 7 16:29:06 EDT 2009
[oracle@racnode1 ~]$ cluvfy comp ocr -n all
Verifying OCR integrity
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check passed.
Verification of OCR integrity was successful.
Finally, verify the applications are running:
[root@racnode1 ~]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.racdb.db application ONLINE ONLINE racnode1
ora....b1.inst application ONLINE ONLINE racnode1
ora....b2.inst application ONLINE ONLINE racnode2
ora....srvc.cs application ONLINE ONLINE racnode2
ora....db1.srv application ONLINE ONLINE racnode1
ora....db2.srv application ONLINE ONLINE racnode2
ora....SM1.asm application ONLINE ONLINE racnode1
ora....E1.lsnr application ONLINE ONLINE racnode1
ora....de1.gsd application ONLINE ONLINE racnode1
ora....de1.ons application ONLINE ONLINE racnode1
ora....de1.vip application ONLINE ONLINE racnode1
ora....SM2.asm application ONLINE ONLINE racnode2
ora....E2.lsnr application ONLINE ONLINE racnode2
ora....de2.gsd application ONLINE ONLINE racnode2
ora....de2.ons application ONLINE ONLINE racnode2
ora....de2.vip application ONLINE ONLINE racnode2

No comments:

Post a Comment