Thursday, May 10, 2012


Task List:

Checking CRS Status
Viewing Name Of the Cluster
Viewing Nodes Configuration
Checking Votedisk Information
Checking OCR Disk information
Timeout Settings in Cluster
ADD/Remove OCR files
ADD/Remove Votedisk
Backing Up OCR
Backing Up Votedisk
Restoring OCR Devices
Restoring Voting Disk Devices
Changing Public IPs as well as Virtual IPs

Checking CRS Status:

The below two commands are generally used to check the status of CRS. The first command lists the status of CRS
on the local node where as the other command shows the CRS status across all the nodes in Cluster.

crsctl check crs <<-- for the local node
crsctl check cluster <<-- for remote nodes in the cluster

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]#

Checking Viability of CSS across nodes:

crsctl check cluster

For this command to run, CSS needs to be running on the local node. The "ONLINE" status for remote node says that CSS is running on that node.
When CSS is down on the remote node, the status of "OFFLINE" is displayed for that node.

[root@node1-pub ~]# crsctl check cluster
node1-pub    ONLINE
node2-pub    ONLINE
Viewing Cluster name:

I use below command to get the name of Cluster. You can also dump the ocr and view the name from the dump file.

ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk '{print $3}'

[root@node1-pub ~]# ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk '{print $3}'
test-crs
[root@node1-pub ~]#

OR

ocrconfig -export /tmp/ocr_exp.dat -s online
for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done


[root@node1-pub ~]# ocrconfig -export /tmp/ocr_exp.dat -s online
[root@node1-pub ~]# for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done
test-crs
[root@node1-pub ~]#

OR

Oracle creates a directory with the same name as Cluster under the $ORA_CRS_HOME/cdata. you can get the cluster name from this directory as well.

[root@node1-pub ~]# ls /u01/app/crs/cdata
localhost  test-crs

Viewing No. Of Nodes configured in Cluster:

The below command can be used to find out the number of nodes registered into the cluster.
It also displays the node's Public name, Private name and Virtual name along with their numbers.

olsnodes -n -p -i

[root@node1-pub ~]# olsnodes -n -p -i
node1-pub       1       node1-prv       node1-vip
node2-pub       2       node2-prv       node2-vip

Viewing Votedisk Information:

The below command is used to view the no. of Votedisks configured in the Cluster.

crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
 0.     0    /u02/ocfs2/vote/VDFile_0
 1.     0    /u02/ocfs2/vote/VDFile_1
 2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]#

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]#

Viewing OCR Disk Information:

The below command is used to view the no. of OCR files configured in the Cluster. It also displays the version of OCR
as well as storage space information. You can only have 2 OCR files at max.

ocrcheck

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3848
         Available space (kbytes) :     258272
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

Various Timeout Settings in Cluster:

Disktimeout:
    Disk Latencies in seconds from node-to-Votedisk. Default Value is 200. (Disk IO)
Misscount:
    Network Latencies in second from node-to-node (Interconnect). Default Value is 60 Sec (Linux) and 30 Sec in Unix platform. (Network IO)
    Misscount < Disktimeout

NOTE: Do not change them without contacting Oracle Support. This may cause logical corruption to the Data.

IF
  (Disk IO Time > Disktimeout) OR (Network IO time > Misscount)
THEN
   REBOOT NODE
ELSE
   DO NOT REBOOT
END IF;

crsctl get css disktimeout
crsctl get css misscount
crsctl get css  reboottime

[root@node1-pub ~]# crsctl get css disktimeout
200

[root@node1-pub ~]# crsctl get css misscount
Configuration parameter misscount is not defined. <<<<< This message indicates that the Misscount is not set maually and it is set to it's
Default Value On Linux, it is default to 60 Second. If you want to chang it, you can do that as below. (Not recommended)

[root@node1-pub ~]# crsctl set css misscount 100
Configuration parameter misscount is now set to 100.
[root@node1-pub ~]# crsctl get css misscount
100

The below command sets the value of misscount back to its Default values:

 crsctl unset css misscount

[root@node1-pub ~]# crsctl unset css misscount

[root@node1-pub ~]# crsctl get css  reboottime
3
Add/Remove OCR file in Cluster:

Removing OCR File

(1) Get the Existing OCR file information by running ocrcheck utility.

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <-- OCR
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_1 <-- OCR Mirror
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

(2) The First command removes the OCR mirror (/u02/ocfs2/ocr/OCRfile_1). If you want to remove the OCR
      file (/u02/ocfs2/ocr/OCRfile_1) run the next command.

ocrconfig -replace ocrmirror
ocrconfig -replace ocr

[root@node1-pub ~]# ocrconfig -replace ocrmirror
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <<-- OCR File
                                    Device/File integrity check succeeded

                                    Device/File not configured  <-- OCR Mirror not existed any more

         Cluster registry integrity check succeeded

Adding OCR

You need to add OCR or OCR Mirror file in a case where you want to move the existing OCR file location to the different Devices.
The below command add ths OCR mirror file if OCR file alread exists.

(1) Get the Current status of OCR:

[root@node1-pub ~]# ocrconfig -replace ocrmirror
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <<-- OCR File
                                    Device/File integrity check succeeded

                                    Device/File not configured  <-- OCR Mirror does not exist

         Cluster registry integrity check succeeded

As You can see, I only have one OCR file but not the second file which is OCR Mirror.
So, I can add second OCR (OCR Mirror) as below command.

ocrconfig -replace ocrmirror

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_1
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

You can have at most 2 OCR devices (OCR itself and its single Mirror) in a cluster. Adding extra Mirror gives you below error message

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_2
PROT-21: Invalid parameter
[root@node1-pub ~]#

Add/Remove Votedisk file in Cluster:

Adding Votedisk:

Get the existing Vote Disks associated into the cluster. To be safe, Bring crs cluster stack down on all the nodes
but one on which you are going to add votedisk from.

(1) Stop CRS on all the nodes in cluster but one.

[root@node2-pub ~]# crsctl stop crs

(2) Get the list of Existing Vote Disks

crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
 0.     0    /u02/ocfs2/vote/VDFile_0
 1.     0    /u02/ocfs2/vote/VDFile_1
 2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).

(3) Backup the VoteDisk file

Backup the existing votedisks as below as oracle:

dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0

[root@node1-pub ~]# su - oracle
[oracle@node1-pub ~]$ dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
41024+0 records in
41024+0 records out
[oracle@node1-pub ~]$

(4) Add an Extra Votedisk into the Cluster:

    If it is a OCFS, then touch the file as oracle. On raw devices, initialize the raw devices using "dd" command

touch /u02/ocfs2/vote/VDFile_3 <<-- as oracle
crsctl add css votedisk /u02/ocfs2/vote/VDFile_3 <<-- as oracle
crsctl query css votedisks

[root@node1-pub ~]# su - oracle
[oracle@node1-pub ~]$ touch /u02/ocfs2/vote/VDFile_3
[oracle@node1-pub ~]$ crsctl add css votedisk /u02/ocfs2/vote/VDFile_3
Now formatting voting disk: /u02/ocfs2/vote/VDFile_3.
Successful addition of voting disk /u02/ocfs2/vote/VDFile_3.

(5) Confirm that the file has been added successfully:

[root@node1-pub ~]# ls -l /u02/ocfs2/vote/VDFile_3
-rw-r-----  1 oracle oinstall 21004288 Oct  6 16:31 /u02/ocfs2/vote/VDFile_3
[root@node1-pub ~]# crsctl query css votedisks
Unknown parameter: votedisks
[root@node1-pub ~]# crsctl query css votedisk
 0.     0    /u02/ocfs2/vote/VDFile_0
 1.     0    /u02/ocfs2/vote/VDFile_1
 2.     0    /u02/ocfs2/vote/VDFile_2
 3.     0    /u02/ocfs2/vote/VDFile_3
Located 4 voting disk(s).
[root@node1-pub ~]#

Removing Votedisk:

Removing Votedisk from the cluster is very simple. Tthe below command removes the given votedisk from cluster configuration.

crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3

[root@node1-pub ~]# crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3
Successful deletion of voting disk /u02/ocfs2/vote/VDFile_3.
[root@node1-pub ~]#

[root@node1-pub ~]# crsctl query css votedisk
 0.     0    /u02/ocfs2/vote/VDFile_0
 1.     0    /u02/ocfs2/vote/VDFile_1
 2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]#

Backing Up OCR

Oracle performs physical backup of OCR devices every 4 hours under the default backup direcory $ORA_CRS_HOME/cdata/
and then it rolls that forward to Daily, weekly and monthly backup. You can get the backup information by executing below command.

ocrconfig -showbackup

[root@node1-pub ~]# ocrconfig -showbackup

node2-pub     2007/09/03 17:46:47     /u01/app/crs/cdata/test-crs/backup00.ocr

node2-pub     2007/09/03 13:46:45     /u01/app/crs/cdata/test-crs/backup01.ocr

node2-pub     2007/09/03 09:46:44     /u01/app/crs/cdata/test-crs/backup02.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/day.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/week.ocr
[root@node1-pub ~]#

Manually backing up the OCR

ocrconfig -manualbackup <<--Physical Backup of OCR

The above command backs up OCR under the default Backup directory. You can export the contents of the OCR using below command (Logical backup).

ocrconfig -export /tmp/ocr_exp.dat -s online <<-- Logical Backup of OCR

Restoring OCR

The below command is used to restore the OCR from the physical backup. Shutdown CRS on all nodes.

ocrconfig -restore

Locate the avialable Backups

[root@node1-pub ~]# ocrconfig -showbackup

node2-pub     2007/09/03 17:46:47     /u01/app/crs/cdata/test-crs/backup00.ocr

node2-pub     2007/09/03 13:46:45     /u01/app/crs/cdata/test-crs/backup01.ocr

node2-pub     2007/09/03 09:46:44     /u01/app/crs/cdata/test-crs/backup02.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/day.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/week.ocr

node1-pub     2007/10/07 13:50:41     /u01/app/crs/cdata/test-crs/backup_20071007_135041.ocr

Perform Restore from previous Backup

[root@node2-pub ~]# ocrconfig -restore /u01/app/crs/cdata/test-crs/week.ocr

The above command restore the OCR from week old backup.
If you have logical backup of OCR (taken using export option), then You can import it with the below command.

ocrconfig -import /tmp/ocr_exp.dat

Restoring Votedisks

    Shutdown CRS on all the nodes in Cluster.
    Locate the current location of the Votedisks
    Restore each of the votedisks using "dd" command from the previous good backup of Votedisk taken using the same "dd" command.
    Start CRS on all the nodes.

crsctl stop crs
crsctl query css votedisk
dd if= of= <<-- do this for all the votedisks
crsctl start crs

Changing Public and Virtual IP Address:


Current Config                                               Changed to

Node 1:

Public IP:       216.160.37.154                              192.168.10.11
VIP:             216.160.37.153                              192.168.10.111
subnet:          216.160.37.159                              192.168.10.0
Netmask:         255.255.255.248                             255.255.255.0
Interface used:  eth0                                        eth0
Hostname:        node1-pub.hingu.net                         node1-pub.hingu.net

Node 2:

Public IP:       216.160.37.156                              192.168.10.22
VIP:             216.160.37.157                              192.168.10.222
subnet:          216.160.37.159                              192.168.10.0
Netmask:         255.255.255.248                             255.255.255.0
Interface used:  eth0                                        eth0
Hostname:        node1-pub.hingu.net                         node2-pub.hingu.net

=======================================================================
(A)

Take the Services, Database, ASM Instances and nodeapps down on both the Nodes in Cluster. Also disable the nodeapps, asm and database instances to prevent them from restarting in case if this node gets rebooted during this process.

srvctl stop service -d test
srvctl stop database -d test
srvctl stop asm -n node1-pub
srvctl stop asm -n node2-pub
srvctl stop nodeapps -n node1-pub,node1-pub2
srvctl disable instance -d test -i test1,test2
srvctl disable asm -n node1-pub
srvctl disable asm -n node2-pub
srvctl disable nodeapps -n node1-pub
srvctl disable nodeapps -n node2-pub

(B)
Modify the /etc/hosts and/or DNS, ifcfg-eth0 (local node) with the new IP values
on All the Nodes

(C)
Restart the specific network interface in order to use the new IP.

ifconfig eth0 down
ifconfig eth0 up

Or, you can restart the network.
CAUTION: on NAS, restarting entire network may cause the node to be rebooted.

(D)
Update the OCR with the New Public IP.
In case of public IP, you have to delete the interface first and then add it back with the new IP address.

As oracle user, Issue the below command:

oifcfg delif -global eth0
oifcfg setif -global eth0/192.168.10.0:public

(E)
Update the OCR with the New Virtual IP.
Virtual IP is part of the nodeapps and so you can modify the nodeapps to update the Virtual IP information.

As privileged user (root), Issue the below commands:

srvctl modify nodeapps -n node1-pub -A 192.168.10.111/255.255.255.0/eth0 <-- for Node 1
srvctl modify nodeapps -n node1-pub -A 192.168.10.222/255.255.255.0/eth0 <-- for Node 2

(F)
Enable the nodeapps, ASM, database Instances for all the Nodes.

srvctl enable instance -d test -i test1,test2
srvctl enable asm -n node1-pub
srvctl enable asm -n node2-pub
srvctl enable nodeapps -n node1-pub
srvctl enable nodeapps -n node2-pub

(G)
Update the listener.ora file on each nodes with the correct IP addresses in case if it uses the IP address instead of the hostname.

(H)
Restart the Nodeapps, ASM and Database instance

srvctl start nodeapps -n node1-pub
srvctl start nodeapps -n node2-pub
srvctl start asm -n node1-pub
srvctl start asm -n node2-pub
srvctl start database -d test

=======================================================================

 

Oracle Locking

Oracle Locking


In Oracle databases, many users may update the same information at roughly the same time. Locking allows one user to update data at a given moment so that another person cannot modify the same data. The data is locked by the transaction until it is committed or rolled back and this is known as data concurrency. Another purpose of Oracle locking is to ensure that all processes can always read the original data as they were at the time the query began even though other users could be modifying the underlying data. This is known as read consistency.



Although locks are a necessity in Oracle, they can create performance issues. Each time a user issues a lock, another user would be prevented from processing the locked data. Oracle locking allows a variety of locks depending on the resources required - a single row, many rows, an entire table, many tables, etc. However, the larger the scope of the lock, the more users will be prevented from processing the data. The Oracle enqueue wait event is the best indication of locking in Oracle databases.



Oracle 9i Enqueue Wait Event

In Oracle 9i, when a session is waiting on the "enqueue" wait event, this indicates a wait for a lock that is held by another user (or sessions) in an incompatible mode to the requested mode. When sessions are found waiting on an enqueue, the following query can be used to find out which session is requesting the lock, the type and mode of the requested lock and the session that is blocking the request:



>SELECT DECODE(request,0,Holder: ,Waiter: )

sid sess,

id1, id2, lmode, request, type

FROM V$LOCK

WHERE (id1, id2, type) IN

(SELECT id1, id2, type FROM V$LOCK WHERE request>0)

ORDER BY id1, request

In Oracle 9i there are approximately 40 types of locks specified by the TYPE column in V$LOCK and each has a unique solution set. The following are examples of the types of locks:

TX: This enqueue is a transaction lock and is typically caused by incorrect application logic or table setup issues.


TM: This enqueue represents a DML lock and is generally due to application issues, particularly if foreign key constraints have not been indexed.

ST: When Oracle performs space management operations (such as allocating temporary segments for a sort, allocating extents for a table, etc), the user session waits on the ST enqueue.

Oracle 10g Enqueue Wait Events

Oracle 10g makes the process of analyzing locks easier by separating the "enqueue" wait event from Oracle 9i into over 200 distinct wait events. Oracle also includes more information about the lock type within the wait event name. For example, an enqueue wait event named "enq: TX - row lock contention" indicates that row locking is occurring, while "enq: TX - index contention" indicates contention on an index. In Oracle 9i, both of these sessions would have been found waiting on the "enqueue" wait event with a lock type of "TX", so Oracle 10g definitely helps isolate the specific issue.

In conclusion, Oracle 10g makes it much easier to track down the specific causes of Oracle locking problems now that the Oracle "enqueue" wait event from 9i and before has been broken up into over 200 distinct events in 10gR2.

Configuration of Snapshot Standby Database in Oracle 11g

Configuration of Snapshot Standby Database in Oracle 11g



Snapshot Standby is a new features introduced in Oracle 11g. A snapshot standby database is a type of updatable standby database that provides full data protection for a primary database. A snapshot standby database receives and archives, but does not apply, redo data from its primary database. Redo data received from the primary database is applied when a snapshot standby database is converted back into a physical standby database, after discarding all local updates to the snapshot standby database.



The main benefits of snapshot standby database is that we can convert the physical standby database into a read-write real time clone of the production database and we can then temporarily use this environment for carrying out any kind of development testing, QA type work or to test the impact of a proposed production change on an application.



The best part of this snapshot features is that the Snapshot Standby database in turn uses the Flashback Database technology to create a guaranteed restore point to which the database can be later flashed back to.All the features of the flashback are inherent in the snapshot standby.



Here we will configure the snapshot standby database



Step 1 : Create the physical standby database

Create the physical standby database .



Step 2: Enable Flashack Parameter



SQL> alter system set db_recovery_file_dest_size=4G scope=both ;

System altered.



SQL> alter system set db_recovery_file_dest='D:\standby\fra\' scope=both ;

System altered.



SQL> show parameter db_recovery

NAME TYPE VALUE

----------------------------- ----------- -------------------------

db_recovery_file_dest string D:\standby\fra\

db_recovery_file_dest_size big integer 4G



Step 3 : Stop the media recovery process

SQL> alter database recover managed standby database cancel;

Database altered.



Step 4 : Ensure that the database is mounted, but not open.

SQL> shut immediate

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

SQL> startup mount



Step 5 : Create guaranteed restore point

SQL> create restore point snapshot_rspt guarantee flashback database;

Restore point created.



Step 6 : Perform the conversion to snapshot standby database

SQL> alter database convert to snapshot standby ;

Database altered.



SQL> select name,open_mode,database_role from v$database;

NAME OPEN_MODE DATABASE_ROLE

--------- -------------- --------------------------

NOIDA MOUNTED SNAPSHOT STANDBY



SQL> alter database open;

Database altered.



SQL> select name,db_unique_name ,open_mode,database_role from v$database;

NAME DB_UNIQUE_NAME OPEN_MODE DATABASE_ROLE

------- ---------------------- --------------- ------------------------

NOIDA gurgoan READ WRITE SNAPSHOT STANDBY



Since the database is in read-write mode , so we can make some changes or even change the parameter say tunning parameter and check there performance after converting to physical standby and further flashback the whole changes.



SQL> select name,guarantee_flashback_database from v$restore_point;

NAME GUA

----------------------------------------------------------------- -------

SNAPSHOT_STANDBY_REQUIRED_11/18/2011 20:41:01 YES

SNAPSHOT_RSPT YES



While the original physical standby database has been now converted to snapshot database, some changes are happening on the Primary database and those changes are shipped to the standby site but not yet applied. They will accumulate on the standby site and will be applied after the snapshot standby database is converted back to a physical standby database.



Step 7 : Convert snapshot standby to physical standby

SQL> shut immediate

SQL> startup mount



SQL> alter database convert to physical standby ;

Database altered.



SQL>shut immediate

SQL> startup mount



SQL> alter database recover managed standby database disconnect from session;

Database altered.



SQL> alter database recover managed standby database cancel;

Database altered.



SQL> alter database open;

Database altered.



SQL> select name,open_mode,db_unique_name,database_role from v$database;

NAME OPEN_MODE DB_UNIQUE_NAME DATABASE_ROLE

------- -------------- ----------------------- ----------------------

NOIDA READ ONLY gurgoan PHYSICAL STANDBY



SQL> alter database recover managed standby database using current logfile disconnect;

Database altered.



Wednesday, May 2, 2012

Performing a Role Transition Using Switchover

 

A switchover allows a primary and standby to reverse roles without any data loss and without any need to re-create the previous primary. In contrast, a failover implies data loss and can result in the need for the old primary to be re-created. Switchovers are normally performed for planned maintenance. For example, if the primary host needed to replace a faulty CPU that required downtime, we could perform a switchover and have users automatically redirected to the new primary. The impact to the user base could be greatly reduced, thus increasing our availability.
A switchover can be performed using either a physical or logical standby. However, there are some issues you should be aware of. If you have a configuration with a primary database, a physical standby, and a logical standby, and you perform a switchover to the logical standby, your physical standby will no longer be a part of the configuration and must be re-created. In the same scenario, if you perform a switchover to the physical standby, the logical standby remains in the configuration and does not need to be re-created. For this reason it can be stated that a physical standby is a better option for a switchover candidate than a logical standby when multiple standby types exist in the configuration.
The secret to successfully performing a switchover is proper planning and testing. Following is a list of items that should be considered prior to performing a switchover:
  • First and most important, verify that the initialization parameter for both the primary and the standby support both roles.  Pay special attention to the VALID_FOR attributes to the LOG_ARCHIVE_DEST_n parameter, as this will play greatly into the switchover.
  • Verify that the primary and standby host each have TNS aliases that point to one another and that those aliases function correctly. Also, verify that those functioning aliases are the ones used in the LOG_ARCHIVE_DEST_n parameters.
  • For a fast and efficient switchover, disconnect all user connections. If that is not feasible, restrict the user activity as much as possible. It is possible to failover user connections if they have connected via an OCI application and the proper transparent application failover setup has been performed.
  • Verify that both the primary and standby temporary tablespaces are populated with tempfiles.
  • Have the standby, either physical or logical, applying changes from the primary. Verify that the application of redo is current with the primary. Using the real-time apply method will speed up the switchover.
  • If the primary is a RAC database, you must shut down all instances but one.

    Once you have reviewed the above recommendations, you are ready to perform the switchover. We highly recommend testing switchover in your test environment prior to attempting it on your production system. This testing will root out any small configuration errors and make the production event smooth and painless.
    Step 1.  On the primary database, query the V$DATABASE view to verify that the SWITCHOVER_STATUS column indicates that a switchover is possible.
    select switchover_status from v$database;
    If SWITCHOVER_STATUS returns a value of TO_STANDBY, everything is good. If the query returns SESSIONS ACTIVE, you should perform the SWITCHOVER command with the SESSION SHUTDOWN clause.
    Step 2.  Convert the primary database into a physical standby.
    alter database commit to switchover to physical standby;
    If the SWITCHOVER_STATUS column in Step 1 returned SESSIONS ACTIVE, issue the following command:
    alter database commit to switchover to physical standby with session shutdown;
    Step 3.  Shut down and restart the old primary as a new standby.
    shutdown immediate;
    startup mount;
    Congratulations, you now have two standbys.
    Step 4.  When we converted the primary to a standby, we generated a marker in the redo stream and sent that marker to the standby. That marker states that no more redo has been generated. As soon as the standby receives and recovers that marker, it is eligible to become a primary database. Query the SWITCHOVER_STATUS column of V$DATABASE on the standby to ensure that the marker has been recovered and it is ready for the switchover to primary.
    select switchover_status from v$database;
    If SWITCHOVER_STATUS returns TO_PRIMARY, the marker has been recovered and you can proceed with the SWITCHOVER TO PRIMARY command. If the status is SESSIONS ACTIVE, you should either disconnect active sessions or issue the SWITCHOVER command with the SESSION SHUTDOWN clause. If the status states NOT ALLOWED, the marker has not been received and recovered by the standby, and switchover cannot proceed.
    Step 5.  Convert the standby to a primary database.
    alter database commit to switchover to primary;
    Or, if the SWITCHOVER_STATUS returned SESSIONS ACTIVE:
    alter database commit to switchover to primary with session shutdown;
    Step 6.  Shut down and restart the new primary database.
    shutdown immediate;
    startup;
    At this point, the switchover process is complete. If you have configured the VALID_FOR attribute to the LOG_ARCHIVE_DEST_n parameter, your new primary is already configured to send redo to the new standby. If you so desire, you can start managed recovery on the new primary to have it begin applying changes. If you performed the switchover to do maintenance on the primary host, you can shut down the new standby and perform the required maintenance. Before shutting down the standby, you should consider the protection mode you are in on your new primary and make sure that the conditions for that protection mode are satisfied.

    Performing Switchover with Logical Standby

    After validating the requirements, we are ready to switch over to our logical standby. As we stated earlier, we highly recommend testing the switchover in a test environment prior to performing in production to root out any small configuration errors.
    Step 1.  On the primary database, query the SWITCHOVER_STATUS column of the V$DATABASE view to see if a switchover is allowed.
    select switchover_status from v$database;
    If the SWITCHOVER_STATUS column states TO STANDBY, TO LOGICAL STANDBY, or SESSIONS ACTIVE, your configuration is ready to perform the switchover.
    Step 2.  New to Oracle Database 10g is the ability to prepare the primary database and logical standby for a switchover, thus reducing the time to complete the switchover. On the primary, issue the command to prepare the primary for the conversion to a logical standby.
    alter database prepare to switchover to logical standby;
    Step 3.  On the logical standby, issue the command to prepare it to become a primary database. This command instructs the logical standby to begin transmitting the LogMiner dictionary to the current primary that will soon be the new standby. Depending on the size of the LogMiner dictionary, this command could take a few minutes to complete.
    alter database prepare to switchover to primary;
    Step 4.  At this point, we need to query the SWITCHOVER_STATUS column of V$DATABASE on the primary to see if the receipt of the LogMiner dictionary from the standby has completed. Once the status states TO LOGICAL STANDBY, we are ready to proceed.
    Step 5.  On the primary database, issue the statement to convert the primary into a logical standby.
    alter database commit to switchover to logical standby;
    This command will wait for all current transactions to complete and will stop any new transaction from occurring. If the statement is taking some time to complete, check V$TRANSACTION and end any long-running transactions.
    Step 6.  On the logical standby query, check the SWITCHOVER_STATUS column of V$DATABASE to see if the logical standby has recovered all of the information from the primary, including the marker that indicates that the switchover is occurring. Once the SWITCHOVER_STATUS column states TO PRIMARY, issue the statement to convert the logical standby into a primary database.
    alter database commit to switchover to primary;
    There is no need to bounce either the new primary or new logical standby database. Optionally, you can start the SQL Apply engine on the new logical standby. 

 

Thursday, December 8, 2011

Upgrading the Protection Mode and Downgrading the Protection Mode

Upgrading the Protection Mode
Use the following steps to upgrade the protection mode from the default of Maximum Performance.


Identify the current configuration:


Primary Database (TNS Service Name): TESTDB_VMLINUX3.IDEVELOPMENT.INFO
Standby Database (TNS Service Name): TESTDB_VMLINUX4.IDEVELOPMENT.INFO
Current Protection Mode: Maximum Performance mode using ARCH

Both of the higher protection modes (Maximum Availability and Maximum Protection) require the use of standby redo logs on the destination standby database. If standby redo logs do not exist for the standby database, create them now.

[Connect to the standby database]

SQL> connect sys/change_on_install@testdb_vmlinux4.idevelopment.info as sysdba

[Verify if using oracle managed files]

SQL> select value from v$parameter where name = 'db_create_online_log_dest_1';

VALUE
-------------------
/u02/oradata/TESTDB

[Do any standby redo logs exist on the standby database?]

SQL> select group#, bytes, 'online' as type from v$log
2 union
3 select group#, bytes, 'standby' as type from v$standby_log
4 order by 1;

GROUP# BYTES TYPE
---------- ---------- -------
1 104857600 ONLINE
2 104857600 ONLINE
3 104857600 ONLINE

[Need to first cancel Managed Recovery mode if it is running]

SQL> alter database recover managed standby database cancel;

[Create appropriate number standby redo log files on the standby database]

SQL> alter database add standby logfile group 4 size 100m;
SQL> alter database add standby logfile group 5 size 100m;
SQL> alter database add standby logfile group 6 size 100m;
SQL> alter database add standby logfile group 7 size 100m;

[Put standby database back into Managed Recovery mode]

SQL> alter database recover managed standby database disconnect from session;

[Verify new standby redo logs on standby database]

SQL> select group#, bytes, 'online' as type from v$log
2 union
3 select group#, bytes, 'standby' as type from v$standby_log
4 order by 1;

GROUP# BYTES TYPE
---------- ---------- -------
1 104857600 ONLINE
2 104857600 ONLINE
3 104857600 ONLINE
4 104857600 STANDBY
5 104857600 STANDBY
6 104857600 STANDBY
7 104857600 STANDBY



Ensure that the attributes for LOG_ARCHIVE_DEST_n are configured on the primary instance to support the desired protection mode. Also note that the destination standby database should be enabled to support the target protection mode as documented in this article.

For the mode of Maximum Protection, the standby database must be up and mounted!




Set the LOG_ARCHIVE_DEST_n initialization parameter on the primary database to support the required protection mode:

[Connect to the primary database]

SQL> connect sys/change_on_install@testdb_vmlinux3.idevelopment.info as sysdba

[Configure log transport services to support the desired protection mode]

SQL> alter system set log_archive_dest_2='service=testdb_vmlinux4.idevelopment.info LGWR SYNC AFFIRM';
SQL> alter system set log_archive_dest_state_2=enable;



The primary database will need to be closed and then placed in the MOUNT stage.

SQL> shutdown immediate
SQL> startup mount



On the primary database, change the protection mode and open the database:

For Maximum Protection mode:

SQL> alter database set standby database to maximize protection;
SQL> alter database open;



For Maximum Availability mode:

SQL> alter database set standby database to maximize availability;
SQL> alter database open;



Query the data dictionary on the primary database to verify the new protection mode:

SQL> SELECT PROTECTION_MODE, PROTECTION_LEVEL, DATABASE_ROLE FROM V$DATABASE;

PROTECTION_MODE PROTECTION_LEVEL DATABASE_ROLE
------------------- ------------------- --------------
MAXIMUM PROTECTION MAXIMUM PROTECTION PRIMARY



(Optional) Although the steps in this section are optional, there are highly recommended so that the primary database can easily and quickly switchover to a standby role without the need for DBA intervention. For example, standby redo logs are only used on the physical standby database; however, creating and having standby redo logs ready to go on the physical primary database makes role transition much easier if the primary would every have to become the standby:

[Connect to the primary database]

SQL> connect sys/change_on_install@testdb_vmlinux3.idevelopment.info as sysdba

[Do any standby redo logs exist on the primary database?]

SQL> select group#, bytes, 'online' as type from v$log
2 union
3 select group#, bytes, 'standby' as type from v$standby_log
4 order by 1;

GROUP# BYTES TYPE
---------- ---------- -------
1 104857600 ONLINE
2 104857600 ONLINE
3 104857600 ONLINE

[Create appropriate number standby redo log files on the primary database]

SQL> alter database add standby logfile group 4 size 100m;
SQL> alter database add standby logfile group 5 size 100m;
SQL> alter database add standby logfile group 6 size 100m;
SQL> alter database add standby logfile group 7 size 100m;

[Verify new standby redo logs on primary database]

SQL> select group#, bytes, 'online' as type from v$log
2 union
3 select group#, bytes, 'standby' as type from v$standby_log
4 order by 1;

GROUP# BYTES TYPE
---------- ---------- -------
1 104857600 ONLINE
2 104857600 ONLINE
3 104857600 ONLINE
4 104857600 STANDBY
5 104857600 STANDBY
6 104857600 STANDBY
7 104857600 STANDBY



Also, on the standby database(s), configure the LOG_ARCHIVE_DEST_n parameter attributes so the Data Guard configuration can continue to operate in the new protection mode after a switchover:

[Connect to the standby database]

SQL> connect sys/change_on_install@testdb_vmlinux4.idevelopment.info as sysdba

SQL> alter system set log_archive_dest_2='service=testdb_vmlinux3.idevelopment.info LGWR SYNC AFFIRM';
SQL> alter system set log_archive_dest_state_2=defer;



Downgrading the Protection Mode
Use the following steps to downgrade the protection mode from the current higher protection mode.


Identify the current configuration:


Primary Database (TNS Service Name): TESTDB_VMLINUX3.IDEVELOPMENT.INFO
Standby Database (TNS Service Name): TESTDB_VMLINUX4.IDEVELOPMENT.INFO
Current Protection Mode: Maximum protection using LGWR SYNC AFFIRM

Configure the attributes for LOG_ARCHIVE_DEST_n on the primary instance to support the new desired protection mode:

[Connect to the primary database]

SQL> connect sys/change_on_install@testdb_vmlinux3.idevelopment.info as sysdba

[Configure log transport services to support the desired protection mode]

SQL> alter system set log_archive_dest_2='service=testdb_vmlinux4.idevelopment.info ARCH';
SQL> alter system set log_archive_dest_state_2=enable;



Shutdown and mount the primary and standby database:

[Connect to the primary database]

SQL> connect sys/change_on_install@testdb_vmlinux3.idevelopment.info as sysdba
SQL> shutdown immediate
SQL> startup mount

[Connect to the standby database]

SQL> connect sys/change_on_install@testdb_vmlinux4.idevelopment.info as sysdba
SQL> alter database recover managed standby database cancel;
SQL> shutdown immediate
SQL> startup nomount
SQL> alter database mount standby database;



Downgrade the primary database to the desired protection mode:

[Connect to the primary database]

SQL> connect sys/change_on_install@testdb_vmlinux3.idevelopment.info as sysdba
[For Maximum Performance mode]

SQL> alter database set standby database to maximize performance;
[For Maximum Availability mode]

SQL> alter database set standby database to maximize availability;



Both of the higher protection modes (Maximum Availability and Maximum Protection) require the use of standby redo logs on the destination standby database. It you were to downgrade from Maximum Protection mode to Maximum Availability mode, you would need to keep the standby redo logs that exist for the standby database. If downgrading to Maximum Performance mode, you can drop the standby redo logs from the standby unless you are going to be configuring log transport services to use LGWR. For this example, I am going to be downgrading to maximum performance mode and using ARCH for log transport services and can therefore drop any standby redo logs from the standby database (and the primary if they exist):

[Connect to the standby database]

SQL> connect sys/change_on_install@testdb_vmlinux4.idevelopment.info as sysdba

[Drop any standby redo logs from the standby database]

SQL> alter database drop standby logfile group 4;
SQL> alter database drop standby logfile group 5;
SQL> alter database drop standby logfile group 6;
SQL> alter database drop standby logfile group 7;



If standby redo logs exist on the primary and you are downgrading to Maximum Performance mode, they can be removed:

[Connect to the primary database]

SQL> connect sys/change_on_install@testdb_vmlinux3.idevelopment.info as sysdba

[Drop any standby redo logs from the primary database]

SQL> alter database drop standby logfile group 4;
SQL> alter database drop standby logfile group 5;
SQL> alter database drop standby logfile group 6;
SQL> alter database drop standby logfile group 7;



Put the standby database in managed recovery mode and open the primary database with the new protection mode enabled:

[Connect to the standby database]

SQL> connect sys/change_on_install@testdb_vmlinux4.idevelopment.info as sysdba
SQL> alter database recover managed standby database disconnect from session;

[Connect to the primary database]

SQL> connect sys/change_on_install@testdb_vmlinux3.idevelopment.info as sysdba
SQL> alter database open;



Query the data dictionary on the primary database to verify the new protection mode:

SQL> select protection_mode, protection_level, database_role from v$database;

PROTECTION_MODE PROTECTION_LEVEL DATABASE_ROLE
-------------------- -------------------- --------------
MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE PRIMARY

Different kind of protection modes Oracle offers in a Data Guard Environment



Maximum Protection—This mode offers the highest level of data protection. Data is synchronously transmitted to the standby database from the primary database and transactions are not committed on the primary database unless the redo data is available on at least one standby database configured in this mode. If the last standby database configured in this mode becomes unavailable, processing stops on the primary database. This mode ensures no-data-loss.


Maximum Availability—This mode is similar to the maximum protection mode, including zero data loss. However, if a standby database becomes unavailable (for example, because of network connectivity problems), processing continues on the primary database. When the fault is corrected, the standby database is automatically resynchronized with the primary database.


Maximum Performance—This mode offers slightly less data protection on the primary database, but higher performance than maximum availability mode. In this mode, as the primary database processes transactions, redo data is asynchronously shipped to the standby database. The commit operation of the primary database does not wait for the standby database to acknowledge receipt of redo data before completing write operations on the primary database. If any standby destination becomes unavailable, processing continues on the primary database and there is little effect on primary database performance.

Tuesday, November 15, 2011

How to find log sequences used during the recovery

RMAN> restore database preview;

Starting restore at 2009-MAY-22 11:24:50

using channel ORA_DISK_1

List of Backup Sets

===================

BS Key Type LV Size Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ --------------------

1 Full 209.16M DISK 00:01:50 2009-MAY-22 11:21:04

BP Key: 1 Status: AVAILABLE Compressed: YES Tag: TAG20090522T111914

Piece Name: /u01/app/oracle/flash_recovery_area/ORCL/backupset/2009_05_22/o1_mf_nnndf_TAG20090522T111914_51d6fm3b_.bkp

List of Datafiles in backup set 1

File LV Type Ckp SCN Ckp Time Name

---- -- ---- ---------- -------------------- ----

1 Full 1621673 2009-MAY-22 11:19:15 /u01/app/oracle/oradata/ORCL/system01.dbf

2 Full 1621673 2009-MAY-22 11:19:15 /u01/app/oracle/oradata/ORCL/sysaux01.dbf

3 Full 1621673 2009-MAY-22 11:19:15 /u01/app/oracle/oradata/ORCL/undotbs01.dbf

4 Full 1621673 2009-MAY-22 11:19:15 /u01/app/oracle/oradata/ORCL/users01.dbf



List of Archived Log Copies for database with db_unique_name ORCL

=====================================================================



Key Thrd Seq S Low Time

------- ---- ------- - --------------------

30 1 97 A 2009-MAY-22 11:11:24

Name: /u01/app/oracle/flash_recovery_area/ORCL/archivelog/2009_05_22/o1_mf_1_97_51d6kvcr_.arc

Media recovery start SCN is 1621093

Recovery must be done beyond SCN 1621673 to clear datafile fuzziness

Finished restore at 2009-MAY-22 11:24:51

RMAN> exit





Recovery Manager complete.

How to find log sequences used during the recovery

From above restore preview, we confirm that all redo logs generated from SCN 1621093 and 1621673 will be used.

SQL> select thread#,sequence# from v$archived_log

2 where --cross begin backup only (sample SEQ# 90)

3 (first_change#<=1621093 and next_change#>=1621093 and next_change#<=1621673)

4 or --within begin backup and end backup (sample SEQ# 91,92)

5 (first_change#>=1621093 and next_change#<=1621673)

6 or --cross end backup only (sample SEQ# 93)

7 (first_change#>=1621093 and first_change#<=1621673 and next_change#>=1621673)

8 or --cross begin backup and end backup (sample SEQ# 150)

9 (first_change#<=1621093 and next_change#>=1621673)

10 ;



THREAD# SEQUENCE#

---------- ----------

1 97