ROHIT Blog: 2010

Friday, December 17, 2010

Oracle Clusterware Components

Oracle Clusterware is the software, which enables the nodes to communicate with each other, and forms the cluster and makes the nodes as single logical server. Oracle Clusterware is run by Cluster Ready Services (CRS) using two key components. They are Oracle Cluster Registry voting disk, which acts a tiebreaker during communication failures. Consistent heartbeat information from all the nodes is sent to voting disk when the cluster is running. CRS service has four components namely OPROCd, CRS Daemon (crsd), Oracle Cluster Synchronization Service Daemon (OCSSD) and Event Volume Manager Daemon (evmd) and each handles a variety of functions. Failure or death of the CRS daemon can cause the node failure and it automatically reboots the nodes to avoid the data corruption because of the possible communication failure between the nodes. The CRS daemon runs as the super user UNIX platforms and runs as a service in the windows platforms. (OCR), which records and maintains the cluster and node membership information. The other component is ‘root’ in the

The following functionalities are covered by Oracle Cluster Ready Services

CRS is installed and run from a different oracle home known as ORA CRS HOME, which is independent from ORACLE HOME.

CRSD --

CRSd manages the resources like starting and stopping the services and failing-over the application resources. It spawns separate processes to manage application resources.
CRS daemon has two modes of running. During startup and after a shutdown. During planned clusterware start it is started as ‘reboot’ mode. It is started as ‘restart’ mode after unplanned shutdown.
In reboot mode it ‘auto’ starts all the resources under its management. In restart mode it prevails the previous state and brings back the resources to it previous state before shutdown
Manages the Oracle Cluster Registry and stores the current known state in the Oracle Cluster Registry
Runs as ‘root’ on Unix and ‘LocalSystem’ on windows and automatically restarts in case of failure.
CRS requires the public interface, private interface and the Virtual IP (VIP) for the operation. All these interfaces should be up and running, should be able to ping each other before starting CRS Installation. Without the above network infrastructure CRS cannot be installed.

OCSSD ---

Oracle Cluster Synchronization Services Daemon (ocssd) is the component, which provides the synchronization services between the nodes. OCSSD provides the access to the node membership. It also enables basic cluster services including cluster group services and cluster locking. It can also run without integration with vendor clusterware. Failure of ocssd causes the machine to reboot to avoid split-brain situation. This is also required in a single instance configuration if Automatic Storage Management (ASM) is used. Automatic Storage management was a new feature in oracle database 10g. Ocssd runs as ‘oracle’ user. The following functionalities are covered by CSS daemon

CSS provides basic ‘group services’ support. Group Services is a distributed group membership system that allows the applications to coordinate activities to achieve a common result.
’Group services’ use vendor clusterware group services when vendor clusterware is available. But it is capable of working independently if there is no vendor clusterware group services available
‘Lock services’ is another service from the CSS daemon. Lock services provide the basic cluster wide serialization locking functions. It uses FIFO mechanism to manage locking
Node Services is the third service produced by the CSSD. It uses OCR to store the data and updates the information during reconfiguration. It also manages the OCR data, which is static otherwise.

EVMD --

The third component in OCS is called Event Management Logger. Event Management logger also runs as daemon process ‘evmd’. The daemon process ‘evmd’ spawns a permanent child process called ‘evmlogger’ and generates the events when things happen. EVMD child process ‘evmlogger’ spawns new children processes on demand and scans the callout directory to invoke callouts. It will restart automatically on failures and death of the evmd process does not halt the instance. Evmd runs as ‘oracle’ user.

OPROCD--

Oprocd provides the server fencing solution for the Oracle Clusterware. It is the process monitor for the oracle clusterware and it uses the hang check timer or watchdog timer (depending on the implementation) for the cluster integrity. Oprocd is locked in the memory and runs as a real time process. This sleeps for a fixed time and runs as ‘root’ user. Failure of the Oprocd process causes the node to restart.

Thursday, October 7, 2010

Latches

What is a latch?
Latches are locking mechanisms used to protect shared memory structures from potential corruption due to concurrent access. In other words,latches ensure exclusive access to the shared data structures in the SGA. Access to SGA structures is seralized, using latches. Latches, unlike locks or enqueues are not used to protect database objects.
Latches are used in almost all operations. For example
1. When a session reads a block from disk, it must modify a free block in the buffer cache and adjust the buffer cache LRU chain.
2. When a session reads a block from the SGA, it will modify the LRU chain.
3. When a new SQL statement is parsed, it will be added to the library cache within the SGA.
4. As modifications are made to blocks, entries are placed in the redo buffer.
5. The database writer periodically writes buffers from the cache to disk (and must update their status from “dirty” to “clean”).
The redo log writer writes entries from the redo buffer to the redo logs.

Latch Types

There are three types of latches: parent, child, and solitary latches. The parent and solitary latches are fixed in the Oracle kernel code. Child latches are created at instance startup.

1. Latch operation

2. How are latches acquired?

A process may request a latch in the willing-to-wait or no-wait (immediate) mode. If the latch is available on the first request, the process acquires it. Before modifying the protected data structure, the process writes the recovery information in the latch recovery area so that PMON knows what to clean up if the process dies while holding the latch.

If the latch is not available, the process spins on the CPU for a short while and retries for the latch. This spin and retry activity can be repeated up to the value specified in the intialization parameter _SPIN_COUNT value (default 2000). The values of initialization parameters _MAX_EXPONENTIAL_SLEEP and _MAX_SLEEP_HOLDING_LATCH are used at arriving the spin duration.

Latches are very light and on most systems, a single machine instruction called “test and set” is used to see if the latch is taken (by looking at a specific memory address) and if not, acquire it (by changing the value in the memory address).

---Wakeup mechanisms

1. Timeout

The operating system signals (wakes up) the process when a set alarm is triggered

2. Latch wait posting

The next process to free the required latch will wake up the process waiting for the latch. This is initiated by the requesting

process before going to sleep by putting itself in a latch wait list

Wednesday, October 6, 2010

Unix for the DBA

Unix for the DBA

How to kill all similar processes with single command (in this case opmn)

ps -ef | grep opmn |grep -v grep | awk ‘{print $2}’ |xargs -i kill -9 {}

Locating Files under a particular directory

find . -print |grep -i test.sql

Using AWK in UNIX

To remove a specific column of output from a UNIX command – for example to determine the UNIX process Ids for all Oracle processes on server (second column)

ps -ef |grep -i oracle |awk '{ print $2 }'

Changing the standard prompt for Oracle Users

Edit the .profile for the oracle user

PS1="`hostname`*$ORACLE_SID:$PWD>"

Display top 10 CPU consumers using the ps command

/usr/ucb/ps auxgw | head -11

Show number of active Oracle dedicated connection users for a particular ORACLE_SID

ps -ef | grep $ORACLE_SID|grep -v grep|grep -v ora_|wc -l
Display the number of CPU’s in Solaris

psrinfo -v | grep "Status of processor"|wc -l
Display the number of CPU’s in AIX

lsdev -C | grep Process|wc -l

Display RAM Memory size on Solaris

prtconf |grep -i mem

Display RAM memory size on AIX

First determine name of memory device

lsdev -C |grep mem

then assuming the name of the memory device is ‘mem0’

lsattr -El mem0

Swap space allocation and usage

Solaris : swap -s or swap -lAix : lsps -a

Total number of semaphores held by all instances on server

ipcs -as | awk '{sum += $9} END {print sum}'

View allocated RAM memory segments

ipcs -pmb

Manually deallocate shared memeory segments

ipcrm -m ''
Show mount points for a disk in AIX

lspv -l hdisk13

Display amount of occupied space (in KB) for a file or collection of files in a directory or sub-directory

du -ks * | sort -n| tail

Display total file space in a directory

du -ks .
Cleanup any unwanted trace files more than seven days old

find . *.trc -mtime +7 -exec rm {} \;

Locate Oracle files that contain certain strings

find . -print | xargs grep rollback

Locate recently created UNIX files (in the past one day)

find . -mtime -1 -print

Finding large files on the server (more than 100MB in size)

find . -size +102400 -print

Crontab :

To submit a task every Tuesday (day 2) at 2:45PM

45 14 2 * * /opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1

To submit a task to run every 15 minutes on weekdays (days 1-5)

15,30,45 * 1-5 * * /opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1

To submit a task to run every hour at 15 minutes past the hour on weekends (days 6 and 0)

15 * 0,6 * * opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1

SRVCTL:
srvctl command target [options]
commands: enable|disable|start|stop|relocate|status|add|remove|
modify|getenv|setenv|unsetenv|config
targets: database/db|instance/inst|service/serv|nodeapps|asm|listener

srvctl -help or srvctl -v
srvctl -V -- prints version
srvctl version: 10.2.0.0.0 (or) srvctl version: 11.0.0.0.0
srvctl -h -- print usage
srvctl status service –h

Database:
srvctl add database -d db_name -o ORACLE_HOME [-m domain_name][-p spfile] [-A name|ip/netmask]
[-r {PRIMARY|PHYSICAL_STANDBY|LOGICAL_STANDBY|SNAPSHOT_STANDBY}]
[-s start_options] [-n db_name] [-y {AUTOMATIC|MANUAL}]

srvctl remove database -d db_name [-f]

srvctl start database -d db_name [-o start_options] [-c connect_str | -q]
srvctl start database -d db_name [-o open]
srvctl start database -d db_name -o nomount
srvctl start database -d db_name -o mount
srvctl start db -d prod

srvctl stop database -d db_name [-o stop_options] [-c connect_str | -q]
srvctl stop database -d db_name [-o normal]
srvctl stop database -d db_name -o transactional
srvctl stop database -d db_name -o immediate
srvctl stop database -d db_name -o abort

srvctl status database -d db_name [-f] [-v] [-S level]
srvctl status database -d db_name -v service_name
srvctl status database -d hrms

srvctl enable database -d db_name
srvctl disable database -d db_name

srvctl config database
srvctl config database -d db_name [-a] [-t]

srvctl modify database -d db_name [-n db_name] [-o ORACLE_HOME] [-m domain_name] [-p spfile]
[-r {PRIMARY|PHYSICAL_STANDBY|LOGICAL_STANDBY|SNAPSHOT_STANDBY}] [-s start_options] [-y {AUTOMATIC|MANUAL}]
srvctl modify database -d hrms -r physical_standby
srvctl modify db -d RAC -p /u03/oradata/RAC/spfileRAC.ora -- moves parameter file

srvctl getenv database -d db_name [-t name_list]
srvctl setenv database -d db_name {-t name=val[,name=val,...]|-T name=val}
srvctl unsetenv database -d db_name [-t name_list]

Instance:
srvctl add instance –d db_name –i inst_name -n node_name
srvctl remove instance –d db_name –i inst_name [-f]

srvctl start instance -d db_name -i inst_names [-o start_options] [-c connect_str | -q]
srvctl start instance –d db_name –i inst_names [-o open]
srvctl start instance –d db_name –i inst_names -o nomount
srvctl start instance –d db_name –i inst_names -o mount
srvctl start instance –d prod -i prod3

srvctl stop instance -d db_name -i inst_names [-o stop_options] [-c connect_str | -q]
srvctl stop instance –d db_name –i inst_names [-o normal]
srvctl stop instance –d db_name –i inst_names -o transactional
srvctl stop instance –d db_name –i inst_names -o immediate
srvctl stop instance –d db_name –i inst_names -o abort
srvctl stop inst –d prod -i prod6

srvctl status instance –d db_name –i inst_names [-f] [-v] [-S level]
srvctl status inst –d racdb -i racdb2

srvctl enable instance –d db_name –i inst_names
srvctl disable instance –d db_name –i inst_names

srvctl modify instance -d db_name -i inst_name {-s asm_inst_name|-r} -- set a dependency of instance to ASM
srvctl modify instance -d db_name -i inst_name -n node_name -- move the instance
srvctl modify instance -d db_name -i inst_name -r -- remove the instance

srvctl getenv instance –d db_name –i inst_name [-t name_list]
srvctl setenv instance –d db_name [–i inst_name] {-t "name=val[,name=val,...]" | -T "name=val"}
srvctl unsetenv instance –d db_name [–i inst_name] [-t name_list]

Service:
srvctl add service -d db_name -s service_name -r pref_insts [-a avail_insts] [-P TAF_policy]
srvctl add service -d db_name -s service_name -u {-r "new_pref_inst" | -a "new_avail_inst"}
srvctl add service -d RAC -s PRD -r RAC01,RAC02 -a RAC03,RAC04
srvctl add serv -d CRM -s CRM -r CRM1 -a CRM3 -P basic

srvctl remove service -d db_name -s service_name [-i inst_name] [-f]

srvctl start service -d db_name [-s service_names [-i inst_name]] [-o start_options]
srvctl start service -d db_name -s service_names [-o open]
srvctl start service -d db_name -s service_names -o nomount
srvctl start service -d db_name -s service_names -o mount

srvctl stop service -d db_name [-s service_names [-i inst_name]] [-f]

srvctl status service -d db_name [-s service_names] [-f] [-v] [-S level]

srvctl enable service -d db_name -s service_names [–i inst_name]
srvctl disable service -d db_name -s service_names [–i inst_name]

srvctl config service -d db_name [-s service_name] [-a] [-S level]
srvctl config service -d db_name -a -- -a shows TAF configuration
srvctl config service -d WEBTST -s webtest PREF:WEBTST1 AVAIL:WEBTST2

srvctl modify service -d db_name -s service_name -i old_inst_name -t new_inst_name [-f]
srvctl modify service -d db_name -s service_name -i avail_inst_name -r [-f]
srvctl modify service -d db_name -s service_name -n -i preferred_list [-a available_list] [-f]
srvctl modify service -d db_name -s service_name -i old_inst_name -a avail_inst -P TAF_policy
srvctl modify serv -d PROD -s SDW -n -i I1,I2,I3,I4 -a I5,I6

srvctl relocate service -d db_name -s service_name –i old_inst_name -t target_inst [-f]

srvctl getenv service -d db_name -s service_name -t name_list
srvctl setenv service -d db_name [-s service_name] {-t "name=val[,name=val,...]" | -T "name=val"}
srvctl unsetenv service -d db_name -s service_name -t name_list

Nodeapps:
#srvctl add nodeapps -n node_name -o ORACLE_HOME -A name|ip/netmask[/if1[|if2|...]]
#srvctl add nodeapps -n lnx02 -o $ORACLE_HOME -A 192.168.0.151/255.255.0.0/eth0

#srvctl remove nodeapps -n node_names [-f]

#srvctl start nodeapps -n node_name -- Starts GSD, VIP, listener & ONS
#srvctl stop nodeapps -n node_name [-r] -- Stops GSD, VIP, listener & ONS

#srvctl status nodeapps -n node_name

#srvctl config nodeapps -n node_name [-a] [-g] [-o] [-s] [-l]

#srvctl modify nodeapps -n node_name [-A new_vip_address]
#srvctl modify nodeapps -n lnx06 -A 10.50.99.43/255.255.252.0/eth0

#srvctl getenv nodeapps -n node_name [-t name_list]
#srvctl setenv nodeapps -n node_name {-t "name=val[,name=val,...]"|-T "name=val"}
#srvctl unsetenv nodeapps -n node_name [-t name_list]

ASM:
srvctl add asm -n node_name -i asminstance -o ORACLE_HOME [-p spfile]
srvctl remove asm -n node_name [-i asminstance] [-f]

srvctl start asm -n node_name [-i asminstance] [-o start_options] [-c connect_str | -q]
srvctl start asm -n node_name -i asminstance [-o open]
srvctl start asm -n node_name -i asminstance -o nomount
srvctl start asm -n node_name -i asminstance -o mount

srvctl stop asm -n node_name [-i asminstance] [-o stop_options] [-c connect_str | -q]
srvctl stop asm -n node_name -i asminstance [-o normal]
srvctl stop asm -n node_name -i asminstance -o transactional
srvctl stop asm -n node_name -i asminstance -o immediate
srvctl stop asm -n node_name -i asminstance -o abort

srvctl status asm -n node_name

srvctl enable asm -n node_name [-i asminstance]
srvctl disable asm -n node_name [-i asminstance]
srvctl config asm -n node_name
srvctl modify asm -n node_name -i asminstance [-o ORACLE_HOME] [-p spfile]

Listener:
srvctl add listener -n node_name -o ORACLE_HOME [-l listener_name]
-- 11g command
srvctl remove listener -n node_name [-l listener_name] -- 11g command

srvctl start listener -n node_name [-l listener_names]
srvctl stop listener -n node_name [-l listener_names]

srvctl config listener -n node_name

Oracle Clusterware processes for 10g on Unix and Linux

What are Oracle Clusterware processes for 10g on Unix and Linux

Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd) —A background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

What are Oracle database background processes specific to RAC

•LMS—Global Cache Service Process

•LMD—Global Enqueue Service Daemon

•LMON—Global Enqueue Service Monitor

•LCK0—Instance Enqueue Process

To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

What are Oracle Clusterware Components

Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.

Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster

How do you troubleshoot node reboot

Please check metalink ...

Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.

How do you backup the OCR

There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOME\cdata\"clustername"\

To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore

With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig -manualbackup

How do you backup voting disk

#dd if=voting_disk_name of=backup_file_name

How do I identify the voting disk location

#crsctl query css votedisk

How do I identify the OCR file location

check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck

Is ssh required for normal Oracle RAC operation ?

"ssh" are not required for normal Oracle RAC operation. However "ssh" should be enabled for Oracle RAC and patchset installation.

What is SCAN?

Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.

Click here for more details from Oracle

What is the purpose of Private Interconnect ?

Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.

Why do we have a Virtual IP (VIP) in Oracle RAC?

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?

This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.

How many nodes are supported in a RAC Database?

10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.

Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?

Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.

what is the purpose of the ONS daemon?

The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.

This in order to facilitate:

a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.

Tuesday, October 5, 2010

ASM REBALANCING

Dynamic Storage Configuration

ASM enables you to change the storage configuration without having to take the database offline. It automatically rebalances—redistributes file data evenly across all the disks of the disk group—after you add disks to or drop disks from a disk group.

Should a disk failure occur, ASM automatically rebalances to restore full redundancy for files that had extents on the failed disk. When you replace the failed disk with a new disk, ASM rebalances the disk group to spread data evenly across all disks, including the replacement disk.

Tuning Rebalance Operations

The V$ASM_OPERATION view provides information that can be used for adjusting ASM_POWER_LIMIT and the resulting power of rebalance operations. The V$ASM_OPERATION view also gives an estimate in the EST_MINUTES column of the amount of time remaining for the rebalance operation to complete. You can see the effect of changing the rebalance power by observing the change in the time estimate.

Effects of Adding and Dropping Disks from a Disk Group

ASM automatically rebalances whenever disks are added or dropped. For a normal drop operation (without the FORCE option), a disk is not released from a disk group until data is moved off of the disk through rebalancing. Likewise, a newly added disk cannot support its share of the I/O workload until rebalancing completes. It is more efficient to add or drop multiple disks at the same time so that they are rebalanced as a single operation. This avoids unnecessary movement of data.

For a drop operation, when rebalance is complete, ASM takes the disk offline momentarily, and then drops it, setting disk header status to FORMER.

You can add or drop disks without shutting down the database. However, a performance impact on I/O activity may result.

Testing the Fail over in Real Application Clusters Oracle10gR2 using service

Testing the Fail over in Real Application Clusters Oracle10gR2 using service:

How to test the Fail-over, if the Client is connected to a RAC database and one of the nodes goes down?
1. Client connected to Database from a SQL Plus session from a remote Workstation.
*********************************************
SQL> show user
USER is "SCOTT"
SQL> select INSTANCE_NAME, INSTANCE_NUMBER from v$INSTANCE;
INSTANCE_NAME INSTANCE_NUMBER
---------------- ---------------
devdb1 1
*********************************************
Checking the User /Client status from a DB Server
SQL> select USERNAME, INST_ID, MACHINE, TERMINAL, STATUS
2 from gv$session where USERNAME = 'SCOTT';
USERNAME INST_ID MACHINE TERMINAL STATUS
--------- ---------- ------------------------------ -------------- --------
SCOTT 1 WORKGROUP\PSDBA03-02 PSDBA03-02 INACTIVE
****************************************
3. On the DB Server, force /Kill the Instance devdb1.
4. The status of the DBInstance devdb1 is down.
Name Type Target State Host
------------------------------------------------------------
ora....SDEV.cs application ONLINE ONLINE rac2
ora....db1.srv application ONLINE OFFLINE
ora....db2.srv application ONLINE ONLINE rac2
ora.devdb.db application ONLINE ONLINE rac2
ora....b1.inst application OFFLINE OFFLINE
ora....b2.inst application ONLINE ONLINE rac2
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
****************************
5. You can see difference with Normal Service, the SQL plus connection throws a error, but with Client-Side Load balancing and TAF the user connection still exists. For User the fail-over is transparent and can go ahead with his session without the need to reconnect.
SQL> select INSTANCE_NAME from v$instance;
INSTANCE_NAME
----------------
devdb2
SQL> show user
USER is "SCOTT"
This is a simple test for fail-over, with a SQL Plus connections from a remote server.

Monday, October 4, 2010

RAC Switchover / Switchback

RAC Switchover / Switchback for 2 Node Primary with 2 Node Standby. Got from oracle forums.
DB Name DB Unique Name Host Name Instance Name

--------------------------------------------------------------------------------
live live linux1 live1
live live linux2 live2
live livestdby linux3 livestdby1
live livestdby linux4 livestdby2

Verify that each database is properly configured for the role it is about to assume and the standby database is in mounted state.
(Verify all Dataguard parameters on each node for Primary & Standby)

Like,
Log_archive_dest_1
Log_archive_dest_2
Log_archive_dest_state_1
Log_archive_dest_state_2
Fal_client
Fal_server
Local_listener
Remote_listener
Standby_archive_Dest
Standby_archive_management
service_names
db_unique_name
instance_name
db_file_name_convert
log_file_name_convert

Verify that both Primary RAC & Dataguard RAC are functioning properly and both are in Sync
On Primary,
Select thread#,max(sequence#) from v$archived_log group by thread#;
On Standby,
Select thread#,max(sequence#) from v$log_history group by thread#;

Before performing a switchover from a RAC primary shut down all but one primary instance (they can be restarted after the switchover has completed).
./srvctl stop instance –d live –i live1
Before performing a switchover or a failover to a RAC standby shut down all but one standby instance (they can be restarted after the role transition has completed).
./srvctl stop instance –d live –i livestdby1

On the primary database initiate the switchover:
alter database commit to switchover to physical standby with session shutdown;
Shutdown former Primary database & Startup in Mount State.
Shut immediate;
Startup mount;
select name,db_unique_name, log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;

Make log_Archive_Dest_state_2 to DEFER
alter system set log_archive_dest_state_2='DEFER' sid='*';

On the (old) standby database,
select name,log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;

On the (old) standby database switch to new primary role:
alter database commit to switchover to primary;
shut immediate;
startup;

On new Primary database,
select name,log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;
Make log_Archive_Dest_state_2 to ENABLE
alter system set log_archive_dest_state_2='ENABLE' sid='*';
Add tempfiles in New Primary database.
Do some archivelog switches on new primary database & verify that archives are getting transferred to Standby database.

On new primary,
select error from v$archive_Dest_status;
select max(sequence#) from v$archived_log;

On new Standby, Start Redo Apply

alter database recover managed standby database using current logfile disconnect;

Select max(sequence#) from v$log_history; (should be matching with Primary)

Now Start RAC databases services (both Primary – in open & Standby – in mount)
On new Primary Server.
./srvctl start instance –d live –i livestdby2
Verify using ./crs_stat –t
Check that database is opened in R/W mode.
On new Standby Server.
./srvctl start instance –d live –i live2 –o mount

Now add TAF services on new Primary (former Standby) Server.
By Command Prompt,
./srvctl add service -d live -s srvc_livestdby -r livestdby1,livestdby2 -P BASIC
OR
By GUI,
dbca -> Oracle Read Application Cluster database -> Service Management -> select database -> add services, details (Preferred / Available), TAF Policy (Basic / Preconnect) - > Finish

Start the services,
./srvctl start service -d live

Verify the same,
./crs_stat -t

Perform TAF testing, to make sure Load Balancing

OCR is automatically backed

OCR is automatically backed up physically:
Every four hours: CRS keeps the last three copies.
At the end of every day: CRS keeps the last two copies.
At the end of every week: CRS keeps the last two copies

[oracle@rac2 ~]$ ocrconfig -showbackup

rac2 2010/10/04 10:29:12 /u01/app/oracle/product/crs/cdata/crs

rac2 2010/10/04 06:29:11 /u01/app/oracle/product/crs/cdata/crs

rac2 2010/10/04 02:29:11 /u01/app/oracle/product/crs/cdata/crs

rac2 2010/10/03 02:29:07 /u01/app/oracle/product/crs/cdata/crs

dbrac1 2010/09/17 04:17:06 /u01/app/oracle/product/crs/cdata/crs
[oracle@rac2 ~]$ cd /u01/app/oracle/product/crs/cdata/crs
[oracle@rac2 crs]$ ls -ltr
total 27916
-rw-r--r-- 1 root root 4079616 Jun 13 02:16 week.ocr
-rw-r--r-- 1 root root 4079616 Sep 29 00:58 week_.ocr
-rw-r--r-- 1 root root 4079616 Oct 3 02:29 day.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 02:29 day_.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 02:29 backup02.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 06:29 backup01.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 10:29 backup00.ocr

Wednesday, September 29, 2010

Checkpoint Tuning

Determining the time to recover from an instance failure is a necessary component for reaching required service levelsagreements. For example, if service levels dictate that when a node fails, instance recovery time can be no more than 3 minutes, FAST_START_MTTR_TARGET should be set to 180

Fast-start checkpointing refers to the periodic writes by the database writer (DBWn) processes for the purpose of writing changed data blocks from the Oracle buffer cache to disk and advancing the thread-checkpoint. Setting the database parameter FAST_START_MTTR_TARGET to a value greater than zero enables the fast-start checkpointing feature.

Fast-start checkpointing should always be enabled for the following reasons:

It reduces the time required for cache recovery, and makes instance recovery time-bounded and predictable. This is accomplished by limiting the number of dirty buffers (data blocks which have changes in memory that still need to be written to disk) and the number of redo records (changes in the database) generated between the most recent redo record and the last checkpoint.

Fast-Start checkpointing eliminates bulk writes and corresponding I/O spikes that occure traditionally with interval- based checkpoints, providing a smoother, more consistent I/O pattern that is more predictable and easier to manage.
If the system is not already near or at its maximum I/O capacity, fast-start checkpointing will have a negligible impact on performance. Although fast-start checkpointing results in increased write activity, there is little reduction in database throughout, provided the system has sufficient I/O capacity.

Check-Pointing

Check-pointing is an important Oracle activity which records the highest system change number (SCN,) so that all data blocks less than or equal to the SCN are known to be written out to the data files. If there is a failure and then subsequent cache recovery, only the redo records containing changes at SCN(s) higher than the checkpoint need to be applied during recovery.

As we are aware, instance and crash recovery occur in two steps - cache recovery followed by transaction recovery. During the cache recovery phase, also known as the rolling forward stage, Oracle applies all committed and uncommitted changes in the redo log files to the affected data blocks. The work required for cache recovery processing is proportional to the rate of change to the database and the time between checkpoints.

Mean time to recover (MTTR)

Fast-start recovery can greatly reduce the mean time to recover (MTTR), with minimal effects on online application performance. Oracle continuously estimates the recovery time and automatically adjusts the check-pointing rate to meet the target recovery time.

With 10g, the Oracle database can now self-tune check-pointing to achieve good recovery times with low impact on normal throughput. You no longer have to set any checkpoint-related parameters.

This method reduces the time required for cache recovery and makes the recovery bounded and predictable by limiting the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint. Administrators specify a target (bounded) time to complete the cache recovery phase of recovery with the FAST_START_MTTR_TARGET initialization parameter, and Oracle automatically varies the incremental checkpoint writes to meet that target.

The TARGET_MTTR field of V$INSTANCE_RECOVERY contains the MTTR target in effect. The ESTIMATED_MTTR field of V$INSTANCE_RECOVERY contains the estimated MTTR should a crash happen right away.

Enable MTTR advisory

Enabling MTTR Advisory Enabling MTTR Advisory involves setting two parameters:

STATISTICS_LEVEL = TYPICAL
FAST_START_MTTR_TARGET > 0

Estimate the value for FAST_START_MTTR_TARGET as follows:

SELECT TARGET_MTTR,
ESTIMATED_MTTR,
CKPT_BLOCK_WRITES
FROM V$INSTANCE_RECOVERY;

TARGET_MTTR ESTIMATED_MTTR CKPT_BLOCK_WRITES
----------- -------------- -----------------
214 12 269880

FAST_START_MTTR_TARGET = 214;

Whenever you set FAST_START_MTTR_TARGET to a nonzero value, then set the following parameters to 0.

LOG_CHECKPOINT_TIMEOUT = 0
LOG_CHECKPOINT_INTERVAL = 0
FAST_START_IO_TARGET = 0

Disable MTTR advisory

FAST_START_MTTR_TARGET = 0
LOG_CHECKPOINT_INTERVAL = 200000

Tuesday, September 28, 2010

Duplicate Oracle Database with RMAN

Overview
A powerful feature of RMAN is the ability to duplicate (clone), a database from a backup. It is possible to create a duplicate database on:

A remote server with the same file structure
A remote server with a different file structure
The local server with a different file structure
A duplicate database is distinct from a standby database, although both types of databases are created with the DUPLICATE command. A standby database is a copy of the primary database that you can update continually or periodically by using archived logs from the primary database. If the primary database is damaged or destroyed, then you can perform failover to the standby database and effectively transform it into the new primary database. A duplicate database, on the other hand, cannot be used in this way: it is not intended for failover scenarios and does not support the various standby recovery and failover options.

To prepare for database duplication, you must first create an auxiliary instance. For the duplication to work, you must connect RMAN to both the target (primary) database and an auxiliary instance started in NOMOUNT mode.

So long as RMAN is able to connect to the primary and duplicate instances, the RMAN client can run on any machine. However, all backups, copies of datafiles, and archived logs used for creating and recovering the duplicate database must be accessible by the server session on the duplicate host.

As part of the duplicating operation, RMAN manages the following:

Restores the target datafiles to the duplicate database and performs incomplete recovery by using all available backups and archived logs.

Shuts down and starts the auxiliary database.

Opens the duplicate database with the RESETLOGS option after incomplete recovery to create the online redo logs.

Generates a new, unique DBID for the duplicate database.

Preparing the Duplicate (Auxiliary) Instance for Duplication
Create an Oracle Password File

First we must create a password file for the duplicate instance.

export ORACLE_SID=APP2
orapwd file=orapwAPP2 password=manager entries=5 force=y

Ensure Oracle Net Connectivity to both Instances

Next add the appropriate entries into the TNSNAMES.ORA and LISTENER.ORA files in the $TNS_ADMIN directory.

LISTENER.ORA

APP1 = Target Database, APP2 = Auxiliary Database

LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = gentic)(PORT = 1521))
)
)

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = APP1.WORLD)
(ORACLE_HOME = /opt/oracle/product/10.2.0)
(SID_NAME = APP1)
)
(SID_DESC =
(GLOBAL_DBNAME = APP2.WORLD)
(ORACLE_HOME = /opt/oracle/product/10.2.0)
(SID_NAME = APP2)
)
)

TNSNAMES.ORA

APP1 = Target Database, APP2 = Auxiliary Database

APP1.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = gentic)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = APP1.WORLD)
)
)

APP2.WORLD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = gentic)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = APP2.WORLD)
)
)

SQLNET.ORA

NAMES.DIRECTORY_PATH= (TNSNAMES)
NAMES.DEFAULT_DOMAIN = WORLD
NAME.DEFAULT_ZONE = WORLD
USE_DEDICATED_SERVER = ON

Now restart the Listener

lsnrctl stop
lsnrctl start

Create an Initialization Parameter File for the Auxiliary Instance

Create an INIT.ORA parameter file for the auxiliary instance, you can copy that from the target instance and then modify the parameters.

### Duplicate Database
### -----------------------------------------------
# This is only used when you duplicate the database
# on the same host to avoid name conflicts

DB_FILE_NAME_CONVERT = (/u01/oracle/db/APP1/,/u01/oracle/db/APP2/)
LOG_FILE_NAME_CONVERT = (/u01/oracle/db/APP1/,/u01/oracle/db/APP2/,
/opt/oracle/db/APP1/,/opt/oracle/db/APP2/)

### Global database name is db_name.db_domain
### -----------------------------------------

db_name = APP2
db_unique_name = APP2_GENTIC
db_domain = WORLD
service_names = APP2
instance_name = APP2

### Basic Configuration Parameters
### ------------------------------

compatible = 10.2.0.4
db_block_size = 8192
db_file_multiblock_read_count = 32
db_files = 512
control_files = /u01/oracle/db/APP2/con/APP2_con01.con,
/opt/oracle/db/APP2/con/APP2_con02.con

### Database Buffer Cache, I/O
### --------------------------
# The Parameter SGA_TARGET enables Automatic Shared Memory Management

sga_target = 500M
sga_max_size = 600M

### REDO Logging without Data Guard
### -------------------------------

log_archive_format = APP2_%s_%t_%r.arc
log_archive_max_processes = 2
log_archive_dest = /u01/oracle/db/APP2/arc

### System Managed Undo
### -------------------

undo_management = auto
undo_retention = 10800
undo_tablespace = undo

### Traces, Dumps and Passwordfile
### ------------------------------

audit_file_dest = /u01/oracle/db/APP2/adm/admp
user_dump_dest = /u01/oracle/db/APP2/adm/udmp
background_dump_dest = /u01/oracle/db/APP2/adm/bdmp
core_dump_dest = /u01/oracle/db/APP2/adm/cdmp
utl_file_dir = /u01/oracle/db/APP2/adm/utld
remote_login_passwordfile = exclusive

Create a full Database Backup

Make sure that a full backup of the target is accessible on the duplicate host. You can use the following BASH script to backup the target database.

rman nocatalog target / <<-EOF
configure retention policy to recovery window of 3 days;
configure backup optimization on;
configure controlfile autobackup on;
configure default device type to disk;
configure device type disk parallelism 1 backup type to compressed backupset;
configure datafile backup copies for device type disk to 1;
configure maxsetsize to unlimited;
configure snapshot controlfile name to '/u01/backup/snapshot_controlfile';
show all;

run {
allocate channel ch1 type Disk maxpiecesize = 1900M;
backup full database noexclude
include current controlfile
format '/u01/backup/datafile_%s_%p.bak'
tag 'datafile_daily';
}

run {
allocate channel ch1 type Disk maxpiecesize = 1900M;
backup archivelog all
delete all input
format '/u01/backup/archivelog_%s_%p.bak'
tag 'archivelog_daily';
}

run {
allocate channel ch1 type Disk maxpiecesize = 1900M;
backup format '/u01/backup/controlfile_%s.bak' current controlfile;
}

crosscheck backup;
list backup of database;
report unrecoverable;
report schema;
report need backup;
report obsolete;
delete noprompt expired backup of database;
delete noprompt expired backup of controlfile;
delete noprompt expired backup of archivelog all;
delete noprompt obsolete recovery window of 3 days;
quit
EOF

Creating a Duplicate Database on the Local Host
Before beginning RMAN duplication, use SQL*Plus to connect to the auxiliary instance and start it in NOMOUNT mode. If you do not have a server-side initialization parameter file for the auxiliary instance in the default location, then you must specify the client-side initialization parameter file with the PFILE parameter on the DUPLICATE command.

Get original Filenames from TARGET

To rename the database files you can use the SET NEWNAME command. Therefore, get the original filenames from the target and modify these names in the DUPLICATE command.

ORACLE_SID=APP1
export ORACLE_SID

set feed off
set pagesize 10000
column name format a40 heading "Datafile"
column file# format 99 heading "File-ID"

select name, file# from v$dbfile;

column member format a40 heading "Logfile"
column group# format 99 heading "Group-Nr"

select member, group# from v$logfile;

Datafile File-ID
---------------------------------------- -------
/u01/oracle/db/APP1/sys/APP1_sys1.dbf 1
/u01/oracle/db/APP1/sys/APP1_undo1.dbf 2
/u01/oracle/db/APP1/sys/APP1_sysaux1.dbf 3
/u01/oracle/db/APP1/usr/APP1_users1.dbf 4

Logfile Group-Nr
---------------------------------------- --------
/u01/oracle/db/APP1/rdo/APP1_log1A.rdo 1
/opt/oracle/db/APP1/rdo/APP1_log1B.rdo 1
/u01/oracle/db/APP1/rdo/APP1_log2A.rdo 2
/opt/oracle/db/APP1/rdo/APP1_log2B.rdo 2
/u01/oracle/db/APP1/rdo/APP1_log3A.rdo 3
/opt/oracle/db/APP1/rdo/APP1_log3B.rdo 3
/u01/oracle/db/APP1/rdo/APP1_log4A.rdo 4
/opt/oracle/db/APP1/rdo/APP1_log4B.rdo 4
/u01/oracle/db/APP1/rdo/APP1_log5A.rdo 5
/opt/oracle/db/APP1/rdo/APP1_log5B.rdo 5
/u01/oracle/db/APP1/rdo/APP1_log6A.rdo 6
/opt/oracle/db/APP1/rdo/APP1_log6B.rdo 6
/u01/oracle/db/APP1/rdo/APP1_log7A.rdo 7
/opt/oracle/db/APP1/rdo/APP1_log7B.rdo 7
/u01/oracle/db/APP1/rdo/APP1_log8A.rdo 8
/opt/oracle/db/APP1/rdo/APP1_log8B.rdo 8
/u01/oracle/db/APP1/rdo/APP1_log9A.rdo 9
/opt/oracle/db/APP1/rdo/APP1_log9B.rdo 9
/u01/oracle/db/APP1/rdo/APP1_log10A.rdo 10
/opt/oracle/db/APP1/rdo/APP1_log10B.rdo 10

Create Directories for the duplicate Database

mkdir -p /u01/oracle/db/APP2
mkdir -p /opt/oracle/db/APP2
cd /opt/oracle/db/APP2
mkdir con rdo
cd /u01/oracle/db/APP2
mkdir adm arc con rdo sys tmp usr bck
cd adm
mkdir admp bdmp cdmp udmp utld

Create Symbolic Links to Password and INIT.ORA File

Oracle must be able to locate the Password and INIT.ORA File.

cd $ORACLE_HOME/dbs
ln -s /home/oracle/config/10.2.0/orapwAPP2 orapwAPP2
ln -s /home/oracle/config/10.2.0/initAPP2.ora initAPP2.ora

Duplicate the Database

Now you are ready to duplicate the database APP1 to APP2.

ORACLE_SID=APP2
export ORACLE_SID

sqlplus sys/manager as sysdba
startup force nomount pfile='/home/oracle/config/10.2.0/initAPP2.ora';
exit;

rman TARGET sys/manager@APP1 AUXILIARY sys/manager@APP2

Recovery Manager: Release 10.2.0.4.0 - Production on Tue Oct 28 12:00:13 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: APP1 (DBID=3191823649)
connected to auxiliary database: APP2 (not mounted)

RUN
{
SET NEWNAME FOR DATAFILE 1 TO '/u01/oracle/db/APP2/sys/APP2_sys1.dbf';
SET NEWNAME FOR DATAFILE 2 TO '/u01/oracle/db/APP2/sys/APP2_undo1.dbf';
SET NEWNAME FOR DATAFILE 3 TO '/u01/oracle/db/APP2/sys/APP2_sysaux1.dbf';
SET NEWNAME FOR DATAFILE 4 TO '/u01/oracle/db/APP2/usr/APP2_users1.dbf';
DUPLICATE TARGET DATABASE TO APP2
PFILE = /home/oracle/config/10.2.0/initAPP2.ora
NOFILENAMECHECK
LOGFILE GROUP 1 ('/u01/oracle/db/APP2/rdo/APP2_log1A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log1B.rdo') SIZE 10M REUSE,
GROUP 2 ('/u01/oracle/db/APP2/rdo/APP2_log2A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log2B.rdo') SIZE 10M REUSE,
GROUP 3 ('/u01/oracle/db/APP2/rdo/APP2_log3A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log3B.rdo') SIZE 10M REUSE,
GROUP 4 ('/u01/oracle/db/APP2/rdo/APP2_log4A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log4B.rdo') SIZE 10M REUSE,
GROUP 5 ('/u01/oracle/db/APP2/rdo/APP2_log5A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log5B.rdo') SIZE 10M REUSE,
GROUP 6 ('/u01/oracle/db/APP2/rdo/APP2_log6A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log6B.rdo') SIZE 10M REUSE,
GROUP 7 ('/u01/oracle/db/APP2/rdo/APP2_log7A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log7B.rdo') SIZE 10M REUSE,
GROUP 8 ('/u01/oracle/db/APP2/rdo/APP2_log8A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log8B.rdo') SIZE 10M REUSE,
GROUP 9 ('/u01/oracle/db/APP2/rdo/APP2_log9A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log9B.rdo') SIZE 10M REUSE,
GROUP 10 ('/u01/oracle/db/APP2/rdo/APP2_log10A.rdo',
'/opt/oracle/db/APP2/rdo/APP2_log10B.rdo') SIZE 10M REUSE;
}

The whole, long output is not shown here, but check, that RMAN was able to open the duplicate database with the RESETLOGS option.

.....
.....
contents of Memory Script:
{
Alter clone database open resetlogs;
}
executing Memory Script

database opened
Finished Duplicate Db at 28-OCT-08

As the final step, eliminate or uncomment the DB_FILE_NAME_CONVERT and LOG_FILE_NAME_CONVERT in the INIT.ORA file and restart the database.

initAPP2.ora

### Duplicate Database
### -----------------------------------------------
# This is only used when you duplicate the database
# on the same host to avoid name conflicts
# DB_FILE_NAME_CONVERT = (/u01/oracle/db/APP1/,/u01/oracle/db/APP2/)
# LOG_FILE_NAME_CONVERT = (/u01/oracle/db/APP1/,/u01/oracle/db/APP2/,
/opt/oracle/db/APP1/,/opt/oracle/db/APP2/)

sqlplus / as sysdba
shutdown immediate;
startup;

Total System Global Area 629145600 bytes
Fixed Size 1269064 bytes
Variable Size 251658936 bytes
Database Buffers 373293056 bytes
Redo Buffers 2924544 bytes
Database mounted.
Database opened.

Creating a Duplicate Database to Remote Host
This scenario is exactly the same as described for the local host. Copy the RMAN Backup files to the remote host on the same directory as on the localhost.

cd /u01/backup
scp gentic:/u01/backup/* .

The other steps are the same as described under «Creating a Duplicate Database on the Local Host».

Duplicate Oracle Database with RMAN

Monday, September 27, 2010

Memory Architecture of an Oracle instance

Oracle Database can set limits on how much virtual memory the database uses for the SGA. It can start instances with minimal memory and allow the instance to use more memory by expanding the memory allocated for SGA components, up to a maximum determined by the SGA_MAX_SIZE initialization parameter.

SGA_MAX_SIZE is not dynamic. You can NOT change the value.

However, as compared to previous versions of Oracle, in 10g SGA_MAX_SIZE does not define the size of memory allocated, but rather the MAXIMUM size that CAN be allocated.

The SGA_TARGET Initialization Parameter

The SGA_TARGET initialization parameter reflects the total size of the SGA and includes memory for the following components:

* Fixed SGA and other internal allocations needed by the Oracle Database instance
* The log buffer
* The shared pool
* The Java pool
* The buffer cache
*

The keep and recycle buffer caches (if specified)
*

Nonstandard block size buffer caches (if specified)
*

The Streams pool

It is significant that SGA_TARGET includes the entire memory for the SGA, in contrast to earlier releases in which memory for the internal and fixed SGA was added to the sum of the configured SGA memory parameters. Thus, SGA_TARGET gives you precise control over the size of the shared memory region allocated by the database. If SGA_TARGET is set to a value greater than SGA_MAX_SIZE at startup, then the latter is bumped up to accommodate SGA_TARGET.

Automatically Managed SGA Components

When you set a value for SGA_TARGET, Oracle Database 10g automatically sizes the most commonly configured components, including:

*

The shared pool (for SQL and PL/SQL execution)
*

The Java pool (for Java execution state)
*

The large pool (for large allocations such as RMAN backup buffers)
*

The buffer cache
*

The Streams pool

You need not set the size of any of these components explicitly. By default the parameters for these components will appear to have values of zero. Whenever a component needs memory, it can request that it be transferred from another component by way of the internal automatic tuning mechanism. This transfer of memory occurs transparently, without user intervention.

The performance of each of these automatically sized components is monitored by the Oracle Database instance. The instance uses internal views and statistics to determine how to distribute memory optimally among the components. As the workload changes, memory is redistributed to ensure optimal performance. To calculate the optimal distribution of memory, the database uses an algorithm that takes into consideration both long-term and short-term trends.

Manually Managed SGA Components

There are a few SGA components whose sizes are not automatically adjusted. The administrator needs to specify the sizes of these components explicitly, if needed by the application. Such components are:

*

Keep/Recycle buffer caches (controlled by DB_KEEP_CACHE_SIZE and DB_RECYCLE_CACHE_SIZE)
*

Additional buffer caches for non-standard block sizes (controlled by DB_nK_CACHE_SIZE, n = {2, 4, 8, 16, 32})

The sizes of these components is determined by the administrator-defined value of their corresponding parameters. These values can, of course, be changed any time either using Enterprise Manager or from the command line with an ALTER SYSTEM statement.

The memory consumed by manually sized components reduces the amount of memory available for automatic adjustment. For example, in the following configuration:

SGA_TARGET = 256M
DB_8K_CACHE_SIZE = 32M

The instance has only 224 MB (256 - 32) remaining to be distributed among the automatically sized components.

Automatic Shared Memory Management

In previous database releases, a database administrator (DBA) was required to manually specify different SGA component sizes by setting a number of initialization parameters, including the SHARED_POOL_SIZE, DB_CACHE_SIZE, JAVA_POOL_SIZE, and LARGE_POOL_SIZE parameters. Oracle Database 10g includes the Automatic Shared Memory Management feature which simplifies the SGA memory management significantly. In Oracle Database 10g, a DBA can simply specify the total amount of SGA memory available to an instance using the SGA_TARGET initialization parameter and the Oracle Database will automatically distribute this memory among various subcomponents to ensure most effective memory utilization.

When automatic SGA memory management is enabled, the sizes of the different SGA components are flexible and can adapt to the needs of a workload without requiring any additional configuration. The database automatically distributes the available memory among the various components as required, allowing the system to maximize the use of all available SGA memory.

Consider a manual configuration in which 1 GB of memory is available for the SGA and distributed to the following initialization parameters:

SHARED_POOL_SIZE=128M
DB_CACHE_SIZE=896M

If an application attempts to allocate more than 128 MB of memory from the shared pool, an error is raised that indicates that the available shared pool has been exhausted. There could be free memory in the buffer cache, but this memory is not accessible to the shared pool. You would have to manually resize the buffer cache and the shared pool to work around this problem.

With automatic SGA management, you can simply set the SGA_TARGET initialization parameter to 1G. If an application needs more shared pool memory, it can obtain that memory by acquiring it from the free memory in the buffer cache.

Setting a single parameter greatly simplifies the administration task. You specify only the amount of SGA memory that an instance has available and forget about the sizes of individual components. No out of memory errors are generated unless the system has actually run out of memory.

Automatic SGA management can enhance workload performance without requiring any additional resources or manual tuning effort. With manual configuration of the SGA, it is possible that compiled SQL statements frequently age out of the shared pool because of its inadequate size. This can increase the frequency of hard parses, leading to reduced performance. When automatic SGA management is enabled, the internal tuning algorithm monitors the performance of the workload, increasing the shared pool if it determines the increase will reduce the number of parses required.

Moving, copying or cloning a database from one server to another with different directory structures can be easily accomplished with RMAN

Moving, copying or cloning a database from one server to another with different directory structures can be easily accomplished with RMAN. Imagine that you have a database on one node and you want to copy it to another node without shuting down your database and move your datafiles to a different directory structure… This will be demonstrated here by using RMAN.

ASSUMPTIONS

Source Database

* 10.2.0.4 database online (sid neo) at server1 (app)
* archivelog mode is enabled
* db datafiles are in the directory /opt/oracle/oradata/neo2
* database will be backed up online with RMAN to /u01/backup

Destiny Database

* 10.2.0.4 Oracle Home installed without any database running at server2 (mynode2.com)
* db datafiles must be created / moved to different directory: /opt/oracle/oradata/neo
* only the manual backup created at server1 will be moved to server2

AT SERVER1
Moving, copying or cloning a database from one server to another with different directory structures can be easily accomplished with RMAN. Imagine that you have a database on one node and you want to copy it to another node without shuting down your database and move your datafiles to a different directory structure… This will be demonstrated here by using RMAN.

ASSUMPTIONS

Source Database

* 10.2.0.4 database online (sid neo) at server1 (app)
* archivelog mode is enabled
* db datafiles are in the directory /opt/oracle/oradata/neo2
* database will be backed up online with RMAN to /u01/backup

Logon as oracle user software owner at server1 and set your environment variables. Then open RMAN and backup the source database we want to copy /move / clone.

[oracle@neoface oracle]$ export ORACLE_HOME=/opt/oracle/product/10.2.0/db_1
[oracle@neoface oracle]$ export ORACLE_SID=neo
[oracle@neoface oracle]$ export PATH=$ORACLE_HOME/bin:$PATH
[oracle@neoface oracle]$ rman target /
RMAN> backup database plus archivelog;

cf_NEO_c-1689570411-20090106-00 (control file backup)
back_NEO_675389594_736_1
back_NEO_675389780_737_1
back_NEO_675390018_738_1
back_NEO_675390293_739_1

Copy those 5 backup files to server2

[oracle@neoface oracle]$ scp /u01/backup/back_NEO* root@mynode2.com:/u01/backup/

Create an initialization file (pfile) from the current spfile. Then copy it to the server2.

[oracle@neoface oracle]$ sqlplus “/ as sysdba”
SQL> create pfile from spfile;
SQL> exit;
[oracle@neoface oracle]$ scp /opt/oracle/product/10.2.0/db_1/dbs/initneo.ora oracle@mynode2.com:/opt/oracle/product/10.2.0/db_1/dbs/initneo.ora/

AT SERVER2

Logon at server2 to do the following steps:

* create the OS directories to hold the datafiles and the admin log files and pfile:
* edit the pfile to modify the instance name in parameters like bdump, udump, etc
* change the onwership of pfile to belong to oracle user
* connect to RMAN and startup the database in nomount mode
* restore the control file from the backup
* mount the database
* validate catalog by crosschecking and cataloging the 4 backups pieces we copied
* rename the datafiles and redolog files and restoring the database

Switch to oracle user and create datafiles directories :

[root@mynode2 root] su – oracle
[oracle@mynode2 oracle]$ mkdir /opt/oracle/admin/neo -p
[oracle@mynode2 oracle]$ cd /opt/oracle/admin/neo
[oracle@mynode2 oracle]$ mkdir cdump udump bdump pfile

[oracle@mynode2 oracle]$ mkdir /opt/oracle/oradata/neo -p

Edit your pfile accordingly your new directory structure:

[oracle@mynode2 oracle]$ vi /opt/oracle/product/10.2.0/db_1/dbs/initneo.ora

Set environment variables and start working on RMAN:

[oracle@mynode2 oracle]$ export ORACLE_HOME=/opt/oracle/product/10.2.0/db_1
[oracle@mynode2 oracle]$ export ORACLE_SID=neo
[oracle@mynode2 oracle]$ export PATH=$ORACLE_HOME/bin:$PATH
[oracle@mynode2 oracle]$ rman target /
RMAN> startup nomount
RMAN> restore controlfile from ‘/u01/backup/cf_NEO_c-1689570411-20090106-00′;
RMAN> alter database mount ;
RMAN> exit

Now that the database is mounted, we’ll check the correct database SCN from the current log that we’ll use later to recover the database. Take note of your current SCN.

[oracle@mynode2 oracle]$ sqlplus “/ as sysdba”
SQL> select group#, first_change#, status, archived from v$log;

GROUP# FIRST_CHANGE# STATUS ARC
---------- ------------- ---------------- ---
1 336565140 ACTIVE YES
2 336415067 CURRENT NO
3 336523814 INACTIVE YES

SQL> exit;

[oracle@mynode2 oracle]$ rman target /

As we only copied to this server the backup we created at the beggining and we did not copy all the backups we had on server1 we must crosscheck the catalog against the OS files. Run the following commands at RMAN prompt :

RMAN> CROSSCHECK backup;
RMAN> CROSSCHECK copy;
RMAN> CROSSCHECK backup of database;
RMAN> CROSSCHECK backup of controlfile;
RMAN> CROSSCHECK archivelog all;

Now let’s catalog the 4 backup pieces that we copy to this server2:

RMAN> CATALOG backuppiece ‘/u01/backup/back_NEO_675389594_736_1′;
RMAN> CATALOG backuppiece ‘/u01/backup/back_NEO_675389780_737_1′;
RMAN> CATALOG backuppiece ‘/u01/backup/back_NEO_675390018_738_1′;
RMAN> CATALOG backuppiece ‘/u01/backup/back_NEO_675390293_739_1′;

Next, as we changed the directory of our datafiles we must rename the redologs:
RMAN> ALTER DATABASE rename file ‘/opt/oracle/oradata/neo2/redo01.log’ to ‘/opt/oracle/oradata/neo/redo01.log’;
RMAN> ALTER DATABASE rename file ‘/opt/oracle/oradata/neo2/redo02.log’ to ‘/opt/oracle/oradata/neo/redo02.log’;
RMAN> ALTER DATABASE rename file ‘/opt/oracle/oradata/neo2/redo03.log’ to ‘/opt/oracle/oradata/neo/redo03.log’;

If you use BLOCK CHANGE TRACKING to allow fast incremental backups, and if you want to move the datafiles to different directory you must disable this feature and enabling it by specifying the new dir:

RMAN> ALTER DATABASE disable block change tracking;
RMAN> ALTER DATABASE enable block change tracking using file ‘/opt/oracle/oradata/neo/block_change_tracking.f’;

This will avoid errors like ORA-19751 and ORA-19750

Now let’s run the script that will restore our database, renaming the datafiles and recovering until the archivelog with SCN 336415067, the current one.

RMAN> run {
set newname for datafile 1 to “/opt/oracle/oradata/neo/system01.dbf”;
set newname for datafile 2 to “/opt/oracle/oradata/neo/undotbs01.dbf”;
set newname for datafile 3 to “/opt/oracle/oradata/neo/sysaux01.dbf”;
set newname for datafile 4 to “/opt/oracle/oradata/neo/data01.dbf”;
set newname for datafile 5 to “/opt/oracle/oradata/neo/index01.dbf”;
set newname for datafile 6 to “/opt/oracle/oradata/neo/users01.dbf”;
set newname for datafile 7 to “/opt/oracle/oradata/neo/streams.dbf”;
set newname for datafile 8 to “/opt/oracle/oradata/neo/data01brig.dbf”;
set newname for datafile 9 to “/opt/oracle/oradata/neo/index02.dbf”;
restore database;
switch datafile all;
recover database until scn 336415067;
}

RMAN> ALTER DATABASE open resetlogs;

I didn’t manage to avoid errors like ORA-01110 and ORA-01180 at RMAN without using the “until” clause in the “recover database” sentence instead, like most people use it, as the first instruction after the run command.

----------------------------------------------------------------------------------
Renaming the database in the process

Pfile

Changing the dbname and path,
sqlplus / as sysdba
create pfile from spfile;
exit

cd $ORACLE_HOME/dbs
rm -f spfiledbname3.ora
vi initdbname3.ora
then,

remove the first lines
:%s/dbname/dbname3/g

Database

Look for datafile IDs,

list backup of database;

add the rman statements to relocate the datafiles,
run {
set newname for datafile 1 to '/u02/oradata/dbname3/system01.dbf';
set newname for datafile 2 to '/u02/oradata/dbname3/undotbs01.dbf';
set newname for datafile 3 to '/u02/oradata/dbname3/sysaux01.dbf';
set newname for datafile 4 to '/u02/oradata/dbname3/users01.dbf';
set newname for datafile 5 to '/u02/oradata/dbname3/tsname.dbf';
restore database;
}

Brief explanation of how assorted Oracle files can be renamed or moved to a new location

To move or rename a controlfile do the following:

* Alter the control_files parameter using the ALTER SYSTEM comamnd.
* Shutdown the database.
* Rename the physical file on the OS.
* Start the database.

To move or rename a logfile do the following:

* Shutdown the database.
* Rename the physical file on the OS.
* Start the database in mount mode.
* Issue the ALTER DATABASE RENAME FILE command to rename the file within the Oracle dictionary.
* Open the database.

To move or rename a datafile do the following:

* Shutdown the database.
* Rename the physical file on the OS.
* Start the database in mount mode.
* Issue the ALTER DATABASE RENAME FILE command to rename the file within the Oracle dictionary.
* Open the database.

ORA-01555: snapshot too old during export EXP-00000: Export terminated unsuccessfully

Seems the other sessions are updating the database during your export.
So use CONSISTENT=N ( which is default).
Dont specify CONSISTENT=Y.
Else
Increase your RBS(in 8i) and look into undo Management in 9i.
There is a long running transaction that needs more undo space than the available one.

sql >show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 0
undo_suppress_errors boolean FALSE
undo_tablespace string UNDOTBS1

If the UNDO_RETENTION initialization parameter is not specified, the default value is 900 seconds.( I must have tweaked mine. Ingore it).

to reset this would be the command.
ALTER SYSTEM SET UNDO_RETENTION = 30000;

quoting docs

Committed undo information normally is lost when its undo space is overwritten by a newer transaction. But for consistent read purposes, long running queries might require old undo information for undoing changes and producing older images of data blocks. The initialization parameter, UNDO_RETENTION, provides a means of explicitly specifying the amount of undo information to retain. With a proper setting, long running queries can complete without risk of receiving the "snapshot too old" error.

Cost Based Optimizer (CBO) and Database Statistics

Cost Based Optimizer (CBO) and Database Statistics

Whenever a valid SQL statement is processed Oracle has to decide how to retrieve the necessary data. This decision can be made using one of two methods:

•Rule Based Optimizer (RBO) - This method is used if the server has no internal statistics relating to the objects referenced by the statement. This method is no longer favoured by Oracle and will be desupported in future releases.

•Cost Based Optimizer (CBO) - This method is used if internal statistics are present. The CBO checks several possible execution plans and selects the one with the lowest cost, where cost relates to system resources.

If new objects are created, or the amount of data in the database changes the statistics will no longer represent the real state of the database so the CBO decision process may be seriously impaired. The mechanisms and issues relating to maintenance of internal statistics are explained below:

•Analyze Statement
•DBMS_UTILITY
•DBMS_STATS
•Scheduling Stats
•Transfering Stats
•Issues
Analyze Statement
The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows:

ANALYZE TABLE employees COMPUTE STATISTICS;
ANALYZE INDEX employees_pk COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 100 ROWS;
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;DBMS_UTILITY
The DBMS_UTILITY package can be used to gather statistics for a whole schema or database. Both methods follow the same format as the analyze statement:

EXEC DBMS_UTILITY.analyze_schema('SCOTT','COMPUTE');
EXEC DBMS_UTILITY.analyze_schema('SCOTT','ESTIMATE', estimate_rows => 100);
EXEC DBMS_UTILITY.analyze_schema('SCOTT','ESTIMATE', estimate_percent => 15);

EXEC DBMS_UTILITY.analyze_database('COMPUTE');
EXEC DBMS_UTILITY.analyze_database('ESTIMATE', estimate_rows => 100);
EXEC DBMS_UTILITY.analyze_database('ESTIMATE', estimate_percent => 15);

DBMS_STATS

The DBMS_STATS package was introduced in Oracle 8i and is Oracles preferred method of gathering object statistics. Oracle list a number of benefits to using it including parallel execution, long term storage of statistics and transfer of statistics between servers. Once again, it follows a similar format to the other methods:

EXEC DBMS_STATS.gather_database_stats;
EXEC DBMS_STATS.gather_database_stats(estimate_percent => 15);

EXEC DBMS_STATS.gather_schema_stats('SCOTT');
EXEC DBMS_STATS.gather_schema_stats('SCOTT', estimate_percent => 15);

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');
EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent => 15);

EXEC DBMS_STATS.gather_index_stats('SCOTT', 'EMPLOYEES_PK');
EXEC DBMS_STATS.gather_index_stats('SCOTT', 'EMPLOYEES_PK', estimate_percent => 15);

This package also gives you the ability to delete statistics:

EXEC DBMS_STATS.delete_database_stats;
EXEC DBMS_STATS.delete_schema_stats('SCOTT');
EXEC DBMS_STATS.delete_table_stats('SCOTT', 'EMPLOYEES');
EXEC DBMS_STATS.delete_index_stats('SCOTT', 'EMPLOYEES_PK');

Scheduling Stats

Scheduling the gathering of statistics using DBMS_Job is the easiest way to make sure they are always up to date:

SET SERVEROUTPUT ON
DECLARE
l_job NUMBER;
BEGIN
DBMS_JOB.submit(l_job,
'BEGIN DBMS_STATS.gather_schema_stats(''SCOTT''); END;',
SYSDATE,
'SYSDATE + 1');
COMMIT;
DBMS_OUTPUT.put_line('Job: ' || l_job);
END;
/

The above code sets up a job to gather statistics for SCOTT for the current time every day. You can list the current jobs on the server using the DBA_JOBS and DBA_JOBS_RUNNING views.

Existing jobs can be removed using:

EXEC DBMS_JOB.remove(X);
COMMIT;

Where 'X' is the number of the job to be removed.

Transfering Stats

It is possible to transfer statistics between servers allowing consistent execution plans between servers with varying amounts of data. First the statistics must be collected into a statistics table. In the following examples the statistics for the APPSCHEMA user are collected into a new table, STATS_TABLE, which is owned by DBASCHEMA:

SQL> EXEC DBMS_STATS.create_stat_table('DBASCHEMA','STATS_TABLE');
SQL> EXEC DBMS_STATS.export_schema_stats('APPSCHEMA','STATS_TABLE',NULL,'DBASCHEMA');

This table can then be transfered to another server using your preferred method
(Export/Import, SQLPlus Copy etc.) and the stats imported into the data dictionary as follows:

SQL> EXEC DBMS_STATS.import_schema_stats('APPSCHEMA','STATS_TABLE',NULL,'DBASCHEMA');
SQL> EXEC DBMS_STATS.drop_stat_table('DBASCHEMA','STATS_TABLE');

Issues

•Exclude dataload tables from your regular stats gathering, unless you know they will be full at the time that stats are gathered.
•I've found gathering stats for the SYS schema can make the system run slower, not faster.
•Gathering statistics can be very resource intensive for the server so avoid peak workload times or gather stale stats only.
•Even if scheduled, it may be necessary to gather fresh statistics after database maintenance or large data loads.

ROHIT Blog

Friday, December 17, 2010

Oracle Clusterware Components

Thursday, October 7, 2010

Latches

Wednesday, October 6, 2010

Unix for the DBA

Oracle Clusterware processes for 10g on Unix and Linux

Tuesday, October 5, 2010

ASM REBALANCING

Testing the Fail over in Real Application Clusters Oracle10gR2 using service

Monday, October 4, 2010

RAC Switchover / Switchback

OCR is automatically backed

Wednesday, September 29, 2010

Checkpoint Tuning

Tuesday, September 28, 2010

Duplicate Oracle Database with RMAN

Duplicate Oracle Database with RMAN

Monday, September 27, 2010

Memory Architecture of an Oracle instance

Moving, copying or cloning a database from one server to another with different directory structures can be easily accomplished with RMAN

Brief explanation of how assorted Oracle files can be renamed or moved to a new location

ORA-01555: snapshot too old during export EXP-00000: Export terminated unsuccessfully

Cost Based Optimizer (CBO) and Database Statistics

Followers

Blog Archive

About Me