Insights and Strategies for DBA: March 2015

Wednesday, 25 March 2015

ORA-07445: exception encountered: core dump SIGSEGV

May 29, 2014 10:07:02 AM	~ORA-07445: exception encountered: core dump [_intel_fast_memcmp()+72] [SIGSEGV] [Address not mapped to object] [0x0] [] []
Mar 17, 2015 2:56:05 PM	~ORA-07445: exception encountered: core dump [_intel_fast_memcmp()+38] [SIGSEGV] [Address not mapped to object] [0x0] [] []

*ORA-600/ORA-7445 Error Look-up Tool

ID 153788.1

ID 362953.1

Problem Symptoms

Whenever I check my alert log file I got the text as
Errors in file /var/opt/oracle/admin/udump/orastdby_ora_31795.trc
ORA-07445: exception encountered: core dump [_intel_fast_memcmp()+38] [SIGSEGV] [Address not mapped to object] [0x0] [] []
After checking the trace file I got,
Call Stack Trace shows:
ksedst ksedmp ssexhd intel_fast_memcmp

Recent Changes of The Database
We have changes the CURSOR_SHARING parameter to SIMILAR from EXACT.

Cause of The Problem
This is oracle bug. Bug number 4456646. When the cursor sharing parameter is not set to EXACT this bug may fire. This occur while literal replacement when there are empty string literals in use. This bug happened in oracle version 10.2.0.1. In our environment it was RHL linux 32 bit production server.

Solution of The Problem
Way 1:
As a workaround disable literal replacement. You can do it by,
ALTER SYSTEM SET cursor_sharing='EXACT' SCOPE=both;
if you use spfile.

Way 2:
This bug is fixed in 10.2.0.2 patchset. So, apply 10.2.0.2 patchset in order to fix the prolem.

Way 3:
Apply One-off patch. Download one-off patch 4456646 from metalink and apply if that is avialable for your OS. For Linux x86 and linux x86-64 only available currently.

ABOUT THE CURSOR & CURSOR SHARING

Cursor is a type of memory where store sql info,sql text,execution plans at all on shared pool -> library cache. And Bind variables usage the shared pool.

Friday, 20 March 2015

RAC BG PROCESS

RAC Background Processes

Lock Monitor Processes (LMON)

Lock Monitor Services (LMS)

Lock Monitor Daemon Process ( LMD)

LCKn ( Lock Process)

DIAG (Diagnostic Daemon)

1. Lock Monitor Processes (LMON)

LMON Maintains GCS memory structures.

LMON handles the abnormal termination of processes and instances.

LMON deals with Reconfiguration of locks & resources when an instance joins or leaves the cluster (During reconfiguration LMON generate the trace files)

LMON is responsible for executing dynamic lock remastering every 10 mins ( Only in 10g R2 & later versions).

LMON Processes manages the global locks & resources.

It monitors all instances in cluster, primary for dictionary cache locks, library cache locks & deadlocks on deadlock sensitive on enqueue & resources.

LMON also provides cluster group services.

It is also called as GES [Global Enqueue Service] monitor.

2. Lock Monitor Services (LMS)

Of all Background processes, LMS is the most active

It consumes significant amount of CPU time. (10g R2: Ensure that LMS process does not encounter the CPU starvation).

Its primary job is to transport blocks across the nodes for cache-fusion requests.

When there is a consistent-read request, the LMS process rolls back the block, makes a Consistent-Read image of the block and then ships this block across the HSI (High Speed Interconnect) to the process requesting from a remote node.

LMS also constantly checks with the LMD background process (or our GES process) to get the lock requests placed by the LMD process.

Each node will have 2 or more LMS processes.

GCS_SERVER_PROCESSES parameter denotes the number of LMS processes specified in init. ora file.

The above parameter value is set based on number of cpu’s (MIN(CPU_COUNT/2,2))

In10gR2, for a single CPU instance, only one LMS processes started.

It is advised to Increase the parameter value, if global cache activity is very high.

Also called the GCS (Global Cache Services) processes.

Internal View: X$KJMSDP

3. Lock Monitor Daemon Process ( LMDn)

LMD process performs lock and deadlock detection globally.

It also monitors for lock conversion timeouts.

Sometimes, it is also referred to as the GES (Global Enqueue Service) daemon since its job is to manage the global enqueue and global resource access.

LMD process also handles deadlock detection and remote enqueue requests.

Remote resource requests are the requests originating from another instance.

Internal View: X$KJMDDP

4. LCKn (Lock Process)

It manages instance resource requests & cross instance calls for shared resources.

During instance recovery, it builds a list of invalid lock elements and validates lock elements.

5. DIAG (Diagnostic Daemon)

A new background process introduced in Oracle 10g featuring new enhanced diagnosability framework

Regularly monitors the health of the instance.

Also checks instance hangs & deadlocks.

It captures the vital diagnostics data for instance & process failures.

QUICK REFERENCE:

RAC – Real Application Cluster

GCS – Global Cache Services

GES - Global Enqueue Service

LMD – Lock Monitor Deamon

LMON – Lock Monitor Processes

LMS – Lock Monitor Services

Reference :- BG PROCESS LINK

Cache fusion Recovery

RAC 11gR2

Internal Cache Fusion Recovery.

1. Instance crash

2. GRD Lock Freez

3. GES reconfigure

4. DLM recovery Perform

5. GCS (PCM remastered)

6. Pending Writes and Notification are processed

7. SINGLE PASS RECOVERY FOLLOW
Instance recovery lock is acquired by SMON.
Recovery set prepaired and built.SMON PGA allocate.
SMON acquires locks on buffer that need recovery.

8. TWO PASS RECOVERY
DB is partially available.
Blocks are made available as they are recovered.
Lock Release by SMON.Recovery is complete.
System is available.

Cluster Re-mastering and Fast re-configuring

RAC 11g R2

_gcs_file_to_lock ------ old lazy remastering (re-configure time locks/delete (Join/leave the node) the DLM in all instances then configure but in new Oracle version 11g below parameter.

_gcs_fast_reconfig, _lm_master_weight Main Parameter to maintain.(GRD TC in 11g and JOIN/LEAVE node to re-distribute the resources.

The most common reasons for Re-mastering any Re-configuration.

1. when a node joins or leaves a cluster.

2. when an instance death is detected.Every instance updates the controlfile with a

heartbeat through its Checkpoint (CKPT) process. If heartbeat information is not

present for x amount of time (normally 30 seconds in UNIX variants), the instance is

considered to be dead and the Instance Membership Recovery (IMR)

process initiates reconfiguration.

3. Reconfiguration event is due to a communication failure.Communication channels are

established between the Oracle processes across the nodes.

(GRD / CLUSTER Re-mastering and Re-configuring)

GRD / CLUSTER re-mastering and re-configuring dynamically by LMON

Reconfigure process triggers IMR (Instance Membership Recovery), and Seven Steps process to ensures complete reconfiguration: -

1. The Name Service is Frozen.
2. IDLM is frozen.Lock Database is Frozen.
3. Determination of membership and validation and IMR occur.
4. Bitmap rebuild take place (GRD), including instance name.
5. Dead all instance
6. Unfreeze and release the name of services for use.
7. Hand over reconfiguration to GES/GCS.

IMR starts and node eviction take place, let's look at the corresponding Messgaes in the ALERT.LOG and LMON trace files to get a better Picture.

(IMR is a Deep Level of Node Evection Process) Read IMR (instance Membership recovery) click : here

Past Image

11gR2 RAC

PI - Past image is a copy of a globally dirty block and is maintained in the database buffer cache.It can be createdand saved when dirty block is shipped across to another instance after setting the resource role and global(if it was not already set).

PI must be maintainde by an instance until it or a later version of block is written to disk.
GCS is responsible for informing an instance that its PI is no longer needed after another instance writes a newer version of the same block.

Please are discarded when GCS posts all the holding instances that a new and consistent version of that particular block is now on disk.

check logs and session logs into shows this type

st: PI md: NULL tch: 1 le: 0x7bfa6000 -- Past Image

Note:- In simple Words It's Eleminates Write/Write contentions

Friday, 13 March 2015

OMOTION

RAC 11g R2 11.2.0.2 -- OMOTION ---- Let’s try to get the online relocation done by ourselves and see what happens to the existing and on to the target instance.

After installation of the Oracle 11gr2 Grid Infrastructure and a "software only installation" of 11gr2 RAC, I installed patch 9004119 .

If you want to read about this article with all explanation read below the links:

Installing One Rac Node

Installing & RAC One Node handling of Planned Downtime

Below the steps follow After patching on 11.2.0.1.0 :

[oracle@host01 ~]$ srvctl config database -d orcl

Database unique name: orcl

Database name: orcl

Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1

Oracle user: oracle

Spfile: +FRA/orcl/spfileorcl.ora

Domain:

Start options: open

Stop options: immediate

Database role: PRIMARY

Management policy: AUTOMATIC

Server pools: orcl

Database instances:

Disk Groups: FRA

Mount point paths:

Services: srvc1

Type: RAC One Node

Online relocation timeout: 30

Instance name prefix: orcl

Candidate servers: host01,host02,host03

Database is administrator managed

Before we start the migration, let’s check the status of the database and it’s instance right now.

[oracle@host01 ~]$ srvctl status database -d orcl

Instance orcl_1 is running on node host01

Online relocation: INACTIVE

So what we have here is a RAC One Node database with the SID orcl and it is running on the node HOST01with the instance orcl_1. So now, we shall try to relocate the instance from this node to the target nodeHOST02. Also it’s shown that the online relocation is not active at the moment.

It’s important to mention that with the version 11201, this task was done by a utility OMOTION but from 11202 onwards, the use of this utility is not required. The release of the software used for this demo was 11203 so obviously, the utility wasn’t required.

The conversion is done using the command SRVCTL RELOCATE DATABASE in which we are going to pass the name of the target node and the option to be in verbose mode for the output. Below is the output of this command:

[oracle@host01 ~]$ srvctl relocate database -d orcl -n host02 -w 30 -v

Configuration updated to two instances

Instance orcl_2 started

Services relocated

Waiting for up to 30 minutes for instance orcl_1 to stop ...

Instance orcl_1 stopped

Configuration updated to one instance

And from another session, we can see that the migration is going on.

[oracle@host01 ~]$ srvctl status database -d orcl

Instance orcl_2 is running on node host02

Online relocation: ACTIVE

Source instance: orcl_1 on host03

Destination instance: orcl_2 on host02

We can see that the second instance has come up and the relocation status is also shown as ACTIVE which means that the relocation is going on. We would need to run the command couple of times as it may take longer for the instance to crash.

[oracle@host01 ~]$ srvctl status database -d orcl

Instance orcl_2 is running on node host02

Online relocation: ACTIVE

Source instance: orcl_1 on host03

Destination instance: orcl_2 on host02

[oracle@host01 ~]$ srvctl status database -d orcl

Instance orcl_2 is running on node host02

Online relocation: ACTIVE

Source instance: orcl_1 on host03

Destination instance: orcl_2 on host02

Finally when the relocation would be over, this would be shown as the output,

[oracle@host01 ~]$ srvctl status database -d orcl

Instance orcl_2 is running on node host02

Online relocation: INACTIVE [oracle@host01 ~]$

As we can see, one the relocation is complete only the instance “orcl_2” is going to be working and the status of ONLINE RELOCATION is going to be completed.

Application Context

RAC 11gR2

Application Context for more usefull and simple Its new Feature in ORACLE RAC 11G

DBMS_APPLICATION_INFO to use and diagnose where the application creating a problem.And why query hang else what ever follow steps .

exec dbms_application_info.set_module(module_name => 'UPDATE RECORD', action_name => 'UPDATE TAB1');
exec DBMS_APPLICATION_INFO.SET_CLIENT_INFO(CLIENT_INFO => 'SUNRAY');
UPDATE TAB1 SET COL1='UPDATED';

create table TAB1 (COL1 varchar2(10)); --- Session 1

insert into tab1 values ('FIRST'); --- Session 1

update tab1 set col1='SECOND'; --- Session 2

Now the Session 2 is hang and going to waiting forever because the same row is beaing updated by another session i.e. Session2 and not commit but session 1
Because we set application context using the DBMS_APPLICATION_INFO package, we can easily identify the problematic code location via the simple query

select sid,event,module,action,client_info
from v$session where state='WAITING' and
WAIT_CLASS='Application';

-------------------

Else old method ---- v$session, v$process, v$sesison_wait, v$system_wait, dba_objects.

FLASHBACK Enable

11gR2 RAC

1. Verify Archive log and Flashback is enable.

archive log list

select FLASHBACK_ON from v$database;

2. cluster_database parameter false

alter system set cluster_database=false scope=false sid='racdb1';

3. Setting up FLASHBACK Area.

alter system set DB_RECOVERY_FILE_DEST_SIZE=200M scope=spfile;
alter system set DB_RECOVERY_FILE_DEST='+FRA/flashback/' scope=spfile;

4. shutdown all instance.

srvctl stop database -d RACDB

5. Mount database.

startup mlount

6. Enable Flashback.

alter database flashback on;

7. set parameter cluster_database true.

alter system set cluster_database=true scope=spfile side='racdb1';

8. shutdown instance.

shutdown ;

9. start all instances.

srvctl start database -d racdb

Archive Logs Enable

11gR2 RAC

1. Set clusetr_database to false for instance:

alter system set cluster_database=false scope=spfile sid='racdb1';

2. Shutdown all instance

srvctl stop database -d racdb

3. Mount Database

startup mount

4. Enable Archiving

alter database archivelog;

5. Set the parameter cluster_database to true for the instance "racdb1".

alter system set cluster_database=true scope=spfile sid='racdb1';

6. Shutdown the local instance

shutdown

7. Bring up all instance

srvctl start database -d racdb

Once in archive log mode,each instance can archive redo logs automatically;

Wednesday, 4 March 2015

GPNP PROFILE 11G R2 RAC

In this post, I will explain what is GPnP profile, what does it contain and how is it used by clusterware.

WHAT IS GPNP PROFILE?

GPNP - is a Grid plug & Play profile management tool its new feature in oracle 11g

The GPnP profile is a small XML file located in GRID_HOME/gpnp/<hostname>/profiles/peer under the name profile.xml. It is used to establish the correct global personality of a node. Each node maintains a local copy of the GPnP Profile and is maintanied by the GPnP Deamon (GPnPD) .

WHAT DOES GPNP PROFILE CONTAIN?

GPnP Profile is used to store necessary information required for the startup of Oracle Clusterware like SPFILE location,ASM DiskString etc.

It contains various attributes defining node personality.

- Cluster name

- Network classifications (Public/Private)

- Storage to be used for CSS

- Storage to be used for ASM : SPFILE location,ASM DiskString etc

- Digital signature information : The profile is security sensitive. It might identify the storage to be used as the root partition of a machine. Hence, it contains digital signature information of the provisioning authority.

Here is the GPnP profile of my RAC setup.

gpnptool can be used for reading/editing the gpnp profile.

[root@host01 peer]# gpnptool get

<?xml version=”1.0″ encoding=”UTF-8″?><gpnp:GPnP-Profile Version=”1.0″ xmlns=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”7″ ClusterUId=”14cddaccc0464f92bfc703ec1004a386″ ClusterName=”cluster01″ PALocation=””><gpnp:Network-Profile><gpnp:HostNetwork id=”gen” HostName=”*”><gpnp:Network id=”net1″ IP=”192.9.201.0″ Adapter=”eth0″ Use=”public”/><gpnp:Network id=”net2″ IP=”10.0.0.0″ Adapter=”eth1″ Use=”cluster_interconnect”/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/><orcl:ASM-Profile id=”asm” DiscoveryString=”” SPFile=”+DATA/cluster01/asmparameterfile/registry.253.783619911″/><ds:Signature xmlns:ds=”http://www.w3.org/2000/09/xmldsig#”><ds:SignedInfo><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”/><ds:SignatureMethodAlgorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1″/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature”/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#sha1″/><ds:DigestValue>4VMorzxVNa+FeOx2SCk1unVBpfU=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>bbzV04n2zSGTtUEvqqB+pjw1vH7i8MOEUqkhXAyloX0a41T2FkDEA++ksc0BafndAk7tR+6LGdppE1aOsaJUtYxQqaHJdpVsJF+sj2jN7LPJlT5NBt+K7b08TLjDID92Se6vEiDAeeKlEbpVWKMUIvQvp6LrYK8cDB/YjUnXuGU=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>

WHO UPDATES GPNP PROFILE?

GPnPd daemon replicates changes to the profile during

- installation,

- system boot or

- when updated

Profile is updated Whenever changes are made to a cluster with configuration tools like

. oifcfg (Change network),

. crsctl (change location of voting disk),

. asmcmd (change ASM_DISKSTRING, SPfile location) etc.

HOW IS GPNP PROFILE USED BY CLUSTERWARE?

To start clusterware, voting disk needs to be accessed. If voting disk is on ASM, this information (that voting disk is on ASM) is read from GPnP profile (<orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/>). The voting disk is read using kfed utility even if ASM is not up.

Next, the clusterware checks if all the nodes have the updated GPnP profile and the node joins the cluster based on the GPnP configuration . Whenver a node is started/added to the cluster, the clusterware software on the starting node starts a GPnP agent.

§ - If the node is already part of the cluster, the GPnP agent reads the existing profile on that node.

§ - If the node is being added to the cluster, GPnP agent locates agent on another existing node using multicast protocol (provided by mDNS) and gets the profile from that agent.

Next CRSD needs to read OCR to startup various resources on the node and hence update it as status of resources changes. Since OCR is also on ASM, location of ASM SPfile should be known.

The order of searching the ASM SPfile is

§ - GPnP profile

§ - ORACLE_HOME/dbs/spfile<sid.ora>

§ - ORACLE_HOME/dbs/init<sid.ora>

In cluster environment, the location of SPfile for ASMread from GPnP profile.

[grid@host01 peer]$ gpnptool getpval -asm_spf

Warning: some command line parameters were defaulted. Resulting command line:

/u01/app/11.2.0/grid/bin/gpnptool.bin getpval -asm_spf -p=profile.xml -o-

+DATA/cluster01/asmparameterfile/registry.253.793721441

The oputput of the query shows that SPfile is on ASM in DATA diskgroup. To find out the

location of ASM disks, following query is issued :

[root@host01 peer]# gpnptool getpval -asm_dis

ASM-Profile id=”asm” DiscoveryString=””

The device headers of every device in the disk string returned by the above query are scanned (if configured by you at ASM initial setup time). Here Discovery String is blank is as ASMDISKSTRINGS parameter has not been set. Hence, headers of all the ASM disks are scanned .

Here, I have shown the output of the query only on the disk which contains SPfile.(spfflg is not null)

[root@host01 ~]# kfed read /dev/sdb3 | grep -E ‘spf|ausize’

kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000

kfdhdb.spfile: 16 ; 0x0f4: 0x00000010

kfdhdb.spfflg: 1 ; 0x0f8: 0x00000001

In the output above, we see that

the device /dev/sdb3 contains a copy of the ASM spfile (spfflg=1).

The ASM spfile location starts at the disk offset of 16 (spfile=16)

Considering the allocation unit size (kfdhdb.ausize = 1M), let’s dump the ASM spfile from the device:

[root@host01 ~]# dd if=/dev/sdb3 of=spfileASM_Copy2.ora skip=16 bs=1M count=1

1+0 records in

1+0 records out

1048576 bytes (1.0 MB) copied, 0.170611 seconds, 6.1 MB/s

[root@host01 ~]# strings spfileASM_Copy2.ora

+ASM1.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value

+ASM2.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value

+ASM3.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value

+ASM3.asm_diskgroups=’FRA’#Manual Mount

+ASM2.asm_diskgroups=’FRA’#Manual Mount

+ASM1.asm_diskgroups=’FRA’#Manual Mount

*.asm_power_limit=1

*.diagnostic_dest=’/u01/app/grid’

*.instance_type=’asm’

*.large_pool_size=12M

*.remote_login_passwordfile=’EXCLUSIVE’

Using the parameters in SPfile, ASM is started.

Once ASM is up, OCR is read by CRSD and various resources on the node are started.

Each node reads network information in GPnP profile and using GNS, negotiates appropriate network identity for itself . Hence, nodes can be dynamically added/deleted.

What happens if GPnP profile is lost?

To know please click here.

————————————————————

GPNPTOOL COMMAND REFERENCE:

How to get GPNP profile info. into local node RAC1
[root@rac1 ~]# gpnptool get

- How to read the profile

[root@inssc3 bin]# ./gpnptool get

- How to find GPnP Deamons are running on the local node

[root@host01 peer]# gpnptool lfind

Success. Local gpnpd found.

- How to find the location of ASM spfile if the ASM is down

[root@host01 peer]# gpnptool getpval -asm_spf

+DATA/cluster01/asmparameterfile/registry.253.783619911

- How to find all RD-discoverable resources of given type

[root@host01 peer]# gpnptool find

Found 3 instances of service ‘gpnp’.

mdns:service:gpnp._tcp.local.://host03:18015/agent=gpnpd,cname=cluster01,host=host03,pid=5066/gpnpd h:host03 c:cluster01

mdns:service:gpnp._tcp.local.://host02:17637/agent=gpnpd,cname=cluster01,host=host02,pid=5236/gpnpd h:host02 c:cluster01

mdns:service:gpnp._tcp.local.://host01:16633/agent=gpnpd,cname=cluster01,host=host01,pid=5206/gpnpd h:host01 c:cluster01

Related Topics:

EDIT GPNP PROFILE

LOSE GPNP PROFILE

Labels

Wednesday, 25 March 2015

Friday, 20 March 2015

Friday, 13 March 2015

Wednesday, 4 March 2015