Insights and Strategies for DBA: HangAnalyze & SYSTEMSTATE

Type of Hang issues in oracle

1.Hang Database
2.Hang Session
3.Hang Instance/Performance
4.Hang DMLs

sqlplus -prelim' option and before I perform instance abort, I have collected the hanganalyze (level 3) and systemstate (level 266)

Most critial is HANG DATABASE , Below how to diagnose and detection of HANG issue.

If you encounter a database-hang situation, you need to take system state dumps so that Oracle Support can begin to diagnose the root cause of the problem.
Whenever you take such dumps for a hang, it is important to take at least three of them a few minutes apart, on all instances of your database.
That way, evidence shows whether a resource is still being held from one time to the next.

The maxdump file size should be set to unlimited, as this will generate bigger and larger trace files, depending on the size of the System Global Area (SGA),
the number of sessions logged in, and the workload on the system. The SYSTEMSTATE dump contains a separate section with information for each process.
Normally, you need to take two or three dumps in regular intervals.

Its a HUGE FILE.

- Hanganalyze is a summary and will confirm if the db is really hung or just slow and provides a consistent snapshot.

- Systemstate dump shows what each process on the database is doing

PMON trace file generate, when database in hang state and Execute " oradebug DUMP SYSTEMSTATE " again repeat at the time of Database into HANG STATE

--------------------------
- Trace Level -
--------------------------
1-2 : Only hanganalyze output. No process dump at all.
3 : Level 2 + Dump only processes thought to be in a hang (IN_HANG state.)
4 : Level 3 + Dump leaf nodes (blockers) in wait chains.
5 : Level 4 + Dump all processes involved in wait chains
10 : Dump all processes.

ALTER SESSION SET EVENTS ‘IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 266′;

For Oracle 9.2.0.5 and less use level 10 instead of 266

ALTER SESSION SET EVENTS ‘IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 266′;

Level 266 includes short stacks (Oracle function calls) which are useful for Oracle Developers to determine which Oracle function’s are causing the problem. This is also helpful in matching existing bugs.

In case you are unable to connect to database then capture systemstate using below note

Note 121779.1-Taking a SYSTEMSTATE dump when you cannot CONNECT to Oracle.

Apart from this, following information can also be captured

a)Database alert log

b)AWR report /statspack report for 30-60 min during database hang

c)Output of OS tools to ensure that everything is fine at OS level.

- Following Methods of SYSTEMSTATE.

1. SQL*PLUS

alter session set max_dump_file_size=unlimited;

alter session set events 'immediate trace name systemstate level 10';

2. USING oradebug

(a) Non-Rac

sqlplus "/ as sysdba"

oradebug setmypid

oradebug unlimit

oradebug dump systemstate 266

wait 90 seconds

oradebug dump systemstate 266

wait 90 seconds

oradebug dump systemstate 266

oradebug tracefile_name

quit

(b) RAC

$ sqlplus '/ as sysdba'

oradebug setmypid

oradebug unlimit

oradebug setinst all

oradebug -g all hanganalyze 4

oradebug tracefile_name

3. When the entire database is hung and you cannot connect to SQL*Plus, you can try invoking SQL*Plus with the prelim option if you’re using Oracle 10g or later.

-- RAC / NON-RAC

RAC Hanganalyze and Systemstate with fixes for bug 11800959 and 11827088

sqlplus -prelim

oradebug setmypid

oradebug unlimit

oradebug -g all hanganalyze 3

oradebug -g all dump systemstate 266 ( -g is denote to global n its using in case of RAC)

oradebug tracefile_name

Note: - Above the 3 methods generate the trace file into the USER_DUMP_DEST

Caution: Do not kill critical processes like SMON or PMON as that would terminate the instance.

Other Commands:

To Check the current trace file name:

SQL> oradebug tracefile_name

Flush any pending writes to the trace file and close it:

SQL> oradebug flush
SQL> oradebug close_trace

To trace a specific session with it's OS PID:

SQL> oradebug setospid

To trace a specific session with it's Oracle PID:

SQL> oradebug setorapid

To freeze/Suspend a session:

First you have to get the PID for that session and set ORADEBUG as it:

SQL> oradebug setospid 12518

*12518 is the PID for the session you want to kill

Second Freeze/Hang the session even it running in a middle of something it will freeze:

SQL> oradebug suspend

To UnFreeze/Resume the session to continue it's work:

SQL> oradebug resume

Trace Oracle Clusterware:

Trace CRS events:

SQL> oradebug dump crs 3

Trace CSS behaviour:

SQL> oradebug dump css 3

Trace OCR:

SQL> oradebug dump ocr 3

Remember:

-You can run oradebug in parallel using diffrent sqlplus sessions.

-ORADEBUG utility is poorly documented by Oracle because the caveats with using this tool to avoid potential damage to the database when calling kernel functions.

PART-II

Reference:

Doc ID 976714.1 : Resolving Issues Where 'PMON failed to acquire latch, see PMON dump' Warnings are Seen in the Alert Log.

Doc ID 452358.1 : How to Collect Diagnostics for Database Hanging Issues.

Doc ID 1951971.1 : SRDC - How to Collect Standard Information for Issues Where 'PMON failed to acquire latch, see PMON dump' Warnings are Seen in the Alert Log.

Doc ID 215858.1 : Interpreting HANGANALYZE trace files to diagnose hanging and performance problems for 9i and 10g.

Insights and Strategies for DBA

Labels

Tuesday, 2 December 2014

HangAnalyze & SYSTEMSTATE

No comments: