APPLIES TO: 
Oracle Database - Enterprise Edition - Version 9.0.1.0 and later 
Oracle Database - Personal Edition - Version 9.0.1.0 and later Oracle Database - Standard Edition - Version 9.0.1.0 and later Information in this document applies to any platform. 
When a database appears to be hung, it is useful to collect
  information from the database in order to determine the root cause of the
  hang. The root cause of  the hang can often be isolated and solved using
  the diagnostic information gathered. Alternatively, if this is not possible,
  we can use the information obtained in order to help eliminate future
  occurences 
Service Request Data Collection (SRDC) documents have been specially
  designed to provide the reader with the necessary instructions to provide a
  step by step guide to collecting information for a various common Database
  Performance Issues. 
Document 1938786.1 List of Available
  Database Performance Related SRDC Documents 
Database hangs are characterised by a number of processes waiting for
  some other activities to complete. Typically there is one or more blockers
  that are stuck or perhaps working hard and not freeing resources quickly
  enough. In order to diagnose this the following diagnostics are needed: 
A. Hanganalyze and Systemstate Dumps 
B. AWR/Statspack snapshots of General database performance 
C. Up to date RDA 
Please refer to the relevant sections below for more details on how to
  collect these. 
Note: Hangs within Multitenant Databases 
If you are running a Multitenant Database then you should determine whether the hang situation that you are encountering is at the container level or with one specific pluggable databases (PDBs). Once this is established then connect to and collect the diagnostics only within the PDB that is experiencing the issue. If it is unclear where the hang is, then from the diagnostic point of view it is better to collect diagnostics connected to the Root container, so that ALL processes for all PDBs are covered rather than omit useful information. However, if you have a large number of PDBs and only one of them is 'hanging' this could result in the collection of a large amount of unrelated data. With this in mind, please make every effort to identify what is hanging and collect only within that database. 
Hanganalyze and Systemstate dumps provide information on the processes
  in the database at a specific point in time. Hanganalyze provides information
  on all processes involved in the hang chain, whereas systemstate provides
  information on all processes in the database. When looking at a potential
  hang situation, you need to determine whether a process is stuck or moving
  slowly. By collecting these dumps at 2 consecutive intervals  this can
  be established. If a process is stuck, these traces also provide the
  information to start further diagnosis and possibly help to provide the
  solution. 
 
Using SQL*Plus connect as SYSDBA using the following command: 
sqlplus '/ as sysdba' 
If there are problems making this connection then in 10gR2 and above,
  the sqlplus "preliminary connection" can be used : 
sqlplus -prelim '/ as sysdba' 
Note: From 11.2.0.2 onwards, hanganalyze will not produce output under a
  sqlplus "preliminary connection" since it requires a process state
  object and a session state object. If a hanganalyze is attempted, although
  the hanganalyze will appear to be successful: 
SQL>  oradebug hanganalyze 3 
Statement processed. the tracefile will contain the following output: 
HANG ANALYSIS: 
ERROR: Can not perform hang analysis dump without a process state object and a session state object. ( process=(nil), sess=(nil) ) For more about connecting with a preliminary connection, see: 
Document 986640.1 How To Connect Using A
  Sqlplus Preliminary Connection 
Sometimes, database may actually just be very slow and not actually
  hanging. It is therefore recommended,  where possible to get 2
  hanganalyze and 2 systemstate dumps in order to determine whether processes
  are moving at all or whether they are "frozen". 
Hanganalyze 
sqlplus '/ as sysdba' 
oradebug setmypid oradebug unlimit oradebug hanganalyze 3 -- Wait one minute before getting the second hanganalyze oradebug hanganalyze 3 oradebug tracefile_name exit 
Systemstate 
sqlplus '/ as sysdba' 
oradebug setmypid oradebug unlimit oradebug dump systemstate 266 oradebug dump systemstate 266 oradebug tracefile_name exit 
There are 2 bugs affecting RAC that without the relevant patches being
  applied on your system, make using level 266 or 267 very costly. Therefore
  without these fixes in place it highly unadvisable to use these level 
For information on these patches see: 
Document 11800959.8 Bug 11800959 - A
  SYSTEMSTATE dump with level >= 10 in RAC dumps huge BUSY GLOBAL CACHE
  ELEMENTS - can hang/crash instances 
Document 11827088.8 Bug 11827088 - Latch 'gc element' contention, LMHB terminates the instance 
Note:  both bugs are fixed in 11.2.0.3. 
Collection commands
  for Hanganalyze and Systemstate: RAC with fixes for bug 11800959 and bug 11827088 
 For 11g: 
sqlplus '/ as sysdba' 
oradebug setorapname reco oradebug unlimit oradebug -g all hanganalyze 3 oradebug -g all hanganalyze 3 oradebug -g all dump systemstate 266 oradebug -g all dump systemstate 266 exit 
Collection commands
  for Hanganalyze and Systemstate: RAC without fixes for Bug 11800959 and Bug 11827088 
sqlplus '/ as sysdba' 
oradebug setorapname reco oradebug unlimit oradebug -g all hanganalyze 3 oradebug -g all hanganalyze 3 oradebug -g all dump systemstate 258 oradebug -g all dump systemstate 258 exit 
For 10g, run oradebug setmypid instead of oradebug setorapname reco: 
sqlplus '/ as sysdba' 
oradebug setmypid oradebug unlimit oradebug -g all hanganalyze 3 oradebug -g all hanganalyze 3 oradebug -g all dump systemstate 258 oradebug -g all dump systemstate 258 exit 
In RAC environment, a dump will be created for all RAC instances in
  the DIAG trace file for each instance. 
Hanganalyze levels: 
 
Systemstate levels: 
 
If connection to the system is not possible in any form, then please
  refer to the following article which describes how to collect systemstates in
  that situation: 
Document 121779.1 Taking a SYSTEMSTATE
  dump when you cannot CONNECT to Oracle. 
On RAC Systems, hanganalyze, systemstates and some other RAC
  information can be collected using the 'racdiag.sql' script, see: 
Document 135714.1 Script to Collect RAC
  Diagnostic Information (racdiag.sql) 
Sometimes you may wish to collect systemstate information at the time
  that a particular error occurs. This can be done by setting an event in the
  session or system wide to trigger based upon the detection of an error. For
  example, if a hang was being encountered that was related to an ORA-00054
  error, then you could capture a systemstate when the ORA-00054 occurs using
  the following command: 
sqlplus '/ as sysdba' 
ALTER SYSTEM SET events '54 trace name systemstate level 258'; 
The next time an ORA-00054 is encountered, a systemstate will be
  dumped. 
The tracing can be disabled with : 
ALTER SYSTEM SET events '54 trace name context off'; 
You can also set such events in the spfile. See: 
Document 160178.1 How To Set EVENTS In The
  SPFILE  
Note: be aware that this will produce a trace for every occurrence of
  the error. 
Starting from 11g release 1, the dia0 background processes starts
  collecting hanganalyze information and stores this in memory in the
  "hang analysis cache". It does this every 3 seconds for local
  hanganalyze information and every 10 seconds for global (RAC) hanganalyze
  information. This information can provide a quick view of hang chains
  occurring at the time of a hang being experienced. 
For more information see: 
Document 1428210.1 Troubleshooting
  Database Contention With V$Wait_Chains 
Hangs are a visible effect of a number of potential causes, this can
  range from a single process issue to something brought on by a global
  problem. 
Collecting information about the general performance of the database in the build up to, during and after the problem is of primary importance since these snapshots can help to determine the nature of the load on the database at these times and can provide vital diagnostic information. This may prove invaluable in identifying the area of the problem and ultimately resolving the issue. To do this, please take and upload snapshot reports of database performance (AWR (or statspack) reports) immediately before, during and after the hang.. Please refer to the following article for details of what to collect: 
Document 781198.1 Diagnostics for Database
  Performance Issues 
An up to date current RDA provides a lot of additional information
  about the configuration of the database and performance metrics and can be
  examined to spot background issues that may impact performance. 
See the following note on My Oracle Support: 
Document 314422.1 Remote Diagnostic Agent
  (RDA) 4 - Getting Started 
On some systems a hang can occur when the DBA is not available to run
  diagnostics or at times it may be too late to collect the relevant
  diagnostics. In these cases, the following methods may be used to gather
  diagnostics: 
 
Document 362094.1 HANGFG User Guide 
Additionally, this script can collect
  information with lower impact on the target database. 
 
Document 352363.1 LTOM - The On-Board
  Monitor User Guide 
 
Document 459694.1 Procwatcher: Script to
  Monitor and Examine Oracle DB and Clusterware Processes 
Document 1352623.1 How To Troubleshoot Database Contention With Procwatcher 
 
Document 301137.1 OSWatcher User Guide
  (Includes: [Video]) 
Real-Time ADDM is a feature of Oracle Enterprise Manager Cloud Control
  12c that allows you to analyze database performance automatically when you
  cannot logon to the database because it is hung or performing very slowly due
  to a performance issue. It analyzes current performance when database is
  hanging or running slow and reports sources of severe contention. 
For more information see the following video: 
Sometimes we may only notice a hang after it has occurred. In this
  case the following information may help with Root Cause Analysis: 
 
Document 243132.1 10g and above Active
  Session History (Ash) And Analysis Of Ash Online And Offline 
Document 555303.1 ashdump* scripts and post-load processing of MMNL traces 
  | 
 
I serve as a go-to resource for anyone involved in managing and maintaining databases, offering insights and solutions to common challenges, as well as exploring new database technologies and strategies to improve database efficiency and reliability.
Monday, 15 February 2016
How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment