Monday, 15 February 2016

SRDC - How to Collect Standard Information for Issues Where 'PMON failed to acquire latch, see PMON dump' Warnings are Seen in the Alert Log (Doc ID 1951971.1)

Product:
  • Oracle Database - Enterprise Edition - Version 6.0.0.0 and later
  • Information in this document applies to any platform.
What is being collected and why?
Provide a step by step guide to collecting information for issues where 'PMON failed to acquire latch, see PMON dump' warnings are seen in the alert log
Action Plan
Diagnostics Checklist
 1) Alert Log covering the period of the warning                                       [  ]
 2) Collect the PMON trace file                                                        [  ]

 Required diagnostics if the database hangs at the time of the warning
 3) Generate 2 Hang Analyze Level 3 traces                                             [  ]
 4) Generate 1 Systemstate Level 258 trace                                             [  ]
 5) Collect OSWatcher data                                                             [  ]
 6) Collect other trace files                                                          [  ]
 7) Gather Other Diagnostics                                                           [  ]

 With Diagnostic pack license
   a) Generate and Check ADDM report, Implementing findings                            [  ]
   b) AWR report covering problem period                                               [  ]
   d) AWR Compare Periods report comparing the good and bad periods                    [  ]
   e) Collect an ASH report for the hang period                                        [  ]

 Without  Diagnostic pack license

   a) Gather a statspack report at the time of the issue                               [  ]
   b) Gather a statspack report covering good period for comparison                    [  ]

 Collate and upload the diagnostic information

 8) Collate and upload the diagnostic information                                      [  ]


Diagnostics for 'PMON failed to acquire latch, see PMON dump' Warnings Seen in the Alert Log

1) Alert log covering the time that the warning occurred

The alert log of a database is a chronological log of messages, errors and warnings that occur. To find the directory in which the alert log resides, use the following select.

SQL> select value from v$diag_info where name ='Diag Trace';

VALUE
-----------------------------------------
/dbadmin/diag/rdbms/v12101/trace

The alert.log is named alert_SID.log.

In an example where the SID is "v120101" and the directory the alert log resided in was "/dbadmin/diag/rdbms/v12101/trace", the full path of the alert log would be:

/dbadmin/diag/rdbms/v12101/trace/alert_v12101.log


2) Collect the PMON trace file

Collect the PMON trace file. The PMON trace file resides in the same directory as the alert log. You can find it using the following select:

SQL> select value from v$diag_info where name ='Diag Trace';

VALUE
-----------------------------------------
/dbadmin/diag/rdbms/v12101/trace

The PMON trace is named SID_pmon_PROCESSID.trc. 

In an example where the SID is "v120101", PROCESSID = 3841 and the directory the alert log resided in was "/dbadmin/diag/rdbms/v12101/trace", the full path of the PMON trace would be:

/dbadmin/diag/rdbms/v12101/trace/v12101_pmon_3841.trc

Getting a PMON process dump from Oracle
Oracle has delivered a hidden tool within server manager (SQL*Plus in Oracle8 and beyond), that allows you to view specific internal Oracle structures.
The following procedure provides a process dump for the Oracle database and shows all statistics for the PMON background process.. 

1 – We start by issuing the oradebug setorapid command to process 2.  Process number 2 is the Oracle process monitor (PMON)
SVRMGR> oradebug setorapid 2
Unix process pid: 25159, image: ora_pmon_test

2 – Next we issue the procstat command to generate the statistics 
SVRMGR> oradebug procstat
Statement processed.

3 – Now we can use the TRACEFILE_NAME command to see the location of our trace file
SVRMGR> oradebug TRACEFILE_NAME
/app/oracle/admin/orcl/bdump/pmon_25159.trc

Below is the listing from this procedure.  As you can see, this provides detailed information regarding important PMON activities.
SQL> !cat /u01/app/oracle/admin/prodc1/bdump/prodc1_pmon_25159.trc


Required diagnostics if the database hangs at the time of the warning

When the 'PMON failed to acquire latch, see PMON dump' warnings are seen in the alert log, it is likely that the database will hang or appear to hang. It is very important to collect diagnostic traces immediately when you encounter the hang AND while it is occurring. If you capture data long after you first encounter a hang, it may not provide enough information as to what originally caused the hang. The quicker you can capture hang diagnostics the easier they are to interpret since there are likely to be less blocked processes. For this reason capture hanganalyze and systemstate dumps prior to collecting other information such as: AWR/ASH/STATSPACK, etc.

Specifically, Hanganalyze and Systems State dumps must be taken while the hang is occurring otherwise they are of little to no use for diagnosis.

3) Collect HangAnalyze traces

Generate 2 HangAnalyze Level 3 traces 1 minute apart, connected as the sys user:

# sqlplus / as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug tracefile_name
SQL> oradebug hanganalyze 3

The location of the hanganalyze trace is written to the screen by the oradebug tracefile_name command and also after the hanganalyze command has been run. It can be found in the Oracle trace directory.
Wait 1 minute and repeat.

4) Collect Systemstate dump

Generate 1 Systemstate Level 258 , connected as the sys user:

# sqlplus / as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug tracefile_name
SQL> oradebug dump systemstate 258

The location of the systemstate trace is written to the screen by the oradebug tracefile_name command. It can be found in the Oracle trace directory.

5) Collect OSWatcher (osw) data

If you have OSWatcher installed and it was collecting data over the period of the problem, then "tar up" the OSWatcher archive data and upload the entire archive directory to the SR. The location of OSWatcher data can be found by looking in /tmp/osw.hb. If not then omit this step. See Document 301137.1 for details about OSWatcher.

6) Collect any other Oracle trace files that got generated during the time of the hang

Check the Oracle trace directory for any other files whose timestamp is within the timestamp window of the hang.

7) Gather Additional Diagnostics

The final part of the collection will depend upon whether you have a diagnostic pack license:


With Diagnostic Pack License
If you have the diagnostic pack license:


7a) Generate and Check ADDM report, implement findings, re-test
Collect an ADDM (Automatic Database Diagnostic Monitor) report covering a short period while the problem is occurring (a 1 hour snapshot duration (or less) is preferable).  Gather the ADDM report through Oracle Enterprise Manager or in SQL*Plus using the ADDM report generation script: $ORACLE_HOME/rdbms/admin/addmrpt.sql :
# cd $ORACLE_HOME/rdbms/admin/
# sqlplus SCOTT/TIGER
SQL> START addmrpt.sql

For more information see Document 1680075.1 "How to Generate and Check an ADDM report" for instructions on how to do this.

Once you have Collected an ADDM Report, reviewed its findings and implemented any recommendations, re-check the performance. If you are still encountering problems, proceed to the next step.

Gather Diagnostics AWR reports

Collect AWR reports covering 2 periods. These periods should be of the same duration(a 1 hour snapshot duration (or less) is preferable) and should cover:
  • The same problem period as with the ADDM report above
  • (If possible) Another period where the problem was not being seen, but the load was similar (for comparison)

7b) AWR report for the problem period
To gather an AWR report, use the AWR report generation script: $ORACLE_HOME/rdbms/admin/awrrpt.sql :

# cd $ORACLE_HOME/rdbms/admin/
# sqlplus SCOTT/TIGER
SQL> START awrrpt.sql

The report will prompt whether you want HTML or TXT (choose HTML)
For more details, refer to the following article:

Document 1903158.1 How to Collect Standard Diagnostic Information Using AWR Reports for Performance Issues

7c) AWR report for another period

If possible, gather a report from another period with similar load where problems are not seem. For example, if you had problems at 2pm today but things were fine yesterday, collect a report from yesterday at 2pm. Use the procedure above to collect the report.

7d) AWR Compare Periods report comparing the 'good' and 'bad' periods

If possible, gather a Workload Repository Compare Periods report to compare the good and bad periods selected above. This can help quickly identify differences so that efforts can be focused on likely problem areas. To collect the report use the awrddrpt.sql from the $ORACLE_HOME/rdbms/admin directory supplying the Begin and End Snapshot Ids for both periods:

# cd $ORACLE_HOME/rdbms/admin/
# sqlplus SCOTT/TIGER
SQL> START awrddrpt.sql

7e) ASH report

If SQL performance is suspected as the cause of the slowness then collect an ASH report for the same problem period. To gather an ASH report use the ASH report generation script: $ORACLE_HOME/rdbms/admin/ashrpt.sql :

# cd $ORACLE_HOME/rdbms/admin/
# sqlplus SCOTT/TIGER
SQL> START ashrpt.sql

The report will prompt whether you want HTML or TXT (choose HTML).
For more details on gathering ASH reports, refer to the following article:

Document 1903145.1 How to Collect Standard Diagnostic Information Using ASH Reports for Performance Issues

Without Diagnostic Pack License

If you do not have a diagnostic pack license then you need to install and gather a statspack report at the time of the issue. To install statspack, see:

Document 1931103.1 How to Collect Standard Diagnostic Information Using Statspack Reports for Performance Issues

The Statspack report is similar to AWR, but AWR is much more comprehensive and contains significant additional information not available in Statspack.

7a) Gather a statspack report at the time of the issue

Assuming that you had statspack installed at the time of the incident and staspack snapshots were being collected, you can gather a statspack report using the following command:

SQL> connect perfstat/perfstat
SQL> start ?/rdbms/admin/spreport

You will be prompted for the beginning snapshot Id, the ending snapshot Id and the name of the report text file to be created

7b) Gather a statspack report for another period

If possible, gather a report from another period with similar load where problems are not seem. For example, if you had problems at 2pm today but things were fine yesterday, collect a report from yesterday at 2pm. Use the procedure above to collect the report.

Next Steps

8) Collate and upload the diagnostic information
You can find more guidance regarding troubleshooting 'PMON failed to acquire latch, see PMON dump' warnings in the following document:

Document 976714.1 Resolving Issues Where 'PMON failed to acquire latch, see PMON dump' Warnings are Seen in the Alert Log

If you have been unable to resolve your issue, please collate the information and upload the files to support along with a clear problem explanation. In most cases the supplied information will be sufficient to progress the issue towards a solution but, in some cases, further follow up information may be required. The list of files to upload will be:

Alert log covering problem period along with any trace files referenced in that period
PMON trace file: SID_pmon_PROCESSID.trc
Hanganalyze trace file(s)
Systemstate trace file(s)
OSWatcher data
With Diagnostic pack license
Without Diagnostic pack license
ADDM Report (Text)
3 AWR reports (HTML):
  AWR report covering problem period
  AWR report covering good period
  AWR Compare Periods report (good/bad)
ASH (HTML) report for the same period

2 Statspack reports (Text):
  Statspack report covering problem period
  Statspack report covering good period



No comments: