Showing posts with label cpu. Show all posts
Showing posts with label cpu. Show all posts

Sep 30, 2019

Bug 24590018 on exadata SCM0 on TOP process CPU Consumption.





First of all: Sorry for my languaje, my english is not very rich, ๐Ÿ˜”


Straight to the point ... 

Some days ago i was working in a database migration from Onprem to cloud environment (Exadata), and i could see that scm background process it was on TOP con CPU consumption (it was so weird because that database was inactive without active users on it)


TOP example:
op - 13:31:04 up 72 days, 11:01,  4 users,  load average: 13.41, 9.01, 7.45
Tasks: 5896 total,  13 running, 5870 sleeping,   0 stopped,  13 zombie
%Cpu(s): 22.0 us,  2.6 sy,  0.0 ni, 75.1 id,  0.0 wa,  0.0 hi,  0.1 si,  0.1 st
KiB Mem : 74261779+total,  5118632 free, 57913158+used, 15836755+buff/cache
KiB Swap: 16777212 total, 16776784 free,      428 used. 14184574+avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 74136 oracle    20   0 8299256  89592  82416 R  98.4  0.0  30840:38 ora_scm0_ctmtst
265577 oracle    20   0   79.7g 865372  85984 R  98.4  0.1   1518:56 oracle
343847 oracle    20   0   80.3g 117968  85712 R  67.7  0.0   8:10.80 oracle
 49052 odhagent  20   0   25.7g 464020  10012 S  28.8  0.1   4959:42 java
 24823 root      20   0 6273548 822004  61116 S  24.1  0.1   1526:05 ohasd.bin
  1887 oracle    20   0 8282336  99408  91092 S  22.6  0.0   0:00.72 oracle_1887_ctm
  1863 oracle    20   0 8413404  99384  91068 S  22.3  0.0   0:00.71 oracle_1863_ctm
393677 oracle    20   0   79.0g  84496  70540 S  22.3  0.0   0:02.01 oracle
396019 oracle    20   0  180680  82388  28944 S  21.9  0.0   0:04.45 rman
  1872 oracle    20   0 8413408  98800  90480 S  21.6  0.0   0:00.69 oracle_1872_ctm
  1877 oracle    20   0 8282332  98908  90572 S  21.6  0.0   0:00.69 oracle_1877_ctm
255797 oracle    -2   0 7749688  63404  59912 S  21.6  0.0   1159:44 ora_vktm_ctmts5
396710 oracle    20   0  180692  82420  28992 S  21.6  0.0   0:04.43 rman
  1867 oracle    20   0 8544480  98284  89968 S  21.3  0.0   0:00.68 oracle_1867_ctm

We can also observe that process it had 514 hrs aprox. or 21 days aprox. of CPU consumption, awesome ๐Ÿ˜ณ๐Ÿคจ

Although i had the instance running only few days ago, i cloud see that BG procces SCM it was on TOP of the list : / very weird. Well, that behaviour is a issue, specific this:
12.2 RAC DB Background process SCM0 consuming excessive CPU (Doc ID 2373451.1)

That is a bug: Bug 24590018 - RAC PERF: SCM0 PROCESS USING 100% CPU, FG'S USING ~80% SYS CPU POSTING SCM0

From the note the support mentions us the solution: 

The DLM Statistics Collection and Management slave (SCM0) is responsible for collecting and managing the statistics related to global enqueue service (GES) and global cache service (GCS). This slave exists only if DLM statistics collection is enabled.
The value is set to 1. Please go ahead and run the following command to change the value of _dlm to 0:
kill -9 <os pid of SCM0>


alter system set "_dlm_stats_collect" = 0 scope = spfile sid = '*';
This does require a reboot for changes to take effect. If a reboot is not an option, as a workaround you may kill the SCM0 process at OS level, it will respawn a new process soon.

Disabling DML STAT COLLECT has no negative impact on performance or other things on 12.2. However on 18c or 19c it should be enabled again. For the moment there are not report this negative behaviour on the last database versions.


Mar 28, 2017

Apply PSU to Rac 12c on AIX7.1 (not pluggable database environment)

Recently I had opportunity to apply PSU to a rac environment in 12cR1 (not pluggable database architecture) on AIX7.1. The steps I followed are the next one:

1.- Make sure that I have the Opatch last version  (OPatch Version: 12.2.0.1.8) on all nodes (same version)
2.- Download PSU 24917825 for RAC Infra.
3.- Download PSU 24732082 for databases
4.- Take backup of ORACLE_HOME of Databases and Grid Infraestructure (also of your OraInventory) for all nodes (I suggest )

I have my install medios in the path: /u01/medios/ for my environments
(same Path on all nodes)

The PSU application on RAC nodes  can be performed  different ways. I choose this method and I will explain  you.

Apply PSU for Infraestructure GRID


The steps are the following (to do with grid user):

cd /u01/medios
unzip p24917825_121020_AIX64-5L.zip   

Note: For Opatch version 12.2.0.1.8 is not necessary to create response File with emocmrsp command.

This steps to do with root user but you will get error OPATCHAUTO-72046: Invalid wallet parameters (Doc ID 2150070.1). Here we will review patch conflicts:

export ORACLE_HOME=/u01/app/12.1.0/grid
/u01/app/12.1.0/grid/OPatch/opatchauto apply /u01/medios/24917825 –analyze


If there are not patch conflicts we can proceed to apply patch to Infra. I remember you that this patch is rolling installable. I take 30’ minutes to apply (with root user):


[nodo01]#> export ORACLE_HOME=/u01/app/12.1.0/grid
[nodo01]#>export export PATH=$PATH:$ORACLE_HOME/OPatch
[nodo01]#>/u01/app/12.1.0/grid/OPatch/opatchauto apply /u01/medios/24917825  -oh /u01/app/12.1.0/grid

OPatchauto session is initiated at Sat Feb  4 04:09:12 2017

System initialization log file is /u01/app/12.1.0/grid/cfgtoollogs/opatchautodb/systemconfig2017-02-04_03-09-19AM.log.

Session log file is /u01/app/12.1.0/grid/cfgtoollogs/opatchauto/opatchauto2017-02-04_03-09-52AM.log
The id for this session is DM67

Executing OPatch prereq operations to verify patch applicability on home /u01/app/12.1.0/grid
Patch applicablity verified successfully on home /u01/app/12.1.0/grid


Verifying patch inventory on home /u01/app/12.1.0/grid
Patch inventory verified successfully on home /u01/app/12.1.0/grid


Bringing down CRS service on home /u01/app/12.1.0/grid
Prepatch operation log file location: /u01/app/12.1.0/grid/cfgtoollogs/crsconfig/crspatch_cmh-racn1_2017-02-04_04-16-44AM.log
CRS service brought down successfully on home /u01/app/12.1.0/grid


Start applying binary patch on home /u01/app/12.1.0/grid
Successfully executed command: /usr/sbin/slibclean

Binary patch applied successfully on home /u01/app/12.1.0/grid


Starting CRS service on home /u01/app/12.1.0/grid
Postpatch operation log file location: /u01/app/12.1.0/grid/cfgtoollogs/crsconfig/crspatch_cmh-racn1_2017-02-04_04-34-16AM.log
CRS service started successfully on home /u01/app/12.1.0/grid


Verifying patches applied on home /u01/app/12.1.0/grid
Patch verification completed with warning on home /u01/app/12.1.0/grid

OPatchAuto successful.

--------------------------------Summary--------------------------------

Patching is completed successfully. Please find the summary as follows:

Host:cmh-racn1
CRS Home:/u01/app/12.1.0/grid
Summary:

==Following patches were SUCCESSFULLY applied:

Patch: /u01/medios/24917825/21436941
Log: /u01/app/12.1.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2017-02-04_03-18-06AM_1.log

Patch: /u01/medios/24917825/24732082
Log: /u01/app/12.1.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2017-02-04_03-18-06AM_1.log

Patch: /u01/medios/24917825/24828633
Log: /u01/app/12.1.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2017-02-04_03-18-06AM_1.log

Patch: /u01/medios/24917825/24828643
Log: /u01/app/12.1.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2017-02-04_03-18-06AM_1.log



OPatchauto session completed at Sat Feb  4 04:42:40 2017
Time taken to complete the session 33 minutes, 30 seconds

Now is necessary to repeat same process  with all rest of nodes. When we have the Grid Infraestructure on all nodes with the same version, we can to apply the psu to database software.

Apply PSU to database software


Just like te previous situation, this part is rolling installation. We will start with the node one. First step is shutdown all db service in that node: (all following command with oracle user)
srvctl stop instance -d  database –I instance_name
Then we’ll check Patch conflict:

[nodo01]#>cd /u01/medios/24732082
[nodo01]#>opatch prereq CheckConflictAgainstOHWithDetail -ph ./


If all previous results fine, we’ll proceed with apply the database PSU:

[nodo01]#>cd  /u01/medios/24732082
[nodo01]#>opatch  apply
`

Oracle Interim Patch Installer version 12.2.0.1.8
Copyright (c) 2017, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/12.1.0/dbhome_1
Central Inventory : /u01/app/oraInventory
  from           : /u01/app/oracle/product/12.1.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.8
OUI version       : 12.1.0.2.0
Log file location : /u01/app/oracle/product/12.1.0/dbhome_1/cfgtoollogs/opatch/opatch2017-02-04_04-01-02AM_1.log

Verifying environment and performing prerequisite checks...
OPatch continues with these patches:   23054246  24006101  24732082  

Do you want to proceed? [y|n]
y
User Responded with: Y
All checks passed.

This node is part of an Oracle Real Application Cluster.
Remote nodes: 'cmh-racn2' 'cmh-racn3' 'cmh-racn4'
Local node: 'cmh-racn1'
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/12.1.0/dbhome_1')


Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Applying sub-patch '23054246' to OH '/u01/app/oracle/product/12.1.0/dbhome_1'

Patching component oracle.rdbms.dv, 12.1.0.2.0...

Patching component oracle.rdbms.rsf, 12.1.0.2.0...

Patching component oracle.rdbms.rman, 12.1.0.2.0...

Patching component oracle.rdbms, 12.1.0.2.0...

Patching component oracle.rdbms.dbscripts, 12.1.0.2.0...

Patching component oracle.ldap.rsf, 12.1.0.2.0...
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/lib/libzt12.a" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/lib/libzt12.a"

Patching component oracle.install.deinstalltool, 12.1.0.2.0...
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/lib/libldapjclnt12.so" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/lib/libldapjclnt12.so"

Patching component oracle.ldap.rsf.ic, 12.1.0.2.0...

Patching component oracle.oracore.rsf, 12.1.0.2.0...

Patching component oracle.ctx, 12.1.0.2.0...

Patching component oracle.xdk, 12.1.0.2.0...
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/xmldiff" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/xmldiff"
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/xvm" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/xvm"
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/xmlpatch" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/xmlpatch"
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/xsl" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/xsl"
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/xmlcg" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/xmlcg"

Patching component oracle.nlsrtl.rsf, 12.1.0.2.0...
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/lxegen" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/lxegen"
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/lcsscan" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/lcsscan"
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/lxchknlb" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/lxchknlb"

Patching component oracle.xdk.parser.java, 12.1.0.2.0...
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/bin/xml" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/bin/xml"

Patching component oracle.ctx.atg, 12.1.0.2.0...
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/lib/libanllexer12.so" because it is the same as
the file in incoming patch "/u01/medios/24732082/23054246/files/lib/libanllexer12.so"
Applying sub-patch '24006101' to OH '/u01/app/oracle/product/12.1.0/dbhome_1'

Patching component oracle.sqlplus, 12.1.0.2.0...

Patching component oracle.rdbms, 12.1.0.2.0...

Patching component oracle.network.listener, 12.1.0.2.0...

Patching component oracle.network.rsf, 12.1.0.2.0...

Patching component oracle.rdbms.dv, 12.1.0.2.0...

Patching component oracle.rdbms.rman, 12.1.0.2.0...

Patching component oracle.rdbms.dbscripts, 12.1.0.2.0...

Patching component oracle.sqlplus.ic, 12.1.0.2.0...

Patching component oracle.rdbms.rsf, 12.1.0.2.0...
Applying sub-patch '24732082' to OH '/u01/app/oracle/product/12.1.0/dbhome_1'

Patching component oracle.rdbms.install.plugins, 12.1.0.2.0...

Patching component oracle.rdbms.rsf, 12.1.0.2.0...
Skip copying to "/u01/app/oracle/product/12.1.0/dbhome_1/rdbms/mesg/oraus.msb" because it is the same as
the file in incoming patch "/u01/medios/24732082/24732082/files/rdbms/mesg/oraus.msb"

Patching component oracle.tfa, 12.1.0.2.0...

Patching component oracle.rdbms.rman, 12.1.0.2.0...

Patching component oracle.rdbms, 12.1.0.2.0...

Patching component oracle.rdbms.dbscripts, 12.1.0.2.0...
…………………..
…………………. (Here I cut this part becaouse es very very long ☹   )
………………….

Patching in rolling mode.

Remaining nodes to be patched:
'cmh-racn2' 'cmh-racn3' 'cmh-racn4'
What is the next node to be patched?



In that part the installation will ask us a question about which is the name of next hostname of Rac for the appy PSU. In this moment we will start again the database service only in this node (node 1), while we need shutdown the database service in the next one node (nodo2). When we have the instance up in the node 1 and down the service in the node , we can proceed with the installer.

When we have finished the installation on all nodes, we need to load the modified sql into the database.

Load modified sql into database (datapatch)

This is only necessary to apply once time, preferably node 1 with user oracle:


cd $ORACLE_HOME/OPatch
$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 Production on Wed Feb 22 05:35:11 2017
Copyright (c) 2012, 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_29229074_2017_02_22_05_35_11/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 22674709 (Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)):
 Installed in the binary registry only
Bundle series PSU:
 ID 170117 in the binary registry and ID 160419 in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
 Nothing to roll back
 The following patches will be applied:
   22674709 (Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016))
   24732082 (DATABASE PATCH SET UPDATE 12.1.0.2.170117)

Installing patches...
Patch installation complete.  Total patches installed: 2

Validating logfiles...
Patch 22674709 apply: SUCCESS
 logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22674709/20077876/22674709_apply_CMH_2017Feb22_05_36_19.log (no errors)
Patch 24732082 apply: SUCCESS
 logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24732082/20919987/24732082_apply_CMH_2017Feb22_05_43_58.log (no errors)
SQL Patching tool complete on Wed Feb 22 05:44:33 2017
You have mail in /usr/spool/mail/oracle



And now when we execute query to catalog we can see that our database is update with the last PSU:


We can to do the same with de oraInventory  for the grid and database ORACLE_HOME:

Remains to explain what will happen with respect to OJVM upgrade. (I’ll have soon)

Bye : )

Jul 4, 2016

Problema con smon consumiendo 100% de CPU, luego de matar una sesiรณn de base de datos



Les comento una situaciรณn que me acontenciรณ con respecto al uso desmesurado en CPU de parte del SMON luego de matar una sesiรณn de base de datos que causaba  un bloqueo. Me paso en un cliente que luego de bajar unas sesiones que estaban ocasionando un bloqueo el SMON quedรณ colgado consumiendo el 100% de 1 core y ojo que por lo menos aquello durรณ mรกs de 24 horas, hasta que apliquรฉ lo que a continuaciรณn mencionarรฉ.


Lo anterior me llamรณ mucho la atenciรณn pues no es lo normal. Buscando encontrรฉ que esto es un bug, a causa de una transacciรณn que el SMON no puede aplicar rollback. Lo siguiente muestra la la transacciรณn con status en DEAD:


Esta base de datos ocupaba undo_management en MANUAL, o sea segmentos de rollback. Como se puede apreciar la transacciรณn que el SMON intenta deshacer se encuentra alojada en el segmento RBS_4. Luego de una bajada/subida que se realizรณ en el motor de base de datos esta sigue mostrando que el SMON se encuentra deshaciendo dicha transacciรณn y tambiรฉn manteniendo la CPU al 100%.

Esto es producto de un bug que afecta a esta versiรณn de base de datos
Bug 18132629 : UNDO SEGMENT IN PARTLY AVAILABLE STAGE, DEAD TRANSACTION EXIST.

Lo normal es ver que en una versiรณn 11g se ocupe undo_management en AUTO y no con MANUAL. Ahora, fuera de eso que no es lo correcto para una versiรณn 11g, la รบnica alternativa que quedรณ fue ocupar un parรกmetro oculto llamado _OFFLINE_ROLLBACK_SEGMENTS el cual debe ser ocupado con mucha precauciรณn, si no se sabe usar puede dejar inconsistencias  o corrupciรณn lรณgica en la base de datos. Antes de aplicar lo siguiente es mejor asegurarse de tener un respaldo de la base de datos. A dicho parรกmetro hay que indicarle el segmento a dejar offline (esta es la รบnica manera de dejarlo offline, puesto que con la manera tradicional "alter rollback segment RBS4 offline" el motor no lo permite puesto que aรบn hay una trasacciรณn activa dentro de รฉl, aunque estรฉ con status en DEAD).

alter system set "_OFFLINE_ROLLBACK_SEGMENTS"='RBS_4' scope=spfile ;

Luego de reiniciar el la base de datos y subirla en modo normal, se intenta eliminar el segmento mencionado sin problemas:

SQL> drop rollback segment RBS_4;

Rollback segment dropped.

Luego de eso se comprueba que finalmente la transacciรณn ya no se encuentra haciendo rollback:

SQL> select b.name "UNDO Segment Name", b.inst# "Instance ID", b.status$ STATUS,
a.ktuxesiz "UNDO Blocks", a.ktuxeusn, a.ktuxeslt xid_slot, a.ktuxesqn
xid_seq,a.ktuxecfl
from x$ktuxe a, undo$ b
where a.ktuxesta = 'ACTIVE' and a.ktuxeusn = b.us#  2    3    4    5 
  6  /

no rows selected

Ojo que esto debe ser ocupado como รบltima opciรณn. Sugiero quitar ese parรกmetro del spfile luego de aplicado este workaround, ojo que en caso de problemas y como todo uso no autorizado de los parรกmetros Ocultos, el soporte de Oracle se lava las manos muchas veces he visto. No olvidar.

alter system reset "_OFFLINE_ROLLBACK_SEGMENTS" scope=spfile;

saludos.