1 / 82

Julian Dyke Independent Consultant

RAC Troubleshooting. Julian Dyke Independent Consultant. Web Version - May 2008. © 2008 Julian Dyke. juliandyke.com. Agenda. Installation and Configuration Oracle Clusterware ASM and RDBMS. Installation and Configuration. Cluster Verification Utility Overview. Introduced in Oracle 10.2

Ava
Download Presentation

Julian Dyke Independent Consultant

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAC Troubleshooting Julian Dyke Independent Consultant Web Version - May 2008 ©2008 Julian Dyke juliandyke.com

  2. Agenda • Installation and Configuration • Oracle Clusterware • ASM and RDBMS

  3. InstallationandConfiguration

  4. Cluster Verification UtilityOverview • Introduced in Oracle 10.2 • Checks cluster configuration • stages - verifies all steps for specified stage have been completed • components - verifies specified component has been correctly installed • Supplied with Oracle Clusterware • Can be downloaded from OTN (Linux and Windows) • Also works with 10.1 (specify -10gR1 option) • For earlier versions see Metalink Note 135714.1Script to Collect RAC Diagnostic Information (racdiag.sql)

  5. Cluster Verification UtilityCVUQDISK Package • On the Red Hat 4 and Enterprise Linux platforms, the following additional RPM is required for CLUVFY cvuqdisk-1.0.1-1.rpm • This package is supplied in the clusterware/cluvfy/rpm directory on the clusterware CD-ROM • It can also be download from OTN • On each node as the root user install the RPM using: rpm -ivh cvuqdisk-1.0.1-1.rpm

  6. Cluster Verification UtilityStages • CLUVFY stages include:

  7. Cluster Verification UtilityComponents • CLUVFY components include:

  8. Cluster Verification UtilityExample • For example, to check configuration before installing Oracle Clusterware on node1 and node2 use: sh runcluvfy.sh stage -pre crsinst -n london1,london2 • Checks: • node reachability • user equivalence • administrative privileges • node connectivity • shared stored accessibility • If any checks fail append -verbose to display more information

  9. Cluster Verification Utility Trace & Diagnostics • To enable trace in CLUVFY use: export SRVM_TRACE = true • Trace files are written to the $CV_HOME/cv/log directory • By default this directory is removed immediately after CLUVFY is execution • On Linux/Unix comment out the following line in runcluvfy.sh # $RM -rf $CV_HOME • Pathname of CV_HOME directory is based on operating system process e.g: /tmp/18124 • It can be useful to echo value of CV_HOME in runcluvfy.sh: echo CV_HOME=$CV_HOME

  10. Oracle Universal Installer (OUI)Trace & Diagnostics • On Unix/Linux to launch the OUI with tracing enabled use: ./runInstaller -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2 • Log files will be written to $ORACLE_BASE/oraInventory/logs • To trace root.sh execute it using: sh -x root.sh • Note that it may be necessary to cleanup the CRS installation before executing root.sh again

  11. DBCATrace & Diagnostics • To enable trace for the DBCA in Oracle 9.0.1 and above • Edit $ORACLE_HOME/bin/dbca and change # Run DBCA$JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin-mx64m -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca$ARGUMENTS • to # Run DBCA$JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin-mx64m -DTRACING.ENABLED=true -DTRACING.LEVEL=2 -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca$ARGUMENTS • Redirect standard output to a file e.g. $ dbca > dbca.out &

  12. OracleClusterware

  13. Oracle ClusterwareOverview • Provides • Node membership services (CSS) • Resource management services (CRS) • Event management services (EVM) • In Oracle 10.1 and above resources include • Node applications • ASM Instances • Database • Instances • Services • Node applications include: • Virtual IP (VIP) • Listeners • Oracle Notification Service (ONS) • Global Services Daemon (GSD)

  14. Oracle ClusterwareVirtual IP (VIP) • Node application introduced in Oracle 10.1 • Allows Virtual IP address to be defined for each node • All applications connect using Virtual IP addresses • If node fails Virtual IP address is automatically relocated to another node • Only applies to newly connecting sessions

  15. Oracle ClusterwareVIP (Virtual IP) Node Application Before After VIP1 VIP2 VIP1 VIP1 VIP2 Listener1 Listener2 Listener1 Listener2 Instance1 Instance2 Instance1 Instance2 Node 1 Node 2 Node 1 Node 2

  16. Oracle ClusterwareVIP (Virtual IP) Node Application • On Linux during normal operation, each node will have one VIP address. For example: [root@server3]# ifconfig eth0 Link encap:Ethernet HWaddr 00:11:D8:58:05:99 inet addr:192.168.2.103 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::211:d8ff:fe58:599/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6814 errors:0 dropped:0 overruns:0 frame:0 TX packets:10326 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:684579 (668.5 KiB) TX bytes:1449071 (1.3 MiB) Interrupt:217 Base address:0x8800 eth0:1 Link encap:Ethernet HWaddr 00:11:D8:58:05:99 inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:217 Base address:0x8800 • The resource for VIP address for 192.168.2.203 is initially running on server3

  17. Oracle ClusterwareVIP (Virtual IP) Node Application • If Oracle Clusterware on server3 is shutdown, the VIP resource is transferred to another node (in this case server11) [root@server11]# ifconfig eth0 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55 inet addr:192.168.2.111 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::21d:7dff:fea3:a55/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2792 errors:0 dropped:0 overruns:0 frame:0 TX packets:4097 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:329891 (322.1 KiB) TX bytes:593615 (579.7 KiB) Interrupt:177 Base address:0x2000 eth0:1 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55 inet addr:192.168.2.211 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:177 Base address:0x2000 eth0:2 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55 inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:177 Base address:0x2000

  18. Oracle ClusterwareVIP Failover • VIP addresses can occasionally be failed over incorrectly. • For example: HA Resource Target State----------- ------ -----ora.server11.vip application ONLINE on server11ora.server12.vip application ONLINE on server12ora.server3.vip application ONLINE on server11ora.server4.vip application ONLINE on server4 [root@server3]# ./crs_relocate ora.server3.vip -c server3Attempting to stop `ora.server3.vip` on member `server11`Stop of `ora.server3.vip` on member `server11` succeeded.Attempting to start `ora.server3.vip` on member `server3`Start of `ora.server3.vip` on member `server3` succeeded. HA Resource Target State----------- ------ -----ora.server11.vip application ONLINE on server11ora.server12.vip application ONLINE on server12ora.server3.vip application ONLINE on server3ora.server4.vip application ONLINE on server4

  19. Oracle ClusterwareLogging • In Oracle 10.2, Oracle Clusterware log files are created in the $CRS_HOME/log directory • can be located on shared storage • $CRS_HOME/log directory contains • subdirectory for each node e.g. $CRS_HOME/log/server6 • $CRS_HOME/log/<node> directory contains: • Oracle Clusterware alert log e.g. alertserver6.log • client - logfiles for OCR applications including CLSCFG, CSS, OCRCHECK, OCRCONFIG, OCRDUMP and OIFCFG • crsd - logfiles for CRS daemon including crsd.log • cssd - logfiles for CSS daemon including ocssd.log • evmd - logfiles for EVM daemon including evmd.log • racg - logfiles for node applications including VIP and ONS

  20. $ORA_CRS_HOME log <nodename> client cssd evmd racg crsd alert<nodename>.log racgeut racgimon racgmain Oracle Clusterware Log Files • Log File locations in $ORA_CRS_HOME

  21. $ORACLE_HOME log <nodename> client racg racgeut racgimon racgmain racgmdb Oracle Clusterware Log Files • Log File locations in $ORACLE_HOME (RDBMS and ASM)

  22. Oracle ClusterwareTroubleshooting • If OCR or voting disk are not available, error files may be created in /tmp e.g. /tmp/crsctl.4038 • For example, if OCR cannot be found: OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2] OCR is inaccessible - no CRS daemons will start No errors written to log files • If Voting Disk has incorrect ownership clsscfg_vhinit: unable(1) to open disk (/dev/raw/raw2)Internal Error Information: Category: 1234 Operation: scls_block_open Location: statfs Other: statfs failed /dev/raw/raw2 Dep: 2Failure 1 checking the Cluster Synchronization Services voting disk '/dev/raw/raw2'.Not able to read adequate number of voting disks

  23. Oracle Clusterwareracgwrap • Script called on each node by SRVCTL to control resources • Copy of script in each Oracle home • $ORA_CRS_HOME/bin/racgwrap • $ORA_ASM_HOME/bin/racgwrap • $ORACLE_HOME/bin/racgwrap • Sets environment variables • Invokes racgmain executable • Generated from racgwrap.sbs • Differs in each home • Sets $ORACLE_HOME and $ORACLE_BASE environment variables for racgmain • Also sets $LD_LIBRARY_PATH • Enable trace by setting _USR_ORA_DEBUG to 1

  24. Oracle Clusterwareracgwrap • In Unix systems the Oracle SGA is located in one or more operating system shared memory segments • Each segment is identified by a shared memory key • Shared memory key is generated by the application • Each shared memory key maps to a shared memory ID • Shared memory ID is generated by operating system • Shared memory segments can be displayed using ipcs -m [root@server3] # ipcs -m------ Shared Memory Segments --------key shmid owner perms bytes nattch status0x8a48ff44 131072 oracle 640 94371840 20 0x17d04568 163841 oracle 660 2099249152 246 • Oracle generates the shared memory key from the values of • $ORACLE_HOME • $ORACLE_SID

  25. Oracle Clusterwareracgwrap • If instance is currently running e.g. [oracle@server3]$ ps -ef | grep pmon_PROD1oracle 8653 1 0 16:13 ? 00:00:00 ora_pmon_PROD1 • But SQL*Plus cannot connect to the instance [oracle@server3]$ export ORACLE_SID=PROD1[oracle@server3]$ sqlplus / as sysdba... Connected to idle instance • Compare $ORACLE_HOME environment variable to ORACLE_HOME variable in $ORACLE_HOME/bin/racgwrap [oracle@server3]$ echo $ORACLE_HOME/u01/app/oracle/product/10.2.0/db_1 [oracle@server3]$ grep "^ORACLE_HOME" $ORACLE_HOME/bin/racgwrapORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1/

  26. Oracle ClusterwareProcess Monitor (OPROCD) • Process Monitor Daemon • Provides Cluster I/O Fencing • Implemented on Unix systems • Not required with third-party clusterware • Implemented in Linux in 10.2.0.4 and above • In 10.2.0.3 and below hangcheck timer module is used • Provides hangcheck timer functionality to maintain cluster integrity • Behaviour similar to hangcheck timer • Runs as root • Locked in memory • Failure causes reboot of system • See /etc/init.d/init.cssd for operating system reboot commands

  27. Oracle ClusterwareProcess Monitor (OPROCD) • OPROCD takes two parameters • -t - Timeout value • Length of time between executions (milliseconds) • Normally defaults to 1000 • -m - Margin • Acceptable margin before rebooting (milliseconds) • Normally defaults to 500 • Parameters are specified in /etc/init.d/init.cssd • OPROCD_DEFAULT_TIMEOUT=1000 • OPROCD_DEFAULT_MARGIN=500 • Contact Oracle Support before changing these values

  28. Oracle ClusterwareProcess Monitor (OPROCD) • /etc/init.d/init.cssd can increase OPROCD_DEFAULT_MARGIN based on two CSS variables • reboottime (mandatory) • diagwait (optional) • Values can for these be obtained using [root@server3]# crsctl get css reboottime[root@server3]# crsctl get css diagwait • Both values are reported in seconds • The algorithm is If diagwait > reboottime then OPROCD_DEFAULT_MARGIN := (diagwait - reboottime) * 1000 • Therefore increasing diagwait will reduce frequency of reboots e.g [root@server3]# crsctl set css diagwait 13

  29. Oracle Clusterware Heartbeats • CSS maintains two heartbeats • Network heartbeat across interconnect • Disk heartbeat to voting device • Disk heartbeat has an internal I/O timeout (in seconds) • Varies between releases • In Oracle 10.2.0.2 and above disk heartbeat timeout can be specified by CSS disktimeout parameter • Maximum time allowed for a voting file I/O to complete • If exceeded file is marked offline • Defaults to 200 seconds crsctl get css disktimeoutcrsctl set css disktimeout <value>

  30. Oracle Clusterware Heartbeats • Network heartbeat timeout can be specified by CSS misscount parameter • Default values (Oracle Clusterware 10.1 and 10.2) are: • Default value for vendor clusterware is 600 seconds crsctl get css misscountcrsctl set css misscount <value>

  31. Oracle ClusterwareHeartbeats • Relationship between internal I/O timeout (IOT), MISSCOUNT and DISKTIMEOUT varies between releases

  32. Oracle ClusterwareHeartbeats • If disktimeout supported CSS will not evict a node from the cluster when I/O to voting disk takes more than MISSCOUNT seconds unless during • during initial cluster formation • slightly before reconfiguration • Nodes will not be evicted as long as voting disk operations are completed within DISKTIMEOUT seconds

  33. Oracle Clusterware CRSCTL • CRSCTL can also be used to enable and disable Oracle Clusterware • To enable Clusterware use: # crsctl enable crs • To disable Clusterware use: # crsctl disable crs • These commands update the following file: • /etc/oracle/scls_scr/<node>/root/crsstart

  34. Oracle ClusterwareCRSCTL • In Oracle 10.2, CRSCTL can be used to check the current state of Oracle Clusterware daemons • To check the current state of all Oracle Clusterware daemons # crsctl check crsCSS appears healthyCRS appears healthyEVM appears healthy • To check the current state of individual Oracle Clusterware daemons # crsctl check cssdCSS appears healthy # crsctl check crsdCRS appears healthy # crsctl check evmdEVM appears healthy

  35. Oracle ClusterwareCRSCTL • CRSCTL can be used to manage the CSS voting disk • To check the current location of the voting disk use: # crsctl query css votedisk0. 0 /dev/raw/raw31. 0 /dev/raw/raw42. 0 /dev/raw/raw5 • To add a new voting disk use: # crsctl add css votedisk <path_name> • To delete an existing voting disk use: # crsctl delete css votedisk <path_name>

  36. Oracle ClusterwareDebugging • In Oracle 10.2 and above • Oracle Clusterware debugging can be enabled and disabled for • CRS • CSS • EVM • Resources • Subcomponents • Debugging can be controlled • statically using environment variables • dynamically using CRSCTL • Debug settings can be persisted in OCR for use in subsequent restarts

  37. Oracle ClusterwareDebugging • To list modules available for debugging use: # crsctl lsmodules crs# crsctl lsmodules css# crsctl lsmodules evm • In Oracle 11.1 modules include:

  38. Oracle ClusterwareDebugging • To debug individual modules use: # crsctl debug log crs <module>:<level>[,<module>:<level>] • For example: # crsctl debug log crs "CRSCOMM:2,COMMCRS:2,COMMNS:2"Set CRSD Debug Module: CRSCOMM Level: 2Set CRSD Debug Module: COMMCRS Level: 2Set CRSD Debug Module: COMMNS Level: 2 • Values only apply for current node • Stored within OCR in SYSTEM.crs.debug.<node>.<module> • For example: # ocrdump -stdout -keyname SYSTEM.crs.debug.vm1.CRSCOMM • Log will be written to: • $ORA_CRS_HOME/log/<node>/crsd/crsd.log

  39. Oracle ClusterwareDebugging • To debug an individual resource use: # crsctl debug log res <resource>:<level> • For example: # crsctl debug log res ora.vm1.vip:5Set Resource Debug Module: ora.vm1.vip Level: 5 • To disable debugging again set level 0 e.g.: # crsctl debug log res ora.vm1.vip:0Set Resource Debug Module: ora.vm1.vip Level: 0 • OCR debug value is stored in USR_ORA_DEBUG • To check current debug value set in OCR for ora.vm1.vip use: # ocrdump -stdout -keyname \CRS.CUR.ora\!vm1\!vip.USR_ORA_DEBUG • Log will be written to • $ORA_CRS_HOME/log/<node>/racg/<resource>.log

  40. Oracle ClusterwareDebugging • Debugging for CRSD and EVMD can also be configured using environment variables • To enable tracing for all modules use ORA_CRSDEBUG_ALL • For example: # export ORA_CRSDEBUG_ALL=5 • To enable tracing for individual modules use ORA_CRSDEBUG_<module> • For example: # export ORA_CRSDEBUG_CRSOCR=5 • Note that these environment variables have not been implemented in OCSSD or OPROCD

  41. Oracle ClusterwareDebugging • In Oracle 10.1 and above debugging can also be configured in • $ORA_CRS_HOME/srvm/admin/ocrlog.ini • By default this file contains: # "mesg_logging_level" is the only supported parameter currently.# level 0 means minimum logging. Only error conditions are loggedmesg_logging_level = 0 # The last appearance of a parameter will override the previous value.# For example, log level will become 3 when the following value is uncommented.# Change to log level 3 for detailed logging from Oracle Cluster Registry# mesg_logging_level = 3 # Component log and trace level specification template#comploglvl="comp1:3;comp2:4"#comptrclvl="comp1:2;comp2:1" • Component level logging can be configured in this file e.g.: comploglvl="OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5"

  42. Oracle ClusterwareDebugging • Component level logging can also be configured in the OCR • For example: crsctl debug log crs OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5 • Components include: • OCRAPI - OCR Abstraction Component • OCRCAC - OCR Cache Component • OCRCLI - OCR Client Component • OCRMAS - OCR Master Thread Component • OCRMSG - OCR Message Component • OCRSRV - OCR Server Component • OCRUTL - OCR Util Component

  43. Oracle ClusterwareDebugging • CRSCTL can also generate state dumps crsctl debug statedump crscrsctl debug statedump csscrsctl debug statedump evm • CSS dump is written to • $ORA_CRS_HOME/log/<node>/cssd/ocssd.log • Dump contents can be made more readable e.g.: cut -c58- < ocssd.log > ocssd.dmp

  44. Oracle ClusterwareOLSNODES • The olsnodes utility lists all nodes currently running on the cluster • With no arguments olsnodes lists the nodes e.g. $ olsnodeslondon1london2 • In Oracle 10.2 and above, with -p argument olsnodes lists node names and private interconnect $ olsnodes -plondon1 london1-privlondon2 london2-priv • In Oracle 10.2 and above, with -i argument olsnodes lists node names and VIP address $ olsnodes -ilondon1 london1-viplondon2 london2-vip

  45. Oracle ClusterwareOCRCONFIG • In Oracle 10.1 and above the OCRCONFIG utility performs various administrative operations on the OCR including: • displaying backup history • configuring backup location • restoring OCR from backup • exporting OCR • importing OCR • upgrading OCR • downgrading OCR • In Oracle 10.2 and above OCRCONFIG can also • manage OCR mirrors • overwrite OCR files • repair OCR files

  46. Oracle ClusterwareOCRCONFIG • Options include

  47. Oracle ClusterwareOCRCONFIG • In Oracle 10.1 and above • OCR is automatically backed up every four hours • Previous three backup copies are retained • Backup copy retained from end of previous day • Backup copy retained from end of previous week • Check node, times and location of previous backups using the showbackup option of OCRCONFIG e.g. # ocrconfig -showbackuplondon1 2005/08/04 11:15:29 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/03 22:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/03 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/08/02 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crslondon1 2005/07/31 18:24:32 /u01/app/oracle/product/10.2.0/crs/cdata/crs • ENSURE THAT YOU COPY THE PHYSICAL BACKUPS TO TAPE AND/OR REDUNDANT STORAGE

  48. Oracle ClusterwareOCRCONFIG • In Oracle 11.1 and above OCR can be backed up manually using: # ocrconfig -manualbackup • Backups will be written to the location specified by: # ocrconfig -backuploc <directory_name> • Manual backups can be listed using: # ocrconfig -showbackup manual • Automatic backups can be listed using: # ocrconfig -showbackup auto

  49. Oracle ClusterwareOCRCONFIG • To restore the OCR from a physical backup copy • Check you have a suitable backup using: # ocrconfig -showbackup • Stop Oracle Clusterware on each node using: # crsctl stop crs • Restore the backup file using # ocrconfig -restore <filename> • For example: # ocrconfig -restore $ORA_CRS_HOME/cdata/crs/backup00.ocr • Start Oracle Clusterware on each node using: # crsctl start crs

  50. Oracle ClusterwareOCRCHECK • In Oracle 10.1 and above, you can verify the configuration of the OCR using the OCRCHECK utility # ocrcheckStatus of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 262144 Used space (kbytes) : 7752 Available space (kbytes) : 254392 ID : 1093363319 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded /dev/raw/raw2 Device/File integrity check succeeded Cluster registry integrity check succeeded • In Oracle 10.1 this utility does not print the ID and Device/File Name information

More Related