240 likes | 490 Views
Oracle Cache Fusion. Cache Fusion Concepts, Data Block Shipping, and Recovery with Cache Fusion. Objectives. At the end of this module the student will understand the following tasks and concepts. Cache fusion concepts Block transfers using cache fusion Cache Fusion and Recovery. Overview.
E N D
Oracle Cache Fusion Cache Fusion Concepts, Data Block Shipping, and Recovery with Cache Fusion
Objectives At the end of this module the student will understand the following tasks and concepts. • Cache fusion concepts • Block transfers using cache fusion • Cache Fusion and Recovery
Overview • Synchronization • Past Images • Cache Fusion Scenarios • Recovery Methodology and steps • Recovery Process Scenarios
Synchronization of Concurrent Tasks • RAC synchronization through Cache Fusion: • Maintains cluster-wide concurrency of resources • Ensures integrity of shared data • Data blocks and enqueues are synchronized when nodes within a cluster: • Acquire block or enqueue ownership • Release block or enqueue ownership
Minimize Synchronization • Key is to optimally divide tasks among nodes, so that very little synchronization is necessary • Minimize inter-node communication, because block access in local cache is cheapest: • Block access in local cache ~ 0.01 msec • Block access in remote cache ~ 2.5 msec • Block access on disk ~ 14 msec+
Past Images • Past Images introduced in Oracle 9i RAC to maintain data integrity • Dirty data block not written to disk immediately • Another instance may request the same block for read or write • Image of the block is created at owning instance, and is shipped to requesting block • Backup image kept at owning block is a Past Image, and is kept in memory for local consistency
“Juggling” Data with Multiple Past Images • Multiple Past Image versions of a data block may be kept by different instances • Upon a checkpoint, only the current image is written to disk; Past Images are discarded • In the event of a failure, current version of block can be reconstructed from PIs • Since PIs are kept in memory, they aid in avoiding frequent disk writes • This avoids “disk pinging” experienced with 8i OPS due to frequent writes to disk • Data is “juggled” in memory, without touching down on the disk
Cache Fusion Scenario 1: Read/Read Cache Fusion – GCS Processing Global Cache Service (GCS) (1) Request (2) Forward (4) Inform GCS 1: (Shared, Local) 2: (Shared, Local) Locks: (None) (Shared, Local) Locks: (Shared, Local) SGA Buffer Cache SGA Buffer Cache Block Block (3) Ship Instance 1 Instance 2 Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Cache Fusion Scenario 2: Write/Write Cache Fusion – GCS Processing Global Cache Service (GCS) (1) Request (2) Forward (4) Inform GCS 1: (Exclusive, Global) 2: (Null, Global, Past Image) Locks: (None) (Exclusive, Global) Locks: (Exclusive, Local) (Null, Global, Past Image) SGA Buffer Cache SGA Buffer Cache Block Block (3) Ship Instance 1 Instance 2 Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Cache Fusion Scenario 3: Write Blocks to Disk – GCS Processing (1) Write Request Global Cache Service (GCS) (2) Forward (5) Flush PI (4) Notify GCS of write and inform GCS 1: (Exclusive Local) 2: (None) Locks: (Exclusive, Global) (Exclusive, Local) Locks: (Null, Global, Past Image) (None) SGA Buffer Cache SGA Buffer Cache Block Block Instance 1 Instance 2 (3) Write Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Online Instance Recovery Steps • Instance Failure detected by Cluster Manager and GCS • Reconfiguration of GES resources (enqueues); global resource directory is frozen • Reconfiguration of GCS resources; involves redistribution among surviving instances • One of the surviving instances becomes the “recovering instance”
Online Instance Recovery Steps (cont.) • SMON process of recovering instance starts first pass of redo log read of the failed instance’s redo log thread • SMON finds BWR (block written records) in the redo and removes them as their PI is already written to disk • SMON prepares recovery set of the blocks modified by the failed instance but not written to disk • Entries in the recovery list are sorted by first dirty SCN • SMON informs each block’s master node to take ownership of the block for recovery
Online Instance Recovery Steps (cont.) • Second pass of log read begins. • Redo is applied to the data files. • Global Resource Directory is unfrozen
RAC Recovery – Solving the Mystery • Recovering from a failed RAC instance is like solving a Detective mystery • Part of the evidence is missing – by definition, the failed instance is inaccessible at recovery time • There are clues – in this case the lock state and mode of the blocks on the surviving nodes. • Tip 1: Whichever node has an exclusive lock on a given block has the most recent version of that block • Tip 2: If the mode of surviving blocks is local, any change made was on one node ony. If the mode of surviving blocks is global, changes have been made on multiple nodes. • Tip 4: If the block mode is global and a node has a past image, there is a newer version of the block on another node.
Lock Remastering – (Scenario-1) Recovering Instance A Open Instance B Failed Instance C Exclusive, Local None Lock Held X Current copy of buffer Exclusive, Local None Lock Aquired Remove from recovery list Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Lock Remastering – (Scenario-2) Recovering Instance A Open Instance B Failed Instance C Exclusive, Global Null, Global, Past Image Lock Held X Exclusive, Global Null, Global, Past Image Lock Acquired Remove from recovery list Write to disk Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Lock Remastering – (Scenario-3) Recovering Instance A Open Instance B Failed Instance C None Exclusive, Local Lock Held X None Exclusive, Local Lock Acquired Remove from recovery list No need to write to disk Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Lock Remastering – (Scenario-4) Recovering Instance A Open Instance B Failed Instance C Null, Global, Past Image Exclusive, Global Lock Held X Null, Global, Past Image Exclusive, Global Lock Acquired Remove from recovery list Write to disk Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Lock Remastering – (Scenario-5) Recovering Instance A Open Instance B Failed Instance C None Exclusive, Global Lock Held X Null, Global, Past Image None Lock Acquired Remove from recovery list write to disk Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Lock Remastering – (Scenario-6) Recovering Instance A Open Instance B Failed Instance C None None (Exclusive, Local Implied) Lock Held X Exclusive, Local None Lock Acquired Keep in recovery list Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Lock Remastering – (Scenario-7) Recovering Instance A Open Instance B Failed Instance C None (Exclusive, Global Implied) Null, Global, Past Image Lock Held X Exclusive, Global Null, Global, Past Image Lock Aquired Keep in recovery list Send Cr block to A Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Lock Remastering – (Scenario-8) Recovering Instance A Open Instance B Failed Instance C Null, Global, Past Image Null, Global (Exclusive, Global Implied) Lock Held X Exclusive, Global, Past Image Null, Global Lock Aquired Keep in recovery list Send Cr block to A Based on “ Oracle 9i RAC Oracle Real Application Clusters Configuration and Interals” Mike Ault, Madhu Tumma
Review • Name two times that data blocks and enqueues are synchronized. • True or false: since communication across the private interconect is fast, you want to maximize the amount of inter-node communication. • True or false: multiple instances may hold Past Images of the same block at the same time. • All cache fusion requests must pass through the ___. • True or false: an instance with an exclusive lock with a global resource role contains the most current version of a block.
Summary • Cache Fusion synchronizes resources cluster-wide such as: • Data blocks • Enqueues • Past Images are used to maintain data block integrity • No immediate disk write required • Avoids “disk pinging” • The GCS plays a key role in performing the necessary block transfers • Including lock mode conversions • Upon recovery, GCS remasters cache resources of a failed node or nodes on one or more recovery nodes