Golden Gate architecture: GoldenGate Bounded Recovery

From Robs_Wiki
Jump to: navigation, search

Extract Recovery concept

Before we talk about the Bounded Recovery concept (which is part of the Extract recovery concecpt), let's first look at how Extract recovery works in general.
When extract encounters the start of a transaction in the redo log, it starts caching to memory all of the data belonging to that transaction in memory.

  • When Extract encounters a commit record for the transaction, it writes the entire cached transaction to the trail file and clears it from memory.
  • When Extract encounters a rollback record for the the transaction, it discards the entire transaction from memory.
  • Extract will keep the transaction in memory untill it encounters either the commit or the rollback record.

Now, if Extract were to stop (intentionally or unintentially), all of the cached information must be recovered when Extract starts again. This applies to all the transactions that were open at the time that Extract stopped. Extract performs the recovery as follows:

  • If there were no open transactions when Extract stopped, the recovery begins at the current Extract read checkpoint. This is a normal recovery.
  • If there were open transactions whose start points in the log were very close in time to the time when Extract stopped, Extract begins recovery by re-reading the archived and online redo log files from the beginning of the oldest open transaction. This is also considered a normal recovery.
  • If there were any long-running transactions, Extract begins a Bounded Recovery. A transaction is considered a long running transaction when it is older than 4 hours at the next BR checkpoint.

Bounded Recovery concept

At each Bounded Recovery Interval (default 4 hours), Extract makes a Bounded Recovery checkpoint, which persists the current state and data of Extract to disk, including the state and data of any long-running transactions. As stated before, a transaction is considered long-running when it is older than 4 hours at the next BR checkpoint. During a recovery, Extract will recover it's state from the last BR checkpoint, in stead of having to read all the archived logs again for the long-running transactions. The Bounded Recovery interval defaults to 4 hours and can be changed using the BRINTERVAL option of the BR parameter. But Oracle states that the default is sufficient and should only be changed under guidance of Oracle Support.

Oracle states that the need to persist long running transactions is rare and that the BR checkpoint does not contain any long running transactions in most cases. The parameter WARNLONGTRANS can be used to specify a lenght of time that a transaction can be open before Extract generates a warning message in the ggserr.log file that there is a long running transaction. This can be used to monitor the occurrence of long running transactions.

How many archives should be kept on disk ?

So Oracle will restore the Bounded Recovery checkpoint during an Extract recovery and will pick up the recovery from there. But how many archives should be kept on disk? Remember that a long-running transaction is considered as such when it is older than 4 hours at the next BR checkpoint. So in the worst case scenario, you can have a transaction that is 3 hours and 59 minutes old at the time of the next BR checkpoint and it will not be included in the BR checkpoint. Only at the next checkpoint, 4 hours later, will it be older than 4 hours and be included at the BR checkpoint. But what if the Extract process crashes just before that second BR checkpoint? This is the reason why Oracle advises to keep 8 hours of archive logs on disk.