Monday, December 9, 2013

Dataguard and DG Broker

Dataguard

Oracle Data Guard ensures high availability, data protection, and disaster recovery for enterprise data. Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive disasters and data corruptions. Data Guard maintains these standby databases as copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage.
You can manage primary and standby databases using the SQL command-line interfaces or the Data Guard broker interfaces.


Primary Database: 

             Its a running Production Database.

Stand By Database:
            A replica of the primary database. Using a primary database, you can create up to thirty standby databases. Once created, Data Guard automatically maintains each standby database by transmitting archive logs from the primary database and then applying the archives to the standby database. Similar to a primary database, a standby database can be either a single-instance Oracle database or an Oracle RAC database.
The types of standby databases are as follows:
    ·         Physical standby database à provides a physically identical copy of the primary database, with the same database structures and it will be same block-for-block basis. The database schema, including indexes, everything. A physical standby database is kept synchronized with the primary database.
    ·         Logical standby database à the logical standby database is kept synchronized with the primary database through SQL Apply, which transforms the data in the redo received from the primary database into SQL statements and then executes the SQL statements on the standby database.
    ·         Snapshot Standby database à like a physical or logical standby database, a snapshot standby database  receives and archives redo data from a primary database. Unlike a physical or logical standby database, a snapshot standby database does not apply the redo data that it receives. The redo data received by a snapshot standby database is not applied until the snapshot standby is converted back into a physical standby database.
Dataguard Protection Modes:
    ·         Maximum availability à Its first priority is to be available its second priority is zero loss protection,  In the event that the standby server is unavailable the primary will wait the specified time in the NET_TIMEOUT parameter before giving up on the standby server and allowing the primary to continue to process. Once the connection has been re-established the primary will automatically resynchronize the standby database.
    ·         Maximum performance à this is the default protection mode. It provides the highest level of data protection that is possible without affecting the performance of a primary database. This is accomplished by allowing transactions to commit as soon as all redo data generated by those transactions has been written to the online log in the primary.
    ·         Maximum Protection à This protection mode ensures that no data loss will occur if the primary database fails. Transactions do not commit until all redo data needed to recover those transactions has been written to the online redo log and to the standby redo log on at least one synchronized standby database.
Set the data protection mode.
Execute the following SQL statement on the primary database:
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE {AVAILABILITY | PERFORMANCE | PROTECTION};
Perform the following query on the primary database to confirm that it is operating in the new protection mode:
SQL> SELECT PROTECTION_MODE FROM V$DATABASE;
Failover
A standby database will become the primary database in case a primary database fails.
Switchover
The primary database will switch roles with one of its standby databases. There is no data loss during a switchover. After a switchover, each database continues to participate in the Data Guard configuration with its new role.
Dataguard Services

The services required on the primary database are:
  • Log Writer Process (LGWR) - Collects redo information and updates the online redo logs. It can also create local archived redo logs and transmit online redo to standby databases.
  • Archiver Process (ARCn) - One or more archiver processes make copies of online redo logs either locally or remotely for standby databases.
  • Fetch Archive Log (FAL) Server – Pushes archived redo log files to the standby site. Services requests for archive redo logs from FAL clients running on multiple standby databases. Multiple FAL servers can be run on a primary database, one for each FAL request.
The services required on the standby database are:
  • Fetch Archive Log (FAL) Client - Pulls archived redo log files from the primary site. Initiates transfer of archived redo logs when it detects a gap sequence.
  • Remote File Server (RFS) - Receives archived and/or standby redo logs from the primary database.
  • Archiver (ARCn) Processes - Archives the standby redo logs applied by the managed recovery process (MRP).
  • Managed Recovery Process (MRP) - Applies archive redo log information to the standby database.
Synchronous transport (SYNC):
It is also referred to as "zero data loss" method because, in the diagram,
The user commits a transaction creating a redo record in the SGA, the LGWR reads the redo record from the log buffer and writes it to the online redo log file and waits for confirmation from the LNS (Log Network Server).
The LNS reads the same redo record from the buffer and transmits it to the standby database using Oracle Net Services, the RFS receives the redo at the standby database and writes it to the Standby Redo Logs.
When the RFS receives a write complete from the disk, it transmits an acknowledgment back to the LNS process on the primary database which in turns notifies the LGWR that the transmission is complete; the LGWR then sends a commit acknowledgment to the user.
                 

Asynchronous transport (ASYNC) is different from SYNC in that it eliminates the requirement that the LGWR waits for an acknowledgment from the LNS. A log file gap occurs whenever a primary database continues to commit transactions while the LNS process has ceased transmitting redo to the standby database (network issues). The primary database continues writing to the current log file.
Data Guard Broker

            The broker is part of Oracle's Database Enterprise Manager and an integral part of Data Guard, it is the management framework. You can of course use SQL*Plus to manage Data Guard but the broker when used with the Enterprise Manager becomes a powerful tool offering the following
  • Configuration
  • Monitoring
  • Alerting
  • Performance Analysis
  • Manual switchover's
  • Automatic switchover's
The broker can make changes to a Data Guard setup (physical or logical database), the type of changes are configuration, transport modes, apply setup and role change services and the protection mode. You can also monitor the health of any configuration; it is also responsible for implementing and managing the automatic failover capability also known as Fast-Start Failover (FSFO).
The broker has three parts,
  • A set of background processes on each database.
  • A set of configuration files.
  • Command line interface (CLI) called DGMGRL.
Background Processes:

            The broker has a number of processes running, the broker manages these processes automatically we have no control over them
  • Data Guard Monitor (DMON) - this is the main process and is responsible for coordinating all broker actions as well as maintaining the configuration files, this process is enabled or disabled with the DG_BROKER_START parameter.
  • Broker Resource Manager (RSM) - is responsible for handling any SQL commands used by the broker that need to be executed on one of the databases in the configuration. These SQL commands are made as a result of a change to the configuration made through DGMGRL or are the configuration commands executed by the broker during the database startup.
  • Data Guard Net Server (NSVn) - from 1 to n can exist, they are responsible for making contact with the remote databases and sending across any work items to the remote database.
  • DRCn - these network receiver processes establish the connection from the source database NSVn process, this is a similar connection to the LogWriter Network Service (LNS) to a Remote File Server (RFS) connection for redo transport. When the broker needs to send something (data or SQL) between databases it uses this NSV to DRC connection, this connection are started as needed.
Configuration Files
Each database has two copies of the configuration, stored in the below parameter locations
  • DG_BROKER_CONFIG_FILE1
  • DG_BROKER_CONFIG_FILE2
By default they are stored in $ORACLE_HOME/dbs directory with the filename of dr1.dat and dr2.dat, if using a RAC these should be keep on a shared location as only one set of copies can be exist for the entire RAC.
The primary has the master copies of the configuration files, all changes to these files are done by the primary, even if you are on the standby server the configuration changes will be passed to the primary via NSV-DRC processes and the DMON process will make the change which is then propagated back to the standby databases. The reason for the multiple copies of the configuration file is if the primary has a problem then each standby database has a copy, when a failover occurs the standby that becomes the primary will then become the master of those configuration files.

Command line interface

You have two choices to interact with Data Guard either Enterprise Manager (EM) or the broker Command Line Interface DGMGRL you can swap between the two with a few simple configuration changes. To gain full flex functionality of Data Guard through Grid Control you must use the CLI.
Broker CLI DGMGRL comes with Enterprise Manager, you can run the broker on any platform you wish it does not need to be the same platform as the primary or standby databases.
You can access the DGMGRL command line using the below
DGMGRL CLI
[primargydg01] > dgmgrl sys/oracle
[primargydg01] > dgmgrl sys/oracle@PROD1
When you start the DGMGRL command line it does not connect you to a database, only the Data Guard configuration. Before you start to use the broker there are a number of things to perform first
  • Configure the Broker parameters
  • The broker and the listener
  • RAC and the broker
  • Connecting to the broker
Even if you use CLI, as mentioned above there are two configuration files, both the systems parameters must be set on the primary and all standby databases, you put set this files anywhere you like,
data Guard configuration files location
# Directory
alter system set DG_BROKER_CONFIG_FILE1 = '/';
alter system set DG_BROKER_CONFIG_FILE2 = '/';
# ASM
alter system set DG_BROKER_CONFIG_FILE1 = '+DATA/PROD1/Broker/dr1PROD1.dat';
alter system set DG_BROKER_CONFIG_FILE1 = '+FLASH/PROD1/Broker/dr2PROD1.dat';
Note: if you are using ASM then the directories must be created already, use the asmcmd command to do this, the broker will create symlinks to the actual config files but the directories must be there
In a RAC environment they should be in a shared area and all nodes should point to the same files, only one set of files must exist in a RAC.
Once the directory has been created you are ready to start up the broker, do this on all databases (primary and standby)
Start the Broker
alter system set DG_BROKER_START=TRUE SCOPE=BOTH;
When you start the broker it does not configure any files yet because you have to use the DGMGRL command, all it does is start the necessary processes we mentioned earlier. If you need to change the location of the files the steps are below
  1. stop the broker using the DG_BROKER_START parameter by setting this to false (on all nodes in the cluster)
  2. change the DG file destination parameters
  3. copy the files to the new location
  4. then re-start the broker, again set the DG_BROKER_START parameter to true (on all nodes in the cluster)
The broker uses the listener.ora to make connections to the databases, setup both redo transport and archive gap resolution and perform role transitions. We need to create a special static entry in the listener.ora file for each database in the broker configuration, this entry makes it possible for the broker to connect to an idle instance using a remote SYSDBA connection and perform the necessary startup. Here is an example
Broker listener.ora static entry
## Primary and all standby databases
SID_LIST_LISTENER =
  (SID_LIST =
     (SID_DESC =
        (GLOBAL_DBNAME = PROD1DR_DGMGRL)
        (ORACLE_HOME = /scratch/OracleHomes/OraHome111)
        (SID_NAME = PROD1DR)
     )
  )

SID_LIST_LISTENER =
  (SID_LIST =
     (SID_DESC =
        (GLOBAL_DBNAME = PROD1_DGMGRL)
        (ORACLE_HOME = /scratch/OracleHomes/OraHome111)
        (SID_NAME = PROD1)
     )
  )
Once the above has been done we are ready to create a broker configuration, there is no difference in a non-RAC or RAC environment, to connect to the configuration you use the dgmgrl command but one word of advice make sure that you use a specific user and password and not the / as the user, this has known to cause problems at a later date
DGMGRL CLI
[primargydg01] > dgmgrl sys/oracle
[primargydg01] > dgmgrl sys/oracle@PROD1
There are four main areas to DGMGRL CLI
  • Connection and Help - connect, help and exit
  • Creation and Editing - create, add, enable, edit and convert
  • Monitoring - show
  • Role Transition - switchover, failover and reinstate
The first step is to create the base configuration
Create base configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> create configuration prod1 as
> primary database is prod1
> connect identifier is prod1;

Configuration "prod1" created with primary database "prod1"
The above command would have created a base configuration, you can display the configuration using the below command
Display configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> show configuration
Configuration - prod1
Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
## Here is the same screen shot after adding a logical server
DGMGRL> show configuration;
Configuration - prod1
Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
prod1lr - Logical standby database
Fast-Start Failover: DISABLE
Configuration Status:
SUCCESS
Next we add the standby database to the configuration, if you receive an error stating that the properties could not be imported from the database, you need to check DB_UNIQUE_NAME also the TNSNAME and the transport parameter (LOG_ARCHIVE_DEST_n), so what is the proper setup, you must have your redo transport parameter defined using the DB_UNIQUE_NAME method, meaning that each redo transport parameter must contain the DB_UNIQUE_NAME= attribute. The broker will search all of your LOG_ARCHIVE_DEST_n parameters looking for a database with a unique name that matches the database name you entered for the command. If you have not done this then you need to use the full set of arguments to add the database
Add the standby database
# Primary Database server - the full command set

DGMGRL> connect sys/password
DGMGRL> add database prod1dr

> as connect identifier is prod1dr
> maintained as physical;
Database "prod1dr" added
# if you want to add a logical standby you can use the following
DGMGRL> add database prod1lr
> as connect identifier is prod1lr
> maintained as logical;

Database "prod1lr" added
  
When you display the configuration you should see that the standby database has been added
Display configuration
# Primary Database server
DGMGRL> connect sys/password
DGMGRL> show configuration
Configuration - prod1
Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
DISABLED
## Because we have not setup any parameters for Broker all parameters defaults will be presumed,
## some parameters are set by examining the database others are Broker default values.

DGMGRL> show database verbose prod1
Database - prod1
Role: PRIMARY
Intended State: OFFLINE
Instance(s):
PROD1
Properties:
DGConnectIdentifier = 'prod1'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = ''
StatusReport = '(monitor)'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'primarydg01'
SidName = 'PROD1'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=primarydg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PROD1_DGMGRL)(INSTANCE_NAME=PROD1)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'
Database Status:
DISABLED
DGMGRL> show database verbose prod1dr
Database - prod1dr
Role: PHYSICAL STANDBY
Intended State: OFFLINE
Transport Lag: (unknown)
Apply Lag: (unknown)
Real Time Query: OFF
Instance(s):
PROD1DR
Properties:
DGConnectIdentifier = 'prod1dr'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'OPTIONAL'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = ''
StatusReport = '(monitor)'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'physicaldg01'
SidName = 'PROD1DR'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=physicaldg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PROD1DR_DGMGRL)(INSTANCE_NAME=PROD1DR)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'
Database Status:
DISABLED
When you do add any additional standby databases always double check the parameters are set correctly for the type of configuration that you want, any changes should be done before enabling the additional standby database.
Once we are happy that all standby databases have been added and their parameters have been check we are ready to enable the configuration, the command will issue alter system commands on both the primary and standby databases, start the redo transport to the standby databases and the apply services, before examining the configuration give it some time to complete as it does take a while, otherwise you see some ORA-16610 errors.
Enabling the configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> enable configuration
Enabled.
You can tail the alert log file and see what action is taking place, hopefully after a few minutes you should see a successful configuration running, you can also check the redo transport and apply services by viewing the v$managed_standby view.
  
Display configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> show configuration;
Configuration - prod1
Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
# Standby database - check the LGWR - RFS connection process and the MRP0 is in the applying state

sql> select client_process, process, thread#, sequence#, status from v$managed_standby;

Broker Properties
As I mentioned above the broker has a number of properties which can be changed, to change the property we use the EDIT command in the DGMGRL CLI
Edit configuration
EDIT CONFIGURATION SET PROPERTY =
EDIT DATABASE SET PROPERTY =
EDIT INSTANCE SET PROPERTY =value>
Configuration Properties
BystandersFollowRoleChange
FastStartFailoverAutoReinstate
FastStartFailoverLagLimit
FastStartFailoverPmyShutdown
FastStartFailoverThreshold
CommunicationTimeout
the amount of time the broker will wait for a response before giving up (default 180 seconds), you can remove any timeout by setting this to 0
Database Properties
FastStartFailoverTarget
ObserverConnectIdentifier
ApplyInstanceTimeout
defines how long the broker should wait until moving the apply process to another instance in a standby RAC, default is 0 when means immediately
PreferredApplyInstance
allows you to tell the broker where you would like the apply to run when you have a RAC, by default it is empty which means it can choose any instance
ArchiveLagTarget
alters database parameter: ARCHIVE_LAG_TARGET
DbFileNameConvert
alters database parameter: DB_FILE_NAME_CONVERT
LogArchiveMaxProcesses
alters database parameter: LOG_ARCHIVE_MAX_PROCESSES
LogArchiveMinSuccessDest
alters database parameter: LOG_ARCHIVE_IN_SUCCEED_DEST
LogFileNameConvert
alters database parameter: LOG_FILE_NAME_CONVERT
LogShipping
(standby role only)
enables or defers redo transport to that standby database.
alters database parameter: LOG_ARCHIVE_DEST_STATE_n
StandbyFileManagement
(standby role only)
alters database parameter: STANDBY_FILE_MANAGEMENT
Instance Properties
HostName
No explaination needed here
SidName
No explaination needed here
LogArchiveTrace
alters database parameter: LOG_ARCHIVE_TRACE
LogArchiveFormat
alters database parameter: LOG_ARCHIVE_FORMAT
StandbyArchiveLocation
alters database parameter: LOG_ARCHIVE_DEST_n
AlternateLocation
alters database parameter: LOG_ARCHIVE_DEST_n
LsbyMaxSga
alters database parameter: MAX_SGA
LsbyMaxServers
alters database parameter: MAX_SERVERS

Broker State and Monitoring
There are a number of commands that you can use to change the state of the database
turn off/on the redo transport service for all standby databases
Primary
DGMGRL> edit database prod1 set state=transport-off;

DGMGRL> edit database prod1 set state=transport-on;
turn off/on the apply state
Standby
DGMGRL> edit database prod1dr set state=apply-off;

DGMGRL> edit database prod1dr set state=apply-on;
put a database into a real-time query mode
Standby
DGMGRL> edit database prod1dr set state=apply-off;
sql> alter database open read only;
DGMGRL> edit database prod1dr set state=apply-on;
change the protection mode
Primary
# Choose what level of protection you require
sql> alter database set standby to maximize performance;
sql> alter database set standby to maximize availability;
sql> alter database set standby to maximize protection;
# display the configuration
DGMGRL> show configuration 
There are a number of useful monitoring commands and log files that can help with diagnosing problems
configuration
DGMGRL> show configuration;
database
DGMGRL> show database prod1;
DGMGRL> show database prod1dr;

# There are a number of specific information commands, here are the most used
DGMGRL> show database prod1 statusreport;
DGMGRL> show database prod1 inconsistentProperties;
DGMGRL> show database prod1 inconsistentlogxptProps;
DGMGRL> show database prod1 logxptstatus;
DGMGRL> show database prod1 latestlog;
Logfiles
# change the instance name to reflect the one you have choosen

prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log
prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.log
prod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log
prod1dr (DG log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/drcPROD1DR.log
DG Menu Utility
DG Menu
========================================
# ------------------------------------------------------------------------- #
# #
# Data Guard Check List - primarydg01
# #
# 0. Review database information and status #
# 1. Check for password file #
# 2. Check for forced logging and unrecoverable activities #
# ------------------------------------------------------------------------- #
# 3. Check for archive log mode #
# 4. Check for standby redo logs #
# 5. Check current SCN on primary and standby databases #
# 6. Check archive log destinations #
# ------------------------------------------------------------------------- #
# 7. Check Data Guard Status View for errors and fatal messages #
# 8. Check Managed Recovery Process Status #
# 9. Check for missing archive logs #
# 10. Check archive log gaps on the standby database #
# 11. Check average apply rate / active apply rate #
# 12. Check transport / apply lag #
# 13. How far behind is my Data Guard in terms of time? #
# #
# ------------------------------------------------------------------------- #
# 20. Launch the Logical Standby Data Guard Submenu #
# ------------------------------------------------------------------------- #
# 21. Generate init.ora entries for primary database #
# 22. Generate init.ora entries for standby database #
# 23. Generate tnsnames.ora entries for primary and standby databases #
# 24. Generate SQL syntax to create standby redo logs #
# #
# ------------------------------------------------------------------------- #
# 30. Generate syntax to duplicate standby database from active database #
# #
# x. Exit #
# ------------------------------------------------------------------------- #
# Enter Task Number:
# ------------------------------------------------------------------------- #
# #
# Logical Standby Data Guard Check List - primarydg01
# #
# 1. Check Logical Progress - View Overall Progress Of SQL Apply #
# 2. Check Logical Events - History on Logical Standby Apply Activity #
# 3. Check Logical Events - Detailed View #
# 4. Check Logical Stats - Logical Standby Stats #
# 5. Check Logical Parameters - Logical Standby Parameters #
# 6. Look At What The Logical Standby Processes Are Doing #
# Coordinator, Reader, Builder, Preparer, Analyzer, Applier ... #
# 7. Look At The Status Codes For The Logical Standby Processes #
# 8. Look At Events The Applier Process Is Stuck On #
# ------------------------------------------------------------------------- #
# 10. Check the LCR - Look At Bytes Paged Out #
# 11. Generate Syntax To Skip Transactions #
# Based On MAX(EVENT_TIME) FROM DBA_LOGSTDBY_EVENTS #
# DO NOT SKIP DML STATEMENTS #
# 12. Diagnostic Script Per Metalink Note ID: 241512.1 #
# Look for output in logical_diag_[ORACLE_SID_MONDD_HHMM.out] format #
# ------------------------------------------------------------------------- #
# 20. Review What Is NOT Supported In Your Logical Standby Database #
# 21. Review Tables That Do NOT have Unique Identifiers #
# 22. Check Primary Database For Supplemental Logging #
# #
# ------------------------------------------------------------------------- #
# 30. Start Logical Standby Database #
# 40. Stop Logical Standby Database - PLEASE BE CAREFUL !!!!! #
# THIS WILL STOP THE LOGICAL STANDBY APPLY PROCESS #
# ------------------------------------------------------------------------- #
# #
# x. Exit #
# ------------------------------------------------------------------------- #
# Enter Task Number:

Removing the Broker
finally if you wish remove the broker follow the steps below
Remove the broker
  1. Connect to the primary database using DGMGRL
  2. run in dgmgrl "remove configuration perserve destinations"
  3. Connect to the primary using SQLPLUS
  4. using SQL set the parameter "dg_broker_start=false"
  5. then run "alter system set dg_broker_start=false"
  6. repeat steps 4 and 5 for all the standby databases
  7. repeat step 2 for all the standby databases
  8. remove the two broker configuration files for all databases

Automatic Failover

Data Guard has implemented something called Fast-start Failover (FSFO) which uses the Broker to perform the failover actions when there is a problem. This architecture uses a third member quorum that ensures that the failover occurs only when everything meets the rules that you have defined, when the failover has happened the primary will never be allowed to open to avoid any split-brain scenarios, this would be a bit of a nightmare should both databases be open and processing transactions.
The third member is called the Observer and its job is to maintain a connection with the primary and target standby databases, monitoring there health and performing any failover's necessary, the Observer will also reinstate the old primary when it comes back on, if it can. The observer pings the primary database and that the first sign of trouble it will start to countdown (which you configure), if it does re-establish the connection it will make all the necessary checks before goes back to watch mode again, if the timer expires then it checks that the standby can take over and initiates a failover, this switchover will have all automatically and in the background using the Broker. If and when the primary comes back the Observer will reinstate the old primary as a standby database again using the Broker to achieve this.
It is import on where the Observer is placed in the network, only one observer per Data Guard installation can be installed, so this server must have access to both the primary and standby databases with as much redundant networking as possible. Next your thing is how much the observer is going to cost me, not much as it can run on most platforms and only required the Oracle Client Kit for the version of Oracle that you are running, you must setup the TNSNAMES on the observer to allow it to ping the databases. If the Observer was to crash it will have no impact on the current Oracle environment, the only impact is that FSFO will not be available until the Observer is up and running again. The Observer can monitoring the following
  • Database crash
  • System crash
  • The loss of the network
  • Complete site outage
You can also get FSFO to perform a shutdown abort on the primary when other issues arise such as
  • Datafile Offline
  • Corrupted Controlfile
  • Corrupted Dictionary
  • Inaccessible Logfile
  • Stuck Archiver
The tags above must be entered as they are above otherwise the Broker will not understand them
Monitor a specific condition via the Broker
DGMGRL> enable fast_start failover condition "Corrupted Controlfile";
DGMGRL> enable fast_start failover condition "Datafile Offline";
To display what you are monitoring use
Display conditions that are be monitored
DGMGRL> show fast_start failover;
Fast-Start Failover: DISABLED
Threshold: 30 seconds
Target: (none)
Observer: (none)
Lag Limit: 30 seconds
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES
Oracle Error Conditions:
(none)
Now that you have an overview of FSFO it's time to set it up and test it, just a quick check before we progress, make sure that the following has been setup or configured
  • Use the Broker with all its prerequisites
  • Enable flashback database on both primary and standby
  • Setup the configuration correctly for the protection mode (standby redo logs files on both sides, redo transport setup the same in both directions)
  • Install the Observer system and configure TNSNAME
If you are using more than one standby you must let the Broker know which one you want to become the primary, if you only have one then the broker will know already
Select the standby to become the primary
DGMGRL> edit database prod1 set property FastStartFailoverTarget = 'prod1dr';
DGMGRL> edit database prod1dr set property FastStartFailoverTarget = 'prod1';
Now its time to discuss how long you should wait before you want to failover, you don't want it too short just in case you network blips, by default it is set to 30 seconds but you can go down to 6 seconds if you wish.
change threshold
DGMGRL> edit configuration set property FastStartFailoverTargetThreshold = 45;
You can control the amount of data loss, if using one of the lesser protection modes, the greater the lag limit set the greater the data loss, again the time is in seconds.
lag limit
DGMGRL> edit configuration set property FastStartFailoverLagLimit = 60;
If the data loss is less than the limit the failover will proceed, if more redo would be loss than the lag limit, the failover will not occur and nothing happens until the primary database either comes back and processing continues or you choose to failover manually, suffering the additional data loss. If you are using maximum protection mode then this property is ignored.
Here are two more additional properties that you can setup regarding the primary, one is to shutdown it down if it becomes hung and the other is to reinstate it if a failover does occur
abort primary if in a hung state
DGMGRL>edit configuration set property FastStartFailoverPmyShutdown = true;
reinstate primary after a failover
DGMGRL>edit configuration set property FastStartFailoverAutoReinstate = true;
Once you are happy with everything you can now enable the FSFO
Enable FSFO
DGMGRL> enable fast_start failover;

## Display the configuration

DGMGRL> show fast_start failover;
Once all setup you can test the FSFO by performing a shutdown abort on the primary, and checking that the failover occurs and that they primary is reinstated and with the amount of data loss expected if using the lesser protection modes. If you are using a test environment this is the time to experiment and play around with different settings. Again keep an eye on the log files including the Broker log file to see how Oracle handles the failovers and to become familiar with them.

No comments :

Post a Comment