Data Guard Info StepbyStep
Data Guard Info StepbyStep
Abstract....................................................................................................................................................................3 Specific Data Guard Environment Presented...........................................................................................................3 Overview Of Data Guard Concepts.........................................................................................................................4
Operational Requirements........................................................................................................... 4 Data Guard Architecture.............................................................................................................. 5
Appendix 1: SPFILEs.............................................................................................................................................40
SPFILE for PROD1 on node1....................................................................................................... 40 SPFILE for PROD1 on node2....................................................................................................... 41 SPFILE for PROD2 on node1....................................................................................................... 41 SPFILE for PROD2 on node2....................................................................................................... 42
Database
tnsnames.ora for node1............................................................................................................. 46 tnsnames.ora for node2............................................................................................................. 47 tnsnames.ora for Non-Privileged Clients.................................................................................... 47 tnsnames.ora for DBA Clients.................................................................................................... 47
Paper #36226
Database
ABSTRACT
Oracle has introduced many new features in Oracle9i Data Guard to enhance Oracle8i standby database functionality. This white paper covers how to implement Oracle9i Data Guard. An actual production Data Guard environment is presented, a bidirectional physical standby database configuration between two Linux Red Hat 7.1 servers running two separate databases. However, the concepts presented apply to any platform. Step-by-step procedures and actual configuration files demonstrate a working Data Guard implementation. 1
The specifics of this implementation were to create a standby database on server node2 for an existing database, PROD1, on node1. Similarly, to create a standby database on server node1 for a separate existing database, PROD2, on node2. Because one implementation was the mirror image of the other, the generic steps were the same to build each standby. The first standby built was the PROD1 standby on node2. Then the PROD2 standby was built on node1. Whenever specific commands or values are indicated here, they are for the PROD1 standby implementation on node2. Each server was running a separate instance of Oracle 9.2.0.1.0 configured for dedicated server on a Red Hat Linux 7.1 platform. Each server was a Dell Poweredge 2550 with two 1000Mhz P3
1
For questions or issues regarding this material, please feel free to contact me at MichaelNew@earthlink.net.
Paper #36226
Database
processors and 4GB RAM. The two servers were located on a WAN, and the network provided high throughput between the servers. These servers were located in different geographical locations, thereby providing disaster recovery. The primary database on one server had its standby database on the other server to make efficient use of each system with no idle hardware. If either primary database became incapacitated, the physical standby database at the other location could be failed over to the primary role so processing could continue.
OPERATIONAL REQUIREMENTS
Below are operational requirements for maintaining a standby database. Some of these requirements are more lax then Data Guard best practices would dictate (see Best Practices For Data Guard Configurations below): 
2
Database
The primary and standby databases must be the same database release. To use the Data Guard broker, the database server must be licensed for Oracle9i Enterprise Edition or Personal Edition. The operating system on the primary and standby sites must be the same, but the operating system release does not need to be the same. The hardware and operating system architecture on the primary and standby locations must be the same. For example, a Data Guard configuration with a primary database on a 32-bit Linux system must be configured with a standby database on a 32-bit Linux system. The primary database can be a single instance database or a multi-instance Real Application Clusters database. The standby databases can be single instance databases or multi-instance Real Application Clusters databases, and these standby databases can be a mix of both physical and logical types. If using a physical standby database, log transport services must be configured to specify a dedicated server process rather than a shared server (dispatcher) process in managed recovery mode. Although the read-only mode allows a shared server process, you must have a dedicated server once you open the database again in managed recovery mode. 3 The hardware (for example, the number of CPUs, memory size, storage configuration) can be different between the primary and standby systems. Each primary database and standby database must have its own control file. If you place your primary and standby databases on the same system, you must adjust the initialization parameters correctly.
Oracle9i Data Guard Concepts and Administration , Section 5.1 Introduction to Log Transport Services. This requirement is easy to miss, only found referenced in a Note in this section.
Paper #36226
Database
On the standby location, log apply services use the following processes: Managed recovery process (MRP) - For physical standby databases only, the MRP applies archived redo log information to the physical standby database. Logical standby process (LSP) - For logical standby databases only, the LSP applies archived redo log information to the logical standby database, using SQL interfaces.
On the primary and standby locations, the Data Guard broker uses the following processes: Data Guard broker monitor (DMON) process These processes work cooperatively to manage the primary and standby databases as a unified configuration. The DMON processes work together to execute switchover and failover operations, monitor the status of the databases, and manage log transport services and log apply services.
Figure 2 identifies the relationships of these processes to the operations they perform and the database objects on which they operate in the absence of the Data Guard broker.
In this figure, the standby redo logs are optionally configured for physical standby databases, except when running in maximum protection mode, which requires physical standby databases and standby redo logs. Logical standby databases do not use standby redo logs.
Paper #36226
Database
Database
occurs. Minimal data loss is possible during a forced failover when operating in Maximum Performance mode. You should not fail over to a standby database except in an emergency, because the failover operation is an unplanned transition that may result in data loss. If you need to fail over before the primary and standby databases are resynchronized, and if the standby database becomes inaccessible, data on the primary database may diverge from data on standby databases. You can prevent this by using a combination of forced logging (to force the generation of redo records of changes against the database), standby redo logs, and the Maximum Protection mode with physical standby databases. The amount of data differences or data loss you incur during a failover operation is directly related to how you set up the overall Data Guard configuration, and log transport services in particular, and can be prevented entirely by using Maximum Protection mode.
Bug No. 2083080 - ORA-1679 occurs when you activate the standby database, if you already opened the standby in read-only mode, and mounted it (by recovery). Bug No. 2151468 - switchover from the standby database hangs, but works from the primary database.
Maximum Availability Architecture (MAA), July 2002 , an Oracle White Paper.
Paper #36226
Database
Bug No. 1034871 - BACKUP CONTROLFILE TO TRACE output is wrong if using a standby controlfile. In particular, read-only files may be listed when they should not be. Bug No. 1920673 - Standby database recovery may fail (end-of-redo error is possible) in delayed protection mode. Bug No. 1950279  A network disconnect from a no-data-loss standby can corrupt primary online redo logs.
AND
IMPLEMENT BACKUP
Implement an efficient Backup and Recovery (B&R) architecture before setting up a standby database. A standby provides High Availability (HA) functionality. The bidirectional failover standby database design presented here is a common, practical, and stable HA configuration. However, good B&R mechanisms should be in place before implementing a standby. 5 It is extremely important to recognize the difference between HA and B&R functionality. The impetus for HA, obviously, is to provide a high degree of access to database users. B&R, however, is concerned with the recovery of data in the event of physical or logical corruption. (A standby can prevent logical corruption, but only if it is discovered before being propagated to the standby). If no B&R mechanism exists, some failures would result in data loss despite a working standby. A standby is not considered a B&R solution because it may not always be available. Standby environments depend on multiple devices, which substantially increases the potential for hardware failure. It is common to experience problems in the network or the failover platform that prohibit the transmission of archive logs from the primary to the standby. Naturally you do not want to shutdown your primary production environment in these cases, thereby defeating the HA objective. Another time standby databases are vulnerable as a B&R solution is during failover and subsequent switchback. These operations require refreshing both primary and secondary to a current state. During these activities the standby capability is diminished or non-existent. Therefore, for the length of these periods, any concurrent hardware failure would result in data loss if no B&R mechanisms existed. When implementing backup and recovery, it is highly advantageous to use multiple, even overlapping, backup methods. This allows you to choose the most expedient mechanism to minimize downtime when database recovery is required. Experience shows that using a combination of binary and logical (i.e. database exports) backups to tape and disk provides the maximum flexibility to recover from any failure scenario.
Data Management Solutions Oracle High Availability Guidelines and Recommendations , M. Burke, 2003, ThinkSpark, LLP. General overview of Oracle-based High Availability techniques. Please contact ThinkSpark for copies and additional information.
Paper #36226
Database
Maximum Performance mode with LGWR ASYNC (AFFIRM or NOAFFIRM) option for an environment that tolerates minimal data loss and divergence when sites are temporarily inaccessible. Performance overhead is minimized.
The only difference between the Maximum Protection and Maximum Performance configuration is whether LGWR writes synchronously or asynchronously, respectively. For the environment presented here, Maximum Performance mode with LGWR ASYNC NOAFFIRM was chosen based upon client requirements.
IDENTICAL LOGICAL
STRUCTURES
Logical structures include database-related files and other operating system (O/S) files. Hardware solutions usually involve a considerable investment in hardware or software. However, no such investment is necessary to implement identical file and directory structures. At the database level, datafiles, log files and control files should contain matching names and paths, and use OFA naming conventions if at all possible. Archive directories should also be identical between sites, including size and structure. At the O/S level, maintain the same partition names, privileges, accounts, other software, database and non-database directories and files. Duplicate logical structures allows for identical operational practices. The Data Guard environment covered in this paper used identical logical structures.
10
Database
redo log traffic in either direction, depending upon the role of each site. The primary and secondary site Oracle network configurations here were identically structured on both sites. As with Oracle network configurations, O/S network infrastructures should be the same across all sites. This includes the number of NIC cards. You must ensure that the operating system network infrastructure between sites will support the redo traffic. Because the redo logs on the production database update the physical standby database, your network infrastructure must have sufficient bandwidth to handle the maximum redo traffic at peak load. Furthermore, network latency affects overall throughput and response time for OLTP and batch operations. Not only does a separate NIC at each site provide the required bandwidth for redo log transmission, it also guards against a single NIC point of failure. A loss of network communications between primary and backup systems due to a network card failure would prevent archive log transport from the primary to the standby. Therefore, for redundancy purposes alone, you would do well to install two NICs in (or make a backup NIC available to) each server. For this Data Guard environment, redo traffic traveled on a 100GB dedicated backbone via a second NIC card at each site, whereas regular database application traffic was directed through a 10GB network. The networks were distinguished by different HOST names in the Oracle network configuration files, which were resolved in the /etc/hosts file and routed to the appropriate NIC.
The combination of defining both primary-only and standby-only parameters on both sites, and of using SPFILE to change the four parameters that must be different between sites, greatly simplifies the process of role reversal. Each database in the Data Guard environment presented
Paper #36226
11
Database
uses identical initialization parameters (with the four exceptions) at both primary and secondary sites.
IDENTICAL HARDWARE
ENVIRONMENTS
In an ideal world, you would use the same hardware at all sites, including middle or application tiers. The same RAM , CPU and storage systems ensures that after a switchover or failover, the secondary site now in the production role has the capacity to handle the same load and provide the same level of fault tolerance. The primary and secondary site would then only differ by their current production or standby roles. The same operational procedures can be leveraged across both sites. If the hardware is different between servers hosting primary and standby systems, you may have to restrict work done after a switchover to a server with fewer resources. The storage options that exist today provide a variety of features and options. The following features will increase high availability of your Data Guard environment: Full redundancy for all hardware components. Online parts replacement (hot swappable parts). Online patch application. Mirrored write cache with battery backup. Load balancing and failover capabilities across all host bus adaptors. Hardware mirroring and striping capabilities - You may want to consider coupling Data Guard with remote mirroring technologies, which can be useful to: Synchronize non-database files such as Oracle binaries or other software between primary and secondary sites. Synchronize important flat or binary files such as SPFILEs or init.ora files between primary and secondary sites. That said, Data Guard and physical standby databases provide the following benefits over database remote mirroring solutions with a third party technology: Protection from user error or data corruption. Reduced network utilization because only the redo traffic is transferred. Role management facilities provide simple and integrated switchover and failover procedures. Data Guards remote log transport and database protection modes provide a better switchover and failover no-data-loss solution without needing to customize or script, or use remote mirroring technologies. Simplified support and certification by using an Oracle-based solution.
These features often make Data Guard preferable to remote mirroring solutions. Our client here preferred to use relatively inexpensive identical hardware (see Specific Data Guard Environment Presented above), which was sufficient to avoid a further investment in remote mirroring technology.
12
Database
applies to the primary role, the secondary role, or both. Regardless of the role a parameter applies to, all parameters, except the four italicized whose values differ based upon role (see Four Parameters That Differ By Role below), are set equally on both primary and secondary sites to simplify role reversal. The Data Guard Broker was enabled (DG_BROKER_START=TRUE is the default) for this configuration, which is a best practices setting. LOG_ARCHIVE_DEST_STATE_2 is the only parameter you need to change during the course of building the standby database (see Dynamically Enable LOG_ARCHIVE_DEST_STATE_2 below). Initially, you need to set this parameter to DEFER, then you dynamically set it to ENABLE when you recover the standby database. By way of example, recommended values and attributes shown are those for the PROD1 primary database running in Maximum Performance mode on node1. All parameter settings except one, LOG_ARCHIVE_DEST_2, are Data Guard generic best practices that are irrespective of the database protection mode chosen. (Of course, net service and file/directory names for several parameters are environment-specific). LOG_ARCHIVE_DEST_2 attributes and values are considered best practices for Maximum Performance protection mode configurations, but must be tuned along with other values for each unique environment. Parameters are omitted that relate either to logical standby databases or to Data Guard environments where one system hosts both primary and standby databases.
Paper #36226
13
Database
Parameter Name ARCHIVE_LAG_TARGET COMPATIBLE CONTROL_FILE_RECORD_KEEP_TIME CONTROL_FILES DB_CREATE_ONLINE_LOG_DEST_n DB_FILE_NAME_CONVERT DB_FILES DB_NAME DG_BROKER_START FAL_SERVER FAL_CLIENT LOCAL_LISTENER LOG_ARCHIVE_DEST_1
0 9.2.0.1.0 (use latest release) 14 Platform-specific "" (default) "" (default) 471 PROD1 (set to same name) TRUE PROD1_node1 PROD1_node2 "" location=/arch1/prod1 arch async mandatory noreopen max_failure=0 alternate=log_archive_dest_3 LOG_ARCHIVE_DEST _STATE_1 Both Enable LOG_ARCHIVE_DEST_2 (Maximum Pri service=PROD1_node2 optional lgwr async=20480 Performance mode only) noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30 LOG_ARCHIVE_DEST _STATE_2 Pri enable (if role is production)- initially set to defer until recovering standby defer (if role is standby) LOG_ARCHIVE_DEST_3 Both location=/arch2/prod1 LOG_ARCHIVE_DEST_STATE_3 Both alternate LOG_ARCHIVE_FORMAT Both arch_%t_%S.arc LOG_ARCHIVE_MAX_PROCESSES Both 1 (default but tune for environment) LOG_ARCHIVE_MIN_SUCCEED_DEST Both 1 (default informed by choice of Maximum Performance mode) LOG_ARCHIVE_START Both TRUE REMOTE_ARCHIVE_ENABLE Both TRUE REMOTE_LOGIN_PASSWORDFILE Both EXCLUSIVE SERVICE_NAMES Both PROD1.domain.com STANDBY_ARCHIVE_DEST Sec /arch1/prod1 (=LOG_ARCHIVE_DEST_1) STANDBY_FILE_MANAGEMENT Both AUTO Table 1: Recommended Data Guard initialization parameter settings for PROD1 in Maximum Performance mode on node1 .
Applies To Pri Both Both Both Both Sec Both Both Both Sec Sec Both Both
AVOID
ENVIRONMENT VARIABLES
When implementing Data Guard at a customer site, I came across a standby limitation with an easy workaround. The error does not occur until after building the standby and placing it in managed recovery. The error stems from using Unix environment variables to define initialization parameters, which is possible to do since Oracle 7.0. However, using environment variables, such as ORACLE_SID, to define USER_DUMP_DEST or BACKGROUND_DUMP_DEST causes the alert log error ARC0: Error 7446 attaching RFS server to standby instance at host '<FAL_SERVER_on_stby>'. This behavior is not documented in any Oracle Corporation sources as relating to standby databases. Error 7446 apparently denotes an ORA-7446: sdnfy: bad value string for parameter string , which is in the Oracle Error Messages manual. The action stated there is to make sure the directory you have specified is a valid directory/file specification. An
Paper #36226
14
Database
oblique reference to the cause of the problem is found in a generic Metalink Note 6 about the ORA-7446 error. This Note states that environment variables are not defined for remote SQL*Net connections. Since redo logs are transmitted from the primary to the standby database over SQL*Net, this error makes sense in the context of standby databases. Since environment variables cannot be expanded, the solution, of course, is to replace the environment variable with its expanded equivalent.
A diff between exported SPFILEs (see Appendix 1 for exported SPFILEs in their entirety) for PROD1 from primary and secondary sites shows just how slightly even these four values differ (italicized):
$ diff initPROD1.ora_frSPFILE_node1 initPROD1.ora_frSPFILE_node2 < *.fal_client='PROD1_node2' < *.fal_server='PROD1_node1' --> *.fal_client='PROD1_node1' > *.fal_server='PROD1_node2' 27c27 < *.log_archive_dest_2='service=PROD1_node2 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' --> *.log_archive_dest_2='service=PROD1_node1 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' 30c30 < *.log_archive_dest_state_2='enable'#stby - set to defer when in stby role --> *.log_archive_dest_state_2='defer'#stby - set to enable when in pri role
Recall that for PROD1, node1 is the primary and node2 is the standby. You can see that the first three parameter values only differ by an Oracle Net service name, PROD1_node2 vs. PROD1_node1. For details on the Oracle Net service names, see Create Oracle Network Configuration Files below. As is required, the value for the fourth parameter, LOG_ARCHIVE_DEST_STATE_2, is set to ENABLE on the primary (node1) so that LOG_ARCHIVE_DEST_2 can specify the standby destination for Oracle Net to transmit the archive logs. (See Set Initialization Parameters  Use SPFILE below about setting this parameter to DISABLE until just before recovering the standby when you dynamically set it to ENABLE). LOG_ARCHIVE_DEST_STATE_2 is set to DEFER on the standby (node2) to disable LOG_ARCHIVE_DEST_2 because this is a primary-only parameter.
6
Metalink Note 20480.1: OERR: ORA 7446 sdnfy: bad value '%s' for parameter %s .
Paper #36226
15
Database
SPFILE comments for LOG_ARCHIVE_DEST_STATE_2 serve as reminders for the proper setting of this parameter depending upon the role of the site.
Paper #36226
16
Database
Description and Rationale Maintaining a standby requires archiving to be enabled and started (LOG_ARCHIVE_START = TRUE). Additionally, remote archiving must be enabled (REMOTE_ARCHIVE_ENABLE = TRUE). LOG_ARCHIVE_FORMAT should have a thread and sequence attribute and should be consistent across all instances. The %S instructs the format to zero-fill the sequence number piece of the log archive file name. Reduce work for LGWR. Postpone hang situations if local and alternate archive destinations are full. Allow more time to detect and fix space problems. LOG_ARCHIVE_DEST _STATE_1 should always be set to ENABLE. In all cases, use LGWR (not ARCn). All production instances in a RAC archive to the same standby destination using the same net service name. The net service name can have an address list that contains all standby instance nodes with the first node in the list being the primary standby instance that is normally running managed recovery. Use a backup net service name only if you want to switch to the secondary standby host. In this case, only one physical standby node exists. In order to support role reversal, both primary and secondary sites archive locally to one device (LOG_ARCHIVE_DEST_1), and alternately archive locally to a separate device (LOG_ARCHIVE_DEST_3). ARCH can switch to the alternate if it encounters a write error or if the destination runs out of space. Use only 2 archive destinations: /arch1/<DB_NAME> as the primary archive destination /arch2/<DB_NAME> as the alternate archive destination If they are identical across the nodes, then it is predictable and easy to manage even after a switchover or failover operation.
ARCH process archives locally to LOG_ARCHIVE_DEST_1 on both primary and standby. LGWR should archive both online and archived redo logs from the primary to the standby (and, for RAC, to only one standby instance and node).
Create identical local alternate archive destinations, LOG_ARCHIVE_DEST_1 and LOG_ARCHIVE_DEST_3, on both primary and standbys. . Set LOG_ARCHIVE_DEST _STATE_3 = ALTERNATE Archive directory structure is identical across all production and standby nodes.
To avoid confusion, set STANDBY_ARCHIVE_DEST = LOG_ARCHIVE_DEST_1 (local archive directory). If standby redo logs (SRLs) are present, the standbys ARCH process writes to the local archive destination. If there is a gap, the fetch archive log (FAL) process writes to the standby archive destination.
Paper #36226
17
Database
Archiving Rules Archive destinations should be sized to hold all the archived redo log files since the last ondisk backup. Set LOG_ARCHIVE_DEST_STATE_2 to ENABLE on primary and to DEFER on standby
Description and Rationale For standby database instantiation, the on-disk backup accompanied with the local archived redo log files can be leveraged to re-create a new standby database. Any node can play the role of the primary standby node since switchover or node failures can occur. The setting depends on the database role. In a production role, the state is enabled. When the database is in a physical standby role, the state is deferred.
Archiving strategy is intricately linked with the settings for LOG_ARCHIVE_DEST_2. This parameter applies to the remote standby destination because the SERVICE attribute of this parameter points to the standby database net alias. Following is a description of all attributes of this parameter as tuned for this environment, but derived from best practices principles for Maximum Performance mode:  OPTIONAL  Specifies that the primary can reuse online redo logs even if archiving to the standby fails. The customer here did not want primary database operations to halt if the primary database lost contact with the standby. The MANDATORY attribute of LOG_ARCHIVE_DEST_1 still required local primary archiving of redo logs to succeed. LGWR  Specifies that LGWR rather than ARCH is responsible for transmitting redo logs to the standby. This allows redo records generated on the primary to be transmitted at the recordlevel, allowing for minimal data loss. Otherwise, using ARCH, a redo log switch needs to occur so the redo log can be archived and transmitted to the standby. ASYNC=20480 - When using the primary database log writer process to archive redo logs, you can specify synchronous (SYNC) or asynchronous (ASYNC) network transmission of redo logs to archiving destinations. With ASYNC, control will be returned to the application processes immediately, even if the data has not reached the destination. This mode has a reasonable degree of data protection on the destination database, with minimal performance effect on the primary database. In general, for slower network connections, use larger block counts. ASYNC=20480 indicates to transmit the SGA network buffer in 20480 512-byte blocks. In Maximum Performance mode, this 10MB buffer size (the largest allowed) performs best in a WAN. (In a LAN ASYNC buffer size does not impact primary database throughput). Also, in a WAN, using the maximum buffer size reduces Timing out messages due to an async buffer full condition. This is because the smaller the buffer, the more the chance of the buffer filling up as latency increases. 7 NOAFFIRM - Specifies to perform asynchronous log archiving disk write I/O operations on the standby database. It is not necessary for the primary database to receive acknowledgment of the availability of the modifications on the standby database in a Maximum Performance environment. This attribute applies to local and remote archive destination disk I/O operations, and to standby redo log disk write I/O operations. However, the NOAFFIRM attribute has no effect on primary database online redo log disk I/O operations. REOPEN=15 MAX_FAILURE=10  Denotes that if there is a connection failure, the network server reopens the connection after 15 seconds and retries up to 10 times. The maximum
Maximum Availability Architecture (MAA), Feb 2003 , an Oracle White Paper.
Paper #36226
18
Database
retry time for all failed operations is calculated as REOPEN multiplied by MAX_FAILURE, or 150 seconds (2.5 minutes). DELAY=30 Specifies that recovery apply is delayed for 30 minutes from the time the log is archived on the physical standby, but the redo transfer to the standby is not delayed. The correct recovery delay is important in ensuring that a user error or corruption does not get propagated to the standby database, which would compromise your disaster recovery solution. The recovery delay setting is critical for standby configurations regardless of the protection mode. The delay allows the managed recovery process (MRP) on the standby database to intentionally lag behind in applying archived redo log files. Without a recovery delay, when the standby database is in managed recovery mode, archived redo is automatically applied upon a log switch. Reducing the delay time reduces standby recovery time due to the reduced number of archived redo log files required for standby recovery. But a short delay time is possible only if you have a monitoring infrastructure that detects problems and stops the standby database within that timeframe (see Monitor Data Guard Configuration below). In the case of this client, OEM Data Guard Manager events monitored the configuration tightly enough to allow for a 30-minute delay. NET_TIMEOUT=30 Designates that if there is no reply for a network operation within 30 seconds, then the network server errors out due to the network timeout instead of stalling for the default network timeout period (TCP timeout value). A NET_TIMEOUT of 30 seconds here provided enough cushion to accommodate the latency during peak redo traffic through the dedicated NIC on the WAN.
Data Guard functionality requires, or archiving strategy informs, the above archive-related parameter settings. However, Data Guard best practices address the following five additional parameter settings:
DISABLE ARCHIVE_LAG_TARGET
The initialization parameter ARCHIVE_LAG_TARGET limits the amount of data that can be lost and can effectively increase the availability of the standby database by forcing a log switch after a user-specified time period elapses. As with the Data Guard environment here, you would be better off disabling this time-based thread advance feature by setting it to zero to eliminate archive log switches based on time. Instead, as with any database, size redo logs such that log switches occur frequently enough to meet requirements for maximum allowable loss of data.
TUNE CONTROL_FILE_RECORD_KEEP_TIME
CONTROL_FILE_RECORD_KEEP_TIME specifies the minimum number of days before a reusable record in the control file can be reused. Setting this parameter prevents the ARCHIVELOG mechanism from overwriting an archive log name (a serially reusable record) in the control file. Setting this parameter value higher than the default of 7 days helps to ensure that data is made available on the standby database.
UNSET DB_CREATE_ONLINE_LOG_DEST_ N
The DB_CREATE_ONLINE_LOG_DEST_n parameter should be unset, as it was in the Data Guard environment here. This parameter sets the default location for Oracle-managed control files and online redo logs. But it causes a problem for instantiation of standby databases using standby redo logs because the SRL names cannot be reused. In other words, when this parameter is set, Oracle dynamically creates a name for the online logs that cannot be manually created on the standby database.
Paper #36226
19
Database
A helpful note on setting up dynamic service registration is Metalink Note 76636.1: Service Registration in Net 8i.
Paper #36226
20
Database
capability after all, primary and standby roles change very infrequently. Moreover, you can easily simulate this functionality by manually controlling the listener states on the primary and secondary sites according to the role of these sites (see explanation of Static SID below). Certainly, the biggest disadvantage to using automatic service registration is that it precludes you from using the Data Guard broker to manage the standby database. 9 An unavoidable though much smaller constraint of service registration is that if the listener is started after the instance, service registration does not occur immediately (though PMON should register the instance within a short time). In this case, you would need to issue the ALTER SYSTEM REGISTER statement on the database to instruct PMON to immediately register the instance with the listeners. All that said, to dynamic service registration requires setting the following initialization parameters correctly: SERVICE_NAMES for the database service name INSTANCE_NAME for the instance name LOCAL_LISTENER (you only need this parameter if using a non-default listener address other than port 1521) PMON registers a database service with the listener and resolves LOCAL_LISTENER by finding the corresponding net service name (alias) in the local tnsnames.ora file. You need a locally managed tnsnames.ora on both production and standby databases so you can use the same alias on each node. On the primary node, SERVICE_NAME or SID_NAME in listener.ora and SID in tnsnames.ora files should equal the SERVICE_NAMES initialization parameter. The listener.ora file needs to define a listener for redo log traffic (see Suggested Oracle Network Configuration below), but a separate listener is not required for database client connections on default port 1521. In fact, for database clients (barring the required listener.ora definition for redo traffic), a listener.ora file is not even required for database clients unless using a port other than 1521; in that case, you need to define an identical listener (except for HOST settings) on primary and secondary hosts, but do not need statically configured information. You may decide to use both dynamic service registration and a static SID, because OEM requires a static SID to discover and manage listeners. A static SID does not override service registration - database clients still connect by means of automatic service registration - the static SID is an appendage only for OEMs benefit. The customer here wanted to use the Data Guard broker, so automatic service registration was not an option. Static SID The other method for configuring the listener is to use a static SID list. There are several advantages to using a static SID: A static SID is more reliable, less complex and more intuitive to manage than using dynamic service registration. A static SID containing the GLOBAL_DBNAME parameter is required by standby databases and listeners in order for Oracle Enterprise Manager (OEM) to automatically discover and manage them. A static SID allows you to manage the Data Guard configuration using the Data Guard broker through OEM.
21
Database
In a Data Guard environment, a minor disadvantage to a static SID (compared with automatic service registration) is that it requires you to manually stop the listener on the primary site and start it on the secondary site during a switchover or failover. However, this drawback is more than offset by the fact that a static SID allows you to use Data Guard Manager within OEM, which greatly simplifies the process of role reversal. As an example, following is the static SID list entry for the listener SID_LIST_LSNRPROD1PUB configured for non-privileged database access on PROD1:
LSNRPROD1PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1021)(SDU = 32768) ) ) SID_LIST_LSNRPROD1PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) )
ALTERNATE STANDBY CONNECTION If the standby site needs to be shut down for maintenance, you can alter the LOG_ARCHIVE_DEST_2 setting on the production database to point to an alternate service name (alias) of another working standby, whose alias is resolved in the tnsnames.ora file. This allows the production database to remain open in any protection mode. If you need to shutdown the standby node for maintenance, one of the following will occur: If the Data Guard environment is configured with Maximum Protection database mode, the production database incurs an outage unless you can switch to an available standby database. If the Data Guard environment is configured with Maximum Availability or Performance database modes, you can shutdown the standby node for maintenance without impacting the production database. However, you lose fault tolerance with this architecture. The longer the outage, the further the standby will lag behind the production database. Still, the choice of Maximum Performance mode means that it is not necessary to implement an alternate standby connection.
CONNECT -TIME FAILOVER Connect-time failover occurs when a connection request is forwarded to a second listener address if the first listener address does not respond. Connect-time failover requires that the service name in the tnsnames.ora file for database clients contain two addresses: one for the production node and one for the standby node. The second address allows for connect-time failover in case the first connection fails. Irrespective of whether you use dynamic service registration or a static SID, Data Guard best practices call for database clients to use connecttime failover to connect to the primary or the standby. With service registration, connect-time failover should work automatically, as only the primary database should be registered with its listener. When using a static SID, this is manually accomplished by making sure that only the listener on the primary site is running.
Paper #36226
22
Database
SESSION DATA UNIT (SDU) PARAMETER SETTING Reducing the number of round trips across the network is key to optimizing the performance for transporting redo log data to a standby site. With Oracle Net Services it is possible to optimize data transfer by adjusting the size of the Oracle Net setting for SDU. In a WAN environment, Oracle recommends setting the SDU to the maximum setting of 32k (32768). The SDU parameter designates the size of an Oracle Net buffer used to place data into before it delivers each buffer to the TCP/IP network layer for transmission across the network. Oracle Net sends the data in the buffer either when requested or when it is full. Oracle internal Data Guard testing on a WAN demonstrates that the maximum setting of 32k performs best on a WAN. This was the setting used at the customer site reviewed here. The SDU parameter needs to be set at the listener and connection levels, i.e., in the tnsnames.ora and listener.ora. See the network configuration files in Appendix 2 for the syntax of the SDU parameter setting. In addition to setting the SDU parameter, increasing the TCP send and receive window sizes can improve performance. Use caution, however, because this may adversely affect networked applications that do not exhibit the same characteristics as archiving. This method consumes a large amount of system resources, so involve your network administrator in any TCP window sizing.
LISTENER.ORA
10
Metalink Note 175122.1: Data Guard 9i - Net8 Configuration for a 2-node database environment .
Paper #36226
23
Database
Two listeners service the primary database and its standby: Listener for non-privileged database clients: This listener is for non-privileged (other than DBA) clients to connect to the current primary database on a non-default port (not 1521). Using a non-default port prevents automatic service registration from occurring by accident. Proper Data Guard operation relies upon starting this dedicated client listener only on the node where the primary database is currently running. You need to make sure you stop the corresponding listener on the secondary node. To allow for transparent client connections to the database, clients use the same net service name (tnsnames.ora alias) regardless of which is currently the primary database. This is possible because this alias contains addresses for both primary and standby nodes; users can only connect to the primary node because only the primary listener will be running. This listener has two addresses, one for redo log traffic and the other for DBA access. The address for redo log traffic allows the standby database to fetch archive log gaps from the primary database through a dedicated network on a separate port than that for DBA access, which goes through the regular network. Because redo log traffic goes through a special dedicated network connecting the two nodes, there is one more network interface and IP address on each of the two nodes. The two databases in this bidirectional environment share this dedicated NIC which at any one time accommodates redo log traffic in each direction for each database. This special network is designated by different hostnames, resolved in the /etc/hosts file on each node. This listener is configured and always running on both nodes to allow for role reversal. Only the listener on the current primary node is utilized for redo log traffic, but the listener for the current standby remains running so that you wont need to start it if roles reverse. SYSDBA connections must always be possible on both the primary and standby databases. So they cannot go through the listener for non-privileged database clients, which is only running on the node where the primary database is running. FILES
TNSNAMES.ORA
Build three types of tnsnames.ora files: for each local node, for non-privileged database clients, and for DBA access:  The local tnsnames.ora file on each node contains an entry for the standby database on that node to fetch archive log gaps from the primary database on the other node. The net alias is the database setting for FAL_CLIENT on that node. The tnsnames.ora files for non-privileged clients contains aliases used to connect only to the primary database. Each alias contains two addresses, one for each node. As an example, for the PROD1 database, the alias looks like this:
PROD1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1021)) (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1021))
Paper #36226
24
Database
Proper Data Guard operation relies upon starting the listener only on the node in the primary database role so only one of these two addresses can be used - the address defining HOST=primary_node. Were using connect-time failover, but with manual listener controls. Distributing a separate tnsnames.ora file to non-privileged clients is a good security practice, in that it only gives the required connectivity information, and nothing more. The tnsnames.ora file for DBAs can be used to connect to the database in either role on either node.
See Create Standby Redo Logs, If Necessary below for the syntax used to create standby redo logs.
Paper #36226
25
Database
The most common standby detection mechanism is to monitor the standby alert log for critical errors such as ORA-600 or ORA-1578, and to alert and react when the application detects a logical corruption like a missing table. But other non ORA- errors in the alert log can indicate trouble. The following alert log entries exemplify this:
ARC0: ARC0: ARC0: ARC0: Error 272 I/O error Error 270 Error 270 writing standby archive log file at host 'PROD1_node2' 272 archiving log 5 to 'PROD1_node2' closing standby archive log file at host 'PROD1_node2' Closing archive log file 'PROD1_node2'
Therefore, you may want to consider parsing the alert log file for words like error (caseinsensitive). Despite best intentions, you may not be able to monitor for all possible errors in the alert log. And other errors will not appear in the alert log at all, such as unrecoverable changes due to NOLOGGING operations when not in FORCE LOGGING mode, for example. If you are not managing your Data Guard environment with the OEM 9i Diagnostics Pack, you can monitor for these same event conditions by querying the dynamic performance (V$) views on the standby (and the primary). See Appendix 3 for scripts that check for archive log gaps between primary and standby, ensure the standby is in media recovery mode, and check for unrecoverable changes on the standby propagated by unrecoverable operations on the primary database.
26
Database
In summary, the Data Guard broker provides the following benefits: Management - Provides primary and standby database management as one unified configuration. It allows you to configure and manage multiple sites from a single location. Automation - Automates opening a primary database, mounting a physical standby database, opening a logical standby database, and starting log transport and log apply services. Automates switchover and failover operations, provides a GUI for changing database states and Data Guard properties. Monitoring - Provides Monitoring of database health and other runtime parameters. Provides a unified status through the Data Guard configuration log. Provides a tie-in to Oracle Enterprise Manager Data Guard-related events.
27
Database
and may require substantial DBA intervention, including the need to propagate unlogged operations manually. The FORCE LOGGING database mode will override any user transactions specifying NOLOGGING. FORCE LOGGING is not the default mode, so you will probably need to change it. To check whether the primary database is in FORCE LOGGING mode, issue the following command: (9.2):
SELECT FORCE_LOGGING FROM V$DATABASE; FORCE_LOGGING -------------------NO
To place the primary database in FORCE LOGGING mode, specify the following as SYSDBA:
SQL> ALTER DATABASE FORCE LOGGING; Database altered.
The FORCE LOGGING mode is a persistent attribute of the database. That is, if the database is shut down and restarted, it remains in the same logging mode state. However, if you recreate the control file, the database is not restarted in the FORCE LOGGING mode unless you specify the FORCE LOGGING clause in the CREATE CONTROL FILE statement. Because FORCE LOGGING mode is not available in Oracle Server Release 9.0, the best you can do in Oracle9.0 is to put all tablespaces in LOGGING mode (except temporary tablespaces which is always set to NOLOGGING). To check if a tablespace is in LOGGING mode, execute the following in sqlplus (this works in 9.2 as well but is unnecessary if in FORCE LOGGING mode): (9.0):
SELECT TABLESPACE_NAME, LOGGING FROM DBA_TABLESPACES; TABLESPACE_NAME -----------------------------SYSTEM UNDOTBS1 TEMP DRSYS INDX etc. LOGGING --------LOGGING LOGGING NOLOGGING LOGGING LOGGING
This does not prevent objects from being overridden by NOLOGGING if specified at the object level, or when doing direct load inserts.
Paper #36226
28
Database
O/S clocks - It is easy to overlook synchronizing the clocks of all servers in a Data Guard environment. Not doing so can make it very confusing later on when comparing log files (particularly alert logs) containing time stamps, because they will probably not dovetail. You may want to ask the System Administrator to implement enterprise-wide software to keep all server clocks synchronized all the time. Hosts files - A working Data Guard environment requires that the /etc/hosts files contain all hostnames used in the network configuration files so that they can be resolved to IP addresses. Even if the /etc/hosts files on both servers in a bidirectional standby configuration already contain both server names, as is often the case, you may need to add new entries for any new NICs dedicated to redo traffic (see Identical Oracle and O/S Network Configurations above). These dedicated networks must be distinguished by different HOST names in the Oracle network configuration files, and resolved in the /etc/hosts files. Also, as recommended above under Use Identical Primary and Secondary Sites , you may need to install a new NIC to prevent a single NIC point of failure. Environment variables and DBA scripts. Initialize environments on both primary and secondary sites (preferably with scripts) to host either database in either role. Make sure to include all required environment variables (ORACLE_SID, etc), aliases, functions, and the like. Place all primary DBA maintenance scripts on both sites. If you configured the SPFILE using the default name in the default directory as recommended above, then you can change all scripts that specify a PFILE after the database STARTUP command. The default SPFILE will now be used if nothing is specified after the STARTUP command within scripts. This makes these scripts more generic. But you must be careful to initialize the O/S environments using the correct values for ORACLE_SID, etc.
Directories - Create identical directories (including owner & permissions) on primary and secondary sites for datafiles, control files, redo logs, standby redo logs and archive logs should match as follows: Create identical datafile directories on standby as on primary. Create identical control file directories on the standby as on the primary. Provided the online redo log is multiplexed, it is good practice to store a control file copy on every disk drive that stores members of online redo log groups. By storing control files in these locations, you minimize the risk that all control files and all groups of online redo logs will be lost in a single disk failure. Create the identical redo log directories on the standby as on the primary. Create identical standby redo log directories on both primary and secondary sites. On both primary and secondary, create a local alternate archive log directory (LOG_ARCHIVE_DEST3) on a separate device than LOG_ARCHIVE_DEST_1. Create STANDBY_ARCHIVE_DEST on the secondary in the same location as LOG_ARCHIVE_DEST_1 on the primary.
Password files. The Data Guard broker requires password file authentication, which was already the authentication method used here. However, if not using the broker, use operating system rather than password file authentication. Using password file authentication opens up the possibility after role reversal of mismatches in SYSDBA/SYSOPER password or privilege information (as reported by V$PWFILE_USERS) with that stored in the password file. This can occur when you add or remove SYSDBA or SYSOPER users, or change
Paper #36226
29
Database
passwords for these users on the primary, or if you ever need to rebuild the standby. SQL commands that change SYSOPER or SYSDBA user information on the primary database do propagate to the standby database, but not to the standby password file. If you cannot use OS authentication, but must use a password file, then create a password file on the primary (if not already done) and the standby as follows:
$ orapwd file=orapw<ORACLE_SID> password= passwd entries=max_users
Set REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE. Then connect SYS/password as SYSDBA, and grant the SYSDBA or SYSOPER system privilege to the same users on the standby as granted on the primary. If later you need to change passwords for SYS, SYSTEM, or any other SYSDBA or SYSOPER user, or if you add/remove any of these users, remember to do so on both primary and secondary sites.
30
Database
ORACLE_SID.ora
If not null, then SPFILE is being used. Alternately, you can check as follows:
SQL> SELECT COUNT(*) FROM V$SPPARAMETER WHERE VALUE IS NOT NULL; COUNT(*) ---------31
If count >0 then you are using SPFILE. To switch to SPFILE, do the following as SYSDBA:
SQL> CREATE SPFILE FROM PFILE=<full_path_of_init.ora>;
(The clause FROM PFILE does not need to be specified if the PFILE is in the default directory). This creates an SPFILE called $ORACLE_HOME/dbs/SPFILE$ORACLE_SID.ora. You now need to bounce the database in order for it to use SPFILE:
SQL> SHUTDOWN IMMEDIATE; Database closed. Database dismounted. ORACLE instance shut down. SQL> STARTUP; ORACLE instance started. (last shutdown unless doing cold backup later)
Total System Global Area 143727516 bytes Fixed Size 453532 bytes Variable Size 109051904 bytes Database Buffers 33554432 bytes Redo Buffers 667648 bytes Database mounted. Database opened. SQL> EXIT; Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 Production
31
Database
Net configuration, including LOG_ARCHIVE_DEST_2, SERVICE_NAMES, FAL_SERVER, and FAL_CLIENT. For the parameter values used in this configuration, see Appendix 1 containing the two SPFILEs at each site. Appendix 2 lists all network configuration files for node1 and node2, including listener.ora and tnsnames.ora files, as well as tnsnames.ora files for DBA and nonprivileged clients. The hostnames node1b and node2b correspond to the dedicated network for standby redo log traffic between the nodes. The /etc/hosts file contains the following entries for the two nodes (generic IP addresses are used for client security reasons): 192.168.0.1 node1 node1.domain.com 192.168.0.2 node2 node2.domain.com 10.10.0.1 10.10.0.2 node1b node2b node1b.domain.com node2b.domain.com
These network configuration files abide by best practices as explained above in Recommended Oracle Network Configuration. Creating a bidirectional Data Guard environment requires defining four rather than two listeners two for PROD1 and two for PROD2. Similarly, bidirectional environments call for two (not one) aliases in the local tnsnames.ora file for each node. All listener definitions utilize static SID lists. Table 3 below gives the listener names, the servers on which they are defined, the ports used, and a description of what these listeners service. Listener Name LSNRPROD1PUB LSNRPROD2PUB LSNRPROD1REDO Server(s) Where Located node1, node2 node1, node2 node1, node2 Por t 102 1 102 2 112 1 122 1 LSNRPROD2REDO node1, node2 112 2 122 2 Purpose Non-privileged client access to PROD1 primary Non-privileged client access to PROD2 primary Redo log traffic for PROD1 standby DBA access to PROD1 in either role on either node Redo log traffic for PROD2 standby DBA access to PROD2 in either role on either node
Following is a description of the network configuration for PROD1. The same network structure applies to PROD2; therefore, for simplicity, only the PROD1 network configuration is covered. Listener for non-privileged database clients (LSNRPROD1PUB): The listener is dedicated to non-privileged (other than DBA) clients to connect to the PROD1 primary database on non-default port 1021.
Paper #36226
32
Database
Proper Data Guard operation relies upon starting this dedicated client listener only on the node where the PROD1 primary database is running, which initially will be node1. You need to make sure you stop the corresponding listener on the secondary node, node2. To allow for transparent client connections to the database, clients use the same net service name (tnsnames.ora alias) regardless of which is currently the primary database. This is possible because this alias contains addresses for both primary and standby nodes; users can only connect to the primary node because only the primary listener will be running. This listener allows the PROD1 standby database (initially on node2) to fetch archive log gaps from the PROD1 primary database (initially on node1) through port 1121. All redo log traffic goes through a special dedicated network connecting the two nodes (designated by node1b and node2b). Thus, there is one more network interface and IP address on each of the two nodes (10.0.0.1 and 10.0.0.2, respectively). This listener is always running on both nodes to allow for DBA intervention and role reversal. DBA clients can connect at any time through port 1221 on either node in either role. Only the listener on the current primary node (initially node1) is utilized for redo log traffic, but the listener for the standby node (node2) remains running so that if roles reverse, you wont need to start this listener.
To enable automatic archiving, set the initialization parameter LOG_ARCHIVE_START = TRUE on the primary. (Do the same on the secondary in case of role reversal). To place the primary database in ARCHIVELOG mode, issue the following in sqlplus as SYSDBA:
SQL> SHUTDOWN IMMEDIATE; Database closed. Database dismounted. ORACLE instance shut down. (can defer)
Set the initialization parameter LOG_ARCHIVE_START=TRUE on both primary and secondary. Then execute the following on the primary:
SQL> STARTUP MOUNT EXCLUSIVE; ORACLE instance started. Total System Global Area Fixed Size Variable Size 143727516 bytes 453532 bytes 109051904 bytes
Paper #36226
33
Database
Database Buffers 33554432 bytes Redo Buffers 667648 bytes Database mounted. SQL> ALTER DATABASE ARCHIVELOG; Database altered. SQL> ALTER DATABASE OPEN; Database altered.
If you change the archiving mode, this updates the control file, so requires a cold backup (the next step) of the primary database. Any previous backups are no longer usable because they were taken in NOARCHIVELOG mode.
The filename for the created standby control file must be different from the filename of the current control file of the primary database.
34
Database
All archived redo logs to STANDBY_ARCHIVE_DEST on the secondary (=LOG_ARCHIVE_DEST_1 on the primary). The required archive logs are those created after the last cold backup, or from the beginning of the last hot backup. All online redo logs (recommended for switchover and failover operations)
If you configured dynamic service registration for non-privileged database clients, you need to register the databases with these new listeners because you started them after starting the primary and after mounting the standby. To register these listeners and confirm they were registered, do the following on both primary and standby databases:
Paper #36226
35
Database
SQL> ALTER SYSTEM REGISTER; System altered. SQL> EXIT; $ lsnrctl status LSNRPROD1PUB $ lsnrctl status LSNRPROD2PUB
The output of each of these commands should contain the following (sample output portion is from the first command):
Service "PROD1" has 1 instance(s). handler(s) for this service. Instance "PROD1", status UNKNOWN, has 1
Now you need to recover the standby database by applying required archive logs to the standby. To do so, issue the following command in sqlplus as SYSDBA:
SQL> RECOVER AUTOMATIC STANDBY DATABASE;
This will recover until no more logs are required, at which point you will get the following message:
ORA-00308: cannot open archived log '/arch1/prod2/log_1_4868.arc' ORA-27037: unable to obtain file status Linux Error: 2: No such file or directory.
This should return you to sqlplus. At this point all archived logs from production should have been applied to the standby.
DICONNECT FROM SESSION allows log apply services to run as a detached background server process and immediately returns control to the user. It does not disconnect the current SQL session. This command should eventually return:
Media recovery complete.
The last line in the standby alert log should show the following:
Paper #36226
36
Database
If necessary, set the primary database to the desired mode, in this case, MAXIMUM PERFORMANCE mode as follows:
Paper #36226
37
Database
(9.2):
SQL> SHUTDOWN IMMEDIATE; Database closed. Database dismounted. ORACLE instance shut down. SQL> STARTUP MOUNT EXCLUSIVE; ORACLE instance started. (can defer)
Total System Global Area 143727516 bytes Fixed Size 453532 bytes Variable Size 109051904 bytes Database Buffers 33554432 bytes Redo Buffers 667648 bytes Database mounted. SQL> ALTER DATABASE SET STANDBY TO MAXIMIZE PERFORMANCE; Database altered.
Oracle9.0 has only two database modes, PROTECTED and UNPROTECTED. UNPROTECTED mode in Oracle9.0 is the closest equivalent to the Oracle9.2 database mode, MAXIMUM PERFORMANCE. So the statement in 9.0 would be as follows: (9.0):
ALTER DATABASE SET STANDBY DATABASE UNPROTECTED; Database altered.
If STATUS is VALID and ERROR is null as shown above, then automatic archiving from the primary is working. If not, possible causes are:  the standby may not be in managed recovery mode.  the primary may have switched a log file during the previous two steps, and skipped sending this log file to the standby.  Windows platforms only - a bug with the ARCH process requires you to bounce the archive process as a workaround:
SQL> ALTER SYSTEM ARCHIVE LOG STOP;
Paper #36226
38
Database
System altered. SQL> ALTER SYSTEM ARCHIVE LOG START; System altered.
Confirm from the standby that the archive log from the previous step was copied to STANDBY_ARCHIVE_DEST. In addition, the alert log should show that the log was processed and that Media Recovery is waiting for the next log:
Media Recovery Waiting for thread 1 seq# 4868 Fri Aug 30 08:13:34 2002 Media Recovery Log /arch/prod1/log_1_4868.arc Fri Aug 30 08:13:34 2002 Media Recovery Waiting for thread 1 seq# 4869
Then begin automatic standby database recovery again at Recover Standby Database above.
Paper #36226
39
Database
APPENDIX 1: SPFILES
Below are exports of the four SPFILEs used in the standard bidirectional Data Guard environment presented in this paper. These files were created using the SQL statement CREATE PFILE. The first text initialization parameter file, for example, was created on node2 for the PROD1 database using the following statement:
CREATE PFILE=initPROD1.ora_fr_SPFILE_node1 FROM SPFILE;
SPFILE
FOR
PROD1
ON NODE1
*._OPTIM_ENHANCE_NNULL_DETECTION=FALSE *.archive_lag_target=0 *.background_dump_dest='/usr/oracle/oradba/prod1/bdump' *.compatible='9.2.0.1.0' *.control_file_record_keep_time=14 *.control_files='/db01/oracle/oradata/prod1/control01.ctl','/db02/oracle/oradata/prod1/control02.ctl','/db03/orac le/oradata/prod1/control03.ctl' *.cursor_space_for_time=TRUE *.db_block_size=8192 *.db_cache_size=840M *.db_create_online_log_dest_1='' *.db_create_online_log_dest_2='' *.db_create_online_log_dest_3='' *.db_create_online_log_dest_4='' *.db_create_online_log_dest_5='' *.db_domain='domain.com' *.db_file_multiblock_read_count=32 *.db_files=471 *.db_name='PROD1' *.fal_client='PROD1_node2' *.fal_server='PROD1_node1' *.global_names=false *.instance_name='PROD1' *.log_archive_dest_1='location=/arch1/prod1 mandatory arch async noreopen max_failure=0 alternate=log_archive_dest_3' *.log_archive_dest_2='service=PROD1_node2 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' *.log_archive_dest_3='location=/arch2/prod1' *.log_archive_dest_state_1='enable' *.log_archive_dest_state_2='enable'#stby - set to defer when in stby role *.log_archive_dest_state_3='alternate' *.log_archive_dest='' *.log_archive_duplex_dest='' *.log_archive_format='log_%t_%s.arc' *.log_archive_start=true *.log_buffer=1048576 *.optimizer_mode='choose' *.parallel_threads_per_cpu=4 *.pga_aggregate_target=840M *.pre_page_sga=FALSE *.processes=200 *.query_rewrite_enabled=true *.query_rewrite_integrity='trusted' *.remote_archive_enable=true *.remote_login_passwordfile='EXCLUSIVE' *.service_names='PROD1.domain.com' *.shared_pool_size=200M *.standby_archive_dest='/arch1/prod1' *.standby_file_management='auto' *.timed_statistics=TRUE *.undo_management='auto' *.user_dump_dest='/usr/oracle/oradba/prod1/udump' *.utl_file_dir='/home/oracle/htmllogs' *.workarea_size_policy='AUTO'
Paper #36226
40
Database
SPFILE
FOR
PROD1
ON NODE2
*._OPTIM_ENHANCE_NNULL_DETECTION=FALSE *.archive_lag_target=0 *.background_dump_dest='/usr/oracle/oradba/prod1/bdump' *.compatible='9.2.0.1.0' *.control_file_record_keep_time=14 *.control_files='/db01/oracle/oradata/prod1/control01.ctl','/db02/oracle/oradata/prod1/control02.ctl','/db03/orac le/oradata/prod1/control03.ctl' *.cursor_space_for_time=TRUE *.db_block_size=8192 *.db_cache_size=840M *.db_create_online_log_dest_1='' *.db_create_online_log_dest_2='' *.db_create_online_log_dest_3='' *.db_create_online_log_dest_4='' *.db_create_online_log_dest_5='' *.db_domain='domain.com' *.db_file_multiblock_read_count=32 *.db_files=471 *.db_name='PROD1' *.fal_client='PROD1_node1' *.fal_server='PROD1_node2' *.global_names=false *.instance_name='PROD1' *.log_archive_dest_1='location=/arch1/prod1 mandatory arch async noreopen max_failure=0 alternate=log_archive_dest_3' *.log_archive_dest_2='service=PROD1_node1 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' *.log_archive_dest_3='location=/arch2/prod1' *.log_archive_dest_state_1='enable' *.log_archive_dest_state_2='defer'#stby - set to enable when in pri role *.log_archive_dest_state_3='alternate' *.log_archive_dest='' *.log_archive_duplex_dest='' *.log_archive_format='log_%t_%s.arc' *.log_archive_start=true *.log_buffer=1048576 *.optimizer_mode='choose' *.parallel_threads_per_cpu=4 *.pga_aggregate_target=840M *.pre_page_sga=FALSE *.processes=200 *.query_rewrite_enabled=true *.query_rewrite_integrity='trusted' *.remote_archive_enable=true *.remote_login_passwordfile='EXCLUSIVE' *.service_names='PROD1.domain.com' *.shared_pool_size=200M *.standby_archive_dest='/arch1/prod1' *.standby_file_management='auto' *.timed_statistics=TRUE *.undo_management='auto' *.user_dump_dest='/usr/oracle/oradba/prod1/udump' *.utl_file_dir='/home/oracle/htmllogs' *.workarea_size_policy='AUTO'
SPFILE
FOR
PROD2
ON NODE1
*._OPTIM_ENHANCE_NNULL_DETECTION=FALSE *.archive_lag_target=0 *.background_dump_dest='/usr/oracle/oradba/prod2/bdump' *.compatible='9.2.0.1.0' *.control_file_record_keep_time=14 *.control_files='/db01/oracle/oradata/prod2/control01.ctl','/db02/oracle/oradata/prod2/control02.ctl','/db03/orac le/oradata/prod2/control03.ctl' *.cursor_space_for_time=TRUE *.db_block_size=8192 *.db_cache_size=200M *.db_create_online_log_dest_1='' *.db_create_online_log_dest_2='' *.db_create_online_log_dest_3='' *.db_create_online_log_dest_4='' *.db_create_online_log_dest_5='' *.db_domain='domain.com' *.db_file_multiblock_read_count=32 Paper #36226
41
Database
*.db_files=471 *.db_name='PROD2' *.fal_client='PROD2_node2' *.fal_server='PROD2_node1' *.global_names=false *.instance_name='PROD2' *.log_archive_dest_1='location=/arch1/prod2 mandatory arch async noreopen max_failure=0 alternate=log_archive_dest_3' *.log_archive_dest_2='service=PROD2_node2 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' *.log_archive_dest_3='location=/arch2/prod2' *.log_archive_dest_state_1='enable' *.log_archive_dest_state_2='defer'#stby - set to enable when in pri role *.log_archive_dest_state_3='alternate' *.log_archive_dest='' *.log_archive_duplex_dest='' *.log_archive_format='log_%t_%s.arc' *.log_archive_start=true *.log_buffer=1048576 *.optimizer_mode='choose' *.parallel_threads_per_cpu=2 *.pga_aggregate_target=100M *.pre_page_sga=TRUE *.processes=200 *.query_rewrite_enabled=true *.query_rewrite_integrity='trusted' *.remote_archive_enable=true *.remote_login_passwordfile='EXCLUSIVE' *.service_names='PROD2.domain.com' *.shared_pool_size=200M *.standby_archive_dest='/arch1/prod2' *.standby_file_management='auto' *.timed_statistics=TRUE *.undo_management='auto' *.user_dump_dest='/usr/oracle/oradba/prod2/udump' *.utl_file_dir='/home/oracle/htmllogs' *.workarea_size_policy='AUTO'
SPFILE
FOR
PROD2
ON NODE2
*._OPTIM_ENHANCE_NNULL_DETECTION=FALSE *.archive_lag_target=0 *.background_dump_dest='/usr/oracle/oradba/prod2/bdump' *.compatible='9.2.0.1.0' *.control_file_record_keep_time=14 *.control_files='/db01/oracle/oradata/prod2/control01.ctl','/db02/oracle/oradata/prod2/control02.ctl','/db03/orac le/oradata/prod2/control03.ctl' *.cursor_space_for_time=TRUE *.db_block_size=8192 *.db_cache_size=200M *.db_create_online_log_dest_1='' *.db_create_online_log_dest_2='' *.db_create_online_log_dest_3='' *.db_create_online_log_dest_4='' *.db_create_online_log_dest_5='' *.db_domain='domain.com' *.db_file_multiblock_read_count=32 *.db_files=471 *.db_name='PROD2' *.fal_client='PROD2_node1' *.fal_server='PROD2_node2' *.global_names=false *.instance_name='PROD2' *.log_archive_dest_1='location=/arch1/prod2 mandatory arch async noreopen max_failure=0 alternate=log_archive_dest_3' *.log_archive_dest_2='service=PROD2_node1 optional lgwr async=20480 noaffirm reopen=15 max_failure=10 delay=30 net_timeout=30' *.log_archive_dest_3='location=/arch2/prod2' *.log_archive_dest_state_1='enable' *.log_archive_dest_state_2='enable'#stby - set to defer when in stby role *.log_archive_dest_state_3='alternate' *.log_archive_dest='' *.log_archive_duplex_dest='' *.log_archive_format='log_%t_%s.arc' Paper #36226
42
Database
*.log_archive_start=true *.log_buffer=1048576 *.optimizer_mode='choose' *.parallel_threads_per_cpu=2 *.pga_aggregate_target=100M *.pre_page_sga=TRUE *.processes=200 *.query_rewrite_enabled=true *.query_rewrite_integrity='trusted' *.remote_archive_enable=true *.remote_login_passwordfile='EXCLUSIVE' *.service_names='PROD2.domain.com' *.shared_pool_size=200M *.standby_archive_dest='/arch1/prod2' *.standby_file_management='auto' *.timed_statistics=TRUE *.undo_management='auto' *.user_dump_dest='/usr/oracle/oradba/prod2/udump' *.utl_file_dir='/home/oracle/htmllogs' *.workarea_size_policy='AUTO'
Paper #36226
43
Database
LSNRPROD1PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1021)(SDU=32768)) ) ) # The SID_LIST below is required because we are using a static SID to # allow Oracle Enterprise Manager to discover and manage the listener. SID_LIST_LSNRPROD1PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) ) # # # # Below is the PROD2 listener for non-privileged database clients. It listens on the network address for node1 on port 1022. It should be up when PROD2 is primary on node1 and down when PROD2 is primary on node2.
LSNRPROD2PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1022)(SDU=32768)) ) ) # The SID_LIST below is required because we are using a static SID to # allow Oracle Enterprise Manager to discover and manage the listener. SID_LIST_LSNRPROD2PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD2) ) ) # # # # The following listener for DBA clients and redo log traffic should always be up. DBA clients connect through port 1222. The PROD2 standby on node2 fetches archive log gaps from the PROD2 primary on node1 through port 1122 and goes on the special network, denoted by the node1b hostname.
LSNRPROD2REDO = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1b)(PORT = 1122)(SDU=32768)) (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1222)(SDU=32768)) ) ) # The GLOBAL_DBNAME parameter below is used to allow Oracle Enterprise Manager # to discover PROD2 as both standby on node1 and as primary on node2 where a # different GLOBAL_DBNAME is used. SID_LIST_LSNRPROD2REDO = (SID_LIST = (SID_DESC = Paper #36226
44
Database
(GLOBAL_DBNAME = PROD2_node1.domain.com (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD2) ) ) # # # # The following listener for DBA clients and redo log traffic should always be up. DBA clients connect through port 1221. The PROD1 standby on node2 fetches archive log gaps from the PROD1 primary on node1 through port 1121 and goes on the special network, denoted by the node1b hostname.
LSNRPROD1REDO = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1b)(PORT = 1121)(SDU=32768)) (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1221)(SDU=32768)) ) ) SID_LIST_LSNRPROD1REDO = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) )
LSNRPROD1PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1021)(SDU=32768)) ) ) # The SID_LIST below is required because we are using a static SID to # allow Oracle Enterprise Manager to discover and manage the listener. SID_LIST_LSNRPROD1PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) ) # # # # Below is the PROD2 listener for non-privileged database clients. It listens on the network address for node2 on port 1022. It should be up when PROD2 is primary on node2 and down when it is primary on node1.
LSNRPROD2PUB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1022)(SDU=32768)) ) ) # The SID_LIST below is required because we are using a static SID to # allow Oracle Enterprise Manager to discover and manage the listener. SID_LIST_LSNRPROD2PUB = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD2) ) ) # The following listener for DBA clients and redo log traffic should Paper #36226
45
Database
# always be up. DBA clients connect through port 1221. The PROD1 standby # on node1 fetches archive log gaps from the PROD1 primary on node2 through # port 1121 and goes on the special network, denoted by the node2b hostname. LSNRPROD1REDO = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2b)(PORT = 1121)(SDU=32768)) (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1221)(SDU=32768)) ) ) SID_LIST_LSNRPROD1REDO = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD1) ) ) # # # # The following listener for DBA clients and redo log traffic should always be up. DBA clients connect through port 1222. The PROD2 standby on node1 fetches archive log gaps from the PROD2 primary on node2 through port 1121 and goes on the special network, denoted by the node2b hostname.
LSNRPROD2REDO = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2b)(PORT = 1122)(SDU=32768)) (ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1222)(SDU=32768)) ) ) SID_LIST_LSNRPROD2REDO = (SID_LIST = (SID_DESC = (ORACLE_HOME = /usr/oracle/product/9.0.1) (SID_NAME = PROD2) ) )
PROD2_node2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2b)(PORT = 1122)) ) (CONNECT_DATA = (SID = PROD2) ) ) # # # # The following entry is for the PROD1 standby on node2 to fetch archive log gaps from the PROD1 primary on node1. PROD1_node2 is the PROD1 setting for FAL_CLIENT on node2. This redo traffic goes through the special network connecting the two nodes.
PROD1_node2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node2b)(PORT = 1121)) ) (CONNECT_DATA = (SID = PROD1) ) )
Paper #36226
46
Database
PROD2_node1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1b)(PORT = 1122)) ) (CONNECT_DATA = (SID = PROD2) ) ) # # # # Following is the entry for the PROD1 standby on node2 to fetch archive log gaps from the PROD1 primary on node1. PROD1_node1 is the PROD1 setting for FAL_CLIENT on node2. This redo traffic goes through the special network connecting the two nodes.
PROD1_node1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = node1b)(PORT = 1121)) ) (CONNECT_DATA = (SID = PROD1) ) )
TNSNAMES.ORA FOR
# # # #
NON-PRIVILEGED CLIENTS
Below are the PROD1 and PROD2 aliases for non-privileged clients. These aliases are used to connect only to the primary databases PROD1 and PROD2. Proper operation relies upon starting the listener only on the node in the primary database role.
PROD1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1021)) (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1021)) ) (CONNECT_DATA = (SID = PROD1) ) ) PROD2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1022)) (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1022)) ) (CONNECT_DATA = (SID = PROD2) ) )
TNSNAMES.ORA FOR
DBA CLIENTS
# Below are the PROD1 and PROD2 aliases for DBA clients. # These aliases can be used to connect to PROD1 and PROD2 in # either role on either node. PROD1NODE1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1221)) Paper #36226
47
Database
) (CONNECT_DATA = (SID = PROD1) ) ) PROD2NODE1 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1222)) ) (CONNECT_DATA = (SID = PROD2) ) ) PROD1NODE2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1221)) ) (CONNECT_DATA = (SID = PROD1) ) ) PROD2NODE2 = (DESCRIPTION = (SDU=32768) (ADDRESS_LIST = (ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=1222)) ) (CONNECT_DATA = (SID = PROD2) ) )
Paper #36226
48
Database
FOR
STANDBY DATABASES
The following set of scripts have been implemented and tested at several customer sites. They have proven highly effective in managing Oracle8i standby database and Oracle9i Data Guard environments.11
CRONTAB
Below are the crontab entries for the two monitoring scripts also listed in this Appendix.
# DBA's crontab .... # #min hour day month dow #0-59 0-23 1-31 1-12 0-6 command #---------------------------------------------# # # 0 9 * * * /export/home/dmsdba/tools/STBY_gap_check.ksh > /export/home/dmsdba/tmp/cron_STBY_gap_check_$$.log 2>&1 # # # 0 9 * * * /export/home/dmsdba/tools/STBY_integrity_check.ksh > /export/home/dmsdba/tmp/cron_STBY_check_$$.log 2>&1 #
STBY_GAP_CHECK.KSH
#!/bin/ksh # # File: STBY_gap_check.ksh # *************************************************************************** # # typeset -x PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/ccs/bin:$PATH # typeset -x PATH=/export/home/dmsdba/tools:$PATH # typeset -x ORACLE_PATH=$HOME/active:$HOME/tools/sql:$HOME/tools/sql/misclib:/ora/admin/scripts \ ORACLE_BASE=/ora # . /usr/local/bin/setup_ora_STBY # # tempfile=/export/home/dmsdba/tmp/STBY_gap_check_$$.dat # # # Check STBY for Recovery Mode # echo " " > $tempfile echo "Checking STBY for Recovery Mode ------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd as sysdba set pause off set timing off set feedback off @recovery_mode_check.sql EOS # # # # Check STBY for Gaps in Archive log application # echo " " >> $tempfile
11
I am indebted to my colleague and mentor, Matthew Burke, Director of ThinkSparks Data Management Services (DMS) Group, for writing, implementing and testing all scripts in Appendix 3.
Paper #36226
49
Database
echo "Checking STBY for Gaps in Archive log applications ------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd as sysdba set pause off set timing off @archive_log_gap_check.sql EOS # # mailx -s "STBY Archive Log Gap Check" dba.oncall@client.com < $tempfile # # exit 0
SETUP_ORA_STBY
# File: setup_ora_STBY # # # This script must be "dot" executed in order to # apply correctly. Consequently, this file should # NOT be given execute privilege - hopefully we'll # avoid misleading any users. # # NOTE: In order to avoid version mismatches, these scripts # always reset certain environment variables # # # Remove any possible setting for TWO_TASK # unset TWO_TASK # # # Setup common environment # ORACLE_BASE=/ora ORAENV_ASK=NO TMPDIR=/tmp TNS_ADMIN=/var/opt/oracle # export ORACLE_BASE ORAENV_ASK TMPDIR TNS_ADMIN # Setup to Oracle STBY Database # NLS_LANG=AMERICAN_AMERICA.UTF8 NLS_DATE_FORMAT=DD-MON-RR NLS_DATE_LANGUAGE=AMERICAN NLS_NUMERIC_CHARACTERS=".," ORACLE_SID=STBY EXPORTS=/ora/EXPORTS # export NLS_LANG NLS_DATE_FORMAT NLS_DATE_LANGUAGE NLS_NUMERIC_CHARACTERS ORACLE_SID EXPORTS # # # # # Setup to Java Runtime Environment # JRE_DIR=/ora/jre # PATH=$JRE_DIR/jre1.1.6/bin:$PATH CLASSPATH=$CLASSPATH:$JRE_DIR/jre1.1.6/lib/rt.jar # # export PATH CLASSPATH JRE_DIR # # # Custom settings for Client # HOSTNAME=be2 HOSTTYPE=sparc LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/ucblib:/lib:/ora/product/817/lib # Paper #36226
50
Database
RECOVERY_MODE_CHECK.SQL
rem rem rem File: recovery_mode_check.sql rem rem rem prompt The presence of lock type MR below indicates that the database is prompt in media recovery mode - either manual or managed - not simply started prompt and mounted. rem #column "Lock Type" format a12 rem rem select type "Lock Type", count(*) "Number of Locks" from v$lock group by type / rem rem rem
.eof
ARCHIVE_LOG_GAP_CHECK.SQL
rem rem rem rem rem rem select from File: archive_log_gap_check.sql
high.thread#, "LogGap#", "HighGap#" ( select from thread#, MIN(sequence#)-1 "HighGap#" ( select from a.thread#, a.sequence# ( select * from v$archived_log ) a, ( select thread#, MAX(next_change#)gap1 from v$log_history group by thread# ) b a.thread# = b.thread# a.next_change# > gap1
where and ) group by thread# ) high, ( select thread#, MIN(sequence#) "LogGap#" from ( select thread#, sequence# from v$log_history, v$datafile where checkpoint_change# <= next_change# and checkpoint_change# >= first_change# ) group by thread# ) low
Paper #36226
51
Database
low.thread# = high.thread#
.eof
STBY_INTEGRITY_CHECK.KSH
#!/bin/ksh # # File: STBY_integrity_check.ksh # *************************************************************************** # # typeset -x PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/ccs/bin:$PATH # typeset -x PATH=/export/home/dmsdba/tools:$PATH # typeset -x ORACLE_PATH=$HOME/active:$HOME/tools/sql:$HOME/tools/sql/misclib:/ora/admin/scripts \ ORACLE_BASE=/ora # . /usr/local/bin/setup_ora_STBY # # tempfile=/export/home/dmsdba/tmp/STBY_check_$$.dat # # # Check STBY for Recovery Mode # echo " " > $tempfile echo "Checking STBY Recovery Mode------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd as sysdba set pause off set timing off set feedback off @recovery_mode_check.sql EOS # # # # Check PRI for Unrecoverable Datafiles # # Check STBY for Unrecoverable Datafiles # echo " " >> $tempfile echo "Checking STBY Datafiles------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd as sysdba set pause off set timing off set feedback off @unrecoverable_change EOS # # # # Check PRI for Unrecoverable Datafiles # # echo " " >> $tempfile echo "Checking PRI Datafiles------------------" >> $tempfile echo " " >> $tempfile # # sqlplus -s /nolog << EOS >> $tempfile connect system/passwd @PRI Paper #36226
52
Database
set pause off set timing off set feedback off @unrecoverable_change EOS # # # Check STBY for any corrupted blocks # alrtscan $ORACLE_BASE/admin/STBY/bdump/alert_STBY.log | tail -300 >> $tempfile # # mailx -s "STBY Integrity Check" dba.oncall@client.com < $tempfile # # exit 0
UNRECOVERABLE_CHANGE.SQL
rem rem rem rem rem rem rem rem rem rem rem select File: unrecoverable_change.sql
Check for unrecoverable changes in the Standby database by unrecoverable operations in the primary database.
unrecoverable_change# as "Unrecoverable SCN", to_char(unrecoverable_time, 'mm-dd-yyyy hh24:mi:ss') as "Unrecoverable Timestamp", file# as "File Number", name as "File Name" from v$datafile where unrecoverable_time is not null order by file# / rem rem rem
.eof
Checking STBY for Recovery Mode -----------------Connected. The presence of lock type MR below indicates that the database is in media recovery mode - either manual or managed - not simply started and mounted. Lock Type Number of Locks ------------ -----------------FS 1 IS 1 MR 27 RT 1 WL 1 Paper #36226
53
Database
Checking STBY for Gaps in Archive log applications -----------------Connected. THREAD# LogGap# HighGap# ------------------ ------------------ -----------------1 28,015 28,016 1 row selected.
Checking PRI Datafiles-----------------Connected. Unrecoverable SCN -----------------917,984,600 930,774,627 930,774,668 917,989,671 917,989,801 917,990,967 Unrecoverable Times File Number File Name ------------------- ------------------ ---------------------------------------------------02-19-2003 11:04:35 7 /CL1/oradata/PRI/data/l1_ez_i01.dbf 03-05-2003 10:04:21 12 /CL2/oradata/PRI/data/l2_ez_intf_d_01.dbf 03-05-2003 10:04:28 13 /CL1/oradata/PRI/data/l1_ez_intf_i_01.dbf 02-19-2003 11:12:24 21 /CL1/oradata/PRI/data/l1_ez_i_02.dbf 02-19-2003 11:12:39 22 /CL1/oradata/PRI/data/l1_ez_i_03.dbf 02-19-2003 11:15:20 23 /CL1/oradata/PRI/data/l1_ez_i_04.dbf
Media Recovery Log /ora/STBYarch/PRI_28010_1.arc Thu Aug 14 16:17:00 2003 Media Recovery Waiting for thread 1 seq# 28011 Thu Aug 14 18:41:17 2003 Media Recovery Log /ora/STBYarch/PRI_28011_1.arc Thu Aug 14 18:42:28 2003 Media Recovery Waiting for thread 1 seq# 28012 Thu Aug 14 21:01:15 2003 Media Recovery Log /ora/STBYarch/PRI_28012_1.arc Thu Aug 14 21:02:06 2003 Media Recovery Waiting for thread 1 seq# 28013 Thu Aug 14 21:03:51 2003 Media Recovery Log /ora/STBYarch/PRI_28013_1.arc Paper #36226
54
Database
Media Recovery Waiting for thread 1 seq# 28014 Thu Aug 14 21:05:48 2003 Media Recovery Log /ora/STBYarch/PRI_28014_1.arc Thu Aug 14 21:06:18 2003 Media Recovery Waiting for thread 1 seq# 28015 Thu Aug 14 21:07:48 2003 Media Recovery Log /ora/STBYarch/PRI_28015_1.arc Thu Aug 14 21:08:13 2003 Media Recovery Waiting for thread 1 seq# 28016 Thu Aug 14 21:14:46 2003 Shutting down instance (immediate) License high water mark = 5 Thu Aug 14 21:14:58 2003 Media Recovery failed with error 1089 ORA-283 signalled during: ALTER DATABASE RECOVER managed standby database Thu Aug 14 21:15:20 2003 ALTER DATABASE CLOSE NORMAL Thu Aug 14 21:15:20 2003 ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL... Thu Aug 14 21:15:20 2003 ALTER DATABASE DISMOUNT Completed: ALTER DATABASE DISMOUNT archiving is disabled Thu Aug 14 21:15:22 2003 ARCH shutting down ARC0: Archival stopped Thu Aug 14 21:39:36 2003 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 LICENSE_MAX_USERS = 0 Starting up ORACLE RDBMS Version: 9.2.0.1.0 . . . System parameters with non-default values: processes = 100 ARCH: STARTING ARCH PROCESSES ARC0 started with pid=10 Thu Aug 14 21:39:40 2003 ARCH: STARTING ARCH PROCESSES COMPLETE Thu Aug 14 21:39:40 2003 ARC0: Archival started Thu Aug 14 21:39:41 2003 alter database mount standby database Thu Aug 14 21:39:45 2003 Successful mount of redo thread 1, with mount id 2715592097. Thu Aug 14 21:39:45 2003 Standby Database mounted. Completed: alter database mount standby database Thu Aug 14 21:39:45 2003 ALTER DATABASE RECOVER managed standby database Media Recovery Start: Managed Standby Recovery Media Recovery Log Media Recovery Waiting for thread 1 seq# 28016 <EOF>
...
This output reports six unrecoverable datafiles on both the primary and standby sites. It also indicates that the database is in media recovery mode, but that the standby is still waiting for log sequence 28,016 from the primary database: The primary database probably lost network connectivity with the standby. Data Guard should automatically detect and resolve this archive gap, and fetch all logs on the primary from sequence# 28,016 onward. If necessary (because the network problem is not fixed in a timely manner, for example), you can manually copy these archive logs to the standby, recover the standby, and place it in managed recovery mode again. To manually recover the standby, execute the following on the standby:
SQL> SQLPLUS INTERNAL; SQL> STARTUP NOMOUNT; SQL> ALTER DATABASE MOUNT STANDBY DATABASE;
Paper #36226
55
Database
After mounting the standby, the primary should resume automatically copying archive logs to the standby. Initiate managed recovery so the standby can catch up with the primary archive logs as follows:
SQL> RECOVER AUTOMATIC STANDBY DATABASE; SQL> CANCEL; SQL> EXIT;
Then start a background process to place the standby in managed recovery mode by executing begin_managed_standby.ksh. This script is listed below along with the script it calls, recover_managed_standby.ksh:
BEGIN_MANAGED_STANDBY.KSH
#!/bin/ksh # # File: begin_managed_standby.ksh # *************************************************************************** # # typeset -x PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/ccs/bin:$PATH # # # . /usr/local/bin/setup_ora_STBY # # # # unset ORACLE_PATH # nohup $ORACLE_BASE/admin/STBY/scripts/recover_managed_standby.ksh > $ORACLE_BASE/admin/STBY/script_logs/managed_recovery.log 2>&1 & # # # # # .eof
RECOVER_MANAGED_STANDBY.KSH
#!/bin/ksh # # File: recover_managed_standby.ksh # # # This script assumes the user's environment and the database # are prepared for this command. # # sqlplus "/ as sysdba" << EOS set echo on startup nomount; alter database mount standby database; recover managed standby database; EOS # # # .eof
Paper #36226
56