Real Application Clusters (RAC)
5.1 Options for High Availability
Oracle has a Maximum Availability Architecture (MAA), which includes a combination of the options with Data Guard and RAC environments.
With RAC, you can apply rolling patches to eliminate downtime for patching. Additional nodes can be added to the cluster to provide more resources, since the nodes can use the CPUs and memory that are available on each server.
Data Guard also provides a way to test an application rollout or database upgrade by using a snapshot of production database on the standby server.
ASM, when used in the RAC environment, is part of a high-availability solution. ASM manages the disks available to databases and instances on a server. It simplifies the management of Oracle database files and provides a clustered file system.
Replication and Oracle Streams might not be considered part of a high availability solution for Oracle because RAC and Data Guard can provide the maximum availability without having to manage the replication processes. Including backups and flashback, further reduces the risks for unplanned failures and planned maintenance.
5.2 Clustering with RAC
Clustering is ideal for two or more servers that have shared resources, such as disks. In case of a hardware failure on one server in the cluster, the other servers can pick up the workload until that server can be brought back up.
Oracle RAC servers share a disk and have the same Oracle database but with different instances running on each node. If one node fails, the connections failover to the other node. The instances do not failover, because the instances are just the processes on each server that access the same data. The Oracle database is available from any of the nodes in the cluster.
The advantage of Oracle RAC is that the resources on both nodes are used by the database, and each node uses its own memory and CPU. Information is shared between nodes through the interconnect—the virtual private network.
RAC provides high availability because of the failover of connections in the event of a hardware failure or server connection failure. The RAC environment also provides high availability for patching with rolling upgrades (Oracle Database 11g). And you can easily add a new server with memory and CPU to the cluster, make new connections to the new node, and the workload will be rebalanced between all of the nodes.
- Configuring RAC
Configuring an RAC environment starts off similar to setting up a cluster of servers in a SQL Server environment. The servers need to have a private network between the machines and a set of disks that can be seen by all of the servers in the cluster. The disks will need space for the Oracle Cluster Registry (OCR) and voting disk, just as a SQL Server cluster needs a quorum disk for the cluster membership. After the network configuration and disk allocation, the Oracle Clusterware software can be installed. If the Clusterware software can see both nodes, then the database installation is available for an RAC database. The software will install on the available nodes in the cluster. The cluster name can be specified, and the node names will be visible, with each private and public IP address that is configured.
The Cluster Verification Utility (CVU) assists in the Clusterware setup and pre installation tasks, including the operating system and network settings. With Oracle Database 11g R2, the Grid Infrastructure software has the installation for Clusterware and ASM. Clusterware and ASM should be installed in a different Oracle home directory than the database.
The network configurations are key because you need a public IP and a private IP, for the interconnect and virtual IP (VIP). The network adapters need to be configured the same on all of the nodes of the clusters, so eth0 should be set to public network on all of the nodes, and eth1 set to the private network. For Linux environments, you can look at the /etc/hosts file to see the IP addresses and configurations.
Once the network IPs, kernel parameters, and operating system settings are configured, and storage is available to the servers in the cluster, the installation walks through the setup of the Clusterware software. An advanced installation of Clusterware provides opportunities to configure the storage and additional networking options.
After Clusterware is installed and the databases are created, the databases and cluster services need to be monitored, and stopped and started as needed. Use the cluster and service commands to check the status, and start and stop the instances and listeners on each node.
- Patching RAC
RAC environments also provide failover and increased uptime for planned maintenance as well as unplanned failures. With RAC environments, there are three ways to apply patches to all of the nodes of the cluster:
■ Patching RAC like a single-instance database. All of the instances and listeners will be down. Patching starts with the local node and continues with all the other nodes.
■ Patching RAC with minimum downtime. This method applies the patches to the local node, requests a subset of nodes to be patched first, and then applies the patches to other nodes. The downtime happens when the second subset is shut down for patching and the initial nodes are brought back online with the new patches.
■ Patching RAC with the rolling method. The patches are applied to one a node at time, so that at least one node in the cluster is available while the patching is rolling through the environment. There is no downtime with this method. The node can be brought up again after being patched while the other nodes are still up and available. Then the next node is patched.
Not all patches are available as rolling patches. The patch will indicate if it can be applied with this method. The Oracle patching method is to use OPATCH to apply the patches to Oracle homes. Using OPATCH, you can verify if the patch is a rolling patch.
- Deploying RAC
Adding another node to a cluster is an easy way to provide more resources to the RAC database. Using Oracle Grid Control or OEM, you can add a node with the same configuration and installation as the other nodes. Then the nodes are available for client connections.
An option pack is available for provisioning new Oracle servers. If you have several servers to manage or need to upgrade and patch a very large set of servers, these tools are useful for handling basic configuration and setup. They can use a golden copy or a template to verify the hardware installation, and then configure the operating system and database, which can be a stand-alone database server or Oracle Clusterware with an RAC database.
- Configuring and Monitoring RAC Instances
With an Oracle RAC environment, connections failover, and multiple instances are involved. There might even be multiple logs and trace files, depending on how the dump destination is configured for the instance. Each instance can have its own set of parameters that are different from those on the other instances in the database. For example, batch jobs, reporting, and backups can be set to go to one instance over another, but still have the ability to failover the connections if that node is not available. In the connection string, you might set FAILOVER=ON but LOAD_BALANCE=OFF to handle the connections to one instance.
The spfile and init.ora files can be shared by all of the instances in the RAC database, so the parameters will have a prefix of the instance SID if they are set for that instance. The view to see all of the parameters is gv$parameter, instead of v$parameter.
- Testing RAC
With Oracle RAC, you can test the failover and confirm that the setup and configuration are working properly. Failover testing includes the client, network, and storage connections from both servers.
Simply rebooting the servers is first on the checklist. Make sure that the Clusterware software is still configured as needed and settings are persistent (the server did not revert to older settings). You can run CVU at any time to verify the cluster that includes the networking settings.
Another test is to pull the interconnect so that servers do not have their private network. Then validate that one of the nodes accepts the new connections, and that the failover of connections to the surviving node runs the queries as it should. Next, test the connections from the application and from utilities like SQL*Plus.
This is not just validating that the users can connect, but also checking what happens if a server goes down. Connect to the database through the different applications, and then actually shut down a server. To verify, look at the sessions running on both nodes before the shutdown to confirm that there are connections to the node, and then look at the sessions on the node that is still running. If connections do not failover, double-check the tnsnames.ora file and connection strings to make sure that failover mode is in the string, as well as that the service name and virtual hostname are being used.
Using the global views makes it easier to see all of the processes running across the nodes. But monitoring RAC performance is basically the same as checking performance on a single instance. You can verify what is running and check that the statistics are up to date.
The interconnect can play a role in the performance, as memory blocks are swapped between the nodes. Oracle Database 11g has improved the Cache Fusion protocols to be more workload-aware to help reduce the messaging for read operations and improve performance.
5.3 Primary and Standby Databases
Oracle offers the option of a standby database with Oracle Data Guard as another type of failover. The primary and secondary database servers do not share any of the database files or disk. They can even be servers located in completely different data centers, which offers a disaster recovery option. The redo logs from the primary server are transported over to the secondary server depending on the protection mode, and then they are applied to the database on the secondary server.
Oracle Data Guard has different protection modes based on the data loss and downtime tolerance:
■ Maximum Protection provides for zero data loss, but the transactions must be applied synchronous to both the primary and secondary database servers. If there are issues applying the logs to the secondary server, the primary server will wait for the transaction to be completed on both servers to commit the change.
■ Maximum Availability has zero data loss as the goal, but if there is a connectivity issue or the transaction cannot be applied to the secondary server, the primary server will not wait. The primary server still has a record of what has been applied for verification, and the standby database might fall slightly behind, but it is more critical to have the primary database available.
■ Maximum Performance has the potential for minimal data loss. The transport of the logs is done asynchronously, and there is no checking back with the primary server about applying the logs and verifying the change has been completed.
- Using Active Standby Databases
The physical standby database is a copy of the primary database and is kept in sync with the primary database. With Oracle Database 11g, the standby database can also be an active database, which remains open for reading while the database is still being synchronized with the primary. This is the Active Data Guard option.
Another option that allows for use of the secondary server is a logical standby database. With this type of standby database, the changes are applied by SQL statements that are converted from the redo logs. This allows for some of the structures of the data to vary from the primary database, and the changes can still be applied through the SQL statements.
A third standby database option is a snapshot database configuration. The standby database can be converted to a read-write snapshot. It continues to receive the redo information from the primary database, but does not apply the changes until converted back to being only a standby database. While in read-write mode, the snapshot standby database can be used to test various changes, such as new application rollout, patches, or data changes. Then the snapshot is set back to before the changes were made, and the redo log will be applied. Having a copy of the production database for testing like this is extremely valuable for successful rollouts of changes.
With this setup, the disaster recovery plan is very simple: connect to the standby database and make it the primary database. The copies of the databases can also be used to offload work such as backups and read-only reporting. This takes advantage of the standby database, which would otherwise sit idle unless the primary database failed.
- Setting Up a Standby Database
An existing database can be configured to have a standby database. The first step is to install the Oracle software on the standby server. The database already exists on the primary server. The primary database will need some configuration with standby logs and parameters. Connections to the secondary database can be set up, and then using RMAN, the initial copy of the database can be set up on the standby server.
In summary, the basic steps are as follows:
- Install the software on the standby server.
- Configure the parameters on the primary server.
- Make the connections by updating tnsnames.ora and listener.
- Use RMAN to copy the database.
You can use the Data Guard broker and management tools to set up the automatic failover and manage the standby servers. The Data Guard broker needs to be running on both the primary and standby server. A listener entry for the Data Guard broker on the primary and standby servers will help with failover and avoiding TNS errors.
The Oracle RAC database can be a primary and a standby server. When the Maximum Protection option is chosen for the Data Guard configuration, having RAC set up on the standby database will reduce the risk for the logs to be applied.
5.4 ASM in an RAC Environment
In the Oracle RAC environment, there needs to be an ASM instance for every node in the cluster, but one ASM instance can support multiple instances on that node.
- Managing ASM Disk Groups
The ASM disk groups serve as containers for consolidating databases and file systems to be able to use the storage more efficiently and even share between databases. The ASM Configuration Assistant (ASMCA) helps you create and manage disk groups. As shown in Figure new disks can be added to the disk group here, and attributes of the disk group can be edited. Other ASMCA options allow you to manage the volumes and file system in a clustered environment.
ASM Configuration Parameters
The ASM instance is really a process and a bit of memory. Some parameters go into the spfile for configuration of this instance. These parameters provide the details about the type of instance and where the disks are located for creating the disk groups.
■ INSTANCE_TYPE Set to ASM (default is RDBMS)
■ ASM_DISKGROUPS Lists the disk groups that should be mounted
■ ASM_DISKSTRING A value that indicates where to discover the disks that are available to add to a disk group
■ ASM_POWER_LIMIT Maximum power for rebalancing operation, a value between 1 and 11 (higher number for faster rebalancing)
ASMLib is the support library for ASM. It is used for initialize the disk for usage with ASM. The Linux package for the ASMLib needs to be installed for usage.
- Viewing ASM Information
When connected to the ASM instance, some v$ views give information about the instances connected, disks that might not be part of a disk group, and files. For example, the v$asm_disk view shows the disks that are being used by that database instance, and when viewed from the ASM instance, it will show all of the disks that are discovered.
5.5 Streams and Advanced Replication
Replication provides copies of data to different servers, and it can be used to move data. While it isn’t a failover mechanism usually associated with high availability, it does help ensure that data is available and can provide a way to selectively pull out the important data. For replication, Oracle offers Oracle Streams and the Advanced Replication option.
- Oracle Streams
Oracle Streams, included as part of the Oracle database installation, captures data changes to distribute to another database. The phases of Streams are similar to the SQL Server publisher, distributor, and subscriber roles. A user needs to be created to manage the replication, and a tablespace is also required.
- Setting Up Oracle Streams
The Streams administrator user needs the DBA permissions and admin privilege on the DBMS_STREAMS_AUTH package. The parameter configurations for Streams setup are GLOBAL_NAMES= TRUE, JOB_QUEUE_PROCESS higher than 2, and STREAMS_POOL_SIZE at least 200MB.
Changes for data and objects are captured and replicated. Replication can be configured for the whole database, schemas, tables, or even tablespaces.You can set up Streams through OEM. Through OEM, you can also choose to set up downstream capture and create an advanced queue. Downstream capture collects streams on a remote database other than the source.
2. Using Oracle Streams
Oracle Streams uses logical change records (LCRs) for each row of a table modified. Each LCR has the name of the table changed, old and new values for any changed columns, and values for the key columns. This information can be applied to the rows at the destination sites and resolve conflicts if they arise.
- Advanced Replication
Along with Oracle Streams replication, Oracle offers an Advanced Replication option. This handles master replication with a single master or multiple masters. Multimaster replication is known as peer-to-peer, and any of the servers can be updated. Advanced Replication processing to multiple masters can be asynchronous and synchronous.
For this type of replication, you need to set up a replication admin user. Tables in the databases should have primary keys. The DBMS_REPCAT package provides routines for administering and updating the replication catalog.
Advanced Replication offers the option of replicating to non-Oracle databases. This allows a way to provide data to several different systems. Also, with this type of replication, the Oracle database version and platform do not need to be the same for replication. Advanced Replication may be suitable for distributed or data warehouse databases, to have copies available for other systems or to maintain the workload on different servers.