Oracle RAC Tutorial

Options for High Availability

Oracle has a Maximum Availability Architecture (MAA) that includes a combination of options with Data Guard and RAC environments.

With RAC, we can apply rolling patches to eliminate the downtime while patching. Additional nodes can be added to the cluster to provide more resources since the nodes can use the CPUs and memory that are available on each server.

Data Guard also provides a way to test an application rollout or database upgrade by using a snapshot of the production database on the standby server.

ASM, when used in the RAC environment, is part of a high-availability solution. It manages the disks available to databases and instances on a server. It simplifies the management of Oracle Database files and provides a clustered file system.

Replication and Oracle Streams might not be considered part of a high-availability solution for Oracle because RAC and Data Guard can provide the maximum availability without having to manage the replication processes. Along with that, backups and flashbacks further reduce the risks of unplanned failures and planned maintenance.

Clustering with Real Application Clusters

Clustering is ideal for two or more servers that have shared resources, such as disks. In case there is a hardware failure on one server in the cluster, the other servers can pick up the workload until the failed server is brought back up.

Oracle RAC servers share a disk and have the same Oracle Database but with different instances running on each node. If one node fails, the connections failover to the other nodes. The instances do not failover, because they are just the processes on each server that access the same data. Oracle Database is available from any of the nodes in the cluster.

The advantage of Oracle RAC is that resources on both nodes are used by the database, and each node uses its own memory and CPU. Information is shared between nodes through the interconnect—the virtual private network.

Real Application Clusters provide high availability because of the failover of connections in the event of a hardware failure or server connection failure. The RAC environment also provides high availability for patching with rolling upgrades (Oracle Database 11g). We can easily add a new server with memory and CPU to the cluster, make new connections to the new node, and the workload will be rebalanced among all of the nodes.

Configuring RAC

Configuring the RAC environment starts off similar to setting up a cluster of servers in a SQL Server environment. The servers need to have a private network among the machines and a set of disks that can be seen by all servers in the cluster. The disks will need space for the Oracle Cluster Registry (OCR) and a voting disk, just as a SQL Server cluster needs a quorum disk for cluster membership. After network configuration and disk allocation, the Oracle Clusterware software can be installed. If the Clusterware software can see both nodes, then the database installation is available for the RAC database. The software will install on the available nodes in the cluster. The cluster name can be specified, and the node names will be visible, with each private and public IP address that is configured.

Cluster Verification Utility (CVU)

Cluster Verification Utility (CVU) assists in the Clusterware setup and pre-installation tasks, including the operating system and network settings. With Oracle Database 11g R2, the Grid Infrastructure software has installation for Clusterware and ASM. Clusterware and ASM should be installed in a different Oracle home directory than the database.

The network configurations are key because we need a public IP and a private IP, for the interconnect and the virtual IP (VIP). The network adapters need to be configured the same on all the nodes of the cluster. So, eth0 should be set to the public network on all the nodes, and eth1 should be set to the private network. For Linux environments, we can look at the /etc/hosts file to see the IP addresses and configurations.

Once the network IPs, kernel parameters, and operating system settings are configured and storage is available to the servers in the cluster, the installation walks through the setup of the Clusterware software. An advanced installation of Clusterware provides opportunities to configure the storage and additional networking options.

After Clusterware is installed and the databases are created, the databases and the cluster services need to be monitored and stopped/started as needed. We can use the cluster and service commands to check the status, to start/stop the instances and to check the listeners on each node.

Patching RAC

RAC environments provide failovers and increased uptime for planned maintenance and for unplanned failures. With RAC environments, there are three ways to apply patches to the nodes in a cluster:

Patching RAC as a single-instance database: When all instances and listeners are down, we can do patching. Here, patching starts with the local node and continues with all the other nodes.
Patching RAC with minimum downtime: This method applies patches to the local node, requests for a subset of nodes to be patched first, and then applies the patches to the other nodes. The downtime happens when the second subset is shut down for patching, but soon the initial nodes are brought back online with the new patches.
Patching RAC with the rolling method: Here, patches are applied to one node at a time so that at least one node in the cluster will be available while patching is rolling through the environment. There is no downtime with this method as, by the time the node being patched up can be brought up again, the other nodes are still up and available. Only after one node gets patched up, the next node is patched.

However, not all patches are available as rolling patches. The patch itself will indicate if it can be applied with this method. The Oracle patching method uses OPATCH to apply patches to Oracle homes. Using OPATCH, we can verify if the patch is a rolling patch.

Deploying RAC

Adding a node to a cluster is an easy way to provide more resources to the RAC database. Using Oracle Grid Control or OEM, we can add a node with the same configuration and installation as the other nodes. Then, this new node will be available for client connections.

An option pack is available for provisioning new Oracle servers. If we have several servers to manage or we need to upgrade and patch a very large set of servers, these tools are useful for handling basic configuration and setup. They use a golden copy or a template to verify the hardware installation and then configure the operating system and database, which can be a stand-alone database server or Oracle Clusterware with RAC database.

Configuring and Monitoring RAC Instances

With an Oracle RAC environment, connections failover, and multiple instances are involved. There might even be multiple logs and trace files, depending on how the dump destination is configured for the instance. Each instance can have its own set of parameters that are different from those on the other instances in the database. For example, batch jobs, reporting, and backups can be set to go to one instance over another but still have the ability to failover the connections if that node is not available. In the connection string, we will set FAILOVER=ON but LOAD_BALANCE=OFF to bring the connections to one instance.

The spfile and init.ora files can be shared by all instances in the RAC database, so parameters will have the prefix of an instance SID if they are set for that instance. The view to see all the parameters is gv$parameter, instead of v$parameter.

Testing RAC

With Oracle RAC, we can test the failover and confirm that the setup and configuration are working properly. Failover testing includes the client, network, and storage connections from both servers.

Simply rebooting the servers is the first on the checklist. We have to make sure that the Clusterware software is still configured as needed and settings are persistent (and that the server has not reverted to the older settings). We can run CVU at any time to verify the cluster that includes the networking settings.

Another test is to pull the interconnect so that the servers do not have their private network. Then, we validate that one of the nodes has accepted the new connections and that the failover of connections to the surviving node runs the queries as it should. Next, we will test the connections from the application and from utilities like SQL*Plus.

This is not just validating that the users can connect but also checking what happens if a server goes down. We will connect to the database through different applications and then shut down the server. To verify, we will look at the sessions running on both nodes before the shutdown to confirm that there are connections to the node, and then look at the sessions on the node that is still running. If connections do not failover, we have to double-check the tnsnames.ora file and connection strings to make sure that failover mode is in the string and that the service name and the virtual hostname are being used.

dba

Using global views makes it easier to see all processes running across the nodes. But monitoring RAC performance is basically the same as checking performance on a single instance. We can verify what is running and check if the statistics are up to date.

The interconnect can play a role in performance as memory blocks are swapped between the nodes. Oracle Database 11g has improved the Cache Fusion protocols to be more workload-aware so as to help reduce the messaging for reading operations and to improve performance.

Primary and Standby Databases

Oracle offers the option of a standby database with Oracle Data Guard as another type of failover. The primary and secondary database servers do not share any of the database files or disk. They can even be servers located in completely different data centers, which offers a disaster recovery option. The redo logs from the primary server are transported over to the secondary server depending on the protection mode, and then they are applied to the database on the secondary server.

Oracle Data Guard has different protection modes based on data loss and downtime tolerance:

Maximum Protection: It provides zero data loss, but the transactions must be applied synchronously to both primary and secondary database servers. If there are issues applying the logs to the secondary server, the primary server will wait for the transaction to be completed on both servers to commit the change.
Maximum Availability: It has zero data loss as the goal, but if there is a connectivity issue or if the transaction cannot be applied to the secondary server, the primary server will not wait. The primary server still has a record of what has been applied for verification, and the standby database might fall slightly behind, but it is more critical to have the primary database available.
Maximum Performance: It has the potential for minimal data loss. The transport of the logs is done asynchronously, and there is no checking back with the primary server for applying the logs and verifying if the change has been completed.

Now, we will look into the standby databases in detail.

Using Active Standby Databases

The physical standby database is a copy of the primary database and is kept in sync with the primary database. With Oracle Database 11g, the standby database can also be an active database, which remains open for reading while the database is still being synchronized with the primary. This is the Active Data Guard option.

Another option that allows the use of a secondary server is a logical standby database. With this type of standby database, the changes are applied by SQL statements that are converted from the redo logs. This allows the variation of data structures from the primary database, and the changes can still be applied through the SQL statements.

A third standby database option is a snapshot database configuration. This standby database can be converted to a read-write snapshot. It continues to receive the redo information from the primary database but does not apply the changes until converted back to being only a standby database. While in read-write mode, the snapshot standby database can be used to test various changes, such as new application rollout, patches, or data changes. Then, the snapshot is set back to before the changes were made, and the redo log will be applied. Having a copy of the production database for testing like this is extremely valuable for successful rollouts of changes.

dba 2

With this setup, the disaster recovery plan is very simple: connect to the standby database and make it the primary database. The copies of the databases can also be used to offload work such as backups and read-only reporting. This takes advantage of the standby database, which would otherwise sit idle unless the primary database fails.

Setting Up a Standby Database

An existing database can be configured to have a standby database. The first step is to install the Oracle software on the standby server. The database already exists on the primary server. The primary database will need some configuration with the standby logs and parameters. Connections to the secondary database can be set up and then using RMAN, the initial copy of the database can be set up on the standby server.

In summary, the basic steps are as follows:

Install the software on the standby server
Configure the parameters on the primary server
Make the connections by updating tnsnames.ora and listener
Use RMAN to copy the database

We can use the Data Guard broker and management tools to set up automatic failover and manage standby servers. The Data Guard broker needs to be running on both the primary and standby servers. A listener’s entry for the Data Guard broker on the primary and standby servers will help with failover and avoid TNS errors.

The Oracle RAC database can be a primary or standby server. When the Maximum Protection option is chosen for the Data Guard configuration, having an RAC setup on the standby database will reduce the risk for the logs to be applied.

ASM in the RAC Environment

In the Oracle RAC environment, there needs to be an ASM instance for every node in the cluster, but one ASM instance can support multiple instances on the same node.

Managing ASM Disk Groups

ASM disk groups serve as containers for consolidating databases and file systems to be able to use the storage more efficiently and even share between databases. ASM Configuration Assistant (ASMCA) helps us create and manage disk groups. As shown in the figure below new disks can be added to the disk group here, and attributes of the disk group can be edited. Other ASMCA options allow us to manage volumes and file systems in a clustered environment.

ASM Configuration Parameters

The ASM instance is really a process and a bit of memory. Some parameters go into the spfile for configuration of this instance. These parameters provide details about the type of instance and where the disks are located for creating the disk groups.

INSTANCE_TYPE: Set to ASM (default is RDBMS)
ASM_DISKGROUPS: Lists the disk groups that should be mounted
ASM_DISKSTRING: A value that indicates where to discover the disks that are available to be added to a disk group
ASM_POWER_LIMIT: The maximum power for rebalancing operation; a value between 1 and 11 (higher the number faster the rebalancing)

ASMLib is the support library for ASM. It is used for initializing the disk to use with ASM. But, the Linux package for the ASMLib needs to be installed for this.

Viewing ASM Information

When connected to the ASM instance, some v$ views give information about the instances connected, disks that might not be part of a disk group, and files. For example, the v$asm_disk view shows the disks that are being used by the database instance, and when viewed from the ASM instance, it will show all of the disks that are discovered.

Oracle Streams and Advanced Replication

Replication provides copies of data to different servers, and it can be used to move data. While it isn’t a failover mechanism usually associated with high availability, it does help ensure that data is available and can provide a way to selectively pull out some important data. For replication, Oracle offers Oracle Streams and the Advanced Replication option.

Oracle Streams

Oracle Streams, included in Oracle Database installation, captures data changes to distribute to other databases. The phases of Streams are similar to the SQL Server publisher, distributor, and subscriber roles. A user needs to be created to manage the replication, and a tablespace is also required.

Setting Up Oracle Streams

The Streams Administrator user needs DBA permissions and admin privileges on the DBMS_STREAMS_AUTH package. The parameter configurations for Streams setup are as follows: GLOBAL_NAMES= TRUE, JOB_QUEUE_PROCESS higher than 2, and STREAMS_POOL_SIZE at least 200 MB.

Changes for data and objects are captured and replicated. Replication can be configured for the whole database, schemas, tables, or even for tablespaces. We can set up Streams through OEM. Through OEM, we can also choose to set up a downstream capture and create an advanced queue. The downstream capture collects streams on a remote database other than the source.

Using Oracle Streams

Oracle Streams uses logical change records (LCRs) for each row of a table modified. Each LCR has the name of the table changed, old and new values for any changed column, and values for the key columns. This information can be applied to the rows at the destination sites, and it can resolve conflicts if they arise.

Advanced Replication

Along with Oracle Streams replication, Oracle offers an Advanced Replication option. This handles master replication with a single master or multiple masters. The multi-master replication is known as peer-to-peer, and in this any of the servers can be updated. Advanced Replication processing to multiple masters can be asynchronous or synchronous.

For this type of replication, we need to set up a Replication Admin user. Tables in the databases should have primary keys. The DBMS_REPCAT package provides routines for administering and updating the replication catalog.

Advanced Replication offers the option of replicating to non-Oracle databases. This allows a way to provide data to several different systems. Also, for this type of replication, the Oracle Database version and platform do not need to be the same. Advanced Replication may be suitable for distributed or data warehouse databases to have copies available for other systems or to maintain the workload on different servers.