How Draining and Application Continuity Work for Maintenance with Oracle RAC

Introduction

Oracle RAC provides scalability and high availability for the Oracle Database. If one server (RAC node) fails or is taken offline for maintenance, the database is still accessible through the additional nodes. However, what happens to client sessions that are executing some work, whether reading or changing data, when maintenance begins? That work will be interrupted and need to be executed again by the end-user or the application unless you implement draining and enable Application Continuity or Transparent Application Continuity.

The Environment

Let’s take a 2-node Oracle RAC as an example environment and go through time to understand how Fast Application Notification (FAN), draining, and Application Continuity make maintenance events transparent to end-user. The application is using a connection pool that is configured to hold 30 sessions.

Normal Operation

During normal operation, both RAC nodes are up and running and serving the application. Depending on your load-balancing strategy the number of sessions might be different or equally distributed across both nodes. For a better visualization of the diagram below let’s assume having 10 sessions on node 1 and 20 sessions on node 2:

The sessions can be idle (connected but doing nothing), or active (in the middle of a request). A request can be reading (SELECT) or manipulating (INSERT, DELETE, UPDATE) data.

Stop Service with Draining

At some point in time, we will need to do maintenance on the system. As we have a 2-node RAC, maintenance can be done in a rolling manner, one node after another (most of the time). Maintenance can be, for example, patching the operating system, the Grid Infrastructure, or the database.

To begin with maintenance, we first stop the database services on node 2. However, we don’t stop those immediately, but let the stop be after 300 seconds (5 minutes), allowing active sessions to finish their work before the service actually stops and the sessions get disconnected. This is called draining with a drain timeout of 5 minutes:

srvctl stop service -db RACCDB_fra -service acsrv -instance RACCDB1 -drain_timeout 300

At the time of stopping the service, there were 5 idle sessions and 15 active sessions on node 2.

Fast Application Notification (FAN) sends a stop event. The connection pool reacts to that notification and immediately closes the 5 idle sessions on node 2.

Further new connections will be established and opened on node 1 when requested by the application. Assuming the application requires 30 connections all the time, 5 new connections will be opened on node 1.

The remaining 15 active sessions on node 2 will continue executing their work. If they finish the work within the drain timeout (5 minutes) and return the connection to the pool, then the session is closed and a new one is established on node 1 when requested.

If you have an OLTP application with short transactions or you choose a drain timeout large enough for all active sessions to drain (= to finish their work), then you will encounter no errors and no interruption on the application side.

Here, let’s assume 13 out of 15 active sessions drained within 5 minutes. The remaining 2 users were leaving their transactions without committing.

We end up with 28 sessions on node 1, and 2 remaining sessions on node 2 right before drain timeout:

Application Continuity / Transparent Application Continuity

When the drain timeout completes, the remaining 2 sessions on node 2 will be disconnected. With no Application Continuity in place, the end-user will receive an error and need to handle it.

With Application Continuity configured, the sessions will be reconnected to node 1 and the interrupted in-flight work will be replayed, transparently to the end-user:

With that, all 30 sessions are on node 1 now and node 2 is ready for maintenance.

Conclusion

Oracle RAC provides high availability for the Oracle database. From the application perspective, Fast Application Notification (FAN) allows relocating sessions to the running node, Draining enables active sessions to finish their requests within a predefined drain timeout, and Application Continuity replays interrupted requests for the sessions that did not drain (=finish executing their requests). All this is done transparently to the end-user and applications.

Further Reading

Would you like to get notified when the next post is published?