How to measure Network Latency for Oracle Data Guard Replication

Introduction

Oracle Data Guard replicates data from the primary to one or multiple standby databases for high availability and disaster recovery. The decision to transfer the redo synchronously or asynchronously depends on your RTO and RPO. Synchronous redo transport provides zero data loss and lower recovery time, but your application needs to wait until the data is written on the standby site on disk (SYNC) or in memory (FASTSYNC) before getting the acknowledgment about the commit. The additional time needed mainly depends on the network latency for FASTSYNC, and additionally on the write to disk time for SYNC.

So the question arising is, what is the minimum latency needed to have no or minimal impact on application performance for synchronous replication? The answer, what a surprise: it depends!

As described in Best Practices for Synchronous Redo Transport:

Each application will have a different tolerance for synchronous replication. Differences in application concurrency, number of sessions, the transaction size in bytes, how often sessions commit, and log switch frequency – result in differences in impact from one application to the next even if round-trip network latency (RTT), bandwidth and log file write i/o performance are all equal. In general Oracle sees customers having greater success with synchronous transport when round trip network latency is less than 5ms, than when latency is greater than 5ms. Testing is always recommended before drawing any specific conclusions on the impact of synchronous replication on your workloads.

The question we want to answer in this blog post is how to measure the latency between your primary and standby database machines for Data Guard replication.

The Environment

I’m going to test the latency between the three Availability Domains in the OCI Frankfurt region, provisioning an IaaS VM in each AD (vmfraad1, vmfraad2, and vmfraad3) with 1 OCPU and 1 Gbps network bandwidth each.

PING

Ping is an easy-to-use utility that is available for virtually all operating systems and might be the first thing many people think about for measuring network latency:

[opc@vmfraad1 ~]$ ping -c 3 10.10.0.127
PING 10.10.0.127 (10.10.0.127) 56(84) bytes of data.
64 bytes from 10.10.0.127: icmp_seq=1 ttl=64 time=0.551 ms
64 bytes from 10.10.0.127: icmp_seq=2 ttl=64 time=0.557 ms
64 bytes from 10.10.0.127: icmp_seq=3 ttl=64 time=0.537 ms

--- 10.10.0.127 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2048ms
rtt min/avg/max/mdev = 0.537/0.548/0.557/0.020 ms

However, even though ping gives you an RTT, the primary use of ping is usually to test the reachability of a host. Latency measurement via ping is not significant for Data Gaurd replication as the mechanism used is different.

oratcptest

Oracle designed a particular utility called oratcptest specifically to help customers assess network resources used by Data Guard redo transport, GoldenGate RMAN backup and restore, migration, Data Guard instantiation, and database remote clone.

Download

You can download the utility (jar file) from Oracle MOS Note 2064368.1, which includes a complete description of the utility. Scroll to the bottom of the MOS note, under Attachments, and click on oratcptest to download the oratcptest.jar file.

Save the file on both your primary and standby database machines. Oratcptest can be executed as any user. Root privileges are not required. In my case, I’ll just save it in /home/opc and run it as opc user:

# primary
[opc@vmfraad1 ~]$ ls -l /home/opc/
-rw-r--r--. 1 opc opc 28038 Apr 22 09:53 oratcptest.jar

# standby
[opc@vmfraad2 ~]$ ls -l /home/opc/
-rw-r--r--. 1 opc opc 28038 Apr 22 09:53 oratcptest.jar

Preparation

If not already done, you need to install java on both machines to be able to run the jar file:

sudo yum install java -y

Also, on both machines, open port 1521, or the port you are going to use for the Data Guard traffic:

## primary
[opc@vmfraad1 ~]$ sudo firewall-cmd --zone=public --add-port=1521/tcp --permanent
success
[opc@vmfraad1 ~]$ sudo firewall-cmd --reload
success
[opc@vmfraad1 ~]$ sudo firewall-cmd --list-all
public (active)
...
  ports: 1521/tcp
...

## standby
[opc@vmfraad2 ~]$ sudo firewall-cmd --zone=public --add-port=1521/tcp --permanent
success
[opc@vmfraad2 ~]$ sudo firewall-cmd --reload
success
[opc@vmfraad2 ~]$ sudo firewall-cmd --list-all
public (active)
...
  ports: 1521/tcp
...

If port 1521 is already being used on the hosts, e.g., you have Oracle databases running using port 1521, you need to choose another port for oratcptest.

Run oratcptest

Oratcptest.jar can be run in client and server modes.

On the standby host (the server, vmfraad2), execute the following command:

java -jar oratcptest.jar -server <server_name_or_ip> -port=<port_number>

[opc@vmfraad2 ~]$ java -jar oratcptest.jar -server vmfraad2 -port=1521
OraTcpTest server started.

It will run and not give you back the prompt.

On the primary host (the client, vmfraad1), you have three options:

1. -mode=async for ASYNC redo transport, measuring network throughput only, as latency to the standby does not impact the application performance:

## ASYNC
java -jar oratcptest.jar <server_name_or_ip> -mode=async -duration=100s -interval=20s -length=8k -port=1521

2. -mode=sync and -write for SYNC redo transport, measuring network throughput, latency, and write time to disk:

## SYNC
java -jar oratcptest.jar <server_name_or_ip> -mode=sync -duration=100s -interval=20s -length=8k -port=1521 -write

[opc@vmfraad1 ~]$ java -jar oratcptest.jar vmfraad2 -mode=sync -duration=100s -interval=20s -length=8k -port=1521 -write
[Requesting a test]
        Message payload        = 8 kbytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = YES
        Statistics interval    = 20 seconds
        Test duration          = 100 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(12:10:30) The server is ready.
                    Throughput             Latency
(12:10:50)      7.544 Mbytes/s            1.037 ms   (disk-write 0.564 ms)
(12:11:10)      7.600 Mbytes/s            1.029 ms   (disk-write 0.568 ms)
(12:11:30)      7.641 Mbytes/s            1.024 ms   (disk-write 0.563 ms)
(12:11:50)      7.702 Mbytes/s            1.016 ms   (disk-write 0.557 ms)
(12:12:10)      7.600 Mbytes/s            1.029 ms   (disk-write 0.569 ms)
(12:12:10) Test finished.
               Socket send buffer = 166400 bytes
                  Avg. throughput = 7.617 Mbytes/s
                     Avg. latency = 1.027 ms (disk-write 0.564 ms)

3. -mode=sync without -write for FASTSYNC, measuring network throughput and latency, but without writing to disk on the standby, reducing the overall round trip and improving application performance:

## FastSync
java -jar oratcptest.jar <server_name_or_ip> -mode=sync -duration=100s -interval=20s -length=8k -port=1521

Results

Here are all results from my testing in OCI Frankfurt. All average values in milliseconds:

These tests are there to give you a first impression. Please do your own testing in your own environment using the adequately sized VMs for your workloads before making any decision, on-premises, Oracle Cloud, or in other cloud environments.

Conclusion

Network latency, mainly impacted by physical distance, is crucial for Data Guard synchronous redo transport. Oracle provides a particular and easy-to-use utility called oratcptest for measuring network latency and throughput. Use this utility for your testing instead of ping to get accurate results that map to what you will get when deploying your Data Guard environment.

Oratcptest provides many more options than described in this blog post. Please read the Oracle MOS note for a full description.

Further Reading

Would you like to get notified when the next post is published?