AWS RDS Replica Lag | Blue Matador

Docs

Causes

High write workload
Network issues across regions or availability zones

Solutions

If replica lag is only occasionally high, and the reads are either never on the read replicas, or the application can tolerate stale reads, then replica lag may not be a concern.

If reads need to be more consistent than the replica lag provides for, make sure to connect to your Aurora cluster using the main cluster endpoint. This guarantees that reads will be made on the primary instance and never be subject to replica lag.

If high replica lag is caused by a high write workload, then consider structuring the application such that writes can be scheduled and completed asynchronously to avoid large spikes in write activity.

If you can determine that the replica lag is caused by network issues, consider creating another replica in another availability zone or region as a backup in case of a failure. Closely monitor the cluster. If you can find a time with low write activity, consider manually promoting one of the replicas to primary in a known good network to avoid the complications that a deteriorating network can cause.

Resources

Replication with Amazon Aurora (AWS Documentation)
Amazon RDS Metrics and Dimensions (AWS Documentation)