Replication lag measures how far an Aurora replica’s data is behind the data in the primary instance. Aurora replicas typically experience less than 100ms of replication lag.

Causes


  • High write workload
  • Network issues across regions or availability zones 

 

Solutions


If replica lag is only occasionally high, and the reads are either never on the read replicas, or the application can tolerate stale reads, then replica lag may not be a concern.

If reads need to be more consistent than the replica lag provides for, make sure to connect to your Aurora cluster using the main cluster endpoint. This guarantees that reads will be made on the primary instance and never be subject to replica lag.

If high replica lag is caused by a high write workload, then consider structuring the application such that writes can be scheduled and completed asynchronously to avoid large spikes in write activity.

If you can determine that the replica lag is caused by network issues, consider creating another replica in another availability zone or region as a backup in case of a failure. Closely monitor the cluster. If you can find a time with low write activity, consider manually promoting one of the replicas to primary in a known good network to avoid the complications that a deteriorating network can cause.

 

Resources