In a DynamoDB table, items are stored across many partitions according to each item’s partition key. Each partition has a share of the table’s provisioned RCU (read capacity units) and WCU (write capacity units). When a request is made, it is routed to the correct partition for its data, and that partition’s capacity is used to determine if the request is allowed, or will be throttled (rejected). Some amount of throttling should be expected and handled by your application.

Excessive throttling is caused by:

  • Hot partitions: throttles are caused by a few partitions in the table that receive more requests than the average partition
  • Not enough capacity: throttles are caused by the table itself not having enough capacity to service requests on many partitions


EFFECTS


Excessive throttling can cause the following issues in your application:

  • Data can be lost if your application fails to retry throttled write requests
  • Processing will be slowed down by retrying throttled requests
  • Data can become out of date if writes are throttled but reads are not

 

QUICK FIX


If your table’s consumed WCU or RCU is at or near the provisioned WCU or RCU, you can alleviate write and read throttles by slowly increasing the provisioned capacity. Be aware of how partitioning in DynamoDB works, and realize that if your application is already consuming 100% capacity, it may take several capacity increases to figure out how much is needed. Increasing capacity by a large amount is not recommended, and may cause throttling issues due to how partitioning works in tables and indexes.

If your table has any global secondary indexes be sure to review their capacity too. A throttle on an index is double-counted as a throttle on the table as well.

 

THOROUGH FIX


The more elusive issue with throttling occurs when the provisioned WCU and RCU on a table or index far exceeds the consumed amount. It is possible to experience throttling on a table using only 10% of its provisioned capacity because of how partitioning works in DynamoDB. When this happens it is highly likely that you have hot partitions.

Understanding partitions is critical for fixing your issue with throttling. A very detailed explanation can be found here. The important points to remember are:

  • A partition can accommodate only 3,000 RCU or 1,000 WCU
  • A partition can only hold 10GB of data
  • Partitions are never deleted, even if capacity or stored data decreases
  • When a partition splits, its current throughput and data is split in 2, creating 2 new partitions
  • Not all partitions will have the same provisioned throughput

If you are experiencing throttling on a table or index that has ever had more than 10GB of data, or 3,000 RCU or 1,000 WCU, then your table is guaranteed to have more than one, and throttling is likely caused by hot partitions. Increasing capacity of the table or index may alleviate throttling, but may also cause partition splits, which can actually result in more throttling.

Take a look at the access patterns for your data. If you are querying an index where the cardinality of the partition key is low relative to the number of items, that can easily cause throttling if access is not distributed evenly across all keys. If your table uses a global secondary index, then any write to the table also writes to the index. If the many writes are occuring on a single partition key for the index, regardless of how well the table partition key is distributed, the write to the table will be throttled too. To get a very detailed look at how throttling is affecting your table, you can create a support request with Amazon to get more details about access patterns in your table.

If your table has lots of data, it will have lots of partitions, which will increase the chance of throttled requests since each partition will have very little capacity. A table with 200 GB of data and 2,000 WCU only has at most 100 WCU per partition. To help control the size of growing tables, you can use the Time To Live (TTL) feature of dynamo. TTL lets you designate an attribute in the table that will be the expire time of items. After that time is reached, the item is deleted. Deleting older data that is no longer relevant can help control tables that are partitioning based on size, which also helps with throttling.

It is common when first using DynamoDB to try to force your existing schema into the table without recognizing how important the partition key is. If the chosen partition key for your table or index simply does not result in a uniform access pattern, then you may consider making a new table that is designed with throttling in mind. If your use case is write-heavy then choose a partition key with very high cardinality to avoid throttled writes. Consider using a lookup table in a relational database to handle querying, or using a cache layer like Amazon DynamoDB Accelerator (DAX) to help with reads.


RESOURCES


 Note: Our system uses DynamoDB metrics in Amazon CloudWatch to detect possible issues with DynamoDB. Due to the API limitations of CloudWatch, there can be a delay of as many as 20 minutes before our system can detect these issues.