You are currently viewing Partitioning Schemes For Azure Cosmos DB

Partitioning Schemes For Azure Cosmos DB

If you have created how to create a new container in Cosmos DB account, you must be knowing that you need to specify the partition key while creating a new container. We will see how the partition key plays role in organizing data logically and physically.

Partition Key

Every item in a Cosmos DB container can be identified uniquely by a partition key and an item ID. Both these keys form an index on the container. Hence choosing right partition key is very critical to application’s performance.

You need to specify a partition key for every container. Every record has some value for partition key. All the records which have same value for partition key form a logical partitions.

For example, if you have a container which contains information about residents in a country and the city name as partition key, then every person from same city will be placed in same logical partition.

Selecting a Partition Key

While selecting partition key, there are few limitations which needs to be known. Every logical partition has an upper limit of size (which is 10 GB as of today). Hence the partition key should be selected in such a way that there are lot of logical partitions, each of small size. Obviously, while doing so, you also need to consider nature of the queries.

Azure Cosmos containers have a minimum throughput of 400 request units per second (RU/s). When throughput is provisioned on a database, minimum RUs per container is 100 request units per second (RU/s).

Requests to the same partition key can’t exceed the throughput that’s allocated to a partition. If requests exceed the allocated throughput, requests are rate-limited. So, it’s important to pick a partition key that doesn’t result in “hot spots” within your application.

While choosing a partition key, you should consider only those attributes which you think will be most frequently used in querying the data.

Synthetic partition key

Choosing the right partition key is always critical for performance of your application. It’s the best practice to have a partition key with many distinct values, such as hundreds or thousands.

The goal is to distribute your data and workload evenly across the items associated with these partition key values. If such a property doesn’t exist in your data, you can construct a synthetic partition key

There are different strategies which you can use for creating partition keys.

  • Concatenate multiple properties of item
  • Suffix a property with random suffix
  • Suffix a property with pre-calculated suffix

Physical Partitions

Azure Cosmos DB transparently and automatically manages placement of logical partitions on physical partitions. This work is done efficiently to satisfy the performance and scalability needs of your application.

If your storage and throughput requirements increase in future, then Cosmos DB moves the logical partitions and spreads them on greater number of physical partitions.

The physical partitions are internal implementation of Azure and you do cannot control its size, or any other aspect.

The throughput of a container is evenly distributed among the physical partitions. So if your partition key is not distributing the data evenly in different physical partitions, there is a chance of some partitions becoming “hot” and rate-limiting (i.e. throttling ) may occur.

I hope this article helps you to understand how the partition key affects performance and scalability of your application. Let me know your thoughts.

Leave a Reply