Azure Cosmos DB

Ainos I 4:50 pm, 25th October

Today’s applications are facing more constraints than ever. As the amount of users increases, applications need to be resilient, ensure accessibility and data consistency across the globe. 

In recent years, microservices-oriented architectures gained in popularity. They provided solutions to achieve resiliency, accessibility and self-healing. With this type of architecture, applications can dynamically react and scale out when the workload increases and, on the contrary, scale in when resources are not used. This solution is performant and cost effective. 

In distributed systems one of the most challenging aspects however is managing data. Even when the application is built as a set of hundreds or thousands of services able to handle millions of requests, one of the most critical components of the system is the storage. In some situation, storage can represent a single point of failure. Solutions have been developed to ensure redundancy and data replications capabilities across many regions, but these solutions require a hard administrative work in order to keep the subsystem behaviour transparent to the developers. 

Solutions have been developed in the cloud in order to bring an answer to this problem. This is what Cosmos DB is all about. A fully managed, globally distributed NoSQL database. 


NoSQL 

Cosmos DB is a schema agnostic NoSQL database. It means documents can be stored in the database without having to honour a strict schema, as we used to for relational databases. Documents are stored as JSON objects and can have any structure. JSON has the advantage of being fully adopted in the web standards and is thus a suited choice to format data. 

Here is an example on how data is stored in a container. 

NoSQL vs relational database is a vast topic that could be discussed in its own dedicated article. However, it is common to find articles making comparisons between consistency vs performance and this is probably the most important concept to keep in mind. Relational databases ensure data consistency in respect with ACID semantics. However, this consistency comes with heavy trade-offs regarding performance. 

On the other hand, NoSQL requires more investments from the developers in order to respect data consistency and accuracy but leads to great improvements of performance. And that’s what a globally distributed database tends to achieve: very high performance! 


APIs 

One of the most uncommon characteristics of Cosmos DB is probably the large choice of APIs it supports. This functionality allows developers to treat Cosmos DB as if it was another technology, while supporting the distributed ecosystem behind the scenes. This makes migration work easier as no code change is required. 

Supported APIs are listed below: 

    - NoSQL (native API) 

    - MongoDB 

    - PostgreSQL 

    - Cassandra 

    - Gremlin 

    - Table 

The choice of the API is important and will depend on different factors: 

    - Is the storage used for a new application? 

    - Does the team already have experience with another DB? 

    - … 

The decision tree below describes which factors to take into consideration when choosing the API. 


Partitioning 

Cosmos DB is horizontally scalable. This means that instead of scaling up with larger VMs, Cosmos DB scales out by adding servers. 

This leads to one of the fundamental concepts of Cosmos DB: partitions. 

Cosmos DB distinguishes two kinds of partitions: physical and logical. 


Physical partitions 

Cosmos DB scales out by distributing containers across physical partitions. Physical partitions can be viewed as the physical servers used to store the data. A single physical partition can contain many logical partitions. 

The way Cosmos DB manages physical partitions is an internal implementation and should not be a concern for the developers. Developers should instead focus on logical partition by identifying the best partition key for the container. 


Logical partitions 

In a container, all items are divided into subsets called logical partitions. Logical partitions are formed by using the item’s partition key and all items in a logical partition have the same partition key. 

The placement of logical partitions in a physical partition is entirely managed by Cosmos DB and is completely transparent for developers. Repartition of logical partitions across physical partitions is the key element that allows Cosmos DB to spread the load across servers. 

During the analysis phase, identifying the most suited partition key is critical to ensure performance. Here are few rules to follow in order to select the best partition key: 

    - Partition key must be a property that never changes. 

    - Partition key must be a string. 

    - Partition key must have a high cardinality 

    - Partition key must spread consumption and data storage evenly. 


Throughput 

Cosmos DB represents the operation's cost with Request Units (RU). 1 RU is defined as the cost to do a point read (fetching by id and partition key) of a 1KB item. 

The RUs consumption for an operation is deterministic. 

Numerous factors will affect the RUs when querying the database. The size of the items you will fetch, the way you will define the indexes, the number of properties indexed (expensive for write operations), the complexity of the queries, the type of read (single point or not), etc… 

It is essential while working with Cosmos DB to understand what kind of operations you expect to perform on the container as the behaviour of the database will highly depend on the operations cost and the provisioned throughput. 

Cosmos DB offers three modes that will define the way you will get charged. Provisioned throughput, serverless and autoscale mode. Each of these modes will define how Cosmos DB will react to a certain amount of requests. Whatever mode you will choose, the main metric used for the pricing is the number of request units provisioned or received by the database. 


Conclusion 

Cosmos DB is a modern technology that tries to handle modern challenges. 

Development teams need to be productive. Migrations and upskilling are time consuming tasks that often make the adoption of new products difficult. Cosmos DB is aware of this problem and proposes a set of APIs to allow transparent migrations. The only task left to the developers is as simple as changing a connection string. 

NoSQL databases also follow this logic of productive development. Modern practices tend to avoid heavy, long-term anticipated analysis. This leads softwares to evolve continuously and makes fixed schema difficult to handle as new features are implemented. NoSQL brings flexibility and allows teams to deliver applications at a higher rate. 

Being fast is not sufficient to be adapted to new constraints of application development. 

Applications tend to be more and more distributed and try to break interdependencies. It is in this logic that Cosmos DB steps in. Cosmos DB offers you the guarantee that, eventually, your data will be globally available.


Subscribe to our Newsletters

There are no any top news
Info Message: By continuing to use the site, you agree to the use of cookies. Privacy Policy Accept