What is database concurrency ?
Database Concurrency refers to the situation where multiple processes or multiple users access or insert or update some data from database, at the same time.
Consider a module within a bank application which is responsible for showing and updating bank balances for users. Let’s say, there are two customers trying to deposit amount in the same account, but from different branches of that bank, at the same time. Both of those counters read the data for purpose of updating. The read operation happens at same point of time. But, update operation happens one after the other.
What should happen in case both of the counters read the data at same time, but the updates are happening one after other ?
Obviously, it should not override the amount that was added by first counter otherwise balance of account would be incorrect. There are some mechanisms which are applied to ensure consistency of data in such cases.
Concurrency control refers to the mechanism which is used to ensure consistency of data in presence of concurrent modifications. If no concurrency control is applied, then default is, every update is successful. So in case of concurrent read and updates, last update wins.
Some applications might not need concurrency controls because of the way it uses data. For such applications, concurrency control implementation may not be required. So, the decision of whether to implement a concurrency control mechanism should be carefully taken.
Let’s have a look at different concurrency controls that can be applied.
Pessimistic Concurrency (or Locking)
This is stricter concurrency control. It can be used when an application does not want to lose any updated version of data at all. One way to achieve this is by the use of locks. This type of mechanism of concurrency control is called as pessimistic concurrency
When a process or user needs to read the data, the caller requests an exclusive lock on the data. Once the lock is granted, the caller can then read and update data. Other callers who need to perform update on the data, would also request locks. But they will not be granted an exclusive lock until previous lock is released. So other callers would have to wait until lock is released.
Locks can represent either a read-only access or they can be given for update access. When a read-only lock is requested by a caller, the caller cannot perform update operation. Also, if a row has read-only lock, system can decide to grant read-only locks to multiple callers if all existing locks on the data are of read-only type.
If a caller requests update access, then other callers cannot acquire lock on that data until update access lock is released.
This type of concurrency control might be exhaustive as it uses resources to manage locks. It is difficult to implement a program that manages locking. Also, if number of concurrent users grow, then it might also affect performance of the system.
EF Core does not have in-built support pessimistic concurrency control.
This type of mechanism does not use locks. As the name suggests, this approach is optimistic about database concurrency and conflicts. It allows concurrent updates to data and then optionally provides a way to handle conflicts.
One of the way to avoid conflicts is to use a concurrency token to decide if the version of the data that is being updated is older than the version of data in database. This token can be then used to decide if the update should be allowed or not.
A concurrency conflict is the situation when one user reads the data in order to update it, but second user updates the data in database before first user could update it. This is a conflict, because second update operation causes conflict between three values:
- value which was read by first user
- value which was updated by second user
- value which is being updated by first user
Different applications may have different strategies to resolve these conflicts. There are three possible ways to handle such conflicts:
- Last update wins, means the last updated value would be written to database column. This is generally default because it does not require any additional configurations or code to implement this strategy.
- First update wins, all subsequent update operations throw errors. All to those users or services will have to read the updated version to perform new update. This requires implementing some kind of concurrency control.
- Merging updated values, this can be rarely used for related entities. concurrent updates merge the collection of related data if the only change between updates is in the associations. This may be very rare and it is not conflict resolution but a strategy to avoid conflicts and overwriting of associations.
- Manual conflict resolution, this can be an approach which can be used in some critical applications. All conflicted versions are displayed to an admin and then the admin decides which version of data should be stored. This requires custom implementation to identify conflicts and store all conflicted versions in an intermediate store until they are resolved by an admin.
There are different mechanisms that can be applied for handling concurrency conflicts. Let’s have a look at common concurrency controls.