What is CAP Theorem

In theoretical computer science, the CAP theorem, also named as Brewer's theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store or distributed system to simultaneously provide more than two out of the following three guarantees:

  • Consistency: Every read receives the most recent write or an error. A guarantee that every node in a distributed cluster returns the same, most recent, successful write. Consistency refers to every client having the same view of the data. For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed successful to client. Consistency in CAP refers to sequential consistency, a very strong form of consistency. There are various types of consistency models like eventual consistency and strong consistency. 
  • Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write. Every non-failing node returns a response for all read and write requests in a reasonable amount of time. The key word here is every. To be available, every node on (either side of a network partition) must be able to respond in a reasonable amount of time whether it is a read or write.
  • Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. The system continues to function and upholds its consistency guarantees in spite of network partitions. Network partitions are a fact of life. Distributed systems guaranteeing partition tolerance can gracefully recover from partitions once the partition heals.

Distributed system is a system of two or more machines that collaboratively work together to achieve a common goal that is to support application load (read and write requests).

Today, distributed data systems are classified based on the two CAP characteristics they support:

  • CP database: A CP database delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved. When consistency is chosen over availability, the system will return an error or time-out if particular information cannot be updated to other nodes due to network partition or failures.
  • AP database: An AP database delivers availability and partition tolerance at the expense of consistency. When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. (When the partition is resolved, the AP databases typically re-sync the nodes to repair all inconsistencies in the system.). In other words in this case write conflict resolution has to be done, I guess. When availability is chosen over consistency, the system is will always process the client request and try to return the most recent available version of the information even if it cannot guarantee it is up to date due to network partitioning.
  • CA database: A CA database delivers consistency and availability across all nodes. It can’t do this if there is a partition between any two nodes in the system, however, and therefore can’t deliver fault tolerance.

The CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability. Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions.



The above diagram is shown as starting point for understanding the CAP theorem. But actually the right scenario is that you have to deal with partition tolerance at all cost.



The part where all three sections intersect is white because it is impossible to have all three properties in networked shared-data systems. A Venn diagram or a triangle is an incorrect visualization of the CAP. Any CAP theorem visualization such as a triangle or a Venn diagram is misleading. The correct way to think about CAP is that in case of a network partition (a rare occurrence) one needs to choose between availability and consistency.

When a network partition failure happens should we decide to:

  • Cancel the operation and thus decrease the availability but ensure consistency.
  • Proceed with the operation and thus provide availability but risk inconsistency.
To prove that:

For example, suppose we have a very simple distributed system with only two server S1 and S2 and assume our distributed system is Consistent, Available and Partition tolerant all at the same time.


Suppose there is network failure, since our system is partition tolerant it suppose to work. During the network failure client sends a write request to server S1. S1 will receive the request and process it.

Our system is consistent, so S1 must update the value in S2 before confirming to the client but S1 cannot update S2 because there is a network failure. In this case, the request to S1 will timeout which means our system is not available.

If our system Available, S1 will respond to the client without waiting for an update in S2. If any client makes a read request to S2 for the same information it will receive older value not the result of recent write. This means our system cannot be consistent when it is available.

This proves CAP theorem.

All distributed system needs partition tolerance because no distributed system is safe from network failure. In presence of a partition tolerance, we can select one out of two options: Consistency and Availability.

In any networked shared-data systems partition tolerance is a must. Network partitions and dropped messages are a fact of life and must be handled appropriately. Consequently, system designers must choose between consistency and availability. Simplistically speaking, a network partition forces designers to either choose perfect consistency or perfect availability. Picking consistency means not being able to answer a client's query as the system cannot guarantee to return the most recent write. This sacrifices availability.

Network partition forces nonfailing nodes to reject clients' requests as these nodes cannot guarantee consistent data. At the opposite end of the spectrum, being available means being able to respond to a client's request but the system cannot guarantee consistency, i.e., the most recent value written. Available systems provide the best possible answer under the given circumstance.

During normal operation (lack of network partition) the CAP theorem does not impose constraints on availability or consistency.

The CAP theorem is responsible for instigating the discussion about the various trade offs in a distributed shared data system. It has played a pivotal role in increasing our understanding of shared data systems. Nonetheless, the CAP theorem is criticized for being too simplistic and often misleading. Over a decade after the release of the CAP theorem, Brewer acknowledges that the CAP theorem oversimplified the choices available in the event of a network partition.

According to Brewer, the CAP theorem prohibits only a “tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare." System designers have a broad range of options for dealing and recovering from network partitions. The goal of every system must be to "maximize combinations of consistency and availability that make sense for the specific application."

No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the presence of a partition, one is then left with two options: consistency or availability. When choosing consistency over availability, the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning. When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.

This is important to understand because as the user demand patterns has changed in the past few decades, it is more and more required to break yourself from traditionally serving the request from single server system towards distributed and well connected as well as communicated system. Communicated here means that all systems know the state of each other to take a collective decision based on some configuration.

But then it also becomes important to understand the pros and cons of distributed system. Please note that we have discussed only the basic idea of CAP theorem here.

References are taken from medium.com, Wikipedia etc.

Comments

Back To Top

Popular posts from this blog

error 18 at 0 depth lookup: self signed certificate

How to check fragmentation in MySQL tables

How to Drop or Remove or Decommission a Database in Oracle