Understand Database Replication The Good and Ugly

Nader Medhat
6 min readJan 23, 2021

--

What Is Database Replication?

Data replication is the process of storing the same data multiple nodes that are connected via a network to improve data availability and accessibility and to improve system resilience and reliability.

Advantages of replication

Improved reliability and availability

If one node goes down due to faulty hardware, malware attack, or another problem, the data can be accessed from a different node.

Improved network performance:

Having the same data in multiple nodes can lower data access latency since required data can be retrieved closer to where the transaction is executing and this could help to keep data geographically close to your users

Increased data analytics support

Replicating data to a data warehouse empowers distributed analytics teams to work on common projects for business intelligence.

Replication types

leader-follower replication

How does this model work?

  • One of the nodes is designated to be a leader (also known as master or primary) and the other nodes are known as followers (read replicas, slaves, or secondaries)
  • When clients want to write to the database, they must send their requests to the leader, which first writes the new data to its local storage.
  • Whenever the leader writes new data to its local storage, it also sends
    the data change to all of its followers as part of a replication log or change stream.
  • Each follower takes the log from the leader and updates its local copy of the data.
  • When a client wants to read from the database, it can query either the leader or any of the followers. However, writes are only accepted on the leader

This mode of replication is a built-in feature of many relational databases, such as PostgreSQL, MySQL, and SQL Server’s
It is also used in some nonrelational databases, including MongoDB, RethinkDB, and Espresso

Problems with leader-follower replication

Leader failure

Handling a failure of the leader is trickier one of the followers needs to be promoted to be the new leader, clients need to be reconfigured to send their writes to the new leader, and the other followers need to start consuming data changes from the new leader.

Follower failure

On its local disk, each follower keeps a log of the data changes it has received from the leader, If a follower crashes and is restarted, or if the network between the leader and the follower is temporarily interrupted, the follower can recover quite easily from its log, it knows the last transaction that was processed before the fault occurred. Thus, the follower can connect to the leader and request all the data changes that occurred during the time when the follower was disconnected. When it has applied these changes, it has caught up to the leader and can continue receiving a stream of data changes as before.

Multi leader replication

How does this model work?

  • You have a database with replicas nodes in several different data centers (perhaps so that you can tolerate the failure of an entire data center, or perhaps in order to be closer to your users).
  • Like a normal leader-based replication setup, the leader has to be in
    one of the datacenters and all writes must go through that datacenter.
    In a multi-leader configuration, you can have a leader in each datacenter.
  • When the client sends a written request to any datacenter the leader will also send the data change to all of its followers as part of a replication log or change stream and to all different data centers related to the system

Some Use Cases for Multi-Leader Replication

Clients with offline operation

If you have an application that needs to continue to work while it is disconnected from the internet.
For example, consider the calendar apps on your mobile phone, your laptop, and other devices. You need to be able to see your meetings (make read requests) and enter new meetings (make write requests) at any time, regardless of whether your device currently has an internet connection. If you make any changes while you are offline, they need to be synced with a server and your other devices when the device is next online.

Collaborative editing

Real-time collaborative editing applications allow several people to edit a document simultaneously. For example, Etherpad and Google Docs allow multiple people to concurrently edit a text document or spreadsheet.

Problems with Multi leader replication

Write Conflicts

The biggest problem with multi-leader replication is that write conflicts can occur, which means that conflict resolution is required.
For example, consider a wiki page that is simultaneously being edited by two users, user 1 changes the title of the page from A to B, and user 2
changes the title from A to C at the same time. Each user’s change is successfully applied to their local leader. However, when the changes are asynchronously replicated, a conflict is detected. This problem does not occur in a single-leader database.

leaderless replication

How does this model work?

  • You will have many replicas nodes
  • The client directly sends its writes to several replicas and the available replicas accept the write and the unavailable replica misses it.
  • When a client reads from the database, it doesn’t just send its request to one replica: read requests are also sent to several nodes in parallel. The client may get different responses from different nodes

Problems with Multi leader replication

Writing to the Database When a Node Is Down

Imagine you have a database with three replicas, and one of the replicas is currently unavailable perhaps it is being rebooted to install a system update. In a leader-based configuration, if you want to continue processing writes, you may need to choose a new node to be promoted to leader.
On the other hand, in a leaderless configuration, failover does not exist.
for example, the client (user 1234) sends the write to all three replicas in parallel, and the two available replicas accept the write but the unavailable replica misses it.

Let’s say that it’s sufficient for two out of three replicas to acknowledge the write, After (user 1234) has received two ok responses, we consider the write to be successful.
The client simply ignores the fact that one of the replicas missed the write.

Disadvantages of replication

We’ve seen that data replication has a good number of advantages, but it has disadvantages they may face.

One of the most common challenges with data replication can stem from data lag or service interruptions while data is being transferred or backed up.

Additionally, as the distance between the replicated data systems and the original copy increases, the process of data replication can become more taxing

  • Keeping all data current can be a challenge. The more locations you store your data, the more you’ll have to implement complex systems to keep track of what’s what.
  • You’ll need more storage space as your data continues to grow. This space can cost you a good chunk of your team budget.
  • When you use data replication tools, keeping a number of replicates in a few, maybe even a dozen locations can lead to your organization spending more money on higher processor and storage costs.
  • Someone has to be in charge of the backup process. Implementing data replication into an organization’s backup process takes time for the dedicated team to perfect.
  • Keeping all data copies consistent requires an overhaul of procedures and increases network traffic, potentially slowing down work.

Conclusion

at the end of this article, those are some resources for reading and getting more knowledge

Database Replication (science direct)

Data replication in the distribution system

Designing Data-Intensive Applications (chapter 5)

--

--

Nader Medhat
Nader Medhat

Written by Nader Medhat

love reading , learning , writing and sharing

Responses (1)