MySQL replication enables data synchronization across databases, powering read scaling and even some complex distributed architectures. At the core lies the binary log (binlog), which is the authoritative record of all data modifications on a source server.
MySQL replication enables data synchronization across databases, powering read scaling and even some complex distributed architectures. At the core lies the binary log (binlog), which is the authoritative record of all data modifications on a source server.
Getting a hang of how binlog replication works is not only fun, but it also builds an intuition as to how other databases would be doing it. More importantly, if you face any replication related issues in production, you know what could have gone wrong.
This essay covers internals along with looking at how events are structured, how threads handle replication, the different modes, and tips for keeping it running smoothly at scale. So, here we go…
MySQL’s binary log keeps a record of every change to the database, essentially serving as the source of truth for what happened and when. Because of this, it supports several key features:
First, it allows read scaling by replicating data to replica servers that can handle read queries. Second, it provides high availability by maintaining standby servers ready to take over if the primary fails. Third, it enables point-in-time recovery by replaying binary logs to restore a database to any moment in time. Fourth, it supports change data capture (CDC) by allowing external systems to consume database changes in real time.
Binary Log
The binary log is a sequence of files that record changes to the database. Each file has a base name (configurable via the log_bin option) followed by a numeric suffix that increments with each new file. An index file tracks all current binary log files.
Plain text
Binary log files roll over either when they reach a certain size (controlled by max_binlog_size, usually 1GB) or whenever the server restarts. The index file is just a plain text list pointing to all the current log files.
Each binary log file begins with a magic number (0xfe626963, which spells “binlog” in ASCII) followed by a format description event that defines how subsequent events should be interpreted. This format description contains versioning information that allows different MySQL versions to understand each other’s binary logs during replication.
Binary Log Event Structure
Every change recorded in the binary log is stored as an event. Events have a 19-byte header followed by event-specific data. Something like this…
Timestamp (4B): when the event occurred
Type code (1B): the event type
Server ID (4B): id of the server that generated the event
Event length (4B): size of the event, including the header
Next position (4B): Position of the next event in the binlog
Flags (2B): Various flags controlling event behavior
Plain text
Common Event Types
MySQL uses several event types, but some important ones are …
FORMAT_DESCRIPTION_EVENT
is the first event in every binary log file. It describes the server version and provides metadata needed to interpret all subsequent events.
QUERY_EVENT
contains SQL statements that were executed on the source. The event includes the database context, the SQL statement text, and various execution metadata.
WRITE_ROWS_EVENT
, UPDATE_ROWS_EVENT
, and DELETE_ROWS_EVENT
contain the actual row data for inserts, updates, and deletes in row-based replication. Update events include both before and after images of modified rows.
4. GTID_LOG_EVENT
contains the global transaction identifier for GTID-based replication. This event appears at the start of each transaction and contains the unique identifier along with logical timestamps used for parallel replication.
5. XID_EVENT
marks the end of a transaction that was committed. The XID
(transaction ID) links the binary log entry to the storage engine’s internal transaction tracking.
6. ROTATE_EVENT
indicates that the server is switching to a new binary log file. This event is appended as the last event before rotating to a new file.
Binary Log Formats
MySQL supports three binary logging formats, each with different tradeoffs for space efficiency, safety, and compatibility.
Statement-based Logging
Statement-based logging (SBR) records the actual SQL statements that were executed. This format is compact because a single statement like UPDATE users SET status = 'active' WHERE created_at < '2024-01-01' is stored as one event regardless of how many rows it affects.
Plain text
However, SBR has limitations. Non-deterministic functions like NOW(), UUID(), or RAND() can produce different results when replayed on a replica. Statements using LIMIT without ORDER BY are also problematic because row ordering may differ between servers.
Row-based Logging
Row-based logging (RBR) records the actual row changes rather than the statements that produced them. Each modified row is stored with its before and after images for updates, or just the relevant image for inserts and deletes.
Plain text
RBR guarantees that replicas receive exactly the same row changes that occurred on the source. The tradeoff is larger binary logs when statements affect many rows. A single UPDATE touching a million rows generates a million row events.
Mixed Logging
Mixed logging (MBR) combines both approaches. MySQL uses statement-based logging by default but automatically switches to row-based logging for statements that would be unsafe to replicate as statements. This balances compactness and safety.
By the way, row-based logging is the default in MySQL 8.0 and is recommended for most production workloads due to its reliability and deterministic behavior.
Replication Architecture
MySQL replication uses an asynchronous, pull-based architecture. The replica initiates connections to the source and requests binary log events, which are then applied locally. This makes replicas to operate independently and recover gracefully from network interruptions.
On the source server:
Binary log: Stores all changes as events
Binlog dump thread: Sends binary log events to connected replicas
On the replica server:
I/O thread (receiver thread): Connects to the source and receives events
Relay log: Local storage for received events
SQL thread (applier thread): Reads relay log and applies events
Worker threads: Apply events in parallel when multi-threaded replication is enabled
Binlog Dump Thread
When a replica connects to a source for replication, the source creates a dedicated binlog dump thread to serve that replica. This thread is responsible for reading events from the binary log and sending them over the network.
The dump thread maintains the binary log position requested by the replica and sends events as they become available. When the source is idle with no new changes, the dump thread waits (not busy-wait) for new events to be written to the binary log.
By the way, you can observe binlog dump threads on the source server using SHOW PROCESSLIST
Plain text
Each connected replica gets its own binlog dump thread. If you have five replicas, the source maintains five dump threads. This is generally lightweight because the threads spend most of their time waiting for new events.
The I/O thread (receiver thread)
The replica’s I/O thread is responsible for connecting to the source server and receiving binary log events. When you start replication with START REPLICA, the I/O thread sets up a connection to the source specified in the CHANGE REPLICATION SOURCE TO command.
Once connected, the I/O thread requests binary log events starting from the position recorded in the replica’s connection metadata. As events arrive, the thread writes them to the local relay log without processing them. This separation of receiving and applying events is an important design decision.
First, it allows the replica to buffer events locally even when the applier is slow or stopped. Second, it enables the replica to catch up quickly after being offline because the I/O thread can pull events as fast as the network allows. Third, it provides a checkpoint mechanism where the relay log position can be used to resume replication after interruptions.
The Relay Log
The relay log (which is on a replica) is structurally identical to the binary log. It consists of numbered files containing the same event format as binary logs. The key difference is that relay logs contain events received from a source server rather than events generated locally.
Plain text
Relay log files are created as the I/O thread receives events from the source. The SQL thread reads from relay logs to apply changes. Once events have been applied, relay log files are automatically deleted to free disk space.
The relay log serves as a buffer between network reception and the local application. This buffering is crucial for handling situations where the applier falls behind the receiver. Without relay logs, a slow application would create backpressure all the way to the source server.
The SQL Thread (applier)
The SQL thread reads events from the relay log and applies them to the local database. In single-threaded mode, this thread processes events sequentially in the exact order they appear in the relay log.
For each event, the SQL thread performs appropriate actions:
QUERY_EVENT
: Executes the SQL statement
WRITE_ROWS_EVENT
: Inserts the specified rows
UPDATE_ROWS_EVENT
: Updates rows matching the before-image to the after-image
DELETE_ROWS_EVENT
: Deletes rows matching the specified image
After successfully applying an event, the SQL thread updates the applier metadata to record its progress. This metadata includes the relay log position and the corresponding source binary log position.
The SQL thread is often the bottleneck in replication because it must apply changes sequentially in single-threaded mode. Even if the source executed many transactions concurrently, the replica applies them one at a time. This is why multi-threaded replication was introduced.
Multi-threaded Replication (MTS)
With multi-threaded replication, a replica can handle multiple transactions at once, because of several worker threads. This really helps when the source is busy with lots of concurrent activity.
Enable multi-threaded replication by setting replica_parallel_workers (or slave_parallel_workers in older versions):
Plain text
When MTS is enabled, the SQL thread becomes a coordinator that reads transactions from the relay log and assigns them to worker threads for execution. The coordinator ensures proper ordering while maximizing parallelism.
MySQL 5.7 introduced logical clock parallelism. The logical clock approach recognizes that transactions committed together in the same group commit on the source are independent and can be applied in parallel. This is based on a key insight: if transactions could commit together on the source without conflicts, they can be applied in parallel on the replica.
The source server adds logical timestamps to each transaction in the binary log. These timestamps indicate the “commit parent” of each transaction, essentially marking which transactions were in the same commit group. The replica coordinator uses these timestamps to schedule parallel execution.
Plain text
MySQL 8 introduced WRITESET-based parallelism, which analyzes the actual rows modified by each transaction. Transactions modifying non-overlapping rows can execute in parallel regardless of commit timing. This provides the highest level of parallelism but requires additional memory for tracking.
Plain text
Semi-synchronous replication
By default, MySQL replication is fully asynchronous. The source commits transactions without waiting for any replica to receive them. This provides the best performance but creates a window where committed transactions could be lost if the source crashes before replicas receive them.
Semi-synchronous replication makes the source pause until at least one replica confirms it got the transaction. So, it gives you a better safety for your data, though it can slow things down a bit.
The semi-synchronous flow works as follows:
Client sends COMMIT
to source
Source writes transaction to binary log
Source waits for replica acknowledgment
Replica receives transaction and writes to relay log
Replica sends ACK
back to source
Source commits in storage engine and returns to client
Semi-synchronous replication includes a timeout mechanism. If no replica acknowledges within the timeout (default 10 seconds), the source falls back to asynchronous replication to prevent blocking indefinitely:
Plain text
You can also require acknowledgment from multiple replicas:
Plain text
Replication filters
MySQL supports filtering which databases, tables, or events are replicated. Filters can be applied on either the source or replica side.
Source-side filtering controls what is written to the binary log:
Plain text
Replica-side filtering controls what is applied from the relay log:
Plain text
Note, if a statement references multiple databases and filtering is in effect, results may be unexpected. Row-based replication with replica-side filtering is generally safer than statement-based replication.
Monitoring Replication Lag
Replication lag is the delay between when a transaction commits on the source and when it is applied on the replica. Some lag is inherent in asynchronous replication, but excessive lag can cause consistency issues for applications reading from replicas.
The traditional method checks Seconds_Behind_Master in SHOW REPLICA STATUS:
Plain text
This metric compares timestamps between the relay log event currently being executed and the most recently received event.
Best Practices for Replication in Production
Deploy replicas with equal or greater resources than the source. A replica with fewer CPU cores, less memory, or slower disks will inevitably lag.
Use row-based replication for safety and predictability. While statement-based logging is more compact, the risk of non-deterministic behavior outweighs the space savings.
Configure crash-safe replication with table-based metadata repositories. This prevents inconsistencies after replica crashes.
Use semi-synchronous replication when durability is critical. Accept the latency overhead to guarantee that committed transactions exist on at least one replica.
Monitor replication lag continuously and alert when it exceeds acceptable thresholds. Integrate lag monitoring into your application health checks.
Keep primary keys on all replicated tables. This is the single most impactful optimization for row-based replication performance.
Test failover procedures regularly. Replication is only valuable for high availability if you can actually promote a replica when needed.
Replication Topology Patterns
MySQL replication supports various topology patterns beyond simple source-to-replica setups.
Chain replication connects replicas in sequence: A replicates to B, B replicates to C. This reduces load on the primary but increases lag for downstream replicas. Enable log_replica_updates on intermediate servers so they write received transactions to their own binary logs.
Plain text
Multi-source replication allows a single replica to receive changes from multiple sources. Each source connection is managed through a replication channel:
Plain text
This pattern is useful for aggregating data from multiple shards into a central reporting database. Each channel has independent I/O and SQL threads.
Binary Log Disk Management
Binary logs can eat up a lot of disk space on busy servers. It’s important to make sure you don’t run out of room, while still keeping enough history for recovery or setting up new replicas.
By the way, we can purge old binary logs based on time or space with the following options:
Plain text
Manual purging can target specific files:
Plain text
We can also monitor binary log disk usage:
Plain text
Footnote
MySQL binlog replication has a pull-based architecture where replicas request events from sources, storing them in relay logs before applying.
The replication supports statement and row-based logging, with GTIDs providing transaction tracking and multi-threaded appliers enabling parallel execution. Semi-synchronous replication offers stronger durability guarantees when needed.
The binary log isn’t only about replication, rather, it is also the ‘backbone’ for things like disaster recovery, real-time data capture, and broader database architecture.
Replication workflow is different in different databases, and nuances are what make them special.