Database deadlocks are one of the most challenging concurrency issues encountered in production systems. Understanding why they occur and how databases handle them is important in building robust applications.
Database deadlocks are one of the most challenging concurrency issues encountered in production systems. Understanding why they occur and how databases handle them is important in building robust applications.
Let’s dig deeper and explore the mechanics of deadlocks, their underlying causes, detection mechanisms, and potential resolution strategies.
Database Deadlocks
A deadlock occurs when two or more transactions are waiting indefinitely for each other to release locks, creating a circular dependency that prevents any of the transactions from proceeding.
Txn A holds a lock on resource X and waits for a lock on resource Y
Txn B holds a lock on resource Y and waits for a lock on resource X
Neither transaction can proceed, creating an infinite wait condition, making them wait forever.
The Classic Example
Consider this scenario with two transactions executing simultaneously
Plain text
Txn 1 locks account 1 and waits for account 2, while Txn 2 locks account 2 and waits for account 1. This creates a circular wait condition - a classic example of a deadlock.
Why Deadlocks Occur
Lock Ordering Issues
The most common cause of deadlocks is inconsistent lock ordering across transactions. When different transactions acquire locks on the same resources in different orders, circular dependencies become inevitable with sufficient concurrency.
Real-world example: In an e-commerce system, one transaction might update inventory first, then the user account, while another updates the user account first, then inventory. Under high load, these transactions will eventually deadlock.
Transaction Design and Scope
Long-running transactions increase deadlock probability by holding locks for extended periods. The longer a transaction runs, the more likely it is to conflict with other transactions.
Problematic pattern
Plain text
Index and Query Plan Variations
Database optimizers can choose different execution plans for similar queries, leading to different lock acquisition orders. This is particularly problematic with range queries and complex joins.
Plain text
Lock Escalation
Lock escalation—where databases convert many fine-grained locks into fewer coarse-grained locks for memory efficiency—can create unexpected deadlock scenarios. A row-level deadlock might escalate to a table-level deadlock, affecting entirely different transactions.
Foreign Key Constraints
Foreign key relationships can create hidden lock dependencies. When you update a parent table, the database may need to check or lock related child table records, creating unexpected lock chains.
Deadlock Detection
Databases employ sophisticated algorithms to detect deadlocks automatically. The most common approach is the wait-for-graph algorithm.
Wait-for Graph Algorithm
The database maintains a directed graph where
Nodes represent active transactions
Edges represent wait relationships (Transaction A waits for Transaction B)
A cycle in the graph indicates a deadlock
Detection process
Periodically (typically every few seconds), the database examines all waiting transactions
Constructs a wait-for graph based on current lock requests and holdings
Searches for cycles using graph traversal algorithms
When a cycle is found, identify it as a deadlock
Some databases also do deadlock prevention, which means all the above steps, but right before granting a lock to a transaction. The lock manager typically kills the transaction that asked for the lock that could have led to the deadlock.
Detection Frequency and Performance
Deadlock detection isn’t free; it requires CPU cycles and can impact performance. Databases balance detection frequency with system overhead
PostgreSQL: Checks for deadlocks every 1 second by default (deadlock_timeout
)
MySQL: Uses immediate deadlock detection for some scenarios, periodic checking for others
SQL Server: Runs deadlock detection every 5 seconds, with more frequent checks under high contention
Deadlock Resolution Strategies
When a deadlock is detected via a periodic check, databases must break the cycle by terminating one or more transactions. This process involves victim selection and recovery mechanisms.
Victim Selection Algorithms
Databases use various criteria to choose which transaction to abort
Transaction That Started Detection
The transaction that started deadlock detection (usually newer) is the one that gets killed.
Work-based Selection
Abort the transaction that has done the least work
Minimizes the amount of rollback required
Measured by number of log records, CPU time, or rows modified
Time-based Selection
Abort the newest transaction (assuming older transactions are more important)
Or abort the oldest transaction (to prevent starvation)
Priority-based Selection
Some databases allow setting transaction priorities
Lower-priority transactions are chosen as victims
Useful for distinguishing between critical and background operations
Rollback Cost Estimation
Calculate the cost of rolling back each transaction
Consider factors like transaction size, complexity, and resource usage
Choose the transaction with the lowest rollback cost
Database-Specific Deadlock Handling
PostgreSQL
PostgreSQL prefers lazy detection i.e., it checks for deadlocks after a timeout period deadlock_timeout (default 1s) and at that time builds a full wait-for graph.
Plain text
Thus, we see that PostgreSQL runs with an assumption that most lock waits resolve quickly, so it doesn’t waste CPU on immediate checks.
Advantages:
Lower CPU overhead during normal operation (lazy detection)
Better performance when lock waits are typically short
More predictable behavior under high concurrency
Disadvantages:
Minimum 1-second delay before deadlock detection
Can lead to longer overall wait times in deadlock scenarios
Example
Plain text
PostgreSQL Output
Plain text
MySQL (InnoDB) Approach
MySQL uses a more aggressive approach where, for a simple two-transaction deadlock, detection is instantaneous, but for complex scenarios, it runs detection every few seconds (~5s) i.e., it falls back to timeout-based resolution (innodb_lock_wait_timeout)
Plain text
Thus, MySQL runs with the assumption that deadlocks should be resolved as quickly as possible
Advantages:
Immediate detection for simple deadlocks
Faster resolution when deadlocks occur
Better for applications that can’t tolerate long waits
Disadvantages:
Higher CPU overhead from constant monitoring
A more complex detection algorithm can have edge cases
Example
Plain text
MySQL Output
Plain text
Prevention Strategies
Consistent Lock Ordering
Establish and enforce consistent ordering of resource access across all transactions:
Plain text
Minimize Transaction Scope
Keep transactions as short as possible:
Plain text
Retry Logic
Build retry mechanisms for deadlock errors
Plain text
Optimize Query Performance
Faster queries hold locks for shorter periods:
Plain text
Monitoring Deadlocks
Logging
Enable comprehensive deadlock logging
Plain text
Refer Slow Query Logs
Regularly analyze slow and problematic queries, because most queries prone to deadlocks are the ones that tend to wait for other transactions to complete.
Hence, monitoring slow queries and periodically analyzing them is the best chance to proactively spot potential deadlocks.
Plain text
Performance Impact and Trade-offs
Detection Overhead
More frequent detection = faster resolution but higher CPU usage
Less frequent detection = lower overhead but longer wait times
Lock Granularity
Fine-grained locks = better concurrency, but more deadlock potential and high overhead
Coarse-grained locks = fewer deadlocks but reduced parallelism
Footnotes
Database deadlocks are an inevitable consequence of concurrent data access in multi-user systems. While they cannot be eliminated, understanding their causes and applying proper prevention, detection, and resolution strategies can minimize their impact on application performance and user experience.
Remember: deadlocks are not bugs to be fixed but a fundamental challenge of concurrent systems. It is therefore our responsibility, and it is honest fun to understand how transactions interact and why they may lead to deadlocks.