InkdownInkdown
Start writing

Study

59 files·8 subfolders

Shared Workspace

Study
core

08-replication-ha

Shared from "Study" on Inkdown

08 - Replication & High Availability

Why Replication?

Replication serves multiple purposes:

  1. High Availability: If primary fails, replica takes over
  2. Read Scaling: Distribute read load across replicas
  3. Disaster Recovery: Geographic redundancy
  4. Backups: Run backups without impacting primary
  5. Analytics: Run reports on replica, not production

Types of Replication

1. Physical (Binary) Replication

Replicates the exact disk blocks from primary to replica.

Plain text
┌─────────────────────────────────────────────────────────────┐
│              Physical Replication                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Primary                     Replica                         │
│  ┌─────────┐               ┌─────────┐                       │
│  │ Data    │◄─────────────►│  Copy   │  Identical bytes     │
│  │ Files   │   WAL stream │  of     │                       │
│  │ WAL ────┼──────────────►│  Data   │  Block-level         │
│  └─────────┘               └─────────┘                       │
│                                                              │
│  How it works (PostgreSQL Streaming Replication):           │
│  1. Primary writes to Write-Ahead Log (WAL)                 │
│  2. WAL records streamed to replica in real-time            │
│  3. Replica applies WAL to its copy of data                │
│                                                              │
│  Characteristics:                                           │
│  - Exact copy (same data files)                             │
│  - All databases replicated                                 │
│  - Replica is read-only (hot standby)                       │
│  - Low lag (milliseconds to seconds)                         │
│                                                              │
└─────────────────────────────────────────────────────────────┘
programming-language-concepts.md
zero-language-explanation.md
DB
01-introduction.md
02-relational-databases.md
03-database-design.md
04-indexing.md
05-transactions-acid.md
06-nosql-databases.md
07-query-optimization.md
08-replication-ha.md
09-sharding-partitioning.md
10-caching-strategies.md
11-cap-theorem.md
12-connection-pooling.md
13-backup-recovery.md
14-monitoring.md
15-database-selection.md
README.md
JS
Event loop
Merlin Backend
01-Orchestration.md
02-DeepResearch.md
03-Search.md
04-Scraping.md
05-Streaming.md
06-MultiProviderLLM.md
07-MemoryAndContext.md
08-ErrorHandling.md
09-RateLimiting.md
10-TaskQueue.md
11-SecurityAndAuth.md
Orchestration-2nd-draft
OpenAI Agents Python
00_OVERVIEW.md
01_AGENT_SYSTEM.md
02_RUNNER_SYSTEM.md
03_TOOL_SYSTEM.md
04_ITEMS_SYSTEM.md
05_GUARDRAILS.md
06_HANDOFFS.md
07_MEMORY_SESSIONS.md
08_MODEL_PROVIDERS.md
09_SANDBOX_SYSTEM.md
10_TRACING.md
11_RUN_STATE.md
12_CONTEXT.md
13_LIFECYCLE_HOOKS.md
14_CONFIGURATION.md
15_ERROR_HANDLING.md
16_STREAMING.md
17_EXTENSIONS.md
18_MCP_INTEGRATION.md
19_BEST_PRACTICES.md
20_ARCHITECTURE_PATTERNS.md
opencode-study
context-handling
core
Python
Alembic
Basics
sqlalchemy - fastapi
SQLAlchemy overview
tweets
system_design_for_agentic_apps.md

PostgreSQL Streaming Replication Setup:

Bash
# Primary configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
max_replication_slots = 10
hot_standby = on

# Create replication user
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret';

# pg_hba.conf - allow replica connections
host replication replicator replica_ip/32 scram-sha-256

# Replica setup
pg_basebackup -h primary -D /var/lib/postgresql/data -U replicator -P -v -R
# -R creates standby.signal (marks as replica)
# replica starts and streams WAL from primary
2. Logical Replication

Replicates changes at the SQL level (INSERT, UPDATE, DELETE).

Plain text
┌─────────────────────────────────────────────────────────────┐
│              Logical Replication                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Primary                     Replica                         │
│  ┌─────────┐               ┌─────────┐                       │
│  │ INSERT  │               │ INSERT  │                       │
│  │ INTO    │──────────────►│ INTO    │  SQL statements      │
│  │ users   │   Logical     │ users   │  (or row changes)     │
│  │ VALUES..│   Decoding    │ VALUES..│                       │
│  └─────────┘               └─────────┘                       │
│                                                              │
│  Advantages over physical:                                   │
│  - Selective replication (specific tables)                  │
│  - Cross-version replication                                 │
│  - Can write to replica (multi-master possible)             │
│  - Different schema on replica                              │
│                                                              │
│  Use cases:                                                  │
│  - Upgrades (PG 14 → 15)                                     │
│  - Aggregating data from multiple sources                   │
│  - Selective replication to data warehouse                  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

PostgreSQL Logical Replication:

Sql
-- Primary: Create publication
CREATE PUBLICATION mypub FOR TABLE users, orders;

-- Replica: Create subscription
CREATE SUBSCRIPTION mysub 
    CONNECTION 'host=primary dbname=mydb user=replicator'
    PUBLICATION mypub;

-- Check replication status
SELECT * FROM pg_stat_subscription;
3. Statement-Based Replication (MySQL)
Sql
-- MySQL can replicate the actual SQL statements
-- Binary logging of statements

-- Pros: Less data to transfer
-- Cons: Non-deterministic functions (NOW(), RAND()) can cause divergence

-- Row-based is more common now (safer)
-- Mixed mode: statement when safe, row when not

Replication Topologies

1. Primary-Replica (Single Primary)
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Primary-Replica                                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│              ┌─────────┐                                    │
│    Writes ───► Primary │                                    │
│              │  (R/W)  │                                    │
│              └────┬─────┘                                    │
│                   │ Replication                             │
│          ┌────────┼────────┐                                │
│          ▼        ▼        ▼                                │
│      ┌───────┐ ┌───────┐ ┌───────┐                        │
│      │Replica│ │Replica│ │Replica│  ◄─── Reads             │
│      │ (RO)  │ │ (RO)  │ │ (RO)  │                        │
│      └───────┘ └───────┘ └───────┘                        │
│                                                              │
│  Read replica lag: Typically milliseconds to seconds         │
│                                                              │
└─────────────────────────────────────────────────────────────┘
2. Multi-Primary (Multi-Master)
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Multi-Primary                                   │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│         ┌───────────┐                                       │
│    ─────► Primary 1 │◄─────────────────────────┐             │
│   │     │  (US)     │                        │             │
│   │     └─────┬─────┘                        │             │
│   │           │                              │             │
│   │     Replication                          │             │
│   │           │                              │             │
│   │     ┌─────▼─────┐                        │             │
│   └─────┤  Primary 2 │◄───────────────────────┤             │
│         │  (EU)     │                        │             │
│         └───────────┘                        │             │
│                                              │             │
│  Challenges:                                  │             │
│  - Write conflicts (same row updated on both)              │
│  - Auto-increment coordination                               │
│  - Split-brain scenarios                                     │
│                                                              │
│  Solutions: Conflict resolution strategies                    │
│  - Last-write-wins (timestamp)                             │
│  - Application-level conflict handling                       │
│                                                              │
└─────────────────────────────────────────────────────────────┘
3. Cascading Replication
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Cascading Replication                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│              ┌─────────┐                                    │
│              │ Primary │                                    │
│              └────┬─────┘                                    │
│                   │                                         │
│              ┌────▼────┐                                     │
│              │ Replica │  (cascade source)                  │
│              │   1     │                                     │
│              └────┬────┘                                     │
│          ┌────────┼────────┐                                │
│          ▼        ▼        ▼                                │
│      ┌───────┐ ┌───────┐ ┌───────┐                        │
│      │Rep 2A │ │Rep 2B │ │Rep 2C │                        │
│      └───────┘ └───────┘ └───────┘                        │
│                                                              │
│  Benefits:                                                   │
│  - Reduces load on primary                                   │
│  - Geographic distribution                                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘
4. Ring Topology
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Ring Replication                                │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│         ┌──────────►┌──────────►┌──────────►┐               │
│         │  Node 1  │  Node 2  │  Node 3   │               │
│         │   (US)   │   (EU)   │   (Asia)  │               │
│         └──────────┴──────────┴───────────┘               │
│              ▲                              │               │
│              └───────────────────────────────┘               │
│                                                              │
│  Each node is primary for its shard/region                  │
│  Replicates to next node in ring                            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

High Availability (HA)

Measuring Availability
Plain text
Availability = Uptime / (Uptime + Downtime)

"9s" of availability:
┌─────────────────┬──────────────────────┐
│ Availability    │ Downtime per year    │
├─────────────────┼──────────────────────┤
│ 99% (2 nines)   │ 3.65 days            │
│ 99.9% (3 nines) │ 8.76 hours           │
│ 99.99% (4 nines)│ 52.6 minutes         │
│ 99.999% (5 nines│ 5.26 minutes         │
│ 99.9999% (6 n)  │ 31.5 seconds         │
└─────────────────┴──────────────────────┘
Failover Strategies
Manual Failover
Bash
# 1. Stop primary (if not already down)
# 2. Promote replica
pg_ctl promote -D /var/lib/postgresql/data

# 3. Update application connection strings
# 4. Rebuild old primary as replica
Automatic Failover with Patroni
YAML
# Patroni configuration
scope: my_cluster
namespace: /db/
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: node1:8008

etcd:
  hosts: etcd1:2379,etcd2:2379,etcd3:2379

postgresql:
  listen: 0.0.0.0:5432
  connect_address: node1:5432
  data_dir: /var/lib/postgresql/data
  
  # Automatic failover happens when primary fails
  # Leader election via etcd consensus
Connection Handling During Failover
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Connection Routing                              │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────┐                                            │
│  │ Application │                                            │
│  └──────┬──────┘                                            │
│         │                                                    │
│         │ Connects to                                        │
│         ▼                                                    │
│  ┌─────────────────┐                                         │
│  │  PgBouncer /    │  Connection pooler                     │
│  │  HAProxy /      │  Routes to current primary             │
│  │  Virtual IP     │                                         │
│  └────────┬────────┘                                         │
│           │                                                  │
│     ┌─────┴─────┐                                            │
│     ▼           ▼                                            │
│ ┌───────┐   ┌───────┐                                        │
│ │Primary│   │Replica│  ◄── On failover, VIP switches           │
│ │(old)  │   │(new)  │      Apps see no change                │
│ └───┬───┘   └───────┘                                        │
│     │                                                        │
│   Fails                                                      │
│     │                                                        │
│     ▼                                                        │
│  ┌───────┐                                                   │
│  │Replica│  Promoted to Primary                              │
│  │(new)  │                                                   │
│  └───────┘                                                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘
Synchronous vs Asynchronous Replication
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Synchronous Replication                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Primary              Replica                                │
│  ┌─────────┐          ┌─────────┐                           │
│  │ COMMIT  │─────────►│ Apply   │                           │
│  │ Wait... │          │ WAL     │                           │
│  │         │◄─────────│ ACK     │                           │
│  │ Return  │          │         │                           │
│  └─────────┘          └─────────┘                           │
│                                                              │
│  No data loss on primary failure                             │
│  But: Higher latency (round trip time added to every commit)  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│              Asynchronous Replication                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Primary              Replica                                │
│  ┌─────────┐          ┌─────────┐                           │
│  │ COMMIT  │─────────►│ Receive│                           │
│  │ Return  │  (async) │ (later) │                           │
│  │         │          │ Apply   │                           │
│  └─────────┘          └─────────┘                           │
│                                                              │
│  Low latency for writes                                      │
│  Risk: Some data loss if primary fails (unreplicated WAL)    │
│  Window: typically milliseconds to seconds                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

PostgreSQL Configuration:

Ini
# Asynchronous (default)
synchronous_commit = off  # or local

# Synchronous
synchronous_commit = remote_apply
synchronous_standby_names = 'replica1, replica2'
# Wait for at least 1 of these to confirm

Replication Lag Monitoring

Sql
-- PostgreSQL: Check replication lag
-- On replica:
SELECT 
    EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;

-- Or using pg_stat_wal_receiver
SELECT 
    conninfo,
    received_lsn,
    latest_end_lsn,
    EXTRACT(EPOCH FROM (now() - latest_end_time)) AS lag_seconds
FROM pg_stat_wal_receiver;

-- On primary: Check connected replicas
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    write_lag,
    flush_lag,
    replay_lag
FROM pg_stat_replication;

Common HA Architectures

1. Cloud Managed (AWS RDS, GCP Cloud SQL, Azure)
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Cloud Managed Database                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  You get:                                                    │
│  - Automatic failover                                        │
│  - Read replicas                                             │
│  - Automated backups                                         │
│  - Monitoring                                                │
│  - Patch management                                          │
│                                                              │
│  Trade-off: Less control, higher cost                       │
│                                                              │
└─────────────────────────────────────────────────────────────┘
2. Self-Managed with Patroni + etcd
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Patroni Cluster                                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐                  │
│   │  etcd1  │   │  etcd2  │   │  etcd3  │  Consensus store   │
│   └────┬────┘   └────┬────┘   └────┬────┘                  │
│        └─────────────┼─────────────┘                       │
│                      │                                       │
│   ┌──────────────────┼──────────────────┐                 │
│   │        ┌─────────▼─────────┐          │                 │
│   │        │     Leader       │◄─────────┼─── Writes      │
│   │        │   (Primary PG)   │          │                 │
│   │        └─────────┬─────────┘          │                 │
│   │                  │ Replication        │                 │
│   │        ┌─────────▼─────────┐          │                 │
│   │        │    Replica       │          │                 │
│   │        │   (Hot Standby)  │◄─────────┼─── Reads       │
│   │        └─────────────────┘          │                 │
│   │           Patroni/HAProxy            │                 │
│   └───────────────────────────────────────┘                 │
│                                                              │
│  Automatic failover, leader election                         │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Split-Brain Problem

Plain text
┌─────────────────────────────────────────────────────────────┐
│              Split-Brain Scenario                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Normal:                                                     │
│  ┌─────────┐              ┌─────────┐                      │
│  │Primary  │◄────────────►│Replica  │                      │
│  │Node A   │  Connection  │Node B   │                      │
│  └─────────┘              └─────────┘                      │
│                                                              │
│  Network partition:                                         │
│  ┌─────────┐     X     ┌─────────┐                          │
│  │Primary  │◄──lost──►│Replica  │                          │
│  │Node A   │ connection│Node B   │                          │
│  └─────────┘              └─────────┘                          │
│       │                       │                              │
│       ▼                       ▼                              │
│  A thinks: "I'm primary"    B thinks: "A is dead,            │
│                             I should become primary!"       │
│                                                              │
│  Result: TWO primaries! Data diverges...                    │
│                                                              │
│  Solutions:                                                  │
│  1. Quorum-based (need majority of nodes to elect leader)    │
│  2. STONITH (Shoot The Other Node In The Head)               │
│  3. Manual intervention for reconciliation                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Next: Sharding & Partitioning