InkdownInkdown
Start writing

Study

59 files·8 subfolders

Shared Workspace

Study
core

13-backup-recovery

Shared from "Study" on Inkdown

13 - Backup & Recovery

Why Backups Matter

Plain text
┌─────────────────────────────────────────────────────────────┐
│              Real-World Data Loss Scenarios                  │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. Accidental DELETE without WHERE                         │
│     DELETE FROM users;  -- Oops, no WHERE!                    │
│     100,000 users gone in 1 second                          │
│                                                              │
│  2. DROP TABLE in production                                │
│     DROP TABLE orders;  -- Wrong terminal!                   │
│     Table structure + all data gone                           │
│                                                              │
│  3. Ransomware/Malware                                        │
│     Database encrypted by attacker                          │
│     Must restore from clean backup                           │
│                                                              │
│  4. Hardware failure                                          │
│     Disk corruption, RAID failure                             │
│     Primary and replica affected                              │
│                                                              │
│  5. Application bug                                           │
│     Bug updates wrong rows                                  │
│     Data corruption spread over days                         │
│                                                              │
│  6. Natural disaster                                          │
│     Data center flood/fire                                  │
│     Need off-site backup                                      │
│                                                              │
│  Recovery Time Objective (RTO): How fast to recover?         │
│  Recovery Point Objective (RPO): How much data loss?         │
│                                                              │
└─────────────────────────────────────────────────────────────┘
programming-language-concepts.md
zero-language-explanation.md
DB
01-introduction.md
02-relational-databases.md
03-database-design.md
04-indexing.md
05-transactions-acid.md
06-nosql-databases.md
07-query-optimization.md
08-replication-ha.md
09-sharding-partitioning.md
10-caching-strategies.md
11-cap-theorem.md
12-connection-pooling.md
13-backup-recovery.md
14-monitoring.md
15-database-selection.md
README.md
JS
Event loop
Merlin Backend
01-Orchestration.md
02-DeepResearch.md
03-Search.md
04-Scraping.md
05-Streaming.md
06-MultiProviderLLM.md
07-MemoryAndContext.md
08-ErrorHandling.md
09-RateLimiting.md
10-TaskQueue.md
11-SecurityAndAuth.md
Orchestration-2nd-draft
OpenAI Agents Python
00_OVERVIEW.md
01_AGENT_SYSTEM.md
02_RUNNER_SYSTEM.md
03_TOOL_SYSTEM.md
04_ITEMS_SYSTEM.md
05_GUARDRAILS.md
06_HANDOFFS.md
07_MEMORY_SESSIONS.md
08_MODEL_PROVIDERS.md
09_SANDBOX_SYSTEM.md
10_TRACING.md
11_RUN_STATE.md
12_CONTEXT.md
13_LIFECYCLE_HOOKS.md
14_CONFIGURATION.md
15_ERROR_HANDLING.md
16_STREAMING.md
17_EXTENSIONS.md
18_MCP_INTEGRATION.md
19_BEST_PRACTICES.md
20_ARCHITECTURE_PATTERNS.md
opencode-study
context-handling
core
Python
Alembic
Basics
sqlalchemy - fastapi
SQLAlchemy overview
tweets
system_design_for_agentic_apps.md

Types of Backups

1. Physical (Binary) Backups
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Physical Backup                                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  What: Copy actual database files                           │
│  - Data files                                                 │
│  - WAL (Write-Ahead Log) files                              │
│  - Configuration files                                        │
│                                                              │
│  Tools:                                                      │
│  - PostgreSQL: pg_basebackup, pgBackRest, Barman            │
│  - MySQL: XtraBackup, mysqldump (for small DBs)             │
│  - Filesystem: LVM snapshots, ZFS snapshots                 │
│                                                              │
│  Pros:                                                       │
│  - Fast (filesystem-level copy)                             │
│  - Exact replica (including indexes, etc.)                   │
│  - Point-in-time recovery possible                          │
│  - Can use for replication setup                            │
│                                                              │
│  Cons:                                                       │
│  - Database must be running (usually)                         │
│  - Platform-specific (can't restore Linux backup to Windows)│
│  - Storage size = database size                               │
│                                                              │
└─────────────────────────────────────────────────────────────┘
2. Logical (SQL) Backups
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Logical Backup                                  │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  What: Export as SQL statements                               │
│  - CREATE TABLE statements                                    │
│  - INSERT statements for data                                │
│  - Schema definitions                                         │
│                                                              │
│  Tools:                                                      │
│  - PostgreSQL: pg_dump, pg_dumpall                            │
│  - MySQL: mysqldump, mydumper                                │
│  - MongoDB: mongodump                                         │
│                                                              │
│  Pros:                                                       │
│  - Human-readable (somewhat)                                 │
│  - Cross-platform compatible                                  │
│  - Can restore to different database version                  │
│  - Can restore subset of tables                               │
│  - Compressed size often smaller                             │
│                                                              │
│  Cons:                                                       │
│  - Slow for large databases (millions of rows)               │
│  - Restore is slow (re-executes every INSERT)               │
│  - No point-in-time recovery                                  │
│  - Can miss data between dump start and end                  │
│                                                              │
└─────────────────────────────────────────────────────────────┘
3. Continuous Archiving (WAL Archiving)
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Continuous Archiving                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  How it works:                                               │
│                                                              │
│  Database ──► WAL Files ──► Archive Storage                 │
│  writes     (sequential     (S3, NFS,                      │
│  changes     logs)           remote server)                 │
│                                                              │
│  Base Backup + WAL Archive = Point-in-Time Recovery          │
│                                                              │
│  PostgreSQL configuration:                                   │
│  wal_level = replica                                         │
│  archive_mode = on                                           │
│  archive_command = 'cp %p /backup/wal/%f'                   │
│  # Or use wal-g, pgBackRest for cloud                        │
│                                                              │
│  Recovery:                                                   │
│  1. Restore base backup                                       │
│  2. Replay WAL files to desired point in time               │
│  3. Can recover to any moment (up to latest)                  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Backup Commands

PostgreSQL
Bash
# Logical backup - single database
pg_dump -h localhost -U myuser -d mydb > mydb_backup.sql

# Compressed logical backup
pg_dump -h localhost -U myuser -d mydb | gzip > mydb_backup.sql.gz

# Custom format (compressed, selective restore)
pg_dump -h localhost -U myuser -d mydb -Fc > mydb_backup.dump

# Backup specific tables only
pg_dump -h localhost -U myuser -d mydb -t 'public.users' -t 'public.orders' > partial_backup.sql

# Backup schema only (no data)
pg_dump -h localhost -U myuser -d mydb --schema-only > schema.sql

# Backup data only (no schema)
pg_dump -h localhost -U myuser -d mydb --data-only > data.sql

# Backup all databases
pg_dumpall -h localhost -U postgres > all_databases.sql

# Physical backup - base backup for PITR
pg_basebackup -h localhost -D /backup/base -Ft -z -P
# -Ft: tar format
# -z: compress
# -P: progress

# Using pgBackRest (recommended for production)
pgbackrest --stanza=mydb backup
pgbackrest --stanza=mydb info
MySQL
Bash
# Logical backup
mysqldump -h localhost -u myuser -p mydb > mydb_backup.sql

# Backup all databases
mysqldump -h localhost -u root -p --all-databases > all_databases.sql

# Single transaction (consistent for InnoDB)
mysqldump -h localhost -u myuser -p --single-transaction mydb > mydb_backup.sql

# Specific tables
mysqldump -h localhost -u myuser -p mydb users orders > tables_backup.sql

# Physical backup with XtraBackup
xtrabackup --backup --target-dir=/backup/base
xtrabackup --prepare --target-dir=/backup/base

Restore Procedures

PostgreSQL Restore
Bash
# Restore from SQL dump
# 1. Create database
createdb -U postgres mydb_restored

# 2. Restore
psql -U postgres -d mydb_restored < mydb_backup.sql

# Or in one command
psql -U postgres -c "CREATE DATABASE mydb_restored;"
psql -U postgres mydb_restored < mydb_backup.sql

# Restore from custom format dump
pg_restore -U postgres -d mydb_restored mydb_backup.dump

# Restore specific table only
pg_restore -U postgres -d mydb_restored -t users mydb_backup.dump

# Point-in-Time Recovery (PITR)
# 1. Stop PostgreSQL
pg_ctl stop -D /var/lib/postgresql/data

# 2. Clear data directory
rm -rf /var/lib/postgresql/data/*

# 3. Extract base backup
tar -xzf /backup/base.tar.gz -C /var/lib/postgresql/data

# 4. Create recovery configuration
cat > /var/lib/postgresql/data/postgresql.auto.conf << EOF
restore_command = 'cp /backup/wal/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
recovery_target_action = 'promote'
EOF

# 5. Create recovery signal
touch /var/lib/postgresql/data/recovery.signal

# 6. Start PostgreSQL - it will replay WAL to target time
pg_ctl start -D /var/lib/postgresql/data
MySQL Restore
Bash
# Restore from SQL dump
mysql -u root -p -e "CREATE DATABASE mydb_restored;"
mysql -u root -p mydb_restored < mydb_backup.sql

# Restore XtraBackup
# 1. Prepare backup (if not already done)
xtrabackup --prepare --target-dir=/backup/base

# 2. Stop MySQL
systemctl stop mysql

# 3. Clear data directory
rm -rf /var/lib/mysql/*

# 4. Copy backup
xtrabackup --copy-back --target-dir=/backup/base

# 5. Fix permissions
chown -R mysql:mysql /var/lib/mysql

# 6. Start MySQL
systemctl start mysql

Backup Strategies

3-2-1 Rule
Plain text
┌─────────────────────────────────────────────────────────────┐
│              The 3-2-1 Backup Rule                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  3 - Keep 3 copies of data                                  │
│      - Primary database                                      │
│      - Local backup                                          │
│      - Off-site/cloud backup                                 │
│                                                              │
│  2 - Use 2 different media types                             │
│      - Disk (fast, local)                                    │
│      - Tape/Cloud (slow, durable)                           │
│                                                              │
│  1 - Keep 1 copy off-site                                    │
│      - Different geographical location                       │
│      - Protects against fire, flood, disaster               │
│                                                              │
│  Example Implementation:                                     │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐   │
│  │  Primary   │     │  Local NAS  │     │   S3/AWS    │   │
│  │  Database  │────►│  (nightly)  │────►│  (hourly)   │   │
│  └─────────────┘     └─────────────┘     └─────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘
Backup Schedule Example
Plain text
┌─────────────────────────────────────────────────────────────┐
│              Production Backup Schedule                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Continuous:                                                 │
│  - WAL archiving (real-time)                                 │
│  - Streaming replication to hot standby                     │
│                                                              │
│  Hourly:                                                     │
│  - Incremental WAL backups (pgBackRest)                     │
│  - Logical backup of critical tables                        │
│                                                              │
│  Daily:                                                      │
│  - Full base backup (pg_basebackup)                         │
│  - Logical backup (pg_dump) to version control               │
│                                                              │
│  Weekly:                                                     │
│  - Test restore procedure (automated)                        │
│  - Archive old backups (move to cold storage)                │
│                                                              │
│  Monthly:                                                    │
│  - Full system backup (OS + database)                      │
│  - Disaster recovery drill                                   │
│                                                              │
│  Retention:                                                  │
│  - Keep 7 daily backups                                     │
│  - Keep 4 weekly backups                                      │
│  - Keep 12 monthly backups                                    │
│  - Keep yearly backups indefinitely                          │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Cloud Backup Solutions

AWS
Bash
# RDS automated backups
# - Enabled by default (retention 7 days)
# - Point-in-time recovery available

# Manual RDS snapshot
aws rds create-db-snapshot \
    --db-instance-identifier mydb \
    --db-snapshot-identifier mydb-snapshot-20240115

# S3 backup with wal-g
wal-g backup-push /var/lib/postgresql/data
wal-g wal-push /var/lib/postgresql/data/pg_wal/wal_file

# Restore from S3
wal-g backup-fetch /var/lib/postgresql/data LATEST
PostgreSQL + S3 with wal-g
Bash
# Configure wal-g
export WALG_S3_PREFIX="s3://mybucket/db-backups"
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

# Full backup
wal-g backup-push /var/lib/postgresql/data

# Continuous WAL archiving
# In postgresql.conf:
archive_mode = on
archive_command = 'wal-g wal-push %p'

# List backups
wal-g backup-list

# Restore
# 1. Stop PostgreSQL
# 2. Clear data directory
# 3. Fetch backup
wal-g backup-fetch /var/lib/postgresql/data base_000000010000000000000004
# 4. Create recovery.conf for PITR
# 5. Start PostgreSQL

Testing Backups

Plain text
┌─────────────────────────────────────────────────────────────┐
│              Backup Testing is CRITICAL                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Untested backup = NO BACKUP                                 │
│                                                              │
│  Automated Testing:                                           │
│  1. Weekly restore test                                       │
│     - Restore backup to test server                         │
│     - Run consistency checks                                │
│     - Verify row counts                                     │
│     - Run application smoke tests                           │
│                                                              │
│  2. Automated restore pipeline                             │
│     - Spin up container/VM                                  │
│     - Restore latest backup                                 │
│     - Run test queries                                        │
│     - Tear down                                               │
│     - Report success/failure                                  │
│                                                              │
│  Manual Testing (Monthly):                                    │
│  - Full disaster recovery drill                              │
│  - Document issues and timing                               │
│  - Update runbooks                                            │
│                                                              │
└─────────────────────────────────────────────────────────────┘
Bash
# Automated backup test script example
#!/bin/bash

BACKUP_FILE=$1
TEST_DB="backup_test_$(date +%s)"

echo "Testing backup: $BACKUP_FILE"

# 1. Create test database
psql -U postgres -c "CREATE DATABASE $TEST_DB;"

# 2. Restore
pg_restore -U postgres -d $TEST_DB $BACKUP_FILE
RESTORE_STATUS=$?

# 3. Run checks
if [ $RESTORE_STATUS -eq 0 ]; then
    # Check row counts match expected
    USER_COUNT=$(psql -U postgres -d $TEST_DB -t -c "SELECT COUNT(*) FROM users;")
    echo "Users: $USER_COUNT"
    
    # Check for corruption
    psql -U postgres -d $TEST_DB -c "SELECT pg_catalog.pg_relation_check_pages('users');"
    
    # Run test queries
    psql -U postgres -d $TEST_DB -f /tests/smoke_tests.sql
    
    echo "✓ Backup test PASSED"
else
    echo "✗ Backup test FAILED"
fi

# 4. Cleanup
psql -U postgres -c "DROP DATABASE $TEST_DB;"

Special Recovery Scenarios

Recovering from DROP TABLE
Sql
-- If you have PITR (Point-in-Time Recovery):
-- 1. Note exact time of DROP
-- 2. Restore to just before that time on separate server
-- 3. Export the dropped table
-- 4. Import to production

-- If you have logical backup:
-- Simply restore the table from dump
pg_restore -t dropped_table backup.dump

-- Prevention: Use event triggers
CREATE OR REPLACE FUNCTION prevent_drop()
RETURNS event_trigger AS $$
BEGIN
    RAISE EXCEPTION 'DROP operations are logged and require approval';
END;
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER prevent_drop_trigger
    ON sql_drop
    EXECUTE FUNCTION prevent_drop();
Recovering from Bad UPDATE
Sql
-- Scenario: UPDATE without WHERE clause
-- UPDATE users SET status = 'inactive';  -- All users!

-- With PITR:
-- 1. Stop application
-- 2. Restore to just before UPDATE on separate server
-- 3. Identify affected rows
-- 4. Generate corrective UPDATE statements
-- 5. Apply to production

-- Without PITR but with triggers/logging:
-- If you have audit tables:
SELECT * FROM users_audit 
WHERE changed_at BETWEEN '2024-01-15 10:00' AND '2024-01-15 10:05';

-- Reconstruct original values and UPDATE back

Next: Monitoring & Observability