14 - Monitoring & Observability
Why Monitor Databases?
Plain text
┌─────────────────────────────────────────────────────────────┐
│ Monitoring Goals │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. Availability: Is the database up? │
│ │
│ 2. Performance: Are queries fast enough? │
│ - Response time percentiles (p50, p95, p99) │
│ - Throughput (QPS) │
│ │
│ 3. Capacity Planning: When will we run out? │
│ - Storage growth │
│ - Connection usage │
│ │
│ 4. Early Warning: Catch problems before outage │
│ - Slow query increase │
│ - Lock contention │
│ - Replication lag │
│ │
│ 5. Debugging: What went wrong? │
│ - Query patterns │
│ - Error logs │
│ - Wait events │
│ │
└─────────────────────────────────────────────────────────────┘