databases-deep-dive

Shared from "Study" on Inkdown

Databases — A Senior Engineer's End-to-End Field Guide

A practical, in-depth walkthrough of database fundamentals → design trade-offs → how a senior engineer actually chooses. Written to be read top-to-bottom, with simplified Mermaid diagrams for the concepts that are easier seen than read.

How to read this: Each section builds on the previous one. The vocabulary (Part 1) feeds the trade-offs (Part 2), which feed the decision framework (Part 3). Real-world examples are called out in Example blocks.

The Mental Model
Foundations: tables, rows, schema, keys

	Normalization	Denormalization
Idea	Each fact lives in exactly one place	Deliberately duplicate data
Reads	Slower (need joins)	Faster (everything in one place)
Writes	Clean (update once)	Painful (update many copies)
Storage	Less	More
Risk	—	Data drifts out of sync

Indexed value (sorted)	Pointer
`aaron@x.com`	→ row @ block 91
`beth@x.com`	→ row @ block 12
`keto@shubho.tech`	→ row @ block 57
`zara@x.com`	→ row @ block 33

Type	What it is	Use when
Clustered	The table itself is physically sorted by this key. Only one per table (usually the PK); the leaf is the row.	Primary key
Non-clustered / secondary	Separate structure with pointers back to the row. Many allowed.	Any other searched column
Composite	Multi-column, e.g. `(last_name, first_name)`. Sorted by first, then second — order matters (like a phone book).	Multi-column filters
Unique	Also forbids duplicates	Emails, usernames
Partial	Indexes only some rows, e.g. `WHERE status='active'`	You only query a subset
Covering	Includes the columns a query needs, so it answers from the index alone (skips the row fetch)	Hot queries

Metric	What it means	Why it matters
QPS / TPS	Queries / transactions per second	Raw load; watch the peak-to-average ratio
p50 / p95 / p99 latency	Median vs. the slow tail	Optimize p99 — the tail is what users feel
Data volume & growth	10 GB vs 10 TB vs 10 PB	Picks the whole tool class; project 1–3 yrs out
Working set	Hot data that must fit in RAM/cache	Drives memory sizing
RPO	How much data you can afford to lose	Backup/replication strategy
RTO	How fast you must recover	Failover design
SLA	Uptime promise	99.9% ≈ 8.7 h/yr down; 99.99% ≈ 52 min/yr
Cost	$ / query, / GB stored, / GB transferred + ops	The bill and the human cost

Term	One-line meaning
Schema	The defined structure & types of your tables
Primary key	Unique identifier for a row
Foreign key	A pointer to another table's primary key
Join	Recombine rows across tables by a matching column
Normalization	Store each fact once (clean writes, slower reads)
Denormalization	Duplicate data (fast reads, messy writes)
Index	Sorted lookup structure for fast finds
B-tree	Balanced tree index; read-friendly, in-place
LSM-tree	Write-friendly index; buffer + flush + compaction
Clustered index	Table physically sorted by this key (one per table)
Covering index	Index that answers a query without fetching the row
ACID	Atomicity, Consistency, Isolation, Durability
Transaction	A group of operations treated as one unit
CAP	Under a partition, pick Consistency or Availability
PACELC	Even without a partition, trade Latency vs. Consistency
Eventual consistency	Replicas briefly disagree, then converge
Replication	Copies of the same data on multiple machines (read scaling)
Sharding	Different data split across machines (write scaling)
Shard key	The column that decides how data is partitioned
Hotspot	One shard overloaded due to a bad shard key
QPS / TPS	Queries / transactions per second
p99	The latency 99% of requests beat (the slow tail)
RPO / RTO	How much data you can lose / how fast you recover
SLA	Uptime promise (e.g. 99.99%)
OLTP	Operational, row-oriented, many small ops
OLAP	Analytical, column-oriented, big aggregations
Working set	Hot data that should fit in memory
Polyglot persistence	Using several databases, each for its strength

databases-deep-dive

Databases — A Senior Engineer's End-to-End Field Guide

Table of Contents

databases-deep-dive

Databases — A Senior Engineer's End-to-End Field Guide

Table of Contents

0. The Mental Model

1. Foundations

2. Joins

3. Normalization vs. Denormalization

4. Indexing (deep dive)

4.1 The problem

4.2 What an index actually is

4.3 Why sorted wins — binary search

4.4 The real structure — a B-tree

4.5 The trade-off

4.6 Flavors of indexes

5. ACID & Transactions

6. CAP & PACELC

7. Scaling

7.1 Replication — copies of the same data

7.2 Sharding — split different data across machines

8. The Measurements

9. OLTP vs. OLAP

10. Storage Engines

11. The Decision Framework

12. Worked Case Studies

Case A — Analytics dashboard

Case B — Banking ledger

13. Senior-Level Wisdom

14. Glossary