InkdownInkdown
Start writing

Study

69 files·11 subfolders

Shared Workspace

Study
core
Revision w/ Whiteboard

databases-deep-dive

Shared from "Study" on Inkdown

Databases — A Senior Engineer's End-to-End Field Guide

A practical, in-depth walkthrough of database fundamentals → design trade-offs → how a senior engineer actually chooses. Written to be read top-to-bottom, with simplified Mermaid diagrams for the concepts that are easier seen than read.

How to read this: Each section builds on the previous one. The vocabulary (Part 1) feeds the trade-offs (Part 2), which feed the decision framework (Part 3). Real-world examples are called out in Example blocks.


Table of Contents

  1. The Mental Model
  2. Foundations: tables, rows, schema, keys
CN Basics - 1
CN Basics - 2
DNS
Event loop
programming-language-concepts.md
zero-language-explanation.md
DB
Quick
databases-deep-dive.md
01-introduction.md
02-relational-databases.md
03-database-design.md
04-indexing.md
05-transactions-acid.md
06-nosql-databases.md
07-query-optimization.md
08-replication-ha.md
09-sharding-partitioning.md
10-caching-strategies.md
11-cap-theorem.md
12-connection-pooling.md
13-backup-recovery.md
14-monitoring.md
15-database-selection.md
README.md
JS
core topics
Event loop
Merlin Backend
01-Orchestration.md
02-DeepResearch.md
03-Search.md
04-Scraping.md
05-Streaming.md
06-MultiProviderLLM.md
07-MemoryAndContext.md
08-ErrorHandling.md
09-RateLimiting.md
10-TaskQueue.md
11-SecurityAndAuth.md
Orchestration-2nd-draft
Mobile
Build Alternative
Bundling
metro-bundler-deep-dive.md
OpenAI Agents Python
00_OVERVIEW.md
01_AGENT_SYSTEM.md
02_RUNNER_SYSTEM.md
03_TOOL_SYSTEM.md
04_ITEMS_SYSTEM.md
05_GUARDRAILS.md
06_HANDOFFS.md
07_MEMORY_SESSIONS.md
08_MODEL_PROVIDERS.md
09_SANDBOX_SYSTEM.md
10_TRACING.md
11_RUN_STATE.md
12_CONTEXT.md
13_LIFECYCLE_HOOKS.md
14_CONFIGURATION.md
15_ERROR_HANDLING.md
16_STREAMING.md
17_EXTENSIONS.md
18_MCP_INTEGRATION.md
19_BEST_PRACTICES.md
20_ARCHITECTURE_PATTERNS.md
opencode-study
context-handling
core
Python
Alembic
Basics
sqlalchemy - fastapi
SQLAlchemy overview
tweets
system_design_for_agentic_apps.md
Agent Loop
Joins
  • Normalization vs. Denormalization
  • Indexing (deep dive)
  • ACID & Transactions
  • CAP & PACELC
  • Scaling: replication & sharding
  • The Measurements
  • OLTP vs. OLAP
  • Storage Engines: B-tree vs. LSM-tree
  • The Decision Framework
  • Worked Case Studies
  • Senior-Level Wisdom
  • Glossary

  • 0. The Mental Model

    The single most important idea in this whole document:

    You don't pick a database. You describe a workload, attach real numbers to it, and the right database falls out.

    The flow a senior engineer follows:

    Rendering diagram…

    Everything below is the toolkit that makes each box in that diagram concrete.


    1. Foundations

    A relational database stores data in tables — think spreadsheets with strict rules.

    • Row = one record (one user, one order).
    • Column = one field (email, created_at).
    • Schema = the defined structure: which tables exist, their columns, and each column's type. A strong schema (Postgres, MySQL) enforces the rules at write time; a schemaless store (MongoDB) lets each record differ.
    • Primary key (PK) = a column that uniquely identifies each row (e.g. user_id). No two rows share it.
    • Foreign key (FK) = a column that points to another table's PK. This is how tables relate.
    Rendering diagram…

    Example — An e-commerce app stores customers in USERS and purchases in ORDERS. Each order carries a user_id FK so you always know who bought what, without copying the customer's name and address onto every single order.


    2. Joins

    Because relational data is split across tables, you often need to recombine it. A join stitches rows from two tables together using a matching column.

    Rendering diagram…
    Sql

    Why it matters: joins avoid duplicated data, but they are computationally expensive at scale — the engine matches rows across potentially huge tables. That cost is exactly why, at scale, people sometimes duplicate data to avoid joins (next section).


    3. Normalization vs. Denormalization

    This is the classic write-simplicity vs. read-speed trade-off.

    NormalizationDenormalization
    IdeaEach fact lives in exactly one placeDeliberately duplicate data
    ReadsSlower (need joins)Faster (everything in one place)
    WritesClean (update once)Painful (update many copies)
    StorageLessMore
    Risk—Data drifts out of sync
    Rendering diagram…

    Example — A banking system normalizes aggressively (correctness is sacred; an address must never disagree with itself). A product catalog read 10,000×/sec denormalizes the product name and price onto the listing row so a page load doesn't trigger five joins.


    4. Indexing (deep dive)

    4.1 The problem

    Without an index, finding a row = full table scan: read row 1, row 2, … all 10 million, checking each. Painfully slow.

    An index is a separate, pre-sorted lookup structure — like the index at the back of a textbook.

    4.2 What an index actually is

    A mini-table holding only (the indexed value, kept sorted) → (a pointer to the full row):

    Indexed value (sorted)Pointer
    aaron@x.com→ row @ block 91
    beth@x.com→ row @ block 12
    keto@shubho.tech→ row @ block 57
    zara@x.com→ row @ block 33

    Two properties make it powerful: it's small (one column + a pointer) and it's sorted.

    4.3 Why sorted wins — binary search
    Rendering diagram…

    Linear scan: up to 10,000,000 steps. Binary search: ~24 steps. That's the difference between "slow" and "instant."

    4.4 The real structure — a B-tree

    Real databases use a B-tree (balanced tree), because inserting into the middle of a giant flat list is expensive.

    Rendering diagram…
    • Root / branch nodes are signposts ("before M go left").
    • Leaf nodes hold the actual sorted values + row pointers, and are linked in order (so range scans just walk sideways).
    • Stays balanced: every lookup costs roughly the tree's depth — a handful of steps even at billions of rows. Inserts slot into the right leaf and rebalance locally.
    4.5 The trade-off
    Rendering diagram…

    Rule of thumb: index the columns you frequently search by, join on, or sort by — and resist indexing everything. A write-heavy table with too many indexes grinds.

    4.6 Flavors of indexes
    TypeWhat it isUse when
    ClusteredThe table itself is physically sorted by this key. Only one per table (usually the PK); the leaf is the row.Primary key
    Non-clustered / secondarySeparate structure with pointers back to the row. Many allowed.Any other searched column
    CompositeMulti-column, e.g. (last_name, first_name). Sorted by first, then second — order matters (like a phone book).Multi-column filters
    UniqueAlso forbids duplicatesEmails, usernames
    PartialIndexes only some rows, e.g. WHERE status='active'You only query a subset
    CoveringIncludes the columns a query needs, so it answers from the index alone (skips the row fetch)Hot queries

    Example — A phone book is a composite index on (last_name, first_name). Finding "Smith, John" is instant. Finding everyone named "John" (any last name) is useless — the book isn't sorted that way. Same rule applies to composite DB indexes: leading column first.


    5. ACID & Transactions

    A transaction groups operations into one unit. ACID are its guarantees:

    Rendering diagram…

    Example — A money transfer: "debit A by $100" and "credit B by $100" must both happen or neither (Atomicity). Two transfers hitting the same account at once must not corrupt the balance (Isolation). Once the bank says "done," a power cut can't undo it (Durability).

    Strong ACID is why you pick a relational DB for a banking ledger. Many NoSQL systems relax some of these for speed/scale.


    6. CAP & PACELC

    For distributed databases (data spread across machines): during a network partition (nodes can't talk), you can guarantee only two of three — so really you choose between C and A.

    Rendering diagram…
    • C — Consistency: every read sees the latest write.
    • A — Availability: every request gets a response.
    • P — Partition tolerance: keeps working despite network breaks (non-negotiable in distributed systems).

    PACELC extends it: if Partition → choose A or C; Else (normal operation) → trade Latency vs. Consistency. Keeping replicas perfectly in sync costs time.

    Eventual consistency is the AP compromise: replicas may briefly disagree but converge.

    Example — A YouTube view count stuck at "301" for a while is eventual consistency in the wild: the system chose availability + low latency over perfectly synchronized counters.


    7. Scaling

    Rendering diagram…
    7.1 Replication — copies of the same data

    Great for read scaling and fault tolerance.

    Rendering diagram…

    Key question: synchronous replicas (safe, slower) or asynchronous (fast, can be briefly stale)?

    7.2 Sharding — split different data across machines

    This is how you scale writes — the genuinely hard problem.

    Rendering diagram…

    The shard key is the highest-stakes, hardest-to-reverse decision in a distributed DB. A bad key creates a hotspot — e.g. sharding by country when 80% of users are in one country dumps the load on a single shard.


    8. The Measurements

    These are the numbers a senior attaches to a workload.

    MetricWhat it meansWhy it matters
    QPS / TPSQueries / transactions per secondRaw load; watch the peak-to-average ratio
    p50 / p95 / p99 latencyMedian vs. the slow tailOptimize p99 — the tail is what users feel
    Data volume & growth10 GB vs 10 TB vs 10 PBPicks the whole tool class; project 1–3 yrs out
    Working setHot data that must fit in RAM/cacheDrives memory sizing
    RPOHow much data you can afford to loseBackup/replication strategy
    RTOHow fast you must recoverFailover design
    SLAUptime promise99.9% ≈ 8.7 h/yr down; 99.99% ≈ 52 min/yr
    Cost$ / query, / GB stored, / GB transferred + opsThe bill and the human cost
    Rendering diagram…

    Example — An API averaging "30 ms" can still feel broken if its p99 is 3 seconds: 1 in 100 requests stalls, and a page making 100 calls will almost always hit one. Averages lie; percentiles tell the truth.


    9. OLTP vs. OLAP

    Rendering diagram…
    • OLTP: "place an order," "update a profile." Row-oriented (a whole row sits together).
    • OLAP: "total revenue by region last quarter." Column-oriented — only reads the columns it needs, so it crushes scan-heavy aggregation.

    Example — Your app writes orders to Postgres (OLTP) all day. Overnight, those orders stream into BigQuery (OLAP) where analysts run "revenue by region" without ever slowing down the live store.


    10. Storage Engines

    Rendering diagram…
    • B-tree: updates data in place; balanced read/write; the default for SQL databases.
    • LSM-tree (Log-Structured Merge): buffers writes in memory, flushes sequential immutable files, merges them in the background (compaction). Very fast writes; reads may check several files. Used by Cassandra, RocksDB, ScyllaDB.

    Example — An IoT fleet writing 500k sensor readings/sec leans LSM (Cassandra): writes are sequential and cheap. A CRM doing balanced reads/writes with rich queries leans B-tree (Postgres).


    11. The Decision Framework

    Rendering diagram…

    Steps:

    1. Characterize the workload — read/write ratio, access patterns, query predictability, data shape, consistency needs.
    2. Attach real numbers — QPS, data size + growth, p99 target, SLA, budget.
    3. Let the numbers eliminate options — don't argue taste; let constraints decide.
    4. Default to boring — Postgres until measured evidence says otherwise.
    5. Scale the bottleneck — reads → replicas/cache; writes → shard or specialized store.

    12. Worked Case Studies

    Case A — Analytics dashboard

    50k writes/sec of event data · queries are big time-range aggregations · p95 read < 2s · grows ~1 TB/month · stale-by-a-minute is fine.

    Rendering diagram…
    Case B — Banking ledger

    2k TPS · every transaction must be correct & durable · ad-hoc queries by support staff.

    Rendering diagram…

    Same engineer, opposite decisions — because the measurements differed. That's the entire point.


    13. Senior-Level Wisdom

    • Boring is good. Default to Postgres until you have measured evidence it can't keep up. It scales much further than people assume.
    • Scaling reads ≠ scaling writes. Reads scale easily (replicas, caching). Writes are the hard problem (sharding is mostly a one-way door).
    • The shard key is the most consequential, least reversible choice in a distributed DB. Choose to spread load evenly; avoid hotspots.
    • Polyglot persistence is normal. Postgres (transactions) + Redis (cache) + Elasticsearch (search) + warehouse (analytics). The right answer is often "more than one."
    • Operational maturity matters. "Can my team run this at 3am?" A theoretically-better DB nobody can operate is the wrong choice.
    • Design for 10×, not 1000×. Architecting today for traffic you won't see for five years is over-engineering. Leave room to evolve; don't build the final form on day one.
    • Measure before you migrate. "We need NoSQL for scale" is, more often than not, a missing index or an N+1 query.

    14. Glossary

    TermOne-line meaning
    SchemaThe defined structure & types of your tables
    Primary keyUnique identifier for a row
    Foreign keyA pointer to another table's primary key
    JoinRecombine rows across tables by a matching column
    NormalizationStore each fact once (clean writes, slower reads)
    DenormalizationDuplicate data (fast reads, messy writes)
    IndexSorted lookup structure for fast finds
    B-treeBalanced tree index; read-friendly, in-place
    LSM-treeWrite-friendly index; buffer + flush + compaction
    Clustered indexTable physically sorted by this key (one per table)
    Covering indexIndex that answers a query without fetching the row
    ACIDAtomicity, Consistency, Isolation, Durability
    TransactionA group of operations treated as one unit
    CAPUnder a partition, pick Consistency or Availability
    PACELCEven without a partition, trade Latency vs. Consistency
    Eventual consistencyReplicas briefly disagree, then converge
    ReplicationCopies of the same data on multiple machines (read scaling)
    ShardingDifferent data split across machines (write scaling)
    Shard keyThe column that decides how data is partitioned
    HotspotOne shard overloaded due to a bad shard key
    QPS / TPSQueries / transactions per second
    p99The latency 99% of requests beat (the slow tail)
    RPO / RTOHow much data you can lose / how fast you recover
    SLAUptime promise (e.g. 99.99%)
    OLTPOperational, row-oriented, many small ops
    OLAPAnalytical, column-oriented, big aggregations
    Working setHot data that should fit in memory
    Polyglot persistenceUsing several databases, each for its strength

    Built as a study reference — pair it with the in-workspace Notion page for quick lookups, and use this file for the diagrams and offline reading.