InkdownInkdown
Start writing

Study

59 files·8 subfolders

Shared Workspace

Study
core

03-database-design

Shared from "Study" on Inkdown

03 - Database Design & Normalization

The Goal of Database Design

Good database design ensures:

  • Data Integrity: No duplication, no inconsistencies
  • Efficiency: Fast queries, minimal storage
  • Scalability: Can grow without redesign
  • Maintainability: Easy to understand and modify

The Design Process

Plain text
1. Requirements Gathering
   ↓
2. Conceptual Design (ER Diagrams)
   ↓
3. Logical Design (Tables, Columns, Relations)
   ↓
4. Normalization (Apply rules to remove redundancy)
   ↓
5. Physical Design (Indexes, Partitioning)
   ↓
6. Implementation

Entity-Relationship (ER) Diagrams

programming-language-concepts.md
zero-language-explanation.md
DB
01-introduction.md
02-relational-databases.md
03-database-design.md
04-indexing.md
05-transactions-acid.md
06-nosql-databases.md
07-query-optimization.md
08-replication-ha.md
09-sharding-partitioning.md
10-caching-strategies.md
11-cap-theorem.md
12-connection-pooling.md
13-backup-recovery.md
14-monitoring.md
15-database-selection.md
README.md
JS
Event loop
Merlin Backend
01-Orchestration.md
02-DeepResearch.md
03-Search.md
04-Scraping.md
05-Streaming.md
06-MultiProviderLLM.md
07-MemoryAndContext.md
08-ErrorHandling.md
09-RateLimiting.md
10-TaskQueue.md
11-SecurityAndAuth.md
Orchestration-2nd-draft
OpenAI Agents Python
00_OVERVIEW.md
01_AGENT_SYSTEM.md
02_RUNNER_SYSTEM.md
03_TOOL_SYSTEM.md
04_ITEMS_SYSTEM.md
05_GUARDRAILS.md
06_HANDOFFS.md
07_MEMORY_SESSIONS.md
08_MODEL_PROVIDERS.md
09_SANDBOX_SYSTEM.md
10_TRACING.md
11_RUN_STATE.md
12_CONTEXT.md
13_LIFECYCLE_HOOKS.md
14_CONFIGURATION.md
15_ERROR_HANDLING.md
16_STREAMING.md
17_EXTENSIONS.md
18_MCP_INTEGRATION.md
19_BEST_PRACTICES.md
20_ARCHITECTURE_PATTERNS.md
opencode-study
context-handling
core
Python
Alembic
Basics
sqlalchemy - fastapi
SQLAlchemy overview
tweets
system_design_for_agentic_apps.md

Visual representation of data structure:

Plain text
┌─────────────────────────────────────────────────────────────────┐
│                     E-COMMERCE DATABASE                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   ┌──────────┐         ┌──────────┐         ┌──────────┐      │
│   │  USER    │         │  ORDER   │         │ PRODUCT  │      │
│   ├──────────┤         ├──────────┤         ├──────────┤      │
│   │ PK id    │◄────┐   │ PK id    │   ┌────►│ PK id    │      │
│   │    name  │     │   │ FK user_id│   │     │    name  │      │
│   │    email │     └───│    total  │   │     │    price │      │
│   │    phone │         │    status │   │     │    stock │      │
│   └──────────┘         └────┬─────┘   │     └──────────┘      │
│        │                    │         │           ▲            │
│        │ 1:M                │ M:N     │ M:1       │            │
│        ▼                    ▼         │           │            │
│   ┌──────────┐         ┌──────────┐   │     ┌──────────┐      │
│   │  ADDRESS │         │ORDER_ITEM│   │     │ CATEGORY │      │
│   ├──────────┤         ├──────────┤   │     ├──────────┤      │
│   │ PK id    │         │ PK id    │   │     │ PK id    │      │
│   │ FK user_id│         │ FK order_id│   │     │    name  │      │
│   │    street│         │ FK prod_id│───┘     └──────────┘      │
│   │    city  │         │    qty   │                          │
│   └──────────┘         │    price │                          │
│                        └──────────┘                          │
│                                                                  │
│  Legend: PK = Primary Key, FK = Foreign Key                    │
│          1:M = One-to-Many, M:N = Many-to-Many                 │
└─────────────────────────────────────────────────────────────────┘

Relationships Explained

1. One-to-One (1:1)

Each record in Table A relates to exactly one record in Table B.

Sql
-- Example: User and UserProfile
-- Users: id, email, name
-- UserProfiles: user_id (PK & FK), bio, avatar_url

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE user_profiles (
    user_id INTEGER PRIMARY KEY,  -- Also the FK
    bio TEXT,
    avatar_url VARCHAR(500),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

-- When to use: Rarely needed. Consider if:
-- - Optional data (not all users have profiles)
-- - Security (separate sensitive data)
-- - Performance (rarely accessed data in separate table)
2. One-to-Many (1:M)

Each record in Table A relates to many records in Table B.

Sql
-- Example: User has many Orders
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    total DECIMAL(10,2),
    FOREIGN KEY (user_id) REFERENCES users(id)
);

-- Foreign key always goes on the "many" side
3. Many-to-Many (M:N)

Records in Table A relate to many in Table B, and vice versa.

Sql
-- Example: Students and Courses
-- A student takes many courses
-- A course has many students

CREATE TABLE students (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE courses (
    id SERIAL PRIMARY KEY,
    title VARCHAR(200)
);

-- Junction/Link/Associate Table
CREATE TABLE enrollments (
    student_id INTEGER REFERENCES students(id),
    course_id INTEGER REFERENCES courses(id),
    enrolled_at TIMESTAMP DEFAULT NOW(),
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id)  -- Composite PK
);

-- Query: Get all courses for a student
SELECT c.* 
FROM courses c
JOIN enrollments e ON c.id = e.course_id
WHERE e.student_id = 1;

Normalization

Normalization is the process of organizing data to minimize redundancy and dependency.

Normal Forms Hierarchy
Plain text
┌─────────────────────────────────────┐
│           5NF (Perfect)             │
├─────────────────────────────────────┤
│   4NF (No multi-valued dependencies)│
├─────────────────────────────────────┤
│   3NF (No transitive dependencies)  │ ◄── Target for most apps
├─────────────────────────────────────┤
│   2NF (No partial dependencies)     │
├─────────────────────────────────────┤
│   1NF (Atomic values, no repeating) │
├─────────────────────────────────────┤
│         Unnormalized (Mess)         │
└─────────────────────────────────────┘
0NF - Unnormalized (The Mess)
Plain text
orders table:
┌────┬───────────┬────────────────────────────────────────────┬──────────┐
│ id │ customer  │ items                                      │ total    │
├────┼───────────┼────────────────────────────────────────────┼──────────┤
│ 1  │ John, NYC │ Laptop: $1000 x 1, Mouse: $25 x 2         │ $1050    │
│ 2  │ Jane, LA  │ Keyboard: $75 x 1, Monitor: $300 x 1      │ $375     │
└────┴───────────┴────────────────────────────────────────────┴──────────┘

Problems:
- Items stored as text blob (can't query individual items)
- Multiple values in customer column
- Can't calculate "total spent on laptops"
- Can't find all NYC customers easily
1NF - First Normal Form

Rules:

  1. Atomic values (no multi-valued attributes)
  2. No repeating groups
  3. Each row uniquely identifiable
Plain text
After 1NF:

orders:
┌────┬──────────┬───────┬──────────┐
│ id │ cust_name│ city  │ total    │
├────┼──────────┼───────┼──────────┤
│ 1  │ John     │ NYC   │ $1050    │
│ 2  │ Jane     │ LA    │ $375     │
└────┴──────────┴───────┴──────────┘

order_items:
┌─────────┬───────────┬───────┬─────┬─────────┐
│ order_id│ item_name │ price │ qty │ line_total│
├─────────┼───────────┼───────┼─────┼─────────┤
│ 1       │ Laptop    │ 1000  │ 1   │ 1000    │
│ 1       │ Mouse     │ 25    │ 2   │ 50      │
│ 2       │ Keyboard  │ 75    │ 1   │ 75      │
│ 2       │ Monitor   │ 300   │ 1   │ 300     │
└─────────┴───────────┴───────┴─────┴─────────┘

Now we can:
- Query "how many laptops sold" easily
- Sum line_total to verify order total
- Separate customer info if needed
2NF - Second Normal Form

Rules:

  1. Must be in 1NF
  2. No partial dependencies (non-key attributes must depend on entire key)

When needed: Only for tables with composite keys.

Plain text
Before 2NF (with composite key):
enrollments:
┌────────────┬──────────┬──────────────┬─────────────┬──────────────┐
│ student_id │ course_id│ student_name │ course_name │ enrolled_date│
├────────────┼──────────┼──────────────┼─────────────┼──────────────┤
│ 1          │ 101      │ John         │ Math 101    │ 2024-01-15   │
│ 1          │ 102      │ John         │ Physics     │ 2024-01-16   │
│ 2          │ 101      │ Jane         │ Math 101    │ 2024-01-15   │
└────────────┴──────────┴──────────────┴─────────────┴──────────────┘
PK: (student_id, course_id)

Problems:
- student_name depends only on student_id (part of key)
- course_name depends only on course_id (part of key)
- If John changes name, update multiple rows
- Redundancy: "Math 101" stored for every enrollment

After 2NF:

students:
┌────┬──────────┐
│ id │ name     │
├────┼──────────┤
│ 1  │ John     │
│ 2  │ Jane     │
└────┴──────────┘

courses:
┌────┬───────────┐
│ id │ name      │
├────┼───────────┤
│ 101│ Math 101  │
│ 102│ Physics   │
└────┴───────────┘

enrollments:
┌────────────┬──────────┬──────────────┐
│ student_id │ course_id│ enrolled_date│
├────────────┼──────────┼──────────────┤
│ 1          │ 101      │ 2024-01-15   │
│ 1          │ 102      │ 2024-01-16   │
│ 2          │ 101      │ 2024-01-15   │
└────────────┴──────────┴──────────────┘
3NF - Third Normal Form

Rules:

  1. Must be in 2NF
  2. No transitive dependencies (non-key attributes depend only on key)
Plain text
Before 3NF:
orders:
┌────┬──────────┬───────────┬─────────────┬────────────┐
│ id │ cust_id  │ cust_name │ cust_city   │ total      │
├────┼──────────┼───────────┼─────────────┼────────────┤
│ 1  │ 100      │ John      │ NYC         │ 1000       │
│ 2  │ 100      │ John      │ NYC         │ 500        │
│ 3  │ 101      │ Jane      │ LA          │ 750        │
└────┴──────────┴───────────┴─────────────┴────────────┘

Problems:
- cust_name depends on cust_id, not directly on order id (transitive)
- cust_city depends on cust_id (transitive)
- John's info repeated for every order
- If John moves, need to update all his orders

After 3NF:

orders:
┌────┬──────────┬────────┐
│ id │ cust_id  │ total  │
├────┼──────────┼────────┤
│ 1  │ 100      │ 1000   │
│ 2  │ 100      │ 500    │
│ 3  │ 101      │ 750    │
└────┴──────────┴────────┘

customers:
┌────┬───────────┬───────────┐
│ id │ name      │ city      │
├────┼───────────┼───────────┤
│ 100│ John      │ NYC       │
│ 101│ Jane      │ LA        │
└────┴───────────┴───────────┘

Now:
- Update customer info in one place
- No redundancy
- Can add customer fields without touching orders
Higher Normal Forms (Briefly)

BCNF (Boyce-Codd Normal Form): Stricter version of 3NF

  • Every determinant is a candidate key

4NF: No multi-valued dependencies

Plain text
Before 4NF:
employee_skills_languages:
┌─────────────┬───────────┬───────────┐
│ employee_id │ skill     │ language  │
├─────────────┼───────────┼───────────┤
│ 1           │ Python    │ English   │
│ 1           │ Python    │ Spanish   │
│ 1           │ Java      │ English   │
│ 1           │ Java      │ Spanish   │
└─────────────┴───────────┴───────────┘

Problem: Skills and languages are independent multi-valued attributes
Need 4 rows to express: Employee 1 knows {Python, Java} and {English, Spanish}

After 4NF:
employee_skills:
┌─────────────┬───────────┐
│ employee_id │ skill     │
├─────────────┼───────────┤
│ 1           │ Python    │
│ 1           │ Java      │
└─────────────┴───────────┘

employee_languages:
┌─────────────┬───────────┐
│ employee_id │ language  │
├─────────────┼───────────┤
│ 1           │ English   │
│ 1           │ Spanish   │
└─────────────┴───────────┘

5NF: No join dependencies (very rare to need)

Denormalization: When to Break the Rules

Sometimes you intentionally add redundancy for performance.

When to Denormalize:
  1. Read-heavy, rarely updated data
  2. Complex joins are too slow
  3. Aggregation queries are frequent
  4. Real-time requirements
Common Denormalization Patterns:
Sql
-- 1. Counter Cache
-- Instead of counting every time
SELECT COUNT(*) FROM orders WHERE user_id = 1;

-- Add counter to users table
ALTER TABLE users ADD COLUMN orders_count INTEGER DEFAULT 0;

-- Update via trigger or application
UPDATE users SET orders_count = orders_count + 1 WHERE id = 1;

-- 2. Storing Aggregates
CREATE TABLE daily_stats (
    date DATE PRIMARY KEY,
    total_orders INTEGER,
    total_revenue DECIMAL(12,2),
    unique_customers INTEGER
);

-- 3. Materialized Path (for trees)
-- Instead of recursive queries
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    path LTREE  -- PostgreSQL ltree extension: '1.5.12.3'
);

-- Query all subcategories of 'Electronics' easily
SELECT * FROM categories WHERE path <@ '1.5';

-- 4. JSON for flexible attributes
-- When attributes vary by product type
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    base_price DECIMAL(10,2),
    attributes JSONB  -- { "color": "red", "size": "XL", "material": "cotton" }
);

Design Patterns

1. Soft Deletes
Sql
-- Instead of DELETE, mark as deleted
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;

-- Query only active
SELECT * FROM users WHERE deleted_at IS NULL;

-- Can restore if needed
UPDATE users SET deleted_at = NULL WHERE id = 1;
2. Audit Logging
Sql
-- Track who changed what and when
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW(),
    created_by INTEGER,
    updated_at TIMESTAMP DEFAULT NOW(),
    updated_by INTEGER
);

-- Or separate audit table
CREATE TABLE users_audit (
    id INTEGER,
    name VARCHAR(100),
    operation VARCHAR(10),  -- INSERT, UPDATE, DELETE
    changed_at TIMESTAMP,
    changed_by INTEGER
);
3. Multi-tenancy Patterns

Separate Databases:

Plain text
Each tenant gets their own database
Pros: Complete isolation, easy backup per tenant
Cons: Harder to manage, cross-tenant queries difficult

Separate Schemas:

Sql
-- Same database, different schemas
CREATE SCHEMA tenant_abc;
CREATE SCHEMA tenant_xyz;

-- Tables: tenant_abc.users, tenant_xyz.users
-- Set search_path per connection
SET search_path = tenant_abc;

Shared Schema with Tenant ID:

Sql
CREATE TABLE users (
    tenant_id INTEGER NOT NULL,
    id SERIAL,
    name VARCHAR(100),
    PRIMARY KEY (tenant_id, id)
);

-- Every query must include tenant_id
SELECT * FROM users WHERE tenant_id = 123 AND id = 1;

-- Use Row Level Security
CREATE POLICY tenant_isolation ON users
    USING (tenant_id = current_setting('app.tenant_id')::INTEGER);
4. Handling Time Zones
Sql
-- Store in UTC, display in local time
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    start_time TIMESTAMPTZ,  -- Stored as UTC
    timezone VARCHAR(50)     -- For reference
);

-- Convert on retrieval
SELECT 
    name,
    start_time AT TIME ZONE 'America/New_York' as local_start
FROM events;
5. Handling Large Text
Sql
-- Store large content separately or use TEXT
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(200),
    summary VARCHAR(500),    -- For list views
    content TEXT,            -- Full content
    content_storage VARCHAR(20) DEFAULT 'inline',
    -- For really large content, store in S3 and keep reference
    s3_key VARCHAR(200)
);

Design Checklist

Before finalizing your schema:

  • All tables have primary keys
  • Foreign keys have indexes
  • Appropriate data types chosen
  • Constraints defined (NOT NULL, CHECK, etc.)
  • Normalized to at least 3NF
  • Identified denormalization needs
  • Soft deletes implemented if needed
  • Audit fields added
  • Multi-tenancy considered
  • Timezone handling planned
  • Growth estimates made (will BIGINT be needed?)

Next: Indexing Deep Dive