InkdownInkdown
Start writing

Study

59 files·8 subfolders

Shared Workspace

Study
core

02-relational-databases

Shared from "Study" on Inkdown

02 - Relational Databases & SQL

What is a Relational Database?

A relational database organizes data into tables (relations) with predefined schemas. Tables relate to each other through keys, enabling powerful queries across multiple tables.

The Relational Model
Plain text
┌─────────────────────────────────────────────────────────────┐
│                      DATABASE                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐         ┌──────────────┐                 │
│  │    USERS     │         │   ORDERS     │                 │
│  ├──────────────┤         ├──────────────┤                 │
│  │ id (PK)      │◄────────│ user_id (FK) │                 │
│  │ name         │    1:M  │ id (PK)      │                 │
│  │ email        │         │ total        │                 │
│  │ created_at   │         │ status       │                 │
│  └──────────────┘         └──────────────┘                 │
│         │                                                   │
│         │ 1:1                                               │
│         ▼                                                   │
│  ┌──────────────┐                                           │
│  │ USER_PROFILES│                                           │
│  ├──────────────┤                                           │
│  │ user_id (FK)│                                           │
│  │ bio          │                                           │
│  │ avatar_url   │                                           │
│  └──────────────┘                                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

PK = Primary Key (unique identifier)
FK = Foreign Key (reference to another table)
1:M = One-to-Many relationship
1:1 = One-to-One relationship
M:N = Many-to-Many (requires junction table)
programming-language-concepts.md
zero-language-explanation.md
DB
01-introduction.md
02-relational-databases.md
03-database-design.md
04-indexing.md
05-transactions-acid.md
06-nosql-databases.md
07-query-optimization.md
08-replication-ha.md
09-sharding-partitioning.md
10-caching-strategies.md
11-cap-theorem.md
12-connection-pooling.md
13-backup-recovery.md
14-monitoring.md
15-database-selection.md
README.md
JS
Event loop
Merlin Backend
01-Orchestration.md
02-DeepResearch.md
03-Search.md
04-Scraping.md
05-Streaming.md
06-MultiProviderLLM.md
07-MemoryAndContext.md
08-ErrorHandling.md
09-RateLimiting.md
10-TaskQueue.md
11-SecurityAndAuth.md
Orchestration-2nd-draft
OpenAI Agents Python
00_OVERVIEW.md
01_AGENT_SYSTEM.md
02_RUNNER_SYSTEM.md
03_TOOL_SYSTEM.md
04_ITEMS_SYSTEM.md
05_GUARDRAILS.md
06_HANDOFFS.md
07_MEMORY_SESSIONS.md
08_MODEL_PROVIDERS.md
09_SANDBOX_SYSTEM.md
10_TRACING.md
11_RUN_STATE.md
12_CONTEXT.md
13_LIFECYCLE_HOOKS.md
14_CONFIGURATION.md
15_ERROR_HANDLING.md
16_STREAMING.md
17_EXTENSIONS.md
18_MCP_INTEGRATION.md
19_BEST_PRACTICES.md
20_ARCHITECTURE_PATTERNS.md
opencode-study
context-handling
core
Python
Alembic
Basics
sqlalchemy - fastapi
SQLAlchemy overview
tweets
system_design_for_agentic_apps.md

SQL: Structured Query Language

SQL is the standard language for relational databases. It has several sub-languages:

1. DDL (Data Definition Language) - Structure
Sql
-- CREATE: Make new tables
CREATE TABLE users (
    id SERIAL PRIMARY KEY,           -- Auto-incrementing integer PK
    email VARCHAR(255) NOT NULL UNIQUE, -- Variable string, required, unique
    name VARCHAR(100) NOT NULL,
    age INTEGER CHECK (age >= 0),      -- Constraint: age can't be negative
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- ALTER: Modify existing tables
ALTER TABLE users 
    ADD COLUMN phone VARCHAR(20),
    DROP COLUMN age,
    ALTER COLUMN name SET NOT NULL;

-- DROP: Remove tables (DANGEROUS!)
DROP TABLE users;  -- Deletes table and all data
DROP TABLE IF EXISTS users;  -- Safe version

-- TRUNCATE: Empty table but keep structure (faster than DELETE)
TRUNCATE TABLE users;
2. DML (Data Manipulation Language) - Data
Sql
-- INSERT: Add new rows
INSERT INTO users (email, name, age) 
VALUES ('john@example.com', 'John Doe', 30);

-- Insert multiple rows
INSERT INTO users (email, name) 
VALUES 
    ('jane@example.com', 'Jane Smith'),
    ('bob@example.com', 'Bob Wilson');

-- UPDATE: Modify existing rows
UPDATE users 
SET name = 'John Updated', age = 31 
WHERE id = 1;

-- DANGEROUS: UPDATE without WHERE updates ALL rows!
UPDATE users SET status = 'inactive';  -- ALL users become inactive!

-- DELETE: Remove rows
DELETE FROM users WHERE id = 1;

-- DANGEROUS: DELETE without WHERE deletes ALL rows!
DELETE FROM users;  -- Empty table!
3. DQL (Data Query Language) - Retrieval
Sql
-- Basic SELECT
SELECT * FROM users;
SELECT name, email FROM users;

-- Filtering with WHERE
SELECT * FROM users WHERE age > 25;
SELECT * FROM users WHERE name LIKE 'J%';  -- Starts with J
SELECT * FROM users WHERE email LIKE '%@gmail.com';  -- Ends with @gmail.com

-- Comparison operators
-- =, != or <>, >, <, >=, <=
-- BETWEEN, IN, NOT IN, IS NULL, IS NOT NULL

SELECT * FROM users 
WHERE age BETWEEN 25 AND 35 
  AND status IN ('active', 'pending')
  AND email IS NOT NULL;

-- Sorting
SELECT * FROM users ORDER BY created_at DESC;
SELECT * FROM users ORDER BY age ASC, name DESC;

-- Limiting results
SELECT * FROM users LIMIT 10;
SELECT * FROM users LIMIT 10 OFFSET 20;  -- Pagination: page 3

-- Aggregation functions
SELECT 
    COUNT(*) as total_users,
    AVG(age) as average_age,
    MAX(age) as oldest,
    MIN(age) as youngest,
    SUM(salary) as total_salary
FROM users;

-- GROUP BY: Aggregate by category
SELECT 
    status,
    COUNT(*) as count,
    AVG(age) as avg_age
FROM users
GROUP BY status;

-- HAVING: Filter aggregated results
SELECT status, COUNT(*) as count
FROM users
GROUP BY status
HAVING COUNT(*) > 5;  -- Only groups with 5+ users
4. JOINs: Combining Tables
Sql
-- Tables structure:
-- users: id, name, email
-- orders: id, user_id, amount, status

-- INNER JOIN: Only matching rows from both tables
SELECT u.name, o.amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- Result: Only users who have orders

-- LEFT JOIN: All from left, matching from right
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- Result: All users, NULL for users without orders

-- RIGHT JOIN: All from right, matching from left (rarely used)
SELECT u.name, o.amount
FROM users u
RIGHT JOIN orders o ON u.id = o.user_id;

-- FULL OUTER JOIN: All from both, NULL where no match
SELECT u.name, o.amount
FROM users u
FULL OUTER JOIN orders o ON u.id = o.user_id;

-- Multiple JOINs
SELECT u.name, o.amount, p.product_name
FROM users u
INNER JOIN orders o ON u.id = o.user_id
INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id;

-- Self JOIN: Table joined to itself
-- employees: id, name, manager_id
SELECT e.name as employee, m.name as manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
5. Subqueries & CTEs
Sql
-- Subquery in WHERE
SELECT * FROM users 
WHERE id IN (SELECT user_id FROM orders WHERE amount > 100);

-- Subquery in FROM
SELECT avg_age.avg_value
FROM (SELECT AVG(age) as avg_value FROM users) as avg_age;

-- Correlated subquery (runs once per row)
SELECT name,
    (SELECT COUNT(*) FROM orders WHERE user_id = users.id) as order_count
FROM users;

-- CTE (Common Table Expression): Cleaner subqueries
WITH active_users AS (
    SELECT * FROM users WHERE status = 'active'
),
user_orders AS (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    GROUP BY user_id
)
SELECT au.name, COALESCE(uo.order_count, 0) as orders
FROM active_users au
LEFT JOIN user_orders uo ON au.id = uo.user_id;
6. Window Functions (Advanced)
Sql
-- ROW_NUMBER: Rank rows
SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as rank
FROM employees;

-- PARTITION BY: Rank within groups
SELECT 
    department,
    name,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;

-- Running totals
SELECT 
    date,
    amount,
    SUM(amount) OVER (ORDER BY date) as running_total
FROM sales;

-- LAG/LEAD: Access previous/next rows
SELECT 
    date,
    amount,
    LAG(amount) OVER (ORDER BY date) as previous_day,
    amount - LAG(amount) OVER (ORDER BY date) as day_change
FROM sales;

Data Types Deep Dive

PostgreSQL Types
Sql
-- Numeric
INTEGER          -- 4 bytes, -2B to +2B
BIGINT           -- 8 bytes, huge numbers
SERIAL           -- Auto-incrementing integer
BIGSERIAL        -- Auto-incrementing bigint
NUMERIC(10,2)    -- Exact decimal, 10 digits, 2 after decimal
REAL / FLOAT     -- Approximate, 6 decimal digits precision
DOUBLE PRECISION -- Approximate, 15 decimal digits

-- String
VARCHAR(n)       -- Variable length, max n chars
CHAR(n)          -- Fixed length, padded with spaces
TEXT             -- Unlimited length (prefer this)

-- Date/Time
DATE             -- Just date: 2024-01-15
TIME             -- Just time: 14:30:00
TIMESTAMP        -- Date + time: 2024-01-15 14:30:00
TIMESTAMPTZ      -- With timezone: 2024-01-15 14:30:00+00
INTERVAL         -- Duration: '1 day 2 hours'

-- Boolean
BOOLEAN          -- TRUE, FALSE, NULL

-- JSON (PostgreSQL)
JSON             -- Stored as text, parsed each time
JSONB            -- Binary, indexed, preferred

-- Arrays
INTEGER[]        -- Array of integers
TEXT[][]         -- 2D array

-- Special
UUID             -- Universally unique identifier
INET             -- IP address
CIDR             -- IP network
MACADDR          -- MAC address
GEOMETRY         -- PostGIS extension for geographic data

Constraints & Data Integrity

Sql
-- PRIMARY KEY: Unique identifier
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    -- or: PRIMARY KEY (id)
    -- or composite: PRIMARY KEY (user_id, order_number)
);

-- FOREIGN KEY: Referential integrity
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    -- or with actions:
    order_id INTEGER REFERENCES orders(id) 
        ON DELETE CASCADE      -- Delete items when order deleted
        ON UPDATE CASCADE,      -- Update items when order id changes
);

-- UNIQUE: No duplicates allowed
CREATE TABLE users (
    email VARCHAR(255) UNIQUE,
    -- or as table constraint:
    CONSTRAINT unique_email UNIQUE (email)
);

-- CHECK: Custom validation
CREATE TABLE products (
    price DECIMAL CHECK (price > 0),
    quantity INTEGER CHECK (quantity >= 0),
    -- complex check:
    CONSTRAINT valid_dates CHECK (end_date > start_date)
);

-- NOT NULL: Required field
CREATE TABLE users (
    name VARCHAR(100) NOT NULL
);

-- DEFAULT: Auto-fill if not provided
CREATE TABLE users (
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(20) DEFAULT 'pending'
);

Views & Materialized Views

Sql
-- Regular View: Stored query, computed on access
CREATE VIEW active_users AS
SELECT id, name, email
FROM users
WHERE status = 'active';

-- Use it like a table
SELECT * FROM active_users WHERE name LIKE 'J%';

-- Materialized View: Precomputed, stored on disk
CREATE MATERIALIZED VIEW daily_sales AS
SELECT 
    DATE(created_at) as date,
    SUM(amount) as total
FROM orders
GROUP BY DATE(created_at);

-- Refresh when needed
REFRESH MATERIALIZED VIEW daily_sales;
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales;  -- Without locking

-- With indexes on materialized views
CREATE INDEX idx_daily_sales_date ON daily_sales(date);

Stored Procedures & Functions

Sql
-- Simple function
CREATE OR REPLACE FUNCTION get_user_orders(user_id INTEGER)
RETURNS TABLE(order_id INTEGER, amount DECIMAL) AS $$
BEGIN
    RETURN QUERY
    SELECT id, total
    FROM orders
    WHERE orders.user_id = get_user_orders.user_id;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM get_user_orders(1);

-- Function with logic
CREATE OR REPLACE FUNCTION update_user_status()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.login_count > 100 THEN
        NEW.status := 'vip';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Trigger: Auto-execute on events
CREATE TRIGGER check_vip_status
    BEFORE UPDATE ON users
    FOR EACH ROW
    EXECUTE FUNCTION update_user_status();

Advanced SQL Patterns

1. UPSERT (Insert or Update)
Sql
-- PostgreSQL
INSERT INTO users (id, name, email)
VALUES (1, 'John', 'john@new.com')
ON CONFLICT (id) DO UPDATE
SET name = EXCLUDED.name,
    email = EXCLUDED.email;

-- MySQL
INSERT INTO users (id, name, email)
VALUES (1, 'John', 'john@new.com')
ON DUPLICATE KEY UPDATE
    name = VALUES(name),
    email = VALUES(email);
2. Conditional Aggregation
Sql
SELECT 
    DATE(created_at) as date,
    COUNT(*) as total_orders,
    COUNT(*) FILTER (WHERE status = 'completed') as completed,
    COUNT(*) FILTER (WHERE status = 'cancelled') as cancelled,
    SUM(amount) FILTER (WHERE status = 'completed') as revenue
FROM orders
GROUP BY DATE(created_at);
3. Recursive CTEs (Hierarchical Data)
Sql
-- Employee hierarchy
WITH RECURSIVE hierarchy AS (
    -- Anchor: Start with top-level employees
    SELECT id, name, manager_id, 0 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: Find subordinates
    SELECT e.id, e.name, e.manager_id, h.level + 1
    FROM employees e
    INNER JOIN hierarchy h ON e.manager_id = h.id
)
SELECT * FROM hierarchy;

Best Practices

1. Query Writing
Sql
-- ✓ Good: Explicit columns
SELECT id, name, email FROM users;

-- ✗ Bad: SELECT * (slow, breaks when schema changes)
SELECT * FROM users;

-- ✓ Good: Meaningful aliases
SELECT u.name as user_name, o.total as order_total
FROM users u
JOIN orders o ON u.id = o.user_id;

-- ✓ Good: Use EXPLAIN before optimizing
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
2. Schema Design
Sql
-- ✓ Good: Use appropriate types
CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,        -- Might need billions of products
    name VARCHAR(255) NOT NULL,
    price NUMERIC(19,4) NOT NULL,    -- Financial precision
    description TEXT,                -- Unlimited, no length worry
    metadata JSONB,                  -- Flexible schema
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- ✓ Good: Indexes on foreign keys
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- ✓ Good: Comments
COMMENT ON TABLE users IS 'Application users';
COMMENT ON COLUMN users.email IS 'Unique email for login';
3. Security
Sql
-- Create roles
CREATE ROLE app_read;
CREATE ROLE app_write;
CREATE ROLE app_admin;

-- Grant permissions
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_read;
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA public TO app_write;

-- Row Level Security (RLS)
ALTER TABLE users ENABLE ROW LEVEL SECURITY;

CREATE POLICY user_isolation ON users
    FOR ALL
    TO app_users
    USING (id = current_setting('app.current_user_id')::INTEGER);

Common Pitfalls

  1. N+1 Query Problem: Loading related data in a loop
  2. Missing Indexes: Full table scans on large tables
  3. No Pagination: Loading millions of rows at once
  4. String Concatenation in Queries: SQL Injection risk
  5. Not Using Transactions: Data inconsistency on failures
  6. Long-running Transactions: Lock contention
  7. Over-normalization: Too many joins hurt performance

Next: Database Design & Normalization