InkdownInkdown
Start writing

Arpit Bhayani Blogs

336 files·168 subfolders

Shared Workspace

Arpit Bhayani Blogs
001 Ai Topological Sort

007-defensive-databases

Shared from "Arpit Bhayani Blogs" on Inkdown

Databases Were Not Designed For This

Source: https://arpitbhayani.me/blogs/defensive-databases Date: 2026-03-08

There is an implicit contract at the foundation of every database architecture decision you have ever made. You probably never wrote it down. Nobody does. It just… existed.


There is an implicit contract at the foundation of every database architecture decision you have ever made. You probably never wrote it down. Nobody does. It just… existed.

The contract goes something like this: the caller is a human-authored application, running deterministic code, issuing predictable queries, reviewed by a developer before deployment. Writes are intentional. Connections are brief. When something goes wrong, a human notices. The database can be dumb and fast because the application layer is smart and careful.

For forty years, this contract held. It shaped how we designed schemas, sized connection pools, granted permissions, and thought about failure modes. It worked because the assumption was correct.

001-ai-topological-sort.md
tldr.md
002 Temporal Primer
002-temporal-primer.md
tldr.md
003 Rag Production
003-rag-production.md
tldr.md
004 Structure Of Llm Chat
004-structure-of-llm-chat.md
tldr.md
005 How Llms Work
005-how-llms-work.md
tldr.md
006 Monolith Is Distributed System
006-monolith-is-distributed-system.md
tldr.md
007 Defensive Databases
007-defensive-databases.md
tldr.md
008 Bm25
008-bm25.md
tldr.md
009 Join Algorithms
009-join-algorithms.md
tldr.md
010 Venting At Work
010-venting-at-work.md
tldr.md
011 Half Life
011-half-life.md
tldr.md
012 Multi Paxos
012-multi-paxos.md
tldr.md
013 Mysql Replication Internals
013-mysql-replication-internals.md
tldr.md
014 Bloom Filters
014-bloom-filters.md
tldr.md
015 Clock Sync Nightmare
015-clock-sync-nightmare.md
tldr.md
016 Kafka Partitions
016-kafka-partitions.md
tldr.md
017 Product Quantization
017-product-quantization.md
tldr.md
018 Qkv Matrices
018-qkv-matrices.md
tldr.md
019 Deleted Production
019-deleted-production.md
tldr.md
020 How Llm Inference Works
020-how-llm-inference-works.md
tldr.md
021 Blocking Queues
021-blocking-queues.md
tldr.md
022 Heartbeats In Distributed Systems
022-heartbeats-in-distributed-systems.md
tldr.md
023 Cassandra Writes
023-cassandra-writes.md
tldr.md
024 Redis Replication
024-redis-replication.md
tldr.md
025 Arrogant People At Work
025-arrogant-people-at-work.md
tldr.md
026 Cdn Content Replication
026-cdn-content-replication.md
tldr.md
027 Cant Fix Everything Day One
027-cant-fix-everything-day-one.md
tldr.md
028 Emotions At Work
028-emotions-at-work.md
tldr.md
029 Grpc Http2
029-grpc-http2.md
tldr.md
030 Meetings With No Agenda Are A Waste Of Time
030-meetings-with-no-agenda-are-a-waste-of-time.md
tldr.md
031 Growth Is Not About Doing Everything
031-growth-is-not-about-doing-everything.md
tldr.md
032 Career Longevity Vs Job Hopping
032-career-longevity-vs-job-hopping.md
tldr.md
033 Stay Relevant At Higher Salary Levels
033-stay-relevant-at-higher-salary-levels.md
tldr.md
034 Why Consensus
034-why-consensus.md
tldr.md
035 Database Deadlocks
035-database-deadlocks.md
tldr.md
036 Cpu Cache Locality
036-cpu-cache-locality.md
tldr.md
037 Eventual Consistency
037-eventual-consistency.md
tldr.md
038 Dns Udp Tcp
038-dns-udp-tcp.md
tldr.md
039 Masters
039-masters.md
tldr.md
040 Empathy Makes Great Engineers Unstoppable
040-empathy-makes-great-engineers-unstoppable.md
tldr.md
041 Good Mentors Build People
041-good-mentors-build-people.md
tldr.md
042 Always Have Back Burner Projects
042-always-have-back-burner-projects.md
tldr.md
043 Before You Push Back Know What Youre Standing On
043-before-you-push-back-know-what-youre-standing-on.md
tldr.md
044 Be The One They Can Count On
044-be-the-one-they-can-count-on.md
tldr.md
045 How Much People Bet On You
045-how-much-people-bet-on-you.md
tldr.md
046 How To Get Leadership To Say Yes To Your Project
046-how-to-get-leadership-to-say-yes-to-your-project.md
tldr.md
047 Dont Let Your Best Ideas Die In Silence
047-dont-let-your-best-ideas-die-in-silence.md
tldr.md
048 Be Someone Others Want To Work With
048-be-someone-others-want-to-work-with.md
tldr.md
049 Dont Fall For Xy Problem Ask Right Questions
049-dont-fall-for-xy-problem-ask-right-questions.md
tldr.md
050 Biggest Lie Startups Tell Engineers
050-biggest-lie-startups-tell-engineers.md
tldr.md
051 Promotions Are Proactive Not Reactive
051-promotions-are-proactive-not-reactive.md
tldr.md
052 Not Enough To Be Right Learn To Be Heard
052-not-enough-to-be-right-learn-to-be-heard.md
tldr.md
053 No One Ships Alone
053-no-one-ships-alone.md
tldr.md
054 Not Every Mistake Needs A Correction
054-not-every-mistake-needs-a-correction.md
tldr.md
055 Build Influence At Work
055-build-influence-at-work.md
tldr.md
056 Your Soft Skills Arent Soft At All
056-your-soft-skills-arent-soft-at-all.md
tldr.md
057 Experience Before Forming Opinion
057-experience-before-forming-opinion.md
tldr.md
058 Curiosity And High Bias For Action
058-curiosity-and-high-bias-for-action.md
tldr.md
059 Worklog
059-worklog.md
tldr.md
060 Mistakes And Growth
060-mistakes-and-growth.md
tldr.md
061 Own It Instead Of Sweeping It Aside
061-own-it-instead-of-sweeping-it-aside.md
tldr.md
062 Dont Wait Step Up
062-dont-wait-step-up.md
tldr.md
063 Temporary Fix Is Permanent
063-temporary-fix-is-permanent.md
tldr.md
064 Interview Bias And What Sets You Apart
064-interview-bias-and-what-sets-you-apart.md
tldr.md
065 Saying This Isnt My Problem Is A Problem
065-saying-this-isnt-my-problem-is-a-problem.md
tldr.md
066 Okr
066-okr.md
tldr.md
067 Miscommunication
067-miscommunication.md
tldr.md
068 When In Doubt Code It Out
068-when-in-doubt-code-it-out.md
tldr.md
069 Follow Up Without Annoying People
069-follow-up-without-annoying-people.md
tldr.md
070 Lead Projects That Land
070-lead-projects-that-land.md
tldr.md
071 Abstract Thinking Skill Next Decade
071-abstract-thinking-skill-next-decade.md
tldr.md
072 We Engineers Suck At Task Estimation
072-we-engineers-suck-at-task-estimation.md
tldr.md
073 Shiny Object Syndrome In Tech
073-shiny-object-syndrome-in-tech.md
tldr.md
074 3p
074-3p.md
tldr.md
075 Leverage The Equilibrium
075-leverage-the-equilibrium.md
tldr.md
076 On Demand Container Loading In Aws Lambda
076-on-demand-container-loading-in-aws-lambda.md
tldr.md
077 Sql Has Problems We Can Fix Them Pipe Syntax In Sql
077-sql-has-problems-we-can-fix-them-pipe-syntax-in-sql.md
tldr.md
078 Nanolog A Nanosecond Scale Logging System
078-nanolog-a-nanosecond-scale-logging-system.md
tldr.md
079 Best Resource Is Mythical
079-best-resource-is-mythical.md
tldr.md
080 Wtf The Who To Follow Service At Twitter
080-wtf-the-who-to-follow-service-at-twitter.md
tldr.md
081 Know A Lot
081-know-a-lot.md
tldr.md
082 Out Of Syllabus
082-out-of-syllabus.md
tldr.md
083 Negotiate The Offer
083-negotiate-the-offer.md
tldr.md
084 Never Bad Mouth Your Ex Exployer
084-never-bad-mouth-your-ex-exployer.md
tldr.md
085 Culture Fit
085-culture-fit.md
tldr.md
086 Quantification In Resume
086-quantification-in-resume.md
tldr.md
087 Hiring Is Unfair
087-hiring-is-unfair.md
tldr.md
088 Questions For Interviewers
088-questions-for-interviewers.md
tldr.md
089 Collaboration Communication
089-collaboration-communication.md
tldr.md
090 Out Of Vicious Interview Cycle
090-out-of-vicious-interview-cycle.md
tldr.md
091 Pitch Projects Not Ideas
091-pitch-projects-not-ideas.md
tldr.md
092 Read Design Docs
092-read-design-docs.md
tldr.md
093 Read Rca Docs
093-read-rca-docs.md
tldr.md
094 Start Generalist
094-start-generalist.md
tldr.md
095 Do Not Rely On Summaries
095-do-not-rely-on-summaries.md
tldr.md
096 Structure Your Design Interviews
096-structure-your-design-interviews.md
tldr.md
097 Title Inflation
097-title-inflation.md
tldr.md
098 Find Your Own Project
098-find-your-own-project.md
tldr.md
099 Six Pointers To Crack Coding And Design Interviews
099-six-pointers-to-crack-coding-and-design-interviews.md
tldr.md
100 Keep Yourself Unblocked
100-keep-yourself-unblocked.md
tldr.md
101 Genetic Knapsack
101-genetic-knapsack.md
tldr.md
102 Pseudorandom Number Generation Lfsr
102-pseudorandom-number-generation-lfsr.md
tldr.md
103 How Indexes Work On Partitioned And Sharded Data
103-how-indexes-work-on-partitioned-and-sharded-data.md
tldr.md
104 Some Data Partitioning Strategies For Distributed Data Stores
104-some-data-partitioning-strategies-for-distributed-data-stores.md
tldr.md
105 Data Partitioning
105-data-partitioning.md
tldr.md
106 Leaderless Replication
106-leaderless-replication.md
tldr.md
107 Conflict Resolution
107-conflict-resolution.md
tldr.md
108 Conflict Detection
108-conflict-detection.md
tldr.md
109 Multi Master Replication
109-multi-master-replication.md
tldr.md
110 Monotonic Reads
110-monotonic-reads.md
tldr.md
111 Read Your Write Consistency
111-read-your-write-consistency.md
tldr.md
112 Handling Outages Master Replica
112-handling-outages-master-replica.md
tldr.md
113 Replication Formats
113-replication-formats.md
tldr.md
114 Replication Strategies
114-replication-strategies.md
tldr.md
115 Master Replica Replication
115-master-replica-replication.md
tldr.md
116 Durability
116-durability.md
tldr.md
117 Isolation
117-isolation.md
tldr.md
118 Atomicity
118-atomicity.md
tldr.md
119 Consistency
119-consistency.md
tldr.md
120 Architectures In Distributed Systems
120-architectures-in-distributed-systems.md
tldr.md
121 Mistaken Beliefs Of Distributed Systems
121-mistaken-beliefs-of-distributed-systems.md
tldr.md
122 Fork Bomb
122-fork-bomb.md
tldr.md
123 Chained Operators Python
123-chained-operators-python.md
tldr.md
124 Taxonomy On Sql
124-taxonomy-on-sql.md
tldr.md
125 The Weird Walrus
125-the-weird-walrus.md
tldr.md
126 Fully Persistent Arrays
126-fully-persistent-arrays.md
tldr.md
127 Persistent Data Structures Introduction
127-persistent-data-structures-introduction.md
tldr.md
128 Constant Folding Python
128-constant-folding-python.md
tldr.md
129 String Interning Python
129-string-interning-python.md
tldr.md
130 Recursion Visualizer Python
130-recursion-visualizer-python.md
tldr.md
131 Flajolet Martin
131-flajolet-martin.md
tldr.md
132 2q Cache
132-2q-cache.md
tldr.md
133 Israeli Queues
133-israeli-queues.md
tldr.md
134 1d Terrain
134-1d-terrain.md
tldr.md
135 Jaccard Minhash
135-jaccard-minhash.md
tldr.md
136 Ts Smoothing
136-ts-smoothing.md
tldr.md
137 Lfu
137-lfu.md
tldr.md
138 Morris Counter
138-morris-counter.md
tldr.md
139 Slowsort
139-slowsort.md
tldr.md
140 Bitcask
140-bitcask.md
tldr.md
141 Phi Accrual
141-phi-accrual.md
tldr.md
142 10x Engineer
142-10x-engineer.md
tldr.md
143 Decipher Repeated Key Xor
143-decipher-repeated-key-xor.md
tldr.md
144 Decipher Single Xor
144-decipher-single-xor.md
tldr.md
145 Python Iterable Integers
145-python-iterable-integers.md
tldr.md
146 Inheritance C
146-inheritance-c.md
tldr.md
147 Rum
147-rum.md
tldr.md
148 Consistent Hashing
148-consistent-hashing.md
tldr.md
149 Python Caches Integers
149-python-caches-integers.md
tldr.md
150 Fractional Cascading
150-fractional-cascading.md
tldr.md
151 Copy On Write
151-copy-on-write.md
tldr.md
152 Midpoint Insertion Caching Strategy
152-midpoint-insertion-caching-strategy.md
tldr.md
153 Fsm Python
153-fsm-python.md
tldr.md
154 Bayesian Average
154-bayesian-average.md
tldr.md
155 Sliding Window Ratelimiter
155-sliding-window-ratelimiter.md
tldr.md
156 Idf
156-idf.md
tldr.md
157 Better Programmer
157-better-programmer.md
tldr.md
158 Python Prompts
158-python-prompts.md
tldr.md
159 Rule 30 Cellular Automata
159-rule-30-cellular-automata.md
tldr.md
160 Function Overloading
160-function-overloading.md
tldr.md
161 Isolation Forest
161-isolation-forest.md
tldr.md
162 Image Steganography
162-image-steganography.md
tldr.md
163 Long Integers Python
163-long-integers-python.md
tldr.md
164 I Changed My Python
164-i-changed-my-python.md
tldr.md
165 Benchmark And Compare Pagination Approach In Mongodb
165-benchmark-and-compare-pagination-approach-in-mongodb.md
tldr.md
166 Mongodb Cursor Skip Is Slow
166-mongodb-cursor-skip-is-slow.md
tldr.md
167 Fast And Efficient Pagination In Mongodb
167-fast-and-efficient-pagination-in-mongodb.md
tldr.md
168 Making Http Requests Using Netcat
168-making-http-requests-using-netcat.md
tldr.md

It is no longer correct. Agentic AI systems violate this contract at every layer simultaneously.

In this article, I break down exactly which assumptions are failing, why they matter, and what to do about it - with concrete patterns and code. Let’s dig right in…

Assumption - Deterministic Caller

In every application you have deployed before agents, the queries hitting your database were authored by a human.

  • developer wrote the SQL
  • developer code-reviewed it
  • developer tested it and deployed it.

This assumption runs so deep that the tooling reflects it automatically: the Postgres query planner builds statistics around observed query patterns, caching layers warm up on repeated queries, and connection pools are tuned around the expected number of concurrent queries of a known complexity.

Agents work differently; they reason their way to queries. Different reasoning paths produce different queries against the same tables.

An agent working on a customer analytics task might issue a join across five tables that has never been issued before, hold the connection while it thinks about the result, then issue a completely different follow-up. Your indexes cover the happy path. Your connection pool is sized for your observed peak. Neither of those holds when the agent can build any query depending on the data it needs.

Statement Timeouts

Statement timeouts are your first line of defense. A human-authored query that takes 30 seconds is a bug that someone will notice. An agent query that takes 30 seconds might be a reasoning loop that no one is watching.

So, set timeouts at the role level, not just the application level.

Plain text

The idle_in_transaction_session_timeout is especially important. Agents that pause mid-reasoning while holding an open transaction could be a legitimate situation.

Assumption - Writes are Intentional

The most dangerous assumption in database architecture is that every write was reviewed by a human before it happened. This was basically true for your entire career, but not anymore.

Agents write autonomously. They write based on their current understanding of the task, which may be wrong. Agents write in loops when their tools return unexpected results. Agents write on retries when a transient network error makes them ‘think’ the first attempt failed. Agents can even write thousands of rows in the time it takes you to get a Slack notification that something looks off.

Here’s a real documented failure pattern - an agent calling a legacy API receives HTTP 200 with an empty result set. The API failed silently because the database connection pool was exhausted downstream. The agent interprets “no data” as “no problem” and proceeds to process 500 transactions with incomplete data. No exception was raised. No alert fired. The log showed “decision: approved” on every record.

The core fix here is to design your write paths assuming the caller might be wrong, might retry, and might not be watching the results.

Soft Deletes Everywhere

Never let an agent hard-delete anything. Use soft deletes as a baseline for any table an agent can write to

Plain text

The deleted_by column is more important than it looks. When you are debugging what happened two hours ago, “show me everything agent X deleted” is a query you will want to run.

Append-only Event Logs

For operations where the stakes are higher - financial records, inventory changes, user state mutations - consider going further and making the table append-only. The agent never issues UPDATE or DELETE. It issues INSERT with a new state and a reason:

Plain text

This is the event sourcing pattern applied at the table level. A single append-only log table for your most sensitive entities gives you a complete audit trail and makes “undo” a projection query.

Idempotency Keys Are Not Optional

Agents retry, and this is by design. Every orchestration framework operates on at-least-once delivery semantics. If a step fails, it runs again. Your write paths need to be designed for this.

An idempotency key is a stable identifier that an agent includes with every write. The database rejects duplicates silently with a unique constraint. The agent gets a successful response either way. Running the operation twice produces the same result as running it once.

Plain text

In practice, the agent constructs the key like this:

Plain text

The task ID comes from the orchestration layer and is stable across retries of the same logical task. This means the agent can retry as many times as it needs to, and your database sees exactly one write per logical operation.

Assumption - Connections are Brief

Traditional connection pool sizing follows a straightforward mental model. Your application handles N concurrent requests. Each request needs one database connection for a brief period. You size your pool to slightly above your expected concurrency peak, add a little headroom, and you are done.

Agents break this model in three ways.

  1. Agents hold connections longer

A multi-step reasoning task may issue a query, pause to process the result with the LLM, issue another query, pause again, and repeat. Each pause holds the connection open. The connection time per task is no longer “query execution time” - it is “query execution time + LLM inference time x reasoning steps.”

  1. Agents fan out

A single high-level agent task often spawns sub-agents to work in parallel. One task becomes five simultaneous database sessions. This can exhaust connections when concurrent agent workflows holding db.session open across long IO waits until Postgres ran out of connection slots.

  1. Agents multiply unexpectedly

In development, you had three agents. In production, you have thirty. Nobody updated the connection pool configuration.

The fix is a dedicated connection pool for agent workloads, sized independently from your human-facing transactional application traffic

Plain text

The pool_timeout=3 is deliberate. When an agent cannot get a connection within 3 seconds, it should fail fast and retry with backoff, not queue indefinitely. Queued requests under a saturated pool is how you get cascading failures.

For systems running many agents concurrently, add PgBouncer between your agents and Postgres. PgBouncer operates in transaction pooling mode, which means it returns a connection to the pool immediately after each transaction rather than holding it for the entire session. This is a significant multiplier on your effective connection capacity for agentic workloads.

Plain text

In transaction pooling mode, 20 actual Postgres connections can serve 500 agent connections, because each agent only holds a Postgres connection for the duration of a single transaction, not the entire multi-step task.

Assumption - Bad Queries Fail Loudly

In a human-operated system, a slow or incorrect query surfaces quickly. The dashboard loads slowly. The API times out. An engineer runs EXPLAIN ANALYZE and finds the problem. The feedback loop is tight.

Agents close that feedback loop. An agent that gets a slow query result just uses the result. An agent that gets an empty result set does not know whether the data genuinely does not exist or whether the query was wrong. It continues with its task, potentially writing decisions based on a bad read.

This is a different class of failure from application errors. An exception is observable. A semantically wrong query that returns rows is not.

The mitigation is building agent-specific observability into your database access layer. Standard slow query logs are not enough. You need to know which agent, which task, and which reasoning step produced a query. The most practical way to do this in Postgres is query comments

Plain text

These comments appear in pg_stat_activity, pg_stat_statements, and your slow query logs. A query that appears in your slow query log tagged agent_id=fulfillment-v3, task_id=task-abc-123, step=check-inventory is immediately actionable. Without this, you are doing archaeology.

Build a monitoring view that surfaces queries grouped by agent:

Plain text

When you see a single agent type accounting for 60% of total database time, you know where to look.

Assumption - Schema is a Contract With Engg

This is the assumption that most teams never think about until it breaks. Your schema was designed for developer ergonomics - named to make sense to the engineers, structured for query convenience, with nullable columns that “mean something” only if you read the original migration comment.

When an agent can see your schema - through Text-to-SQL, through tool definitions, through an MCP server wrapping your database - the schema becomes a contract with a language model. Column names, table structure, and nullability now affect whether the LLM generates correct queries or confident-sounding nonsense.

Consider the difference between these two column definitions

Plain text

The second schema generates correct LLM queries almost automatically. The first schema requires extensive prompt engineering to compensate for what should have been done at the schema level.

For schemas you cannot rename (legacy systems, high-migration-cost tables), build an agent-facing view layer

Plain text

Write column comments as if they are docstrings - because for Text-to-SQL agents, they are:

Plain text

Scoping Blast Radius

There is one more failure mode worth treating separately, because it cuts across all the assumptions above: the blast radius of a misbehaving agent is determined by the access it was granted.

Traditional applications share a database role, or at best have a few roles for different services. The assumption was that the application code was the guard rail. If the code only allowed users to update their own records, the database role did not need to enforce that - the application layer handled it.

Agents make this assumption dangerous. An agent that reasons itself into an incorrect state can issue queries that the application developers never anticipated. The agent is not a known, finite set of code paths - it is a general-purpose reasoner with access to a database connection. Application-layer guardrails do not bind it the way they bind deterministic code.

The fix is role-per-agent-type access, with the minimum necessary privileges defined at the database level:

Plain text

The question to ask in your access design review is not “what does this agent need?” but “what is the worst case if this agent’s reasoning goes wrong, or if its credentials are compromised?” Reduce that blast radius at the database level, where it cannot be reasoned around.

Defensively Designed Data Layer

Pulling this together, here is what the data layer looks like for a team that has internalized these failure modes. None of it is exotic. All of it exists in battle-tested database tooling.

Every agent type has its own database role with the minimum necessary privileges, enforced at the database level with role-level timeouts. Agents connect through a dedicated connection pool, sized for agentic workload patterns and separated from human-facing traffic. PgBouncer runs in transaction pooling mode between agents and Postgres.

Tables that agents can write to use soft deletes with a deleted_by column that captures agent identity. High-stakes write paths use append-only event log tables with idempotency key constraints. Every write carries an agent ID and task ID so the audit trail is always traversable.

Schema objects that agents can see are named for legibility, not legacy convenience. A maintained view layer translates legacy column names to meaningful ones. Column comments are written as docstrings. Agents are granted access to views, not directly to underlying tables.

Every query issued by an agent carries a comment with the agent ID, task ID, and reasoning step. A monitoring dashboard aggregates this data so the on-call engineer can see “agent X consumed 40% of database time in the last hour” in real time.

The circuit breakers are defined: max writes per task enforced in the orchestration layer, max rows affected per statement enforced via statement complexity checks, max task duration enforced with a watchdog process that terminates stalled agent sessions.

None of this is new technology. Soft deletes, append-only logs, least-privilege roles, row-level security, idempotency keys, query tagging - these are patterns that have existed for years. The shift that agents force is that these patterns go from “best practice we keep meaning to implement” to “load-bearing infrastructure.” Agents do not give you the luxury of deferring them.

The database was not designed for this caller. But the tools to make it safe are already there.

Conclusion and Footnote

Traditional database architecture rests on assumptions that agentic AI workloads systematically violate: deterministic callers, intentional writes, brief connections, loud failures, and schema as a developer contract.

Each of these assumptions held because a human was always somewhere in the loop. Agents remove that guarantee. The result is that patterns long treated as optional best practice - soft deletes, append-only logs, idempotency keys, least-privilege roles, query tagging - become load-bearing infrastructure.

None of this requires new technology. It requires treating the database as a defensive layer that assumes the caller might be wrong, might retry, and might not be watching the results.