003-rag-production

What Matters in Production RAG

Source: https://arpitbhayani.me/blogs/rag-production Date: 2026-05-15

Most of us build RAG the same way: follow a tutorial that embeds a handful of PDFs, stores the vectors in a local Chroma instance, and chains everything together with LangChain (if that's still a thing). The demo works. The answer looks reasonable. Then you take it to production and it falls apart in quiet, hard-to-diagnose ways.

Most of us build RAG the same way: follow a tutorial that embeds a handful of PDFs, stores the vectors in a local Chroma instance, and chains everything together with LangChain (if that’s still a thing). The demo works. The answer looks reasonable. Then you take it to production and it falls apart in quiet, hard-to-diagnose ways.

What Matters in Production RAG

Source: https://arpitbhayani.me/blogs/rag-production Date: 2026-05-15

Most of us build RAG the same way: follow a tutorial that embeds a handful of PDFs, stores the vectors in a local Chroma instance, and chains everything together with LangChain (if that's still a thing). The demo works. The answer looks reasonable. Then you take it to production and it falls apart in quiet, hard-to-diagnose ways.

What Matters in Production RAG

003-rag-production

What Matters in Production RAG

RAG Basics

Chunking

Embedding Models and the Model-Lock Problem

RAG Indexing Pipelines

Chunk Identity

Avoiding Unnecessary Re-Embedding

Index Versioning and No-Downtime Updates

Embedding Model Upgrades

Observability and Retrieval Tracing

The Span Architecture

Logging the “Why”

Retrieval Quality vs Answer Quality

Index Version Attribution in Traces

Footnote