TLDR: How LLMs Really Work
Date: 2026-05-12 Source: https://arpitbhayani.me/blogs/how-llms-work
Overview
If you have used ChatGPT, Gemini, or Claude, you have already formed an intuition about what these systems do. You type something in, and text comes back that feels coherent, knowledgeable, and sometimes eerily human. But the machinery underneath is simultaneously simpler and stranger than most people expect.
Key Points
- If you have used ChatGPT, Gemini, or Claude, you have already formed an intuition about what these systems do.
- Next-token Prediction Machine: A large language model (LLM) is, at its most fundamental level, a function that takes a sequence of tokens as input and outputs a probability distribution over its entire vocabulary) for what the next token should be.
- What Training Actually Does: The model learns to produce these probability distributions by training on a massive corpus of text - essentially a large fraction of the written internet, books, code, and academic papers.