TLDR: How LLM Inference Works
Date: 2025-11-22 Source: https://arpitbhayani.me/blogs/how-llm-inference-works
Overview
When you enter a prompt into an LLM, the model converts your text into numbers, processes them, and returns a response one token at a time. In this article, we go through the journey of LLM inference and see how it works.
Key Points
- When you enter a prompt into an LLM, the model converts your text into numbers, processes them, and returns a response one token at a time.
- What are Large Language Models?: LLMs are just neural networks built on the transformer architecture.
- Tokenization: Before any computation happens, the model needs to convert your text input into numbers.
- Token Embeddings: Once text becomes tokens, the next step transforms these discrete token IDs into continuous vector representations that neural networks can process.
- The Transformer Architecture: The transformer processes embedding vectors through its layers.