RAG — Fieldtested

Retrieval-Augmented Generation — a pattern where relevant external documents are retrieved at query time and added to the prompt to ground the model's answer.

RAG (Retrieval-Augmented Generation) was introduced by Lewis et al. in 2020 and has become the default architecture for LLM applications that need to answer questions over private or up-to-date knowledge. The flow: convert the user’s query to an embedding, search a vector database for similar document chunks, inject the retrieved chunks into the prompt, then let the model generate a grounded answer.

Classical RAG is fixed-pipeline: one retrieval, one generation. Agentic RAG turns retrieval into a tool the agent calls when it decides it needs more context, possibly multiple times with refined queries.

RAG’s strength is grounding — the model cites or quotes source material rather than hallucinating. Its weaknesses are chunk-boundary problems (relevant info split across chunks), embedding-similarity failures (semantically similar but contextually wrong matches), and ranking quality (top-K retrieval doesn’t always surface the right material). In 2026, hybrid retrieval (keyword + vector + reranking) handles most production cases that pure vector RAG misses.