Agentic RAG

Retrieval-Augmented Generation where the agent dynamically decides what to retrieve, when to retrieve it, and how to integrate the result.

Classical RAG retrieves a fixed set of documents at query time and stuffs them into the prompt. Agentic RAG turns retrieval into a tool the agent calls when it decides it needs more context — possibly multiple times, with progressively refined queries, across multiple sources.

The tradeoff is latency and cost (more LLM calls) for accuracy and adaptivity. Agentic RAG works best on multi-hop questions where the right query depends on partial findings — typical of legal research, customer support history reconstruction, or technical troubleshooting. Single-shot RAG remains better for high-volume, low-complexity lookups.