GLOSSARY

AI agents glossary

Agent Evaluation: The process of measuring agent performance — accuracy, reliability, cost, latency — against defined benchmarks or production data.
Agent Fallback: The mechanism by which an agent yields to a human or alternate path when it cannot proceed safely or confidently.
Agent Handoff: The transfer of an in-progress task from one agent (or human) to another, with enough context to continue without restart.
Agent Harness: The execution scaffold around an LLM that manages the agent loop, tool invocation, memory, and safety controls.
Agent Loop: The repeating cycle where an agent observes, thinks, acts, and re-observes until its goal is achieved or a stop condition triggers.
Agent Memory: The mechanism by which an agent retains information across turns, sessions, or long-running tasks beyond the context window.
Agent Observability: The instrumentation and tooling that makes agent runs inspectable — traces, logs, metrics, replays — so failures can be debugged and improvements measured.
Agent Orchestration: The discipline of coordinating multiple agents — routing tasks, managing handoffs, sharing context, and resolving conflicts.
Agent Policy: The set of rules and constraints that govern what an agent may and may not do, including authentication, rate limits, and forbidden actions.
Agentic AI: AI systems that autonomously plan, act, and use tools to complete multi-step tasks.
Agentic RAG: Retrieval-Augmented Generation where the agent dynamically decides what to retrieve, when to retrieve it, and how to integrate the result.
AI Agent: A software system that uses an LLM to perceive context, decide actions, invoke tools, and complete tasks toward a goal.
Autonomous Agent: An agent that operates without step-by-step human input, deciding its own actions to reach a stated goal.
Browser Agent: An agent specialized in operating a web browser to research, fill forms, scrape data, and complete web-based tasks.
Chain of Thought: A prompting technique where the model is encouraged to produce step-by-step reasoning before answering, improving accuracy on complex tasks.
Computer Use: An agent capability to operate a computer interface directly — clicking, typing, reading screens — instead of calling APIs.
Context Window: The maximum amount of text — measured in tokens — that an LLM can process in a single inference call, including both input and generated output.
Embeddings: Numerical vector representations of text (or other content) where semantic similarity maps to vector distance, enabling search by meaning rather than keyword.
Function Calling: The protocol by which an LLM emits a structured function invocation that a runtime then executes — synonymous with tool calling.
Guardrails: Safety controls layered around an LLM or agent to prevent harmful, off-policy, or non-compliant outputs and actions.
Hallucination: A confident but incorrect output from an LLM — invented facts, fabricated citations, or nonexistent functions — produced as if it were grounded.
iPaaS: Integration Platform as a Service — a category of tools that connect SaaS applications via workflows, increasingly with AI agent capabilities embedded.
MCP: Model Context Protocol — an open standard from Anthropic for connecting LLMs to external tools and data sources via a uniform interface.
Model Context Protocol: The full name of MCP — an open standard for LLM-tool interoperability published by Anthropic in November 2024.
Multi-Agent System: A system of multiple specialized agents that collaborate, hand off tasks, and coordinate to solve problems larger than any single agent.
Prompt Injection: A security vulnerability where attacker-controlled text causes an LLM to follow instructions outside its intended scope, bypassing system rules.
RAG: Retrieval-Augmented Generation — a pattern where relevant external documents are retrieved at query time and added to the prompt to ground the model's answer.
ReAct: A reasoning pattern where an agent alternates explicit thought ("Reason") and action ("Act") steps until the task completes.
Tool Calling: An LLM capability to invoke external functions or APIs as part of a response, used by agents to act on the world.
Vector Database: A database optimized for storing and searching high-dimensional vectors — the embeddings used in semantic search and RAG.