Definition
An LLM agent is a system where an LLM acts as the reasoning engine (the "brain") that dynamically plans actions, selects tools to use, executes those tools, observes results, and continues reasoning until a goal is achieved. Unlike workflows (fixed scripts), agents decide their own next steps based on the current state.
The Core Agent Loop
`
[Goal] → [LLM: think + decide action] → [Execute action/tool] → [Observe result]
↑ |
└──────────────── [Continue until done] ←─────────────┘
`
This is called the ReAct loop (Reason + Act):
1. Reason: LLM thinks about what to do next
2. Act: Execute a tool or action
3. Observe: Receive the result
4. Repeat until the goal is reached or a termination condition is met
Agent Components
1. The LLM (Brain)
- Receives: goal + current state + available tools + history
- Produces: reasoning + next action to take
- Examples: GPT-4o, Claude 3.5 Sonnet, LLaMA 3.1 (with tool use)
- Short-term: the current conversation / context window (active state)
- Long-term: external storage (vector DB) persisted across sessions
- Episodic: memory of past actions/outcomes
- Some agents have an explicit planning step before acting
- Plan → decompose into sub-tasks → execute sub-tasks
- After completing a task, the agent reflects on what went wrong
- Updates its approach for the next attempt
- Improves over iterations without weight updates
- Verification steps: have the agent check its own work
- Human-in-the-loop: require approval for irreversible actions
- Sandboxed execution: code runs in isolated environment
- Structured outputs: enforce JSON for tool calls to prevent parsing errors
- Step limits:
max_iterations = 20to prevent runaway agents - Workflow, Tool Calling, ReAct, Multi-Agent, LangChain, LangGraph, Orchestration, Memory
2. Tools
External capabilities the agent can invoke:
| Tool Type | Examples |
|-----------|---------|
| Web search | Bing API, SerpAPI, Brave Search |
| Code execution | Python REPL, Jupyter |
| File I/O | Read/Write files, directory listing |
| Database | SQL query, vector DB search |
| APIs | Weather, stock prices, CRM |
| Browsers | Playwright, Selenium for web automation |
| Memory | Vector DB for long-term memory |
| Other LLMs | Spawning sub-agents |
3. Memory
4. Planner
Agent Architectures
ReAct (Reason + Act)
`
Thought: I need to find the current weather in Paris.
Action: search("Paris weather today")
Observation: "Paris: 18°C, partly cloudy"
Thought: I have the answer.
Final Answer: It is currently 18°C and partly cloudy in Paris.
`
Plan-and-Execute
1. Plan: generate a full plan of steps upfront
2. Execute: execute each step (with another LLM call or tool)
3. Replan: if needed, update the plan based on results
Reflexion
Multi-Agent (Multi-Agent Systems)
Multiple specialized agents collaborate:
`
Orchestrator Agent
├── Research Agent (searches the web)
├── Code Agent (writes and runs code)
├── Review Agent (checks for errors)
└── Writer Agent (produces final output)
`
Tool Calling / Function Calling
Modern LLMs support structured tool calling:
`json
// Model outputs:
{
"tool": "search",
"parameters": {"query": "latest iPhone price"},
"reasoning": "I need to find the current price"
}
// Developer executes the tool and returns result to model
`
APIs: OpenAI Function Calling, Anthropic Tool Use, Bedrock Tool Use
Agent Execution Frameworks
| Framework | Description |
|-----------|-------------|
| LangChain Agents | ReAct and OpenAI Functions agents |
| LangGraph | Graph-based stateful multi-agent systems |
| AutoGen (Microsoft) | Multi-agent conversation framework |
| CrewAI | Role-based multi-agent teams |
| Autogen Studio | Visual multi-agent builder |
| Claude Code | Developer agent with full tool access |
| OpenAI Swarm | Lightweight multi-agent handoff |
Agent vs. Workflow Summary
| Dimension | Agent | Workflow |
|-----------|-------|----------|
| Control flow | Model decides | Developer defines |
| Flexibility | High | Low |
| Predictability | Lower | High |
| Failure modes | Can loop, go off-track | Fail at defined points |
| Cost | Variable (unknown # calls) | Predictable |
| Best for | Open-ended tasks | Structured, repeatable tasks |
Challenges and Failure Modes
| Challenge | Description | Mitigation |
|-----------|-------------|-----------|
| Infinite loops | Agent keeps taking actions without terminating | Max step limits |
| Tool misuse | Calls wrong tool or with wrong parameters | Strict tool schemas |
| Hallucinating tool results | Makes up what a tool returned | Require actual execution |
| Lost context | Forgets earlier observations in long tasks | Summarization, memory |
| Cost explosion | Too many LLM calls | Budget limits, step caps |
| Security risks | Agent executes dangerous code | Sandboxing, approval steps |