Introduction to the Focused and Diffuse Modes
Diffuse (as opposed to the focused) mode of thinking:

The
diffuse mode could be thought of as a flashlight set so that it casts
its light very broadly, but not very strongly in any one area. (As
opposed to the focused mode, which would have its light cast very
strongly in a single area, but very weakly everywhere else.)
The type of thinking you need to do when you are trying to understand something new.
When you learn something new, many new synapses (connections) are formed on the dendrites of neurons. This formation of new synapses is one of the brain's ways to adapt and store new information, reflecting the dynamic nature of neural connectivity.
A Procrastination
Key Concepts on Procrastination
- Universal but Variable: Everyone procrastinates to some degree because focusing on one task means not working on many others. However, some people struggle with it more than others.
- Neurological Basis:
- When confronting unpleasant tasks, the brain's pain centers activate
- The brain naturally tries to avoid this discomfort by switching attention
- Research shows this discomfort actually disappears shortly after beginning the task
- Procrastination Cycle:
- Cue → Discomfort → Attention shift → Temporary relief
- This reinforces the avoidance behavior
- The Pomodoro Technique:
- Simple method: 25 minutes of focused work followed by a short break
- Steps: Set timer, eliminate distractions, focus completely, reward yourself after
Unit 1 - Introduction to Agents
Note that Actions are not the same as Tools. An Action, for instance, can involve the use of multiple Tools to complete.
Agent is a system that uses an AI Model (typically a LLM) as its core reasoning engine, to:
Understand natural language: Interpret and respond to human instructions in a meaningful way.
Reason and plan: Analyze information, make decisions, and devise strategies to solve problems.
Interact with its environment: Gather information, take actions, and observe the results of those actions.
Chat-Templates
As mentioned, chat templates are essential for structuring conversations between language models and users. They guide how message exchanges are formatted into a single prompt.
Base Models vs. Instruct Models
Another point we need to understand is the difference between a Base Model vs. an Instruct Model:
A Base Model is trained on raw text data to predict the next token.
An Instruct Model is fine-tuned specifically to follow instructions and engage in conversations. For example, SmolLM2-135M
is a base model, while SmolLM2-135M-Instruct
is its instruction-tuned variant.
The Core Components
Agents work in a continuous cycle of: thinking (Thought) → acting (Act) and observing (Observe).
Let’s break down these actions together:
- Thought: The LLM part of the Agent decides what the next step should be.
- Action: The agent takes an action, by calling the tools with the associated arguments.
- Observation: The model reflects on the response from the tool.
The Core Components
Agents work in a continuous cycle of: thinking (Thought) → acting (Act) and observing (Observe).
Let’s break down these actions together:
- Thought: The LLM part of the Agent decides what the next step should be.
- Action: The agent takes an action, by calling the tools with the associated arguments.
- Observation: The model reflects on the response from the tool.
Actions are the concrete steps an AI agent takes to interact with its environment.
Actions bridge an agent’s internal reasoning and its
real-world interactions by executing clear, structured tasks—whether
through JSON, code, or function calls.
Observations are how an Agent perceives the consequences of its actions.
In the observation phase, the agent:
- Collects Feedback: Receives data or confirmation that its action was successful (or not).
- Appends Results: Integrates the new information into its existing context, effectively updating its memory.
- Adapts its Strategy: Uses this updated context to refine subsequent thoughts and actions.
Unit 2 - Introduction to Agentic Frameworks
When to Use an Agentic Framework
You're right that agentic frameworks aren't always necessary for LLM applications. Here's a more detailed breakdown:
Simple Use Cases (No Framework Needed)
- Direct question-answering
- Content generation with fixed inputs
- Straightforward classification tasks
- Single-turn interactions
In these cases, a simple prompt template and direct API call to an LLM is often sufficient. The overhead of an agentic framework might be unnecessary.
Complex Use Cases (Framework Beneficial)
- Multi-step reasoning processes
- Tasks requiring external tool usage
- Iterative problem-solving
- Autonomous decision-making based on dynamic contexts
- Systems requiring persistent memory across interactions
Framework Comparisons
Each framework you mentioned has different strengths:
- SmolaAgents: Lightweight, minimal abstractions, good for educational purposes
- LlamaIndex: Strong data connection capabilities, retrieval-focused
- LangGraph: Combines LLMs with state machines, enabling complex workflows
Smolagents
smolagents
is a simple yet powerful framework for building AI agents. It provides LLMs with the agency to interact with the real world, such as searching or generating images.
1️⃣ Why Use smolagents
- smolagents is an open-source agent framework, similar to LlamaIndex and LangGraph.
- It has specific advantages and drawbacks, making it suitable for certain use cases.
- Choosing the right framework depends on project requirements.
2️⃣ CodeAgents
- Primary agent type in smolagents.
- Generates Python code instead of JSON or text.
- Used for performing automated actions with executable scripts.
3️⃣ ToolCallingAgents
- Second type of agent in smolagents.
- Generates JSON/text that the system must interpret to execute actions.
- Key differences from CodeAgents and use cases are explored.
4️⃣ Tools
- Tools are functions that an LLM can use in an agentic system.
- They are essential for defining agent behavior.
- Implementation methods include the
Tool
class and @tool
decorator. - Includes how to create, share, and load tools.
5️⃣ Retrieval Agents
- Used for searching, synthesizing, and retrieving information.
- Leverage vector stores and RAG (Retrieval-Augmented Generation) patterns.
- Useful for integrating web search and knowledge bases while maintaining context.
6️⃣ Multi-Agent Systems
- Combining multiple agents enhances functionality and efficiency.
- Example: A web search agent working alongside a code execution agent.
- Focuses on designing, implementing, and managing multi-agent workflows.
7️⃣ Vision and Browser Agents
- Vision agents use Vision-Language Models (VLMs) for image-based reasoning.
- Can analyze visual data and enable multimodal interactions.
- Browser agents can extract information from the web using vision capabilities.
Introduction to Tool Calling Agents
Tool Calling Agents represent the second type of agent available in the smolagents
library. Unlike Code Agents that execute Python snippets, these agents leverage the built-in tool-calling capabilities of LLM providers to generate tool calls as structured JSON objects. This approach has become the standard method used by major AI providers including OpenAI, Anthropic, and many others.
Code Agents vs. Tool Calling Agents: A Comparison
To understand the difference, let's examine how each agent type would handle a request to search for catering services and party ideas:
Default Toolbox:
- PythonInterpreterTool
- FinalAnswerTool
- UserInputTool
- DuckDuckGoSearchTool
- GoogleSearchTool
- VisitWebpageTool
RAG
I've reviewed the material you shared about Agentic RAG (Retrieval-Augmented Generation) systems. Here are the key ideas to understand:
- Traditional RAG vs. Agentic RAG:
- Traditional RAG simply passes a user query to search and uses retrieved results with the query for model response
- Agentic RAG adds autonomous control over both retrieval and generation processes
- Key limitations of traditional RAG:
- Relies on a single retrieval step
- Only focuses on direct semantic similarity to the query
- May miss relevant information
- Agentic RAG advantages:
- Autonomous formulation of search queries
- Ability to critique retrieved results
- Multiple retrieval steps for more comprehensive outputs
- Core components shown in the examples:
- Search tools (like DuckDuckGo integration)
- Custom knowledge bases (using vector databases)
- Document splitting for more efficient retrieval
- BM25 retrieval for semantic search
- Enhanced retrieval strategies:
- Query reformulation: crafting optimized search terms
- Multi-step retrieval: using initial results to inform subsequent queries
- Source integration: combining information from multiple sources
- Result validation: analyzing content for relevance before inclusion
- Implementation considerations:
- Tool selection based on query type and context
- Memory systems to maintain conversation history
- Fallback strategies when primary retrieval methods fail
- Validation steps for accuracy and relevance
Links:
Going deeper:
Langsmith - The guy used from co-working space