The Personal Site of Lalo Morales


Smolagents: Hugging Face’s New Agentic Framework

huggingface - smolagents blog post

Introduction

It’s often said that some years bring massive breakthroughs in AI and machine learning. Judging by all the agent frameworks popping up, 2025 is shaping up to be the Year of Agents. From the early days of AutoGPT and BabyAGI to more refined solutions like LangGraph, CrewAI, Phi AI, and even Hugging Face’s previous “Transformers Agent,” everyone seems to be unveiling their own take on making large language model (LLM) “agents.”

In the past year or so, the AI community has been wrestling with questions about how to best give LLMs the ability to act autonomously, integrate with diverse tools, and handle intricate workflows. Early attempts at fully “unleashed” agents often burned through tokens, frequently became confused, or simply performed unpredictably. More recent frameworks tried to limit an agent’s “agency,” introducing safer, more reliable, but sometimes less flexible approaches.

Now, Hugging Face has revisited the concept with a brand-new release called smolagents. The name is playful, hinting at a goal: keep the framework small and simple while leveraging powerful coding capabilities under the hood. In this blog post, we’ll dive into:

  1. What smolagents is and how it emerged from previous lessons learned.
  2. The differences between “code agents” and “tool-calling agents” in smolagents.
  3. Hugging Face’s plans for model hosting and usage, including how to integrate “pro” models or external LLM providers.
  4. The significance of code-based reasoning and how smolagents extends beyond text-based ReACT flows.
  5. Hands-on examples with smolagents, including tips for custom tools, changing system prompts, and debugging.

Whether you’re simply curious about the next wave of AI agents or looking to build your own advanced multi-agent systems, smolagents could be an intriguing addition to your LLM toolbelt.


From Transformers Agent to Smolagents

A Look Back

This isn’t Hugging Face’s first attempt at an agentic library. If you’ve been around the LLM scene for a while, you might recall Transformers Agent, initially released back in May of 2023 (with subsequent versions arriving later). It never fully took off, however, due to several factors:

  • Complex or limited toolset: The tool integrations weren’t as flexible as some developers needed.
  • Unclear instructions or usage: People weren’t entirely sure how to incorporate Transformers Agent into everyday workflows.
  • Rapidly evolving agent landscape: With the agentic scene exploding in complexity, older frameworks needed heavy redesigns to keep up.

Over time, Hugging Face recognized that agent frameworks shouldn’t just be a novelty; they needed to scale, be robust, and seamlessly integrate with real-world code.

Rethinking With Smolagents

Fast forward to 2025, and Hugging Face is unveiling a new library—smolagents—that attempts to fill those gaps. This library is more than a reboot of Transformers Agent. It is, in fact, described as the “successor” to Transformers Agent and provides a simplified yet powerful approach to building “agentic” applications on top of LLMs.

Smolagents embraces two ideas:

  1. Use code as a core mechanism for decision-making (the so-called “code agent”).
  2. Provide an optional tool-calling agent for simpler tasks.

By combining these two approaches, Hugging Face hopes to serve both advanced, sandboxed Python workflows and simpler text-based logic. Throughout the rest of this post, you’ll see references to the idea that letting an agent “think in code” can be more powerful (and sometimes more reliable) than having it only pass text or JSON around.


What Does “Agency” Mean in Smolagents?

It’s easy to get lost in terms like “agents,” “agency,” or “multi-step decision making,” so let’s break them down:

  • Agency: The ability of an LLM to perform actions—like calling a tool, writing code, or making a web request—rather than just returning text.
  • Multi-step Agents: Agents that work through a loop, taking an action, checking the result, then deciding on their next move.
  • Tool-Calling Agents: Agents that route queries to external functions or APIs (tools) to gather additional information or perform specific tasks.

In the earliest agent frameworks, like BabyAGI or AutoGPT, the LLM had almost too much autonomy. It could run in circles, make random calls, or burn tokens. Meanwhile, heavily restricted frameworks provided safer rails but less flexibility. Smolagents aims to give you a sweet spot of controlled but flexible autonomy.


Key Differentiators of Smolagents

From the get-go, Hugging Face’s smolagents stands out for a few reasons:

  1. Harnessing the Hugging Face Hub
    By virtue of being a Hugging Face project, smolagents integrates seamlessly with thousands of open-source models on the Hugging Face Hub. You can switch among coder models, instruction-tuned models, or specialized domain models.
  2. Code Agents by Default
    Smolagents emphasizes code-based reasoning. The official blog post references multiple papers, including the “Executable Code Actions” work and older “Program-Aided Language Model (PAL)” ideas from 2022–2023. The gist: if your LLM can rely on Python code, it can parse data, do math, or manipulate complex data structures with fewer hallucinations.
  3. Sandboxed Python Execution
    By default, your code agent can run Python in a restricted environment. Hugging Face has integrated E2B for sandboxing, meaning you can safely let the agent run certain (authorized) imports. You get to specify which Python libraries are available, reducing the risk of malicious or uncontrolled code execution.
  4. Flexible Integration with Proprietary Models
    Although smolagents is built with HF Hub models in mind, it also supports Lite LLM to tie in external providers like OpenAI, Anthropic (Claude), or even AWS Bedrock.
  5. Tooling Ecosystem
    Building and using custom tools follows a simple pattern: define them in Python, set I/O constraints, and let smolagents do the rest. Hugging Face is also encouraging a community “tools library” on the Hub, so that you can share or reuse specialized tools.

When (and When Not) To Use an Agent

Even the official smolagents blog urges caution: not every scenario calls for an agent. Sometimes a single prompt with the right instructions is enough to solve your problem. Agents truly shine in these situations:

  • Dynamic and branching workflows: Where a user’s question might lead down many different paths.
  • Complex or multi-step reasoning: If the logic depends on external data sources (web searches, custom APIs, sandbox code).
  • Flexible use of external tools: When you want to manage a variety of APIs or function calls and you’re not sure which one the user will need next.

Avoid agents if your pipeline can be thoroughly expressed in a straightforward function or a well-defined set of steps. A direct, “manual” approach might be simpler, cheaper, and more predictable.


Getting Started: Simple Examples

Let’s walk through the typical workflow for creating a simple code agent in smolagents. Below is a high-level outline of how you might set up a new environment (e.g., in a Jupyter notebook or Google Colab):

  1. Install smolagents
    pip install smolagents
  2. (Optional) Install Lite LLM
    If you plan to use proprietary models like GPT-4 or Claude, you’ll need:
    pip install litellm
  3. Authentication
    • If you’re using Hugging Face Hub models, set your HF token or log in with huggingface-cli login.
    • For OpenAI or Anthropic, set the respective API keys in your environment variables.
  4. Import the Essentials
    from smolagents import CodeAgent, DuckDuckGoSearchTool from smolagents.models import HFAPI
  5. Create and Run an Agent
    model = HFAPI(model_id="Qwen-2.5-coder-32b-instruct") # Example HF model tools = [DuckDuckGoSearchTool()] agent = CodeAgent(model=model, tools=tools) response = agent.run("What is the cube root of 27?") print(response) The agent will use code execution under the hood, calling Python to compute the cube root of 27 and returning “3” as the final answer.

Searching with DuckDuckGo

If you need external data, smolagents can automatically do a web search. For instance:

question = "How long does it take to drive from Melbourne to Sydney?"
answer = agent.run(question)
print(answer)
  • Step 1: The LLM decides it needs real-world data, calls the DuckDuckGo tool with a relevant search query.
  • Step 2: The tool fetches results.
  • Step 3: The agent reads them, does any necessary analysis in Python, and replies with an approximate driving time (e.g., 8–9 hours).

Challenges and Debugging

Import Errors in the Sandbox

One of the biggest features of smolagents is that Python is sandboxed. By default, the agent can only use a small subset of Python libraries, such as math, random, or datetime. If your code agent tries import requests or import json without permission, you’ll see an error:


import requests is not allowed. Authorized imports: [...]

You can fix this by adding:

Now the agent can fetch URLs, parse JSON, or scrape HTML using Beautiful Soup.

Token Usage

Like other advanced agents, smolagents can consume a lot of tokens. Each step in the multi-step chain includes the LLM’s reasoning plus any tool interaction. As your agent tries multiple solutions or code snippets, it racks up input and output tokens. Keep an eye on costs if you’re using models that charge per token.

Sandbox vs. External Tools

If the agent keeps failing to solve a problem in Python, consider building a custom tool. For example, if you often retrieve historical Bitcoin prices, define a BitcoinPriceTool that interacts with a known API and returns data in a direct, easy-to-parse format. Agents are less likely to get lost or throw 20 different Python errors if you streamline common tasks into specialized tools.


Using Proprietary Models via Lite LLM

While the Qwen 2.5 coder model (or other Hugging Face-hosted models) might be sufficient for many tasks, you can easily integrate smolagents with GPT-4, Claude, or other proprietary solutions. Just swap out HFAPI for LiteLLM.

This is particularly handy if you already have an enterprise contract with OpenAI or if you trust GPT-4 more for coding tasks. Smolagents doesn’t limit you to a single provider.


Tool-Calling Agents vs. Code Agents

Traditional JSON-Based Tools

Not every workflow requires Python code generation. In older ReACT-style agents, the model might simply pass a JSON snippet describing the tool call, then parse a JSON response. Smolagents also supports this approach via a tool-calling agent. You define tools with input and output schemas, and the agent knows how to route a user’s query appropriately.

For instance:

Here, the agent simply calls the WeatherTool. It’s the same principle as code-based reasoning, except the agent is not generating or interpreting Python code in each step.

Why Code Agents Might Be Better

The big advantage of code-based agents (like CodeAgent) is that you can tap into Python’s ecosystem. If the model needs to parse HTML, filter results in pandas, or plot charts in matplotlib, it can do so within the sandbox. This drastically expands the capabilities of your agent without forcing you to write elaborate custom tools every time.


Custom Tools and the Hugging Face Hub

If you do want to build your own specialized tools, smolagents makes it straightforward:

You can then push this tool to the Hugging Face Hub for others to discover. Over time, we might see a large library of user-created tools, covering everything from advanced geospatial queries to specialized financial data lookups.


Memory and Logging

One of the more intriguing features of smolagents is a logging and memory system. Each agent call can generate a chain of:

  • System messages: The system prompt that includes internal instructions and constraints.
  • User messages: The user’s original question or query.
  • Assistant messages: The agent’s step-by-step reasoning or final answer.

By default, smolagents keeps track of logs and the final chain-of-thought. If you want to see them:

You can also add lines to the agent’s memory, letting it build up knowledge or remember important states. This is particularly relevant if you’re exploring advanced multi-agent systems where one agent delegates tasks to others and might need to “remember” partial steps or relevant data.


Limitations and Ongoing Questions

As promising as smolagents is, it still has a few kinks and potential downsides:

  1. Frequent Errors in Code Generation
    Depending on the model or the system prompt, code agents can attempt invalid imports or cause runtime errors. They often correct themselves, but each failed step burns tokens and time.
  2. Max Iterations
    In many examples, the agent hits a maximum iteration limit (e.g., 5 or 10 attempts) if it repeatedly fails. This prevents infinite loops but can be frustrating if your scenario requires many trial-and-error steps.
  3. Model Compatibility
    Some advanced Hugging Face models require a Pro subscription. If you only have a free tier, you might be limited in your choice of large coder or instruct models. Alternatively, you can switch to a pay-as-you-go arrangement with OpenAI or Anthropic.
  4. Memory Persistence
    Smolagents offers logs and optional memory, but it’s not as advanced as more specialized memory frameworks (e.g., tools that store long-term user data or retrieve it with embeddings). This might change with future updates.

Despite these caveats, smolagents has potential to streamline tasks that call for flexible, multi-step problem-solving with code.


Multi-Agent Orchestration

One of the more advanced features Hugging Face briefly teased is multi-agent workflows within smolagents. This concept might ring a bell if you’ve explored CrewAI or other orchestrators that have a “manager agent” and “worker agents.” In a multi-agent scenario, you might:

  • Have a ManagerAgent that interprets the user’s request.
  • Delegate tasks to specialized sub-agents (e.g., one for data scraping, another for summarizing text).
  • Consolidate results and produce a final coherent response.

While the official docs mention this is possible in smolagents, there isn’t an abundance of examples yet. Expect more updates and examples as the community begins building elaborate multi-agent systems on top of smolagents.


Potential Use Cases

If you’re thinking about how to make the most of smolagents, here are some promising use cases:

  1. Data Analysis or ETL: Let an LLM ingest CSVs, filter data with pandas, then generate visualizations.
  2. RAG (Retrieval-Augmented Generation): Combine smolagents with a vector database. Let the code agent parse and rank relevant context before forming an answer.
  3. Knowledge Graph Queries: If you have a specialized tool that queries a graph database, a code agent might dynamically parse results, join them to other data, or run further computations.
  4. Web Automation: With smolagents, you could create a tool that logs into websites, extracts info, or triggers actions, all within a sandboxed environment. (Though you’d need to be mindful of security implications.)

Final Thoughts: Prospects for Smolagents

Will smolagents become the next big thing in the agentic AI ecosystem? It has a good shot, given Hugging Face’s track record and the popularity of their Hub. The library encourages:

  • Lightweight setup: Two or three lines of Python code can get you an agent.
  • Powerful code-based reasoning: Agents can do more than just return text or simple JSON.
  • Community-driven expansions: A potential “tools marketplace” on the Hugging Face Hub could supercharge its adoption.

Still, some open questions remain:

  • How robust is the sandboxing in large-scale production?
  • How effectively can smolagents store and leverage memory over time?
  • Will enough developers create and share advanced tools to make it stand out from other frameworks?

Regardless, smolagents is well worth investigating if you’re exploring new ways to harness LLMs for dynamic, multi-step tasks. It blends the best of earlier agentic approaches—like ReACT patterns, code-based reasoning from PAL, and a modern ecosystem of specialized tools and memory logging—into a single cohesive platform. For those of us who love coding, the prospect of letting an LLM generate and execute Python in a sandbox is very compelling indeed.


Conclusion

In an era where AI agents are everywhere—some too rigid, others too freewheeling—smolagents offers a promising balance of flexibility and simplicity. It’s an evolution from Hugging Face’s previous attempts, incorporating lessons learned from the entire community about how to keep large language models on track, safe, and truly helpful.

  • If you want quick, text-based solutions that gather straightforward data, tool-calling agents may suffice.
  • If you require deeper integration—such as Python-based logic, data analysis, or multi-step planning—code agents are a game-changer.

It’s exciting to watch how smolagents will evolve, especially with features like multi-agent management, memory, and a well-curated library of community tools. For now, if you’re curious about agentic LLM frameworks, give smolagents a whirl. It’s surprisingly easy to set up, refreshingly simple to debug, and fully open to the advanced code-based workflows that are increasingly popular in 2025.

Have fun exploring smolagents, and don’t forget to share your own experiences, tips, or any custom tools you create. With an entire ecosystem of developers building on top of Hugging Face’s new framework, this next generation of agents might indeed live up to the hype.

Thanks for reading—and here’s to a future with more robust, safe, and creative AI agents that help us solve the toughest challenges and daily tasks alike.

Share via
Copy link