New framework simplifies complex agentic AI landscape



With the ecosystem of agentic tools and frameworks exploding in size, it is becoming increasingly difficult to navigate the many options for building AI systems, leaving developers confused and paralyzed when choosing the right tools and models for their applications.

In a new studyResearchers from multiple institutions present a comprehensive framework for untangling this complex network. They classify agentic frameworks based on their focus area and trade-offs, providing a practical guide for developers to choose the right tools and strategies for their applications.

For enterprise teams, this reframes agentic AI from a model selection problem to an architectural decision about where to spend training budget, how much modularity to preserve, and what tradeoffs they are willing to make between cost, flexibility, and risk.

Agent vs tool adaptation

Researchers divide the landscape into two main dimensions: adaptation of agents And adaptation of tools.

Agent adaptation involves modifying the basic model that underlies the agent system. This is done by updating the agent’s internal parameters or policies via methods such as fine-tuning or reinforcement learning to better align with specific tasks.

Tool adaptation, on the other hand, shifts attention to the environment surrounding the agent. Instead of recycling the bulky and expensive base model, developers optimize external tools such as search fetchers, memory modules or subagents. In this strategy, the main agent remains "frozen" (unchanged). This approach allows the system to evolve without the massive computational cost of retraining the base model.

The study breaks them down into four distinct strategies:

A1: Tool execution reported: In this strategy, the agent learns by doing. It is optimized using verifiable feedback directly from running a tool, such as a code compiler interacting with a script or a database returning search results. This teaches the agent "mechanical" to use a tool correctly.

A great example is DeepSeek-R1where the model was trained by reinforcement learning with verifiable rewards to generate code that runs successfully in a sandbox. The feedback signal is binary and objective (did the code execute or did it fail?). This method develops strong low-level skills in stable, verifiable areas like coding or SQL.

A2: Agent exit Signaled: Here, the agent is optimized based on the quality of its final response, regardless of the intermediate steps and the number of calls to the tool it makes. This teaches the agent how to orchestrate various tools to arrive at a correct conclusion.

An example is Research-R1an agent that performs a multi-step recovery to answer questions. The model only receives a reward if the final answer is correct, implicitly forcing it to learn better search and reasoning strategies to maximize that reward. A2 is ideal for system-level orchestration, enabling agents to manage complex workflows.

T1: independent of the agent: In this category, tools are trained independently on large data, then "branch" to a frozen agent. Think about the classic dense recuperators used in RAG systems. A standard retrieval model is trained on generic search data. A powerful frozen LLM can use this fetcher to find information, even if the fetcher was not designed specifically for that LLM.

T2: Supervised agent: This strategy involves training tools specifically intended to serve a frozen agent. The supervision signal comes from the agent’s own results, creating a symbiotic relationship in which the tool learns to provide exactly what the agent needs.

For example, the s3logic leads to a small "researcher" template for retrieving documents. This little model is rewarded depending on whether a "reasoner" (a major LLM) can answer the question correctly using these materials. The tool effectively adapts to fill specific knowledge gaps of the lead agent.

Complex AI systems could use a combination of these adaptation paradigms. For example, a deep search system might use T1-style retrieval tools (pre-trained dense retrievers), T2-style adaptive search agents (trained via frozen LLM feedback), and A1-style reasoning agents (fine-tuned with run-time feedback) in a larger orchestrated system.

Hidden costs and trade-offs

For business decision-makers, the choice between these strategies often comes down to three factors: cost, generalizability and scalability.

Cost vs flexibility: Agent adaptation (A1/A2) provides maximum flexibility because you rewire the agent’s brain. However, the costs are high. For example, Search-R1 (an A2 system) required training on 170,000 examples to internalize search capabilities. This requires massive calculations and specialized data sets. On the other hand, models can be much more efficient at inference time because they are much smaller than general models.

On the other hand, tool adaptation (T1/T2) is much more effective. The s3 (T2) system trained a lightweight searcher using only 2,400 examples (about 70 times less data than Search-R1) while achieving comparable performance. By optimizing the ecosystem rather than the agent, businesses can achieve high performance at lower cost. However, this comes with inference time overhead, as s3 requires coordination with a larger model.

Generalization: Risk of methods A1 and A2 "overfitting," where an agent becomes so specialized in a task that it loses its general capabilities. The study found that while Search-R1 excelled at its training tasks, it struggled with specialized medical quality assurance, achieving only 71.8% accuracy. This isn’t a problem when your agent is designed to perform a very specific set of tasks.

Conversely, the s3 (T2) system, which used a general-purpose frozen agent assisted by a trained tool, generalized better, achieving 76.6% accuracy on the same medical tasks. The frozen agent retained its vast knowledge of the world, while the tool managed the specific recovery mechanics. However, T1/T2 systems rely on knowledge of the frozen agent, and if the underlying model cannot handle the specific task, they will be useless.

Modularity: T1/T2 strategies allow "hot swap." You can upgrade a memory module or seeker without touching the core reasoning engine. For example, Memento optimizes a memory module to recover past cases; if the requirements change, you update the module, not the scheduler.

Systems A1 and A2 are monolithic. Teaching an agent a new skill (like coding) through fine-tuning can result in "catastrophic forgetting," where it degrades previously learned skills (like math) because its internal weightings are overwritten.

A strategic framework for business adoption

Based on the study, developers should view these strategies as a progressive ladder, moving from low-risk, modular solutions to high-resource customization.

Start with T1 (agent-agnostic tools): Equip a frozen and powerful model (like Gemini or Claude) with tools of the trade like a dense retriever or a MCP connector. This requires no training and is perfect for prototyping and general applications. It’s the low-hanging fruit that can take you very far for most tasks.

Switch to T2 (agent-supervised tools): If the agent has trouble using generic tools, don’t retrain the master model. Instead, train a small, specialized subagent (like a searcher or memory manager) to filter and format the data exactly the way the main agent likes it. This is very data efficient and suitable for proprietary enterprise data and applications that are large and cost sensitive.

Use A1 (reported tool execution) for specialization: If the agent fundamentally fails at technical tasks (for example, writing non-functional code or incorrect API calls), you need to rethink their understanding of how the tool works. "mechanical." A1 is ideal for creating specialists in verifiable areas like SQL or Python or your proprietary tools. For example, you can optimize a small template for your specific toolset, then use it as a T1 plugin for a general template.

Reserve A2 (agent exit signaled) as "nuclear option": Train an end-to-end monolithic agent only if you need it to internalize complex strategy and self-correction. This is resource intensive and is rarely necessary for standard enterprise applications. In reality, you rarely need to get involved in training your own model.

As the AI ​​landscape evolves, the focus is shifting from building a giant, perfect model to building an intelligent ecosystem of specialized tools around a stable core. For most companies, the most effective path to agentic AI is not to build a bigger brain, but to give it better tools.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *