Alibaba’s AI agent framework cuts tool tokens by 99% for complex tasks

Enterprise AI agents often struggle when tasked with complex workflows that require coordinating dozens—or even hundreds—of specialized tools. Traditional approaches expose the entire tool library to the model, forcing it to sift through irrelevant options and consume excessive tokens. Now, researchers at Alibaba have developed SkillWeaver, a framework designed to streamline this process by dynamically constructing execution plans tailored to user queries.

The system introduces Skill-Aware Decomposition (SAD), a feedback-driven technique that iteratively refines task breakdowns to match available tools. Unlike one-shot routing methods that select tools in isolation, SkillWeaver treats multi-step workflows as compositional problems, breaking prompts into sub-tasks and assembling them into executable sequences. In tests using a benchmark of 300 multi-step queries, the framework reduced token consumption by over 99% compared to naive approaches, while maintaining or improving accuracy.

The hidden cost of brute-force tool routing

Modern AI agents rely on skills—structured, reusable tool specifications written in natural language—to extend functionality. However, exposing an entire skill library to a model introduces critical inefficiencies:

Context overload: Feeding hundreds of tool descriptions rapidly exhausts the model’s context window, leading to errors or truncated responses.
Token hemorrhage: Each token processed consumes computational resources, driving up latency and costs in production environments.
Ambiguity amplification: Generic user queries often fail to align with the technical vocabulary of available skills, forcing the model to guess rather than act.

Most existing frameworks address this by pre-filtering tools or organizing them hierarchically. Yet these solutions still treat routing as a single-step selection problem, incapable of handling the interconnected nature of real-world tasks. For example, a request like "Pull sales data from the API, clean the file, and generate a dashboard" demands a sequence of distinct tools working in harmony—not a single monolithic function.

How SkillWeaver reimagines multi-tool orchestration

SkillWeaver tackles compositional routing through a three-phase pipeline: Decompose, Retrieve, and Compose. Each stage refines the workflow until it’s both feasible and efficient.

1. Decompose: Breaking the problem into atomic steps

A lightweight large language model (LLM) acts as the task decomposer, dissecting complex prompts into granular sub-tasks. For instance, the query "Analyze customer churn and create a regression model" might split into:

Fetch customer data via an API client.
Preprocess the raw dataset to handle missing values.
Train a logistic regression model.
Generate visualizations comparing churn rates.

Critically, the decomposer’s output must use terminology that matches the actual skill library—a challenge SkillWeaver solves with SAD.

2. Retrieve: Matching sub-tasks to precise tools

Using a semantic search retriever (powered by MiniLM with a FAISS index), the system scans the skill library to identify the best candidates for each sub-task. Instead of returning all possible matches, it surfaces a shortlist of top candidates based on relevance, reducing noise in downstream planning.

3. Compose: Building a flawless execution plan

A planner evaluates the retrieved tools to construct a directed acyclic graph (DAG) that maps dependencies. The planner ensures:

Compatibility: Outputs from one tool seamlessly feed inputs into the next.
Parallelism: Independent tasks execute concurrently where possible.
Robustness: The plan accounts for edge cases, such as retry logic for failed API calls.

For example, if the planner detects that a data-cleaning tool requires CSV-formatted input but the API returns JSON, it automatically inserts a conversion step—eliminating silent failures.

SAD: The secret sauce for precision matching

The most innovative element of SkillWeaver is Iterative Skill-Aware Decomposition (SAD), which closes the loop between decomposition and tool retrieval. SAD works in cycles:

The LLM drafts an initial task breakdown.
The system retrieves a preliminary set of skills based on that draft.
The retrieved skills are fed back into the LLM as contextual hints, guiding it to refine its decomposition.
The loop repeats until the sub-tasks align perfectly with available tools.

This feedback mechanism ensures the model’s vocabulary and granularity match the actual skill library, drastically reducing mismatches. In Alibaba’s experiments, SAD improved tool-matching accuracy by 18% compared to one-shot decomposition methods.

Benchmarking SkillWeaver against real-world demands

To validate SkillWeaver, researchers created CompSkillBench, a custom benchmark featuring 300 multi-step queries of varying complexity. The test suite leveraged a library of 2,209 real-world skills from the public Model Context Protocol (MCP) ecosystem, spanning 24 categories such as cloud infrastructure, finance, and databases.

The framework was evaluated against three baselines:

LLM-Direct: Stuffing all tool names into the prompt and letting the model choose.
API retrieval: Using traditional API-based routing without decomposition.
Hierarchical routing: Organizing tools into categories and selecting step-by-step.

SkillWeaver outperformed all baselines in efficiency and accuracy, with its token savings translating to lower latency and reduced costs in large-scale deployments. The results suggest that granular task decomposition—not sheer model size—is the key to scalable AI agent systems.

What this means for AI practitioners

For teams building AI agents, SkillWeaver offers a blueprint for scalable, cost-effective multi-tool orchestration. By prioritizing smart decomposition over brute-force exposure, it addresses the core bottlenecks in enterprise AI workflows. The framework’s emphasis on iterative refinement and compatibility checks also paves the way for more reliable automation in domains like data analytics, DevOps, and financial modeling.

As AI agents grow more sophisticated, the ability to dynamically assemble toolchains will separate the experimental from the enterprise-ready. SkillWeaver’s approach—bridging the gap between natural language and executable workflows—could redefine how businesses integrate AI into their operations.

AI summary

Alibaba’nın geliştirdiği SkillWeaver, AI ajanlarının görevleri otomatik olarak parçalamasını ve gereksiz token tüketimini %99’a kadar azaltmasını sağlayan yenilikçi bir framework. Detayları öğrenin.