Alibaba's Metis agent cuts redundant AI tool calls from 98% to 2% — and gets more accurate doing it
One of the key challenges of building effective AI agents is teaching them to choose between using external tools or relying on their internal knowledge. But large language models are often trained...
Source: venturebeat.com
One of the key challenges of building effective AI agents is teaching them to choose between using external tools or relying on their internal knowledge. But large language models are often trained to blindly invoke tools, which causes latency bottlenecks, unnecessary API costs, and degraded reasoning caused by environmental noise. To overcome this challenge, researchers at Alibaba introduced Hierarchical Decoupled Policy Optimization (HDPO), a reinforcement learning framework that trains agents