Notes on LLM-based Autonomous Agents: Hype vs. Reality, May 2024.
While general LLM agents promise flexibility, devs find them very unreliable for production applications.
There has been a lot of hype around the promise of LLM-based autonomous aget workflows. In mid 2024, all major LLMs are capable of tool use and function calling, enabling the LLM to perform sequences of tasks with autonomy.
But reality is proving more challenging than anticipated.
The WebArena leaderboard, which benchmarks LLM agents against real-world tasks, shows that even the best-performing models have a success rate of only 35.8%.