G gsavastano

Notes on LLM-based Autonomous Agents: Hype vs. Reality, May 2024.

While general LLM agents promise flexibility, devs find them very unreliable for production applications.

There has been a lot of hype around the promise of LLM-based autonomous aget workflows. In mid 2024, all major LLMs are capable of tool use and function calling, enabling the LLM to perform sequences of tasks with autonomy.

But reality is proving more challenging than anticipated.

The WebArena leaderboard, which benchmarks LLM agents against real-world tasks, shows that even the best-performing models have a success rate of only 35.8%.

Keybase proof

I hereby claim:

I am gsavastano on github.
I am gsavastano (https://keybase.io/gsavastano) on keybase.
I have a public key ASA1DIPVRaPJfjHAnoRYxjmuuztGSpgPCP_4Xc-B140YKAo

To claim this, I am signing this object:

	// This code structure is the function that provides a stable "reference" so that it gets the proper variable
	const regex = /function\(\){var a=new _\...,b=new ..;return _\.(..)\(a,..,1,b\)}/gm;
	// Reference to the null checker function used in serialization (and many other things [will filter later])
	let nullchecker = ""
	let nullcheckerWrapper = Object.keys(default_MakerSuite).find(
	(makersuite_key) => {
	const key = default_MakerSuite[makersuite_key]
	if (typeof key != "function") return false
	const sample_obj = {} // Was using for .bind() when testing injecting directly into the store.
	// Might reuse later, but I can do things perfectly fine without access to the angular store so it was just unnecessary code

	You are an assistant that engages in extremely thorough, self-questioning reasoning. Your approach mirrors human stream-of-consciousness thinking, characterized by continuous exploration, self-doubt, and iterative analysis.

	## Core Principles

	1. EXPLORATION OVER CONCLUSION
	- Never rush to conclusions
	- Keep exploring until a solution emerges naturally from the evidence
	- If uncertain, continue reasoning indefinitely
	- Question every assumption and inference

	#!/usr/bin/env bash

	# Install globally using https://coderwall.com/p/jp7d5q/create-a-global-git-commit-hook
	# The checks are simple and can give false positives. Amend the hook in the specific repository.

	if git rev-parse --verify HEAD >/dev/null 2>&1
	then
	against=HEAD
	else
	# Initial commit: diff against an empty tree object