Over the past week, developers around the world have begun building “autonomous agents” that work with large language models (LLMs) such as OpenAI’s GPT-4 to solve complex problems. While still very new, such agents could represent a major milestone in the productive application of LLMs.
Normally, we interact with GPT-4 by typing carefully worded prompts into ChatGPT’s text window until the model generates the output we want. But most of us lack the skill and patience to sit and write prompt after prompt, guiding the LLM toward answering a complex question, such as “What is the optimal business plan for capturing 20% of the fingernail-polish market?” Quite naturally, developers have been thinking of ways to automate much of that process. That’s where autonomous agents come in.
In general terms, autonomous agents can generate a systematic sequence of tasks that the LLM works on until it’s satisfied a preordained “goal.” Autonomous agents can already perform tasks as varied as conducting web research, writing code, and creating to-do lists.
Agents effectively add a traditional software interface to the front of a large language model. And that interface can use well-known software practices (such as loops and functions) to guide the language model to complete a general objective (such as, “find all YouTube videos about the Great Recession and distill the key points”). Some people call them “recursive” agents because they run in a loop, asking the LLM questions, each one based on the result of the last, until the model produces a full answer.
BabyAGI
The seminal autonomous agent BabyAGI was created by Yohei Nakajima, a VC and habitual coder and experimenter. He describes BabyAGI as an “autonomous AI agent that contains an AI task manager.”
Nakajima, a partner at the small VC firm Untapped Capital, says he originally set out to build an agent that would automate some of the tasks he routinely performs as a VC—researching new technologies and companies, and so on—by replicating his own workflow. “I wake up in the morning and tackle the first thing on the list, and throughout the day I add new tasks, and then at night I review my tasks and reprioritize them, then decide what to do the next day,” he says. BabyAGI also systematically completes, adds, and reprioritizes tasks for the GPT-4 language model to complete.
Realizing that his creation could be applied to all sorts of other objectives, Nakajima stripped the agent down to bare bones (105 lines of code), and uploaded it on GitHub for others to use as a foundation for their own (more specialized) agents.