Artificial intelligence chatbots such as OpenAI LP’s ChatGPT have reached a fever pitch of popularity recently not just for their ability to hold humanlike conversations, but because they can perform knowledge tasks such as research, searches and content generation.
Now there’s a new contender taking social media by storm that extends the capabilities of OpenAI’s offering by automating its abilities even further: Auto-GPT. It’s part of a new class of AI tools called “autonomous AI agents” that take the power of GPT-3.5 and GPT-4, the generative AI technologies behind ChatGPT, to approach a task, build on its own knowledge, and connect apps and services to automate tasks and perform actions on the behalf of users.
ChatGPT might seem magical to users for its ability to answer questions and produce content based on user prompts, such as summarizing large documents or generating poems and stories or writing computer code. However, it’s limited in what it can do because it’s capable of doing only one task at a time. During a session with ChatGPT, a user can prompt the AI with only one question at a time and refining those prompts or questions can be a slow and tedious journey.
Auto-GPT, created by game developer Toran Bruce Richards, takes away these limitations by allowing users to give the AI an objective and a set of goals to meet. Then it spawns a bot that acts like a person would, using OpenAI’s GPT model to perform AI prompts in order to approach that goal. Along the way, it learns to refine its prompts and questions in order to get better results with every iteration.
It also has internet connectivity in order to gather additional information from searches. Moreover, it has short- and long-term memory through database connections so that it can keep track of sub-tasks. And it uses GPT-4 to produce content such as text or code when required. Auto-GPT is also capable of challenging itself when a task is incomplete and filling in the gaps by changing its own prompts to get better results.
According to Richards, although current AI chatbots are extremely powerful, their inability to refine their own prompts on the fly and automate tasks is a bottleneck. “This inspiration led me to develop Auto-GPT, which can apply GPT-4’s reasoning to broader, more complex problems that require long-term planning and multiple steps,” he told Vice.
Auto-GPT is available as open source on GitHub. It requires an application programming interface key from OpenAI to access GPT-4. And to use it, people will need to install Python and a development environment such as Docker or VS Code with a Dev Container extension. As a result, it might take a little bit of technical knowhow to get going, though there’s extensive setup documentation.
How does it work?
In a text interface, Auto-GPT asks the user to give the AI a name, a role, an objective and up to five goals that it should reach. Each of these defines how the AI agents will approach the action the user wants and how it will deliver the final product.
First, the user sets a name for the AI, such as “RestaurantMappingApp-GPT,” and then set a role, such as “Develop a web app that will provide interactive maps for nearby restaurants.” The user can then set a series of goals, such as “Write a back-end in Python” and “Program a front end in HTML,” or “Offer links to menus if available” and “Link to delivery apps.”
It does this by breaking the overall job into smaller tasks to work on each, and it uses a primary monitoring AI bot that acts as a “manager” to make sure that they coordinate. This particular prompt asks the bot to build a somewhat complex app that could go awry if it doesn’t keep track of a number of different moving parts, so it might take a large number of steps to get there.
With each step, each AI instance will “narrate” what it’s doing and even criticize itself in order to refine its prompts depending on its approach toward the given goal. Once it reaches a particular goal, each instance will finalize its process and return its answer back to the main management task.
Trying to get ChatGPT or even the more advanced, subscription-based GPT-4 to do this without supervision would take a large number of manual steps that would have to be attended to by a human being. Auto-GPT does them on its own.
The capabilities of Auto-GPT are beneficial for neophyte developers looking to get ahead in the game, Brandon Jung, vice president of ecosystem at AI-code completion tool provider Tabnine Ltd., told SiliconANGLE.
“One benefit is that it’s a good introduction for those that are new to coding, and it allows for quick prototyping,” Jung said. “For use cases that don’t require exactness or have security concerns, it could speed up the creation process without having to be part of a broader system that includes an expert for review.”
Being able to build apps rapidly, including all the code all at once, from a simple series of text prompts would bring a lot of new templates for code into the hands of developers. Essentially providing them with rapid solutions and foundations to build on. However, they would have to go through a thorough review first before being put into production.
What kind of applications can Auto-GPT be used for?
That’s just one example of Auto-GPT’s capabilities. With its capabilities, it has wide-reaching possibilities that are currently being explored by developers, project managers, AI researchers and anyone else who can download its source code.
“There are numerous examples of people using Auto-GPT to do market research, create business plans, create apps, automate complex tasks in pursuit of a goal, such as planning a meal, identifying recipes and ordering all the ingredients, and even execute transactions on behalf of the user,” Sheldon Monteiro, chief product officer at the digital business transformation firm Publicis Sapient, told SiliconANGLE.
With its ability to search the internet, Auto-GPT can be tasked with quick market research such as “Find me five gaming keyboards under $200 and list their pros and cons.” With its ability to break a task up into multiple subtasks, the autonomous AI could then rapidly search multiple review sites, produce a market research report and come back with a list of gaming keyboards that come in under that amount and supply their prices as well as information about them.
A Twitter user named MOE created an Auto-GPT bot named “Isabella” that can autonomously analyze market data and outsource to other AIs. It does so by using the AI framework Lang-chain to gather data autonomously and do sentiment analysis on different markets.
Auto-GPT isn’t the only autonomous agent AI currently available. Another that has come into vogue is BabyAGI, which was created by Yohei Nakajima, a venture capitalist and artificial intelligence researcher. AGI refers to “artificial general intelligence,” a hypothetical type of AI that would have the ability to perform any intellectual task – but no existing AI is anywhere close. BabyAGI is a Python-based task management system that uses the OpenAI API, like Auto-GPT, that prioritizes and builds new tasks toward an objective.
There are also AgentGPT and GodMode, which are much more user-friendly in that they use a web interface instead of needing an installation on a computer, so they can be accessed as a service. These services lower the barrier to entry by making it simple for users because they don’t require any technical knowledge to use and will perform similar tasks to Auto-GPT, such as generating code, answering questions and doing research. However, they can’t write documents to the computer or install software.
Autonomous agents are powerful but experimental
These tools do have drawbacks, however, Monteiro warned. The examples on the internet are cherry-picked and paint the technology in a glowing light. For all the successes, there are a lot of issues that can happen when using it.
“It can get stuck in task loops and get confused,” Monteiro said. “And those task loops can get pretty expensive, very fast with the costs of GPT-4 API calls. Even when it does work as intended, it might take a fairly lengthy sequence of reasoning steps, each of which eats up expensive GPT-4 tokens.”
Accessing GPT-4 can cost money that varies depending on how many tokens are used. Tokens are based on words or parts of phrases sent through the chatbot. Charges range from three cents per 1,000 tokens for prompts to six cents per 1,000 tokens for results. That means using Auto-GPT running through a complex project or getting stuck in a loop unattended could end up costing a few dollars.
At the same time, GPT-4 can be prone to errors, known as “hallucinations,” which could spell trouble during the process. It could come up with totally incorrect or erroneous actions or, worse, produce insecure or disastrously bad code when asked to create an application.
“[Auto-GPT] has the ability to execute on previous output, even if it gets something wrong it keeps going on,” said Bern Elliot, a distinguished vice president analyst at Gartner. “It needs strong controls to avoid it going off the rails and keeping on going. I expect misuse without proper guardrails will cause some damaging unexpected and unintended outcomes.”
The software development side could be equally problematic. Even if Auto-GPT doesn’t make a mistake that causes it to produce broken code, which would cause the software to simply fail, it could create an application riddled with security issues.
“Auto-GPT is not part of a full software development lifecycle — testing, security, et cetera — nor is it integrated into an IDE,” Jung said, warning about the potential issues that could arise from the misuse of the tool. “Abstracting complexity is fine if you are building on a strong foundation. However, these tools are by definition not building strong code and are encouraging bad and insecure code to be pushed into production.”
The future of Auto-GPT and other autonomous agents
Tools such as Auto-GPT, BabyAGI, AgentGPT and GodMode are still experimental, but there are broader implications in how they could be used to replace routine tasks such as vacation planning or shopping, explained Monteiro.
Right now, Microsoft has even developed simple examples of a plugin for Bing Chat. It allows users to ask it to offer them dinner suggestions that will have its AI – which is powered by GPT-4 – will roll up a list of ingredients and then launch Instacart to have them prepared for delivery. Although this is a step in the direction of automation, bots such as Auto-GPT are edging toward a potential future of all-out autonomous behaviors.
A user could ask for Auto-GPT to look through local stores, prepare lists of ingredients, compare prices and quality, set up a shopping cart and even complete orders autonomously. At this experimental point, many users may not be willing to allow the bot to go all the way through with using their credit card and deliver orders all on its own, for fear that it could go haywire and send them several hundred bunches of basil.
A similar future where an AI does this for travel agents using Auto-GPT may not be far away. “Give it your parameters — beach, four-hour max travel, hotel class — and your budget, and it will happily do all the web browsing for you, comparing options in quest of your goal,” said Monteiro. “When it is done, it will present you with its findings, and you can also see how it got there.”
As these tools begin to mature, they have a real chance of providing a way for people to automate away mundane step-by-step tasks that happen on the internet. That could have some interesting implications, especially in e-commerce.
“How will companies adapt when these agents are browsing sites and eliminating your product from the consideration set before a human even sees the brand?” said Monteiro. “From an e-commerce standpoint, if people start using Auto-GPT tools to buy goods and services online, retailers will have to adapt their customer experience.”
* Image: Freepik