Here Come Agents
Remember Clippy? If you opened a blank document in Word, PowerPoint, or Excel program on your computer in 1996, chances are, this cute little character popped up on your screen and offered to help. This was meant to be a friendly, task-focused program that one could use to speed up their work. It seemed like a good idea at the time. But what followed was nothing short of a colossal backfire. Not only Clippy didn’t take off as Microsoft had hoped for, but it became an annoyance, often attracting harsh criticism from users. This was one of the more infamous cases of an effort for building agents going wrong.
What is an agent, you ask? Let’s just say that it’s an autonomous entity or program that takes preferences, instructions, or other forms of inputs from a user to accomplish specific tasks on their behalf. Clippy was meant to do this in the context of digital office work such as writing a letter, compiling numbers on a spreadsheet, and preparing a presentation. Technically, an agent can be anything that allows one to automat things, even if that automation is not that smart or fancy. For decades, background processes in UNIX systems known as daemons have done a host of automated tasks on behalf of the users without those users even knowing about them. As office software started becoming staples for most desktop computers, developers realized that there were repeated tasks that the users were doing. To help execute them more efficiently, they offered macros. A macro is a set of steps that can be carried out to accomplish a task. Essentially, these are the steps a user was doing over and over again, but now a macro captured those steps in that sequence, allowing it to be executed automatically at a push of a button.
Your thermostat is a kind of an agent. You have set various thresholds for your comfort level and that thermostat calls heat or air on your behalf based on those thresholds. Nothing smart or fancy, but it’s an example of delegating and automating a task. That’s an agent at play.
Of course, today when people are talking about agents (and there are a lot of people and companies talking about them), they mean ‘AI agents’. That means we are talking about an entity that is not just autonomous, but also makes decisions that are typically made by humans using their cognitive and intellectual abilities. Driving a car is one such example of a task that requires a lot of decision-making. And yes, there are plenty of efforts to build an agent for driving cars. But there are also numerous other projects to build agents of all kinds. We have already seen agents from big tech companies to startups and individual developers that can do shopping, browsing, website development, travel planning, and a host of other activities and tasks for the user. Currently, almost all of them are in beta, or should be labeled as such since they are not ready in most cases for real-life applications. These agents lack enough sophistication, security, and reliability that we need for them to replace human efforts and be meaningful assistants to us.
And many tech companies have gone all in on these efforts to bring agents to our lives. They believe after LLMs, this is the next level and a new frontier that they need to conquer. They are at least partially right (not fully because there are always unpredictable and unforeseen issues with AI in general). They are certainly putting money where their mouth is and investing billions in building and deploying agents. Many are currently being offered for free, but eventually users will need to start paying for them. Some services are already charging anywhere from $10 to $30 a month to use their agents. The question is — are they worth it?
Evaluating agents is one of the trickiest problems in research and in practice. Would you pay $20/month for an agent that can do all your travel planning? It depends how much time and effort that agent saves you and how much that’s worth to you. But it’s not that simple. Imagine that agent making a mistake and booking you an overpriced flight ticket or a red-eye journey that you despise. Who pays for that mistake? And this is a simpler case on the spectrum of all kinds of uses we want to get out of agents. Think about financial transactions and healthcare decisions and then think about the cost of making mistakes there. In other words, it’s not all about saving time and effort, but also trust, assurance, and accountability.
We need to think about the whole ecosystem around agents that has to put users, their tasks, and their contexts at the center.
Elsewhere I have written about how or why agents are not enough for the next level in AI. I have nothing against agents. In fact, a lot of my recent work has been focused on building and evaluating agents. But I also know that simply building capable agents are not going to cut it. We need to think about the whole ecosystem around agents that has to put users, their tasks, and their contexts at the center. Unfortunately, most of the efforts that I see today lack that since the developers have been hyped up about what GenAI technologies could do and not thinking enough about what it should do. In the excitement of generation and reasoning capabilities of LLMs, we often forget what problem we are trying to solve here. And until we reverse this trend, we risk once again to produce glorified Clippys.
Author
-
Dr. Chirag Shah is a Professor in Information School, an Adjunct Professor in Paul G. Allen School of Computer Science & Engineering, and an Adjunct Professor in Human Centered Design & Engineering (HCDE) at University of Washington (UW). He is the Founding Director of InfoSeeking Lab and the Founding Co-Director of RAISE, a Center for Responsible AI. He is also the Founding Editor-in-Chief of Information Matters. His research revolves around intelligent systems. On one hand, he is trying to make search and recommendation systems smart, proactive, and integrated. On the other hand, he is investigating how such systems can be made fair, transparent, and ethical. The former area is Search/Recommendation and the latter falls under Responsible AI. They both create interesting synergy, resulting in Human-Centered ML/AI.
View all posts