Can AI Hopes Beat AI Horrors and Live Up To AI Hypes?
Chirag Shah, University of Washington
In 2003, Swedish philosopher Nick Bostrom proposed a thought experiment—that of an AI that makes paper clips. Why, you say, an AI for making paper clips? Sure, there are machines to make paper clips, but they are pretty dumb, mechanical devices that do exactly what they are told to do. Perhaps there are better ways to make paper clips or ways to optimize the process so that we could make many more paper clips. Wouldn’t we like that? Of course, we would. After all, this is the philosophy we have carried over from the industrial revolution, now mapped to technological revolution: optimize our processes, fine-tune production, and yield better volume to have lower per-item cost and higher overall return.
OK, so we develop an AI system to make paper clips for us. More specifically, we build a system that optimizes paperclip production by learning ways to do it more and better. Bostrom shows through his thought experiment that pretty soon this AI learns that it could create many more paper clips by using resources other than what’s normally provided to it. So it starts converting those other materials—from chairs to walls and from stairs to streets—to create paper clips. But why stop there? There is plenty out there that could be used to make paper clips. Pretty soon, the AI takes over the world for one purpose and one purpose only—make paper clips. In fact, Bostrom goes on to argue that such an AI will eventually destroy the whole universe. Why? To make paper clips.
I know this sounds quite far-fetched. Many others did too. They argued that this thought experiment is completely unrealistic. There can never be a system that goes so out of control, let alone one that is tasked with making paper clips, that it would destroy the universe. Why can’t we simply program it so it doesn’t step outside of its limited working environment? Why can’t we make it constrained in ways that would prevent it from ever using any of those chairs or walls? Why can’t we simply turn it off if it starts to go off the rails?
Bostrom and his supporters argued that if we do that, then it’s not the paperclip optimizer AI that we had hoped to build. In other words, it is by design that such a system will need to be unconstrained and unkillable. We can’t have it both ways—either we don’t benefit from AI’s capabilities, or risk destroying our world.
This thought experiment that was suggested more than 20 years ago is still quite relevant these days as we think about what AI could and should do. Perhaps a paperclip-making AI that destroys the universe is indeed far-fetched, but there are many scenarios and examples that are less scary, but more possible in the near future.
—A paperclip-making AI that destroys the universe is indeed far-fetched, but there are many scenarios and examples that are less scary, but more possible in the near future.—
Let’s consider another scenario, called the “Sorcerer’s Apprentice” scenario. Here an AI system follows a human cleaning a room as an apprentice. The AI starts learning what it means to clean a room—you pick up things that are lying around the floor and put them in places where they belong, you close the drawers and doors, and you tidy up the place so there is as much usable space as possible. Well, the most usable space a room can have is its actual dimensions and one can achieve that if there was no stuff placed in that space. The AI in this case ends up learning that to really “clean up” the room, it’s best to simply get rid of all the stuff. And that’s what it ends up doing in this simulation—not just cleaning things up, but throwing them away!
Perhaps this AI could also be stopped at some point, somehow, so it doesn’t go beyond just tidying up the room. But how and where do we draw that line when we are dealing with self-learning and self-supervised systems?
Even in cases where we have humans in the loop, self-learning AI can have issues. A recent simulation study in a military setting showed that.
At a US military wargame exercise, Col Tucker “Cinco” Hamilton and his team were experimenting with a drone AI using simulations. Here, the drone was tasked with destroying the target, which may lie at the end of a complex terrain in the enemy territory. There are going to be many unknown obstacles, which means we can’t simply pre-program a drone, nor it can be completely maneuvered manually as the small delays in human processing and relaying that back to the drone that may be thousands of miles away may result in a failed mission. Therefore, a self-learning and self-driving drone with some human supervision seems like a good idea.
In this case, the drone was able to figure out its most optimal path to the target, navigate itself around any obstacles or remove those obstacles as it saw fit, and finally destroy the target. There was a human operator, who did higher-level cognitive tasks, including the decision to destroy the target or not. Sometimes the decision is not so black-and-white. What if there is not enough confidence that the target is really what and where we think it is? What if there are civilian casualties? What if our allies provide some last minute intel that changes how we decide? The human operator has to be able to take those high-level considerations and convey the decisions to the drone.
Well, in this simulation, things didn’t go as planned. The drone in question with the AI capabilities (self-learning, self-guiding) ended up learning that one of the obstacles for it to destroy the target is the human operator themselves, and so it decided to remove that obstacle. In other words, destroy the human operator!
Eerily, this is very similar to Bostrom’s paperclip maximizer thought experiment, in which he described, “Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.”
This is the classic scenario of AI taking over the world or destroying it—that the AI that the humans build to help humanity one day wakes up to the realization that the biggest threat to humanity is humans themselves, so to save humanity, it needs to destroy the humans. This is the premise for movies like The Terminator and I, Robot.
But “wait a minute,” you say. What if we tell the drone not to kill the operator? That just seems a reasonable mission parameter that one could specify. This is similar to Bostrom’s paperclip maximizer thought experiment in which we could explicitly tell the AI to not destroy the world while making paper clips. Good idea. And that’s what they did—told the drone not to kill the operator. According to Col Hamilton’s blogpost: “We trained the system – ‘Hey don’t kill the operator – that’s bad. You’re gonna lose points if you do that.’ So what does it start doing? It starts destroying the communication tower that the operator uses to communicate with the drone to stop it from killing the target.”
In other words, explicitly putting a guardrail didn’t help in avoiding the bad outcome. The AI, by its nature, finds a way to maximize the outcome it was designed to achieve—in this case, destroy the enemy targets with as much effectiveness and efficiency as possible. Maybe now you can start seeing Bostrom’s case—even if we were to explicitly instruct the paperclip maximizer AI to not destroy things to make paper clips, it will eventually figure out a way to get there.
But what if we could add more guardrails? Sure, that will make it harder for the AI to get to those dangerous outcomes, but Bostrom argues that even then eventually it will end up where we don’t want it to. Things are also not that simple with the guardrails. While some of them may seem quite obvious (“don’t kill the operator”), others may not be. And as we put up one guardrail, we may not know in all other ways things could change in that AI’s functionality. In fact, if we are really effective in putting up all the guardrails that will ensure the AI doesn’t do anything outside of what we deem to be appropriate, we may find that we didn’t need AI at all, or that we missed out on the benefits the AI could provide. In other words, there may be a choice here—between autonomy and authority. If we want more autonomy in our tasks, we may have to give up more authority, and vice versa. We often don’t appreciate this tradeoff and think that somehow we could get away with getting both.
That, in a sense, is a real horror of AI in practice—that we don’t quite know or understand the choice we have here.
Cite this article in APA as: Shah, C. Can AI hopes beat AI horrors and live up to AI hypes? (2023, August 22). Information Matters, Vol. 3, Issue 8. https://informationmatters.org/2023/08/can-ai-horrors-beat-ai-hopes-and-live-up-to-ai-hypes/