Are We Ok With “Overviews” For Our Information Needs?

June 6, 2024 Chirag Shah

Chirag Shah, University of Washington

During the summer of 2021, in response to Google’s ongoing push for LaMDA, my colleague Emily M. Bender and I wrote about why that was not a good idea for the future of information access. We didn’t simply write it as an opinion piece on a blog or a straight-to-arXiv paper. After getting it reviewed by some of the distinguished scholars in the field and addressing their criticism and incorporating their feedback, we submitted the article to a peer-reviewed ACM conference where it was reviewed using the standard double-blind reviewing process. It was accepted and published in ACM Digital Library in March 2022. Since then, it has been available for free.

Why am I saying all this? Two reasons. First, to emphasize that we went through a rigorous scientific process to produce and publish this work that is open sourced and accessible to all. And second, while we got some pushback from some folks back then and in the months and years that ensued, our predictions from that pre-ChatGPT article has become even truer and our recommendations even more important. For the support for the latter point, look no further than Google’s recent announcement of a new core search feature called AI Overviews.

—while you may have never wondered about the health benefits of running with scissors, Google will tell you that it's a great exercise to do—

Before getting into benefits and costs of AI Overviews, let’s first understand what it is. At this point, I’m sure you have heard enough about generative AI (GenAI) and how large language models or LLMs sometimes make stuff up or “hallucinate.” While you may want that GenAI tool to be “hallucinate” or be creative when you ask it to write a poem with verses that don’t rhyme, you don’t want that behavior when you are looking for facts while writing a history report. Fear not, because some smart people have come up with a better technique called RAG or Retrieval Augmented Generation.

The basic idea with RAG is simple—take your question or query as an input and do a regular search-retrieval first, and then use that small set as a basis or context for generating an answer. This way, what’s being generated is rooted in what’s actually out there. The proponents of this method would argue that this is basically what humans do—we do our web search, find a few top results, dig through them and generate our answers.

But there is one big fundamental difference—between that retrieval and generation, we have something called common sense. So when you want to find out why the cheese on your pizza is not sticking and you see one of the top results telling you to use glue, you know that’s either satirical or at least questionable. Why? Because of that commonsense. But Google ended up using such a result from an old reddit post to generate an answer that told the user to use glue. Or when you are looking for suggestions to fix your car blinker not working and come across some top results that suggest to replace “blinker fluid,” you may know that there is no such thing as “blinker fluid.” But once again, Google didn’t have that commonsense. And while you may have never wondered about the health benefits of running with scissors, Google will tell you that it’s a great exercise to do.

We know in an RAG framework, we need to have both the retrieval and the generation components working well for the whole thing to work well. If the former retrieves bad results like what we see in the cases above, the latter may mess up generation. However, this doesn’t happen for humans because we are not simply connecting what we see to arrive to an answer. Well, at least not most of us and not most of the times. Sure, humans also make mistakes and it is well documented how people often blindly trust the results they get from Google. This was already bad, and now with that retrieval part removed, things could be even worse because now we don’t even get the full context and diversity of results to have a fighting chance to overcome bad answers and decisions.

On top of that, getting straight to the answer takes away our ability to learn. As Pickens talks about in his article on “information proprioception,” there is a great value in having some amount of friction while getting to that right information. That friction or effort allows us to question and validate what we are receiving, and more importantly it allows us to learn. So even in cases when such AI-generated answers are right, we are losing those opportunities of learning and developing new perspectives.

So the question is—are we ok with such “overviews” or pre-digested information? Is the cost of getting to them, which includes not only potential bad or wrong answers, but the loss of our agency and ability to learn and grow worth the potential benefits of saving us time and effort? While I may have hinted to my view or answer to this here, I encourage you to find yours. After all, why should we take something just because someone offers it for “free”?

Cite this article in APA as: Shah, C. Are we ok with “overviews” for our information needs? (2024, June 6). Information Matters, Vol. 4, Issue 6. https://informationmatters.org/2024/06/are-we-ok-with-overviews-for-our-information-needs/

Author

Chirag Shah

Dr. Chirag Shah is a Professor in Information School, an Adjunct Professor in Paul G. Allen School of Computer Science & Engineering, and an Adjunct Professor in Human Centered Design & Engineering (HCDE) at University of Washington (UW). He is the Founding Director of InfoSeeking Lab and the Founding Co-Director of RAISE, a Center for Responsible AI. He is also the Founding Editor-in-Chief of Information Matters. His research revolves around intelligent systems. On one hand, he is trying to make search and recommendation systems smart, proactive, and integrated. On the other hand, he is investigating how such systems can be made fair, transparent, and ethical. The former area is Search/Recommendation and the latter falls under Responsible AI. They both create interesting synergy, resulting in Human-Centered ML/AI.
View all posts