Problems With Archiving and Replaying Web Advertisements
Problems With Archiving and Replaying Web Advertisements
Travis Reid, Alex H. Poole, Hyung Wook Choi, Christopher Rauch, Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Advertisements are an integral part of our cultural heritage, and this extends to online web advertisements. Unlike print ads, web ads are dynamic and interactive, which makes them difficult to archive and replay (i.e., load archived web resources in a web browser) successfully . To explore these challenges, we created a dataset of 279 archived web ads. During this study, we identified five major problems with archiving and replaying web ads, which we discuss in detail in our article “Problems With Archiving and Replaying Current Web Advertisements“.
Despite the importance of web advertisements, there has been limited effort to create web archive collections from them. This gap is primarily due to the technical challenges involved in archiving web ads, as well as the common attitude that ads are a nuisance, to be avoided or blocked. Our research addresses a key question: what are the main obstacles users face in archiving and replaying web ads?
—What are the main obstacles users face in archiving and replaying web ads?—
Creating a Dataset of Archived Web Ads
Archiving 17 web pages from SimilarWeb.com’s top websites resulted in the collection of 279 advertisements. To archive these web ads, we used four web archiving services (Internet Archive’s Save Page Now, Arquivo.pt, archive.today, and Conifer) and three browser-based tools (ArchiveWeb.page, Browsertrix Crawler, and Brozzler). We used the four web archiving services to archive two web pages each, ArchiveWeb.page and Browsertrix Crawler archived four web pages each, and Brozzler archived one web page. We successfully archived nearly all of these ads (273 of 279), meaning that all required resources were captured. For these 273 successfully archived ads 111 (40.66%) always replayed in the containing web page (the web page that loaded the ad during a web crawling session), 113 (41.39%) sometimes failed to replay depending upon the version of the replay system, and 49 (17.95%) never replayed in the containing web page. These archived ads were identified by either replaying the archived web page (224 of 279) or by using ReplayWeb.page‘s URL search feature and our Display Archived Ads tool (55 of 279). To replay the archived advertisements, we used four web archiving services (Internet Archive’s Wayback Machine, Arquivo.pt, archive.today, and Conifer) and three other replay systems (ReplayWeb.page, pywb, and OpenWayback).
We organized our 279 ads into five categories: image, video, embedded web page, text-only, and combination. The first three types are associated with one web resource. Text-only ads are displayed in the containing web page. The combination category comprises ads that rely upon multiple resources and are constructed inside of the containing web page or ad iframe.
We also created a web page (https://savingads.github.io/themed_ad_collections.html) to display the archived ads from our dataset. This web page has three views: themes, collection, and ad details. The Themes view shows all ad themes. The Collection view shows ad previews for ads from the same theme (Shopping collection example: https://savingads.github.io/themed_ad_collections.html?collection=Shopping). The Ad Details view shows the archived ad and all of its information from the dataset (Samsung Neo QLED TV ad example: https://savingads.github.io/themed_ad_collections.html?ad=-62639933).

Archiving and Replaying Problems Identified
The process of archiving and replaying these 279 web ads revealed five key findings. First, Internet Archive’s Save Page Now feature blocked many web ads from being archived. After communicating with Wayback Machine staff about this problem, a new option “Disable ad blocker” was created for logged-in users that enabled archiving ads. Second, Brozzler was incompatible with versions of Google Chrome released after March 2023, which prevented web pages from loading during the crawl. After we reported this incompatibility, the problem was resolved during 2024. Third, when executing Google’s and Amazon’s ad scripts, the random values generated were not the same during the crawling and replay sessions, which resulted in a request for an incorrect URL during replay that was not archived and this prevented the ad from loading (examples: Google ad and Amazon ad). Fourth, the JavaScript for Flashtalking’s ad service prevented the replay of embedded web page ads outside of an ad iframe, because the ad script dynamically generated an incorrect URL that did not exist on the live web. Fifth, some web ads were not loaded during replay depending on the web browser used, because the service worker implementation can differ between browsers. Chromium had a bug that prevented service workers from accessing resources loaded in an “about:blank” iframe, which prevented the replay of a successfully archived ad. After encountering this problem with ReplayWeb.page, we reported this problem and a workaround was created.
Summary
We explored the process of creating a dataset of 279 archived web ads and identified problems with archiving and replaying these ads. This dataset was created by archiving 17 web pages from SimilarWeb’s top websites with four web archiving services (Internet Archive’s Save Page Now, Arquivo.pt, archive.today, and Conifer) and three browser-based web archive crawlers (ArchiveWeb.page, Browsertrix Crawler, and Brozzler).
We identified five key problems while creating this dataset. First, Internet Archive’s Save Page Now feature excluded web ads from being archived. Second, Brozzler became incompatible with Google Chrome during 2023. Third, Google’s and Amazon’s ad scripts generated random values in URLs during replay which prevented ads from loading. Fourth, Flashtalking’s ad script prevented the replay of web page ads outside of an ad iframe. Fifth, a Chromium bug prevented replay systems that use service workers from accessing “about:blank” iframes which prevented the replay of ads. By identifying and reporting these issues, three of the five problems have been resolved. These improvements will help enhance the experience for users archiving and replaying dynamic web resources such as web ads.
Cite this article in APA as: Reid, T., H. Poole, A. H., Choi, H. W., et. al. (2025, December 10). Problems with archiving and replaying web advertisements. Information Matters. https://informationmatters.org/2025/12/problems-with-archiving-and-replaying-web-advertisements/
Author
-
Travis Reid is a Ph.D. student in Computer Science at Old Dominion University and a member of the Web Science and Digital Libraries (WSDL) research group. His recent research has focused on identifying challenges in archiving and replaying web advertisements, developing a web archiving livestream tool that gamifies the web archiving process and enhances transparency, and creating a gaming livestream tool that integrates web archiving with video games.
View all posts