Looking beyond the stars: A critique of the scrape-and-report online reviews scholarship

Looking beyond the Stars: A Critique of the Scrape-and-report Online Reviews Scholarship

Philip Fei Wu

According to one estimate, 93% of online shoppers read customer reviews before making any purchase (Kaemingk, 2022). Hundreds of millions of online reviews posted on popular sites such as Amazon and TripAdvisor are now an integral part of e-commerce. The important role of online reviews in consumer decision-making and the “Big Data” of these reviews have spurred a surge of online reviews research in the past 10-15 years. Boasting a large dataset, many recent works follow a “scrape and report” approach—that is, reporting data analytics based on a set of online reviews scraped from various e-commerce platforms.

—I think it is time to look beyond the stars (of product rating) to examine the sociomaterial life of online reviews.—

While collecting a set of characteristics of online reviews (e.g., star ratings) to make predictions about review “helpfulness” or product sales might be an innovative idea 15 years ago, this line of research is now approaching a saturation point. More importantly, the perils of the scrape-and-report approach have not been critically appraised in the online reviews scholarship. On the scraping front, I argue that the scraped review data are neither “exhaustive” nor “organic,” unlike some Big Data enthusiasts might proclaim. For example, most review platforms require user account creation before posting a review, which introduces a potential sampling bias. Rampant on all major e-commerce platforms, fake reviews are not just noises in themselves but could distort subsequent review generation due to herding and other social dynamics.

On the reporting front, with a mindset of “let the data speak,” it is only natural for online review researchers to play with a wide range of variables to find “interesting” hypotheses to write about. This practice of “hypothesizing after results are known” (HARKing) tends to present “good stories” with statistically significant results and bury the “non-significant” results. Choosing to hypothesize and report the “noteworthy” relationships bypasses the important stage of theory building which is fundamental to new discoveries. In addition, working with large numbers of data points and parameters is likely to produce statistically significant but spurious relationships in statistical modelling. In the words of Lin et al. (2013), this type of HARKing research is “too big to fail.”

I think it is time to look beyond the stars (of product rating) to examine the sociomaterial life of online reviews. By sociomaterial life I mean the entire journey of a review’s creation, circulation, as well as its influence on people and businesses, who in turn dynamically reshape the social and material conditions of review creation and circulation in the first place. We should ask questions such as: How are reviews produced and managed by multiple stakeholders, including consumers, fake review production companies, and e-commerce platforms? How do consumers and business owners act upon online reviews that would have material impact on their lives? A rare example of such inquiries is Orlikowski and Scott (2014), which shows TripAdvisor reviews are reconfiguring the service provision and consumption in the hotel industry by altering both travellers’ and hoteliers’ behaviors and their interactions.

Compared to scape-and-report, such practice-based, sociomaterial approach to online reviews research requires social theory, deep contextual knowledge, and methodological pluralism. A return to the strong practices of doing social science research might help offset many of the problems with scrape-and-report; for instance, researchers need to carefully define the population and the sample of reviews, control validity threats to the data, and reflect on the boundaries and constraints of the data collection. In doing so, we shall adopt both Big and small data in our research, recognising that “small data” generated through traditional methods such as experiment, survey, and interview can reveal latent constructs and establish convincing causalities informed by social theories. In short, I encourage online reviews researchers to look at Big Data with a critical eye, embrace methodological pluralism, and see online reviews as a sociotechnical “thing” produced within the fabric of sociomaterial life.

This is a translation of the article “Veni, vidi, vici? On the rise of scrape-and-report scholarship in online reviews research.”


Kaemingk, D. (2022). Online reviews statistics to know in 2022. Qualtrics.

Lin, M., Lucas, H. C., & Shmueli, G. (2013). Too big to fail: large samples and the p-value problem. Information Systems Research, 24(4), 906–917.

Orlikowski, W. J., & Scott, S. V. (2014). What happens when evaluation goes online? Exploring apparatuses of valuation in the travel sector. Organization Science, 25(3), 868–891.

Cite this article in APA as: Wu, P. F. (2023, March 1). Looking beyond the stars: A critique of the scrape-and-report online reviews scholarship. Information Matters, Vol. 3, Issue 3.

Philip Fei Wu

Dr Philip Fei Wu is a Senior Lecturer (Associate Professor) in information management in the School of Business and Management, Royal Holloway, University of London, UK. He holds a PhD from the iSchool at the University of Maryland, College Park, USA. Broadly speaking, Dr Wu's research looks at human behaviour in technology-mediated environments from socio-psychological perspectives.