FeaturedTranslation

On the Current State of Query Formulation for Book Search

Irfan Ullah et al.

Abstract

Formulating a well-defined search query is one of the prominent success factors in getting relevant search results on any search system. Various approaches have been proposed in the literature, including using topic fields in different combinations as a search query, query reduction, and expansion. These approaches were adopted and tested in various domain-specific retrieval platforms, search engines and digital libraries. However, no latest review or survey exists in the literature that studies query formulation from the perspective of book retrieval. The authors fill this literary gap by studying query formulation in light of research publications from 2007-22. In this regard, a rigorous search and selection methodology was adopted to select the most relevant publications and their findings were compared and reported. In addition, the current trends and possible future directions were identified.

—Information Science and Retrieval has a key role in finding relevant information about books, when it comes to the colossal size of book collections and complementary social and bibliographic metadata on the Web.—

Introduction

Searching for relevant books is among the popular activities of book lovers on book search platforms, online catalogs, social book web applications, and digital libraries. Query formulation, i.e., producing a well-crafted query, is the key to a successful book search. In this connection, various approaches to query formulation have been proposed from time to time. These include stemming, removing stop words, combining topic fields, reducing verbose queries to the essential terms, weighting query terms, and expanding search queries.

Query formulation has been studied by several review and survey articles. However, in the domain of book search, a recent study [1] reported on query formulation as part of a larger study on the various aspects of book retrieval and recommendation published during 2007-18. Another study [2] reported different aspects of the social book websites in light of literature published during 2011-16. However, no study could comprehensively cover query formulation in book search. The authors fill this literary gap by studying query formulation in light of research publications from 2007-22. In this regard, a rigorous search and selection methodology was adopted to select the most relevant publications and the findings were reported through a well-crafted theoretical framework. The key findings of this study are summarized in the next section.

Summary of Findings

The following are a few key findings of this study.

  • The main concern of the selected publications was book search, followed by book recommendation, and then book search-recommendation hybrid solutions.
  • Most studies used topic fields in various combinations as their search query or applied query expansion, i.e., expanding the search query with the most essential words from the first set of retrieved search results.
  • It was challenging to decide which query formulation approach performs better. This is because relevance is affected by how documents are represented in the search index and how they are weighted against the search query.
  • Using the title and narrative fields of the topic as the search query in any form gives the best results.
  • Most studies used well-known datasets, such as Amazon/LibraryThing dataset, having 2.8 million book records with the required topic sets and relevance judgments to perform query formulation experiments.
  • NDCG@10, Precision@10, Recall@1000, Mean Average Precision, and Mean Reciprocal Rank were widely used among the evaluation metrics.

Conclusions and Future Directions

Information Science and Retrieval has a key role in finding relevant information about books, when it comes to the colossal size of book collections and complementary social and bibliographic metadata on the Web. A successful book search solution has three essential components: (i) search index, i.e., how books are represented to make searchable upon request, (ii) query formulation, i.e., how to get a well-crafted query so that the most relevant books are retrieved, and (iii) weighting scheme or retrieval mode, i.e., how the documents should be weighed against the search query so that the highly relevant books are displayed first in a ranked order. This study focused on query formulation in the book search domain by reporting works published during 2007-22, compared them performance-wise, and identified current trends.

In the development of any research area, datasets have immense importance. For example, the massive success of the Social Book Search was due to the Amazon/LibraryThing dataset. One possible future direction is to enrich this dataset or create similar ones while benefiting from the semantic structure and logically-connected content of books along with the complementary social metadata available on various socio-cataloging book web applications, including Amazon, LibraryThing, OpenLibrary, and GoodReads [3]. Developing such a dataset will attract researchers to advance the book search and recommendation field further. This will bring more to the information science and retrieval platforms such as library catalogs, and digital library search and exploration solutions.

Semantic Web and Ontologies have a key role in improving search relevance. A comprehensive book ontology that benefits from its structure, content, social, and bibliographic aspects would improve query formulation and book representation, descriptions, searching, recommendation, and exploration [3]. In addition, Linked and Open Data can make the bibliographic and social data about books available, discoverable, and consumable on the Web [4]. It is another good source for query formulation, especially, in expanding search queries.

References

[1] Ullah, I., Khusro, S. Social book search: the impact of the social web on book retrieval and recommendation. Multimedia Tools and Applications 79, 8011–8060 (2020). https://doi.org/10.1007/s11042-019-08591-0

[2] Kumar, R., Pamula, R. Social Book Search: a survey. Artificial Intelligence Review 53, 95–139 (2020). https://doi.org/10.1007/s10462-018-9647-x

[3] Ullah, I., Khusro, S. & Ahmad, I. Improving social book search using structure semantics, bibliographic descriptions and social metadata. Multimedia Tools and Applications 80, 5131–5172 (2021). https://doi.org/10.1007/s11042-020-09811-8

[4] Ullah, I., Khusro, S., Ullah, A., & Naeem, M. (2018). An Overview of the Current State of Linked and Open Data in Cataloging. Information Technology and Libraries, 37(4), 47–80. https://doi.org/10.6017/ital.v37i4.10432

This article is a translation of the original:

Ullah, I., Alam, S., Ali, Z., Khan, M., Jabeen, F., and Khusro, S. On the current state of query formulation for book search. Artificial Intelligence Review (2023). https://doi.org/10.1007/s10462-023-10483-7

Cite this article in APA as: Ullah, I., Alam, S., Ali, Z., Khan, M., Jabeen, F., & Khusro, S. On the current state of query formulation for book search. (2023, July 11). Information Matters, Vol. 3, Issue 7. https://informationmatters.org/2023/07/translation-on-the-current-state-of-query-formulation-for-book-search/

Author

  • Irfan Ullah

    Dr. Irfan Ullah is working as an Assistant Professor in the Department of Computer Science, Shaheed Benazir Bhutto University, Sheringal, Pakistan. He has received PhD and MS degrees in Computer Science specializing in the area of Web Engineering from the Department of Computer Science, University of Peshawar, Pakistan.

    View all posts

Irfan Ullah

Dr. Irfan Ullah is working as an Assistant Professor in the Department of Computer Science, Shaheed Benazir Bhutto University, Sheringal, Pakistan. He has received PhD and MS degrees in Computer Science specializing in the area of Web Engineering from the Department of Computer Science, University of Peshawar, Pakistan.