Fireside Chat with Dan Atkins—Evolution of Computing Infrastructure for Research: Is AI the Future of Research?
From World Brain to Wiring the World
Recently (May 10, 2022), the National Academies of Sciences, Engineering, and Medicine released a report advocating Automated Research Workflows (ARWs) to accelerate discovery by closing the knowledge Discovery Loop. ARWs aim to integrate computation, laboratory automation, and tools from artificial intelligence in the research process, such as designing experiments, observations, and simulations; collecting and analyzing data; and learning from the results to inform further experiments, observations, and simulations to accelerate scientific knowledge generation, by orders of magnitude, while achieving greater control and reproducibility in the scientific process. This report is the outcome of a study by the Committee on Realizing Opportunities for Advanced and Automated Workflows in Scientific Research, chaired by Daniel E. Atkins, III, Emeritus W.K. Kellogg Professor of Information and Professor of Electrical Engineering and Computer Science at the University of Michigan (UM), Ann Arbor, and sponsored by Schmidt Futures.
Envisioning and creating systems to advance science is not new. For example, in a series of talks and essays in 1937, H. G. Wells proselytized the idea of a “World Brain”—a repository of scientifically established knowledge—that “pulls the mind of the world together, which will be a supplementary and co-ordinating addition to their educational activities”—on a planetary scale.
The scientific community has striven to accelerate scientific discovery by communicating reproducible datasets and results for centuries. “If I have seen further, it is by standing on the shoulders of giants” attributed to Newton has become the credo of the Republic of Science.
—"If I have seen further, it is by standing on the shoulders of giants," attributed to Newton, has become the credo of the scientific enterprise.—
Science, Scientific Method, and Science Communication
The term and concept of “science”—from the Latin Scientia and Old French, meaning “knowledge, learning, application” (twelfth century) and in English (fourteenth century) referred to “book learning” and “experiential knowledge” and later, “collective human knowledge” gained through experiment, systematic observation, and reasoning. Peters & Besley (2019) note that the word has a long history and has always had a consistent core referring to a socially embedded activity. During the era of the founding of the Royal Society, the word took on a consistent modern meaning with the Baconian method. Francis Bacon (1561–1626), is regarded as the father of empiricism and responsible for a “new science” grounded in inductive reasoning and observation that he developed in his Novum Organum Scientiarum (1620) (“new instrument of science”). Bacon became the inspiration for “science” and “scientific method”. Baconian science, popularized by Robert Boyle, metamorphosed the term science to incorporate the state of knowledge and a branch of systematic knowledge, and a system of knowledge based on general truths and laws obtained and tested through the scientific method. Bacon’s New Atlantis, an unfinished utopian novel published in 1627, contained a vision of a fictional institution Salomon’s House (the inspiration for the founding of the Royal Society), for the future of human discovery and knowledge and perhaps the first defense of science as a public good.
Since the birth of modern science in the 16th century with the publication of the book “De revolutionibus orbium coelestium libri VI” by Copernicus in 1543—a significant event in the history of science—every new paradigm in science has engendered new scholarly communication infrastructures. The scientific revolution set in motion by Copernicus and further augmented by Kepler, Galileo, and later Newton and others led to the emergence of scientific societies such as the Royal Society and the scientific journal such as the Philosophical Transactions as critical elements of the Republic of Science. Since then, the alliance between the scholarly communication system and the science has grown steadily and transfigured in varied ways. With the emergence of computational and data-intensive research leading to eScience, often referred to as the fourth paradigm in science, and the open access movement leading to open science, this old alliance is being reimagined.
Computing and Communication Technologies and Science
The African Proverb “If you want to go fast, go alone. If you want to go far, go together” captures the essence of the aim of research collaboration. The advent of computing and communication technologies speeds up the communication process and enables and enhances collaboration by dismantling the barriers of geographies and crossing the disciplinary barriers. Visionaries like Licklider envisaged a tight coupling of human brains and computers and a partnership to help think and process data in unimaginable ways. His seminal paper on Man-Computer Symbiosis, published in 1960 outlined the foundational concepts and theory about future uses of computers, artificial intelligence, and their relationship with humankind. He hoped to facilitate this symbiosis by building a galactic network. Licklider’s vision became a reality with the development of the Web in the 1990s.
As Tony Hey (2005) observes, it is no coincidence that Tim Berners-Lee invented the World Wide Web at CERN, the particle physics accelerator laboratory in Geneva. Given the distributed nature of the multi-institute collaborations required for modern particle physics experiments, researchers desperately needed a tool for exchanging information. A decade later, these computing and communication technologies led to the envisioning of collaboratories through cyberinfrastructure.
eScience/Open Science and cyberinfrastructure
eScience is the convergence of different sets of trends and technologies that have radically transformed scientific methods and the conduct of science. The bedrock of eScience is the infrastructure and processes of data management, such as mining, extraction, curation, and analysis of humongous quantities of data from distributed systems, along with the ability to share the ideas and results of the analysis to help discover patterns and trends in advancing science. Technologies that buttress eScience include prearranged networks of distributed computing and data archives referred to as grids, including curating and preserving data.
eScience promotes innovation in collaborative, computational, or data-intensive research across all disciplines throughout the research lifecycle. As Jim Gray imagined, this new fourth paradigm of distributed computing and data deluge fundamentally transforms the practice of science.
The book Scientific Collaboration on the Internet, edited by Gary M. Olson, Ann Zimmerman, and Nathan Bos provides views of how new technology enables novel kinds of science and engineering collaboration, offering commentary from experts in the field and case studies of large-scale collaborative projects.
Open Science aims to improve the quality of research in terms of transparency and reproducibility and be the growth mechanism for industry and society. According to UNESCO (2021), open science is an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible, and reusable for everyone, to increase scientific collaborations and sharing of information for the benefit of science and society.
Open Science, Open Data, and COVID 19: Consolidating the Public Good concept of Science
One of the positive experiences of pandemic gloom has been the speed of scientific progress in understanding and treating COVID 19. Many effective vaccines were launched in less than a year, and rapid large-scale trials found cheap and effective drugs, saving thousands of lives.
The global scientific community has also carried out “genomic surveillance”—sequencing the virus’s genome to track how it evolves and spreads at an unprecedented level: the public genome database has more than 5.5m genomes. The great value of that genomic surveillance is underpinned by a commitment to rapid and open sharing of the data by all countries in near-real-time, which is the cornerstone of open science. In addition, surveillance requires a remarkable amount of cooperation between scientists to build compatible laboratory protocols, software systems, and databases.
This commitment to rapid data sharing has deep roots in genomics. At a 1996 summit in Bermuda, the leaders of the Human Genome Project established a set of principles to release a new DNA sequence to public databases within 24 hours. Sir John Sulston, founding director of the Wellcome Sanger Institute, said: “All of this [genome data] should be in the public domain… We need a public social welfare attitude to use this information.”
That thinking has prevailed worldwide, as evidenced by the rapid sharing of more than 1m Sars-CoV-2 sequences by the Sanger Institute since March 2020.
From Centralized Computing Systems to Collaboratories to ARW: Dan Atkins
Listen to Dan Atkins in this episode of InfoFire to get a ringside view of the evolution of computing and cyberinfrastructure for scientific collaboration to accelerate discovery. Dan Atkins takes us through the fascinating history of computers from the days of vacuum tubes and computers the size of football fields to the initiatives and cyberinfrastructures for eScience and Open Science.
Atkins identifies watershed moments such as the emergence of packet networks sponsored by DARPA, resulting in the TCP/IP protocol. The NSF recognized this significant trend and created the NSFNET to support research and make it widely available. The uptake of the Internet was so rapid and profound, wiping out all the other propriety network protocols. The high-end computing power and a robust network impacted the academic research community’s ability to adopt the computational inquiry model. Thus, the next major milestone was the emergence of computational methods for scientific research. With the digitization of everything and scientific instruments becoming digital and being used remotely, the new methods of extraction of knowledge through data mining led to the idea of eScience, the fourth paradigm of science.
Soon it was discovered that the then-current supercomputer-centered model was inadequate. So, we started thinking about how we take computing, networking, data repositories, online instruments, and real-time collaboration technology and bring that all together as the necessary research computing infrastructure so that people could work together in real-time. At the NSF invitational workshop at the Rockefeller University, New York, sponsored by Joshua Lederberg, Nobel Laureate and president of Rockefeller, led by William Wulf the idea of science collaboratory was mooted (to which Atkins was invited). Collaboratory is a made-up word and is defined as s a “center without walls, in which the nation’s researchers can perform their research without regard to the physical location, interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries.” The collaboratory movement was the next big thing, says Atkins.
The Science of Collaboratories (SOC) project was undertaken by the NSF to explore the potential of extensive collaboration across different academic research fields. Since then, there have been several attempts to build computer-supported scientific collaboration environments.
Eric Schmidt (former CEO and Executive Chairman of Google), who started the SCHMIDT FUTURES with the mission to connect the brightest minds everywhere with opportunities to solve the world’s most complex problems, focused on scientific discovery, went to the National Academy of Science and Engineering in 2019 to fund a study “how AI and ML could be used to design and run scientific experiments iteratively to speed up research.” Dan Atkins was asked to chair the committee of ten experts set up to examine ARWs to accelerate discovery. This committee ran a symposium on conducting science collaboratively focused on accelerating discovery through ARWs and making research data more open and sharable according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
Despite the challenges of research data sharing and long-term archiving, Atkins is optimistic that scientists, the scientific community, and institutions would shoulder the responsibility to create and steward a sustainable cyberinfrastructure for science.
Cite this article in APA as: Urs, S. (2022, May 25). Fireside chat with Dan Atkins—Evolution of computing infrastructure for research: Is AI the future of research? Information Matters, Vol. 2, Issue 5. https://informationmatters.org/2022/05/fireside-chat-with-dan-atkins-evolution-of-computing-infrastructure-for-research-is-ai-the-future-of-research/