Difficult to Know If You Can Rely on Or Use Your Data? Look for Paradata to Understand Better
Difficult to Know If You Can Rely on Or Use Your Data? Look for Paradata to Understand Better
Isto Huvila
These days, everything from the every day choices we make to major scientific discoveries is increasingly relying on data. But how often do we stop and ask: Can we trust this data? How was it created? Who has changed it and how? Why should—or should not—we rely on it?
To really trust and use data well, it is not enough to just know what it is about. We also need to understand where it comes from, who made it, why, and how it has been handled or changed along the way. This extra information about the process behind the data is called paradata. Unfortunately, paradata is often missing or scattered in different types of documentation and data itself—which makes it hard to assess the reliability of data.
—Can we trust this data?—
We need to know the recipe of data
We often hear about “raw data,” but the truth is, data is never truly raw. As Geoffrey Bowker wrote for a couple of decades ago, it is always “cooked” or prepared in some way by someone for a particular purpose. This is why for understanding the data, we need to know its recipe, who cooked it, if the recipe was followed, what changes were made and why.
Imagine using a household thermometer. For deciding if you need a jacket or if it is picnic weather, it works just fine—even if a 5-year-old reads it. But if you are tracking global warming trends, you will need precise instruments and clear details about how and when the temperature was measured. This shows that how data is made and for what purpose—and by whom—matters a lot.
Why paradata is important?
Paradata helps us understand the story behind data: how it was created, processed, and used. Without it, making informed decisions can be tricky. For example, archaeologists, who work with a wide range of data types, often rely on paradata to evaluate and reuse old datasets. Research shows that paradata exists in many forms including notes, diagrams, workflows, and even videos. A recently published book “Perspectives on paradata” shows a plenty of examples where and how paradata can be found and how it can be helpful in a broad range of contexts. But the studies also show that paradata is often hard to find, scattered across different places, or incomplete.
Instead of always creating more paradata, the real challenge is to collect and organise the information that already exists. Talking to the people who made the data can help, but memories fade, and sometimes those people are no longer around. That is why documenting key details already during the data creation process is crucial.
How to document data effectively
Paradata is information on data creation, processing and use. The European Research Council funded research project CApturing Paradata for documenTing data creation and Use for the REsearch of the future (CAPTURE) has investigated what information about the creation and use of research data—or paradata—is needed and how it could be possible to capture enough of that information to make datasets reusable in the future.
Compiling and keeping paradata that would comprehensively clarify all possible details of how data making, processing and use happened is impossible. It is impossible to an extent that it should not be tried. Still, there are some simple ways how to better create and preserve paradata.
Remember to plan ahead. Before starting to collect or generate data, think about what information future users might need. Talk to potential users to decide what to document. Avoid overdoing it and focus on the essentials. Too much documentation can be overwhelming and waste time.
Remember also to leave traces. Keep early versions, working documents, or notes. These can provide clues later on about how the data was made. It is also useful to utilise tools that are available. Techniques like “data nutrition labels” (simple summaries of data’s origins) or digital notebooks can make paradata easier to create and understand. Finally, if you are programming, save your code. As Richel Bilderbeek, researcher in computational biology reminds, computer code is an important part of the data’s history.
Building Trust Through Paradata
While paradata is about understanding data-related practices, it is ultimately very often a question about trust. Sometimes we really need to know what exactly happened with data, for example, to repeat the procedures and create more comparable data. Mostly, what we need is, however, to know enough to say if the data is good for the purposes we are planning to use it. It helps us see the process behind data, so we can understand its limits and strengths.
In today’s world of misinformation, many people think they are “fact-checking” or “doing their own research” when they are actually trusting unreliable or deceptive data. By focusing on how data is made and asking questions about its origins, we can build trust and make better decisions. No one can know everything about data, and that’s okay. The key is to ask the right questions and to know when to rely on experts.
Focus on data-making: Paradata reveals you the story behind your data
To use data responsibly, we need more than just numbers—we need the stories behind them. Paradata provides this context, showing us who made the data, how, and why. Rather than suggesting that data and data-about-data would be enough, paradata switches perspective and helps to highlight the practices that led to the making of data, how and why the data is like it is today. By creating and sharing paradata thoughtfully, we can bridge the gap between data creators and users, ensuring that we rely on trustworthy information.
Acknowledgement
This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement No 818210 as a part of the project CApturing Paradata for documenTing data creation and Use for the REsearch of the future (CAPTURE).
Cite this article in APA as: Huvila, I. Difficult to know if you can rely on or use your data? Look for paradata to understand better. (2024, November 21). Information Matters, Vol. 4, Issue 11. https://informationmatters.org/2024/11/difficult-to-know-if-you-can-rely-on-or-use-your-data-look-for-paradata-to-understand-better/
Author
-
Isto Huvila is professor in information studies at the Department of ALM (Archival Studies, Library and Information Studies and Museums and Cultural Heritage Studies) at Uppsala University in Sweden.
View all posts