Problem Statements

These are the problem statements provided by participants in the Data Cure Libraries+ Virtual Meeting. These are meant to get participants thinking about the different layers to the problem of backing up and sharing federal data.

"Lack of communication and coordination between open data community and data preservation/curation experts (librarians)."

"Not being sure if data has been taken down, moved, or on a closed part of the site. Are there other sources for the data that are free, and how do I find them?"

"Maintaining trust in the data after removing it from its original location and severing ties with the authoritative body. "

"Our current government administration combined with the storage and time needed to devote to the large amounts of data."

"Locating and organizing it."

"Verifiable provenance and integrity of datasets; long-term storage and access to the rescued data."

"Knowing what has been already collected (in order to collect what has not been) AND knowing how to access and use what has been collected. Also there seems to be confusion as to what is "data." Is it datasets, information from a web page, files and all the extra multimedia from websites?"

"Uncertainty due to current administration's disregard for science."

"Coordinated catalog of what's been captured, where it is, when it was taken, and all associated documentation."

"Ensuring that data is not lost when funding or priorities change and that researchers can find data when it is migrated to a new location."

"Sustainability of funding for access to/preservation of data is my biggest concern. Setting up organization/technical infrastructure for backing-up access is a heavy lift, it's true, but without sustainability..."

"Versioning and authenticity: For datasets that are updated and copies are made, which copies are updated vs not and how do we track and document that?"

"Discoverability, authority, metadata."

"Prioritization of the work within academic libraries."

"Loss of funding to existing data centers and loss of funding for the collection of new data going forward."

"The context lost when it is harvested from websites, rather than systematically archived in collaboration with agency staff. But that's where we are right now."

"Federal agencies deleting the data from their servers or not producing it in the first place."

"Political threats are immediate but there are many challenges related to making the data truly useful for the future (e.g., metadata, data quality)."


"A commitment on the part of all agencies to put resources and effort to creating a unified approach/system that is inclusive to many different and diverse data types, sources that would ensure access and preservation.  Also to provide training to a national support system (i.e. librarians, information professionals, others in industry, etc)."

"No laws surrounding data availability means that different administrations have too much power over what's available."

"I'm worried about funding. How do libraries take on the burden of caring for federal data while budgets are being simultaneously slashed? I would love to have conversations about organizational structures that could allow us to share this burden across a wider network of libraries and research institutions."

"Accountability for researchers to share research data; migration of data to newer formats and or platforms."

"Changing administrations can choose what data is being represented online."

"A lack of sources for good, open data."

"Our administration. It also sounds like the data is not well-documented and the metadata not machine-readable. We need to grab as much data (with robust context) as possible before things get modified/disappear."

"I am concerned about adequately capturing provenance metadata and about registering datasets that have been successfully captured."

"Proper description for findability and proper preservation."

"Federal data is apparently at risk of being 'disappeared' by the current administration, and the situation needs to be monitored and preserved. I think the admirable and necessary Data Refuge movement is introducing new issues into the mix. How are copies of federal data sets downloaded by citizen activists being stored and backed up? Is there redundancy built into that system? If multiple copies are being stored on individual hard drives, how is provenance being captured and how will we know which data set is the 'real' one if there are multiple differing versions out there?"

"Provenance and understanding in the future are the biggest problems to ensure continued access. Access is important, but if data that is accessible is not understandable, then it is of little use."

"Link rot and general landscape instability"

"Medical data reporting being stopped or altered to meet pseudo-science beliefs. "

"I think that determining who bears the responsibility of ensuring long-term access to data will be a challenge. Determining who is responsible for storage, maintenance, etc. Are there formal succession plans in place for stewardship of the data should a repository no longer be able to maintain the data?"

"Infrastructure and funding within the agencies to enable them to meet their mission."

"The takedown of scientific data that President Trump and his associates disfavor."