Data Librarians & Managers Virtual Meeting about Libraries+
April 7, 2017, 2:00pm EDT

Summary of meeting & Notes

Jonathan Petters graciously took notes. Margaret Janz failed to clean them up, but they are available for comment. The recording will be sent to the DataCure listserv. If you didn't receive the recording but would like to view it, please email datarefuge@ppehlab.org.

The meeting started with some background regarding the metamorphosis of going from Data Refuge and Data Rescue events to the Libraries+ Network. In a nutshell, The Data Refuge teams has long been thinking about how to create research quality copies of federal data in a more sustainable and systematic way than can be achieved through Data Rescue Events. Since libraries have always been the institutions responsible for providing access to information, it makes sense to us that they would be an obvious choice for providing insurance policy access to federal data. One way of thinking about this is a reboot of the FDLP that would make hosting these backup copies a partnership between libraries and federal agencies. There are currently a few libraries experimenting with different ideas for how to do this at their own institutions.

Please note that we have full faith in the backup and archiving practices within government data centers and agencies. The professionals who do that work are very talented experts. This proposal is only about ensuring that there are research quality copies being held in ready-to-use states off of federal servers and under the control of federal employees in the unlikely case something should cause the federal government to discontinue offering access. 

Matt Mayernik from UCAR spoke about the document co-written by himself and others at ESIP entitled "Stronger Together: the case for cross-sector collaboration in identifying and preserving at-risk data." This is a great piece from the perspective of data professional at federal data centers regarding "rescuing" data and I really recommend everyone read it. The authors bring up some excellent points for consideration in thinking through this problem. Many data centers are willing to provide copies of their data to potential data refuges and repositories - we just have to figure out better ways to communicate with each other and know who to ask. 

After Matt brought in the perspective of potentially non-librarian data professionals, we opened the meeting for discussion, which was at times hard due to general confusion and head-wrapping-arounds, but overall we brought up a lot of great problems to think about going forward. Hopefully the discussion also helped people move towards better understanding or least more comfort with how many unknowns exist in this space.

Concerns
We identified a number of areas for concern when considering creating a non-federal archive of federal information. Listed here are a few that stood out from this call.

  • What is the value add to the government of our doing this?
  • How do we ensure copies of these data are authoritative and up to date?
  • How do we avoid duplication? Perhaps duplication is ok - like our other collections. But how does version control scale with multiple copies, and how do we reference/cite the other copies?
  • Conversely - how do we avoid missing things? How can we know what exists?
  • How do we convince library administrators to commit to doing this? Do we have the resources or will a great deal of new staff/money/infrastructure be needed?
  • How can avoid taking traffic metrics away from data centers?
  • Should access be the goal? If we build a dark archive, how can we convince admins? If we have "light switch" archive, how do we know when to turn access on?
  • If libraries are creating mirrors and access points to the data, what would be the incentive for the government to continue hosting any copies of the data?
  • Is there a way to move this forward and maintain the community engagement?
  • How can we help with legacy data issues?

Ideas
One really interesting idea that doesn't really qualify as a problem or concern is the notion of creating some risk assessment metrics specifically for federal data. This is certainly an area where data professionals could be of great service.


Meeting Recording
Available upon request. Email datarefuge@ppehlab.org


Invitation

To the data librarian and other data professional communities:

Many of you know about DataRefuge and some of you know that we’re looking to move towards a more sustainable and systematic way to backup federal data and information. We've been talking to many library, open data, government, and other people and starting to imagine what we're calling (intentionally blandly) the Libraries+ Network. To understand the problems ensuring continued access to federal data, we’re having a multi-stakeholder meeting in May. In order to keep a good balance of stakeholders at the meeting, we’ve had to curate the invite list, but we want and need to have the larger community - especially data professionals - involved and engaged in the discussion.

In an effort to do that, we’re hosting a virtual meeting on Friday, April 7 from 2-3:30 pm EDT just for data librarians and other data professionals (there will be a meeting for a broader audience on April 19 - please direct your interested non-datalib-type colleagues to the April 19 meeting). The meeting will provide a brief overview of the goals as we currently see them for the Libraries+ Network and give attendees the opportunity to discuss together the problems and issues associated with this idea. The agenda is not to come up with solutions, but to get a better understanding of the problem, particularly from this community's perspective. We’ll also be discussing how best to bring this perspective to the May meeting and stay engaged going forward.

Please consider registering for this virtual meeting at https://goo.gl/forms/IWNNoEq3CBbvuFTF2. This form also includes a space to write a brief description of how you currently see the problem of ensuring continued access to federal data. Please respond to this question so that we can use the answers to find common themes or great divergences ahead of the meeting and form a more productive agenda based on what we hear. Answers will be made public so we can all learn from each other before the meeting. 

You’ll be able to join the meeting on April 7 from PC, Mac, Linux, iOS, or Android at https://www.zoom.us/j/6397098606 
 

Goals for this meeting: Create a cohesive understanding of the problem from this community's perspective that will be carried into and considered at the May Meeting.

Proposed Agenda:

Suggested Readings: