On the Preservation of and Access to NOAA’s Open Data

By Dr. Edward J. Kearns, NOAA Chief Data Officer
Ed.Kearns@noaa.gov

Recent articles in the popular press and across various social media platforms have raised concerns over the continued preservation and utilization of federal data holdings, particularly NOAA’s climate-related data.  These concerns have produced a number of coordinated efforts to download and store significant volumes of NOAA’s data outside of the federal data systems. While I do not share those same concerns about preservation, as NOAA’s new Chief Data Officer I recognize that the essential idea that enables these efforts --  easy public access to all of NOAA’s open data -- is a laudable one that NOAA’s data stewards are striving to achieve. Let’s talk about open data access first, and I’ll come back to those concerns related to preservation later.

NOAA employs many strategies to make its open data available to all users, as quickly and easily as possible. Data are served directly from NOAA’s federal data systems to consumers through a variety of technical methods, and some data are distributed by NOAA’s partners and cooperators, including those in the commercial weather enterprise and environmental data communities.  The demand for NOAA’s data often exceeds the government’s ability to provide them routinely at a sufficient scale and timeliness to meet that demand. And NOAA’s data holdings and the demand for them (see Figure 1) continue to grow at a rapid pace.

Figure 1. The annual volume and types of data delivered from NOAA’s archives at the National Centers for Environmental Information. This is just a subset of the total amount of data accessed from NOAA. (Figure courtesy of Tim Owen and Ken Casey, NOAA/NCEI)

Figure 1. The annual volume and types of data delivered from NOAA’s archives at the National Centers for Environmental Information. This is just a subset of the total amount of data accessed from NOAA. (Figure courtesy of Tim Owen and Ken Casey, NOAA/NCEI)

How can NOAA find a scalable, and affordable, solution to this public open data access challenge? We are currently experimenting with new public-private partnerships and cloud-based access technologies.  NOAA’s Big Data Project (BDP, see www.noaa.gov/big-data-project) was established in April 2015 through 3-year, extendable Cooperative Research And Development Agreements (CRADAs) between NOAA and Amazon Web Services (AWS), Google, IBM, Microsoft and the Open Commons Consortium to discover how NOAA can:

●      discover ways for NOAA to “work smarter” through partnerships with industry and academia,

●      leverage the value inherent in NOAA’s data to broaden use and reduce costs,

●      unleash the power of industry’s modern cloud platforms and related technologies,

●      create opportunities to advance the US economy using federal data.

Through the duration of these BDP CRADAs, each Collaborator has agreed to store and make freely available to all the original data from NOAA, while they may seek other ways of monetizing those data, including the provision of new services and value-added information products. While all of NOAA’s open data are available to the Collaborators, they choose the particular datasets in which they wish to invest their time and resources, and will often partner with 3rd parties that are interested as well. As you can imagine, the Collaborators’ cloud platforms offer significant advancements in scale, processing, analytics, and tools for the users of NOAA’s data.

While over a dozen datasets are at some level of delivery via the BDP, NOAA’s NEXRAD weather radar data were among the first data to be made publicly available (see Ansari et al, in press, for details).  NOAA transferred the complete NEXRAD Level II historical archive (approximately 300TB) from its internal systems to those CRADA Collaborators that wished to receive them. AWS was the first to make those data freely available, and AWS and NOAA found after a year that:

●      weather radar data utilization has doubled by volume, compared to prior years,

●      thousands of distinct users per month are accessing NOAA data on AWS,

●      and loads have decreased by 50% on NOAA’s internal data ordering systems,

●      ...all at no net cost to the US taxpayer  

The costs of hosting the NOAA data on AWS are underwritten by those users that use the data on the AWS platform, instead of simply downloading them to a different system. By using the data on AWS instead of having to extract them from the NOAA systems, the level of data services has significantly increased and the time required to develop new information products has drastically decreased.  Other NOAA datasets under consideration for BDP delivery include fisheries catch data, integrated water resources information, numerical weather prediction model output, advanced severe weather products, marine genomics data, and new geostationary satellite data.

An upcoming challenge for NOAA is to take the lessons learned by industry and the federal government during these CRADA activities and develop a sustainable partnership model with defined levels of service on which both the federal government and industry can agree, and depend. The ultimate goal is to provide full and open utilization all of NOAA’s data, at a scale and rate that is largely determined by and underwritten by the needs of the user community, instead of solely by taxpayers’ funds.

Now that I’ve described briefly how NOAA is exploring better data access and utilization through these public-private partnerships, let’s go back to the question of preservation. Archive and long-term preservation are widely accepted as inherently governmental responsibilities, and NOAA follows laws, regulations, and policies related to archive and data management to uphold those responsibilities.  Throughout its history, NOAA has remained committed to the collection, preservation, and dissemination of environmental data in service to the Nation, in support of the US economy, and in cooperation with our international partners.

So, are NOAA’s data at greater risk for loss now? No. NOAA’s archive systems are well established, and NOAA’s data and data management practices are governed by federal laws and regulations.  Oversight of federal data management is provided by the National Archive and Records Administration (NARA) and the Office of Management and Budget (OMB). A sampling of relevant laws and regulations, including the Federal Records Act, can be found at the end of this blog post.  Executive orders and policies clarify how these laws should be carried out by NOAA and other agencies, and some of these are also listed.

I am sometimes asked if NOAA’s data in its archives can be easily deleted. No they can’t, since data may not be removed without significant effort and public deliberation. It is also unlawful to tamper, damage, delete, vandalize, or in any way alter formal federal records, including NOAA’s environmental data and its archives. There are data disposition schedules and defined NOAA processes that help us to meet the intended outcome of well-executed and efficient data preservation, which prescribe public notice and comment periods, by which NOAA may propose to remove data from its archives. Such removal has been rare.

What about authentication? While anyone is welcome to download and copy NOAA’s open data, the uncoordinated proliferation of data stores actually may introduce future issues with the trust of those data. The trust of any data is associated with the quality, stewardship, provenance and authority associated with them. The value of NOAA’s data archives include not just the simple existence of the data themselves, but the continuous investment of NOAA’s experts’ efforts towards the sustained quality and usability of the data. The integrity and accuracy of data that are stored on non-federal system and are not stewarded by NOAA’s scientists cannot always be easily verified beyond file-level distribution. NOAA is currently exploring best practices and technologies that may allow the authentication of its data throughout the wider data ecosystem, and welcomes interested parties in academia and industry to join in this exploration.

With these challenges and opportunities facing NOAA, I am certainly excited to step into the role of NOAA’s CDO.  I look forward to working with the wider open data community to discover new, more effective methods of bringing NOAA's open data for everyone’s use, while ensuring the integrity and preservation of those data.

Dr. Edward J. Kearns

 

 

A sampling of laws, regulations, and policies relevant to NOAA’s open data and data preservation:

1.     Federal Records Act of 1950, 44 U.S.C. §§ 2101 et seq., 3101 et seq., 3301 et seq.

The Federal Records Act, establishes the framework for records management programs in Federal Agencies, including the National Weather Records Center (NWRC) in 1951, now NOAA’s National Centers for Environmental Information (NCEI). NCEI is charged with archiving and servicing U.S. weather and climate records.

The Act specifically amended the Federal Property and Administrative Services Act of 1949 to provide for Agency Records Centers run by the Archivist of the General Services Administration: “The Archivist may establish, maintain, and operate records centers and centralized microfilming services for Federal agencies.” 44 U.S.C. § 2907. It also allows for Records Centers run by federal agencies, following approval: “When the head of a Federal agency determines that such action may affect substantial economies or increased operating efficiency, [s]he shall provide for the transfer of records to a records center maintained and operated by the Archivist, or, when approved by the Archivist, to a center maintained and operated by the head of the Federal agency.” 44 U.S.C. § 3103.

2.     NARA, Records Management, 36 C.F.R. § 1220.1-1239.6.  The National Archive and Records Administration’s (NARA) mission is to safeguard and preserve the records of the U.S. government, ensuring that its citizens can discover, use, and learn from the country’s documentary heritage. The NARA regulations on records management specify policies for Federal agencies’ records management programs relating to proper records creation and maintenance, adequate documentation, and records disposition. They are the implementing authority for the Federal Records Act. NARA standards for Records Management apply to all federal records, regardless of where they are stored.

3.     Office of Management and Budget, Revision of Circular No. A-130, Transmittal 4,  Management of Federal Information Resources (Nov. 30, 2000)
Revised Circular No. A-130 provides uniform government-wide information resources management policies as required by the Paperwork Reduction Act of 1980, amended by the Paperwork Reduction Act of 1995, 44 U.S.C. § 3501 et seq. This Transmittal Memorandum contains updated guidance on the "Security of Federal Automated Information Systems.” Under the Circular, “Agencies must plan in an integrated manner for managing information throughout its life cycle.”

4.      Office of Management and Budget, Circular No. A-16 Revised, Coordination of Geographic Information and Related Spatial Data Activities (Aug. 10, 2002).
Revised Circular No. A-16 “provides direction for federal agencies that produce, maintain or use spatial data either directly or indirectly in the fulfillment of their mission. This Circular establishes a coordinated approach to electronically develop the National Spatial Data Infrastructure and establishes the Federal Geographic Data Committee.” Spatial data is defined as: “information about places or geography, and has traditionally been shown on maps. “  The Circular also “describes the management and reporting requirements of Federal agencies in the acquisition, maintenance, distribution, use, and preservation of spatial data by the Federal Government” including the preparation, maintenance, publication and implementation of a strategy for advancing geographic information and related spatial data activities.

5.     Executive Office of the President, Office of Science and Technology Policy, Memorandum: Increasing Access to the Results of Federally Funded Scientific Research (Feb. 22, 2013).
This memorandum “directs each Federal agency with over $100 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government. This includes any results published in peer-reviewed scholarly publications that are based on research that directly arises from Federal funds, as defined in relevant OMB circulars (e.g., A-21 and A-11). It is preferred that agencies work together, where appropriate, to develop these plans.”

6.     White House, M-13-13: Memorandum on Open Data Policy – Managing Information as an Asset (May 9, 2013)
White House Memorandum M-13-13 “establishes a framework to help institutionalize the principles of effective information management at each stage of the information’s life cycle to promote interoperability and openness” in accordance with Exec. Or. 13642, Making Open and Machine Readable the New Default for Government Information. The Memorandum “requires agencies to collect or create information in a way that supports downstream information processing and dissemination activities” including “machine-readable and open formats, data standards, and common core and extensible metadata for all new information creation and collection efforts.”

7.     NOAA Administrative Order NAO 205-1: NOAA Records Management Program (2010)
NOAA Administrative Order (NAO) 205-1 enables NOAA “to carry out an effective records management program in compliance with the Federal Records Act and other relevant legal authorities[.]” Under the order “NOAA Program Officials have the primary responsibility for creating, maintaining, protecting, and disposing of records of their program area.” This duty includes, but is not limited to, documentation of the creation of records, implementation of record protection policies, establishment of a records management system, and cooperation with the NOAA Records Management Officer in requests for information regarding the management of records.

8.     NOAA Administrative Order NAO 212-15: Management of Environmental Data and Information (Issued: 1991, Effective: 2010)
NOAA Administrative Order (NAO) 212-15 establishes the Administration’s Environmental Data Management Policy. The “NAO applies to all NOAA environmental data and to the personnel and organizations that manage these data, unless exempted by statutory or regulatory authority.” Under the Order, data management “consists of two major activities conducted in coordination: data management services and data stewardship. They constitute a comprehensive end-to-end process including movement of data and information from the observing system sensors to the data user. This process includes the acquisition, quality control, metadata cataloging, validation, reprocessing, storage, retrieval, dissemination, and archival of data.” This end-to-end data management lifecycle helps to achieve the NOAA policy objective requiring that “[e]nvironmental data will be visible, accessible and independently understandable to users[.]”