Report and Recommendations of the

LWS Science Data System Planning Team

 

 

 

D. G. Sibeck and T. Kucera (Co-Chairs)

 

J. Byrnes, B. Fortner, S. Fung, B. Giles, J. Gurman, T. Herder,

G. Le, B. Labonte, W.D. Pesnell, and A. Szabo

 

January 2002

 

 

Outline

 

 

 

a.     Introduction and Objectives

 

b.     Current Status of Data Management within the SEC Discipline

 

c.     Principles Guiding the Development of an LWS Science Data System

 

d.     Responsibilities within the LWS Data System

 

e.     Items for Immediate Action


 

Introduction

 

In response to a request by LWS Senior Project Scientists Drs. Richard Fisher (GSFC) and Larry Zanetti (JHU/APL), the LWS Data System Planning Team held a series of biweekly meetings throughout 2001 to discuss the nature of the forthcoming LWS Science Data System.  This report (1) reviews current data management practices within the Sun-Earth Connection (SEC) discipline, (2) lists principles that should guide the development of an LWS data system, (3) outlines the distribution of responsibilities within an LWS data system, and (4) recommends immediate and near-term actions facilitating the formation of an LWS data system.

The LWS Science Data System Planning Team hopes that this summary may serve as a starting point for future workshops directed towards community consensus.


 

Current Status of Data Management within the SEC Discipline

 

Cost constraints and the need to provide services based on user needs dictate that any future LWS Data System will grow from existing services.  The LWS Data System Planning team therefore began by surveying data management practices and resources in the various Sun-Earth Connection subdisciplines.  It then considered a set of common problems faced by each subdiscipline.

 

Solar Physics.  Numerous domestic (e.g. NASA, NOAA, NSF, and DoD) and foreign agencies (e.g., ESA, ISAS) support projects that provide solar observations.  Several WWW sites maintain comprehensive links to data sets held on-line by principal investigators and designated archives.  Projects within the solar community generally provide research quality observations to the scientific community and general public within minutes to hours after the observations have been made.  The solar physics community benefits from the widespread use of a single software analysis package (SolarSoft) to examine data exchanged in a single format (FITS).  The community maintains a software tree and sponsors the development of new tools within it, including tools that enable the importation of observations in other formats.

 

Heliospheric Physics.  Both NASA- and foreign-sponsored missions have provided and will provide important heliospheric observations.  No WWW site maintains comprehensive links to heliospheric data sets held within the research community and at designated archives.  Although the NSSDCÕs COHOWeb and OMNIWeb provide hourly-averaged plasma and magnetic field measurements from a variety of heliospheric spacecraft, many valuable data sets are held off line.  Lag times for new heliospheric observations to be processed, validated, and placed on line range from days to months. Some data sets are never made available on line. Heliospheric physicists use a number of analysis tools to inspect observations in a wide variety of formats, often proprietary.  Although the Solar & Heliospheric SR&T program has imposed a requirement that new proposals may only use publicly available data, the means of enforcing (or funding) and enabling this initiative have not been specified.

 

Magnetospheric Physics.  Several federal (e.g., NASA, NOAA, DoE, and DoD), and foreign space agencies (ESA, ISAS, RSA), sponsor missions that provide magnetospheric observations.  No WWW site maintains comprehensive links to magnetospheric data sets held within the research community and at designated archives.  Although the SPDFÕs CDAWeb provides key parameter observations from a variety of spacecraft, numerous valuable data sets are held off line within the community or in archives.  Many are in danger of being lost permanently following the termination of the ISTP project.  Lag times for new magnetospheric observations to be placed on line range from minutes to months.  Magnetospheric physicists use a number of analysis tools to inspect observations in a wide variety of formats, often proprietary. The NSSDCÕs SSCWeb provides information such as the location of the spacecraft and conjugate points on the ground.  Satellite ephemeris data are needed for this service.

 

Ionospheric and Thermospheric Physics..  A variety of federal agencies (e.g., DoD, NSF, NOAA, and NASA), as well as numerous foreign governments, support projects that provide ionospheric observations.  No WWW site maintains comprehensive links to ionospheric data sets held within the research community and at designated archives.  However, NSF maintains a WWW site with extensive links to all the projects that it sponsors, and NOAA provides an archive for magnetometer data, solar indices, and other datasets related to solar-terrestrial connection research.  Nevertheless, the ability to integrate these observations into comprehensive views of the ionosphere remains absent.  Lag times for new observations to be placed on-line range from minutes to months.  As in the case of heliospheric and magnetospheric physics, ionospheric physicists use a number of self-written analysis tools to inspect observations stored in a wide variety of data formats, often proprietary.

 

Possible Approach to Problems Common to Each Subdiscipline.

            As discussed above, there is no single entry point on the WWW that comprehensively catalogues the various data sets currently available for LWS-type studies or provides the tools needed to conduct such studies.  Consequently, researchers must often search the WWW for required data sets, translate formats, and prepare both graphical and analytical software to achieve their research goals.  As many researchers have similar objectives, there is considerable duplication of effort.

            Thanks to their use of a single software tree and a single data format, data management practices are most advanced within the Solar Physics community.  Although this will greatly facilitate the solar communityÕs transition to LWS-type studies, researchers within other subdisciplines will not be able to make use of SolarSoft without the preparation of further introductory material and simple web-based tools, e.g., a tool for inter-comparison of images from disparate instruments.

            Instead, most researchers within the heliospheric, magnetospheric, and ionospheric communities will desire software tools specifically adapted to their own research interests and more rapid access to validated data sets than has been the case to date.  This will require a paradigm shift towards rapid data availability, the use of a limited number of data exchange formats (or more common use of format converters), and the development of standard software analysis and display tools.  Among the tools currently available to these scientific disciplines, COHOWeb (for hourly-averaged heliospheric observations), OMNIWeb (for hourly-averaged near-Earth heliospheric and geomagnetic observations), CDAWeb (for higher time resolution magnetospheric and some ionospheric observations) and SSCWeb (for ephemeris) provide potential foundations for an incipient LWS data system.  These have the advantage of utilizing and building upon existing data management infrastructure that is already serving the space physics community by providing ISTP data sets relevant to LWS.  While some correlative tools may be developed at the request of the LWS data system, most should originate from and be developed by individual PIs, who understand their own observations best, and by other members of the scientific community.  The LWS data system can play an important role in disseminating these tools.

            Simulations and empirical models will play important roles in the LWS program.  Simulations test the degree to which we understand the underlying physics linking the Sun to the Earth, whereas data intensive empirical models help specify the space environment.  Both can help fill in gaps resulting from incomplete observational coverage.  Continuing advances in simulation techniques and data assimilation have brought the possibility of accurate space weather forecasts within sight.  For the models to be further improved, extensive comparisons with observations will be necessary.  With the notable exception of the CCMC, current data systems do not facilitate such comparisons.  The construction of accurate empirical specification models (such as the NASA AE/P-8 trapped radiation models and the International Reference Ionosphere [IRI] model) requires intensive data processing. The LWS data system must facilitate both model-data comparisons and the development of empirical models from its inception.

            In summary, there is considerable room for improvement in data management policies, developing and disseminating correlative analysis tools, and cataloging and providing access to existing and future LWS-relevant data sets and simulation results.  Advocating a paradigm shift towards more effective data management will be an important first step in the development of an LWS data system.


 

Principles Guiding the Development of an LWS Data System

 

            The development of any data system begins with lessons learned from past experiences.  The LWS Data System Planning Team noted that (1) the imposition of overly ambitious comprehensive data systems can result in costly systems that do not address basic needs; (2) valuable data sets are currently in danger of being lost because their delivery to designated archives was neither required nor funded; (3) a combined effort of designated archives and dedicated PIs will be needed to extract the full scientific return from publicly available data sets; and (4) periodic competitions encourage innovations and help control costs.  Consequently, the team adopted the following guiding principles:

 

a.      The most useful and feasible LWS science data system would be a meta-system tying together many heterogeneous sets of data distributed among different institutions. Such a data system should identify and allow for access to essential data sets and model output from other NASA and non-NASA projects, sponsored by the US and other countries. The resulting LWS data system will very likely be distributed and virtual.

b.      The LWS data system design must not solidify too early or be imposed from outside the scientific community, but should initially be based on existing services and then evolve in response to clearly-identified user needs and project guidelines.

c.     The LWS data system must provide for end-to-end management of all research-quality data sets returned by LWS missions and models. 

d.     Both PI teams and designated archives should manage and maintain the usability of the data system.  Whereas the former provide expertise to ensure proper processing of individual full resolution data sets, the latter support the project by establishing data archiving and accessing protocols and developing and providing the services needed to locate and retrieve multiple data sets for correlative analysis.

e.     Peer-reviewed proposals in response to directed AOs provide the most cost-effective means for initiating and improving the LWS data system.

f.      The time to begin developing the metadata standards and access methods for an LWS data system is now, because this will afford an opportunity to identify the data sets and services needed, familiarize potential users with available tools and conventions, support the ongoing LWS TMDA program, and take advantage of technology developed with the support of NASAÕs OSS AISRP.

 


Responsibilities within the LWS Data System

 

            In view of the fact that the final set of spacecraft and instruments remain to be determined, it might be thought premature to begin designing an LWS data system.  Solutions adopted now may become outdated by the time the LWS missions are launched.  On the other hand, the LWS science data system is more than a traditional data system designed to serve only a single project.  It will serve all LWS data product providers and users, connect different LWS program elements, and provide coherence to the LWS program. Because of its paramount importance to the LWS program, it is not too early to begin identifying tasks that any data system must accomplish, allocating responsibilities, and reaching community consensus.  A prototype LWS data system can begin examining and testing possible solutions, salvaging relevant data sets, and supplying both current and heritage data sets to researchers, particularly those currently funded by the LWS Targeted Research and Technology Program.  This section describes our views of how the functions assigned to the LWS Data System should be distributed amongst NASA management, LWS Data System managers, the PIs, and designated data centers.

 

Role of LWS Project Management. Because the LWS program emphasizes cross-disciplinary and correlative studies, NASA management must provide adequate resources and expertise for end-to-end data management.  NASA must ensure that Announcements of Opportunity (AOs) for LWS missions include requirements for Project Data Management Plans (PDMPs) and that proposals submitted to the program include satisfactory responses.  PDMPs must be based on NASAÕs open data policy and LWS program objectives, which require timely delivery of scientifically meaningful observations together with metadata and supporting documentation to data users, relevant real-time observations to operational forecasters, and data products of general interest to the public.

            NASA must allocate sufficient funds to ensure the successful completion of data management tasks.  NASA managers must not allow instrument and spacecraft operations to terminate abruptly with these tasks left unfinished.  They may rely upon LWS data system managers and the scientific community to monitor progress.

 

Role of LWS Data System Management.  The LWS Data System will require a small management and administrative staff.  Working together with PIs, designated archives, and interested members of the scientific community, the Data System Managers will direct data system activities and ensure proper communication between NASA headquarters, members of the data system, users, and affiliated non-NASA data sources.

 

  1. Set data system requirements with an emphasis on cross-disciplinary tasks, but allow archives and PIs find innovative ways to fulfill them.  This will require effective management to encourage ongoing consultations.
  2. Organize periodic competitions for the components of the data system (including its management and archival sites).
  3. Organize periodic competitions to develop WWW-based and stand-alone tools to locate, retrieve, translate, integrate, portray, and analyze both model results and observations.
  4. Ensure access to validated data sets and tools located in designated archives.
  5. Establish the metrics needed to evaluate service levels and allocate funding within the data system.
  6. Provide a WWW entry point linking the distributed sites within the LWS Data System.
  7. Work with the community to define metadata standards and data exchange formats.
  8. Negotiate routines for the delivery of relevant non-LWS data sets (including model output) into the data system.
  9. Ensure compatibility with relevant climate change and atmospheric data system partners to achieve LWS objectives.
  10. Sponsor the restoration of and provide access to heritage data sets.
  11. Support the LWS project by maintaining a library of data management plans.
  12. Sponsor the development of software trees for analysis routines and provide instructions on their use.
  13. Survey, answer, and incorporate community feedback on the LWS Data System.
  14. Survey, record, and advocate best practices within the community.

 

Role of Designated Archives.  With the exception of deep archiving, archive functions within the LWS Data System should be periodically competed.  Designated archives must maintain the data sets returned by the LWS project once individual missions have ended, and catalogue their holdings.  Furthermore, they must provide the various user communities (researchers, forecasters, educators, and the general public) with comprehensive and comprehensible WWW interfaces to LWS data sets.  It is likely that they will develop value-added services, data products, and functionality.

 

Role of PI teams.  PI teams will play a key role within the LWS data system.  They possess the unique knowledge required to interpret the observations and develop the software to interpret high-resolution observations.  They should be asked to accept a paradigm shift towards full and free access to their data following an initial brief validation period.  In contrast to the situation that has hitherto prevailed, they should

 

a.     Provide access to the most recent versions of research quality data, processing software, metadata, and documentation.

 

b.     Develop WWW-based tools that grant both team members and outsiders similar views and access to the data.

 

c.     Produce and deliver key parameters to the designated LWS data system archives.

 

d.     Respond to questions concerning data quality and interpretation.

 

e.     Carry out their responsibilities as stipulated in the PDMPs.

 

PI funding for research should be based substantially on how well they serve the larger community in these ways. The LWS Program Scientist should ensure (1) that all AOÕs include a clear statement indicating that the PI status is a public trust, and (2) that Project Scientists receive the resources and power to reward the PIÕs who do the most for the community.

 


Action Items

 

1.     Establish a Prototype LWS Science Data System

  1. Appoint an LWS Data System science team (including LWS SDS project scientist(s) or deputy project scientist(s)) to lead this effort, insure sufficient resources, and see to it that the scientific communityÕs priorities are served in this effort.
  2. Provide links to relevant data sets, particularly those needed for projects supported by the LWS Targeted Research and Technology program.
  3. Identify, restore, and validate relevant legacy data sets.
  4. Sponsor the development of tools for correlative analysis.
  5. Identify essential non-LWS data sets (including model output).  Negotiate timetables, formats, parameters, and time resolutions for their routine incorporation into the LWS data system.
  6. Begin a dialogue with the research community concerning standards for metadata, documentation, and format for both observations and model output.
  7. Compile a database of exemplary Project Data Management Plans, and best practice data set management in the community.
  8. Define requirements for the system as forecast for 2005, 2010, and 2015, and work with program management at NASA HQ and GSFC to insure that adequate resources are available and community involvement is built in.

 

2.      Require Project Data Management Plans (PDMPs)

a.     Instrument and spacecraft AOs must require end-to-end PDMPs.

b.     Review panels must explicitly evaluate proposal PDMPs.

c.     Routinely monitor and reward compliance with PDMPs.

d.     Taper (rather than cut) off funding at the end of projects so that they can properly archive their high-resolution scientific data sets.

e.     Do not rely on free help from interested scientists and archivists to preserve valuable data sets.

 

3.      Establish a firmer LWS WWW presence for the project and more specifically for the LWS Data System.

a.     Improve visibility with readable descriptions of LWS objectives.

b.     Provide frequent updates on LWS implementation plans.

c.     Receive community comments on plans.

d.     Provide contact points and describe roles of LWS program officials.

e.     Encourage joint research by describing the projects funded by the LWS Targeted Research and Technology program.

f.      Support the LWS Targeted Research and Technology program by establishing a prototype LWS Data System.

 


 

 

Acronym List

AISRP Ð Applied Information Systems Research Program

CCMC Ð Community Coordinated Modeling Center

CDAWeb Ð Coordinated Data Analysis Web

COHOWeb Ð COordinated Heliospheric Observations Web

FITS Ð Flexible Image Transport System

NSSDC Ð National Space Science Data Center

PDMP - Project Data Management Plan

SPDF Ð Space Physics Data Facility

SSCWeb Ð Satellite Situation Center Web