Metadata Tool
James Berger
National Oceanographic Data Center
We will build a web-form Metadata Tool that will allow scientist
to thoroughly document and QC their environmental data, regardless
of data type or format. Drop-down lists will dynamically narrow
after each
selection, to increase accuracy and ease of use, and indicate
completeness of documentation. New terms and citations can be
added to the Tool's dictionary as needed to support any depth
of documentation. The Tool's Metadata Data Base (MDB) contains
such a wealth of scientific information and access to data,
that it
will be a valuable assets at every stage of a scientific project.
When data and documentation arrive at NODC, the Tool will verify
its accuracy and completeness. FGDC, NOAA Portal and customer
metadata requirements will be satisfied by reports generated
from the MDB.
Problem
- Oceanographic data covers every scientific discipline. So,
oceanographic data consists of a large number of diverse data
sets in a wide range of formats, usually from short term projects.
Historically, standard formats' covered few data types,
and imposed expensive and error-prone reformatting requirements,
which discouraged data submission and delayed data access. Data
types not covered by standard formats are stored in originator
format with minimal inventory or access.
The
Solution starts with an ASCII dump of the originators data into
a Working Archive. Build a web tool that simplifies the documentation
process to a selection from drop-down lists, continually evaluates
documentation completeness and accuracy, and accommodates all
level of documentation. Build a Dictionary of terms to be used
in the Metadata Tool and allow all user to add terms. Provide
QA routines to evaluate data and documentation. Use the documentation
to automate the drudge jobs, and make data and documentation
available to customers immediately providing real-time
pier review.
Reuse
Existing Formats - After each selection, the Metadata Tool will
display a list of existing formats that match you selections,
so far. You can use an existing format, as is, or edit it.
Dynamic
Metadata - automate data reformatting. If a target format statement
is entered into the Metadata Tool, its attributes can be used
to find and retrieve all candidate data sets. Since each candidate
data set has a format statement, each data column can be mapped
to the target format. The program can compare unit, character
format, etc. fields to call conversion subroutines as needed.
Inventories can be constructed on the fly. Data transfer to
standard data bases can be automated. Any number of data sets
can be merged into a COTS
analysis-display utility. FGDC and NOAA Portal requirements
can be satisfied by a standard report from the MDB.
Open
Source - We propose to use platform-independent, open-source
software, i.e., HTML, Perl/CGI, MySQL, Java and Javascript,
and the same open source policy that built Perl and LINUX. This
policy will encourage participation from the world-wide environmental
community, tap free resources to design and build the system,
avoid the costs and lack-of-control of proprietary systems and
build a constituency of interested users. We welcome expertise
from everyone, and will recognize their contribution.
Auditorium
- Paper
Wednesday - 10:30 - 10:50 A.M.
|