PETREL

Data Management and Sharing Pilot

Pilot Data Service for Researchers

Petrel is a pilot service for data management that allows researchers to store large-scale datasets and easily share that data with collaborators. Researchers from the Argonne Leadership Computing Facility (ALCF) and Globus are developing the system collaboratively. Petrel leverages the ALCF's storage and infrastructure and Globus's transfer and sharing services to provide a mechanism for researchers to transfer data into the system, manage data on the filesystem, and share and transfer data to other locations. Authentication and identity to access the system is provided through Globus and users can access Petrel using their campus or institution federated login.

32 Nodes with 1.7 PB usable storage
GPFS and Globus
100TB allocation per project
Transfer and sharing data with collaborators
Federated login
Self-managed by PIs

How Does Petrel Work?

Beamline scientists from Argonne's Advanced Photon Source (APS) use Petrel as a resource for their data management.

The scientists request a project allocation on Petrel. Upon approval, they have access to 100TB. They also have the right to add other users, to whom they can grant rights to manage, read, and/or write the space.

Once data is generated at APS, scientists transfer it to the project space on Petrel. They can then set up permissions to enable remote collaborators to access all or a subset of the data. Importantly, remote collaborators do not need Argonne accounts to access the data.

Petrel users can easily stage all or some of their data to a compute resource for analysis—and then move results back to Petrel.

Case Studies

Tomography

A microtomography research team at Argonne's Advanced Photon Source (APS) collects 20-80TB/month of raw data, and expects to scale to about 100-200TB/month in the near future. Microtomography can be carried out at a variety of energies suitable for 3D characterization of materials relevant to materials science, geoscience, energy storage, and biology. High-speed imaging allows for ultra-short exposure times, allowing for detailed study of transient material phenomena. Researchers in this group get an allocation at the beamline for a duration of a few days, and gather data that needs to be further processed and analyzed. Subsets of the raw data need to be moved to a diverse set of analysis and storage facilities for processing and long-term preservation. Users can leverage Petrel to help meet these requirements. APS beamline users would also like to use Petrel to track their study metadata and provenance, and share raw and analyzed data with collaborators (some working remotely).

Materials Science

A group of scientists from Argonne's Materials Science Division (MSD) gather experimental data from APS beamlines with raw data volumes ranging from 60-100 TB/month with data volumes expected to double by 2016. These scientists require a flexible environment to implement end-to-end experiment-time data analysis workflows to automate their analyses and leverage distributed computing resources. This functionality allows the researchers to compare their experimental data to simulation results drawn from high-performance computing resources to rapidly provide actionable feedback and data visualization. These scientists can leverage Petrel to help meet these requirements.

Through Petrel and Globus functionality, we provide intuitive interfaces for these researchers to share, bundle, and publish their datasets. After data is gathered, the scientists often want to share subsets of raw data, derived datasets, and analysis results with collaborators, track metadata associated with the data, and track data provenance. Eventually, these scientists may want to make their datasets publicly and persistently available via publication functionality, fully bundled with the associated metadata and associated with a persistent identifier to aide search, discovery, and data citability.