What is BigData Express? 

The Challenges

In DOE research communities, the emergence of distributed, extreme-scale science applications is generating significant challenges regarding data transfer. We believe that the data transfer challenges of the extreme-scale era are characterized by two relevant dimensions:

  • High-performance challenges. The DOE is working toward deploying terabit networks in support of extreme-scale science applications. Ideally, high-performance data transfer will reach terabit/s throughput to make full use of the underlying networks.
  • Time-constraint challenges. Scientific applications typically have explicit or implicit time constraints on data transfer. Providing real-time and deadline-bound data transfer is a challenging task in the extreme-scale era.

Although significant improvements have been made in the area of bulk data transfer, currently available data transfer tools and services will not be able to successfully meet these challenges, for the following reasons:

  • Existing data transfer tools and services lack a data-transfer-centric approach to seamlessly and effectively integrating and coordinating the various entities in an end-to- end data transfer loop.
  • Existing data transfer tools and services lack effective mechanisms to minimize cross- interference between data transfers.
  • Existing data transfer tools and services are oblivious to user (or user application) requirements (e.g., deadlines and QoS requirements).
  • Inefficiencies arise when existing data transfer tools are run on DTNs.

These are common and fundamental problems for bulk data transfer in the extreme-scale era. In this proposal, we seek to address these problems.

BigData Express

To address these problems, DOE’s ASCR Network Researech Program has funded Fermilab (FNAL) and Oak Ridge National Laboratory (ORNL) to collaboratively work on the BigData Express project. BigData Express aims to provide Schedulable, Predictable, and High-performance data transfer service for DOE large-scale science computing facilities (LCF, NERSC, US-LHC computing facilities, etc.) and collaborators.

Design principles:

  • Parallelism
  • Integration
  • Cooperation

Key features:

  • A data-transfer-centric architecture to seamlessly integrate and effectively coordinate the various resources in an end-to-end data transfer loop
  • Employment of SDN and SDS to improve network and storage I/O performance
  • A time-constraint-based scheduler to schedule data transfer tasks
  • An admission control mechanism to provide guaranteed resources for admitted data transfer tasks
  • A rate control mechanism to improve data transfer schedulability and reduce cross- interference between data transfers

Research Team

BigData Express is a joint research project between FNAL and ORNL. In addition, ESnet, as an unfunded project partner, will provide the underlying SDN- based WAN services required for a successful project.

FNAL (Lead institution)

  • Dr. Wenji Wu (PI), Email: wenji@fnal.gov
  • Mr. Phil DeMar (Co-PI), Email: demar@fnal.gov

ORNL

  • Dr. Gary Liu (Collaborating ORNL PI), Email: liuq@ornl.gov
  • Dr. Norbert Podhorszki (Co-PI), Email: pnorbert@ornl.gov

 

  • Last modified
  • 01/08/2016