What is the SciFlo Network?

"SciFlo" stands for Scientific Dataflow. SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) Web Services and the Grid Computing standards (WS-* standards and the Globus Alliance toolkits), and enables scientists to do [:LargeScaleEarthScienceGoal:multi-instrument Earth Science] by assembling reusable SOAP Services, native executables, local command-line scripts, and python codes into a distributed computing flow (a graph of operators).


The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. A Visual Programming tool is also being developed, but it is not required. Once an analysis has been specified for a granule or day of data, it can be easily repeated with different control parameters and over months or years of data.

Goals

The goal of SciFlo is to enable large-scale, multi-instrument Earth science. The SciFlo Network is a Peer-to-Peer (P2P) Network of Grid workflow nodes. However, SciFlo actually exploits, not just workflow, but multiple [:TechTrends:technology trends]. Each SciFlo node serves many purposes and bundles together multiple open-source technologies: SOAP-based Web Services, the SciFlo dataflow engine, a file redirection/caching server, metadata stored in a relational database (mysql), an XQuery-able XML document store (Sleepycat dbxml), and a collaboration environment (shared wiki pages). The challenge is to integrate all of these technologies into a Grid workflow and collaboration environment with many nice features: lightweight, user installable, scalable, runs on a range of hardware from Windows laptops to Linux clusters, supports distributed queries, declarative dataflow, visual programming, load-balanced parallel execution, publishable algorithms and analysis flows, and generated products with preserved lineage and semantic annotations added. In short, the goal is for each scientist to have a personal scientific notebook and a personal data center that is tied automatically into a P2P network which enables Grid computation and group collaboration, all with great ease of use.


All of the power of SciFlo is available through a web browser interface. To execute a SciFlo document, possibly shared by a friend, you simply provide the desired inputs by filling out an HTML form in your browser of choice. To author a dataflow, you start from a template and edit the XML document in outline form using a “smart” XML editor, or you use the visual programming tool. The distributed dataflow execution network then does the rest:

  • It choreographs parallel execution, potentially using many nodes.
  • Data & operator movement is done automatically by the engine.

  • Each node serves data & operators, executes SciFlo documents, and is a client of other nodes.


SciFlo pervasively uses many XML-based technologies:

  • Metadata described in XML and XML schema
  • Distributed computing via XML messaging (SOAP)
  • Service & operator interfaces described in XML (WSDL)

  • Services published in queryable catalogs (UDDI)
  • Operators and data typed using XML schema and namespaces
  • Semantic “kind” annotations added to all products using Earth science ontologies (RDF/OWL)


To learn more about SciFlo, you can:


Privacy/Copyright Statement
Add to Google

FrontPage (last edited 2009-08-26 19:21:05 by GeraldManipon)