Lawrence Berkeley National Lab

Overview

The Data Science and Technology (DST) Department - a part of the Computational Research Division (CRD) at Berkeley Lab - delivers leading-edge, innovative methods for solving data-intensive science problems. DST activities range from basic and applied research to deployment of software tools. Our projects span a diverse set of activities, including: data management; data movement; statistical, topological, and geometric analysis/analytics; computer vision; visualization; user-interface design; usability; end-to-end data-intensive system architecture and deployment. We focus on conceiving, developing, and applying leading-edge, innovative methods for solving data-intensive science problems. Our multidisciplinary teams are engaged on projects in five primary mission areas:
  • Scientific workflows and data analysis algorithms and frameworks
  • Data synthesis, management, movement, and curation of large and complex datasets
  • User-centered design of interfaces and software
  • Exascale data analytics and visualization capabilities
Our collaborators are from across the science disciplines, ranging from theoretical astrophysicists to computational and experimental bioscientists. The capabilities we build are driven by the needs of contemporary computational, observational, and experimental science projects central to the mission of the DOE Office of Science. Our portfolio includes projects in basic and applied research, advanced software development, and deployment to the scientific community. The science challenges we are helping to understand include: understanding carbon interactions between the atmosphere and the biome, interpreting results from trillion-particle space weather simulations, detecting extreme weather events in climate models, locating halo particles in accelerator models, understanding organism function, and detecting blobs in fusion experiments while the data is in transit.

Latest News

DST's Sean Peisert to lead project developing new data analysis methods for power grid

The project jointly led by LBNL and LLNL, Threat Detection and Response with Data Analytics, is part of a $220 million, three-year Grid Modernization Initiative launched in January 2016 by the Department of Energy to support research and development in power grid modernization. The goal of this project is to develop technologies and methodologies to protect the grid from advanced cyber and threats through the collection of data from a range of sources and then use advanced analytics to identify threats and how best to respond to them. Specifically, the project team hopes to be able to distinguish between power grid failures caused by cyber attacks and failures caused by other means, including natural disasters, normal equipment failures and even physical attacks.

Follow the link to read the rest: CRD news story

DST's Deb Agarwal profiled in honor of Lesbian, Gay, Bisexual, and Transgender Pride Month in June

When Deb Agarwal [the head of the DST department] first joined Berkeley Lab 22 years ago, she loved seeing the rainbow flag fly every June. But a number of years back, she noticed that it hadn't flown for a couple of years. No one seemed to think it was a big deal, but Agarwal felt otherwise, so she took it upon herself to make sure it was up every June....

Follow the link to read the rest: Profile

DST's Oliver Ruebel developer on Berkeley Lab's OpenMSI, which has been licensed to ImaBiotech

Two years ago, Lawrence Berkeley National Laboratory researchers developed OpenMSI - the most advanced computational tool for analyzing and visualizing mass spectrometry imaging (MSI) data. Last year, this web-available tool was selected as one of the 100 most technologically significant new products of the year by R&D; Magazine. Now, OpenMSI has been licensed to support ImaBiotech's Multimaging technology in the field of pharmaceutical and cosmetic research and development. The Multimaging platform essentially allows researchers to combine and overlay different image files that have been acquired from different imaging techniques - like qualitative MALDI imaging, staining and immune staining—to increase confidence in data sets.

With cutting-edge MSI technology, scientists can study tissues, cell cultures and bacterial colonies in unprecedented detail at the molecular level. This information can lead to the discovery of new drug targets, diagnostic tests and more effective drugs. Beyond healthcare, MSI can be applied to industrial biotechnology, plant agriculture, veterinary medicine, forensic investigation, environmental toxicology, combatting terrorism, even exploring the universe. However as MSI datasets have grown from gigabytes to terabytes, basic tasks like opening a file or plotting spectra and ion images became insurmountable challenges for the average scientist.

To help researchers overcome these challenges, two Berkeley Lab researchers - Oliver Ruebel of the Computational Research Division (CRD) and Ben Bowen of the Environmental Genomics and Systems Biology (EGSB) Division with support from the National Energy Research Scientific Computing Center (NERSC), conceptualized and developed OpenMSI. This work was part of a larger effort at Berkeley Lab to extend advanced computational techniques to science areas (and scientists) that haven’t benefited from them in the past. The tool, which makes highly optimized computing technologies available to researchers via a user-friendly interface, was born from the Lab's Integrated Bio-imaging Initiative, and initially developed with funds from NERSC and the Laboratory Directed Research and Development (LDRD) Program at Berkeley Lab. NERSC is a DOE Office of Science User Facility.

Because OpenMSI leverages NERSC's resources to process, analyze, store, and serve massive MSI datasets, users can work on their data at full-resolution and in real-time without any special hardware or software. They can also access their data on any device with an Internet connection.

"OpenMSI has really been a grassroots effort. We always believed that OpenMSI would be a transformational technology, so we worked with Berkeley Lab's Innovation and Partnerships Office early on to protect the Lab's intellectual property," says Ruebel.

"This licensing agreement is certainly a great achievement towards achieving our vision of transforming mass spectrometry imaging research and applications through computing, but we still have a ways to go to fulfill our vision and to fully develop OpenMSI and achieve broad adoption," adds Bowen.

ImaBiotech is a contract research organization that offers services in mass spectrometry imaging. Headquartered in Lille, France, the company also develops and implements new imaging technologies.

Open Positions in DST

DST has a a number of open positions right now - Scientific Data Management Postdoctoral Researcher

The Scientific Data Management Research Group has an immediate opening for a post-doctoral researcher to analyze storage systems and parallel I/O performance and to develop novel object-based storage technologies for upcoming exascale era. Apply here

Usable Software Systems Group - User Research Postdoctoral Researcher

The Usable Software Systems Groups is looking for a postdoc in user research with experience in ethnography, user studies, user analytics. Apply here

Computer Security and Systems Data Analysis Postdoctoral Researcher

Looking for a post-doctoral researcher to perform data engineering and analysis of security and systems-behavior data, including use of machine learning and/or graph-theoretic approaches, in multiple domains. The position will analyze systems behavior in scientific computing environments, as well as security-related issues in cyber-physical system environments. Apply here

Data Analysis and Visualization Open Positions

Openings for a research scientist, a CSE and a postdoc with expertise on scientific imaging, computer vision and machine learning algorithm development to perform pattern recognition applied to problems in material sciences, such as characterization of new composites and films for microelectronics.

Apply for Research scientist

Apply for CSE

Apply for Postdoc

Dani Ushizima Participated in Black Girls CODE Robot Expo

Last month, Dani Ushizima from the DST department (left) and Laleh Cote (Workforce Development & Education) joined forces with Black Girls CODE to support 214 girls of color (ages 7-17) at the Robot Expo. The girls participated in hands-on activities on robotics, heard from from STEM professionals, and learned about applications and uses for robots.

ESnet, CENIC Announce Joint Cybersecurity Initiative; DST's Sean Peisert to Direct

ESnet and the Corporation for Education Network Initiatives in California (CENIC) recently announced a partnership in developing cybersecurity strategy and research. CENIC is a nonprofit organization that operates the California Research & Education Network (CalREN), a high-capacity network with over 20 million users. Sean Peisert of the Computational Research Division will be director of the new CENIC/ESnet Joint Cybersecurity Initiative. Peisert, who was also recently named as the chief cybersecurity strategist for CENIC, has worked extensively in computer security research and development. He will continue his work at Berkeley Lab and as an adjunct faculty member of the University of California at Davis.

FLUXNET2015 Global Carbon Flux Dataset Released

Today, eddy covariance measurements of carbon dioxide and water vapor exchange are being made routinely on all continents. The flux measurement sites are linked across a confederation of regional networks in North, Central and South America, Europe, Asia, Africa, and Australia, in a global network, called FLUXNET. This global network includes more than eight hundred active and historic flux measurement sites, dispersed across most of the world’s climate space and representative biomes. the FLUXNET-Fluxdata website (http://fluxnet.fluxdata.org/), hosted at the Lawrence Berkeley National Laboratory (USA). Here the data that have been shared by the Regional Networks and processed and harmonized to share with the FLUXNET communities. Fluxdata website offers a number of tools in addition to the data access such communication and ideas sharing platforms, documentation, and support to the FLUXNET data users. The FLUXNET2015 dataset is the first new global FLUXNET dataset since the LaThuile dataset in 2007. The team involved in preparing the data are spread across the towers and regional networks that contributed data as well as UC Berkeley (Housen Chu and Dennis Baldocchi), University of Tuscia (Dario Papale and Carlo Trotta), University of Virginia (Marty Humphrey and Norm Beekwilder), and Berkeley Lab. The DST personnel on the team are Gilberto Pastorello (data processing and preparation), Megha Sandesh (FLUXNET data portal), You-Wei Cheah (metadata and data preparation), and Deb Agarwal (data lead). Image credit - Housen Chu, UC Berkeley.

OpenMSI wins R & D 100 Award

OpenMSI is the most advanced tool for analyzing and visualizing mass spectrometry instruments (MSI) data that is available via web-browser. MSI technology enables scientists to study tissues, cell cultures, and bacterial colonies in unprecedented detail at the molecular level. Nowadays, MSI datasets range from tens of gigabytes to several terabytes. Thus, basic tasks like opening a file or plotting spectra and ion images become insurmountable challenges. OpenMSI overcomes these obstacles by making highly optimized computing technologies available via a user-friendly interface. Because OpenMSI leverages NERSC resources to process, analyze, store, and serve massive MSI datasets, users can now work on their data at full-resolution and in real-time without any special hardware or software. They can also access their data on any device with an internet connection. Ben Bowen of the Environmental Genomics and Systems Biology Division and Oliver Ruebel of the Computational Research Division led the development of the technology. Several other DST and NERSC personnel were involved in the development of this technology.

Sean Peisert Chairs Second ASCR Cybersecurity Workshop and Edits Subsequent Report

Sean Peisert chaired a second ASCR-sponsored workshop on the subject of cybersecurity research for scientific computing integrity in June 2015. The goal of this workshop was to define a long-term 10 to 20 year fundamental basic research and development strategy and roadmap regarding scientific computing integrity facing future high performance computing (HPC) and scientific user facilitates. This report builds on the findings of a previous ASCR Cybersecurity workshop, to examine computer security research gaps and approaches for assuring scientific computing integrity specific to the mission of the DOE Office of Science. Subsequently, Peisert also compiled and edited the resulting workshop report.

Shreyas Cholia: Systems Engineer by Day, KALX DJ by Night

Shreyas Cholia is not just another software engineer. If you've tuned into KALX lately, UC Berkeley's radio station, you may just have been enjoying the DJ sensibilities of DST's own Shreyas Cholia, a computational systems engineer who works jointly for CRD and NERSC. Cholia started as a volunteer at KALX in 2003, a year after joining the Lab, looking for a way to channel his love of music into an interesting extracurricular activity. Twelve years later, he's hosting his own KALX show every other Tuesday night. Read more in the recent article in Today at Berkeley Lab.

Dani Ushizima Receives DOE Early Career Research Award

Dani Ushizima of the Data Science and Technology Department has received a 2015 Early Career Research Program award from the Department of Energy's Office of Science. The award will fund research into developing new methods to help scientists extract more information from digital images produced by experiments studying materials such as ceramics and geological samples at the Department of Energy (DOE) facilities. The work is important as facilities are deploying instruments that can produce digital images at much greater resolution and much more frequently than just a few years ago. Images are being generated so quickly that scientists are struggling to keep up and extract information contained in this data modality.. . Read more in the recent article on The Computational Research Division website.

Sean Peisert Chairs IEEE Symposium on Security and Privacy

Sean Peisert is serving as general chair of the 36th IEEE Symposium on Security and Privacy, May 18–20, 2015 in San Jose, California. Since 1980, the IEEE Symposium on Security and Privacy has been the premier forum for presenting developments in computer security and electronic privacy, and for bringing together researchers and practitioners in the field. This year's program committee has selected 55 research papers covering a wide range of topics. An estimated 500 attendees from around the world are expected. As in past years, there will be a poster session, a short talks session, and several workshops that will take place alongside the symposium, including workshops on privacy engineering, genomic security and privacy, language theoretic security, mobile security, and Web 2.0 security and privacy. Also at the symposium is a NITRD panel, featuring panelists from NSF, DHS S&T and other government agencies involved in creating the 2015 Federal Cybersecurity R&D Strategic Plan, as well as several "Birds of a Feather" sessions including discussions on network integrity, the Federal Cybersecurity plan and privacy in affective computing.

Sean Peisert Chairs ASCR Cybersecurity Workshop and Edits Report

Sean Peisert chaired an ASCR-sponsored workshop on the subject of cybersecurity research for scientific computing integrity, a key interest of the DOE Office of Science, ASCR in particular. Subsequently, He also compiled and edited the resulting workshop report.

IPython Featured in Nature News

IPython was recently featured in Nature News. The IPython team is led by Fernando Perez, who joined DST earlier this month. IPython notebook makes data analysis easier to record, understand and reproduce. The article is available here and a live, interactive demo is available here

Best Paper Award at the IEEE Visualization Large Data Analysis and Visualization Symposium

Alexy Agronovsky of UCD was the lead author on the paper "Improved Post Hoc Flow Analysis Via Lagrangian Representations" which won Best Paper Award at the IEEE Visualization Large Data Analysis and Visualization Symposium. The basic idea is that flow field analysis can be done more accurately using a Lagrangian basis rather than an Eulerian basis, and that the work needed to produce the Lagrangian analysis can be done in situ, which results in not only better (more accurate) analysis but also at much less I/O cost. Hank Childs of DST led the team. Additional contributors to the paper are David Camp, Christoph Garth, E. Wes Bethel, and Kenneth I. Joy.

More information about the symposium can be found here.

Agarwal Named as Inria International Chair

The Inria Research Center in Rennes, France has awarded Deb Agarwal an International Chair position. This position is a part of the DALHIS Associated Team which is a collaboration between Dr. Christine Morin’s Inria Myriads team and the Data Science and Technology Department.

CRD Reorganization creates Data Science and Technology Department

The Data Science and Technology Department was announced today. This department brings together groups working across the span of data science problems. The new department is made up of the iintegrated Data Frameworks group led by Dan Gunter, the Scientific Data Management group led by John Wu, the Data Analytics and Visualization group led by Wes Bethel, the Usable Software systems group led by Lavanya Ramakrishnan. These groups have a long history of research and development in data science. This reorganization better aligns the groups to work together to address problems holistically. The Computational Science Department, formed at the same time, is composed of the groups focused more on a particular science and includes Craig Tull's Science Software Systems group. More details about the reorganization can be found here.

Inspiring Women in Computing

Women from Berkeley Lab's Computing Sciences area and DST delivered talks, volunteered as mentors and helped organize and energize this year's Grace Hopper Celebration of Women in Computing. More details available here.

SPOT Suite noted in NPR's Science Friday

NPR's Science Friday notes NERSC, ESnet, CRD, SLAC collaboration that uses the SPOT Suite developed by DST researchers. This was based on observations of interesting science that was submitted by listeners. More details available here.

Craig Tull Leading Multi-Disciplinary Team Enabling the SPOT Suite Transformation at ALS Beamlines

Earlier this year, the ALS became the first and only facility worldwide to fully automate GISAXS/GIWAXS measurements. This tool is primarily used to characterize the assembly and shape of nanoscopic objects at surfaces or buried interfaces in thin films—including materials like organic photovoltaics, fuel cell membranes or batteries. Combine this capability with SPOT Suite, and researchers can run experiments at this beamline from anywhere in the world, provided they have Internet access. Users can mount their samples onto barcoded sample holders at home and ship them to the ALS. At the facility, a robot arm transfers each new sample to the measurement stage, where it is automatically aligned into grazing incidence using the X-ray beam. A barcode reader informs the computer system which sample is mounted and how to run the sample. Data acquisition software then moves the sample to all angles pre-specified by the researcher and chooses the appropriate exposure time automatically for each image. As images are collected, SPOT Suite sends the data to NERSC via ESnet for scientists to access and view any time. “This automated system represents a significant leap forward in terms of labor saving, ease of use and throughput,” says Alexander Hexemer, who manages the GISAXS/GIWAXS beamline at the ALS. Read more

Daniela Ushizima, Deb Agarwal, and Wes Bethel Named to the Berkeley Institute for Data Science

The Berkeley Institute for Data Science has introduced its senior fellows, including Lab researchers Deborah Agarwal, E. Wes Bethel, Peter Nugent, Saul Perlmutter, David Schlegel, James Sethian, Kimmen Sjolander, Kyle Barbary, Beth Reid, and David Culler. The funded science fellows were also named with Daniela Ushizima receiving the only award to an LBL researcher (congratulations Dani!). Go here to view the complete list.

Berkeley Lab Hosts Week long Discovery Workshop covering Big Data analytics

Today, the tools available to the scientific community are undergoing a major revolution with a wide range of innovations that are enabling powerful new capabilities for knowledge discovery and analytics. The majority of these innovations are being motivated by major large scale science research initiatives. The foundation for these innovations consists of multiple new information technology architectures, tools, techniques and platforms. Berkeley Lab's Computational Research Division (CRD) and Univ. of California's AMPLab hosted a weeklong workshop on big data analytics from June 2-6.

This year, the workshop covered a wide range of topics including machine learning, graph processing, data security, tools for big data analytics. "The workshop was very successful and allowed the participants to both understand the general landscape but also dive-in deep with hands-on tutorials. The key to success of the workshop was the expertise at Berkeley Lab in these areas," said organizer Lavanya Ramakrishnan of CRD's Advanced Computing for Science Department.

In addition to Berkeley Lab and UC Berkeley, the speakers were from Cloudera, Apple, Cisco, Yahoo!, UC Davis, Nebula Inc, Microsoft, Google, HP Labs, Adatao, Databricks, Hortonworks and Amazon.

Eugen Feller Completes Inria Postdoc Visit

Eugen Feller has spent the last year visiting LBNL as an Inria@SiliconValley Post-doc as part of the DST Department/Inria Associated Team DALHIS. During his time at LBNL Eugen contributed to several research projects including Frieda and AmeriFlux. To read more about Eugen's experiences, see the interview he did once back at Inria.

Taghrid Samak Highlighted in Scientific Computing

Taghrid Samak of Berkeley Lab's Computational Research Division admits with a laugh that she wasn't one of those kids who started programming on the home computer at age 10. And if she hadn't followed her father's advice, she might have ended up looking for political solutions to pressing problems, rather than working on computational approaches to scientific challenges. Read more.

DST Intern Amy Nesky Presents Poster at the SULI Poster Session

DST intern Amy Nesky presented a poster on the work she did for the Baryon Oscillation Spectroscopic Survery (BOSS) at the Fall SULI poster session. Amy's work involved designing and developing an interactive, visual analytics website to help track and analyze the progress of the survey. See the poster for more details.

DST Team Helping Usher in a New Era of Light Source Computational Science

DST's Craig Tull was recently featured in an article for DEIXIS Magazine. The project is an LDRD involving CRD, NERSC, ALS, and ESnet personnel. DST personnel involved in the project include Craig Tull (project lead), Abdelilah Essiari, and Lavanya Ramakrishnan.

DST Team Helping Materials Science

LBNL's Kristin Persson recently co-authored an article for Scientific American about the Materials Project. The project here at LBNL involves personnel from many divisions across the lab. DST personnel involved in the project include Dan Gunter and Miriam Brafman.

TechWomen Participants visit LBNL

On October 18, participants and mentors from the TechWomen program visited LBNL for a tour of its facilities, including the Advanced Light Source. After the tour, DST department head Deb Agarwal gave a presentation on how to be an exceptional leader. The visit was organized by DST's Taghrid Samak.

Craig Tull and Team Reimagining ALS Data Environment with SPOT Suite

DEIXIS Magazine Annual 2013 featured Craig Tull, DST, Dula Parkinson, ALS, Jack Deslippe, NERSC, and others in an article about the growing flow of data from light sources. The team is working to move beamline data in real time via ESnet to some of the nation's most powerful open-science computers at NERSC, where it is processed, analyzed and visualized on the fly. Tull and team are working with the Advanced Light Source (ALS) at the Lab, but it can be extended to work with other light sources. "We're trying to move the typical data-intensive beamline into a world where they can take advantage of leadership-class, high-performance computing abilities," says Tull, who leads the project. "In the final analysis what we're trying to do is drive a quantum leap in science productivity. Read more

Deb Agarwal Among Lab Women Honored for Contributions to Science, Education

Deborah Agarwal, head of CRD's Advanced Computing for Science Department, was among 15 women honored October 18 during the first annual Women@The Lab event. Sponsored by the lab's Diversity and Inclusion Office and the Women Scientists and Engineers Council, the event highlighted the women's contributions to science and technology as well as the lab's commitment to diversity and its support for the Science, Technology, Engineering and Mathematics (STEM) workforce.

DST members participated in the LBNL 2013 Runaround

DST Summer Students contributing on projects

As the summer starts to wind toward a new school year, we want to acknowledge the excellent work of our summer students. Here is a list of this year's summer's students and a brief title for what they worked on while they were here.

  • Tonglin Li, from Illinois Institute of Technology, Chicago. FRIEDA state management in cloud environments.
  • Zhao Zhang, University of Chicago. Next-generation infrastructure support for mixed workloads.
  • Morgan Hargrove, Louisiana State University - Materials Project workflow interface.
  • Ryan Rodriguez, University of California Santa Cruz. Tigres visual representation of workflows.
  • Ahmed el Hassany, Indiana University. Interoperability between ESnet lookup service and IU's lookup/topology service (UNIS) to avoid fragmentation of perfSONAR landscape going forward (Joint with Esnet).
  • Karlyn Harrod, University of St. Thomas. Energy consumption models for distributed systems.
  • Jin Huang, University of Texas at Arlington. Machine learning for modeling network utilization.

Sarah Poon Organizes Talks and Tour for East Bay Consortium High School Students

Sarah Poon of DST along with Boun Khamnouane of the East Bay Consortium of Educational Institutions organized a visit of about 50 high school students from the East Bay to learn about careers in science, technology, engineering and mathematics. Read more

Taghrid Samak Works to Impact Social Development in Egypt

Since the Egyptian uprising that ultimately toppled the 30-year reign of Hosni Mubarak began on Jan. 25, 2011, Taghrid Samak of Berkeley Lab’s Computational Research Division has watched as the initial hope for her homeland has unraveled into a “messy” situation, as she puts it. But last month, Samak was at MIT, meeting with other Egyptian professionals to take concrete steps to address at least some of the pressing issues in the country that launched the Arab Spring. She chaired the 2013 EgyptNEGMA (Networking, Entrepreneurship, Growth, Mobilization, and Action) conference to review 10 finalist proposals for advancing social development in Egypt and choosing the top three. Read more

DST Team Contributes to Developing Tools to Reduce Greenhouse Gases at the Source

Despite advances in alternative energy sources, the United States will continue to rely on coal-fired power plants to generate much of the nation's electricity for the next 20 years or more. While coal is an economically viable fuel, its environmental cost is high-in 2011, coal accounted for 34 percent of the energy-related carbon dioxide (CO2) emissions in the United States. This is why a U.S. Department of Energy project, called the Carbon Capture Simulation Initiative (CCSI), is bringing together America's national laboratories, industry and academic institutions, to develop and deploy state-of-the-art computational modeling and simulation tools to accelerate the commercialization of carbon capture technologies in power plants. As part of this collaboration, computational researchers at Lawrence Berkeley National Laboratory (Berkeley Lab) are playing key roles in the development of the computational tools. on its industry advisory board. Read more

In the context of CCSI, Joshua Boverhof of CRD's Advanced Computing for Science Department developed the Turbine Science Gateway (TSG), a code for running the AspenTech process simulation applications in parallel on cloud computing systems, clusters, or on standalone machines. He recently won honorable mention for TSG in a competition sponsored by Amazon. Read more

view archived news