You are here

Research & Development: Current Projects
Compelling Computing Research & Development from A to Z

Listed here are some of the many of the research projects and topics underway in Computation. Projects vary in size, scope, and duration, but what they share is a focus on developing tools and methods that help LLNL deliver on its missions to the nation and, more broadly, advance the state of the art in high-performance scientific computing.

Application-Level Resilience

Application-level resilience is emerging as an alternative to traditional fault-tolerance approaches because it provides fault tolerance at a lower cost than traditional approaches. LLNL researchers are implementing application-level resilience in ddcMD, which now has the ability to incorporate lost data again in its workload and continue its execution in the presence of most errors without needing to restart the entire application.

Focus Area:
Fault Tolerance

AutomaDeD: Diagnosing Performance and Correctness Faults

AutomaDeD is a tool that automatically diagnoses performance and correctness faults in MPI applications. It has two major functionalities: identifying abnormal MPI tasks and code regions and finding the least-progressed task. The tool produces a ranking of MPI processes by their abnormality degree and specifies the regions of code where faults are first manifested.

Focus Areas:
Parallel Software Development Tools | Performance Analysis Tools | Debugging and Correctness Tools

BLAST: High-Order Finite Element Hydrodynamics

Through research funded at LLNL, scientists have developed BLAST, a high-order finite element hydrodynamics research code that improves the accuracy of simulations, provides a path to extreme parallel computing and exascale architectures, and gives a high performance computing advantage since its greater FLOP/byte ratios result in more time spent on floating point operations relative to memory transfer.

Caliper: Application Introspection System

A comprehensive understanding of the performance behavior of large-scale simulations requires the ability to compile, analyze, and compare measurements and contexts from many independent sources. Caliper, a general-purpose application introspection system, makes that task easier by connecting various independent context annotations, measurement services, and data processing services.

Focus Areas:
Parallel Software Development Tools | Job Scheduling & Resource Management | Debugging and Correctness Tools

Co-design

The Department of Energy (DOE) has a long history of deploying leading-edge computing capability for science and national security.

Cram: Running Millions of Concurrent MPI Jobs

Cram lets you easily run many small MPI jobs within a single, large MPI job by splitting MPI_COMM_WORLD up into many small communicators to run each job in the cram file independently. A job comprises the pieces needed to run a parallel MPI program. Cram was created to allow automated test suites to pack more jobs into a BG/Q partition, and to run large ensembles on systems where the scheduler will not scale.

Focus Areas:
Parallel Software Development Tools | Middleware for Parallel Performance Tools

Data-Intensive Computing Solutions

New platforms are improving big data computing on Livermore’s high performance computers.

Derived Field Generation Execution Strategies

Livermore computer scientists have helped create a flexible framework that aids programmers in creating source code that can be used effectively on multiple hardware architectures.

Enhancing Image Processing Methods

Researchers are developing enhanced computed tomography image processing methods for explosives identification and other national security applications.

ESGF: Supporting Climate Research Collaboration

The Earth System Grid Federation is a web-based tool set that powers most global climate change research.

ETHOS: Enabling Technologies for High-Order Simulations

The Enabling Technologies for High-Order Simulations (ETHOS) project performs research of fundamental mathematical technologies for next-generation high-order simulations

Focus Area:
Mesh Management

ExReDi: Extreme Resilient Discretization

Because of the end of Dennard scaling, computing capability is increasing through more processing units, not faster clock

FGFS: Fast Global File Status

Fast Global File Status (FGFS) is an open-source package that provides scalable mechanisms and programming interfaces to retrieve global information of a file, including its degree of distribution or replication and consistency. It turns expensive, non-scalable file system calls into simple string comparison operations. Most FGFS file status queries complete in 272 milliseconds or faster at 32,768 MPI processes, with the most expensive operation clocking in at less than 7 seconds.

Focus Areas:
Parallel Software Development Tools | Middleware for Parallel Performance Tools

Flux: Building a Framework for Resource Management

Livermore researchers have developed a toolset for solving data center bottlenecks.

Focus Areas:
Parallel Software Development Tools | Job Scheduling & Resource Management | System Software | Resource Management

GLVis: Finite Element Visualization

GLVis is a lightweight OpenGL-based tool for accurate and flexible finite element visualization. It is based on MFEM, a finite element library developed at LLNL. GLVis provides interactive visualizations of general finite element meshes and solutions, both in serial and in parallel. It encodes a large amount of parallel finite element domain-specific knowledge; e.g., it allows the user to view parallel meshes as one piece, but it also gives them the ability to isolate each component and observe it individually. It provides support for arbitrary high-order and NURBS meshes (NURBS allow more accurate geometric representation) and accepts multiple socket connections so that the user may have multiple fully-functional visualizations open at one time. GLVis can also run a batch sequence, or a series of commands, which gives the user precise control over visualizations and enables them to easily generate animations.

GREMLINs: Emulating Exascale Conditions on Today's Platforms

To overcome the shortcomings of the analytical and architectural approaches to performance modeling and evaluation, we are developing techniques that emulate the behavior of anticipated future architectures on current machines. We are implementing our emulation approaches in what we call the GREMLIN framework. Using GREMLIN, we can emulate a combined effect of power limitations and reduced memory bandwidth and then measure the impact of the GREMLIN modifications.

Focus Areas:
Parallel Software Development Tools | Performance Analysis Tools

High-order Finite Volume Methods

High-resolution finite volume methods are being developed for solving problems in complex phase space geometries, motivated by kinetic models of fusion plasmas.  Techniques being investigated include conservative, high-order methods based on the method-of-lines for hyperbolic problems, as well as coupling to implicit solvers for fields equations.  Mapped multiblock grids enable alignement of the grid coordinate directions to accomodate strong anisotrropy.  The algorithms developed will be broadly applicable to systems of equations with conservative formulations in mapped geometries.

Focus Area:
Plasma Physics

HPC Code Performance: Challenges and Solutions

LLNL researchers are finding some factors are more important in determining HPC application performance than traditionally thought.

HPSS: Data Archiving in a Supercomputing Environment

At LLNL, data isn’t just data; it’s collateral, and the Laboratory’s high performance computing (HPC) users produce massive quantities of it, ave

Focus Area:
I/O, Networking, and Storage

HYPRE: Scalable Linear Solvers and Multigrid Methods

Livermore’s hypre library of solvers makes larger, more detailed simulations possible by solving problems faster than ever before. It offers one of the most comprehensive suites of scalable parallel linear solvers available for large-scale scientific simulation.  

InfiniBand: Improving Communications for Large-scale Computing

Livermore Computing staff is enhancing the high-speed InfiniBand data network used in many of its high-performance computing and file systems.

Inter-job Interference

Message passing can reduce throughput for massively parallel science simulation codes by 30% or more due to contention with other jobs for the network links. We investigated potential causes of performance variability. Reducing this variability could improve overall throughput at a computer center and save energy costs.

LibRom: POD-based Reduced Order Modeling

LibRom is a library designed to facilitate Proper Orthogonal Decomposition (POD) based Reduced Order Modeling (ROM).  In POD

LMAT: Livermore Metagenomics Analysis Toolkit

The Livermore Metagenomic Analysis Toolkit (LMAT) is a genome sequencing technology that helps accelerate the comparison of genetic fragments with reference genomes and improve the accuracy of the results as compared to previous technologies. It tracks approximately 25 billion short sequences and is currently being evaluated for potential operational use in global biosurveillance and microbial forensics by various federal agencies.

Focus Areas:
Bioinformatics | Computational Biology

Machine Learning: Strengthening Performance Predictions

LLNL computer scientists are using machine learning to model and characterize the performance and ultimately accelerate the development of adaptive applications.

Master Block List: Protecting Against Cyber Threats

Master Block List is a service and data aggregation tool that aids Department of Energy facilities in creating filters and blocks to prevent cyber attacks.

Mathematical Techniques for Data Mining Analysis

Newly developed mathematical techniques reveal important tools for data mining analysis.

Focus Area:
Data Analytics and Management

Memory-Centric Architectures

The advent of many-core processors with a greatly reduced amount of per-core memory has shifted the bottleneck in computing from FLOPs to memory. A new, complex memory/storage hierarchy is emerging, with persistent memories offering greatly expanded capacity, and augmented by DRAM/SRAM cache and scratchpads to mitigate latency.  As shown above, non-volatile random access memory (NVRAM), Resistive RAM (RRAM), or Phase Change Memory (PCM) may be memory or I/O bus attached, and may utilize DRAM buffers to improve latency and reduce wear.

Our research program focuses on transforming the memory-storage interface with three complementary approaches:

*Active memory and storage in which processing is shared between CPU and in-memory/storage controllers,
*Efficient software cache and scratchpad management, enabling memory-mapped access to large, local persistent stores,
*Algorithms and applications that provide a latency-tolerant, throughput-driven, massively concurrent computation model.

MFEM: Scalable Finite Element Discretization Library

Livermore’s open-source MFEM library enables application scientists to quickly prototype parallel physics application codes based on partial differential equations (PDEs) discretized with high-order finite elements. The MFEM library is designed to be lightweight, general and highly scalable, and conceptually can be viewed as a finite element toolkit that provides the building blocks for developing finite element algorithms in a manner similar to that of MATLAB for linear algebra methods. It has a number of unique features, including: support for arbitrary order finite element meshes and spaces with both conforming and nonconforming adaptive mesh refinement; advanced finite element spaces and discretizations, such as mixed methods, DG (discontinuous Galerkin), DPG (discontinuous Petrov-Galerkin) and Isogeometric Analysis (IGA) on NURBS (Non-Uniform Rational B-Splines) meshes; and native support for the high-performance Algebraic Multigrid (AMG) preconditioners from the HYPRE library.

Focus Area:
Numerical PDEs/High-Order Discretization Modeling

MPI_T: Tools for MPI 3.0

MPI_T is an interface for tools introduced in the 3.0 version of MPI. The interface provides mechanisms for tools to access and set performance and control variables that are exposed by an MPI implementation. The latest versions of major MPI implementations are already providing MPI_T functionality, making it widely accessible to users. We have developed a set of MPI_T tools, Gyan and VarList, to help tool writers with the new interface.

Focus Areas:
Parallel Software Development Tools | Middleware for Parallel Performance Tools

Network Modeling and Simulation

To ensure that the supercomputing power at our disposal is not wasted, we must ascertain that our applications can run at their peak performance; the amount of communication in an application will be the primary determinant of performance at those scales. Fast, scalable, and accurate modeling/simulation of an application’s communication is required to prepare parallel applications for exascale.

O(N) First Principles Molecular Dynamics

LLNL researchers are developing a truly scalable first-principles molecular dynamics algorithm with O(N) complexity and controllable accuracy, capable of simulating systems of sizes that were previously impossible with this degree of accuracy. By avoiding global communications, a practical computational scheme capable of extreme scalability has been implemented.

PAVE: Performance Analysis and Visualization at Exascale

Performance analysis of parallel scientific codes is becoming increasingly difficult, and existing tools fall short in revealing the root causes of performance problems. We have developed the HAC model, which allows us to directly compare the data across domains and use data visualization and analysis tools available in other domains.

Focus Areas:
Performance Analysis Tools | Parallel Software Development Tools

PDES: Modeling Complex, Asynchronous Systems

PDES focuses on models that can accurately and effectively simulate California’s large-scale electric grid.

Focus Areas:
Parallel Discrete Event Simulation | Cyber Security | Modeling and Simulation

Phase Field Modeling

Livermore researchers have developed an algorithm for the numerical solution of a phase-field model of microstructure evolution in polycrystalline materials. The system of equations includes a local order parameter, a quaternion representation of local orientation and species composition. The approach is based on a Finite Volume discretization and an implicit time-stepping algorithm.  Recent developments have been focused on modeling solidification in binary alloys, coupled with CALPHAD methodology.

Power Measurement

Modern processors offer a wide range of control and measurement features that are traditionally accessed through libraries like PAPI. However, some newer features no longer follow the traditional model of counters, and all of these features are controlled through Model Specific Registers (MSRs). libMSR provides a convenient interface to access MSRs and to allow tools to utilize their full functionality.

Focus Areas:
Parallel Software Development Tools | Middleware for Parallel Performance Tools

Predictive Vascular Modeling

Livermore researchers are enhancing HARVEY, an open-source parallel fluid dynamics application designed to model blood flow in patient-specific geometries. Researchers will use HARVEY to achieve a better understanding of vascular diseases as well as cancer cell movement through the bloodstream. Establishment of a robust research platform could have direct impact on patient care. HARVEY is also an enabling capability for the BAASiC initiative.

Focus Area:
Computational Biology

PSUADE: Uncertainty Quantification

The growth of high-performance supercomputing technology and advances in numerical techniques have resulted in the emergence of the uncertainty quantification (UQ) discipline, whose goal is to enable scientists to make precise statements about the degree of confidence they have in their simulation-based predictions. Uncertainty quantification is defined as the identification, characterization, propagation, analysis, and reduction of all uncertainties in simulation models.

Focus Areas:
Uncertainty Quantification | Non-intrusive UQ Methods

P^nMPI: Low-overhead Wrapper Library

PMPI is a success story for HPC Tools, but it has a number of shortcomings. LLNL researchers aimed to virtualize the PMPI interface, enable dynamic linking of multiple PMPI tools, create extensions for modularity, reuse existing binary PMPI tools, and allow dynamic tool chain selection. The result is PnMPI, a thin, low-overhead wrapper library that is automatically generated from mpi.h file and that can be linked by default.

Focus Areas:
Parallel Software Development Tools | Middleware for Parallel Performance Tools

Qbox: Computing Electronic Structures at the Quantum Level

LLNL’s version of Qbox, a first-principles molecular dynamics code, will let researchers accurately calculate bigger systems on supercomputers.

RAJA: Managing Application Portability for Next-Generation Platforms

A Livermore-developed programming approach helps software to run on different platforms without major disruption to the source code.

ROSE Compiler

ROSE, an open-source project maintained by Livermore researchers, provides easy access to complex, automated compiler technology and assistance.

Focus Areas:
Cyber Security | Secure Coding

SAMRAI: Structured Adaptive Mesh Refinement Application Infrastructure

The Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory is developing algorithms and software technology to enable the application of structured adaptive mesh refinement (SAMR) to large-scale multi-physics problems relevant to U.S. Department of Energy programs. The SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library is the code base in CASC for exploring application, numerical, parallel computing, and software issues associated with SAMR.

Focus Area:
Mesh Management

Scalable Quantum Molecular Dynamics Simulations

LLNL researchers are developing a new algorithm for use with first-principles molecular dynamics (FPMD) codes that will enable the number of atoms simulated to be proportional to the number of processors available; with traditional algorithms, the size of simulations is much too small to model complex systems or realistic materials. The researchers have achieved excellent scaling on 100,000 cores of Vulcan with 100,000 atoms at a rate of about 4 minutes per time step.

Focus Area:
Algorithm Development at Extreme Scale

Scaling Up Transport Sweep Algorithms

LLNL researchers are testing and enhancing a neutral particle transport code and the algorithm on which the code relies to ensure that they successfully scale to larger and more complex computing systems.

SCR: Scalable Checkpoint/Restart for MPI

To evaluate the multilevel checkpoint approach in a large-scale, production system context, LLNL researchers developed the Scalable Checkpoint/Restart (SCR) library. With SCR, we have found that jobs run more efficiently, recover more work upon failure, and reduce load on critical shared resources. Research efforts now focus on reducing the overhead of writing checkpoints even further.

Focus Area:
Fault Tolerance

Serpentine Wave Propagation

The Serpentine project develops advanced finite difference methods for solving hyperbolic wave propagation problems. Our approach is based on solving the governing equations in second order differential formulation using difference operators that satisfy the summation by parts (SBP) principle. The SBP property of our finite difference operators guarantees stability of the scheme in an energy norm.

Spack: A Flexible Package Manager for HPC Software

High-performance computing (HPC) software is becoming increasingly complex, quickly outpacing the capabilities of existing software management tools.

Focus Area:
Middleware for Parallel Performance Tools

Spindle: Scalable Shared Library Loading

Spindle is a tool for improving the library-loading performance of dynamically-linked HPC applications. It plugs into the system’s dynamic linker and intercepts its file operations so that only one process (or other small amount) will perform the file operations necessary and share the results with other processes in the job.

Focus Areas:
Parallel Software Development Tools | Middleware for Parallel Performance Tools

StarSapphire: Data-driven Modeling and Analysis

StarSapphire is a collection of projects in the area of scientific data mining focusing on the analysis of data from scientific simulations, observations, and experiments.

STAT: Discovering Supercomputers' Code Errors

LLNL’s Stack Trace Analysis Tool helps users quickly identify errors in code running on today’s largest machines.

Focus Areas:
Parallel Software Development Tools | Debugging and Correctness Tools

SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers

SUNDIALS is a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers.  It consists of the following 6 solvers: consists of the following six solvers: CVODE, solves initial value problems for ordinary differential equation (ODE) systems; CVODES, solves ODE systems and includes sensitivity analysis capabilities (forward and adjoint); ARKODE, solves initial value ODE problems with additive Runge-Kutta methods, include support for IMEX methods; IDA, solves initial value problems for differential-algebraic equation (DAE) systems; IDAS, solves DAE systems and includes sensitivity analysis capabilities (forward and adjoint); KINSOL, solves nonlinear algebraic systems.

Task Mapping

As processors have become faster over the years, the cost of communicating data has grown higher. It is imperative to maximize data locality and minimize data movement on-node and off-node. Using profiling tools, we can characterize different classes of applications and use specialized profilers to measure specific phases of an application in detail. We can also predict the performance benefits of intelligently mapping applications by combining a variety of network and system measurements.

TESSA: Tracking Space Debris

Testbed Environment for Space Situational Awareness software helps to track satellites and space debris and prevent collisions.

Topological Analysis: Charting Data’s Peaks and Valleys

LLNL and University of Utah researchers have developed an advanced, intuitive method for analyzing and visualizing complex data sets.

TOSS: Speeding Up Commodity Cluster Computing

Researchers have been developing a standardized and optimized operating system and software for deployment across a series of Linux clusters to enable high-performance computing at a reduced cost.

Veritas: Validating Proxy Apps

Veritas provides a method for validating proxy applications to ensure that they capture the intended characteristics of their parents. Previously, the validation process has been done mostly by manually matching algorithmic steps in proxy applications to the parent or by relying on the experience of the code developer. Veritas can identify and compare performance sinks in areas such as memory, cache utilization, and network utilization.

Focus Area:
Co-design

VPC: Variable Precision Computing

Decades ago, when memory was a scarce resource, computational scientists routinely worked in single precision and were more sophisticated in dealing with the pitfalls finite-precision arithmetic. 

XBraid: Parallel Time Integration with Multigrid

The scalable multigrid reduction in time (MGRIT) approach was developed by LLNL researchers in response to a bottleneck of traditional sequential time-marching algorithms caused by stagnant clock speeds. It constructs coarse time grids and uses each coarse time scale solution to improve the next finer-scale solution, ultimately yielding an iterative scheme that simultaneously updates in parallel a solution guess over the entire space-time domain.

Focus Areas:
Nonlinear Solvers | Multigrid and Multilevel Solvers

zfp & fpzip: Floating Point Compression

zfp is an open source C/C++ library for compressed floating-point arrays that support very high throughput read and write random access. It was designed to achieve high compression ratios and therefore uses lossy but optionally error-bounded compression. fpzip is a library for lossless or lossy compression of 2D or 3D floating-point scalar fields. It was primarily designed for lossless compression.

Focus Area:
Compression Techniques

ZFS: Improving Lustre Efficiency

Livermore computer scientists are incorporating the Zettabyte File System into their high-performance parallel file systems for better performance and scalability.