HPM2016

Agenda

Program

Thursday, August 25, 2016, 9:00am – 5:30pm
8:00am-9:00am	Working Breakfast and Networking (Breakfast provided) Agenda: Discussion of Panel Topics
9:00am-9:10am	Opening Remarks *Neena Imam, Director of Research Collaboration, Computing and Computational Sciences Directorate, ORNL*
9:10am-10:10am	Keynote Talk *FITARA Data Center Optimization Initiative (DCOI) – How It Applies to HPC* *Jake Wooley, Program Manager, DOE* Abstract: As required by the Federal Information Technology Acquisition Reform Act (FITARA), OMB has issued new guidance on federal data center consolidation and optimization. While the Data Center Optimization Initiative (DCOI) guidance continues to promote data center consolidation, it also includes new optimization goals that were initially included in Executive Order 13693 (Federal Sustainability into the Next Decade). Mr. Wooley will explain the performance goals and metrics in the new DCOI memo and how they specifically apply to HPC systems and data centers.
10:10am-10:25am	Coffee Break
10:25am-11:55am	Session 1
Talk 1	*HPC Data Center Power and Energy Monitoring at LRZ* *Torsten Wilde, LRZ* Abstract: Reducing the power and energy consumption of HPC data centers is one of the essential areas of exascale research. The Leibniz Supercomputing Centre has been on the forefront of HPC energy efficiency research since 2011 when hot water (chiller-less) direct liquid cooling was installed in its data center. This talk will introduce the 4 Pillar Framework, a common frame of reference concerning energy usage in a data center. The talk will also discuss how power and energy related data is collected at LRZ and what is being done with the collected data. Since "computing under a power bound" is a major topic in the US HPC community, this talk will briefly cover LRZ's power contract and what a power bound could mean for the SuperMUC HPC system. This talk will conclude with an example of cooling infrastructure data analytics to highlight the importance and the need for advanced analytic tools for data center operators.
Talk 2	*Preparing for Exascale Computing Power Demands Using a Data Driven Approach* *Ghaleb Abdulla, LLNL* Abstract: Managing power demands is a priority at Lawrence Livermore Laboratory. The current sequoia machine draws up to 9 MW and combined with other machines the power demand reaches up to ~ 20 MW. The HPC center at LLNL is concerned about the quality, cost, and environmental impact; in addition, we share with our providers the interest to reduce energy costs and improve electrical grid reliability. We implemented an extensive monitoring and data collection system to collect and analyze data from power meters, PMU's, and computer environmental and power sensors. We will describe our work on managing (storage and integration), analysis and visualization of the collected data and describe how data was used to help with energy savings and understand and correlate events. Our effort helps prepare LLNL for the power demand challenges that exascale computing will introduce.
Talk 3	*Power Data for HPC: What is it? How is it obtained, and what use is it?* *Sean Wallace, Illinois Institute of Technology* Abstract: The high performance computing landscape is filled with diverse hardware components. A large part of understanding how these components compare to others is by looking at the various environmental aspects of these devices such as power consumption, temperature, etc. Thankfully, vendors of these various pieces of hardware have supported this by providing mechanisms to obtain this data. However, differences not only in the way this data is obtained but also the data that is provided is common between products. While advances in the acquisition of this data have made this a task of relative ease, the use of this data for meaningful insight is still far from trivial. This talk will detail the types of data, how this data is accessed, and what potential insights can be gained from the use of this data for a range of high performance computing systems and components.
11:55am-12:55pm	Working Lunch (Provided) Agenda: Feedback on Morning Sessions
12:55pm-2:25pm	Session 2
Talk 4	Using Streaming Analytics to Improve Operating Efficiency Jim Rogers, ORNL Abstract: ORNL is constructing a new warm-water energy plant that will provide up to 6000 tons of cooling, and deliver supply temperatures that are nearly thirty degrees warmer than the current central energy plant. The initial tenant for this facility is Summit, a hybrid computing system from IBM and NVIDIA that could generate demand of up to 20MW. While the control systems for the mechanical plant are well-understood, there is a significant gap between the information available from Summit, and how that information can be used to improve the operational efficiency of the energy plant. To bridge this gap, ORNL is prototyping data collection and real time streaming analytics techniques that can be used to contribute to more efficient plant operation. There are significant challenges related to the data volume, system scale, and the integration with traditional SCADA controls. These challenges, and the corresponding opportunities will be described.
Talk 5	*Improving Efficiency with Dynamic Controls* *David Grant, ORNL* Abstract: HPC system integrators provide worst case power consumption of a cabinet. This information is scaled up to a system's size for the design of the power and cooling infrastructure. Rack count, resource management policies, the scheduler, and job types can all impact the final power usage profile. Historically, the average HPC power usage has been 60-75% of the worst case. The heat generated is often split between the liquid and air. All these variables can result in a need for hybrid and/or multiple independent cooling system all of which probably won't operate as efficiently as they could. Good information from the start is critical and a method of having direct feedback from the HPC may enable dynamic controls and increase efficiency and cooling infrastructure utilization. This presentation will discuss what information could be shared between the facilities control systems and the HPC system, methods of communication, and where we are headed in looking at an exascale system.
Talk 6	*Power API Collaborations, Community, and What's Next* *James Laros, SNL* Abstract: The "High Performance Computing – Power Application Programming Interface Specification" was originally released in 2014 and since then has undergone changes inspired by the community and collaborations. Portions of the specification are currently being implemented as part of the NNSA's first Advanced Technology System (Trinity) as part of a non-recurring engineering (NRE) project with Cray Inc. The specification is also playing an important role in the Trinity NRE collaboration with Adaptive Computing. Community involvement inspired by the Energy Efficient High Performance Computing (EEHPC) working group has led to collaborations with Intel to align the GEO API and Power API specification interfaces. Likewise, collaborations with the developers of REDFISH and the Power API specification are aligning their efforts wherever possible. This talk will present a brief update of changes to the Power API specification since its introduction and cover these and other ongoing collaborations. The question of how to move forward as a community will be posed and open for discussion.
2:25pm-2:40pm	Coffee Break
2:40pm-4:10pm	Session 3
Talk 7	*Power Signatures of HPC Workloads* *Suzanne Rivoire, Sonoma State University* Abstract: Workload-aware power management and scheduling techniques have the potential to save energy while minimizing negative impact on performance. The effectiveness of these techniques depends on the stability of a workload's power consumption pattern across different input data, resource allocations (e.g. number of cores), and hardware platforms. This talk will discuss techniques for summarizing the distinctive power consumption behavior of HPC workloads into signatures and using these signatures to identify known and never-before-seen tasks based only on traces of their power consumption.
Talk 8	*Prediction and Characterization of Application Power Use in a High Performance Computing Environment* *Caleb Phillips,* *NREL* Abstract: The Energy Systems Integration Facility (ESIF) at the National Renewable Energy Laboratory (NREL) in Golden, Colorado houses one of the most efficient HPC data centers in the world through an innovative integration of the HPC system with the building and campus infrastructure. This integrated environment offers a testbed to explore trade-offs with respect to power and energy constraints that we believe will typify future data centers. Power use in traditional data centers and high performance computing facilities has grown in tandem with increases in the size and number of these facilities. U.S. data center electrical energy consumption reached 91 billion kWh in 2013 and is expected to grow to 140 billion kWh by 2020. Meanwhile, the market for data-center construction is projected to register an annual compound growth rate of 22%. Motivated by this observation, we have endeavored to better understand the underlying factors that drive energy consumption in a HPC environment by monitoring node-level power use on the Peregrine supercomputer using HP integrated Lights Out (iLO) sensors and a custom informatics system. By analyzing massive collections of detailed power time-series data and meta-data about job submissions, we have been able to show that there is a substantial amount of variation between applications, that power use for many jobs has a strong periodic structure, and that ensemble machine learning methods are able to accurately forecast per-task and entire-system power use in real time. These results have immediate applications to power aware scheduling software which aims to conserve energy and optimize power use during periods of peak loads.
Talk 9	*Application Trapped Capacity and Energy Reporting* *Mark O'Connor, Allinea* Abstract: We have spent the past year working with the Computational Research and Development Programs at Oak Ridge National Laboratory to develop application-specific trapped capacity reports. The motivation and results of this collaboration will be presented and discussed. We will also provide an update on our energy research elsewhere, including projects to track application energy usage per function and to export application-energy data to optimizing compilers and workload schedulers.
4:10pm-4:20pm	Coffee Break
4:20pm-5:20pm	Panel Discussion Moderated by Natalie Bates, the chair of Energy Efficiency HPC Working Group, the panel consists of the specialists from Allinea, Cray, HP, IBM, and Intel to foster a conversation between the users and the vendors on the challenges and solutions of knowledge discovery for HPC power management. Panelists: Allinea – Mark O'Connor Cray – Steven J. Martin HPE – Nicolas Dubé IBM – Todd Rosedahl Intel – Michael K. Patterson
5:20pm-5:30pm	Closing Remarks