TitleHow to Optimize and Run MPI Workloads on AWS with our Latest Services
SpeakerLinda S. Hedges and Raghu Raja (Amazon Web Services)
Come to a hands-on workshop designed to show and explain the essentials of optimizing HPC applications on AWS. The tutorial starts with an introduction to common workloads run in the cloud and a discussion on common AWS services, instance types, storage and networking options that target HPC workloads. A hands-on tutorial will walk through the set up and running of a common HPC workload which relies heavily on the network. Elastic Fabric Adapter, EFA, is an AWS network interface designed specifically for HPC applications requiring high levels of inter-instance communications such as computational fluid dynamics, weather modeling, and reservoir simulation. It uses a custom-built operating system bypass technique to enhance the performance of inter-instance communications, which is critical to scaling HPC applications. With EFA, HPC applications using popular HPC technologies like Message Passing Interface (MPI) can scale to thousands of CPU cores. You'll learn how to implement EFA to get maximum scalability for your workloads.
TitleSystem Innovation in DCI Transport Networks
SpeakersLoukas Paraschis and Abhinava Shivakumar Sadasivarao (Infinera)
Traffic interconnecting data centers (DCI) has grown more than any other transport network traffic type, and has been projected to grow by at least 2 more orders of magnitude. The economics of this growth motivated the building of dedicated DCI networks, with some of the most spectrally efficient fiber deployments globally. It also motivated a new class of purpose-built DCI- optimized routing and optical transport systems. Hence, DCI has been the most significant evolution in transport networking this decade, and arguably since the earlier major transitions from TDM to IP/MPLS and WDM.
This tutorial reviews the most important DCI innovations, and their increasingly important role more generally in transport networks. Notably, it reviews the main DCI network requirements, and the associated optimization in routers that focus on maximizing throughput rather than routing scale, and high capacity, typically point-to-point, WDM systems that have been the first to employ state-of-the-art coherent transmission. DCI has also pioneered in transport the extensive adoption of software innovations in automation, programmability, management abstraction, and control-plane disaggregation, typically referred collectively as "SDN", and of the associated "open" transport architectures. Moreover, DCI is driving significant emerging innovations, including 400GE coherent WDM "ZR" pluggables in DCI routers, or the potential value from network optimization and traffic engineering based on network analytics. We discuss the value of these innovations, and the associated trade-offs, along with future research topics, and related emerging standards.
TitleThe CODES/TraceR Framework for Continuing Innovation of HPC Interconnects
SpeakersAbhinav Bhatele (Lawrence Livermore National Laboratory) and Neil McGlohon (Rensselaer Polytechnic Institute)
With the frontier of exascale-level high-performance computing (HPC) upon us, it is becoming ever more crucial to obtain accurate and reliable predictions of prospective interconnect performance The cost associated with building a new HPC system puts great risk on relying solely on analytical estimates and metrics. Full-scale simulation of network interconnects with a broad variety of workloads and configurations can grant crucial insight into the viability of prospective designs.
This tutorial will introduce CODES/TraceR. It is a flexible interconnect simulation framework built on top of the ROSS parallel-discrete-event-simulation (PDES) environment. We will present the capabilities of this framework, describing how these capabilities can be used to predict real-world interconnect ability and performance with minimal effort. Additionally, the tutorial will cover recent additions to the CODES framework, specifically support for Intel Scalable Workload Model (SWM) online workloads and Quality of Service features through traffic classes.
The tutorial will include a from-the-ground-up setup and execution procedure and present case studies of recent work exhibiting how the framework can be used to help innovate in the area of HPC system interconnects.
TitleHPC meets distributed deep learning
SpeakersD.K. Panda, Ammar Ahmed Awan, and Hari Subramoni (Ohio State University)
The recent advances in Deep Learning (DL) has led to many exciting challenges and opportunities for CS and AI re- searchers alike. Modern DL frameworks like TensorFlow, PyTorch, and several others have emerged that offer ease of use and flexibility to train, and deploy various types of Deep Neural Networks (DNNs). In this tutorial, we will provide an overview of interesting trends in DNN design and how cutting-edge hardware architectures and high-performance intercon- nects are playing a key role in moving the field forward. We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node design. However, approaches to parallelize the pro- cess of DNN training are also being actively explored. The DL community has moved along different distributed training designs that exploit communication runtimes like gRPC, MPI, and NCCL. We highlight new challenges and opportunities for communication runtimes to exploit high-performance interconnects and efficiently support large-scale distributed DNN training. We also highlight some of our co-design efforts to utilize CUDA-Aware MPI for large-scale DNN training on GPU clusters. Finally, we include hands-on exercises to enable the attendees to gain first-hand experience of running distributed DNN training experiments on a modern GPU cluster.