BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20131118T153000Z DTEND:20131119T000000Z LOCATION:405 DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: With High-Performance Computing trends heading towards increasingly heterogeneous solutions, scientific developers face challenges adapting software to leverage these new systems. For instance, many systems feature nodes that couple multi-core processors with GPU-based computational accelerators, like the NVIDIA® Kepler, or many-core coprocessors, like the Intel® Xeon Phi coprocessor. In order to utilize these systems, scientific programmers need to leverage as much parallelism in applications as possible. Developers also need to juggle technologies including MPI, OpenMP, CUDA, and OpenACC. While troubleshooting, debugging, and optimizing applications are an expected part of porting, they become even more critical with the introduction of so many technologies.=0A=0AThis tutorial provides an introduction to parallel debugging and optimization. Debugging techniques covered include: MPI and subset debugging, process and thread sets, reverse and comparative debugging, and techniques for CUDA, OpenACC, and Intel Xeon Phi coprocessor debugging. Participants will have the opportunity to do hands-on CUDA and Intel Xeon Phi coprocessor debugging using TotalView on a cluster at RWTH Aachen University and on Keeneland and Beacon at NICS. Therefore, it is recommended that participants bring a network-capable laptop to the session. Optimization techniques will include profiling, tracing, and cache memory optimization. Examples will use ThreadSpotter and vendor-supplied tools. SUMMARY:Debugging and Optimizing MPI and OpenMP Applications Running on CUDA, OpenACC®, and Intel® Xeon Phi Coprocessors with TotalView® PRIORITY:3 END:VEVENT END:VCALENDAR