SC13 Home > SC13 Schedule > SC13 Presentation - Theory, Meet Practice: Challenges in Applying Failure Prediction on Large Systems

SCHEDULE: NOV 16-22, 2013

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Theory, Meet Practice: Challenges in Applying Failure Prediction on Large Systems

SESSION: ACM Student Research Competition Poster Reception

EVENT TYPE: ACM Student Research Competition Posters, ACM Student Research Competition

TIME: 5:15PM - 7:00PM

AUTHOR(S):Ana Gainaru

ROOM:Mile High Pre-Function

ABSTRACT:
As the size of supercomputers increases, so does the probability of a single component failure within a time frame. Checkpoint-Restart, the classical method to survive application failures, faces many challenges in the Exascale era due to frequent and large rollbacks. A complement to this approach is failure avoidance, by which the occurrence of a fault is predicted and proactive measures are taken. With the growing complexity of extreme scale supercomputers, the act of predicting failures in real time becomes cumbersome and presents a couple of challenges not encountered before. This work is nearly complete and presents key issues I have encountered when applying online failure prediction on the Blue Waters system. The overhead of combining fault prediction and checkpointing on smaller and large scale systems will also be reported. The results give insights on the challenges in achieving an effective fault prevention mechanism for current and future HPC systems.

Chair/Author Details:

Ana Gainaru - University of Illinois at Urbana-Champaign

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar