Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: Repik, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:1907.01019  [pdf, other

    cs.DC

    Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo

    Authors: Valerio Formicola, Saurabh Jha, Daniel Chen, Fei Deng, Amanda Bonnie, Mike Mason, Jim Brandt, Ann Gentile, Larry Kaplan, Jason Repik, Jeremy Enos, Mike Showerman, Annette Greiner, Zbigniew Kalbarczyk, Ravishankar K. Iyer, Bill Krammer

    Abstract: We present a set of fault injection experiments performed on the ACES (LANL/SNL) Cray XE supercomputer Cielo. We use this experimental campaign to improve the understanding of failure causes and propagation that we observed in the field failure data analysis of NCSA's Blue Waters. We use the data collected from the logs and from network performance counter data 1) to characterize the fault-error-f… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Presented at Cray User Group 2017