Simulating the weak death of the neutron in a femtoscale universe with near-Exascale computing
Authors:
Evan Berkowitz,
M. A. Clark,
Arjun Gambhir,
Ken McElvain,
Amy Nicholson,
Enrico Rinaldi,
Pavlos Vranas,
André Walker-Loud,
Chia Cheng Chang,
Bálint Joó,
Thorsten Kurth,
Kostas Orginos
Abstract:
The fundamental particle theory called Quantum Chromodynamics (QCD) dictates everything about protons and neutrons, from their intrinsic properties to interactions that bind them into atomic nuclei. Quantities that cannot be fully resolved through experiment, such as the neutron lifetime (whose precise value is important for the existence of light-atomic elements that make the sun shine and life p…
▽ More
The fundamental particle theory called Quantum Chromodynamics (QCD) dictates everything about protons and neutrons, from their intrinsic properties to interactions that bind them into atomic nuclei. Quantities that cannot be fully resolved through experiment, such as the neutron lifetime (whose precise value is important for the existence of light-atomic elements that make the sun shine and life possible), may be understood through numerical solutions to QCD. We directly solve QCD using Lattice Gauge Theory and calculate nuclear observables such as neutron lifetime. We have developed an improved algorithm that exponentially decreases the time-to solution and applied it on the new CORAL supercomputers, Sierra and Summit. We use run-time autotuning to distribute GPU resources, achieving 20% performance at low node count. We also developed optimal application mapping through a job manager, which allows CPU and GPU jobs to be interleaved, yielding 15% of peak performance when deployed across large fractions of CORAL.
△ Less
Submitted 10 October, 2018; v1 submitted 3 October, 2018;
originally announced October 2018.
Job Management and Task Bundling
Authors:
Evan Berkowitz,
Gustav R. Jansen,
Kenneth McElvain,
André Walker-Loud
Abstract:
High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We…
▽ More
High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We discuss METAQ and mpi_jm, software developed to dynamically group computational tasks together, that can intelligently backfill to consume idle time without substantial changes to users' current workflows or executables.
△ Less
Submitted 5 October, 2017;
originally announced October 2017.