Search | arXiv e-print repository

SPLAIN: Augmenting Cybersecurity Warnings with Reasons and Data

Authors: Vera A. Kazakova, Jena D. Hwang, Bonnie J. Dorr, Yorick Wilks, J. Blake Gage, Alex Memory, Mark A. Clark

Abstract: Effective cyber threat recognition and prevention demand comprehensible forecasting systems, as prior approaches commonly offer limited and, ultimately, unconvincing information. We introduce Simplified Plaintext Language (SPLAIN), a natural language generator that converts warning data into user-friendly cyber threat explanations. SPLAIN is designed to generate clear, actionable outputs, incorpor… ▽ More Effective cyber threat recognition and prevention demand comprehensible forecasting systems, as prior approaches commonly offer limited and, ultimately, unconvincing information. We introduce Simplified Plaintext Language (SPLAIN), a natural language generator that converts warning data into user-friendly cyber threat explanations. SPLAIN is designed to generate clear, actionable outputs, incorporating hierarchically organized explanatory details about input data and system functionality. Given the inputs of individual sensor-induced forecasting signals and an overall warning from a fusion module, SPLAIN queries each signal for information on contributing sensors and data signals. This collected data is processed into a coherent English explanation, encompassing forecasting, sensing, and data elements for user review. SPLAIN's template-based approach ensures consistent warning structure and vocabulary. SPLAIN's hierarchical output structure allows each threat and its components to be expanded to reveal underlying explanations on demand. Our conclusions emphasize the need for designers to specify the "how" and "why" behind cyber warnings, advocate for simple structured templates in generating consistent explanations, and recognize that direct causal links in Machine Learning approaches may not always be identifiable, requiring some explanations to focus on general methodologies, such as model and training data. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: Presented at FLAIRS-2019 as poster (see ancillary files)

ACM Class: I.2

Journal ref: FLAIRS-2019

arXiv:2311.05341 [pdf, other]

doi 10.3847/1538-4365/acfef6

Accelerating Dedispersion using Many-Core Architectures

Authors: Jan Novotný, Karel Adámek, M. A. Clark, Mike Giles, Wesley Armour

Abstract: Astrophysical radio signals are excellent probes of extreme physical processes that emit them. However, to reach Earth, electromagnetic radiation passes through the ionised interstellar medium (ISM), introducing a frequency-dependent time delay (dispersion) to the emitted signal. Removing dispersion enables searches for transient signals like Fast Radio Bursts (FRB) or repeating signals from isola… ▽ More Astrophysical radio signals are excellent probes of extreme physical processes that emit them. However, to reach Earth, electromagnetic radiation passes through the ionised interstellar medium (ISM), introducing a frequency-dependent time delay (dispersion) to the emitted signal. Removing dispersion enables searches for transient signals like Fast Radio Bursts (FRB) or repeating signals from isolated pulsars or those in orbit around other compact objects. The sheer volume and high resolution of data that next generation radio telescopes will produce require High-Performance Computing (HPC) solutions and algorithms to be used in time-domain data processing pipelines to extract scientifically valuable results in real-time. This paper presents a state-of-the-art implementation of brute force incoherent dedispersion on NVIDIA GPUs, and on Intel and AMD CPUs. We show that our implementation is 4x faster (8-bit 8192 channels input) than other available solutions and demonstrate, using 11 existing telescopes, that our implementation is at least 20 faster than real-time. This work is part of the AstroAccelerate package. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Journal ref: The Astrophysical Journal Supplement Series, Volume 269, Number 1, 2023

arXiv:2002.02530 [pdf, other]

doi 10.1021/acs.jmedchem.0c00452

Machine learning on DNA-encoded libraries: A new paradigm for hit-finding

Authors: Kevin McCloskey, Eric A. Sigel, Steven Kearnes, Ling Xue, Xia Tian, Dennis Moccia, Diana Gikunju, Sana Bazzaz, Betty Chan, Matthew A. Clark, John W. Cuozzo, Marie-Aude Guié, John P. Guilinger, Christelle Huguet, Christopher D. Hupp, Anthony D. Keefe, Christopher J. Mulhern, Ying Zhang, Patrick Riley

Abstract: DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value through screening of libraries with up to billions of unique small molecules. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from a large commercial collection and a virtual library of easily syn… ▽ More DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value through screening of libraries with up to billions of unique small molecules. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from a large commercial collection and a virtual library of easily synthesizable compounds. We train models using only DEL selection data and apply automated or automatable filters with chemist review restricted to the removal of molecules with potential for instability or reactivity. We validate this approach with a large prospective study (nearly 2000 compounds tested) across three diverse protein targets: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). The approach is effective, with an overall hit rate of {\sim}30% at 30 {\textmu}M and discovery of potent compounds (IC50 <10 nM) for every target. The model makes useful predictions even for molecules dissimilar to the original DEL and the compounds identified are diverse, predominantly drug-like, and different from known ligands. Collectively, the quality and quantity of DEL selection data; the power of modern machine learning methods; and access to large, inexpensive, commercially-available libraries creates a powerful new approach for hit finding. △ Less

Submitted 31 January, 2020; originally announced February 2020.

arXiv:1810.01609 [pdf, other]

doi 10.1109/SC.2018.00058

Simulating the weak death of the neutron in a femtoscale universe with near-Exascale computing

Authors: Evan Berkowitz, M. A. Clark, Arjun Gambhir, Ken McElvain, Amy Nicholson, Enrico Rinaldi, Pavlos Vranas, André Walker-Loud, Chia Cheng Chang, Bálint Joó, Thorsten Kurth, Kostas Orginos

Abstract: The fundamental particle theory called Quantum Chromodynamics (QCD) dictates everything about protons and neutrons, from their intrinsic properties to interactions that bind them into atomic nuclei. Quantities that cannot be fully resolved through experiment, such as the neutron lifetime (whose precise value is important for the existence of light-atomic elements that make the sun shine and life p… ▽ More The fundamental particle theory called Quantum Chromodynamics (QCD) dictates everything about protons and neutrons, from their intrinsic properties to interactions that bind them into atomic nuclei. Quantities that cannot be fully resolved through experiment, such as the neutron lifetime (whose precise value is important for the existence of light-atomic elements that make the sun shine and life possible), may be understood through numerical solutions to QCD. We directly solve QCD using Lattice Gauge Theory and calculate nuclear observables such as neutron lifetime. We have developed an improved algorithm that exponentially decreases the time-to solution and applied it on the new CORAL supercomputers, Sierra and Summit. We use run-time autotuning to distribute GPU resources, achieving 20% performance at low node count. We also developed optimal application mapping through a job manager, which allows CPU and GPU jobs to be interleaved, yielding 15% of peak performance when deployed across large fractions of CORAL. △ Less

Submitted 10 October, 2018; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: 2018 Gordon Bell Finalist: 9 pages, 9 figures; v2: fixed 2 typos and appended acknowledgements

Report number: LLNL-JRNL-749850, RIKEN-iTHEMS-Report-18 ACM Class: C.1.4; D.1.3

Journal ref: Supercomputing 2018, pp. 697-705

arXiv:1710.09409 [pdf, other]

doi 10.1051/epjconf/201817509006

Performance Portability Strategies for Grid C++ Expression Templates

Authors: Peter A. Boyle, M. A. Clark, Carleton DeTar, Meifeng Lin, Verinder Rana, Alejandro Vaquero Avilés-Casco

Abstract: One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Ju… ▽ More One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)$\times$SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code. △ Less

Submitted 25 October, 2017; originally announced October 2017.

Comments: 8 pages, 4 figures. Talk presented at the 35th International Symposium on Lattice Field Theory, 18-24 June 2017, Granada, Spain

arXiv:1501.05992 [pdf, other]

doi 10.1017/pasa.2015.5

The Murchison Widefield Array Correlator

Authors: S. M. Ord, B. Crosse, D. Emrich, D. Pallot, R. B. Wayth, M. A. Clark, S. E. Tremblay, W. Arcus, D. Barnes, M. Bell, G. Bernardi, N. D. R. Bhat, J. D. Bowman, F. Briggs, J. D. Bunton, R. J. Cappallo, B. E. Corey, A. A. Deshpande, L. deSouza, A. Ewell-Wice, L. Feng, R. Goeke, L. J. Greenhill, B. J. Hazelton, D. Herne , et al. (42 additional authors not shown)

Abstract: The Murchison Widefield Array (MWA) is a Square Kilometre Array (SKA) Precursor. The telescope is located at the Murchison Radio--astronomy Observatory (MRO) in Western Australia (WA). The MWA consists of 4096 dipoles arranged into 128 dual polarisation aperture arrays forming a connected element interferometer that cross-correlates signals from all 256 inputs. A hybrid approach to the correlation… ▽ More The Murchison Widefield Array (MWA) is a Square Kilometre Array (SKA) Precursor. The telescope is located at the Murchison Radio--astronomy Observatory (MRO) in Western Australia (WA). The MWA consists of 4096 dipoles arranged into 128 dual polarisation aperture arrays forming a connected element interferometer that cross-correlates signals from all 256 inputs. A hybrid approach to the correlation task is employed, with some processing stages being performed by bespoke hardware, based on Field Programmable Gate Arrays (FPGAs), and others by Graphics Processing Units (GPUs) housed in general purpose rack mounted servers. The correlation capability required is approximately 8 TFLOPS (Tera FLoating point Operations Per Second). The MWA has commenced operations and the correlator is generating 8.3 TB/day of correlation products, that are subsequently transferred 700 km from the MRO to Perth (WA) in real-time for storage and offline processing. In this paper we outline the correlator design, signal path, and processing elements and present the data format for the internal and external interfaces. △ Less

Submitted 23 January, 2015; originally announced January 2015.

Comments: 17 pages, 9 figures. Accepted for publication in PASA. Some figures altered to meet astro-ph submission requirements

arXiv:1408.5925 [pdf, other]

doi 10.1109/IPDPS.2014.112

A Framework for Lattice QCD Calculations on GPUs

Authors: F. T. Winter, M. A. Clark, R. G. Edwards, B. Joó

Abstract: Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typic… ▽ More Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl's law issue. The lattice QCD application Chroma allows to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory and Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one can effectively move the whole application in one swing to a different platform. The QDP-JIT/PTX library, the reimplementation of the low-level layer, provides a framework for lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler (part of the NVIDIA Linux kernel driver) which translates an assembly-like language (PTX) to GPU code. The expression template technique is used to build PTX code generators and a software cache manages the GPU memory. This reimplementation allows us to deploy an efficient implementation of the full gauge-generation program with dynamical fermions on large-scale GPU-based machines such as Titan and Blue Waters which accelerates the algorithm by more than an order of magnitude. △ Less

Submitted 25 August, 2014; originally announced August 2014.

Comments: 10 pages, 6 figures, as published in the proceedings of IPDPS '14

arXiv:1407.8116 [pdf, other]

doi 10.1007/s00450-015-0300-5

Optimizing performance per watt on GPUs in High Performance Computing: temperature, frequency and voltage effects

Authors: D. C. Price, M. A. Clark, B. R. Barsdell, R. Babich, L. J. Greenhill

Abstract: The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency (performance per watt) at remote observatory sites parallels that in HPC broadly, where efficiency is an emerging critical metric. We investigate how the performance per watt of graphics… ▽ More The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency (performance per watt) at remote observatory sites parallels that in HPC broadly, where efficiency is an emerging critical metric. We investigate how the performance per watt of graphics processing units (GPUs) is affected by temperature, core clock frequency and voltage. Our results highlight how the underlying physical processes that govern transistor operation affect power efficiency. In particular, we show experimentally that GPU power consumption grows non-linearly with both temperature and supply voltage, as predicted by physical transistor models. We show lowering GPU supply voltage and increasing clock frequency while maintaining a low die temperature increases the power efficiency of an NVIDIA K20 GPU by up to 37-48% over default settings when running xGPU, a compute-bound code used in radio astronomy. We discuss how temperature-aware power models could be used to reduce power consumption for future HPC installations. Automatic temperature-aware and application-dependent voltage and frequency scaling (T-DVFS and A-DVFS) may provide a mechanism to achieve better power efficiency for a wider range of codes running on GPUs △ Less

Submitted 20 October, 2015; v1 submitted 30 July, 2014; originally announced July 2014.

Comments: In Computer Science - Research and Development special issue on Energy-Aware High-Performance Computing. The final publication is available at Springer via http://dx.doi.org/10.1007/s00450-015-0300-5

arXiv:1107.4264 [pdf, other]

Accelerating Radio Astronomy Cross-Correlation with Graphics Processing Units

Authors: M. A. Clark, P. C. La Plante, L. J. Greenhill

Abstract: We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from "Large-N" arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implementated efficiently on Nvidia's Fermi architecture, sustaining up… ▽ More We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from "Large-N" arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implementated efficiently on Nvidia's Fermi architecture, sustaining up to 79% of the peak single precision floating-point throughput. We compare performance obtained for hardware- and software-managed caches, observing significantly better performance for the latter. The high performance reported involves use of a multi-level data tiling strategy in memory and use of a pipelined algorithm with simultaneous computation and transfer of data from host to device memory. The speed of code development, flexibility, and low cost of the GPU implementations compared to ASIC and FPGA implementations have the potential to greatly shorten the cycle of correlator development and deployment, for cases where some power consumption penalty can be tolerated. △ Less

Submitted 1 August, 2011; v1 submitted 21 July, 2011; originally announced July 2011.

Comments: Submitted to the International Journal of High Performance Computing Applications (IJHPCA). 36 pages and 8 figures

Showing 1–9 of 9 results for author: Clark, M A