-
Representation Learning of Geometric Trees
Authors:
Zheng Zhang,
Allen Zhang,
Ruth Nelson,
Giorgio Ascoli,
Liang Zhao
Abstract:
Geometric trees are characterized by their tree-structured layout and spatially constrained nodes and edges, which significantly impacts their topological attributes. This inherent hierarchical structure plays a crucial role in domains such as neuron morphology and river geomorphology, but traditional graph representation methods often overlook these specific characteristics of tree structures. To…
▽ More
Geometric trees are characterized by their tree-structured layout and spatially constrained nodes and edges, which significantly impacts their topological attributes. This inherent hierarchical structure plays a crucial role in domains such as neuron morphology and river geomorphology, but traditional graph representation methods often overlook these specific characteristics of tree structures. To address this, we introduce a new representation learning framework tailored for geometric trees. It first features a unique message passing neural network, which is both provably geometrical structure-recoverable and rotation-translation invariant. To address the data label scarcity issue, our approach also includes two innovative training targets that reflect the hierarchical ordering and geometric structure of these geometric trees. This enables fully self-supervised learning without explicit labels. We validate our method's effectiveness on eight real-world datasets, demonstrating its capability to represent geometric trees.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Automated MPI code generation for scalable finite-difference solvers
Authors:
George Bisbas,
Rhodri Nelson,
Mathias Louboutin,
Paul H. J. Kelly,
Fabio Luporini,
Gerard Gorman
Abstract:
Partial differential equations (PDEs) are crucial in modelling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs on a large scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically…
▽ More
Partial differential equations (PDEs) are crucial in modelling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs on a large scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to solve explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modelling simulations at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. While the contributions of this work are implemented and integrated within the Devito framework, the DMP concepts and the techniques applied are generally applicable to any FD solvers. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive weak and strong scaling on the Archer2 supercomputer, demonstrating the effectiveness of the proposed approach in meeting the demands of large-scale scientific simulations.
△ Less
Submitted 7 May, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
A Novel Immersed Boundary Approach for Irregular Topography with Acoustic Wave Equations
Authors:
Edward Caunt,
Rhodri Nelson,
Fabio Luporini,
Gerard Gorman
Abstract:
Irregular terrain has a pronounced effect on the propagation of seismic and acoustic wavefields but is not straightforwardly reconciled with structured finite-difference (FD) methods used to model such phenomena. Methods currently detailed in the literature are generally limited in scope application-wise or non-trivial to apply to real-world geometries. With this in mind, a general immersed bounda…
▽ More
Irregular terrain has a pronounced effect on the propagation of seismic and acoustic wavefields but is not straightforwardly reconciled with structured finite-difference (FD) methods used to model such phenomena. Methods currently detailed in the literature are generally limited in scope application-wise or non-trivial to apply to real-world geometries. With this in mind, a general immersed boundary treatment capable of imposing a range of boundary conditions in a relatively equation-agnostic manner has been developed, alongside a framework implementing this approach, intending to complement emerging code-generation paradigms. The approach is distinguished by the use of N-dimensional Taylor-series extrapolants constrained by boundary conditions imposed at some suitably-distributed set of surface points. The extrapolation process is encapsulated in modified derivative stencils applied in the vicinity of the boundary, utilizing hyperspherical support regions. This method ensures boundary representation is consistent with the FD discretization: both must be considered in tandem. Furthermore, high-dimensional and vector boundary conditions can be applied without approximation prior to discretization. A consistent methodology can thus be applied across free and rigid surfaces with the first and second-order acoustic wave equation formulations. Application to both equations is demonstrated, and numerical examples based on analytic and real-world topography implementing free and rigid surfaces in 2D and 3D are presented.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the AgBioData Consortium
Authors:
Jennifer L. Clarke,
Laurel D. Cooper,
Monica F. Poelchau,
Tanya Z. Berardini,
Justin Elser,
Andrew D. Farmer,
Stephen Ficklin,
Sunita Kumari,
Marie-Angélique Laporte,
Rex T. Nelson,
Rie Sadohara,
Peter Selby,
Anne E. Thessen,
Brandon Whitehead,
Taner Z. Sen
Abstract:
Over the last several decades, there has been rapid growth in the number and scope of agricultural genetics, genomics and breeding (GGB) databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databa…
▽ More
Over the last several decades, there has been rapid growth in the number and scope of agricultural genetics, genomics and breeding (GGB) databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, conducted a survey to assess the status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data sharing practices by AgBioData databases are in a healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that ontology use has not substantially changed since a similar survey was conducted in 2017. We recommend 1) providing training for database personnel in specific data sharing techniques, as well as in ontology use; 2) further study on what metadata is shared, and how well it is shared among databases; 3) promoting an understanding of data sharing and ontologies in the stakeholder community; 4) improving data sharing and ontologies for specific phenotypic data types and formats; and 5) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources
Authors:
George Bisbas,
Fabio Luporini,
Mathias Louboutin,
Rhodri Nelson,
Gerard Gorman,
Paul H. J. Kelly
Abstract:
Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However,…
▽ More
Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However, applying temporal blocking to practical applications' stencils remains challenging. These computations often consist of sparsely located operators not aligned with the computational grid ("off-the-grid"). Our work is motivated by modeling problems in which source injections result in wavefields that must then be measured at receivers by interpolation from the grided wavefield. The resulting data dependencies make the adoption of temporal blocking much more challenging. We propose a methodology to inspect these data dependencies and reorder the computation, leading to performance gains in stencil codes where temporal blocking has not been applicable. We implement this novel scheme in the Devito domain-specific compiler toolchain. Devito implements a domain-specific language embedded in Python to generate optimized partial differential equation solvers using the finite-difference method from high-level symbolic problem definitions. We evaluate our scheme using isotropic acoustic, anisotropic acoustic, and isotropic elastic wave propagators of industrial significance. After auto-tuning, performance evaluation shows that this enables substantial performance improvement through temporal blocking over highly-optimized vectorized spatially-blocked code of up to 1.6x.
△ Less
Submitted 25 February, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Visualizing a Large Spatiotemporal Collection of Historic Photography with a Generous Interface
Authors:
Taylor Arnold,
Nathaniel Ayers,
Justin Madron,
Robert Nelson,
Lauren Tilton
Abstract:
Museums, libraries, and other cultural institutions continue to prioritize and build web-based visualization systems that increase access and discovery to digitized archives. Prominent examples exist that illustrate impressive visualizations of a particular feature of a collection. For example, interactive maps showing geographic spread or timelines capturing the temporal aspects of collections. B…
▽ More
Museums, libraries, and other cultural institutions continue to prioritize and build web-based visualization systems that increase access and discovery to digitized archives. Prominent examples exist that illustrate impressive visualizations of a particular feature of a collection. For example, interactive maps showing geographic spread or timelines capturing the temporal aspects of collections. By way of a case study, this paper presents a new web-based visualization system that allows users to simultaneously explore a large collection of images along several different dimensions---spatial, temporal, visual, textual, and through additional metadata fields including the photographer name---guided by the concept of generous interfaces. The case study is a complete redesign of a previously released digital, public humanities project called Photogrammar (2014). The paper highlights the redesign's interactive visualizations that are now possible by the affordances of newly available software. All of the code is open-source in order to allow for re-use of the codebase to other collections with a similar structure.
△ Less
Submitted 4 September, 2020;
originally announced September 2020.
-
Scaling through abstractions -- high-performance vectorial wave simulations for seismic inversion with Devito
Authors:
Mathias Louboutin,
Fabio Luporini,
Philipp Witte,
Rhodri Nelson,
George Bisbas,
Jan Thorbecke,
Felix J. Herrmann,
Gerard Gorman
Abstract:
[Devito] is an open-source Python project based on domain-specific language and compiler technology. Driven by the requirements of rapid HPC applications development in exploration seismology, the language and compiler have evolved significantly since inception. Sophisticated boundary conditions, tensor contractions, sparse operations and features such as staggered grids and sub-domains are all su…
▽ More
[Devito] is an open-source Python project based on domain-specific language and compiler technology. Driven by the requirements of rapid HPC applications development in exploration seismology, the language and compiler have evolved significantly since inception. Sophisticated boundary conditions, tensor contractions, sparse operations and features such as staggered grids and sub-domains are all supported; operators of essentially arbitrary complexity can be generated. To accommodate this flexibility whilst ensuring performance, data dependency analysis is utilized to schedule loops and detect computational-properties such as parallelism. In this article, the generation and simulation of MPI-parallel propagators (along with their adjoints) for the pseudo-acoustic wave-equation in tilted transverse isotropic media and the elastic wave-equation are presented. Simulations are carried out on industry scale synthetic models in a HPC Cloud system and reach a performance of 28TFLOP/s, hence demonstrating Devito's suitability for production-grade seismic inversion problems.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Automated Weed Detection in Aerial Imagery with Context
Authors:
Delia Bullock,
Andrew Mangeni,
Tyr Wiesner-Hanks,
Chad DeChant,
Ethan L. Stewart,
Nicholas Kaczmar,
Judith M. Kolkman,
Rebecca J. Nelson,
Michael A. Gore,
Hod Lipson
Abstract:
In this paper, we demonstrate the ability to discriminate between cultivated maize plant and grass or grass-like weed image segments using the context surrounding the image segments. While convolutional neural networks have brought state of the art accuracies within object detection, errors arise when objects in different classes share similar features. This scenario often occurs when objects in i…
▽ More
In this paper, we demonstrate the ability to discriminate between cultivated maize plant and grass or grass-like weed image segments using the context surrounding the image segments. While convolutional neural networks have brought state of the art accuracies within object detection, errors arise when objects in different classes share similar features. This scenario often occurs when objects in images are viewed at too small of a scale to discern distinct differences in features, causing images to be incorrectly classified or localized. To solve this problem, we will explore using context when classifying image segments. This technique involves feeding a convolutional neural network a central square image along with a border of its direct surroundings at train and test times. This means that although images are labelled at a smaller scale to preserve accurate localization, the network classifies the images and learns features that include the wider context. We demonstrate the benefits of this context technique in the object detection task through a case study of grass (foxtail) and grass-like (yellow nutsedge) weed detection in maize fields. In this standard situation, adding context alone nearly halved the error of the neural network from 7.1% to 4.3%. After only one epoch with context, the network also achieved a higher accuracy than the network without context did after 50 epochs. The benefits of using the context technique are likely to particularly evident in agricultural contexts in which parts (such as leaves) of several plants may appear similar when not taking into account the context in which those parts appear.
△ Less
Submitted 19 November, 2019; v1 submitted 1 October, 2019;
originally announced October 2019.
-
SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python
Authors:
Pauli Virtanen,
Ralf Gommers,
Travis E. Oliphant,
Matt Haberland,
Tyler Reddy,
David Cournapeau,
Evgeni Burovski,
Pearu Peterson,
Warren Weckesser,
Jonathan Bright,
Stéfan J. van der Walt,
Matthew Brett,
Joshua Wilson,
K. Jarrod Millman,
Nikolay Mayorov,
Andrew R. J. Nelson,
Eric Jones,
Robert Kern,
Eric Larson,
CJ Carey,
İlhan Polat,
Yu Feng,
Eric W. Moore,
Jake VanderPlas,
Denis Laxalde
, et al. (10 additional authors not shown)
Abstract:
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent reposit…
▽ More
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
Information Fusion to Estimate Resilience of Dense Urban Neighborhoods
Authors:
Anthony Palladino,
Elisa J. Bienenstock,
Bradley M. West,
Jake R. Nelson,
Tony H. Grubesic
Abstract:
Diverse sociocultural influences in rapidly growing dense urban areas may induce strain on civil services and reduce the resilience of those areas to exogenous and endogenous shocks. We present a novel approach with foundations in computer and social sciences, to estimate the resilience of dense urban areas at finer spatiotemporal scales compared to the state-of-the-art. We fuse multi-modal data s…
▽ More
Diverse sociocultural influences in rapidly growing dense urban areas may induce strain on civil services and reduce the resilience of those areas to exogenous and endogenous shocks. We present a novel approach with foundations in computer and social sciences, to estimate the resilience of dense urban areas at finer spatiotemporal scales compared to the state-of-the-art. We fuse multi-modal data sources to estimate resilience indicators from social science theory and leverage a structured ontology for factor combinations to enhance explainability. Estimates of destabilizing areas can improve the decision-making capabilities of civil governments by identifying critical areas needing increased social services.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.
-
SNS Timing System
Authors:
B. oerter,
R. Nelson,
T. Shea,
C. Sibley
Abstract:
This poster describes the timing system being designed for Spallation Neutron Source being built at Oak Ridge National lab.
This poster describes the timing system being designed for Spallation Neutron Source being built at Oak Ridge National lab.
△ Less
Submitted 9 November, 2001;
originally announced November 2001.