-
Leveraging Visibility Graphs for Enhanced Arrhythmia Classification with Graph Convolutional Networks
Authors:
Rafael F. Oliveira,
Gladston J. P. Moreira,
Vander L. S. Freitas,
Eduardo J. S. Luz
Abstract:
Arrhythmias, detectable via electrocardiograms (ECGs), pose significant health risks, emphasizing the need for robust automated identification techniques. Although traditional deep learning methods have shown potential, recent advances in graph-based strategies are aimed at enhancing arrhythmia detection performance. However, effectively representing ECG signals as graphs remains a challenge. This…
▽ More
Arrhythmias, detectable via electrocardiograms (ECGs), pose significant health risks, emphasizing the need for robust automated identification techniques. Although traditional deep learning methods have shown potential, recent advances in graph-based strategies are aimed at enhancing arrhythmia detection performance. However, effectively representing ECG signals as graphs remains a challenge. This study explores graph representations of ECG signals using Visibility Graph (VG) and Vector Visibility Graph (VVG), coupled with Graph Convolutional Networks (GCNs) for arrhythmia classification. Through experiments on the MIT-BIH dataset, we investigated various GCN architectures and preprocessing parameters. The results reveal that GCNs, when integrated with VG and VVG for signal graph mapping, can classify arrhythmias without the need for preprocessing or noise removal from ECG signals. While both VG and VVG methods show promise, VG is notably more efficient. The proposed approach was competitive compared to baseline methods, although classifying the S class remains challenging, especially under the inter-patient paradigm. Computational complexity, particularly with the VVG method, required data balancing and sophisticated implementation strategies. The source code is publicly available for further research and development at https://github.com/raffoliveira/VG_for_arrhythmia_classification_with_GCN.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
A Study on Hyperparameters Configurations for an Efficient Human Activity Recognition System
Authors:
Paulo J. S. Ferreira,
João Mendes Moreira,
João M. P. Cardoso
Abstract:
Human Activity Recognition (HAR) has been a popular research field due to the widespread of devices with sensors and computational power (e.g., smartphones and smartwatches). Applications for HAR systems have been extensively researched in recent literature, mainly due to the benefits of improving quality of life in areas like health and fitness monitoring. However, since persons have different mo…
▽ More
Human Activity Recognition (HAR) has been a popular research field due to the widespread of devices with sensors and computational power (e.g., smartphones and smartwatches). Applications for HAR systems have been extensively researched in recent literature, mainly due to the benefits of improving quality of life in areas like health and fitness monitoring. However, since persons have different motion patterns when performing physical activities, a HAR system must adapt to user characteristics to maintain or improve accuracy. Mobile devices, such as smartphones, used to implement HAR systems, have limited resources (e.g., battery life). They also have difficulty adapting to the device's constraints to work efficiently for long periods. In this work, we present a kNN-based HAR system and an extensive study of the influence of hyperparameters (window size, overlap, distance function, and the value of k) and parameters (sampling frequency) on the system accuracy, energy consumption, and inference time. We also study how hyperparameter configurations affect the model's user and activity performance. Experimental results show that adapting the hyperparameters makes it possible to adjust the system's behavior to the user, the device, and the target service. These results motivate the development of a HAR system capable of automatically adapting the hyperparameters for the user, the device, and the service.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering
Authors:
Braedy Kuzma,
Ivan Korostelev,
João P. L. de Carvalho,
José E. Moreira,
Christopher Barton,
Guido Araujo,
José Nelson Amaral
Abstract:
The resurgence of machine learning has increased the demand for high-performance basic linear algebra subroutines (BLAS), which have long depended on libraries to achieve peak performance on commodity hardware. High-performance BLAS implementations rely on a layered approach that consists of tiling and packing layers, for data (re)organization, and micro kernels that perform the actual computation…
▽ More
The resurgence of machine learning has increased the demand for high-performance basic linear algebra subroutines (BLAS), which have long depended on libraries to achieve peak performance on commodity hardware. High-performance BLAS implementations rely on a layered approach that consists of tiling and packing layers, for data (re)organization, and micro kernels that perform the actual computations. The creation of high-performance micro kernels requires significant development effort to write tailored assembly code for each architecture. This hand optimization task is complicated by the recent introduction of matrix engines by IBM's POWER10 MMA, Intel AMX, and Arm ME to deliver high-performance matrix operations. This paper presents a compiler-only alternative to the use of high-performance libraries by incorporating, to the best of our knowledge and for the first time, the automatic generation of the layered approach into LLVM, a production compiler. Modular design of the algorithm, such as the use of LLVM's matrix-multiply intrinsic for a clear interface between the tiling and packing layers and the micro kernel, makes it easy to retarget the code generation to multiple accelerators. The use of intrinsics enables a comprehensive performance study. In processors without hardware matrix engines, the tiling and packing delivers performance up to 22x (Intel), for small matrices, and more than 6x (POWER9), for large matrices, faster than PLuTo, a widely used polyhedral optimizer. The performance also approaches high-performance libraries and is only 34% slower than OpenBLAS and on-par with Eigen for large matrices. With MMA in POWER10 this solution is, for large matrices, over 2.6x faster than the vector-extension solution, matches Eigen performance, and achieves up to 96% of BLAS peak performance.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Regularization Through Simultaneous Learning: A Case Study on Plant Classification
Authors:
Pedro Henrique Nascimento Castro,
Gabriel Cássia Fortuna,
Rafael Alves Bonfim de Queiroz,
Gladston Juliano Prates Moreira,
Eduardo José da Silva Luz
Abstract:
In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty.…
▽ More
In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty. This experimental configuration allows for a detailed examination of model performance across similar (PlantNet) and dissimilar (ImageNet) domains, thereby enriching the generalizability of Convolutional Neural Network models. Remarkably, our approach demonstrates superior performance over models without regularization and those applying dropout regularization exclusively, enhancing accuracy by 5 to 22 percentage points. Moreover, when combined with dropout, the proposed approach improves generalization, securing state-of-the-art results for the UFOP-HVD challenge. The method also showcases efficiency with significantly smaller sample sizes, suggesting its broad applicability across a spectrum of related tasks. In addition, an interpretability approach is deployed to evaluate feature quality by analyzing class feature correlations within the network's convolutional layers. The findings of this study provide deeper insights into the efficacy of Simultaneous Learning, particularly concerning its interaction with the auxiliary and target datasets.
△ Less
Submitted 20 June, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
FineIBT: Fine-grain Control-flow Enforcement with Indirect Branch Tracking
Authors:
Alexander J. Gaidis,
Joao Moreira,
Ke Sun,
Alyssa Milburn,
Vaggelis Atlidakis,
Vasileios P. Kemerlis
Abstract:
We present the design, implementation, and evaluation of FineIBT: a CFI enforcement mechanism that improves the precision of hardware-assisted CFI solutions, like Intel IBT, by instrumenting program code to reduce the valid/allowed targets of indirect forward-edge transfers. We study the design of FineIBT on the x86-64 architecture, and implement and evaluate it on Linux and the LLVM toolchain. We…
▽ More
We present the design, implementation, and evaluation of FineIBT: a CFI enforcement mechanism that improves the precision of hardware-assisted CFI solutions, like Intel IBT, by instrumenting program code to reduce the valid/allowed targets of indirect forward-edge transfers. We study the design of FineIBT on the x86-64 architecture, and implement and evaluate it on Linux and the LLVM toolchain. We designed FineIBT's instrumentation to be compact, incurring low runtime and memory overheads, and generic, so as to support different CFI policies. Our prototype implementation incurs negligible runtime slowdowns ($\approx$0%-1.94% in SPEC CPU2017 and $\approx$0%-1.92% in real-world applications) outperforming Clang-CFI. Lastly, we investigate the effectiveness/security and compatibility of FineIBT using the ConFIRM CFI benchmarking suite, demonstrating that our instrumentation provides complete coverage in the presence of modern software features, while supporting a wide range of CFI policies with the same, predictable performance.
△ Less
Submitted 13 September, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions
Authors:
Victor Ferrari,
Rafael Sousa,
Marcio Pereira,
João P. L. de Carvalho,
José Nelson Amaral,
José Moreira,
Guido Araujo
Abstract:
Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv: a direct-convolution algorithm based on a MLIR/LLVM code-generation toolchain that can be integrated into machine-learning compilers . This algorithm introduce…
▽ More
Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv: a direct-convolution algorithm based on a MLIR/LLVM code-generation toolchain that can be integrated into machine-learning compilers . This algorithm introduces: (a) Convolution Slicing Analysis (CSA) - a convolution-specific 3D cache-blocking analysis pass that focuses on tile reuse over the cache hierarchy; (b) Convolution Slicing Optimization (CSO) - a code-generation pass that uses CSA to generate a tiled direct-convolution macro-kernel; and (c) Vector-Based Packing (VBP) - an architecture-specific optimized input-tensor packing solution based on vector-register shift instructions for convolutions with unitary stride. Experiments conducted on 393 convolutions from full ONNX-MLIR machine-learning models indicate that the elimination of the Im2Col transformation and the use of fast packing routines result in a total packing time reduction, on full model inference, of 2.0x - 3.9x on Intel x86 and 3.6x - 7.2x on IBM POWER10. The speed-up over an Im2Col + BLAS method based on current BLAS implementations for end-to-end machine-learning model inference is in the range of 9% - 25% for Intel x86 and 10% - 42% for IBM POWER10 architectures. The total convolution speedup for model inference is 12% - 27% on Intel x86 and 26% - 46% on IBM POWER10. SConv also outperforms BLAS GEMM, when computing pointwise convolutions, in more than 83% of the 219 tested instances.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Automatic verification of transparency protocols (extended version)
Authors:
Vincent Cheval,
José Moreira,
Mark Ryan
Abstract:
Transparency protocols are protocols whose actions can be publicly monitored by observers (such observers may include regulators, rights advocacy groups, or the general public). The observed actions are typically usages of private keys such as decryptions, and signings. Examples of transparency protocols include certificate transparency, cryptocurrency, transparent decryption, and electronic votin…
▽ More
Transparency protocols are protocols whose actions can be publicly monitored by observers (such observers may include regulators, rights advocacy groups, or the general public). The observed actions are typically usages of private keys such as decryptions, and signings. Examples of transparency protocols include certificate transparency, cryptocurrency, transparent decryption, and electronic voting. These protocols usually pose a challenge for automatic verification, because they involve sophisticated data types that have strong properties, such as Merkle trees, that allow compact proofs of data presence and tree extension.
We address this challenge by introducing new features in ProVerif, and a methodology for using them. With our methodology, it is possible to describe the data type quite abstractly, using ProVerif axioms, and prove the correctness of the protocol using those axioms as assumptions. Then, in separate steps, one can define one or more concrete implementations of the data type, and again use ProVerif to show that the implementations satisfy the assumptions that were coded as axioms. This helps make compositional proofs, splitting the proof burden into several manageable pieces. We illustrate the methodology and features by providing the first formal verification of the transparent decryption and certificate transparency protocols with a precise modelling of the Merkle tree data structure.
△ Less
Submitted 16 April, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting
Authors:
Martim Sousa,
Ana Maria Tomé,
José Moreira
Abstract:
This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data…
▽ More
This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable. Moreover, the resulting prediction intervals, besides being empirically valid along the forecast horizon, do not neglect heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution shifts, which means that its prediction intervals remain reliable over an unlimited period of time, without entailing retraining or imposing unrealistic strict assumptions on the data-generating process. Through methodically experimentation, we demonstrate that our approach outperforms other competitive methods on both real-world and synthetic datasets. The code used in the experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.
△ Less
Submitted 11 October, 2023; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Improved conformalized quantile regression
Authors:
Martim Sousa,
Ana Maria Tomé,
José Moreira
Abstract:
Conformalized quantile regression is a procedure that inherits the advantages of conformal prediction and quantile regression. That is, we use quantile regression to estimate the true conditional quantile and then apply a conformal step on a calibration set to ensure marginal coverage. In this way, we get adaptive prediction intervals that account for heteroscedasticity. However, the aforementione…
▽ More
Conformalized quantile regression is a procedure that inherits the advantages of conformal prediction and quantile regression. That is, we use quantile regression to estimate the true conditional quantile and then apply a conformal step on a calibration set to ensure marginal coverage. In this way, we get adaptive prediction intervals that account for heteroscedasticity. However, the aforementioned conformal step lacks adaptiveness as described in (Romano et al., 2019). To overcome this limitation, instead of applying a single conformal step after estimating conditional quantiles with quantile regression, we propose to cluster the explanatory variables weighted by their permutation importance with an optimized k-means and apply k conformal steps. To show that this improved version outperforms the classic version of conformalized quantile regression and is more adaptive to heteroscedasticity, we extensively compare the prediction intervals of both in open datasets.
△ Less
Submitted 6 November, 2022; v1 submitted 6 July, 2022;
originally announced July 2022.
-
${\tt simwave}$ -- A Finite Difference Simulator for Acoustic Waves Propagation
Authors:
Jaime Freire de Souza,
João Baptista Dias Moreira,
Keith Jared Roberts,
Roussian di Ramos Alves Gaioso,
Edson Satoshi Gomi,
Emílio Carlos Nelli Silva,
Hermes Senger
Abstract:
${\tt simwave}$ is an open-source Python package to perform wave simulations in 2D or 3D domains. It solves the constant and variable density acoustic wave equation with the finite difference method and has support for domain truncation techniques, several boundary conditions, and the modeling of sources and receivers given a user-defined acquisition geometry. The architecture of ${\tt simwave}$ i…
▽ More
${\tt simwave}$ is an open-source Python package to perform wave simulations in 2D or 3D domains. It solves the constant and variable density acoustic wave equation with the finite difference method and has support for domain truncation techniques, several boundary conditions, and the modeling of sources and receivers given a user-defined acquisition geometry. The architecture of ${\tt simwave}$ is designed for applications with geophysical exploration in mind. Its Python front-end enables straightforward integration with many existing Python scientific libraries for the composition of more complex workflows and applications (e.g., migration and inversion problems). The back-end is implemented in C enabling performance portability across a range of computing hardware and compilers including both CPUs and GPUs.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Encrypted Data Processing
Authors:
Jessica Tseng,
Gianfranco Bilardi,
Kattamuri Ekanadham,
Manoj Kumar,
Jose Moreira,
P. C. Pattnaik
Abstract:
In this paper, we present a comprehensive architecture for confidential computing, which we show to be general purpose and quite efficient. It executes the application as is, without any added burden or discipline requirements from the application developers. Furthermore, it does not require the trust of system software at the computing server and does not impose any added burden on the communicat…
▽ More
In this paper, we present a comprehensive architecture for confidential computing, which we show to be general purpose and quite efficient. It executes the application as is, without any added burden or discipline requirements from the application developers. Furthermore, it does not require the trust of system software at the computing server and does not impose any added burden on the communication subsystem. The proposed Encrypted Data Processing (EDAP) architecture accomplishes confidentiality, authenticity, and freshness of the key-based cryptographic data protection by adopting data encryption with a multi-level key protection scheme. It guarantees that the user data is visible only in non-privileged mode to a designated program trusted by the data owner on a designated hardware, thus protecting the data from an untrusted hardware, hypervisor, OS, or other users' applications. The cryptographic keys and protocols used for achieving these confidential computing requirements are described in a use case example. Encrypting and decrypting data in an EDAP-enabled processor can lead to performance degradation as it adds cycle time to the overall execution. However, our simulation result shows that the slowdown is only 6% on average across a collection of commercial workloads when the data encryption engine is placed between the L1 and L2 cache. We demonstrate that the EDAP architecture is valuable and practicable in the modern cloud environment for confidential computing. EDAP delivers a zero trust model of computing where the user software does not trust system software and vice versa.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
A matrix math facility for Power ISA(TM) processors
Authors:
José E. Moreira,
Kit Barton,
Steven Battle,
Peter Bergner,
Ramon Bertran,
Puneeth Bhat,
Pedro Caldeira,
David Edelsohn,
Gordon Fossum,
Brad Frey,
Nemanja Ivanovic,
Chip Kerchner,
Vincent Lim,
Shakti Kapoor,
Tulio Machado Filho,
Silvia Melitta Mueller,
Brett Olsson,
Satish Sadasivam,
Baptiste Saleil,
Bill Schmidt,
Rajalakshmi Srinivasaraghavan,
Shricharan Srivatsan,
Brian Thompto,
Andreas Wagner,
Nelson Wu
Abstract:
Power ISA(TM) Version 3.1 has introduced a new family of matrix math instructions, collectively known as the Matrix-Multiply Assist (MMA) facility. The instructions in this facility implement numerical linear algebra operations on small matrices and are meant to accelerate computation-intensive kernels, such as matrix multiplication, convolution and discrete Fourier transform. These instructions h…
▽ More
Power ISA(TM) Version 3.1 has introduced a new family of matrix math instructions, collectively known as the Matrix-Multiply Assist (MMA) facility. The instructions in this facility implement numerical linear algebra operations on small matrices and are meant to accelerate computation-intensive kernels, such as matrix multiplication, convolution and discrete Fourier transform. These instructions have led to a power- and area-efficient implementation of a high throughput math engine in the future POWER10 processor. Performance per core is 4 times better, at constant frequency, than the previous generation POWER9 processor. We also advocate the use of compiler built-ins as the preferred way of leveraging these instructions, which we illustrate through case studies covering matrix multiplication and convolution.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Social Distancing and the Internet: What Can Network Performance Measurements Tell Us?
Authors:
Jessica De Oliveira Moreira,
Amey Praveen Pasarkar,
Wenjun Chen,
Wenkai Hu,
Jan Janak,
Henning Schulzrinne
Abstract:
The COVID-19 pandemic and related restrictions forced many to work, learn, and socialize from home over the internet. There appears to be consensus that internet infrastructure in the developed world handled the resulting traffic surge well. In this paper, we study network measurement data collected by the Federal Communications Commission's Measuring Broadband America program before and during th…
▽ More
The COVID-19 pandemic and related restrictions forced many to work, learn, and socialize from home over the internet. There appears to be consensus that internet infrastructure in the developed world handled the resulting traffic surge well. In this paper, we study network measurement data collected by the Federal Communications Commission's Measuring Broadband America program before and during the pandemic in the United States (US). We analyze the data to understand the impact of lockdown orders on the performance of fixed broadband internet infrastructure across the US, and also attempt to correlate internet usage patterns with the changing behavior of users during lockdown. We found the key metrics such as change in data usage to be generally consistent with the literature. Through additional analysis, we found differences between metro and rural areas, changes in weekday, weekend, and hourly internet usage patterns, and indications of network congestion for some users.
△ Less
Submitted 13 January, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
SemifreddoNets: Partially Frozen Neural Networks for Efficient Computer Vision Systems
Authors:
Leo F Isikdogan,
Bhavin V Nayak,
Chyuan-Tyng Wu,
Joao Peralta Moreira,
Sushma Rao,
Gilad Michael
Abstract:
We propose a system comprised of fixed-topology neural networks having partially frozen weights, named SemifreddoNets. SemifreddoNets work as fully-pipelined hardware blocks that are optimized to have an efficient hardware implementation. Those blocks freeze a certain portion of the parameters at every layer and replace the corresponding multipliers with fixed scalers. Fixing the weights reduces t…
▽ More
We propose a system comprised of fixed-topology neural networks having partially frozen weights, named SemifreddoNets. SemifreddoNets work as fully-pipelined hardware blocks that are optimized to have an efficient hardware implementation. Those blocks freeze a certain portion of the parameters at every layer and replace the corresponding multipliers with fixed scalers. Fixing the weights reduces the silicon area, logic delay, and memory requirements, leading to significant savings in cost and power consumption. Unlike traditional layer-wise freezing approaches, SemifreddoNets make a profitable trade between the cost and flexibility by having some of the weights configurable at different scales and levels of abstraction in the model. Although fixing the topology and some of the weights somewhat limits the flexibility, we argue that the efficiency benefits of this strategy outweigh the advantages of a fully configurable model for many use cases. Furthermore, our system uses repeatable blocks, therefore it has the flexibility to adjust model complexity without requiring any hardware change. The hardware implementation of SemifreddoNets provides up to an order of magnitude reduction in silicon area and power consumption as compared to their equivalent implementation on a general-purpose accelerator.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
Distantly-Supervised Neural Relation Extraction with Side Information using BERT
Authors:
Johny Moreira,
Chaina Oliveira,
David Macêdo,
Cleber Zanchettin,
Luciano Barbosa
Abstract:
Relation extraction (RE) consists in categorizing the relationship between entities in a sentence. A recent paradigm to develop relation extractors is Distant Supervision (DS), which allows the automatic creation of new datasets by taking an alignment between a text corpus and a Knowledge Base (KB). KBs can sometimes also provide additional information to the RE task. One of the methods that adopt…
▽ More
Relation extraction (RE) consists in categorizing the relationship between entities in a sentence. A recent paradigm to develop relation extractors is Distant Supervision (DS), which allows the automatic creation of new datasets by taking an alignment between a text corpus and a Knowledge Base (KB). KBs can sometimes also provide additional information to the RE task. One of the methods that adopt this strategy is the RESIDE model, which proposes a distantly-supervised neural relation extraction using side information from KBs. Considering that this method outperformed state-of-the-art baselines, in this paper, we propose a related approach to RESIDE also using additional side information, but simplifying the sentence encoding with BERT embeddings. Through experiments, we show the effectiveness of the proposed method in Google Distant Supervision and Riedel datasets concerning the BGWA and RESIDE baseline methods. Although Area Under the Curve is decreased because of unbalanced datasets, P@N results have shown that the use of BERT as sentence encoding allows superior performance to baseline methods.
△ Less
Submitted 10 September, 2020; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Bandwidth-Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking
Authors:
Zhixiang Gu,
Jose Moreira,
David Edelsohn,
Ariful Azad
Abstract:
Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. It is well known that SpGEMM is a memory-bound operation, and its peak performance is expected to be bound by the memory bandwidth. Yet, existing algorithms fail to saturate the memory bandwidth, resulting in suboptimal performance under the Roofline model. I…
▽ More
Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. It is well known that SpGEMM is a memory-bound operation, and its peak performance is expected to be bound by the memory bandwidth. Yet, existing algorithms fail to saturate the memory bandwidth, resulting in suboptimal performance under the Roofline model. In this paper we characterize existing SpGEMM algorithms based on their memory access patterns and develop practical lower and upper bounds for SpGEMM performance. We then develop an SpGEMM algorithm based on outer product matrix multiplication. The newly developed algorithm called PB-SpGEMM saturates memory bandwidth by using the propagation blocking technique and by performing in-cache sorting and merging. For many practical matrices, PB-SpGEMM runs 20%-50% faster than the state-of-the-art heap and hash SpGEMM algorithms on modern multicore processors. Most importantly, PB-SpGEMM attains performance predicted by the Roofline model, and its performance remains stable with respect to matrix size and sparsity.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Towards FAIR protocols and workflows: The OpenPREDICT case study
Authors:
Remzi Celebi,
Joao Rebelo Moreira,
Ahmed A. Hassan,
Sandeep Ayyar,
Lars Ridder,
Tobias Kuhn,
Michel Dumontier
Abstract:
It is essential for the advancement of science that scientists and researchers share, reuse and reproduce workflows and protocols used by others. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize a number of important points regarding the means by which digital objects are found and reused by others. The question of how to app…
▽ More
It is essential for the advancement of science that scientists and researchers share, reuse and reproduce workflows and protocols used by others. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize a number of important points regarding the means by which digital objects are found and reused by others. The question of how to apply these principles not just to the static input and output data but also to the dynamic workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe our inclusive and overarching approach to apply the FAIR principles to workflows and protocols and demonstrate its benefits. We apply and evaluate our approach on a case study that consists of making the PREDICT workflow, a highly cited drug repurposing workflow, open and FAIR. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. A semantic model was proposed to better address these specific requirements and were evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Enabling Massive Deep Neural Networks with the GraphBLAS
Authors:
Jeremy Kepner,
Manoj Kumar,
José Moreira,
Pratap Pattnaik,
Mauricio Serrano,
Henry Tufo
Abstract:
Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more stages and more nodes per stage, these weight matrices may be required to be sparse because of memory limitations. The GraphBLAS.org math library standard was develo…
▽ More
Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more stages and more nodes per stage, these weight matrices may be required to be sparse because of memory limitations. The GraphBLAS.org math library standard was developed to provide high performance manipulation of sparse weight matrices and input/output vectors. For sufficiently sparse matrices, a sparse matrix library requires significantly less memory than the corresponding dense matrix implementation. This paper provides a brief description of the mathematics underlying the GraphBLAS. In addition, the equations of a typical DNN are rewritten in a form designed to use the GraphBLAS. An implementation of the DNN is given using a preliminary GraphBLAS C library. The performance of the GraphBLAS implementation is measured relative to a standard dense linear algebra library implementation. For various sizes of DNN weight matrices, it is shown that the GraphBLAS sparse implementation outperforms a BLAS dense implementation as the weight matrix becomes sparser.
△ Less
Submitted 8 August, 2017;
originally announced August 2017.
-
Non-locality of the meet levels of the Trotter-Weil Hierarchy
Authors:
João Daniel Moreira
Abstract:
We prove that the meet level $m$ of the Trotter-Weil, $\mathsf{V}_m$ is not local for all $m \geq 1$, as conjectured in a paper by Kufleitner and Lauser. In order to show this, we explicitly provide a language whose syntactic semigroup is in $L \mathsf{V}_m$ and not in $\mathsf{V}_m*\mathsf{D}$.
We prove that the meet level $m$ of the Trotter-Weil, $\mathsf{V}_m$ is not local for all $m \geq 1$, as conjectured in a paper by Kufleitner and Lauser. In order to show this, we explicitly provide a language whose syntactic semigroup is in $L \mathsf{V}_m$ and not in $\mathsf{V}_m*\mathsf{D}$.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
Mathematical Foundations of the GraphBLAS
Authors:
Jeremy Kepner,
Peter Aaltonen,
David Bader,
Aydın Buluc,
Franz Franchetti,
John Gilbert,
Dylan Hutchison,
Manoj Kumar,
Andrew Lumsdaine,
Henning Meyerhenke,
Scott McMillan,
Jose Moreira,
John D. Owens,
Carl Yang,
Marcin Zalewski,
Timothy Mattson
Abstract:
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. Mathematically the Graph- BLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of th…
▽ More
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. Mathematically the Graph- BLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix mul- tiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.
△ Less
Submitted 13 July, 2016; v1 submitted 18 June, 2016;
originally announced June 2016.
-
The Distribution of the Asymptotic Number of Citations to Sets of Publications by a Researcher or From an Academic Department Are Consistent With a Discrete Lognormal Model
Authors:
João A. G. Moreira,
Xiao Han T. Zeng,
Luís A. Nunes Amaral
Abstract:
How to quantify the impact of a researcher's or an institution's body of work is a matter of increasing importance to scientists, funding agencies, and hiring committees. The use of bibliometric indicators, such as the h-index or the Journal Impact Factor, have become widespread despite their known limitations. We argue that most existing bibliometric indicators are inconsistent, biased, and, wors…
▽ More
How to quantify the impact of a researcher's or an institution's body of work is a matter of increasing importance to scientists, funding agencies, and hiring committees. The use of bibliometric indicators, such as the h-index or the Journal Impact Factor, have become widespread despite their known limitations. We argue that most existing bibliometric indicators are inconsistent, biased, and, worst of all, susceptible to manipulation. Here, we pursue a principled approach to the development of an indicator to quantify the scientific impact of both individual researchers and research institutions grounded on the functional form of the distribution of the asymptotic number of citations. We validate our approach using the publication records of 1,283 researchers from seven scientific and engineering disciplines and the chemistry departments at the 106 U.S. research institutions classified as "very high research activity". Our approach has three distinct advantages. First, it accurately captures the overall scientific impact of researchers at all career stages, as measured by asymptotic citation counts. Second, unlike other measures, our indicator is resistant to manipulation and rewards publication quality over quantity. Third, our approach captures the time-evolution of the scientific impact of research institutions.
△ Less
Submitted 2 November, 2015;
originally announced November 2015.
-
A conditional construction of restricted isometries
Authors:
Afonso S. Bandeira,
Dustin G. Mixon,
Joel Moreira
Abstract:
We study the restricted isometry property of a matrix that is built from the discrete Fourier transform matrix by collecting rows indexed by quadratic residues. We find an $ε>0$ such that, conditioned on a folklore conjecture in number theory, this matrix satisfies the restricted isometry property with sparsity parameter $K=Ω(M^{1/2+ε})$, where $M$ is the number of rows.
We study the restricted isometry property of a matrix that is built from the discrete Fourier transform matrix by collecting rows indexed by quadratic residues. We find an $ε>0$ such that, conditioned on a folklore conjecture in number theory, this matrix satisfies the restricted isometry property with sparsity parameter $K=Ω(M^{1/2+ε})$, where $M$ is the number of rows.
△ Less
Submitted 23 October, 2014;
originally announced October 2014.
-
Derandomizing restricted isometries via the Legendre symbol
Authors:
Afonso S. Bandeira,
Matthew Fickus,
Dustin G. Mixon,
Joel Moreira
Abstract:
The restricted isometry property (RIP) is an important matrix condition in compressed sensing, but the best matrix constructions to date use randomness. This paper leverages pseudorandom properties of the Legendre symbol to reduce the number of random bits in an RIP matrix with Bernoulli entries. In this regard, the Legendre symbol is not special---our main result naturally generalizes to any smal…
▽ More
The restricted isometry property (RIP) is an important matrix condition in compressed sensing, but the best matrix constructions to date use randomness. This paper leverages pseudorandom properties of the Legendre symbol to reduce the number of random bits in an RIP matrix with Bernoulli entries. In this regard, the Legendre symbol is not special---our main result naturally generalizes to any small-bias sample space. We also conjecture that no random bits are necessary for our Legendre symbol--based construction.
△ Less
Submitted 16 June, 2014;
originally announced June 2014.