Search | arXiv e-print repository

SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry

Authors: Hafiz Mughees Ahmad, Afshin Rahimi

Abstract: Workplace accidents continue to pose significant risks for human safety, particularly in industries such as construction and manufacturing, and the necessity for effective Personal Protective Equipment (PPE) compliance has become increasingly paramount. Our research focuses on the development of non-invasive techniques based on the Object Detection (OD) and Convolutional Neural Network (CNN) to de… ▽ More Workplace accidents continue to pose significant risks for human safety, particularly in industries such as construction and manufacturing, and the necessity for effective Personal Protective Equipment (PPE) compliance has become increasingly paramount. Our research focuses on the development of non-invasive techniques based on the Object Detection (OD) and Convolutional Neural Network (CNN) to detect and verify the proper use of various types of PPE such as helmets, safety glasses, masks, and protective clothing. This study proposes the SH17 Dataset, consisting of 8,099 annotated images containing 75,994 instances of 17 classes collected from diverse industrial environments, to train and validate the OD models. We have trained state-of-the-art OD models for benchmarking, and initial results demonstrate promising accuracy levels with You Only Look Once (YOLO)v9-e model variant exceeding 70.9% in PPE detection. The performance of the model validation on cross-domain datasets suggests that integrating these technologies can significantly improve safety management systems, providing a scalable and efficient solution for industries striving to meet human safety regulations and protect their workforce. The dataset is available at https://github.com/ahmadmughees/sh17dataset. △ Less

Submitted 5 July, 2024; originally announced July 2024.

ACM Class: I.2.10; I.4.8; I.4.9; I.5.1; I.5.4

arXiv:2407.02060 [pdf, other]

Terminating Differentiable Tree Experts

Authors: Jonathan Thomm, Michael Hersche, Giacomo Camposampiero, Aleksandar Terzić, Bernhard Schölkopf, Abbas Rahimi

Abstract: We advance the recently proposed neuro-symbolic Differentiable Tree Machine, which learns tree operations using a combination of transformers and Tensor Product Representations. We investigate the architecture and propose two key components. We first remove a series of different transformer layers that are used in every step by introducing a mixture of experts. This results in a Differentiable Tre… ▽ More We advance the recently proposed neuro-symbolic Differentiable Tree Machine, which learns tree operations using a combination of transformers and Tensor Product Representations. We investigate the architecture and propose two key components. We first remove a series of different transformer layers that are used in every step by introducing a mixture of experts. This results in a Differentiable Tree Experts model with a constant number of parameters for any arbitrary number of steps in the computation, compared to the previous method in the Differentiable Tree Machine with a linear growth. Given this flexibility in the number of steps, we additionally propose a new termination algorithm to provide the model the power to choose how many steps to make automatically. The resulting Terminating Differentiable Tree Experts model sluggishly learns to predict the number of steps without an oracle. It can do so while maintaining the learning capabilities of the model, converging to the optimal amount of steps. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted at the 18th International Conference on Neural-Symbolic Learning and Reasoning (NeSy) 2024

arXiv:2406.19121 [pdf, other]

Towards Learning Abductive Reasoning using VSA Distributed Representations

Authors: Giacomo Camposampiero, Michael Hersche, Aleksandar Terzić, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi

Abstract: We introduce the Abductive Rule Learner with Context-awareness (ARLC), a model that solves abstract reasoning tasks based on Learn-VRF. ARLC features a novel and more broadly applicable training objective for abductive reasoning, resulting in better interpretability and higher accuracy when solving Raven's progressive matrices (RPM). ARLC allows both programming domain knowledge and learning the r… ▽ More We introduce the Abductive Rule Learner with Context-awareness (ARLC), a model that solves abstract reasoning tasks based on Learn-VRF. ARLC features a novel and more broadly applicable training objective for abductive reasoning, resulting in better interpretability and higher accuracy when solving Raven's progressive matrices (RPM). ARLC allows both programming domain knowledge and learning the rules underlying a data distribution. We evaluate ARLC on the I-RAVEN dataset, showcasing state-of-the-art accuracy across both in-distribution and out-of-distribution (unseen attribute-rule pairs) tests. ARLC surpasses neuro-symbolic and connectionist baselines, including large language models, despite having orders of magnitude fewer parameters. We show ARLC's robustness to post-programming training by incrementally learning from examples on top of programmed knowledge, which only improves its performance and does not result in catastrophic forgetting of the programmed solution. We validate ARLC's seamless transfer learning from a 2x2 RPM constellation to unseen constellations. Our code is available at https://github.com/IBM/abductive-rule-learner-with-context-awareness. △ Less

Submitted 30 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: Accepted at the 18th International Conference on Neural-Symbolic Learning and Reasoning (NeSy) 2024 [Spotlight]

arXiv:2403.10569 [pdf, other]

Achieving Pareto Optimality using Efficient Parameter Reduction for DNNs in Resource-Constrained Edge Environment

Authors: Atah Nuh Mih, Alireza Rahimi, Asfia Kawnine, Francis Palma, Monica Wachowicz, Rickey Dubay, Hung Cao

Abstract: This paper proposes an optimization of an existing Deep Neural Network (DNN) that improves its hardware utilization and facilitates on-device training for resource-constrained edge environments. We implement efficient parameter reduction strategies on Xception that shrink the model size without sacrificing accuracy, thus decreasing memory utilization during training. We evaluate our model in two e… ▽ More This paper proposes an optimization of an existing Deep Neural Network (DNN) that improves its hardware utilization and facilitates on-device training for resource-constrained edge environments. We implement efficient parameter reduction strategies on Xception that shrink the model size without sacrificing accuracy, thus decreasing memory utilization during training. We evaluate our model in two experiments: Caltech-101 image classification and PCB defect detection and compare its performance against the original Xception and lightweight models, EfficientNetV2B1 and MobileNetV2. The results of the Caltech-101 image classification show that our model has a better test accuracy (76.21%) than Xception (75.89%), uses less memory on average (847.9MB) than Xception (874.6MB), and has faster training and inference times. The lightweight models overfit with EfficientNetV2B1 having a 30.52% test accuracy and MobileNetV2 having a 58.11% test accuracy. Both lightweight models have better memory usage than our model and Xception. On the PCB defect detection, our model has the best test accuracy (90.30%), compared to Xception (88.10%), EfficientNetV2B1 (55.25%), and MobileNetV2 (50.50%). MobileNetV2 has the least average memory usage (849.4MB), followed by our model (865.8MB), then EfficientNetV2B1 (874.8MB), and Xception has the highest (893.6MB). We further experiment with pre-trained weights and observe that memory usage decreases thereby showing the benefits of transfer learning. A Pareto analysis of the models' performance shows that our optimized model architecture satisfies accuracy and low memory utilization objectives. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.05355

arXiv:2403.07851 [pdf, other]

12 mJ per Class On-Device Online Few-Shot Class-Incremental Learning

Authors: Yoga Esa Wibowo, Cristian Cioflan, Thorir Mar Ingolfsson, Michael Hersche, Leo Zhao, Abbas Rahimi, Luca Benini

Abstract: Few-Shot Class-Incremental Learning (FSCIL) enables machine learning systems to expand their inference capabilities to new classes using only a few labeled examples, without forgetting the previously learned classes. Classical backpropagation-based learning and its variants are often unsuitable for battery-powered, memory-constrained systems at the extreme edge. In this work, we introduce Online F… ▽ More Few-Shot Class-Incremental Learning (FSCIL) enables machine learning systems to expand their inference capabilities to new classes using only a few labeled examples, without forgetting the previously learned classes. Classical backpropagation-based learning and its variants are often unsuitable for battery-powered, memory-constrained systems at the extreme edge. In this work, we introduce Online Few-Shot Class-Incremental Learning (O-FSCIL), based on a lightweight model consisting of a pretrained and metalearned feature extractor and an expandable explicit memory storing the class prototypes. The architecture is pretrained with a novel feature orthogonality regularization and metalearned with a multi-margin loss. For learning a new class, our approach extends the explicit memory with novel class prototypes, while the remaining architecture is kept frozen. This allows learning previously unseen classes based on only a few examples with one single pass (hence online). O-FSCIL obtains an average accuracy of 68.62% on the FSCIL CIFAR100 benchmark, achieving state-of-the-art results. Tailored for ultra-low-power platforms, we implement O-FSCIL on the 60 mW GAP9 microcontroller, demonstrating online learning capabilities within just 12 mJ per new class. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 tables, 3 figures. Accepted at IEEE DATE 2024

arXiv:2402.05785 [pdf, other]

Limits of Transformer Language Models on Learning to Compose Algorithms

Authors: Jonathan Thomm, Aleksandar Terzic, Giacomo Camposampiero, Michael Hersche, Bernhard Schölkopf, Abbas Rahimi

Abstract: We analyze the capabilities of Transformer language models in learning compositional discrete tasks. To this end, we evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks. On both training LLaMA models from scratch and prompting on GPT-4 and Gemini, we measure how well these models can reuse primitives observable… ▽ More We analyze the capabilities of Transformer language models in learning compositional discrete tasks. To this end, we evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks. On both training LLaMA models from scratch and prompting on GPT-4 and Gemini, we measure how well these models can reuse primitives observable in the sub-tasks to learn the composition task. Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient: LLaMA requires more data samples than relearning all sub-tasks from scratch to learn the compositional task; in-context prompting with few samples is unreliable and fails at executing the sub-tasks or correcting the errors in multi-round code generation. Further, by leveraging complexity theory, we support these findings with a theoretical analysis focused on the sample inefficiency of gradient descent in memorizing feedforward models. △ Less

Submitted 25 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.00243 [pdf, other]

Capacity Constraint Analysis Using Object Detection for Smart Manufacturing

Authors: Hafiz Mughees Ahmad, Afshin Rahimi, Khizer Hayat

Abstract: The increasing popularity of Deep Learning (DL) based Object Detection (OD) methods and their real-world applications have opened new venues in smart manufacturing. Traditional industries struck by capacity constraints after Coronavirus Disease (COVID-19) require non-invasive methods for in-depth operations' analysis to optimize and increase their revenue. In this study, we have initially develope… ▽ More The increasing popularity of Deep Learning (DL) based Object Detection (OD) methods and their real-world applications have opened new venues in smart manufacturing. Traditional industries struck by capacity constraints after Coronavirus Disease (COVID-19) require non-invasive methods for in-depth operations' analysis to optimize and increase their revenue. In this study, we have initially developed a Convolutional Neural Network (CNN) based OD model to tackle this issue. This model is trained to accurately identify the presence of chairs and individuals on the production floor. The identified objects are then passed to the CNN based tracker, which tracks them throughout their life cycle in the workstation. The extracted meta-data is further processed through a novel framework for the capacity constraint analysis. We identified that the Station C is only 70.6% productive through 6 months. Additionally, the time spent at each station is recorded and aggregated for each object. This data proves helpful in conducting annual audits and effectively managing labor and material over time. △ Less

Submitted 31 January, 2024; originally announced February 2024.

ACM Class: I.2.10; I.4.8; I.4.9; I.5.1; I.5.4

arXiv:2401.16876 [pdf, other]

Zero-shot Classification using Hyperdimensional Computing

Authors: Samuele Ruffino, Geethan Karunaratne, Michael Hersche, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: Classification based on Zero-shot Learning (ZSL) is the ability of a model to classify inputs into novel classes on which the model has not previously seen any training examples. Providing an auxiliary descriptor in the form of a set of attributes describing the new classes involved in the ZSL-based classification is one of the favored approaches to solving this challenging task. In this work, ins… ▽ More Classification based on Zero-shot Learning (ZSL) is the ability of a model to classify inputs into novel classes on which the model has not previously seen any training examples. Providing an auxiliary descriptor in the form of a set of attributes describing the new classes involved in the ZSL-based classification is one of the favored approaches to solving this challenging task. In this work, inspired by Hyperdimensional Computing (HDC), we propose the use of stationary binary codebooks of symbol-like distributed representations inside an attribute encoder to compactly represent a computationally simple end-to-end trainable model, which we name Hyperdimensional Computing Zero-shot Classifier~(HDC-ZSC). It consists of a trainable image encoder, an attribute encoder based on HDC, and a similarity kernel. We show that HDC-ZSC can be used to first perform zero-shot attribute extraction tasks and, can later be repurposed for Zero-shot Classification tasks with minimal architectural changes and minimal model retraining. HDC-ZSC achieves Pareto optimal results with a 63.8% top-1 classification accuracy on the CUB-200 dataset by having only 26.6 million trainable parameters. Compared to two other state-of-the-art non-generative approaches, HDC-ZSC achieves 4.3% and 9.9% better accuracy, while they require more than 1.85x and 1.72x parameters compared to HDC-ZSC, respectively. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: This is the extended version of a paper accepted in the Design, Automation, and Test in Europe Conference (DATE), 2024

arXiv:2401.16024 [pdf, other]

Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architectures

Authors: Michael Hersche, Francesco di Stefano, Thomas Hofmann, Abu Sebastian, Abbas Rahimi

Abstract: Abstract reasoning is a cornerstone of human intelligence, and replicating it with artificial intelligence (AI) presents an ongoing challenge. This study focuses on efficiently solving Raven's progressive matrices (RPM), a visual test for assessing abstract reasoning abilities, by using distributed computation and operators provided by vector-symbolic architectures (VSA). Instead of hard-coding th… ▽ More Abstract reasoning is a cornerstone of human intelligence, and replicating it with artificial intelligence (AI) presents an ongoing challenge. This study focuses on efficiently solving Raven's progressive matrices (RPM), a visual test for assessing abstract reasoning abilities, by using distributed computation and operators provided by vector-symbolic architectures (VSA). Instead of hard-coding the rule formulations associated with RPMs, our approach can learn the VSA rule formulations (hence the name Learn-VRF) with just one pass through the training data. Yet, our approach, with compact parameters, remains transparent and interpretable. Learn-VRF yields accurate predictions on I-RAVEN's in-distribution data, and exhibits strong out-of-distribution capabilities concerning unseen attribute-rule pairs, significantly outperforming pure connectionist baselines including large language models. Our code is available at https://github.com/IBM/learn-vector-symbolic-architectures-rule-formulations. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted in NeurIPS 2023 Workshop on MATH-AI

arXiv:2312.05605 [pdf, other]

TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing

Authors: Aleksandar Terzic, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length. We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computation… ▽ More MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length. We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$. The resulting model is called TCNCA, a Temporal Convolutional Network with Chunked Attention. We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall. On EnWik8, TCNCA outperforms MEGA, reaching a lower loss with $1.37\times$/$1.24\times$ faster forward/backward pass during training. The dilated convolutions used in TCNCA are consistently and significantly faster operations than the FFT-based parallelized recurrence in GPUs, making them a scalable candidate for handling very large sequence lengths: they are up to $7.07\times$/$2.86\times$ faster in the forward/backward pass for sequences up to 131k. Further on LRA, TCNCA achieves, on average, $1.28\times$ speed-up during inference with similar accuracy to what MEGA achieves. On associative recall, we find that even a simplified version of TCNCA, without excessive multiplicative and additive interactions, remains superior or competitive to MEGA on a range of sequence lengths and vocabulary sizes. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.04540 [pdf, other]

Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations

Authors: Yuejiang Liu, Ahmad Rahimi, Po-Chien Luan, Frano Rajič, Alexandre Alahi

Abstract: Modeling spatial-temporal interactions among neighboring agents is at the heart of multi-agent problems such as motion forecasting and crowd navigation. Despite notable progress, it remains unclear to which extent modern representations can capture the causal relationships behind agent interactions. In this work, we take an in-depth look at the causal awareness of these representations, from compu… ▽ More Modeling spatial-temporal interactions among neighboring agents is at the heart of multi-agent problems such as motion forecasting and crowd navigation. Despite notable progress, it remains unclear to which extent modern representations can capture the causal relationships behind agent interactions. In this work, we take an in-depth look at the causal awareness of these representations, from computational formalism to real-world practice. First, we cast doubt on the notion of non-causal robustness studied in the recent CausalAgents benchmark. We show that recent representations are already partially resilient to perturbations of non-causal agents, and yet modeling indirect causal effects involving mediator agents remains challenging. To address this challenge, we introduce a metric learning approach that regularizes latent representations with causal annotations. Our controlled experiments show that this approach not only leads to higher degrees of causal awareness but also yields stronger out-of-distribution robustness. To further operationalize it in practice, we propose a sim-to-real causal transfer method via cross-domain multi-task learning. Experiments on pedestrian datasets show that our method can substantially boost generalization, even in the absence of real-world causal annotations. We hope our work provides a new perspective on the challenges and potential pathways towards causally-aware representations of multi-agent interactions. Our code is available at https://github.com/socialcausality. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2312.02829 [pdf, other]

MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

Authors: Nicolas Menet, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONet… ▽ More With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONets augment various deep neural network architectures with variable binding mechanisms to represent an arbitrary number of inputs in a compositional data structure via fixed-width distributed representations. Accordingly, MIMONets adapt nonlinear neural transformations to process the data structure holistically, leading to a speedup nearly proportional to the number of superposed input items in the data structure. After processing in superposition, an unbinding mechanism recovers each transformed input of interest. MIMONets also provide a dynamic trade-off between accuracy and throughput by an instantaneous on-demand switching between a set of accuracy-throughput operating points, yet within a single set of fixed parameters. We apply the concept of MIMONets to both CNN and Transformer architectures resulting in MIMOConv and MIMOFormer, respectively. Empirical evaluations show that MIMOConv achieves about 2-4 x speedup at an accuracy delta within [+0.68, -3.18]% compared to WideResNet CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle 2-4 inputs at once while maintaining a high average accuracy within a [-1.07, -3.43]% delta on the long range arena benchmark. Finally, we provide mathematical bounds on the interference between superposition channels in MIMOFormer. Our code is available at https://github.com/IBM/multiple-input-multiple-output-nets. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: accepted in NeurIPS 2023

arXiv:2309.08798 [pdf, other]

D3: Data Diversity Design for Systematic Generalization in Visual Question Answering

Authors: Amir Rahimi, Vanessa D'Amario, Moyuru Yamada, Kentaro Takemoto, Tomotake Sasaki, Xavier Boix

Abstract: Systematic generalization is a crucial aspect of intelligence, which refers to the ability to generalize to novel tasks by combining known subtasks and concepts. One critical factor that has been shown to influence systematic generalization is the diversity of training data. However, diversity can be defined in various ways, as data have many factors of variation. A more granular understanding of… ▽ More Systematic generalization is a crucial aspect of intelligence, which refers to the ability to generalize to novel tasks by combining known subtasks and concepts. One critical factor that has been shown to influence systematic generalization is the diversity of training data. However, diversity can be defined in various ways, as data have many factors of variation. A more granular understanding of how different aspects of data diversity affect systematic generalization is lacking. We present new evidence in the problem of Visual Question Answering (VQA) that reveals that the diversity of simple tasks (i.e. tasks formed by a few subtasks and concepts) plays a key role in achieving systematic generalization. This implies that it may not be essential to gather a large and varied number of complex tasks, which could be costly to obtain. We demonstrate that this result is independent of the similarity between the training and testing data and applies to well-known families of neural network architectures for VQA (i.e. monolithic architectures and neural module networks). Additionally, we observe that neural module networks leverage all forms of data diversity we evaluated, while monolithic architectures require more extensive amounts of data to do so. These findings provide a first step towards understanding the interactions between data diversity design, neural network architectures, and systematic generalization capabilities. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Under review, 15 pages

arXiv:2308.08496 [pdf, other]

Understanding User Intent Modeling for Conversational Recommender Systems: A Systematic Literature Review

Authors: Siamak Farshidi, Kiyan Rezaee, Sara Mazaheri, Amir Hossein Rahimi, Ali Dadashzadeh, Morteza Ziabakhsh, Sadegh Eskandari, Slinger Jansen

Abstract: Context: User intent modeling is a crucial process in Natural Language Processing that aims to identify the underlying purpose behind a user's request, enabling personalized responses. With a vast array of approaches introduced in the literature (over 13,000 papers in the last decade), understanding the related concepts and commonly used models in AI-based systems is essential. Method: We conducte… ▽ More Context: User intent modeling is a crucial process in Natural Language Processing that aims to identify the underlying purpose behind a user's request, enabling personalized responses. With a vast array of approaches introduced in the literature (over 13,000 papers in the last decade), understanding the related concepts and commonly used models in AI-based systems is essential. Method: We conducted a systematic literature review to gather data on models typically employed in designing conversational recommender systems. From the collected data, we developed a decision model to assist researchers in selecting the most suitable models for their systems. Additionally, we performed two case studies to evaluate the effectiveness of our proposed decision model. Results: Our study analyzed 59 distinct models and identified 74 commonly used features. We provided insights into potential model combinations, trends in model selection, quality concerns, evaluation measures, and frequently used datasets for training and evaluating these models. Contribution: Our study contributes practical insights and a comprehensive understanding of user intent modeling, empowering the development of more effective and personalized conversational recommender systems. With the Conversational Recommender System, researchers can perform a more systematic and efficient assessment of fitting intent modeling frameworks. △ Less

Submitted 5 August, 2023; originally announced August 2023.

arXiv:2307.04599 [pdf, other]

Bridging MDE and AI: A Systematic Review of Domain-Specific Languages and Model-Driven Practices in AI Software Systems Engineering

Authors: Simon Raedler, Luca Berardinelli, Karolin Winter, Abbas Rahimi, Stefanie Rinderle-Ma

Abstract: Background:Technical systems are growing in complexity with more components and functions across various disciplines. Model-Driven Engineering (MDE) helps manage this complexity by using models as key artifacts. Domain-Specific Languages (DSL) supported by MDE facilitate modeling. As data generation in product development increases, there's a growing demand for AI algorithms, which can be challeng… ▽ More Background:Technical systems are growing in complexity with more components and functions across various disciplines. Model-Driven Engineering (MDE) helps manage this complexity by using models as key artifacts. Domain-Specific Languages (DSL) supported by MDE facilitate modeling. As data generation in product development increases, there's a growing demand for AI algorithms, which can be challenging to implement. Integrating AI algorithms with DSL and MDE can streamline this process. Objective:This study aims to investigate the existing model-driven approaches relying on DSL in support of the engineering of AI software systems to sharpen future research further and define the current state of the art. Method:We conducted a Systemic Literature Review (SLR), collecting papers from five major databases resulting in 1335 candidate studies, eventually retaining 18 primary studies. Each primary study will be evaluated and discussed with respect to the adoption of MDE principles and practices and the phases of AI development support aligned with the stages of the CRISP-DM methodology. Results:The study's findings show that language workbenches are of paramount importance in dealing with all aspects of modeling language development and are leveraged to define DSL explicitly addressing AI concerns. The most prominent AI-related concerns are training and modeling of the AI algorithm, while minor emphasis is given to the time-consuming preparation of the data. Early project phases that support interdisciplinary communication of requirements, e.g., CRISP-DM Business Understanding phase, are rarely reflected. Conclusion:The study found that the use of MDE for AI is still in its early stages, and there is no single tool or method that is widely used. Additionally, current approaches tend to focus on specific stages of development rather than providing support for the entire development process. △ Less

Submitted 6 May, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: 57 pages, 2 figures, 8 tables

ACM Class: A.1; H.1.0; I.2.4

arXiv:2304.04687 [pdf, other]

Learning to Detect Touches on Cluttered Tables

Authors: Norberto Adrian Goussies, Kenji Hata, Shruthi Prabhakara, Abhishek Amit, Tony Aube, Carl Cepress, Diana Chang, Li-Te Cheng, Horia Stefan Ciurdar, Mike Cleron, Chelsey Fleming, Ashwin Ganti, Divyansh Garg, Niloofar Gheissari, Petra Luna Grutzik, David Hendon, Daniel Iglesia, Jin Kim, Stuart Kyle, Chris LaRosa, Roman Lewkow, Peter F McDermott, Chris Melancon, Paru Nackeeran, Neal Norwitz , et al. (6 additional authors not shown)

Abstract: We present a novel self-contained camera-projector tabletop system with a lamp form-factor that brings digital intelligence to our tables. We propose a real-time, on-device, learning-based touch detection algorithm that makes any tabletop interactive. The top-down configuration and learning-based algorithm makes our method robust to the presence of clutter, a main limitation of existing camera-pro… ▽ More We present a novel self-contained camera-projector tabletop system with a lamp form-factor that brings digital intelligence to our tables. We propose a real-time, on-device, learning-based touch detection algorithm that makes any tabletop interactive. The top-down configuration and learning-based algorithm makes our method robust to the presence of clutter, a main limitation of existing camera-projector tabletop systems. Our research prototype enables a set of experiences that combine hand interactions and objects present on the table. A video can be found at https://youtu.be/hElC_c25Fg8. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2303.13957 [pdf, other]

Factorizers for Distributed Sparse Block Codes

Authors: Michael Hersche, Aleksandar Terzic, Geethan Karunaratne, Jovin Langenegger, Angéline Pouget, Giovanni Cherubini, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-width vectors. One major challenge however is to disentangle, or factorize, the distributed representation of data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging w… ▽ More Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-width vectors. One major challenge however is to disentangle, or factorize, the distributed representation of data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when SBCs vectors are noisy due to perceptual uncertainty and approximations made by modern neural networks to generate the query SBCs vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, conditional random sampling, and an $\ell_\infty$-based similarity metric. Secondly, the proposed factorizer maintains a high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby $C$ trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having $F$-factor codebooks, each with $\sqrt[\leftroot{-2}\uproot{2}F]{C}$ fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. With this integration, the convolutional layers can generate a noisy product vector that our factorizer can still decode, whereby the decoded factors can have different interpretations based on downstream tasks. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are notably reduced compared to the FCL. △ Less

Submitted 28 May, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Accepted at Neurosymbolic Artificial Intelligence

arXiv:2303.08067 [pdf, other]

doi 10.1109/JETCAS.2023.3243064

WHYPE: A Scale-Out Architecture with Wireless Over-the-Air Majority for Scalable In-memory Hyperdimensional Computing

Authors: Robert Guirado, Abbas Rahimi, Geethan Karunaratne, Eduard Alarcón, Abu Sebastian, Sergi Abadal

Abstract: Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using long random vectors known as hypervectors. Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) has shown promise as it is very efficient in performing matrix-vector multiplications, which are common in the HDC algebra. Although H… ▽ More Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using long random vectors known as hypervectors. Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) has shown promise as it is very efficient in performing matrix-vector multiplications, which are common in the HDC algebra. Although HDC architectures based on IMC already exist, how to scale them remains a key challenge due to collective communication patterns that these architectures required and that traditional chip-scale networks were not designed for. To cope with this difficulty, we propose a scale-out HDC architecture called WHYPE, which uses wireless in-package communication technology to interconnect a large number of physically distributed IMC cores that either encode hypervectors or perform multiple similarity searches in parallel. In this context, the key enabler of WHYPE is the opportunistic use of the wireless network as a medium for over-the-air computation. WHYPE implements an optimized source coding that allows receivers to calculate the bit-wise majority of multiple hypervectors (a useful operation in HDC) being transmitted concurrently over the wireless channel. By doing so, we achieve a joint broadcast distribution and computation with a performance and efficiency unattainable with wired interconnects, which in turn enables massive parallelization of the architecture. Through evaluations at the on-chip network and complete architecture levels, we demonstrate that WHYPE can bundle and distribute hypervectors faster and more efficiently than a hypothetical wired implementation, and that it scales well to tens of receivers. We show that the average error rate of the majority computation is low, such that it has negligible impact on the accuracy of HDC classification tasks. △ Less

Submitted 4 February, 2023; originally announced March 2023.

Comments: Accepted at IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS). arXiv admin note: text overlap with arXiv:2205.10889

arXiv:2301.01682 [pdf, other]

DOT: A flexible multi-objective optimization framework for transferring features across single-cell and spatial omics

Authors: Arezou Rahimi, Luis A. Vale-Silva, Maria Faelth Savitski, Jovan Tanevski, Julio Saez-Rodriguez

Abstract: Single-cell RNA sequencing (scRNA-seq) and spatially-resolved imaging/sequencing technologies have revolutionized biomedical research. On one hand, scRNA-seq provides information about a large portion of the transcriptome for individual cells, but lacks the spatial context. On the other hand, spatially-resolved measurements come with a trade-off between resolution and gene coverage. Combining scRN… ▽ More Single-cell RNA sequencing (scRNA-seq) and spatially-resolved imaging/sequencing technologies have revolutionized biomedical research. On one hand, scRNA-seq provides information about a large portion of the transcriptome for individual cells, but lacks the spatial context. On the other hand, spatially-resolved measurements come with a trade-off between resolution and gene coverage. Combining scRNA-seq with different spatially-resolved technologies can thus provide a more complete map of tissues with enhanced cellular resolution and gene coverage. Here, we propose DOT, a novel multi-objective optimization framework for transferring cellular features across these data modalities. DOT is flexible and can be used to infer categorical (cell type or cell state) or continuous features (gene expression) in different types of spatial omics. Our optimization model combines practical aspects related to tissue composition, technical effects, and integration of prior knowledge, thereby providing flexibility to combine scRNA-seq and both low- and high-resolution spatial data. Our fast implementation based on the Frank-Wolfe algorithm achieves state-of-the-art or improved performance in localizing cell features in high- and low-resolution spatial data and estimating the expression of unmeasured genes in low-coverage spatial data across different tissues. DOT is freely available and can be deployed efficiently without large computational resources; typical cases-studies can be run on a laptop, facilitating its use. △ Less

Submitted 21 July, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: 36 pages, 6 figures

arXiv:2301.01221 [pdf, other]

Unlocking Metaverse-as-a-Service The three pillars to watch: Privacy and Security, Edge Computing, and Blockchain

Authors: Vesal Ahsani, Ali Rahimi, Mehdi Letafati, Babak Hossein Khalaj

Abstract: In this article, the authors provide a comprehensive overview on three core pillars of metaverse-as-a-service (MaaS) platforms; privacy and security, edge computing, and blockchain technology. The article starts by investigating security aspects for the wireless access to the metaverse. Then it goes through the privacy and security issues inside the metaverse from data-centric, learning-centric, a… ▽ More In this article, the authors provide a comprehensive overview on three core pillars of metaverse-as-a-service (MaaS) platforms; privacy and security, edge computing, and blockchain technology. The article starts by investigating security aspects for the wireless access to the metaverse. Then it goes through the privacy and security issues inside the metaverse from data-centric, learning-centric, and human-centric points-of-view. The authors address private and secure mechanisms for privatizing sensitive data attributes and securing machine learning algorithms running in a distributed manner within the metaverse platforms. Novel visions and less-investigated methods are reviewed to help mobile network operators and metaverse service providers facilitate the realization of secure and private MaaS through different layers of the metaverse, ranging from the access layer to the social interactions among clients. Later in the article, it has been explained how the paradigm of edge computing can strengthen different aspects of the metaverse. Along with that, the challenges of using edge computing in the metaverse have been comprehensively investigated. Additionally, the paper has comprehensively investigated and analyzed 10 main challenges of MaaS platforms and thoroughly discussed how blockchain technology provides solutions for these constraints. At the final, future vision and directions, such as content-centric security and zero-trust metaverse, some blockchain's unsolved challenges are also discussed to bring further insights for the network designers in the metaverse era. △ Less

Submitted 11 January, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

Comments: 21 pages, 4 figures, added references for section 3-A

arXiv:2211.15351 [pdf, other]

Testing the effectiveness of saliency-based explainability in NLP using randomized survey-based experiments

Authors: Adel Rahimi, Shaurya Jain

Abstract: As the applications of Natural Language Processing (NLP) in sensitive areas like Political Profiling, Review of Essays in Education, etc. proliferate, there is a great need for increasing transparency in NLP models to build trust with stakeholders and identify biases. A lot of work in Explainable AI has aimed to devise explanation methods that give humans insights into the workings and predictions… ▽ More As the applications of Natural Language Processing (NLP) in sensitive areas like Political Profiling, Review of Essays in Education, etc. proliferate, there is a great need for increasing transparency in NLP models to build trust with stakeholders and identify biases. A lot of work in Explainable AI has aimed to devise explanation methods that give humans insights into the workings and predictions of NLP models. While these methods distill predictions from complex models like Neural Networks into consumable explanations, how humans understand these explanations is still widely unexplored. Innate human tendencies and biases can handicap the understanding of these explanations in humans, and can also lead to them misjudging models and predictions as a result. We designed a randomized survey-based experiment to understand the effectiveness of saliency-based Post-hoc explainability methods in Natural Language Processing. The result of the experiment showed that humans have a tendency to accept explanations with a less critical view. △ Less

Submitted 25 November, 2022; originally announced November 2022.

arXiv:2211.05052 [pdf, other]

doi 10.1038/s41565-023-01357-8

In-memory factorization of holographic perceptual representations

Authors: Jovin Langenegger, Geethan Karunaratne, Michael Hersche, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: Disentanglement of constituent factors of a sensory signal is central to perception and cognition and hence is a critical task for future artificial intelligence systems. In this paper, we present a compute engine capable of efficiently factorizing holographic perceptual representations by exploiting the computation-in-superposition capability of brain-inspired hyperdimensional computing and the i… ▽ More Disentanglement of constituent factors of a sensory signal is central to perception and cognition and hence is a critical task for future artificial intelligence systems. In this paper, we present a compute engine capable of efficiently factorizing holographic perceptual representations by exploiting the computation-in-superposition capability of brain-inspired hyperdimensional computing and the intrinsic stochasticity associated with analog in-memory computing based on nanoscale memristive devices. Such an iterative in-memory factorizer is shown to solve at least five orders of magnitude larger problems that cannot be solved otherwise, while also significantly lowering the computational time and space complexity. We present a large-scale experimental demonstration of the factorizer by employing two in-memory compute chips based on phase-change memristive devices. The dominant matrix-vector multiply operations are executed at O(1) thus reducing the computational time complexity to merely the number of iterations. Moreover, we experimentally demonstrate the ability to factorize visual perceptual representations reliably and efficiently. △ Less

Submitted 16 February, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

Comments: 23 pages, 4 figures, 1 extended data figure, 3 supplementary notes, 2 supplementary figures and 3 supplementary tables

arXiv:2207.06810 [pdf, other]

In-memory Realization of In-situ Few-shot Continual Learning with a Dynamically Evolving Explicit Memory

Authors: Geethan Karunaratne, Michael Hersche, Jovin Langenegger, Giovanni Cherubini, Manuel Le Gallo-Bourdeau, Urs Egger, Kevin Brew, Sam Choi, INJO OK, Mary Claire Silvestre, Ning Li, Nicole Saulnier, Victor Chan, Ishtiaq Ahsan, Vijay Narayanan, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: Continually learning new classes from a few training examples without forgetting previous old classes demands a flexible architecture with an inevitably growing portion of storage, in which new examples and classes can be incrementally stored and efficiently retrieved. One viable architectural solution is to tightly couple a stationary deep neural network to a dynamically evolving explicit memory… ▽ More Continually learning new classes from a few training examples without forgetting previous old classes demands a flexible architecture with an inevitably growing portion of storage, in which new examples and classes can be incrementally stored and efficiently retrieved. One viable architectural solution is to tightly couple a stationary deep neural network to a dynamically evolving explicit memory (EM). As the centerpiece of this architecture, we propose an EM unit that leverages energy-efficient in-memory compute (IMC) cores during the course of continual learning operations. We demonstrate for the first time how the EM unit can physically superpose multiple training examples, expand to accommodate unseen classes, and perform similarity search during inference, using operations on an IMC core based on phase-change memory (PCM). Specifically, the physical superposition of a few encoded training examples is realized via in-situ progressive crystallization of PCM devices. The classification accuracy achieved on the IMC core remains within a range of 1.28%--2.5% compared to that of the state-of-the-art full-precision baseline software model on both the CIFAR-100 and miniImageNet datasets when continually learning 40 novel classes (from only five examples per class) on top of 60 old classes. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: Accepted at the European Solid-state Devices and Circuits Conference (ESSDERC), September 2022

arXiv:2205.10889 [pdf, other]

Wireless On-Chip Communications for Scalable In-memory Hyperdimensional Computing

Authors: Robert Guirado, Abbas Rahimi, Geethan Karunaratne, Eduard Alarcón, Abu Sebastian, Sergi Abadal

Abstract: Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using very long random vectors (aka hypervectors). Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) systems have been recently proved to be one of the most energy-efficient options, due to hypervector manipulations in the memory its… ▽ More Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using very long random vectors (aka hypervectors). Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) systems have been recently proved to be one of the most energy-efficient options, due to hypervector manipulations in the memory itself that reduces data movement. Although implementations of HDC on single IMC cores have been made, their parallelization is still unresolved due to the communication challenges that these novel architectures impose and that traditional Networks-on-Chip and Networks-in-Package were not designed for. To cope with this difficulty, we propose the use of wireless on-chip communication technology in unique ways. We are particularly interested in physically distributing a large number of IMC cores performing similarity search across a chip, and maintaining the classification accuracy when each of which is queried with a slightly different version of a bundled hypervector. To achieve it, we introduce a novel over-the-air computing that consists of defining different binary decision regions in the receivers so as to compute the logical majority operation (i.e., bundling, or superposition) required in HDC. It introduces moderate overheads of a single antenna and receiver per IMC core. By doing so, we achieve a joint broadcast distribution and computation with a performance and efficiency unattainable with wired interconnects, which in turn enables massive parallelization of the architecture. It is demonstrated that the proposed approach allows to both bundle at least three hypervectors and scale similarity search to 64 IMC cores seamlessly, while incurring an average bit error ratio of 0.01 without any impact in the accuracy of a generic HDC-based classifier working with 512-bit vectors. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: This paper has been accepted at 2022 IEEE International Joint Conference on Neural Networks (IJCNN)

arXiv:2203.16588 [pdf, other]

Constrained Few-shot Class-incremental Learning

Authors: Michael Hersche, Geethan Karunaratne, Giovanni Cherubini, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: Continually learning new classes from fresh data without forgetting previous knowledge of old classes is a very challenging research problem. Moreover, it is imperative that such learning must respect certain memory and computational constraints such as (i) training samples are limited to only a few per class, (ii) the computational cost of learning a novel class remains constant, and (iii) the me… ▽ More Continually learning new classes from fresh data without forgetting previous knowledge of old classes is a very challenging research problem. Moreover, it is imperative that such learning must respect certain memory and computational constraints such as (i) training samples are limited to only a few per class, (ii) the computational cost of learning a novel class remains constant, and (iii) the memory footprint of the model grows at most linearly with the number of classes observed. To meet the above constraints, we propose C-FSCIL, which is architecturally composed of a frozen meta-learned feature extractor, a trainable fixed-size fully connected layer, and a rewritable dynamically growing memory that stores as many vectors as the number of encountered classes. C-FSCIL provides three update modes that offer a trade-off between accuracy and compute-memory cost of learning novel classes. C-FSCIL exploits hyperdimensional embedding that allows to continually express many more classes than the fixed dimensions in the vector space, with minimal interference. The quality of class vector representations is further improved by aligning them quasi-orthogonally to each other by means of novel loss functions. Experiments on the CIFAR100, miniImageNet, and Omniglot datasets show that C-FSCIL outperforms the baselines with remarkable accuracy and compression. It also scales up to the largest problem size ever tried in this few-shot setting by learning 423 novel classes on top of 1200 base classes with less than 1.6% accuracy drop. Our code is available at https://github.com/IBM/constrained-FSCIL. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: CVPR 2022 camera-ready version

arXiv:2203.06223 [pdf, other]

doi 10.1109/TNNLS.2022.3159445

Generalized Key-Value Memory to Flexibly Adjust Redundancy in Memory-Augmented Networks

Authors: Denis Kleyko, Geethan Karunaratne, Jan M. Rabaey, Abu Sebastian, Abbas Rahimi

Abstract: Memory-augmented neural networks enhance a neural network with an external key-value memory whose complexity is typically dominated by the number of support vectors in the key memory. We propose a generalized key-value memory that decouples its dimension from the number of support vectors by introducing a free parameter that can arbitrarily add or remove redundancy to the key memory representation… ▽ More Memory-augmented neural networks enhance a neural network with an external key-value memory whose complexity is typically dominated by the number of support vectors in the key memory. We propose a generalized key-value memory that decouples its dimension from the number of support vectors by introducing a free parameter that can arbitrarily add or remove redundancy to the key memory representation. In effect, it provides an additional degree of freedom to flexibly control the trade-off between robustness and the resources required to store and compute the generalized key-value memory. This is particularly useful for realizing the key memory on in-memory computing hardware where it exploits nonideal, but extremely efficient non-volatile memory devices for dense storage and computation. Experimental results show that adapting this parameter on demand effectively mitigates up to 44% nonidealities, at equal accuracy and number of devices, without any need for neural network retraining. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 8 pages, 7 figures

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2022

arXiv:2203.04571 [pdf, other]

A Neuro-vector-symbolic Architecture for Solving Raven's Progressive Matrices

Authors: Michael Hersche, Mustafa Zeqiri, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: Neither deep neural networks nor symbolic AI alone has approached the kind of intelligence expressed in humans. This is mainly because neural networks are not able to decompose joint representations to obtain distinct objects (the so-called binding problem), while symbolic AI suffers from exhaustive rule searches, among other problems. These two problems are still pronounced in neuro-symbolic AI w… ▽ More Neither deep neural networks nor symbolic AI alone has approached the kind of intelligence expressed in humans. This is mainly because neural networks are not able to decompose joint representations to obtain distinct objects (the so-called binding problem), while symbolic AI suffers from exhaustive rule searches, among other problems. These two problems are still pronounced in neuro-symbolic AI which aims to combine the best of the two paradigms. Here, we show that the two problems can be addressed with our proposed neuro-vector-symbolic architecture (NVSA) by exploiting its powerful operators on high-dimensional distributed representations that serve as a common language between neural networks and symbolic AI. The efficacy of NVSA is demonstrated by solving the Raven's progressive matrices datasets. Compared to state-of-the-art deep neural network and neuro-symbolic approaches, end-to-end training of NVSA achieves a new record of 87.7% average accuracy in RAVEN, and 88.1% in I-RAVEN datasets. Moreover, compared to the symbolic reasoning within the neuro-symbolic approaches, the probabilistic reasoning of NVSA with less expensive operations on the distributed representations is two orders of magnitude faster. Our code is available at https://github.com/IBM/neuro-vector-symbolic-architectures. △ Less

Submitted 3 March, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: Updated version with additional NVSA end-to-end training, generalization experiments, and PGM experiments

arXiv:2112.15424 [pdf, other]

doi 10.1145/3558000

A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part II: Applications, Cognitive Models, and Challenges

Authors: Denis Kleyko, Dmitri A. Rachkovskij, Evgeny Osipov, Abbas Rahimi

Abstract: This is Part II of the two-part comprehensive survey devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of struc… ▽ More This is Part II of the two-part comprehensive survey devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and vector distributed representations. Holographic Reduced Representations is an influential HDC/VSA model that is well-known in the machine learning domain and often used to refer to the whole family. However, for the sake of consistency, we use HDC/VSA to refer to the field. Part I of this survey covered foundational aspects of the field, such as the historical context leading to the development of HDC/VSA, key elements of any HDC/VSA model, known HDC/VSA models, and the transformation of input data of various types into high-dimensional vectors suitable for HDC/VSA. This second part surveys existing applications, the role of HDC/VSA in cognitive computing and architectures, as well as directions for future work. Most of the applications lie within the Machine Learning/Artificial Intelligence domain, however, we also cover other applications to provide a complete picture. The survey is written to be useful for both newcomers and practitioners. △ Less

Submitted 1 August, 2023; v1 submitted 12 November, 2021; originally announced December 2021.

Comments: 37 pages

Journal ref: ACM Computing Surveys (2023), vol. 55, no. 9

arXiv:2112.03909 [pdf, other]

Vehicle trajectory prediction works, but not everywhere

Authors: Mohammadhossein Bahari, Saeed Saadatnejad, Ahmad Rahimi, Mohammad Shaverdikondori, Amir-Hossein Shahidzadeh, Seyed-Mohsen Moosavi-Dezfooli, Alexandre Alahi

Abstract: Vehicle trajectory prediction is nowadays a fundamental pillar of self-driving cars. Both the industry and research communities have acknowledged the need for such a pillar by providing public benchmarks. While state-of-the-art methods are impressive, i.e., they have no off-road prediction, their generalization to cities outside of the benchmark remains unexplored. In this work, we show that those… ▽ More Vehicle trajectory prediction is nowadays a fundamental pillar of self-driving cars. Both the industry and research communities have acknowledged the need for such a pillar by providing public benchmarks. While state-of-the-art methods are impressive, i.e., they have no off-road prediction, their generalization to cities outside of the benchmark remains unexplored. In this work, we show that those methods do not generalize to new scenes. We present a method that automatically generates realistic scenes causing state-of-the-art models to go off-road. We frame the problem through the lens of adversarial scene generation. The method is a simple yet effective generative model based on atomic scene generation functions along with physical constraints. Our experiments show that more than 60% of existing scenes from the current benchmarks can be modified in a way to make prediction methods fail (i.e., predicting off-road). We further show that the generated scenes (i) are realistic since they do exist in the real world, and (ii) can be used to make existing models more robust, yielding 30-40 reductions in the off-road rate. The code is available online: https://s-attack.github.io/. △ Less

Submitted 29 March, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: CVPR 2022

arXiv:2111.15199 [pdf, other]

Semi-Supervised 3D Hand Shape and Pose Estimation with Label Propagation

Authors: Samira Kaviani, Amir Rahimi, Richard Hartley

Abstract: To obtain 3D annotations, we are restricted to controlled environments or synthetic datasets, leading us to 3D datasets with less generalizability to real-world scenarios. To tackle this issue in the context of semi-supervised 3D hand shape and pose estimation, we propose the Pose Alignment network to propagate 3D annotations from labelled frames to nearby unlabelled frames in sparsely annotated v… ▽ More To obtain 3D annotations, we are restricted to controlled environments or synthetic datasets, leading us to 3D datasets with less generalizability to real-world scenarios. To tackle this issue in the context of semi-supervised 3D hand shape and pose estimation, we propose the Pose Alignment network to propagate 3D annotations from labelled frames to nearby unlabelled frames in sparsely annotated videos. We show that incorporating the alignment supervision on pairs of labelled-unlabelled frames allows us to improve the pose estimation accuracy. Besides, we show that the proposed Pose Alignment network can effectively propagate annotations on unseen sparsely labelled videos without fine-tuning. △ Less

Submitted 30 November, 2021; originally announced November 2021.

Comments: DICTA 2021

arXiv:2111.06077 [pdf, other]

doi 10.1145/3538531

A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part I: Models and Data Transformations

Authors: Denis Kleyko, Dmitri A. Rachkovskij, Evgeny Osipov, Abbas Rahimi

Abstract: This two-part comprehensive survey is devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic… ▽ More This two-part comprehensive survey is devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and vector distributed representations. Notable models in the HDC/VSA family are Tensor Product Representations, Holographic Reduced Representations, Multiply-Add-Permute, Binary Spatter Codes, and Sparse Binary Distributed Representations but there are other models too. HDC/VSA is a highly interdisciplinary field with connections to computer science, electrical engineering, artificial intelligence, mathematics, and cognitive science. This fact makes it challenging to create a thorough overview of the field. However, due to a surge of new researchers joining the field in recent years, the necessity for a comprehensive survey of the field has become extremely important. Therefore, amongst other aspects of the field, this Part I surveys important aspects such as: known computational models of HDC/VSA and transformations of various input data types to high-dimensional distributed representations. Part II of this survey is devoted to applications, cognitive computing and architectures, as well as directions for future work. The survey is written to be useful for both newcomers and practitioners. △ Less

Submitted 31 July, 2023; v1 submitted 11 November, 2021; originally announced November 2021.

Comments: 31 pages

Journal ref: ACM Computing Surveys (2022), vol. 55, no. 6

arXiv:2109.10444 [pdf, other]

Fairness-aware Class Imbalanced Learning

Authors: Shivashankar Subramanian, Afshin Rahimi, Timothy Baldwin, Trevor Cohn, Lea Frermann

Abstract: Class imbalance is a common challenge in many NLP tasks, and has clear connections to bias, in that bias in training data often leads to higher accuracy for majority groups at the expense of minority groups. However there has traditionally been a disconnect between research on class-imbalanced learning and mitigating bias, and only recently have the two been looked at through a common lens. In thi… ▽ More Class imbalance is a common challenge in many NLP tasks, and has clear connections to bias, in that bias in training data often leads to higher accuracy for majority groups at the expense of minority groups. However there has traditionally been a disconnect between research on class-imbalanced learning and mitigating bias, and only recently have the two been looked at through a common lens. In this work we evaluate long-tail learning methods for tweet sentiment and occupation classification, and extend a margin-loss based approach with methods to enforce fairness. We empirically show through controlled experiments that the proposed approaches help mitigate both class imbalance and demographic biases. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: To appear in EMNLP 2021

arXiv:2107.10456 [pdf, other]

CogSense: A Cognitively Inspired Framework for Perception Adaptation

Authors: Hyukseong Kwon, Amir Rahimi, Kevin G. Lee, Amit Agarwal, Rajan Bhattacharyya

Abstract: This paper proposes the CogSense system, which is inspired by sense-making cognition and perception in the mammalian brain to perform perception error detection and perception parameter adaptation using probabilistic signal temporal logic. As a specific application, a contrast-based perception adaption method is presented and validated. The proposed method evaluates perception errors using heterog… ▽ More This paper proposes the CogSense system, which is inspired by sense-making cognition and perception in the mammalian brain to perform perception error detection and perception parameter adaptation using probabilistic signal temporal logic. As a specific application, a contrast-based perception adaption method is presented and validated. The proposed method evaluates perception errors using heterogeneous probe functions computed from the detected objects and subsequently solves a contrast optimization problem to correct perception errors. The CogSense probe functions utilize the characteristics of geometry, dynamics, and detected blob image quality of the objects to develop axioms in a probabilistic signal temporal logic framework. By evaluating these axioms, we can formally verify whether the detections are valid or erroneous. Further, using the CogSense axioms, we generate the probabilistic signal temporal logic-based constraints to finally solve the contrast-based optimization problem to reduce false positives and false negatives. △ Less

Submitted 22 July, 2021; originally announced July 2021.

arXiv:2106.11654 [pdf, other]

doi 10.1109/TCSII.2021.3068126

Energy Efficient In-memory Hyperdimensional Encoding for Spatio-temporal Signal Processing

Authors: Geethan Karunaratne, Manuel Le Gallo, Michael Hersche, Giovanni Cherubini, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: The emerging brain-inspired computing paradigm known as hyperdimensional computing (HDC) has been proven to provide a lightweight learning framework for various cognitive tasks compared to the widely used deep learning-based approaches. Spatio-temporal (ST) signal processing, which encompasses biosignals such as electromyography (EMG) and electroencephalography (EEG), is one family of applications… ▽ More The emerging brain-inspired computing paradigm known as hyperdimensional computing (HDC) has been proven to provide a lightweight learning framework for various cognitive tasks compared to the widely used deep learning-based approaches. Spatio-temporal (ST) signal processing, which encompasses biosignals such as electromyography (EMG) and electroencephalography (EEG), is one family of applications that could benefit from an HDC-based learning framework. At the core of HDC lie manipulations and comparisons of large bit patterns, which are inherently ill-suited to conventional computing platforms based on the von-Neumann architecture. In this work, we propose an architecture for ST signal processing within the HDC framework using predominantly in-memory compute arrays. In particular, we introduce a methodology for the in-memory hyperdimensional encoding of ST data to be used together with an in-memory associative search module. We show that the in-memory HDC encoder for ST signals offers at least 1.80x energy efficiency gains, 3.36x area gains, as well as 9.74x throughput gains compared with a dedicated digital hardware implementation. At the same time it achieves a peak classification accuracy within 0.04% of that of the baseline HDC framework. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 68, no. 5, pp. 1725-1729, May 2021

arXiv:2106.05268 [pdf, other]

doi 10.1109/JPROC.2022.3209104

Vector Symbolic Architectures as a Computing Framework for Emerging Hardware

Authors: Denis Kleyko, Mike Davies, E. Paxon Frady, Pentti Kanerva, Spencer J. Kent, Bruno A. Olshausen, Evgeny Osipov, Jan M. Rabaey, Dmitri A. Rachkovskij, Abbas Rahimi, Friedrich T. Sommer

Abstract: This article reviews recent progress in the development of the computing framework vector symbolic architectures (VSA) (also known as hyperdimensional computing). This framework is well suited for implementation in stochastic, emerging hardware, and it naturally expresses the types of cognitive operations required for artificial intelligence (AI). We demonstrate in this article that the field-like… ▽ More This article reviews recent progress in the development of the computing framework vector symbolic architectures (VSA) (also known as hyperdimensional computing). This framework is well suited for implementation in stochastic, emerging hardware, and it naturally expresses the types of cognitive operations required for artificial intelligence (AI). We demonstrate in this article that the field-like algebraic structure of VSA offers simple but powerful operations on high-dimensional vectors that can support all data structures and manipulations relevant to modern computing. In addition, we illustrate the distinguishing feature of VSA, "computing in superposition," which sets it apart from conventional computing. It also opens the door to efficient solutions to the difficult combinatorial search problems inherent in AI applications. We sketch ways of demonstrating that VSA are computationally universal. We see them acting as a framework for computing with distributed representations that can play a role of an abstraction layer for emerging computing hardware. This article serves as a reference for computer architects by illustrating the philosophy behind VSA, techniques of distributed computing with them, and their relevance to emerging computing hardware, such as neuromorphic computing. △ Less

Submitted 20 July, 2023; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: 31 pages, 15 figures, 4 Tables

Journal ref: Proceedings of the IEEE (2022), vol. 110, no. 10

arXiv:2105.02771 [pdf]

doi 10.1088/1361-6560/ac176d

Saliency-Guided Deep Learning Network for Automatic Tumor Bed Volume Delineation in Post-operative Breast Irradiation

Authors: Mahdieh Kazemimoghadam, Weicheng Chi, Asal Rahimi, Nathan Kim, Prasanna Alluri, Chika Nwachukwu, Weiguo Lu, Xuejun Gu

Abstract: Efficient, reliable and reproducible target volume delineation is a key step in the effective planning of breast radiotherapy. However, post-operative breast target delineation is challenging as the contrast between the tumor bed volume (TBV) and normal breast tissue is relatively low in CT images. In this study, we propose to mimic the marker-guidance procedure in manual target delineation. We de… ▽ More Efficient, reliable and reproducible target volume delineation is a key step in the effective planning of breast radiotherapy. However, post-operative breast target delineation is challenging as the contrast between the tumor bed volume (TBV) and normal breast tissue is relatively low in CT images. In this study, we propose to mimic the marker-guidance procedure in manual target delineation. We developed a saliency-based deep learning segmentation (SDL-Seg) algorithm for accurate TBV segmentation in post-operative breast irradiation. The SDL-Seg algorithm incorporates saliency information in the form of markers' location cues into a U-Net model. The design forces the model to encode the location-related features, which underscores regions with high saliency levels and suppresses low saliency regions. The saliency maps were generated by identifying markers on CT images. Markers' locations were then converted to probability maps using a distance-transformation coupled with a Gaussian filter. Subsequently, the CT images and the corresponding saliency maps formed a multi-channel input for the SDL-Seg network. Our in-house dataset was comprised of 145 prone CT images from 29 post-operative breast cancer patients, who received 5-fraction partial breast irradiation (PBI) regimen on GammaPod. The performance of the proposed method was compared against basic U-Net. Our model achieved mean (standard deviation) of 76.4 %, 6.76 mm, and 1.9 mm for DSC, HD95, and ASD respectively on the test set with computation time of below 11 seconds per one CT volume. SDL-Seg showed superior performance relative to basic U-Net for all the evaluation metrics while preserving low computation cost. The findings demonstrate that SDL-Seg is a promising approach for improving the efficiency and accuracy of the on-line treatment planning procedure of PBI, such as GammaPod based PBI. △ Less

Submitted 26 July, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: https://iopscience.iop.org/article/10.1088/1361-6560/ac176d

Journal ref: Physics in Medicine & Biology 2021

arXiv:2104.11949 [pdf, other]

Automatic Diagnosis of COVID-19 from CT Images using CycleGAN and Transfer Learning

Authors: Navid Ghassemi, Afshin Shoeibi, Marjane Khodatars, Jonathan Heras, Alireza Rahimi, Assef Zare, Ram Bilas Pachori, J. Manuel Gorriz

Abstract: The outbreak of the corona virus disease (COVID-19) has changed the lives of most people on Earth. Given the high prevalence of this disease, its correct diagnosis in order to quarantine patients is of the utmost importance in steps of fighting this pandemic. Among the various modalities used for diagnosis, medical imaging, especially computed tomography (CT) imaging, has been the focus of many pr… ▽ More The outbreak of the corona virus disease (COVID-19) has changed the lives of most people on Earth. Given the high prevalence of this disease, its correct diagnosis in order to quarantine patients is of the utmost importance in steps of fighting this pandemic. Among the various modalities used for diagnosis, medical imaging, especially computed tomography (CT) imaging, has been the focus of many previous studies due to its accuracy and availability. In addition, automation of diagnostic methods can be of great help to physicians. In this paper, a method based on pre-trained deep neural networks is presented, which, by taking advantage of a cyclic generative adversarial net (CycleGAN) model for data augmentation, has reached state-of-the-art performance for the task at hand, i.e., 99.60% accuracy. Also, in order to evaluate the method, a dataset containing 3163 images from 189 patients has been collected and labeled by physicians. Unlike prior datasets, normal data have been collected from people suspected of having COVID-19 disease and not from data from other diseases, and this database is made available publicly. △ Less

Submitted 24 April, 2021; originally announced April 2021.

arXiv:2103.14162 [pdf, other]

Few-shot Weakly-Supervised Object Detection via Directional Statistics

Authors: Amirreza Shaban, Amir Rahimi, Thalaiyasingam Ajanthan, Byron Boots, Richard Hartley

Abstract: Detecting novel objects from few examples has become an emerging topic in computer vision recently. However, these methods need fully annotated training images to learn new object categories which limits their applicability in real world scenarios such as field robotics. In this work, we propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and f… ▽ More Detecting novel objects from few examples has become an emerging topic in computer vision recently. However, these methods need fully annotated training images to learn new object categories which limits their applicability in real world scenarios such as field robotics. In this work, we propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and few-shot Weakly Supervised Object Detection (WSOD). In these tasks, only image-level labels, which are much cheaper to acquire, are available. We find that operating on features extracted from the last layer of a pre-trained Faster-RCNN is more effective compared to previous episodic learning based few-shot COL methods. Our model simultaneously learns the distribution of the novel objects and localizes them via expectation-maximization steps. As a probabilistic model, we employ von Mises-Fisher (vMF) distribution which captures the semantic information better than Gaussian distribution when applied to the pre-trained embedding space. When the novel objects are localized, we utilize them to learn a linear appearance model to detect novel classes in new images. Our extensive experiments show that the proposed method, despite being simple, outperforms strong baselines in few-shot COL and WSOD, as well as large-scale WSOD tasks. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.01344 [pdf, ps, other]

Multi-Party Proof Generation in QAP-based zk-SNARKs

Authors: Ali Rahimi, Mohammad Ali Maddah-Ali

Abstract: Zero-knowledge succinct non-interactive argument of knowledge (zkSNARK) allows a party, known as the prover, to convince another party, known as the verifier, that he knows a private value $v$, without revealing it, such that $F(u,v)=y$ for some function $F$ and public values $u$ and $y$. There are various versions of zk-SNARK, among them, Quadratic Arithmetic Program (QAP)-based zk-SNARK has been… ▽ More Zero-knowledge succinct non-interactive argument of knowledge (zkSNARK) allows a party, known as the prover, to convince another party, known as the verifier, that he knows a private value $v$, without revealing it, such that $F(u,v)=y$ for some function $F$ and public values $u$ and $y$. There are various versions of zk-SNARK, among them, Quadratic Arithmetic Program (QAP)-based zk-SNARK has been widely used in practice, specially in Blockchain technology. This is attributed to two desirable features; its fixed-size proof and the very light computation load of the verifier. However, the computation load of the prover in QAP-based zkSNARKs, is very heavy, even-though it is designed to be very efficient. This load can be beyond the prover's computation power to handle, and has to be offloaded to some external servers. In the existing offloading solutions, either (i) the load of computation, offloaded to each sever, is a fraction of the prover's primary computation (e.g., DZIK), however the servers need to be trusted, (ii) the servers are not required to be trusted, but the computation complexity imposed to each one is the same as the prover's primary computation (e.g., Trinocchio). In this paper, we present a scheme, which has the benefits of both solutions. In particular, we propose a secure multi-party proof generation algorithm where the prover can delegate its task to $N $ servers, where (i) even if a group of $T \in \mathbb{N}$ servers, $T\le N$, collude, they cannot gain any information about the secret value $v$, (ii) the computation complexity of each server is less than $1/(N-T)$ of the prover's primary computation. The design is such that we don't lose the efficiency of the prover's algorithm in the process of delegating the tasks to external servers. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: 31 pages, 2 figures

arXiv:2102.09107 [pdf]

A Core of E-Commerce Customer Experience based on Conversational Data using Network Text Methodology

Authors: Andry Alamsyah, Nurlisa Laksmiani, Lies Anisa Rahimi

Abstract: E-commerce provides an efficient and effective way to exchange goods between sellers and customers. E-commerce has been a popular method for doing business, because of its simplicity of having commerce activity transparently available, including customer voice and opinion about their own experience. Those experiences can be a great benefit to understand customer experience comprehensively, both fo… ▽ More E-commerce provides an efficient and effective way to exchange goods between sellers and customers. E-commerce has been a popular method for doing business, because of its simplicity of having commerce activity transparently available, including customer voice and opinion about their own experience. Those experiences can be a great benefit to understand customer experience comprehensively, both for sellers and future customers. This paper applies to e-commerces and customers in Indonesia. Many Indonesian customers expressed their voice to open social network services such as Twitter and Facebook, where a large proportion of data is in the form of conversational data. By understanding customer behavior through open social network service, we can have descriptions about the e-commerce services level in Indonesia. Thus, it is related to the government's effort to improve the Indonesian digital economy ecosystem. A method for finding core topics in large-scale internet unstructured text data is needed, where the method should be fast but sufficiently accurate. Processing large-scale data is not a straightforward job, it often needs special skills of people and complex software and hardware computer system. We propose a fast methodology of text mining methods based on frequently appeared words and their word association to form network text methodology. This method is adapted from Social Network Analysis by the model relationships between words instead of actors. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: 9 pages, 1 figure, 4 tables

MSC Class: 00-XX ACM Class: K.4

Journal ref: International Journal of Business, 2018, 23(3)

arXiv:2102.02758 [pdf, other]

A 5 μW Standard Cell Memory-based Configurable Hyperdimensional Computing Accelerator for Always-on Smart Sensing

Authors: Manuel Eggimann, Abbas Rahimi, Luca Benini

Abstract: Hyperdimensional computing (HDC) is a brain-inspired computing paradigm based on high-dimensional holistic representations of vectors. It recently gained attention for embedded smart sensing due to its inherent error-resiliency and suitability to highly parallel hardware implementations. In this work, we propose a programmable all-digital CMOS implementation of a fully autonomous HDC accelerator f… ▽ More Hyperdimensional computing (HDC) is a brain-inspired computing paradigm based on high-dimensional holistic representations of vectors. It recently gained attention for embedded smart sensing due to its inherent error-resiliency and suitability to highly parallel hardware implementations. In this work, we propose a programmable all-digital CMOS implementation of a fully autonomous HDC accelerator for always-on classification in energy-constrained sensor nodes. By using energy-efficient standard cell memory (SCM), the design is easily cross-technology mappable. It achieves extremely low power, 5 $μW$ in typical applications, and an energy-efficiency improvement over the state-of-the-art (SoA) digital architectures of up to 3$\times$ in post-layout simulations for always-on wearable tasks such as EMG gesture recognition. As part of the accelerator's architecture, we introduce novel hardware-friendly embodiments of common HDC-algorithmic primitives, which results in 3.3$\times$ technology scaled area reduction over the SoA, achieving the same accuracy levels in all examined targets. The proposed architecture also has a fully configurable datapath using microcode optimized for HDC stored on an integrated SCM based configuration memory, making the design "general-purpose" in terms of HDC algorithm flexibility. This flexibility allows usage of the accelerator across novel HDC tasks, for instance, a newly designed HDC applied to the task of ball bearing fault detection. △ Less

Submitted 4 February, 2021; originally announced February 2021.

arXiv:2011.13115 [pdf, ps, other]

Learning Causal Bayesian Networks from Text

Authors: Farhad Moghimifar, Afshin Rahimi, Mahsa Baktashmotlagh, Xue Li

Abstract: Causal relationships form the basis for reasoning and decision-making in Artificial Intelligence systems. To exploit the large volume of textual data available today, the automatic discovery of causal relationships from text has emerged as a significant challenge in recent years. Existing approaches in this realm are limited to the extraction of low-level relations among individual events. To over… ▽ More Causal relationships form the basis for reasoning and decision-making in Artificial Intelligence systems. To exploit the large volume of textual data available today, the automatic discovery of causal relationships from text has emerged as a significant challenge in recent years. Existing approaches in this realm are limited to the extraction of low-level relations among individual events. To overcome the limitations of the existing approaches, in this paper, we propose a method for automatic inference of causal relationships from human written language at conceptual level. To this end, we leverage the characteristics of hierarchy of concepts and linguistic variables created from text, and represent the extracted causal relationships in the form of a Causal Bayesian Network. Our experiments demonstrate superiority of our approach over the existing approaches in inferring complex causal reasoning from the text. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Comments: ALTA2020

arXiv:2011.00677 [pdf, other]

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

Authors: Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin

Abstract: Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research. Previous work on Indonesian has been hampered by a lack of annotated datasets, a sparsity of language resources, and a lack of resource standardization. In this work, we release the IndoLEM dataset comprising seven tasks for the Indonesian… ▽ More Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research. Previous work on Indonesian has been hampered by a lack of annotated datasets, a sparsity of language resources, and a lack of resource standardization. In this work, we release the IndoLEM dataset comprising seven tasks for the Indonesian language, spanning morpho-syntax, semantics, and discourse. We additionally release IndoBERT, a new pre-trained language model for Indonesian, and evaluate it over IndoLEM, in addition to benchmarking it against existing resources. Our experiments show that IndoBERT achieves state-of-the-art performance over most of the tasks in IndoLEM. △ Less

Submitted 1 November, 2020; originally announced November 2020.

Comments: Accepted at COLING 2020 - The 28th International Conference on Computational Linguistics

arXiv:2010.11273 [pdf, other]

The Need for Standardized Explainability

Authors: Othman Benchekroun, Adel Rahimi, Qini Zhang, Tetiana Kodliuk

Abstract: Explainable AI (XAI) is paramount in industry-grade AI; however existing methods fail to address this necessity, in part due to a lack of standardisation of explainability methods. The purpose of this paper is to offer a perspective on the current state of the area of explainability, and to provide novel definitions for Explainability and Interpretability to begin standardising this area of resear… ▽ More Explainable AI (XAI) is paramount in industry-grade AI; however existing methods fail to address this necessity, in part due to a lack of standardisation of explainability methods. The purpose of this paper is to offer a perspective on the current state of the area of explainability, and to provide novel definitions for Explainability and Interpretability to begin standardising this area of research. To do so, we provide an overview of the literature on explainability, and of the existing methods that are already implemented. Finally, we offer a tentative taxonomy of the different explainability methods, opening the door to future research. △ Less

Submitted 22 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: Accepted in 2nd ICML 2020 Workshop on Human in the Loop Learning

arXiv:2010.08232 [pdf, other]

WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

Authors: Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan

Abstract: In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems… ▽ More In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems obtain very high performance, up to 0.91 F1 score, (ii) the majority of the submissions achieve substantially higher results than the baseline fastText (Joulin et al., 2017), and (iii) fine-tuning pre-trained language models on relevant language data followed by supervised training performs well in this task. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: In Proceedings of the 6th Workshop on Noisy User-generated Text

arXiv:2010.07004 [pdf, other]

Binarization Methods for Motor-Imagery Brain-Computer Interface Classification

Authors: Michael Hersche, Luca Benini, Abbas Rahimi

Abstract: Successful motor-imagery brain-computer interface (MI-BCI) algorithms either extract a large number of handcrafted features and train a classifier, or combine feature extraction and classification within deep convolutional neural networks (CNNs). Both approaches typically result in a set of real-valued weights, that pose challenges when targeting real-time execution on tightly resource-constrained… ▽ More Successful motor-imagery brain-computer interface (MI-BCI) algorithms either extract a large number of handcrafted features and train a classifier, or combine feature extraction and classification within deep convolutional neural networks (CNNs). Both approaches typically result in a set of real-valued weights, that pose challenges when targeting real-time execution on tightly resource-constrained devices. We propose methods for each of these approaches that allow transforming real-valued weights to binary numbers for efficient inference. Our first method, based on sparse bipolar random projection, projects a large number of real-valued Riemannian covariance features to a binary space, where a linear SVM classifier can be learned with binary weights too. By tuning the dimension of the binary embedding, we achieve almost the same accuracy in 4-class MI ($\leq$1.27% lower) compared to models with float16 weights, yet delivering a more compact model with simpler operations to execute. Second, we propose to use memory-augmented neural networks (MANNs) for MI-BCI such that the augmented memory is binarized. Our method replaces the fully connected layer of CNNs with a binary augmented memory using bipolar random projection, or learned projection. Our experimental results on EEGNet, an already compact CNN for MI-BCI, show that it can be compressed by 1.28x at iso-accuracy using the random projection. On the other hand, using the learned projection provides 3.89% higher accuracy but increases the memory size by 28.10x. △ Less

Submitted 14 October, 2020; originally announced October 2020.

arXiv:2010.01939 [pdf, other]

doi 10.1038/s41467-021-22364-0

Robust High-dimensional Memory-augmented Neural Networks

Authors: Geethan Karunaratne, Manuel Schmuck, Manuel Le Gallo, Giovanni Cherubini, Luca Benini, Abu Sebastian, Abbas Rahimi

Abstract: Traditional neural networks require enormous amounts of data to build their complex mappings during a slow training procedure that hinders their abilities for relearning and adapting to new data. Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues. Access to this explicit memory, however, occurs via soft read and write operations involving ever… ▽ More Traditional neural networks require enormous amounts of data to build their complex mappings during a slow training procedure that hinders their abilities for relearning and adapting to new data. Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues. Access to this explicit memory, however, occurs via soft read and write operations involving every individual memory entry, resulting in a bottleneck when implemented using the conventional von Neumann computer architecture. To overcome this bottleneck, we propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors, while closely matching 32-bit software-equivalent accuracy. This is achieved by a content-based attention mechanism that represents unrelated items in the computational memory with uncorrelated HD vectors, whose real-valued components can be readily approximated by binary, or bipolar components. Experimental results demonstrate the efficacy of our approach on few-shot image classification tasks on the Omniglot dataset using more than 256,000 phase-change memory devices. Our approach effectively merges the richness of deep neural network representations with HD computing that paves the way for robust vector-symbolic manipulations applicable in reasoning, fusion, and compression. △ Less

Submitted 19 March, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: This is a pre-print of an article accepted for publication in Nature Communications

Journal ref: Nature Communications volume 12, Article number: 2468 (2021)

arXiv:2010.00287 [pdf, ps, other]

Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT

Authors: Ehsan Doostmohammadi, Minoo Nassajian, Adel Rahimi

Abstract: Words are properly segmented in the Persian writing system; in practice, however, these writing rules are often neglected, resulting in single words being written disjointedly and multiple words written without any white spaces between them. This paper addresses the problems of word segmentation and zero-width non-joiner (ZWNJ) recognition in Persian, which we approach jointly as a sequence labeli… ▽ More Words are properly segmented in the Persian writing system; in practice, however, these writing rules are often neglected, resulting in single words being written disjointedly and multiple words written without any white spaces between them. This paper addresses the problems of word segmentation and zero-width non-joiner (ZWNJ) recognition in Persian, which we approach jointly as a sequence labeling problem. We achieved a macro-averaged F1-score of 92.40% on a carefully collected corpus of 500 sentences with a high level of difficulty. △ Less

Submitted 28 October, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

arXiv:2009.09474 [pdf, other]

Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging

Authors: Ehsan Doostmohammadi, Minoo Nassajian, Adel Rahimi

Abstract: Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results i… ▽ More Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition. Transformer-based methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving 2.68% F1-score more than the previous state-of-the-art. We, moreover, use ezafe information to improve Persian part-of-speech tagging results and show that such information will not be useful to transformer-based methods and explain why that might be the case. △ Less

Submitted 4 October, 2020; v1 submitted 20 September, 2020; originally announced September 2020.

arXiv:2006.12807 [pdf, other]

Post-hoc Calibration of Neural Networks by g-Layers

Authors: Amir Rahimi, Thomas Mensink, Kartik Gupta, Thalaiyasingam Ajanthan, Cristian Sminchisescu, Richard Hartley

Abstract: Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves. In recent years, there is a surge of research on neural network calibration and the majority of the works can be categorized into post-hoc calibration methods, defined as… ▽ More Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves. In recent years, there is a surge of research on neural network calibration and the majority of the works can be categorized into post-hoc calibration methods, defined as methods that learn an additional function to calibrate an already trained base network. In this work, we intend to understand the post-hoc calibration methods from a theoretical point of view. Especially, it is known that minimizing Negative Log-Likelihood (NLL) will lead to a calibrated network on the training set if the global optimum is attained (Bishop, 1994). Nevertheless, it is not clear learning an additional function in a post-hoc manner would lead to calibration in the theoretical sense. To this end, we prove that even though the base network ($f$) does not lead to the global optimum of NLL, by adding additional layers ($g$) and minimizing NLL by optimizing the parameters of $g$ one can obtain a calibrated network $g \circ f$. This not only provides a less stringent condition to obtain a calibrated network but also provides a theoretical justification of post-hoc calibration methods. Our experiments on various image classification benchmarks confirm the theory. △ Less

Submitted 21 February, 2022; v1 submitted 23 June, 2020; originally announced June 2020.

Showing 1–50 of 78 results for author: Rahimi, A