-
Exploring the Frontiers of Energy Efficiency using Power Management at System Scale
Authors:
Ahmad Maroof Karimi,
Matthias Maiterth,
Woong Shin,
Naw Safrin Sattar,
Hao Lu,
Feiyi Wang
Abstract:
In the face of surging power demands for exascale HPC systems, this work tackles the critical challenge of understanding the impact of software-driven power management techniques like Dynamic Voltage and Frequency Scaling (DVFS) and Power Capping. These techniques have been actively developed over the past few decades. By combining insights from GPU benchmarking to understand application power pro…
▽ More
In the face of surging power demands for exascale HPC systems, this work tackles the critical challenge of understanding the impact of software-driven power management techniques like Dynamic Voltage and Frequency Scaling (DVFS) and Power Capping. These techniques have been actively developed over the past few decades. By combining insights from GPU benchmarking to understand application power profiles, we present a telemetry data-driven approach for deriving energy savings projections. This approach has been demonstrably applied to the Frontier supercomputer at scale. Our findings based on three months of telemetry data indicate that, for certain resource-constrained jobs, significant energy savings (up to 8.5%) can be achieved without compromising performance. This translates to a substantial cost reduction, equivalent to 1438 MWh of energy saved. The key contribution of this work lies in the methodology for establishing an upper limit for these best-case scenarios and its successful application. This work sheds light on potential energy savings and empowers HPC professionals to optimize the power-performance trade-off within constrained power budgets, not only for the exascale era but also beyond.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Do Multilingual Large Language Models Mitigate Stereotype Bias?
Authors:
Shangrui Nie,
Michael Fromm,
Charles Welch,
Rebekka Görge,
Akbar Karimi,
Joan Plepi,
Nazia Afsan Mowmita,
Nicolas Flores-Herr,
Mehdi Ali,
Lucie Flek
Abstract:
While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanis…
▽ More
While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanish) and one multilingual model trained on an equal distribution of data across these languages, all using publicly available data. To ensure robust evaluation, standard bias benchmarks were automatically translated into the five target languages and verified for both translation quality and bias preservation by human annotators. Our results consistently demonstrate that multilingual training effectively mitigates bias. Moreover, we observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models with the same amount of training data, model architecture, and size.
△ Less
Submitted 9 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Self-Balanced R-CNN for Instance Segmentation
Authors:
Leonardo Rossi,
Akbar Karimi,
Andrea Prati
Abstract:
Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loo…
▽ More
Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at https://github.com/IMPLabUniPr/mmdetection/tree/sbr_cnn
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Prospector Heads: Generalized Feature Attribution for Large Models & Data
Authors:
Gautam Machiraju,
Alexander Derry,
Arjun Desai,
Neel Guha,
Amir-Hossein Karimi,
James Zou,
Russ Altman,
Christopher Ré,
Parag Mallick
Abstract:
Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for ML models in scientific and biomedical domains. Current methods for feature attribution, which rely on "explaining" the predictions of end-to-end classifiers, suffer from imprecise feature localization and are inadequate for use with small sample sizes and hig…
▽ More
Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for ML models in scientific and biomedical domains. Current methods for feature attribution, which rely on "explaining" the predictions of end-to-end classifiers, suffer from imprecise feature localization and are inadequate for use with small sample sizes and high-dimensional datasets due to computational challenges. We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods that can be applied to any encoder and any data modality. Prospector heads generalize across modalities through experiments on sequences (text), images (pathology), and graphs (protein structures), outperforming baseline attribution methods by up to 26.3 points in mean localization AUPRC. We also demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data. Through their high performance, flexibility, and generalizability, prospectors provide a framework for improving trust and transparency for ML models in complex domains.
△ Less
Submitted 19 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Profiling and Modeling of Power Characteristics of Leadership-Scale HPC System Workloads
Authors:
Ahmad Maroof Karimi,
Naw Safrin Sattar,
Woong Shin,
Feiyi Wang
Abstract:
In the exascale era in which application behavior has large power & energy footprints, per-application job-level awareness of such impression is crucial in taking steps towards achieving efficiency goals beyond performance, such as energy efficiency, and sustainability.
To achieve these goals, we have developed a novel low-latency job power profiling machine learning pipeline that can group job-…
▽ More
In the exascale era in which application behavior has large power & energy footprints, per-application job-level awareness of such impression is crucial in taking steps towards achieving efficiency goals beyond performance, such as energy efficiency, and sustainability.
To achieve these goals, we have developed a novel low-latency job power profiling machine learning pipeline that can group job-level power profiles based on their shapes as they complete. This pipeline leverages a comprehensive feature extraction and clustering pipeline powered by a generative adversarial network (GAN) model to handle the feature-rich time series of job-level power measurements. The output is then used to train a classification model that can predict whether an incoming job power profile is similar to a known group of profiles or is completely new. With extensive evaluations, we demonstrate the effectiveness of each component in our pipeline. Also, we provide a preliminary analysis of the resulting clusters that depict the power profile landscape of the Summit supercomputer from more than 60K jobs sampled from the year 2021.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Whitepaper on Reusable Hybrid and Multi-Cloud Analytics Service Framework
Authors:
Gregor von Laszewski,
Wo Chang,
Russell Reinsch,
Olivera Kotevska,
Ali Karimi,
Abdul Rahman Sattar,
Garry Mazzaferro,
Geoffrey C. Fox
Abstract:
Over the last several years, the computation landscape for conducting data analytics has completely changed. While in the past, a lot of the activities have been undertaken in isolation by companies, and research institutions, today's infrastructure constitutes a wealth of services offered by a variety of providers that offer opportunities for reuse, and interactions while leveraging service colla…
▽ More
Over the last several years, the computation landscape for conducting data analytics has completely changed. While in the past, a lot of the activities have been undertaken in isolation by companies, and research institutions, today's infrastructure constitutes a wealth of services offered by a variety of providers that offer opportunities for reuse, and interactions while leveraging service collaboration, and service cooperation.
This document focuses on expanding analytics services to develop a framework for reusable hybrid multi-service data analytics. It includes (a) a short technology review that explicitly targets the intersection of hybrid multi-provider analytics services, (b) a small motivation based on use cases we looked at, (c) enhancing the concepts of services to showcase how hybrid, as well as multi-provider services can be integrated and reused via the proposed framework, (d) address analytics service composition, and (e) integrate container technologies to achieve state-of-the-art analytics service deployment
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Causal Adversarial Perturbations for Individual Fairness and Robustness in Heterogeneous Data Spaces
Authors:
Ahmad-Reza Ehyaei,
Kiarash Mohammadi,
Amir-Hossein Karimi,
Samira Samadi,
Golnoosh Farnadi
Abstract:
As responsible AI gains importance in machine learning algorithms, properties such as fairness, adversarial robustness, and causality have received considerable attention in recent years. However, despite their individual significance, there remains a critical gap in simultaneously exploring and integrating these properties. In this paper, we propose a novel approach that examines the relationship…
▽ More
As responsible AI gains importance in machine learning algorithms, properties such as fairness, adversarial robustness, and causality have received considerable attention in recent years. However, despite their individual significance, there remains a critical gap in simultaneously exploring and integrating these properties. In this paper, we propose a novel approach that examines the relationship between individual fairness, adversarial robustness, and structural causal models in heterogeneous data spaces, particularly when dealing with discrete sensitive attributes. We use causal structural models and sensitive attributes to create a fair metric and apply it to measure semantic similarity among individuals. By introducing a novel causal adversarial perturbation and applying adversarial training, we create a new regularizer that combines individual fairness, causality, and robustness in the classifier. Our method is evaluated on both real-world and synthetic datasets, demonstrating its effectiveness in achieving an accurate classifier that simultaneously exhibits fairness, adversarial robustness, and causal awareness.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification
Authors:
Akbar Karimi,
Lucie Flek
Abstract:
The class imbalance problem can cause machine learning models to produce an undesirable performance on the minority class as well as the whole dataset. Using data augmentation techniques to increase the number of samples is one way to tackle this problem. We introduce a novel counterfactual data augmentation by verb replacement for the identification of medical claims. In addition, we investigate…
▽ More
The class imbalance problem can cause machine learning models to produce an undesirable performance on the minority class as well as the whole dataset. Using data augmentation techniques to increase the number of samples is one way to tackle this problem. We introduce a novel counterfactual data augmentation by verb replacement for the identification of medical claims. In addition, we investigate the impact of this method and compare it with 3 other data augmentation techniques, showing that the proposed method can result in a significant (relative) improvement in the minority class.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
A Persian Benchmark for Joint Intent Detection and Slot Filling
Authors:
Masoud Akbari,
Amir Hossein Karimi,
Tayyebeh Saeedi,
Zeinab Saeidi,
Kiana Ghezelbash,
Fatemeh Shamsezat,
Mohammad Akbari,
Ali Mohades
Abstract:
Natural Language Understanding (NLU) is important in today's technology as it enables machines to comprehend and process human language, leading to improved human-computer interactions and advancements in fields such as virtual assistants, chatbots, and language-based AI systems. This paper highlights the significance of advancing the field of NLU for low-resource languages. With intent detection…
▽ More
Natural Language Understanding (NLU) is important in today's technology as it enables machines to comprehend and process human language, leading to improved human-computer interactions and advancements in fields such as virtual assistants, chatbots, and language-based AI systems. This paper highlights the significance of advancing the field of NLU for low-resource languages. With intent detection and slot filling being crucial tasks in NLU, the widely used datasets ATIS and SNIPS have been utilized in the past. However, these datasets only cater to the English language and do not support other languages. In this work, we aim to address this gap by creating a Persian benchmark for joint intent detection and slot filling based on the ATIS dataset. To evaluate the effectiveness of our benchmark, we employ state-of-the-art methods for intent detection and slot filling.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Robustness Implies Fairness in Causal Algorithmic Recourse
Authors:
Ahmad-Reza Ehyaei,
Amir-Hossein Karimi,
Bernhard Schölkopf,
Setareh Maghsudi
Abstract:
Algorithmic recourse aims to disclose the inner workings of the black-box decision process in situations where decisions have significant consequences, by providing recommendations to empower beneficiaries to achieve a more favorable outcome. To ensure an effective remedy, suggested interventions must not only be low-cost but also robust and fair. This goal is accomplished by providing similar exp…
▽ More
Algorithmic recourse aims to disclose the inner workings of the black-box decision process in situations where decisions have significant consequences, by providing recommendations to empower beneficiaries to achieve a more favorable outcome. To ensure an effective remedy, suggested interventions must not only be low-cost but also robust and fair. This goal is accomplished by providing similar explanations to individuals who are alike. This study explores the concept of individual fairness and adversarial robustness in causal algorithmic recourse and addresses the challenge of achieving both. To resolve the challenges, we propose a new framework for defining adversarially robust recourse. The new setting views the protected feature as a pseudometric and demonstrates that individual fairness is a special case of adversarial robustness. Finally, we introduce the fair robust recourse problem to achieve both desirable properties and show how it can be satisfied both theoretically and empirically.
△ Less
Submitted 11 February, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
On the Relationship Between Explanation and Prediction: A Causal View
Authors:
Amir-Hossein Karimi,
Krikamol Muandet,
Simon Kornblith,
Bernhard Schölkopf,
Been Kim
Abstract:
Being able to provide explanations for a model's decision has become a central requirement for the development, deployment, and adoption of machine learning models. However, we are yet to understand what explanation methods can and cannot do. How do upstream factors such as data, model prediction, hyperparameters, and random initialization influence downstream explanations? While previous work rai…
▽ More
Being able to provide explanations for a model's decision has become a central requirement for the development, deployment, and adoption of machine learning models. However, we are yet to understand what explanation methods can and cannot do. How do upstream factors such as data, model prediction, hyperparameters, and random initialization influence downstream explanations? While previous work raised concerns that explanations (E) may have little relationship with the prediction (Y), there is a lack of conclusive study to quantify this relationship. Our work borrows tools from causal inference to systematically assay this relationship. More specifically, we study the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors, i.e., on hyperparameters and inputs used to generate saliency-based Es or Ys. Our results suggest that the relationships between E and Y is far from ideal. In fact, the gap between 'ideal' case only increase in higher-performing models -- models that are likely to be deployed. Our work is a promising first step towards providing a quantitative measure of the relationship between E and Y, which could also inform the future development of methods for E with a quantitative metric.
△ Less
Submitted 12 May, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Dynamic Decision Frequency with Continuous Options
Authors:
Amirmohammad Karimi,
Jun Jin,
Jun Luo,
A. Rupam Mahmood,
Martin Jagersand,
Samuele Tosatto
Abstract:
In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals. The duration between decisions becomes a crucial hyperparameter, as setting it too short may increase the problem's difficulty by requiring the agent to make numerous decisions to achieve its goal while setting it too long can result in the agent losing control over the system. However, physic…
▽ More
In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals. The duration between decisions becomes a crucial hyperparameter, as setting it too short may increase the problem's difficulty by requiring the agent to make numerous decisions to achieve its goal while setting it too long can result in the agent losing control over the system. However, physical systems do not necessarily require a constant control frequency, and for learning agents, it is often preferable to operate with a low frequency when possible and a high frequency when necessary. We propose a framework called Continuous-Time Continuous-Options (CTCO), where the agent chooses options as sub-policies of variable durations. These options are time-continuous and can interact with the system at any desired frequency providing a smooth change of actions. We demonstrate the effectiveness of CTCO by comparing its performance to classical RL and temporal-abstraction RL methods on simulated continuous control tasks with various action-cycle times. We show that our algorithm's performance is not affected by the choice of environment interaction frequency. Furthermore, we demonstrate the efficacy of CTCO in facilitating exploration in a real-world visual reaching task for a 7 DOF robotic arm with sparse rewards.
△ Less
Submitted 25 October, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Improving Localization for Semi-Supervised Object Detection
Authors:
Leonardo Rossi,
Akbar Karimi,
Andrea Prati
Abstract:
Nowadays, Semi-Supervised Object Detection (SSOD) is a hot topic, since, while it is rather easy to collect images for creating a new dataset, labeling them is still an expensive and time-consuming task. One of the successful methods to take advantage of raw images on a Semi-Supervised Learning (SSL) setting is the Mean Teacher technique, where the operations of pseudo-labeling by the Teacher and…
▽ More
Nowadays, Semi-Supervised Object Detection (SSOD) is a hot topic, since, while it is rather easy to collect images for creating a new dataset, labeling them is still an expensive and time-consuming task. One of the successful methods to take advantage of raw images on a Semi-Supervised Learning (SSL) setting is the Mean Teacher technique, where the operations of pseudo-labeling by the Teacher and the Knowledge Transfer from the Student to the Teacher take place simultaneously. However, the pseudo-labeling by thresholding is not the best solution since the confidence value is not strictly related to the prediction uncertainty, not permitting to safely filter predictions. In this paper, we introduce an additional classification task for bounding box localization to improve the filtering of the predicted bounding boxes and obtain higher quality on Student training. Furthermore, we empirically prove that bounding box regression on the unsupervised part can equally contribute to the training as much as category classification. Our experiments show that our IL-net (Improving Localization net) increases SSOD performance by 1.14% AP on COCO dataset in limited-annotation regime. The code is available at https://github.com/IMPLabUniPr/unbiased-teacher/tree/ilnet
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Introducing One Sided Margin Loss for Solving Classification Problems in Deep Networks
Authors:
Ali Karimi,
Zahra Mousavi Kouzehkanan,
Reshad Hosseini,
Hadi Asheri
Abstract:
This paper introduces a new loss function, OSM (One-Sided Margin), to solve maximum-margin classification problems effectively. Unlike the hinge loss, in OSM the margin is explicitly determined with corresponding hyperparameters and then the classification problem is solved. In experiments, we observe that using OSM loss leads to faster training speeds and better accuracies than binary and categor…
▽ More
This paper introduces a new loss function, OSM (One-Sided Margin), to solve maximum-margin classification problems effectively. Unlike the hinge loss, in OSM the margin is explicitly determined with corresponding hyperparameters and then the classification problem is solved. In experiments, we observe that using OSM loss leads to faster training speeds and better accuracies than binary and categorical cross-entropy in several commonly used deep models for classification and optical character recognition problems.
OSM has consistently shown better classification accuracies over cross-entropy and hinge losses for small to large neural networks. it has also led to a more efficient training procedure. We achieved state-of-the-art accuracies for small networks on several benchmark datasets of CIFAR10(98.82\%), CIFAR100(91.56\%), Flowers(98.04\%), Stanford Cars(93.91\%) with considerable improvements over other loss functions. Moreover, the accuracies are rather better than cross-entropy and hinge loss for large networks. Therefore, we strongly believe that OSM is a powerful alternative to hinge and cross-entropy losses to train deep neural networks on classification tasks.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
Multi-task recommendation system for scientific papers with high-way networks
Authors:
Aram Karimi,
Simon Dobnik
Abstract:
Finding and selecting the most relevant scientific papers from a large number of papers written in a research community is one of the key challenges for researchers these days. As we know, much information around research interest for scholars and academicians belongs to papers they read. Analysis and extracting contextual features from these papers could help us to suggest the most related paper…
▽ More
Finding and selecting the most relevant scientific papers from a large number of papers written in a research community is one of the key challenges for researchers these days. As we know, much information around research interest for scholars and academicians belongs to papers they read. Analysis and extracting contextual features from these papers could help us to suggest the most related paper to them. In this paper, we present a multi-task recommendation system (RS) that predicts a paper recommendation and generates its meta-data such as keywords. The system is implemented as a three-stage deep neural network encoder that tries to maps longer sequences of text to an embedding vector and learns simultaneously to predict the recommendation rate for a particular user and the paper's keywords. The motivation behind this approach is that the paper's topics expressed as keywords are a useful predictor of preferences of researchers. To achieve this goal, we use a system combination of RNNs, Highway and Convolutional Neural Networks to train end-to-end a context-aware collaborative matrix. Our application uses Highway networks to train the system very deep, combine the benefits of RNN and CNN to find the most important factor and make latent representation. Highway Networks allow us to enhance the traditional RNN and CNN pipeline by learning more sophisticated semantic structural representations. Using this method we can also overcome the cold start problem and learn latent features over large sequences of text.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Learning Enhancement of CNNs via Separation Index Maximizing at the First Convolutional Layer
Authors:
Ali Karimi,
Ahmad Kalhor
Abstract:
In this paper, a straightforward enhancement learning algorithm based on Separation Index (SI) concept is proposed for Convolutional Neural Networks (CNNs). At first, the SI as a supervised complexity measure is explained its usage in better learning of CNNs for classification problems illustrate. Then, a learning strategy proposes through which the first layer of a CNN is optimized by maximizing…
▽ More
In this paper, a straightforward enhancement learning algorithm based on Separation Index (SI) concept is proposed for Convolutional Neural Networks (CNNs). At first, the SI as a supervised complexity measure is explained its usage in better learning of CNNs for classification problems illustrate. Then, a learning strategy proposes through which the first layer of a CNN is optimized by maximizing the SI, and the further layers are trained through the backpropagation algorithm to learn further layers. In order to maximize the SI at the first layer, A variant of ranking loss is optimized by using the quasi least square error technique. Applying such a learning strategy to some known CNNs and datasets, its enhancement impact in almost all cases is demonstrated.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
On the Adversarial Robustness of Causal Algorithmic Recourse
Authors:
Ricardo Dominguez-Olmedo,
Amir-Hossein Karimi,
Bernhard Schölkopf
Abstract:
Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable classification outcomes from automated decision-making systems. Recourse recommendations should ideally be robust to reasonably small uncertainty in the features of the individual seeking recourse. In this work, we formulate the adversarially robust recourse problem and show that recourse metho…
▽ More
Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable classification outcomes from automated decision-making systems. Recourse recommendations should ideally be robust to reasonably small uncertainty in the features of the individual seeking recourse. In this work, we formulate the adversarially robust recourse problem and show that recourse methods that offer minimally costly recourse fail to be robust. We then present methods for generating adversarially robust recourse for linear and for differentiable classifiers. Finally, we show that regularizing the decision-making classifier to behave locally linearly and to rely more strongly on actionable features facilitates the existence of adversarially robust recourse.
△ Less
Submitted 13 June, 2022; v1 submitted 21 December, 2021;
originally announced December 2021.
-
AEDA: An Easier Data Augmentation Technique for Text Classification
Authors:
Akbar Karimi,
Leonardo Rossi,
Andrea Prati
Abstract:
This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the performance on text classification tasks. AEDA includes only random insertion of punctuation marks into the original text. This is an easier technique to implement for data augmentation than EDA method (Wei and Zou, 2019) with which we compare our results. In addition, it keeps the order of the words while changin…
▽ More
This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the performance on text classification tasks. AEDA includes only random insertion of punctuation marks into the original text. This is an easier technique to implement for data augmentation than EDA method (Wei and Zou, 2019) with which we compare our results. In addition, it keeps the order of the words while changing their positions in the sentence leading to a better generalized performance. Furthermore, the deletion operation in EDA can cause loss of information which, in turn, misleads the network, whereas AEDA preserves all the input information. Following the baseline, we perform experiments on five different datasets for text classification. We show that using the AEDA-augmented data for training, the models show superior performance compared to using the EDA-augmented data in all five datasets. The source code is available for further study and reproduction of the results.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Safety and progress proofs for a reactive planner and controller for autonomous driving
Authors:
Abolfazl Karimi,
Manish Goyal,
Parasara Sridhar Duggirala
Abstract:
In this paper, we perform safety and performance analysis of an autonomous vehicle that implements reactive planner and controller for navigating a race lap. Unlike traditional planning algorithms that have access to a map of the environment, reactive planner generates the plan purely based on the current input from sensors. Our reactive planner selects a waypoint on the local Voronoi diagram and…
▽ More
In this paper, we perform safety and performance analysis of an autonomous vehicle that implements reactive planner and controller for navigating a race lap. Unlike traditional planning algorithms that have access to a map of the environment, reactive planner generates the plan purely based on the current input from sensors. Our reactive planner selects a waypoint on the local Voronoi diagram and we use a pure-pursuit controller to navigate towards the waypoint. Our safety and performance analysis has two parts. The first part demonstrates that the reactive planner computes a plan that is locally consistent with the Voronoi plan computed with full map. The second part involves modeling of the evolution of vehicle navigating along the Voronoi diagram as a hybrid automata. For proving the safety and performance specification, we compute the reachable set of this hybrid automata and employ some enhancements that make this computation easier. We demonstrate that an autonomous vehicle implementing our reactive planner and controller is safe and successfully completes a lap for five different circuits. In addition, we have implemented our planner and controller in a simulation environment as well as a scaled down autonomous vehicle and demonstrate that our planner works well for a wide variety of circuits.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Recursively Refined R-CNN: Instance Segmentation with Self-RoI Rebalancing
Authors:
Leonardo Rossi,
Akbar Karimi,
Andrea Prati
Abstract:
Within the field of instance segmentation, most of the state-of-the-art deep learning networks rely nowadays on cascade architectures, where multiple object detectors are trained sequentially, re-sampling the ground truth at each step. This offers a solution to the problem of exponentially vanishing positive samples. However, it also translates into an increase in network complexity in terms of th…
▽ More
Within the field of instance segmentation, most of the state-of-the-art deep learning networks rely nowadays on cascade architectures, where multiple object detectors are trained sequentially, re-sampling the ground truth at each step. This offers a solution to the problem of exponentially vanishing positive samples. However, it also translates into an increase in network complexity in terms of the number of parameters. To address this issue, we propose Recursively Refined R-CNN (R^3-CNN) which avoids duplicates by introducing a loop mechanism instead. At the same time, it achieves a quality boost using a recursive re-sampling technique, where a specific IoU quality is utilized in each recursion to eventually equally cover the positive spectrum. Our experiments highlight the specific encoding of the loop mechanism in the weights, requiring its usage at inference time. The R^3-CNN architecture is able to surpass the recently proposed HTC model, while reducing the number of parameters significantly. Experiments on COCO minival 2017 dataset show performance boost independently from the utilized baseline model. The code is available online at https://github.com/IMPLabUniPr/mmdetection/tree/r3_cnn.
△ Less
Submitted 2 August, 2021; v1 submitted 3 April, 2021;
originally announced April 2021.
-
UniParma at SemEval-2021 Task 5: Toxic Spans Detection Using CharacterBERT and Bag-of-Words Model
Authors:
Akbar Karimi,
Leonardo Rossi,
Andrea Prati
Abstract:
With the ever-increasing availability of digital information, toxic content is also on the rise. Therefore, the detection of this type of language is of paramount importance. We tackle this problem utilizing a combination of a state-of-the-art pre-trained language model (CharacterBERT) and a traditional bag-of-words technique. Since the content is full of toxic words that have not been written acc…
▽ More
With the ever-increasing availability of digital information, toxic content is also on the rise. Therefore, the detection of this type of language is of paramount importance. We tackle this problem utilizing a combination of a state-of-the-art pre-trained language model (CharacterBERT) and a traditional bag-of-words technique. Since the content is full of toxic words that have not been written according to their dictionary spelling, attendance to individual characters is crucial. Therefore, we use CharacterBERT to extract features based on the word characters. It consists of a CharacterCNN module that learns character embeddings from the context. These are, then, fed into the well-known BERT architecture. The bag-of-words method, on the other hand, further improves upon that by making sure that some frequently used toxic words get labeled accordingly. With a 4 percent difference from the first team, our system ranked 36th in the competition. The code is available for further re-search and reproduction of the results.
△ Less
Submitted 9 April, 2021; v1 submitted 17 March, 2021;
originally announced March 2021.
-
Soccer Event Detection Using Deep Learning
Authors:
Ali Karimi,
Ramin Toosi,
Mohammad Ali Akhaee
Abstract:
Event detection is an important step in extracting knowledge from the video. In this paper, we propose a deep learning approach to detect events in a soccer match emphasizing the distinction between images of red and yellow cards and the correct detection of the images of selected events from other images. This method includes the following three modules: i) the variational autoencoder (VAE) modul…
▽ More
Event detection is an important step in extracting knowledge from the video. In this paper, we propose a deep learning approach to detect events in a soccer match emphasizing the distinction between images of red and yellow cards and the correct detection of the images of selected events from other images. This method includes the following three modules: i) the variational autoencoder (VAE) module to differentiate between soccer images and others image, ii) the image classification module to classify the images of events, and iii) the fine-grain image classification module to classify the images of red and yellow cards. Additionally, a new dataset was introduced for soccer images classification that is employed to train the networks mentioned in the paper. In the final section, 10 UEFA Champions League matches are used to evaluate the networks' performance and precision in detecting the events. The experiments demonstrate that the proposed method achieves better performance than state-of-the-art methods.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
EfficientNet-Absolute Zero for Continuous Speech Keyword Spotting
Authors:
Amir Mohammad Rostami,
Ali Karimi,
Mohammad Ali Akhaee
Abstract:
Keyword spotting is a process of finding some specific words or phrases in recorded speeches by computers. Deep neural network algorithms, as a powerful engine, can handle this problem if they are trained over an appropriate dataset. To this end, the football keyword dataset (FKD), as a new keyword spotting dataset in Persian, is collected with crowdsourcing. This dataset contains nearly 31000 sam…
▽ More
Keyword spotting is a process of finding some specific words or phrases in recorded speeches by computers. Deep neural network algorithms, as a powerful engine, can handle this problem if they are trained over an appropriate dataset. To this end, the football keyword dataset (FKD), as a new keyword spotting dataset in Persian, is collected with crowdsourcing. This dataset contains nearly 31000 samples in 18 classes. The continuous speech synthesis method proposed to made FKD usable in the practical application which works with continuous speeches. Besides, we proposed a lightweight architecture called EfficientNet-A0 (absolute zero) by applying the compound scaling method on EfficientNet-B0 for keyword spotting task. Finally, the proposed architecture is evaluated with various models. It is realized that EfficientNet-A0 and Resnet models outperform other models on this dataset.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Improving BERT Performance for Aspect-Based Sentiment Analysis
Authors:
Akbar Karimi,
Leonardo Rossi,
Andrea Prati
Abstract:
Aspect-Based Sentiment Analysis (ABSA) studies the consumer opinion on the market products. It involves examining the type of sentiments as well as sentiment targets expressed in product reviews. Analyzing the language used in a review is a difficult task that requires a deep understanding of the language. In recent years, deep language models, such as BERT \cite{devlin2019bert}, have shown great…
▽ More
Aspect-Based Sentiment Analysis (ABSA) studies the consumer opinion on the market products. It involves examining the type of sentiments as well as sentiment targets expressed in product reviews. Analyzing the language used in a review is a difficult task that requires a deep understanding of the language. In recent years, deep language models, such as BERT \cite{devlin2019bert}, have shown great progress in this regard. In this work, we propose two simple modules called Parallel Aggregation and Hierarchical Aggregation to be utilized on top of BERT for two main ABSA tasks namely Aspect Extraction (AE) and Aspect Sentiment Classification (ASC) in order to improve the model's performance. We show that applying the proposed models eliminates the need for further training of the BERT model. The source code is available on the Web for further research and reproduction of the results.
△ Less
Submitted 1 March, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
On the Fairness of Causal Algorithmic Recourse
Authors:
Julius von Kügelgen,
Amir-Hossein Karimi,
Umang Bhatt,
Isabel Valera,
Adrian Weller,
Bernhard Schölkopf
Abstract:
Algorithmic fairness is typically studied from the perspective of predictions. Instead, here we investigate fairness from the perspective of recourse actions suggested to individuals to remedy an unfavourable classification. We propose two new fairness criteria at the group and individual level, which -- unlike prior work on equalising the average group-wise distance from the decision boundary --…
▽ More
Algorithmic fairness is typically studied from the perspective of predictions. Instead, here we investigate fairness from the perspective of recourse actions suggested to individuals to remedy an unfavourable classification. We propose two new fairness criteria at the group and individual level, which -- unlike prior work on equalising the average group-wise distance from the decision boundary -- explicitly account for causal relationships between features, thereby capturing downstream effects of recourse actions performed in the physical world. We explore how our criteria relate to others, such as counterfactual fairness, and show that fairness of recourse is complementary to fairness of prediction. We study theoretically and empirically how to enforce fair causal recourse by altering the classifier and perform a case study on the Adult dataset. Finally, we discuss whether fairness violations in the data generating process revealed by our criteria may be better addressed by societal interventions as opposed to constraints on the classifier.
△ Less
Submitted 6 March, 2022; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Scaling Guarantees for Nearest Counterfactual Explanations
Authors:
Kiarash Mohammadi,
Amir-Hossein Karimi,
Gilles Barthe,
Isabel Valera
Abstract:
Counterfactual explanations (CFE) are being widely used to explain algorithmic decisions, especially in consequential decision-making contexts (e.g., loan approval or pretrial bail). In this context, CFEs aim to provide individuals affected by an algorithmic decision with the most similar individual (i.e., nearest individual) with a different outcome. However, while an increasing number of works p…
▽ More
Counterfactual explanations (CFE) are being widely used to explain algorithmic decisions, especially in consequential decision-making contexts (e.g., loan approval or pretrial bail). In this context, CFEs aim to provide individuals affected by an algorithmic decision with the most similar individual (i.e., nearest individual) with a different outcome. However, while an increasing number of works propose algorithms to compute CFEs, such approaches either lack in optimality of distance (i.e., they do not return the nearest individual) and perfect coverage (i.e., they do not provide a CFE for all individuals); or they cannot handle complex models, such as neural networks. In this work, we provide a framework based on Mixed-Integer Programming (MIP) to compute nearest counterfactual explanations with provable guarantees and with runtimes comparable to gradient-based approaches. Our experiments on the Adult, COMPAS, and Credit datasets show that, in contrast with previous methods, our approach allows for efficiently computing diverse CFEs with both distance guarantees and perfect coverage.
△ Less
Submitted 8 February, 2021; v1 submitted 10 October, 2020;
originally announced October 2020.
-
A survey of algorithmic recourse: definitions, formulations, solutions, and prospects
Authors:
Amir-Hossein Karimi,
Gilles Barthe,
Bernhard Schölkopf,
Isabel Valera
Abstract:
Machine learning is increasingly used to inform decision-making in sensitive situations where decisions have consequential effects on individuals' lives. In these settings, in addition to requiring models to be accurate and robust, socially relevant values such as fairness, privacy, accountability, and explainability play an important role for the adoption and impact of said technologies. In this…
▽ More
Machine learning is increasingly used to inform decision-making in sensitive situations where decisions have consequential effects on individuals' lives. In these settings, in addition to requiring models to be accurate and robust, socially relevant values such as fairness, privacy, accountability, and explainability play an important role for the adoption and impact of said technologies. In this work, we focus on algorithmic recourse, which is concerned with providing explanations and recommendations to individuals who are unfavourably treated by automated decision-making systems. We first perform an extensive literature review, and align the efforts of many authors by presenting unified definitions, formulations, and solutions to recourse. Then, we provide an overview of the prospective research directions towards which the community may engage, challenging existing assumptions and making explicit connections to other ethical challenges such as security, privacy, and fairness.
△ Less
Submitted 1 March, 2021; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Algorithmic recourse under imperfect causal knowledge: a probabilistic approach
Authors:
Amir-Hossein Karimi,
Julius von Kügelgen,
Bernhard Schölkopf,
Isabel Valera
Abstract:
Recent work has discussed the limitations of counterfactual explanations to recommend actions for algorithmic recourse, and argued for the need of taking causal relationships between features into consideration. Unfortunately, in practice, the true underlying structural causal model is generally unknown. In this work, we first show that it is impossible to guarantee recourse without access to the…
▽ More
Recent work has discussed the limitations of counterfactual explanations to recommend actions for algorithmic recourse, and argued for the need of taking causal relationships between features into consideration. Unfortunately, in practice, the true underlying structural causal model is generally unknown. In this work, we first show that it is impossible to guarantee recourse without access to the true structural equations. To address this limitation, we propose two probabilistic approaches to select optimal actions that achieve recourse with high probability given limited causal knowledge (e.g., only the causal graph). The first captures uncertainty over structural equations under additive Gaussian noise, and uses Bayesian model averaging to estimate the counterfactual distribution. The second removes any assumptions on the structural equations by instead computing the average effect of recourse actions on individuals similar to the person who seeks recourse, leading to a novel subpopulation-based interventional notion of recourse. We then derive a gradient-based procedure for selecting optimal recourse actions, and empirically show that the proposed approaches lead to more reliable recommendations under imperfect causal knowledge than non-probabilistic baselines.
△ Less
Submitted 23 October, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Restricted Chase Termination for Existential Rules: a Hierarchical Approach and Experimentation
Authors:
Arash Karimi,
Heng Zhang,
Jia-Huai You
Abstract:
The chase procedure for existential rules is an indispensable tool for several database applications, where its termination guarantees the decidability of these tasks. Most previous studies have focused on the skolem chase variant and its termination analysis. It is known that the restricted chase variant is a more powerful tool in termination analysis provided a database is given. But all-instanc…
▽ More
The chase procedure for existential rules is an indispensable tool for several database applications, where its termination guarantees the decidability of these tasks. Most previous studies have focused on the skolem chase variant and its termination analysis. It is known that the restricted chase variant is a more powerful tool in termination analysis provided a database is given. But all-instance termination presents a challenge since the critical database and similar techniques do not work. In this paper, we develop a novel technique to characterize the activeness of all possible cycles of a certain length for the restricted chase, which leads to the formulation of a parameterized class of the finite restricted chase, called $k$-$\mathsf{safe}(Φ)$. This approach applies to any class of finite skolem chase identified with a condition of acyclicity. More generally, we show that the approach can be applied to the hierarchy of bounded rule sets previously only defined for the skolem chase. Experiments on a collection of ontologies from the web show the applicability of the proposed methods on real-world ontologies. Under consideration in Theory and Practice of Logic Programming (TPLP).
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
A novel Region of Interest Extraction Layer for Instance Segmentation
Authors:
Leonardo Rossi,
Akbar Karimi,
Andrea Prati
Abstract:
Given the wide diffusion of deep neural network architectures for computer vision tasks, several new applications are nowadays more and more feasible. Among them, a particular attention has been recently given to instance segmentation, by exploiting the results achievable by two-stage networks (such as Mask R-CNN or Faster R-CNN), derived from R-CNN. In these complex architectures, a crucial role…
▽ More
Given the wide diffusion of deep neural network architectures for computer vision tasks, several new applications are nowadays more and more feasible. Among them, a particular attention has been recently given to instance segmentation, by exploiting the results achievable by two-stage networks (such as Mask R-CNN or Faster R-CNN), derived from R-CNN. In these complex architectures, a crucial role is played by the Region of Interest (RoI) extraction layer, devoted to extracting a coherent subset of features from a single Feature Pyramid Network (FPN) layer attached on top of a backbone.
This paper is motivated by the need to overcome the limitations of existing RoI extractors which select only one (the best) layer from FPN. Our intuition is that all the layers of FPN retain useful information. Therefore, the proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local building blocks and attention mechanisms to boost the performance.
A comprehensive ablation study at component level is conducted to find the best set of algorithms and parameters for the GRoIE layer. Moreover, GRoIE can be integrated seamlessly with every two-stage architecture for both object detection and instance segmentation tasks. Therefore, the improvements brought about by the use of GRoIE in different state-of-the-art architectures are also evaluated. The proposed layer leads up to gain a 1.1% AP improvement on bounding box detection and 1.7% AP improvement on instance segmentation.
The code is publicly available on GitHub repository at https://github.com/IMPLabUniPr/mmdetection/tree/groie_dev
△ Less
Submitted 1 October, 2020; v1 submitted 28 April, 2020;
originally announced April 2020.
-
Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval
Authors:
Hadi Abdi Khojasteh,
Ebrahim Ansari,
Parvin Razzaghi,
Akbar Karimi
Abstract:
This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are not comparable. In this work, we introduce an end-to-end deep multimodal convolutional-recurrent network for learning both vision and language representations si…
▽ More
This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are not comparable. In this work, we introduce an end-to-end deep multimodal convolutional-recurrent network for learning both vision and language representations simultaneously to infer image-text similarity. The model learns which pairs are a match (positive) and which ones are a mismatch (negative) using a hinge-based triplet ranking. To learn about the joint representations, we leverage our newly extracted collection of tweets from Twitter. The main characteristic of our dataset is that the images and tweets are not standardized the same as the benchmarks. Furthermore, there can be a higher semantic correlation between the pictures and tweets contrary to benchmarks in which the descriptions are well-organized. Experimental results on MS-COCO benchmark dataset show that our model outperforms certain methods presented previously and has competitive performance compared to the state-of-the-art. The code and dataset have been made available publicly.
△ Less
Submitted 23 February, 2020;
originally announced February 2020.
-
Algorithmic Recourse: from Counterfactual Explanations to Interventions
Authors:
Amir-Hossein Karimi,
Bernhard Schölkopf,
Isabel Valera
Abstract:
As machine learning is increasingly used to inform consequential decision-making (e.g., pre-trial bail and loan approval), it becomes important to explain how the system arrived at its decision, and also suggest actions to achieve a favorable decision. Counterfactual explanations -- "how the world would have (had) to be different for a desirable outcome to occur" -- aim to satisfy these criteria.…
▽ More
As machine learning is increasingly used to inform consequential decision-making (e.g., pre-trial bail and loan approval), it becomes important to explain how the system arrived at its decision, and also suggest actions to achieve a favorable decision. Counterfactual explanations -- "how the world would have (had) to be different for a desirable outcome to occur" -- aim to satisfy these criteria. Existing works have primarily focused on designing algorithms to obtain counterfactual explanations for a wide range of settings. However, one of the main objectives of "explanations as a means to help a data-subject act rather than merely understand" has been overlooked. In layman's terms, counterfactual explanations inform an individual where they need to get to, but not how to get there. In this work, we rely on causal reasoning to caution against the use of counterfactual explanations as a recommendable set of actions for recourse. Instead, we propose a shift of paradigm from recourse via nearest counterfactual explanations to recourse through minimal interventions, moving the focus from explanations to recommendations. Finally, we provide the reader with an extensive discussion on how to realistically achieve recourse beyond structural interventions.
△ Less
Submitted 8 October, 2020; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Adversarial Training for Aspect-Based Sentiment Analysis with BERT
Authors:
Akbar Karimi,
Leonardo Rossi,
Andrea Prati
Abstract:
Aspect-Based Sentiment Analysis (ABSA) deals with the extraction of sentiments and their targets. Collecting labeled data for this task in order to help neural networks generalize better can be laborious and time-consuming. As an alternative, similar data to the real-world examples can be produced artificially through an adversarial process which is carried out in the embedding space. Although the…
▽ More
Aspect-Based Sentiment Analysis (ABSA) deals with the extraction of sentiments and their targets. Collecting labeled data for this task in order to help neural networks generalize better can be laborious and time-consuming. As an alternative, similar data to the real-world examples can be produced artificially through an adversarial process which is carried out in the embedding space. Although these examples are not real sentences, they have been shown to act as a regularization method which can make neural networks more robust. In this work, we apply adversarial training, which was put forward by Goodfellow et al. (2014), to the post-trained BERT (BERT-PT) language model proposed by Xu et al. (2019) on the two major tasks of Aspect Extraction and Aspect Sentiment Classification in sentiment analysis. After improving the results of post-trained BERT by an ablation study, we propose a novel architecture called BERT Adversarial Training (BAT) to utilize adversarial training in ABSA. The proposed model outperforms post-trained BERT in both tasks. To the best of our knowledge, this is the first study on the application of adversarial training in ABSA.
△ Less
Submitted 23 October, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
Comparison of the P300 detection accuracy related to the BCI speller and image recognition scenarios
Authors:
S. A. Karimi,
A. M. Mijani,
M. T. Talebian,
S. Mirzakuchaki
Abstract:
There are several protocols in the Electroencephalography (EEG) recording scenarios which produce various types of event-related potentials (ERP). P300 pattern is a well-known ERP which produced by auditory and visual oddball paradigm and BCI speller system. In this study, P300 and non-P300 separability are investigated in two scenarios including image recognition paradigm and BCI speller. Image r…
▽ More
There are several protocols in the Electroencephalography (EEG) recording scenarios which produce various types of event-related potentials (ERP). P300 pattern is a well-known ERP which produced by auditory and visual oddball paradigm and BCI speller system. In this study, P300 and non-P300 separability are investigated in two scenarios including image recognition paradigm and BCI speller. Image recognition scenario is an experiment that examines the participants, knowledge about an image that shown to them before by analyzing the EEG signal recorded during the observing of that image as visual stimulation. To do this, three types of famous classifiers (SVM, Bayes LDA, and sparse logistic regression) were used to classify EEG recordings in six classes problem. Filtered and down-sampled (temporal samples) of EEG recording were considered as features in classification P300 pattern. Also, different sets of EEG recording including 4, 8 and 16 channels and different trial numbers were used to considering various situations in comparison. The accuracy was increased by increasing the number of trials and channels. The results prove that better accuracy is observed in the case of the image recognition scenario for the different sets of channels and by using the different number of trials. So it can be concluded that P300 pattern which produced in image recognition paradigm is more separable than BCI (matrix speller).
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
Model-Agnostic Counterfactual Explanations for Consequential Decisions
Authors:
Amir-Hossein Karimi,
Gilles Barthe,
Borja Balle,
Isabel Valera
Abstract:
Predictive models are being increasingly used to support consequential decision making at the individual level in contexts such as pretrial bail and loan approval. As a result, there is increasing social and legal pressure to provide explanations that help the affected individuals not only to understand why a prediction was output, but also how to act to obtain a desired outcome. To this end, seve…
▽ More
Predictive models are being increasingly used to support consequential decision making at the individual level in contexts such as pretrial bail and loan approval. As a result, there is increasing social and legal pressure to provide explanations that help the affected individuals not only to understand why a prediction was output, but also how to act to obtain a desired outcome. To this end, several works have proposed optimization-based methods to generate nearest counterfactual explanations. However, these methods are often restricted to a particular subset of models (e.g., decision trees or linear models) and differentiable distance functions. In contrast, we build on standard theory and tools from formal verification and propose a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae. As shown by our experiments on real-world data, our algorithm is: i) model-agnostic ({non-}linear, {non-}differentiable, {non-}convex); ii) data-type-agnostic (heterogeneous features); iii) distance-agnostic ($\ell_0, \ell_1, \ell_\infty$, and combinations thereof); iv) able to generate plausible and diverse counterfactuals for any sample (i.e., 100% coverage); and v) at provably optimal distances.
△ Less
Submitted 28 February, 2020; v1 submitted 27 May, 2019;
originally announced May 2019.
-
On the Resource Utilization of Multi-Connectivity Transmission for URLLC Services in 5G New Radio
Authors:
Nurul Huda Mahmood,
Ali Karimi,
Gilberto Berardinelli,
Klaus I. Pedersen,
Daniela Laselva
Abstract:
Multi-connectivity with packet duplication, where the same data packet is duplicated and transmitted from multiple transmitters, is proposed in 5G New Radio as a reliability enhancement feature. This paper presents an analytical study of the outage probability enhancement with multi-connectivity, and analyses its cost in terms of resource usage. The performance analysis is further compared against…
▽ More
Multi-connectivity with packet duplication, where the same data packet is duplicated and transmitted from multiple transmitters, is proposed in 5G New Radio as a reliability enhancement feature. This paper presents an analytical study of the outage probability enhancement with multi-connectivity, and analyses its cost in terms of resource usage. The performance analysis is further compared against conventional single-connectivity transmission. Our analysis shows that, for transmission with a given block error rate target, multi-connectivity results in more than an order of magnitude outage probability improvement over the baseline single-connectivity scheme. However, such gains are achieved at the cost of almost doubling the amount of radio resources used. Multi-connectivity should thus be selectively used such that its benefits can be harnessed for critical users, while the price to pay in terms of resource utilization is simultaneously minimized.
△ Less
Submitted 13 March, 2019;
originally announced April 2019.
-
Deep Variational Sufficient Dimensionality Reduction
Authors:
Ershad Banijamali,
Amir-Hossein Karimi,
Ali Ghodsi
Abstract:
We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved. We propose DVSDR, a deep variational approach for sufficient dimensionality reduction. The deep structure in our model has a bottleneck that represent the lo…
▽ More
We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved. We propose DVSDR, a deep variational approach for sufficient dimensionality reduction. The deep structure in our model has a bottleneck that represent the low-dimensional embedding of the data. We explain the SDR problem using graphical models and use the framework of variational autoencoders to maximize the lower bound of the log-likelihood of the joint distribution of the observation and label. We show that such a maximization problem can be interpreted as solving the SDR problem. DVSDR can be easily adopted to semi-supervised learning setting. In our experiment we show that DVSDR performs competitively on classification tasks while being able to generate novel data samples.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
SRP: Efficient class-aware embedding learning for large-scale data via supervised random projections
Authors:
Amir-Hossein Karimi,
Alexander Wong,
Ali Ghodsi
Abstract:
Supervised dimensionality reduction strategies have been of great interest. However, current supervised dimensionality reduction approaches are difficult to scale for situations characterized by large datasets given the high computational complexities associated with such methods. While stochastic approximation strategies have been explored for unsupervised dimensionality reduction to tackle this…
▽ More
Supervised dimensionality reduction strategies have been of great interest. However, current supervised dimensionality reduction approaches are difficult to scale for situations characterized by large datasets given the high computational complexities associated with such methods. While stochastic approximation strategies have been explored for unsupervised dimensionality reduction to tackle this challenge, such approaches are not well-suited for accelerating computational speed for supervised dimensionality reduction. Motivated to tackle this challenge, in this study we explore a novel direction of directly learning optimal class-aware embeddings in a supervised manner via the notion of supervised random projections (SRP). The key idea behind SRP is that, rather than performing spectral decomposition (or approximations thereof) which are computationally prohibitive for large-scale data, we instead perform a direct decomposition by leveraging kernel approximation theory and the symmetry of the Hilbert-Schmidt Independence Criterion (HSIC) measure of dependence between the embedded data and the labels. Experimental results on five different synthetic and real-world datasets demonstrate that the proposed SRP strategy for class-aware embedding learning can be very promising in producing embeddings that are highly competitive with existing supervised dimensionality reduction methods (e.g., SPCA and KSPCA) while achieving 1-2 orders of magnitude better computational performance. As such, such an efficient approach to learning embeddings for dimensionality reduction can be a powerful tool for large-scale data analysis and visualization.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
JADE: Joint Autoencoders for Dis-Entanglement
Authors:
Ershad Banijamali,
Amir-Hossein Karimi,
Alexander Wong,
Ali Ghodsi
Abstract:
The problem of feature disentanglement has been explored in the literature, for the purpose of image and video processing and text analysis. State-of-the-art methods for disentangling feature representations rely on the presence of many labeled samples. In this work, we present a novel method for disentangling factors of variation in data-scarce regimes. Specifically, we explore the application of…
▽ More
The problem of feature disentanglement has been explored in the literature, for the purpose of image and video processing and text analysis. State-of-the-art methods for disentangling feature representations rely on the presence of many labeled samples. In this work, we present a novel method for disentangling factors of variation in data-scarce regimes. Specifically, we explore the application of feature disentangling for the problem of supervised classification in a setting where few labeled samples exist, and there are no unlabeled samples for use in unsupervised training. Instead, a similar datasets exists which shares at least one direction of variation with the sample-constrained datasets. We train our model end-to-end using the framework of variational autoencoders and are able to experimentally demonstrate that using an auxiliary dataset with similar variation factors contribute positively to classification performance, yielding competitive results with the state-of-the-art in unsupervised learning.
△ Less
Submitted 24 November, 2017;
originally announced November 2017.
-
Extracting an English-Persian Parallel Corpus from Comparable Corpora
Authors:
Akbar Karimi,
Ebrahim Ansari,
Bahram Sadeghi Bigham
Abstract:
Parallel data are an important part of a reliable Statistical Machine Translation (SMT) system. The more of these data are available, the better the quality of the SMT system. However, for some language pairs such as Persian-English, parallel sources of this kind are scarce. In this paper, a bidirectional method is proposed to extract parallel sentences from English and Persian document aligned Wi…
▽ More
Parallel data are an important part of a reliable Statistical Machine Translation (SMT) system. The more of these data are available, the better the quality of the SMT system. However, for some language pairs such as Persian-English, parallel sources of this kind are scarce. In this paper, a bidirectional method is proposed to extract parallel sentences from English and Persian document aligned Wikipedia. Two machine translation systems are employed to translate from Persian to English and the reverse after which an IR system is used to measure the similarity of the translated sentences. Adding the extracted sentences to the training data of the existing SMT systems is shown to improve the quality of the translation. Furthermore, the proposed method slightly outperforms the one-directional approach. The extracted corpus consists of about 200,000 sentences which have been sorted by their degree of similarity calculated by the IR system and is freely available for public access on the Web.
△ Less
Submitted 31 March, 2019; v1 submitted 2 November, 2017;
originally announced November 2017.
-
Synthesizing Deep Neural Network Architectures using Biological Synaptic Strength Distributions
Authors:
A. H. Karimi,
M. J. Shafiee,
A. Ghodsi,
A. Wong
Abstract:
In this work, we perform an exploratory study on synthesizing deep neural networks using biological synaptic strength distributions, and the potential influence of different distributions on modelling performance particularly for the scenario associated with small data sets. Surprisingly, a CNN with convolutional layer synaptic strengths drawn from biologically-inspired distributions such as log-n…
▽ More
In this work, we perform an exploratory study on synthesizing deep neural networks using biological synaptic strength distributions, and the potential influence of different distributions on modelling performance particularly for the scenario associated with small data sets. Surprisingly, a CNN with convolutional layer synaptic strengths drawn from biologically-inspired distributions such as log-normal or correlated center-surround distributions performed relatively well suggesting a possibility for designing deep neural network architectures that do not require many data samples to learn, and can sidestep current training procedures while maintaining or boosting modelling performance.
△ Less
Submitted 30 June, 2017;
originally announced July 2017.
-
Key-Value Memory Networks for Directly Reading Documents
Authors:
Alexander Miller,
Adam Fisch,
Jesse Dodge,
Amir-Hossein Karimi,
Antoine Bordes,
Jason Weston
Abstract:
Directly reading documents and being able to answer questions from them is an unsolved challenge. To avoid its inherent difficulty, question answering (QA) has been directed towards using Knowledge Bases (KBs) instead, which has proven effective. Unfortunately KBs often suffer from being too restrictive, as the schema cannot support certain types of answers, and too sparse, e.g. Wikipedia contains…
▽ More
Directly reading documents and being able to answer questions from them is an unsolved challenge. To avoid its inherent difficulty, question answering (QA) has been directed towards using Knowledge Bases (KBs) instead, which has proven effective. Unfortunately KBs often suffer from being too restrictive, as the schema cannot support certain types of answers, and too sparse, e.g. Wikipedia contains much more information than Freebase. In this work we introduce a new method, Key-Value Memory Networks, that makes reading documents more viable by utilizing different encodings in the addressing and output stages of the memory read operation. To compare using KBs, information extraction or Wikipedia documents directly in a single framework we construct an analysis tool, WikiMovies, a QA dataset that contains raw text alongside a preprocessed KB, in the domain of movies. Our method reduces the gap between all three settings. It also achieves state-of-the-art results on the existing WikiQA benchmark.
△ Less
Submitted 10 October, 2016; v1 submitted 9 June, 2016;
originally announced June 2016.
-
Theoremizing Yablo's Paradox
Authors:
Ahmad Karimi,
Saeed Salehi
Abstract:
To counter a general belief that all the paradoxes stem from a kind of circularity (or involve some self--reference, or use a diagonal argument) Stephen Yablo designed a paradox in 1993 that seemingly avoided self--reference. We turn Yablo's paradox, the most challenging paradox in the recent years, into a genuine mathematical theorem in Linear Temporal Logic (LTL). Indeed, Yablo's paradox comes i…
▽ More
To counter a general belief that all the paradoxes stem from a kind of circularity (or involve some self--reference, or use a diagonal argument) Stephen Yablo designed a paradox in 1993 that seemingly avoided self--reference. We turn Yablo's paradox, the most challenging paradox in the recent years, into a genuine mathematical theorem in Linear Temporal Logic (LTL). Indeed, Yablo's paradox comes in several varieties; and he showed in 2004 that there are other versions that are equally paradoxical. Formalizing these versions of Yablo's paradox, we prove some theorems in LTL. This is the first time that Yablo's paradox(es) become new(ly discovered) theorems in mathematics and logic.
△ Less
Submitted 1 June, 2014;
originally announced June 2014.
-
Diagonalizing by Fixed-Points
Authors:
Ahmad Karimi,
Saeed Salehi
Abstract:
A universal schema for diagonalization was popularized by N. S. Yanofsky (2003) in which the existence of a (diagonolized-out and contradictory) object implies the existence of a fixed-point for a certain function. It was shown that many self-referential paradoxes and diagonally proved theorems can fit in that schema. Here, we fit more theorems in the universal schema of diagonalization, such as E…
▽ More
A universal schema for diagonalization was popularized by N. S. Yanofsky (2003) in which the existence of a (diagonolized-out and contradictory) object implies the existence of a fixed-point for a certain function. It was shown that many self-referential paradoxes and diagonally proved theorems can fit in that schema. Here, we fit more theorems in the universal schema of diagonalization, such as Euclid's theorem on the infinitude of the primes and new proofs of Boolos (1997) for Cantor's theorem on the non-equinumerosity of a set with its powerset. Then, in Linear Temporal Logic, we show the non-existence of a fixed-point in this logic whose proof resembles the argument of Yablo's paradox. Thus, Yablo's paradox turns for the first time into a genuine mathematico-logical theorem in the framework of Linear Temporal Logic. Again the diagonal schema of the paper is used in this proof, and also it is shown that G. Priest's inclosure schema (1997) can fit in our universal diagonal/fixed-point schema. We also show the existence of dominating (Ackermann-like) functions (which dominate a given countable set of functions---like primitive recursives) using the schema.
△ Less
Submitted 10 December, 2016; v1 submitted 4 March, 2013;
originally announced March 2013.
-
A New Fuzzy Approach for Dynamic Load Balancing Algorithm
Authors:
Abbas Karimi,
Faraneh Zarafshan,
Adznan. b. Jantan,
A. R Ramli,
M. Iqbal b. Saripan
Abstract:
Load balancing is the process of improving the Performance of a parallel and distributed system through is distribution of load among the processors [1-2]. Most of the previous work in load balancing and distributed decision making in general, do not effectively take into account the uncertainty and inconsistency in state information but in fuzzy logic, we have advantage of using crisps inputs.…
▽ More
Load balancing is the process of improving the Performance of a parallel and distributed system through is distribution of load among the processors [1-2]. Most of the previous work in load balancing and distributed decision making in general, do not effectively take into account the uncertainty and inconsistency in state information but in fuzzy logic, we have advantage of using crisps inputs. In this paper, we present a new approach for implementing dynamic load balancing algorithm with fuzzy logic, which can face to uncertainty and inconsistency of previous algorithms, further more our algorithm shows better response time than round robin and randomize algorithm respectively 30.84 percent and 45.45 percent.
△ Less
Submitted 1 October, 2009;
originally announced October 2009.