Search | arXiv e-print repository

Paraphrasing in Affirmative Terms Improves Negation Understanding

Authors: MohammadHossein Rezaei, Eduardo Blanco

Abstract: Negation is a common linguistic phenomenon. Yet language models face challenges with negation in many natural language understanding tasks such as question answering and natural language inference. In this paper, we experiment with seamless strategies that incorporate affirmative interpretations (i.e., paraphrases without negation) to make models more robust against negation. Crucially, our affirm… ▽ More Negation is a common linguistic phenomenon. Yet language models face challenges with negation in many natural language understanding tasks such as question answering and natural language inference. In this paper, we experiment with seamless strategies that incorporate affirmative interpretations (i.e., paraphrases without negation) to make models more robust against negation. Crucially, our affirmative interpretations are obtained automatically. We show improvements with CondaQA, a large corpus requiring reasoning with negation, and five natural language understanding tasks. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024

arXiv:2405.18218 [pdf, other]

FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models

Authors: Yang Zhang, Yawei Li, Xinpeng Wang, Qianli Shen, Barbara Plank, Bernd Bischl, Mina Rezaei, Kenji Kawaguchi

Abstract: Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all… ▽ More Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all self-attention and feed-forward network (FFN) layers within blocks as individual pruning candidates. FinerCut prunes layers whose removal causes minimal alternation to the model's output -- contributing to a new, lean, interpretable, and task-agnostic pruning method. Tested across 9 benchmarks, our approach retains 90% performance of Llama3-8B with 25% layers removed, and 95% performance of Llama3-70B with 30% layers removed, all without fine-tuning or post-pruning reconstruction. Strikingly, we observe intriguing results with FinerCut: 42% (34 out of 80) of the self-attention layers in Llama3-70B can be removed while preserving 99% of its performance -- without additional fine-tuning after removal. Moreover, FinerCut provides a tool to inspect the types and locations of pruned layers, allowing to observe interesting pruning behaviors. For instance, we observe a preference for pruning self-attention layers, often at deeper consecutive decoder layers. We hope our insights inspire future efficient LLM architecture designs. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 22 pages

arXiv:2405.11848 [pdf, other]

Alternators For Sequence Modeling

Authors: Mohammad Reza Rezaei, Adji Bousso Dieng

Abstract: This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively, over a cycle. The parameters of… ▽ More This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively, over a cycle. The parameters of the OTN and the FTN are not time-dependent and are learned via a minimum cross-entropy criterion over the trajectories. Alternators are versatile. They can be used as dynamical latent-variable generative models or as sequence-to-sequence predictors. When alternators are used as generative models, the FTN produces interpretable low-dimensional latent variables that capture the dynamics governing the observations. When alternators are used as sequence-to-sequence predictors, the FTN learns to predict the observed features. In both cases, the OTN learns to produce sequences that match the data. Alternators can uncover the latent dynamics underlying complex sequential data, accurately forecast and impute missing data, and sample new trajectories. We showcase the capabilities of alternators in three applications. We first used alternators to model the Lorenz equations, often used to describe chaotic behavior. We then applied alternators to Neuroscience, to map brain activity to physical activity. Finally, we applied alternators to Climate Science, focusing on sea-surface temperature forecasting. In all our experiments, we found alternators are stable to train, fast to sample from, yield high-quality generated samples and latent variables, and outperform strong baselines such as neural ODEs and diffusion models in the domains we studied. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: A new versatile family of sequence models that can be used for both generative modeling and supervised learning. The codebase will be made available upon publication. This paper is dedicated to Thomas Sankara

arXiv:2402.12810 [pdf, other]

PIP-Net: Pedestrian Intention Prediction in the Wild

Authors: Mohsen Azarmi, Mahdi Rezaei, He Wang, Sebastien Glaser

Abstract: Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs) is one of the current research challenges in this field. In this article, we introduce PIP-Net, a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios. We offer two variants of PIP-Net designed for different camera mounts and setups. Leveraging both kinematic data and spat… ▽ More Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs) is one of the current research challenges in this field. In this article, we introduce PIP-Net, a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios. We offer two variants of PIP-Net designed for different camera mounts and setups. Leveraging both kinematic data and spatial features from the driving scene, the proposed model employs a recurrent and temporal attention-based solution, outperforming state-of-the-art performance. To enhance the visual representation of road users and their proximity to the ego vehicle, we introduce a categorical depth feature map, combined with a local motion flow feature, providing rich insights into the scene dynamics. Additionally, we explore the impact of expanding the camera's field of view, from one to three cameras surrounding the ego vehicle, leading to enhancement in the model's contextual perception. Depending on the traffic scenario and road environment, the model excels in predicting pedestrian crossing intentions up to 4 seconds in advance which is a breakthrough in current research studies in pedestrian intention prediction. Finally, for the first time, we present the Urban-PIP dataset, a customised pedestrian intention prediction dataset, with multi-camera annotations in real-world automated driving scenarios. △ Less

Submitted 1 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2401.11284 [pdf, other]

Evaluating Driver Readiness in Conditionally Automated Vehicles from Eye-Tracking Data and Head Pose

Authors: Mostafa Kazemi, Mahdi Rezaei, Mohsen Azarmi

Abstract: As automated driving technology advances, the role of the driver to resume control of the vehicle in conditionally automated vehicles becomes increasingly critical. In the SAE Level 3 or partly automated vehicles, the driver needs to be available and ready to intervene when necessary. This makes it essential to evaluate their readiness accurately. This article presents a comprehensive analysis of… ▽ More As automated driving technology advances, the role of the driver to resume control of the vehicle in conditionally automated vehicles becomes increasingly critical. In the SAE Level 3 or partly automated vehicles, the driver needs to be available and ready to intervene when necessary. This makes it essential to evaluate their readiness accurately. This article presents a comprehensive analysis of driver readiness assessment by combining head pose features and eye-tracking data. The study explores the effectiveness of predictive models in evaluating driver readiness, addressing the challenges of dataset limitations and limited ground truth labels. Machine learning techniques, including LSTM architectures, are utilised to model driver readiness based on the Spatio-temporal status of the driver's head pose and eye gaze. The experiments in this article revealed that a Bidirectional LSTM architecture, combining both feature sets, achieves a mean absolute error of 0.363 on the DMD dataset, demonstrating superior performance in assessing driver readiness. The modular architecture of the proposed model also allows the integration of additional driver-specific features, such as steering wheel activity, enhancing its adaptability and real-world applicability. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2312.04427 [pdf, other]

Spheroidal Molecular Communication via Diffusion: Signaling Between Homogeneous Cell Aggregates

Authors: Mitra Rezaei, Hamidreza Arjmandi, Mohammad Zoofaghari, Kajsa Kanebratt, Liisa Vilen, David Janzen, Peter Gennemark, Adam Noel

Abstract: Recent molecular communication (MC) research has integrated more detailed computational models to capture the dynamics of practical biophysical systems. This research focuses on developing realistic models for MC transceivers inspired by spheroids - three-dimensional cell aggregates commonly used in organ-on-chip experimental systems. Potential applications that can be used or modeled with spheroi… ▽ More Recent molecular communication (MC) research has integrated more detailed computational models to capture the dynamics of practical biophysical systems. This research focuses on developing realistic models for MC transceivers inspired by spheroids - three-dimensional cell aggregates commonly used in organ-on-chip experimental systems. Potential applications that can be used or modeled with spheroids include nutrient transport in an organ-on-chip system, the release of biomarkers or reception of drug molecules by a cancerous tumor site, or transceiver nanomachines participating in information exchange. In this paper, a simple diffusive MC system is considered where a spheroidal transmitter and receiver are in an unbounded fluid environment. These spheroidal antennas are modeled as porous media for diffusive signaling molecules, then their boundary conditions and effective diffusion coefficients are characterized. Further, for either a point source or spheroidal transmitter, Green's function for concentration (GFC) outside and inside the receiving spheroid is analytically derived and formulated in terms of an infinite series and confirmed by a particle-based simulator (PBS). The provided GFCs enable computation of the transmitted and received signals in the spheroidal communication system. This study shows that the porous structure of the receiving spheroid amplifies diffusion signals but also disperses them, thus there is a trade-off between porosity and information transmission rate. Also, the results reveal that the porous arrangement of the transmitting spheroid not only disperses the received signal but also attenuates it. System performance is also evaluated in terms of bit error rate (BER). Decreasing the porosity of the receiving spheroid is shown to enhance system performance. Conversely, reducing the porosity of the transmitting spheroid can adversely affect system performance. △ Less

Submitted 9 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 14 pages; 10 figures; accepted to appear in IEEE Transactions on Molecular, Biological, and Multi-Scale Communications. This work was presented in part at the 2023 IEEE International Conference on Communication arXiv:2302.09265

arXiv:2311.18645 [pdf, other]

Stochastic Vision Transformers with Wasserstein Distance-Aware Attention

Authors: Franciskus Xaverius Erick, Mina Rezaei, Johanna Paula Müller, Bernhard Kainz

Abstract: Self-supervised learning is one of the most promising approaches to acquiring knowledge from limited labeled data. Despite the substantial advancements made in recent years, self-supervised models have posed a challenge to practitioners, as they do not readily provide insight into the model's confidence and uncertainty. Tackling this issue is no simple feat, primarily due to the complexity involve… ▽ More Self-supervised learning is one of the most promising approaches to acquiring knowledge from limited labeled data. Despite the substantial advancements made in recent years, self-supervised models have posed a challenge to practitioners, as they do not readily provide insight into the model's confidence and uncertainty. Tackling this issue is no simple feat, primarily due to the complexity involved in implementing techniques that can make use of the latent representations learned during pre-training without relying on explicit labels. Motivated by this, we introduce a new stochastic vision transformer that integrates uncertainty and distance awareness into self-supervised learning (SSL) pipelines. Instead of the conventional deterministic vector embedding, our novel stochastic vision transformer encodes image patches into elliptical Gaussian distributional embeddings. Notably, the attention matrices of these stochastic representational embeddings are computed using Wasserstein distance-based attention, effectively capitalizing on the distributional nature of these embeddings. Additionally, we propose a regularization term based on Wasserstein distance for both pre-training and fine-tuning processes, thereby incorporating distance awareness into latent representations. We perform extensive experiments across different tasks such as in-distribution generalization, out-of-distribution detection, dataset corruption, semi-supervised settings, and transfer learning to other datasets and tasks. Our proposed method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on a variety of datasets. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.13290 [pdf, other]

Interpreting Indirect Answers to Yes-No Questions in Multiple Languages

Authors: Zijie Wang, Md Mosharaf Hossain, Shivam Mathur, Terry Cruz Melo, Kadir Bulut Ozler, Keun Hee Park, Jacob Quintero, MohammadHossein Rezaei, Shreya Nupur Shakya, Md Nayem Uddin, Eduardo Blanco

Abstract: Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are us… ▽ More Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). Experimental results demonstrate that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages). △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 Findings

arXiv:2310.06514 [pdf, other]

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Authors: Yang Zhang, Yawei Li, Hannah Brown, Mina Rezaei, Bernd Bischl, Philip Torr, Ashkan Khakzar, Kenji Kawaguchi

Abstract: Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend to test faithfulness is to fit a model on designed data with known relevant features and then compare attributions with ground truth input features.This idea as… ▽ More Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend to test faithfulness is to fit a model on designed data with known relevant features and then compare attributions with ground truth input features.This idea assumes that the model learns to use all and only these designed features, for which there is no guarantee. In this paper, we solve this issue by designing the network and manually setting its weights, along with designing data. The setup, AttributionLab, serves as a sanity check for faithfulness: If an attribution method is not faithful in a controlled environment, it can be unreliable in the wild. The environment is also a laboratory for controlled experiments by which we can analyze attribution methods and suggest improvements. △ Less

Submitted 14 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Appear at NeurIPS 2023 Workshop XAIA

arXiv:2310.03174 [pdf, other]

Test Case Recommendations with Distributed Representation of Code Syntactic Features

Authors: Mosab Rezaei, Hamed Alhoori, Mona Rahimi

Abstract: Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, w… ▽ More Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, we propose an automated approach which exploits both structural and semantic properties of source code methods and test cases to recommend the most relevant and useful unit tests to the developers. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations (embedded vectors) while preserving the importance of the structure in the code. Retrieving the semantic and structural properties of a given method, the approach computes cosine similarity between the method's embedding and the previously-embedded training instances. Further, according to the similarity scores between the embedding vectors, the model identifies the closest methods of embedding and the associated unit tests as the most similar recommendations. The results on the Methods2Test dataset showed that, while there is no guarantee to have similar relevant test cases for the group of similar methods, the proposed approach extracts the most similar existing test cases for a given method in the dataset, and evaluations show that recommended test cases decrease the developers' effort to generating expected test cases. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 8 pages, 4 figures, 14th Workshop on Automating Test Case Design, Selection and Evaluation (A-TEST 2023) co-located with 38th IEEE/ACM International Conference on ASE 2023 conference

arXiv:2309.02048 [pdf, other]

Probabilistic Self-supervised Learning via Scoring Rules Minimization

Authors: Amirhossein Vahidi, Simon Schoßer, Lisa Wimmer, Yawei Li, Bernd Bischl, Eyke Hüllermeier, Mina Rezaei

Abstract: In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representa… ▽ More In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN's convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method's optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.14705 [pdf, other]

Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning

Authors: Amirhossein Vahidi, Lisa Wimmer, Hüseyin Anil Gündüz, Bernd Bischl, Eyke Hüllermeier, Mina Rezaei

Abstract: Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members which is challenging for large, over-parameterized de… ▽ More Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members which is challenging for large, over-parameterized deep neural networks. Moreover, ensemble learning has not yet seen such widespread adoption, and it remains a challenging endeavor for self-supervised or unsupervised representation learning. Motivated by these challenges, we present a novel self-supervised training regime that leverages an ensemble of independent sub-networks, complemented by a new loss function designed to encourage diversity. Our method efficiently builds a sub-model ensemble with high diversity, leading to well-calibrated estimates of model uncertainty, all achieved with minimal computational overhead compared to traditional deep self-supervised ensembles. To evaluate the effectiveness of our approach, we conducted extensive experiments across various tasks, including in-distribution generalization, out-of-distribution detection, dataset corruption, and semi-supervised settings. The results demonstrate that our method significantly improves prediction reliability. Our approach not only achieves excellent accuracy but also enhances calibration, surpassing baseline performance across a wide range of self-supervised architectures in computer vision, natural language processing, and genomics data. △ Less

Submitted 1 September, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.10033 [pdf]

CRC-ICM: Colorectal Cancer Immune Cell Markers Pattern Dataset

Authors: Zahra Mokhtari, Elham Amjadi, Hamidreza Bolhasani, Zahra Faghih, AmirReza Dehghanian, Marzieh Rezaei

Abstract: Colorectal Cancer (CRC) is the second most common cause of cancer death in the world, ad can be identified by the location of the primary tumor in the large intestine: right and left colon, and rectum. Based on the location, CRC shows differences in chromosomal and molecular characteristics, microbiomes incidence, pathogenesis, and outcome. It has been shown that tumors on left and right sides als… ▽ More Colorectal Cancer (CRC) is the second most common cause of cancer death in the world, ad can be identified by the location of the primary tumor in the large intestine: right and left colon, and rectum. Based on the location, CRC shows differences in chromosomal and molecular characteristics, microbiomes incidence, pathogenesis, and outcome. It has been shown that tumors on left and right sides also have different immune landscape, so the prognosis may be different based on the primary tumor locations. It is widely accepted that immune components of the tumor microenvironment (TME) plays a critical role in tumor development. One of the critical regulatory molecules in the TME is immune checkpoints that as the gatekeepers of immune responses regulate the infiltrated immune cell functions. Inhibitory immune checkpoints such as PD-1, Tim3, and LAG3, as the main mechanism of immune suppression in TME overexpressed and result in further development of the tumor. The images of this dataset have been taken from colon tissues of patients with CRC, stained with specific antibodies for CD3, CD8, CD45RO, PD-1, LAG3 and Tim3. The name of this dataset is CRC-ICM and contains 1756 images related to 136 patients. The initial version of CRC-ICM is published on Elsevier Mendeley dataset portal, and the latest version is accessible via: https://databiox.com △ Less

Submitted 19 August, 2023; originally announced August 2023.

arXiv:2308.08949 [pdf, other]

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Authors: Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

Abstract: Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). Wh… ▽ More Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 16 pages, 14 figures

arXiv:2306.07077

Latent Dynamical Implicit Diffusion Processes

Authors: Mohammad R. Rezaei

Abstract: Latent dynamical models are commonly used to learn the distribution of a latent dynamical process that represents a sequence of noisy data samples. However, producing samples from such models with high fidelity is challenging due to the complexity and variability of latent and observation dynamics. Recent advances in diffusion-based generative models, such as DDPM and NCSN, have shown promising al… ▽ More Latent dynamical models are commonly used to learn the distribution of a latent dynamical process that represents a sequence of noisy data samples. However, producing samples from such models with high fidelity is challenging due to the complexity and variability of latent and observation dynamics. Recent advances in diffusion-based generative models, such as DDPM and NCSN, have shown promising alternatives to state-of-the-art latent generative models, such as Neural ODEs, RNNs, and Normalizing flow networks, for generating high-quality sequential samples from a prior distribution. However, their application in modeling sequential data with latent dynamical models is yet to be explored. Here, we propose a novel latent variable model named latent dynamical implicit diffusion processes (LDIDPs), which utilizes implicit diffusion processes to sample from dynamical latent processes and generate sequential observation samples accordingly. We tested LDIDPs on synthetic and simulated neural decoding problems. We demonstrate that LDIDPs can accurately learn the dynamics over latent dimensions. Furthermore, the implicit sampling method allows for the computationally efficient generation of high-quality sequential data samples from the latent and observation spaces. △ Less

Submitted 16 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: I request a withdrawal because there are no experiments with real-world datasets and also the method section requires major changes to look mathematically sounds

arXiv:2305.16031 [pdf, other]

Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning

Authors: Daniel Saggau, Mina Rezaei, Bernd Bischl, Ilias Chalkidis

Abstract: Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both effic… ▽ More Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the baseline method -- siamese neural network -- with additional convex neural networks based on functional Bregman divergence aiming to enhance the quality of the output document representations. We show that overall the combination of a self-contrastive siamese network and our proposed neural Bregman network outperforms the baselines in two linear classification settings on three long document topic classification tasks from the legal and biomedical domains. △ Less

Submitted 26 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 5 pages, short paper at Findings of ACL 2023

arXiv:2305.15421 [pdf]

Generative Adversarial Networks for Brain Images Synthesis: A Review

Authors: Firoozeh Shomal Zadeh, Sevda Molani, Maysam Orouskhani, Marziyeh Rezaei, Mehrzad Shafiei, Hossein Abbasi

Abstract: In medical imaging, image synthesis is the estimation process of one image (sequence, modality) from another image (sequence, modality). Since images with different modalities provide diverse biomarkers and capture various features, multi-modality imaging is crucial in medicine. While multi-screening is expensive, costly, and time-consuming to report by radiologists, image synthesis methods are ca… ▽ More In medical imaging, image synthesis is the estimation process of one image (sequence, modality) from another image (sequence, modality). Since images with different modalities provide diverse biomarkers and capture various features, multi-modality imaging is crucial in medicine. While multi-screening is expensive, costly, and time-consuming to report by radiologists, image synthesis methods are capable of artificially generating missing modalities. Deep learning models can automatically capture and extract the high dimensional features. Especially, generative adversarial network (GAN) as one of the most popular generative-based deep learning methods, uses convolutional networks as generators, and estimated images are discriminated as true or false based on a discriminator network. This review provides brain image synthesis via GANs. We summarized the recent developments of GANs for cross-modality brain image synthesis including CT to PET, CT to MRI, MRI to PET, and vice versa. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 9 pages, 3 tabels, 4 figures

MSC Class: 68T07 ACM Class: I.2.m

arXiv:2305.10421 [pdf]

Evolving Tsukamoto Neuro Fuzzy Model for Multiclass Covid 19 Classification with Chest X Ray Images

Authors: Marziyeh Rezaei, Sevda Molani, Negar Firoozeh, Hossein Abbasi, Farzan Vahedifard, Maysam Orouskhani

Abstract: Du e to rapid population growth and the need to use artificial intelligence to make quick decisions, developing a machine learning-based disease detection model and abnormality identification system has greatly improved the level of medical diagnosis Since COVID-19 has become one of the most severe diseases in the world, developing an automatic COVID-19 detection framework helps medical doctors in… ▽ More Du e to rapid population growth and the need to use artificial intelligence to make quick decisions, developing a machine learning-based disease detection model and abnormality identification system has greatly improved the level of medical diagnosis Since COVID-19 has become one of the most severe diseases in the world, developing an automatic COVID-19 detection framework helps medical doctors in the diagnostic process of disease and provides correct and fast results. In this paper, we propose a machine lear ning based framework for the detection of Covid 19. The proposed model employs a Tsukamoto Neuro Fuzzy Inference network to identify and distinguish Covid 19 disease from normal and pneumonia cases. While the traditional training methods tune the parameters of the neuro-fuzzy model by gradient-based algorithms and recursive least square method, we use an evolutionary-based optimization, the Cat swarm algorithm to update the parameters. In addition, six texture features extracted from chest X-ray images are give n as input to the model. Finally, the proposed model is conducted on the chest X-ray dataset to detect Covid 19. The simulation results indicate that the proposed model achieves an accuracy of 98.51%, sensitivity of 98.35%, specificity of 98.08%, and F1 score of 98.17%. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 14 pages, 5 figures, 3 tables

MSC Class: 68W50 ACM Class: I.5.0

arXiv:2305.01111 [pdf, other]

Local and Global Contextual Features Fusion for Pedestrian Intention Prediction

Authors: Mohsen Azarmi, Mahdi Rezaei, Tanveer Hussain, Chenghao Qian

Abstract: Autonomous vehicles (AVs) are becoming an indispensable part of future transportation. However, safety challenges and lack of reliability limit their real-world deployment. Towards boosting the appearance of AVs on the roads, the interaction of AVs with pedestrians including "prediction of the pedestrian crossing intention" deserves extensive research. This is a highly challenging task as involves… ▽ More Autonomous vehicles (AVs) are becoming an indispensable part of future transportation. However, safety challenges and lack of reliability limit their real-world deployment. Towards boosting the appearance of AVs on the roads, the interaction of AVs with pedestrians including "prediction of the pedestrian crossing intention" deserves extensive research. This is a highly challenging task as involves multiple non-linear parameters. In this direction, we extract and analyse spatio-temporal visual features of both pedestrian and traffic contexts. The pedestrian features include body pose and local context features that represent the pedestrian's behaviour. Additionally, to understand the global context, we utilise location, motion, and environmental information using scene parsing technology that represents the pedestrian's surroundings, and may affect the pedestrian's intention. Finally, these multi-modality features are intelligently fused for effective intention prediction learning. The experimental results of the proposed model on the JAAD dataset show a superior result on the combined AUC and F1-score compared to the state-of-the-art. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2305.01096 [pdf]

A Novel Model for Driver Lane Change Prediction in Cooperative Adaptive Cruise Control Systems

Authors: Armin Nejadhossein Qasemabadi, Saeed Mozaffari, Mahdi Rezaei, Majid Ahmadi, Shahpour Alirezaee

Abstract: Accurate lane change prediction can reduce potential accidents and contribute to higher road safety. Adaptive cruise control (ACC), lane departure avoidance (LDA), and lane keeping assistance (LKA) are some conventional modules in advanced driver assistance systems (ADAS). Thanks to vehicle-to-vehicle communication (V2V), vehicles can share traffic information with surrounding vehicles, enabling c… ▽ More Accurate lane change prediction can reduce potential accidents and contribute to higher road safety. Adaptive cruise control (ACC), lane departure avoidance (LDA), and lane keeping assistance (LKA) are some conventional modules in advanced driver assistance systems (ADAS). Thanks to vehicle-to-vehicle communication (V2V), vehicles can share traffic information with surrounding vehicles, enabling cooperative adaptive cruise control (CACC). While ACC relies on the vehicle's sensors to obtain the position and velocity of the leading vehicle, CACC also has access to the acceleration of multiple vehicles through V2V communication. This paper compares the type of information (position, velocity, acceleration) and the number of surrounding vehicles for driver lane change prediction. We trained an LSTM (Long Short-Term Memory) on the HighD dataset to predict lane change intention. Results indicate a significant improvement in accuracy with an increase in the number of surrounding vehicles and the information received from them. Specifically, the proposed model can predict the ego vehicle lane change with 59.15% and 92.43% accuracy in ACC and CACC scenarios, respectively. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2305.01095 [pdf]

LSTM-based Preceding Vehicle Behaviour Prediction during Aggressive Lane Change for ACC Application

Authors: Rajmeet Singh, Saeed Mozaffari, Mahdi Rezaei, Shahpour Alirezaee

Abstract: The development of Adaptive Cruise Control (ACC) systems aims to enhance the safety and comfort of vehicles by automatically regulating the speed of the vehicle to ensure a safe gap from the preceding vehicle. However, conventional ACC systems are unable to adapt themselves to changing driving conditions and drivers' behavior. To address this limitation, we propose a Long Short-Term Memory (LSTM)… ▽ More The development of Adaptive Cruise Control (ACC) systems aims to enhance the safety and comfort of vehicles by automatically regulating the speed of the vehicle to ensure a safe gap from the preceding vehicle. However, conventional ACC systems are unable to adapt themselves to changing driving conditions and drivers' behavior. To address this limitation, we propose a Long Short-Term Memory (LSTM) based ACC system that can learn from past driving experiences and adapt and predict new situations in real time. The model is constructed based on the real-world highD dataset, acquired from German highways with the assistance of camera-equipped drones. We evaluated the ACC system under aggressive lane changes when the side lane preceding vehicle cut off, forcing the targeted driver to reduce speed. To this end, the proposed system was assessed on a simulated driving environment and compared with a feedforward Artificial Neural Network (ANN) model and Model Predictive Control (MPC) model. The results show that the LSTM-based system is 19.25% more accurate than the ANN model and 5.9% more accurate than the MPC model in terms of predicting future values of subject vehicle acceleration. The simulation is done in Matlab/Simulink environment. △ Less

Submitted 5 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

arXiv:2304.08500 [pdf]

A comparison between Recurrent Neural Networks and classical machine learning approaches In Laser induced breakdown spectroscopy

Authors: Fatemeh Rezaei, Pouriya Khaliliyan, Mohsen Rezaei, Parvin Karimi, Behnam Ashrafkhani

Abstract: Recurrent Neural Networks are classes of Artificial Neural Networks that establish connections between different nodes form a directed or undirected graph for temporal dynamical analysis. In this research, the laser induced breakdown spectroscopy (LIBS) technique is used for quantitative analysis of aluminum alloys by different Recurrent Neural Network (RNN) architecture. The fundamental harmonic… ▽ More Recurrent Neural Networks are classes of Artificial Neural Networks that establish connections between different nodes form a directed or undirected graph for temporal dynamical analysis. In this research, the laser induced breakdown spectroscopy (LIBS) technique is used for quantitative analysis of aluminum alloys by different Recurrent Neural Network (RNN) architecture. The fundamental harmonic (1064 nm) of a nanosecond Nd:YAG laser pulse is employed to generate the LIBS plasma for the prediction of constituent concentrations of the aluminum standard samples. Here, Recurrent Neural Networks based on different networks, such as Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Simple Recurrent Neural Network (Simple RNN), and as well as Recurrent Convolutional Networks comprising of Conv-SimpleRNN, Conv-LSTM and Conv-GRU are utilized for concentration prediction. Then a comparison is performed among prediction by classical machine learning methods of support vector regressor (SVR), the Multi Layer Perceptron (MLP), Decision Tree algorithm, Gradient Boosting Regression (GBR), Random Forest Regression (RFR), Linear Regression, and k-Nearest Neighbor (KNN) algorithm. Results showed that the machine learning tools based on Convolutional Recurrent Networks had the best efficiencies in prediction of the most of the elements among other multivariate methods. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2303.15147 [pdf, other]

Pushing the Envelope for Depth-Based Semi-Supervised 3D Hand Pose Estimation with Consistency Training

Authors: Mohammad Rezaei, Farnaz Farahanipad, Alex Dillhoff, Vassilis Athitsos

Abstract: Despite the significant progress that depth-based 3D hand pose estimation methods have made in recent years, they still require a large amount of labeled training data to achieve high accuracy. However, collecting such data is both costly and time-consuming. To tackle this issue, we propose a semi-supervised method to significantly reduce the dependence on labeled training data. The proposed metho… ▽ More Despite the significant progress that depth-based 3D hand pose estimation methods have made in recent years, they still require a large amount of labeled training data to achieve high accuracy. However, collecting such data is both costly and time-consuming. To tackle this issue, we propose a semi-supervised method to significantly reduce the dependence on labeled training data. The proposed method consists of two identical networks trained jointly: a teacher network and a student network. The teacher network is trained using both the available labeled and unlabeled samples. It leverages the unlabeled samples via a loss formulation that encourages estimation equivariance under a set of affine transformations. The student network is trained using the unlabeled samples with their pseudo-labels provided by the teacher network. For inference at test time, only the student network is used. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art semi-supervised methods by large margins. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.07541 [pdf, ps, other]

Young Humans Make Change, Young Users Click: Creating Youth-Centered Networked Social Movements

Authors: Mina Rezaei, Patsy Eubanks Owens

Abstract: From the urbanists' perspective, the everyday experience of young people, as an underrepresented group in the design of public spaces, includes tactics they use to challenge the strategies which rule over urban spaces. In this regard, youth led social movements are a set of collective tactics which groups of young people use to resist power structures. Social informational streams have revolutioni… ▽ More From the urbanists' perspective, the everyday experience of young people, as an underrepresented group in the design of public spaces, includes tactics they use to challenge the strategies which rule over urban spaces. In this regard, youth led social movements are a set of collective tactics which groups of young people use to resist power structures. Social informational streams have revolutionized the way youth organize and mobilize for social movements throughout the world, especially in urban areas. However, just like public spaces, these algorithm based platforms have been developed with a great power imbalance between the developers and users which results in the creation of non inclusive social informational streams for young activists. Social activism grows agency and confidence in youth which is critical to their development. This paper employs a youth centric lens, which is used in designing public spaces, for designing algorithmic spaces that can improve bottom up youth led movements. By reviewing the structure of these spaces and how young people interact with these structures in the different cultural contexts of Iran and the US, we propose a humanistic approach to designing social informational streams which can enhance youth activism. △ Less

Submitted 23 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Journal ref: CHI 2023 Workshop titled "Supporting Social Movements Through HCI and Design"

arXiv:2302.09265 [pdf, other]

Diffusive Molecular Communication with a Spheroidal Receiver for Organ-on-Chip Systems

Authors: Hamidreza Arjmandi, Mohamad Zoofaghari, Mitra Rezaei, Kajsa Kanebratt, Liisa Vilen, David Janzen, Peter Gennemark, Adam Noel

Abstract: Realistic models of the components and processes are required for molecular communication (MC) systems. In this paper, a spheroidal receiver structure is proposed for MC that is inspired by the 3D cell cultures known as spheroids being widely used in organ-on-chip systems. A simple diffusive MC system is considered where the spheroidal receiver and a point source transmitter are in an unbounded fl… ▽ More Realistic models of the components and processes are required for molecular communication (MC) systems. In this paper, a spheroidal receiver structure is proposed for MC that is inspired by the 3D cell cultures known as spheroids being widely used in organ-on-chip systems. A simple diffusive MC system is considered where the spheroidal receiver and a point source transmitter are in an unbounded fluid environment. The spheroidal receiver is modeled as a porous medium for diffusive signaling molecules, then its boundary conditions and effective diffusion coefficient are characterized. It is revealed that the spheroid amplifies the diffusion signal, but also disperses the signal which reduces the information communication rate. Furthermore, we analytically formulate and derive the concentration Green's function inside and outside the spheroid in terms of infinite series-forms that are confirmed by a particle-based simulator (PBS). △ Less

Submitted 18 February, 2023; originally announced February 2023.

arXiv:2212.08568 [pdf, other]

Biomedical image analysis competitions: The state of current participation practice

Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps. △ Less

Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2212.07560 [pdf, other]

Multi-level and multi-modal feature fusion for accurate 3D object detection in Connected and Automated Vehicles

Authors: Yiming Hou, Mahdi Rezaei, Richard Romano

Abstract: Aiming at highly accurate object detection for connected and automated vehicles (CAVs), this paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor by developing a novel LIDAR-Camera fusion scheme. The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded d… ▽ More Aiming at highly accurate object detection for connected and automated vehicles (CAVs), this paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor by developing a novel LIDAR-Camera fusion scheme. The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded during the convolutional process. The novel fusion scheme effectively fuses features across sensory modalities and convolutional layers to find the best representative global features. The fused features are shared by a two-stage network: the region proposal network (RPN) and the detection head (DH). The RPN generates high-recall proposals, and the DH produces final detection results. The experimental results show the proposed model outperforms more recent research on the KITTI 2D and 3D detection benchmark, particularly for distant and highly occluded instances. △ Less

Submitted 19 December, 2022; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2211.09979 [pdf]

Comparison between EM and FCM algorithms in skin tone extraction

Authors: Elham Ravanbakhsh, Mosab Rezaei, Ehsan Namjoo, Padideh Choobdar

Abstract: This study aims to investigate implementing EM and FCM algorithms for skin color extraction. The capabilities of three well-known color spaces, namely, RGB, HSV, and YCbCr for skin-tone extraction are assessed by using statistical modeling of skin tones using EM and FCM algorithms. The results show that utilizing a Gaussian mixture model for parametric modeling of skin tones using EM algorithm wor… ▽ More This study aims to investigate implementing EM and FCM algorithms for skin color extraction. The capabilities of three well-known color spaces, namely, RGB, HSV, and YCbCr for skin-tone extraction are assessed by using statistical modeling of skin tones using EM and FCM algorithms. The results show that utilizing a Gaussian mixture model for parametric modeling of skin tones using EM algorithm works well in HSV color space when all three components of the color vector are used. In spite of discarding the luminance components in YCbCr and HSV color spaces, EM algorithm provides the best results. The results of the detailed comparisons are explained in the conclusion. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 2016 1st International Conference on New Research Achievements in Electrical and Computer Engineering (ICNRAECE)

arXiv:2210.15674 [pdf, other]

Reverse Survival Model (RSM): A Pipeline for Explaining Predictions of Deep Survival Models

Authors: Mohammad R. Rezaei, Reza Saadati Fard, Ebrahim Pourjafari, Navid Ziaei, Amir Sameizadeh, Mohammad Shafiee, Mohammad Alavinia, Mansour Abolghasemian, Nick Sajadi

Abstract: The aim of survival analysis in healthcare is to estimate the probability of occurrence of an event, such as a patient's death in an intensive care unit (ICU). Recent developments in deep neural networks (DNNs) for survival analysis show the superiority of these models in comparison with other well-known models in survival analysis applications. Ensuring the reliability and explainability of deep… ▽ More The aim of survival analysis in healthcare is to estimate the probability of occurrence of an event, such as a patient's death in an intensive care unit (ICU). Recent developments in deep neural networks (DNNs) for survival analysis show the superiority of these models in comparison with other well-known models in survival analysis applications. Ensuring the reliability and explainability of deep survival models deployed in healthcare is a necessity. Since DNN models often behave like a black box, their predictions might not be easily trusted by clinicians, especially when predictions are contrary to a physician's opinion. A deep survival model that explains and justifies its decision-making process could potentially gain the trust of clinicians. In this research, we propose the reverse survival model (RSM) framework that provides detailed insights into the decision-making process of survival models. For each patient of interest, RSM can extract similar patients from a dataset and rank them based on the most relevant features that deep survival models rely on for their predictions. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2209.06941 [pdf, other]

Joint Debiased Representation and Image Clustering Learning with Self-Supervision

Authors: Shunjie-Fabian Zheng, JaeEun Nam, Emilio Dorigatti, Bernd Bischl, Shekoofeh Azizi, Mina Rezaei

Abstract: Contrastive learning is among the most successful methods for visual representation learning, and its performance can be further improved by jointly performing clustering on the learned representations. However, existing methods for joint clustering and contrastive learning do not perform well on long-tailed data distributions, as majority classes overwhelm and distort the loss of minority classes… ▽ More Contrastive learning is among the most successful methods for visual representation learning, and its performance can be further improved by jointly performing clustering on the learned representations. However, existing methods for joint clustering and contrastive learning do not perform well on long-tailed data distributions, as majority classes overwhelm and distort the loss of minority classes, thus preventing meaningful representations to be learned. Motivated by this, we develop a novel joint clustering and contrastive learning framework by adapting the debiased contrastive loss to avoid under-clustering minority classes of imbalanced datasets. We show that our proposed modified debiased contrastive loss and divergence clustering loss improves the performance across multiple datasets and learning tasks. The source code is available at https://anonymous.4open.science/r/SSL-debiased-clustering △ Less

Submitted 14 September, 2022; originally announced September 2022.

arXiv:2209.05734 [pdf]

Evaluating User Experience in Literary and Film Geography-based Apps with a Cartographical User-Centered Design Lens

Authors: Mina Rezaei, Patsy Eubanks Owens, Darnel Degand

Abstract: Geography scholarship currently includes interdisciplinary approaches and theories and reflects shifts in research methodologies. Since the spatial turn in geographical thought and the emergence of geo-web technologies, geography scholarship has leaned more toward interdisciplinarity. In recent years geographical research methods have relied on various disciplines ranging from data science to arts… ▽ More Geography scholarship currently includes interdisciplinary approaches and theories and reflects shifts in research methodologies. Since the spatial turn in geographical thought and the emergence of geo-web technologies, geography scholarship has leaned more toward interdisciplinarity. In recent years geographical research methods have relied on various disciplines ranging from data science to arts and design. Literary geography and film geography are two subfields of geography that employ novels and films in exploring spatiality, respectively. In addition to geographical concepts, these courses include many aspects of relations in space, including human-human relations, human-environment relations, et cetera, which were barely addressed in traditional geography courses. However, a review of the employment of geoweb technologies in literary and film geography practices reveals that these practices have mostly remained limited to isolating geographical passages from novels or movies. This paper explores new opportunities for designing film and literary geography-based apps using a cartographical user-centered design framework. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2209.02459 [pdf, other]

Robust and Efficient Imbalanced Positive-Unlabeled Learning with Self-supervision

Authors: Emilio Dorigatti, Jonas Schweisthal, Bernd Bischl, Mina Rezaei

Abstract: Learning from positive and unlabeled (PU) data is a setting where the learner only has access to positive and unlabeled samples while having no information on negative examples. Such PU setting is of great importance in various tasks such as medical diagnosis, social network analysis, financial markets analysis, and knowledge base completion, which also tend to be intrinsically imbalanced, i.e., w… ▽ More Learning from positive and unlabeled (PU) data is a setting where the learner only has access to positive and unlabeled samples while having no information on negative examples. Such PU setting is of great importance in various tasks such as medical diagnosis, social network analysis, financial markets analysis, and knowledge base completion, which also tend to be intrinsically imbalanced, i.e., where most examples are actually negatives. Most existing approaches for PU learning, however, only consider artificially balanced datasets and it is unclear how well they perform in the realistic scenario of imbalanced and long-tail data distribution. This paper proposes to tackle this challenge via robust and efficient self-supervised pretraining. However, training conventional self-supervised learning methods when applied with highly imbalanced PU distribution needs better reformulation. In this paper, we present \textit{ImPULSeS}, a unified representation learning framework for \underline{Im}balanced \underline{P}ositive \underline{U}nlabeled \underline{L}earning leveraging \underline{Se}lf-\underline{S}upervised debiase pre-training. ImPULSeS uses a generic combination of large-scale unsupervised learning with debiased contrastive loss and additional reweighted PU loss. We performed different experiments across multiple datasets to show that ImPULSeS is able to halve the error rate of the previous state-of-the-art, even compared with previous methods that are given the true prior. Moreover, our method showed increased robustness to prior misspecification and superior performance even when pretraining was performed on an unrelated dataset. We anticipate such robustness and efficiency will make it much easier for practitioners to obtain excellent results on other PU datasets of interest. The source code is available at \url{https://github.com/JSchweisthal/ImPULSeS} △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2208.01371 [pdf]

Multi-Module G2P Converter for Persian Focusing on Relations between Words

Authors: Mahdi Rezaei, Negar Nayeri, Saeed Farzi, Hossein Sameti

Abstract: In this paper, we investigate the application of end-to-end and multi-module frameworks for G2P conversion for the Persian language. The results demonstrate that our proposed multi-module G2P system outperforms our end-to-end systems in terms of accuracy and speed. The system consists of a pronunciation dictionary as our look-up table, along with separate models to handle homographs, OOVs and ezaf… ▽ More In this paper, we investigate the application of end-to-end and multi-module frameworks for G2P conversion for the Persian language. The results demonstrate that our proposed multi-module G2P system outperforms our end-to-end systems in terms of accuracy and speed. The system consists of a pronunciation dictionary as our look-up table, along with separate models to handle homographs, OOVs and ezafe in Persian created using GRU and Transformer architectures. The system is sequence-level rather than word-level, which allows it to effectively capture the unwritten relations between words (cross-word information) necessary for homograph disambiguation and ezafe recognition without the need for any pre-processing. After evaluation, our system achieved a 94.48% word-level accuracy, outperforming the previous G2P systems for Persian. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: 10 pages, 4 figures

ACM Class: I.2.7

arXiv:2206.07117 [pdf, other]

TriHorn-Net: A Model for Accurate Depth-Based 3D Hand Pose Estimation

Authors: Mohammad Rezaei, Razieh Rastgoo, Vassilis Athitsos

Abstract: 3D hand pose estimation methods have made significant progress recently. However, the estimation accuracy is often far from sufficient for specific real-world applications, and thus there is significant room for improvement. This paper proposes TriHorn-Net, a novel model that uses specific innovations to improve hand pose estimation accuracy on depth images. The first innovation is the decompositi… ▽ More 3D hand pose estimation methods have made significant progress recently. However, the estimation accuracy is often far from sufficient for specific real-world applications, and thus there is significant room for improvement. This paper proposes TriHorn-Net, a novel model that uses specific innovations to improve hand pose estimation accuracy on depth images. The first innovation is the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space (UV), and the estimation of their corresponding depths aided by two complementary attention maps. This decomposition prevents depth estimation, which is a more difficult task, from interfering with the UV estimations at both the prediction and feature levels. The second innovation is PixDropout, which is, to the best of our knowledge, the first appearance-based data augmentation method for hand depth images. Experimental results demonstrate that the proposed model outperforms the state-of-the-art methods on three public benchmark datasets. Our implementation is available at https://github.com/mrezaei92/TriHorn-Net. △ Less

Submitted 26 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

arXiv:2206.00050 [pdf, other]

FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation

Authors: Mehmet Ozgur Turkoglu, Alexander Becker, Hüseyin Anil Gündüz, Mina Rezaei, Bernd Bischl, Rodrigo Caye Daudt, Stefano D'Aronco, Jan Dirk Wegner, Konrad Schindler

Abstract: The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computatio… ▽ More The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computational cost and high memory demand. This challenges in particular modern deep learning, where even a single deep network is already demanding in terms of compute and memory, and has given rise to a number of attempts to emulate the model ensemble without actually instantiating separate ensemble members. We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation (FiLM). That technique was originally developed for multi-task learning, with the aim of decoupling different tasks. We show that the idea can be extended to uncertainty quantification: by modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity, and consequently well-calibrated estimates of epistemic uncertainty, with low computational overhead in comparison. Empirically, FiLM-Ensemble outperforms other implicit ensemble methods, and it and comes very close to the upper bound of an explicit ensemble of networks (sometimes even beating it), at a fraction of the memory cost. △ Less

Submitted 19 December, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

Comments: accepted at NeurIPS 2022

arXiv:2205.10947 [pdf, other]

Deep Direct Discriminative Decoders for High-dimensional Time-series Data Analysis

Authors: Mohammad R. Rezaei, Milos R. Popovic, Milad Lankarany, Ali Yousefi

Abstract: The state-space models (SSMs) are widely utilized in the analysis of time-series data. SSMs rely on an explicit definition of the state and observation processes. Characterizing these processes is not always easy and becomes a modeling challenge when the dimension of observed data grows or the observed data distribution deviates from the normal distribution. Here, we propose a new formulation of S… ▽ More The state-space models (SSMs) are widely utilized in the analysis of time-series data. SSMs rely on an explicit definition of the state and observation processes. Characterizing these processes is not always easy and becomes a modeling challenge when the dimension of observed data grows or the observed data distribution deviates from the normal distribution. Here, we propose a new formulation of SSM for high-dimensional observation processes. We call this solution the deep direct discriminative decoder (D4). The D4 brings deep neural networks' expressiveness and scalability to the SSM formulation letting us build a novel solution that efficiently estimates the underlying state processes through high-dimensional observation signal. We demonstrate the D4 solutions in simulated and real data such as Lorenz attractors, Langevin dynamics, random walk dynamics, and rat hippocampus spiking neural data and show that the D4 performs better than traditional SSMs and RNNs. The D4 can be applied to a broader class of time-series data where the connection between high-dimensional observation and the underlying latent process is hard to characterize. △ Less

Submitted 3 July, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

arXiv:2204.04542 [pdf, other]

Survival Seq2Seq: A Survival Model based on Sequence to Sequence Architecture

Authors: Ebrahim Pourjafari, Navid Ziaei, Mohammad R. Rezaei, Amir Sameizadeh, Mohammad Shafiee, Mohammad Alavinia, Mansour Abolghasemian, Nick Sajadi

Abstract: This paper introduces a novel non-parametric deep model for estimating time-to-event (survival analysis) in presence of censored data and competing risks. The model is designed based on the sequence-to-sequence (Seq2Seq) architecture, therefore we name it Survival Seq2Seq. The first recurrent neural network (RNN) layer of the encoder of our model is made up of Gated Recurrent Unit with Decay (GRU-… ▽ More This paper introduces a novel non-parametric deep model for estimating time-to-event (survival analysis) in presence of censored data and competing risks. The model is designed based on the sequence-to-sequence (Seq2Seq) architecture, therefore we name it Survival Seq2Seq. The first recurrent neural network (RNN) layer of the encoder of our model is made up of Gated Recurrent Unit with Decay (GRU-D) cells. These cells have the ability to effectively impute not-missing-at-random values of longitudinal datasets with very high missing rates, such as electronic health records (EHRs). The decoder of Survival Seq2Seq generates a probability distribution function (PDF) for each competing risk without assuming any prior distribution for the risks. Taking advantage of RNN cells, the decoder is able to generate smooth and virtually spike-free PDFs. This is beyond the capability of existing non-parametric deep models for survival analysis. Training results on synthetic and medical datasets prove that Survival Seq2Seq surpasses other existing deep survival models in terms of the accuracy of predictions and the quality of generated PDFs. △ Less

Submitted 9 April, 2022; originally announced April 2022.

arXiv:2204.01729 [pdf, other]

Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models

Authors: Ashkan Khakzar, Yawei Li, Yang Zhang, Mirac Sanisoglu, Seong Tae Kim, Mina Rezaei, Bernd Bischl, Nassir Navab

Abstract: One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and… ▽ More One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2201.13192 [pdf, other]

Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning

Authors: Emilio Dorigatti, Jann Goschenhofer, Benjamin Schubert, Mina Rezaei, Bernd Bischl

Abstract: Positive-unlabeled learning (PUL) aims at learning a binary classifier from only positive and unlabeled training data. Even though real-world applications often involve imbalanced datasets where the majority of examples belong to one class, most contemporary approaches to PUL do not investigate performance in this setting, thus severely limiting their applicability in practice. In this work, we th… ▽ More Positive-unlabeled learning (PUL) aims at learning a binary classifier from only positive and unlabeled training data. Even though real-world applications often involve imbalanced datasets where the majority of examples belong to one class, most contemporary approaches to PUL do not investigate performance in this setting, thus severely limiting their applicability in practice. In this work, we thus propose to tackle the issues of imbalanced datasets and model calibration in a PUL setting through an uncertainty-aware pseudo-labeling procedure (PUUPL): by boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set, while explicit uncertainty quantification prevents the emergence of harmful confirmation bias leading to increased predictive performance. Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings while also showing strong performance in balanced PU scenarios across recent baselines. We furthermore provide ablations and sensitivity analyses to shed light on PUUPL's several ingredients. Finally, a real-world application with an imbalanced dataset confirms the advantage of our approach. △ Less

Submitted 10 March, 2024; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: 25 pages, 4 figures

arXiv:2109.10777 [pdf, other]

Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images

Authors: Farzin Soleymani, Mohammad Eslami, Tobias Elze, Bernd Bischl, Mina Rezaei

Abstract: We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivari… ▽ More We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivariate Gaussian posterior captures the latent distribution of a large set of unlabeled images. Then, we perform unsupervised clustering on top of the variational latent space using a clustering loss. In this approach, the probabilistic decoder helps to prevent the distortion of data points in the latent space and to preserve the local structure of data generating distribution. The training process can be considered as a self-training process to refine the latent space and simultaneously optimizing cluster assignments iteratively. We evaluated our proposed framework on three public datasets that represented different medical imaging modalities. Our experimental results show that our proposed framework generalizes better across different datasets. It achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over conventional deep unsupervised learning in real-world applications. The source code of the method and all the experiments are available publicly at: https://github.com/csfarzin/DVC △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: arXiv admin note: text overlap with arXiv:2109.05232

arXiv:2109.09435 [pdf]

Incremental Learning Techniques for Online Human Activity Recognition

Authors: Meysam Vakili, Masoumeh Rezaei

Abstract: Unobtrusive and smart recognition of human activities using smartphones inertial sensors is an interesting topic in the field of artificial intelligence acquired tremendous popularity among researchers, especially in recent years. A considerable challenge that needs more attention is the real-time detection of physical activities, since for many real-world applications such as health monitoring an… ▽ More Unobtrusive and smart recognition of human activities using smartphones inertial sensors is an interesting topic in the field of artificial intelligence acquired tremendous popularity among researchers, especially in recent years. A considerable challenge that needs more attention is the real-time detection of physical activities, since for many real-world applications such as health monitoring and elderly care, it is required to recognize users' activities immediately to prevent severe damages to individuals' wellness. In this paper, we propose a human activity recognition (HAR) approach for the online prediction of physical movements, benefiting from the capabilities of incremental learning algorithms. We develop a HAR system containing monitoring software and a mobile application that collects accelerometer and gyroscope data and send them to a remote server via the Internet for classification and recognition operations. Six incremental learning algorithms are employed and evaluated in this work and compared with several batch learning algorithms commonly used for developing offline HAR systems. The Final results indicated that considering all performance evaluation metrics, Incremental K-Nearest Neighbors and Incremental Naive Bayesian outperformed other algorithms, exceeding a recognition accuracy of 95% in real-time. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: 16 pages, 5 figures, 7 tables

arXiv:2109.09165 [pdf, other]

Traffic-Net: 3D Traffic Monitoring Using a Single Camera

Authors: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir

Abstract: Computer Vision has played a major role in Intelligent Transportation Systems (ITS) and traffic surveillance. Along with the rapidly growing automated vehicles and crowded cities, the automated and advanced traffic management systems (ATMS) using video surveillance infrastructures have been evolved by the implementation of Deep Neural Networks. In this research, we provide a practical platform for… ▽ More Computer Vision has played a major role in Intelligent Transportation Systems (ITS) and traffic surveillance. Along with the rapidly growing automated vehicles and crowded cities, the automated and advanced traffic management systems (ATMS) using video surveillance infrastructures have been evolved by the implementation of Deep Neural Networks. In this research, we provide a practical platform for real-time traffic monitoring, including 3D vehicle/pedestrian detection, speed detection, trajectory estimation, congestion detection, as well as monitoring the interaction of vehicles and pedestrians, all using a single CCTV traffic camera. We adapt a custom YOLOv5 deep neural network model for vehicle/pedestrian detection and an enhanced SORT tracking algorithm. For the first time, a hybrid satellite-ground based inverse perspective mapping (SG-IPM) method for camera auto-calibration is also developed which leads to an accurate 3D object detection and visualisation. We also develop a hierarchical traffic modelling solution based on short- and long-term temporal video data stream to understand the traffic flow, bottlenecks, and risky spots for vulnerable road users. Several experiments on real-world scenarios and comparisons with state-of-the-art are conducted using various traffic monitoring datasets, including MIO-TCD, UA-DETRAC and GRAM-RTM collected from highways, intersections, and urban areas under different lighting and weather conditions. △ Less

Submitted 2 July, 2022; v1 submitted 19 September, 2021; originally announced September 2021.

arXiv:2109.07455 [pdf, other]

Deep Bregman Divergence for Contrastive Learning of Visual Representations

Authors: Mina Rezaei, Farzin Soleymani, Bernd Bischl, Shekoofeh Azizi

Abstract: Deep Bregman divergence measures divergence of data points using neural networks which is beyond Euclidean distance and capable of capturing divergence over distributions. In this paper, we propose deep Bregman divergences for contrastive learning of visual representation where we aim to enhance contrastive loss used in self-supervised learning by training additional networks based on functional B… ▽ More Deep Bregman divergence measures divergence of data points using neural networks which is beyond Euclidean distance and capable of capturing divergence over distributions. In this paper, we propose deep Bregman divergences for contrastive learning of visual representation where we aim to enhance contrastive loss used in self-supervised learning by training additional networks based on functional Bregman divergence. In contrast to the conventional contrastive learning methods which are solely based on divergences between single points, our framework can capture the divergence between distributions which improves the quality of learned representation. We show the combination of conventional contrastive loss and our proposed divergence loss outperforms baseline and most of the previous methods for self-supervised and semi-supervised learning on multiple classifications and object detection tasks and datasets. Moreover, the learned representations generalize well when transferred to the other datasets and tasks. The source code and our models are available in supplementary and will be released with paper. △ Less

Submitted 22 November, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

arXiv:2109.05232 [pdf, other]

Joint Debiased Representation Learning and Imbalanced Data Clustering

Authors: Mina Rezaei, Emilio Dorigatti, David Ruegamer, Bernd Bischl

Abstract: One of the most promising approaches for unsupervised learning is combining deep representation learning and deep clustering. Some recent works propose to simultaneously learn representation using deep neural networks and perform clustering by defining a clustering loss on top of embedded features. However, these approaches are sensitive to imbalanced data and out-of-distribution samples. As a con… ▽ More One of the most promising approaches for unsupervised learning is combining deep representation learning and deep clustering. Some recent works propose to simultaneously learn representation using deep neural networks and perform clustering by defining a clustering loss on top of embedded features. However, these approaches are sensitive to imbalanced data and out-of-distribution samples. As a consequence, these methods optimize clustering by pushing data close to randomly initialized cluster centers. This is problematic when the number of instances varies largely in different classes or a cluster with few samples has less chance to be assigned a good centroid. To overcome these limitations, we introduce a new unsupervised framework for joint debiased representation learning and image clustering. We simultaneously train two deep learning models, a deep representation network that captures the data distribution, and a deep clustering network that learns embedded features and performs clustering. Specifically, the clustering network and learning representation network both take advantage of our proposed statistics pooling block that represents mean, variance, and cardinality to handle the out-of-distribution samples and class imbalance. Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to the out-of-distribution dataset. △ Less

Submitted 6 September, 2022; v1 submitted 11 September, 2021; originally announced September 2021.

arXiv:2105.14383 [pdf, other]

Gradient-Free Neural Network Training via Synaptic-Level Reinforcement Learning

Authors: Aman Bhargava, Mohammad R. Rezaei, Milad Lankarany

Abstract: An ongoing challenge in neural information processing is: how do neurons adjust their connectivity to improve task performance over time (i.e., actualize learning)? It is widely believed that there is a consistent, synaptic-level learning mechanism in specific brain regions that actualizes learning. However, the exact nature of this mechanism remains unclear. Here we propose an algorithm based on… ▽ More An ongoing challenge in neural information processing is: how do neurons adjust their connectivity to improve task performance over time (i.e., actualize learning)? It is widely believed that there is a consistent, synaptic-level learning mechanism in specific brain regions that actualizes learning. However, the exact nature of this mechanism remains unclear. Here we propose an algorithm based on reinforcement learning (RL) to generate and apply a simple synaptic-level learning policy for multi-layer perceptron (MLP) models. In this algorithm, the action space for each MLP synapse consists of a small increase, decrease, or null action on the synapse weight, and the state for each synapse consists of the last two actions and reward signals. A binary reward signal indicates improvement or deterioration in task performance. The static policy produces superior training relative to the adaptive policy and is agnostic to activation function, network shape, and task. Trained MLPs yield character recognition performance comparable to identically shaped networks trained with gradient descent. 0 hidden unit character recognition tests yielded an average validation accuracy of 88.28%, 1.86$\pm$0.47% higher than the same MLP trained with gradient descent. 32 hidden unit character recognition tests yielded an average validation accuracy of 88.45%, 1.11$\pm$0.79% lower than the same MLP trained with gradient descent. The robustness and lack of reliance on gradient computations opens the door for new techniques for training difficult-to-differentiate artificial neural networks such as spiking neural networks (SNNs) and recurrent neural networks (RNNs). Further, the method's simplicity provides a unique opportunity for further development of local rule-driven multi-agent connectionist models for machine intelligence analogous to cellular automata. △ Less

Submitted 29 May, 2021; originally announced May 2021.

Comments: 10 pages, 3 figures, submitted to NeurIPS 2021

MSC Class: 68T07 ACM Class: I.2.6

arXiv:2105.04881 [pdf]

doi 10.1016/j.compbiomed.2021.104697

Applications of Deep Learning Techniques for Automated Multiple Sclerosis Detection Using Magnetic Resonance Imaging: A Review

Authors: Afshin Shoeibi, Marjane Khodatars, Mahboobeh Jafari, Parisa Moridian, Mitra Rezaei, Roohallah Alizadehsani, Fahime Khozeimeh, Juan Manuel Gorriz, Jónathan Heras, Maryam Panahiazar, Saeid Nahavandi, U. Rajendra Acharya

Abstract: Multiple Sclerosis (MS) is a type of brain disease which causes visual, sensory, and motor problems for people with a detrimental effect on the functioning of the nervous system. In order to diagnose MS, multiple screening methods have been proposed so far; among them, magnetic resonance imaging (MRI) has received considerable attention among physicians. MRI modalities provide physicians with fund… ▽ More Multiple Sclerosis (MS) is a type of brain disease which causes visual, sensory, and motor problems for people with a detrimental effect on the functioning of the nervous system. In order to diagnose MS, multiple screening methods have been proposed so far; among them, magnetic resonance imaging (MRI) has received considerable attention among physicians. MRI modalities provide physicians with fundamental information about the structure and function of the brain, which is crucial for the rapid diagnosis of MS lesions. Diagnosing MS using MRI is time-consuming, tedious, and prone to manual errors. Hence, computer aided diagnosis systems (CADS) based on artificial intelligence (AI) methods have been proposed in recent years for accurate diagnosis of MS using MRI neuroimaging modalities. In the AI field, automated MS diagnosis is being conducted using (i) conventional machine learning and (ii) deep learning (DL) techniques. The conventional machine learning approach is based on feature extraction and selection by trial and error. In DL, these steps are performed by the DL model itself. In this paper, a complete review of automated MS diagnosis methods performed using DL techniques with MRI neuroimaging modalities are discussed. Also, each work is thoroughly reviewed and discussed. Finally, the most important challenges and future directions in the automated MS diagnosis using DL techniques coupled with MRI modalities are presented in detail. △ Less

Submitted 9 August, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

Journal ref: Computers in Biology and Medicine,Volume 136,2021,104697

arXiv:2105.00499 [pdf, other]

Curious Exploration and Return-based Memory Restoration for Deep Reinforcement Learning

Authors: Saeed Tafazzol, Erfan Fathi, Mahdi Rezaei, Ehsan Asali

Abstract: Reward engineering and designing an incentive reward function are non-trivial tasks to train agents in complex environments. Furthermore, an inaccurate reward function may lead to a biased behaviour which is far from an efficient and optimised behaviour. In this paper, we focus on training a single agent to score goals with binary success/failure reward function in Half Field Offense domain. As th… ▽ More Reward engineering and designing an incentive reward function are non-trivial tasks to train agents in complex environments. Furthermore, an inaccurate reward function may lead to a biased behaviour which is far from an efficient and optimised behaviour. In this paper, we focus on training a single agent to score goals with binary success/failure reward function in Half Field Offense domain. As the major advantage of this research, the agent has no presumption about the environment which means it only follows the original formulation of reinforcement learning agents. The main challenge of using such a reward function is the high sparsity of positive reward signals. To address this problem, we use a simple prediction-based exploration strategy (called Curious Exploration) along with a Return-based Memory Restoration (RMR) technique which tends to remember more valuable memories. The proposed method can be utilized to train agents in environments with fairly complex state and action spaces. Our experimental results show that many recent solutions including our baseline method fail to learn and perform in complex soccer domain. However, the proposed method can converge easily to the nearly optimal behaviour. The video presenting the performance of our trained agent is available at http://bit.ly/HFO_Binary_Reward. △ Less

Submitted 2 May, 2021; originally announced May 2021.

arXiv:2102.04238 [pdf]

Amazon Product Recommender System

Authors: Mohammad R. Rezaei

Abstract: The number of reviews on Amazon has grown significantly over the years. Customers who made purchases on Amazon provide reviews by rating the product from 1 to 5 stars and sharing a text summary of their experience and opinion of the product. The ratings of a product are averaged to provide an overall product rating. We analyzed what ratings score customers give to a specific product (a music track… ▽ More The number of reviews on Amazon has grown significantly over the years. Customers who made purchases on Amazon provide reviews by rating the product from 1 to 5 stars and sharing a text summary of their experience and opinion of the product. The ratings of a product are averaged to provide an overall product rating. We analyzed what ratings score customers give to a specific product (a music track) in order to build a recommender model for digital music tracks on Amazon. We test various traditional models along with our proposed deep neural network (DNN) architecture to predict the reviews rating score. The Amazon review dataset contains 200,000 data samples; we train the models on 70% of the dataset and test the performance of the models on the remaining 30% of the dataset. △ Less

Submitted 30 January, 2021; originally announced February 2021.

arXiv:2011.13851 [pdf, other]

Real-time Active Vision for a Humanoid Soccer Robot Using Deep Reinforcement Learning

Authors: Soheil Khatibi, Meisam Teimouri, Mahdi Rezaei

Abstract: In this paper, we present an active vision method using a deep reinforcement learning approach for a humanoid soccer-playing robot. The proposed method adaptively optimises the viewpoint of the robot to acquire the most useful landmarks for self-localisation while keeping the ball into its viewpoint. Active vision is critical for humanoid decision-maker robots with a limited field of view. To deal… ▽ More In this paper, we present an active vision method using a deep reinforcement learning approach for a humanoid soccer-playing robot. The proposed method adaptively optimises the viewpoint of the robot to acquire the most useful landmarks for self-localisation while keeping the ball into its viewpoint. Active vision is critical for humanoid decision-maker robots with a limited field of view. To deal with an active vision problem, several probabilistic entropy-based approaches have previously been proposed which are highly dependent on the accuracy of the self-localisation model. However, in this research, we formulate the problem as an episodic reinforcement learning problem and employ a Deep Q-learning method to solve it. The proposed network only requires the raw images of the camera to move the robot's head toward the best viewpoint. The model shows a very competitive rate of 80% success rate in achieving the best viewpoint. We implemented the proposed method on a humanoid robot simulated in Webots simulator. Our evaluations and experimental results show that the proposed method outperforms the entropy-based methods in the RoboCup context, in cases with high self-localisation errors. △ Less

Submitted 27 November, 2020; originally announced November 2020.

Comments: The paper has been accepted in ICAART 2021

arXiv:2008.11672 [pdf, other]

doi 10.3390/app10217514

DeepSOCIAL: Social Distancing Monitoring and Infection Risk Assessment in COVID-19 Pandemic

Authors: Mahdi Rezaei, Mohsen Azarmi

Abstract: Social distancing is a recommended solution by the World Health Organisation (WHO) to minimise the spread of COVID-19 in public places. The majority of governments and national health authorities have set the 2-meter physical distancing as a mandatory safety measure in shopping centres, schools and other covered areas. In this research, we develop a hybrid Computer Vision and YOLOv4-based Deep Neu… ▽ More Social distancing is a recommended solution by the World Health Organisation (WHO) to minimise the spread of COVID-19 in public places. The majority of governments and national health authorities have set the 2-meter physical distancing as a mandatory safety measure in shopping centres, schools and other covered areas. In this research, we develop a hybrid Computer Vision and YOLOv4-based Deep Neural Network model for automated people detection in the crowd in indoor and outdoor environments using common CCTV security cameras. The proposed DNN model in combination with an adapted inverse perspective mapping (IPM) technique and SORT tracking algorithm leads to a robust people detection and social distancing monitoring. The model has been trained against two most comprehensive datasets by the time of the research the Microsoft Common Objects in Context (MS COCO) and Google Open Image datasets. The system has been evaluated against the Oxford Town Centre dataset with superior performance compared to three state-of-the-art methods. The evaluation has been conducted in challenging conditions, including occlusion, partial visibility, and under lighting variations with the mean average precision of 99.8% and the real-time speed of 24.1 fps. We also provide an online infection risk assessment scheme by statistical analysis of the Spatio-temporal data from people's moving trajectories and the rate of social distancing violations. The developed model is a generic and accurate people detection and tracking solution that can be applied in many other fields such as autonomous vehicles, human action recognition, anomaly detection, sports, crowd analysis, or any other research areas where the human detection is in the centre of attention. △ Less

Submitted 28 November, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

Journal ref: Applied Sciences. 2020, 10, 7514

Showing 1–50 of 69 results for author: Rezaei, M