Search | arXiv e-print repository

Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data

Authors: Ritesh Mehta, Aleksandar Pramov, Shashank Verma

Abstract: Amyotrophic Lateral Sclerosis (ALS) is characterized as a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options in the realm of medical interventions and therapies. The disease showcases a diverse range of onset patterns and progression trajectories, emphasizing the critical importance of early detection of functional decline to enable tailored care… ▽ More Amyotrophic Lateral Sclerosis (ALS) is characterized as a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options in the realm of medical interventions and therapies. The disease showcases a diverse range of onset patterns and progression trajectories, emphasizing the critical importance of early detection of functional decline to enable tailored care strategies and timely therapeutic interventions. The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app. This data is used to construct various machine learning models specifically designed to forecast the advancement of the ALS Functional Rating Scale-Revised (ALSFRS-R) score, leveraging the dataset provided by the organizers. In our analysis, multiple predictive models were evaluated to determine their efficacy in handling ALS sensor data. The temporal aspect of the sensor data was compressed and amalgamated using statistical methods, thereby augmenting the interpretability and applicability of the gathered information for predictive modeling objectives. The models that demonstrated optimal performance were a naive baseline and ElasticNet regression. The naive model achieved a Mean Absolute Error (MAE) of 0.20 and a Root Mean Square Error (RMSE) of 0.49, slightly outperforming the ElasticNet model, which recorded an MAE of 0.22 and an RMSE of 0.50. Our comparative analysis suggests that while the naive approach yielded marginally better predictive accuracy, the ElasticNet model provides a robust framework for understanding feature contributions. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Paper submitted to CLEF 2024 CEUR-WS

arXiv:2404.13527 [pdf, other]

On the structure of envy-free orientations on graphs

Authors: Jinghan A Zeng, Ruta Mehta

Abstract: Fair division is the problem of allocating a set of items among agents in a fair manner. One of the most sought-after fairness notions is envy-freeness (EF), requiring that no agent envies another's allocation. When items are indivisible, it ceases to exist, and envy-freeness up to any good (EFX) emerged as one of its strongest relaxations. The existence of EFX allocations is arguably the biggest… ▽ More Fair division is the problem of allocating a set of items among agents in a fair manner. One of the most sought-after fairness notions is envy-freeness (EF), requiring that no agent envies another's allocation. When items are indivisible, it ceases to exist, and envy-freeness up to any good (EFX) emerged as one of its strongest relaxations. The existence of EFX allocations is arguably the biggest open question within fair division. Recently, Christodoulou, Fiat, Koutsoupias, and Sgouritsa (EC 2023) introduced showed that EFX allocations exist for the case of graphical valuations where an instance is represented by a graph: nodes are agents, edges are goods, and each agent values only her incident edges. On the other hand, they showed NP-hardness for checking the existence of EFX orientation where every edge is allocated to one of its incident vertices, and asked for a characterization of graphs that exhibit EFX orientation regardless of the assigned valuations. In this paper, we make significant progress toward answering their question. We introduce the notion of strongly EFX orientable graphs -- graphs that have EFX orientations regardless of how much agents value the edges. We show a surprising connection between this property and the chromatic number of the graph, namely $χ(G)$ for graph $G$. In particular, we show that graphs with $χ(G)\le 2$ are strongly EFX orientable, and those with $χ(G)>3$ are not strongly EFX orientable. We provide examples of strongly EFX orientable and non-strongly EFX orientable graphs of $χ(G)=3$ to prove tightness. Finally, we give a complete characterization of strong EFX orientability when restricted to binary valuations. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 12 pages, 4 figures

arXiv:2404.06948 [pdf, other]

MetaCheckGPT -- A Multi-task Hallucination Detector Using LLM Uncertainty and Meta-models

Authors: Rahul Mehta, Andrew Hoblitzell, Jack O'Keefe, Hyeju Jang, Vasudeva Varma

Abstract: Hallucinations in large language models (LLMs) have recently become a significant problem. A recent effort in this direction is a shared task at Semeval 2024 Task 6, SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. This paper describes our winning solution ranked 1st and 2nd in the 2 sub-tasks of model agnostic and model aware tracks respectively. We propose… ▽ More Hallucinations in large language models (LLMs) have recently become a significant problem. A recent effort in this direction is a shared task at Semeval 2024 Task 6, SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. This paper describes our winning solution ranked 1st and 2nd in the 2 sub-tasks of model agnostic and model aware tracks respectively. We propose a meta-regressor framework of LLMs for model evaluation and integration that achieves the highest scores on the leaderboard. We also experiment with various transformer-based models and black box methods like ChatGPT, Vectara, and others. In addition, we perform an error analysis comparing GPT4 against our best model which shows the limitations of the former. △ Less

Submitted 11 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: Entry for SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

MSC Class: 68T07; 68T50 ACM Class: I.2.7

arXiv:2403.10763 [pdf, other]

A Primal-Dual Algorithm for Faster Distributionally Robust Optimization

Authors: Ronak Mehta, Jelena Diakonikolas, Zaid Harchaoui

Abstract: We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses the $f$-DRO, Wasserstein-DRO, and spectral/$L$-risk formulations used in practice. We present Drago, a stochastic primal-dual algorithm that achieves a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems. The method com… ▽ More We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses the $f$-DRO, Wasserstein-DRO, and spectral/$L$-risk formulations used in practice. We present Drago, a stochastic primal-dual algorithm that achieves a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems. The method combines both randomized and cyclic components with mini-batching, which effectively handles the unique asymmetric nature of the primal and dual problems in DRO. We support our theoretical results with numerical benchmarks in classification and regression. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.10439 [pdf, other]

Competitive Equilibrium for Chores: from Dual Eisenberg-Gale to a Fast, Greedy, LP-based Algorithm

Authors: Bhaskar Ray Chaudhury, Christian Kroer, Ruta Mehta, Tianlong Nan

Abstract: We study the computation of competitive equilibrium for Fisher markets with $n$ agents and $m$ divisible chores. Prior work showed that competitive equilibria correspond to the nonzero KKT points of a non-convex analogue of the Eisenberg-Gale convex program. We introduce an analogue of the Eisenberg-Gale dual for chores: we show that all KKT points of this dual correspond to competitive equilibria… ▽ More We study the computation of competitive equilibrium for Fisher markets with $n$ agents and $m$ divisible chores. Prior work showed that competitive equilibria correspond to the nonzero KKT points of a non-convex analogue of the Eisenberg-Gale convex program. We introduce an analogue of the Eisenberg-Gale dual for chores: we show that all KKT points of this dual correspond to competitive equilibria, and while it is not a dual of the non-convex primal program in a formal sense, the objectives touch at all KKT points. Similar to the primal, the dual has problems from an optimization perspective: there are many feasible directions where the objective tends to positive infinity. We then derive a new constraint for the dual, which restricts optimization to a hyperplane that avoids all these directions. We show that restriction to this hyperplane retains all KKT points, and surprisingly, does not introduce any new ones. This allows, for the first time ever, application of iterative optimization methods over a convex region for computing competitive equilibria for chores. We next introduce a greedy Frank-Wolfe algorithm for optimization over our program and show a state-of-the-art convergence rate to competitive equilibrium. In the case of equal incomes, we show a $\mathcal{\tilde O}(n/ε^2)$ rate of convergence, which improves over the two prior state-of-the-art rates of $\mathcal{\tilde O}(n^3/ε^2)$ for an exterior-point method and $\mathcal{\tilde O}(nm/ε^2)$ for a combinatorial method. Moreover, our method is significantly simpler: each iteration of our method only requires solving a simple linear program. We show through numerical experiments on simulated data and a paper review bidding dataset that our method is extremely practical. This is the first highly practical method for solving competitive equilibrium for Fisher markets with chores. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 25 pages, 17 figures

arXiv:2401.11645 [pdf, other]

doi 10.1109/SLT54892.2023.10022475

Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

Authors: Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh Mehta

Abstract: Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support… ▽ More Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support switching between the languages, without any language input from the user. The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism. As the language-specific posteriors are combined, it produces a single posterior probability over all the output symbols, enabling a single beam search decoding and also allowing dynamic switching between the languages. The proposed approach outperforms the conventional bilingual baseline with 13.3%, 8.23% and 1.3% word error rate relative reduction on Hindi, English and code-mixed test sets, respectively. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: Published in IEEE's Spoken Language Technology (SLT) 2022, 8 pages (6 + 2 for references), 5 figures

Journal ref: 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 252-259

arXiv:2312.08509 [pdf, ps, other]

Approximating APS under Submodular and XOS valuations with Binary Marginals

Authors: Pooja Kulkarni, Rucha Kulkarni, Ruta Mehta

Abstract: We study the problem of fairly dividing indivisible goods among a set of agents under the fairness notion of Any Price Share (APS). APS is known to dominate the widely studied Maximin share (MMS). Since an exact APS allocation may not exist, the focus has traditionally been on the computation of approximate APS allocations. Babaioff et al. studied the problem under additive valuations, and asked (… ▽ More We study the problem of fairly dividing indivisible goods among a set of agents under the fairness notion of Any Price Share (APS). APS is known to dominate the widely studied Maximin share (MMS). Since an exact APS allocation may not exist, the focus has traditionally been on the computation of approximate APS allocations. Babaioff et al. studied the problem under additive valuations, and asked (i) how large can the APS value be compared to the MMS value? and (ii) what guarantees can one achieve beyond additive functions. We partly answer these questions by considering valuations beyond additive, namely submodular and XOS functions, with binary marginals. For the submodular functions with binary marginals, also known as matroid rank functions (MRFs), we show that APS is exactly equal to MMS. Consequently, we get that an exact APS allocation exists and can be computed efficiently while maximizing the social welfare. Complementing this result, we show that it is NP-hard to compute the APS value within a factor of 5/6 for submodular valuations with three distinct marginals of {0, 1/2, 1}. We then consider binary XOS functions, which are immediate generalizations of binary submodular functions in the complement free hierarchy. In contrast to the MRFs setting, MMS and APS values are not equal under this case. Nevertheless, we show that under binary XOS valuations, $MMS \leq APS \leq 2 \cdot MMS + 1$. Further, we show that this is almost the tightest bound we can get using MMS, by giving an instance where $APS \geq 2 \cdot MMS$. The upper bound on APS, implies a ~0.1222-approximation for APS under binary XOS valuations. And the lower bound implies the non-existence of better than 0.5-APS even when agents have identical valuations, which is in sharp contrast to the guaranteed existence of exact MMS allocation when agent valuations are identical. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.08504 [pdf, ps, other]

1/2 Approximate MMS Allocation for Separable Piecewise Linear Concave Valuations

Authors: Chandra Chekuri, Pooja Kulkarni, Rucha Kulkarni, Ruta Mehta

Abstract: We study fair distribution of a collection of m indivisible goods among a group of n agents, using the widely recognized fairness principles of Maximin Share (MMS) and Any Price Share (APS). These principles have undergone thorough investigation within the context of additive valuations. We explore these notions for valuations that extend beyond additivity. First, we study approximate MMS under… ▽ More We study fair distribution of a collection of m indivisible goods among a group of n agents, using the widely recognized fairness principles of Maximin Share (MMS) and Any Price Share (APS). These principles have undergone thorough investigation within the context of additive valuations. We explore these notions for valuations that extend beyond additivity. First, we study approximate MMS under the separable (piecewise-linear) concave (SPLC) valuations, an important class generalizing additive, where the best known factor was 1/3-MMS. We show that 1/2-MMS allocation exists and can be computed in polynomial time, significantly improving the state-of-the-art. We note that SPLC valuations introduce an elevated level of intricacy in contrast to additive. For instance, the MMS value of an agent can be as high as her value for the entire set of items. Further, the equilibrium computation problem, which is polynomial-time for additive valuations, becomes intractable for SPLC. We use a relax-and-round paradigm that goes through competitive equilibrium and LP relaxation. Our result extends to give (symmetric) 1/2-APS, a stronger guarantee than MMS. APS is a stronger notion that generalizes MMS by allowing agents with arbitrary entitlements. We study the approximation of APS under submodular valuation functions. We design and analyze a simple greedy algorithm using concave extensions of submodular functions. We prove that the algorithm gives a 1/3-APS allocation which matches the current best-known factor. Concave extensions are hard to compute in polynomial time and are, therefore, generally not used in approximation algorithms. Our approach shows a way to utilize it within analysis (while bypassing its computation), and might be of independent interest. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: To appear in AAAI Conference on Artificial Intelligence, 2024

arXiv:2310.13863 [pdf, other]

Distributionally Robust Optimization with Bias and Variance Reduction

Authors: Ronak Mehta, Vincent Roulet, Krishna Pillutla, Zaid Harchaoui

Abstract: We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and $f$-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-$k$ loss. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparame… ▽ More We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and $f$-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-$k$ loss. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparameter, and prove that it enjoys linear convergence for smooth regularized losses. This contrasts with previous algorithms that either require tuning multiple hyperparameters or potentially fail to converge due to biased gradient estimates or inadequate regularization. Empirically, we show that Prospect can converge 2-3$\times$ faster than baselines such as stochastic gradient and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2309.11097 [pdf]

Evaluating Mental Stress Among College Students Using Heart Rate and Hand Acceleration Data Collected from Wearable Sensors

Authors: Moein Razavi, Anthony McDonald, Ranjana Mehta, Farzan Sasangohar

Abstract: Stress is various mental health disorders including depression and anxiety among college students. Early stress diagnosis and intervention may lower the risk of developing mental illnesses. We examined a machine learning-based method for identification of stress using data collected in a naturalistic study utilizing self-reported stress as ground truth as well as physiological data such as heart r… ▽ More Stress is various mental health disorders including depression and anxiety among college students. Early stress diagnosis and intervention may lower the risk of developing mental illnesses. We examined a machine learning-based method for identification of stress using data collected in a naturalistic study utilizing self-reported stress as ground truth as well as physiological data such as heart rate and hand acceleration. The study involved 54 college students from a large campus who used wearable wrist-worn sensors and a mobile health (mHealth) application continuously for 40 days. The app gathered physiological data including heart rate and hand acceleration at one hertz frequency. The application also enabled users to self-report stress by tapping on the watch face, resulting in a time-stamped record of the self-reported stress. We created, evaluated, and analyzed machine learning algorithms for identifying stress episodes among college students using heart rate and accelerometer data. The XGBoost method was the most reliable model with an AUC of 0.64 and an accuracy of 84.5%. The standard deviation of hand acceleration, standard deviation of heart rate, and the minimum heart rate were the most important features for stress detection. This evidence may support the efficacy of identifying patterns in physiological reaction to stress using smartwatch sensors and may inform the design of future tools for real-time detection of stress. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2308.10984 [pdf, other]

Debiasing Counterfactuals In the Presence of Spurious Correlations

Authors: Amar Kumar, Nima Fathi, Raghav Mehta, Brennan Nichyporuk, Jean-Pierre R. Falet, Sotirios Tsaftaris, Tal Arbel

Abstract: Deep learning models can perform well in complex medical imaging classification tasks, even when basing their conclusions on spurious correlations (i.e. confounders), should they be prevalent in the training dataset, rather than on the causal image markers of interest. This would thereby limit their ability to generalize across the population. Explainability based on counterfactual image generatio… ▽ More Deep learning models can perform well in complex medical imaging classification tasks, even when basing their conclusions on spurious correlations (i.e. confounders), should they be prevalent in the training dataset, rather than on the causal image markers of interest. This would thereby limit their ability to generalize across the population. Explainability based on counterfactual image generation can be used to expose the confounders but does not provide a strategy to mitigate the bias. In this work, we introduce the first end-to-end training framework that integrates both (i) popular debiasing classifiers (e.g. distributionally robust optimization (DRO)) to avoid latching onto the spurious correlations and (ii) counterfactual image generation to unveil generalizable imaging markers of relevance to the task. Additionally, we propose a novel metric, Spurious Correlation Latching Score (SCLS), to quantify the extent of the classifier reliance on the spurious correlation as exposed by the counterfactual images. Through comprehensive experiments on two public datasets (with the simulated and real visual artifacts), we demonstrate that the debiasing method: (i) learns generalizable markers across the population, and (ii) successfully ignores spurious correlations and focuses on the underlying disease pathology. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: Accepted to the FAIMI (Fairness of AI in Medical Imaging) workshop at MICCAI 2023

arXiv:2307.11032 [pdf, other]

A Natural Language Processing Approach to Malware Classification

Authors: Ritik Mehta, Olha Jurečková, Mark Stamp

Abstract: Many different machine learning and deep learning techniques have been successfully employed for malware detection and classification. Examples of popular learning techniques in the malware domain include Hidden Markov Models (HMM), Random Forests (RF), Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Recurrent Neural Networks (RNN) such as Long Short-Term Memory (LSTM) netw… ▽ More Many different machine learning and deep learning techniques have been successfully employed for malware detection and classification. Examples of popular learning techniques in the malware domain include Hidden Markov Models (HMM), Random Forests (RF), Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Recurrent Neural Networks (RNN) such as Long Short-Term Memory (LSTM) networks. In this research, we consider a hybrid architecture, where HMMs are trained on opcode sequences, and the resulting hidden states of these trained HMMs are used as feature vectors in various classifiers. In this context, extracting the HMM hidden state sequences can be viewed as a form of feature engineering that is somewhat analogous to techniques that are commonly employed in Natural Language Processing (NLP). We find that this NLP-based approach outperforms other popular techniques on a challenging malware dataset, with an HMM-Random Forrest model yielding the best results. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2307.01738 [pdf, other]

Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis

Authors: Changjian Shui, Justin Szeto, Raghav Mehta, Douglas L. Arnold, Tal Arbel

Abstract: Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to su… ▽ More Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to successfully mitigate biases across subgroups in terms of model accuracy, this work focuses on the open problem of mitigating calibration biases in the context of medical image analysis. Our method does not require subgroup attributes during training, permitting the flexibility to mitigate biases for different choices of sensitive attributes without re-training. To this end, we propose a novel two-stage method: Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias. We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients. In addition to considering traditional sensitive attributes (e.g. age, sex) with demographic subgroups, we also consider biases among groups with different image-derived attributes, such as lesion load, which are required in medical image analysis. Our results demonstrate that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, and outperforming recent baselines. △ Less

Submitted 20 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

arXiv:2305.04788 [pdf, ps, other]

Fair and Efficient Allocation of Indivisible Chores with Surplus

Authors: Hannaneh Akrami, Bhaskar Ray Chaudhury, Jugal Garg, Kurt Mehlhorn, Ruta Mehta

Abstract: We study fair division of indivisible chores among $n$ agents with additive disutility functions. Two well-studied fairness notions for indivisible items are envy-freeness up to one/any item (EF1/EFX) and the standard notion of economic efficiency is Pareto optimality (PO). There is a noticeable gap between the results known for both EF1 and EFX in the goods and chores settings. The case of chores… ▽ More We study fair division of indivisible chores among $n$ agents with additive disutility functions. Two well-studied fairness notions for indivisible items are envy-freeness up to one/any item (EF1/EFX) and the standard notion of economic efficiency is Pareto optimality (PO). There is a noticeable gap between the results known for both EF1 and EFX in the goods and chores settings. The case of chores turns out to be much more challenging. We reduce this gap by providing slightly relaxed versions of the known results on goods for the chores setting. Interestingly, our algorithms run in polynomial time, unlike their analogous versions in the goods setting. We introduce the concept of $k$ surplus which means that up to $k$ more chores are allocated to the agents and each of them is a copy of an original chore. We present a polynomial-time algorithm which gives EF1 and PO allocations with $(n-1)$ surplus. We relax the notion of EFX slightly and define tEFX which requires that the envy from agent $i$ to agent $j$ is removed upon the transfer of any chore from the $i$'s bundle to $j$'s bundle. We give a polynomial-time algorithm that in the chores case for $3$ agents returns an allocation which is either proportional or tEFX. Note that proportionality is a very strong criterion in the case of indivisible items, and hence both notions we guarantee are desirable. △ Less

Submitted 22 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

arXiv:2305.03829 [pdf, other]

Improving Image-Based Precision Medicine with Uncertainty-Aware Causal Models

Authors: Joshua Durso-Finley, Jean-Pierre Falet, Raghav Mehta, Douglas L. Arnold, Nick Pawlowski, Tal Arbel

Abstract: Image-based precision medicine aims to personalize treatment decisions based on an individual's unique imaging features so as to improve their clinical outcome. Machine learning frameworks that integrate uncertainty estimation as part of their treatment recommendations would be safer and more reliable. However, little work has been done in adapting uncertainty estimation techniques and validation… ▽ More Image-based precision medicine aims to personalize treatment decisions based on an individual's unique imaging features so as to improve their clinical outcome. Machine learning frameworks that integrate uncertainty estimation as part of their treatment recommendations would be safer and more reliable. However, little work has been done in adapting uncertainty estimation techniques and validation metrics for precision medicine. In this paper, we use Bayesian deep learning for estimating the posterior distribution over factual and counterfactual outcomes on several treatments. This allows for estimating the uncertainty for each treatment option and for the individual treatment effects (ITE) between any two treatments. We train and evaluate this model to predict future new and enlarging T2 lesion counts on a large, multi-center dataset of MR brain images of patients with multiple sclerosis, exposed to several treatments during randomized controlled trials. We evaluate the correlation of the uncertainty estimate with the factual error, and, given the lack of ground truth counterfactual outcomes, demonstrate how uncertainty for the ITE prediction relates to bounds on the ITE error. Lastly, we demonstrate how knowledge of uncertainty could modify clinical decision-making to improve individual patient and clinical trial outcomes. △ Less

Submitted 10 August, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

arXiv:2305.03300 [pdf, other]

LLM-RM at SemEval-2023 Task 2: Multilingual Complex NER using XLM-RoBERTa

Authors: Rahul Mehta, Vasudeva Varma

Abstract: Named Entity Recognition(NER) is a task of recognizing entities at a token level in a sentence. This paper focuses on solving NER tasks in a multilingual setting for complex named entities. Our team, LLM-RM participated in the recently organized SemEval 2023 task, Task 2: MultiCoNER II,Multilingual Complex Named Entity Recognition. We approach the problem by leveraging cross-lingual representation… ▽ More Named Entity Recognition(NER) is a task of recognizing entities at a token level in a sentence. This paper focuses on solving NER tasks in a multilingual setting for complex named entities. Our team, LLM-RM participated in the recently organized SemEval 2023 task, Task 2: MultiCoNER II,Multilingual Complex Named Entity Recognition. We approach the problem by leveraging cross-lingual representation provided by fine-tuning XLM-Roberta base model on datasets of all of the 12 languages provided -- Bangla, Chinese, English, Farsi, French, German, Hindi, Italian, Portuguese, Spanish, Swedish and Ukrainian △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: Submitted to SemEval-2023, The 17th International Workshop on Semantic Evaluation

arXiv:2303.05046 [pdf]

Unsupervised Language agnostic WER Standardization

Authors: Satarupa Guha, Rahul Ambavat, Ankur Gupta, Manish Gupta, Rupeshkumar Mehta

Abstract: Word error rate (WER) is a standard metric for the evaluation of Automated Speech Recognition (ASR) systems. However, WER fails to provide a fair evaluation of human perceived quality in presence of spelling variations, abbreviations, or compound words arising out of agglutination. Multiple spelling variations might be acceptable based on locale/geography, alternative abbreviations, borrowed words… ▽ More Word error rate (WER) is a standard metric for the evaluation of Automated Speech Recognition (ASR) systems. However, WER fails to provide a fair evaluation of human perceived quality in presence of spelling variations, abbreviations, or compound words arising out of agglutination. Multiple spelling variations might be acceptable based on locale/geography, alternative abbreviations, borrowed words, and transliteration of code-mixed words from a foreign language to the target language script. Similarly, in case of agglutination, often times the agglutinated, as well as the split forms, are acceptable. Previous work handled this problem by using manually identified normalization pairs and applying them to both the transcription and the hypothesis before computing WER. In this paper, we propose an automatic WER normalization system consisting of two modules: spelling normalization and segmentation normalization. The proposed system is unsupervised and language agnostic, and therefore scalable. Experiments with ASR on 35K utterances across four languages yielded an average WER reduction of 13.28%. Human judgements of these automatically identified normalization pairs show that our WER-normalized evaluation is highly consistent with the perceived quality of ASR output. △ Less

Submitted 9 March, 2023; originally announced March 2023.

arXiv:2303.03242 [pdf, other]

Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis

Authors: Raghav Mehta, Changjian Shui, Tal Arbel

Abstract: Although deep learning (DL) models have shown great success in many medical image analysis tasks, deployment of the resulting models into real clinical contexts requires: (1) that they exhibit robustness and fairness across different sub-populations, and (2) that the confidence in DL model predictions be accurately expressed in the form of uncertainties. Unfortunately, recent studies have indeed s… ▽ More Although deep learning (DL) models have shown great success in many medical image analysis tasks, deployment of the resulting models into real clinical contexts requires: (1) that they exhibit robustness and fairness across different sub-populations, and (2) that the confidence in DL model predictions be accurately expressed in the form of uncertainties. Unfortunately, recent studies have indeed shown significant biases in DL models across demographic subgroups (e.g., race, sex, age) in the context of medical image analysis, indicating a lack of fairness in the models. Although several methods have been proposed in the ML literature to mitigate a lack of fairness in DL models, they focus entirely on the absolute performance between groups without considering their effect on uncertainty estimation. In this work, we present the first exploration of the effect of popular fairness models on overcoming biases across subgroups in medical image analysis in terms of bottom-line performance, and their effects on uncertainty quantification. We perform extensive experiments on three different clinically relevant tasks: (i) skin lesion classification, (ii) brain tumour segmentation, and (iii) Alzheimer's disease clinical score regression. Our results indicate that popular ML methods, such as data-balancing and distributionally robust optimization, succeed in mitigating fairness issues in terms of the model performances for some of the tasks. However, this can come at the cost of poor uncertainty estimates associated with the model predictions. This tradeoff must be mitigated if fairness models are to be adopted in medical image analysis. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Paper accepted at MIDL 2023

arXiv:2212.06254 [pdf, other]

You Only Need a Good Embeddings Extractor to Fix Spurious Correlations

Authors: Raghav Mehta, Vítor Albiero, Li Chen, Ivan Evtimov, Tamar Glaser, Zhiheng Li, Tal Hassner

Abstract: Spurious correlations in training data often lead to robustness issues since models learn to use them as shortcuts. For example, when predicting whether an object is a cow, a model might learn to rely on its green background, so it would do poorly on a cow on a sandy background. A standard dataset for measuring state-of-the-art on methods mitigating this problem is Waterbirds. The best method (Gro… ▽ More Spurious correlations in training data often lead to robustness issues since models learn to use them as shortcuts. For example, when predicting whether an object is a cow, a model might learn to rely on its green background, so it would do poorly on a cow on a sandy background. A standard dataset for measuring state-of-the-art on methods mitigating this problem is Waterbirds. The best method (Group Distributionally Robust Optimization - GroupDRO) currently achieves 89\% worst group accuracy and standard training from scratch on raw images only gets 72\%. GroupDRO requires training a model in an end-to-end manner with subgroup labels. In this paper, we show that we can achieve up to 90\% accuracy without using any sub-group information in the training set by simply using embeddings from a large pre-trained vision model extractor and training a linear classifier on top of it. With experiments on a wide range of pre-trained models and pre-training datasets, we show that the capacity of the pre-training model and the size of the pre-training dataset matters. Our experiments reveal that high capacity vision transformers perform better compared to high capacity convolutional neural networks, and larger pre-training dataset leads to better worst-group accuracy on the spurious correlation dataset. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: Accepted at ECCV 2022 workshop on Responsible Computer Vision (RCV)

arXiv:2212.05149 [pdf, other]

Stochastic Optimization for Spectral Risk Measures

Authors: Ronak Mehta, Vincent Roulet, Krishna Pillutla, Lang Liu, Zaid Harchaoui

Abstract: Spectral risk objectives - also called $L$-risks - allow for learning systems to interpolate between optimizing average-case performance (as in empirical risk minimization) and worst-case performance on a task. We develop stochastic algorithms to optimize these quantities by characterizing their subdifferential and addressing challenges such as biasedness of subgradient estimates and non-smoothnes… ▽ More Spectral risk objectives - also called $L$-risks - allow for learning systems to interpolate between optimizing average-case performance (as in empirical risk minimization) and worst-case performance on a task. We develop stochastic algorithms to optimize these quantities by characterizing their subdifferential and addressing challenges such as biasedness of subgradient estimates and non-smoothness of the objective. We show theoretically and experimentally that out-of-the-box approaches such as stochastic subgradient and dual averaging are hindered by bias and that our approach outperforms them. △ Less

Submitted 9 December, 2022; originally announced December 2022.

arXiv:2211.15836 [pdf, ps, other]

On the Envy-free Allocation of Chores

Authors: Lang Yin, Ruta Mehta

Abstract: We study the problem of allocating a set of indivisible chores to three agents, among whom two have additive cost functions, in a fair manner. Two fairness notions under consideration are envy-freeness up to any chore (EFX) and a relaxed notion, namely envy-freeness up to transferring any chore (tEFX). In contrast to the case of goods, the case of chores remain relatively unexplored. In particular… ▽ More We study the problem of allocating a set of indivisible chores to three agents, among whom two have additive cost functions, in a fair manner. Two fairness notions under consideration are envy-freeness up to any chore (EFX) and a relaxed notion, namely envy-freeness up to transferring any chore (tEFX). In contrast to the case of goods, the case of chores remain relatively unexplored. In particular, our results constructively prove the existence of a tEFX allocation for three agents if two of them have additive cost functions and the ratio of their highest and lowest costs is bounded by two. In addition, if those two cost functions have identical ordering (IDO) on the costs of chores, then an EFX allocation exists even if the condition on the ratio bound is slightly relaxed. Throughout our entire framework, the third agent is unrestricted besides having a monotone cost function. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.02091 [pdf, ps, other]

Fairness in Federated Learning via Core-Stability

Authors: Bhaskar Ray Chaudhury, Linyi Li, Mintong Kang, Bo Li, Ruta Mehta

Abstract: Federated learning provides an effective paradigm to jointly optimize a model benefited from rich distributed data while protecting data privacy. Nonetheless, the heterogeneity nature of distributed data makes it challenging to define and ensure fairness among local agents. For instance, it is intuitively "unfair" for agents with data of high quality to sacrifice their performance due to other age… ▽ More Federated learning provides an effective paradigm to jointly optimize a model benefited from rich distributed data while protecting data privacy. Nonetheless, the heterogeneity nature of distributed data makes it challenging to define and ensure fairness among local agents. For instance, it is intuitively "unfair" for agents with data of high quality to sacrifice their performance due to other agents with low quality data. Currently popular egalitarian and weighted equity-based fairness measures suffer from the aforementioned pitfall. In this work, we aim to formally represent this problem and address these fairness issues using concepts from co-operative game theory and social choice theory. We model the task of learning a shared predictor in the federated setting as a fair public decision making problem, and then define the notion of core-stable fairness: Given $N$ agents, there is no subset of agents $S$ that can benefit significantly by forming a coalition among themselves based on their utilities $U_N$ and $U_S$ (i.e., $\frac{|S|}{N} U_S \geq U_N$). Core-stable predictors are robust to low quality local data from some agents, and additionally they satisfy Proportionality and Pareto-optimality, two well sought-after fairness and efficiency notions within social choice. We then propose an efficient federated learning protocol CoreFed to optimize a core stable predictor. CoreFed determines a core-stable predictor when the loss functions of the agents are convex. CoreFed also determines approximate core-stable predictors when the loss functions are not convex, like smooth neural networks. We further show the existence of core-stable predictors in more general settings using Kakutani's fixed point theorem. Finally, we empirically validate our analysis on two real-world datasets, and we show that CoreFed achieves higher core-stability fairness than FedAvg while having similar accuracy. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022; code: https://openreview.net/attachment?id=lKULHf7oFDo&name=supplementary_material

arXiv:2210.17398 [pdf, other]

Rethinking Generalization: The Impact of Annotation Style on Medical Image Segmentation

Authors: Brennan Nichyporuk, Jillian Cardinell, Justin Szeto, Raghav Mehta, Jean-Pierre R. Falet, Douglas L. Arnold, Sotirios A. Tsaftaris, Tal Arbel

Abstract: Generalization is an important attribute of machine learning models, particularly for those that are to be deployed in a medical context, where unreliable predictions can have real world consequences. While the failure of models to generalize across datasets is typically attributed to a mismatch in the data distributions, performance gaps are often a consequence of biases in the 'ground-truth' lab… ▽ More Generalization is an important attribute of machine learning models, particularly for those that are to be deployed in a medical context, where unreliable predictions can have real world consequences. While the failure of models to generalize across datasets is typically attributed to a mismatch in the data distributions, performance gaps are often a consequence of biases in the 'ground-truth' label annotations. This is particularly important in the context of medical image segmentation of pathological structures (e.g. lesions), where the annotation process is much more subjective, and affected by a number underlying factors, including the annotation protocol, rater education/experience, and clinical aims, among others. In this paper, we show that modeling annotation biases, rather than ignoring them, poses a promising way of accounting for differences in annotation style across datasets. To this end, we propose a generalized conditioning framework to (1) learn and account for different annotation styles across multiple datasets using a single model, (2) identify similar annotation styles across different datasets in order to permit their effective aggregation, and (3) fine-tune a fully trained model to a new annotation style with just a few samples. Next, we present an image-conditioning approach to model annotation styles that correlate with specific image features, potentially enabling detection biases to be more easily identified. △ Less

Submitted 13 December, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://www.melba-journal.org/papers/2022:029.html

Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

arXiv:2209.07988 [pdf, ps, other]

Prophet Inequalities for Cost Minimization

Authors: Vasilis Livanos, Ruta Mehta

Abstract: Prophet inequalities for rewards maximization are fundamental to optimal stopping theory with extensive applications to mechanism design and online optimization. We study the \emph{cost minimization} counterpart of the classical prophet inequality: a decision maker is facing a sequence of costs $X_1, X_2, \dots, X_n$ drawn from known distributions in an online manner and \emph{must} ``stop'' at so… ▽ More Prophet inequalities for rewards maximization are fundamental to optimal stopping theory with extensive applications to mechanism design and online optimization. We study the \emph{cost minimization} counterpart of the classical prophet inequality: a decision maker is facing a sequence of costs $X_1, X_2, \dots, X_n$ drawn from known distributions in an online manner and \emph{must} ``stop'' at some point and take the last cost seen. The goal is to compete with a ``prophet'' who can see the realizations of all $X_i$'s upfront and always select the minimum, obtaining a cost of $\mathbb{E}[\min_i X_i]$. If the $X_i$'s are not identically distributed, no strategy can achieve a bounded approximation, even for random arrival order and $n = 2$. This leads us to consider the case where the $X_i$'s are independent and identically distributed (I.I.D.). For the I.I.D. case, we show that if the distribution satisfies a mild condition, the optimal stopping strategy achieves a (distribution-dependent) constant-factor approximation to the prophet's cost. Moreover, for MHR distributions, this constant is at most $2$. All our results are tight. We also demonstrate an example distribution that does not satisfy the condition and for which the competitive ratio of any algorithm is infinite. Turning our attention to single-threshold strategies, we design a threshold that achieves a $O\left(polylog{n}\right)$-factor approximation, where the exponent in the logarithmic factor is a distribution-dependent constant, and we show a matching lower bound. Finally, we note that our results can be used to design approximately optimal posted price-style mechanisms for procurement auctions which may be of independent interest. Our techniques utilize the \emph{hazard rate} of the distribution in a novel way, allowing for a fine-grained analysis which could find further applications in prophet inequalities. △ Less

Submitted 23 February, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

Comments: 40 pages

arXiv:2208.00974 [pdf, other]

Information Gain Sampling for Active Learning in Medical Image Classification

Authors: Raghav Mehta, Changjian Shui, Brennan Nichyporuk, Tal Arbel

Abstract: Large, annotated datasets are not widely available in medical image analysis due to the prohibitive time, costs, and challenges associated with labelling large datasets. Unlabelled datasets are easier to obtain, and in many contexts, it would be feasible for an expert to provide labels for a small subset of images. This work presents an information-theoretic active learning framework that guides t… ▽ More Large, annotated datasets are not widely available in medical image analysis due to the prohibitive time, costs, and challenges associated with labelling large datasets. Unlabelled datasets are easier to obtain, and in many contexts, it would be feasible for an expert to provide labels for a small subset of images. This work presents an information-theoretic active learning framework that guides the optimal selection of images from the unlabelled pool to be labeled based on maximizing the expected information gain (EIG) on an evaluation dataset. Experiments are performed on two different medical image classification datasets: multi-class diabetic retinopathy disease scale classification and multi-class skin lesion classification. Results indicate that by adapting EIG to account for class-imbalances, our proposed Adapted Expected Information Gain (AEIG) outperforms several popular baselines including the diversity based CoreSet and uncertainty based maximum entropy sampling. Specifically, AEIG achieves ~95% of overall performance with only 19% of the training data, while other active learning approaches require around 25%. We show that, by careful design choices, our model can be integrated into existing deep learning classifiers. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: Paper accepted at UNSURE 2022 workshop at MICCAI 2022

arXiv:2205.11363 [pdf, other]

Competitive Equilibrium with Chores: Combinatorial Algorithm and Hardness

Authors: Bhaskar Ray Chaudhury, Jugal Garg, Peter McGlaughlin, Ruta Mehta

Abstract: We study the computational complexity of finding a competitive equilibrium (CE) with chores when agents have linear preferences. CE is one of the most preferred mechanisms for allocating a set of items among agents. CE with equal incomes (CEEI), Fisher, and Arrow-Debreu (exchange) are the fundamental economic models to study allocation problems, where CEEI is a special case of Fisher and Fisher is… ▽ More We study the computational complexity of finding a competitive equilibrium (CE) with chores when agents have linear preferences. CE is one of the most preferred mechanisms for allocating a set of items among agents. CE with equal incomes (CEEI), Fisher, and Arrow-Debreu (exchange) are the fundamental economic models to study allocation problems, where CEEI is a special case of Fisher and Fisher is a special case of exchange. When the items are goods (giving utility), the CE set is convex even in the exchange model, facilitating several combinatorial polynomial-time algorithms (starting with the seminal work of Devanur, Papadimitriou, Saberi and Vazirani) for all of these models. In sharp contrast, when the items are chores (giving disutility), the CE set is known to be non-convex and disconnected even in the CEEI model. Further, no combinatorial algorithms or hardness results are known for these models. In this paper, we give two main results for CE with chores: 1) A combinatorial algorithm to compute a $(1-\varepsilon)$-approximate CEEI in time $\tilde{\mathcal{O}}(n^4m^2 / \varepsilon^2)$, where $n$ is the number of agents and $m$ is the number of chores. 2) PPAD-hardness of finding a $(1-1/\mathit{poly}(n))$-approximate CE in the exchange model under a sufficient condition. To the best of our knowledge, these results show the first separation between the CEEI and exchange models when agents have linear preferences, assuming PPAD $\neq $ P. Finally, we show that our new insight implies a straightforward proof of the existence of an allocation that is both envy-free up to one chore (EF1) and Pareto optimal (PO) in the discrete setting when agents have factored bivalued preferences. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: Accepted in EC 2022. The PPAD hardness section also appeared in arXiv:2008.00285

MSC Class: Computer Science

arXiv:2205.07638 [pdf, other]

EFX Allocations: Simplifications and Improvements

Authors: Hannaneh Akrami, Noga Alon, Bhaskar Ray Chaudhury, Jugal Garg, Kurt Mehlhorn, Ruta Mehta

Abstract: The existence of EFX allocations is a fundamental open problem in discrete fair division. Given a set of agents and indivisible goods, the goal is to determine the existence of an allocation where no agent envies another following the removal of any single good from the other agent's bundle. Since the general problem has been illusive, progress is made on two fronts: $(i)$ proving existence when t… ▽ More The existence of EFX allocations is a fundamental open problem in discrete fair division. Given a set of agents and indivisible goods, the goal is to determine the existence of an allocation where no agent envies another following the removal of any single good from the other agent's bundle. Since the general problem has been illusive, progress is made on two fronts: $(i)$ proving existence when the number of agents is small, $(ii)$ proving existence of relaxations of EFX. In this paper, we improve results on both fronts (and simplify in one of the cases). We prove the existence of EFX allocations with three agents, restricting only one agent to have an MMS-feasible valuation function (a strict generalization of nice-cancelable valuation functions introduced by Berger et al. which subsumes additive, budget-additive and unit demand valuation functions). The other agents may have any monotone valuation functions. Our proof technique is significantly simpler and shorter than the proof by Chaudhury et al. on existence of EFX allocations when there are three agents with additive valuation functions and therefore more accessible. Secondly, we consider relaxations of EFX allocations, namely, approximate-EFX allocations and EFX allocations with few unallocated goods (charity). Chaudhury et al. showed the existence of $(1-ε)$-EFX allocation with $O((n/ε)^{\frac{4}{5}})$ charity by establishing a connection to a problem in extremal combinatorics. We improve their result and prove the existence of $(1-ε)$-EFX allocations with $\tilde{O}((n/ ε)^{\frac{1}{2}})$ charity. In fact, some of our techniques can be used to prove improved upper-bounds on a problem in zero-sum combinatorics introduced by Alon and Krivelevich. △ Less

Submitted 23 December, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

arXiv:2205.03987 [pdf]

Methodology to Create Analysis-Naive Holdout Records as well as Train and Test Records for Machine Learning Analyses in Healthcare

Authors: Michele Bennett, Mehdi Nekouei, Armand Prieditis Rajesh Mehta, Ewa Kleczyk, Karin Hayes

Abstract: It is common for researchers to holdout data from a study pool to be used for external validation as well as for future research, and the same desire is true to those using machine learning modeling research. For this discussion, the purpose of the holdout sample it is preserve data for research studies that will be analysis-naive and randomly selected from the full dataset. Analysis-naive are rec… ▽ More It is common for researchers to holdout data from a study pool to be used for external validation as well as for future research, and the same desire is true to those using machine learning modeling research. For this discussion, the purpose of the holdout sample it is preserve data for research studies that will be analysis-naive and randomly selected from the full dataset. Analysis-naive are records that are not used for testing or training machine learning (ML) models and records that do not participate in any aspect of the current machine learning study. The methodology suggested for creating holdouts is a modification of k-fold cross validation, which takes into account randomization and efficiently allows a three-way split (holdout, test and training) as part of the method without forcing. The paper also provides a working example using set of automated functions in Python and some scenarios for applicability in healthcare. △ Less

Submitted 8 May, 2022; originally announced May 2022.

Comments: 11 pages, 1 figure

arXiv:2204.07655 [pdf, other]

Deep Unlearning via Randomized Conditionally Independent Hessians

Authors: Ronak Mehta, Sourav Pal, Vikas Singh, Sathya N. Ravi

Abstract: Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deleting the closest original sample can be effective. Bu… ▽ More Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deleting the closest original sample can be effective. But this idea is inapplicable to models which learn richer representations. Recent ideas leveraging optimization-based updates scale poorly with the model dimension d, due to inverting the Hessian of the loss function. We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level. Our approach completely avoids the need to invert a (possibly) huge matrix. By utilizing a Markov blanket selection, we premise that L-CODEC is also suitable for deep unlearning, as well as other applications in vision. Compared to alternatives, L-CODEC makes approximate unlearning possible in settings that would otherwise be infeasible, including vision models used for face recognition, person re-identification and NLP models that may require unlearning samples identified for exclusion. Code can be found at https://github.com/vsingh-group/LCODEC-deep-unlearning/ △ Less

Submitted 13 July, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: CVPR 2022. Supplement appended to end of main paper (total 15 pages). Ronak Mehta and Sourav Pal equal contribution

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10422-10431

arXiv:2204.00348 [pdf, other]

WavFT: Acoustic model finetuning with labelled and unlabelled data

Authors: Utkarsh Chauhan, Vikas Joshi, Rupesh R. Mehta

Abstract: Unsupervised and self-supervised learning methods have leveraged unlabelled data to improve the pretrained models. However, these methods need significantly large amount of unlabelled data and the computational cost of training models with such large amount of data can be prohibitively high. We address this issue by using unlabelled data during finetuning, instead of pretraining. We propose acoust… ▽ More Unsupervised and self-supervised learning methods have leveraged unlabelled data to improve the pretrained models. However, these methods need significantly large amount of unlabelled data and the computational cost of training models with such large amount of data can be prohibitively high. We address this issue by using unlabelled data during finetuning, instead of pretraining. We propose acoustic model finetuning (FT) using labelled and unlabelled data. The model is jointly trained to learn representations to classify senones, as well as learn contextual acoustic representations. Our training objective is a combination of cross entropy loss, suitable for classification task, and contrastive loss, suitable to learn acoustic representations. The proposed approach outperforms conventional finetuning with 11.2% and 9.19% word error rate relative (WERR) reduction on Gujarati and Bengali languages respectively. △ Less

Submitted 1 April, 2022; originally announced April 2022.

arXiv:2202.09478 [pdf, other]

Graph Reparameterizations for Enabling 1000+ Monte Carlo Iterations in Bayesian Deep Neural Networks

Authors: Jurijs Nazarovs, Ronak R. Mehta, Vishnu Suresh Lokhande, Vikas Singh

Abstract: Uncertainty estimation in deep models is essential in many real-world applications and has benefited from developments over the last several years. Recent evidence suggests that existing solutions dependent on simple Gaussian formulations may not be sufficient. However, moving to other distributions necessitates Monte Carlo (MC) sampling to estimate quantities such as the KL divergence: it could b… ▽ More Uncertainty estimation in deep models is essential in many real-world applications and has benefited from developments over the last several years. Recent evidence suggests that existing solutions dependent on simple Gaussian formulations may not be sufficient. However, moving to other distributions necessitates Monte Carlo (MC) sampling to estimate quantities such as the KL divergence: it could be expensive and scales poorly as the dimensions of both the input data and the model grow. This is directly related to the structure of the computation graph, which can grow linearly as a function of the number of MC samples needed. Here, we construct a framework to describe these computation graphs, and identify probability families where the graph size can be independent or only weakly dependent on the number of MC samples. These families correspond directly to large classes of distributions. Empirically, we can run a much larger number of iterations for MC approximations for larger architectures used in computer vision with gains in performance measured in confident accuracy, stability of training, memory and training time. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Journal ref: Proceedings of Machine Learning Research; PMLR 161:118-128, 2021

arXiv:2202.02672 [pdf, ps, other]

(Almost) Envy-Free, Proportional and Efficient Allocations of an Indivisible Mixed Manna

Authors: Vasilis Livanos, Ruta Mehta, Aniket Murhekar

Abstract: We study the problem of finding fair and efficient allocations of a set of indivisible items to a set of agents, where each item may be a good (positively valued) for some agents and a bad (negatively valued) for others, i.e., a mixed manna. As fairness notions, we consider arguably the strongest possible relaxations of envy-freeness and proportionality, namely envy-free up to any item (EFX and EF… ▽ More We study the problem of finding fair and efficient allocations of a set of indivisible items to a set of agents, where each item may be a good (positively valued) for some agents and a bad (negatively valued) for others, i.e., a mixed manna. As fairness notions, we consider arguably the strongest possible relaxations of envy-freeness and proportionality, namely envy-free up to any item (EFX and EFX$_0$), and proportional up to the maximin good or any bad (PropMX and PropMX$_0$). Our efficiency notion is Pareto-optimality (PO). We study two types of instances: (i) Separable, where the item set can be partitioned into goods and bads, and (ii) Restricted mixed goods (RMG), where for each item $j$, every agent has either a non-positive value for $j$, or values $j$ at the same $v_j>0$. We obtain polynomial-time algorithms for the following: (i) Separable instances: PropMX$_0$ allocation. (ii) RMG instances: Let pure bads be the set of items that everyone values negatively. - PropMX allocation for general pure bads. - EFX+PropMX allocation for identically-ordered pure bads. - EFX+PropMX+PO allocation for identical pure bads. Finally, if the RMG instances are further restricted to binary mixed goods where all the $v_j$'s are the same, we strengthen the results to guarantee EFX$_0$ and PropMX$_0$ respectively. △ Less

Submitted 5 February, 2022; originally announced February 2022.

Comments: 23 pages

arXiv:2201.02469 [pdf]

Similarities and Differences between Machine Learning and Traditional Advanced Statistical Modeling in Healthcare Analytics

Authors: Michele Bennett, Karin Hayes, Ewa J. Kleczyk, Rajesh Mehta

Abstract: Data scientists and statisticians are often at odds when determining the best approach, machine learning or statistical modeling, to solve an analytics challenge. However, machine learning and statistical modeling are more cousins than adversaries on different sides of an analysis battleground. Choosing between the two approaches or in some cases using both is based on the problem to be solved and… ▽ More Data scientists and statisticians are often at odds when determining the best approach, machine learning or statistical modeling, to solve an analytics challenge. However, machine learning and statistical modeling are more cousins than adversaries on different sides of an analysis battleground. Choosing between the two approaches or in some cases using both is based on the problem to be solved and outcomes required as well as the data available for use and circumstances of the analysis. Machine learning and statistical modeling are complementary, based on similar mathematical principles, but simply using different tools in an overall analytics knowledge base. Determining the predominant approach should be based on the problem to be solved as well as empirical evidence, such as size and completeness of the data, number of variables, assumptions or lack thereof, and expected outcomes such as predictions or causality. Good analysts and data scientists should be well versed in both techniques and their proper application, thereby using the right tool for the right project to achieve the desired results. △ Less

Submitted 7 January, 2022; originally announced January 2022.

Comments: 16 pages, 2 figures

arXiv:2112.10074 [pdf, other]

doi 10.59275/j.melba.2022-354b

QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results

Authors: Raghav Mehta, Angelos Filos, Ujjwal Baid, Chiharu Sako, Richard McKinley, Michael Rebsamen, Katrin Datwyler, Raphael Meier, Piotr Radojewski, Gowtham Krishnan Murugesan, Sahil Nalawade, Chandan Ganesh, Ben Wagner, Fang F. Yu, Baowei Fei, Ananth J. Madhuranthakam, Joseph A. Maldjian, Laura Daza, Catalina Gomez, Pablo Arbelaez, Chengliang Dai, Shuo Wang, Hadrien Reynaud, Yuan-han Mo, Elsa Angelini , et al. (67 additional authors not shown)

Abstract: Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying… ▽ More Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Developing scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS. △ Less

Submitted 23 August, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA): https://www.melba-journal.org/papers/2022:026.html

Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

arXiv:2111.04851 [pdf, other]

doi 10.3389/frobt.2022.888261

In-Situ Sensing and Dynamics Predictions for Electrothermally-Actuated Soft Robot Limbs

Authors: Andrew P. Sabelhaus, Rohan K. Mehta, Anthony T. Wertz, Carmel Majidi

Abstract: Untethered soft robots that locomote using electrothermally-responsive materials like shape memory alloy (SMA) face challenging design constraints for sensing actuator states. At the same time, modeling of actuator behaviors faces steep challenges, even with available sensor data, due to complex electrical-thermal-mechanical interactions and hysteresis. This article proposes a framework for in-sit… ▽ More Untethered soft robots that locomote using electrothermally-responsive materials like shape memory alloy (SMA) face challenging design constraints for sensing actuator states. At the same time, modeling of actuator behaviors faces steep challenges, even with available sensor data, due to complex electrical-thermal-mechanical interactions and hysteresis. This article proposes a framework for in-situ sensing and dynamics modeling of actuator states, particularly temperature of SMA wires, which is used to predict robot motions. A planar soft limb is developed, actuated by a pair of SMA coils, that includes compact and robust sensors for temperature and angular deflection. Data from these sensors are used to train a neural network based on the long short-term memory (LSTM) architecture to model both unidirectional (single SMA) and bidirectional (both SMAs) motion. Predictions from the model demonstrate that data from the temperature sensor, combined with control inputs, allow for dynamics predictions over extraordinarily long open-loop timescales (10 minutes) with little drift. Prediction errors are on the order of the soft deflection sensor's accuracy. This architecture allows for compact designs of electrothermally-actuated soft robots that include sensing sufficient for motion predictions, helping to bring these robots into practical application. △ Less

Submitted 7 March, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: 17 pages, 8 figures

Journal ref: Frontiers in Robotics and AI, 17 May 2022

arXiv:2111.01561 [pdf, other]

Sub-cortical structure segmentation database for young population

Authors: Jayanthi Sivaswamy, Alphin J Thottupattu, Mythri V, Raghav Mehta, R Sheelakumari, Chandrasekharan Kesavadas

Abstract: Segmentation of sub-cortical structures from MRI scans is of interest in many neurological diagnosis. Since this is a laborious task machine learning and specifically deep learning (DL) methods have become explored. The structural complexity of the brain demands a large, high quality segmentation dataset to develop good DL-based solutions for sub-cortical structure segmentation. Towards this, we a… ▽ More Segmentation of sub-cortical structures from MRI scans is of interest in many neurological diagnosis. Since this is a laborious task machine learning and specifically deep learning (DL) methods have become explored. The structural complexity of the brain demands a large, high quality segmentation dataset to develop good DL-based solutions for sub-cortical structure segmentation. Towards this, we are releasing a set of 114, 1.5 Tesla, T1 MRI scans with manual delineations for 14 sub-cortical structures. The scans in the dataset were acquired from healthy young (21-30 years) subjects ( 58 male and 56 female) and all the structures are manually delineated by experienced radiology experts. Segmentation experiments have been conducted with this dataset and results demonstrate that accurate results can be obtained with deep-learning methods. Our sub-cortical structure segmentation dataset, Indian Brain Segmentation Dataset (IBSD) is made openly available at \url{https://doi.org/10.5281/zenodo.5656776}. △ Less

Submitted 9 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

arXiv:2109.14501 [pdf, other]

Towards a theory of out-of-distribution learning

Authors: Jayanta Dey, Ali Geisa, Ronak Mehta, Tyler M. Tomita, Hayden S. Helm, Haoyin Xu, Eric Eaton, Jeffery Dick, Carey E. Priebe, Joshua T. Vogelstein

Abstract: Learning is a process wherein a learning agent enhances its performance through exposure of experience or data. Throughout this journey, the agent may encounter diverse learning environments. For example, data may be presented to the leaner all at once, in multiple batches, or sequentially. Furthermore, the distribution of each data sample could be either identical and independent (iid) or non-iid… ▽ More Learning is a process wherein a learning agent enhances its performance through exposure of experience or data. Throughout this journey, the agent may encounter diverse learning environments. For example, data may be presented to the leaner all at once, in multiple batches, or sequentially. Furthermore, the distribution of each data sample could be either identical and independent (iid) or non-iid. Additionally, there may exist computational and space constraints for the deployment of the learning algorithms. The complexity of a learning task can vary significantly, depending on the learning setup and the constraints imposed upon it. However, it is worth noting that the current literature lacks formal definitions for many of the in-distribution and out-of-distribution learning paradigms. Establishing proper and universally agreed-upon definitions for these learning setups is essential for thoroughly exploring the evolution of ideas across different learning scenarios and deriving generalized mathematical bounds for these learners. In this paper, we aim to address this issue by proposing a chronological approach to defining different learning tasks using the provably approximately correct (PAC) learning framework. We will start with in-distribution learning and progress to recently proposed lifelong or continual learning. We employ consistent terminology and notation to demonstrate how each of these learning frameworks represents a specific instance of a broader, more generalized concept of learnability. Our hope is that this work will inspire a universally agreed-upon approach to quantifying different types of learning, fostering greater understanding and progress in the field. △ Less

Submitted 7 June, 2024; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2108.00713 [pdf, other]

Cohort Bias Adaptation in Aggregated Datasets for Lesion Segmentation

Authors: Brennan Nichyporuk, Jillian Cardinell, Justin Szeto, Raghav Mehta, Sotirios Tsaftaris, Douglas L. Arnold, Tal Arbel

Abstract: Many automatic machine learning models developed for focal pathology (e.g. lesions, tumours) detection and segmentation perform well, but do not generalize as well to new patient cohorts, impeding their widespread adoption into real clinical contexts. One strategy to create a more diverse, generalizable training set is to naively pool datasets from different cohorts. Surprisingly, training on this… ▽ More Many automatic machine learning models developed for focal pathology (e.g. lesions, tumours) detection and segmentation perform well, but do not generalize as well to new patient cohorts, impeding their widespread adoption into real clinical contexts. One strategy to create a more diverse, generalizable training set is to naively pool datasets from different cohorts. Surprisingly, training on this \it{big data} does not necessarily increase, and may even reduce, overall performance and model generalizability, due to the existence of cohort biases that affect label distributions. In this paper, we propose a generalized affine conditioning framework to learn and account for cohort biases across multi-source datasets, which we call Source-Conditioned Instance Normalization (SCIN). Through extensive experimentation on three different, large scale, multi-scanner, multi-centre Multiple Sclerosis (MS) clinical trial MRI datasets, we show that our cohort bias adaptation method (1) improves performance of the network on pooled datasets relative to naively pooling datasets and (2) can quickly adapt to a new cohort by fine-tuning the instance normalization parameters, thus learning the new cohort bias with only 10 labelled samples. △ Less

Submitted 18 May, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: Accepted at DART 2021

arXiv:2107.06649 [pdf, other]

Polynomial Time Algorithms to Find an Approximate Competitive Equilibrium for Chores

Authors: Shant Boodaghians, Bhaskar Ray Chaudhury, Ruta Mehta

Abstract: Competitive equilibrium with equal income (CEEI) is considered one of the best mechanisms to allocate a set of items among agents fairly and efficiently. In this paper, we study the computation of CEEI when items are chores that are disliked (negatively valued) by agents, under 1-homogeneous and concave utility functions which includes linear functions as a subcase. It is well-known that, even wit… ▽ More Competitive equilibrium with equal income (CEEI) is considered one of the best mechanisms to allocate a set of items among agents fairly and efficiently. In this paper, we study the computation of CEEI when items are chores that are disliked (negatively valued) by agents, under 1-homogeneous and concave utility functions which includes linear functions as a subcase. It is well-known that, even with linear utilities, the set of CEEI may be non-convex and disconnected, and the problem is PPAD-hard in the more general exchange model. In contrast to these negative results, we design FPTAS: A polynomial-time algorithm to compute $ε$-approximate CEEI where the running-time depends polynomially on $1/ε$. Our algorithm relies on the recent characterization due to Bogomolnaia et al.~(2017) of the CEEI set as exactly the KKT points of a non-convex minimization problem that have all coordinates non-zero. Due to this non-zero constraint, naive gradient-based methods fail to find the desired local minima as they are attracted towards zero. We develop an exterior-point method that alternates between guessing non-zero KKT points and maximizing the objective along supporting hyperplanes at these points. We show that this procedure must converge quickly to an approximate KKT point which then can be mapped to an approximate CEEI; this exterior point method may be of independent interest. When utility functions are linear, we give explicit procedures for finding the exact iterates, and as a result show that a stronger form of approximate CEEI can be found in polynomial time. Finally, we note that our algorithm extends to the setting of un-equal incomes (CE), and to mixed manna with linear utilities where each agent may like (positively value) some items and dislike (negatively value) others. △ Less

Submitted 17 July, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

arXiv:2103.16617 [pdf, other]

HAD-Net: A Hierarchical Adversarial Knowledge Distillation Network for Improved Enhanced Tumour Segmentation Without Post-Contrast Images

Authors: Saverio Vadacchino, Raghav Mehta, Nazanin Mohammadi Sepahvand, Brennan Nichyporuk, James J. Clark, Tal Arbel

Abstract: Segmentation of enhancing tumours or lesions from MRI is important for detecting new disease activity in many clinical contexts. However, accurate segmentation requires the inclusion of medical images (e.g., T1 post contrast MRI) acquired after injecting patients with a contrast agent (e.g., Gadolinium), a process no longer thought to be safe. Although a number of modality-agnostic segmentation ne… ▽ More Segmentation of enhancing tumours or lesions from MRI is important for detecting new disease activity in many clinical contexts. However, accurate segmentation requires the inclusion of medical images (e.g., T1 post contrast MRI) acquired after injecting patients with a contrast agent (e.g., Gadolinium), a process no longer thought to be safe. Although a number of modality-agnostic segmentation networks have been developed over the past few years, they have been met with limited success in the context of enhancing pathology segmentation. In this work, we present HAD-Net, a novel offline adversarial knowledge distillation (KD) technique, whereby a pre-trained teacher segmentation network, with access to all MRI sequences, teaches a student network, via hierarchical adversarial training, to better overcome the large domain shift presented when crucial images are absent during inference. In particular, we apply HAD-Net to the challenging task of enhancing tumour segmentation when access to post-contrast imaging is not available. The proposed network is trained and tested on the BraTS 2019 brain tumour segmentation challenge dataset, where it achieves performance improvements in the ranges of 16% - 26% over (a) recent modality-agnostic segmentation methods (U-HeMIS, U-HVED), (b) KD-Net adapted to this problem, (c) the pre-trained student network and (d) a non-hierarchical version of the network (AD-Net), in terms of Dice scores for enhancing tumour (ET). The network also shows improvements in tumour core (TC) Dice scores. Finally, the network outperforms both the baseline student network and AD-Net in terms of uncertainty quantification for enhancing tumour segmentation based on the BraTs 2019 uncertainty challenge metrics. Our code is publicly available at: https://github.com/SaverioVad/HAD_Net △ Less

Submitted 12 May, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: Accepted at Medical Imaging with Deep Learning (MIDL) 2021

arXiv:2103.01628 [pdf, other]

Improving EFX Guarantees through Rainbow Cycle Number

Authors: Bhaskar Ray Chaudhury, Jugal Garg, Kurt Mehlhorn, Ruta Mehta, Pranabendu Misra

Abstract: We study the problem of fairly allocating a set of indivisible goods among $n$ agents with additive valuations. Envy-freeness up to any good (EFX) is arguably the most compelling fairness notion in this context. However, the existence of EFX allocations has not been settled and is one of the most important problems in fair division. Towards resolving this problem, many impressive results show the… ▽ More We study the problem of fairly allocating a set of indivisible goods among $n$ agents with additive valuations. Envy-freeness up to any good (EFX) is arguably the most compelling fairness notion in this context. However, the existence of EFX allocations has not been settled and is one of the most important problems in fair division. Towards resolving this problem, many impressive results show the existence of its relaxations, e.g., the existence of $0.618$-EFX allocations, and the existence of EFX at most $n-1$ unallocated goods. The latter result was recently improved for three agents, in which the two unallocated goods are allocated through an involved procedure. Reducing the number of unallocated goods for arbitrary number of agents is a systematic way to settle the big question. In this paper, we develop a new approach, and show that for every $\varepsilon \in (0,1/2]$, there always exists a $(1-\varepsilon)$-EFX allocation with sublinear number of unallocated goods and high Nash welfare. For this, we reduce the EFX problem to a novel problem in extremal graph theory. We introduce the notion of rainbow cycle number $R(\cdot)$. For all $d \in \mathbb{N}$, $R(d)$ is the largest $k$ such that there exists a $k$-partite digraph $G =(\cup_{i \in [k]} V_i, E)$, in which 1) each part has at most $d$ vertices, i.e., $\lvert V_i \rvert \leq d$ for all $i \in [k]$, 2) for any two parts $V_i$ and $V_j$, each vertex in $V_i$ has an incoming edge from some vertex in $V_j$ and vice-versa, and 3) there exists no cycle in $G$ that contains at most one vertex from each part. We show that any upper bound on $R(d)$ directly translates to a sublinear bound on the number of unallocated goods. We establish a polynomial upper bound on $R(d)$, yielding our main result. Furthermore, our approach is constructive, which also gives a polynomial-time algorithm for finding such an allocation. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2011.06557 [pdf, other]

A partition-based similarity for classification distributions

Authors: Hayden S. Helm, Ronak D. Mehta, Brandon Duderstadt, Weiwei Yang, Christoper M. White, Ali Geisa, Joshua T. Vogelstein, Carey E. Priebe

Abstract: Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a… ▽ More Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a source distribution performs when applied to inference related to a target distribution. The definition of task similarity allows for natural definitions of adversarial and orthogonal distributions. We highlight limiting properties of representations induced by (universally) consistent decision rules and demonstrate in simulation that an empirical estimate of task similarity is a function of the decision rule deployed for inference. We demonstrate that for a given target distribution, both transfer efficiency and semantic similarity of candidate source distributions correlate with empirical task similarity. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2008.05086 [pdf, other]

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Authors: Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li

Abstract: Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language. TL can be applied to end-to-end (E2E) ASR system such as recurrent neural network transducer (RNN-T) models, by initializing the encoder and/or prediction network of the target language with the pre-trained models from source language. In… ▽ More Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language. TL can be applied to end-to-end (E2E) ASR system such as recurrent neural network transducer (RNN-T) models, by initializing the encoder and/or prediction network of the target language with the pre-trained models from source language. In the hybrid ASR system, transfer learning is typically done by initializing the target language acoustic model (AM) with source language AM. Several transfer learning strategies exist in the case of the RNN-T framework, depending upon the choice of the initialization model for encoder and prediction networks. This paper presents a comparative study of four different TL methods for RNN-T framework. We show 17% relative word error rate reduction with different TL methods over randomly initialized RNN-T model. We also study the impact of TL with varying amount of training data ranging from 50 hours to 1000 hours and show the efficacy of TL for languages with small amount of training data. △ Less

Submitted 17 August, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

arXiv:2008.02753 [pdf, other]

Competitive Allocation of a Mixed Manna

Authors: Bhaskar Ray Chaudhury, Jugal Garg, Peter McGlaughlin, Ruta Mehta

Abstract: We study the fair division problem of allocating a mixed manna under additively separable piecewise linear concave (SPLC) utilities. A mixed manna contains goods that everyone likes and bads that everyone dislikes, as well as items that some like and others dislike. The seminal work of Bogomolnaia et al. [Econometrica'17] argue why allocating a mixed manna is genuinely more complicated than a good… ▽ More We study the fair division problem of allocating a mixed manna under additively separable piecewise linear concave (SPLC) utilities. A mixed manna contains goods that everyone likes and bads that everyone dislikes, as well as items that some like and others dislike. The seminal work of Bogomolnaia et al. [Econometrica'17] argue why allocating a mixed manna is genuinely more complicated than a good or a bad manna, and why competitive equilibrium is the best mechanism. They also provide the existence of equilibrium and establish its peculiar properties (e.g., non-convex and disconnected set of equilibria even under linear utilities), but leave the problem of computing an equilibrium open. This problem remained unresolved even for only bad manna under linear utilities. Our main result is a simplex-like algorithm based on Lemke's scheme for computing a competitive allocation of a mixed manna under SPLC utilities, a strict generalization of linear. Experimental results on randomly generated instances suggest that our algorithm will be fast in practice. The problem is known to be PPAD-hard for the case of good manna, and we also show a similar result for the case of bad manna. Given these PPAD-hardness results, designing such an algorithm is the only non-brute-force (non-enumerative) option known, e.g., the classic Lemke-Howson algorithm (1964) for computing a Nash equilibrium in a 2-player game is still one of the most widely used algorithms in practice. Our algorithm also yields several new structural properties as simple corollaries. We obtain a (constructive) proof of existence for a far more general setting, membership of the problem in PPAD, rational-valued solution, and odd number of solutions property. The last property also settles the conjecture of Bogomolnaia et al. in the affirmative. △ Less

Submitted 6 August, 2020; originally announced August 2020.

arXiv:2008.00285 [pdf, other]

Dividing Bads is Harder than Dividing Goods: On the Complexity of Fair and Efficient Division of Chores

Authors: Bhaskar Ray Chaudhury, Jugal Garg, Peter McGlaughlin, Ruta Mehta

Abstract: We study the chore division problem where a set of agents needs to divide a set of chores (bads) among themselves fairly and efficiently. We assume that agents have linear disutility (cost) functions. Like for the case of goods, competitive division is known to be arguably the best mechanism for the bads as well. However, unlike goods, there are multiple competitive divisions with very different d… ▽ More We study the chore division problem where a set of agents needs to divide a set of chores (bads) among themselves fairly and efficiently. We assume that agents have linear disutility (cost) functions. Like for the case of goods, competitive division is known to be arguably the best mechanism for the bads as well. However, unlike goods, there are multiple competitive divisions with very different disutility value profiles in bads. Although all competitive divisions satisfy the standard notions of fairness and efficiency, some divisions are significantly fairer and efficient than the others. This raises two important natural questions: Does there exist a competitive division in which no agent is assigned a chore that she hugely dislikes? Are there simple sufficient conditions for the existence and polynomial-time algorithms assuming them? We investigate both these questions in this paper. We show that the first problem is strongly NP-hard. Further, we derive a simple sufficient condition for the existence, and we show that finding a competitive division is PPAD-hard assuming the condition. These results are in sharp contrast to the case of goods where both problems are strongly polynomial-time solvable. To the best of our knowledge, these are the first hardness results for the chore division problem, and, in fact, for any economic model under linear preferences. △ Less

Submitted 1 August, 2020; originally announced August 2020.

MSC Class: Computer Science

arXiv:2007.09133 [pdf, ps, other]

Indivisible Mixed Manna: On the Computability of MMS + PO Allocations

Authors: Rucha Kulkarni, Ruta Mehta, Setareh Taki

Abstract: In this paper we initiate the study of finding fair and efficient allocations of an indivisible mixed manna: Divide m indivisible items among n agents under the fairness notion of maximin share (MMS) and the efficiency notion of Pareto optimality (PO). A mixed manna allows an item to be a good for some agents and a chore for others. The problem of finding $α$-MMS allocation for the (near) best… ▽ More In this paper we initiate the study of finding fair and efficient allocations of an indivisible mixed manna: Divide m indivisible items among n agents under the fairness notion of maximin share (MMS) and the efficiency notion of Pareto optimality (PO). A mixed manna allows an item to be a good for some agents and a chore for others. The problem of finding $α$-MMS allocation for the (near) best $α\in(0,1]$ for which it exists, remains unresolved even for a goods manna with constantly many agents, while the problem of finding $α$-MMS+PO allocation is unexplored for any $α\in(0,1]$. We make significant progress on the above questions for a mixed manna. First, we show that for any $α>0$, an $α$-MMS allocation may not always exist, thus ruling out solving the problem for a fixed $α$. Second, towards computing $α$-MMS+PO allocation for the best possible $α$, we obtain a dichotomous result: We derive two conditions and show that the problem is tractable under these two conditions, while dropping either renders the problem intractable. The two conditions are: (i) number of agents is a constant, and (ii) for every agent, her absolute value for all the items is at least a constant factor of her total (absolute) value for all the goods or all the chores. In particular, first, for instances satisfying (i) and (ii) we design a PTAS - an efficient algorithm to find an $(α-ε)$-MMS and $γ$-PO allocation when given $ε,γ>0$, for the highest possible $α\in(0,1]$. Second, we show that if either condition is not satisfied then finding an $α$-MMS allocation for any $α\in(0,1]$ is NP-hard, even when a solution exists for $α=1$. To the best of our knowledge, ours is the first algorithm that ensures both approximate MMS and PO guarantees. △ Less

Submitted 5 April, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

arXiv:2005.14262 [pdf, other]

Uncertainty Evaluation Metric for Brain Tumour Segmentation

Authors: Raghav Mehta, Angelos Filos, Yarin Gal, Tal Arbel

Abstract: In this paper, we develop a metric designed to assess and rank uncertainty measures for the task of brain tumour sub-tissue segmentation in the BraTS 2019 sub-challenge on uncertainty quantification. The metric is designed to: (1) reward uncertainty measures where high confidence is assigned to correct assertions, and where incorrect assertions are assigned low confidence and (2) penalize measures… ▽ More In this paper, we develop a metric designed to assess and rank uncertainty measures for the task of brain tumour sub-tissue segmentation in the BraTS 2019 sub-challenge on uncertainty quantification. The metric is designed to: (1) reward uncertainty measures where high confidence is assigned to correct assertions, and where incorrect assertions are assigned low confidence and (2) penalize measures that have higher percentages of under-confident correct assertions. Here, the workings of the components of the metric are explored based on a number of popular uncertainty measures evaluated on the BraTS 2019 dataset. △ Less

Submitted 28 May, 2020; originally announced May 2020.

Report number: MIDL/2019/ExtendedAbstract/H-PvDNIex

arXiv:2005.10919 [pdf, other]

Correlated Mixed Membership Modeling of Somatic Mutations

Authors: Rahul Mehta, Muge Karaman

Abstract: Recent studies of cancer somatic mutation profiles seek to identify mutations for targeted therapy in personalized medicine. Analysis of profiles, however, is not trivial, as each profile is heterogeneous and there are multiple confounding factors that influence the cause-and-effect relationships between cancer genes such as cancer (sub)type, biological processes, total number of mutations, and no… ▽ More Recent studies of cancer somatic mutation profiles seek to identify mutations for targeted therapy in personalized medicine. Analysis of profiles, however, is not trivial, as each profile is heterogeneous and there are multiple confounding factors that influence the cause-and-effect relationships between cancer genes such as cancer (sub)type, biological processes, total number of mutations, and non-linear mutation interactions. Moreover, cancer is biologically redundant, i.e., distinct mutations can result in the alteration of similar biological processes, so it is important to identify all possible combinatorial sets of mutations for effective patient treatment. To model this phenomena, we propose the correlated zero-inflated negative binomial process to infer the inherent structure of somatic mutation profiles through latent representations. This stochastic process takes into account different, yet correlated, co-occurring mutations using profile-specific negative binomial dispersion parameters that are mixed with a correlated beta-Bernoulli process and a probability parameter to model profile heterogeneity. These model parameters are inferred by iterative optimization via amortized and stochastic variational inference using the Pan Cancer dataset from The Cancer Genomic Archive (TCGA). By examining the the latent space, we identify biologically relevant correlations between somatic mutations. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: To be published in IJCNN 2020

arXiv:2005.06511 [pdf, other]

Fair and Efficient Allocations under Subadditive Valuations

Authors: Bhaskar Ray Chaudhury, Jugal Garg, Ruta Mehta

Abstract: We study the problem of allocating a set of indivisible goods among agents with subadditive valuations in a fair and efficient manner. Envy-Freeness up to any good (EFX) is the most compelling notion of fairness in the context of indivisible goods. Although the existence of EFX is not known beyond the simple case of two agents with subadditive valuations, some good approximations of EFX are known… ▽ More We study the problem of allocating a set of indivisible goods among agents with subadditive valuations in a fair and efficient manner. Envy-Freeness up to any good (EFX) is the most compelling notion of fairness in the context of indivisible goods. Although the existence of EFX is not known beyond the simple case of two agents with subadditive valuations, some good approximations of EFX are known to exist, namely $\tfrac{1}{2}$-EFX allocation and EFX allocations with bounded charity. Nash welfare (the geometric mean of agents' valuations) is one of the most commonly used measures of efficiency. In case of additive valuations, an allocation that maximizes Nash welfare also satisfies fairness properties like Envy-Free up to one good (EF1). Although there is substantial work on approximating Nash welfare when agents have additive valuations, very little is known when agents have subadditive valuations. In this paper, we design a polynomial-time algorithm that outputs an allocation that satisfies either of the two approximations of EFX as well as achieves an $\mathcal{O}(n)$ approximation to the Nash welfare. Our result also improves the current best-known approximation of $\mathcal{O}(n \log n)$ and $\mathcal{O}(m)$ to Nash welfare when agents have submodular and subadditive valuations, respectively. Furthermore, our technique also gives an $\mathcal{O}(n)$ approximation to a family of welfare measures, $p$-mean of valuations for $p\in (-\infty, 1]$, thereby also matching asymptotically the current best known approximation ratio for special cases like $p =-\infty$ while also retaining the fairness properties. △ Less

Submitted 16 August, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

MSC Class: Computer Science

arXiv:2004.12908 [pdf, other]

A Simple Lifelong Learning Approach

Authors: Joshua T. Vogelstein, Jayanta Dey, Hayden S. Helm, Will LeVine, Ronak D. Mehta, Tyler M. Tomita, Haoyin Xu, Ali Geisa, Qingyang Wang, Gido M. van de Ven, Chenyu Gao, Weiwei Yang, Bryan Tower, Jonathan Larson, Christopher M. White, Carey E. Priebe

Abstract: In lifelong learning, data are used to improve performance not only on the present task, but also on past and future (unencountered) tasks. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain perf… ▽ More In lifelong learning, data are used to improve performance not only on the present task, but also on past and future (unencountered) tasks. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain performance on old tasks given new tasks. But striving to avoid forgetting sets the goal unnecessarily low. The goal of lifelong learning should be to use data to improve performance on both future tasks (forward transfer) and past tasks (backward transfer). In this paper, we show that a simple approach -- representation ensembling -- demonstrates both forward and backward transfer in a variety of simulated and benchmark data scenarios, including tabular, vision (CIFAR-100, 5-dataset, Split Mini-Imagenet, and Food1k), and speech (spoken digit), in contrast to various reference algorithms, which typically failed to transfer either forward or backward, or both. Moreover, our proposed approach can flexibly operate with or without a computational budget. △ Less

Submitted 11 June, 2024; v1 submitted 27 April, 2020; originally announced April 2020.

Showing 1–50 of 102 results for author: Mehta, R