-
Amazing Things Come From Having Many Good Models
Authors:
Cynthia Rudin,
Chudi Zhong,
Lesia Semenova,
Margo Seltzer,
Ronald Parr,
Jiachang Liu,
Srikar Katta,
Jon Donnelly,
Harry Chen,
Zachery Boner
Abstract:
The Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, this perspective piece proposes reshaping the way we think about machine learning, particularly for…
▽ More
The Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, this perspective piece proposes reshaping the way we think about machine learning, particularly for tabular data problems in the nondeterministic (noisy) setting. We address how the Rashomon Effect impacts (1) the existence of simple-yet-accurate models, (2) flexibility to address user preferences, such as fairness and monotonicity, without losing performance, (3) uncertainty in predictions, fairness, and explanations, (4) reliable variable importance, (5) algorithm choice, specifically, providing advanced knowledge of which algorithms might be suitable for a given problem, and (6) public policy. We also discuss a theory of when the Rashomon Effect occurs and why. Our goal is to illustrate how the Rashomon Effect can have a massive impact on the use of machine learning for complex problems in society.
△ Less
Submitted 9 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data
Authors:
Srikar Katta,
Harsh Parikh,
Cynthia Rudin,
Alexander Volfovsky
Abstract:
Many modern causal questions ask how treatments affect complex outcomes that are measured using wearable devices and sensors. Current analysis approaches require summarizing these data into scalar statistics (e.g., the mean), but these summaries can be misleading. For example, disparate distributions can have the same means, variances, and other statistics. Researchers can overcome the loss of inf…
▽ More
Many modern causal questions ask how treatments affect complex outcomes that are measured using wearable devices and sensors. Current analysis approaches require summarizing these data into scalar statistics (e.g., the mean), but these summaries can be misleading. For example, disparate distributions can have the same means, variances, and other statistics. Researchers can overcome the loss of information by instead representing the data as distributions. We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making: Analyzing Distributional Data via Matching After Learning to Stretch (ADD MALTS). We (i) provide analytical guarantees of the correctness of our estimation strategy, (ii) demonstrate via simulation that ADD MALTS outperforms other distributional data analysis methods at estimating treatment effects, and (iii) illustrate ADD MALTS' ability to verify whether there is enough cohesion between treatment and control units within subpopulations to trustworthily estimate treatment effects. We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks.
△ Less
Submitted 20 March, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance
Authors:
Jon Donnelly,
Srikar Katta,
Cynthia Rudin,
Edward P. Browne
Abstract:
Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different re…
▽ More
Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available at https://github.com/jdonnelly36/Rashomon_Importance_Distribution.
△ Less
Submitted 1 April, 2024; v1 submitted 24 September, 2023;
originally announced September 2023.
-
DECAR: Deep Clustering for learning general-purpose Audio Representations
Authors:
Sreyan Ghosh,
Sandesh V Katta,
Ashish Seth,
S. Umesh
Abstract:
We introduce DECAR, a self-supervised pre-training approach for learning general-purpose audio representations. Our system is based on clustering: it utilizes an offline clustering step to provide target labels that act as pseudo-labels for solving a prediction task. We develop on top of recent advances in self-supervised learning for computer vision and design a lightweight, easy-to-use self-supe…
▽ More
We introduce DECAR, a self-supervised pre-training approach for learning general-purpose audio representations. Our system is based on clustering: it utilizes an offline clustering step to provide target labels that act as pseudo-labels for solving a prediction task. We develop on top of recent advances in self-supervised learning for computer vision and design a lightweight, easy-to-use self-supervised pre-training scheme. We pre-train DECAR embeddings on a balanced subset of the large-scale Audioset dataset and transfer those representations to 9 downstream classification tasks, including speech, music, animal sounds, and acoustic scenes. Furthermore, we conduct ablation studies identifying key design choices and also make all our code and pre-trained models publicly available.
△ Less
Submitted 14 March, 2023; v1 submitted 17 October, 2021;
originally announced October 2021.
-
S-vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder
Authors:
N J Metilda Sagaya Mary,
S Umesh,
Sandesh V Katta
Abstract:
One of the most popular speaker embeddings is x-vectors, which are obtained from an architecture that gradually builds a larger temporal context with layers. In this paper, we propose to derive speaker embeddings from Transformer's encoder trained for speaker classification. Self-attention, on which Transformer's encoder is built, attends to all the features over the entire utterance and might be…
▽ More
One of the most popular speaker embeddings is x-vectors, which are obtained from an architecture that gradually builds a larger temporal context with layers. In this paper, we propose to derive speaker embeddings from Transformer's encoder trained for speaker classification. Self-attention, on which Transformer's encoder is built, attends to all the features over the entire utterance and might be more suitable in capturing the speaker characteristics in an utterance. We refer to the speaker embeddings obtained from the proposed speaker classification model as s-vectors to emphasize that they are obtained from an architecture that heavily relies on self-attention. Through experiments, we demonstrate that s-vectors perform better than x-vectors. In addition to the s-vectors, we also propose a new architecture based on Transformer's encoder for speaker verification as a replacement for speaker verification based on conventional probabilistic linear discriminant analysis (PLDA). This architecture is inspired by the next sentence prediction task of bidirectional encoder representations from Transformers (BERT), and we feed the s-vectors of two utterances to verify whether they belong to the same speaker. We name this architecture the Transformer encoder speaker authenticator (TESA). Our experiments show that the performance of s-vectors with TESA is better than s-vectors with conventional PLDA-based speaker verification.
△ Less
Submitted 12 December, 2021; v1 submitted 11 August, 2020;
originally announced August 2020.
-
DIF : Dataset of Perceived Intoxicated Faces for Drunk Person Identification
Authors:
Vineet Mehta,
Devendra Pratap Yadav,
Sai Srinadhu Katta,
Abhinav Dhall
Abstract:
Traffic accidents cause over a million deaths every year, of which a large fraction is attributed to drunk driving. An automated intoxicated driver detection system in vehicles will be useful in reducing accidents and related financial costs. Existing solutions require special equipment such as electrocardiogram, infrared cameras or breathalyzers. In this work, we propose a new dataset called DIF…
▽ More
Traffic accidents cause over a million deaths every year, of which a large fraction is attributed to drunk driving. An automated intoxicated driver detection system in vehicles will be useful in reducing accidents and related financial costs. Existing solutions require special equipment such as electrocardiogram, infrared cameras or breathalyzers. In this work, we propose a new dataset called DIF (Dataset of perceived Intoxicated Faces) which contains audio-visual data of intoxicated and sober people obtained from online sources. To the best of our knowledge, this is the first work for automatic bimodal non-invasive intoxication detection. Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) are trained for computing the video and audio baselines, respectively. 3D CNN is used to exploit the Spatio-temporal changes in the video. A simple variation of the traditional 3D convolution block is proposed based on inducing non-linearity between the spatial and temporal channels. Extensive experiments are performed to validate the approach and baselines.
△ Less
Submitted 8 September, 2019; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Visual Secret Sharing Scheme using Grayscale Images
Authors:
Sandeep Katta
Abstract:
Pixel expansion and the quality of the reconstructed secret image has been a major issue of visual secret sharing (VSS) schemes. A number of probabilistic VSS schemes with minimum pixel expansion have been proposed for black and white (binary) secret images. This paper presents a probabilistic (2, 3)-VSS scheme for gray scale images. Its pixel expansion is larger in size but the quality of the ima…
▽ More
Pixel expansion and the quality of the reconstructed secret image has been a major issue of visual secret sharing (VSS) schemes. A number of probabilistic VSS schemes with minimum pixel expansion have been proposed for black and white (binary) secret images. This paper presents a probabilistic (2, 3)-VSS scheme for gray scale images. Its pixel expansion is larger in size but the quality of the image is perfect when it's reconstructed. The construction of the shadow images (transparent shares) is based on the binary OR operation.
△ Less
Submitted 30 June, 2011;
originally announced June 2011.
-
Recursive Information Hiding in Visual Cryptography
Authors:
Sandeep Katta
Abstract:
Visual Cryptography is a secret sharing scheme that uses the human visual system to perform computations. This paper presents a recursive hiding scheme for 3 out of 5 secret sharing. The idea used is to hide smaller secrets in the shares of a larger secret without an expansion in the size of the latter.
Visual Cryptography is a secret sharing scheme that uses the human visual system to perform computations. This paper presents a recursive hiding scheme for 3 out of 5 secret sharing. The idea used is to hide smaller secrets in the shares of a larger secret without an expansion in the size of the latter.
△ Less
Submitted 13 May, 2010; v1 submitted 27 April, 2010;
originally announced April 2010.