-
Using Case Studies to Teach Responsible AI to Industry Practitioners
Authors:
Julia Stoyanovich,
Rodrigo Kreis de Paula,
Armanda Lewis,
Chloe Zheng
Abstract:
Responsible AI (RAI) is the science and the practice of making the design, development, and use of AI socially sustainable: of reaping the benefits of innovation while controlling the risks. Naturally, industry practitioners play a decisive role in our collective ability to achieve the goals of RAI. Unfortunately, we do not yet have consolidated educational materials and effective methodologies fo…
▽ More
Responsible AI (RAI) is the science and the practice of making the design, development, and use of AI socially sustainable: of reaping the benefits of innovation while controlling the risks. Naturally, industry practitioners play a decisive role in our collective ability to achieve the goals of RAI. Unfortunately, we do not yet have consolidated educational materials and effective methodologies for teaching RAI to practitioners. In this paper, we propose a novel stakeholder-first educational approach that uses interactive case studies to achieve organizational and practitioner -level engagement and advance learning of RAI. We discuss a partnership with Meta, an international technology company, to co-develop and deliver RAI workshops to a diverse audience within the company. Our assessment results indicate that participants found the workshops engaging and reported a positive shift in understanding and motivation to apply RAI to their work.
△ Less
Submitted 23 July, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
A Game-Theoretic Approach to Privacy-Utility Tradeoff in Sharing Genomic Summary Statistics
Authors:
Tao Zhang,
Rajagopal Venkatesaramani,
Rajat K. De,
Bradley A. Malin,
Yevgeniy Vorobeychik
Abstract:
The advent of online genomic data-sharing services has sought to enhance the accessibility of large genomic datasets by allowing queries about genetic variants, such as summary statistics, aiding care providers in distinguishing between spurious genomic variations and those with clinical significance. However, numerous studies have demonstrated that even sharing summary genomic information exposes…
▽ More
The advent of online genomic data-sharing services has sought to enhance the accessibility of large genomic datasets by allowing queries about genetic variants, such as summary statistics, aiding care providers in distinguishing between spurious genomic variations and those with clinical significance. However, numerous studies have demonstrated that even sharing summary genomic information exposes individual members of such datasets to a significant privacy risk due to membership inference attacks. While several approaches have emerged that reduce privacy risks by adding noise or reducing the amount of information shared, these typically assume non-adaptive attacks that use likelihood ratio test (LRT) statistics. We propose a Bayesian game-theoretic framework for optimal privacy-utility tradeoff in the sharing of genomic summary statistics. Our first contribution is to prove that a very general Bayesian attacker model that anchors our game-theoretic approach is more powerful than the conventional LRT-based threat models in that it induces worse privacy loss for the defender who is modeled as a von Neumann-Morgenstern (vNM) decision-maker. We show this to be true even when the attacker uses a non-informative subjective prior. Next, we present an analytically tractable approach to compare the Bayesian attacks with arbitrary subjective priors and the Neyman-Pearson optimal LRT attacks under the Gaussian mechanism common in differential privacy frameworks. Finally, we propose an approach for approximating Bayes-Nash equilibria of the game using deep neural network generators to implicitly represent player mixed strategies. Our experiments demonstrate that the proposed game-theoretic framework yields both stronger attacks and stronger defense strategies than the state of the art.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Input Guided Multiple Deconstruction Single Reconstruction neural network models for Matrix Factorization
Authors:
Prasun Dutta,
Rajat K. De
Abstract:
Referring back to the original text in the course of hierarchical learning is a common human trait that ensures the right direction of learning. The models developed based on the concept of Non-negative Matrix Factorization (NMF), in this paper are inspired by this idea. They aim to deal with high-dimensional data by discovering its low rank approximation by determining a unique pair of factor mat…
▽ More
Referring back to the original text in the course of hierarchical learning is a common human trait that ensures the right direction of learning. The models developed based on the concept of Non-negative Matrix Factorization (NMF), in this paper are inspired by this idea. They aim to deal with high-dimensional data by discovering its low rank approximation by determining a unique pair of factor matrices. The model, named Input Guided Multiple Deconstruction Single Reconstruction neural network for Non-negative Matrix Factorization (IG-MDSR-NMF), ensures the non-negativity constraints of both factors. Whereas Input Guided Multiple Deconstruction Single Reconstruction neural network for Relaxed Non-negative Matrix Factorization (IG-MDSR-RNMF) introduces a novel idea of factorization with only the basis matrix adhering to the non-negativity criteria. This relaxed version helps the model to learn more enriched low dimensional embedding of the original data matrix. The competency of preserving the local structure of data in its low rank embedding produced by both the models has been appropriately verified. The superiority of low dimensional embedding over that of the original data justifying the need for dimension reduction has been established. The primacy of both the models has also been validated by comparing their performances separately with that of nine other established dimension reduction algorithms on five popular datasets. Moreover, computational complexity of the models and convergence analysis have also been presented testifying to the supremacy of the models.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
NeuroDAVIS: A neural network model for data visualization
Authors:
Chayan Maitra,
Dibyendu B. Seal,
Rajat K. De
Abstract:
The task of dimensionality reduction and visualization of high-dimensional datasets remains a challenging problem since long. Modern high-throughput technologies produce newer high-dimensional datasets having multiple views with relatively new data types. Visualization of these datasets require proper methodology that can uncover hidden patterns in the data without affecting the local and global s…
▽ More
The task of dimensionality reduction and visualization of high-dimensional datasets remains a challenging problem since long. Modern high-throughput technologies produce newer high-dimensional datasets having multiple views with relatively new data types. Visualization of these datasets require proper methodology that can uncover hidden patterns in the data without affecting the local and global structures within the data. To this end, however, very few such methodology exist, which can realise this task. In this work, we have introduced a novel unsupervised deep neural network model, called NeuroDAVIS, for data visualization. NeuroDAVIS is capable of extracting important features from the data, without assuming any data distribution, and visualize effectively in lower dimension. It has been shown theoritically that neighbourhood relationship of the data in high dimension remains preserved in lower dimension. The performance of NeuroDAVIS has been evaluated on a wide variety of synthetic and real high-dimensional datasets including numeric, textual, image and biological data. NeuroDAVIS has been highly competitive against both t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) with respect to visualization quality, and preservation of data size, shape, and both local and global structure. It has outperformed Fast interpolation-based t-SNE (Fit-SNE), a variant of t-SNE, for most of the high-dimensional datasets as well. For the biological datasets, besides t-SNE, UMAP and Fit-SNE, NeuroDAVIS has also performed well compared to other state-of-the-art algorithms, like Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE) and the siamese neural network-based method, called IVIS. Downstream classification and clustering analyses have also revealed favourable results for NeuroDAVIS-generated embeddings.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.