Zum Hauptinhalt springen

Showing 1–50 of 84 results for author: Steeg, G V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.08946  [pdf, other

    cs.LG

    Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

    Authors: Yunshu Wu, Yingtao Luo, Xianghao Kong, Evangelos E. Papalexakis, Greg Ver Steeg

    Abstract: Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  2. arXiv:2402.15833  [pdf, other

    cs.CL cs.LG

    Prompt Perturbation Consistency Learning for Robust Language Models

    Authors: Yao Qiang, Subhrangshu Nandi, Ninareh Mehrabi, Greg Ver Steeg, Anoop Kumar, Anna Rumshisky, Aram Galstyan

    Abstract: Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermor… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  3. arXiv:2402.08919  [pdf, other

    cs.CV cs.LG

    Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding

    Authors: Alessandro Achille, Greg Ver Steeg, Tian Yu Liu, Matthew Trager, Carson Klingenberg, Stefano Soatto

    Abstract: Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning. In legal doctrine however, determining the degree of similarity between works requires subjective analysis, and fact-finders (judges and juries) can demonstrate considerable variability in these subjective judgement calls. Images that are structurally similar can be deemed dissimilar, whe… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  4. arXiv:2312.14440  [pdf, other

    cs.LG cs.CR

    Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks

    Authors: Haz Sameen Shahgir, Xianghao Kong, Greg Ver Steeg, Yue Dong

    Abstract: The widespread use of Text-to-Image (T2I) models in content generation requires careful examination of their safety, including their robustness to adversarial attacks. Despite extensive research on adversarial attacks, the reasons for their effectiveness remain underexplored. This paper presents an empirical study on adversarial attacks against T2I models, focusing on analyzing factors associated… ▽ More

    Submitted 17 July, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: camera-ready version

  5. arXiv:2310.07972  [pdf, other

    cs.LG cs.AI cs.IT

    Interpretable Diffusion via Information Decomposition

    Authors: Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, Greg Ver Steeg

    Abstract: Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by di… ▽ More

    Submitted 18 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 32 pages, 18 figures

  6. arXiv:2306.09520  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Ensembled Prediction Intervals for Causal Outcomes Under Hidden Confounding

    Authors: Myrl G. Marmarelis, Greg Ver Steeg, Aram Galstyan, Fred Morstatter

    Abstract: Causal inference of exact individual treatment outcomes in the presence of hidden confounders is rarely possible. Recent work has extended prediction intervals with finite-sample guarantees to partially identifiable causal outcomes, by means of a sensitivity model for hidden confounding. In deep learning, predictors can exploit their inductive biases for better generalization out of sample. We arg… ▽ More

    Submitted 1 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

  7. arXiv:2306.06302  [pdf, other

    cs.IR cs.LG

    Multi-Task Knowledge Enhancement for Zero-Shot and Multi-Domain Recommendation in an AI Assistant Application

    Authors: Elan Markowitz, Ziyan Jiang, Fan Yang, Xing Fan, Tony Chen, Greg Ver Steeg, Aram Galstyan

    Abstract: Recommender systems have found significant commercial success but still struggle with integrating new users. Since users often interact with content in different domains, it is possible to leverage a user's interactions in previous domains to improve that user's recommendations in a new one (multi-domain recommendation). A separate research thread on knowledge graph enhancement uses external knowl… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  8. arXiv:2305.19264  [pdf, other

    cs.CL cs.LG

    Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private Tuning

    Authors: Umang Gupta, Aram Galstyan, Greg Ver Steeg

    Abstract: Efficient finetuning of pretrained language transformers is becoming increasingly prevalent for solving natural language processing tasks. While effective, it can still require a large number of tunable parameters. This can be a drawback for low-resource applications and training with differential-privacy constraints, where excessive noise may be introduced during finetuning. To this end, we propo… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: To appear in the Findings of ACL 2023. Code available at https://github.com/umgupta/jointly-reparametrized-finetuning

  9. arXiv:2305.16597  [pdf, other

    cs.CL cs.AI cs.LG

    Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models

    Authors: Neal Lawton, Anoop Kumar, Govind Thattai, Aram Galstyan, Greg Ver Steeg

    Abstract: Parameter-efficient tuning (PET) methods fit pre-trained language models (PLMs) to downstream tasks by either computing a small compressed update for a subset of model parameters, or appending and fine-tuning a small number of new model parameters to the pre-trained network. Hand-designed PET architectures from the literature perform well in practice, but have the potential to be improved via auto… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 8 pages, 3 figures, ACL 2023

    ACM Class: I.2.7

  10. arXiv:2305.10625  [pdf, other

    cs.LG

    Measuring and Mitigating Local Instability in Deep Neural Networks

    Authors: Arghya Datta, Subhrangshu Nandi, Jingcheng Xu, Greg Ver Steeg, He Xie, Anoop Kumar, Aram Galstyan

    Abstract: Deep Neural Networks (DNNs) are becoming integral components of real world services relied upon by millions of users. Unfortunately, architects of these systems can find it difficult to ensure reliable performance as irrelevant details like random initialization can unexpectedly change the outputs of a trained system with potentially disastrous consequences. We formulate the model stability proble… ▽ More

    Submitted 18 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: To be published in Findings of the Association for Computational Linguistics (ACL), 2023

  11. arXiv:2303.06992  [pdf, other

    cs.LG stat.ML

    Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

    Authors: Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger Grosse, Alireza Makhzani

    Abstract: Mutual information (MI) is a fundamental quantity in information theory and machine learning. However, direct estimation of MI is intractable, even if the true joint probability density for the variables of interest is known, as it involves estimating a potentially high-dimensional log partition function. In this work, we present a unifying view of existing MI bounds from the perspective of import… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: A shorter version appeared in the International Conference on Learning Representations (ICLR) 2022

    Journal ref: ICLR 2022 https://openreview.net/forum?id=T0B9AoM_bFg

  12. arXiv:2303.01491  [pdf, other

    eess.IV cs.LG q-bio.QM

    Transferring Models Trained on Natural Images to 3D MRI via Position Encoded Slice Models

    Authors: Umang Gupta, Tamoghna Chattopadhyay, Nikhil Dhinagar, Paul M. Thompson, Greg Ver Steeg, The Alzheimer's Disease Neuroimaging Initiative

    Abstract: Transfer learning has remarkably improved computer vision. These advances also promise improvements in neuroimaging, where training set sizes are often small. However, various difficulties arise in directly applying models pretrained on natural images to radiologic images, such as MRIs. In particular, a mismatch in the input space (2D images vs. 3D MRIs) restricts the direct transfer of models, of… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: To appear at IEEE International Symposium on Biomedical Imaging 2023 (ISBI 2023). Code is available at https://github.com/umgupta/2d-slice-set-networks

  13. arXiv:2302.03792  [pdf, other

    cs.LG cs.IT

    Information-Theoretic Diffusion

    Authors: Xianghao Kong, Rob Brekelmans, Greg Ver Steeg

    Abstract: Denoising diffusion models have spurred significant gains in density modeling and image generation, precipitating an industrial revolution in text-guided AI art generation. We introduce a new mathematical foundation for diffusion models inspired by classic results in information theory that connect Information with Minimum Mean Square Error regression, the so-called I-MMSE relations. We generalize… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 26 pages, 7 figures, International Conference on Learning Representations (ICLR), 2023. Code is at http://github.com/kxh001/ITdiffusion and http://github.com/gregversteeg/InfoDiffusionSimple

  14. arXiv:2208.11669  [pdf, other

    cs.LG cs.CR eess.IV q-bio.QM

    Towards Sparsified Federated Neuroimaging Models via Weight Pruning

    Authors: Dimitris Stripelis, Umang Gupta, Nikhil Dhinagar, Greg Ver Steeg, Paul Thompson, José Luis Ambite

    Abstract: Federated training of large deep neural networks can often be restrictive due to the increasing costs of communicating the updates with increasing model sizes. Various model pruning techniques have been designed in centralized settings to reduce inference times. Combining centralized pruning techniques with federated training seems intuitive for reducing communication costs -- by pruning the model… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: Accepted to 3rd MICCAI Workshop on Distributed, Collaborative and Federated Learning (DeCaF, 2022)

  15. Formal limitations of sample-wise information-theoretic generalization bounds

    Authors: Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan

    Abstract: Some of the tightest information-theoretic generalization bounds depend on the average information between the learned hypothesis and a single training example. However, these sample-wise bounds were derived only for expected generalization gap. We show that even for expected squared generalization gap no such sample-wise information-theoretic bounds exist. The same is true for PAC-Bayes and singl… ▽ More

    Submitted 13 December, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: 2022 IEEE Information Theory Workshop

  16. arXiv:2205.05249  [pdf, other

    cs.LG cs.CR cs.CV cs.DC

    Secure & Private Federated Neuroimaging

    Authors: Dimitris Stripelis, Umang Gupta, Hamza Saleem, Nikhil Dhinagar, Tanmay Ghai, Rafael Chrysovalantis Anastasiou, Armaghan Asghar, Greg Ver Steeg, Srivatsan Ravi, Muhammad Naveed, Paul M. Thompson, Jose Luis Ambite

    Abstract: The amount of biomedical data continues to grow rapidly. However, collecting data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. To overcome this challenge, we use Federated Learning, which enables distributed training of neural network models over multiple data sources without sharing data. Each site trains the neural network over its… ▽ More

    Submitted 28 August, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: 18 pages, 13 figures, 2 tables

    ACM Class: I.2; I.5.1; J.3

  17. arXiv:2204.12430  [pdf, other

    cs.LG

    Federated Progressive Sparsification (Purge, Merge, Tune)+

    Authors: Dimitris Stripelis, Umang Gupta, Greg Ver Steeg, Jose Luis Ambite

    Abstract: To improve federated training of neural networks, we develop FedSparsify, a sparsification strategy based on progressive weight magnitude pruning. Our method has several benefits. First, since the size of the network becomes increasingly smaller, computation and communication costs during training are reduced. Second, the models are incrementally constrained to a smaller set of parameters, which f… ▽ More

    Submitted 15 May, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

    Comments: Accepted at the Workshop on Federated Learning: Recent Advances and New Challenges, in Conjunction with NeurIPS 2022 (FL-NeurIPS'22) 23 pages, 12 figures, 1 algorithm, 2 Tables

    MSC Class: 68T07 ACM Class: I.2.m

  18. arXiv:2204.11206  [pdf, other

    stat.ME cs.LG stat.ML

    Partial Identification of Dose Responses with Hidden Confounders

    Authors: Myrl G. Marmarelis, Elizabeth Haddad, Andrew Jesson, Neda Jahanshad, Aram Galstyan, Greg Ver Steeg

    Abstract: Inferring causal effects of continuous-valued treatments from observational data is a crucial task promising to better inform policy- and decision-makers. A critical assumption needed to identify these effects is that all confounding variables -- causal parents of both the treatment and the outcome -- are included as covariates. Unfortunately, given observational data alone, we cannot know with ce… ▽ More

    Submitted 12 June, 2023; v1 submitted 24 April, 2022; originally announced April 2022.

  19. arXiv:2203.12574  [pdf, other

    cs.CL cs.LG

    Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

    Authors: Umang Gupta, Jwala Dhamala, Varun Kumar, Apurv Verma, Yada Pruksachatkun, Satyapriya Krishna, Rahul Gupta, Kai-Wei Chang, Greg Ver Steeg, Aram Galstyan

    Abstract: Language models excel at generating coherent text, and model compression techniques such as knowledge distillation have enabled their use in resource-constrained settings. However, these models can be biased in multiple ways, including the unfounded association of male and female genders with gender-neutral professions. Therefore, knowledge distillation without any fairness constraints may preserv… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: To appear in the Findings of ACL 2022

  20. arXiv:2203.10204  [pdf, other

    cond-mat.mtrl-sci cond-mat.dis-nn cs.CV cs.LG

    Inferring topological transitions in pattern-forming processes with self-supervised learning

    Authors: Marcin Abram, Keith Burghardt, Greg Ver Steeg, Aram Galstyan, Remi Dingreville

    Abstract: The identification and classification of transitions in topological and microstructural regimes in pattern-forming processes are critical for understanding and fabricating microstructurally precise novel materials in many application domains. Unfortunately, relevant microstructure transitions may depend on process parameters in subtle and complex ways that are not captured by the classic theory of… ▽ More

    Submitted 10 August, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: 17 pages, 6 figures, 8 pages of supplementary information

    ACM Class: I.2.6; I.4.7; I.5.4; I.6.m; J.2

  21. arXiv:2111.13733  [pdf, other

    cs.LG

    Failure Modes of Domain Generalization Algorithms

    Authors: Tigran Galstyan, Hrayr Harutyunyan, Hrant Khachatrian, Greg Ver Steeg, Aram Galstyan

    Abstract: Domain generalization algorithms use training data from multiple domains to learn models that generalize well to unseen domains. While recently proposed benchmarks demonstrate that most of the existing algorithms do not outperform simple baselines, the established evaluation methods fail to expose the impact of various factors that contribute to the poor performance. In this paper we propose an ev… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

  22. arXiv:2111.06312  [pdf, other

    cs.LG cs.AI cs.MS cs.SI

    Implicit SVD for Graph Representation Learning

    Authors: Sami Abu-El-Haija, Hesham Mostafa, Marcel Nassar, Valentino Crespi, Greg Ver Steeg, Aram Galstyan

    Abstract: Recent improvements in the performance of state-of-the-art (SOTA) methods for Graph Representational Learning (GRL) have come at the cost of significant computational resource requirements for training, e.g., for calculating gradients via backprop over many data epochs. Meanwhile, Singular Value Decomposition (SVD) can find closed-form solutions to convex problems, using merely a handful of epochs… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Journal ref: Advances in Neural Information Processing Systems (NeurIPS) 2021

  23. arXiv:2111.02434  [pdf, other

    cs.LG physics.comp-ph

    Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

    Authors: Greg Ver Steeg, Aram Galstyan

    Abstract: Sampling from an unnormalized probability distribution is a fundamental problem in machine learning with applications including Bayesian modeling, latent factor inference, and energy-based model training. After decades of research, variations of MCMC remain the default approach to sampling despite slow convergence. Auxiliary neural models can learn to speed up MCMC, but the overhead for training t… ▽ More

    Submitted 29 December, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: 31 pages, 19 figures. Advances in Neural Information Processing Systems (NeurIPS), 2021. Animations at https://sites.google.com/view/esh-dynamics/home, code at https://github.com/gregversteeg/esh_dynamics

  24. arXiv:2110.01584  [pdf, other

    cs.LG stat.ML

    Information-theoretic generalization bounds for black-box learning algorithms

    Authors: Hrayr Harutyunyan, Maxim Raginsky, Greg Ver Steeg, Aram Galstyan

    Abstract: We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms… ▽ More

    Submitted 5 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  25. arXiv:2109.03952  [pdf, other

    cs.AI

    Attributing Fair Decisions with Attention Interventions

    Authors: Ninareh Mehrabi, Umang Gupta, Fred Morstatter, Greg Ver Steeg, Aram Galstyan

    Abstract: The widespread use of Artificial Intelligence (AI) in consequential domains, such as healthcare and parole decision-making systems, has drawn intense scrutiny on the fairness of these methods. However, ensuring fairness is often insufficient as the rationale for a contentious decision needs to be audited, understood, and defended. We propose that the attention mechanism can be used to ensure fair… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

  26. arXiv:2108.03437  [pdf, other

    cs.CR cs.LG

    Secure Neuroimaging Analysis using Federated Learning with Homomorphic Encryption

    Authors: Dimitris Stripelis, Hamza Saleem, Tanmay Ghai, Nikhil Dhinagar, Umang Gupta, Chrysovalantis Anastasiou, Greg Ver Steeg, Srivatsan Ravi, Muhammad Naveed, Paul M. Thompson, Jose Luis Ambite

    Abstract: Federated learning (FL) enables distributed computation of machine learning models over various disparate, remote data sources, without requiring to transfer any individual data to a centralized location. This results in an improved generalizability of models and efficient scaling of computation as more sources and larger datasets are added to the federation. Nevertheless, recent membership attack… ▽ More

    Submitted 9 November, 2021; v1 submitted 7 August, 2021; originally announced August 2021.

    Comments: 9 pages, 3 figures, 1 algorithm

  27. arXiv:2107.00745  [pdf, other

    cs.LG cs.AI stat.ML

    q-Paths: Generalizing the Geometric Annealing Path using Power Means

    Authors: Vaden Masrani, Rob Brekelmans, Thang Bui, Frank Nielsen, Aram Galstyan, Greg Ver Steeg, Frank Wood

    Abstract: Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: arXiv admin note: text overlap with arXiv:2012.07823

  28. arXiv:2105.02866  [pdf, other

    q-bio.QM cs.CR cs.LG eess.IV

    Membership Inference Attacks on Deep Regression Models for Neuroimaging

    Authors: Umang Gupta, Dimitris Stripelis, Pradeep K. Lam, Paul M. Thompson, José Luis Ambite, Greg Ver Steeg

    Abstract: Ensuring the privacy of research participants is vital, even more so in healthcare environments. Deep learning approaches to neuroimaging require large datasets, and this often necessitates sharing data between multiple sites, which is antithetical to the privacy objectives. Federated learning is a commonly proposed solution to this problem. It circumvents the need for data sharing by sharing para… ▽ More

    Submitted 3 June, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: To appear at Medical Imaging with Deep Learning 2021 (MIDL 2021)

  29. arXiv:2102.08530  [pdf, other

    cs.LG cs.MS cs.SI

    Fast Graph Learning with Unique Optimal Solutions

    Authors: Sami Abu-El-Haija, Valentino Crespi, Greg Ver Steeg, Aram Galstyan

    Abstract: We consider two popular Graph Representation Learning (GRL) methods: message passing for node classification and network embedding for link prediction. For each, we pick a popular model that we: (i) linearize and (ii) and switch its training objective to Frobenius norm error minimization. These simplifications can cast the training into finding the optimal parameters in closed-form. We program in… ▽ More

    Submitted 22 April, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Journal ref: ICLR 2021 Workshop on Geometrical and Topological Representation Learning

  30. arXiv:2102.04438  [pdf, other

    eess.IV cs.LG q-bio.QM

    Improved Brain Age Estimation with Slice-based Set Networks

    Authors: Umang Gupta, Pradeep K. Lam, Greg Ver Steeg, Paul M. Thompson

    Abstract: Deep Learning for neuroimaging data is a promising but challenging direction. The high dimensionality of 3D MRI scans makes this endeavor compute and data-intensive. Most conventional 3D neuroimaging methods use 3D-CNN-based architectures with a large number of parameters and require more time and data to train. Recently, 2D-slice-based models have received increasing attention as they have fewer… ▽ More

    Submitted 9 February, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: To appear at IEEE International Symposium on Biomedical Imaging 2021 (ISBI 2021). Code is available at https://git.io/JtazG

  31. arXiv:2102.04350  [pdf, other

    cs.LG

    Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning

    Authors: Elan Markowitz, Keshav Balasubramanian, Mehrnoosh Mirtaheri, Sami Abu-El-Haija, Bryan Perozzi, Greg Ver Steeg, Aram Galstyan

    Abstract: Graph Representation Learning (GRL) methods have impacted fields from chemistry to social science. However, their algorithmic implementations are specialized to specific use-cases e.g.message passing methods are run differently from node embedding ones. Despite their apparent differences, all these methods utilize the graph structure, and therefore, their learning can be approximated with stochast… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: To appear in ICLR 2021

  32. arXiv:2101.04108  [pdf, other

    cs.LG stat.ML

    Controllable Guarantees for Fair Outcomes via Contrastive Information Estimation

    Authors: Umang Gupta, Aaron M Ferber, Bistra Dilkina, Greg Ver Steeg

    Abstract: Controlling bias in training datasets is vital for ensuring equal treatment, or parity, between different groups in downstream applications. A naive solution is to transform the data so that it is statistically independent of group membership, but this may throw away too much information when a reasonable compromise between fairness and accuracy is desired. Another common approach is to limit the… ▽ More

    Submitted 3 June, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: This version fixes an error in Theorem 2 of the original manuscript that appeared at the Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-21). Code is available at https://github.com/umgupta/fairness-via-contrastive-estimation

  33. arXiv:2012.15480  [pdf, other

    cs.LG cs.IT stat.ML

    Likelihood Ratio Exponential Families

    Authors: Rob Brekelmans, Frank Nielsen, Alireza Makhzani, Aram Galstyan, Greg Ver Steeg

    Abstract: The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance sampling. Linking these two ideas, recent work has interpreted the geometric mixture path as an exponential family of distributions to analyze the thermod… ▽ More

    Submitted 15 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: NeurIPS Workshop on Deep Learning through Information Geometry

  34. arXiv:2012.07823  [pdf, other

    cs.LG

    Annealed Importance Sampling with q-Paths

    Authors: Rob Brekelmans, Vaden Masrani, Thang Bui, Frank Wood, Aram Galstyan, Greg Ver Steeg, Frank Nielsen

    Abstract: Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been primarily limited to the geometric mixture or moment-averaged paths associated with… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: NeurIPS Workshop on Deep Learning through Information Geometry (Best Paper Award)

    Journal ref: Published at UAI 2021 https://arxiv.org/abs/2107.00745

  35. arXiv:2007.14917  [pdf, other

    cs.LG stat.ML

    Compressing Deep Neural Networks via Layer Fusion

    Authors: James O' Neill, Greg Ver Steeg, Aram Galstyan

    Abstract: This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

  36. arXiv:2007.05335  [pdf, other

    cs.LG stat.ML

    Robust Classification under Class-Dependent Domain Shift

    Authors: Tigran Galstyan, Hrant Khachatrian, Greg Ver Steeg, Aram Galstyan

    Abstract: Investigation of machine learning algorithms robust to changes between the training and test distributions is an active area of research. In this paper we explore a special type of dataset shift which we call class-dependent domain shift. It is characterized by the following features: the input data causally depends on the label, the shift in the data is fully explained by a known variable, the va… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Comments: Accepted at ICML 2020 workshop on Uncertainty and Robustness in Deep Learning

  37. arXiv:2007.00642  [pdf, other

    cs.LG stat.ML

    All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

    Authors: Rob Brekelmans, Vaden Masrani, Frank Wood, Greg Ver Steeg, Aram Galstyan

    Abstract: The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model lear… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: ICML 2020

  38. arXiv:2006.00115  [pdf, other

    q-bio.QM cs.CV cs.LG eess.IV

    Overview of Scanner Invariant Representations

    Authors: Daniel Moyer, Greg Ver Steeg, Paul M. Thompson

    Abstract: Pooled imaging data from multiple sources is subject to bias from each source. Studies that do not correct for these scanner/site biases at best lose statistical power, and at worst leave spurious correlations in their data. Estimation of the bias effects is non-trivial due to the paucity of data with correspondence across sites, so called "traveling phantom" data, which is expensive to collect. N… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: Accepted as a short paper in MIDL 2020. In accordance with the MIDL 2020 Call for Papers, this short paper is an overview of an already published work arXiv:1904.05375, and was submitted to MIDL in order to allow presentation and discussion at the meeting

    Report number: MIDL/2020/ExtendedAbstract/yqm9RD_XHT

  39. A Metric Space for Point Process Excitations

    Authors: Myrl G. Marmarelis, Greg Ver Steeg, Aram Galstyan

    Abstract: A multivariate Hawkes process enables self- and cross-excitations through a triggering matrix that behaves like an asymmetrical covariance structure, characterizing pairwise interactions between the event types. Full-rank estimation of all interactions is often infeasible in empirical settings. Models that specialize on a spatiotemporal application alleviate this obstacle by exploiting spatial loc… ▽ More

    Submitted 23 April, 2022; v1 submitted 5 May, 2020; originally announced May 2020.

    Journal ref: Journal of Artificial Intelligence Research 73 (2022) 1323-1353

  40. arXiv:2002.07933  [pdf, other

    cs.LG stat.ML

    Improving Generalization by Controlling Label-Noise Information in Neural Network Weights

    Authors: Hrayr Harutyunyan, Kyle Reing, Greg Ver Steeg, Aram Galstyan

    Abstract: In the presence of noisy or incorrect labels, neural networks have the undesirable tendency to memorize information about the noise. Standard regularization techniques such as dropout, weight decay or data augmentation sometimes help, but do not prevent this behavior. If one considers neural network weights as random variables that depend on the data and stochasticity of training, the amount of me… ▽ More

    Submitted 20 November, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: ICML, 2020

  41. arXiv:1912.00646  [pdf, other

    cs.LG stat.ML

    Discovery and Separation of Features for Invariant Representation Learning

    Authors: Ayush Jaiswal, Rob Brekelmans, Daniel Moyer, Greg Ver Steeg, Wael AbdAlmageed, Premkumar Natarajan

    Abstract: Supervised machine learning models often associate irrelevant nuisance factors with the prediction target, which hurts generalization. We propose a framework for training robust neural networks that induces invariance to nuisances through learning to discover and separate predictive and nuisance factors of data. We present an information theoretic formulation of our approach, from which we derive… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: 10 pages, 3 figures

  42. arXiv:1911.04060  [pdf, other

    cs.LG stat.ML

    Invariant Representations through Adversarial Forgetting

    Authors: Ayush Jaiswal, Daniel Moyer, Greg Ver Steeg, Wael AbdAlmageed, Premkumar Natarajan

    Abstract: We propose a novel approach to achieving invariance for deep neural networks in the form of inducing amnesia to unwanted factors of data through a new adversarial forgetting mechanism. We show that the forgetting mechanism serves as an information-bottleneck, which is manipulated by the adversarial training to learn invariance to unwanted factors. Empirical results show that the proposed framework… ▽ More

    Submitted 20 November, 2019; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: To appear in Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20)

  43. arXiv:1909.03881  [pdf, other

    cs.LG cs.AI cs.CL cs.IT stat.ML

    Nearly-Unsupervised Hashcode Representations for Relation Extraction

    Authors: Sahil Garg, Aram Galstyan, Greg Ver Steeg, Guillermo Cecchi

    Abstract: Recently, kernelized locality sensitive hashcodes have been successfully employed as representations of natural language text, especially showing high relevance to biomedical relation extraction tasks. In this paper, we propose to optimize the hashcode representations in a nearly unsupervised manner, in which we only use data points, but not their class labels, for learning. The optimized hashcode… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: Proceedings of EMNLP-19

  44. arXiv:1905.13276  [pdf, other

    cs.LG stat.ML

    Efficient Covariance Estimation from Temporal Data

    Authors: Hrayr Harutyunyan, Daniel Moyer, Hrant Khachatrian, Greg Ver Steeg, Aram Galstyan

    Abstract: Estimating the covariance structure of multivariate time series is a fundamental problem with a wide-range of real-world applications -- from financial modeling to fMRI analysis. Despite significant recent advances, current state-of-the-art methods are still severely limited in terms of scalability, and do not work well in high-dimensional undersampled regimes. In this work we propose a novel meth… ▽ More

    Submitted 11 February, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

  45. arXiv:1905.00067  [pdf, other

    cs.LG cs.SI stat.ML

    MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

    Authors: Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan

    Abstract: Existing popular methods for semi-supervised learning with Graph Neural Networks (such as the Graph Convolutional Network) provably cannot learn a general class of neighborhood mixing relationships. To address this weakness, we propose a new model, MixHop, that can learn these relationships, including difference operators, by repeatedly mixing feature representations of neighbors at various distan… ▽ More

    Submitted 19 June, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

  46. arXiv:1904.07199  [pdf, other

    cs.LG cs.IT stat.ML

    Exact Rate-Distortion in Autoencoders via Echo Noise

    Authors: Rob Brekelmans, Daniel Moyer, Aram Galstyan, Greg Ver Steeg

    Abstract: Compression is at the heart of effective representation learning. However, lossy compression is typically achieved through simple parametric models like Gaussian noise to preserve analytic tractability, and the limitations this imposes on learning are largely unexplored. Further, the Gaussian prior assumptions in models such as variational autoencoders (VAEs) provide only an upper bound on the com… ▽ More

    Submitted 14 November, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

    Comments: NeurIPS 2019; updated Gaussian baseline results, added disentanglement

  47. arXiv:1904.05375  [pdf, other

    q-bio.QM cs.LG eess.IV stat.AP stat.ML

    Scanner Invariant Representations for Diffusion MRI Harmonization

    Authors: Daniel Moyer, Greg Ver Steeg, Chantal M. W. Tax, Paul M. Thompson

    Abstract: Purpose: In the present work we describe the correction of diffusion-weighted MRI for site and scanner biases using a novel method based on invariant representation. Theory and Methods: Pooled imaging data from multiple sources are subject to variation between the sources. Correcting for these biases has become very important as imaging studies increase in size and multi-site cases become more c… ▽ More

    Submitted 31 January, 2020; v1 submitted 10 April, 2019; originally announced April 2019.

  48. arXiv:1902.03110  [pdf, other

    cs.SI cs.LG stat.ML

    Identifying and Analyzing Cryptocurrency Manipulations in Social Media

    Authors: Mehrnoosh Mirtaheri, Sami Abu-El-Haija, Fred Morstatter, Greg Ver Steeg, Aram Galstyan

    Abstract: Interest surrounding cryptocurrencies, digital or virtual currencies that are used as a medium for financial transactions, has grown tremendously in recent years. The anonymity surrounding these currencies makes investors particularly susceptible to fraud---such as "pump and dump" scams---where the goal is to artificially inflate the perceived worth of a currency, luring victims into investing bef… ▽ More

    Submitted 17 December, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: Section 4. Prediction tasks: The training setup and algorithm revised. The details of the training algorithm added. More features added to the feature set. Section 5. Botometer score added as the likelihood of a user being bot. More analysis added on bot activity in clusters

  49. arXiv:1811.10839  [pdf, other

    cs.IT

    Maximizing Multivariate Information with Error-Correcting Codes

    Authors: Kyle Reing, Greg Ver Steeg, Aram Galstyan

    Abstract: Multivariate mutual information provides a conceptual framework for characterizing higher-order interactions in complex systems. Two well-known measures of multivariate information---total correlation and dual total correlation---admit a spectrum of measures with varying sensitivity to intermediate orders of dependence. Unfortunately, these intermediate measures have not received much attention du… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

    Comments: 10 pages

  50. arXiv:1806.04634  [pdf, other

    q-bio.QM cs.LG q-bio.TO stat.AP

    Measures of Tractography Convergence

    Authors: Daniel Moyer, Paul M. Thompson, Greg Ver Steeg

    Abstract: In the present work, we use information theory to understand the empirical convergence rate of tractography, a widely-used approach to reconstruct anatomical fiber pathways in the living brain. Based on diffusion MRI data, tractography is the starting point for many methods to study brain connectivity. Of the available methods to perform tractography, most reconstruct a finite set of streamlines,… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: 11 pages