-
Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance
Authors:
Chiraag Kaushik,
Ran Liu,
Chi-Heng Lin,
Amrit Khera,
Matthew Y Jin,
Wenrui Ma,
Vidya Muthukumar,
Eva L Dyer
Abstract:
Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class dispariti…
▽ More
Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pretrained encoders and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pretrained features may have unknown biases that can be diagnosed through their spectra.
△ Less
Submitted 3 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
A Unified, Scalable Framework for Neural Population Decoding
Authors:
Mehdi Azabou,
Vinam Arora,
Venkataramana Ganesh,
Ximeng Mao,
Santosh Nachimuthu,
Michael J. Mendelson,
Blake Richards,
Matthew G. Perich,
Guillaume Lajoie,
Eva L. Dyer
Abstract:
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and archit…
▽ More
Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration
Authors:
Ran Liu,
Sahil Khose,
Jingyun Xiao,
Lakshmi Sathidevi,
Keerthan Ramnath,
Zsolt Kira,
Eva L. Dyer
Abstract:
Despite significant advances in deep learning, models often struggle to generalize well to new, unseen domains, especially when training data is limited. To address this challenge, we propose a novel approach for distribution-aware latent augmentation that leverages the relationships across samples to guide the augmentation procedure. Our approach first degrades the samples stochastically in the l…
▽ More
Despite significant advances in deep learning, models often struggle to generalize well to new, unseen domains, especially when training data is limited. To address this challenge, we propose a novel approach for distribution-aware latent augmentation that leverages the relationships across samples to guide the augmentation procedure. Our approach first degrades the samples stochastically in the latent space, mapping them to augmented labels, and then restores the samples from their corrupted versions during training. This process confuses the classifier in the degradation step and restores the overall class distribution of the original samples, promoting diverse intra-class/cross-domain variability. We extensively evaluate our approach on a diverse set of datasets and tasks, including domain generalization benchmarks and medical imaging datasets with strong domain shift, where we show our approach achieves significant improvements over existing methods for latent space augmentation. We further show that our method can be flexibly adapted to long-tail recognition tasks, demonstrating its versatility in building more generalizable models. Code is available at https://github.com/nerdslab/LatentDR.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Half-Hop: A graph upsampling approach for slowing down message passing
Authors:
Mehdi Azabou,
Venkataramana Ganesh,
Shantanu Thakoor,
Chi-Heng Lin,
Lakshmi Sathidevi,
Ran Liu,
Michal Valko,
Petar Veličković,
Eva L. Dyer
Abstract:
Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the origin…
▽ More
Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding "slow nodes" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis
Authors:
Mehdi Azabou,
Michael Mendelson,
Nauman Ahad,
Maks Sorokin,
Shantanu Thakoor,
Carolina Urzay,
Eva L. Dyer
Abstract:
Natural behavior consists of dynamics that are complex and unpredictable, especially when trying to predict many steps into the future. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings where behavior becomes increasingly hard to model. In this wor…
▽ More
Natural behavior consists of dynamics that are complex and unpredictable, especially when trying to predict many steps into the future. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings where behavior becomes increasingly hard to model. In this work, we develop a multi-task representation learning model for behavior that combines two novel components: (i) An action prediction objective that aims to predict the distribution of actions over future timesteps, and (ii) A multi-scale architecture that builds separate latent spaces to accommodate short- and long-term dynamics. After demonstrating the ability of the method to build representations of both local and global dynamics in realistic robots in varying environments and terrains, we apply our method to the MABe 2022 Multi-agent behavior challenge, where our model ranks 1st overall and on all global tasks, and 1st or 2nd on 7 out of 9 frame-level tasks. In all of these cases, we show that our model can build representations that capture the many different factors that drive behavior and solve a wide range of downstream tasks.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Learning signatures of decision making from many individuals playing the same game
Authors:
Michael J Mendelson,
Mehdi Azabou,
Suma Jacob,
Nicola Grissom,
David Darrow,
Becket Ebitz,
Alexander Herman,
Eva L. Dyer
Abstract:
Human behavior is incredibly complex and the factors that drive decision making--from instinct, to strategy, to biases between individuals--often vary over multiple timescales. In this paper, we design a predictive framework that learns representations to encode an individual's 'behavioral style', i.e. long-term behavioral trends, while simultaneously predicting future actions and choices. The mod…
▽ More
Human behavior is incredibly complex and the factors that drive decision making--from instinct, to strategy, to biases between individuals--often vary over multiple timescales. In this paper, we design a predictive framework that learns representations to encode an individual's 'behavioral style', i.e. long-term behavioral trends, while simultaneously predicting future actions and choices. The model explicitly separates representations into three latent spaces: the recent past space, the short-term space, and the long-term space where we hope to capture individual differences. To simultaneously extract both global and local variables from complex human behavior, our method combines a multi-scale temporal convolutional network with latent prediction tasks, where we encourage embeddings across the entire sequence, as well as subsets of the sequence, to be mapped to similar points in the latent space. We develop and apply our method to a large-scale behavioral dataset from 1,000 humans playing a 3-armed bandit task, and analyze what our model's resulting embeddings reveal about the human decision making process. In addition to predicting future choices, we show that our model can learn rich representations of human behavior over multiple timescales and provide signatures of differences in individuals.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction
Authors:
Jorge Quesada,
Lakshmi Sathidevi,
Ran Liu,
Nauman Ahad,
Joy M. Jackson,
Mehdi Azabou,
Jingyun Xiao,
Christopher Liding,
Matthew Jin,
Carolina Urzay,
William Gray-Roncal,
Erik C. Johnson,
Eva L. Dyer
Abstract:
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain reg…
▽ More
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
△ Less
Submitted 31 December, 2022;
originally announced January 2023.
-
The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective
Authors:
Chi-Heng Lin,
Chiraag Kaushik,
Eva L. Dyer,
Vidya Muthukumar
Abstract:
Data augmentation (DA) is a powerful workhorse for bolstering performance in modern machine learning. Specific augmentations like translations and scaling in computer vision are traditionally believed to improve generalization by generating new (artificial) data from the same distribution. However, this traditional viewpoint does not explain the success of prevalent augmentations in modern machine…
▽ More
Data augmentation (DA) is a powerful workhorse for bolstering performance in modern machine learning. Specific augmentations like translations and scaling in computer vision are traditionally believed to improve generalization by generating new (artificial) data from the same distribution. However, this traditional viewpoint does not explain the success of prevalent augmentations in modern machine learning (e.g. randomized masking, cutout, mixup), that greatly alter the training data distribution. In this work, we develop a new theoretical framework to characterize the impact of a general class of DA on underparameterized and overparameterized linear model generalization. Our framework reveals that DA induces implicit spectral regularization through a combination of two distinct effects: a) manipulating the relative proportion of eigenvalues of the data covariance matrix in a training-data-dependent manner, and b) uniformly boosting the entire spectrum of the data covariance matrix through ridge regression. These effects, when applied to popular augmentations, give rise to a wide variety of phenomena, including discrepancies in generalization between over-parameterized and under-parameterized regimes and differences between regression and classification tasks. Our framework highlights the nuanced and sometimes surprising impacts of DA on generalization, and serves as a testbed for novel augmentation design.
△ Less
Submitted 27 February, 2024; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Learning Behavior Representations Through Multi-Timescale Bootstrapping
Authors:
Mehdi Azabou,
Michael Mendelson,
Maks Sorokin,
Shantanu Thakoor,
Nauman Ahad,
Carolina Urzay,
Eva L. Dyer
Abstract:
Natural behavior consists of dynamics that are both unpredictable, can switch suddenly, and unfold over many different timescales. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings due to the fact that they assume a single scale of temporal dynamic…
▽ More
Natural behavior consists of dynamics that are both unpredictable, can switch suddenly, and unfold over many different timescales. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings due to the fact that they assume a single scale of temporal dynamics. In this work, we introduce Bootstrap Across Multiple Scales (BAMS), a multi-scale representation learning model for behavior: we combine a pooling module that aggregates features extracted over encoders with different temporal receptive fields, and design a set of latent objectives to bootstrap the representations in each respective space to encourage disentanglement across different timescales. We first apply our method on a dataset of quadrupeds navigating in different terrain types, and show that our model captures the temporal complexity of behavior. We then apply our method to the MABe 2022 Multi-agent behavior challenge, where our model ranks 3rd overall and 1st on two subtasks, and show the importance of incorporating multi-timescales when analyzing behavior.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers
Authors:
Ran Liu,
Mehdi Azabou,
Max Dabagia,
Jingyun Xiao,
Eva L. Dyer
Abstract:
Complex time-varying systems are often studied by abstracting away from the dynamics of individual components to build a model of the population-level dynamics from the start. However, when building a population-level description, it can be easy to lose sight of each individual and how they contribute to the larger picture. In this paper, we present a novel transformer architecture for learning fr…
▽ More
Complex time-varying systems are often studied by abstracting away from the dynamics of individual components to build a model of the population-level dynamics from the start. However, when building a population-level description, it can be easy to lose sight of each individual and how they contribute to the larger picture. In this paper, we present a novel transformer architecture for learning from time-varying data that builds descriptions of both the individual as well as the collective population dynamics. Rather than combining all of our data into our model at the onset, we develop a separable architecture that operates on individual time-series first before passing them forward; this induces a permutation-invariance property and can be used to transfer across systems of different size and order. After demonstrating that our model can be applied to successfully recover complex interactions and dynamics in many-body systems, we apply our approach to populations of neurons in the nervous system. On neural activity datasets, we show that our model not only yields robust decoding performance, but also provides impressive performance in transfer across recordings of different animals without any neuron-level correspondence. By enabling flexible pre-training that can be transferred to neural recordings of different size and order, our work provides a first step towards creating a foundation model for neural decoding.
△ Less
Submitted 20 October, 2022; v1 submitted 10 June, 2022;
originally announced June 2022.
-
Learning Sinkhorn divergences for supervised change point detection
Authors:
Nauman Ahad,
Eva L. Dyer,
Keith B. Hengen,
Yao Xie,
Mark A. Davenport
Abstract:
Many modern applications require detecting change points in complex sequential data. Most existing methods for change point detection are unsupervised and, as a consequence, lack any information regarding what kind of changes we want to detect or if some kinds of changes are safe to ignore. This often results in poor change detection performance. We present a novel change point detection framework…
▽ More
Many modern applications require detecting change points in complex sequential data. Most existing methods for change point detection are unsupervised and, as a consequence, lack any information regarding what kind of changes we want to detect or if some kinds of changes are safe to ignore. This often results in poor change detection performance. We present a novel change point detection framework that uses true change point instances as supervision for learning a ground metric such that Sinkhorn divergences can be then used in two-sample tests on sliding windows to detect change points in an online manner. Our method can be used to learn a sparse metric which can be useful for both feature selection and interpretation in high-dimensional change point detection settings. Experiments on simulated as well as real world sequences show that our proposed method can substantially improve change point detection performance over existing unsupervised change point detection methods using only few labeled change point instances.
△ Less
Submitted 10 February, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity
Authors:
Ran Liu,
Mehdi Azabou,
Max Dabagia,
Chi-Heng Lin,
Mohammad Gheshlaghi Azar,
Keith B. Hengen,
Michal Valko,
Eva L. Dyer
Abstract:
Meaningful and simplified representations of neural activity can yield insights into how and what information is being processed within a neural circuit. However, without labels, finding representations that reveal the link between the brain and behavior can be challenging. Here, we introduce a novel unsupervised approach for learning disentangled representations of neural activity called Swap-VAE…
▽ More
Meaningful and simplified representations of neural activity can yield insights into how and what information is being processed within a neural circuit. However, without labels, finding representations that reveal the link between the brain and behavior can be challenging. Here, we introduce a novel unsupervised approach for learning disentangled representations of neural activity called Swap-VAE. Our approach combines a generative modeling framework with an instance-specific alignment loss that tries to maximize the representational similarity between transformed views of the input (brain state). These transformed (or augmented) views are created by dropping out neurons and jittering samples in time, which intuitively should lead the network to a representation that maintains both temporal consistency and invariance to the specific neurons used to represent the neural state. Through evaluations on both synthetic data and neural recordings from hundreds of neurons in different primate brains, we show that it is possible to build representations that disentangle neural datasets along relevant latent dimensions linked to behavior.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Neural Latents Benchmark '21: Evaluating latent variable models of neural population activity
Authors:
Felix Pei,
Joel Ye,
David Zoltowski,
Anqi Wu,
Raeed H. Chowdhury,
Hansem Sohn,
Joseph E. O'Doherty,
Krishna V. Shenoy,
Matthew T. Kaufman,
Mark Churchland,
Mehrdad Jazayeri,
Lee E. Miller,
Jonathan Pillow,
Il Memming Park,
Eva L. Dyer,
Chethan Pandarinath
Abstract:
Advances in neural recording present increasing opportunities to study neural activity in unprecedented detail. Latent variable models (LVMs) are promising tools for analyzing this rich activity across diverse neural systems and behaviors, as LVMs do not depend on known relationships between the activity and external experimental variables. However, progress with LVMs for neuronal population activ…
▽ More
Advances in neural recording present increasing opportunities to study neural activity in unprecedented detail. Latent variable models (LVMs) are promising tools for analyzing this rich activity across diverse neural systems and behaviors, as LVMs do not depend on known relationships between the activity and external experimental variables. However, progress with LVMs for neuronal population activity is currently impeded by a lack of standardization, resulting in methods being developed and compared in an ad hoc manner. To coordinate these modeling efforts, we introduce a benchmark suite for latent variable modeling of neural population activity. We curate four datasets of neural spiking activity from cognitive, sensory, and motor areas to promote models that apply to the wide variety of activity seen across these areas. We identify unsupervised evaluation as a common framework for evaluating models across datasets, and apply several baselines that demonstrate benchmark diversity. We release this benchmark through EvalAI. http://neurallatents.github.io
△ Less
Submitted 17 January, 2022; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction
Authors:
Mehdi Azabou,
Mohammad Gheshlaghi Azar,
Ran Liu,
Chi-Heng Lin,
Erik C. Johnson,
Kiran Bhaskaran-Nair,
Max Dabagia,
Bernardo Avila-Pires,
Lindsey Kitchell,
Keith B. Hengen,
William Gray-Roncal,
Michal Valko,
Eva L. Dyer
Abstract:
State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a sample. Without sufficient diversity in the transformations used to create views, however, it can be difficult to overcome nuisance variables in the data and build rich representations. This motivates the use of the dataset itself to find similar…
▽ More
State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a sample. Without sufficient diversity in the transformations used to create views, however, it can be difficult to overcome nuisance variables in the data and build rich representations. This motivates the use of the dataset itself to find similar, yet distinct, samples to serve as views for one another. In this paper, we introduce Mine Your Own vieW (MYOW), a new approach for self-supervised learning that looks within the dataset to define diverse targets for prediction. The idea behind our approach is to actively mine views, finding samples that are neighbors in the representation space of the network, and then predict, from one sample's latent representation, the representation of a nearby sample. After showing the promise of MYOW on benchmarks used in computer vision, we highlight the power of this idea in a novel application in neuroscience where SSL has yet to be applied. When tested on multi-unit neural recordings, we find that MYOW outperforms other self-supervised approaches in all examples (in some cases by more than 10%), and often surpasses the supervised baseline. With MYOW, we show that it is possible to harness the diversity of the data to build rich views and leverage self-supervision in new domains where augmentations are limited or unknown.
△ Less
Submitted 13 December, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Large-Scale Representation Learning on Graphs via Bootstrapping
Authors:
Shantanu Thakoor,
Corentin Tallec,
Mohammad Gheshlaghi Azar,
Mehdi Azabou,
Eva L. Dyer,
Rémi Munos,
Petar Veličković,
Michal Valko
Abstract:
Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootst…
▽ More
Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input. BGRL uses only simple augmentations and alleviates the need for contrasting with negative examples, and is thus scalable by design. BGRL outperforms or matches prior methods on several established benchmarks, while achieving a 2-10x reduction in memory costs. Furthermore, we show that BGRL can be scaled up to extremely large graphs with hundreds of millions of nodes in the semi-supervised regime - achieving state-of-the-art performance and improving over supervised baselines where representations are shaped only through label information. In particular, our solution centered on BGRL constituted one of the winning entries to the Open Graph Benchmark - Large Scale Challenge at KDD Cup 2021, on a graph orders of magnitudes larger than all previously available benchmarks, thus demonstrating the scalability and effectiveness of our approach.
△ Less
Submitted 20 February, 2023; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Making transport more robust and interpretable by moving data through a small number of anchor points
Authors:
Chi-Heng Lin,
Mehdi Azabou,
Eva L. Dyer
Abstract:
Optimal transport (OT) is a widely used technique for distribution alignment, with applications throughout the machine learning, graphics, and vision communities. Without any additional structural assumptions on trans-port, however, OT can be fragile to outliers or noise, especially in high dimensions. Here, we introduce a new form of structured OT that simultaneously learns low-dimensional struct…
▽ More
Optimal transport (OT) is a widely used technique for distribution alignment, with applications throughout the machine learning, graphics, and vision communities. Without any additional structural assumptions on trans-port, however, OT can be fragile to outliers or noise, especially in high dimensions. Here, we introduce a new form of structured OT that simultaneously learns low-dimensional structure in data while leveraging this structure to solve the alignment task. Compared with OT, the resulting transport plan has better structural interpretability, highlighting the connections between individual data points and local geometry, and is more robust to noise and sampling. We apply the method to synthetic as well as real datasets, where we show that our method can facilitate alignment in noisy settings and can be used to both correct and interpret domain shift.
△ Less
Submitted 17 July, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
Bayesian optimization for modular black-box systems with switching costs
Authors:
Chi-Heng Lin,
Joseph D. Miano,
Eva L. Dyer
Abstract:
Most existing black-box optimization methods assume that all variables in the system being optimized have equal cost and can change freely at each iteration. However, in many real world systems, inputs are passed through a sequence of different operations or modules, making variables in earlier stages of processing more costly to update. Such structure imposes a cost on switching variables in earl…
▽ More
Most existing black-box optimization methods assume that all variables in the system being optimized have equal cost and can change freely at each iteration. However, in many real world systems, inputs are passed through a sequence of different operations or modules, making variables in earlier stages of processing more costly to update. Such structure imposes a cost on switching variables in early parts of a data processing pipeline. In this work, we propose a new algorithm for switch cost-aware optimization called Lazy Modular Bayesian Optimization (LaMBO). This method efficiently identifies the global optimum while minimizing cost through a passive change of variables in early modules. The method is theoretical grounded and achieves vanishing regret when augmented with switching cost. We apply LaMBO to multiple synthetic functions and a three-stage image segmentation pipeline used in a neuroscience application, where we obtain promising improvements over prevailing cost-aware Bayesian optimization algorithms. Our results demonstrate that LaMBO is an effective strategy for black-box optimization that is capable of minimizing switching costs in modular systems.
△ Less
Submitted 11 October, 2021; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Hierarchical Optimal Transport for Multimodal Distribution Alignment
Authors:
John Lee,
Max Dabagia,
Eva L. Dyer,
Christopher J. Rozell
Abstract:
In many machine learning applications, it is necessary to meaningfully aggregate, through alignment, different but related datasets. Optimal transport (OT)-based approaches pose alignment as a divergence minimization problem: the aim is to transform a source dataset to match a target dataset using the Wasserstein distance as a divergence measure. We introduce a hierarchical formulation of OT which…
▽ More
In many machine learning applications, it is necessary to meaningfully aggregate, through alignment, different but related datasets. Optimal transport (OT)-based approaches pose alignment as a divergence minimization problem: the aim is to transform a source dataset to match a target dataset using the Wasserstein distance as a divergence measure. We introduce a hierarchical formulation of OT which leverages clustered structure in data to improve alignment in noisy, ambiguous, or multimodal settings. To solve this numerically, we propose a distributed ADMM algorithm that also exploits the Sinkhorn distance, thus it has an efficient computational complexity that scales quadratically with the size of the largest cluster. When the transformation between two datasets is unitary, we provide performance guarantees that describe when and how well aligned cluster correspondences can be recovered with our formulation, as well as provide worst-case dataset geometry for such a strategy. We apply this method to synthetic datasets that model data as mixtures of low-rank Gaussians and study the impact that different geometric properties of the data have on alignment. Next, we applied our approach to a neural decoding application where the goal is to predict movement directions and instantaneous velocities from populations of neurons in the macaque primary motor cortex. Our results demonstrate that when clustered structure exists in datasets, and is consistent across trials or time points, a hierarchical alignment strategy that leverages such structure can provide significant improvements in cross-domain alignment.
△ Less
Submitted 3 November, 2019; v1 submitted 27 June, 2019;
originally announced June 2019.
-
Quantifying mesoscale neuroanatomy using X-ray microtomography
Authors:
Eva L. Dyer,
William Gray Roncal,
Hugo L. Fernandes,
Doga Gürsoy,
Vincent De Andrade,
Rafael Vescovi,
Kamel Fezzaa,
Xianghui Xiao,
Joshua T. Vogelstein,
Chris Jacobsen,
Konrad P. Körding,
Narayanan Kasthuri
Abstract:
Methods for resolving the 3D microstructure of the brain typically start by thinly slicing and staining the brain, and then imaging each individual section with visible light photons or electrons. In contrast, X-rays can be used to image thick samples, providing a rapid approach for producing large 3D brain maps without sectioning. Here we demonstrate the use of synchrotron X-ray microtomography (…
▽ More
Methods for resolving the 3D microstructure of the brain typically start by thinly slicing and staining the brain, and then imaging each individual section with visible light photons or electrons. In contrast, X-rays can be used to image thick samples, providing a rapid approach for producing large 3D brain maps without sectioning. Here we demonstrate the use of synchrotron X-ray microtomography ($μ$CT) for producing mesoscale $(1~μm^3)$ resolution brain maps from millimeter-scale volumes of mouse brain. We introduce a pipeline for $μ$CT-based brain mapping that combines methods for sample preparation, imaging, automated segmentation of image volumes into cells and blood vessels, and statistical analysis of the resulting brain structures. Our results demonstrate that X-ray tomography promises rapid quantification of large brain volumes, complementing other brain mapping and connectomics efforts.
△ Less
Submitted 26 July, 2016; v1 submitted 12 April, 2016;
originally announced April 2016.
-
oASIS: Adaptive Column Sampling for Kernel Matrix Approximation
Authors:
Raajen Patel,
Thomas A. Goldstein,
Eva L. Dyer,
Azalia Mirhoseini,
Richard G. Baraniuk
Abstract:
Kernel matrices (e.g. Gram or similarity matrices) are essential for many state-of-the-art approaches to classification, clustering, and dimensionality reduction. For large datasets, the cost of forming and factoring such kernel matrices becomes intractable. To address this challenge, we introduce a new adaptive sampling algorithm called Accelerated Sequential Incoherence Selection (oASIS) that sa…
▽ More
Kernel matrices (e.g. Gram or similarity matrices) are essential for many state-of-the-art approaches to classification, clustering, and dimensionality reduction. For large datasets, the cost of forming and factoring such kernel matrices becomes intractable. To address this challenge, we introduce a new adaptive sampling algorithm called Accelerated Sequential Incoherence Selection (oASIS) that samples columns without explicitly computing the entire kernel matrix. We provide conditions under which oASIS is guaranteed to exactly recover the kernel matrix with an optimal number of columns selected. Numerical experiments on both synthetic and real-world datasets demonstrate that oASIS achieves performance comparable to state-of-the-art adaptive sampling methods at a fraction of the computational cost. The low runtime complexity of oASIS and its low memory footprint enable the solution of large problems that are simply intractable using other adaptive methods.
△ Less
Submitted 19 May, 2015;
originally announced May 2015.
-
Self-Expressive Decompositions for Matrix Approximation and Clustering
Authors:
Eva L. Dyer,
Tom A. Goldstein,
Raajen Patel,
Konrad P. Kording,
Richard G. Baraniuk
Abstract:
Data-aware methods for dimensionality reduction and matrix decomposition aim to find low-dimensional structure in a collection of data. Classical approaches discover such structure by learning a basis that can efficiently express the collection. Recently, "self expression", the idea of using a small subset of data vectors to represent the full collection, has been developed as an alternative to le…
▽ More
Data-aware methods for dimensionality reduction and matrix decomposition aim to find low-dimensional structure in a collection of data. Classical approaches discover such structure by learning a basis that can efficiently express the collection. Recently, "self expression", the idea of using a small subset of data vectors to represent the full collection, has been developed as an alternative to learning. Here, we introduce a scalable method for computing sparse SElf-Expressive Decompositions (SEED). SEED is a greedy method that constructs a basis by sequentially selecting incoherent vectors from the dataset. After forming a basis from a subset of vectors in the dataset, SEED then computes a sparse representation of the dataset with respect to this basis. We develop sufficient conditions under which SEED exactly represents low rank matrices and vectors sampled from a unions of independent subspaces. We show how SEED can be used in applications ranging from matrix approximation and denoising to clustering, and apply it to numerous real-world datasets. Our results demonstrate that SEED is an attractive low-complexity alternative to other sparse matrix factorization approaches such as sparse PCA and self-expressive methods for clustering.
△ Less
Submitted 4 May, 2015;
originally announced May 2015.
-
RankMap: A Platform-Aware Framework for Distributed Learning from Dense Datasets
Authors:
Azalia Mirhoseini,
Eva L. Dyer,
Ebrahim. M. Songhori,
Richard G. Baraniuk,
Farinaz Koushanfar
Abstract:
This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense datasets. Our framework exploits data structure to factorize it into an ensemble of lower rank subspaces. The factorization creates sparse low-dimensional representations of the data, a property which is leveraged to devise effective m…
▽ More
This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense datasets. Our framework exploits data structure to factorize it into an ensemble of lower rank subspaces. The factorization creates sparse low-dimensional representations of the data, a property which is leveraged to devise effective mapping and scheduling of iterative learning algorithms on the distributed computing machines. We provide two APIs, one matrix-based and one graph-based, which facilitate automated adoption of the framework for performing several contemporary learning applications. To demonstrate the utility of RankMap, we solve sparse recovery and power iteration problems on various real-world datasets with up to 1.8 billion non-zeros. Our evaluations are performed on Amazon EC2 and IBM iDataPlex servers using up to 244 cores. The results demonstrate up to two orders of magnitude improvements in memory usage, execution speed, and bandwidth compared with the best reported prior work, while achieving the same level of learning accuracy.
△ Less
Submitted 27 October, 2016; v1 submitted 27 March, 2015;
originally announced March 2015.
-
Greedy Feature Selection for Subspace Clustering
Authors:
Eva L. Dyer,
Aswin C. Sankaranarayanan,
Richard G. Baraniuk
Abstract:
Unions of subspaces provide a powerful generalization to linear subspace models for collections of high-dimensional data. To learn a union of subspaces from a collection of data, sets of signals in the collection that belong to the same subspace must be identified in order to obtain accurate estimates of the subspace structures present in the data. Recently, sparse recovery methods have been shown…
▽ More
Unions of subspaces provide a powerful generalization to linear subspace models for collections of high-dimensional data. To learn a union of subspaces from a collection of data, sets of signals in the collection that belong to the same subspace must be identified in order to obtain accurate estimates of the subspace structures present in the data. Recently, sparse recovery methods have been shown to provide a provable and robust strategy for exact feature selection (EFS)--recovering subsets of points from the ensemble that live in the same subspace. In parallel with recent studies of EFS with L1-minimization, in this paper, we develop sufficient conditions for EFS with a greedy method for sparse signal recovery known as orthogonal matching pursuit (OMP). Following our analysis, we provide an empirical study of feature selection strategies for signals living on unions of subspaces and characterize the gap between sparse recovery methods and nearest neighbor (NN)-based approaches. In particular, we demonstrate that sparse recovery methods provide significant advantages over NN methods and the gap between the two approaches is particularly pronounced when the sampling of subspaces in the dataset is sparse. Our results suggest that OMP may be employed to reliably recover exact feature sets in a number of regimes where NN approaches fail to reveal the subspace membership of points in the ensemble.
△ Less
Submitted 3 July, 2013; v1 submitted 19 March, 2013;
originally announced March 2013.