Search | arXiv e-print repository

Understanding Generative AI Content with Embedding Models

Authors: Max Vargas, Reilly Cannon, Andrew Engel, Anand D. Sarwate, Tony Chiang

Abstract: The construction of high-quality numerical features is critical to any quantitative data analysis. Feature engineering has been historically addressed by carefully hand-crafting data representations based on domain expertise. This work views the internal representations of modern deep neural networks (DNNs), called embeddings, as an automated form of traditional feature engineering. For trained DN… ▽ More The construction of high-quality numerical features is critical to any quantitative data analysis. Feature engineering has been historically addressed by carefully hand-crafting data representations based on domain expertise. This work views the internal representations of modern deep neural networks (DNNs), called embeddings, as an automated form of traditional feature engineering. For trained DNNs, we show that these embeddings can reveal interpretable, high-level concepts in unstructured sample data. We use these embeddings in natural language and computer vision tasks to uncover both inherent heterogeneity in the underlying data and human-understandable explanations for it. In particular, we find empirical evidence that there is inherent separability between real data and that generated from AI models. △ Less

Submitted 22 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

arXiv:2402.03535 [pdf, other]

Preliminary Report on Mantis Shrimp: a Multi-Survey Computer Vision Photometric Redshift Model

Authors: Andrew Engel, Gautham Narayan, Nell Byler

Abstract: The availability of large, public, multi-modal astronomical datasets presents an opportunity to execute novel research that straddles the line between science of AI and science of astronomy. Photometric redshift estimation is a well-established subfield of astronomy. Prior works show that computer vision models typically outperform catalog-based models, but these models face additional complexitie… ▽ More The availability of large, public, multi-modal astronomical datasets presents an opportunity to execute novel research that straddles the line between science of AI and science of astronomy. Photometric redshift estimation is a well-established subfield of astronomy. Prior works show that computer vision models typically outperform catalog-based models, but these models face additional complexities when incorporating images from more than one instrument or sensor. In this report, we detail our progress creating Mantis Shrimp, a multi-survey computer vision model for photometric redshift estimation that fuses ultra-violet (GALEX), optical (PanSTARRS), and infrared (UnWISE) imagery. We use deep learning interpretability diagnostics to measure how the model leverages information from the different inputs. We reason about the behavior of the CNNs from the interpretability metrics, specifically framing the result in terms of physically-grounded knowledge of galaxy properties. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 4 pages, 1 figure, 1 table. Submitted to AI4Differential Equations in Science Workshop at ICLR24. Public repository unavailable while under institutional review

arXiv:2310.18612 [pdf, other]

Efficient kernel surrogates for neural network-based regression

Authors: Saad Qadeer, Andrew Engel, Amanda Howard, Adam Tsou, Max Vargas, Panos Stinis, Tony Chiang

Abstract: Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialize… ▽ More Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialized DNNs in the infinite width limit converge to kernel machines relying on a Neural Tangent Kernel (NTK) with known closed form. These results suggest, and experimental evidence corroborates, that empirical kernel machines can also act as surrogates for finite width DNNs. The high computational cost of assembling the full NTK, however, makes this approach infeasible in practice, motivating the need for low-cost approximations. In the current work, we study the performance of the Conjugate Kernel (CK), an efficient approximation to the NTK that has been observed to yield fairly similar results. For the regression problem of smooth functions and logistic regression classification, we show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior. In particular, we establish bounds for the relative test losses, verify them with numerical tests, and identify the regularity of the kernel as the key determinant of performance. In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively. We present a demonstration of this on the foundation model GPT-2 by comparing its performance on a classification task using a conventional approach and our prescription. We also show how our approach can be used to improve physics-informed operator network training for regression tasks as well as convolutional neural network training for vision classification tasks. △ Less

Submitted 24 January, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

Comments: 35 pages. software used to reach results available upon request, approved for release by Pacific Northwest National Laboratory

Report number: PNNL-SA-191858 MSC Class: 68T07; 65M99

arXiv:2310.13836 [pdf, other]

Foundation Model's Embedded Representations May Detect Distribution Shift

Authors: Max Vargas, Adam Tsou, Andrew Engel, Tony Chiang

Abstract: Sampling biases can cause distribution shifts between train and test datasets for supervised learning tasks, obscuring our ability to understand the generalization capacity of a model. This is especially important considering the wide adoption of pre-trained foundational neural networks -- whose behavior remains poorly understood -- for transfer learning (TL) tasks. We present a case study for TL… ▽ More Sampling biases can cause distribution shifts between train and test datasets for supervised learning tasks, obscuring our ability to understand the generalization capacity of a model. This is especially important considering the wide adoption of pre-trained foundational neural networks -- whose behavior remains poorly understood -- for transfer learning (TL) tasks. We present a case study for TL on the Sentiment140 dataset and show that many pre-trained foundation models encode different representations of Sentiment140's manually curated test set $M$ from the automatically labeled training set $P$, confirming that a distribution shift has occurred. We argue training on $P$ and measuring performance on $M$ is a biased measure of generalization. Experiments on pre-trained GPT-2 show that the features learnable from $P$ do not improve (and in fact hamper) performance on $M$. Linear probes on pre-trained GPT-2's representations are robust and may even outperform overall fine-tuning, implying a fundamental importance for discerning distribution shift in train/test splits for model interpretation. △ Less

Submitted 2 February, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: 17 pages, 8 figures, 5 tables

arXiv:2309.15328 [pdf, other]

Exploring Learned Representations of Neural Networks with Principal Component Analysis

Authors: Amit Harlev, Andrew Engel, Panos Stinis, Tony Chiang

Abstract: Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We sh… ▽ More Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We show that in certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification and that across all layers, the first ~100 PCs completely determine the performance of the k-NN and NCC classifiers. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation with an affine linear model the best performing. We also show that leveraging several surrogate models affords us a clever method to estimate where neural collapse may initially occur within the DNN. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures

arXiv:2305.14585 [pdf, other]

Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate Models

Authors: Andrew Engel, Zhichao Wang, Natalie S. Frank, Ioana Dumitriu, Sutanay Choudhury, Anand Sarwate, Tony Chiang

Abstract: A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution… ▽ More A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution. Approximation is critical for eNTK analysis due to the high computational cost to compute the eNTK. We define new approximate eNTK and perform novel analysis on how well the resulting kernel machine surrogate models correlate with the underlying neural network. We introduce two new random projection variants of approximate eNTK which allow users to tune the time and memory complexity of their calculation. We conclude that kernel machines using approximate neural tangent kernel as the kernel function are effective surrogate models, with the introduced trace NTK the most consistent performer. Open source software allowing users to efficiently calculate kernel functions in the PyTorch framework is available (https://github.com/pnnl/projection\_ntk). △ Less

Submitted 11 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 9 pages, 2 figures, 3 tables Updated 3/11/2024 various additions/clarifications after ICLR review. Accepted as a Spotlight paper at ICLR 2024

arXiv:2302.02337 [pdf]

Regulating ChatGPT and other Large Generative AI Models

Authors: Philipp Hacker, Andreas Engel, Marco Mauer

Abstract: Large generative AI models (LGAIMs), such as ChatGPT, GPT-4 or Stable Diffusion, are rapidly transforming the way we communicate, illustrate, and create. However, AI regulation, in the EU and beyond, has primarily focused on conventional AI models, not LGAIMs. This paper will situate these new generative models in the current debate on trustworthy AI regulation, and ask how the law can be tailored… ▽ More Large generative AI models (LGAIMs), such as ChatGPT, GPT-4 or Stable Diffusion, are rapidly transforming the way we communicate, illustrate, and create. However, AI regulation, in the EU and beyond, has primarily focused on conventional AI models, not LGAIMs. This paper will situate these new generative models in the current debate on trustworthy AI regulation, and ask how the law can be tailored to their capabilities. After laying technical foundations, the legal part of the paper proceeds in four steps, covering (1) direct regulation, (2) data protection, (3) content moderation, and (4) policy proposals. It suggests a novel terminology to capture the AI value chain in LGAIM settings by differentiating between LGAIM developers, deployers, professional and non-professional users, as well as recipients of LGAIM output. We tailor regulatory duties to these different actors along the value chain and suggest strategies to ensure that LGAIMs are trustworthy and deployed for the benefit of society at large. Rules in the AI Act and other direct regulation must match the specificities of pre-trained models. The paper argues for three layers of obligations concerning LGAIMs (minimum standards for all LGAIMs; high-risk obligations for high-risk use cases; collaborations along the AI value chain). In general, regulation should focus on concrete high-risk applications, and not the pre-trained model itself, and should include (i) obligations regarding transparency and (ii) risk management. Non-discrimination provisions (iii) may, however, apply to LGAIM developers. Lastly, (iv) the core of the DSA content moderation rules should be expanded to cover LGAIMs. This includes notice and action mechanisms, and trusted flaggers. In all areas, regulators and lawmakers need to act fast to keep track with the dynamics of ChatGPT et al. △ Less

Submitted 12 May, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: FAccT '23, June 12-15, 2023, Chicago, IL, USA

ACM Class: I.2

arXiv:2211.06506 [pdf, other]

Spectral Evolution and Invariance in Linear-width Neural Networks

Authors: Zhichao Wang, Andrew Engel, Anand Sarwate, Ioana Dumitriu, Tony Chiang

Abstract: We investigate the spectral properties of linear-width feed-forward neural networks, where the sample size is asymptotically proportional to network width. Empirically, we show that the spectra of weight in this high dimensional regime are invariant when trained by gradient descent for small constant learning rates; we provide a theoretical justification for this observation and prove the invarian… ▽ More We investigate the spectral properties of linear-width feed-forward neural networks, where the sample size is asymptotically proportional to network width. Empirically, we show that the spectra of weight in this high dimensional regime are invariant when trained by gradient descent for small constant learning rates; we provide a theoretical justification for this observation and prove the invariance of the bulk spectra for both conjugate and neural tangent kernels. We demonstrate similar characteristics when training with stochastic gradient descent with small learning rates. When the learning rate is large, we exhibit the emergence of an outlier whose corresponding eigenvector is aligned with the training data structure. We also show that after adaptive gradient training, where a lower test error and feature learning emerge, both weight and kernel matrices exhibit heavy tail behavior. Simple examples are provided to explain when heavy tails can have better generalizations. We exhibit different spectral properties such as invariant bulk, spike, and heavy-tailed distribution from a two-layer neural network using different training strategies, and then correlate them to the feature learning. Analogous phenomena also appear when we train conventional neural networks with real-world data. We conclude that monitoring the evolution of the spectra during training is an essential step toward understanding the training dynamics and feature learning. △ Less

Submitted 7 November, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: Accepted by NeurIPS 2023

arXiv:2205.12372 [pdf, other]

TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models

Authors: Andrew Engel, Zhichao Wang, Anand D. Sarwate, Sutanay Choudhury, Tony Chiang

Abstract: We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework. We provide an efficient method to calculate the NTK of multilayer perceptrons. We compare the explicit differentiation implementation against autodifferentiation implementations, which have the benefit of extending the utility of the library to any archi… ▽ More We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework. We provide an efficient method to calculate the NTK of multilayer perceptrons. We compare the explicit differentiation implementation against autodifferentiation implementations, which have the benefit of extending the utility of the library to any architecture supported by PyTorch, such as convolutional networks. A feature of the library is that we expose the user to layerwise NTK components, and show that in some regimes a layerwise calculation is more memory efficient. We conduct preliminary experiments to demonstrate use cases for the software and probe the NTK. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: 19 pages, 5 figures

arXiv:2201.05242 [pdf, other]

Neural Circuit Architectural Priors for Embodied Control

Authors: Nikhil X. Bhattasali, Anthony M. Zador, Tatiana A. Engel

Abstract: Artificial neural networks for motor control usually adopt generic architectures like fully connected MLPs. While general, these tabula rasa architectures rely on large amounts of experience to learn, are not easily transferable to new bodies, and have internal dynamics that are difficult to interpret. In nature, animals are born with highly structured connectivity in their nervous systems shaped… ▽ More Artificial neural networks for motor control usually adopt generic architectures like fully connected MLPs. While general, these tabula rasa architectures rely on large amounts of experience to learn, are not easily transferable to new bodies, and have internal dynamics that are difficult to interpret. In nature, animals are born with highly structured connectivity in their nervous systems shaped by evolution; this innate circuitry acts synergistically with learning mechanisms to provide inductive biases that enable most animals to function well soon after birth and learn efficiently. Convolutional networks inspired by visual circuitry have encoded useful biases for vision. However, it is unknown the extent to which ANN architectures inspired by neural circuitry can yield useful biases for other AI domains. In this work, we ask what advantages biologically inspired ANN architecture can provide in the domain of motor control. Specifically, we translate C. elegans locomotion circuits into an ANN model controlling a simulated Swimmer agent. On a locomotion task, our architecture achieves good initial performance and asymptotic performance comparable with MLPs, while dramatically improving data efficiency and requiring orders of magnitude fewer parameters. Our architecture is interpretable and transfers to new body designs. An ablation analysis shows that constrained excitation/inhibition is crucial for learning, while weight initialization contributes to good initial performance. Our work demonstrates several advantages of biologically inspired ANN architecture and encourages future work in more complex embodied control. △ Less

Submitted 27 November, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

Comments: NeurIPS 2022

arXiv:2012.14944 [pdf, other]

doi 10.1038/s41467-021-26202-1

Learning non-stationary Langevin dynamics from stochastic observations of latent trajectories

Authors: Mikhail Genkin, Owen Hughes, Tatiana A. Engel

Abstract: Many complex systems operating far from the equilibrium exhibit stochastic dynamics that can be described by a Langevin equation. Inferring Langevin equations from data can reveal how transient dynamics of such systems give rise to their function. However, dynamics are often inaccessible directly and can be only gleaned through a stochastic observation process, which makes the inference challengin… ▽ More Many complex systems operating far from the equilibrium exhibit stochastic dynamics that can be described by a Langevin equation. Inferring Langevin equations from data can reveal how transient dynamics of such systems give rise to their function. However, dynamics are often inaccessible directly and can be only gleaned through a stochastic observation process, which makes the inference challenging. Here we present a non-parametric framework for inferring the Langevin equation, which explicitly models the stochastic observation process and non-stationary latent dynamics. The framework accounts for the non-equilibrium initial and final states of the observed system and for the possibility that the system's dynamics define the duration of observations. Omitting any of these non-stationary components results in incorrect inference, in which erroneous features arise in the dynamics due to non-stationary data distribution. We illustrate the framework using models of neural dynamics underlying decision making in the brain. △ Less

Submitted 29 December, 2020; originally announced December 2020.

Journal ref: Nat Commun 12, 5986 (2021)

arXiv:2012.12253 [pdf, other]

Improving Sample and Feature Selection with Principal Covariates Regression

Authors: Rose K. Cersonsky, Benjamin A. Helfrecht, Edgar A. Engel, Michele Ceriotti

Abstract: Selecting the most relevant features and samples out of a large set of candidates is a task that occurs very often in the context of automated data analysis, where it can be used to improve the computational performance, and also often the transferability, of a model. Here we focus on two popular sub-selection schemes which have been applied to this end: CUR decomposition, that is based on a low-r… ▽ More Selecting the most relevant features and samples out of a large set of candidates is a task that occurs very often in the context of automated data analysis, where it can be used to improve the computational performance, and also often the transferability, of a model. Here we focus on two popular sub-selection schemes which have been applied to this end: CUR decomposition, that is based on a low-rank approximation of the feature matrix and Farthest Point Sampling, that relies on the iterative identification of the most diverse samples and discriminating features. We modify these unsupervised approaches, incorporating a supervised component following the same spirit as the Principal Covariates Regression (PCovR) method. We show that incorporating target information provides selections that perform better in supervised tasks, which we demonstrate with ridge regression, kernel ridge regression, and sparse kernel regression. We also show that incorporating aspects of simple supervised learning models can improve the accuracy of more complex models, such as feed-forward neural networks. We present adjustments to minimize the impact that any subselection may incur when performing unsupervised tasks. We demonstrate the significant improvements associated with the use of PCov-CUR and PCov-FPS selections for applications to chemistry and materials science, typically reducing by a factor of two the number of features and samples which are required to achieve a given level of regression accuracy. △ Less

Submitted 22 December, 2020; originally announced December 2020.

arXiv:2011.08828 [pdf, other]

doi 10.1063/5.0036522

Uncertainty estimation for molecular dynamics and sampling

Authors: Giulio Imbalzano, Yongbin Zhuang, Venkat Kapil, Kevin Rossi, Edgar A. Engel, Federico Grasselli, Michele Ceriotti

Abstract: Machine learning models have emerged as a very effective strategy to sidestep time-consuming electronic-structure calculations, enabling accurate simulations of greater size, time scale and complexity. Given the interpolative nature of these models, the reliability of predictions depends on the position in phase space, and it is crucial to obtain an estimate of the error that derives from the fini… ▽ More Machine learning models have emerged as a very effective strategy to sidestep time-consuming electronic-structure calculations, enabling accurate simulations of greater size, time scale and complexity. Given the interpolative nature of these models, the reliability of predictions depends on the position in phase space, and it is crucial to obtain an estimate of the error that derives from the finite number of reference structures included during the training of the model. When using a machine-learning potential to sample a finite-temperature ensemble, the uncertainty on individual configurations translates into an error on thermodynamic averages, and provides an indication for the loss of accuracy when the simulation enters a previously unexplored region. Here we discuss how uncertainty quantification can be used, together with a baseline energy model, or a more robust although less accurate interatomic potential, to obtain more resilient simulations and to support active-learning strategies. Furthermore, we introduce an on-the-fly reweighing scheme that makes it possible to estimate the uncertainty in the thermodynamic averages extracted from long trajectories. We present examples covering different types of structural and thermodynamic properties, and systems as diverse as water and liquid gallium. △ Less

Submitted 14 January, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: 17 pages, 9 figures

arXiv:0809.1138 [pdf, other]

Derivation of evolutionary payoffs from observable behavior

Authors: Alexander Feigel, Avraham Englander, Assaf Engel

Abstract: Interpretation of animal behavior, especially as cooperative or selfish, is a challenge for evolutionary theory. Strategy of a competition should follow from corresponding Darwinian payoffs for the available behavioral options. The payoffs and decision making processes, however, are difficult to observe and quantify. Here we present a general method for the derivation of evolutionary payoffs fro… ▽ More Interpretation of animal behavior, especially as cooperative or selfish, is a challenge for evolutionary theory. Strategy of a competition should follow from corresponding Darwinian payoffs for the available behavioral options. The payoffs and decision making processes, however, are difficult to observe and quantify. Here we present a general method for the derivation of evolutionary payoffs from observable statistics of interactions. The method is applied to combat of male bowl and doily spiders, to predator inspection by sticklebacks and to territorial defense by lions, demonstrating animal behavior as a new type of game theoretical equilibrium. Games animals play may be derived unequivocally from their observable behavior, the reconstruction, however, can be subjected to fundamental limitations due to our inability to observe all information exchange mechanisms (communication). △ Less

Submitted 6 September, 2008; originally announced September 2008.

Comments: 9 pages, 3 figures

arXiv:0808.3203 [pdf, ps, other]

doi 10.1371/journal.pone.0006012

Sex is always well worth its two-fold cost

Authors: Alexander Feigel, Avraham Englander, Assaf Engel

Abstract: Sex is considered as an evolutionary paradox, since its evolutionary advantage does not necessarily overcome the two fold cost of sharing half of one's offspring's genome with another member of the population. Here we demonstrate that sexual reproduction can be evolutionary stable even when its Darwinian fitness is twice as low when compared to the fitness of asexual mutants. We also show that m… ▽ More Sex is considered as an evolutionary paradox, since its evolutionary advantage does not necessarily overcome the two fold cost of sharing half of one's offspring's genome with another member of the population. Here we demonstrate that sexual reproduction can be evolutionary stable even when its Darwinian fitness is twice as low when compared to the fitness of asexual mutants. We also show that more than two sexes are always evolutionary unstable. Our approach generalizes the evolutionary game theory to analyze species whose members are able to sense the sexual state of their conspecifics and to switch sexes consequently. The widespread emergence and maintenance of sex follows therefore from its co-evolution with even more widespread environmental sensing abilities. △ Less

Submitted 23 August, 2008; originally announced August 2008.

Comments: 8 pages, 3 figures

Showing 1–15 of 15 results for author: Engel, A