Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Mallinar, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20199  [pdf, other

    stat.ML cs.LG

    Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

    Authors: Neil Mallinar, Daniel Beaglehole, Libin Zhu, Adityanarayanan Radhakrishnan, Parthe Pandit, Mikhail Belkin

    Abstract: Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example of "emergence", where model ability manifests sharply through a phase transition. In this work, we show that the phenomenon of grokking is not specific to neura… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2404.00522  [pdf, other

    cs.LG stat.ML

    Minimum-Norm Interpolation Under Covariate Shift

    Authors: Neil Mallinar, Austin Zane, Spencer Frei, Bin Yu

    Abstract: Transfer learning is a critical part of real-world machine learning deployments and has been extensively studied in experimental works with overparameterized neural networks. However, even in the simplest setting of linear regression a notable gap still exists in the theoretical understanding of transfer learning. In-distribution research on high-dimensional linear regression has led to the identi… ▽ More

    Submitted 17 July, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: The Forty-first International Conference on Machine Learning (ICML 2024)

  3. arXiv:2210.01964  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    The Calibration Generalization Gap

    Authors: A. Michael Carrell, Neil Mallinar, James Lucas, Preetum Nakkiran

    Abstract: Calibration is a fundamental property of a good predictive model: it requires that the model predicts correctly in proportion to its confidence. Modern neural networks, however, provide no strong guarantees on their calibration -- and can be either poorly calibrated or well-calibrated depending on the setting. It is currently unclear which factors contribute to good calibration (architecture, data… ▽ More

    Submitted 6 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Appeared at ICML 2022 Workshop on Distribution-Free Uncertainty Quantification

  4. arXiv:2207.06569  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

    Authors: Neil Mallinar, James B. Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran

    Abstract: The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body… ▽ More

    Submitted 15 July, 2024; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: NM and JS co-first authors

  5. arXiv:2002.01412  [pdf, other

    cs.LG cs.CL

    Iterative Data Programming for Expanding Text Classification Corpora

    Authors: Neil Mallinar, Abhishek Shah, Tin Kam Ho, Rajendra Ugrani, Ayush Gupta

    Abstract: Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data sets quickly via a general framework for building weak models, also known as labeling functions, and denoising them through ensemble learning techniques. We prese… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

    Comments: 6 pages, 2 figures, In Proceedings of the AAAI Conference on Artificial Intelligence 2020 (IAAI Technical Track: Emerging Papers)

  6. arXiv:1907.13121  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-Frame Cross-Entropy Training for Convolutional Neural Networks in Speech Recognition

    Authors: Tom Sercu, Neil Mallinar

    Abstract: We introduce Multi-Frame Cross-Entropy training (MFCE) for convolutional neural network acoustic models. Recognizing that similar to RNNs, CNNs are in nature sequence models that take variable length inputs, we propose to take as input to the CNN a part of an utterance long enough that multiple labels are predicted at once, therefore getting cross-entropy loss signal from multiple adjacent frames.… ▽ More

    Submitted 29 July, 2019; originally announced July 2019.

  7. arXiv:1812.06176  [pdf, other

    cs.AI cs.CL

    Bootstrapping Conversational Agents With Weak Supervision

    Authors: Neil Mallinar, Abhishek Shah, Rajendra Ugrani, Ayush Gupta, Manikandan Gurusankar, Tin Kam Ho, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, Robert Yates, Chris Desmarais, Blake McGregor

    Abstract: Many conversational agents in the market today follow a standard bot development framework which requires training intent classifiers to recognize user input. The need to create a proper set of training examples is often the bottleneck in the development process. In many occasions agent developers have access to historical chat logs that can provide a good quantity as well as coverage of training… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 6 pages, 3 figures, 1 table, Accepted for publication in IAAI 2019

  8. arXiv:1807.03848  [pdf, ps, other

    cs.CV

    Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition

    Authors: Chun-Fu Chen, Quanfu Fan, Neil Mallinar, Tom Sercu, Rogerio Feris

    Abstract: In this paper, we propose a novel Convolutional Neural Network (CNN) architecture for learning multi-scale feature representations with good tradeoffs between speed and accuracy. This is achieved by using a multi-branch network, which has different computational complexity at different branches. Through frequent merging of features from branches at distinct scales, our model obtains multi-scale fe… ▽ More

    Submitted 30 July, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: git repo: https://github.com/IBM/BigLittleNet

  9. arXiv:1801.05407  [pdf, other

    stat.ML cs.LG

    Deep Canonically Correlated LSTMs

    Authors: Neil Mallinar, Corbin Rosset

    Abstract: We examine Deep Canonically Correlated LSTMs as a way to learn nonlinear transformations of variable length sequences and embed them into a correlated, fixed dimensional space. We use LSTMs to transform multi-view time-series data non-linearly while learning temporal relationships within the data. We then perform correlation analysis on the outputs of these neural networks to find a correlated sub… ▽ More

    Submitted 16 January, 2018; originally announced January 2018.

    Comments: 8 pages, 3 figures, accepted as the undergraduate honors thesis for Neil Mallinar by The Johns Hopkins University