Skip to main content

Showing 1–21 of 21 results for author: Shwartz-Ziv, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19314  [pdf, other

    cs.CL cs.AI cs.LG

    LiveBench: A Challenging, Contamination-Free LLM Benchmark

    Authors: Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Yann LeCun, Tom Goldstein, Willie Neiswanger, Micah Goldblum

    Abstract: Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.14657  [pdf, other

    cs.CL cs.AI cs.LG

    OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

    Authors: Allen Roush, Yusuf Shabazz, Arvind Balaji, Peter Zhang, Stefano Mezza, Markus Zhang, Sanjay Basu, Sriram Vishwanath, Mehdi Fatemi, Ravid Shwartz-Ziv

    Abstract: We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable r… ▽ More

    Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted for Publication to ARGMIN 2024 at ACL2024

  3. arXiv:2406.11463  [pdf, other

    cs.LG stat.ML

    Just How Flexible are Neural Networks in Practice?

    Authors: Ravid Shwartz-Ziv, Micah Goldblum, Arpit Bansal, C. Bayan Bruss, Yann LeCun, Andrew Gordon Wilson

    Abstract: It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.09366  [pdf, other

    cs.LG cs.CV q-bio.NC

    Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

    Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

    Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2405.05012  [pdf, other

    cs.CV

    The Entropy Enigma: Success and Failure of Entropy Minimization

    Authors: Ori Press, Ravid Shwartz-Ziv, Yann LeCun, Matthias Bethge

    Abstract: Entropy minimization (EM) is frequently used to increase the accuracy of classification models when they're faced with new data at test time. EM is a self-supervised learning method that optimizes classifiers to assign even higher probabilities to their top predicted classes. In this paper, we analyze why EM works when adapting a model for a few steps and why it eventually fails after adapting for… ▽ More

    Submitted 12 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  6. arXiv:2312.02517  [pdf, other

    cs.LG cs.AI

    Simplifying Neural Network Training Under Class Imbalance

    Authors: Ravid Shwartz-Ziv, Micah Goldblum, Yucen Lily Li, C. Bayan Bruss, Andrew Gordon Wilson

    Abstract: Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models. The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures. Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023. Code available at https://github.com/ravidziv/SimplifyingImbalancedTraining

  7. arXiv:2309.07311  [pdf, other

    cs.CL

    Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

    Authors: Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra

    Abstract: Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout… ▽ More

    Submitted 7 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 camera-ready

  8. arXiv:2306.13292  [pdf, other

    cs.LG cs.AI cs.CV

    Variance-Covariance Regularization Improves Representation Learning

    Authors: Jiachen Zhu, Katrina Evtimova, Yubei Chen, Ravid Shwartz-Ziv, Yann LeCun

    Abstract: Transfer learning plays a key role in advancing machine learning models, yet conventional supervised pretraining often undermines feature transferability by prioritizing features that minimize the pretraining loss. In this work, we adapt a self-supervised learning regularization technique from the VICReg method to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg)… ▽ More

    Submitted 22 February, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: 165 pages, 5 figures

  9. arXiv:2305.15614  [pdf, other

    cs.LG cs.AI

    Reverse Engineering Self-Supervised Learning

    Authors: Ido Ben-Shaul, Ravid Shwartz-Ziv, Tomer Galanti, Shai Dekel, Yann LeCun

    Abstract: Self-supervised learning (SSL) is a powerful tool in machine learning, but understanding the learned representations and their underlying mechanisms remains a challenge. This paper presents an in-depth empirical analysis of SSL-trained representations, encompassing diverse models, architectures, and hyperparameters. Our study reveals an intriguing aspect of the SSL training process: it inherently… ▽ More

    Submitted 31 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  10. arXiv:2304.09355  [pdf, other

    cs.LG cs.IT

    To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review

    Authors: Ravid Shwartz-Ziv, Yann LeCun

    Abstract: Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory, and notably the information bottleneck principle, has been pivotal in shaping deep neural networks. This principle focuses on optimizing the trade-off be… ▽ More

    Submitted 21 November, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  11. arXiv:2303.00633  [pdf, other

    cs.IT cs.AI

    An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization

    Authors: Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann LeCun

    Abstract: Variance-Invariance-Covariance Regularization (VICReg) is a self-supervised learning (SSL) method that has shown promising results on a variety of tasks. However, the fundamental mechanisms underlying VICReg remain unexplored. In this paper, we present an information-theoretic perspective on the VICReg objective. We begin by deriving information-theoretic quantities for deterministic networks as a… ▽ More

    Submitted 1 May, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

  12. arXiv:2210.06441  [pdf, other

    cs.LG cs.CV

    How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

    Authors: Jonas Geiping, Micah Goldblum, Gowthami Somepalli, Ravid Shwartz-Ziv, Tom Goldstein, Andrew Gordon Wilson

    Abstract: Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsist… ▽ More

    Submitted 30 March, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 31 pages, 29 figures. To be presented at ICLR 2023. Code at https://github.com/JonasGeiping/dataaugs

  13. arXiv:2207.10081  [pdf, other

    cs.LG cs.AI

    What Do We Maximize in Self-Supervised Learning?

    Authors: Ravid Shwartz-Ziv, Randall Balestriero, Yann LeCun

    Abstract: In this paper, we examine self-supervised learning methods, particularly VICReg, to provide an information-theoretical understanding of their construction. As a first step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network, offering a possible alternative to prior work that relies on stochastic models. This enables us to demonstrate how VICReg can be (… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  14. arXiv:2205.10279  [pdf, other

    cs.LG cs.CV

    Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

    Authors: Ravid Shwartz-Ziv, Micah Goldblum, Hossein Souri, Sanyam Kapoor, Chen Zhu, Yann LeCun, Andrew Gordon Wilson

    Abstract: Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task. But an initialization contains relatively little information about the source task. Instead, we show that we can learn highly informative posteriors from the source task, through supervised or self-… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: Code available at https://github.com/hsouri/BayesianTransferLearning

  15. arXiv:2202.06749  [pdf, other

    cs.LG

    Information Flow in Deep Neural Networks

    Authors: Ravid Shwartz-Ziv

    Abstract: Although deep neural networks have been immensely successful, there is no comprehensive theoretical understanding of how they work or are structured. As a result, deep networks are often seen as black boxes with unclear interpretations and reliability. Understanding the performance of deep neural networks is one of the greatest scientific challenges. This work aims to apply principles and techniqu… ▽ More

    Submitted 21 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: PhD thesis

  16. arXiv:2106.03253  [pdf, other

    cs.LG

    Tabular Data: Deep Learning is Not All You Need

    Authors: Ravid Shwartz-Ziv, Amitai Armon

    Abstract: A key element in solving real-life data science problems is selecting the types of models to use. Tree ensemble models (such as XGBoost) are usually recommended for classification and regression problems with tabular data. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use cases. This paper explores whether these deep mod… ▽ More

    Submitted 23 November, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

  17. arXiv:2101.05304  [pdf, other

    cs.LG

    Spatial-Temporal Convolutional Network for Spread Prediction of COVID-19

    Authors: Ravid Shwartz-Ziv, Itamar Ben Ari, Amitai Armon

    Abstract: In this work we present a spatial-temporal convolutional neural network for predicting future COVID-19 related symptoms severity among a population, per region, given its past reported symptoms. This can help approximate the number of future Covid-19 patients in each region, thus enabling a faster response, e.g., preparing the local hospital or declaring a local lockdown where necessary. Our model… ▽ More

    Submitted 27 December, 2020; originally announced January 2021.

    Comments: IEEE BigData 2020

  18. arXiv:2006.04641  [pdf, other

    cs.IT cs.LG

    The Dual Information Bottleneck

    Authors: Zoe Piran, Ravid Shwartz-Ziv, Naftali Tishby

    Abstract: The Information Bottleneck (IB) framework is a general characterization of optimal representations obtained using a principled approach for balancing accuracy and complexity. Here we present a new framework, the Dual Information Bottleneck (dualIB), which resolves some of the known drawbacks of the IB. We provide a theoretical analysis of the dualIB framework; (i) solving for the structure of its… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

  19. arXiv:1911.09189  [pdf, other

    cs.LG cs.IT stat.ML

    Information in Infinite Ensembles of Infinitely-Wide Neural Networks

    Authors: Ravid Shwartz-Ziv, Alexander A. Alemi

    Abstract: In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks. Amazingly, this model family admits tractable calculations for many information-theoretic quantities. We report analytical and empirical investigations in the search for signals that correlate with generalization.

    Submitted 7 November, 2022; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: 2nd Symposium on Advances in Approximate Bayesian Inference, 2019

    Journal ref: Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference, PMLR 118:1-17 2019

  20. arXiv:1811.10228  [pdf, other

    cs.CV

    Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos

    Authors: Itamar Ben-Ari, Ravid Shwartz-Ziv

    Abstract: We propose a semi-supervised model for detecting anomalies in videos inspiredby the Video Pixel Network [van den Oord et al., 2016]. VPN is a probabilisticgenerative model based on a deep neural network that estimates the discrete jointdistribution of raw pixels in video frames. Our model extends the Convolutional-LSTM video encoder part of the VPN with a novel convolutional based attentionmechani… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

  21. arXiv:1703.00810  [pdf, other

    cs.LG

    Opening the Black Box of Deep Neural Networks via Information

    Authors: Ravid Shwartz-Ziv, Naftali Tishby

    Abstract: Despite their great success, there is still no comprehensive theoretical understanding of learning with Deep Neural Networks (DNNs) or their inner organization. Previous work proposed to analyze DNNs in the \textit{Information Plane}; i.e., the plane of the Mutual Information values that each layer preserves on the input and output variables. They suggested that the goal of the network is to optim… ▽ More

    Submitted 29 April, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: 19 pages, 8 figures