Skip to main content

Showing 1–10 of 10 results for author: Khona, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14549  [pdf, other

    cs.CV cs.LG q-bio.NC

    Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models

    Authors: Sunny Duan, Mikail Khona, Abhiram Iyer, Rylan Schaeffer, Ila R Fiete

    Abstract: The proliferation of large language models has revolutionized natural language processing tasks, yet it raises profound concerns regarding data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage -- where the model response reveals pieces of such information -- remains inadequately understoo… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.12785  [pdf, other

    cs.LG

    In-Context Learning of Energy Functions

    Authors: Rylan Schaeffer, Mikail Khona, Sanmi Koyejo

    Abstract: In-context learning is a powerful capability of certain machine learning models that arguably underpins the success of today's frontier AI models. However, in-context learning is critically limited to settings where the in-context distribution of interest $p_θ^{ICL}( x|\mathcal{D})$ can be straightforwardly expressed and/or parameterized by the model; for instance, language modeling relies on expr… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 1st Workshop on In-Context Learning at the 41st International Conference on Machine Learning, Vienna, Austria. 2024. arXiv admin note: text overlap with arXiv:2402.10202

  3. arXiv:2406.09366  [pdf, other

    cs.LG cs.CV q-bio.NC

    Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

    Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

    Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2403.03230  [pdf, other

    q-bio.NC cs.AI

    Large language models surpass human experts in predicting neuroscience results

    Authors: Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, Daniele Marinazzo, Jonathan Nicholas, Alessandro Salatiello, Ilia Sucholutsky, Pasquale Minervini, Sepehr Razavi, Roberta Rocca, Elkhan Yusifov, Tereza Okalova, Nianlong Gu, Martin Ferianc, Mikail Khona, Kaustubh R. Patil, Pui-Shee Lee, Rui Mata , et al. (14 additional authors not shown)

    Abstract: Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created Brain… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  5. arXiv:2402.10202  [pdf, other

    cs.LG

    Bridging Associative Memory and Probabilistic Modeling

    Authors: Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

    Abstract: Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from probability distributions. Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log like… ▽ More

    Submitted 13 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  6. arXiv:2402.07757  [pdf, other

    cs.LG cs.AI

    Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

    Authors: Mikail Khona, Maya Okawa, Jan Hula, Rahul Ramesh, Kento Nishi, Robert Dick, Ekdeep Singh Lubana, Hidenori Tanaka

    Abstract: Stepwise inference protocols, such as scratchpads and chain-of-thought, help language models solve complex problems by decomposing them into a sequence of simpler subproblems. Despite the significant gain in performance achieved via these protocols, the underlying mechanisms of stepwise inference have remained elusive. To address this, we propose to study autoregressive Transformer models on a syn… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  7. arXiv:2311.12997  [pdf, other

    cs.LG

    Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

    Authors: Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka

    Abstract: Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing basic arithmetic. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we train autoregressive Transformer models on… ▽ More

    Submitted 5 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  8. arXiv:2311.02316  [pdf, other

    cs.LG cs.NE

    Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells

    Authors: Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Cristóbal Eyzaguirre, Sanmi Koyejo, Ila Rani Fiete

    Abstract: To solve the spatial problems of mapping, localization and navigation, the mammalian lineage has developed striking spatial representations. One important spatial representation is the Nobel-prize winning grid cells: neurons that represent self-location, a local and aperiodic quantity, with seemingly bizarre non-local and spatially periodic activity patterns of a few discrete periods. Why has the… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  9. arXiv:2310.07711  [pdf, other

    q-bio.NC cs.AI cs.LG cs.NE

    Growing Brains: Co-emergence of Anatomical and Functional Modularity in Recurrent Neural Networks

    Authors: Ziming Liu, Mikail Khona, Ila R. Fiete, Max Tegmark

    Abstract: Recurrent neural networks (RNNs) trained on compositional tasks can exhibit functional modularity, in which neurons can be clustered by activity similarity and participation in shared computational subtasks. Unlike brains, these RNNs do not exhibit anatomical modularity, in which functional clustering is correlated with strong recurrent coupling and spatial localization of functional clusters. Con… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 8 pages, 6 figures

  10. arXiv:2303.14151  [pdf, other

    cs.LG stat.ML

    Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

    Authors: Rylan Schaeffer, Mikail Khona, Zachary Robertson, Akhilan Boopathy, Kateryna Pistunova, Jason W. Rocks, Ila Rani Fiete, Oluwasanmi Koyejo

    Abstract: Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine lea… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.