Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Matena, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.04649  [pdf, other

    cs.LG

    NPEFF: Non-Negative Per-Example Fisher Factorization

    Authors: Michael Matena, Colin Raffel

    Abstract: As deep learning models are deployed in more and more settings, it becomes increasingly important to be able to understand why they produce a given prediction, but interpretation of these models remains a challenge. In this paper, we introduce a novel interpretability method called NPEFF that is readily applicable to any end-to-end differentiable model. It operates on the principle that processing… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  2. arXiv:2210.00176  [pdf, other

    cs.LG

    A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

    Authors: Michael Matena, Colin Raffel

    Abstract: The NP-hard problem of optimizing a shallow ReLU network can be characterized as a combinatorial search over each training example's activation pattern followed by a constrained convex problem given a fixed set of activation patterns. We explore the implications of this combinatorial aspect of ReLU optimization in this work. We show that it can be naturally modeled via a geometric and combinatoric… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

  3. arXiv:2111.09832  [pdf, other

    cs.LG

    Merging Models with Fisher-Weighted Averaging

    Authors: Michael Matena, Colin Raffel

    Abstract: Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities. In this paper, we take the perspective that this "merging" operation can be seen as choosing parameters that approximately maximize the joint likelihood of the posteriors of the models' parameters. Computing a simple average of the models' parameters… ▽ More

    Submitted 26 August, 2022; v1 submitted 18 November, 2021; originally announced November 2021.

  4. arXiv:2102.11972  [pdf, other

    cs.LG cs.CL

    Do Transformer Modifications Transfer Across Implementations and Applications?

    Authors: Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

    Abstract: The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we f… ▽ More

    Submitted 10 September, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: To appear at EMNLP 2021 as a conference paper

  5. arXiv:1910.10683  [pdf, other

    cs.LG cs.CL stat.ML

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

    Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing… ▽ More

    Submitted 19 September, 2023; v1 submitted 23 October, 2019; originally announced October 2019.