Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Twiton, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2201.12091  [pdf, other

    cs.LG cs.CL

    Linear Adversarial Concept Erasure

    Authors: Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell

    Abstract: Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in… ▽ More

    Submitted 12 September, 2024; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Accepted in ICML 2022; a revised version

  2. arXiv:2004.07667  [pdf, other

    cs.CL cs.LG

    Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

    Authors: Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav Goldberg

    Abstract: The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations. Our method is based on repeated training of linear classifiers that predict a certain property we ai… ▽ More

    Submitted 28 April, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted as a long paper in ACL 2020