Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Chen, M F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.14430  [pdf, other

    cs.CL cs.LG

    Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

    Authors: Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher Ré

    Abstract: The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when le… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  2. arXiv:2307.11031  [pdf, ps, other

    cs.LG cs.CL

    Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

    Authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré

    Abstract: Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 38 pages, 22 figures, 8 tables

  3. arXiv:2212.10579  [pdf, other

    hep-ph cs.LG hep-ex stat.ML

    Resonant Anomaly Detection with Multiple Reference Datasets

    Authors: Mayee F. Chen, Benjamin Nachman, Frederic Sala

    Abstract: An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  4. arXiv:2210.02441  [pdf, other

    cs.CL

    Ask Me Anything: A simple strategy for prompting language models

    Authors: Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré

    Abstract: Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt that demonstrates how to perform the task and no additional training. Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect promp… ▽ More

    Submitted 19 November, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

  5. arXiv:2204.08173  [pdf, other

    cs.CL cs.LG

    TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

    Authors: Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher Ré

    Abstract: Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there a… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: Accepted to Findings of ACL 2022

  6. arXiv:2204.07596  [pdf, other

    stat.ML cs.LG

    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

    Authors: Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré

    Abstract: An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them,… ▽ More

    Submitted 13 July, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: ICML 2022 Camera Ready

  7. arXiv:2203.13270  [pdf, other

    stat.ML cs.LG

    Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

    Authors: Mayee F. Chen, Daniel Y. Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, Christopher Ré

    Abstract: Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudol… ▽ More

    Submitted 1 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: UAI 2022 Camera Ready

  8. arXiv:2103.02761  [pdf, other

    cs.LG stat.ML

    Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation

    Authors: Mayee F. Chen, Benjamin Cohen-Wang, Stephen Mussmann, Frederic Sala, Christopher Ré

    Abstract: Labeling data for modern machine learning is expensive and time-consuming. Latent variable models can be used to infer labels from weaker, easier-to-acquire sources operating on unlabeled data. Such models can also be trained using labeled data, presenting a key question: should a user invest in few labeled or many unlabeled points? We answer this via a framework centered on model misspecification… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: To appear in AISTATS 2021

  9. arXiv:2006.15168  [pdf, other

    stat.ML cs.LG

    Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

    Authors: Mayee F. Chen, Daniel Y. Fu, Frederic Sala, Sen Wu, Ravi Teja Mullapudi, Fait Poms, Kayvon Fatahalian, Christopher Ré

    Abstract: Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  10. arXiv:2003.08377  [pdf, other

    cs.SI cs.DS cs.GT physics.soc-ph

    Network disruption: maximizing disagreement and polarization in social networks

    Authors: Mayee F. Chen, Miklos Z. Racz

    Abstract: Recent years have seen a marked increase in the spread of misinformation, a phenomenon which has been accelerated and amplified by social media such as Facebook and Twitter. While some actors spread misinformation to push a specific agenda, it has also been widely documented that others aim to simply disrupt the network by increasing disagreement and polarization across the network and thereby des… ▽ More

    Submitted 9 April, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: 20 pages, 6 figures

  11. arXiv:2002.11955  [pdf, other

    stat.ML cs.LG

    Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

    Authors: Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré

    Abstract: Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive,… ▽ More

    Submitted 15 July, 2020; v1 submitted 27 February, 2020; originally announced February 2020.