Zum Hauptinhalt springen

Showing 1–50 of 86 results for author: Cho, K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.16218  [pdf, other

    cs.LG stat.ML

    Targeted Cause Discovery with Data-Driven Learning

    Authors: Jang-Hyun Kim, Claudia Skok Gibbs, Sangdoo Yun, Hyun Oh Song, Kyunghyun Cho

    Abstract: We propose a novel machine learning approach for inferring causal variables of a target variable from observations. Our goal is to identify both direct and indirect causes within a system, thereby efficiently regulating the target variable when the difficulty and cost of intervening on each causal variable vary. Our method employs a neural network trained to identify causality through supervised l… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: preprint

  2. arXiv:2408.13430  [pdf, other

    stat.AP cs.DL cs.GT cs.LG stat.ML

    Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning?

    Authors: Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie J. Su

    Abstract: We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML) that requested authors with multiple submissions to rank their own papers based on perceived quality. We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be le… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: See more details about the experiment at https://openrank.cc/

  3. arXiv:2406.02585  [pdf, other

    cs.LG cs.AI stat.ML

    Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

    Authors: Siavash Golkar, Alberto Bietti, Mariel Pettee, Michael Eickenberg, Miles Cranmer, Keiya Hirashima, Geraud Krawezik, Nicholas Lourie, Michael McCabe, Rudy Morel, Ruben Ohana, Liam Holden Parker, Bruno Régaldo-Saint Blancard, Kyunghyun Cho, Shirley Ho

    Abstract: Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datas… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

  4. arXiv:2405.18075  [pdf, other

    cs.LG stat.ML

    Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient

    Authors: Nataša Tagasovska, Vladimir Gligorijević, Kyunghyun Cho, Andreas Loukas

    Abstract: Across scientific domains, generating new models or optimizing existing ones while meeting specific criteria is crucial. Traditional machine learning frameworks for guided design use a generative model and a surrogate model (discriminator), requiring large datasets. However, real-world scientific applications often have limited data and complex landscapes, making data-hungry models inefficient or… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2311.09480  [pdf, other

    cs.CL cs.LG stat.ML

    Show Your Work with Confidence: Confidence Bands for Tuning Curves

    Authors: Nicholas Lourie, Kyunghyun Cho, He He

    Abstract: The choice of hyperparameters greatly impacts performance in natural language processing. Often, it is hard to tell if a method is better than another or just better tuned. Tuning curves fix this ambiguity by accounting for tuning effort. Specifically, they plot validation performance as a function of the number of hyperparameter choices tried so far. While several estimators exist for these curve… ▽ More

    Submitted 8 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024. 18 pages, 20 figures

  6. arXiv:2310.02994  [pdf, other

    cs.LG cs.AI stat.ML

    Multiple Physics Pretraining for Physical Surrogate Models

    Authors: Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, Mariel Pettee, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho

    Abstract: We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling. MPP involves training large surrogate models to predict the dynamics of multiple heterogeneous physical systems simultaneously by learning features that are broadly useful across diverse physical tasks. In order to learn effectively in this setting, we introduce a… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  7. arXiv:2310.02989  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    xVal: A Continuous Number Encoding for Large Language Models

    Authors: Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno Régaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho

    Abstract: Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. We propose xVal, a numerical encoding scheme that represents any real number using just a single token. xVal represents a given real number by scaling a dedicated embedding vector by the number value. Combined with a modified number-inference… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 10 pages 7 figures. Supplementary: 5 pages 2 figures

  8. arXiv:2308.09248  [pdf, other

    cs.LG stat.ML

    Active and Passive Causal Inference Learning

    Authors: Daniel Jiwoong Im, Kyunghyun Cho

    Abstract: This paper serves as a starting point for machine learning researchers, engineers and students who are interested in but not yet familiar with causal inference. We start by laying out an important set of assumptions that are collectively needed for causal identification, such as exchangeability, positivity, consistency and the absence of interference. From these assumptions, we build out a set of… ▽ More

    Submitted 25 August, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

  9. arXiv:2308.05027  [pdf, other

    q-bio.BM cs.LG stat.ML

    AbDiffuser: Full-Atom Generation of in vitro Functioning Antibodies

    Authors: Karolis Martinkus, Jan Ludwiczak, Kyunghyun Cho, Wei-Ching Liang, Julien Lafrance-Vanasse, Isidro Hotzel, Arvind Rajpal, Yan Wu, Richard Bonneau, Vladimir Gligorijevic, Andreas Loukas

    Abstract: We introduce AbDiffuser, an equivariant and physics-informed diffusion model for the joint generation of antibody 3D structures and sequences. AbDiffuser is built on top of a new representation of protein structure, relies on a novel architecture for aligned proteins, and utilizes strong diffusion priors to improve the denoising process. Our approach improves protein diffusion by taking advantage… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 July, 2023; originally announced August 2023.

    Comments: NeurIPS 2023

  10. arXiv:2306.00344  [pdf, other

    cs.LG stat.ML

    BOtied: Multi-objective Bayesian optimization with tied multivariate ranks

    Authors: Ji Won Park, Nataša Tagasovska, Michael Maser, Stephen Ra, Kyunghyun Cho

    Abstract: Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In t… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 12 pages (+9 appendix), 13 figures. Accepted at ICML 2024

  11. arXiv:2208.12590  [pdf, other

    physics.chem-ph cs.LG physics.comp-ph stat.ML

    Ab-initio quantum chemistry with neural-network wavefunctions

    Authors: Jan Hermann, James Spencer, Kenny Choo, Antonio Mezzacapo, W. M. C. Foulkes, David Pfau, Giuseppe Carleo, Frank Noé

    Abstract: Machine learning and specifically deep-learning methods have outperformed human capabilities in many pattern recognition and data processing problems, in game playing, and now also play an increasingly important role in scientific discovery. A key application of machine learning in the molecular sciences is to learn potential energy surfaces or force fields from ab-initio solutions of the electron… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

    Comments: review, 17 pages, 6 figures

    Journal ref: Nat Rev Chem 7, 692-709 (2023)

  12. arXiv:2207.02093  [pdf, other

    cs.LG stat.ML

    Predicting Out-of-Domain Generalization with Neighborhood Invariance

    Authors: Nathan Ng, Neha Hulkund, Kyunghyun Cho, Marzyeh Ghassemi

    Abstract: Developing and deploying machine learning models safely depends on the ability to characterize and compare their abilities to generalize to new environments. Although recent work has proposed a variety of methods that can directly predict or theoretically bound the generalization capacity of a model, they rely on strong assumptions such as matching train/test distributions and access to model grad… ▽ More

    Submitted 17 July, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: 38 pages, 5 figures, 28 tables

  13. arXiv:2202.04136  [pdf, other

    cs.LG cs.AI stat.ML

    Generative multitask learning mitigates target-causing confounding

    Authors: Taro Makino, Krzysztof J. Geras, Kyunghyun Cho

    Abstract: We propose generative multitask learning (GMTL), a simple and scalable approach to causal representation learning for multitask learning. Our approach makes a minor change to the conventional multitask inference objective, and improves robustness to target shift. Since GMTL only modifies the inference objective, it can be used with existing multitask learning methods without requiring additional t… ▽ More

    Submitted 22 October, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

  14. arXiv:2112.09313  [pdf, other

    stat.ME math.ST stat.AP

    Federated Adaptive Causal Estimation (FACE) of Target Treatment Effects

    Authors: Larry Han, Jue Hou, Kelly Cho, Rui Duan, Tianxi Cai

    Abstract: Federated learning of causal estimands may greatly improve estimation efficiency by leveraging data from multiple study sites, but robustness to heterogeneity and model misspecifications is vital for ensuring validity. We develop a Federated Adaptive Causal Estimation (FACE) framework to incorporate heterogeneous data from multiple sites to provide treatment effect estimation and inference for a f… ▽ More

    Submitted 5 October, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: 59 pages

  15. arXiv:2110.09612  [pdf, ps, other

    stat.ME

    Semi-supervised Approach to Event Time Annotation Using Longitudinal Electronic Health Records

    Authors: Liang Liang, Jue Hou, Hajime Uno, Kelly Cho, Yanyuan Ma, Tianxi Cai

    Abstract: Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on impo… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  16. arXiv:2109.09339  [pdf, ps, other

    stat.ME

    Improving the accuracy of estimating indexes in contingency tables using Bayesian estimators

    Authors: Tomotaka Momozaki, Koji Cho, Tomoyuki Nakagawa, Sadao Tomizawa

    Abstract: In contingency table analysis, one is interested in testing whether a model of interest (e.g., the independent or symmetry model) holds using goodness-of-fit tests. When the null hypothesis where the model is true is rejected, the interest turns to the degree to which the probability structure of the contingency table deviates from the model. Many indexes have been studied to measure the degree of… ▽ More

    Submitted 25 October, 2023; v1 submitted 20 September, 2021; originally announced September 2021.

    Comments: 19 pages, 6 figures

  17. arXiv:2106.05459  [pdf, other

    cs.LG stat.ML

    Mode recovery in neural autoregressive sequence modeling

    Authors: Ilia Kulikov, Sean Welleck, Kyunghyun Cho

    Abstract: Despite its wide use, recent studies have revealed unexpected and undesirable properties of neural autoregressive sequence models trained with maximum likelihood, such as an unreasonably high affinity to short sequences after training and to infinitely long sequences at decoding time. We propose to study these phenomena by investigating how the modes, or local maxima, of a distribution are maintai… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: ACL-IJCNLP 2021 5th Workshop on Structured Prediction for NLP

  18. arXiv:2105.11447  [pdf, other

    cs.CL cs.LG stat.ML

    True Few-Shot Learning with Language Models

    Authors: Ethan Perez, Douwe Kiela, Kyunghyun Cho

    Abstract: Pretrained language models (LMs) perform well on many tasks even when learning from a few examples, but prior work uses many held-out examples to tune various aspects of learning, such as hyperparameters, training objectives, and natural language templates ("prompts"). Here, we evaluate the few-shot ability of LMs when such held-out examples are unavailable, a setting we call true few-shot learnin… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Comments: Code at https://github.com/ethanjperez/true_few_shot

  19. arXiv:2103.03872  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Rissanen Data Analysis: Examining Dataset Characteristics via Description Length

    Authors: Ethan Perez, Douwe Kiela, Kyunghyun Cho

    Abstract: We introduce a method to determine if a certain capability helps to achieve an accurate model of given data. We view labels as being generated from the inputs by a program composed of subroutines with different capabilities, and we posit that a subroutine is useful if and only if the minimal program that invokes it is shorter than the one that does not. Since minimum program length is uncomputable… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Comments: Code at https://github.com/ethanjperez/rda along with a script to run RDA on your own dataset

  20. arXiv:2012.14193  [pdf, other

    cs.LG stat.ML

    Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

    Authors: Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras

    Abstract: The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. We ask whether this tendency is connected to the widely observed phenomen… ▽ More

    Submitted 11 June, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: The last two authors contributed equally. Accepted to the International Conference on Machine Learning 2021

  21. arXiv:2009.10195  [pdf, other

    cs.CL cs.LG stat.ML

    SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness

    Authors: Nathan Ng, Kyunghyun Cho, Marzyeh Ghassemi

    Abstract: Models that perform well on a training domain often fail to generalize to out-of-domain (OOD) examples. Data augmentation is a common method used to prevent overfitting and improve OOD generalization. However, in natural language, it is difficult to generate new examples that stay on the underlying data manifold. We introduce SSMBA, a data augmentation method for generating synthetic training exam… ▽ More

    Submitted 4 October, 2020; v1 submitted 21 September, 2020; originally announced September 2020.

    Comments: 16 pages, 8 figures, to be published in EMNLP 2020

  22. arXiv:2009.07368  [pdf, other

    cs.LG cs.AI stat.ML

    Evaluating representations by the complexity of learning low-loss predictors

    Authors: William F. Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar, Kyunghyun Cho

    Abstract: We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest, and introduce two methods, surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). In contrast to… ▽ More

    Submitted 5 February, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

  23. A Parallel Evolutionary Multiple-Try Metropolis Markov Chain Monte Carlo Algorithm for Sampling Spatial Partitions

    Authors: Wendy K. Tam Cho, Yan Y. Liu

    Abstract: We develop an Evolutionary Markov Chain Monte Carlo (EMCMC) algorithm for sampling spatial partitions that lie within a large and complex spatial state space. Our algorithm combines the advantages of evolutionary algorithms (EAs) as optimization heuristics for state space traversal and the theoretical convergence properties of Markov Chain Monte Carlo algorithms for sampling from unknown distribut… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

    Journal ref: Statistics and Computing 31, Article 10 (2021)

  24. arXiv:2006.11432  [pdf, other

    cs.LG stat.ML

    Online Kernel based Generative Adversarial Networks

    Authors: Yeojoon Youn, Neil Thistlethwaite, Sang Keun Choe, Jacob Abernethy

    Abstract: One of the major breakthroughs in deep learning over the past five years has been the Generative Adversarial Network (GAN), a neural network-based generative model which aims to mimic some underlying distribution given a dataset of samples. In contrast to many supervised problems, where one tries to minimize a simple objective function of the parameters, GAN training is formulated as a min-max pro… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  25. arXiv:2006.03158  [pdf, other

    cs.LG stat.ML

    MLE-guided parameter search for task loss minimization in neural sequence modeling

    Authors: Sean Welleck, Kyunghyun Cho

    Abstract: Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks, where they are evaluated according to sequence-level task losses. These models are typically trained with maximum likelihood estimation, which ignores the task loss, yet empirically performs well as a surrogate objective. Typical approaches to directly optimizing the task l… ▽ More

    Submitted 5 October, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

  26. arXiv:2004.00816  [pdf, other

    stat.ME

    Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

    Authors: Molei Liu, Yin Xia, Kelly Cho, Tianxi Cai

    Abstract: Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improve power is through meta-analyzing multiple studies on the same scientific question. However, integrative analysis of high dimensional data fro… ▽ More

    Submitted 7 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

  27. arXiv:2003.10041  [pdf, other

    cs.LG cs.CV stat.ML

    Understanding the robustness of deep neural network classifiers for breast cancer screening

    Authors: Witold Oleszkiewicz, Taro Makino, Stanisław Jastrzębski, Tomasz Trzciński, Linda Moy, Kyunghyun Cho, Laura Heacock, Krzysztof J. Geras

    Abstract: Deep neural networks (DNNs) show promise in breast cancer screening, but their robustness to input perturbations must be better understood before they can be clinically implemented. There exists extensive literature on this subject in the context of natural images that can potentially be built upon. However, it cannot be assumed that conclusions about robustness will transfer from natural images t… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

    Comments: Accepted as a workshop paper at AI4AH, ICLR 2020

  28. arXiv:2002.09572  [pdf, other

    cs.LG stat.ML

    The Break-Even Point on Optimization Trajectories of Deep Neural Networks

    Authors: Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras

    Abstract: The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the optimization trajectory. We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gr… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

    Comments: Accepted as a spotlight at ICLR 2020. The last two authors contributed equally

  29. arXiv:2002.07613  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization

    Authors: Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Kangning Liu, Sudarshini Tyagi, Laura Heacock, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras

    Abstract: Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical im… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

  30. arXiv:2002.07233  [pdf, other

    cs.LG stat.ML

    On the Discrepancy between Density Estimation and Sequence Generation

    Authors: Jason Lee, Dustin Tran, Orhan Firat, Kyunghyun Cho

    Abstract: Many sequence-to-sequence generation tasks, including machine translation and text-to-speech, can be posed as estimating the density of the output y given the input x: p(y|x). Given this interpretation, it is natural to evaluate sequence-to-sequence models using conditional log-likelihood on a test set. However, the goal of sequence-to-sequence generation (or structured prediction) is to find the… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

  31. arXiv:2002.02492  [pdf, other

    cs.LG cs.CL stat.ML

    Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

    Authors: Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang, Kyunghyun Cho

    Abstract: Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm,… ▽ More

    Submitted 2 October, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: EMNLP 2020

  32. arXiv:1910.11424  [pdf, other

    cs.CL cs.AI cs.LG cs.MA stat.ML

    Capacity, Bandwidth, and Compositionality in Emergent Language Learning

    Authors: Cinjon Resnick, Abhinav Gupta, Jakob Foerster, Andrew M. Dai, Kyunghyun Cho

    Abstract: Many recent works have discussed the propensity, or lack thereof, for emergent languages to exhibit properties of natural languages. A favorite in the literature is learning compositionality. We note that most of those works have focused on communicative bandwidth as being of primary importance. While important, it is not the only contributing factor. In this paper, we investigate the learning bia… ▽ More

    Submitted 15 April, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: The first two authors contributed equally. Accepted at AAMAS 2020

  33. arXiv:1910.01727  [pdf, other

    cs.LG stat.ML

    Generalized Inner Loop Meta-Learning

    Authors: Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, Soumith Chintala

    Abstract: Many (but not all) approaches self-qualifying as "meta-learning" in deep learning and reinforcement learning fit a common pattern of approximating the solution to a nested optimization problem. In this paper, we give a formalization of this shared pattern, which we call GIMLI, prove its general requirements, and derive a general-purpose algorithm for implementing similar approaches. Based on this… ▽ More

    Submitted 7 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 17 pages, 3 figures, 1 algorithm

  34. arXiv:1909.11299  [pdf, other

    cs.LG stat.ML

    Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

    Authors: Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang

    Abstract: In natural language processing, it has been observed recently that generalization could be greatly improved by finetuning a large-scale language model pretrained on a large unlabeled corpus. Despite its recent success and wide adoption, finetuning a large pretrained language model on a downstream task is prone to degenerate performance when there are only a small number of training instances avail… ▽ More

    Submitted 22 January, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

    Comments: Published as a conference paper at ICLR 2020

  35. arXiv:1908.09357  [pdf, other

    cs.LG cs.AI stat.ML

    Dynamics-aware Embeddings

    Authors: William Whitney, Rajat Agarwal, Kyunghyun Cho, Abhinav Gupta

    Abstract: In this paper we consider self-supervised representation learning to improve sample efficiency in reinforcement learning (RL). We propose a forward prediction objective for simultaneously learning embeddings of states and action sequences. These embeddings capture the structure of the environment's dynamics, enabling efficient policy learning. We demonstrate that our action embeddings alone improv… ▽ More

    Submitted 14 January, 2020; v1 submitted 25 August, 2019; originally announced August 2019.

    Comments: Published at ICLR 2020

  36. arXiv:1908.04319  [pdf, other

    cs.LG cs.CL stat.ML

    Neural Text Generation with Unlikelihood Training

    Authors: Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston

    Abstract: Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core. In particular, standard likelihood training and decoding leads to dull and repetitive outputs. While some post-hoc fixes have been proposed, in particular top-$k$ and nucleus sampling, they do not address the fact that the token-level probabilities predicted by the mode… ▽ More

    Submitted 26 September, 2019; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: Sean Welleck and Ilia Kulikov contributed equally

  37. arXiv:1908.00615  [pdf, other

    eess.IV cs.CV stat.ML

    Improving localization-based approaches for breast cancer screening exam classification

    Authors: Thibault Févry, Jason Phang, Nan Wu, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras

    Abstract: We trained and evaluated a localization-based deep CNN for breast cancer screening exam classification on over 200,000 exams (over 1,000,000 images). Our model achieves an AUC of 0.919 in predicting malignancy in patients undergoing breast cancer screening, reducing the error rate of the baseline (Wu et al., 2019a) by 23%. In addition, the models generates bounding boxes for benign and malignant f… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

    Comments: MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/HyxoAR_AK4

  38. arXiv:1907.13057  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Screening Mammogram Classification with Prior Exams

    Authors: Jungkyu Park, Jason Phang, Yiqiu Shen, Nan Wu, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras

    Abstract: Radiologists typically compare a patient's most recent breast cancer screening exam to their previous ones in making informed diagnoses. To reflect this practice, we propose new neural network models that compare pairs of screening mammograms from the same patient. We train and evaluate our proposed models on over 665,000 pairs of images (over 166,000 pairs of exams). Our best model achieves an AU… ▽ More

    Submitted 30 July, 2019; originally announced July 2019.

    Comments: MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/HkgCdUaMq4

  39. arXiv:1907.02649  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks

    Authors: Owen Marschall, Kyunghyun Cho, Cristina Savin

    Abstract: We present a framework for compactly summarizing many recent results in efficient and/or biologically plausible online training of recurrent neural networks (RNN). The framework organizes algorithms according to several criteria: (a) past vs. future facing, (b) tensor structure, (c) stochastic vs. deterministic, and (d) closed form vs. numerical. These axes reveal latent conceptual connections amo… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

    Comments: 29 pages, 9 figures

  40. arXiv:1906.02846  [pdf, other

    cs.LG eess.IV stat.ML

    Globally-Aware Multiple Instance Classifier for Breast Cancer Screening

    Authors: Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras

    Abstract: Deep learning models designed for visual classification tasks on natural images have become prevalent in medical image analysis. However, medical images differ from typical natural images in many ways, such as significantly higher resolutions and smaller regions of interest. Moreover, both the global structure and local details play important roles in medical image analysis tasks. To address these… ▽ More

    Submitted 19 August, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Accepted to MLMI 2019

  41. arXiv:1905.12790  [pdf, other

    cs.LG cs.CL stat.ML

    A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models

    Authors: Elman Mansimov, Alex Wang, Sean Welleck, Kyunghyun Cho

    Abstract: Undirected neural sequence models such as BERT (Devlin et al., 2019) have received renewed interest due to their success on discriminative natural language understanding tasks such as question-answering and natural language inference. The problem of generating sequences directly from these models has received relatively little attention, in part because generating from undirected models departs si… ▽ More

    Submitted 7 February, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

  42. arXiv:1905.10930  [pdf, other

    cs.LG cs.CL stat.ML

    Sequential Graph Dependency Parser

    Authors: Sean Welleck, Kyunghyun Cho

    Abstract: We propose a method for non-projective dependency parsing by incrementally predicting a set of edges. Since the edges do not have a pre-specified order, we propose a set-based learning method. Our method blends graph, transition, and easy-first parsing, including a prior state of the parser as a special case. The proposed transition-based method successfully parses near the state of the art on bot… ▽ More

    Submitted 23 October, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: RANLP 2019

  43. arXiv:1905.10345  [pdf, other

    cs.LG stat.ML

    Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar

    Authors: Iddo Drori, Yamuna Krishnamurthy, Raoni Lourenco, Remi Rampin, Kyunghyun Cho, Claudio Silva, Juliana Freire

    Abstract: Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

    Comments: ICML Workshop on Automated Machine Learning

  44. arXiv:1905.05843  [pdf, other

    cs.LG stat.ML

    Task-Driven Data Verification via Gradient Descent

    Authors: Siavash Golkar, Kyunghyun Cho

    Abstract: We introduce a novel algorithm for the detection of possible sample corruption such as mislabeled samples in a training dataset given a small clean validation set. We use a set of inclusion variables which determine whether or not any element of the noisy training set should be included in the training of a network. We compute these inclusion variables by optimizing the performance of the network… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: 10 pages, 6 figures

  45. arXiv:1904.12935  [pdf, ps, other

    cs.LG stat.ML

    Advancing GraphSAGE with A Data-Driven Node Sampling

    Authors: Jihun Oh, Kyunghyun Cho, Joan Bruna

    Abstract: As an efficient and scalable graph neural network, GraphSAGE has enabled an inductive capability for inferring unseen nodes or graphs by aggregating subsampled local neighborhoods and by learning in a mini-batch gradient descent fashion. The neighborhood sampling used in GraphSAGE is effective in order to improve computing and memory efficiency when inferring a batch of target nodes with diverse d… ▽ More

    Submitted 29 April, 2019; originally announced April 2019.

    Comments: 6 pages, 2 tables, ICLR 2019 workshop on Representation Learning on Graphs and Manifolds

  46. arXiv:1904.00314  [pdf, other

    cs.LG physics.comp-ph stat.ML

    Molecular geometry prediction using a deep generative graph neural network

    Authors: Elman Mansimov, Omar Mahmood, Seokho Kang, Kyunghyun Cho

    Abstract: A molecule's geometry, also known as conformation, is one of a molecule's most important properties, determining the reactions it participates in, the bonds it forms, and the interactions it has with other molecules. Conventional conformation generation methods minimize hand-designed molecular force field energy functions that are often not well correlated with the true energy function of a molecu… ▽ More

    Submitted 16 December, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

    Comments: 15 pages, 6 figures

    Journal ref: Scientific Reports 9: 20381, 2019

  47. arXiv:1903.08297  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

    Authors: Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh, Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao, Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema, Stephanie Chung , et al. (7 additional authors not shown)

    Abstract: We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

    Comments: MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/SkxYez76FE

  48. arXiv:1903.04476  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    Continual Learning via Neural Pruning

    Authors: Siavash Golkar, Michael Kagan, Kyunghyun Cho

    Abstract: We introduce Continual Learning via Neural Pruning (CLNP), a new method aimed at lifelong learning in fixed capacity models based on neuronal model sparsification. In this method, subsequent tasks are trained using the inactive neurons and filters of the sparsified network and cause zero deterioration to the performance of previous tasks. In order to deal with the possible compromise between model… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

    Comments: 12 pages, 5 figures, 3 tables

  49. arXiv:1902.02192  [pdf, other

    cs.CL cs.LG stat.ML

    Non-Monotonic Sequential Text Generation

    Authors: Sean Welleck, Kianté Brantley, Hal Daumé III, Kyunghyun Cho

    Abstract: Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary… ▽ More

    Submitted 23 October, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

    Comments: ICML 2019

  50. arXiv:1810.00150  [pdf, other

    cs.LG stat.ML

    Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning

    Authors: Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang

    Abstract: Although stochastic gradient descent (SGD) is a driving force behind the recent success of deep learning, our understanding of its dynamics in a high-dimensional parameter space is limited. In recent years, some researchers have used the stochasticity of minibatch gradients, or the signal-to-noise ratio, to better characterize the learning dynamics of SGD. Inspired from these work, we here analyze… ▽ More

    Submitted 28 November, 2018; v1 submitted 29 September, 2018; originally announced October 2018.

    Comments: 11 pages (+15 pages for references and supplemental material, total 26 pages), 12 figures, a single table