Zum Hauptinhalt springen

Showing 1–50 of 116 results for author: Chung, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10676  [pdf, other

    cs.LG

    Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

    Authors: Dong Geun Shin, Hye Won Chung

    Abstract: Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distr… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 30 pages, 8 figures, 17 tables

  2. arXiv:2407.12604  [pdf, ps, other

    cs.IT cs.DS cs.SI

    Exact Graph Matching in Correlated Gaussian-Attributed Erdős-Rényi Model

    Authors: Joonhyuk Yang, Hye Won Chung

    Abstract: Graph matching problem aims to identify node correspondence between two or more correlated graphs. Previous studies have primarily focused on models where only edge information is provided. However, in many social networks, not only the relationships between users, represented by edges, but also their personal information, represented by features, are present. In this paper, we address the challen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: IEEE International Symposium on Information Theory (ISIT) 2024

  3. arXiv:2407.08991  [pdf

    eess.AS cs.AI cs.CC

    Optimization of DNN-based speaker verification model through efficient quantization technique

    Authors: Yeona Hong, Woo-Jin Chung, Hong-Goo Kang

    Abstract: As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: in Korean language, Accepted at Society of Electronic Engineers of Korea Conference 2024

  4. arXiv:2407.00568  [pdf, other

    cs.LG cs.AI

    Divide And Conquer: Learning Chaotic Dynamical Systems With Multistep Penalty Neural Ordinary Differential Equations

    Authors: Dibyajyoti Chakraborty, Seung Whan Chung, Romit Maulik

    Abstract: Forecasting high-dimensional dynamical systems is a fundamental challenge in various fields, such as the geosciences and engineering. Neural Ordinary Differential Equations (NODEs), which combine the power of neural networks and numerical solvers, have emerged as a promising algorithm for forecasting complex nonlinear dynamical systems. However, classical techniques used for NODE training are inef… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: 20 pages, 10 Figures, submitted to Journal of Computational Physics

  5. arXiv:2406.18561  [pdf, other

    cs.CV cs.LG

    SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching

    Authors: Yongmin Lee, Hye Won Chung

    Abstract: Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

    Comments: ICML 2024

  6. arXiv:2406.17329  [pdf, other

    eess.SP cs.SD eess.AS physics.bio-ph

    Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

    Authors: Woo-Jin Chung, Hong-Goo Kang

    Abstract: We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern in… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  7. arXiv:2406.03057  [pdf, other

    cs.LG stat.ML

    BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

    Authors: Hoyong Choi, Nohyun Ki, Hye Won Chung

    Abstract: Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing methods tend to specialize in either high or low selection ratio regimes, lacking a universal approach that consistently achieves competitive performance acros… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  8. arXiv:2403.19180  [pdf

    eess.SP cs.ET

    A Robust UWOC-assisted Multi-hop Topology for Underwater Sensor Network Nodes

    Authors: Maaz Salman, Javad Bolboli, Wan-Young Chung

    Abstract: Underwater environment is substantially less explored territory as compared to earth surface due to lack of robust underwater communication infrastructure. For Internet of Underwater things connectivity, underwater wireless optical communication can play a vital role, compared to conventional radio frequency communication, due to longer range, high data rate, low latency, and unregulated bandwidth… ▽ More

    Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  9. arXiv:2403.07959  [pdf, other

    cs.CR cs.AI

    An Interpretable Generalization Mechanism for Accurately Detecting Anomaly and Identifying Networking Intrusion Techniques

    Authors: Hao-Ting Pai, Yu-Hsuan Kang, Wen-Cheng Chung

    Abstract: Recent advancements in Intrusion Detection Systems (IDS), integrating Explainable AI (XAI) methodologies, have led to notable improvements in system performance via precise feature selection. However, a thorough understanding of cyber-attacks requires inherently explainable decision-making processes within IDS. In this paper, we present the Interpretable Generalization Mechanism (IG), poised to re… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  10. arXiv:2402.11223  [pdf, other

    cs.LG

    HEAL: Brain-inspired Hyperdimensional Efficient Active Learning

    Authors: Yang Ni, Zhuowen Zou, Wenjun Huang, Hanning Chen, William Youngwoo Chung, Samuel Cho, Ranganath Krishnan, Pietro Mercati, Mohsen Imani

    Abstract: Drawing inspiration from the outstanding learning capability of our human brains, Hyperdimensional Computing (HDC) emerges as a novel computing paradigm, and it leverages high-dimensional vector presentation and operations for brain-like lightweight Machine Learning (ML). Practical deployments of HDC have significantly enhanced the learning efficiency compared to current deep ML methods on a broad… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  11. arXiv:2402.10482  [pdf, other

    cs.LG stat.ML

    Understanding Self-Distillation and Partial Label Learning in Multi-Class Classification with Label Noise

    Authors: Hyeonsu Jeong, Hye Won Chung

    Abstract: Self-distillation (SD) is the process of training a student model using the outputs of a teacher model, with both models sharing the same architecture. Our study theoretically examines SD in multi-class classification with cross-entropy loss, exploring both multi-round SD and SD with refined teacher outputs, inspired by partial label learning (PLL). By deriving a closed-form solution for the stude… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  12. arXiv:2401.10245  [pdf, other

    cs.CE physics.flu-dyn

    Train Small, Model Big: Scalable Physics Simulators via Reduced Order Modeling and Domain Decomposition

    Authors: Seung Whan Chung, Youngsoo Choi, Pratanu Roy, Thomas Moore, Thomas Roy, Tiras Y. Lin, Du Y. Nguyen, Christopher Hahn, Eric B. Duoss, Sarah E. Baker

    Abstract: Numerous cutting-edge scientific technologies originate at the laboratory scale, but transitioning them to practical industry applications is a formidable challenge. Traditional pilot projects at intermediate scales are costly and time-consuming. An alternative, the E-pilot, relies on high-fidelity numerical simulations, but even these simulations can be computationally prohibitive at larger scale… ▽ More

    Submitted 5 December, 2023; originally announced January 2024.

    Comments: 40 pages, 12 figures. Submitted to Computer Methods in Applied Mechanics and Engineering

    Report number: LLNL-JRNL-857774 MSC Class: 65F55; 65N55 (primary) 76D07 (secondary)

  13. arXiv:2312.15320  [pdf

    q-bio.QM cs.CV cs.LG cs.MM q-bio.GN

    GestaltMML: Enhancing Rare Genetic Disease Diagnosis through Multimodal Machine Learning Combining Facial Images and Clinical Texts

    Authors: Da Wu, Jingye Yang, Cong Liu, Tzung-Chien Hsieh, Elaine Marchi, Justin Blair, Peter Krawitz, Chunhua Weng, Wendy Chung, Gholson J. Lyon, Ian D. Krantz, Jennifer M. Kalish, Kai Wang

    Abstract: Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artifi… ▽ More

    Submitted 21 April, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: Significant revisions

  14. arXiv:2311.10327  [pdf, other

    cs.RO eess.SY

    Dimensionality Reduction of Dynamics on Lie Manifolds via Structure-Aware Canonical Correlation Analysis

    Authors: Wooyoung Chung, Daniel Polani, Stas Tiomkin

    Abstract: Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  15. arXiv:2311.00364  [pdf, other

    eess.AS cs.SD physics.bio-ph

    C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge

    Authors: Woo-Jin Chung, Miseul Kim, Hong-Goo Kang

    Abstract: This report describes our submission to BHI 2023 Data Competition: Sensor challenge. Our Audio Alchemists team designed an acoustic-based COVID-19 diagnosis system, Cough to COVID-19 (C2C), and won the 1st place in the challenge. C2C involves three key contributions: pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation. Through exper… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 1st place winning paper from the BHI 2023 Data Challenge Competition: Sensor Informatics

  16. arXiv:2310.12467  [pdf, other

    cs.CL

    Contrastive Learning for Inference in Dialogue

    Authors: Etsuko Ishii, Yan Xu, Bryan Wilie, Ziwei Ji, Holy Lovenia, Willy Chung, Pascale Fung

    Abstract: Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this… ▽ More

    Submitted 12 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP2023

  17. arXiv:2310.08885  [pdf, other

    cs.CL

    InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

    Authors: Willy Chung, Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Pascale Fung

    Abstract: Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTOD… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  18. arXiv:2309.13457  [pdf, other

    cs.LG cs.CV physics.comp-ph physics.flu-dyn

    Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data

    Authors: Wai Tong Chung, Bassem Akoush, Pushan Sharma, Alex Tamkin, Ki Sung Jung, Jacqueline H. Chen, Jack Guo, Davy Brouzet, Mohsen Talei, Bruno Savard, Alexei Y. Poludnenko, Matthias Ihme

    Abstract: Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent f… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted in Adv. in Neural Information Processing Systems 36 (NeurIPS 2023). Link: https://nips.cc/virtual/2023/poster/73433 . 55 pages, 21 figures. Keywords: Super-resolution, 3D, Neural Scaling, Physics-informed Loss, Computational Fluid Dynamics, Partial Differential Equations, Turbulent Reacting Flows, Direct Numerical Simulation, Fluid Mechanics, Combustion, Computer Vision

  19. arXiv:2309.10413  [pdf, other

    cs.CL

    PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems

    Authors: Bryan Wilie, Yan Xu, Willy Chung, Samuel Cahyawijaya, Holy Lovenia, Pascale Fung

    Abstract: Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses. However, current knowledge-grounded dialogue (KGD) systems often fail to align the generated responses with human-preferred qualities due to several issues like hallucination and the lack of coherence. Upon analyzing multiple language model generations, we observe the presence of… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  20. arXiv:2309.05182  [pdf, ps, other

    cs.IT cs.DS

    Graph Matching in Correlated Stochastic Block Models for Improved Graph Clustering

    Authors: Joonhyuk Yang, Hye Won Chung

    Abstract: We consider community detection from multiple correlated graphs sharing the same community structure. The correlated graphs are generated by independent subsampling of a parent graph sampled from the stochastic block model. The vertex correspondence between the correlated graphs is assumed to be unknown. We consider the two-step procedure where the vertex correspondence between the correlated grap… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: Allerton Conference 2023

  21. arXiv:2306.14517  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

    Authors: Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale Fung

    Abstract: Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese,… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted in INTERSPEECH 2023

  22. arXiv:2306.01859  [pdf, other

    cs.CV cs.AI

    Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive Learning

    Authors: Ronald Xie, Kuan Pang, Sai W. Chung, Catia T. Perciani, Sonya A. MacParland, Bo Wang, Gary D. Bader

    Abstract: Histology imaging is an important tool in medical diagnosis and research, enabling the examination of tissue structure and composition at the microscopic level. Understanding the underlying molecular mechanisms of tissue architecture is critical in uncovering disease mechanisms and developing effective treatments. Gene expression profiling provides insight into the molecular processes underlying t… ▽ More

    Submitted 27 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

  23. arXiv:2305.19666  [pdf, other

    cs.DS cs.LG cs.SI stat.ML

    Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

    Authors: Joonhyuk Yang, Dongpil Shin, Hye Won Chung

    Abstract: We consider the problem of graph matching, or learning vertex correspondence, between two correlated stochastic block models (SBMs). The graph matching problem arises in various fields, including computer vision, natural language processing and bioinformatics, and in particular, matching graphs with inherent community structure has significance related to de-anonymization of correlated social netw… ▽ More

    Submitted 2 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  24. arXiv:2305.14705  [pdf, other

    cs.CL

    Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

    Authors: Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou

    Abstract: Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions. We advocate combining these two approaches, as we find that MoE models benefit more from instruction tuning than dense models. In particular, we… ▽ More

    Submitted 5 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Preprint

  25. arXiv:2305.13627  [pdf, other

    cs.CL cs.AI

    InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

    Authors: Samuel Cahyawijaya, Holy Lovenia, Tiezheng Yu, Willy Chung, Pascale Fung

    Abstract: Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages. However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data. Additionally, directly adapting new languages to instruction-tuned LLMs can result in catastrophic forgetting, which leads to the loss of multitask… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  26. arXiv:2304.11220  [pdf, other

    cs.CL

    Learn What NOT to Learn: Towards Generative Safety in Chatbots

    Authors: Leila Khalatbari, Yejin Bang, Dan Su, Willy Chung, Saeed Ghadimi, Hossein Sameti, Pascale Fung

    Abstract: Conversational models that are generative and open-domain are particularly susceptible to generating unsafe content since they are trained on web-based social data. Prior approaches to mitigating this issue have drawbacks, such as disrupting the flow of conversation, limited generalization to unseen toxic input contexts, and sacrificing the quality of the dialogue for the sake of safety. In this p… ▽ More

    Submitted 25 April, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 9 pages, 3 tables, 3 figures

  27. arXiv:2304.09151  [pdf, other

    cs.CL

    UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

    Authors: Hyung Won Chung, Noah Constant, Xavier Garcia, Adam Roberts, Yi Tay, Sharan Narang, Orhan Firat

    Abstract: Pretrained multilingual large language models have typically used heuristic temperature-based sampling to balance between different languages. However previous work has not systematically evaluated the efficacy of different pretraining language distributions across model scales. In this paper, we propose a new sampling method, UniMax, that delivers more uniform coverage of head languages while mit… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  28. arXiv:2303.09664  [pdf, other

    cs.LG cs.AI cs.HC

    Tribe or Not? Critical Inspection of Group Differences Using TribalGram

    Authors: Yongsu Ahn, Muheng Yan, Yu-Ru Lin, Wen-Ting Chung, Rebecca Hwa

    Abstract: With the rise of AI and data mining techniques, group profiling and group-level analysis have been increasingly used in many domains including policy making and direct marketing. In some cases, the statistics extracted from data may provide insights to a group's shared characteristics; in others, the group-level analysis can lead to problems including stereotyping and systematic oppression. How ca… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    ACM Class: H.5.2

    Journal ref: ACM Transactions on Interactive Intelligent Systems (TiiS) 12.1 (2022): 1-34

  29. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  30. arXiv:2302.04023  [pdf, other

    cs.CL cs.AI

    A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

    Authors: Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung

    Abstract: This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.… ▽ More

    Submitted 28 November, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 45 pages, AACL 2023

  31. arXiv:2301.13688  [pdf, other

    cs.AI cs.CL cs.LG

    The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

    Authors: Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts

    Abstract: We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniqu… ▽ More

    Submitted 14 February, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  32. arXiv:2301.06276  [pdf, other

    cs.LG cs.AI

    The Role of Baselines in Policy Gradient Optimization

    Authors: Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

    Abstract: We study the effect of baselines in on-policy stochastic policy gradient optimization, and close the gap between the theory and practice of policy optimization methods. Our first contribution is to show that the \emph{state value} baseline allows on-policy stochastic \emph{natural} policy gradient (NPG) to converge to a globally optimal policy at an $O(1/t)$ rate, which was not previously known. T… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: 55 pages; published at NeurIPS 2022

  33. arXiv:2301.05331  [pdf, other

    math.ST cs.LG math.PR stat.ML

    Detection problems in the spiked matrix models

    Authors: Ji Hyung Jung, Hye Won Chung, Ji Oon Lee

    Abstract: We study the statistical decision process of detecting the low-rank signal from various signal-plus-noise type data matrices, known as the spiked random matrix models. We first show that the principal component analysis can be improved by entrywise pre-transforming the data matrix if the noise is non-Gaussian, generalizing the known results for the spiked random matrix models with rank-1 signals.… ▽ More

    Submitted 16 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: 80 pages, 6 figures. arXiv admin note: text overlap with arXiv:2104.13517

    MSC Class: 62H25; 62H15; 60B20

  34. arXiv:2301.00930  [pdf, other

    cs.LG

    Data Valuation Without Training of a Model

    Authors: Nohyun Ki, Hoyong Choi, Hye Won Chung

    Abstract: Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model. Such attempts reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a mo… ▽ More

    Submitted 7 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: ICLR 2023

  35. arXiv:2301.00006  [pdf, other

    cs.HC cs.IT cs.LG stat.ML

    Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

    Authors: Hyeonsu Jeong, Hye Won Chung

    Abstract: Crowdsourcing has emerged as an effective platform for labeling large amounts of data in a cost- and time-efficient manner. Most previous work has focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and th… ▽ More

    Submitted 31 May, 2023; v1 submitted 29 December, 2022; originally announced January 2023.

    Comments: ICML 2023

  36. arXiv:2212.13138  [pdf, other

    cs.CL

    Large Language Models Encode Clinical Knowledge

    Authors: Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Seneviratne, Paul Gamble, Chris Kelly, Nathaneal Scharli, Aakanksha Chowdhery, Philip Mansfield, Blaise Aguera y Arcas, Dale Webster, Greg S. Corrado, Yossi Matias, Katherine Chou, Juraj Gottweis, Nenad Tomasev, Yun Liu , et al. (5 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To a… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

  37. arXiv:2212.09396  [pdf, other

    stat.ML cs.IT cs.LG

    Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

    Authors: Daesung Kim, Hye Won Chung

    Abstract: The nonconvex formulation of matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient descent (GD) is the simplest yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combin… ▽ More

    Submitted 8 February, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

  38. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  39. arXiv:2210.11416  [pdf, other

    cs.LG cs.CL

    Scaling Instruction-Finetuned Language Models

    Authors: Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang , et al. (10 additional authors not shown)

    Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects d… ▽ More

    Submitted 6 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

  40. arXiv:2210.11399  [pdf, other

    cs.CL cs.AI cs.LG

    Transcending Scaling Laws with 0.1% Extra Compute

    Authors: Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani

    Abstract: Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model (e.g., PaLM) on a few more steps with UL2's mixture-of-denoiser objec… ▽ More

    Submitted 16 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: V2 has updated references/related work

  41. arXiv:2210.09261  [pdf, other

    cs.CL cs.AI

    Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

    Authors: Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

    Abstract: BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models. Language models have already made good progress on this benchmark, with the best model in the BIG-Bench paper outperforming average reported human-rater results on 65% of the BIG-Bench tasks via few-shot prompting. But on what tasks do language… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: GitHub repository: https://github.com/suzgunmirac/BIG-Bench-Hard

  42. arXiv:2210.03057  [pdf, other

    cs.CL cs.AI cs.LG

    Language Models are Multilingual Chain-of-Thought Reasoners

    Authors: Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei

    Abstract: We evaluate the reasoning abilities of large language models in multilingual settings. We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating 250 grade-school math problems from the GSM8K dataset (Cobbe et al., 2021) into ten typologically diverse languages. We find that the ability to solve MGSM problems via chain-of-thought prompting emerges with increasing mod… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  43. arXiv:2209.01638  [pdf, other

    cs.CL

    Every picture tells a story: Image-grounded controllable stylistic story generation

    Authors: Holy Lovenia, Bryan Wilie, Romain Barraud, Samuel Cahyawijaya, Willy Chung, Pascale Fung

    Abstract: Generating a short story out of an image is arduous. Unlike image captioning, story generation from an image poses multiple challenges: preserving the story coherence, appropriately assessing the quality of the story, steering the generated story into a certain style, and addressing the scarcity of image-story pair reference datasets limiting supervision during training. In this work, we introduce… ▽ More

    Submitted 11 September, 2022; v1 submitted 4 September, 2022; originally announced September 2022.

    Comments: Accepted in LaTeCH-CLfL 2022 (6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature), COLING 2022

  44. arXiv:2207.12546  [pdf, other

    cs.LG physics.flu-dyn

    The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning

    Authors: Wai Tong Chung, Ki Sung Jung, Jacqueline H. Chen, Matthias Ihme

    Abstract: In general, large datasets enable deep learning models to perform with good accuracy and generalizability. However, massive high-fidelity simulation datasets (from molecular chemistry, astrophysics, computational fluid dynamics (CFD), etc. can be challenging to curate due to dimensionality and storage constraints. Lossy compression algorithms can help mitigate limitations from storage, as long as… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted in ICML 2022 2nd AI for Science Workshop. 10 pages, 8 figures

    Journal ref: ICML 2022 2nd AI for Science Workshop

  45. arXiv:2207.10792  [pdf, other

    cs.CV cs.AI

    Test-Time Adaptation via Self-Training with Nearest Neighbor Information

    Authors: Minguk Jang, Sae-Young Chung, Hye Won Chung

    Abstract: Test-time adaptation (TTA) aims to adapt a trained classifier using online unlabeled test data only, without any information related to the training procedure. Most existing TTA methods adapt the trained classifier using the classifier's prediction on the test data as pseudo-label. However, under test-time domain shift, accuracy of the pseudo labels cannot be guaranteed, and thus the TTA methods o… ▽ More

    Submitted 27 February, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

  46. arXiv:2207.10551  [pdf, other

    cs.LG cs.CL

    Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

    Authors: Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler

    Abstract: There have been a lot of interest in the scaling properties of Transformer models. However, not much has been done on the front of investigating the effect of scaling properties of different inductive biases and model architectures. Do model architectures scale differently? If so, how does inductive bias affect scaling behaviour? How does this influence upstream (pretraining) and downstream (trans… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  47. arXiv:2206.00773  [pdf, other

    cs.CL cs.LG

    Assessing the trade-off between prediction accuracy and interpretability for topic modeling on energetic materials corpora

    Authors: Monica Puerto, Mason Kellett, Rodanthi Nikopoulou, Mark D. Fuge, Ruth Doherty, Peter W. Chung, Zois Boukouvalas

    Abstract: As the amount and variety of energetics research increases, machine aware topic identification is necessary to streamline future research pipelines. The makeup of an automatic topic identification process consists of creating document representations and performing classification. However, the implementation of these processes on energetics research imposes new challenges. Energetics datasets cont… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in the 25th International Seminar New Trends in Research of Energetic Materials (NTREM 2022 proceedings)

  48. arXiv:2205.05131  [pdf, other

    cs.CL

    UL2: Unifying Language Learning Paradigms

    Authors: Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Siamak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler

    Abstract: Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should be. This paper presents a unified framework for pre-training models that are universally effective across datasets and setups. We begin by disentangling architectural archetypes with pre-training objectiv… ▽ More

    Submitted 28 February, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: Updated Q1 2023 with Flan-UL2 20B release! :)

  49. arXiv:2204.05832  [pdf, other

    cs.CL cs.LG stat.ML

    What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

    Authors: Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, Colin Raffel

    Abstract: Large pretrained Transformer language models have been shown to exhibit zero-shot generalization, i.e. they can perform a wide variety of tasks that they were not explicitly trained on. However, the architectures and pretraining objectives used across state-of-the-art models differ significantly, and there has been limited systematic comparison of these factors. In this work, we present a large-sc… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  50. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.