Zum Hauptinhalt springen

Showing 1–50 of 52 results for author: Sohn, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.00359  [pdf, other

    cs.LG stat.ML

    Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

    Authors: Jy-yong Sohn, Dohyun Kwon, Seoyeon An, Kangwook Lee

    Abstract: Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neur… ▽ More

    Submitted 19 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 pages, 9 figures, UAI 2024

  2. arXiv:2406.16521  [pdf, other

    cs.CL cs.AI

    Carrot and Stick: Inducing Self-Motivation with Positive & Negative Feedback

    Authors: Jimin Sohn, Jeihee Cho, Junyong Lee, Songmu Heo, Ji-Eun Han, David R. Mortensen

    Abstract: Positive thinking is thought to be an important component of self-motivation in various practical fields such as education and the workplace. Previous work, including sentiment transfer and positive reframing, has focused on the positive side of language. However, self-motivation that drives people to reach their goals has not yet been studied from a computational perspective. Moreover, negative f… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 10 pages, 8 figures

  3. arXiv:2406.16030  [pdf, other

    cs.CL cs.AI

    Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages

    Authors: Jimin Sohn, Haeji Jung, Alex Cheng, Jooeon Kang, Yilin Du, David R. Mortensen

    Abstract: Existing zero-shot cross-lingual NER approaches require substantial prior knowledge of the target language, which is impractical for low-resource languages. In this paper, we propose a novel approach to NER using phonemic representation based on the International Phonetic Alphabet (IPA) to bridge the gap between representations of different languages. Our experiments show that our method significa… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, 5 tables

  4. arXiv:2405.16155  [pdf, other

    cs.CL

    Improving Multi-lingual Alignment Through Soft Contrastive Learning

    Authors: Minsu Park, Seyeon Choi, Chanyeol Choi, Jun-Seong Kim, Jy-yong Sohn

    Abstract: Making decent multi-lingual sentence representations is critical to achieve high performances in cross-lingual downstream tasks. In this work, we propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model. Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cr… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 8 pages, 1 figures, Accepted at NAACL SRW 2024

  5. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  6. arXiv:2404.00376  [pdf, other

    cs.CL

    Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

    Authors: Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Donghee Choi, Jaewoo Kang

    Abstract: While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving co… ▽ More

    Submitted 30 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Added new LLaMA-3-based models and experiments on NEJM case challenges

  7. arXiv:2403.17428  [pdf, other

    cs.AI cs.CL

    Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization

    Authors: Jae-hee So, Joonhwan Chang, Eunji Kim, Junho Na, JiYeon Choi, Jy-yong Sohn, Byung-Hoon Kim, Sang Hui Chu

    Abstract: Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interview… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  8. arXiv:2403.14255  [pdf, other

    cs.CL cs.LG

    ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification

    Authors: Sehee Lim, Yejin Kim, Chi-Hyun Choi, Jy-yong Sohn, Byung-Hoon Kim

    Abstract: Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performa… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  9. arXiv:2402.17097  [pdf, other

    cs.CL cs.AI

    Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses

    Authors: Juyeon Kim, Jeongeun Lee, Yoonho Chang, Chanyeol Choi, Junseong Kim, Jy-yong Sohn

    Abstract: Mitigating hallucination issues is a key challenge that must be overcome to reliably deploy large language models (LLMs) in real-world scenarios. Recently, various methods have been proposed to detect and revise factual errors in LLM-generated texts, in order to reduce hallucination. In this paper, we propose Re-Ex, a method for post-editing LLM-generated responses. Re-Ex introduces a novel reason… ▽ More

    Submitted 12 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 16 pages

  10. arXiv:2402.14279  [pdf, other

    cs.CL cs.AI

    Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding

    Authors: Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim, David R. Mortensen

    Abstract: Approaches to improving multilingual language understanding often require multiple languages during the training phase, rely on complicated training techniques, and -- importantly -- struggle with significant performance gaps between high-resource and low-resource languages. We hypothesize that the performance gaps between languages are affected by linguistic gaps between those languages and provi… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  11. arXiv:2402.12613  [pdf, other

    cs.LG

    Analysis of Using Sigmoid Loss for Contrastive Learning

    Authors: Chungpa Lee, Joonhwan Chang, Jy-yong Sohn

    Abstract: Contrastive learning has emerged as a prominent branch of self-supervised learning for several years. Especially, CLIP, which applies contrastive learning to large sets of captioned images, has garnered significant attention. Recently, SigLIP, a variant of CLIP, has been proposed, which uses the sigmoid loss instead of the standard InfoNCE loss. SigLIP achieves the performance comparable to CLIP i… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain

  12. arXiv:2402.10645  [pdf, other

    cs.CL cs.AI

    Can Separators Improve Chain-of-Thought Prompting?

    Authors: Yoonjeong Park, Hyunjin Kim, Chanyeol Choi, Junseong Kim, Jy-yong Sohn

    Abstract: Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large Language Models (LLMs). The basic idea of CoT is to let LLMs break down their thought processes step-by-step by putting exemplars in the input prompt. However, the densely structured prompt exemplars of CoT may cause the cognitive overload of LLMs. Inspired by human cognition, we int… ▽ More

    Submitted 7 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  13. arXiv:2402.00234  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Are Generative AI systems Capable of Supporting Information Needs of Patients?

    Authors: Shreya Rajagopal, Subhashis Hazarika, Sookyung Kim, Yan-ming Chiou, Jae Ho Sohn, Hari Subramonyam, Shiwali Mohan

    Abstract: Patients managing a complex illness such as cancer face a complex information challenge where they not only must learn about their illness but also how to manage it. Close interaction with healthcare experts (radiologists, oncologists) can improve patient learning and thereby, their disease outcome. However, this approach is resource intensive and takes expert time away from other critical tasks.… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  14. arXiv:2401.15269  [pdf, other

    cs.CL cs.AI cs.IR

    Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

    Authors: Minbyul Jeong, Jiwoong Sohn, Mujeen Sung, Jaewoo Kang

    Abstract: Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from… ▽ More

    Submitted 17 June, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: ISMB 2024

  15. arXiv:2311.05866  [pdf, other

    stat.ML cs.LG

    Fair Supervised Learning with A Simple Random Sampler of Sensitive Attributes

    Authors: Jinwon Sohn, Qifan Song, Guang Lin

    Abstract: As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of s… ▽ More

    Submitted 9 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

  16. arXiv:2310.03353  [pdf, other

    cs.AI cs.LG

    Deep Geometric Learning with Monotonicity Constraints for Alzheimer's Disease Progression

    Authors: Seungwoo Jeong, Wonsik Jung, Junghyo Sohn, Heung-Il Suk

    Abstract: Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observ… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  17. arXiv:2307.05906  [pdf, other

    cs.LG

    Mini-Batch Optimization of Contrastive Loss

    Authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

    Abstract: Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views of the same object) are similar, while embeddings of negative pairs are dissimilar. Practical constraints such as large memory requirements make it challenging t… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  18. arXiv:2306.10193  [pdf, other

    cs.CL cs.LG

    Conformal Language Modeling

    Authors: Victor Quach, Adam Fisch, Tal Schuster, Adam Yala, Jae Ho Sohn, Tommi S. Jaakkola, Regina Barzilay

    Abstract: We propose a novel approach to conformal prediction for generative language models (LMs). Standard conformal prediction produces prediction sets -- in place of single predictions -- that have rigorous, statistical performance guarantees. LM responses are typically sampled from the model's predicted distribution over the large, combinatorial output space of natural language. Translating this proces… ▽ More

    Submitted 1 June, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: ICLR 2024

  19. arXiv:2305.08592  [pdf, other

    cs.SE

    Time-based Repair for Asynchronous Wait Flaky Tests in Web Testing

    Authors: Yu Pei, Jeongju Sohn, Sarra Habchi, Mike Papadakis

    Abstract: Asynchronous waits are one of the most prevalent root causes of flaky tests and a major time-influential factor of web application testing. To investigate the characteristics of asynchronous wait flaky tests and their fixes in web testing, we build a dataset of 49 reproducible flaky tests, from 26 open-source projects, caused by asynchronous waits, along with their corresponding developer-written… ▽ More

    Submitted 19 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

  20. arXiv:2305.03609  [pdf, other

    stat.ML cs.CG cs.CR cs.LG math.AT

    Differentially Private Topological Data Analysis

    Authors: Taegyu Kang, Sehwan Kim, Jinwon Sohn, Jordan Awan

    Abstract: This paper is the first to attempt differentially private (DP) topological data analysis (TDA), producing near-optimal private persistence diagrams. We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance, and we show that the commonly used Čech complex has sensitivity that does not decrease as the sample size $n$ increases. This makes it challenging for the persiste… ▽ More

    Submitted 3 November, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: 23 pages before references and appendices, 42 pages total, 8 figures

  21. arXiv:2301.13196  [pdf, other

    cs.LG cs.AI

    Looped Transformers as Programmable Computers

    Authors: Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos

    Abstract: We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including embedding edit operations, non-linear functions, fu… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  22. arXiv:2212.08311  [pdf, other

    cs.CV cs.LG

    Can We Find Strong Lottery Tickets in Generative Models?

    Authors: Sangyeop Yeo, Yoojin Jang, Jy-yong Sohn, Dongyoon Han, Jaejun Yoo

    Abstract: Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algor… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  23. arXiv:2210.06732  [pdf, other

    cs.LG cs.CY

    Equal Improvability: A New Fairness Notion Considering the Long-term Impact

    Authors: Ozgur Guldogan, Yuchen Zeng, Jy-yong Sohn, Ramtin Pedarsani, Kangwook Lee

    Abstract: Devising a fair classifier that does not discriminate against different groups is an important problem in machine learning. Although researchers have proposed various ways of defining group fairness, most of them only focused on the immediate fairness, ignoring the long-term impact of a fair classifier under the dynamic scenario where each individual can improve its feature over time. Such dynamic… ▽ More

    Submitted 9 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Codes are available in a GitHub repository, see https://github.com/guldoganozgur/ei_fairness. ICLR 2023 Poster. 31 pages, 10 figures, 6 tables

  24. arXiv:2209.05635  [pdf, other

    cs.LG cs.AI

    Bending the Future: Autoregressive Modeling of Temporal Knowledge Graphs in Curvature-Variable Hyperbolic Spaces

    Authors: Jihoon Sohn, Mingyu Derek Ma, Muhao Chen

    Abstract: Recently there is an increasing scholarly interest in time-varying knowledge graphs, or temporal knowledge graphs (TKG). Previous research suggests diverse approaches to TKG reasoning that uses historical information. However, less attention has been given to the hierarchies within such information at different timestamps. Given that TKG is a sequence of knowledge graphs based on time, the chronol… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: 11 pages, 2 figures, In the 4th Conference on Automated Knowledge Base Construction (AKBC) 2022

  25. arXiv:2207.10143  [pdf, other

    cs.SE

    What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness

    Authors: Sarra Habchi, Guillaume Haben, Jeongju Sohn, Adriano Franci, Mike Papadakis, Maxime Cordy, Yves Le Traon

    Abstract: Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers' time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect a… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted at the 38th IEEE International Conference on Software Maintenance and Evolution (ICSME)

  26. arXiv:2206.06565  [pdf, other

    cs.LG cs.CL

    LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

    Authors: Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

    Abstract: Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding… ▽ More

    Submitted 30 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Accepted at NeurIPS 2022

  27. arXiv:2205.11616  [pdf, other

    cs.CL cs.LG

    Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

    Authors: Tuan Dinh, Jy-yong Sohn, Shashank Rajput, Timothy Ossowski, Yifei Ming, Junjie Hu, Dimitris Papailiopoulos, Kangwook Lee

    Abstract: Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations b… ▽ More

    Submitted 7 November, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings)

  28. arXiv:2204.05472  [pdf, other

    cs.LG cs.CR cs.CY

    Breaking Fair Binary Classification with Optimal Flipping Attacks

    Authors: Changhun Jo, Jy-yong Sohn, Kangwook Lee

    Abstract: Minimizing risk with fairness constraints is one of the popular approaches to learning a fair classifier. Recent works showed that this approach yields an unfair classifier if the training set is corrupted. In this work, we study the minimum amount of data corruption required for a successful flipping attack. First, we find lower/upper bounds on this quantity and show that these bounds are tight w… ▽ More

    Submitted 9 May, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

  29. arXiv:2203.11343  [pdf, ps, other

    cs.SE

    Using Evolutionary Coupling to Establish Relevance Links Between Tests and Code Units. A case study on fault localization

    Authors: Jeongju Sohn, Mike Papadakis

    Abstract: Many software engineering techniques, such as fault localization, operate based on relevance relationships between tests and code. These relationships are often inferred through the use of dynamic test execution information (test execution traces) that approximate the link between relevant code units and asserted, by the tests, program behaviour. Unfortunately, in practice dynamic information is n… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

  30. arXiv:2202.12002  [pdf, other

    cs.LG cs.AI cs.CV

    Rare Gems: Finding Lottery Tickets at Initialization

    Authors: Kartik Sreenivasan, Jy-yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Eric Xing, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle e… ▽ More

    Submitted 2 June, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  31. arXiv:2201.02354  [pdf, other

    cs.LG

    GenLabel: Mixup Relabeling using Generative Models

    Authors: Jy-yong Sohn, Liang Shang, Hongxu Chen, Jaekyun Moon, Dimitris Papailiopoulos, Kangwook Lee

    Abstract: Mixup is a data augmentation method that generates new data points by mixing a pair of input data. While mixup generally improves the prediction performance, it sometimes degrades the performance. In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

  32. arXiv:2111.05096  [pdf, other

    cs.CR

    E-voting System Using Homomorphic Encryption and Blockchain Technology to Encrypt Voter Data

    Authors: Hyunyeon Kim, Kyung Eun Kim, Soohan Park, Jongsoo Sohn

    Abstract: Homomorphic encryption and blockchain technology are regarded as two significant technologies for improving e-voting systems. In this paper, we suggest a novel e-voting system using homomorphic encryption and blockchain technology that is focused on encrypting voter data. By encrypting voter information rather than cast votes, the system enables various statistical analyses regarding the vote resu… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

  33. arXiv:2110.08996  [pdf, other

    cs.LG cs.AI

    Finding Everything within Random Binary Networks

    Authors: Kartik Sreenivasan, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos

    Abstract: A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up line of theoretical work provides justification of these findings by proving that slightly overparameterized neural networks, with commonly used cont… ▽ More

    Submitted 22 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

  34. arXiv:2104.04952  [pdf, other

    cs.CV cs.AI

    Fine-Grained Attention for Weakly Supervised Object Localization

    Authors: Junghyo Sohn, Eunjin Jeon, Wonsik Jung, Eunsong Kang, Heung-Il Suk

    Abstract: Although recent advances in deep learning accelerated an improvement in a weakly supervised object localization (WSOL) task, there are still challenges to identify the entire body of an object, rather than only discriminative parts. In this paper, we propose a novel residual fine-grained attention (RFGA) module that autonomously excites the less activated regions of an object by utilizing informat… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: 16 pages, 11 figures

  35. arXiv:2102.13267  [pdf, other

    cs.PL cs.LG

    LazyTensor: combining eager execution with domain-specific compilers

    Authors: Alex Suhan, Davide Libenzi, Ailing Zhang, Parker Schuh, Brennan Saeta, Jie Young Sohn, Denys Shabalin

    Abstract: Domain-specific optimizing compilers have demonstrated significant performance and portability benefits, but require programs to be represented in their specialized IRs. Existing frontends to these compilers suffer from the "language subset problem" where some host language features are unsupported in the subset of the user's program that interacts with the domain-specific compiler. By contrast, d… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

  36. arXiv:2012.05433  [pdf, other

    cs.LG cs.CR cs.IT

    Communication-Computation Efficient Secure Aggregation for Federated Learning

    Authors: Beongjun Choi, Jy-yong Sohn, Dong-Jun Han, Jaekyun Moon

    Abstract: Federated learning has been spotlighted as a way to train neural networks using distributed data with no need for individual nodes to share data. Unfortunately, it has also been shown that adversaries may be able to extract local data contents off model parameters transmitted during federated learning. A recent solution based on the secure aggregation primitive enabled privacy-preserving federated… ▽ More

    Submitted 12 July, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

  37. arXiv:2007.05084  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

    Authors: Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is cur… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

  38. Arachne: Search Based Repair of Deep Neural Networks

    Authors: Jeongju Sohn, Sungmin Kang, Shin Yoo

    Abstract: The rapid and widespread adoption of Deep Neural Networks (DNNs) has called for ways to test their behaviour, and many testing approaches have successfully revealed misbehaviour of DNNs. However, it is relatively unclear what one can do to correct such behaviour after revelation, as retraining involves costly data collection and does not guarantee to fix the underlying issue. This paper introduces… ▽ More

    Submitted 16 August, 2022; v1 submitted 28 December, 2019; originally announced December 2019.

  39. arXiv:1910.06093  [pdf, other

    cs.IT cs.DC cs.LG

    Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks

    Authors: Jy-yong Sohn, Dong-Jun Han, Beongjun Choi, Jaekyun Moon

    Abstract: Recent advances in large-scale distributed learning algorithms have enabled communication-efficient training via SignSGD. Unfortunately, a major issue continues to plague distributed learning: namely, Byzantine failures may incur serious degradation in learning accuracy. This paper proposes Election Coding, a coding-theoretic framework to guarantee Byzantine-robustness for SignSGD with Majority Vo… ▽ More

    Submitted 24 October, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: Accepted at NeurIPS 2020

  40. arXiv:1909.06326  [pdf, other

    q-bio.QM cs.CV cs.LG eess.IV physics.med-ph

    Automatic Hip Fracture Identification and Functional Subclassification with Deep Learning

    Authors: Justin D Krogue, Kaiyang V Cheng, Kevin M Hwang, Paul Toogood, Eric G Meinberg, Erik J Geiger, Musa Zaid, Kevin C McGill, Rina Patel, Jae Ho Sohn, Alexandra Wright, Bryan F Darger, Kevin A Padrez, Eugene Ozhinsky, Sharmila Majumdar, Valentina Pedoia

    Abstract: Purpose: Hip fractures are a common cause of morbidity and mortality. Automatic identification and classification of hip fractures using deep learning may improve outcomes by reducing diagnostic errors and decreasing time to operation. Methods: Hip and pelvic radiographs from 1118 studies were reviewed and 3034 hips were labeled via bounding boxes and classified as normal, displaced femoral neck f… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: Presented at Orthopaedic Research Society, Austin, TX, Feb 2, 2019, currently in submission for publication

  41. arXiv:1901.05162  [pdf, other

    cs.IT

    Coded Matrix Multiplication on a Group-Based Model

    Authors: Muah Kim, Jy-yong Sohn, Jaekyun Moon

    Abstract: Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the clustered or grouped structure of real-world computing servers. Neither the large variations in the computing power and bandwidth capabilities across different serv… ▽ More

    Submitted 16 January, 2019; originally announced January 2019.

    Comments: 9 pages, submitted to ISIT 2019

  42. arXiv:1901.03610  [pdf, other

    cs.IT

    Coded Distributed Computing over Packet Erasure Channels

    Authors: Dong-Jun Han, Jy-yong Sohn, Jaekyun Moon

    Abstract: Coded computation is a framework which provides redundancy in distributed computing systems to speed up largescale tasks. Although most existing works assume an error-free scenarios in a master-worker setup, the link failures are common in current wired/wireless networks. In this paper, we consider the straggler problem in coded distributed computing with link failures, by modeling the links betwe… ▽ More

    Submitted 11 January, 2019; originally announced January 2019.

    Comments: 12 pages

  43. Joint Correction of Attenuation and Scatter Using Deep Convolutional Neural Networks (DCNN) for Time-of-Flight PET

    Authors: Jaewon Yang, Dookun Park, Jae Ho Sohn, Zhen Jane Wang, Grant T. Gullberg, Youngho Seo

    Abstract: Deep convolutional neural networks (DCNN) have demonstrated its capability to convert MR image to pseudo CT for PET attenuation correction in PET/MRI. Conventionally, attenuated events are corrected in sinogram space using attenuation maps derived from CT or MR-derived pseudo CT. Separately, scattered events are iteratively estimated by a 3D model-based simulation using down-sampled attenuation an… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: 4 pages, 7 figures, IEEE MIC 2018 conference

  44. arXiv:1801.04686  [pdf, other

    cs.DC cs.IT

    Hierarchical Coding for Distributed Computing

    Authors: Hyegyeong Park, Kangwook Lee, Jy-yong Sohn, Changho Suh, Jaekyun Moon

    Abstract: Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of groups of workers, motivated by the need to reflect the architectures of real-world distributed computing systems. In this work, we propose a hierarchical codi… ▽ More

    Submitted 15 January, 2018; originally announced January 2018.

    Comments: 7 pages, part of the paper is submitted to ISIT2018

  45. arXiv:1801.02287  [pdf, other

    cs.IT

    Explicit Constructions of MBR and MSR Codes for Clustered Distributed Storage

    Authors: Jy-yong Sohn, Beongjun Choi, Jaekyun Moon

    Abstract: This paper considers capacity-achieving coding for the clustered form of distributed storage that reflects practical storage networks. To reflect the clustered structure with limited cross-cluster communication bandwidths, nodes in the same cluster are set to communicate $β_I$ symbols, while nodes in other clusters can communicate $β_c \leq β_I$ symbols with one another. We provide two types of ex… ▽ More

    Submitted 2 August, 2019; v1 submitted 7 January, 2018; originally announced January 2018.

    Comments: 25 pages, submitted to IEEE Transactions on Information Theory

  46. arXiv:1801.02014  [pdf, other

    cs.IT

    A Class of MSR Codes for Clustered Distributed Storage

    Authors: Jy-yong Sohn, Beongjun Choi, Jaekyun Moon

    Abstract: Clustered distributed storage models real data centers where intra- and cross-cluster repair bandwidths are different. In this paper, exact-repair minimum-storage-regenerating (MSR) codes achieving capacity of clustered distributed storage are designed. Focus is given on two cases: $ε=0$ and $ε=1/(n-k)$, where $ε$ is the ratio of the available cross- and intra-cluster repair bandwidths, $n$ is the… ▽ More

    Submitted 6 January, 2018; originally announced January 2018.

    Comments: 9 pages, a part of this paper is submitted to IEEE ISIT2018

  47. arXiv:1710.02821  [pdf, other

    cs.IT

    Capacity of Clustered Distributed Storage

    Authors: Jy-yong Sohn, Beongjun Choi, Sung Whan Yoon, Jaekyun Moon

    Abstract: A new system model reflecting the clustered structure of distributed storage is suggested to investigate interplay between storage overhead and repair bandwidth as storage node failures occur. Large data centers with multiple racks/disks or local networks of storage devices (e.g. sensor network) are good applications of the suggested clustered model. In realistic scenarios involving clustered stor… ▽ More

    Submitted 1 May, 2018; v1 submitted 8 October, 2017; originally announced October 2017.

    Comments: To Appear at IEEE Transactions on Information Theory

  48. arXiv:1710.02811  [pdf, other

    cs.IT

    On Reusing Pilots Among Interfering Cells in Massive MIMO

    Authors: Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon

    Abstract: Pilot contamination, caused by the reuse of pilots among interfering cells, remains as a significant obstacle that limits the performance of massive multi-input multi-output antenna systems. To handle this problem, less aggressive reuse of pilots involving allocation of additional pilots for interfering users is closely examined in this paper. Hierarchical pilot reuse methods are proposed, which e… ▽ More

    Submitted 8 October, 2017; originally announced October 2017.

    Comments: 13 pages, to be appear in IEEE Transactions on Wireless Communications

  49. arXiv:1705.01061  [pdf, other

    cs.IT

    Pilot Reuse Strategy Maximizing the Weighted-Sum-Rate in Massive MIMO Systems

    Authors: Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon

    Abstract: Pilot reuse in multi-cell massive multi-input multi-output (MIMO) system is investigated where user groups with different priorities exist. Recent investigation on pilot reuse has revealed that when the ratio of the coherent time interval to the number of users is reasonably high, it is beneficial not to fully reuse pilots from interfering cells. This work finds the optimum pilot assignment strate… ▽ More

    Submitted 2 May, 2017; originally announced May 2017.

    Comments: 13 pages, to appear in IEEE Journal on Selected Areas in Communications 2017

  50. arXiv:1702.07498  [pdf, other

    cs.IT

    Secure Clustered Distributed Storage Against Eavesdroppers

    Authors: Beongjun Choi, Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon

    Abstract: This paper considers the security issue of practical distributed storage systems (DSSs) which consist of multiple clusters of storage nodes. Noticing that actual storage nodes constituting a DSS are distributed in multiple clusters, two novel eavesdropper models - the node-restricted model and the cluster-restricted model - are suggested which reflect the clustered nature of DSSs. In the node-rest… ▽ More

    Submitted 24 February, 2017; originally announced February 2017.

    Comments: 6 pages, accepted at IEEE ICC 2017