Search | arXiv e-print repository

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

Authors: Jy-yong Sohn, Dohyun Kwon, Seoyeon An, Kangwook Lee

Abstract: Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neur… ▽ More Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons ($m$) needed to arbitrarily change $N$ labels among $K$ samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network $f$ and a neural network $g$ (with $m$ neurons) designed for fine-tuning. When $g$ is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that $N$ samples can be fine-tuned with $m=Θ(N)$ neurons for 2-layer networks, and with $m=Θ(\sqrt{N})$ neurons for 3-layer networks, no matter how large $K$ is. Our results recover the known memorization capacity results when $N = K$ as a special case. △ Less

Submitted 19 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

Comments: 10 pages, 9 figures, UAI 2024

arXiv:2406.16521 [pdf, other]

Carrot and Stick: Inducing Self-Motivation with Positive & Negative Feedback

Authors: Jimin Sohn, Jeihee Cho, Junyong Lee, Songmu Heo, Ji-Eun Han, David R. Mortensen

Abstract: Positive thinking is thought to be an important component of self-motivation in various practical fields such as education and the workplace. Previous work, including sentiment transfer and positive reframing, has focused on the positive side of language. However, self-motivation that drives people to reach their goals has not yet been studied from a computational perspective. Moreover, negative f… ▽ More Positive thinking is thought to be an important component of self-motivation in various practical fields such as education and the workplace. Previous work, including sentiment transfer and positive reframing, has focused on the positive side of language. However, self-motivation that drives people to reach their goals has not yet been studied from a computational perspective. Moreover, negative feedback has not yet been explored, even though positive and negative feedback are both necessary to grow self-motivation. To facilitate self-motivation, we propose CArrot and STICk (CASTIC) dataset, consisting of 12,590 sentences with 5 different strategies for enhancing self-motivation. Our data and code are publicly available at here. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 10 pages, 8 figures

arXiv:2406.16030 [pdf, other]

Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages

Authors: Jimin Sohn, Haeji Jung, Alex Cheng, Jooeon Kang, Yilin Du, David R. Mortensen

Abstract: Existing zero-shot cross-lingual NER approaches require substantial prior knowledge of the target language, which is impractical for low-resource languages. In this paper, we propose a novel approach to NER using phonemic representation based on the International Phonetic Alphabet (IPA) to bridge the gap between representations of different languages. Our experiments show that our method significa… ▽ More Existing zero-shot cross-lingual NER approaches require substantial prior knowledge of the target language, which is impractical for low-resource languages. In this paper, we propose a novel approach to NER using phonemic representation based on the International Phonetic Alphabet (IPA) to bridge the gap between representations of different languages. Our experiments show that our method significantly outperforms baseline models in extremely low-resource languages, with the highest average F-1 score (46.38%) and lowest standard deviation (12.67), particularly demonstrating its robustness with non-Latin scripts. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures, 5 tables

arXiv:2405.16155 [pdf, other]

Improving Multi-lingual Alignment Through Soft Contrastive Learning

Authors: Minsu Park, Seyeon Choi, Chanyeol Choi, Jun-Seong Kim, Jy-yong Sohn

Abstract: Making decent multi-lingual sentence representations is critical to achieve high performances in cross-lingual downstream tasks. In this work, we propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model. Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cr… ▽ More Making decent multi-lingual sentence representations is critical to achieve high performances in cross-lingual downstream tasks. In this work, we propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model. Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cross-lingual embeddings follows the similarity of sentences measured at the mono-lingual teacher model. Our method can be considered as contrastive learning with soft labels defined as the similarity between sentences. Our experimental results on five languages show that our contrastive loss with soft labels far outperforms conventional contrastive loss with hard labels in various benchmarks for bitext mining tasks and STS tasks. In addition, our method outperforms existing multi-lingual embeddings including LaBSE, for Tatoeba dataset. The code is available at https://github.com/YAI12xLinq-B/IMASCL △ Less

Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

Comments: 8 pages, 1 figures, Accepted at NAACL SRW 2024

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00376 [pdf, other]

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

Authors: Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Donghee Choi, Jaewoo Kang

Abstract: While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving co… ▽ More While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters. The models were trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our systems achieved remarkable accuracy across six medical benchmarks, surpassing the previous best models such as MediTron and BioMistral, and GPT-3.5 by a large margin. Notably, Meerkat-7B surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model, while Meerkat-70B outperformed GPT-4 by an average of 1.3%. Additionally, Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8 and closely matching GPT-4's 21.8. Our systems offered more detailed free-form responses to clinical queries compared to existing small models, approaching the performance level of large commercial models. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges. △ Less

Submitted 30 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Added new LLaMA-3-based models and experiments on NEJM case challenges

arXiv:2403.17428 [pdf, other]

Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization

Authors: Jae-hee So, Joonhwan Chang, Eunji Kim, Junho Na, JiYeon Choi, Jy-yong Sohn, Byung-Hoon Kim, Sang Hui Chu

Abstract: Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interview… ▽ More Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interviews, by analyzing counseling data from North Korean defectors with traumatic events and mental health issues. Specifically, we investigate whether LLMs can (1) delineate the part of the conversation that suggests psychiatric symptoms and name the symptoms, and (2) summarize stressors and symptoms, based on the interview dialogue transcript. Here, the transcript data was labeled by mental health experts for training and evaluation of LLMs. Our experimental results show that appropriately prompted LLMs can achieve high performance on both the symptom delineation task and the summarization task. This research contributes to the nascent field of applying LLMs to psychiatric interview and demonstrates their potential effectiveness in aiding mental health practitioners. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.14255 [pdf, other]

ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification

Authors: Sehee Lim, Yejin Kim, Chi-Hyun Choi, Jy-yong Sohn, Byung-Hoon Kim

Abstract: Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performa… ▽ More Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performance with the aid of additional modules of (1) extracting the parts related to cognitive distortion, and (2) debating the reasoning steps by multiple agents. Our experimental results on a public dataset show that ERD improves the multi-class F1 score as well as binary specificity score. Regarding the latter score, it turns out that our method is effective in debiasing the baseline method which has high false positive rate, especially when the summary of multi-agent debate is provided to LLMs. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2402.17097 [pdf, other]

Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses

Authors: Juyeon Kim, Jeongeun Lee, Yoonho Chang, Chanyeol Choi, Junseong Kim, Jy-yong Sohn

Abstract: Mitigating hallucination issues is a key challenge that must be overcome to reliably deploy large language models (LLMs) in real-world scenarios. Recently, various methods have been proposed to detect and revise factual errors in LLM-generated texts, in order to reduce hallucination. In this paper, we propose Re-Ex, a method for post-editing LLM-generated responses. Re-Ex introduces a novel reason… ▽ More Mitigating hallucination issues is a key challenge that must be overcome to reliably deploy large language models (LLMs) in real-world scenarios. Recently, various methods have been proposed to detect and revise factual errors in LLM-generated texts, in order to reduce hallucination. In this paper, we propose Re-Ex, a method for post-editing LLM-generated responses. Re-Ex introduces a novel reasoning step dubbed as the factual error explanation step. Re-Ex revises the initial response of LLMs using 3-steps : first, external tools are used to retrieve the evidences of the factual errors in the initial LLM response; next, LLM is instructed to explain the problematic parts of the response based on the gathered evidence; finally, LLM revises the initial response using the explanations provided in the previous step. In addition to the explanation step, Re-Ex also incorporates new prompting techniques to reduce the token count and inference time required for the response revision process. Compared with existing methods including FacTool, CoVE, and RARR, Re-Ex provides better detection and revision performance with less inference time and fewer tokens in multiple benchmarks. △ Less

Submitted 12 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 16 pages

arXiv:2402.14279 [pdf, other]

Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding

Authors: Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim, David R. Mortensen

Abstract: Approaches to improving multilingual language understanding often require multiple languages during the training phase, rely on complicated training techniques, and -- importantly -- struggle with significant performance gaps between high-resource and low-resource languages. We hypothesize that the performance gaps between languages are affected by linguistic gaps between those languages and provi… ▽ More Approaches to improving multilingual language understanding often require multiple languages during the training phase, rely on complicated training techniques, and -- importantly -- struggle with significant performance gaps between high-resource and low-resource languages. We hypothesize that the performance gaps between languages are affected by linguistic gaps between those languages and provide a novel solution for robust multilingual language modeling by employing phonemic representations (specifically, using phonemes as input tokens to LMs rather than subwords). We present quantitative evidence from three cross-lingual tasks that demonstrate the effectiveness of phonemic representation, which is further justified by a theoretical analysis of the cross-lingual performance gap. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12613 [pdf, other]

Analysis of Using Sigmoid Loss for Contrastive Learning

Authors: Chungpa Lee, Joonhwan Chang, Jy-yong Sohn

Abstract: Contrastive learning has emerged as a prominent branch of self-supervised learning for several years. Especially, CLIP, which applies contrastive learning to large sets of captioned images, has garnered significant attention. Recently, SigLIP, a variant of CLIP, has been proposed, which uses the sigmoid loss instead of the standard InfoNCE loss. SigLIP achieves the performance comparable to CLIP i… ▽ More Contrastive learning has emerged as a prominent branch of self-supervised learning for several years. Especially, CLIP, which applies contrastive learning to large sets of captioned images, has garnered significant attention. Recently, SigLIP, a variant of CLIP, has been proposed, which uses the sigmoid loss instead of the standard InfoNCE loss. SigLIP achieves the performance comparable to CLIP in a more efficient manner by eliminating the need for a global view. However, theoretical understanding of using the sigmoid loss in contrastive learning is underexplored. In this paper, we provide a theoretical analysis of using the sigmoid loss in contrastive learning, in the perspective of the geometric structure of learned embeddings. First, we propose the double-Constant Embedding Model (CCEM), a framework for parameterizing various well-known embedding structures by a single variable. Interestingly, the proposed CCEM is proven to contain the optimal embedding with respect to the sigmoid loss. Second, we mathematically analyze the optimal embedding minimizing the sigmoid loss for contrastive learning. The optimal embedding ranges from simplex equiangular-tight-frame to antipodal structure, depending on the temperature parameter used in the sigmoid loss. Third, our experimental results on synthetic datasets coincide with the theoretical results on the optimal embedding structures. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain

arXiv:2402.10645 [pdf, other]

Can Separators Improve Chain-of-Thought Prompting?

Authors: Yoonjeong Park, Hyunjin Kim, Chanyeol Choi, Junseong Kim, Jy-yong Sohn

Abstract: Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large Language Models (LLMs). The basic idea of CoT is to let LLMs break down their thought processes step-by-step by putting exemplars in the input prompt. However, the densely structured prompt exemplars of CoT may cause the cognitive overload of LLMs. Inspired by human cognition, we int… ▽ More Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large Language Models (LLMs). The basic idea of CoT is to let LLMs break down their thought processes step-by-step by putting exemplars in the input prompt. However, the densely structured prompt exemplars of CoT may cause the cognitive overload of LLMs. Inspired by human cognition, we introduce COT-SEP, a method that strategically employs separators at the end of each exemplar in CoT prompting. These separators are designed to help the LLMs understand their thought processes better while reasoning. Interestingly, it turns out that COT-SEP significantly improves the LLMs' performances on complex reasoning tasks (e.g., GSM8K, AQuA, CSQA), compared with the vanilla CoT, which does not use separators. We also study the effects of the type and the location of separators tested on multiple LLMs, including GPT-3.5-Turbo, GPT-4, and LLaMA-2 7B. △ Less

Submitted 7 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.00234 [pdf, other]

Are Generative AI systems Capable of Supporting Information Needs of Patients?

Authors: Shreya Rajagopal, Subhashis Hazarika, Sookyung Kim, Yan-ming Chiou, Jae Ho Sohn, Hari Subramonyam, Shiwali Mohan

Abstract: Patients managing a complex illness such as cancer face a complex information challenge where they not only must learn about their illness but also how to manage it. Close interaction with healthcare experts (radiologists, oncologists) can improve patient learning and thereby, their disease outcome. However, this approach is resource intensive and takes expert time away from other critical tasks.… ▽ More Patients managing a complex illness such as cancer face a complex information challenge where they not only must learn about their illness but also how to manage it. Close interaction with healthcare experts (radiologists, oncologists) can improve patient learning and thereby, their disease outcome. However, this approach is resource intensive and takes expert time away from other critical tasks. Given the recent advancements in Generative AI models aimed at improving the healthcare system, our work investigates whether and how generative visual question answering systems can responsibly support patient information needs in the context of radiology imaging data. We conducted a formative need-finding study in which participants discussed chest computed tomography (CT) scans and associated radiology reports of a fictitious close relative with a cardiothoracic radiologist. Using thematic analysis of the conversation between participants and medical experts, we identified commonly occurring themes across interactions, including clarifying medical terminology, locating the problems mentioned in the report in the scanned image, understanding disease prognosis, discussing the next diagnostic steps, and comparing treatment options. Based on these themes, we evaluated two state-of-the-art generative visual language models against the radiologist's responses. Our results reveal variability in the quality of responses generated by the models across various themes. We highlight the importance of patient-facing generative AI systems to accommodate a diverse range of conversational themes, catering to the real-world informational needs of patients. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.15269 [pdf, other]

Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

Authors: Minbyul Jeong, Jiwoong Sohn, Mujeen Sung, Jaewoo Kang

Abstract: Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from… ▽ More Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains. △ Less

Submitted 17 June, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: ISMB 2024

arXiv:2311.05866 [pdf, other]

Fair Supervised Learning with A Simple Random Sampler of Sensitive Attributes

Authors: Jinwon Sohn, Qifan Song, Guang Lin

Abstract: As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of s… ▽ More As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of sensitive attributes and response variables, the proposed penalty is able to handle versatile formats of the sensitive attributes, so it is more extensively applicable in practice than many existing algorithms. This penalty enables us to build a computationally efficient group-level in-processing fairness-aware training framework. Empirical evidence shows that our framework enjoys better utility and fairness measures on popular benchmark data sets than competing methods. We also theoretically characterize estimation errors and loss of utility of the proposed neural-penalized risk minimization problem. △ Less

Submitted 9 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.03353 [pdf, other]

Deep Geometric Learning with Monotonicity Constraints for Alzheimer's Disease Progression

Authors: Seungwoo Jeong, Wonsik Jung, Junghyo Sohn, Heung-Il Suk

Abstract: Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observ… ▽ More Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observations, and (iii) temporal geometric characteristics. However, deep learning-based approaches regarding data variability and sparsity have yet to consider inherent geometrical properties sufficiently. The ordinary differential equation-based geometric modeling method (ODE-RGRU) has recently emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. Despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. Therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. We have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. We verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. Furthermore, we thoroughly analyze our proposed framework through an ablation study. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2307.05906 [pdf, other]

Mini-Batch Optimization of Contrastive Loss

Authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

Abstract: Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views of the same object) are similar, while embeddings of negative pairs are dissimilar. Practical constraints such as large memory requirements make it challenging t… ▽ More Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views of the same object) are similar, while embeddings of negative pairs are dissimilar. Practical constraints such as large memory requirements make it challenging to consider all possible positive and negative pairs, leading to the use of mini-batch optimization. In this paper, we investigate the theoretical aspects of mini-batch optimization in contrastive learning. We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $\binom{N}{B}$ mini-batches are selected, while sub-optimality may arise when examining only a subset. We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches. Our experimental results validate our theoretical findings and demonstrate that our proposed algorithm outperforms vanilla SGD in practically relevant settings, providing a better understanding of mini-batch optimization in contrastive learning. △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2306.10193 [pdf, other]

Conformal Language Modeling

Authors: Victor Quach, Adam Fisch, Tal Schuster, Adam Yala, Jae Ho Sohn, Tommi S. Jaakkola, Regina Barzilay

Abstract: We propose a novel approach to conformal prediction for generative language models (LMs). Standard conformal prediction produces prediction sets -- in place of single predictions -- that have rigorous, statistical performance guarantees. LM responses are typically sampled from the model's predicted distribution over the large, combinatorial output space of natural language. Translating this proces… ▽ More We propose a novel approach to conformal prediction for generative language models (LMs). Standard conformal prediction produces prediction sets -- in place of single predictions -- that have rigorous, statistical performance guarantees. LM responses are typically sampled from the model's predicted distribution over the large, combinatorial output space of natural language. Translating this process to conformal prediction, we calibrate a stopping rule for sampling different outputs from the LM that get added to a growing set of candidates until we are confident that the output set is sufficient. Since some samples may be low-quality, we also simultaneously calibrate and apply a rejection rule for removing candidates from the output set to reduce noise. Similar to conformal prediction, we prove that the sampled set returned by our procedure contains at least one acceptable answer with high probability, while still being empirically precise (i.e., small) on average. Furthermore, within this set of candidate responses, we show that we can also accurately identify subsets of individual components -- such as phrases or sentences -- that are each independently correct (e.g., that are not "hallucinations"), again with statistical guarantees. We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation using different LM variants. △ Less

Submitted 1 June, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: ICLR 2024

arXiv:2305.08592 [pdf, other]

Time-based Repair for Asynchronous Wait Flaky Tests in Web Testing

Authors: Yu Pei, Jeongju Sohn, Sarra Habchi, Mike Papadakis

Abstract: Asynchronous waits are one of the most prevalent root causes of flaky tests and a major time-influential factor of web application testing. To investigate the characteristics of asynchronous wait flaky tests and their fixes in web testing, we build a dataset of 49 reproducible flaky tests, from 26 open-source projects, caused by asynchronous waits, along with their corresponding developer-written… ▽ More Asynchronous waits are one of the most prevalent root causes of flaky tests and a major time-influential factor of web application testing. To investigate the characteristics of asynchronous wait flaky tests and their fixes in web testing, we build a dataset of 49 reproducible flaky tests, from 26 open-source projects, caused by asynchronous waits, along with their corresponding developer-written fixes. Our study of these flaky tests reveals that in approximately 63% of them (31 out of 49), developers addressed Asynchronous Wait flaky tests by adapting the wait time, even for cases where the root causes lie elsewhere. Based on this finding, we propose TRaf, an automated time-based repair method for asynchronous wait flaky tests in web applications. TRaf tackles the flakiness issues by suggesting a proper waiting time for each asynchronous call in a web application, using code similarity and past change history. The core insight is that as developers often make similar mistakes more than once, hints for the efficient wait time exist in the current or past codebase. Our analysis shows that TRaf can suggest a shorter wait time to resolve the test flakiness compared to developer-written fixes, reducing the test execution time by 11.1%. With additional dynamic tuning of the new wait time, TRaf further reduces the execution time by 20.2%. △ Less

Submitted 19 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.03609 [pdf, other]

Differentially Private Topological Data Analysis

Authors: Taegyu Kang, Sehwan Kim, Jinwon Sohn, Jordan Awan

Abstract: This paper is the first to attempt differentially private (DP) topological data analysis (TDA), producing near-optimal private persistence diagrams. We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance, and we show that the commonly used Čech complex has sensitivity that does not decrease as the sample size $n$ increases. This makes it challenging for the persiste… ▽ More This paper is the first to attempt differentially private (DP) topological data analysis (TDA), producing near-optimal private persistence diagrams. We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance, and we show that the commonly used Čech complex has sensitivity that does not decrease as the sample size $n$ increases. This makes it challenging for the persistence diagrams of Čech complexes to be privatized. As an alternative, we show that the persistence diagram obtained by the $L^1$-distance to measure (DTM) has sensitivity $O(1/n)$. Based on the sensitivity analysis, we propose using the exponential mechanism whose utility function is defined in terms of the bottleneck distance of the $L^1$-DTM persistence diagrams. We also derive upper and lower bounds of the accuracy of our privacy mechanism; the obtained bounds indicate that the privacy error of our mechanism is near-optimal. We demonstrate the performance of our privatized persistence diagrams through simulations as well as on a real dataset tracking human movement. △ Less

Submitted 3 November, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

Comments: 23 pages before references and appendices, 42 pages total, 8 figures

arXiv:2301.13196 [pdf, other]

Looped Transformers as Programmable Computers

Authors: Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos

Abstract: We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including embedding edit operations, non-linear functions, fu… ▽ More We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including embedding edit operations, non-linear functions, function calls, program counters, and conditional branches. Using these building blocks, we emulate a small instruction-set computer. This allows us to map iterative algorithms to programs that can be executed by a looped, 13-layer transformer. We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation. Our work highlights the versatility of the attention mechanism, and demonstrates that even shallow transformers can execute full-fledged, general-purpose programs. △ Less

Submitted 30 January, 2023; originally announced January 2023.

arXiv:2212.08311 [pdf, other]

Can We Find Strong Lottery Tickets in Generative Models?

Authors: Sangyeop Yeo, Yoojin Jang, Jy-yong Sohn, Dongyoon Han, Jaejun Yoo

Abstract: Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algor… ▽ More Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algorithms suffer from excessive weight-training costs, performance degradation, limited generalizability, or complicated training. To address these problems, we propose to find a strong lottery ticket via moment-matching scores. Our experimental results show that the discovered subnetwork can perform similarly or better than the trained dense model even when only 10% of the weights remain. To the best of our knowledge, we are the first to show the existence of strong lottery tickets in generative models and provide an algorithm to find it stably. Our code and supplementary materials are publicly available. △ Less

Submitted 16 December, 2022; originally announced December 2022.

arXiv:2210.06732 [pdf, other]

Equal Improvability: A New Fairness Notion Considering the Long-term Impact

Authors: Ozgur Guldogan, Yuchen Zeng, Jy-yong Sohn, Ramtin Pedarsani, Kangwook Lee

Abstract: Devising a fair classifier that does not discriminate against different groups is an important problem in machine learning. Although researchers have proposed various ways of defining group fairness, most of them only focused on the immediate fairness, ignoring the long-term impact of a fair classifier under the dynamic scenario where each individual can improve its feature over time. Such dynamic… ▽ More Devising a fair classifier that does not discriminate against different groups is an important problem in machine learning. Although researchers have proposed various ways of defining group fairness, most of them only focused on the immediate fairness, ignoring the long-term impact of a fair classifier under the dynamic scenario where each individual can improve its feature over time. Such dynamic scenarios happen in real world, e.g., college admission and credit loaning, where each rejected sample makes effort to change its features to get accepted afterwards. In this dynamic setting, the long-term fairness should equalize the samples' feature distribution across different groups after the rejected samples make some effort to improve. In order to promote long-term fairness, we propose a new fairness notion called Equal Improvability (EI), which equalizes the potential acceptance rate of the rejected samples across different groups assuming a bounded level of effort will be spent by each rejected sample. We analyze the properties of EI and its connections with existing fairness notions. To find a classifier that satisfies the EI requirement, we propose and study three different approaches that solve EI-regularized optimization problems. Through experiments on both synthetic and real datasets, we demonstrate that the proposed EI-regularized algorithms encourage us to find a fair classifier in terms of EI. Finally, we provide experimental results on dynamic scenarios which highlight the advantages of our EI metric in achieving the long-term fairness. Codes are available in a GitHub repository, see https://github.com/guldoganozgur/ei_fairness. △ Less

Submitted 9 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Codes are available in a GitHub repository, see https://github.com/guldoganozgur/ei_fairness. ICLR 2023 Poster. 31 pages, 10 figures, 6 tables

arXiv:2209.05635 [pdf, other]

Bending the Future: Autoregressive Modeling of Temporal Knowledge Graphs in Curvature-Variable Hyperbolic Spaces

Authors: Jihoon Sohn, Mingyu Derek Ma, Muhao Chen

Abstract: Recently there is an increasing scholarly interest in time-varying knowledge graphs, or temporal knowledge graphs (TKG). Previous research suggests diverse approaches to TKG reasoning that uses historical information. However, less attention has been given to the hierarchies within such information at different timestamps. Given that TKG is a sequence of knowledge graphs based on time, the chronol… ▽ More Recently there is an increasing scholarly interest in time-varying knowledge graphs, or temporal knowledge graphs (TKG). Previous research suggests diverse approaches to TKG reasoning that uses historical information. However, less attention has been given to the hierarchies within such information at different timestamps. Given that TKG is a sequence of knowledge graphs based on time, the chronology in the sequence derives hierarchies between the graphs. Furthermore, each knowledge graph has its hierarchical level which may differ from one another. To address these hierarchical characteristics in TKG, we propose HyperVC, which utilizes hyperbolic space that better encodes the hierarchies than Euclidean space. The chronological hierarchies between knowledge graphs at different timestamps are represented by embedding the knowledge graphs as vectors in a common hyperbolic space. Additionally, diverse hierarchical levels of knowledge graphs are represented by adjusting the curvatures of hyperbolic embeddings of their entities and relations. Experiments on four benchmark datasets show substantial improvements, especially on the datasets with higher hierarchical levels. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 11 pages, 2 figures, In the 4th Conference on Automated Knowledge Base Construction (AKBC) 2022

arXiv:2207.10143 [pdf, other]

What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness

Authors: Sarra Habchi, Guillaume Haben, Jeongju Sohn, Adriano Franci, Mike Papadakis, Maxime Cordy, Yves Le Traon

Abstract: Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers' time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect a… ▽ More Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers' time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e., avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e., pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted at the 38th IEEE International Conference on Software Maintenance and Evolution (ICSME)

arXiv:2206.06565 [pdf, other]

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

Authors: Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

Abstract: Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding… ▽ More Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding layer with an image patch embedding layer, the word token output layer with a 10-way output layer, and the word prediction loss with a 10-way classification loss, respectively. A natural question arises: Can LM fine-tuning solve non-language downstream tasks without changing the model architecture or loss function? To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks. LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling "no-code machine learning with LMs." We find that LIFT performs comparably well across a wide range of low-dimensional classification and regression tasks, matching the performances of the best baselines in many cases, especially for the classification tasks. We also report experimental results on the fundamental properties of LIFT, including inductive bias, robustness, and sample complexity. We also analyze the effect of pretraining on LIFT and a few properties/techniques specific to LIFT, e.g., context-aware learning via appropriate prompting, calibrated predictions, data generation, and two-stage fine-tuning. Our code is available at https://github.com/UW-Madison-Lee-Lab/LanguageInterfacedFineTuning. △ Less

Submitted 30 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2205.11616 [pdf, other]

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Authors: Tuan Dinh, Jy-yong Sohn, Shashank Rajput, Timothy Ossowski, Yifei Ming, Junjie Hu, Dimitris Papailiopoulos, Kangwook Lee

Abstract: Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations b… ▽ More Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations but also pretrained language-image models for enabling a more efficient and robust UWT. Specifically, we develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP), which leverages visual observations via the shared embedding space of images and texts provided by CLIP models (Radford et al., 2021). WALIP has a two-step procedure. First, we retrieve word pairs with high confidences of similarity, computed using our proposed image-based fingerprints, which define the initial pivot for the word alignment. Second, we apply our robust Procrustes algorithm to estimate the linear mapping between two embedding spaces, which iteratively corrects and refines the estimated alignment. Our extensive experiments show that WALIP improves upon the state-of-the-art performance of bilingual word alignment for a few language pairs across different word embeddings and displays great robustness to the dissimilarity of language pairs or training corpora for two word embeddings. △ Less

Submitted 7 November, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings)

arXiv:2204.05472 [pdf, other]

Breaking Fair Binary Classification with Optimal Flipping Attacks

Authors: Changhun Jo, Jy-yong Sohn, Kangwook Lee

Abstract: Minimizing risk with fairness constraints is one of the popular approaches to learning a fair classifier. Recent works showed that this approach yields an unfair classifier if the training set is corrupted. In this work, we study the minimum amount of data corruption required for a successful flipping attack. First, we find lower/upper bounds on this quantity and show that these bounds are tight w… ▽ More Minimizing risk with fairness constraints is one of the popular approaches to learning a fair classifier. Recent works showed that this approach yields an unfair classifier if the training set is corrupted. In this work, we study the minimum amount of data corruption required for a successful flipping attack. First, we find lower/upper bounds on this quantity and show that these bounds are tight when the target model is the unique unconstrained risk minimizer. Second, we propose a computationally efficient data poisoning attack algorithm that can compromise the performance of fair learning algorithms. △ Less

Submitted 9 May, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2203.11343 [pdf, ps, other]

Using Evolutionary Coupling to Establish Relevance Links Between Tests and Code Units. A case study on fault localization

Authors: Jeongju Sohn, Mike Papadakis

Abstract: Many software engineering techniques, such as fault localization, operate based on relevance relationships between tests and code. These relationships are often inferred through the use of dynamic test execution information (test execution traces) that approximate the link between relevant code units and asserted, by the tests, program behaviour. Unfortunately, in practice dynamic information is n… ▽ More Many software engineering techniques, such as fault localization, operate based on relevance relationships between tests and code. These relationships are often inferred through the use of dynamic test execution information (test execution traces) that approximate the link between relevant code units and asserted, by the tests, program behaviour. Unfortunately, in practice dynamic information is not always available due to the overheads introduced by the instrumentation or the nature of the production environments. To deal with this issue, we propose CEMENT, a static technique that automatically infers such test and code relationships given the projects' evolution. The key idea is that developers make relevant changes on test and code units at the same period of time, i.e., co-evolution of tests and code units reflects a probable link between them. We evaluate CEMENT on 15 open source projects and show that it indeed captures relevant links. Additionally, we perform a fault localization case study where we compare CEMENT with an existing Information Retrieval-based Fault Localization (IRFL) technique and show that it achieves comparable performance. A further analysis of our results reveals a small overlap between the faults successfully localized by the two approaches suggesting complementarity. In particular, out of the 39 successfully localized faults, two are common while CEMENT and IRFL localize 16 and 21. These results demonstrate that test and code evolutionary coupling can effectively support test and debugging activities. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2202.12002 [pdf, other]

Rare Gems: Finding Lottery Tickets at Initialization

Authors: Kartik Sreenivasan, Jy-yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Eric Xing, Kangwook Lee, Dimitris Papailiopoulos

Abstract: Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle e… ▽ More Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle et al. and Su et al. presents concrete evidence that current algorithms for finding trainable networks at initialization, fail simple baseline comparisons, e.g., against training random sparse subnetworks. Finding lottery tickets that train to better accuracy compared to simple baselines remains an open problem. In this work, we resolve this open problem by proposing Gem-Miner which finds lottery tickets at initialization that beat current baselines. Gem-Miner finds lottery tickets trainable to accuracy competitive or better than Iterative Magnitude Pruning (IMP), and does so up to $19\times$ faster. △ Less

Submitted 2 June, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

arXiv:2201.02354 [pdf, other]

GenLabel: Mixup Relabeling using Generative Models

Authors: Jy-yong Sohn, Liang Shang, Hongxu Chen, Jaekyun Moon, Dimitris Papailiopoulos, Kangwook Lee

Abstract: Mixup is a data augmentation method that generates new data points by mixing a pair of input data. While mixup generally improves the prediction performance, it sometimes degrades the performance. In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling… ▽ More Mixup is a data augmentation method that generates new data points by mixing a pair of input data. While mixup generally improves the prediction performance, it sometimes degrades the performance. In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling algorithm designed for mixup. In particular, GenLabel helps the mixup algorithm correctly label mixup samples by learning the class-conditional data distribution using generative models. Via extensive theoretical and empirical analysis, we show that mixup, when used together with GenLabel, can effectively resolve the aforementioned phenomenon, improving the generalization performance and the adversarial robustness. △ Less

Submitted 7 January, 2022; originally announced January 2022.

arXiv:2111.05096 [pdf, other]

E-voting System Using Homomorphic Encryption and Blockchain Technology to Encrypt Voter Data

Authors: Hyunyeon Kim, Kyung Eun Kim, Soohan Park, Jongsoo Sohn

Abstract: Homomorphic encryption and blockchain technology are regarded as two significant technologies for improving e-voting systems. In this paper, we suggest a novel e-voting system using homomorphic encryption and blockchain technology that is focused on encrypting voter data. By encrypting voter information rather than cast votes, the system enables various statistical analyses regarding the vote resu… ▽ More Homomorphic encryption and blockchain technology are regarded as two significant technologies for improving e-voting systems. In this paper, we suggest a novel e-voting system using homomorphic encryption and blockchain technology that is focused on encrypting voter data. By encrypting voter information rather than cast votes, the system enables various statistical analyses regarding the vote result while securing the credibility, privacy, and verifiability of overall e-voting system. We checked the efficiency of the overall system by comparing the speed of the proposed system with the speed of other e-voting systems that use homomorphic encryption and blockchain technology. △ Less

Submitted 5 November, 2021; originally announced November 2021.

arXiv:2110.08996 [pdf, other]

Finding Everything within Random Binary Networks

Authors: Kartik Sreenivasan, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos

Abstract: A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up line of theoretical work provides justification of these findings by proving that slightly overparameterized neural networks, with commonly used cont… ▽ More A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up line of theoretical work provides justification of these findings by proving that slightly overparameterized neural networks, with commonly used continuous-valued random initializations can indeed be pruned to approximate any target network. In this work, we show that the amplitude of those random weights does not even matter. We prove that any target network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $\{\pm1\}$ weights that is only a polylogarithmic factor wider and deeper than the target network. △ Less

Submitted 22 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

arXiv:2104.04952 [pdf, other]

Fine-Grained Attention for Weakly Supervised Object Localization

Authors: Junghyo Sohn, Eunjin Jeon, Wonsik Jung, Eunsong Kang, Heung-Il Suk

Abstract: Although recent advances in deep learning accelerated an improvement in a weakly supervised object localization (WSOL) task, there are still challenges to identify the entire body of an object, rather than only discriminative parts. In this paper, we propose a novel residual fine-grained attention (RFGA) module that autonomously excites the less activated regions of an object by utilizing informat… ▽ More Although recent advances in deep learning accelerated an improvement in a weakly supervised object localization (WSOL) task, there are still challenges to identify the entire body of an object, rather than only discriminative parts. In this paper, we propose a novel residual fine-grained attention (RFGA) module that autonomously excites the less activated regions of an object by utilizing information distributed over channels and locations within feature maps in combination with a residual operation. To be specific, we devise a series of mechanisms of triple-view attention representation, attention expansion, and feature calibration. Unlike other attention-based WSOL methods that learn a coarse attention map, having the same values across elements in feature maps, our proposed RFGA learns fine-grained values in an attention map by assigning different attention values for each of the elements. We validated the superiority of our proposed RFGA module by comparing it with the recent methods in the literature over three datasets. Further, we analyzed the effect of each mechanism in our RFGA and visualized attention maps to get insights. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: 16 pages, 11 figures

arXiv:2102.13267 [pdf, other]

LazyTensor: combining eager execution with domain-specific compilers

Authors: Alex Suhan, Davide Libenzi, Ailing Zhang, Parker Schuh, Brennan Saeta, Jie Young Sohn, Denys Shabalin

Abstract: Domain-specific optimizing compilers have demonstrated significant performance and portability benefits, but require programs to be represented in their specialized IRs. Existing frontends to these compilers suffer from the "language subset problem" where some host language features are unsupported in the subset of the user's program that interacts with the domain-specific compiler. By contrast, d… ▽ More Domain-specific optimizing compilers have demonstrated significant performance and portability benefits, but require programs to be represented in their specialized IRs. Existing frontends to these compilers suffer from the "language subset problem" where some host language features are unsupported in the subset of the user's program that interacts with the domain-specific compiler. By contrast, define-by-run ML frameworks-colloquially called "eager" mode-are popular due to their ease of use and expressivity, where the full power of the host programming language can be used. LazyTensor is a technique to target domain specific compilers without sacrificing define-by-run ergonomics. Initially developed to support PyTorch on Cloud TPUs, the technique, along with a substantially shared implementation, has been used by Swift for TensorFlow across CPUs, GPUs, and TPUs, demonstrating the generality of the approach across (1) Tensor implementations, (2) hardware accelerators, and (3) programming languages. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2012.05433 [pdf, other]

Communication-Computation Efficient Secure Aggregation for Federated Learning

Authors: Beongjun Choi, Jy-yong Sohn, Dong-Jun Han, Jaekyun Moon

Abstract: Federated learning has been spotlighted as a way to train neural networks using distributed data with no need for individual nodes to share data. Unfortunately, it has also been shown that adversaries may be able to extract local data contents off model parameters transmitted during federated learning. A recent solution based on the secure aggregation primitive enabled privacy-preserving federated… ▽ More Federated learning has been spotlighted as a way to train neural networks using distributed data with no need for individual nodes to share data. Unfortunately, it has also been shown that adversaries may be able to extract local data contents off model parameters transmitted during federated learning. A recent solution based on the secure aggregation primitive enabled privacy-preserving federated learning, but at the expense of significant extra communication/computational resources. In this paper, we propose a low-complexity scheme that provides data privacy using substantially reduced communication/computational resources relative to the existing secure solution. The key idea behind the suggested scheme is to design the topology of secret-sharing nodes as a sparse random graph instead of the complete graph corresponding to the existing solution. We first obtain the necessary and sufficient condition on the graph to guarantee both reliability and privacy. We then suggest using the Erdős-Rényi graph in particular and provide theoretical guarantees on the reliability/privacy of the proposed scheme. Through extensive real-world experiments, we demonstrate that our scheme, using only $20 \sim 30\%$ of the resources required in the conventional scheme, maintains virtually the same levels of reliability and data privacy in practical federated learning systems. △ Less

Submitted 12 July, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

arXiv:2007.05084 [pdf, other]

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Authors: Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, Dimitris Papailiopoulos

Abstract: Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is cur… ▽ More Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time. We couple our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors. An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution. We explain how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness, and exhibit that with careful tuning at the side of the adversary, one can insert them across a range of machine learning tasks (e.g., image classification, OCR, text prediction, sentiment analysis). △ Less

Submitted 9 July, 2020; originally announced July 2020.

arXiv:1912.12463 [pdf, other]

doi 10.1145/3563210

Arachne: Search Based Repair of Deep Neural Networks

Authors: Jeongju Sohn, Sungmin Kang, Shin Yoo

Abstract: The rapid and widespread adoption of Deep Neural Networks (DNNs) has called for ways to test their behaviour, and many testing approaches have successfully revealed misbehaviour of DNNs. However, it is relatively unclear what one can do to correct such behaviour after revelation, as retraining involves costly data collection and does not guarantee to fix the underlying issue. This paper introduces… ▽ More The rapid and widespread adoption of Deep Neural Networks (DNNs) has called for ways to test their behaviour, and many testing approaches have successfully revealed misbehaviour of DNNs. However, it is relatively unclear what one can do to correct such behaviour after revelation, as retraining involves costly data collection and does not guarantee to fix the underlying issue. This paper introduces Arachne, a novel program repair technique for DNNs, which directly repairs DNNs using their input-output pairs as a specification. Arachne localises neural weights on which it can generate effective patches and uses Differential Evolution to optimise the localised weights and correct the misbehaviour. An empirical study using different benchmarks shows that Arachne can fix specific misclassifications of a DNN without reducing general accuracy significantly. On average, patches generated by Arachne generalise to 61.3% of unseen misbehaviour, whereas those by a state-of-the-art DNN repair technique generalise only to 10.2% and sometimes to none while taking tens of times more than Arachne. We also show that Arachne can address fairness issues by debiasing a gender classification model. Finally, we successfully apply Arachne to a text sentiment model to show that it generalises beyond Convolutional Neural Networks. △ Less

Submitted 16 August, 2022; v1 submitted 28 December, 2019; originally announced December 2019.

arXiv:1910.06093 [pdf, other]

Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks

Authors: Jy-yong Sohn, Dong-Jun Han, Beongjun Choi, Jaekyun Moon

Abstract: Recent advances in large-scale distributed learning algorithms have enabled communication-efficient training via SignSGD. Unfortunately, a major issue continues to plague distributed learning: namely, Byzantine failures may incur serious degradation in learning accuracy. This paper proposes Election Coding, a coding-theoretic framework to guarantee Byzantine-robustness for SignSGD with Majority Vo… ▽ More Recent advances in large-scale distributed learning algorithms have enabled communication-efficient training via SignSGD. Unfortunately, a major issue continues to plague distributed learning: namely, Byzantine failures may incur serious degradation in learning accuracy. This paper proposes Election Coding, a coding-theoretic framework to guarantee Byzantine-robustness for SignSGD with Majority Vote, which uses minimum worker-master communication in both directions. The suggested framework explores new information-theoretic limits of finding the majority opinion when some workers could be malicious, and paves the road to implement robust and efficient distributed learning algorithms. Under this framework, we construct two types of explicit codes, random Bernoulli codes and deterministic algebraic codes, that can tolerate Byzantine attacks with a controlled amount of computational redundancy. For the Bernoulli codes, we provide upper bounds on the error probability in estimating the majority opinion, which give useful insights into code design for tolerating Byzantine attacks. As for deterministic codes, we construct an explicit code which perfectly tolerates Byzantines, and provide tight upper/lower bounds on the minimum required computational redundancy. Finally, the Byzantine-tolerance of the suggested coding schemes is confirmed by deep learning experiments on Amazon EC2 using Python with MPI4py package. △ Less

Submitted 24 October, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Comments: Accepted at NeurIPS 2020

arXiv:1909.06326 [pdf, other]

Automatic Hip Fracture Identification and Functional Subclassification with Deep Learning

Authors: Justin D Krogue, Kaiyang V Cheng, Kevin M Hwang, Paul Toogood, Eric G Meinberg, Erik J Geiger, Musa Zaid, Kevin C McGill, Rina Patel, Jae Ho Sohn, Alexandra Wright, Bryan F Darger, Kevin A Padrez, Eugene Ozhinsky, Sharmila Majumdar, Valentina Pedoia

Abstract: Purpose: Hip fractures are a common cause of morbidity and mortality. Automatic identification and classification of hip fractures using deep learning may improve outcomes by reducing diagnostic errors and decreasing time to operation. Methods: Hip and pelvic radiographs from 1118 studies were reviewed and 3034 hips were labeled via bounding boxes and classified as normal, displaced femoral neck f… ▽ More Purpose: Hip fractures are a common cause of morbidity and mortality. Automatic identification and classification of hip fractures using deep learning may improve outcomes by reducing diagnostic errors and decreasing time to operation. Methods: Hip and pelvic radiographs from 1118 studies were reviewed and 3034 hips were labeled via bounding boxes and classified as normal, displaced femoral neck fracture, nondisplaced femoral neck fracture, intertrochanteric fracture, previous ORIF, or previous arthroplasty. A deep learning-based object detection model was trained to automate the placement of the bounding boxes. A Densely Connected Convolutional Neural Network (DenseNet) was trained on a subset of the bounding box images, and its performance evaluated on a held out test set and by comparison on a 100-image subset to two groups of human observers: fellowship-trained radiologists and orthopaedists, and senior residents in emergency medicine, radiology, and orthopaedics. Results: The binary accuracy for fracture of our model was 93.8% (95% CI, 91.3-95.8%), with sensitivity of 92.7% (95% CI, 88.7-95.6%), and specificity 95.0% (95% CI, 91.5-97.3%). Multiclass classification accuracy was 90.4% (95% CI, 87.4-92.9%). When compared to human observers, our model achieved at least expert-level classification under all conditions. Additionally, when the model was used as an aid, human performance improved, with aided resident performance approximating unaided fellowship-trained expert performance. Conclusions: Our deep learning model identified and classified hip fractures with at least expert-level accuracy, and when used as an aid improved human performance, with aided resident performance approximating that of unaided fellowship-trained attendings. △ Less

Submitted 10 September, 2019; originally announced September 2019.

Comments: Presented at Orthopaedic Research Society, Austin, TX, Feb 2, 2019, currently in submission for publication

arXiv:1901.05162 [pdf, other]

Coded Matrix Multiplication on a Group-Based Model

Authors: Muah Kim, Jy-yong Sohn, Jaekyun Moon

Abstract: Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the clustered or grouped structure of real-world computing servers. Neither the large variations in the computing power and bandwidth capabilities across different serv… ▽ More Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the clustered or grouped structure of real-world computing servers. Neither the large variations in the computing power and bandwidth capabilities across different servers have been properly modeled. We suggest a group-based model to reflect practical conditions and develop an appropriate coding scheme for this model. The suggested code, called group code, employs parallel encoding for each group. We show that the suggested coding scheme can asymptotically achieve optimal computing time in regimes of infinite n, the number of workers. While theoretical analysis is conducted in the asymptotic regime, numerical results also show that the suggested scheme achieves near-optimal computing time for any finite but reasonably large n. Moreover, we demonstrate that the decoding complexity of the suggested scheme is significantly reduced by the virtue of parallel decoding. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: 9 pages, submitted to ISIT 2019

arXiv:1901.03610 [pdf, other]

Coded Distributed Computing over Packet Erasure Channels

Authors: Dong-Jun Han, Jy-yong Sohn, Jaekyun Moon

Abstract: Coded computation is a framework which provides redundancy in distributed computing systems to speed up largescale tasks. Although most existing works assume an error-free scenarios in a master-worker setup, the link failures are common in current wired/wireless networks. In this paper, we consider the straggler problem in coded distributed computing with link failures, by modeling the links betwe… ▽ More Coded computation is a framework which provides redundancy in distributed computing systems to speed up largescale tasks. Although most existing works assume an error-free scenarios in a master-worker setup, the link failures are common in current wired/wireless networks. In this paper, we consider the straggler problem in coded distributed computing with link failures, by modeling the links between the master node and worker nodes as packet erasure channels. When the master fails to detect the received signal, retransmission is required for each worker which increases the overall run-time to finish the task. We first investigate the expected overall run-time in this setting using an (n; k) maximum distance separable (MDS) code. We obtain the lower and upper bounds on the latency in closed-forms and give guidelines to design MDS code depending on the packet erasure probability. Finally, we consider a setup where the number of retransmissions is limited due to the bandwidth constraint. By formulating practical optimization problems related to latency, transmission bandwidth and probability of successful computation, we obtain achievable performance curves as a function of packet erasure probability. △ Less

Submitted 11 January, 2019; originally announced January 2019.

Comments: 12 pages

arXiv:1811.11852 [pdf]

doi 10.1088/1361-6560/ab0606

Joint Correction of Attenuation and Scatter Using Deep Convolutional Neural Networks (DCNN) for Time-of-Flight PET

Authors: Jaewon Yang, Dookun Park, Jae Ho Sohn, Zhen Jane Wang, Grant T. Gullberg, Youngho Seo

Abstract: Deep convolutional neural networks (DCNN) have demonstrated its capability to convert MR image to pseudo CT for PET attenuation correction in PET/MRI. Conventionally, attenuated events are corrected in sinogram space using attenuation maps derived from CT or MR-derived pseudo CT. Separately, scattered events are iteratively estimated by a 3D model-based simulation using down-sampled attenuation an… ▽ More Deep convolutional neural networks (DCNN) have demonstrated its capability to convert MR image to pseudo CT for PET attenuation correction in PET/MRI. Conventionally, attenuated events are corrected in sinogram space using attenuation maps derived from CT or MR-derived pseudo CT. Separately, scattered events are iteratively estimated by a 3D model-based simulation using down-sampled attenuation and emission sinograms. However, no studies have investigated joint correction of attenuation and scatter using DCNN in image space. Therefore, we aim to develop and optimize a DCNN model for attenuation and scatter correction (ASC) simultaneously in PET image space without additional anatomical imaging or time-consuming iterative scatter simulation. For the first time, we demonstrated the feasibility of directly producing PET images corrected for attenuation and scatter using DCNN (PET-DCNN) from noncorrected PET (PET-NC) images. △ Less

Submitted 28 November, 2018; originally announced November 2018.

Comments: 4 pages, 7 figures, IEEE MIC 2018 conference

arXiv:1801.04686 [pdf, other]

Hierarchical Coding for Distributed Computing

Authors: Hyegyeong Park, Kangwook Lee, Jy-yong Sohn, Changho Suh, Jaekyun Moon

Abstract: Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of groups of workers, motivated by the need to reflect the architectures of real-world distributed computing systems. In this work, we propose a hierarchical codi… ▽ More Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of groups of workers, motivated by the need to reflect the architectures of real-world distributed computing systems. In this work, we propose a hierarchical coding scheme for this model, as well as analyze its decoding cost and expected computation time. Specifically, we first provide upper and lower bounds on the expected computing time of the proposed scheme. We also show that our scheme enables efficient parallel decoding, thus reducing decoding costs by orders of magnitude over non-hierarchical schemes. When considering both decoding cost and computing time, the proposed hierarchical coding is shown to outperform existing schemes in many practical scenarios. △ Less

Submitted 15 January, 2018; originally announced January 2018.

Comments: 7 pages, part of the paper is submitted to ISIT2018

arXiv:1801.02287 [pdf, other]

Explicit Constructions of MBR and MSR Codes for Clustered Distributed Storage

Authors: Jy-yong Sohn, Beongjun Choi, Jaekyun Moon

Abstract: This paper considers capacity-achieving coding for the clustered form of distributed storage that reflects practical storage networks. To reflect the clustered structure with limited cross-cluster communication bandwidths, nodes in the same cluster are set to communicate $β_I$ symbols, while nodes in other clusters can communicate $β_c \leq β_I$ symbols with one another. We provide two types of ex… ▽ More This paper considers capacity-achieving coding for the clustered form of distributed storage that reflects practical storage networks. To reflect the clustered structure with limited cross-cluster communication bandwidths, nodes in the same cluster are set to communicate $β_I$ symbols, while nodes in other clusters can communicate $β_c \leq β_I$ symbols with one another. We provide two types of exact regenerating codes which achieve the capacity of clustered distributed storage: the minimum-bandwidth-regenerating (MBR) codes and the minimum-storage-regenerating (MSR) codes. First, we construct MBR codes for general parameter settings of clustered distributed storage. The suggested MBR code is a generalization of an existing code proposed by Rashmi et al., for scenarios where storage nodes are dispersed into L > 1 clusters. The proposed MBR code for the $β_c = 0$ case requires a much smaller field size compared to existing local MBR codes. Secondly, we devise MSR codes for clustered distributed storage. Focus is given on two important cases:$ε=0$ and $ε\in [1/(n-k), 1]$, where $ε=β_c/β_I$ is the ratio of the available cross- to intra-cluster repair bandwidths, n is the total number of distributed nodes and k is the number of contact nodes in data retrieval. The former represents the scenario where cross-cluster communication is not allowed, while the latter corresponds to the case of minimum node storage overhead. For $ε=0$, two existing locally repairable codes are proven to be MSR codes for the clustered model. For $ε\in [1/(n-k), 1]$, existing MSR codes for the non-clustered model are applicable to clustered scenarios with a simple modification. △ Less

Submitted 2 August, 2019; v1 submitted 7 January, 2018; originally announced January 2018.

Comments: 25 pages, submitted to IEEE Transactions on Information Theory

arXiv:1801.02014 [pdf, other]

A Class of MSR Codes for Clustered Distributed Storage

Authors: Jy-yong Sohn, Beongjun Choi, Jaekyun Moon

Abstract: Clustered distributed storage models real data centers where intra- and cross-cluster repair bandwidths are different. In this paper, exact-repair minimum-storage-regenerating (MSR) codes achieving capacity of clustered distributed storage are designed. Focus is given on two cases: $ε=0$ and $ε=1/(n-k)$, where $ε$ is the ratio of the available cross- and intra-cluster repair bandwidths, $n$ is the… ▽ More Clustered distributed storage models real data centers where intra- and cross-cluster repair bandwidths are different. In this paper, exact-repair minimum-storage-regenerating (MSR) codes achieving capacity of clustered distributed storage are designed. Focus is given on two cases: $ε=0$ and $ε=1/(n-k)$, where $ε$ is the ratio of the available cross- and intra-cluster repair bandwidths, $n$ is the total number of distributed nodes and $k$ is the number of contact nodes in data retrieval. The former represents the scenario where cross-cluster communication is not allowed, while the latter corresponds to the case of minimum cross-cluster bandwidth that is possible under the minimum storage overhead constraint. For the $ε=0$ case, two types of locally repairable codes are proven to achieve the MSR point. As for $ε=1/(n-k)$, an explicit MSR coding scheme is suggested for the two-cluster situation under the specific condition of $n = 2k$. △ Less

Submitted 6 January, 2018; originally announced January 2018.

Comments: 9 pages, a part of this paper is submitted to IEEE ISIT2018

arXiv:1710.02821 [pdf, other]

Capacity of Clustered Distributed Storage

Authors: Jy-yong Sohn, Beongjun Choi, Sung Whan Yoon, Jaekyun Moon

Abstract: A new system model reflecting the clustered structure of distributed storage is suggested to investigate interplay between storage overhead and repair bandwidth as storage node failures occur. Large data centers with multiple racks/disks or local networks of storage devices (e.g. sensor network) are good applications of the suggested clustered model. In realistic scenarios involving clustered stor… ▽ More A new system model reflecting the clustered structure of distributed storage is suggested to investigate interplay between storage overhead and repair bandwidth as storage node failures occur. Large data centers with multiple racks/disks or local networks of storage devices (e.g. sensor network) are good applications of the suggested clustered model. In realistic scenarios involving clustered storage structures, repairing storage nodes using intact nodes residing in other clusters is more bandwidth-consuming than restoring nodes based on information from intra-cluster nodes. Therefore, it is important to differentiate between intra-cluster repair bandwidth and cross-cluster repair bandwidth in modeling distributed storage. Capacity of the suggested model is obtained as a function of fundamental resources of distributed storage systems, namely, node storage capacity, intra-cluster repair bandwidth and cross-cluster repair bandwidth. The capacity is shown to be asymptotically equivalent to a monotonic decreasing function of number of clusters, as the number of storage nodes increases without bound. Based on the capacity expression, feasible sets of required resources which enable reliable storage are obtained in a closed-form solution. Specifically, it is shown that the cross-cluster traffic can be minimized to zero (i.e., intra-cluster local repair becomes possible) by allowing extra resources on storage capacity and intra-cluster repair bandwidth, according to the law specified in the closed-form. The network coding schemes with zero cross-cluster traffic are defined as intra-cluster repairable codes, which are shown to be a class of the previously developed locally repairable codes. △ Less

Submitted 1 May, 2018; v1 submitted 8 October, 2017; originally announced October 2017.

Comments: To Appear at IEEE Transactions on Information Theory

arXiv:1710.02811 [pdf, other]

On Reusing Pilots Among Interfering Cells in Massive MIMO

Authors: Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon

Abstract: Pilot contamination, caused by the reuse of pilots among interfering cells, remains as a significant obstacle that limits the performance of massive multi-input multi-output antenna systems. To handle this problem, less aggressive reuse of pilots involving allocation of additional pilots for interfering users is closely examined in this paper. Hierarchical pilot reuse methods are proposed, which e… ▽ More Pilot contamination, caused by the reuse of pilots among interfering cells, remains as a significant obstacle that limits the performance of massive multi-input multi-output antenna systems. To handle this problem, less aggressive reuse of pilots involving allocation of additional pilots for interfering users is closely examined in this paper. Hierarchical pilot reuse methods are proposed, which effectively mitigate pilot contamination and increase the net throughput of the system. Among the suggested hierarchical pilot reuse schemes, the optimal way of assigning pilots to different users is obtained in a closed-form solution which maximizes the net sum-rate in a given coherence time. Simulation results confirm that when the ratio of the channel coherence time to the number of users in each cell is sufficiently large, less aggressive reuse of pilots yields significant performance advantage relative to the case where all cells reuse the same pilot set. △ Less

Submitted 8 October, 2017; originally announced October 2017.

Comments: 13 pages, to be appear in IEEE Transactions on Wireless Communications

arXiv:1705.01061 [pdf, other]

Pilot Reuse Strategy Maximizing the Weighted-Sum-Rate in Massive MIMO Systems

Authors: Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon

Abstract: Pilot reuse in multi-cell massive multi-input multi-output (MIMO) system is investigated where user groups with different priorities exist. Recent investigation on pilot reuse has revealed that when the ratio of the coherent time interval to the number of users is reasonably high, it is beneficial not to fully reuse pilots from interfering cells. This work finds the optimum pilot assignment strate… ▽ More Pilot reuse in multi-cell massive multi-input multi-output (MIMO) system is investigated where user groups with different priorities exist. Recent investigation on pilot reuse has revealed that when the ratio of the coherent time interval to the number of users is reasonably high, it is beneficial not to fully reuse pilots from interfering cells. This work finds the optimum pilot assignment strategy that would maximize the weighted sum rate (WSR) given the user groups with different priorities. A closed-form solution for the optimal pilot assignment is derived and is shown to make intuitive sense. Performance comparison shows that under wide range of channel conditions, the optimal pilot assignment that uses extra set of pilots achieves better WSR performance than conventional full pilot reuse. △ Less

Submitted 2 May, 2017; originally announced May 2017.

Comments: 13 pages, to appear in IEEE Journal on Selected Areas in Communications 2017

arXiv:1702.07498 [pdf, other]

Secure Clustered Distributed Storage Against Eavesdroppers

Authors: Beongjun Choi, Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon

Abstract: This paper considers the security issue of practical distributed storage systems (DSSs) which consist of multiple clusters of storage nodes. Noticing that actual storage nodes constituting a DSS are distributed in multiple clusters, two novel eavesdropper models - the node-restricted model and the cluster-restricted model - are suggested which reflect the clustered nature of DSSs. In the node-rest… ▽ More This paper considers the security issue of practical distributed storage systems (DSSs) which consist of multiple clusters of storage nodes. Noticing that actual storage nodes constituting a DSS are distributed in multiple clusters, two novel eavesdropper models - the node-restricted model and the cluster-restricted model - are suggested which reflect the clustered nature of DSSs. In the node-restricted model, an eavesdropper cannot access the individual nodes, but can eavesdrop incoming/outgoing data for $L_c$ compromised clusters. In the cluster-restricted model, an eavesdropper can access a total of $l$ individual nodes but the number of accessible clusters is limited to $L_c$. We provide an upper bound on the securely storable data for each model, while a specific network coding scheme which achieves the upper bound is obtained for the node-restricted model, given some mild condition on the node storage size. △ Less

Submitted 24 February, 2017; originally announced February 2017.

Comments: 6 pages, accepted at IEEE ICC 2017

Showing 1–50 of 52 results for author: Sohn, J