Search | arXiv e-print repository

Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection

Authors: Sungwon Park, Sungwon Han, Meeyoung Cha

Abstract: The spread of fake news negatively impacts individuals and is regarded as a significant social challenge that needs to be addressed. A number of algorithmic and insightful features have been identified for detecting fake news. However, with the recent LLMs and their advanced generation capabilities, many of the detectable features (e.g., style-conversion attacks) can be altered, making it more cha… ▽ More The spread of fake news negatively impacts individuals and is regarded as a significant social challenge that needs to be addressed. A number of algorithmic and insightful features have been identified for detecting fake news. However, with the recent LLMs and their advanced generation capabilities, many of the detectable features (e.g., style-conversion attacks) can be altered, making it more challenging to distinguish from real news. This study proposes adversarial style augmentation, AdStyle, to train a fake news detector that remains robust against various style-conversion attacks. Our model's key mechanism is the careful use of LLMs to automatically generate a diverse yet coherent range of style-conversion attack prompts. This improves the generation of prompts that are particularly difficult for the detector to handle. Experiments show that our augmentation strategy improves robustness and detection performance when tested on fake news benchmark datasets. △ Less

Submitted 22 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2406.09799 [pdf, other]

GeoSEE: Regional Socio-Economic Estimation With a Large Language Model

Authors: Sungwon Han, Donghyun Ahn, Seungeon Lee, Minhyuk Song, Sungwon Park, Sangyoon Park, Jihee Kim, Meeyoung Cha

Abstract: Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Pre… ▽ More Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Presented with a diverse set of information modules, including those pre-constructed from satellite imagery, GeoSEE selects which modules to use in estimation, for each indicator and country. This selection is guided by the LLM's prior socio-geographic knowledge, which functions similarly to the insights of a domain expert. The system then computes target indicators via in-context learning after aggregating results from selected modules in the format of natural language-based texts. Comprehensive evaluation across countries at various stages of development reveals that our method outperforms other predictive models in both unsupervised and low-shot contexts. This reliable performance under data-scarce setting in under-developed or developing countries, combined with its cost-effectiveness, underscores its potential to continuously support and monitor the progress of Sustainable Development Goals, such as poverty alleviation and equitable growth, on a global scale. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08020 [pdf, other]

Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model

Authors: Kyeongjin Ahn, Sungwon Han, Sungwon Park, Jihee Kim, Sangyoon Park, Meeyoung Cha

Abstract: The increasing frequency and intensity of natural disasters demand more sophisticated approaches for rapid and precise damage assessment. To tackle this issue, researchers have developed various methods on disaster benchmark datasets from satellite imagery to aid in detecting disaster damage. However, the diverse nature of geographical landscapes and disasters makes it challenging to apply existin… ▽ More The increasing frequency and intensity of natural disasters demand more sophisticated approaches for rapid and precise damage assessment. To tackle this issue, researchers have developed various methods on disaster benchmark datasets from satellite imagery to aid in detecting disaster damage. However, the diverse nature of geographical landscapes and disasters makes it challenging to apply existing methods to regions unseen during training. We present DAVI (Disaster Assessment with VIsion foundation model), which overcomes domain disparities and detects structural damage (e.g., building) without requiring ground-truth labels of the target region. DAVI integrates task-specific knowledge from a model trained on source regions with an image segmentation foundation model to generate pseudo labels of possible damage in the target region. It then employs a two-stage refinement process, targeting both the pixel and overall image, to more accurately pinpoint changes in disaster-struck areas based on before-and-after images. Comprehensive evaluations demonstrate that DAVI achieves exceptional performance across diverse terrains (e.g., USA and Mexico) and disaster types (e.g., wildfires, hurricanes, and earthquakes). This confirms its robustness in assessing disaster impact without dependence on ground-truth labels. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures, 2 tables

arXiv:2405.18986 [pdf, other]

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

Authors: Minji Lee, Luiz Felipe Vecchietti, Hyunkyu Jung, Hyun Joo Ro, Meeyoung Cha, Ho Min Kim

Abstract: Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space… ▽ More Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model. To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space. We evaluate our approach on two important fitness optimization tasks, demonstrating its ability to achieve comparable or superior fitness over baseline methods. Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.04990 [pdf, other]

Health Index Estimation Through Integration of General Knowledge with Unsupervised Learning

Authors: Kristupas Bajarunas, Marcia L. Baptista, Kai Goebel, Manuel A. Chao

Abstract: Accurately estimating a Health Index (HI) from condition monitoring data (CM) is essential for reliable and interpretable prognostics and health management (PHM) in complex systems. In most scenarios, complex systems operate under varying operating conditions and can exhibit different fault modes, making unsupervised inference of an HI from CM data a significant challenge. Hybrid models combining… ▽ More Accurately estimating a Health Index (HI) from condition monitoring data (CM) is essential for reliable and interpretable prognostics and health management (PHM) in complex systems. In most scenarios, complex systems operate under varying operating conditions and can exhibit different fault modes, making unsupervised inference of an HI from CM data a significant challenge. Hybrid models combining prior knowledge about degradation with deep learning models have been proposed to overcome this challenge. However, previously suggested hybrid models for HI estimation usually rely heavily on system-specific information, limiting their transferability to other systems. In this work, we propose an unsupervised hybrid method for HI estimation that integrates general knowledge about degradation into the convolutional autoencoder's model architecture and learning algorithm, enhancing its applicability across various systems. The effectiveness of the proposed method is demonstrated in two case studies from different domains: turbofan engines and lithium batteries. The results show that the proposed method outperforms other competitive alternatives, including residual-based methods, in terms of HI quality and their utility for Remaining Useful Life (RUL) predictions. The case studies also highlight the comparable performance of our proposed method with a supervised model trained with HI labels. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.11905 [pdf, other]

FedMID: A Data-Free Method for Using Intermediate Outputs as a Defense Mechanism Against Poisoning Attacks in Federated Learning

Authors: Sungwon Han, Hyeonho Song, Sungwon Park, Meeyoung Cha

Abstract: Federated learning combines local updates from clients to produce a global model, which is susceptible to poisoning attacks. Most previous defense strategies relied on vectors derived from projections of local updates on a Euclidean space; however, these methods fail to accurately represent the functionality and structure of local models, resulting in inconsistent performance. Here, we present a n… ▽ More Federated learning combines local updates from clients to produce a global model, which is susceptible to poisoning attacks. Most previous defense strategies relied on vectors derived from projections of local updates on a Euclidean space; however, these methods fail to accurately represent the functionality and structure of local models, resulting in inconsistent performance. Here, we present a new paradigm to defend against poisoning attacks in federated learning using functional mappings of local models based on intermediate outputs. Experiments show that our mechanism is robust under a broad range of computing conditions and advanced attack scenarios, enabling safer collaboration among data-sensitive participants via federated learning. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2403.05861 [pdf, ps, other]

DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep Learning Clusters in the Cloud

Authors: Yoochan Kim, Kihyun Kim, Yonghyeon Cho, Jinwoo Kim, Awais Khan, Ki-Dong Kang, Baik-Song An, Myung-Hoon Cha, Hong-Yeon Kim, Youngjae Kim

Abstract: Distributed Deep Learning (DDL), as a paradigm, dictates the use of GPU-based clusters as the optimal infrastructure for training large-scale Deep Neural Networks (DNNs). However, the high cost of such resources makes them inaccessible to many users. Public cloud services, particularly Spot Virtual Machines (VMs), offer a cost-effective alternative, but their unpredictable availability poses a sig… ▽ More Distributed Deep Learning (DDL), as a paradigm, dictates the use of GPU-based clusters as the optimal infrastructure for training large-scale Deep Neural Networks (DNNs). However, the high cost of such resources makes them inaccessible to many users. Public cloud services, particularly Spot Virtual Machines (VMs), offer a cost-effective alternative, but their unpredictable availability poses a significant challenge to the crucial checkpointing process in DDL. To address this, we introduce DeepVM, a novel solution that recommends cost-effective cluster configurations by intelligently balancing the use of Spot and On-Demand VMs. DeepVM leverages a four-stage process that analyzes instance performance using the FLOPP (FLoating-point Operations Per Price) metric, performs architecture-level analysis with linear programming, and identifies the optimal configuration for the user-specific needs. Extensive simulations and real-world deployments in the AWS environment demonstrate that DeepVM consistently outperforms other policies, reducing training costs and overall makespan. By enabling cost-effective checkpointing with Spot VMs, DeepVM opens up DDL to a wider range of users and facilitates a more efficient training of complex DNNs. △ Less

Submitted 14 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures

arXiv:2402.10436 [pdf, other]

I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large Language Models

Authors: Wenchao Dong, Assem Zhunis, Hyojin Chin, Jiyoung Han, Meeyoung Cha

Abstract: We explored cultural biases-individualism vs. collectivism-in ChatGPT across three Western languages (i.e., English, German, and French) and three Eastern languages (i.e., Chinese, Japanese, and Korean). When ChatGPT adopted an individualistic persona in Western languages, its collectivism scores (i.e., out-group values) exhibited a more negative trend, surpassing their positive orientation toward… ▽ More We explored cultural biases-individualism vs. collectivism-in ChatGPT across three Western languages (i.e., English, German, and French) and three Eastern languages (i.e., Chinese, Japanese, and Korean). When ChatGPT adopted an individualistic persona in Western languages, its collectivism scores (i.e., out-group values) exhibited a more negative trend, surpassing their positive orientation towards individualism (i.e., in-group values). Conversely, when a collectivistic persona was assigned to ChatGPT in Eastern languages, a similar pattern emerged with more negative responses toward individualism (i.e., out-group values) as compared to collectivism (i.e., in-group values). The results indicate that when imbued with a particular social identity, ChatGPT discerns in-group and out-group, embracing in-group values while eschewing out-group values. Notably, the negativity towards the out-group, from which prejudices and discrimination arise, exceeded the positivity towards the in-group. The experiment was replicated in the political domain, and the results remained consistent. Furthermore, this replication unveiled an intrinsic Democratic bias in Large Language Models (LLMs), aligning with earlier findings and providing integral insights into mitigating such bias through prompt engineering. Extensive robustness checks were performed using varying hyperparameter and persona setup methods, with or without social identity labels, across other popular language models. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2401.09466 [pdf, other]

Self Supervised Vision for Climate Downscaling

Authors: Karandeep Singh, Chaeyoon Jeong, Naufal Shidqi, Sungwon Park, Arjun Nellikkattil, Elke Zeller, Meeyoung Cha

Abstract: Climate change is one of the most critical challenges that our planet is facing today. Rising global temperatures are already bringing noticeable changes to Earth's weather and climate patterns with an increased frequency of unpredictable and extreme weather events. Future projections for climate change research are based on Earth System Models (ESMs), the computer models that simulate the Earth's… ▽ More Climate change is one of the most critical challenges that our planet is facing today. Rising global temperatures are already bringing noticeable changes to Earth's weather and climate patterns with an increased frequency of unpredictable and extreme weather events. Future projections for climate change research are based on Earth System Models (ESMs), the computer models that simulate the Earth's climate system. ESMs provide a framework to integrate various physical systems, but their output is bound by the enormous computational resources required for running and archiving higher-resolution simulations. For a given resource budget, the ESMs are generally run on a coarser grid, followed by a computationally lighter $downscaling$ process to obtain a finer-resolution output. In this work, we present a deep-learning model for downscaling ESM simulation data that does not require high-resolution ground truth data for model optimization. This is realized by leveraging salient data distribution patterns and the hidden dependencies between weather variables for an $\textit{individual}$ data point at $\textit{runtime}$. Extensive evaluation with $2$x, $3$x, and $4$x scaling factors demonstrates that the proposed model consistently obtains superior performance over that of various baselines. The improved downscaling performance and no dependence on high-resolution ground truth data make the proposed method a valuable tool for climate research and mark it as a promising direction for future research. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.08939 [pdf, other]

Enhancing Campus Mobility: Achievements and Challenges of Autonomous Shuttle "Snow Lion''

Authors: Yingbing Chen, Jie Cheng, Sheng Wang, Hongji Liu, Xiaodong Mei, Xiaoyang Yan, Mingkai Tang, Ge Sun, Ya Wen, Junwei Cai, Xupeng Xie, Lu Gan, Mandan Chao, Ren Xin, Ming Liu, Jianhao Jiao, Kangcheng Liu, Lujia Wang

Abstract: The rapid evolution of autonomous vehicles (AVs) has significantly influenced global transportation systems. In this context, we present ``Snow Lion'', an autonomous shuttle meticulously designed to revolutionize on-campus transportation, offering a safer and more efficient mobility solution for students, faculty, and visitors. The primary objective of this research is to enhance campus mobility b… ▽ More The rapid evolution of autonomous vehicles (AVs) has significantly influenced global transportation systems. In this context, we present ``Snow Lion'', an autonomous shuttle meticulously designed to revolutionize on-campus transportation, offering a safer and more efficient mobility solution for students, faculty, and visitors. The primary objective of this research is to enhance campus mobility by providing a reliable, efficient, and eco-friendly transportation solution that seamlessly integrates with existing infrastructure and meets the diverse needs of a university setting. To achieve this goal, we delve into the intricacies of the system design, encompassing sensing, perception, localization, planning, and control aspects. We evaluate the autonomous shuttle's performance in real-world scenarios, involving a 1146-kilometer road haul and the transportation of 442 passengers over a two-month period. These experiments demonstrate the effectiveness of our system and offer valuable insights into the intricate process of integrating an autonomous vehicle within campus shuttle operations. Furthermore, a thorough analysis of the lessons derived from this experience furnishes a valuable real-world case study, accompanied by recommendations for future research and development in the field of autonomous driving. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 9 pages, 9 figures

arXiv:2312.15166 [pdf, other]

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Authors: Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim

Abstract: We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling meth… ▽ More We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field. △ Less

Submitted 3 April, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

Comments: accepted to NAACL 2024 Industry Track

arXiv:2311.10922 [pdf, other]

doi 10.8080/1020220078265

Explainable Product Classification for Customs

Authors: Eunji Lee, Sihyeon Kim, Sundong Kim, Soyeon Jung, Heeja Kim, Meeyoung Cha

Abstract: The task of assigning internationally accepted commodity codes (aka HS codes) to traded goods is a critical function of customs offices. Like court decisions made by judges, this task follows the doctrine of precedent and can be nontrivial even for experienced officers. Together with the Korea Customs Service (KCS), we propose a first-ever explainable decision supporting model that suggests the mo… ▽ More The task of assigning internationally accepted commodity codes (aka HS codes) to traded goods is a critical function of customs offices. Like court decisions made by judges, this task follows the doctrine of precedent and can be nontrivial even for experienced officers. Together with the Korea Customs Service (KCS), we propose a first-ever explainable decision supporting model that suggests the most likely subheadings (i.e., the first six digits) of the HS code. The model also provides reasoning for its suggestion in the form of a document that is interpretable by customs officers. We evaluated the model using 5,000 cases that recently received a classification request. The results showed that the top-3 suggestions made by our model had an accuracy of 93.9\% when classifying 925 challenging subheadings. A user study with 32 customs experts further confirmed that our algorithmic suggestions accompanied by explainable reasonings, can substantially reduce the time and effort taken by customs officers for classification reviews. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 24 pages, Accepted to ACM Transactions on Intelligent Systems and Technology

arXiv:2310.19635 [pdf, other]

Bidirectional Captioning for Clinically Accurate and Interpretable Models

Authors: Keegan Quigley, Miriam Cha, Josh Barua, Geeticka Chauhan, Seth Berkowitz, Steven Horng, Polina Golland

Abstract: Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks. While generative language models have gained widespread attention, image captioning has thus far been mostly overlooked as a form of cross-modal pretraining in favor of contrastive learning, especially in medical image analysis. In this paper, we experi… ▽ More Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks. While generative language models have gained widespread attention, image captioning has thus far been mostly overlooked as a form of cross-modal pretraining in favor of contrastive learning, especially in medical image analysis. In this paper, we experiment with bidirectional captioning of radiology reports as a form of pretraining and compare the quality and utility of learned embeddings with those from contrastive pretraining methods. We optimize a CNN encoder, transformer decoder architecture named RadTex for the radiology domain. Results show that not only does captioning pretraining yield visual encoders that are competitive with contrastive pretraining (CheXpert competition multi-label AUC of 89.4%), but also that our transformer decoder is capable of generating clinically relevant reports (captioning macro-F1 score of 0.349 using CheXpert labeler) and responding to prompts with targeted, interactive outputs. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 12 pages, 7 figures. Code release to follow

arXiv:2310.05189 [pdf, ps, other]

Factuality Challenges in the Era of Large Language Models

Authors: Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha, Tanmoy Chakraborty, Giovanni Luca Ciampaglia, David Corney, Renee DiResta, Emilio Ferrara, Scott Hale, Alon Halevy, Eduard Hovy, Heng Ji, Filippo Menczer, Ruben Miguez, Preslav Nakov, Dietram Scheufele, Shivam Sharma, Giovanni Zagni

Abstract: The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention. These incredibly useful, natural-sounding tools mark significant advances in natural language generation, yet they exhibit a propensity to generate false, erroneous, or misleading content -- commonly referred to as "hallucinations.… ▽ More The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention. These incredibly useful, natural-sounding tools mark significant advances in natural language generation, yet they exhibit a propensity to generate false, erroneous, or misleading content -- commonly referred to as "hallucinations." Moreover, LLMs can be exploited for malicious applications, such as generating false but credible-sounding content and profiles at scale. This poses a significant challenge to society in terms of the potential deception of users and the increasing dissemination of inaccurate information. In light of these risks, we explore the kinds of technological innovations, regulatory reforms, and AI literacy initiatives needed from fact-checkers, news organizations, and the broader research and policy communities. By identifying the risks, the imminent threats, and some viable solutions, we seek to shed light on navigating various aspects of veracity in the era of generative AI. △ Less

Submitted 9 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: Our article offers a comprehensive examination of the challenges and risks associated with Large Language Models (LLMs), focusing on their potential impact on the veracity of information in today's digital landscape

arXiv:2309.00196 [pdf, other]

doi 10.1145/3583780.3615254

A Comparative Study of Reference Reliability in Multiple Language Editions of Wikipedia

Authors: Aitolkyn Baigutanova, Diego Saez-Trumper, Miriam Redi, Meeyoung Cha, Pablo Aragón

Abstract: Information presented in Wikipedia articles must be attributable to reliable published sources in the form of references. This study examines over 5 million Wikipedia articles to assess the reliability of references in multiple language editions. We quantify the cross-lingual patterns of the perennial sources list, a collection of reliability labels for web domains identified and collaboratively a… ▽ More Information presented in Wikipedia articles must be attributable to reliable published sources in the form of references. This study examines over 5 million Wikipedia articles to assess the reliability of references in multiple language editions. We quantify the cross-lingual patterns of the perennial sources list, a collection of reliability labels for web domains identified and collaboratively agreed upon by Wikipedia editors. We discover that some sources (or web domains) deemed untrustworthy in one language (i.e., English) continue to appear in articles in other languages. This trend is especially evident with sources tailored for smaller communities. Furthermore, non-authoritative sources found in the English version of a page tend to persist in other language versions of that page. We finally present a case study on the Chinese, Russian, and Swedish Wikipedias to demonstrate a discrepancy in reference reliability across cultures. Our finding highlights future challenges in coordinating global knowledge on source reliability. △ Less

Submitted 4 September, 2023; v1 submitted 31 August, 2023; originally announced September 2023.

Comments: Conference on Information & Knowledge Management (CIKM '23)

arXiv:2308.15979 [pdf, other]

doi 10.1145/3583780.3615226

Fine-Grained Socioeconomic Prediction from Satellite Images with Distributional Adjustment

Authors: Donghyun Ahn, Minhyuk Song, Seungeon Lee, Yubin Choi, Jihee Kim, Sangyoon Park, Hyunjoo Yang, Meeyoung Cha

Abstract: While measuring socioeconomic indicators is critical for local governments to make informed policy decisions, such measurements are often unavailable at fine-grained levels like municipality. This study employs deep learning-based predictions from satellite images to close the gap. We propose a method that assigns a socioeconomic score to each satellite image by capturing the distributional behavi… ▽ More While measuring socioeconomic indicators is critical for local governments to make informed policy decisions, such measurements are often unavailable at fine-grained levels like municipality. This study employs deep learning-based predictions from satellite images to close the gap. We propose a method that assigns a socioeconomic score to each satellite image by capturing the distributional behavior observed in larger areas based on the ground truth. We train an ordinal regression scoring model and adjust the scores to follow the common power law within and across regions. Evaluation based on official statistics in South Korea shows that our method outperforms previous models in predicting population and employment size at both the municipality and grid levels. Our method also demonstrates robust performance in districts with uneven development, suggesting its potential use in developing countries where reliable, fine-grained data is scarce. △ Less

Submitted 4 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

ACM Class: J.4

arXiv:2308.09318 [pdf, other]

Towards Attack-tolerant Federated Learning via Critical Parameter Analysis

Authors: Sungwon Han, Sungwon Park, Fangzhao Wu, Sundong Kim, Bin Zhu, Xing Xie, Meeyoung Cha

Abstract: Federated learning is used to train a shared model in a decentralized way without clients sharing private data with each other. Federated learning systems are susceptible to poisoning attacks when malicious clients send false updates to the central server. Existing defense strategies are ineffective under non-IID data settings. This paper proposes a new defense strategy, FedCPA (Federated learning… ▽ More Federated learning is used to train a shared model in a decentralized way without clients sharing private data with each other. Federated learning systems are susceptible to poisoning attacks when malicious clients send false updates to the central server. Existing defense strategies are ineffective under non-IID data settings. This paper proposes a new defense strategy, FedCPA (Federated learning with Critical Parameter Analysis). Our attack-tolerant aggregation method is based on the observation that benign local models have similar sets of top-k and bottom-k critical parameters, whereas poisoned local models do not. Experiments with different attack scenarios on multiple datasets demonstrate that our model outperforms existing defense strategies in defending against poisoning attacks. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: ICCV'23 Accepted

arXiv:2307.09048 [pdf, other]

FedDefender: Client-Side Attack-Tolerant Federated Learning

Authors: Sungwon Park, Sungwon Han, Fangzhao Wu, Sundong Kim, Bin Zhu, Xing Xie, Meeyoung Cha

Abstract: Federated learning enables learning from decentralized data sources without compromising privacy, which makes it a crucial technique. However, it is vulnerable to model poisoning attacks, where malicious clients interfere with the training process. Previous defense mechanisms have focused on the server-side by using careful model aggregation, but this may not be effective when the data is not iden… ▽ More Federated learning enables learning from decentralized data sources without compromising privacy, which makes it a crucial technique. However, it is vulnerable to model poisoning attacks, where malicious clients interfere with the training process. Previous defense mechanisms have focused on the server-side by using careful model aggregation, but this may not be effective when the data is not identically distributed or when attackers can access the information of benign clients. In this paper, we propose a new defense mechanism that focuses on the client-side, called FedDefender, to help benign clients train robust local models and avoid the adverse impact of malicious model updates from attackers, even when a server-side defense cannot identify or remove adversaries. Our method consists of two main components: (1) attack-tolerant local meta update and (2) attack-tolerant global knowledge distillation. These components are used to find noise-resilient model parameters while accurately extracting knowledge from a potentially corrupted global model. Our client-side defense strategy has a flexible structure and can work in conjunction with any existing server-side strategies. Evaluations of real-world scenarios across multiple datasets show that the proposed method enhances the robustness of federated learning against model poisoning attacks. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: KDD'23 research track accepted

arXiv:2306.06176 [pdf, other]

Quantitative Analysis of Cultural Dynamics Seen from an Event-based Social Network

Authors: Bayu Adhi Tama, Jaehong Kim, Jaehyuk Park, Lev Manovich, Meeyoung Cha

Abstract: Culture is a collection of connected and potentially interactive patterns that characterize a social group or a passed-on idea that people acquire as members of society. While offline activities can provide a better picture of the geographical association of cultural traits than online activities, gathering such data on a large scale has been challenging. Here, we use multi-decade longitudinal rec… ▽ More Culture is a collection of connected and potentially interactive patterns that characterize a social group or a passed-on idea that people acquire as members of society. While offline activities can provide a better picture of the geographical association of cultural traits than online activities, gathering such data on a large scale has been challenging. Here, we use multi-decade longitudinal records of cultural events from Meetup.com, the largest event-based social networking service, to examine the landscape of offline cultural events. We analyze the temporal and categorical event dynamics driven by cultural diversity using over 2 million event logs collected over 17 years in 90 countries. Our results show that the national economic status explains 44.6 percent of the variance in total event count, while cultural characteristics such as individualism and long-term orientation explain 32.8 percent of the variance in topic categories. Furthermore, our analysis using hierarchical clustering reveals cultural proximity between the topics of socio-cultural activities (e.g., politics, leisure, health, technology). We expect that this work provides a landscape of social and cultural activities across the world, which allows us to better understand their dynamical patterns as well as their associations with cultural characteristics. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.04738 [pdf, other]

MultiEarth 2023 -- Multimodal Learning for Earth and Environment Workshop and Challenge

Authors: Miriam Cha, Gregory Angelides, Mark Hamilton, Andy Soszynski, Brandon Swenson, Nathaniel Maidel, Phillip Isola, Taylor Perron, Bill Freeman

Abstract: The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023) is the second annual CVPR workshop aimed at the monitoring and analysis of the health of Earth ecosystems by leveraging the vast amount of remote sensing data that is continuously being collected. The primary objective of this workshop is to bring together the Earth and environmental science communities as well as the mul… ▽ More The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023) is the second annual CVPR workshop aimed at the monitoring and analysis of the health of Earth ecosystems by leveraging the vast amount of remote sensing data that is continuously being collected. The primary objective of this workshop is to bring together the Earth and environmental science communities as well as the multimodal representation learning communities to explore new ways of harnessing technological advancements in support of environmental monitoring. The MultiEarth Workshop also seeks to provide a common benchmark for processing multimodal remote sensing information by organizing public challenges focused on monitoring the Amazon rainforest. These challenges include estimating deforestation, detecting forest fires, translating synthetic aperture radar (SAR) images to the visible domain, and projecting environmental trends. This paper presents the challenge guidelines, datasets, and evaluation metrics. Our challenge website is available at https://sites.google.com/view/rainforest-challenge/multiearth-2023. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2305.17696 [pdf, other]

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Authors: Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Meeyoung Cha, Yejin Choi, Byoung Pil Kim, Gunhee Kim, Eun-Ju Lee, Yong Lim, Alice Oh, Sangchul Park, Jung-Woo Ha

Abstract: The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-inte… ▽ More The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: 19 pages, 10 figures, ACL 2023

arXiv:2305.11377 [pdf, other]

GraphFC: Customs Fraud Detection with Label Scarcity

Authors: Karandeep Singh, Yu-Che Tsai, Cheng-Te Li, Meeyoung Cha, Shou-De Lin

Abstract: Custom officials across the world encounter huge volumes of transactions. With increased connectivity and globalization, the customs transactions continue to grow every year. Associated with customs transactions is the customs fraud - the intentional manipulation of goods declarations to avoid the taxes and duties. With limited manpower, the custom offices can only undertake manual inspection of a… ▽ More Custom officials across the world encounter huge volumes of transactions. With increased connectivity and globalization, the customs transactions continue to grow every year. Associated with customs transactions is the customs fraud - the intentional manipulation of goods declarations to avoid the taxes and duties. With limited manpower, the custom offices can only undertake manual inspection of a limited number of declarations. This necessitates the need for automating the customs fraud detection by machine learning (ML) techniques. Due the limited manual inspection for labeling the new-incoming declarations, the ML approach should have robust performance subject to the scarcity of labeled data. However, current approaches for customs fraud detection are not well suited and designed for this real-world setting. In this work, we propose $\textbf{GraphFC}$ ($\textbf{Graph}$ neural networks for $\textbf{C}$ustoms $\textbf{F}$raud), a model-agnostic, domain-specific, semi-supervised graph neural network based customs fraud detection algorithm that has strong semi-supervised and inductive capabilities. With upto 252% relative increase in recall over the present state-of-the-art, extensive experimentation on real customs data from customs administrations of three different countries demonstrate that GraphFC consistently outperforms various baselines and the present state-of-art by a large margin. △ Less

Submitted 19 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2304.06237 [pdf, other]

doi 10.1371/journal.pone.0303178

Deep learning based ECG segmentation for delineation of diverse arrhythmias

Authors: Chankyu Joung, Mijin Kim, Taejin Paik, Seong-Ho Kong, Seung-Young Oh, Won Kyeong Jeon, Jae-hu Jeon, Joong-Sik Hong, Wan-Joong Kim, Woong Kook, Myung-Jin Cha, Otto van Koert

Abstract: Accurate delineation of key waveforms in an ECG is a critical step in extracting relevant features to support the diagnosis and treatment of heart conditions. Although deep learning based methods using segmentation models to locate P, QRS, and T waves have shown promising results, their ability to handle arrhythmias has not been studied in any detail. In this paper we investigate the effect of arr… ▽ More Accurate delineation of key waveforms in an ECG is a critical step in extracting relevant features to support the diagnosis and treatment of heart conditions. Although deep learning based methods using segmentation models to locate P, QRS, and T waves have shown promising results, their ability to handle arrhythmias has not been studied in any detail. In this paper we investigate the effect of arrhythmias on delineation quality and develop strategies to improve performance in such cases. We introduce a U-Net-like segmentation model for ECG delineation with a particular focus on diverse arrhythmias. This is followed by a post-processing algorithm which removes noise and automatically determines the boundaries of P, QRS, and T waves. Our model has been trained on a diverse dataset and evaluated against the LUDB and QTDB datasets to show strong performance, with F1-scores exceeding 99% for QRS and T waves, and over 97% for P waves in the LUDB dataset. Furthermore, we assess various models across a wide array of arrhythmias and observe that models with a strong performance on standard benchmarks may still perform poorly on arrhythmias that are underrepresented in these benchmarks, such as tachycardias. We propose solutions to address this discrepancy. △ Less

Submitted 7 August, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

Journal ref: PLoS ONE 19(6): e0303178 (2024)

arXiv:2304.02176 [pdf, other]

doi 10.1145/3544548.3580953

Blaming Humans and Machines: What Shapes People's Reactions to Algorithmic Harm

Authors: Gabriel Lima, Nina Grgić-Hlača, Meeyoung Cha

Abstract: Artificial intelligence (AI) systems can cause harm to people. This research examines how individuals react to such harm through the lens of blame. Building upon research suggesting that people blame AI systems, we investigated how several factors influence people's reactive attitudes towards machines, designers, and users. The results of three studies (N = 1,153) indicate differences in how blame… ▽ More Artificial intelligence (AI) systems can cause harm to people. This research examines how individuals react to such harm through the lens of blame. Building upon research suggesting that people blame AI systems, we investigated how several factors influence people's reactive attitudes towards machines, designers, and users. The results of three studies (N = 1,153) indicate differences in how blame is attributed to these actors. Whether AI systems were explainable did not impact blame directed at them, their developers, and their users. Considerations about fairness and harmfulness increased blame towards designers and users but had little to no effect on judgments of AI systems. Instead, what determined people's reactive attitudes towards machines was whether people thought blaming them would be a suitable response to algorithmic harm. We discuss implications, such as how future decisions about including AI systems in the social and moral spheres will shape laypeople's reactions to AI-caused harm. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: ACM CHI 2023

arXiv:2303.08403 [pdf, other]

DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision

Authors: Sungwon Han, Seungeon Lee, Fangzhao Wu, Sundong Kim, Chuhan Wu, Xiting Wang, Xing Xie, Meeyoung Cha

Abstract: Algorithmic fairness has become an important machine learning problem, especially for mission-critical Web applications. This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations. Unlike existing models that target a single type of fairness, our model jointly optimizes for two fairness criteria - group fairne… ▽ More Algorithmic fairness has become an important machine learning problem, especially for mission-critical Web applications. This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations. Unlike existing models that target a single type of fairness, our model jointly optimizes for two fairness criteria - group fairness and counterfactual fairness - and hence makes fairer predictions at both the group and individual levels. Our model uses contrastive loss to generate embeddings that are indistinguishable for each protected group, while forcing the embeddings of counterfactual pairs to be similar. It then uses a self-knowledge distillation method to maintain the quality of representation for the downstream tasks. Extensive analysis over multiple datasets confirms the model's validity and further shows the synergy of jointly addressing two fairness criteria, suggesting the model's potential value in fair intelligent Web applications. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted and will be published at TheWebConf2023 (WWW2023)

arXiv:2303.05227 [pdf, other]

doi 10.1145/3543507.3583218

Longitudinal Assessment of Reference Quality on Wikipedia

Authors: Aitolkyn Baigutanova, Jaehyeon Myung, Diego Saez-Trumper, Ai-Jou Chou, Miriam Redi, Changwook Jung, Meeyoung Cha

Abstract: Wikipedia plays a crucial role in the integrity of the Web. This work analyzes the reliability of this global encyclopedia through the lens of its references. We operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references. We release Citation Detec… ▽ More Wikipedia plays a crucial role in the integrity of the Web. This work analyzes the reliability of this global encyclopedia through the lens of its references. We operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references. We release Citation Detective, a tool for automatically calculating the RN score, and discover that the RN score has dropped by 20 percent point in the last decade, with more than half of verifiable statements now accompanying references. The RR score has remained below 1% over the years as a result of the efforts of the community to eliminate unreliable references. We propose pairing novice and experienced editors on the same Wikipedia article as a strategy to enhance reference quality. Our quasi-experiment indicates that such a co-editing experience can result in a lasting advantage in identifying unreliable sources in future edits. As Wikipedia is frequently used as the ground truth for numerous Web applications, our findings and suggestions on its reliability can have a far-reaching impact. We discuss the possibility of other Web services adopting Wiki-style user collaboration to eliminate unreliable content. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: Published at the Web Conference 2023 (WWW '23)

Journal ref: Proceedings of the ACM Web Conference 2023 (WWW '23), May 1-5, 2023, Austin, TX, USA. ACM

arXiv:2302.04730 [pdf, other]

A Benchmark on Uncertainty Quantification for Deep Learning Prognostics

Authors: Luis Basora, Arthur Viens, Manuel Arias Chao, Xavier Olive

Abstract: Reliable uncertainty quantification on RUL prediction is crucial for informative decision-making in predictive maintenance. In this context, we assess some of the latest developments in the field of uncertainty quantification for prognostics deep learning. This includes the state-of-the-art variational inference algorithms for Bayesian neural networks (BNN) as well as popular alternatives such as… ▽ More Reliable uncertainty quantification on RUL prediction is crucial for informative decision-making in predictive maintenance. In this context, we assess some of the latest developments in the field of uncertainty quantification for prognostics deep learning. This includes the state-of-the-art variational inference algorithms for Bayesian neural networks (BNN) as well as popular alternatives such as Monte Carlo Dropout (MCD), deep ensembles (DE) and heteroscedastic neural networks (HNN). All the inference techniques share the same inception deep learning architecture as a functional model. We performed hyperparameter search to optimize the main variational and learning parameters of the algorithms. The performance of the methods is evaluated on a subset of the large NASA NCMAPSS dataset for aircraft engines. The assessment includes RUL prediction accuracy, the quality of predictive uncertainty, and the possibility to break down the total predictive uncertainty into its aleatoric and epistemic parts. The results show no method clearly outperforms the others in all the situations. Although all methods are close in terms of accuracy, we find differences in the way they estimate uncertainty. Thus, DE and MCD generally provide more conservative predictive uncertainty than BNN. Surprisingly, HNN can achieve strong results without the added training complexity and extra parameters of the BNN. For tasks like active learning where a separation of epistemic and aleatoric uncertainty is required, radial BNN and MCD seem the best options. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: 29 pages, 16 figures

arXiv:2302.04465 [pdf, other]

Detecting Contextomized Quotes in News Headlines by Contrastive Learning

Authors: Seonyeong Song, Hyeonho Song, Kunwoo Park, Jiyoung Han, Meeyoung Cha

Abstract: Quotes are critical for establishing credibility in news articles. A direct quote enclosed in quotation marks has a strong visual appeal and is a sign of a reliable citation. Unfortunately, this journalistic practice is not strictly followed, and a quote in the headline is often "contextomized." Such a quote uses words out of context in a way that alters the speaker's intention so that there is no… ▽ More Quotes are critical for establishing credibility in news articles. A direct quote enclosed in quotation marks has a strong visual appeal and is a sign of a reliable citation. Unfortunately, this journalistic practice is not strictly followed, and a quote in the headline is often "contextomized." Such a quote uses words out of context in a way that alters the speaker's intention so that there is no semantically matching quote in the body text. We present QuoteCSE, a contrastive learning framework that represents the embedding of news quotes based on domain-driven positive and negative samples to identify such an editorial strategy. The dataset and code are available at https://github.com/ssu-humane/contextomized-quote-contrastive. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: 8 pages, EACL 2023 (Findings)

arXiv:2211.02784 [pdf, other]

Waveguide Holography: Towards True 3D Holographic Glasses

Authors: Changwon Jang, Kiseung Bang, Minseok Chae, Byoungho Lee, Douglas Lanman

Abstract: We present a novel near-eye display concept which consists of a waveguide combiner, a spatial light modulator, and a laser light source. The proposed system can display true 3D holographic images through see-through pupil-replicating waveguide combiner as well as providing a large eye-box. By modeling the coherent light interaction inside of the waveguide combiner, we demonstrate that the output w… ▽ More We present a novel near-eye display concept which consists of a waveguide combiner, a spatial light modulator, and a laser light source. The proposed system can display true 3D holographic images through see-through pupil-replicating waveguide combiner as well as providing a large eye-box. By modeling the coherent light interaction inside of the waveguide combiner, we demonstrate that the output wavefront from the waveguide can be controlled by modulating the wavefront of input light using a spatial light modulator. This new possibility allows combining a holographic display, which is considered as the ultimate 3D display technology, with the state-of-the-art pupil replicating waveguides, enabling the path towards true 3D holographic augmented reality glasses. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2210.07024 [pdf, other]

Self-explaining deep models with logic rule reasoning

Authors: Seungeon Lee, Xiting Wang, Sungwon Han, Xiaoyuan Yi, Xing Xie, Meeyoung Cha

Abstract: We present SELOR, a framework for integrating self-explaining capabilities into a given deep model to achieve both high prediction performance and human precision. By "human precision", we refer to the degree to which humans agree with the reasons models provide for their predictions. Human precision affects user trust and allows users to collaborate closely with the model. We demonstrate that log… ▽ More We present SELOR, a framework for integrating self-explaining capabilities into a given deep model to achieve both high prediction performance and human precision. By "human precision", we refer to the degree to which humans agree with the reasons models provide for their predictions. Human precision affects user trust and allows users to collaborate closely with the model. We demonstrate that logic rule explanations naturally satisfy human precision with the expressive power required for good predictive performance. We then illustrate how to enable a deep model to predict and explain with logic rules. Our method does not require predefined logic rule sets or human annotations and can be learned efficiently and easily with widely-used deep learning modules in a differentiable way. Extensive experiments show that our method gives explanations closer to human decision logic than other methods while maintaining the performance of deep learning models. △ Less

Submitted 18 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: 26 pages including reference, checklist, and appendix. Accepted in NeurIPS 2022

arXiv:2208.03218 [pdf, other]

doi 10.1007/978-3-031-16876-5_3

RadTex: Learning Efficient Radiograph Representations from Text Reports

Authors: Keegan Quigley, Miriam Cha, Ruizhi Liao, Geeticka Chauhan, Steven Horng, Seth Berkowitz, Polina Golland

Abstract: Automated analysis of chest radiography using deep learning has tremendous potential to enhance the clinical diagnosis of diseases in patients. However, deep learning models typically require large amounts of annotated data to achieve high performance -- often an obstacle to medical domain adaptation. In this paper, we build a data-efficient learning framework that utilizes radiology reports to im… ▽ More Automated analysis of chest radiography using deep learning has tremendous potential to enhance the clinical diagnosis of diseases in patients. However, deep learning models typically require large amounts of annotated data to achieve high performance -- often an obstacle to medical domain adaptation. In this paper, we build a data-efficient learning framework that utilizes radiology reports to improve medical image classification performance with limited labeled data (fewer than 1000 examples). Specifically, we examine image-captioning pretraining to learn high-quality medical image representations that train on fewer examples. Following joint pretraining of a convolutional encoder and transformer decoder, we transfer the learned encoder to various classification tasks. Averaged over 9 pathologies, we find that our model achieves higher classification performance than ImageNet-supervised and in-domain supervised pretraining when labeled training data is limited. △ Less

Submitted 7 April, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: Awarded Best Paper at Resource Efficient Medical Image Analysis (REMIA) Workshop, MICCAI 2022

arXiv:2207.13184 [pdf, other]

SAR-to-EO Image Translation with Multi-Conditional Adversarial Networks

Authors: Armando Cabrera, Miriam Cha, Prafull Sharma, Michael Newey

Abstract: This paper explores the use of multi-conditional adversarial networks for SAR-to-EO image translation. Previous methods condition adversarial networks only on the input SAR. We show that incorporating multiple complementary modalities such as Google maps and IR can further improve SAR-to-EO image translation especially on preserving sharp edges of manmade objects. We demonstrate effectiveness of o… ▽ More This paper explores the use of multi-conditional adversarial networks for SAR-to-EO image translation. Previous methods condition adversarial networks only on the input SAR. We show that incorporating multiple complementary modalities such as Google maps and IR can further improve SAR-to-EO image translation especially on preserving sharp edges of manmade objects. We demonstrate effectiveness of our approach on a diverse set of datasets including SEN12MS, DFC2020, and SpaceNet6. Our experimental results suggest that additional information provided by complementary modalities improves the performance of SAR-to-EO image translation compared to the models trained on paired SAR and EO data only. To best of our knowledge, our approach is the first to leverage multiple modalities for improving SAR-to-EO image translation performance. △ Less

Submitted 26 July, 2022; originally announced July 2022.

arXiv:2207.09158 [pdf, other]

FedX: Unsupervised Federated Learning with Cross Knowledge Distillation

Authors: Sungwon Han, Sungwon Park, Fangzhao Wu, Sundong Kim, Chuhan Wu, Xing Xie, Meeyoung Cha

Abstract: This paper presents FedX, an unsupervised federated learning framework. Our model learns unbiased representation from decentralized and heterogeneous local data. It employs a two-sided knowledge distillation with contrastive learning as a core component, allowing the federated system to function without requiring clients to share any data features. Furthermore, its adaptable architecture can be us… ▽ More This paper presents FedX, an unsupervised federated learning framework. Our model learns unbiased representation from decentralized and heterogeneous local data. It employs a two-sided knowledge distillation with contrastive learning as a core component, allowing the federated system to function without requiring clients to share any data features. Furthermore, its adaptable architecture can be used as an add-on module for existing unsupervised algorithms in federated settings. Experiments show that our model improves performance significantly (1.58--5.52pp) on five unsupervised algorithms. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted and will be published at ECCV2022

arXiv:2207.07033 [pdf, other]

Developing a Series of AI Challenges for the United States Department of the Air Force

Authors: Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron , et al. (17 additional authors not shown)

Abstract: Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requireme… ▽ More Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requirements. Several projects supported by the DAF-MIT AI Accelerator are developing public challenge problems that address numerous Federal AI research priorities. These challenges target priorities by making large, AI-ready datasets publicly available, incentivizing open-source solutions, and creating a demand signal for dual use technologies that can stimulate further research. In this article, we describe these public challenges being developed and how their application contributes to scientific advances. △ Less

Submitted 14 July, 2022; originally announced July 2022.

arXiv:2206.13246 [pdf, other]

Prediction of Football Player Value using Bayesian Ensemble Approach

Authors: Hansoo Lee, Bayu Adhi Tama, Meeyoung Cha

Abstract: The transfer fees of sports players have become astronomical. This is because bringing players of great future value to the club is essential for their survival. We present a case study on the key factors affecting the world's top soccer players' transfer fees based on the FIFA data analysis. To predict each player's market value, we propose an improved LightGBM model by optimizing its hyperparame… ▽ More The transfer fees of sports players have become astronomical. This is because bringing players of great future value to the club is essential for their survival. We present a case study on the key factors affecting the world's top soccer players' transfer fees based on the FIFA data analysis. To predict each player's market value, we propose an improved LightGBM model by optimizing its hyperparameter using a Tree-structured Parzen Estimator (TPE) algorithm. We identify prominent features by the SHapley Additive exPlanations (SHAP) algorithm. The proposed method has been compared against the baseline regression models (e.g., linear regression, lasso, elastic net, kernel ridge regression) and gradient boosting model without hyperparameter optimization. The optimized LightGBM model showed an excellent accuracy of approximately 3.8, 1.4, and 1.8 times on average compared to the regression baseline models, GBDT, and LightGBM model in terms of RMSE. Our model offers interpretability in deciding what attributes football clubs should consider in recruiting players in the future. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: 17 pages, 4 figures, 6 tables, will be published in Journal of Expert Systems with Applications

arXiv:2205.05306 [pdf, other]

doi 10.1145/3531146.3534628

The Conflict Between Explainable and Accountable Decision-Making Algorithms

Authors: Gabriel Lima, Nina Grgić-Hlača, Jin Keun Jeong, Meeyoung Cha

Abstract: Decision-making algorithms are being used in important decisions, such as who should be enrolled in health care programs and be hired. Even though these systems are currently deployed in high-stakes scenarios, many of them cannot explain their decisions. This limitation has prompted the Explainable Artificial Intelligence (XAI) initiative, which aims to make algorithms explainable to comply with l… ▽ More Decision-making algorithms are being used in important decisions, such as who should be enrolled in health care programs and be hired. Even though these systems are currently deployed in high-stakes scenarios, many of them cannot explain their decisions. This limitation has prompted the Explainable Artificial Intelligence (XAI) initiative, which aims to make algorithms explainable to comply with legal requirements, promote trust, and maintain accountability. This paper questions whether and to what extent explainability can help solve the responsibility issues posed by autonomous AI systems. We suggest that XAI systems that provide post-hoc explanations could be seen as blameworthy agents, obscuring the responsibility of developers in the decision-making process. Furthermore, we argue that XAI could result in incorrect attributions of responsibility to vulnerable stakeholders, such as those who are subjected to algorithmic decisions (i.e., patients), due to a misguided perception that they have control over explainable algorithms. This conflict between explainability and accountability can be exacerbated if designers choose to use algorithms and patients as moral and legal scapegoats. We conclude with a set of recommendations for how to approach this tension in the socio-technical process of algorithmic decision-making and a defense of hard regulation to prevent designers from escaping responsibility. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: To appear in the FAccT 2022 proceedings

arXiv:2205.01472 [pdf, other]

Learning Economic Indicators by Aggregating Multi-Level Geospatial Information

Authors: Sungwon Park, Sungwon Han, Donghyun Ahn, Jaeyeon Kim, Jeasurk Yang, Susang Lee, Seunghoon Hong, Jihee Kim, Sangyoon Park, Hyunjoo Yang, Meeyoung Cha

Abstract: High-resolution daytime satellite imagery has become a promising source to study economic activities. These images display detailed terrain over large areas and allow zooming into smaller neighborhoods. Existing methods, however, have utilized images only in a single-level geographical unit. This research presents a deep learning model to predict economic indicators via aggregating traits observed… ▽ More High-resolution daytime satellite imagery has become a promising source to study economic activities. These images display detailed terrain over large areas and allow zooming into smaller neighborhoods. Existing methods, however, have utilized images only in a single-level geographical unit. This research presents a deep learning model to predict economic indicators via aggregating traits observed from multiple levels of geographical units. The model first measures hyperlocal economy over small communities via ordinal regression. The next step extracts district-level features by summarizing interconnection among hyperlocal economies. In the final step, the model estimates economic indicators of districts via aggregating the hyperlocal and district information. Our new multi-level learning model substantially outperforms strong baselines in predicting key indicators such as population, purchasing power, and energy consumption. The model is also robust against data shortage; the trained features from one country can generalize to other countries when evaluated with data gathered from Malaysia, the Philippines, Thailand, and Vietnam. We discuss the multi-level model's implications for measuring inequality, which is the essential first step in policy and social science research on inequality and poverty. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: Accepted at AAAI2022

arXiv:2204.13508 [pdf, other]

Design of Blockchain-based Travel Rule Compliance System

Authors: Chaehyeon Lee, Changhoon Kang, Wonseok Choi, Jehoon Lee, Myunghun Cha, Jongsoo Woo, James Won-Ki Hong

Abstract: In accordance with the guidelines of the Financial Action Task Force (FATF), Virtual Asset Service Providers (VASPs) should comply with a `travel rule', which requires them to exchange originator's and beneficiary's personal information when transferring virtual assets. In this paper, we propose a novel blockchain-based travel rule compliance system that supports fully-decentralized data exchange.… ▽ More In accordance with the guidelines of the Financial Action Task Force (FATF), Virtual Asset Service Providers (VASPs) should comply with a `travel rule', which requires them to exchange originator's and beneficiary's personal information when transferring virtual assets. In this paper, we propose a novel blockchain-based travel rule compliance system that supports fully-decentralized data exchange. The proposed system uses a permissioned blockchain, and thereby eliminates the possibility of leakage of personal information to third parties or even to travel rule service providers, and ensures that travel rule data can be managed securely. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: 3 pages, 1 figure, 1 table. Accepted to IEEE ICBC 2022 as a poster paper

arXiv:2204.11095 [pdf, ps, other]

EdgeKeeper: Resilient and Lightweight Coordination for Mobile Edge Computing Systems

Authors: S. Bhunia, R. Stoleru, M. Sagor, A. Haroon, A. Altaweel, M. Chao, M. Maurice, R. Blalock

Abstract: Mobile Edge Computing (MEC) has been gaining significant interest from first responders and tactical teams, primarily because they can employ handheld mobile devices to form a computing cluster (for computing tasks like face/scene recognition, virtual assistance) when connectivity to the cloud is not present or it is limited. High user mobility in first responder or tactical environments makes MEC… ▽ More Mobile Edge Computing (MEC) has been gaining significant interest from first responders and tactical teams, primarily because they can employ handheld mobile devices to form a computing cluster (for computing tasks like face/scene recognition, virtual assistance) when connectivity to the cloud is not present or it is limited. High user mobility in first responder or tactical environments makes MEC challenging, as wireless links observe substantial fluctuations. Typical cloud-based coordination (e.g., ZooKeeper-based service discovery and coordination, device naming, security) needed by edge computing tasks cannot work in these environments. Driven by the need for a resilient and lightweight coordination service, in this paper, we design and implement \ek to provide cloud-like coordination for MEC systems. It provides naming, network management, application coordination, and security to distributed edge computing applications. It maintains an edge cluster among devices and intelligently stores its data on a group of replicas to guard against node failure and disconnections. We provide a full-system implementation of EdgeKeeper for Android and Linux platforms. We have integrated EdgeKeeper with existing MEC applications and performed real-world performance evaluations in a wide-area search and rescue operation conducted by first responders, which proves it to be lightweight and suitable for mobile devices. △ Less

Submitted 23 April, 2022; originally announced April 2022.

Comments: 12 Pages

arXiv:2204.10823 [pdf, ps, other]

R-Drive: Resilient Data Storage and Sharing for Mobile Edge Computing Systems

Authors: M. Sagor, R. Stoleru, S. Bhunia, M. Chao, A. Haroon, A. Altaweel, M. Maurice, R. Blalock

Abstract: Mobile edge computing (MEC) systems (in which intensive computation and data storage tasks are performed locally, due to the absence of communication infrastructure for connectivity to the cloud) are currently being developed for disaster response applications and for tactical environments. MEC applications for these scenarios generate and process significant mission-critical and personal data tha… ▽ More Mobile edge computing (MEC) systems (in which intensive computation and data storage tasks are performed locally, due to the absence of communication infrastructure for connectivity to the cloud) are currently being developed for disaster response applications and for tactical environments. MEC applications for these scenarios generate and process significant mission-critical and personal data that require resilient and secure storage and sharing. In this paper, we present the design, implementation, and evaluation of R-Drive, a resilient data storage and sharing framework for disaster response and tactical MEC applications. R-Drive employs erasure coding and data encryption, ensuring resilient and secure data storage against device failure. R-Drive adaptively chooses erasure coding parameters to ensure the highest data availability with a minimal storage cost. R-Drive's distributed directory service provides a resilient and secure namespace for files with rigorous access control management. R-Drive leverages opportunistic networking, allowing data storage and sharing in mobile and loosely connected edge computing environments. We implemented R-Drive on Android, and integrated it with existing MEC applications. Performance evaluation results show that R-Drive enables resilient and secure data storage and sharing. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: 13 pages

arXiv:2204.07649 [pdf, other]

MultiEarth 2022 -- Multimodal Learning for Earth and Environment Workshop and Challenge

Authors: Miriam Cha, Kuan Wei Huang, Morgan Schmidt, Gregory Angelides, Mark Hamilton, Sam Goldberg, Armando Cabrera, Phillip Isola, Taylor Perron, Bill Freeman, Yen-Chen Lin, Brandon Swenson, Jean Piou

Abstract: The Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022) will be the first competition aimed at the monitoring and analysis of deforestation in the Amazon rainforest at any time and in any weather conditions. The goal of the Challenge is to provide a common benchmark for multimodal information processing and to bring together the earth and environmental science communities as… ▽ More The Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022) will be the first competition aimed at the monitoring and analysis of deforestation in the Amazon rainforest at any time and in any weather conditions. The goal of the Challenge is to provide a common benchmark for multimodal information processing and to bring together the earth and environmental science communities as well as multimodal representation learning communities to compare the relative merits of the various multimodal learning methods to deforestation estimation under well-defined and strictly comparable conditions. MultiEarth 2022 will have three sub-challenges: 1) matrix completion, 2) deforestation estimation, and 3) image-to-image translation. This paper presents the challenge guidelines, datasets, and evaluation metrics for the three sub-challenges. Our challenge website is available at https://sites.google.com/view/rainforest-challenge. △ Less

Submitted 31 May, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

arXiv:2201.06759 [pdf, other]

Knowledge Sharing via Domain Adaptation in Customs Fraud Detection

Authors: Sungwon Park, Sundong Kim, Meeyoung Cha

Abstract: Knowledge of the changing traffic is critical in risk management. Customs offices worldwide have traditionally relied on local resources to accumulate knowledge and detect tax fraud. This naturally poses countries with weak infrastructure to become tax havens of potentially illicit trades. The current paper proposes DAS, a memory bank platform to facilitate knowledge sharing across multi-national… ▽ More Knowledge of the changing traffic is critical in risk management. Customs offices worldwide have traditionally relied on local resources to accumulate knowledge and detect tax fraud. This naturally poses countries with weak infrastructure to become tax havens of potentially illicit trades. The current paper proposes DAS, a memory bank platform to facilitate knowledge sharing across multi-national customs administrations to support each other. We propose a domain adaptation method to share transferable knowledge of frauds as prototypes while safeguarding the local trade information. Data encompassing over 8 million import declarations have been used to test the feasibility of this new system, which shows that participating countries may benefit up to 2-11 times in fraud detection with the help of shared knowledge. We discuss implications for substantial tax revenue potential and strengthened policy against illicit trades. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: AAAI2022, Special track on AI for Social Impact

arXiv:2111.01663 [pdf, other]

Classification of Goods Using Text Descriptions With Sentences Retrieval

Authors: Eunji Lee, Sundong Kim, Sihyun Kim, Sungwon Park, Meeyoung Cha, Soyeon Jung, Suyoung Yang, Yeonsoo Choi, Sungdae Ji, Minsoo Song, Heeja Kim

Abstract: The task of assigning and validating internationally accepted commodity code (HS code) to traded goods is one of the critical functions at the customs office. This decision is crucial to importers and exporters, as it determines the tariff rate. However, similar to court decisions made by judges, the task can be non-trivial even for experienced customs officers. The current paper proposes a deep l… ▽ More The task of assigning and validating internationally accepted commodity code (HS code) to traded goods is one of the critical functions at the customs office. This decision is crucial to importers and exporters, as it determines the tariff rate. However, similar to court decisions made by judges, the task can be non-trivial even for experienced customs officers. The current paper proposes a deep learning model to assist this seemingly challenging HS code classification. Together with Korea Customs Service, we built a decision model based on KoELECTRA that suggests the most likely heading and subheadings (i.e., the first four and six digits) of the HS code. Evaluation on 129,084 past cases shows that the top-3 suggestions made by our model have an accuracy of 95.5% in classifying 265 subheadings. This promising result implies algorithms may reduce the time and effort taken by customs officers substantially by assisting the HS code classification task. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2106.10147 [pdf, other]

Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in Deep Neural Networks

Authors: Suyoung Lee, Wonho Song, Suman Jana, Meeyoung Cha, Sooel Son

Abstract: Trigger set-based watermarking schemes have gained emerging attention as they provide a means to prove ownership for deep neural network model owners. In this paper, we argue that state-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We posit that this impaired capability stems from two common experimental flaws that the existing resear… ▽ More Trigger set-based watermarking schemes have gained emerging attention as they provide a means to prove ownership for deep neural network model owners. In this paper, we argue that state-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We posit that this impaired capability stems from two common experimental flaws that the existing research practice has committed when evaluating the robustness of watermarking algorithms: (1) incomplete adversarial evaluation and (2) overlooked adaptive attacks. We conduct a comprehensive adversarial evaluation of 11 representative watermarking schemes against six of the existing attacks and demonstrate that each of these watermarking schemes lacks robustness against at least two non-adaptive attacks. We also propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model. We demonstrate that the proposed attacks effectively break all of the 11 watermarking schemes, consequently allowing adversaries to obscure the ownership of any watermarked model. We encourage follow-up studies to consider our guidelines when evaluating the robustness of their watermarking schemes via conducting comprehensive adversarial evaluation that includes our adaptive attacks to demonstrate a meaningful upper bound of watermark robustness. △ Less

Submitted 19 January, 2023; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 15 pages, accepted at IEEE TDSC

arXiv:2105.04046 [pdf, other]

A likelihood approach to nonparametric estimation of a singular distribution using deep generative models

Authors: Minwoo Chae, Dongha Kim, Yongdai Kim, Lizhen Lin

Abstract: We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional… ▽ More We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly. △ Less

Submitted 28 March, 2023; v1 submitted 9 May, 2021; originally announced May 2021.

Comments: 42 pages, 13 figures, 1 table

MSC Class: 62G05 (Primary); 62G20 (Secondary)

arXiv:2104.10864 [pdf, other]

doi 10.1371/journal.pone.0263381

Misinformation, Believability, and Vaccine Acceptance Over 40 Countries: Takeaways From the Initial Phase of The COVID-19 Infodemic

Authors: Karandeep Singh, Gabriel Lima, Meeyoung Cha, Chiyoung Cha, Juhi Kulshrestha, Yong-Yeol Ahn, Onur Varol

Abstract: The COVID-19 pandemic has been damaging to the lives of people all around the world. Accompanied by the pandemic is an infodemic, an abundant and uncontrolled spreading of potentially harmful misinformation. The infodemic may severely change the pandemic's course by interfering with public health interventions such as wearing masks, social distancing, and vaccination. In particular, the impact of… ▽ More The COVID-19 pandemic has been damaging to the lives of people all around the world. Accompanied by the pandemic is an infodemic, an abundant and uncontrolled spreading of potentially harmful misinformation. The infodemic may severely change the pandemic's course by interfering with public health interventions such as wearing masks, social distancing, and vaccination. In particular, the impact of the infodemic on vaccination is critical because it holds the key to reverting to pre-pandemic normalcy. This paper presents findings from a global survey on the extent of worldwide exposure to the COVID-19 infodemic, assesses different populations' susceptibility to false claims, and analyzes its association with vaccine acceptance. Based on responses gathered from over 18,400 individuals from 40 countries, we find a strong association between perceived believability of misinformation and vaccination hesitancy. Additionally, our study shows that only half of the online users exposed to rumors might have seen the fact-checked information. Moreover, depending on the country, between 6% and 37% of individuals considered these rumors believable. Our survey also shows that poorer regions are more susceptible to encountering and believing COVID-19 misinformation. We discuss implications of our findings on public campaigns that proactively spread accurate information to countries that are more susceptible to the infodemic. We also highlight fact-checking platforms' role in better identifying and prioritizing claims that are perceived to be believable and have wide exposure. Our findings give insights into better handling of risk communication during the initial phase of a future pandemic. △ Less

Submitted 22 April, 2021; originally announced April 2021.

arXiv:2104.03613 [pdf, other]

Uncertainty-aware Remaining Useful Life predictor

Authors: Luca Biggio, Alexander Wieland, Manuel Arias Chao, Iason Kastanis, Olga Fink

Abstract: Remaining Useful Life (RUL) estimation is the problem of inferring how long a certain industrial asset can be expected to operate within its defined specifications. Deploying successful RUL prediction methods in real-life applications is a prerequisite for the design of intelligent maintenance strategies with the potential of drastically reducing maintenance costs and machine downtimes. In light o… ▽ More Remaining Useful Life (RUL) estimation is the problem of inferring how long a certain industrial asset can be expected to operate within its defined specifications. Deploying successful RUL prediction methods in real-life applications is a prerequisite for the design of intelligent maintenance strategies with the potential of drastically reducing maintenance costs and machine downtimes. In light of their superior performance in a wide range of engineering fields, Machine Learning (ML) algorithms are natural candidates to tackle the challenges involved in the design of intelligent maintenance systems. In particular, given the potentially catastrophic consequences or substantial costs associated with maintenance decisions that are either too late or too early, it is desirable that ML algorithms provide uncertainty estimates alongside their predictions. However, standard data-driven methods used for uncertainty estimation in RUL problems do not scale well to large datasets or are not sufficiently expressive to model the high-dimensional mapping from raw sensor data to RUL estimates. In this work, we consider Deep Gaussian Processes (DGPs) as possible solutions to the aforementioned limitations. We perform a thorough evaluation and comparison of several variants of DGPs applied to RUL predictions. The performance of the algorithms is evaluated on the N-CMAPSS (New Commercial Modular Aero-Propulsion System Simulation) dataset from NASA for aircraft engines. The results show that the proposed methods are able to provide very accurate RUL predictions along with sensible uncertainty estimates, providing more reliable solutions for (safety-critical) real-life industrial applications. △ Less

Submitted 8 April, 2021; originally announced April 2021.

arXiv:2103.15296 [pdf, other]

Elsa: Energy-based learning for semi-supervised anomaly detection

Authors: Sungwon Han, Hyeonho Song, Seungeon Lee, Sungwon Park, Meeyoung Cha

Abstract: Anomaly detection aims at identifying deviant instances from the normal data distribution. Many advances have been made in the field, including the innovative use of unsupervised contrastive learning. However, existing methods generally assume clean training data and are limited when the data contain unknown anomalies. This paper presents Elsa, a novel semi-supervised anomaly detection approach th… ▽ More Anomaly detection aims at identifying deviant instances from the normal data distribution. Many advances have been made in the field, including the innovative use of unsupervised contrastive learning. However, existing methods generally assume clean training data and are limited when the data contain unknown anomalies. This paper presents Elsa, a novel semi-supervised anomaly detection approach that unifies the concept of energy-based models with unsupervised contrastive learning. Elsa instills robustness against any data contamination by a carefully designed fine-tuning step based on the new energy function that forces the normal data to be divided into classes of prototypes. Experiments on multiple contamination scenarios show the proposed model achieves SOTA performance. Extensive analyses also verify the contribution of each component in the proposed model. Beyond the experiments, we also offer a theoretical interpretation of why contrastive learning alone cannot detect anomalies under data contamination. △ Less

Submitted 3 January, 2022; v1 submitted 28 March, 2021; originally announced March 2021.

Comments: Accepted and published at BMVC2021

arXiv:2103.04537 [pdf, other]

doi 10.1007/978-3-030-87196-3_26

Multimodal Representation Learning via Maximization of Local Mutual Information

Authors: Ruizhi Liao, Daniel Moyer, Miriam Cha, Keegan Quigley, Seth Berkowitz, Steven Horng, Polina Golland, William M. Wells

Abstract: We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Our method trains image and text encoders by encouraging the resulting represe… ▽ More We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Our method trains image and text encoders by encouraging the resulting representations to exhibit high local mutual information. We make use of recent advances in mutual information estimation with neural network discriminators. We argue that the sum of local mutual information is typically a lower bound on the global mutual information. Our experimental results in the downstream image classification tasks demonstrate the advantages of using local features for image-text representation learning. △ Less

Submitted 14 December, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

Comments: In Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2021

Journal ref: In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 273-283. Springer, Cham, 2021

arXiv:2102.07650 [pdf, other]

Learning Student-Friendly Teacher Networks for Knowledge Distillation

Authors: Dae Young Park, Moon-Hyun Cha, Changwook Jeong, Dae Sin Kim, Bohyung Han

Abstract: We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students and, consequently, more appropriate for knowledge transfer. In other words, at the time of o… ▽ More We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students and, consequently, more appropriate for knowledge transfer. In other words, at the time of optimizing a teacher model, the proposed algorithm learns the student branches jointly to obtain student-friendly representations. Since the main goal of our approach lies in training teacher models and the subsequent knowledge distillation procedure is straightforward, most of the existing knowledge distillation methods can adopt this technique to improve the performance of diverse student models in terms of accuracy and convergence speed. The proposed algorithm demonstrates outstanding accuracy in several well-known knowledge distillation techniques with various combinations of teacher and student models even in the case that their architectures are heterogeneous and there is no prior knowledge about student models at the time of training teacher networks. △ Less

Submitted 23 January, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: Accepted by NeurIPS 2021

Showing 1–50 of 93 results for author: Chao, M