-
Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores
Authors:
Kiyoon Jeong,
Woojun Lee,
Woongchan Nam,
Minjeong Ma,
Pilsung Kang
Abstract:
This report presents the ECO (Ensembled Clip score and cOnsensus score) pipeline from team DSBA LAB, which is a new framework used to evaluate and rank captions for a given image. ECO selects the most accurate caption describing image. It is made possible by combining an Ensembled CLIP score, which considers the semantic alignment between the image and captions, with a Consensus score that account…
▽ More
This report presents the ECO (Ensembled Clip score and cOnsensus score) pipeline from team DSBA LAB, which is a new framework used to evaluate and rank captions for a given image. ECO selects the most accurate caption describing image. It is made possible by combining an Ensembled CLIP score, which considers the semantic alignment between the image and captions, with a Consensus score that accounts for the essentialness of the captions. Using this framework, we achieved notable success in the CVPR 2024 Workshop Challenge on Caption Re-ranking Evaluation at the New Frontiers for Zero-Shot Image Captioning Evaluation (NICE). Specifically, we secured third place based on the CIDEr metric, second in both the SPICE and METEOR metrics, and first in the ROUGE-L and all BLEU Score metrics. The code and configuration for the ECO framework are available at https://github.com/DSBA-Lab/ECO .
△ Less
Submitted 13 June, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Integrating Paralinguistics in Speech-Empowered Large Language Models for Natural Conversation
Authors:
Heeseung Kim,
Soonshin Seo,
Kyeongseok Jeong,
Ohsung Kwon,
Soyoon Kim,
Jungwhan Kim,
Jaehong Lee,
Eunwoo Song,
Myungwoo Oh,
Jung-Woo Ha,
Sungroh Yoon,
Kang Min Yoo
Abstract:
Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech. However, an LLM-based strategy for modeling spoken dialogs remains elusive, calling for further investigation. This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken respons…
▽ More
Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech. However, an LLM-based strategy for modeling spoken dialogs remains elusive, calling for further investigation. This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken responses with naturally occurring prosodic features relevant to the given input speech without relying on explicit automatic speech recognition (ASR) or text-to-speech (TTS) systems. We have verified the inclusion of prosody in speech tokens that predominantly contain semantic information and have used this foundation to construct a prosody-infused speech-text model. Additionally, we propose a generalized speech-text pretraining scheme that enhances the capture of cross-modal semantics. To construct USDM, we fine-tune our speech-text model on spoken dialog data using a multi-step spoken dialog template that stimulates the chain-of-reasoning capabilities exhibited by the underlying LLM. Automatic and human evaluations on the DailyTalk dataset demonstrate that our approach effectively generates natural-sounding spoken responses, surpassing previous and cascaded baselines. We will make our code and checkpoints publicly available.
△ Less
Submitted 26 August, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Which is better? Exploring Prompting Strategy For LLM-based Metrics
Authors:
Joonghoon Kim,
Saeran Park,
Kiyoon Jeong,
Sangmin Lee,
Seung Hun Han,
Jiyoon Lee,
Pilsung Kang
Abstract:
This paper describes the DSBA submissions to the Prompting Large Language Models as Explainable Metrics shared task, where systems were submitted to two tracks: small and large summarization tracks. With advanced Large Language Models (LLMs) such as GPT-4, evaluating the quality of Natural Language Generation (NLG) has become increasingly paramount. Traditional similarity-based metrics such as BLE…
▽ More
This paper describes the DSBA submissions to the Prompting Large Language Models as Explainable Metrics shared task, where systems were submitted to two tracks: small and large summarization tracks. With advanced Large Language Models (LLMs) such as GPT-4, evaluating the quality of Natural Language Generation (NLG) has become increasingly paramount. Traditional similarity-based metrics such as BLEU and ROUGE have shown to misalign with human evaluation and are ill-suited for open-ended generation tasks. To address this issue, we explore the potential capability of LLM-based metrics, especially leveraging open-source LLMs. In this study, wide range of prompts and prompting techniques are systematically analyzed with three approaches: prompting strategy, score aggregation, and explainability. Our research focuses on formulating effective prompt templates, determining the granularity of NLG quality scores and assessing the impact of in-context examples on LLM-based evaluation. Furthermore, three aggregation strategies are compared to identify the most reliable method for aggregating NLG quality scores. To examine explainability, we devise a strategy that generates rationales for the scores and analyzes the characteristics of the explanation produced by the open-source LLMs. Extensive experiments provide insights regarding evaluation capabilities of open-source LLMs and suggest effective prompting strategies.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Mutual Information Maximizing Quantum Generative Adversarial Network and Its Applications in Finance
Authors:
Mingyu Lee,
Myeongjin Shin,
Junseo Lee,
Kabgyun Jeong
Abstract:
One of the most promising applications in the era of NISQ (Noisy Intermediate-Scale Quantum) computing is quantum machine learning. Quantum machine learning offers significant quantum advantages over classical machine learning across various domains. Specifically, generative adversarial networks have been recognized for their potential utility in diverse fields such as image generation, finance, a…
▽ More
One of the most promising applications in the era of NISQ (Noisy Intermediate-Scale Quantum) computing is quantum machine learning. Quantum machine learning offers significant quantum advantages over classical machine learning across various domains. Specifically, generative adversarial networks have been recognized for their potential utility in diverse fields such as image generation, finance, and probability distribution modeling. However, these networks necessitate solutions for inherent challenges like mode collapse. In this study, we capitalize on the concept that the estimation of mutual information between high-dimensional continuous random variables can be achieved through gradient descent using neural networks. We introduce a novel approach named InfoQGAN, which employs the Mutual Information Neural Estimator (MINE) within the framework of quantum generative adversarial networks to tackle the mode collapse issue. Furthermore, we elaborate on how this approach can be applied to a financial scenario, specifically addressing the problem of generating portfolio return distributions through dynamic asset allocation. This illustrates the potential practical applicability of InfoQGAN in real-world contexts.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
The Conflict Between Explainable and Accountable Decision-Making Algorithms
Authors:
Gabriel Lima,
Nina Grgić-Hlača,
Jin Keun Jeong,
Meeyoung Cha
Abstract:
Decision-making algorithms are being used in important decisions, such as who should be enrolled in health care programs and be hired. Even though these systems are currently deployed in high-stakes scenarios, many of them cannot explain their decisions. This limitation has prompted the Explainable Artificial Intelligence (XAI) initiative, which aims to make algorithms explainable to comply with l…
▽ More
Decision-making algorithms are being used in important decisions, such as who should be enrolled in health care programs and be hired. Even though these systems are currently deployed in high-stakes scenarios, many of them cannot explain their decisions. This limitation has prompted the Explainable Artificial Intelligence (XAI) initiative, which aims to make algorithms explainable to comply with legal requirements, promote trust, and maintain accountability. This paper questions whether and to what extent explainability can help solve the responsibility issues posed by autonomous AI systems. We suggest that XAI systems that provide post-hoc explanations could be seen as blameworthy agents, obscuring the responsibility of developers in the decision-making process. Furthermore, we argue that XAI could result in incorrect attributions of responsibility to vulnerable stakeholders, such as those who are subjected to algorithmic decisions (i.e., patients), due to a misguided perception that they have control over explainable algorithms. This conflict between explainability and accountability can be exacerbated if designers choose to use algorithms and patients as moral and legal scapegoats. We conclude with a set of recommendations for how to approach this tension in the socio-technical process of algorithmic decision-making and a defense of hard regulation to prevent designers from escaping responsibility.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Two-Stage Deep Anomaly Detection with Heterogeneous Time Series Data
Authors:
Kyeong-Joong Jeong,
Jin-Duk Park,
Kyusoon Hwang,
Seong-Lyun Kim,
Won-Yong Shin
Abstract:
We introduce a data-driven anomaly detection framework using a manufacturing dataset collected from a factory assembly line. Given heterogeneous time series data consisting of operation cycle signals and sensor signals, we aim at discovering abnormal events. Motivated by our empirical findings that conventional single-stage benchmark approaches may not exhibit satisfactory performance under our ch…
▽ More
We introduce a data-driven anomaly detection framework using a manufacturing dataset collected from a factory assembly line. Given heterogeneous time series data consisting of operation cycle signals and sensor signals, we aim at discovering abnormal events. Motivated by our empirical findings that conventional single-stage benchmark approaches may not exhibit satisfactory performance under our challenging circumstances, we propose a two-stage deep anomaly detection (TDAD) framework in which two different unsupervised learning models are adopted depending on types of signals. In Stage I, we select anomaly candidates by using a model trained by operation cycle signals; in Stage II, we finally detect abnormal events out of the candidates by using another model, which is suitable for taking advantage of temporal continuity, trained by sensor signals. A distinguishable feature of our framework is that operation cycle signals are exploited first to find likely anomalous points, whereas sensor signals are leveraged to filter out unlikely anomalous points afterward. Our experiments comprehensively demonstrate the superiority over single-stage benchmark approaches, the model-agnostic property, and the robustness to difficult situations.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Time-Series Anomaly Detection with Implicit Neural Representation
Authors:
Kyeong-Joong Jeong,
Yong-Min Shin
Abstract:
Detecting anomalies in multivariate time-series data is essential in many real-world applications. Recently, various deep learning-based approaches have shown considerable improvements in time-series anomaly detection. However, existing methods still have several limitations, such as long training time due to their complex model designs or costly tuning procedures to find optimal hyperparameters (…
▽ More
Detecting anomalies in multivariate time-series data is essential in many real-world applications. Recently, various deep learning-based approaches have shown considerable improvements in time-series anomaly detection. However, existing methods still have several limitations, such as long training time due to their complex model designs or costly tuning procedures to find optimal hyperparameters (e.g., sliding window length) for a given dataset. In our paper, we propose a novel method called Implicit Neural Representation-based Anomaly Detection (INRAD). Specifically, we train a simple multi-layer perceptron that takes time as input and outputs corresponding values at that time. Then we utilize the representation error as an anomaly score for detecting anomalies. Experiments on five real-world datasets demonstrate that our proposed method outperforms other state-of-the-art methods in performance, training speed, and robustness.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Segmentation of Shoulder Muscle MRI Using a New Region and Edge based Deep Auto-Encoder
Authors:
Saddam Hussain Khan,
Asifullah Khan,
Yeon Soo Lee,
Mehdi Hassan,
Woong Kyo jeong
Abstract:
Automatic segmentation of shoulder muscle MRI is challenging due to the high variation in muscle size, shape, texture, and spatial position of tears. Manual segmentation of tear and muscle portion is hard, time-consuming, and subjective to pathological expertise. This work proposes a new Region and Edge-based Deep Auto-Encoder (RE-DAE) for shoulder muscle MRI segmentation. The proposed RE-DAE harm…
▽ More
Automatic segmentation of shoulder muscle MRI is challenging due to the high variation in muscle size, shape, texture, and spatial position of tears. Manual segmentation of tear and muscle portion is hard, time-consuming, and subjective to pathological expertise. This work proposes a new Region and Edge-based Deep Auto-Encoder (RE-DAE) for shoulder muscle MRI segmentation. The proposed RE-DAE harmoniously employs average and max-pooling operation in the encoder and decoder blocks of the Convolutional Neural Network (CNN). Region-based segmentation incorporated in the Deep Auto-Encoder (DAE) encourages the network to extract smooth and homogenous regions. In contrast, edge-based segmentation tries to learn the boundary and anatomical information. These two concepts, systematically combined in a DAE, generate a discriminative and sparse hybrid feature space (exploiting both region homogeneity and boundaries). Moreover, the concept of static attention is exploited in the proposed RE-DAE that helps in effectively learning the tear region. The performances of the proposed MRI segmentation based DAE architectures have been tested using a 3D MRI shoulder muscle dataset using the hold-out cross-validation technique. The MRI data has been collected from the Korea University Anam Hospital, Seoul, South Korea. Experimental comparisons have been conducted by employing innovative custom-made and existing pre-trained CNN architectures both using transfer learning and fine-tuning. Objective evaluation on the muscle datasets using the proposed SA-RE-DAE showed a dice similarity of 85.58% and 87.07%, an accuracy of 81.57% and 95.58% for tear and muscle regions, respectively. The high visual quality and the objective result suggest that the proposed SA-RE-DAE is able to correctly segment tear and muscle regions in shoulder muscle MRI for better clinical decisions.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
SiReN: Sign-Aware Recommendation Using Graph Neural Networks
Authors:
Changwon Seo,
Kyeong-Joong Jeong,
Sungsu Lim,
Won-Yong Shin
Abstract:
In recent years, many recommender systems using network embedding (NE) such as graph neural networks (GNNs) have been extensively studied in the sense of improving recommendation accuracy. However, such attempts have focused mostly on utilizing only the information of positive user-item interactions with high ratings. Thus, there is a challenge on how to make use of low rating scores for represent…
▽ More
In recent years, many recommender systems using network embedding (NE) such as graph neural networks (GNNs) have been extensively studied in the sense of improving recommendation accuracy. However, such attempts have focused mostly on utilizing only the information of positive user-item interactions with high ratings. Thus, there is a challenge on how to make use of low rating scores for representing users' preferences since low ratings can be still informative in designing NE-based recommender systems. In this study, we present SiReN, a new sign-aware recommender system based on GNN models. Specifically, SiReN has three key components: 1) constructing a signed bipartite graph for more precisely representing users' preferences, which is split into two edge-disjoint graphs with positive and negative edges each, 2) generating two embeddings for the partitioned graphs with positive and negative edges via a GNN model and a multi-layer perceptron (MLP), respectively, and then using an attention model to obtain the final embeddings, and 3) establishing a sign-aware Bayesian personalized ranking (BPR) loss function in the process of optimization. Through comprehensive experiments, we empirically demonstrate that SiReN consistently outperforms state-of-the-art NE-aided recommendation methods.
△ Less
Submitted 18 May, 2022; v1 submitted 19 August, 2021;
originally announced August 2021.
-
An Algorithm for Reversible Logic Circuit Synthesis Based on Tensor Decomposition
Authors:
Hochang Lee,
Kyung Chul Jeong,
Daewan Han,
Panjin Kim
Abstract:
An algorithm for reversible logic synthesis is proposed. The task is, for a given $n$-bit substitution map $P_n: \{0,1\}^n \rightarrow \{0,1\}^n$, to find a sequence of reversible logic gates that implements the map. The gate library adopted in this work consists of multiple-controlled Toffoli gates denoted by $C^m\!X$, where $m$ is the number of control bits that ranges from 0 to $n-1$. Controlle…
▽ More
An algorithm for reversible logic synthesis is proposed. The task is, for a given $n$-bit substitution map $P_n: \{0,1\}^n \rightarrow \{0,1\}^n$, to find a sequence of reversible logic gates that implements the map. The gate library adopted in this work consists of multiple-controlled Toffoli gates denoted by $C^m\!X$, where $m$ is the number of control bits that ranges from 0 to $n-1$. Controlled gates with large $m \,\,(>2)$ are then further decomposed into $C^0\!X$, $C^1\!X$, and $C^2\!X$ gates. A primary concern in designing the algorithm is to reduce the use of $C^2\!X$ gate (also known as Toffoli gate) which is known to be universal.
The main idea is to view an $n$-bit substitution map as a rank-$2n$ tensor and to transform it such that the resulting map can be written as a tensor product of a rank-($2n-2$) tensor and the $2\times 2$ identity matrix. Let $\mathcal{P}_n$ be a set of all $n$-bit substitution maps. What we try to find is a size reduction map $\mathcal{A}_{\rm red}: \mathcal{P}_n \rightarrow \{P_n: P_n = P_{n-1} \otimes I_2\}$. %, where $I_m$ is the $m\times m$ identity matrix. One can see that the output $P_{n-1} \otimes I_2$ acts nontrivially on $n-1$ bits only, meaning that the map to be synthesized becomes $P_{n-1}$. The size reduction process is iteratively applied until it reaches tensor product of only $2 \times 2$ matrices.
△ Less
Submitted 23 July, 2024; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Graph-Aware Transformer: Is Attention All Graphs Need?
Authors:
Sanghyun Yoo,
Young-Seok Kim,
Kang Hyun Lee,
Kuhwan Jeong,
Junhwi Choi,
Hoshik Lee,
Young Sang Choi
Abstract:
Graphs are the natural data structure to represent relational and structural information in many domains. To cover the broad range of graph-data applications including graph classification as well as graph generation, it is desirable to have a general and flexible model consisting of an encoder and a decoder that can handle graph data. Although the representative encoder-decoder model, Transformer…
▽ More
Graphs are the natural data structure to represent relational and structural information in many domains. To cover the broad range of graph-data applications including graph classification as well as graph generation, it is desirable to have a general and flexible model consisting of an encoder and a decoder that can handle graph data. Although the representative encoder-decoder model, Transformer, shows superior performance in various tasks especially of natural language processing, it is not immediately available for graphs due to their non-sequential characteristics. To tackle this incompatibility, we propose GRaph-Aware Transformer (GRAT), the first Transformer-based model which can encode and decode whole graphs in end-to-end fashion. GRAT is featured with a self-attention mechanism adaptive to the edge information and an auto-regressive decoding mechanism based on the two-path approach consisting of sub-graph encoding path and node-and-edge generation path for each decoding step. We empirically evaluated GRAT on multiple setups including encoder-based tasks such as molecule property predictions on QM9 datasets and encoder-decoder-based tasks such as molecule graph generation in the organic molecule synthesis domain. GRAT has shown very promising results including state-of-the-art performance on 4 regression tasks in QM9 benchmark.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
Time-Space Complexity of Quantum Search Algorithms in Symmetric Cryptanalysis
Authors:
Panjin Kim,
Kyung Chul Jeong,
Daewan Han
Abstract:
Performance of cryptanalytic quantum search algorithms is mainly inferred from query complexity which hides overhead induced by an implementation. To shed light on quantitative complexity analysis removing hidden factors, we provide a framework for estimating time-space complexity, with carefully accounting for characteristics of target cryptographic functions. Processor and circuit parallelizatio…
▽ More
Performance of cryptanalytic quantum search algorithms is mainly inferred from query complexity which hides overhead induced by an implementation. To shed light on quantitative complexity analysis removing hidden factors, we provide a framework for estimating time-space complexity, with carefully accounting for characteristics of target cryptographic functions. Processor and circuit parallelization methods are taken into account, resulting in the time-space trade-offs curves in terms of depth and qubit. The method guides how to rank different circuit designs in order of their efficiency. The framework is applied to representative cryptosystems NIST referred to as a guideline for security parameters, reassessing the security strengths of AES and SHA-2.
△ Less
Submitted 14 May, 2018;
originally announced May 2018.
-
A Cyberinfrastructure-based Approach to Real Time Water Temperature Prediction
Authors:
Jounghyun Lee,
Keun Young Lee,
Karpjoo Jeong,
Meilan Jiang,
Bomchul Kim,
Suntae Hwang
Abstract:
The prediction of water temperature is crucial for aquatic ecosystem studies and management. In this paper, we raise challenging issues in supporting real time water temperature prediction and present a system called WT-Agabus to address those issues. The WT-Agabus system is designed to be a cyberinfrastructure and to support various prediction models in a uniform way. In addition, we present a ne…
▽ More
The prediction of water temperature is crucial for aquatic ecosystem studies and management. In this paper, we raise challenging issues in supporting real time water temperature prediction and present a system called WT-Agabus to address those issues. The WT-Agabus system is designed to be a cyberinfrastructure and to support various prediction models in a uniform way. In addition, we present a neural network-based water temperature prediction model to use only data available online from Korea Meteorological Administration (KMA). In this paper, we also show the current prototype implementation of the WT-Agabus system to support the prediction model
△ Less
Submitted 25 September, 2015;
originally announced September 2015.