-
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Authors:
Eunsu Kim,
Juyoung Suk,
Philhoon Oh,
Haneul Yoo,
James Thorne,
Alice Oh
Abstract:
Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge. Because many existing Korean benchmark datasets are derived from the English counterparts through translation, they often overlook the different cultural contexts. For the few benchmark datasets…
▽ More
Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge. Because many existing Korean benchmark datasets are derived from the English counterparts through translation, they often overlook the different cultural contexts. For the few benchmark datasets that are sourced from Korean data capturing cultural knowledge, only narrow tasks such as bias and hate speech detection are offered. To address this gap, we introduce a benchmark of Cultural and Linguistic Intelligence in Korean (CLIcK), a dataset comprising 1,995 QA pairs. CLIcK sources its data from official Korean exams and textbooks, partitioning the questions into eleven categories under the two main categories of language and culture. For each instance in CLIcK, we provide fine-grained annotation of which cultural and linguistic knowledge is required to answer the question correctly. Using CLIcK, we test 13 language models to assess their performance. Our evaluation uncovers insights into their performances across the categories, as well as the diverse factors affecting their comprehension. CLIcK offers the first large-scale comprehensive Korean-centric analysis of LLMs' proficiency in Korean culture and language.
△ Less
Submitted 4 July, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
eXplainable Bayesian Multi-Perspective Generative Retrieval
Authors:
EuiYul Song,
Philhoon Oh,
Sangryul Kim,
James Thorne
Abstract:
Modern deterministic retrieval pipelines prioritize achieving state-of-the-art performance but often lack interpretability in decision-making. These models face challenges in assessing uncertainty, leading to overconfident predictions. To overcome these limitations, we integrate uncertainty calibration and interpretability into a retrieval pipeline. Specifically, we introduce Bayesian methodologie…
▽ More
Modern deterministic retrieval pipelines prioritize achieving state-of-the-art performance but often lack interpretability in decision-making. These models face challenges in assessing uncertainty, leading to overconfident predictions. To overcome these limitations, we integrate uncertainty calibration and interpretability into a retrieval pipeline. Specifically, we introduce Bayesian methodologies and multi-perspective retrieval to calibrate uncertainty within a retrieval pipeline. We incorporate techniques such as LIME and SHAP to analyze the behavior of a black-box reranker model. The importance scores derived from these explanation methodologies serve as supplementary relevance scores to enhance the base reranker model. We evaluate the resulting performance enhancements achieved through uncertainty calibration and interpretable reranking on Question Answering and Fact Checking tasks. Our methods demonstrate substantial performance improvements across three KILT datasets.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Analysis and Perspectives on the ANA Avatar XPRIZE Competition
Authors:
Kris Hauser,
Eleanor Watson,
Joonbum Bae,
Josh Bankston,
Sven Behnke,
Bill Borgia,
Manuel G. Catalano,
Stefano Dafarra,
Jan B. F. van Erp,
Thomas Ferris,
Jeremy Fishel,
Guy Hoffman,
Serena Ivaldi,
Fumio Kanehiro,
Abderrahmane Kheddar,
Gaelle Lannuzel,
Jacqueline Ford Morie,
Patrick Naughton,
Steve NGuyen,
Paul Oh,
Taskin Padir,
Jim Pippine,
Jaeheung Park,
Daniele Pucci,
Jean Vaz
, et al. (3 additional authors not shown)
Abstract:
The ANA Avatar XPRIZE was a four-year competition to develop a robotic "avatar" system to allow a human operator to sense, communicate, and act in a remote environment as though physically present. The competition featured a unique requirement that judges would operate the avatars after less than one hour of training on the human-machine interfaces, and avatar systems were judged on both objective…
▽ More
The ANA Avatar XPRIZE was a four-year competition to develop a robotic "avatar" system to allow a human operator to sense, communicate, and act in a remote environment as though physically present. The competition featured a unique requirement that judges would operate the avatars after less than one hour of training on the human-machine interfaces, and avatar systems were judged on both objective and subjective scoring metrics. This paper presents a unified summary and analysis of the competition from technical, judging, and organizational perspectives. We study the use of telerobotics technologies and innovations pursued by the competing teams in their avatar systems, and correlate the use of these technologies with judges' task performance and subjective survey ratings. It also summarizes perspectives from team leads, judges, and organizers about the competition's execution and impact to inform the future development of telerobotics and telepresence.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Detrimental Contexts in Open-Domain Question Answering
Authors:
Philhoon Oh,
James Thorne
Abstract:
For knowledge intensive NLP tasks, it has been widely accepted that accessing more information is a contributing factor to improvements in the model's end-to-end performance. However, counter-intuitively, too much context can have a negative impact on the model when evaluated on common question answering (QA) datasets. In this paper, we analyze how passages can have a detrimental effect on retriev…
▽ More
For knowledge intensive NLP tasks, it has been widely accepted that accessing more information is a contributing factor to improvements in the model's end-to-end performance. However, counter-intuitively, too much context can have a negative impact on the model when evaluated on common question answering (QA) datasets. In this paper, we analyze how passages can have a detrimental effect on retrieve-then-read architectures used in question answering. Our empirical evidence indicates that the current read architecture does not fully leverage the retrieved passages and significantly degrades its performance when using the whole passages compared to utilizing subsets of them. Our findings demonstrate that model accuracy can be improved by 10% on two popular QA datasets by filtering out detrimental passages. Additionally, these outcomes are attained by utilizing existing retrieval methods without further training or data. We further highlight the challenges associated with identifying the detrimental passages. First, even with the correct context, the model can make an incorrect prediction, posing a challenge in determining which passages are most influential. Second, evaluation typically considers lexical matching, which is not robust to variations of correct answers. Despite these limitations, our experimental results underscore the pivotal role of identifying and removing these detrimental passages for the context-efficient retrieve-then-read pipeline. Code and data are available at https://github.com/xfactlab/emnlp2023-damaging-retrieval
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Knowledge Corpus Error in Question Answering
Authors:
Yejoon Lee,
Philhoon Oh,
James Thorne
Abstract:
Recent works in open-domain question answering (QA) have explored generating context passages from large language models (LLMs), replacing the traditional retrieval step in the QA pipeline. However, it is not well understood why generated passages can be more effective than retrieved ones. This study revisits the conventional formulation of QA and introduces the concept of knowledge corpus error.…
▽ More
Recent works in open-domain question answering (QA) have explored generating context passages from large language models (LLMs), replacing the traditional retrieval step in the QA pipeline. However, it is not well understood why generated passages can be more effective than retrieved ones. This study revisits the conventional formulation of QA and introduces the concept of knowledge corpus error. This error arises when the knowledge corpus used for retrieval is only a subset of the entire string space, potentially excluding more helpful passages that exist outside the corpus. LLMs may mitigate this shortcoming by generating passages in a larger space. We come up with an experiment of paraphrasing human-annotated gold context using LLMs to observe knowledge corpus error empirically. Our results across three QA benchmarks reveal an increased performance (10% - 13%) when using paraphrased passage, indicating a signal for the existence of knowledge corpus error. Our code is available at https://github.com/xfactlab/emnlp2023-knowledge-corpus-error
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Antenna Selection in Polarization Reconfigurable MIMO (PR-MIMO) Communication Systems
Authors:
Paul S. Oh,
Sean S. Kwon,
Andreas F. Molisch
Abstract:
Adaptation of a wireless system to the polarization state of the propagation channel can improve reliability and throughput. This paper in particular considers polarization reconfigurable multiple input multiple output (PR-MIMO) systems, where both transmitter and receiver can change the (linear) polarization orientation at each element of their antenna arrays. We first introduce joint polarizatio…
▽ More
Adaptation of a wireless system to the polarization state of the propagation channel can improve reliability and throughput. This paper in particular considers polarization reconfigurable multiple input multiple output (PR-MIMO) systems, where both transmitter and receiver can change the (linear) polarization orientation at each element of their antenna arrays. We first introduce joint polarization pre-post coding to maximize bounds on the capacity and the maximum eigenvalue of the channel matrix. For this we first derive approximate closed form equations of optimal polarization vectors at one link end, and then use iterative joint polarization pre-post coding to pursue joint optimal polarization vectors at both link ends. Next we investigate the combination of PR-MIMO with hybrid antenna selection / maximum ratio transmission (PR-HS/MRT), which can achieve a remarkable improvement of channel capacity and symbol error rate (SER). Further, two novel schemes of element wise and global polarization reconfiguration are presented for PR-HS/MRT. Comprehensive simulation results indicate that the proposed schemes provide 3 to 5 dB SNR gain in PR-MIMO spatial multiplexing and approximately 3 dB SNR gain in PRHS/ MRT, with concomitant improvements of channel capacity and SER.
△ Less
Submitted 2 April, 2024; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Economic Theoretic LEO Satellite Coverage Control: An Auction-based Framework
Authors:
Junghyun Kim,
Thong D. Ngo,
Paul S. Oh,
Sean S. -C. Kwon,
Changhee Han,
Joongheon Kim
Abstract:
Recently, ultra-dense low earth orbit (LEO) satelliteconstellation over high-frequency bands has considered as one ofpromising solutions to supply coverage all over the world. Givensatellite constellations, efficient beam coverage schemes should beemployed at satellites to provide seamless services and full-viewcoverage. In LEO systems, hybrid wide and spot beam coverageschemes are generally used,…
▽ More
Recently, ultra-dense low earth orbit (LEO) satelliteconstellation over high-frequency bands has considered as one ofpromising solutions to supply coverage all over the world. Givensatellite constellations, efficient beam coverage schemes should beemployed at satellites to provide seamless services and full-viewcoverage. In LEO systems, hybrid wide and spot beam coverageschemes are generally used, where the LEO provides a widebeam for large area coverage and additional several steering spotbeams for high speed data access. In this given setting, schedulingmultiple spot beams is essentially required. In order to achievethis goal, Vickery-Clarke-Groves (VCG) auction-based trustfulalgorithm is proposed in this paper for scheduling multiple spotbeams for more efficient seamless services and full-view coverage.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.