Search | arXiv e-print repository

Form and meaning co-determine the realization of tone in Taiwan Mandarin spontaneous speech: the case of Tone 3 sandhi

Authors: Yuxin Lu, Yu-Ying Chuang, R. Harald Baayen

Abstract: In Standard Chinese, Tone 3 (the dipping tone) becomes Tone 2 (rising tone) when followed by another Tone 3. Previous studies have noted that this sandhi process may be incomplete, in the sense that the assimilated Tone 3 is still distinct from a true Tone 2. While Mandarin Tone 3 sandhi is widely studied using carefully controlled laboratory speech (Xu, 1997) and more formal registers of Beijing… ▽ More In Standard Chinese, Tone 3 (the dipping tone) becomes Tone 2 (rising tone) when followed by another Tone 3. Previous studies have noted that this sandhi process may be incomplete, in the sense that the assimilated Tone 3 is still distinct from a true Tone 2. While Mandarin Tone 3 sandhi is widely studied using carefully controlled laboratory speech (Xu, 1997) and more formal registers of Beijing Mandarin (Yuan and Chen, 2014), less is known about its realization in spontaneous speech, and about the effect of contextual factors on tonal realization. The present study investigates the pitch contours of two-character words with T2-T3 and T3-T3 tone patterns in spontaneous Taiwan Mandarin conversations. Our analysis makes use of the Generative Additive Mixed Model (GAMM, Wood, 2017) to examine fundamental frequency (f0) contours as a function of normalized time. We consider various factors known to influence pitch contours, including gender, speaking rate, speaker, neighboring tones, word position, bigram probability, and also novel predictors, word and word sense (Chuang et al., 2024). Our analyses revealed that in spontaneous Taiwan Mandarin, T3-T3 words become indistinguishable from T2-T3 words, indicating complete sandhi, once the strong effect of word (or word sense) is taken into account. For our data, the shape of f0 contours is not co-determined by word frequency. In contrast, the effect of word meaning on f0 contours is robust, as strong as the effect of adjacent tones, and is present for both T2-T3 and T3-T3 words. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.13704 [pdf, other]

DHP Benchmark: Are LLMs Good NLG Evaluators?

Authors: Yicheng Wang, Jiayi Yuan, Yu-Neng Chuang, Zhuoer Wang, Yingchi Liu, Mark Cusick, Param Kulkarni, Zhengping Ji, Yasser Ibrahim, Xia Hu

Abstract: Large Language Models (LLMs) are increasingly serving as evaluators in Natural Language Generation (NLG) tasks. However, the capabilities of LLMs in scoring NLG quality remain inadequately explored. Current studies depend on human assessments and simple metrics that fail to capture the discernment of LLMs across diverse NLG tasks. To address this gap, we propose the Discernment of Hierarchical Per… ▽ More Large Language Models (LLMs) are increasingly serving as evaluators in Natural Language Generation (NLG) tasks. However, the capabilities of LLMs in scoring NLG quality remain inadequately explored. Current studies depend on human assessments and simple metrics that fail to capture the discernment of LLMs across diverse NLG tasks. To address this gap, we propose the Discernment of Hierarchical Perturbation (DHP) benchmarking framework, which provides quantitative discernment scores for LLMs utilizing hierarchically perturbed text data and statistical tests to measure the NLG evaluation capabilities of LLMs systematically. We have re-established six evaluation datasets for this benchmark, covering four NLG tasks: Summarization, Story Completion, Question Answering, and Translation. Our comprehensive benchmarking of five major LLM series provides critical insight into their strengths and limitations as NLG evaluators. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.08422 [pdf, other]

Assessing and Enhancing Large Language Models in Rare Disease Question-answering

Authors: Guanchu Wang, Junhao Ran, Ruixiang Tang, Chia-Yuan Chang, Chia-Yuan Chang, Yu-Neng Chuang, Zirui Liu, Vladimir Braverman, Zhandong Liu, Xia Hu

Abstract: Despite the impressive capabilities of Large Language Models (LLMs) in general medical domains, questions remain about their performance in diagnosing rare diseases. To answer this question, we aim to assess the diagnostic performance of LLMs in rare diseases, and explore methods to enhance their effectiveness in this area. In this work, we introduce a rare disease question-answering (ReDis-QA) da… ▽ More Despite the impressive capabilities of Large Language Models (LLMs) in general medical domains, questions remain about their performance in diagnosing rare diseases. To answer this question, we aim to assess the diagnostic performance of LLMs in rare diseases, and explore methods to enhance their effectiveness in this area. In this work, we introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of LLMs in diagnosing rare diseases. Specifically, we collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases. Additionally, we annotated meta-data for each question, facilitating the extraction of subsets specific to any given disease and its property. Based on the ReDis-QA dataset, we benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models. To facilitate retrieval augmentation generation for rare disease diagnosis, we collect the first rare diseases corpus (ReCOP), sourced from the National Organization for Rare Disorders (NORD) database. Specifically, we split the report of each rare disease into multiple chunks, each representing a different property of the disease, including their overview, symptoms, causes, effects, related disorders, diagnosis, and standard therapies. This structure ensures that the information within each chunk aligns consistently with a question. Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%. Moreover, it significantly guides LLMs to generate trustworthy answers and explanations that can be traced back to existing literature. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2407.21050 [pdf]

Artificial Intelligence in Extracting Diagnostic Data from Dental Records

Authors: Yao-Shun Chuang, Chun-Teh Lee, Oluwabunmi Tokede, Guo-Hao Lin, Ryan Brandon, Trung Duong Tran, Xiaoqian Jiang, Muhammad F. Walji

Abstract: This research addresses the issue of missing structured data in dental records by extracting diagnostic information from unstructured text. The updated periodontology classification system's complexity has increased incomplete or missing structured diagnoses. To tackle this, we use advanced AI and NLP methods, leveraging GPT-4 to generate synthetic notes for fine-tuning a RoBERTa model. This signi… ▽ More This research addresses the issue of missing structured data in dental records by extracting diagnostic information from unstructured text. The updated periodontology classification system's complexity has increased incomplete or missing structured diagnoses. To tackle this, we use advanced AI and NLP methods, leveraging GPT-4 to generate synthetic notes for fine-tuning a RoBERTa model. This significantly enhances the model's ability to understand medical and dental language. We evaluated the model using 120 randomly selected clinical notes from two datasets, demonstrating its improved diagnostic extraction accuracy. The results showed high accuracy in diagnosing periodontal status, stage, and grade, with Site 1 scoring 0.99 and Site 2 scoring 0.98. In the subtype category, Site 2 achieved perfect scores, outperforming Site 1. This method enhances extraction accuracy and broadens its use across dental contexts. The study underscores AI and NLP's transformative impact on healthcare delivery and management. Integrating AI and NLP technologies enhances documentation and simplifies administrative tasks by precisely extracting complex clinical information. This approach effectively addresses challenges in dental diagnostics. Using synthetic training data from LLMs optimizes the training process, improving accuracy and efficiency in identifying periodontal diagnoses from clinical notes. This innovative method holds promise for broader healthcare applications, potentially improving patient care quality. △ Less

Submitted 12 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

Comments: 11 pages, 2 tables, 3 figures, under review

arXiv:2407.21017 [pdf, other]

doi 10.1145/3641519.3657519

Matting by Generation

Authors: Zhixiang Wang, Baiang Li, Jian Wang, Yu-Lun Liu, Jinwei Gu, Yung-Yu Chuang, Shin'ichi Satoh

Abstract: This paper introduces an innovative approach for image matting that redefines the traditional regression-based task as a generative modeling challenge. Our method harnesses the capabilities of latent diffusion models, enriched with extensive pre-trained knowledge, to regularize the matting process. We present novel architectural innovations that empower our model to produce mattes with superior re… ▽ More This paper introduces an innovative approach for image matting that redefines the traditional regression-based task as a generative modeling challenge. Our method harnesses the capabilities of latent diffusion models, enriched with extensive pre-trained knowledge, to regularize the matting process. We present novel architectural innovations that empower our model to produce mattes with superior resolution and detail. The proposed method is versatile and can perform both guidance-free and guidance-based image matting, accommodating a variety of additional cues. Our comprehensive evaluation across three benchmark datasets demonstrates the superior performance of our approach, both quantitatively and qualitatively. The results not only reflect our method's robust effectiveness but also highlight its ability to generate visually compelling mattes that approach photorealistic quality. The project page for this paper is available at https://lightchaserx.github.io/matting-by-generation/ △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: SIGGRAPH'24, Project page: https://lightchaserx.github.io/matting-by-generation/

arXiv:2407.16166 [pdf]

Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks

Authors: Yao-Shun Chuang, Atiquer Rahman Sarkar, Noman Mohammed, Xiaoqian Jiang

Abstract: This study examines integrating EHRs and NLP with large language models (LLMs) to improve healthcare data management and patient care. It focuses on using advanced models to create secure, HIPAA-compliant synthetic patient notes for biomedical research. The study used de-identified and re-identified MIMIC III datasets with GPT-3.5, GPT-4, and Mistral 7B to generate synthetic notes. Text generation… ▽ More This study examines integrating EHRs and NLP with large language models (LLMs) to improve healthcare data management and patient care. It focuses on using advanced models to create secure, HIPAA-compliant synthetic patient notes for biomedical research. The study used de-identified and re-identified MIMIC III datasets with GPT-3.5, GPT-4, and Mistral 7B to generate synthetic notes. Text generation employed templates and keyword extraction for contextually relevant notes, with one-shot generation for comparison. Privacy assessment checked PHI occurrence, while text utility was tested using an ICD-9 coding task. Text quality was evaluated with ROUGE and cosine similarity metrics to measure semantic similarity with source notes. Analysis of PHI occurrence and text utility via the ICD-9 coding task showed that the keyword-based method had low risk and good performance. One-shot generation showed the highest PHI exposure and PHI co-occurrence, especially in geographic location and date categories. The Normalized One-shot method achieved the highest classification accuracy. Privacy analysis revealed a critical balance between data utility and privacy protection, influencing future data use and sharing. Re-identified data consistently outperformed de-identified data. This study demonstrates the effectiveness of keyword-based methods in generating privacy-protecting synthetic clinical notes that retain data usability, potentially transforming clinical data-sharing practices. The superior performance of re-identified over de-identified data suggests a shift towards methods that enhance utility and privacy by using dummy PHIs to perplex privacy attacks. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 13 pages, 4 figures, 1 table, 1 supplementary, under review

arXiv:2407.07071 [pdf, other]

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Authors: Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, James Glass

Abstract: When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends… ▽ More When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: The source code is available at https://github.com/voidism/Lookback-Lens

arXiv:2407.01881 [pdf, other]

Spectral evidence for NiPS3 as a Mott-Hubbard insulator

Authors: Yifeng Cao, Nicholas Russo, Qishuo Tan, Xi Ling, Jinghua Guo, Yi-de Chuang, Kevin E. Smith

Abstract: The layered van der Waals trichalcogenide NiPS3 has attracted widespread attention due to its unique optical, magnetic, and electronic properties. The complexity of NiPS3 itself, however, has also led to ongoing debates regarding its characteristics such as the existence of self-doped ligand holes. In this study, X-ray absorption spectroscopy and resonant inelastic X-ray scattering have been appli… ▽ More The layered van der Waals trichalcogenide NiPS3 has attracted widespread attention due to its unique optical, magnetic, and electronic properties. The complexity of NiPS3 itself, however, has also led to ongoing debates regarding its characteristics such as the existence of self-doped ligand holes. In this study, X-ray absorption spectroscopy and resonant inelastic X-ray scattering have been applied to investigate the electronic structure of NiPS3. With the aid of theoretical calculations using the charge-transfer multiplet model, we provide experimental evidence for NiPS3 being a Mott-Hubbard insulator rather than a charge-transfer insulator. Moreover, we explain why some previous XAS studies have concluded that NiPS3 is a charge-transfer insulator by comparing surface and bulk sensitive spectra. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 6 figures

arXiv:2407.01527 [pdf, other]

KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

Authors: Jiayi Yuan, Hongyi Liu, Shaochen, Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye Jin, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

Abstract: Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the… ▽ More Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the growing size of the KV cache and the intrinsic complexity of attending to extended inputs; where multiple schools of efficiency-driven approaches -- such as KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures -- have been proposed to produce efficient yet long context-capable models. Despite these advancements, no existing work has comprehensively benchmarked these methods in a reasonably aligned environment. In this work, we fill this gap by providing a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks. Our work reveals numerous previously unknown phenomena and offers insights -- as well as a friendly workbench -- for the future development of long context-capable LLMs. The source code will be available at https://github.com/henryzhongsc/longctx_bench △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.17232 [pdf, other]

Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks

Authors: Yun-Shiuan Chuang, Zach Studdiford, Krirk Nirunwiroj, Agam Goyal, Vincent V. Frigo, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estima… ▽ More Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estimated a belief network encompassing 18 topics loading on two non-overlapping latent factors. We then seeded LLM-based agents with an opinion on one topic, and assessed the alignment of its expressed opinions on remaining test topics with corresponding human data. Role-playing based on demographic information alone did not align LLM and human opinions, but seeding the agent with a single belief greatly improved alignment for topics related in the belief network, and not for topics outside the network. These results suggest a novel path for human-LLM belief alignment in work seeking to simulate and understand patterns of belief distributions in society. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16008 [pdf, other]

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Authors: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

Abstract: Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between… ▽ More Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences. △ Less

Submitted 3 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: ACL Findings 2024

arXiv:2406.14045 [pdf, other]

Understanding Different Design Choices in Training Large Time Series Models

Authors: Yu-Neng Chuang, Songchen Li, Jiayi Yuan, Guanchu Wang, Kwei-Herng Lai, Leisheng Yu, Sirui Ding, Chia-Yuan Chang, Qiaoyu Tan, Daochen Zha, Xia Hu

Abstract: Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datase… ▽ More Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities, spanning pre-processing techniques, model configurations, and dataset configurations. In this work, we comprehensively analyze these design choices and aim to identify the best practices for training LTSM. Moreover, we propose \emph{time series prompt}, a novel statistical prompting strategy tailored to time series data. Furthermore, based on the observations in our analysis, we introduce \texttt{LTSM-bundle}, which bundles the best design choices we have identified. Empirical results demonstrate that \texttt{LTSM-bundle} achieves superior zero-shot and few-shot performances compared to state-of-the-art LSTMs and traditional TSF methods on benchmark datasets. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.08310 [pdf, other]

GraphFM: A Comprehensive Benchmark for Graph Foundation Model

Authors: Yuhao Xu, Xinqi Liu, Keyu Duan, Yi Fang, Yu-Neng Chuang, Daochen Zha, Qiaoyu Tan

Abstract: Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogeniza… ▽ More Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogenization. The extent of generalization capability on downstream tasks remains unclear. 2) Scalability. It is unknown how effectively these models can scale to large datasets. 3) Efficiency. The training time and memory usage of these models require evaluation. 4) Training Stop Criteria. Determining the optimal stopping strategy for pre-training across multiple tasks to maximize performance on downstream tasks. To address these questions, we have constructed a rigorous benchmark that thoroughly analyzes and studies the generalization and scalability of self-supervised Graph Neural Network (GNN) models. Regarding generalization, we have implemented and compared the performance of various self-supervised GNN models, trained to generate node representations, across tasks such as node classification, link prediction, and node clustering. For scalability, we have compared the performance of various models after training using full-batch and mini-batch strategies. Additionally, we have assessed the training efficiency of these models by conducting experiments to test their GPU memory usage and throughput. Through these experiments, we aim to provide insights to motivate future research. The code for this benchmark is publicly available at https://github.com/NYUSHCS/GraphFM. △ Less

Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.07006 [pdf, other]

Word-specific tonal realizations in Mandarin

Authors: Yu-Ying Chuang, Melanie J. Bell, Yu-Hsiang Tseng, R. Harald Baayen

Abstract: The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' m… ▽ More The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' meanings. We first show, on the basis of a Taiwan corpus of spontaneous conversations, using the generalized additive regression model, and focusing on the rise-fall tone pattern, that after controlling for effects of speaker and context, word type is a stronger predictor of pitch realization than all the previously established word-form related predictors combined. Importantly, the addition of information about meaning in context improves prediction accuracy even further. We then proceed to show, using computational modeling with context-specific word embeddings, that token-specific pitch contours predict word type with 50% accuracy on held-out data, and that context-sensitive, token-specific embeddings can predict the shape of pitch contours with 30% accuracy. These accuracies, which are an order of magnitude above chance level, suggest that the relation between words' pitch contours and their meanings are sufficiently strong to be functional for language users. The theoretical implications of these empirical findings are discussed. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.01880 [pdf, other]

doi 10.1016/j.optcom.2024.130367

Change of polarization degree of light beams on propagation in curved space

Authors: You-Lin Chuang, Himanshu Parihar

Abstract: Even in free space, which is commonly considered of as a flat space-time in most settings, the degree of polarization of a partially spatially coherent light beam changes as it travels. Similarly, the polarization degree would change when a partially spatially coherent light beam propagates in a curved space-time. The difference of the polarization degree between the curved space and flat space ca… ▽ More Even in free space, which is commonly considered of as a flat space-time in most settings, the degree of polarization of a partially spatially coherent light beam changes as it travels. Similarly, the polarization degree would change when a partially spatially coherent light beam propagates in a curved space-time. The difference of the polarization degree between the curved space and flat space can reveal the essential structure of the curved space. In this work, we consider a simplest case of curved space known as Schwarzschild spacetime. We can simulate the Schwarzschild space-time as an optical material with an effective refractive index. The difference of the polarization degree of a light beam propagating in curved space and flat space can be achieved up to $ 5\% $, which is detectable in practical measurement. In addition, we have found that the partially spatially coherent light source is necessary for obtaining significant changes in polarization degree. Our results provide an alternative method to estimate the Schwarzschild radius of a massive object with the optical polarization degree measurement. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 14 pages, 4 figures

Journal ref: Optics Communications 558 (2024) 130367

arXiv:2404.17022 [pdf]

Investigating differences in lab-quality and remote recording methods with dynamic acoustic measures

Authors: Cong Zhang, Kathleen Jepson, Yu-Ying Chuang

Abstract: Increasingly, phonetic research utilizes data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how the individual methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0… ▽ More Increasingly, phonetic research utilizes data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how the individual methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0, intensity, and the first and second formants, comparing files recorded using a laboratory-standard recording method (Zoom H6 Recorder with an external microphone), to three remote recording methods, (1) the Awesome Voice Recorder application on a smartphone (AVR), (2) the Zoom meeting application with default settings (Zoom-default), and (3) the Zoom meeting application with the "Turn on Original Sound" setting (Zoom-raw). A linear temporal alignment issue was observed for the Zoom methods over the course of the long, recording session files. However, the difference was not significant for utterance-length files. F0 was reliably measured using all methods. Intensity and formants presented non-linear differences across methods that could not be corrected for simply. Overall, the AVR files were most similar to the H6's, and so AVR is deemed to be a more reliable recording method than either Zoom-default or Zoom-raw. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.09385 [pdf, other]

A Large-Scale Evaluation of Speech Foundation Models

Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark. △ Less

Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

arXiv:2404.04231 [pdf, other]

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Authors: Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang, Chun-Pei Chen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Yung-Yu Chuang, Yen-Yu Lin

Abstract: This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text a… ▽ More This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text alignment and semantic segmentation: A text often consists of multiple semantic concepts, whereas semantic segmentation strives to create semantically homogeneous segments. To address this issue, we propose a novel framework, Image-Text Co-Decomposition (CoDe), where the paired image and text are jointly decomposed into a set of image regions and a set of word segments, respectively, and contrastive learning is developed to enforce region-word alignment. To work with a vision-language model, we present a prompt learning mechanism that derives an extra representation to highlight an image segment or a word segment of interest, with which more effective features can be extracted from that segment. Comprehensive experimental results demonstrate that our method performs favorably against existing text-supervised semantic segmentation methods on six benchmark datasets. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.02963 [pdf, other]

Unraveling the Mn $L_3$-edge RIXS spectrum of lightly manganese doped Sr$_{3}$Ru$_{2}$O$_{7}$

Authors: Wei-Yang Chen, Shih-Wen Huang, Yi Tseng, Wenliang Zhang, Eugenio Paris, Teguh Citra Asmara, Jenn-Min Lee, Thorsten Schmitt, Yu-Cheng Shao, Yi-De Chuang, Byron Freelon, Dao-Xin Yao, Trinanjan Datta

Abstract: Resonant inelastic x-ray scattering (RIXS) experiment was performed at the Mn $L_3$ edge. A 10 $\%$ Mn-doped Sr$_{3}$Ru$_{2}$O$_{7}$ compound, where the Mn$^{3+}$ ions are in the 3$d^4$ state, were probed for $dd$ excitations. The dilute doping concentration allows one to treat the dopant Mn$^{3+}$ ions as effectively free in the host ruthenium compound. The local nature of $dd$ RIXS spectroscopy… ▽ More Resonant inelastic x-ray scattering (RIXS) experiment was performed at the Mn $L_3$ edge. A 10 $\%$ Mn-doped Sr$_{3}$Ru$_{2}$O$_{7}$ compound, where the Mn$^{3+}$ ions are in the 3$d^4$ state, were probed for $dd$ excitations. The dilute doping concentration allows one to treat the dopant Mn$^{3+}$ ions as effectively free in the host ruthenium compound. The local nature of $dd$ RIXS spectroscopy permits one to use a single-site model to simulate the experimental spectra. The simulated spectra reproduces the in-plane [100] experimental RIXS spectrum. We also predict the intensity for the in-plane [110] direction and the out-of-plane spin orientation configuration [001]. Based on our single-ion model we were able to fit the experimental data to obtain the crystal field parameters, the 10Dq value, and the intra-orbital spin-flip energy 2$\mathcal{J}$(or $3J_{H}$, where $J_{H}$ is the Hund's energy) of the Mn$^{3+}$ ion. Utilizing our computed RIXS quantum transition amplitudes between the various $d$ orbitals of the Mn$^{3+}$ ion, the expression for the Kramers-Heisenberg cross section, and a self-consistent fitting procedure we also identify the energy boundaries of the non-spin-flip and spin-flip $dd$ excitations present in the experimental data. From our fitting procedure we obtain $2\mathcal{J} (3J_{H})=2.06$ eV, a value which is in excellent agreement with that computed from the free ion Racah parameters. We also identified the charge transfer boundary. In addition to predicting the microscopic parameters, we find a quantum spin-flip transition in the non-cross ($σ_{in}-σ_{out}$, $π_{in}-π_{out}$) x-ray polarization channels of the $dd$ RIXS spectra. A similar transition, was previously predicted to occur in the $π-π$ channel of the magnon spectrum in the non-collinear non-coplanar Kagome compound composed of Cu$^{2+}$ 3d$^{9}$ ion. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 12 pages, 7 figures, see PDF text for full abstract info

arXiv:2403.00108 [pdf, other]

LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario

Authors: Hongyi Liu, Zirui Liu, Ruixiang Tang, Jiayi Yuan, Shaochen Zhong, Yu-Neng Chuang, Li Li, Rui Chen, Xia Hu

Abstract: Fine-tuning LLMs is crucial to enhancing their task-specific performance and ensuring model behaviors are aligned with human preferences. Among various fine-tuning methods, LoRA is popular for its efficiency and ease to use, allowing end-users to easily post and adopt lightweight LoRA modules on open-source platforms to tailor their model for different customization. However, such a handy share-an… ▽ More Fine-tuning LLMs is crucial to enhancing their task-specific performance and ensuring model behaviors are aligned with human preferences. Among various fine-tuning methods, LoRA is popular for its efficiency and ease to use, allowing end-users to easily post and adopt lightweight LoRA modules on open-source platforms to tailor their model for different customization. However, such a handy share-and-play setting opens up new attack surfaces, that the attacker can render LoRA as an attacker, such as backdoor injection, and widely distribute the adversarial LoRA to the community easily. This can result in detrimental outcomes. Despite the huge potential risks of sharing LoRA modules, this aspect however has not been fully explored. To fill the gap, in this study we thoroughly investigate the attack opportunities enabled in the growing share-and-play scenario. Specifically, we study how to inject backdoor into the LoRA module and dive deeper into LoRA's infection mechanisms. We found that training-free mechanism is possible in LoRA backdoor injection. We also discover the impact of backdoor attacks with the presence of multiple LoRA adaptions concurrently as well as LoRA based backdoor transferability. Our aim is to raise awareness of the potential risks under the emerging share-and-play scenario, so as to proactively prevent potential consequences caused by LoRA-as-an-Attack. Warning: the paper contains potential offensive content generated by models. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.19464 [pdf, other]

Curiosity-driven Red-teaming for Large Language Models

Authors: Zhang-Wei Hong, Idan Shenfeld, Tsun-Hsuan Wang, Yung-Sung Chuang, Aldo Pareja, James Glass, Akash Srivastava, Pulkit Agrawal

Abstract: Large language models (LLMs) hold great potential for many natural language applications but risk generating incorrect or toxic content. To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i.e., test cases) that elicit undesirable responses from LLMs. However, relying solely on human testers is expensive… ▽ More Large language models (LLMs) hold great potential for many natural language applications but risk generating incorrect or toxic content. To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i.e., test cases) that elicit undesirable responses from LLMs. However, relying solely on human testers is expensive and time-consuming. Recent works automate red teaming by training a separate red team LLM with reinforcement learning (RL) to generate test cases that maximize the chance of eliciting undesirable responses from the target LLM. However, current RL methods are only able to generate a small number of effective test cases resulting in a low coverage of the span of prompts that elicit undesirable responses from the target LLM. To overcome this limitation, we draw a connection between the problem of increasing the coverage of generated test cases and the well-studied approach of curiosity-driven exploration that optimizes for novelty. Our method of curiosity-driven red teaming (CRT) achieves greater coverage of test cases while mantaining or increasing their effectiveness compared to existing methods. Our method, CRT successfully provokes toxic responses from LLaMA2 model that has been heavily fine-tuned using human preferences to avoid toxic outputs. Code is available at \url{https://github.com/Improbable-AI/curiosity_redteam} △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Published at ICLR 2024

arXiv:2402.18700 [pdf, other]

Learning to Compress Prompt in Natural Language Formats

Authors: Yu-Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu

Abstract: Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on com… ▽ More Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on compressing long prompt contexts into soft prompts. However, soft prompt compression encounters limitations in transferability across different LLMs, especially API-based LLMs. To this end, this work aims to compress lengthy prompts in the form of natural language with LLM transferability. This poses two challenges: (i) Natural Language (NL) prompts are incompatible with back-propagation, and (ii) NL prompts lack flexibility in imposing length constraints. In this work, we propose a Natural Language Prompt Encapsulation (Nano-Capsulator) framework compressing original prompts into NL formatted Capsule Prompt while maintaining the prompt utility and transferability. Specifically, to tackle the first challenge, the Nano-Capsulator is optimized by a reward function that interacts with the proposed semantics preserving loss. To address the second question, the Nano-Capsulator is optimized by a reward function featuring length constraints. Experimental results demonstrate that the Capsule Prompt can reduce 81.4% of the original length, decrease inference latency up to 4.5x, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets. △ Less

Submitted 1 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.15515 [pdf]

Feasibility of Identifying Factors Related to Alzheimer's Disease and Related Dementia in Real-World Data

Authors: Aokun Chen, Qian Li, Yu Huang, Yongqiu Li, Yu-neng Chuang, Xia Hu, Serena Guo, Yonghui Wu, Yi Guo, Jiang Bian

Abstract: A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD/ADRD. In total, we extracted 477 risk factors in… ▽ More A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD/ADRD. In total, we extracted 477 risk factors in 10 categories from 537 studies. We constructed an interactive knowledge map to disseminate our study results. Most of the risk factors are accessible from structured Electronic Health Records (EHRs), and clinical narratives show promise as information sources. However, evaluating genomic risk factors using RWD remains a challenge, as genetic testing for AD/ADRD is still not a common practice and is poorly documented in both structured and unstructured EHRs. Considering the constantly evolving research on AD/ADRD risk factors, literature mining via NLP methods offers a solution to automatically update our knowledge map. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.14721 [pdf, other]

Anomalous Giant Superradiance in Molecular Aggregates Coupled to Polaritons

Authors: Yi-Ting Chuang, Liang-Yan Hsu

Abstract: In this study, we unveil an eccentric superradiance phenomenon in molecular aggregates coupled to surface plasmon polaritons. Through the quantization of electromagnetic fields in media, we demonstrate that superradiance can be significantly enhanced by polaritons and its behavior distinguishably surpasses the Dick's $N$ scaling law. To understand the mechanism of this anomalous phenomenon, we der… ▽ More In this study, we unveil an eccentric superradiance phenomenon in molecular aggregates coupled to surface plasmon polaritons. Through the quantization of electromagnetic fields in media, we demonstrate that superradiance can be significantly enhanced by polaritons and its behavior distinguishably surpasses the Dick's $N$ scaling law. To understand the mechanism of this anomalous phenomenon, we derive an analytical expression of the superradiance rate, which is general for molecular aggregates in arbitrary dispersive and absorbing media. Furthermore, we show the importance of intermolecular distance for this extraordinary superradiance. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.13927 [pdf, other]

The Delusional Hedge Algorithm as a Model of Human Learning from Diverse Opinions

Authors: Yun-Shiuan Chuang, Jerry Zhu, Timothy T. Rogers

Abstract: Whereas cognitive models of learning often assume direct experience with both the features of an event and with a true label or outcome, much of everyday learning arises from hearing the opinions of others, without direct access to either the experience or the ground truth outcome. We consider how people can learn which opinions to trust in such scenarios by extending the hedge algorithm: a classi… ▽ More Whereas cognitive models of learning often assume direct experience with both the features of an event and with a true label or outcome, much of everyday learning arises from hearing the opinions of others, without direct access to either the experience or the ground truth outcome. We consider how people can learn which opinions to trust in such scenarios by extending the hedge algorithm: a classic solution for learning from diverse information sources. We first introduce a semi-supervised variant we call the delusional hedge capable of learning from both supervised and unsupervised experiences. In two experiments, we examine the alignment between human judgments and predictions from the standard hedge, the delusional hedge, and a heuristic baseline model. Results indicate that humans effectively incorporate both labeled and unlabeled information in a manner consistent with the delusional hedge algorithm -- suggesting that human learners not only gauge the accuracy of information sources but also their consistency with other reliable sources. The findings advance our understanding of human learning from diverse opinions, with implications for the development of algorithms that better capture how people learn to weigh conflicting information sources. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.05728 [pdf, other]

CTGAN: Semantic-guided Conditional Texture Generator for 3D Shapes

Authors: Yi-Ting Pan, Chai-Rong Lee, Shu-Ho Fan, Jheng-Wei Su, Jia-Bin Huang, Yung-Yu Chuang, Hung-Kuo Chu

Abstract: The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guide… ▽ More The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guided Conditional Texture Generator (CTGAN), producing high-quality textures for 3D shapes that are consistent with the viewing angle while respecting shape semantics. CTGAN utilizes the disentangled nature of StyleGAN to finely manipulate the input latent codes, enabling explicit control over both the style and structure of the generated textures. A coarse-to-fine encoder architecture is introduced to enhance control over the structure of the resulting textures via input segmentation. Experimental results show that CTGAN outperforms existing methods on multiple quality metrics and achieves state-of-the-art performance on texture generation in both conditional and unconditional settings. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.04678 [pdf, other]

FaithLM: Towards Faithful Explanations for Large Language Models

Authors: Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Ruixiang Tang, Shaochen Zhong, Fan Yang, Mengnan Du, Xuanting Cai, Xia Hu

Abstract: Large Language Models (LLMs) have become proficient in addressing complex tasks by leveraging their extensive internal knowledge and reasoning capabilities. However, the black-box nature of these models complicates the task of explaining their decision-making processes. While recent advancements demonstrate the potential of leveraging LLMs to self-explain their predictions through natural language… ▽ More Large Language Models (LLMs) have become proficient in addressing complex tasks by leveraging their extensive internal knowledge and reasoning capabilities. However, the black-box nature of these models complicates the task of explaining their decision-making processes. While recent advancements demonstrate the potential of leveraging LLMs to self-explain their predictions through natural language (NL) explanations, their explanations may not accurately reflect the LLMs' decision-making process due to a lack of fidelity optimization on the derived explanations. Measuring the fidelity of NL explanations is a challenging issue, as it is difficult to manipulate the input context to mask the semantics of these explanations. To this end, we introduce FaithLM to explain the decision of LLMs with NL explanations. Specifically, FaithLM designs a method for evaluating the fidelity of NL explanations by incorporating the contrary explanations to the query process. Moreover, FaithLM conducts an iterative process to improve the fidelity of derived explanations. Experiment results on three datasets from multiple domains demonstrate that FaithLM can significantly improve the fidelity of derived explanations, which also provides a better alignment with the ground-truth explanations. △ Less

Submitted 26 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.00179 [pdf, other]

De-identification is not always enough

Authors: Atiquer Rahman Sarkar, Yao-Shun Chuang, Noman Mohammed, Xiaoqian Jiang

Abstract: For sharing privacy-sensitive data, de-identification is commonly regarded as adequate for safeguarding privacy. Synthetic data is also being considered as a privacy-preserving alternative. Recent successes with numerical and tabular data generative models and the breakthroughs in large generative language models raise the question of whether synthetically generated clinical notes could be a viabl… ▽ More For sharing privacy-sensitive data, de-identification is commonly regarded as adequate for safeguarding privacy. Synthetic data is also being considered as a privacy-preserving alternative. Recent successes with numerical and tabular data generative models and the breakthroughs in large generative language models raise the question of whether synthetically generated clinical notes could be a viable alternative to real notes for research purposes. In this work, we demonstrated that (i) de-identification of real clinical notes does not protect records against a membership inference attack, (ii) proposed a novel approach to generate synthetic clinical notes using the current state-of-the-art large language models, (iii) evaluated the performance of the synthetically generated notes in a clinical domain task, and (iv) proposed a way to mount a membership inference attack where the target model is trained with synthetic data. We observed that when synthetically generated notes closely match the performance of real data, they also exhibit similar privacy concerns to the real data. Whether other approaches to synthetically generated clinical notes could offer better trade-offs and become a better alternative to sensitive real notes warrants further investigation. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.13463 [pdf, other]

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

Authors: Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

Abstract: Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the ans… ▽ More Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered. This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and text dense retriever (TDR). No manually transcribed speech data is needed. Initial experiments showed performance comparable to the cascading model of UASR and TDR, and significantly better when UASR was poor, verifying this approach is more robust to speech recognition errors. △ Less

Submitted 24 August, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted at ICASSP 2024

arXiv:2312.15359 [pdf, other]

TVE: Learning Meta-attribution for Transferable Vision Explainer

Authors: Guanchu Wang, Yu-Neng Chuang, Fan Yang, Mengnan Du, Chia-Yuan Chang, Shaochen Zhong, Zirui Liu, Zhaozhuo Xu, Kaixiong Zhou, Xuanting Cai, Xia Hu

Abstract: Explainable machine learning significantly improves the transparency of deep neural networks. However, existing work is constrained to explaining the behavior of individual model predictions, and lacks the ability to transfer the explanation across various models and tasks. This limitation results in explaining various tasks being time- and resource-consuming. To address this problem, we introduce… ▽ More Explainable machine learning significantly improves the transparency of deep neural networks. However, existing work is constrained to explaining the behavior of individual model predictions, and lacks the ability to transfer the explanation across various models and tasks. This limitation results in explaining various tasks being time- and resource-consuming. To address this problem, we introduce a Transferable Vision Explainer (TVE) that can effectively explain various vision models in downstream tasks. Specifically, the transferability of TVE is realized through a pre-training process on large-scale datasets towards learning the meta-attribution. This meta-attribution leverages the versatility of generic backbone encoders to comprehensively encode the attribution knowledge for the input instance, which enables TVE to seamlessly transfer to explain various downstream tasks, without the need for training on task-specific data. Empirical studies involve explaining three different architectures of vision models across three diverse downstream datasets. The experimental results indicate TVE is effective in explaining these tasks without the need for additional training on downstream data. △ Less

Submitted 15 July, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.13063 [pdf, other]

Microscopic theory of exciton-polariton model involving multiple molecules: Macroscopic quantum electrodynamics formulation and essence of direct intermolecular interactions

Authors: Yi-Ting Chuang, Liang-Yan Hsu

Abstract: Cavity quantum electrodynamics (CQED) and its extensions are widely used for the description of exciton-polariton systems. However, the exciton-polariton models based on CQED vary greatly within different contexts. One of the most significant discrepancies among these CQED models is whether one should include direct intermolecular interactions in the CQED Hamiltonian. To answer this question, in t… ▽ More Cavity quantum electrodynamics (CQED) and its extensions are widely used for the description of exciton-polariton systems. However, the exciton-polariton models based on CQED vary greatly within different contexts. One of the most significant discrepancies among these CQED models is whether one should include direct intermolecular interactions in the CQED Hamiltonian. To answer this question, in this article, we derive an effective dissipative CQED model including free-space dipole-dipole interactions (CQED-DDI) from a microscopic Hamiltonian based on macroscopic quantum electrodynamics. Dissipative CQED-DDI successfully captures the nature of vacuum fluctuations in dielectric media and separates it into the free-space effects and the dielectric-induced effects. The former include spontaneous emissions, dephasings and dipole-dipole interactions in free space; the latter include exciton-polariton interactions and photonic losses due to dielectric media. We apply dissipative CQED-DDI to investigate the exciton-polariton dynamics (the population dynamics of molecules above a plasmonic surface) and compare the results with those based on the methods proposed by several previous studies. We find that direct intermolecular interactions are a crucial element when employing CQED-like models to study exciton-polariton systems involving multiple molecules. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2311.10810 [pdf]

Use GPT-J Prompt Generation with RoBERTa for NER Models on Diagnosis Extraction of Periodontal Diagnosis from Electronic Dental Records

Authors: Yao-Shun Chuang, Xiaoqian Jiang, Chun-Teh Lee, Ryan Brandon, Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji

Abstract: This study explored the usability of prompt generation on named entity recognition (NER) tasks and the performance in different settings of the prompt. The prompt generation by GPT-J models was utilized to directly test the gold standard as well as to generate the seed and further fed to the RoBERTa model with the spaCy package. In the direct test, a lower ratio of negative examples with higher nu… ▽ More This study explored the usability of prompt generation on named entity recognition (NER) tasks and the performance in different settings of the prompt. The prompt generation by GPT-J models was utilized to directly test the gold standard as well as to generate the seed and further fed to the RoBERTa model with the spaCy package. In the direct test, a lower ratio of negative examples with higher numbers of examples in prompt achieved the best results with a F1 score of 0.72. The performance revealed consistency, 0.92-0.97 in the F1 score, in all settings after training with the RoBERTa model. The study highlighted the importance of seed quality rather than quantity in feeding NER models. This research reports on an efficient and accurate way to mine clinical notes for periodontal diagnoses, allowing researchers to easily and quickly build a NER model with the prompt generation approach. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 2023 AMIA Annual Symposium, see https://amia.org/education-events/amia-2023-annual-symposium

arXiv:2311.10809 [pdf]

Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression

Authors: Yao-Shun Chuang, Chun-Teh Lee, Ryan Brandon, Trung Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji, Xiaoqian Jiang

Abstract: This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer mo… ▽ More This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards. The comparison of the RE methods with the gold standard showed that as the complexity increased in the RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation. This study provided an example of the benefit of combining NER methods and NLP models in extracting target information from free-text to structured data and fulfilling the need for missing diagnoses from unstructured notes. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: IEEE ICHI 2023, see https://ieeeichi.github.io/ICHI2023/program.html

arXiv:2311.10127 [pdf, other]

Learning interactions to boost human creativity with bandits and GPT-4

Authors: Ara Vartanian, Xiaoxi Sun, Yun-Shiuan Chuang, Siddharth Suresh, Xiaojin Zhu, Timothy T. Rogers

Abstract: This paper considers how interactions with AI algorithms can boost human creative thought. We employ a psychological task that demonstrates limits on human creativity, namely semantic feature generation: given a concept name, respondents must list as many of its features as possible. Human participants typically produce only a fraction of the features they know before getting "stuck." In experimen… ▽ More This paper considers how interactions with AI algorithms can boost human creative thought. We employ a psychological task that demonstrates limits on human creativity, namely semantic feature generation: given a concept name, respondents must list as many of its features as possible. Human participants typically produce only a fraction of the features they know before getting "stuck." In experiments with humans and with a language AI (GPT-4) we contrast behavior in the standard task versus a variant in which participants can ask for algorithmically-generated hints. Algorithm choice is administered by a multi-armed bandit whose reward indicates whether the hint helped generating more features. Humans and the AI show similar benefits from hints, and remarkably, bandits learning from AI responses prefer the same prompting strategy as those learning from human behavior. The results suggest that strategies for boosting human creativity via computer interactions can be learned by bandits run on groups of simulated participants. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09665 [pdf, other]

The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents

Authors: Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human gro… ▽ More Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human groups. In this paper, we examine the extent to which the wisdom of partisan crowds emerges in groups of LLM-based agents that are prompted to role-play as partisan personas (e.g., Democrat or Republican). We find that they not only display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do. We then identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas. Conversely, fine-tuning on human data appears to enhance convergence. These findings show the potential and limitations of LLM-based agents as a model of human collective intelligence. △ Less

Submitted 16 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09661 [pdf, other]

Evolving Domain Adaptation of Pretrained Language Models for Text Classification

Authors: Yun-Shiuan Chuang, Yi Wu, Dhruv Gupta, Rheeya Uppaal, Ananya Kumar, Luhang Sun, Makesh Narsimhan Sreedhar, Sijia Yang, Timothy T. Rogers, Junjie Hu

Abstract: Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self… ▽ More Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method. Our analysis across various datasets reveals that this incremental method excels at adapting PLMs to EDS, outperforming traditional domain adaptation techniques. These findings highlight the importance of continually updating PLMs to ensure their effectiveness in real-world applications, paving the way for future research into PLM robustness against the natural temporal evolution of language. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09618 [pdf, other]

Simulating Opinion Dynamics with Networks of LLM-based Agents

Authors: Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings re… ▽ More Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings reveal a strong inherent bias in LLM agents towards producing accurate information, leading simulated agents to consensus in line with scientific reality. This bias limits their utility for understanding resistance to consensus views on issues like climate change. After inducing confirmation bias through prompt engineering, however, we observed opinion fragmentation in line with existing agent-based modeling and opinion dynamics research. These insights highlight the promise and limitations of LLM agents in this domain and suggest a path forward: refining LLMs with real-world discourse to better simulate the evolution of human beliefs. △ Less

Submitted 31 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.05477 [pdf, other]

Using ResNet to Utilize 4-class T2-FLAIR Slice Classification Based on the Cholinergic Pathways Hyperintensities Scale for Pathological Aging

Authors: Wei-Chun Kevin Tsai, Yi-Chien Liu, Ming-Chun Yu, Chia-Ju Chou, Sui-Hing Yan, Yang-Teng Fan, Yan-Hsiang Huang, Yen-Ling Chiu, Yi-Fang Chuang, Ran-Zan Wang, Yao-Chia Shih

Abstract: The Cholinergic Pathways Hyperintensities Scale (CHIPS) is a visual rating scale used to assess the extent of cholinergic white matter hyperintensities in T2-FLAIR images, serving as an indicator of dementia severity. However, the manual selection of four specific slices for rating throughout the entire brain is a time-consuming process. Our goal was to develop a deep learning-based model capable… ▽ More The Cholinergic Pathways Hyperintensities Scale (CHIPS) is a visual rating scale used to assess the extent of cholinergic white matter hyperintensities in T2-FLAIR images, serving as an indicator of dementia severity. However, the manual selection of four specific slices for rating throughout the entire brain is a time-consuming process. Our goal was to develop a deep learning-based model capable of automatically identifying the four slices relevant to CHIPS. To achieve this, we trained a 4-class slice classification model (BSCA) using the ADNI T2-FLAIR dataset (N=150) with the assistance of ResNet. Subsequently, we tested the model's performance on a local dataset (N=30). The results demonstrated the efficacy of our model, with an accuracy of 99.82% and an F1-score of 99.83%. This achievement highlights the potential impact of BSCA as an automatic screening tool, streamlining the selection of four specific T2-FLAIR slices that encompass white matter landmarks along the cholinergic pathways. Clinicians can leverage this tool to assess the risk of clinical dementia development efficiently. △ Less

Submitted 11 September, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 8 pages, 2 figures, 2 tables

arXiv:2310.12817 [pdf, other]

2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision

Authors: Cheng-Kun Yang, Min-Hung Chen, Yung-Yu Chuang, Yen-Yu Lin

Abstract: We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D an… ▽ More We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D and 3D feature fusion based on weakly supervised learning is in great demand. To this end, we propose a transformer model with two encoders and one decoder for weakly supervised point cloud segmentation using only scene-level class tags. Specifically, the two encoders compute the self-attended features for 3D point clouds and 2D multi-view images, respectively. The decoder implements interlaced 2D-3D cross-attention and carries out implicit 2D and 3D feature fusion. We alternately switch the roles of queries and key-value pairs in the decoder layers. It turns out that the 2D and 3D features are iteratively enriched by each other. Experiments show that it performs favorably against existing weakly supervised point cloud segmentation methods by a large margin on the S3DIS and ScanNet benchmarks. The project page will be available at https://jimmy15923.github.io/mit_web/. △ Less

Submitted 22 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: ICCV 2023 (main + supp). Website: https://jimmy15923.github.io/mit_web/

arXiv:2310.07654 [pdf, other]

Audio-Visual Neural Syntax Acquisition

Authors: Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve… ▽ More We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without ever being exposed to text. By training on paired images and spoken captions, AV-NSL exhibits the capability to infer meaningful phrase structures that are comparable to those derived by naturally-supervised text parsers, for both English and German. Our findings extend prior work in unsupervised language acquisition from speech and grounded grammar induction, and present one approach to bridge the gap between the two topics. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.03991 [pdf, other]

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

Authors: Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov

Abstract: Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence… ▽ More Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence-level rejection sampling until the sampled sentence falls in watermarked partitions in the semantic embedding space. A margin-based constraint is used to enhance its robustness. To show the advantages of our algorithm, we propose a "bigram" paraphrase attack using the paraphrase that has the fewest bigram overlaps with the original sentence. This attack is shown to be effective against the existing token-level watermarking method. Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation. △ Less

Submitted 22 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Accepted to NAACL 24 Main

arXiv:2310.01508 [pdf, other]

CODA: Temporal Domain Generalization via Concept Drift Simulator

Authors: Chia-Yuan Chang, Yu-Neng Chuang, Zhimeng Jiang, Kwei-Herng Lai, Anxiao Jiang, Na Zou

Abstract: In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized predicti… ▽ More In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data, and (ii) precisely capturing the temporal trends of joint distribution along chronological source domains is computationally infeasible. To tackle the challenges, we propose the COncept Drift simulAtor (CODA) framework incorporating a predicted feature correlation matrix to simulate future data for model training. Specifically, CODA leverages feature correlations to represent data characteristics at specific time points, thereby circumventing the daunting computational costs. Experimental results demonstrate that using CODA-generated data as training input effectively achieves temporal domain generalization across different model architectures. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.12646 [pdf]

Decoding Emotional Experiences in Dyadic Conversations of Married Couples: Leveraging Semantic Similarity through Sentence Embedding

Authors: Chen-Wei Yu, Yun-Shiuan Chuang, Alexandros N. Lotsos, Claudia M. Haase

Abstract: Recent advancements in Natural Language Processing (NLP) have highlighted the potential of sentence embeddings in measuring semantic similarity (hereafter similarity). Yet, whether this approach can be used to analyze real-world dyadic interactions and predict people's emotional experiences in response to these interactions remains largely uncharted. To bridge this gap, the present study analyzes… ▽ More Recent advancements in Natural Language Processing (NLP) have highlighted the potential of sentence embeddings in measuring semantic similarity (hereafter similarity). Yet, whether this approach can be used to analyze real-world dyadic interactions and predict people's emotional experiences in response to these interactions remains largely uncharted. To bridge this gap, the present study analyzes verbal conversations of 50 married couples who engage in naturalistic 10-minute conflict and 10-minute positive conversations. Transformer-based model General Text Embeddings-Large is employed to obtain the embeddings of the utterances from each speaker. The overall similarity of the conversations is then quantified by the average cosine similarity between the embeddings of adjacent utterances. Results show that lower similarity is associated with greater positive emotional experiences in the positive (but not conflict) conversation. Follow-up analyses show that (a) findings remain stable when controlling for marital satisfaction and the number of utterance pairs and (b) the similarity measure is valid in capturing critical features of a dyadic conversation. The present study underscores the potency of sentence embeddings in understanding links between interpersonal dynamics and individual emotional experiences, paving the way for innovative applications of NLP tools in affective and relationship science. △ Less

Submitted 25 February, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.10814 [pdf, other]

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

Authors: Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass

Abstract: How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define funct… ▽ More How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output. Despite using a task-general prompt, we find that this approach can improve upon strong baselines across a range of different tasks including math and symbolic reasoning, text classification, question answering, and instruction following. We found that the generated programs are interpretable since they outline the exact reasoning process followed by the program interpreter. △ Less

Submitted 28 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: NAACL 2024

arXiv:2309.03883 [pdf, other]

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Authors: Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He

Abstract: Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distributi… ▽ More Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLa consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving the performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts. △ Less

Submitted 10 March, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: ICLR 2024 main conference paper. The source code is available at https://github.com/voidism/DoLa

arXiv:2309.01808 [pdf, other]

DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research

Authors: Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Kwei-Herng Lai, Daochen Zha, Ruixiang Tang, Fan Yang, Alfredo Costilla Reyes, Kaixiong Zhou, Xiaoqian Jiang, Xia Hu

Abstract: The exponential growth in scholarly publications necessitates advanced tools for efficient article retrieval, especially in interdisciplinary fields where diverse terminologies are used to describe similar research. Traditional keyword-based search engines often fall short in assisting users who may not be familiar with specific terminologies. To address this, we present a knowledge graph-based pa… ▽ More The exponential growth in scholarly publications necessitates advanced tools for efficient article retrieval, especially in interdisciplinary fields where diverse terminologies are used to describe similar research. Traditional keyword-based search engines often fall short in assisting users who may not be familiar with specific terminologies. To address this, we present a knowledge graph-based paper search engine for biomedical research to enhance the user experience in discovering relevant queries and articles. The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG. To reduce information overload, DiscoverPath presents users with a focused subgraph containing the queried entity and its neighboring nodes and incorporates a query recommendation system, enabling users to iteratively refine their queries. The system is equipped with an accessible Graphical User Interface that provides an intuitive visualization of the KG, query recommendations, and detailed article information, enabling efficient article retrieval, thus fostering interdisciplinary knowledge exploration. DiscoverPath is open-sourced at https://github.com/ynchuang/DiscoverPath. △ Less

Submitted 10 October, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

arXiv:2307.15331 [pdf, other]

Tutorials on Stance Detection using Pre-trained Language Models: Fine-tuning BERT and Prompting Large Language Models

Authors: Yun-Shiuan Chuang

Abstract: This paper presents two self-contained tutorials on stance detection in Twitter data using BERT fine-tuning and prompting large language models (LLMs). The first tutorial explains BERT architecture and tokenization, guiding users through training, tuning, and evaluating standard and domain-specific BERT models with HuggingFace transformers. The second focuses on constructing prompts and few-shot e… ▽ More This paper presents two self-contained tutorials on stance detection in Twitter data using BERT fine-tuning and prompting large language models (LLMs). The first tutorial explains BERT architecture and tokenization, guiding users through training, tuning, and evaluating standard and domain-specific BERT models with HuggingFace transformers. The second focuses on constructing prompts and few-shot examples to elicit stances from ChatGPT and open-source FLAN-T5 without fine-tuning. Various prompting strategies are implemented and evaluated using confusion matrices and macro F1 scores. The tutorials provide code, visualizations, and insights revealing the strengths of few-shot ChatGPT and FLAN-T5 which outperform fine-tuned BERTs. By covering both model fine-tuning and prompting-based techniques in an accessible, hands-on manner, these tutorials enable learners to gain applied experience with cutting-edge methods for stance detection. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.14645 [pdf, other]

doi 10.1103/PhysRevA.109.013717

Quantum dynamics of molecular ensembles coupled with quantum light: Counter-rotating interactions as an essential component

Authors: Yi-Ting Chuang, Liang-Yan Hsu

Abstract: The rotating-wave approximation to light-matter interactions is widely used in the quantum electrodynamics Hamiltonian; however, its validity has long been a matter of debate. In this article, we explore the impact of the rotating-wave approximation on the quantum dynamics of multiple molecules in complex dielectric environments within the framework of macroscopic quantum electrodynamics. In gener… ▽ More The rotating-wave approximation to light-matter interactions is widely used in the quantum electrodynamics Hamiltonian; however, its validity has long been a matter of debate. In this article, we explore the impact of the rotating-wave approximation on the quantum dynamics of multiple molecules in complex dielectric environments within the framework of macroscopic quantum electrodynamics. In general, we find that the energy shifts of the molecules and the inter-molecule dipole-dipole interaction obtained in the weak coupling regime are correct only when the counter-rotating interactions are considered. Moreover, under the rotating-wave approximation, the energy shifts of the ground-state molecules and a portion of the inter-molecule interaction are discarded. Notably, in the near-field zone (short inter-molecular distance), the reduction of inter-molecule interaction can reach up to 50 percent. We also conduct a case study on the population dynamics of a pair of identical molecules above a plasmonic surface. Through analytical and numerical analysis, it is revealed that the rotating-wave approximation can profoundly affect the dynamics of the molecules in both strong and weak coupling regimes, emphasizing the need for careful consideration when making the rotating-wave approximation in a multiple-molecule system coupled with quantum light. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Journal ref: Phys. Rev. A 109, 013717 (2024)

arXiv:2307.07181 [pdf, other]

DISPEL: Domain Generalization via Domain-Specific Liberating

Authors: Chia-Yuan Chang, Yu-Neng Chuang, Guanchu Wang, Mengnan Du, Na Zou

Abstract: Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categor… ▽ More Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categorizing underlying feature groups into domain-shared and domain-specific features. Nevertheless, the domain-specific features are difficult to be identified and distinguished from the input data. In this work, we propose DomaIn-SPEcific Liberating (DISPEL), a post-processing fine-grained masking approach that can filter out undefined and indistinguishable domain-specific features in the embedding space. Specifically, DISPEL utilizes a mask generator that produces a unique mask for each input data to filter domain-specific features. The DISPEL framework is highly flexible to be applied to any fine-tuned models. We derive a generalization error bound to guarantee the generalization performance by optimizing a designed objective loss. The experimental results on five benchmarks demonstrate DISPEL outperforms existing methods and can further generalize various algorithms. △ Less

Submitted 31 July, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.06486 [pdf, ps, other]

doi 10.1038/s41563-024-01797-0

Absence of $3a_0$ Charge Density Wave Order in the Infinite Layer Nickelates

Authors: C. T. Parzyck, N. K. Gupta, Y. Wu, V. Anil, L. Bhatt, M. Bouliane, R. Gong, B. Z. Gregory, A. Luo, R. Sutarto, F. He, Y. -D. Chuang, T. Zhou, G. Herranz, L. F. Kourkoutis, A. Singer, D. G. Schlom, D. G. Hawthorn, K. M. Shen

Abstract: A hallmark of many unconventional superconductors is the presence of many-body interactions which give rise to broken symmetry states intertwined with superconductivity. Recent resonant soft x-ray scattering experiments report commensurate $3a_0$ charge density wave order in the infinite layer nickelates, which has important implications regarding the universal interplay between charge order and s… ▽ More A hallmark of many unconventional superconductors is the presence of many-body interactions which give rise to broken symmetry states intertwined with superconductivity. Recent resonant soft x-ray scattering experiments report commensurate $3a_0$ charge density wave order in the infinite layer nickelates, which has important implications regarding the universal interplay between charge order and superconductivity in both the cuprates and nickelates. Here, we present x-ray scattering and spectroscopy measurements on a series of NdNiO$_{2+x}$ samples which reveal that the signatures of charge density wave order are absent in fully reduced, single-phase NdNiO$_2$. The $3a_0$ superlattice peak instead originates from a partially reduced impurity phase where excess apical oxygens form ordered rows with 3 unit cell periodicity. The absence of any observable charge density wave order in NdNiO$_2$ highlights a crucial difference between the phase diagrams of the cuprate and nickelate superconductors. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Main Text: 8 pages, 4 figures. Supplemental: 12 pages, 12 figures

Showing 1–50 of 236 results for author: Chuang, Y