Search | arXiv e-print repository

Pseudorandomness, symmetry, smoothing: II

Authors: Harm Derksen, Peter Ivanov, Chin Ho Lee, Emanuele Viola

Abstract: We prove several new results on the Hamming weight of bounded uniform and small-bias distributions. We exhibit bounded-uniform distributions whose weight is anti-concentrated, matching existing concentration inequalities. This construction relies on a recent result in approximation theory due to Erdéyi (Acta Arithmetica 2016). In particular, we match the classical tail bounds, generalizing a res… ▽ More We prove several new results on the Hamming weight of bounded uniform and small-bias distributions. We exhibit bounded-uniform distributions whose weight is anti-concentrated, matching existing concentration inequalities. This construction relies on a recent result in approximation theory due to Erdéyi (Acta Arithmetica 2016). In particular, we match the classical tail bounds, generalizing a result by Bun and Steinke (RANDOM 2015). Also, we improve on a construction by Benjamini, Gurel-Gurevich, and Peled (2012). We give a generic transformation that converts any bounded uniform distribution to a small-bias distribution that almost preserves its weight distribution. Applying this transformation in conjunction with the above results and others, we construct small-bias distributions with various weight restrictions. In particular, we match the concentration that follows from that of bounded uniformity and the generic closeness of small-bias and bounded-uniform distributions, answering a question by Bun and Steinke (RANDOM 2015). Moreover, these distributions are supported on only a constant number of Hamming weights. We further extend the anti-concentration constructions to small-bias distributions perturbed with noise, a class that has received much attention recently in derandomization. Our results imply (but are not implied by) a recent result of the authors (CCC 2024), and are based on different techniques. In particular, we prove that the standard Gaussian distribution is far from any mixture of Gaussians with bounded variance. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11439 [pdf, other]

Repurformer: Transformers for Repurposing-Aware Molecule Generation

Authors: Changhun Lee, Gyumin Lee

Abstract: Generating as diverse molecules as possible with desired properties is crucial for drug discovery research, which invokes many approaches based on deep generative models today. Despite recent advancements in these models, particularly in variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, and diffusion models, a significant challenge known as \textit{the sample b… ▽ More Generating as diverse molecules as possible with desired properties is crucial for drug discovery research, which invokes many approaches based on deep generative models today. Despite recent advancements in these models, particularly in variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, and diffusion models, a significant challenge known as \textit{the sample bias problem} remains. This problem occurs when generated molecules targeting the same protein tend to be structurally similar, reducing the diversity of generation. To address this, we propose leveraging multi-hop relationships among proteins and compounds. Our model, Repurformer, integrates bi-directional pretraining with Fast Fourier Transform (FFT) and low-pass filtering (LPF) to capture complex interactions and generate diverse molecules. A series of experiments on BindingDB dataset confirm that Repurformer successfully creates substitutes for anchor compounds that resemble positive compounds, increasing diversity between the anchor and generated compounds. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 12 pages, 8 figures, conference

arXiv:2407.11177 [pdf, ps, other]

Trace reconstruction from local statistical queries

Authors: Xi Chen, Anindya De, Chin Ho Lee, Rocco A. Servedio

Abstract: The goal of trace reconstruction is to reconstruct an unknown $n$-bit string $x$ given only independent random traces of $x$, where a random trace of $x$ is obtained by passing $x$ through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of $x$ rather than individual… ▽ More The goal of trace reconstruction is to reconstruct an unknown $n$-bit string $x$ given only independent random traces of $x$, where a random trace of $x$ is obtained by passing $x$ through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of $x$ rather than individual traces themselves. Such an algorithm is said to be $\ell$-local if each of its statistical queries corresponds to an $\ell$-junta function over some block of $\ell$ consecutive bits in the trace. Since several -- but not all -- known algorithms for trace reconstruction fall under the local statistical query paradigm, it is interesting to understand the abilities and limitations of local SQ algorithms for trace reconstruction. In this paper we establish nearly-matching upper and lower bounds on local Statistical Query algorithms for both worst-case and average-case trace reconstruction. For the worst-case problem, we show that there is an $\tilde{O}(n^{1/5})$-local SQ algorithm that makes all its queries with tolerance $τ\geq 2^{-\tilde{O}(n^{1/5})}$, and also that any $\tilde{O}(n^{1/5})$-local SQ algorithm must make some query with tolerance $τ\leq 2^{-\tildeΩ(n^{1/5})}$. For the average-case problem, we show that there is an $O(\log n)$-local SQ algorithm that makes all its queries with tolerance $τ\geq 1/\mathrm{poly}(n)$, and also that any $O(\log n)$-local SQ algorithm must make some query with tolerance $τ\leq 1/\mathrm{poly}(n).$ △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: RANDOM 2024

arXiv:2407.10603 [pdf, other]

Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data

Authors: Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee, Tsung-Ren Huang, Hung-yi Lee

Abstract: Recent advances in automatic speech recognition (ASR) often rely on large speech foundation models for generating high-quality transcriptions. However, these models can be impractical due to limited computing resources. The situation is even more severe in terms of more realistic or difficult scenarios, such as code-switching ASR (CS-ASR). To address this, we present a framework for developing mor… ▽ More Recent advances in automatic speech recognition (ASR) often rely on large speech foundation models for generating high-quality transcriptions. However, these models can be impractical due to limited computing resources. The situation is even more severe in terms of more realistic or difficult scenarios, such as code-switching ASR (CS-ASR). To address this, we present a framework for developing more efficient models for CS-ASR through knowledge distillation using realistic speech-only data. Our proposed method, Leave No Knowledge Behind During Knowledge Distillation (K$^2$D), leverages both the teacher model's knowledge and additional insights from a small auxiliary model. We evaluate our approach on two in-domain and two out-domain datasets, demonstrating that K$^2$D is effective. By conducting K$^2$D on the unlabeled realistic data, we have successfully obtained a 2-time smaller model with 5-time faster generation speed while outperforming the baseline methods and the teacher model on all the testing sets. We have made our model publicly available on Hugging Face (https://huggingface.co/andybi7676/k2d-whisper.zh-en). △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10180 [pdf, other]

Defending Against Repetitive-based Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

Authors: Cheng-Yi Lee, Ching-Chia Kao, Cheng-Han Yeh, Chun-Shien Lu, Chia-Mu Yu, Chu-Song Chen

Abstract: Semi-supervised learning (SSL) has achieved remarkable performance with a small fraction of labeled data by leveraging vast amounts of unlabeled data from the Internet. However, this large pool of untrusted data is extremely vulnerable to data poisoning, leading to potential backdoor attacks. Current backdoor defenses are not yet effective against such a vulnerability in SSL. In this study, we pro… ▽ More Semi-supervised learning (SSL) has achieved remarkable performance with a small fraction of labeled data by leveraging vast amounts of unlabeled data from the Internet. However, this large pool of untrusted data is extremely vulnerable to data poisoning, leading to potential backdoor attacks. Current backdoor defenses are not yet effective against such a vulnerability in SSL. In this study, we propose a novel method, Unlabeled Data Purification (UPure), to disrupt the association between trigger patterns and target classes by introducing perturbations in the frequency domain. By leveraging the Rate- Distortion-Perception (RDP) trade-off, we further identify the frequency band, where the perturbations are added, and justify this selection. Notably, UPure purifies poisoned unlabeled data without the need of extra clean labeled data. Extensive experiments on four benchmark datasets and five SSL algorithms demonstrate that UPure effectively reduces the attack success rate from 99.78% to 0% while maintaining model accuracy △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: under review

arXiv:2407.09189 [pdf, other]

doi 10.1016/j.neunet.2024.106367

Segmenting Medical Images with Limited Data

Authors: Zhaoshan Liua, Qiujie Lv, Chau Hung Lee, Lei Shen

Abstract: While computer vision has proven valuable for medical image segmentation, its application faces challenges such as limited dataset sizes and the complexity of effectively leveraging unlabeled images. To address these challenges, we present a novel semi-supervised, consistency-based approach termed the data-efficient medical segmenter (DEMS). The DEMS features an encoder-decoder architecture and in… ▽ More While computer vision has proven valuable for medical image segmentation, its application faces challenges such as limited dataset sizes and the complexity of effectively leveraging unlabeled images. To address these challenges, we present a novel semi-supervised, consistency-based approach termed the data-efficient medical segmenter (DEMS). The DEMS features an encoder-decoder architecture and incorporates the developed online automatic augmenter (OAA) and residual robustness enhancement (RRE) blocks. The OAA augments input data with various image transformations, thereby diversifying the dataset to improve the generalization ability. The RRE enriches feature diversity and introduces perturbations to create varied inputs for different decoders, thereby providing enhanced variability. Moreover, we introduce a sensitive loss to further enhance consistency across different decoders and stabilize the training process. Extensive experimental results on both our own and three public datasets affirm the effectiveness of DEMS. Under extreme data shortage scenarios, our DEMS achieves 16.85\% and 10.37\% improvement in dice score compared with the U-Net and top-performed state-of-the-art method, respectively. Given its superior data efficiency, DEMS could present significant advancements in medical segmentation under small data regimes. The project homepage can be accessed at https://github.com/NUS-Tim/DEMS. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Neural Networks Accepted

arXiv:2407.08503 [pdf, other]

DIOR-ViT: Differential Ordinal Learning Vision Transformer for Cancer Classification in Pathology Images

Authors: Ju Cheon Lee, Keunho Byeon, Boram Song, Kyungeun Kim, Jin Tae Kwak

Abstract: In computational pathology, cancer grading has been mainly studied as a categorical classification problem, which does not utilize the ordering nature of cancer grades such as the higher the grade is, the worse the cancer is. To incorporate the ordering relationship among cancer grades, we introduce a differential ordinal learning problem in which we define and learn the degree of difference in th… ▽ More In computational pathology, cancer grading has been mainly studied as a categorical classification problem, which does not utilize the ordering nature of cancer grades such as the higher the grade is, the worse the cancer is. To incorporate the ordering relationship among cancer grades, we introduce a differential ordinal learning problem in which we define and learn the degree of difference in the categorical class labels between pairs of samples by using their differences in the feature space. To this end, we propose a transformer-based neural network that simultaneously conducts both categorical classification and differential ordinal classification for cancer grading. We also propose a tailored loss function for differential ordinal learning. Evaluating the proposed method on three different types of cancer datasets, we demonstrate that the adoption of differential ordinal learning can improve the accuracy and reliability of cancer grading, outperforming conventional cancer grading approaches. The proposed approach should be applicable to other diseases and problems as they involve ordinal relationship among class labels. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.08223 [pdf, other]

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Authors: Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

Abstract: Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Specul… ▽ More Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Speculative RAG - a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that Speculative RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2407.06174 [pdf, other]

The Tug-of-War Between Deepfake Generation and Detection

Authors: Hannah Lee, Changyeon Lee, Kevin Farhat, Lin Qiu, Steve Geluso, Aerin Kim, Oren Etzioni

Abstract: Multimodal generative models are rapidly evolving, leading to a surge in the generation of realistic video and audio that offers exciting possibilities but also serious risks. Deepfake videos, which can convincingly impersonate individuals, have particularly garnered attention due to their potential misuse in spreading misinformation and creating fraudulent content. This survey paper examines the… ▽ More Multimodal generative models are rapidly evolving, leading to a surge in the generation of realistic video and audio that offers exciting possibilities but also serious risks. Deepfake videos, which can convincingly impersonate individuals, have particularly garnered attention due to their potential misuse in spreading misinformation and creating fraudulent content. This survey paper examines the dual landscape of deepfake video generation and detection, emphasizing the need for effective countermeasures against potential abuses. We provide a comprehensive overview of current deepfake generation techniques, including face swapping, reenactment, and audio-driven animation, which leverage cutting-edge technologies like generative adversarial networks and diffusion models to produce highly realistic fake videos. Additionally, we analyze various detection approaches designed to differentiate authentic from altered videos, from detecting visual artifacts to deploying advanced algorithms that pinpoint inconsistencies across video and audio signals. The effectiveness of these detection methods heavily relies on the diversity and quality of datasets used for training and evaluation. We discuss the evolution of deepfake datasets, highlighting the importance of robust, diverse, and frequently updated collections to enhance the detection accuracy and generalizability. As deepfakes become increasingly indistinguishable from authentic content, developing advanced detection techniques that can keep pace with generation technologies is crucial. We advocate for a proactive approach in the "tug-of-war" between deepfake creators and detectors, emphasizing the need for continuous research collaboration, standardization of evaluation metrics, and the creation of comprehensive benchmarks. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.04966 [pdf, other]

A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition

Authors: Shreya G. Upadhyay, Carlos Busso, Chi-Chun Lee

Abstract: Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications. While recent SER research relies heavily on large pretrained models for emotion training, existing studies often concentrate solely on the final transformer layer of these models. However, given the task-specific nature and hierarchical architecture of these models, each transformer layer encapsu… ▽ More Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications. While recent SER research relies heavily on large pretrained models for emotion training, existing studies often concentrate solely on the final transformer layer of these models. However, given the task-specific nature and hierarchical architecture of these models, each transformer layer encapsulates different levels of information. Leveraging this hierarchical structure, our study focuses on the information embedded across different layers. Through an examination of layer feature similarity across different languages, we propose a novel strategy called a layer-anchoring mechanism to facilitate emotion transfer in cross-lingual SER tasks. Our approach is evaluated using two distinct language affective corpora (MSP-Podcast and BIIC-Podcast), achieving a best UAR performance of 60.21% on the BIIC-podcast corpus. The analysis uncovers interesting insights into the behavior of popular pretrained models. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.03741 [pdf, other]

A Unified Expression for Upper Bounds on the BLER of Spinal Codes over Fading Channels

Authors: Aimin Li, Xiaomeng Chen, Shaohua Wu, Gary C. F. Lee, Sumei Sun

Abstract: Performance evaluation of particular channel coding has been a significant topic in coding theory, often involving the use of bounding techniques. This paper focuses on the new family of capacity-achieving codes, Spinal codes, to provide a comprehensive analysis framework to tightly upper bound the block error rate (BLER) of Spinal codes in the finite block length (FBL) regime. First, we resort to… ▽ More Performance evaluation of particular channel coding has been a significant topic in coding theory, often involving the use of bounding techniques. This paper focuses on the new family of capacity-achieving codes, Spinal codes, to provide a comprehensive analysis framework to tightly upper bound the block error rate (BLER) of Spinal codes in the finite block length (FBL) regime. First, we resort to a variant of the Gallager random coding bound to upper bound the BLER of Spinal codes over the fading channel. Then, this paper derives a new bound without resorting to the use of Gallager random coding bound, achieving provable tightness over the wide range of signal-to-noise ratios (SNR). The derived BLER upper bounds in this paper are generalized, facilitating the performance evaluations of Spinal codes over different types of fast fading channels. Over the Rayleigh, Nakagami-m, and Rician fading channels, this paper explicitly derived the BLER upper bounds on Spinal codes as case studies. Based on the bounds, we theoretically reveal that the tail transmission pattern (TTP) for ML-decoded Spinal codes remains optimal in terms of reliability performance. Simulations verify the tightness of the bounds and the insights obtained. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02681 [pdf, other]

Uniform Transformation: Refining Latent Representation in Variational Autoencoders

Authors: Ye Shi, C. S. George Lee

Abstract: Irregular distribution in latent space causes posterior collapse, misalignment between posterior and prior, and ill-sampling problem in Variational Autoencoders (VAEs). In this paper, we introduce a novel adaptable three-stage Uniform Transformation (UT) module -- Gaussian Kernel Density Estimation (G-KDE) clustering, non-parametric Gaussian Mixture (GM) Modeling, and Probability Integral Transfor… ▽ More Irregular distribution in latent space causes posterior collapse, misalignment between posterior and prior, and ill-sampling problem in Variational Autoencoders (VAEs). In this paper, we introduce a novel adaptable three-stage Uniform Transformation (UT) module -- Gaussian Kernel Density Estimation (G-KDE) clustering, non-parametric Gaussian Mixture (GM) Modeling, and Probability Integral Transform (PIT) -- to address irregular latent distributions. By reconfiguring irregular distributions into a uniform distribution in the latent space, our approach significantly enhances the disentanglement and interpretability of latent representations, overcoming the limitation of traditional VAE models in capturing complex data structures. Empirical evaluations demonstrated the efficacy of our proposed UT module in improving disentanglement metrics across benchmark datasets -- dSprites and MNIST. Our findings suggest a promising direction for advancing representation learning techniques, with implication for future research in extending this framework to more sophisticated datasets and downstream tasks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted by 2024 IEEE 20th International Conference on Automation Science and Engineering

arXiv:2407.01911 [pdf, other]

Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model

Authors: Yu-Kuan Fu, Cheng-Kuang Lee, Hsiu-Hsuan Wang, Hung-yi Lee

Abstract: Recent efforts in Spoken Dialogue Modeling aim to synthesize spoken dialogue without the need for direct transcription, thereby preserving the wealth of non-textual information inherent in speech. However, this approach faces a challenge when speakers talk simultaneously, requiring stereo dialogue data with speakers recorded on separate channels, a notably scarce resource. To address this, we have… ▽ More Recent efforts in Spoken Dialogue Modeling aim to synthesize spoken dialogue without the need for direct transcription, thereby preserving the wealth of non-textual information inherent in speech. However, this approach faces a challenge when speakers talk simultaneously, requiring stereo dialogue data with speakers recorded on separate channels, a notably scarce resource. To address this, we have developed an innovative pipeline capable of transforming single-channel dialogue data into pseudo-stereo data. This expanded our training dataset from a mere 2,000 to an impressive 17,600 hours, significantly enriching the diversity and quality of the training examples available. The inclusion of this pseudo-stereo data has proven to be effective in improving the performance of spoken dialogue language models. Additionally, we explored the use of discrete units of different speech foundation models for spoken dialogue generation. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: submitted to interspeech 2024

arXiv:2407.00556 [pdf, other]

Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction

Authors: Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Yu Jian, Chi-Han Tsai

Abstract: Social media popularity (SMP) prediction is a complex task involving multi-modal data integration. While pre-trained vision-language models (VLMs) like CLIP have been widely adopted for this task, their effectiveness in capturing the unique characteristics of social media content remains unexplored. This paper critically examines the applicability of CLIP-based features in SMP prediction, focusing… ▽ More Social media popularity (SMP) prediction is a complex task involving multi-modal data integration. While pre-trained vision-language models (VLMs) like CLIP have been widely adopted for this task, their effectiveness in capturing the unique characteristics of social media content remains unexplored. This paper critically examines the applicability of CLIP-based features in SMP prediction, focusing on the overlooked phenomenon of semantic inconsistency between images and text in social media posts. Through extensive analysis, we demonstrate that this inconsistency increases with post popularity, challenging the conventional use of VLM features. We provide a comprehensive investigation of semantic inconsistency across different popularity intervals and analyze the impact of VLM feature adaptation on SMP tasks. Our experiments reveal that incorporating inconsistency measures and adapted text features significantly improves model performance, achieving an SRC of 0.729 and an MAE of 1.227. These findings not only enhance SMP prediction accuracy but also provide crucial insights for developing more targeted approaches in social media analysis. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: Submission of the 7th Social Media Prediction Challenge

arXiv:2406.19941 [pdf, other]

GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

Authors: Chih-Chung Hsu, Shao-Ning Chen, Mei-Hsuan Wu, Yi-Fang Wang, Chia-Ming Lee, Yi-Shiuan Chou

Abstract: As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques.… ▽ More As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques. This paper introduces a novel method for robust DeepFake video detection, harnessing the power of the proposed Graph-Regularized Attentive Convolutional Entanglement (GRACE) based on the graph convolutional network with graph Laplacian to address the aforementioned challenges. First, conventional Convolution Neural Networks are deployed to perform spatiotemporal features for the entire video. Then, the spatial and temporal features are mutually entangled by constructing a graph with sparse constraint, enforcing essential features of valid face images in the noisy face sequences remaining, thus augmenting stability and performance for DeepFake video detection. Furthermore, the Graph Laplacian prior is proposed in the graph convolutional network to remove the noise pattern in the feature space to further improve the performance. Comprehensive experiments are conducted to illustrate that our proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences. The source code is available at https://github.com/ming053l/GRACE. △ Less

Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

Comments: Submitted to TPAMI 2024

arXiv:2406.19666 [pdf, other]

CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion

Authors: Chih-Chung Hsu, Chih-Chien Ni, Chia-Ming Lee, Li-Wei Kang

Abstract: Hyperspectral imaging, capturing detailed spectral information for each pixel, is pivotal in diverse scientific and industrial applications. Yet, the acquisition of high-resolution (HR) hyperspectral images (HSIs) often needs to be addressed due to the hardware limitations of existing imaging systems. A prevalent workaround involves capturing both a high-resolution multispectral image (HR-MSI) and… ▽ More Hyperspectral imaging, capturing detailed spectral information for each pixel, is pivotal in diverse scientific and industrial applications. Yet, the acquisition of high-resolution (HR) hyperspectral images (HSIs) often needs to be addressed due to the hardware limitations of existing imaging systems. A prevalent workaround involves capturing both a high-resolution multispectral image (HR-MSI) and a low-resolution (LR) HSI, subsequently fusing them to yield the desired HR-HSI. Although deep learning-based methods have shown promising in HR-MSI/LR-HSI fusion and LR-HSI super-resolution (SR), their substantial model complexities hinder deployment on resource-constrained imaging devices. This paper introduces a novel knowledge distillation (KD) framework for HR-MSI/LR-HSI fusion to achieve SR of LR-HSI. Our KD framework integrates the proposed Cross-Layer Residual Aggregation (CLRA) block to enhance efficiency for constructing Dual Two-Streamed (DTS) network structure, designed to extract joint and distinct features from LR-HSI and HR-MSI simultaneously. To fully exploit the spatial and spectral feature representations of LR-HSI and HR-MSI, we propose a novel Cross Self-Attention (CSA) fusion module to adaptively fuse those features to improve the spatial and spectral quality of the reconstructed HR-HSI. Finally, the proposed KD-based joint loss function is employed to co-train the teacher and student networks. Our experimental results demonstrate that the student model not only achieves comparable or superior LR-HSI SR performance but also significantly reduces the model-size and computational requirements. This marks a substantial advancement over existing state-of-the-art methods. The source code is available at https://github.com/ming053l/CSAKD. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Submitted to TIP 2024

arXiv:2406.19622 [pdf, other]

Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness

Authors: Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee

Abstract: The security and robustness of deep neural networks (DNNs) have become increasingly concerning. This paper aims to provide both a theoretical foundation and a practical solution to ensure the reliability of DNNs. We explore the concept of Lipschitz continuity to certify the robustness of DNNs against adversarial attacks, which aim to mislead the network with adding imperceptible perturbations into… ▽ More The security and robustness of deep neural networks (DNNs) have become increasingly concerning. This paper aims to provide both a theoretical foundation and a practical solution to ensure the reliability of DNNs. We explore the concept of Lipschitz continuity to certify the robustness of DNNs against adversarial attacks, which aim to mislead the network with adding imperceptible perturbations into inputs. We propose a novel algorithm that remaps the input domain into a constrained range, reducing the Lipschitz constant and potentially enhancing robustness. Unlike existing adversarially trained models, where robustness is enhanced by introducing additional examples from other datasets or generative models, our method is almost cost-free as it can be integrated with existing models without requiring re-training. Experimental results demonstrate the generalizability of our method, as it can be combined with various models and achieve enhancements in robustness. Furthermore, our method achieves the best robust accuracy for CIFAR10, CIFAR100, and ImageNet datasets on the RobustBench leaderboard. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.16008 [pdf, other]

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Authors: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

Abstract: Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between… ▽ More Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences. △ Less

Submitted 3 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: ACL Findings 2024

arXiv:2406.15959 [pdf, other]

A Nonoverlapping Domain Decomposition Method for Extreme Learning Machines: Elliptic Problems

Authors: Chang-Ock Lee, Youngkyu Lee, Byungeun Ryoo

Abstract: Extreme learning machine (ELM) is a methodology for solving partial differential equations (PDEs) using a single hidden layer feed-forward neural network. It presets the weight/bias coefficients in the hidden layer with random values, which remain fixed throughout the computation, and uses a linear least squares method for training the parameters of the output layer of the neural network. It is kn… ▽ More Extreme learning machine (ELM) is a methodology for solving partial differential equations (PDEs) using a single hidden layer feed-forward neural network. It presets the weight/bias coefficients in the hidden layer with random values, which remain fixed throughout the computation, and uses a linear least squares method for training the parameters of the output layer of the neural network. It is known to be much faster than Physics informed neural networks. However, classical ELM is still computationally expensive when a high level of representation is desired in the solution as this requires solving a large least squares system. In this paper, we propose a nonoverlapping domain decomposition method (DDM) for ELMs that not only reduces the training time of ELMs, but is also suitable for parallel computation. In numerical analysis, DDMs have been widely studied to reduce the time to obtain finite element solutions for elliptic PDEs through parallel computation. Among these approaches, nonoverlapping DDMs are attracting the most attention. Motivated by these methods, we introduce local neural networks, which are valid only at corresponding subdomains, and an auxiliary variable at the interface. We construct a system on the variable and the parameters of local neural networks. A Schur complement system on the interface can be derived by eliminating the parameters of the output layer. The auxiliary variable is then directly obtained by solving the reduced system after which the parameters for each local neural network are solved in parallel. A method for initializing the hidden layer parameters suitable for high approximation quality in large systems is also proposed. Numerical results that verify the acceleration performance of the proposed method with respect to the number of subdomains are presented. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 18 pages, 4 figures, 7 tables

MSC Class: 65N55 (Primary); 35J25; 68T07 (Secondary)

arXiv:2406.15673 [pdf, other]

Large Language Models have Intrinsic Self-Correction Ability

Authors: Dancheng Liu, Amir Nassereldine, Ziming Yang, Chenhui Xu, Yuting Hu, Jiajie Li, Utkarsh Kumar, Changjae Lee, Jinjun Xiong

Abstract: Large language models (LLMs) have attracted significant attention for their remarkable abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-co… ▽ More Large language models (LLMs) have attracted significant attention for their remarkable abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: in submission

arXiv:2406.14541 [pdf, other]

Are LLMs Naturally Good at Synthetic Tabular Data Generation?

Authors: Shengzhe Xu, Cho-Ting Lee, Mandar Sharma, Raquib Bin Yousuf, Nikhil Muralidhar, Naren Ramakrishnan

Abstract: Large language models (LLMs) have demonstrated their prowess in generating synthetic text and images; however, their potential for generating tabular data -- arguably the most common data type in business and scientific applications -- is largely underexplored. This paper demonstrates that LLMs, used as-is, or after traditional fine-tuning, are severely inadequate as synthetic table generators. Du… ▽ More Large language models (LLMs) have demonstrated their prowess in generating synthetic text and images; however, their potential for generating tabular data -- arguably the most common data type in business and scientific applications -- is largely underexplored. This paper demonstrates that LLMs, used as-is, or after traditional fine-tuning, are severely inadequate as synthetic table generators. Due to the autoregressive nature of LLMs, fine-tuning with random order permutation runs counter to the importance of modeling functional dependencies, and renders LLMs unable to model conditional mixtures of distributions (key to capturing real world constraints). We showcase how LLMs can be made to overcome some of these deficiencies by making them permutation-aware. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13474 [pdf, other]

Attention-aware Post-training Quantization without Backpropagation

Authors: Junhan Kim, Ho-young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon

Abstract: Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated… ▽ More Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated via recently proposed backpropagation-free PTQ methods; however, their performance is somewhat limited by their lack of consideration of inter-layer dependencies. In this paper, we thus propose a novel PTQ algorithm that considers inter-layer dependencies without relying on backpropagation. The fundamental concept involved is the development of attention-aware Hessian matrices, which facilitates the consideration of inter-layer dependencies within the attention module. Extensive experiments demonstrate that the proposed algorithm significantly outperforms conventional PTQ methods, particularly for low bit-widths. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 20 pages, under review

arXiv:2406.12119 [pdf]

Deploying scalable traffic prediction models for efficient management in real-world large transportation networks during hurricane evacuations

Authors: Qinhua Jiang, Brian Yueshuai He, Changju Lee, Jiaqi Ma

Abstract: Accurate traffic prediction is vital for effective traffic management during hurricane evacuation. This paper proposes a predictive modeling system that integrates Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) models to capture both long-term congestion patterns and short-term speed patterns. Leveraging various input variables, including archived traffic data, spatial-temporal road… ▽ More Accurate traffic prediction is vital for effective traffic management during hurricane evacuation. This paper proposes a predictive modeling system that integrates Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) models to capture both long-term congestion patterns and short-term speed patterns. Leveraging various input variables, including archived traffic data, spatial-temporal road network information, and hurricane forecast data, the framework is designed to address challenges posed by heterogeneous human behaviors, limited evacuation data, and hurricane event uncertainties. Deployed in a real-world traffic prediction system in Louisiana, the model achieved an 82% accuracy in predicting long-term congestion states over a 6-hour period during a 7-day hurricane-impacted duration. The short-term speed prediction model exhibited Mean Absolute Percentage Errors (MAPEs) ranging from 7% to 13% across evacuation horizons from 1 to 6 hours. Evaluation results underscore the model's potential to enhance traffic management during hurricane evacuations, and real-world deployment highlights its adaptability and scalability in diverse hurricane scenarios within extensive transportation networks. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE ITS Magazine and currently under review

arXiv:2406.11645 [pdf, other]

SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking

Authors: Tianhong Catherine Yu, Manru, Zhang, Peter He, Chi-Jung Lee, Cassidy Cheesman, Saif Mahmud, Ruidong Zhang, François Guimbretière, Cheng Zhang

Abstract: Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the surface of clothing, our solution leverages existing… ▽ More Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the surface of clothing, our solution leverages existing seams inside of a shirt by machine-sewing insulated conductive threads over the seams. The unique invisibilities and placements of the seams afford the sensing shirt to look and wear the same as a conventional shirt while providing exciting pose-tracking capabilities. To validate this approach, we implemented a proof-of-concept untethered shirt. With eight capacitive sensing seams, our customized deep-learning pipeline accurately estimates the upper-body 3D joint positions relative to the pelvis. With a 12-participant user study, we demonstrated promising cross-user and cross-session tracking performance. SeamPose represents a step towards unobtrusive integration of smart clothing for everyday pose estimation. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10631 [pdf, other]

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

Authors: Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

Abstract: Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several adva… ▽ More Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $O(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $δ>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/δ$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 27 pages, 4 figures

arXiv:2406.10304 [pdf, other]

Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Authors: Ming Gao, Hang Chen, Jun Du, Xin Xu, Hongxiao Guo, Hui Bu, Jianxing Yang, Ming Li, Chin-Hui Lee

Abstract: Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we… ▽ More Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: to be published in Interspeech 2024

arXiv:2406.10087 [pdf]

Biomarker based Cancer Classification using an Ensemble with Pre-trained Models

Authors: Chongmin Lee, Jihie Kim

Abstract: Certain cancer types, namely pancreatic cancer is difficult to detect at an early stage; sparking the importance of discovering the causal relationship between biomarkers and cancer to identify cancer efficiently. By allowing for the detection and monitoring of specific biomarkers through a non-invasive method, liquid biopsies enhance the precision and efficacy of medical interventions, advocating… ▽ More Certain cancer types, namely pancreatic cancer is difficult to detect at an early stage; sparking the importance of discovering the causal relationship between biomarkers and cancer to identify cancer efficiently. By allowing for the detection and monitoring of specific biomarkers through a non-invasive method, liquid biopsies enhance the precision and efficacy of medical interventions, advocating the move towards personalized healthcare. Several machine learning algorithms such as Random Forest, SVM are utilized for classification, yet causing inefficiency due to the need for conducting hyperparameter tuning. We leverage a meta-trained Hyperfast model for classifying cancer, accomplishing the highest AUC of 0.9929 and simultaneously achieving robustness especially on highly imbalanced datasets compared to other ML algorithms in several binary classification tasks (e.g. breast invasive carcinoma; BRCA vs. non-BRCA). We also propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks, achieving an incremental increase in accuracy (0.9464) while merely using 500 PCA features; distinguishable from previous studies where they used more than 2,000 features for similar results. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted to the AIAA Workshop at IJCAI 2024

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08796 [pdf, other]

Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning

Authors: Janghoon Han, Changho Lee, Joongbo Shin, Stanley Jungkyu Choi, Honglak Lee, Kynghoon Bae

Abstract: Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins… ▽ More Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in instruction tuning, we perform instruction tuning individually for two distinct language meta-datasets. Subsequently, we assess the performance on unseen tasks in a language different from the one used for training. To facilitate this investigation, we introduce a novel non-English meta-dataset named "KORANI" (Korean Natural Instruction), comprising 51 Korean benchmarks. Moreover, we design cross-lingual templates to mitigate discrepancies in language and instruction-format of the template between training and inference within the cross-lingual setting. Our experiments reveal consistent improvements through cross-lingual generalization in both English and Korean, outperforming baseline by average scores of 20.7\% and 13.6\%, respectively. Remarkably, these enhancements are comparable to those achieved by monolingual instruction tuning and even surpass them in some tasks. The result underscores the significance of relevant data acquisition across languages over linguistic congruence with unseen tasks during instruction tuning. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Findings of ACL 2024 (Camera-ready), by Janghoon Han and Changho Lee, with equal contribution

arXiv:2406.06799 [pdf, other]

LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching

Authors: Simranjit Singh, Michael Fore, Andreas Karatzas, Chaehong Lee, Yanan Jian, Longfei Shangguan, Fuxun Yu, Iraklis Anagnostopoulos, Dimitrios Stamoulis

Abstract: As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to… ▽ More As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to manage cache decisions via prompting, seamlessly integrating with existing function-calling mechanisms. Tested on an industry-scale massively parallel platform that spans hundreds of GPT endpoints and terabytes of imagery, our method improves Copilot times by an average of 1.24x across various LLMs and prompting techniques. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06474 [pdf, other]

Towards a Personal Health Large Language Model

Authors: Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra , et al. (9 additional authors not shown)

Abstract: In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We… ▽ More In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 72 pages

arXiv:2406.05365 [pdf, other]

CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

Authors: I-Hung Hsu, Zifeng Wang, Long T. Le, Lesly Miculicich, Nanyun Peng, Chen-Yu Lee, Tomas Pfister

Abstract: Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response… ▽ More Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs. Larger LM responses that closely align with the smaller LMs' output, which relies exclusively on cited documents, are verified. Responses showing discrepancies are iteratively refined through a feedback loop. Experiments on three open-domain question-answering datasets demonstrate significant performance gains of 1.5% to 7% absolute average without any required model fine-tuning. △ Less

Submitted 24 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: ACL 2024 Camera Ready Version

arXiv:2406.03824 [pdf, other]

Predictability Analysis of Regression Problems via Conditional Entropy Estimations

Authors: Yu-Hsueh Fang, Chia-Yen Lee

Abstract: In the field of machine learning, regression problems are pivotal due to their ability to predict continuous outcomes. Traditional error metrics like mean squared error, mean absolute error, and coefficient of determination measure model accuracy. The model accuracy is the consequence of the selected model and the features, which blurs the analysis of contribution. Predictability, in the other han… ▽ More In the field of machine learning, regression problems are pivotal due to their ability to predict continuous outcomes. Traditional error metrics like mean squared error, mean absolute error, and coefficient of determination measure model accuracy. The model accuracy is the consequence of the selected model and the features, which blurs the analysis of contribution. Predictability, in the other hand, focus on the predictable level of a target variable given a set of features. This study introduces conditional entropy estimators to assess predictability in regression problems, bridging this gap. We enhance and develop reliable conditional entropy estimators, particularly the KNIFE-P estimator and LMC-P estimator, which offer under- and over-estimation, providing a practical framework for predictability analysis. Extensive experiments on synthesized and real-world datasets demonstrate the robustness and utility of these estimators. Additionally, we extend the analysis to the coefficient of determination $R^2 $, enhancing the interpretability of predictability. The results highlight the effectiveness of KNIFE-P and LMC-P in capturing the achievable performance and limitations of feature sets, providing valuable tools in the development of regression models. These indicators offer a robust framework for assessing the predictability for regression problems. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02989 [pdf, other]

Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy

Authors: Yunho Kim, Jeong Hyun Lee, Choongin Lee, Juhyeok Mun, Donghoon Youm, Jeongsoo Park, Jemin Hwangbo

Abstract: For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves m… ▽ More For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable. The summary video is available at https://youtu.be/EUVoH-wA-lA. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE Robotics and Automation Letters (RA-L), First two authors contributed equally

arXiv:2406.02488 [pdf, other]

Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition

Authors: Hao Yen, Pin-Jui Ku, Sabato Marco Siniscalchi, Chin-Hui Lee

Abstract: We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR) leveraging upon (i) a self-supervised pre-trained model, and (ii) a set of universal speech attributes (manner and place of articulation). Specifically, Wav2Vec2.0 is used to generate robust speech representations, followed by a linear output layer to produce attribute sequences. A non-trainable… ▽ More We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR) leveraging upon (i) a self-supervised pre-trained model, and (ii) a set of universal speech attributes (manner and place of articulation). Specifically, Wav2Vec2.0 is used to generate robust speech representations, followed by a linear output layer to produce attribute sequences. A non-trainable pronunciation model then maps sequences of attributes into spoken keywords in a multilingual setting. Experiments on the Multilingual Spoken Words Corpus show comparable performances to character- and phoneme-based SKR in seen languages. The inclusion of domain adversarial training (DAT) improves the proposed framework, outperforming both character- and phoneme-based SKR approaches with 13.73% and 17.22% relative word error rate (WER) reduction in seen languages, and achieves 32.14% and 19.92% WER reduction for unseen languages in zero-shot settings. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01591 [pdf, other]

DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation

Authors: Chun-Hung Wu, Shih-Hong Chen, Chih-Yao Hu, Hsin-Yu Wu, Kai-Hsin Chen, Yu-You Chen, Chih-Hai Su, Chih-Kuo Lee, Yu-Lun Liu

Abstract: This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronar… ▽ More This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronary video dataset with high-quality, manually labeled segmentation ground truth. Our evaluation demonstrates that DeNVeR outperforms current state-of-the-art methods in vessel segmentation. This paper marks an advance in medical imaging, providing a robust, data-efficient tool for disease diagnosis and treatment planning and setting a new standard for future research in video vessel segmentation. See our project page for video results at https://kirito878.github.io/DeNVeR/. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Project page: https://kirito878.github.io/DeNVeR/

arXiv:2406.01480 [pdf, other]

Towards Automating the Retrospective Generation of BIM Models: A Unified Framework for 3D Semantic Reconstruction of the Built Environment

Authors: Ka Lung Cheung, Chi Chung Lee

Abstract: The adoption of Building Information Modeling (BIM) is beneficial in construction projects. However, it faces challenges due to the lack of a unified and scalable framework for converting 3D model details into BIM. This paper introduces SRBIM, a unified semantic reconstruction architecture for BIM generation. Our approach's effectiveness is demonstrated through extensive qualitative and quantitati… ▽ More The adoption of Building Information Modeling (BIM) is beneficial in construction projects. However, it faces challenges due to the lack of a unified and scalable framework for converting 3D model details into BIM. This paper introduces SRBIM, a unified semantic reconstruction architecture for BIM generation. Our approach's effectiveness is demonstrated through extensive qualitative and quantitative evaluations, establishing a new paradigm for automated BIM modeling. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: CVPRW 2024, Oral

arXiv:2406.01337 [pdf, other]

ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds

Authors: Ka Lung Cheung, Chi Chung Lee

Abstract: Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcom… ▽ More Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcome this shortfall, this paper introduces a semantically-enriched, photo-realistic 3D architectural models dataset and benchmark for semantic segmentation. It features 4 different building purposes of real-world buildings as well as an open architectural landscape in Hong Kong. Each point cloud is annotated into one of 14 semantic classes. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: CVPRW 2024 (Oral)

arXiv:2406.00856 [pdf, other]

DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection

Authors: Yewon Lim, Changyeon Lee, Aerin Kim, Oren Etzioni

Abstract: A dramatic influx of diffusion-generated images has marked recent years, posing unique challenges to current detection technologies. While the task of identifying these images falls under binary classification, a seemingly straightforward category, the computational load is significant when employing the "reconstruction then compare" technique. This approach, known as DIRE (Diffusion Reconstructio… ▽ More A dramatic influx of diffusion-generated images has marked recent years, posing unique challenges to current detection technologies. While the task of identifying these images falls under binary classification, a seemingly straightforward category, the computational load is significant when employing the "reconstruction then compare" technique. This approach, known as DIRE (Diffusion Reconstruction Error), not only identifies diffusion-generated images but also detects those produced by GANs, highlighting the technique's broad applicability. To address the computational challenges and improve efficiency, we propose distilling the knowledge embedded in diffusion models to develop rapid deepfake detection models. Our approach, aimed at creating a small, fast, cheap, and lightweight diffusion synthesized deepfake detector, maintains robust performance while significantly reducing operational demands. Maintaining performance, our experimental results indicate an inference speed 3.2 times faster than the existing DIRE framework. This advance not only enhances the practicality of deploying these systems in real-world settings but also paves the way for future research endeavors that seek to leverage diffusion model knowledge. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 6 pages, 1 figure

arXiv:2405.20513 [pdf, other]

Deep Modeling of Non-Gaussian Aleatoric Uncertainty

Authors: Aastha Acharya, Caleb Lee, Marissa D'Alonzo, Jared Shamwell, Nisar R. Ahmed, Rebecca Russell

Abstract: Deep learning offers promising new ways to accurately model aleatoric uncertainty in robotic estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deep learning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric unce… ▽ More Deep learning offers promising new ways to accurately model aleatoric uncertainty in robotic estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deep learning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric uncertainty: parametric, discretized, and generative modeling. We systematically compare the respective strengths and weaknesses of these three methods on simulated non-Gaussian densities as well as on real-world terrain-relative navigation data. Our results show that these deep learning methods can accurately capture complex uncertainty patterns, highlighting their potential for improving the reliability and robustness of estimation systems. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 8 pages, 7 figures

arXiv:2405.19563 [pdf, other]

Unlearning Climate Misinformation in Large Language Models

Authors: Michael Fore, Simranjit Singh, Chaehong Lee, Amritanshu Pandey, Antonios Anastasopoulos, Dimitrios Stamoulis

Abstract: Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity. This paper investigates factual accuracy in large language models (LLMs) regarding climate information. Using true/false labeled Q&A data for fine-tuning and evaluating LLMs on climate-related claims, we compare open-source models, assessing their ability to generate truthful respo… ▽ More Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity. This paper investigates factual accuracy in large language models (LLMs) regarding climate information. Using true/false labeled Q&A data for fine-tuning and evaluating LLMs on climate-related claims, we compare open-source models, assessing their ability to generate truthful responses to climate change questions. We investigate the detectability of models intentionally poisoned with false climate information, finding that such poisoning may not affect the accuracy of a model's responses in other domains. Furthermore, we compare the effectiveness of unlearning algorithms, fine-tuning, and Retrieval-Augmented Generation (RAG) for factually grounding LLMs on climate change topics. Our evaluation reveals that unlearning algorithms can be effective for nuanced conceptual claims, despite previous findings suggesting their inefficacy in privacy contexts. These insights aim to guide the development of more factually reliable LLMs and highlight the need for additional work to secure LLMs against misinformation attacks. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18028 [pdf, other]

Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints

Authors: Aryo Pradipta Gema, Chaeeun Lee, Pasquale Minervini, Luke Daines, T. Ian Simpson, Beatrice Alex

Abstract: The MEDIQA-CORR 2024 shared task aims to assess the ability of Large Language Models (LLMs) to identify and correct medical errors in clinical notes. In this study, we evaluate the capability of general LLMs, specifically GPT-3.5 and GPT-4, to identify and correct medical errors with multiple prompting strategies. Recognising the limitation of LLMs in generating accurate corrections only via promp… ▽ More The MEDIQA-CORR 2024 shared task aims to assess the ability of Large Language Models (LLMs) to identify and correct medical errors in clinical notes. In this study, we evaluate the capability of general LLMs, specifically GPT-3.5 and GPT-4, to identify and correct medical errors with multiple prompting strategies. Recognising the limitation of LLMs in generating accurate corrections only via prompting strategies, we propose incorporating error-span predictions from a smaller, fine-tuned model in two ways: 1) by presenting it as a hint in the prompt and 2) by framing it as multiple-choice questions from which the LLM can choose the best correction. We found that our proposed prompting strategies significantly improve the LLM's ability to generate corrections. Our best-performing solution with 8-shot + CoT + hints ranked sixth in the shared task leaderboard. Additionally, our comprehensive analyses show the impact of the location of the error sentence, the prompted role, and the position of the multiple-choice option on the accuracy of the LLM. This prompts further questions about the readiness of LLM to be implemented in real-world clinical settings. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17428 [pdf, other]

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Authors: Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Abstract: Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retrieval. In this work, we introduce the NV-Embed model with a variety of architectural designs and training procedures to significantly enhance the performance of LLM as a versatile embedding model, whil… ▽ More Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retrieval. In this work, we introduce the NV-Embed model with a variety of architectural designs and training procedures to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility. For model architecture, we propose a latent attention layer to obtain pooled embeddings, which consistently improves retrieval and downstream task accuracy compared to mean pooling or using the last <EOS> token embedding from LLMs. To enhance representation learning, we remove the causal attention mask of LLMs during contrastive training. For model training, we introduce a two-stage contrastive instruction-tuning method. It first applies contrastive training with instructions on retrieval datasets, utilizing in-batch negatives and curated hard negative examples. At stage-2, it blends various non-retrieval datasets into instruction tuning, which not only enhances non-retrieval task accuracy but also improves retrieval performance. Combining these techniques, our NV-Embed model, using only publicly available data, has achieved a record-high score of 69.32, ranking No. 1 on the Massive Text Embedding Benchmark (MTEB) (as of May 24, 2024), with 56 tasks, encompassing retrieval, reranking, classification, clustering, and semantic textual similarity tasks. Notably, our model also attains the highest score of 59.36 on 15 retrieval tasks in the MTEB benchmark (also known as BEIR). We will open-source the model at: https://huggingface.co/nvidia/NV-Embed-v1. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17083 [pdf, other]

F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting

Authors: Xiangyu Sun, Joo Chan Lee, Daniel Rho, Jong Hwan Ko, Usman Ali, Eunbyung Park

Abstract: The neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering spee… ▽ More The neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering speeds while maintaining excellent image quality. However, as it represents objects and scenes using a myriad of Gaussians, it requires substantial storage to achieve high-quality representation. To mitigate the storage overhead, we propose Factorized 3D Gaussian Splatting (F-3DGS), a novel approach that drastically reduces storage requirements while preserving image quality. Inspired by classical matrix and tensor factorization techniques, our method represents and approximates dense clusters of Gaussians with significantly fewer Gaussians through efficient factorization. We aim to efficiently represent dense 3D Gaussians by approximating them with a limited amount of information for each axis and their combinations. This method allows us to encode a substantially large number of Gaussians along with their essential attributes -- such as color, scale, and rotation -- necessary for rendering using a relatively small number of elements. Extensive experimental results demonstrate that F-3DGS achieves a significant reduction in storage costs while maintaining comparable quality in rendered images. △ Less

Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: Our project page including code is available at https://xiangyu1sun.github.io/Factorize-3DGS/

arXiv:2405.16711 [pdf, other]

doi 10.1145/3643834.3661576

The AI-DEC: A Card-based Design Method for User-centered AI Explanations

Authors: Christine P Lee, Min Kyung Lee, Bilge Mutlu

Abstract: Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four di… ▽ More Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four dimensions of AI explanations that are critical for the integration of AI systems -- communication content, modality, frequency, and direction -- and offers design examples for end-users to design AI explanations that meet their needs. We evaluated this method through co-design sessions with workers in healthcare, finance, and management industries who regularly use AI systems in their daily work. Findings indicate that the AI-DEC effectively supported workers in designing explanations that accommodated diverse levels of performance and autonomy needs, which varied depending on the AI system's workplace role and worker values. We discuss the implications of using the AI-DEC for the user-centered design of AI explanations in real-world systems. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Journal ref: Designing Interactive Systems Conference, 2024, (DIS '24)

arXiv:2405.16710 [pdf, other]

doi 10.1145/3643834.3661559

REX: Designing User-centered Repair and Explanations to Address Robot Failures

Authors: Christine P Lee, Pragathi Praveena, Bilge Mutlu

Abstract: Robots in real-world environments continuously engage with multiple users and encounter changes that lead to unexpected conflicts in fulfilling user requests. Recent technical advancements (e.g., large-language models (LLMs), program synthesis) offer various methods for automatically generating repair plans that address such conflicts. In this work, we understand how automated repair and explanati… ▽ More Robots in real-world environments continuously engage with multiple users and encounter changes that lead to unexpected conflicts in fulfilling user requests. Recent technical advancements (e.g., large-language models (LLMs), program synthesis) offer various methods for automatically generating repair plans that address such conflicts. In this work, we understand how automated repair and explanations can be designed to improve user experience with robot failures through two user studies. In our first, online study ($n=162$), users expressed increased trust, satisfaction, and utility with the robot performing automated repair and explanations. However, we also identified risk factors -- safety, privacy, and complexity -- that require adaptive repair strategies. The second, in-person study ($n=24$) elucidated distinct repair and explanation strategies depending on the level of risk severity and type. Using a design-based approach, we explore automated repair with explanations as a solution for robots to handle conflicts and failures, complemented by adaptive strategies for risk factors. Finally, we discuss the implications of incorporating such strategies into robot designs to achieve seamless operation among changing user needs and environments. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Journal ref: Designing Interactive Systems Conference, 2024, (DIS '24)

arXiv:2405.13858 [pdf, other]

Carbon Connect: An Ecosystem for Sustainable Computing

Authors: Benjamin C. Lee, David Brooks, Arthur van Benthem, Udit Gupta, Gage Hills, Vincent Liu, Benjamin Pierce, Christopher Stewart, Emma Strubell, Gu-Yeon Wei, Adam Wierman, Yuan Yao, Minlan Yu

Abstract: Computing is at a moment of profound opportunity. Emerging applications -- such as capable artificial intelligence, immersive virtual realities, and pervasive sensor systems -- drive unprecedented demand for computer. Despite recent advances toward net zero carbon emissions, the computing industry's gross energy usage continues to rise at an alarming rate, outpacing the growth of new energy instal… ▽ More Computing is at a moment of profound opportunity. Emerging applications -- such as capable artificial intelligence, immersive virtual realities, and pervasive sensor systems -- drive unprecedented demand for computer. Despite recent advances toward net zero carbon emissions, the computing industry's gross energy usage continues to rise at an alarming rate, outpacing the growth of new energy installations and renewable energy deployments. A shift towards sustainability is needed to spark a transformation in how computer systems are manufactured, allocated, and consumed. Carbon Connect envisions coordinated research thrusts that produce design and management strategies for sustainable, next-generation computer systems. These strategies must flatten and then reverse growth trajectories for computing power and carbon for society's most rapidly growing applications such as artificial intelligence and virtual spaces. We will require accurate models for carbon accounting in computing technology. For embodied carbon, we must re-think conventional design strategies -- over-provisioned monolithic servers, frequent hardware refresh cycles, custom silicon -- and adopt life-cycle design strategies that more effectively reduce, reuse and recycle hardware at scale. For operational carbon, we must not only embrace renewable energy but also design systems to use that energy more efficiently. Finally, new hardware design and management strategies must be cognizant of economic policy and regulatory landscape, aligning private initiatives with societal goals. Many of these broader goals will require computer scientists to develop deep, enduring collaborations with researchers in economics, law, and industrial ecology to spark change in broader practice. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13629 [pdf, other]

Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Authors: Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

Abstract: Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance wit… ▽ More Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function. In this paper, we introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow). This framework integrates the policy evaluation steps and the policy improvement steps, resulting in a single objective training process. Our method enables the calculation of the soft value function used in the policy evaluation target without Monte Carlo approximation. Moreover, this design supports the modeling of multi-modal action distributions while facilitating efficient action sampling. To evaluate the performance of our method, we conducted experiments on the MuJoCo benchmark suite and a number of high-dimensional robotic tasks simulated by Omniverse Isaac Gym. The evaluation results demonstrate that our method achieves superior performance compared to widely-adopted representative baselines. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13143 [pdf, ps, other]

Pseudorandomness, symmetry, smoothing: I

Authors: Harm Derksen, Peter Ivanov, Chin Ho Lee, Emanuele Viola

Abstract: We prove several new results about bounded uniform and small-bias distributions. A main message is that, small-bias, even perturbed with noise, does not fool several classes of tests better than bounded uniformity. We prove this for threshold tests, small-space algorithms, and small-depth circuits. In particular, we obtain small-bias distributions that 1) achieve an optimal lower bound on their… ▽ More We prove several new results about bounded uniform and small-bias distributions. A main message is that, small-bias, even perturbed with noise, does not fool several classes of tests better than bounded uniformity. We prove this for threshold tests, small-space algorithms, and small-depth circuits. In particular, we obtain small-bias distributions that 1) achieve an optimal lower bound on their statistical distance to any bounded-uniform distribution. This closes a line of research initiated by Alon, Goldreich, and Mansour in 2003, and improves on a result by O'Donnell and Zhao. 2) have heavier tail mass than the uniform distribution. This answers a question posed by several researchers including Bun and Steinke. 3) rule out a popular paradigm for constructing pseudorandom generators, originating in a 1989 work by Ajtai and Wigderson. This again answers a question raised by several researchers. For branching programs, our result matches a bound by Forbes and Kelley. Our small-bias distributions above are symmetric. We show that the xor of any two symmetric small-bias distributions fools any bounded function. Hence our examples cannot be extended to the xor of two small-bias distributions, another popular paradigm whose power remains unknown. We also generalize and simplify the proof of a result of Bazzi. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: CCC 2024

arXiv:2405.03547 [pdf, other]

Position: Leverage Foundational Models for Black-Box Optimization

Authors: Xingyou Song, Yingtao Tian, Robert Tjarko Lange, Chansoo Lee, Yujin Tang, Yutian Chen

Abstract: Undeniably, Large Language Models (LLMs) have stirred an extraordinary wave of innovation in the machine learning research domain, resulting in substantial impact across diverse fields such as reinforcement learning, robotics, and computer vision. Their incorporation has been rapid and transformative, marking a significant paradigm shift in the field of machine learning research. However, the fiel… ▽ More Undeniably, Large Language Models (LLMs) have stirred an extraordinary wave of innovation in the machine learning research domain, resulting in substantial impact across diverse fields such as reinforcement learning, robotics, and computer vision. Their incorporation has been rapid and transformative, marking a significant paradigm shift in the field of machine learning research. However, the field of experimental design, grounded on black-box optimization, has been much less affected by such a paradigm shift, even though integrating LLMs with optimization presents a unique landscape ripe for exploration. In this position paper, we frame the field of black-box optimization around sequence-based foundation models and organize their relationship with previous literature. We discuss the most promising ways foundational language models can revolutionize optimization, which include harnessing the vast wealth of information encapsulated in free-form text to enrich task comprehension, utilizing highly flexible sequence models such as Transformers to engineer superior optimization strategies, and enhancing performance prediction over previously unseen search spaces. △ Less

Submitted 9 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: International Conference on Machine Learning (ICML) 2024

Showing 1–50 of 886 results for author: Lee, C