Search | arXiv e-print repository

arXiv:2407.19225 [pdf, other]

Magic3DSketch: Create Colorful 3D Models From Sketch-Based 3D Modeling Guided by Text and Language-Image Pre-Training

Authors: Ying Zang, Yidong Han, Chaotao Ding, Jianqi Zhang, Tianrun Chen

Abstract: The requirement for 3D content is growing as AR/VR application emerges. At the same time, 3D modelling is only available for skillful experts, because traditional methods like Computer-Aided Design (CAD) are often too labor-intensive and skill-demanding, making it challenging for novice users. Our proposed method, Magic3DSketch, employs a novel technique that encodes sketches to predict a 3D mesh,… ▽ More The requirement for 3D content is growing as AR/VR application emerges. At the same time, 3D modelling is only available for skillful experts, because traditional methods like Computer-Aided Design (CAD) are often too labor-intensive and skill-demanding, making it challenging for novice users. Our proposed method, Magic3DSketch, employs a novel technique that encodes sketches to predict a 3D mesh, guided by text descriptions and leveraging external prior knowledge obtained through text and language-image pre-training. The integration of language-image pre-trained neural networks complements the sparse and ambiguous nature of single-view sketch inputs. Our method is also more useful and offers higher degree of controllability compared to existing text-to-3D approaches, according to our user study. Moreover, Magic3DSketch achieves state-of-the-art performance in both synthetic and real dataset with the capability of producing more detailed structures and realistic shapes with the help of text input. Users are also more satisfied with models obtained by Magic3DSketch according to our user study. Additionally, we are also the first, to our knowledge, add color based on text description to the sketch-derived shapes. By combining sketches and text guidance with the help of language-image pretrained models, our Magic3DSketch can allow novice users to create custom 3D models with minimal effort and maximum creative freedom, with the potential to revolutionize future 3D modeling pipelines. △ Less

Submitted 27 July, 2024; originally announced July 2024.

arXiv:2407.18544 [pdf, other]

Utilising Explainable Techniques for Quality Prediction in a Complex Textiles Manufacturing Use Case

Authors: Briony Forsberg, Dr Henry Williams, Prof Bruce MacDonald, Tracy Chen, Dr Reza Hamzeh, Dr Kirstine Hulse

Abstract: This paper develops an approach to classify instances of product failure in a complex textiles manufacturing dataset using explainable techniques. The dataset used in this study was obtained from a New Zealand manufacturer of woollen carpets and rugs. In investigating the trade-off between accuracy and explainability, three different tree-based classification algorithms were evaluated: a Decision… ▽ More This paper develops an approach to classify instances of product failure in a complex textiles manufacturing dataset using explainable techniques. The dataset used in this study was obtained from a New Zealand manufacturer of woollen carpets and rugs. In investigating the trade-off between accuracy and explainability, three different tree-based classification algorithms were evaluated: a Decision Tree and two ensemble methods, Random Forest and XGBoost. Additionally, three feature selection methods were also evaluated: the SelectKBest method, using chi-squared as the scoring function, the Pearson Correlation Coefficient, and the Boruta algorithm. Not surprisingly, the ensemble methods typically produced better results than the Decision Tree model. The Random Forest model yielded the best results overall when combined with the Boruta feature selection technique. Finally, a tree ensemble explaining technique was used to extract rule lists to capture necessary and sufficient conditions for classification by a trained model that could be easily interpreted by a human. Notably, several features that were in the extracted rule lists were statistical features and calculated features that were added to the original dataset. This demonstrates the influence that bringing in additional information during the data preprocessing stages can have on the ultimate model performance. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: Accepted at the 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE 2024), awaiting publication Contains seven pages and five figures

arXiv:2407.18479 [pdf, other]

Multi-turn Response Selection with Commonsense-enhanced Language Models

Authors: Yuandong Wang, Xuhui Ren, Tong Chen, Yuxiao Dong, Nguyen Quoc Viet Hung, Jie Tang

Abstract: As a branch of advanced artificial intelligence, dialogue systems are prospering. Multi-turn response selection is a general research problem in dialogue systems. With the assistance of background information and pre-trained language models, the performance of state-of-the-art methods on this problem gains impressive improvement. However, existing studies neglect the importance of external commons… ▽ More As a branch of advanced artificial intelligence, dialogue systems are prospering. Multi-turn response selection is a general research problem in dialogue systems. With the assistance of background information and pre-trained language models, the performance of state-of-the-art methods on this problem gains impressive improvement. However, existing studies neglect the importance of external commonsense knowledge. Hence, we design a Siamese network where a pre-trained Language model merges with a Graph neural network (SinLG). SinLG takes advantage of Pre-trained Language Models (PLMs) to catch the word correlations in the context and response candidates and utilizes a Graph Neural Network (GNN) to reason helpful common sense from an external knowledge graph. The GNN aims to assist the PLM in fine-tuning, and arousing its related memories to attain better performance. Specifically, we first extract related concepts as nodes from an external knowledge graph to construct a subgraph with the context response pair as a super node for each sample. Next, we learn two representations for the context response pair via both the PLM and GNN. A similarity loss between the two representations is utilized to transfer the commonsense knowledge from the GNN to the PLM. Then only the PLM is used to infer online so that efficiency can be guaranteed. Finally, we conduct extensive experiments on two variants of the PERSONA-CHAT dataset, which proves that our solution can not only improve the performance of the PLM but also achieve an efficient inference. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.18450 [pdf, other]

Textile Anomaly Detection: Evaluation of the State-of-the-Art for Automated Quality Inspection of Carpet

Authors: Briony Forsberg, Dr Henry Williams, Prof Bruce MacDonald, Tracy Chen, Dr Kirstine Hulse

Abstract: In this study, state-of-the-art unsupervised detection models were evaluated for the purpose of automated anomaly inspection of wool carpets. A custom dataset of four unique types of carpet textures was created to thoroughly test the models and their robustness in detecting subtle anomalies in complex textures. Due to the requirements of an inline inspection system in a manufacturing use case, the… ▽ More In this study, state-of-the-art unsupervised detection models were evaluated for the purpose of automated anomaly inspection of wool carpets. A custom dataset of four unique types of carpet textures was created to thoroughly test the models and their robustness in detecting subtle anomalies in complex textures. Due to the requirements of an inline inspection system in a manufacturing use case, the metrics of importance in this study were accuracy in detecting anomalous areas, the number of false detections, and the inference times of each model for real-time performance. Of the evaluated models, the student-teacher network based methods were found on average to yield the highest detection accuracy and lowest false detection rates. When trained on a multi-class dataset the models were found to yield comparable if not better results than single-class training. Finally, in terms of detection speed, with exception to the generative model, all other evaluated models were found to have comparable inference times on a GPU, with an average of 0.16s per image. On a CPU, most of these models typically produced results between 1.5 to 2 times the respective GPU inference times. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted at the 2023 Australasian Conference on Robotics and Automation (ACRA 2023) Publication url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85184380272&partnerID=40&md5=74fde263f4a24a1bff75d6560b423994 ISSN: 14482053 Contains 10 pages and three figures

arXiv:2407.17983 [pdf, other]

Explain EEG-based End-to-end Deep Learning Models in the Frequency Domain

Authors: Hanqi Wang, Kun Yang, Jingyu Zhang, Tao Chen, Liang Song

Abstract: The recent rise of EEG-based end-to-end deep learning models presents a significant challenge in elucidating how these models process raw EEG signals and generate predictions in the frequency domain. This challenge limits the transparency and credibility of EEG-based end-to-end models, hindering their application in security-sensitive areas. To address this issue, we propose a mask perturbation me… ▽ More The recent rise of EEG-based end-to-end deep learning models presents a significant challenge in elucidating how these models process raw EEG signals and generate predictions in the frequency domain. This challenge limits the transparency and credibility of EEG-based end-to-end models, hindering their application in security-sensitive areas. To address this issue, we propose a mask perturbation method to explain the behavior of end-to-end models in the frequency domain. Considering the characteristics of EEG data, we introduce a target alignment loss to mitigate the out-of-distribution problem associated with perturbation operations. Additionally, we develop a perturbation generator to define perturbation generation in the frequency domain. Our explanation method is validated through experiments on multiple representative end-to-end deep learning models in the EEG decoding field, using an established EEG benchmark dataset. The results demonstrate the effectiveness and superiority of our method, and highlight its potential to advance research in EEG-based end-to-end models. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.17857 [pdf, other]

Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network

Authors: Sukwon Yun, Jie Peng, Alexandro E. Trevino, Chanyoung Park, Tianlong Chen

Abstract: Recent advancements in graph-based approaches for multiplexed immunofluorescence (mIF) images have significantly propelled the field forward, offering deeper insights into patient-level phenotyping. However, current graph-based methodologies encounter two primary challenges: (1) Cellular Heterogeneity, where existing approaches fail to adequately address the inductive biases inherent in graphs, pa… ▽ More Recent advancements in graph-based approaches for multiplexed immunofluorescence (mIF) images have significantly propelled the field forward, offering deeper insights into patient-level phenotyping. However, current graph-based methodologies encounter two primary challenges: (1) Cellular Heterogeneity, where existing approaches fail to adequately address the inductive biases inherent in graphs, particularly the homophily characteristic observed in cellular connectivity and; (2) Scalability, where handling cellular graphs from high-dimensional images faces difficulties in managing a high number of cells. To overcome these limitations, we introduce Mew, a novel framework designed to efficiently process mIF images through the lens of multiplex network. Mew innovatively constructs a multiplex network comprising two distinct layers: a Voronoi network for geometric information and a Cell-type network for capturing cell-wise homogeneity. This framework equips a scalable and efficient Graph Neural Network (GNN), capable of processing the entire graph during training. Furthermore, Mew integrates an interpretable attention module that autonomously identifies relevant layers for image classification. Extensive experiments on a real-world patient dataset from various institutions highlight Mew's remarkable efficacy and efficiency, marking a significant advancement in mIF image analysis. The source code of Mew can be found here: \url{https://github.com/UNITES-Lab/Mew} △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.17412 [pdf, other]

(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork

Authors: Tianjin Huang, Fang Meng, Li Shen, Fan Liu, Yulong Pei, Mykola Pechenizkiy, Shiwei Liu, Tianlong Chen

Abstract: Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing, although at the cost of massive computation resources. As illustrated by compression literature, structural model pruning is a prominent algorithm to encourage model efficiency, thanks to its acceleration-friendly sparsity patterns. One of the key questions of structural p… ▽ More Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing, although at the cost of massive computation resources. As illustrated by compression literature, structural model pruning is a prominent algorithm to encourage model efficiency, thanks to its acceleration-friendly sparsity patterns. One of the key questions of structural pruning is how to estimate the channel significance. In parallel, work on data-centric AI has shown that prompting-based techniques enable impressive generalization of large language models across diverse downstream tasks. In this paper, we investigate a charming possibility - \textit{leveraging visual prompts to capture the channel importance and derive high-quality structural sparsity}. To this end, we propose a novel algorithmic framework, namely \texttt{PASS}. It is a tailored hyper-network to take both visual prompts and network weight statistics as input, and output layer-wise channel sparsity in a recurrent manner. Such designs consider the intrinsic channel dependency between layers. Comprehensive experiments across multiple network architectures and six datasets demonstrate the superiority of \texttt{PASS} in locating good structural sparsity. For example, at the same FLOPs level, \texttt{PASS} subnetworks achieve $1\%\sim 3\%$ better accuracy on Food101 dataset; or with a similar performance of $80\%$ accuracy, \texttt{PASS} subnetworks obtain $0.35\times$ more speedup than the baselines. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: Under review

arXiv:2407.17193 [pdf, other]

doi 10.1145/3664647.3680560

Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model

Authors: Yuanbo Wen, Tao Gao, Ting Chen

Abstract: Existing unpaired image deraining approaches face challenges in accurately capture the distinguishing characteristics between the rainy and clean domains, resulting in residual degradation and color distortion within the reconstructed images. To this end, we propose an energy-informed diffusion model for unpaired photo-realistic image deraining (UPID-EDM). Initially, we delve into the intricate vi… ▽ More Existing unpaired image deraining approaches face challenges in accurately capture the distinguishing characteristics between the rainy and clean domains, resulting in residual degradation and color distortion within the reconstructed images. To this end, we propose an energy-informed diffusion model for unpaired photo-realistic image deraining (UPID-EDM). Initially, we delve into the intricate visual-language priors embedded within the contrastive language-image pre-training model (CLIP), and demonstrate that the CLIP priors aid in the discrimination of rainy and clean images. Furthermore, we introduce a dual-consistent energy function (DEF) that retains the rain-irrelevant characteristics while eliminating the rain-relevant features. This energy function is trained by the non-corresponding rainy and clean images. In addition, we employ the rain-relevance discarding energy function (RDEF) and the rain-irrelevance preserving energy function (RPEF) to direct the reverse sampling procedure of a pre-trained diffusion model, effectively removing the rain streaks while preserving the image contents. Extensive experiments demonstrate that our energy-informed model surpasses the existing unpaired learning approaches in terms of both supervised and no-reference metrics. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.17184 [pdf, other]

Search for $η_{c}(2S)\to K^+ K^- η^{\prime}$ decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Using $(2.712\pm0.014)\times10^{9}$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII, we find an evidence of the $η_{c}(2S)\to K^+ K^- η^{\prime}$ decay with a statistical significance of 3.1$σ$. Its decay branching fraction is measured to be $(12.24\pm4.60(\mathrm{stat.})\pm2.37(\mathrm{syst.})\pm4.68(\mathrm{extr.}))\times 10^{-4}$, where the first uncertainty is stati… ▽ More Using $(2.712\pm0.014)\times10^{9}$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII, we find an evidence of the $η_{c}(2S)\to K^+ K^- η^{\prime}$ decay with a statistical significance of 3.1$σ$. Its decay branching fraction is measured to be $(12.24\pm4.60(\mathrm{stat.})\pm2.37(\mathrm{syst.})\pm4.68(\mathrm{extr.}))\times 10^{-4}$, where the first uncertainty is statistical, the second is systematic, and the third uncertainty is from the branching fraction of the $ψ(3686)\toγη_{c}(2S)$ decay. The upper limit on the product branching fraction $B[ψ(3686)\toγη_{c}(2S)] \times$ $B[η_{c}(2S)\to K^+ K^- η^{\prime}]$ is set to be $1.14 \times 10^{-6}$ at $90\%$ confidence level. In addition, the branching fractions of $χ_{c1}\to K^+ K^- η^{\prime}$ and $χ_{c2}\to K^+ K^- η^{\prime}$ are updated to be $(8.47\pm0.09(\mathrm{stat.})\pm0.47(\mathrm{syst.}))\times 10^{-4}$ and $(1.53\pm0.04(\mathrm{stat.})\pm0.08(\mathrm{syst.}))\times 10^{-4}$, respectively. The precision is improved by twofold. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.17126 [pdf]

SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

Authors: Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

Abstract: Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying… ▽ More Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying on extensive medical annotations or costly human intervention. It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of SDoH-GPT and XGBoost leverages the strengths of both, ensuring high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. Testing across three distinct datasets has confirmed its robustness and accuracy. This study highlights the potential of leveraging LLMs to revolutionize medical note classification, demonstrating their capability to achieve highly accurate classifications with significantly reduced time and cost. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.15595 [pdf, other]

Discrete Flow Matching

Authors: Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman

Abstract: Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this work, we present Discrete Flow Matching, a novel discrete flow paradigm designed specifically for generating discrete data. Discrete Flow Matching offers severa… ▽ More Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this work, we present Discrete Flow Matching, a novel discrete flow paradigm designed specifically for generating discrete data. Discrete Flow Matching offers several key contributions: (i) it works with a general family of probability paths interpolating between source and target distributions; (ii) it allows for a generic formula for sampling from these probability paths using learned posteriors such as the probability denoiser ($x$-prediction) and noise-prediction ($ε$-prediction); (iii) practically, focusing on specific probability paths defined with different schedulers considerably improves generative perplexity compared to previous discrete diffusion and flow models; and (iv) by scaling Discrete Flow Matching models up to 1.7B parameters, we reach 6.7% Pass@1 and 13.4% Pass@10 on HumanEval and 6.7% Pass@1 and 20.6% Pass@10 on 1-shot MBPP coding benchmarks. Our approach is capable of generating high-quality discrete data in a non-autoregressive fashion, significantly closing the gap between autoregressive models and discrete flow models. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15411 [pdf, other]

Scalable Dynamic Embedding Size Search for Streaming Recommendation

Authors: Yunke Qu, Liang Qu, Tong Chen, Xiangyu Zhao, Quoc Viet Hung Nguyen, Hongzhi Yin

Abstract: Recommender systems typically represent users and items by learning their embeddings, which are usually set to uniform dimensions and dominate the model parameters. However, real-world recommender systems often operate in streaming recommendation scenarios, where the number of users and items continues to grow, leading to substantial storage resource consumption for these embeddings. Although a fe… ▽ More Recommender systems typically represent users and items by learning their embeddings, which are usually set to uniform dimensions and dominate the model parameters. However, real-world recommender systems often operate in streaming recommendation scenarios, where the number of users and items continues to grow, leading to substantial storage resource consumption for these embeddings. Although a few methods attempt to mitigate this by employing embedding size search strategies to assign different embedding dimensions in streaming recommendations, they assume that the embedding size grows with the frequency of users/items, which eventually still exceeds the predefined memory budget over time. To address this issue, this paper proposes to learn Scalable Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items within a given memory budget over time. Specifically, we propose to sample embedding sizes from a probabilistic distribution, with the guarantee to meet any predefined memory budget. By fixing the memory budget, the proposed embedding size sampling strategy can increase and decrease the embedding sizes in accordance to the frequency of the corresponding users or items. Furthermore, we develop a reinforcement learning-based search paradigm that models each state with mean pooling to keep the length of the state vectors fixed, invariant to the changing number of users and items. As a result, the proposed method can provide embedding sizes to unseen users and items. Comprehensive empirical evaluations on two public datasets affirm the advantageous effectiveness of our proposed method. △ Less

Submitted 31 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: accepted to CIKM 2024

arXiv:2407.15182 [pdf, other]

Thermometry of Trapped Ions Based on Bichromatic Driving

Authors: Xie-Qian Li, Yi Tao, Ting Chen, Wei Wu, Yi Xie, Chun-Wang Wu, Ping-Xing Chen

Abstract: Accurate thermometry of laser-cooled ions is crucial for the performance of the trapped-ions quantum computing platform. However, most existing methods face a computational exponential bottleneck. Recently, a thermometry method based on bichromatic driving was theoretically proposed by Ivan Vybornyi et al. to overcome this obstacle, which allows the computational complexity to remain constant with… ▽ More Accurate thermometry of laser-cooled ions is crucial for the performance of the trapped-ions quantum computing platform. However, most existing methods face a computational exponential bottleneck. Recently, a thermometry method based on bichromatic driving was theoretically proposed by Ivan Vybornyi et al. to overcome this obstacle, which allows the computational complexity to remain constant with the increase of ion numbers. In this paper, we provide a detailed statistical analysis of this method and prove its robustness to several imperfect experimental conditions using Floquet theory. We then experimentally verify its good performance on a linear segmented surface-electrode ion trap platform for the first time. This method is proven to be effective from near the motional ground state to a few mean phonon numbers. Our theoretical analysis and experimental verification demonstrate that the scheme can accurately and efficiently measure the temperature in ion crystals. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.15128 [pdf, other]

A description of the integral depth-$r$ Bernstein center

Authors: Tsao-Hsien Chen, Sarbartha Bhattacharya

Abstract: In this paper we give a description of the depth-$r$ Bernstein center for non-negative integers $r$ of a reductive simply connected group $G$ over a non-archimedean local field as a limit of depth-$r$ standard parahoric Hecke algebras. Using the description, we construct maps from the algebra of stable functions on the $r$-th Moy-Prasad filtration quotient of hyperspecial parahorics to the depth-… ▽ More In this paper we give a description of the depth-$r$ Bernstein center for non-negative integers $r$ of a reductive simply connected group $G$ over a non-archimedean local field as a limit of depth-$r$ standard parahoric Hecke algebras. Using the description, we construct maps from the algebra of stable functions on the $r$-th Moy-Prasad filtration quotient of hyperspecial parahorics to the depth-$r$ Bernstein center and use them to attach to each depth-$r$ irreducible representation $π$ an invariant $θ(π)$, called the depth-$r$ Deligne-Lusztig parameter of $π$. We show that $θ(π)$ is equal to the semi-simple part of minimal $K$-types of $π$. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: 38 pages

arXiv:2407.14903 [pdf, other]

Automated Patient Positioning with Learned 3D Hand Gestures

Authors: Zhongpai Gao, Abhishek Sharma, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu

Abstract: Positioning patients for scanning and interventional procedures is a critical task that requires high precision and accuracy. The conventional workflow involves manually adjusting the patient support to align the center of the target body part with the laser projector or other guiding devices. This process is not only time-consuming but also prone to inaccuracies. In this work, we propose an autom… ▽ More Positioning patients for scanning and interventional procedures is a critical task that requires high precision and accuracy. The conventional workflow involves manually adjusting the patient support to align the center of the target body part with the laser projector or other guiding devices. This process is not only time-consuming but also prone to inaccuracies. In this work, we propose an automated patient positioning system that utilizes a camera to detect specific hand gestures from technicians, allowing users to indicate the target patient region to the system and initiate automated positioning. Our approach relies on a novel multi-stage pipeline to recognize and interpret the technicians' gestures, translating them into precise motions of medical devices. We evaluate our proposed pipeline during actual MRI scanning procedures, using RGB-Depth cameras to capture the process. Results show that our system achieves accurate and precise patient positioning with minimal technician intervention. Furthermore, we validate our method on HaGRID, a large-scale hand gesture dataset, demonstrating its effectiveness in hand detection and gesture recognition. △ Less

Submitted 20 July, 2024; originally announced July 2024.

arXiv:2407.14829 [pdf, other]

Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct dataset and baseline model respectively. In total, 32 competing teams register for the challenge, from which we received 11 successful submissions. In this paper, we will present the results of the challenge and a summary of the systems, highlighting commonalities and innovations among participating systems. Datasets and baseline models of the AI-Debater 2023 Challenge have been already released and can be accessed through the official website of the challenge. △ Less

Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

arXiv:2407.13917 [pdf, other]

doi 10.5555/3618408.3619931

LinSATNet: The Positive Linear Satisfiability Neural Networks

Authors: Runzhong Wang, Yunhao Zhang, Ziao Guo, Tianyi Chen, Xiaokang Yang, Junchi Yan

Abstract: Encoding constraints into neural networks is attractive. This paper studies how to introduce the popular positive linear satisfiability to neural networks. We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions. We further theoretically characterize the convergence property of the… ▽ More Encoding constraints into neural networks is attractive. This paper studies how to introduce the popular positive linear satisfiability to neural networks. We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions. We further theoretically characterize the convergence property of the Sinkhorn algorithm for multiple marginals. In contrast to the sequential decision e.g.\ reinforcement learning-based solvers, we showcase our technique in solving constrained (specifically satisfiability) problems by one-shot neural networks, including i) a neural routing solver learned without supervision of optimal solutions; ii) a partial graph matching network handling graphs with unmatchable outliers on both sides; iii) a predictive network for financial portfolios with continuous constraints. To our knowledge, there exists no one-shot neural solver for these scenarios when they are formulated as satisfiability problems. Source code is available at https://github.com/Thinklab-SJTU/LinSATNet △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: This is a revised version of our ICML'23 publication that fixes a minor issue in Eq (11). In Proceedings of the 40th International Conference on Machine Learning (ICML'23)

arXiv:2407.13605 [pdf, other]

Physics-guided Active Sample Reweighting for Urban Flow Prediction

Authors: Wei Jiang, Tong Chen, Guanhua Ye, Wentao Zhang, Lizhen Cui, Zi Huang, Hongzhi Yin

Abstract: Urban flow prediction is a spatio-temporal modeling task that estimates the throughput of transportation services like buses, taxis, and ride-sharing, where data-driven models have become the most popular solution in the past decade. Meanwhile, the implicitly learned mapping between historical observations to the prediction targets tend to over-simplify the dynamics of real-world urban flows, lead… ▽ More Urban flow prediction is a spatio-temporal modeling task that estimates the throughput of transportation services like buses, taxis, and ride-sharing, where data-driven models have become the most popular solution in the past decade. Meanwhile, the implicitly learned mapping between historical observations to the prediction targets tend to over-simplify the dynamics of real-world urban flows, leading to suboptimal predictions. Some recent spatio-temporal prediction solutions bring remedies with the notion of physics-guided machine learning (PGML), which describes spatio-temporal data with nuanced and principled physics laws, thus enhancing both the prediction accuracy and interpretability. However, these spatio-temporal PGML methods are built upon a strong assumption that the observed data fully conforms to the differential equations that define the physical system, which can quickly become ill-posed in urban flow prediction tasks. The observed urban flow data, especially when sliced into time-dependent snapshots to facilitate predictions, is typically incomplete and sparse, and prone to inherent noise incurred in the collection process. As a result, such physical inconsistency between the data and PGML model significantly limits the predictive power and robustness of the solution. Moreover, due to the interval-based predictions and intermittent nature of data filing in many transportation services, the instantaneous dynamics of urban flows can hardly be captured, rendering differential equation-based continuous modeling a loose fit for this setting. To overcome the challenges, we develop a discretized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR) to enhance PN. Experimental results in four real-world datasets demonstrate that our method achieves state-of-the-art performance with a demonstrable improvement in robustness. △ Less

Submitted 6 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: This paper is accepted by Proceedings of the 33nd ACM International Conference on Information and Knowledge Management (CIKM '24)

arXiv:2407.13322 [pdf, other]

Fully Test-Time rPPG Estimation via Synthetic Signal-Guided Feature Learning

Authors: Pei-Kai Huang, Tzu-Hsien Chen, Ya-Ting Chan, Kuan-Wen Chen, Chiou-Ting Hsu

Abstract: Many remote photoplethysmography (rPPG) estimation models have achieved promising performance in the training domain but often fail to accurately estimate physiological signals or heart rates (HR) in the target domains. Domain generalization (DG) or domain adaptation (DA) techniques are therefore adopted during the offline training stage to adapt the model to either unobserved or observed target d… ▽ More Many remote photoplethysmography (rPPG) estimation models have achieved promising performance in the training domain but often fail to accurately estimate physiological signals or heart rates (HR) in the target domains. Domain generalization (DG) or domain adaptation (DA) techniques are therefore adopted during the offline training stage to adapt the model to either unobserved or observed target domains by utilizing all available source domain data. However, in rPPG estimation problems, the adapted model usually encounters challenges in estimating target data with significant domain variation. In contrast, Test-Time Adaptation (TTA) enables the model to adaptively estimate rPPG signals in various unseen domains by online adapting to unlabeled target data without referring to any source data. In this paper, we first establish a new TTA-rPPG benchmark that encompasses various domain information and HR distributions to simulate the challenges encountered in real-world rPPG estimation. Next, we propose a novel synthetic signal-guided rPPG estimation framework to address the forgetting issue during the TTA stage and to enhance the adaptation capability of the pre-trained rPPG model. To this end, we develop a synthetic signal-guided feature learning method by synthesizing pseudo rPPG signals as pseudo ground truths to guide a conditional generator in generating latent rPPG features. In addition, we design an effective spectral-based entropy minimization technique to encourage the rPPG model to learn new target domain information. Both the generated rPPG features and synthesized rPPG signals prevent the rPPG model from overfitting to target data and forgetting previously acquired knowledge, while also broadly covering various heart rate (HR) distributions. Our extensive experiments on the TTA-rPPG benchmark show that the proposed method achieves superior performance. △ Less

Submitted 15 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.13271 [pdf, other]

doi 10.1145/3650212.3680353

Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow

Authors: Jiachi Chen, Chong Chen, Jiang Hu, John Grundy, Yanlin Wang, Ting Chen, Zibin Zheng

Abstract: Smart contract developers frequently seek solutions to developmental challenges on Q&A platforms such as Stack Overflow (SO). Although community responses often provide viable solutions, the embedded code snippets can also contain hidden vulnerabilities. Integrating such code directly into smart contracts may make them susceptible to malicious attacks. We conducted an online survey and received 74… ▽ More Smart contract developers frequently seek solutions to developmental challenges on Q&A platforms such as Stack Overflow (SO). Although community responses often provide viable solutions, the embedded code snippets can also contain hidden vulnerabilities. Integrating such code directly into smart contracts may make them susceptible to malicious attacks. We conducted an online survey and received 74 responses from smart contract developers. The results of this survey indicate that the majority (86.4%) of participants do not sufficiently consider security when reusing SO code snippets. Despite the existence of various tools designed to detect vulnerabilities in smart contracts, these tools are typically developed for analyzing fully-completed smart contracts and thus are ineffective for analyzing typical code snippets as found on SO. We introduce SOChecker, the first tool designed to identify potential vulnerabilities in incomplete SO smart contract code snippets. SOChecker first leverages a fine-tuned Llama2 model for code completion, followed by the application of symbolic execution methods for vulnerability detection. Our experimental results, derived from a dataset comprising 897 code snippets collected from smart contract-related SO posts, demonstrate that SOChecker achieves an F1 score of 68.2%, greatly surpassing GPT-3.5 and GPT-4 (20.9% and 33.2% F1 Scores respectively). Our findings underscore the need to improve the security of code snippets from Q&A websites. △ Less

Submitted 23 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.13089 [pdf, other]

MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking

Authors: Ting-Chih Chen, Chia-Wei Tang, Chris Thomas

Abstract: Fact-checking real-world claims often requires reviewing multiple multimodal documents to assess a claim's truthfulness, which is a highly laborious and time-consuming task. In this paper, we present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal, multi-document datasets. The model takes inputs in the form of documents, images, and a cl… ▽ More Fact-checking real-world claims often requires reviewing multiple multimodal documents to assess a claim's truthfulness, which is a highly laborious and time-consuming task. In this paper, we present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal, multi-document datasets. The model takes inputs in the form of documents, images, and a claim, with the objective of assisting in fact-checking tasks. We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths. To train our model, we leverage a novel reinforcement learning-based entailment objective to generate summaries that provide evidence distinguishing between different truthfulness labels. To assess the efficacy of our approach, we conduct experiments on both an existing benchmark and a new dataset of multi-document claims that we contribute. Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset and demonstrates strong performance on our new Multi-News-Fact-Checking dataset. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 16 pages, 7 figures, The 62nd Annual Meeting of the Association for Computational Linguistics

arXiv:2407.12270 [pdf, other]

Observation of $Λ_c^+ \to Λa_0(980)^+$ and Evidence for $Σ(1380)^+$ in $Λ_c^+ \to Λπ^+ η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Based on $6.1~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at center-of-mass energies from 4.600~GeV to 4.843~GeV with the BESIII detector at the BEPCII collider, a partial wave analysis of $Λ_c^+\toΛπ^+η$ is performed, and branching fractions and decay asymmetry parameters of intermediate processes are determined. The process $Λ_c^+\toΛa_0(980)^+$ is observed for the first time, and… ▽ More Based on $6.1~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at center-of-mass energies from 4.600~GeV to 4.843~GeV with the BESIII detector at the BEPCII collider, a partial wave analysis of $Λ_c^+\toΛπ^+η$ is performed, and branching fractions and decay asymmetry parameters of intermediate processes are determined. The process $Λ_c^+\toΛa_0(980)^+$ is observed for the first time, and evidence for the pentaquark candidate $Σ(1380)^+$ decaying into $Λπ^+$ is found with statistical significance larger than $3σ$. The branching fraction product $\mathcal{B}(Λ_{c}^{+} \to Λa_0(980)^+) \; \mathcal{B}( a_0(980)^+ \to π^{+}η)$ is determined to be $(1.05 \pm 0.16_{\mathrm{stat}} \pm 0.05_{\mathrm{syst}} \pm 0.07_{\mathrm{ext}})\%$, which is larger than theoretical calculations by $1 - 2$ orders of magnitude. Here the third (external) systematic is from $\mathcal{B}(Λ_{c}^{+} \to Λπ^+ η)$. Finally, we precisely obtain the absolute branching fraction $\mathcal{B}(Λ_{c}^{+} \to Λπ^+ η) = (1.94 \pm 0.07_{\mathrm{stat}} \pm 0.11_{\mathrm{syst}})\%$. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 16 pages, 8 figures

arXiv:2407.11727 [pdf, ps, other]

Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(0.547\pm0.026_{\rm stat}\pm0.016_{\rm syst})\%$ a… ▽ More Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(0.547\pm0.026_{\rm stat}\pm0.016_{\rm syst})\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(5.60\pm0.16_{\rm stat}\pm0.20_{\rm syst})\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(246.5\pm5.9_{\rm stat}\pm3.6_{\rm syst}\pm0.5_{\rm input})_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(252.7\pm3.6_{\rm stat}\pm4.5_{\rm syst}\pm0.6_{\rm input}))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(252.8\pm6.0_{\rm stat}\pm3.7_{\rm syst}\pm0.6_{\rm input})_{μν}$ MeV and ${f_{D^+_s}}=(259.2\pm3.6_{\rm stat}\pm4.5_{\rm syst}\pm0.6_{\rm input})_{τν}$ MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(0.986\pm0.023_{\rm stat}\pm0.014_{\rm syst}\pm0.003_{\rm input})_{μν}$ and $|V_{cs}| = (1.011\pm0.014_{\rm stat}\pm0.018_{\rm syst}\pm0.003_{\rm input})_{τν}$, respectively. △ Less

Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

Comments: 27 pages, 13 figures

arXiv:2407.11572 [pdf, other]

doi 10.3847/2041-8213/ad5ffd

Discovery of an Extremely r-process-enhanced Thin-disk Star with [Eu/H] = +0.78

Authors: Xiao-Jin Xie, Jianrong Shi, Hong-Liang Yan, Tian-Yi Chen, Carlos Allende Prieto, Timothy C. Beers, Shuai Liu, Chun-Qian Li, Ming-Yi Ding, Yao-Jia Tang, Ruizhi Zhang, Renjing Xie

Abstract: Highly r-process-enhanced stars are rare and usually metal-poor ([Fe/H] < - 1.0), and mainly populate the Milky Way halo and dwarf galaxies. This study presents the discovery of a relatively bright (V = 12.72), highly r-process-enhanced (r-II) star ([Eu/Fe] = +1.32, [Ba/Eu] = - 0.95), LAMOST J020623.21 + 494127.9. This star was selected from the Large Sky Area Multi-Object Fiber Spectroscopic Tele… ▽ More Highly r-process-enhanced stars are rare and usually metal-poor ([Fe/H] < - 1.0), and mainly populate the Milky Way halo and dwarf galaxies. This study presents the discovery of a relatively bright (V = 12.72), highly r-process-enhanced (r-II) star ([Eu/Fe] = +1.32, [Ba/Eu] = - 0.95), LAMOST J020623.21 + 494127.9. This star was selected from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution (R ~ 7500) spectroscopic survey; follow-up high-resolution (R ~ 25,000) observations were conducted with the High Optical Resolution Spectrograph (HORuS) installed on the Gran Telescopio Canarias (GTC). The stellar parameters (${T_{\rm eff}}$ = 4130 K, $\rm log\,g $ = 1.52, $ \rm[Fe/H] $ = $ - $0.54, $ξ$ = 1.80 $ \rm{km\,{s^{-1}}} $) have been inferred taking into account non-local thermodynamic equilibrium (NLTE) effects. The abundances of [Ce/Fe], [Pr/Fe], and [Nd/Fe] are +0.19, +0.65 and +0.64, respectively, relatively low compared to the Solar r-process pattern normalized to Eu. This star has a high metallicity ([Fe/H] = - 0.54) compared to most other highly r-process-enhanced stars, and has the highest measured abundance ratio of Eu to H ([Eu/H] = +0.78). It is classified as a thin-disk star based on its kinematics, and does not appear to belong to any known stream or dwarf galaxy. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 5 figures, 3 tables

Journal ref: ApJL, 2024, Volume 970, Number 2, L30

arXiv:2407.11277 [pdf, other]

Target conversation extraction: Source separation using turn-taking dynamics

Authors: Tuochao Chen, Qirui Wang, Bohan Wu, Malek Itani, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota

Abstract: Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in h… ▽ More Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in human conversations, particularly turn-taking dynamics, which uniquely characterize speakers engaged in conversation and distinguish them from interfering speakers and noise. Using neural networks, we show the feasibility of our approach on English and Mandarin conversation datasets. In the presence of interfering speakers, our results show an 8.19 dB improvement in signal-to-noise ratio for 2-speaker conversations and a 7.92 dB improvement for 2-4-speaker conversations. Code, dataset available at https://github.com/chentuochao/Target-Conversation-Extraction. △ Less

Submitted 29 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted by Interspeech 2024

arXiv:2407.11055 [pdf, other]

Knowledge boosting during low-latency inference

Authors: Vidya Srinivas, Malek Itani, Tuochao Chen, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota

Abstract: Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not gu… ▽ More Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time. We propose knowledge boosting, a novel technique that allows a large model to operate on time-delayed input during inference, while still boosting small model performance. Using a streaming neural network that processes 8 ms chunks, we evaluate different speech separation and enhancement tasks with communication delays of up to six chunks or 48 ms. Our results show larger gains where the performance gap between the small and large models is wide, demonstrating a promising method for large-small model collaboration for low-latency applications. Code, dataset, and audio samples available at https://knowledgeboosting.cs.washington.edu/. △ Less

Submitted 25 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted by Interspeech 2024

arXiv:2407.11030 [pdf, other]

DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Authors: Zhen Tan, Daize Dong, Xinyu Zhao, Jie Peng, Yu Cheng, Tianlong Chen

Abstract: In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model… ▽ More In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples. Our framework is integrated with the Supervised Fine-Tuning (SFT) stage, eliminating the need for resource-intensive Continual Pre-Training (CPT). Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency. Our work offers a promising direction for building efficient yet powerful LLMs. We will release our implementation and model weights upon acceptance. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.10691 [pdf, other]

$\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

Authors: Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

Abstract: Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, thi… ▽ More Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces $\texttt{MixGR}$, which improves dense retrievers' awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. $\texttt{MixGR}$ fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that $\texttt{MixGR}$ outperforms previous document retrieval by 24.7% and 9.8% on nDCG@5 with unsupervised and supervised retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of $\texttt{MixGR}$to boost the application of LLMs in the scientific domain. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10613 [pdf, other]

Global destabilization of drift-tearing mode with coupling to discretized electron drift-wave instability

Authors: J. Bao, W. L. Zhang, Z. Lin, H. S. Cai, D. J. Liu, H. T. Chen, C. Dong, J. T. Cao, D. Li

Abstract: The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable sol… ▽ More The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable solutions. It is found that in the small EDD frequency (omega_*e) regime, DTM growth rate agrees well with local theory that is reduced with increasing omega_*e. However, when omega_*e exceeds a critical threshold omega_*crit, the strongly linear coupling between DTM and other discretized EDW instabilities happens so that the free energies from current and pressure channels can be released together and thus enhance the DTM, of which growth rate increases with increasing omega_*e and deviates from local theory results qualitatively. Correspondingly, a cross-scale mode structure forms with mixed polarization, namely, phi perturbation is dominated by electrostatic polarized short-wavelength oscillation as EDW instability character, and A_para perturbation remains typical tearing mode solution of Alfvenic polarized macroscopic structure. Within omega_*e > omega_*crit, the additional IDD causes phi oscillating structure to shift towards small density gradient domain, which cancels the extra drive from ion channel and thus DTM growth rate is insensitive to IDD frequency. Compared to EDD effects, the IDD effect alone with zero-omega_*e only leads to the stabilization of RTM that shows agreements between global simulation and local theory, which is no longer the condition for DTM regime. These results are useful for clarifying the DTM global properties with underlying physics mechanisms, which occurs in the regime of omega_*e >> gamma_c that is relevant to nowadays tokamak discharges with hot plasmas. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 23 pages, 15 figues

arXiv:2407.10275 [pdf, other]

Cross-Lingual Multi-Hop Knowledge Editing -- Benchmarks, Analysis and a Simple Contrastive Learning based Approach

Authors: Aditi Khandelwal, Harman Singh, Hengrui Gu, Tianlong Chen, Kaixiong Zhou

Abstract: Large language models are often expected to constantly adapt to new sources of knowledge and knowledge editing techniques aim to efficiently patch the outdated model knowledge, with minimal modification. Most prior works focus on monolingual knowledge editing in English, even though new information can emerge in any language from any part of the world. We propose the Cross-Lingual Multi-Hop Knowle… ▽ More Large language models are often expected to constantly adapt to new sources of knowledge and knowledge editing techniques aim to efficiently patch the outdated model knowledge, with minimal modification. Most prior works focus on monolingual knowledge editing in English, even though new information can emerge in any language from any part of the world. We propose the Cross-Lingual Multi-Hop Knowledge Editing paradigm, for measuring and analyzing the performance of various SoTA knowledge editing techniques in a cross-lingual setup. Specifically, we create a parallel cross-lingual benchmark, CROLIN-MQUAKE for measuring the knowledge editing capabilities. Our extensive analysis over various knowledge editing techniques uncover significant gaps in performance between the cross-lingual and English-centric setting. Following this, we propose a significantly improved system for cross-lingual multi-hop knowledge editing, CLEVER-CKE. CLEVER-CKE is based on a retrieve, verify and generate knowledge editing framework, where a retriever is formulated to recall edited facts and support an LLM to adhere to knowledge edits. We develop language-aware and hard-negative based contrastive objectives for improving the cross-lingual and fine-grained fact retrieval and verification process used in this framework. Extensive experiments on three LLMs, eight languages, and two datasets show CLEVER-CKE's significant gains of up to 30% over prior methods. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: Paper on Cross-Lingual Multi-Hop Knowledge Editing

arXiv:2407.10267 [pdf, other]

RS-NeRF: Neural Radiance Fields from Rolling Shutter Images

Authors: Muyao Niu, Tong Chen, Yifan Zhan, Zhuoxiao Li, Xiang Ji, Yinqiang Zheng

Abstract: Neural Radiance Fields (NeRFs) have become increasingly popular because of their impressive ability for novel view synthesis. However, their effectiveness is hindered by the Rolling Shutter (RS) effects commonly found in most camera systems. To solve this, we present RS-NeRF, a method designed to synthesize normal images from novel views using input with RS distortions. This involves a physical mo… ▽ More Neural Radiance Fields (NeRFs) have become increasingly popular because of their impressive ability for novel view synthesis. However, their effectiveness is hindered by the Rolling Shutter (RS) effects commonly found in most camera systems. To solve this, we present RS-NeRF, a method designed to synthesize normal images from novel views using input with RS distortions. This involves a physical model that replicates the image formation process under RS conditions and jointly optimizes NeRF parameters and camera extrinsic for each image row. We further address the inherent shortcomings of the basic RS-NeRF model by delving into the RS characteristics and developing algorithms to enhance its functionality. First, we impose a smoothness regularization to better estimate trajectories and improve the synthesis quality, in line with the camera movement prior. We also identify and address a fundamental flaw in the vanilla RS model by introducing a multi-sampling algorithm. This new approach improves the model's performance by comprehensively exploiting the RGB data across different rows for each intermediate camera pose. Through rigorous experimentation, we demonstrate that RS-NeRF surpasses previous methods in both synthetic and real-world scenarios, proving its ability to correct RS-related distortions effectively. Codes and data available: https://github.com/MyNiuuu/RS-NeRF △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV 2024 ; Codes and data: https://github.com/MyNiuuu/RS-NeRF

arXiv:2407.09860 [pdf, other]

Quantum Vicsek Model for Active Matter

Authors: Hong Yuan, L. X. Cui, L. T. Chen, C. P. Sun

Abstract: We propose a quantum analog of the Vicsek model, consisting of an ensemble of overdamped spin$-1/2$ particles with ferromagnetic couplings, driven by a uniformly polarized magnetic field. The spontaneous magnetization of the spin components breaks the $SO(3)$ (or $SO(2)$) symmetry, inducing an ordered phase of flocking. We derive the hydrodynamic equations, similar to those formulated by Toner and… ▽ More We propose a quantum analog of the Vicsek model, consisting of an ensemble of overdamped spin$-1/2$ particles with ferromagnetic couplings, driven by a uniformly polarized magnetic field. The spontaneous magnetization of the spin components breaks the $SO(3)$ (or $SO(2)$) symmetry, inducing an ordered phase of flocking. We derive the hydrodynamic equations, similar to those formulated by Toner and Tu, by applying a mean-field approximation to the quantum analog model up to the next leading order. Our investigation not only establishes a microscopic connection between the Vicsek model and the Toner-Tu hydrodynamics for active matter, but also aims to inspire further studies of active matter in the quantum regime. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.09694 [pdf, other]

Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images

Authors: Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu

Abstract: We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruc… ▽ More We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruction. To overcome this limitation, our method employs a "Divide and Fuse (D&F)" strategy, reconstructing human body parts independently before fusing them, thereby ensuring robustness against occlusions. We design Human Part Parametric Models (HPPM) that independently reconstruct the mesh from a few shape and global-location parameters, without inter-part dependency. A specially designed fusion module then seamlessly integrates the reconstructed parts, even when only a few are visible. We harness a large volume of ground-truth SMPL data to train our parametric mesh models. To facilitate the training and evaluation of our method, we have established benchmark datasets featuring images of partially visible humans with HPPM annotations. Our experiments, conducted on these benchmark datasets, demonstrate the effectiveness of our D&F method, particularly in scenarios with substantial invisibility, where traditional approaches struggle to maintain reconstruction quality. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV2024

arXiv:2407.09292 [pdf, other]

Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models

Authors: Dong Shu, Mingyu Jin, Tianle Chen, Chong Zhang, Yongfeng Zhang

Abstract: This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs), such as GPT-4 and LLaMA-2, by identifying and mitigating their vulnerabilities through explainable analysis of prompt attacks. We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively me… ▽ More This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs), such as GPT-4 and LLaMA-2, by identifying and mitigating their vulnerabilities through explainable analysis of prompt attacks. We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively measure attack effectiveness and explore the embedded defense mechanisms in these models. Our approach is distinctive for its capacity to elucidate the reasons behind the generation of harmful responses by LLMs through an incremental counterfactual methodology. By organizing the prompt modification process into four incremental levels: (word, sentence, character, and a combination of character and word) we facilitate a thorough examination of the susceptibilities inherent to LLMs. The findings from our study not only provide counterfactual explanation insight but also demonstrate that our framework significantly enhances the effectiveness of attack prompts. △ Less

Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

Comments: 23 pages, 6 figures

arXiv:2407.09274 [pdf, other]

Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX

Authors: Zhiyuan Chen, Tianhao Chen, Chenggang Xie, Yang Xue, Xiaonan Zhang, Jingbo Zhou, Xiaomin Fang

Abstract: Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. Th… ▽ More Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. These approaches restrict the understanding and generation of multimodal protein data. In contrast, large multimodal models have demonstrated potential capabilities in generating any-to-any content like text, images, and videos, thus enriching user interactions across various domains. Integrating these multimodal model technologies into protein research offers significant promise by potentially transforming how proteins are studied. To this end, we introduce HelixProtX, a system built upon the large multimodal model, aiming to offer a comprehensive solution to protein research by supporting any-to-any protein modality generation. Unlike existing methods, it allows for the transformation of any input protein modality into any desired protein modality. The experimental results affirm the advanced capabilities of HelixProtX, not only in generating functional descriptions from amino acid sequences but also in executing critical tasks such as designing protein sequences and structures from textual descriptions. Preliminary findings indicate that HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models. By integrating multimodal large models into protein research, HelixProtX opens new avenues for understanding protein biology, thereby promising to accelerate scientific discovery. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08956 [pdf, other]

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

Authors: Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, David Lo, Taolue Chen

Abstract: Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defens… ▽ More Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs. △ Less

Submitted 20 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: Under Review; Waiting for updates

arXiv:2407.08016 [pdf, other]

Source Tracing of Audio Deepfake Systems

Authors: Nicholas Klein, Tianxiang Chen, Hemlata Tak, Ricardo Casal, Elie Khoury

Abstract: Recent progress in generative AI technology has made audio deepfakes remarkably more realistic. While current research on anti-spoofing systems primarily focuses on assessing whether a given audio sample is fake or genuine, there has been limited attention on discerning the specific techniques to create the audio deepfakes. Algorithms commonly used in audio deepfake generation, like text-to-speech… ▽ More Recent progress in generative AI technology has made audio deepfakes remarkably more realistic. While current research on anti-spoofing systems primarily focuses on assessing whether a given audio sample is fake or genuine, there has been limited attention on discerning the specific techniques to create the audio deepfakes. Algorithms commonly used in audio deepfake generation, like text-to-speech (TTS) and voice conversion (VC), undergo distinct stages including input processing, acoustic modeling, and waveform generation. In this work, we introduce a system designed to classify various spoofing attributes, capturing the distinctive features of individual modules throughout the entire generation pipeline. We evaluate our system on two datasets: the ASVspoof 2019 Logical Access and the Multi-Language Audio Anti-Spoofing Dataset (MLAAD). Results from both experiments demonstrate the robustness of the system to identify the different spoofing attributes of deepfake generation systems. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by INTERSPEECH 2024

arXiv:2407.07884 [pdf, other]

Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation

Authors: Tao Chen, Eric Cousineau, Naveen Kuppuswamy, Pulkit Agrawal

Abstract: Recent studies have made significant progress in addressing dexterous manipulation problems, particularly in in-hand object reorientation. However, there are few existing works that explore the potential utilization of developed dexterous manipulation controllers for downstream tasks. In this study, we focus on constrained dexterous manipulation for food peeling. Food peeling presents various cons… ▽ More Recent studies have made significant progress in addressing dexterous manipulation problems, particularly in in-hand object reorientation. However, there are few existing works that explore the potential utilization of developed dexterous manipulation controllers for downstream tasks. In this study, we focus on constrained dexterous manipulation for food peeling. Food peeling presents various constraints on the reorientation controller, such as the requirement for the hand to securely hold the object after reorientation for peeling. We propose a simple system for learning a reorientation controller that facilitates the subsequent peeling task. Videos are available at: https://taochenshh.github.io/projects/veg-peeling. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07651 [pdf, other]

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07155 [pdf, other]

Across the soft gamma-ray regime: utilizing simultaneous detections in the Compton Spectrometer and Imager (COSI) and the Background and Transient Observer (BTO) to understand astrophysical transients

Authors: Hannah C. Gulick, Eliza Neights, Samer Al Nussirat, Claire Tianyi Chen, Kaylie Ching, Cassandra Dove, Alyson Joens, Carolyn Kierans, Hubert Liu, Israel Martinez, Tomas Mician, Shunsaku Nagasawa, Shreya Nandyala, Isabel Schmidtke, Derek Shah, Andreas Zoglauer, Kazuhiro Nakasawa, Tadayuki Takahashi, Juan-Carlos Martinez Oliveros, John A. Tomsick

Abstract: The Compton Spectrometer and Imager (COSI) is a NASA funded Small Explorer (SMEX) mission slated to launch in 2027. COSI will house a wide-field gamma-ray telescope designed to survey the entire sky in the 0.2--5 MeV range. Using germanium detectors, the instrument will provide imaging, spectroscopy, and polarimetry of astrophysical sources with excellent energy resolution and degree-scale localiz… ▽ More The Compton Spectrometer and Imager (COSI) is a NASA funded Small Explorer (SMEX) mission slated to launch in 2027. COSI will house a wide-field gamma-ray telescope designed to survey the entire sky in the 0.2--5 MeV range. Using germanium detectors, the instrument will provide imaging, spectroscopy, and polarimetry of astrophysical sources with excellent energy resolution and degree-scale localization capabilities. In addition to the main instrument, COSI will fly with a student collaboration project known as the Background and Transient Observer (BTO). BTO will extend the COSI bandpass to energies lower than 200 keV, thus enabling spectral analysis across the shared band of 30 keV--2 MeV range. The BTO instrument will consist of two NaI scintillators and student-designed readout electronics. Using spectral information from both the COSI and BTO instruments, physics such as the energy peak turnover in gamma-ray bursts, the characteristics of magnetar flares, and the event frequency of a range of transient phenomena will be constrained. In this paper, we present the expected science returnables from BTO and comment on the shared returnables from the COSI and BTO missions. We include simulations of gamma-ray bursts, magnetar giant flares, and terrestrial gamma-ray flashes using BTO's spectral response. Additionally, we estimate BTO's gamma-ray burst detection rate and find that BTO will detect ~150 gamma-ray bursts per year, with most of these events being long bursts. △ Less

Submitted 28 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: 12 pages of text with an additional 2 pages for acknowledgments and citations. 9 figures. 1 table

Journal ref: SPIE, 2024

arXiv:2407.07087 [pdf, other]

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

Authors: Tong Chen, Akari Asai, Niloofar Mireshghallah, Sewon Min, James Grimmelmann, Yejin Choi, Hannaneh Hajishirzi, Luke Zettlemoyer, Pang Wei Koh

Abstract: Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities. Although both literal and non-literal similarities are considered by courts when assessing the degree of reproduction, prior research has focused only on literal similarities. To bridge this gap, we introduce CopyBench, a benchmark designed to me… ▽ More Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities. Although both literal and non-literal similarities are considered by courts when assessing the degree of reproduction, prior research has focused only on literal similarities. To bridge this gap, we introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations. Using copyrighted fiction books as text sources, we provide automatic evaluation protocols to assess literal and non-literal copying, balanced against the model utility in terms of the ability to recall facts from the copyrighted works and generate fluent completions. We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters. Larger models demonstrate significantly more copying, with literal copying rates increasing from 0.2% to 10.5% and non-literal copying from 2.3% to 6.9% when comparing Llama3-8B and 70B models, respectively. We further evaluate the effectiveness of current strategies for mitigating copying and show that (1) training-time alignment can reduce literal copying but may increase non-literal copying, and (2) current inference-time mitigation methods primarily reduce literal but not non-literal copying. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06844 [pdf, other]

Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

Authors: Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

Abstract: Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This pa… ▽ More Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This paper introduces the Multi-Label Confidence Calibration (MLCC) task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Unlike single-label images, multi-label images contain multiple objects, leading to semantic confusion and further unreliability in confidence scores. Existing single-label calibration methods, based on label smoothing, fail to account for category correlations, which are crucial for addressing semantic confusion, thereby yielding sub-optimal performance. To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization. DCLR learns dynamic instance-level and prototype-level similarities specific to each category, using these to measure semantic correlations across different categories. With this understanding, we construct adaptive label vectors that assign higher values to categories with strong correlations, thereby facilitating more effective regularization. We establish an evaluation benchmark, re-implementing several advanced confidence calibration algorithms and applying them to leading multi-label recognition (MLR) models for fair comparison. Through extensive experiments, we demonstrate the superior performance of DCLR over existing methods in providing reliable confidence scores in multi-label scenarios. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: submitted to TIP

arXiv:2407.06503 [pdf, other]

Preference-Guided Reinforcement Learning for Efficient Exploration

Authors: Guojian Wang, Faguo Wu, Xiao Zhang, Tianyuan Chen, Xuyang Chen, Lin Zhao

Abstract: In this paper, we investigate preference-based reinforcement learning (PbRL) that allows reinforcement learning (RL) agents to learn from human feedback. This is particularly valuable when defining a fine-grain reward function is not feasible. However, this approach is inefficient and impractical for promoting deep exploration in hard-exploration tasks with long horizons and sparse rewards. To tac… ▽ More In this paper, we investigate preference-based reinforcement learning (PbRL) that allows reinforcement learning (RL) agents to learn from human feedback. This is particularly valuable when defining a fine-grain reward function is not feasible. However, this approach is inefficient and impractical for promoting deep exploration in hard-exploration tasks with long horizons and sparse rewards. To tackle this issue, we introduce LOPE: Learning Online with trajectory Preference guidancE, an end-to-end preference-guided RL framework that enhances exploration efficiency in hard-exploration tasks. Our intuition is that LOPE directly adjusts the focus of online exploration by considering human feedback as guidance, avoiding learning a separate reward model from preferences. Specifically, LOPE includes a two-step sequential policy optimization process consisting of trust-region-based policy improvement and preference guidance steps. We reformulate preference guidance as a novel trajectory-wise state marginal matching problem that minimizes the maximum mean discrepancy distance between the preferred trajectories and the learned policy. Furthermore, we provide a theoretical analysis to characterize the performance improvement bound and evaluate the LOPE's effectiveness. When assessed in various challenging hard-exploration environments, LOPE outperforms several state-of-the-art methods regarding convergence rate and overall performance. The code used in this study is available at \url{https://github.com/buaawgj/LOPE}. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 13 pages, 17 figures

arXiv:2407.06483 [pdf, other]

Composable Interventions for Language Models

Authors: Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen

Abstract: Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventi… ▽ More Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: https://github.com/hartvigsen-group/composable-interventions. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.06304 [pdf, other]

VIMI: Grounding Video Generation through Multi-modal Instruction

Authors: Yuwei Fang, Willi Menapace, Aliaksandr Siarohin, Tsai-Shien Chen, Kuan-Chien Wang, Ivan Skorokhodov, Graham Neubig, Sergey Tulyakov

Abstract: Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their versatility and application in multimodal integration. To address this, we construct a large-scale multimodal prompt dataset by employing retrieval metho… ▽ More Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their versatility and application in multimodal integration. To address this, we construct a large-scale multimodal prompt dataset by employing retrieval methods to pair in-context examples with the given text prompts and then utilize a two-stage training strategy to enable diverse video generation tasks within the same model. In the first stage, we propose a multimodal conditional video generation framework for pretraining on these augmented datasets, establishing a foundational model for grounded video generation. Secondly, we finetune the model from the first stage on three video generation tasks, incorporating multi-modal instructions. This process further refines the model's ability to handle diverse inputs and tasks, ensuring seamless integration of multi-modal information. After this two-stage train-ing process, VIMI demonstrates multimodal understanding capabilities, producing contextually rich and personalized videos grounded in the provided inputs, as shown in Figure 1. Compared to previous visual grounded video generation methods, VIMI can synthesize consistent and temporally coherent videos with large motion while retaining the semantic control. Lastly, VIMI also achieves state-of-the-art text-to-video generation results on UCF101 benchmark. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.06079 [pdf, other]

Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis

Authors: Emaad Khwaja, Abdullah Rashwan, Ting Chen, Oliver Wang, Suraj Kothawade, Yeqing Li

Abstract: We present a one-shot text-to-image diffusion model that can generate high-resolution images from natural language descriptions. Our model employs a layered U-Net architecture that simultaneously synthesizes images at multiple resolution scales. We show that this method outperforms the baseline of synthesizing images only at the target resolution, while reducing the computational cost per step. We… ▽ More We present a one-shot text-to-image diffusion model that can generate high-resolution images from natural language descriptions. Our model employs a layered U-Net architecture that simultaneously synthesizes images at multiple resolution scales. We show that this method outperforms the baseline of synthesizing images only at the target resolution, while reducing the computational cost per step. We demonstrate that higher resolution synthesis can be achieved by layering convolutions at additional resolution scales, in contrast to other methods which require additional models for super-resolution synthesis. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05160 [pdf, other]

Relativistic Exact Two-Component Coupled-Cluster Study of Molecular Sensitivity Factors for Nuclear Schiff Moments

Authors: Tianxiang Chen, Chaoqun Zhang, Lan Cheng, Kia Boon Ng, Stephan Malbrunot-Ettenauer, Victor V. Flambaum, Zack Lasner, John M. Doyle, Phelan Yu, Chandler J. Conn, Chi Zhang, Nicholas R. Hutzler, Andrew M. Jayich, Benjamin Augenbraun, David Demille

Abstract: Relativistic exact two-component coupled-cluster calculations of molecular sensitivity factors for nuclear Schiff moments (NSMs) are reported. We focus on molecules containing heavy nuclei, especially octupole-deformed nuclei. Analytic relativistic coupled-cluster gradient techniques are used and serve as useful tools for identifying candidate molecules that sensitively probe for physics beyond th… ▽ More Relativistic exact two-component coupled-cluster calculations of molecular sensitivity factors for nuclear Schiff moments (NSMs) are reported. We focus on molecules containing heavy nuclei, especially octupole-deformed nuclei. Analytic relativistic coupled-cluster gradient techniques are used and serve as useful tools for identifying candidate molecules that sensitively probe for physics beyond the Standard Model in the hadronic sector. Notably, these tools enable straightforward ``black-box'' calculations. Two competing chemical mechanisms that contribute to the NSM are analyzed, illuminating the physics of ligand effects on NSM sensitivity factors. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04686 [pdf, other]

Near-optimal hierarchical matrix approximation from matrix-vector products

Authors: Tyler Chen, Feyza Duman Keles, Diana Halikias, Cameron Musco, Christopher Musco, David Persson

Abstract: We describe a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an $n\times n$ matrix $\mathbf{A}$, accessible only though matrix-vector products with $\mathbf{A}$ and $\mathbf{A}^{\mathsf{T}}$. We prove that, for the rank-$k$ HODLR approximation problem, our method achieves a $(1+β)^{\log(n)}$-optimal approximation in expected Frobenius… ▽ More We describe a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an $n\times n$ matrix $\mathbf{A}$, accessible only though matrix-vector products with $\mathbf{A}$ and $\mathbf{A}^{\mathsf{T}}$. We prove that, for the rank-$k$ HODLR approximation problem, our method achieves a $(1+β)^{\log(n)}$-optimal approximation in expected Frobenius norm using $O(k\log(n)/β^3)$ matrix-vector products. In particular, the algorithm obtains a $(1+\varepsilon)$-optimal approximation with $O(k\log^4(n)/\varepsilon^3)$ matrix-vector products, and for any constant $c$, an $n^c$-optimal approximation with $O(k \log(n))$ matrix-vector products. Apart from matrix-vector products, the additional computational cost of our method is just $O(n \operatorname{poly}(\log(n), k, β))$. We complement the upper bound with a lower bound, which shows that any matrix-vector query algorithm requires at least $Ω(k\log(n) + k/\varepsilon)$ queries to obtain a $(1+\varepsilon)$-optimal approximation. Our algorithm can be viewed as a robust version of widely used "peeling" methods for recovering HODLR matrices and is, to the best of our knowledge, the first matrix-vector query algorithm to enjoy theoretical worst-case guarantees for approximation by any hierarchical matrix class. To control the propagation of error between levels of hierarchical approximation, we introduce a new perturbation bound for low-rank approximation, which shows that the widely used Generalized Nyström method enjoys inherent stability when implemented with noisy matrix-vector products. We also introduced a novel randomly perforated matrix sketching method to further control the error in the peeling algorithm. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04338 [pdf, other]

Entanglement distribution based on quantum walk in arbitrary quantum networks

Authors: Tianen Chen, Yun Shang, Chitong Chen, Heng Fan

Abstract: In large-scale quantum networks, distributing the multi-particle entangled state among selected nodes is crucial for realizing long-distance and complicated quantum communication. Quantum repeaters provides an efficient method to generate entanglement between distant nodes. However, it is difficult to extend quantum repeater protocols to high-dimensional quantum states in existing experiments. Her… ▽ More In large-scale quantum networks, distributing the multi-particle entangled state among selected nodes is crucial for realizing long-distance and complicated quantum communication. Quantum repeaters provides an efficient method to generate entanglement between distant nodes. However, it is difficult to extend quantum repeater protocols to high-dimensional quantum states in existing experiments. Here we develop a series of scheme for generating high-dimensional entangled states via quantum walks with multiple coins or single coin by quantum repeaters, including $d$-dimensional Bell states, multi-particle high dimensional GHZ states etc.. Furthermore, we give entanglement distribution schemes on arbitrary quantum networks according to the above theoretical framework. As applications, we construct quantum fractal networks and multiparty quantum secret sharing protocols based on $d$-dimensional GHZ states. In the end, we give the experiment implementing of various 2-party or 3-party entanglement generation schemes based on repeaters. Our work can serve as a building block for constructing larger and more complex quantum networks. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04055 [pdf, other]

Benchmark on Drug Target Interaction Modeling from a Structure Perspective

Authors: Xinnan Zhang, Jialin Wu, Junyi Xie, Tianlong Chen, Kaixiong Zhou

Abstract: The prediction modeling of drug-target interactions is crucial to drug discovery and design, which has seen rapid advancements owing to deep learning technologies. Recently developed methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets by effectively extracting structural information. However, the benchmarking of… ▽ More The prediction modeling of drug-target interactions is crucial to drug discovery and design, which has seen rapid advancements owing to deep learning technologies. Recently developed methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets by effectively extracting structural information. However, the benchmarking of these novel methods often varies significantly in terms of hyperparameter settings and datasets, which limits algorithmic progress. In view of these, we conduct a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective, via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms. To this end, we first unify the hyperparameter setting within each class of structure learning methods. Moreover, we conduct a macroscopical comparison between these two classes of encoding strategies as well as the different featurization techniques that inform molecules' chemical and physical properties. We then carry out the microscopical comparison between all the integrated models across the six datasets, via comprehensively benchmarking their effectiveness and efficiency. Remarkably, the summarized insights from the benchmark studies lead to the design of model combos. We demonstrate that our combos can achieve new state-of-the-art performance on various datasets associated with cost-effective memory and computation. Our code is available at \hyperlink{https://github.com/justinwjl/GTB-DTI/tree/main}{https://github.com/justinwjl/GTB-DTI/tree/main}. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Submitted to NIPS 2024 Dataset and Benchmark

Showing 51–100 of 3,069 results for author: Chen, T