Search | arXiv e-print repository

Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations

Authors: Lei Fan, Jianxiong Zhou, Xiaoying Xing, Ying Wu

Abstract: Active recognition, which allows intelligent agents to explore observations for better recognition performance, serves as a prerequisite for various embodied AI tasks, such as grasping, navigation and room arrangements. Given the evolving environment and the multitude of object classes, it is impractical to include all possible classes during the training stage. In this paper, we aim at advancing… ▽ More Active recognition, which allows intelligent agents to explore observations for better recognition performance, serves as a prerequisite for various embodied AI tasks, such as grasping, navigation and room arrangements. Given the evolving environment and the multitude of object classes, it is impractical to include all possible classes during the training stage. In this paper, we aim at advancing active open-vocabulary recognition, empowering embodied agents to actively perceive and classify arbitrary objects. However, directly adopting recent open-vocabulary classification models, like Contrastive Language Image Pretraining (CLIP), poses its unique challenges. Specifically, we observe that CLIP's performance is heavily affected by the viewpoint and occlusions, compromising its reliability in unconstrained embodied perception scenarios. Further, the sequential nature of observations in agent-environment interactions necessitates an effective method for integrating features that maintains discriminative strength for open-vocabulary classification. To address these issues, we introduce a novel agent for active open-vocabulary recognition. The proposed method leverages inter-frame and inter-concept similarities to navigate agent movements and to fuse features, without relying on class-specific knowledge. Compared to baseline CLIP model with 29.6% accuracy on ShapeNet dataset, the proposed agent could achieve 53.3% accuracy for open-vocabulary recognition, without any fine-tuning to the equipped CLIP model. Additional experiments conducted with the Habitat simulator further affirm the efficacy of our method. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.13534 [pdf, other]

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

Authors: Shitao Xiao, Zheng Liu, Peitian Zhang, Xingrun Xing

Abstract: The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model mergin… ▽ More The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging, where the fine-tuned language model is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail. △ Less

Submitted 8 December, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: Work is in progress

arXiv:2311.11538 [pdf, other]

Assessing Prompt Injection Risks in 200+ Custom GPTs

Authors: Jiahao Yu, Yuhang Wu, Dong Shu, Mingyu Jin, Sabrina Yang, Xinyu Xing

Abstract: In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature - customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testi… ▽ More In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature - customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy. △ Less

Submitted 25 May, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: Accepted in ICLR 2024 Workshop on Secure and Trustworthy Large Language Models

arXiv:2311.06258 [pdf, other]

Post-COVID Highlights: Challenges and Solutions of AI Techniques for Swift Identification of COVID-19

Authors: Yingying Fang, Xiaodan Xing, Shiyi Wang, Simon Walsh, Guang Yang

Abstract: Since the onset of the COVID-19 pandemic in 2019, there has been a concerted effort to develop cost-effective, non-invasive, and rapid AI-based tools. These tools were intended to alleviate the burden on healthcare systems, control the rapid spread of the virus, and enhance intervention outcomes, all in response to this unprecedented global crisis. As we transition into a post-COVID era, we retros… ▽ More Since the onset of the COVID-19 pandemic in 2019, there has been a concerted effort to develop cost-effective, non-invasive, and rapid AI-based tools. These tools were intended to alleviate the burden on healthcare systems, control the rapid spread of the virus, and enhance intervention outcomes, all in response to this unprecedented global crisis. As we transition into a post-COVID era, we retrospectively evaluate these proposed studies and offer a review of the techniques employed in AI diagnostic models, with a focus on the solutions proposed for different challenges. This review endeavors to provide insights into the diverse solutions designed to address the multifaceted challenges that arose during the pandemic. By doing so, we aim to prepare the AI community for the development of AI tools tailored to address public health emergencies effectively. △ Less

Submitted 24 November, 2023; v1 submitted 24 September, 2023; originally announced November 2023.

arXiv:2311.01066 [pdf, other]

Dynamic Multimodal Information Bottleneck for Multimodality Classification

Authors: Yingying Fang, Shuang Wu, Sheng Zhang, Chaoyan Huang, Tieyong Zeng, Xiaodan Xing, Simon Walsh, Guang Yang

Abstract: Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These appro… ▽ More Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These approaches are generally not optimal for clinical settings, which pose the additional challenges of limited training data, as well as being rife with redundant data or noisy modality channels, leading to subpar performance. To address this gap, we study the robustness of existing methods to data redundancy and noise and propose a generalized dynamic multimodal information bottleneck framework for attaining a robust fused feature representation. Specifically, our information bottleneck module serves to filter out the task-irrelevant information and noises in the fused feature, and we further introduce a sufficiency loss to prevent dropping of task-relevant information, thus explicitly preserving the sufficiency of prediction information in the distilled feature. We validate our model on an in-house and a public COVID19 dataset for mortality prediction as well as two public biomedical datasets for diagnostic tasks. Extensive experiments show that our method surpasses the state-of-the-art and is significantly more robust, being the only method to remain performance when large-scale noisy channels exist. Our code is publicly available at https://github.com/ayanglab/DMIB. △ Less

Submitted 25 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: WACV 2024

arXiv:2311.00273 [pdf, other]

SoulChat: Improving LLMs' Empathy, Listening, and Comfort Abilities through Fine-tuning with Multi-turn Empathy Conversations

Authors: Yirong Chen, Xiaofen Xing, Jingkai Lin, Huimin Zheng, Zhenyu Wang, Qi Liu, Xiangmin Xu

Abstract: Large language models (LLMs) have been widely applied in various fields due to their excellent capability for memorizing knowledge and chain of thought (CoT). When these language models are applied in the field of psychological counseling, they often rush to provide universal advice. However, when users seek psychological support, they need to gain empathy, trust, understanding and comfort, rather… ▽ More Large language models (LLMs) have been widely applied in various fields due to their excellent capability for memorizing knowledge and chain of thought (CoT). When these language models are applied in the field of psychological counseling, they often rush to provide universal advice. However, when users seek psychological support, they need to gain empathy, trust, understanding and comfort, rather than just reasonable advice. To this end, we constructed a multi-turn empathetic conversation dataset of more than 2 million samples, in which the input is the multi-turn conversation context, and the target is empathetic responses that cover expressions such as questioning, comfort, recognition, listening, trust, emotional support, etc. Experiments have shown that the empathy ability of LLMs can be significantly enhanced when finetuning by using multi-turn dialogue history and responses that are closer to the expression of a psychological consultant. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: Appectped to Findings of EMNLP2023

arXiv:2310.15985 [pdf, other]

Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Authors: Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, Nathan Jacobs

Abstract: This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specificall… ▽ More This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at https://github.com/mvrl/VLPL. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.15896 [pdf, other]

BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT

Authors: Yirong Chen, Zhenyu Wang, Xiaofen Xing, huimin zheng, Zhipei Xu, Kai Fang, Junhong Wang, Sihang Li, Jieling Wu, Qi Liu, Xiangmin Xu

Abstract: Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independen… ▽ More Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independently select the useful part. It is mainly caused by the missing ability to engage in multi-turn questioning. In real-world medical consultations, doctors usually employ a series of iterative inquiries to comprehend the patient's condition thoroughly, enabling them to provide effective and personalized suggestions subsequently, which can be defined as chain of questioning (CoQ) for LLMs. To improve the CoQ of LLMs, we propose BianQue, a ChatGLM-based LLM finetuned with the self-constructed health conversation dataset BianQueCorpus that is consist of multiple turns of questioning and health suggestions polished by ChatGPT. Experimental results demonstrate that the proposed BianQue can simultaneously balance the capabilities of both questioning and health suggestions, which will help promote the research and application of LLMs in the field of proactive health. △ Less

Submitted 4 December, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.11295 [pdf, other]

CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation

Authors: Zhaojie Chu, Kailing Guo, Xiaofen Xing, Yilin Lan, Bolun Cai, Xiangmin Xu

Abstract: Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly mapping single-level speech features to the entire facial animation, which… ▽ More Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly mapping single-level speech features to the entire facial animation, which overlook the differences in facial activity intensity leading to overly smoothed facial movements. In this study, we propose a novel framework, CorrTalk, which effectively establishes the temporal correlation between hierarchical speech features and facial activities of different intensities across distinct regions. A novel facial activity intensity metric is defined to distinguish between strong and weak facial activity, obtained by computing the short-time Fourier transform of facial vertex displacements. Based on the variances in facial activity, we propose a dual-branch decoding framework to synchronously synthesize strong and weak facial activity, which guarantees wider intensity facial animation synthesis. Furthermore, a weighted hierarchical feature encoder is proposed to establish temporal correlation between hierarchical speech features and facial activity at different intensities, which ensures lip-sync and plausible facial expressions. Extensive qualitatively and quantitatively experiments as well as a user study indicate that our CorrTalk outperforms existing state-of-the-art methods. The source code and supplementary video are publicly available at: https://zjchu.github.io/projects/CorrTalk/ △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.09758 [pdf]

Genome hybridization: A universal way for the origin and diversification of organelles as well as the origin and speciation of eukaryotes

Authors: Qing-lin Dong, Xiang-ying Xing

Abstract: The origin of organelles (mitochondrion, chloroplast and nucleus) remains enigmatic. The endosymbiotic hypothesis that chloroplasts, mitochondria and nuclei descend from the endosymbiotic cyanobacterium, bacterium and archaebacterium respectively is dominant yet uncompelling, while our discovery of de novo organelle biogenesis in the cyanobacterium TDX16 that had acquired the genome of its green a… ▽ More The origin of organelles (mitochondrion, chloroplast and nucleus) remains enigmatic. The endosymbiotic hypothesis that chloroplasts, mitochondria and nuclei descend from the endosymbiotic cyanobacterium, bacterium and archaebacterium respectively is dominant yet uncompelling, while our discovery of de novo organelle biogenesis in the cyanobacterium TDX16 that had acquired the genome of its green algal host Haematococcus pluvialis overturns this hypothesis. In light of organelle biogenesis in the cyanobacterium TDX16 in combination with the relevant cellular and molecular evidence, we propose genome hybridization hypothesis (GHH) that the origin of organelles and origin of eukaryotes as well as the diversification of organelles and speciation of eukaryotes are unified and achieved by genome hybridization: the endosymbiotic cyanobacteria or bacteria obtain genomes of their archaebacterial or eukaryotic hosts and hybridize with their own ones resulting in expanded genomes containing a mixture of hybrid prokaryotic genes and eukaryotic genes, and thus the cyanobacteria or bacteria have to compartmentalize to accommodate different genes for specialized function of photosynthesis (chloroplast), respiration (mitochondrion) and DNA preservation (nucleus), and consequently turn into photosynthetic or heterotrophic eukaryotes. Accordingly, eukaryotes and their organelles are of multiple origin, while the formation of cancer cells is the speciation of eukaryotes as cancer cells are new species of unicellular eukaryotes arising from bacteria. Therefore, GHH provides a theoretical framework unifying evolutionary biology, cancer biology and cell biology and directing the integrated multidisciplinary research. △ Less

Submitted 7 May, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

Comments: 22 pages with two tables; added references for section 2; revised testable predictions for Section 5

arXiv:2310.05171 [pdf, other]

Multi-Ship Tracking by Robust Similarity metric

Authors: Hongyu Zhao, Gongming Wei, Yang Xiao, Xianglei Xing

Abstract: Multi-ship tracking (MST) as a core technology has been proven to be applied to situational awareness at sea and the development of a navigational system for autonomous ships. Despite impressive tracking outcomes achieved by multi-object tracking (MOT) algorithms for pedestrian and vehicle datasets, these models and techniques exhibit poor performance when applied to ship datasets. Intersection of… ▽ More Multi-ship tracking (MST) as a core technology has been proven to be applied to situational awareness at sea and the development of a navigational system for autonomous ships. Despite impressive tracking outcomes achieved by multi-object tracking (MOT) algorithms for pedestrian and vehicle datasets, these models and techniques exhibit poor performance when applied to ship datasets. Intersection of Union (IoU) is the most popular metric for computing similarity used in object tracking. The low frame rates and severe image shake caused by wave turbulence in ship datasets often result in minimal, or even zero, Intersection of Union (IoU) between the predicted and detected bounding boxes. This issue contributes to frequent identity switches of tracked objects, undermining the tracking performance. In this paper, we address the weaknesses of IoU by incorporating the smallest convex shapes that enclose both the predicted and detected bounding boxes. The calculation of the tracking version of IoU (TIoU) metric considers not only the size of the overlapping area between the detection bounding box and the prediction box, but also the similarity of their shapes. Through the integration of the TIoU into state-of-the-art object tracking frameworks, such as DeepSort and ByteTrack, we consistently achieve improvements in the tracking performance of these frameworks. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2309.16535 [pdf, other]

KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models

Authors: Yiming Ju, Xingrun Xing, Zhixiong Zeng

Abstract: Recently, Locate-Then-Edit paradigm has emerged as one of the main approaches in changing factual knowledge stored in the Language models. However, there is a lack of research on whether present locating methods can pinpoint the exact parameters embedding the desired knowledge. Moreover, although many researchers have questioned the validity of locality hypothesis of factual knowledge, no method i… ▽ More Recently, Locate-Then-Edit paradigm has emerged as one of the main approaches in changing factual knowledge stored in the Language models. However, there is a lack of research on whether present locating methods can pinpoint the exact parameters embedding the desired knowledge. Moreover, although many researchers have questioned the validity of locality hypothesis of factual knowledge, no method is provided to test the a hypothesis for more in-depth discussion and research. Therefore, we introduce KLoB, a benchmark examining three essential properties that a reliable knowledge locating method should satisfy. KLoB can serve as a benchmark for evaluating existing locating methods in language models, and can contributes a method to reassessing the validity of locality hypothesis of factual knowledge. KLoB is publicly available at an anonymous GitHub: \url{https://github.com/anon6662/KLoB}. △ Less

Submitted 26 August, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.15485 [pdf, other]

Style Transfer and Self-Supervised Learning Powered Myocardium Infarction Super-Resolution Segmentation

Authors: Lichao Wang, Jiahao Huang, Xiaodan Xing, Yinzhe Wu, Ramyah Rajakulasingam, Andrew D. Scott, Pedro F Ferreira, Ranil De Silva, Sonia Nielles-Vallespin, Guang Yang

Abstract: This study proposes a pipeline that incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. The proposed pipeline aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) a… ▽ More This study proposes a pipeline that incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. The proposed pipeline aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) areas. Subsequently, the segmentation task is performed on the LGE style image. An end-to-end super-resolution segmentation model is introduced to generate high-resolution mask from low-resolution LGE style DTI image. Further, to enhance the performance of the model, a multi-task self-supervised learning strategy is employed to pre-train the super-resolution segmentation model, allowing it to acquire more representative knowledge and improve its segmentation performance after fine-tuning. https: github.com/wlc2424762917/Med_Img △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 6 pages, 8 figures, conference, accepted by SIPAIM2023

arXiv:2309.14157 [pdf, other]

LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Authors: Pucheng Zhai, Kailing Guo, Fang Liu, Xiaofen Xing, Xiangmin Xu

Abstract: Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to assign different pruning rates across different layers in CNN or cannot control the compression rate explicitly. Since too narrow network blocks information flow… ▽ More Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to assign different pruning rates across different layers in CNN or cannot control the compression rate explicitly. Since too narrow network blocks information flow for training, automatic pruning rate setting cannot explore a high pruning rate for a specific layer. To overcome these limitations, we propose a novel framework named Layer Adaptive Progressive Pruning (LAPP), which gradually compresses the network during initial training of a few epochs from scratch. In particular, LAPP designs an effective and efficient pruning strategy that introduces a learnable threshold for each layer and FLOPs constraints for network. Guided by both task loss and FLOPs constraints, the learnable thresholds are dynamically and gradually updated to accommodate changes of importance scores during training. Therefore the pruning strategy can gradually prune the network and automatically determine the appropriate pruning rates for each layer. What's more, in order to maintain the expressive power of the pruned layer, before training starts, we introduce an additional lightweight bypass for each convolutional layer to be pruned, which only adds relatively few additional burdens. Our method demonstrates superior performance gains over previous compression methods on various datasets and backbone architectures. For example, on CIFAR-10, our method compresses ResNet-20 to 40.3% without accuracy drop. 55.6% of FLOPs of ResNet-18 are reduced with 0.21% top-1 accuracy increase and 0.40% top-5 accuracy increase on ImageNet. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 12 pages, 8 tables, 3 figures

arXiv:2309.12691 [pdf]

Characterizing the temporally stable structure of community evolution in intra-urban origin-destination networks

Authors: Xiao-Jian Chen, Yuhui Zhao, Chaogui Kang, Xiaoyue Xing, Quanhua Dong, Yu Liu

Abstract: Intra-urban origin-destination (OD) network communities evolve throughout the day, indicating changing groups of closely connected regions. Under this variation, groups of regions with high consistency of community affiliation characterize the temporally stable structure of the evolution process, aiding in comprehending urban dynamics. However, how to quantify this consistency and identify these g… ▽ More Intra-urban origin-destination (OD) network communities evolve throughout the day, indicating changing groups of closely connected regions. Under this variation, groups of regions with high consistency of community affiliation characterize the temporally stable structure of the evolution process, aiding in comprehending urban dynamics. However, how to quantify this consistency and identify these groups are open questions. In this study, we introduce the consensus OD network to quantify the consistency of community affiliation among regions. Furthermore, the temporally stable community decomposition method is proposed to identify groups of regions with high internal and low external consistency (named "stable groups"), where each group consists of temporally stable cores and attaching peripheries. Wuhan taxi data is used to verify our methods. On the hourly time scale, eleven stable groups containing 82.9% of regions are identified. This high percentage suggests that dynamic communities can be well organized via cores. Moreover, stable groups are spatially closed and more likely to distribute within a single district and separated by water bodies. Cores exhibit higher POI entropy and more healthcare and shopping services than peripheries. Our methods and empirical findings contribute to some practical issues, such as urban area division, polycentric evaluation and construction, and infectious disease control. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.10253 [pdf, other]

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Authors: Jiahao Yu, Xingwei Lin, Zheng Yu, Xinyu Xing

Abstract: Large language models (LLMs) have recently experienced tremendous popularity and are widely used from casual conversations to AI-driven programming. However, despite their considerable success, LLMs are not entirely reliable and can give detailed guidance on how to conduct harmful or illegal activities. While safety measures can reduce the risk of such outputs, adversarial jailbreak attacks can st… ▽ More Large language models (LLMs) have recently experienced tremendous popularity and are widely used from casual conversations to AI-driven programming. However, despite their considerable success, LLMs are not entirely reliable and can give detailed guidance on how to conduct harmful or illegal activities. While safety measures can reduce the risk of such outputs, adversarial jailbreak attacks can still exploit LLMs to produce harmful content. These jailbreak templates are typically manually crafted, making large-scale testing challenging. In this paper, we introduce GPTFuzz, a novel black-box jailbreak fuzzing framework inspired by the AFL fuzzing framework. Instead of manual engineering, GPTFuzz automates the generation of jailbreak templates for red-teaming LLMs. At its core, GPTFuzz starts with human-written templates as initial seeds, then mutates them to produce new templates. We detail three key components of GPTFuzz: a seed selection strategy for balancing efficiency and variability, mutate operators for creating semantically equivalent or similar sentences, and a judgment model to assess the success of a jailbreak attack. We evaluate GPTFuzz against various commercial and open-source LLMs, including ChatGPT, LLaMa-2, and Vicuna, under diverse attack scenarios. Our results indicate that GPTFuzz consistently produces jailbreak templates with a high success rate, surpassing human-crafted templates. Remarkably, GPTFuzz achieves over 90% attack success rates against ChatGPT and Llama-2 models, even with suboptimal initial seed templates. We anticipate that GPTFuzz will be instrumental for researchers and practitioners in examining LLM robustness and will encourage further exploration into enhancing LLM safety. △ Less

Submitted 27 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.05217 [pdf, other]

Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis

Authors: Li Du, Yequan Wang, Xingrun Xing, Yiqun Ya, Xiang Li, Xin Jiang, Xuezhi Fang

Abstract: Although demonstrating superb performance on various NLP tasks, large language models (LLMs) still suffer from the hallucination problem, which threatens the reliability of LLMs. To measure the level of hallucination of LLMs, previous works first categorize the hallucination according to the phenomenon similarity, then quantify the proportion that model outputs contain hallucinatory contents. Howe… ▽ More Although demonstrating superb performance on various NLP tasks, large language models (LLMs) still suffer from the hallucination problem, which threatens the reliability of LLMs. To measure the level of hallucination of LLMs, previous works first categorize the hallucination according to the phenomenon similarity, then quantify the proportion that model outputs contain hallucinatory contents. However, such hallucination rates could easily be distorted by confounders. Moreover, such hallucination rates could not reflect the reasons for the hallucination, as similar hallucinatory phenomena may originate from different sources. To address these issues, we propose to combine the hallucination level quantification and hallucination reason investigation through an association analysis, which builds the relationship between the hallucination rate of LLMs with a set of risk factors. In this way, we are able to observe the hallucination level under each value of each risk factor, examining the contribution and statistical significance of each risk factor, meanwhile excluding the confounding effect of other factors. Additionally, by recognizing the risk factors according to a taxonomy of model capability, we reveal a set of potential deficiencies in commonsense memorization, relational reasoning, and instruction following, which may further provide guidance for the pretraining and supervised fine-tuning process of LLMs to mitigate the hallucination. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2309.04190 [pdf, other]

SegmentAnything helps microscopy images based automatic and quantitative organoid detection and analysis

Authors: Xiaodan Xing, Chunling Tang, Yunzhe Guo, Nicholas Kurniawan, Guang Yang

Abstract: Organoids are self-organized 3D cell clusters that closely mimic the architecture and function of in vivo tissues and organs. Quantification of organoid morphology helps in studying organ development, drug discovery, and toxicity assessment. Recent microscopy techniques provide a potent tool to acquire organoid morphology features, but manual image analysis remains a labor and time-intensive proce… ▽ More Organoids are self-organized 3D cell clusters that closely mimic the architecture and function of in vivo tissues and organs. Quantification of organoid morphology helps in studying organ development, drug discovery, and toxicity assessment. Recent microscopy techniques provide a potent tool to acquire organoid morphology features, but manual image analysis remains a labor and time-intensive process. Thus, this paper proposes a comprehensive pipeline for microscopy analysis that leverages the SegmentAnything to precisely demarcate individual organoids. Additionally, we introduce a set of morphological properties, including perimeter, area, radius, non-smoothness, and non-circularity, allowing researchers to analyze the organoid structures quantitatively and automatically. To validate the effectiveness of our approach, we conducted tests on bright-field images of human induced pluripotent stem cells (iPSCs) derived neural-epithelial (NE) organoids. The results obtained from our automatic pipeline closely align with manual organoid detection and measurement, showcasing the capability of our proposed method in accelerating organoids morphology analysis. △ Less

Submitted 8 April, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: Replace Figure 4 with the correct version. The original version is wrong due to a column name mismatch

arXiv:2309.03147 [pdf]

doi 10.1109/JBHI.2024.3370502

Real-Time Non-Invasive Imaging and Detection of Spreading Depolarizations through EEG: An Ultra-Light Explainable Deep Learning Approach

Authors: Yinzhe Wu, Sharon Jewell, Xiaodan Xing, Yang Nan, Anthony J. Strong, Guang Yang, Martyn G. Boutelle

Abstract: A core aim of neurocritical care is to prevent secondary brain injury. Spreading depolarizations (SDs) have been identified as an important independent cause of secondary brain injury. SDs are usually detected using invasive electrocorticography recorded at high sampling frequency. Recent pilot studies suggest a possible utility of scalp electrodes generated electroencephalogram (EEG) for non-inva… ▽ More A core aim of neurocritical care is to prevent secondary brain injury. Spreading depolarizations (SDs) have been identified as an important independent cause of secondary brain injury. SDs are usually detected using invasive electrocorticography recorded at high sampling frequency. Recent pilot studies suggest a possible utility of scalp electrodes generated electroencephalogram (EEG) for non-invasive SD detection. However, noise and attenuation of EEG signals makes this detection task extremely challenging. Previous methods focus on detecting temporal power change of EEG over a fixed high-density map of scalp electrodes, which is not always clinically feasible. Having a specialized spectrogram as an input to the automatic SD detection model, this study is the first to transform SD identification problem from a detection task on a 1-D time-series wave to a task on a sequential 2-D rendered imaging. This study presented a novel ultra-light-weight multi-modal deep-learning network to fuse EEG spectrogram imaging and temporal power vectors to enhance SD identification accuracy over each single electrode, allowing flexible EEG map and paving the way for SD detection on ultra-low-density EEG with variable electrode positioning. Our proposed model has an ultra-fast processing speed (<0.3 sec). Compared to the conventional methods (2 hours), this is a huge advancement towards early SD detection and to facilitate instant brain injury prognosis. Seeing SDs with a new dimension - frequency on spectrograms, we demonstrated that such additional dimension could improve SD detection accuracy, providing preliminary evidence to support the hypothesis that SDs may show implicit features over the frequency profile. △ Less

Submitted 28 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.02964 [pdf]

Hierarchical-level rain image generative model based on GAN

Authors: Zhenyuan Liu, Tong Jia, Xingyu Xing, Jianfeng Wu, Junyi Chen

Abstract: Autonomous vehicles are exposed to various weather during operation, which is likely to trigger the performance limitations of the perception system, leading to the safety of the intended functionality (SOTIF) problems. To efficiently generate data for testing the performance of visual perception algorithms under various weather conditions, a hierarchical-level rain image generative model, rain co… ▽ More Autonomous vehicles are exposed to various weather during operation, which is likely to trigger the performance limitations of the perception system, leading to the safety of the intended functionality (SOTIF) problems. To efficiently generate data for testing the performance of visual perception algorithms under various weather conditions, a hierarchical-level rain image generative model, rain conditional CycleGAN (RCCycleGAN), is constructed. RCCycleGAN is based on the generative adversarial network (GAN) and can generate images of light, medium, and heavy rain. Different rain intensities are introduced as labels in conditional GAN (CGAN). Meanwhile, the model structure is optimized and the training strategy is adjusted to alleviate the problem of mode collapse. In addition, natural rain images of different intensities are collected and processed for model training and validation. Compared with the two baseline models, CycleGAN and DerainCycleGAN, the peak signal-to-noise ratio (PSNR) of RCCycleGAN on the test dataset is improved by 2.58 dB and 0.74 dB, and the structural similarity (SSIM) is improved by 18% and 8%, respectively. The ablation experiments are also carried out to validate the effectiveness of the model tuning. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.15767 [pdf, other]

United v.s. Divided, Deconfinement of Social Tension as a Topological Phase Transition

Authors: Chen Huang, Jun Wu, Xiangjun Xing

Abstract: The proverbs "the enemy of my enemy is my friend" and alike capture the essence of many body correlations in social relations, whose violation leads to social tension. We study how rule-breakers, who disrespect these norms, affect the structure and dynamics of signed social networks which tries to minimize social tension. We find two dynamic phases. A friendly society exhibits a "united phase" whe… ▽ More The proverbs "the enemy of my enemy is my friend" and alike capture the essence of many body correlations in social relations, whose violation leads to social tension. We study how rule-breakers, who disrespect these norms, affect the structure and dynamics of signed social networks which tries to minimize social tension. We find two dynamic phases. A friendly society exhibits a "united phase" where insertion of a rule-breaker only leads to localized rearrangement. A hostile society exhibits a "divided phase", where insertion leads to macroscopic reorganization of social relations. In the divided phase, starting from the utopia state, where all relations are friendly, insertion of a {\em separatist}, a particular type of rule-breaker who makes friends with only half of its neighbors, leads to fragmentation, where the society breaks into many finite size, mutually antagonistic cliques. These phenomena are described by Ising lattice gauge theory, where social tension behave as $Z_2$ topological defects, which are confined in the united phase and deconfined in the divided phase. We further show that the connection between social dynamics and Ising lattice gauge theory is viable independently of connectivity structure of the social network. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 6 pages, 3 figures

arXiv:2308.15764 [pdf, other]

Stochastic Thermodynamics of Brownian motion in Temperature Gradient

Authors: Mingnan Ding, Jun Wu, Xiangjun Xing

Abstract: We study stochastic thermodynamics of a Brownian particle which is subjected to a temperature gradient and is confined by an external potential. We first formulate an over-damped Ito-Langevin theory in terms of local temperature, friction coefficient, and steady state distribution, all of which are experimentally measurable. We then study the associated stochastic thermodynamics theory. We analyze… ▽ More We study stochastic thermodynamics of a Brownian particle which is subjected to a temperature gradient and is confined by an external potential. We first formulate an over-damped Ito-Langevin theory in terms of local temperature, friction coefficient, and steady state distribution, all of which are experimentally measurable. We then study the associated stochastic thermodynamics theory. We analyze the excess entropy production (EP) both at trajectory level and at ensemble level, and derive the Clausius inequality as well as the transient fluctuation theorem (FT). We also use molecular dynamics to simulate a Brownian particle inside a Lennard-Jones fluid and verify the FT. Remarkably we find that the FT remains valid even in the under-damped regime. We explain the possible mechanism underlying this surprising result. △ Less

Submitted 21 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: 12 pages, 7 figures

arXiv:2308.07665 [pdf, other]

Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Authors: Ximing Xing, Chuang Wang, Haitao Zhou, Zhihao Hu, Chongxuan Li, Dong Xu, Qian Yu

Abstract: Exemplar-based sketch-to-photo synthesis allows users to generate photo-realistic images based on sketches. Recently, diffusion-based methods have achieved impressive performance on image generation tasks, enabling highly-flexible control through text-driven generation or energy functions. However, generating photo-realistic images with color and texture from sketch images remains challenging for… ▽ More Exemplar-based sketch-to-photo synthesis allows users to generate photo-realistic images based on sketches. Recently, diffusion-based methods have achieved impressive performance on image generation tasks, enabling highly-flexible control through text-driven generation or energy functions. However, generating photo-realistic images with color and texture from sketch images remains challenging for diffusion models. Sketches typically consist of only a few strokes, with most regions left blank, making it difficult for diffusion-based methods to produce photo-realistic images. In this work, we propose a two-stage method named ``Inversion-by-Inversion" for exemplar-based sketch-to-photo synthesis. This approach includes shape-enhancing inversion and full-control inversion. During the shape-enhancing inversion process, an uncolored photo is generated with the guidance of a shape-energy function. This step is essential to ensure control over the shape of the generated photo. In the full-control inversion process, we propose an appearance-energy function to control the color and texture of the final generated photo.Importantly, our Inversion-by-Inversion pipeline is training-free and can accept different types of exemplars for color and texture control. We conducted extensive experiments to evaluate our proposed method, and the results demonstrate its effectiveness. The code and project can be found at https://ximinng.github.io/inversion-by-inversion-project/. △ Less

Submitted 3 January, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: 15 pages

arXiv:2308.05137 [pdf, other]

Discrepancy-based Active Learning for Weakly Supervised Bleeding Segmentation in Wireless Capsule Endoscopy Images

Authors: Fan Bai, Xiaohan Xing, Yutian Shen, Han Ma, Max Q. -H. Meng

Abstract: Weakly supervised methods, such as class activation maps (CAM) based, have been applied to achieve bleeding segmentation with low annotation efforts in Wireless Capsule Endoscopy (WCE) images. However, the CAM labels tend to be extremely noisy, and there is an irreparable gap between CAM labels and ground truths for medical images. This paper proposes a new Discrepancy-basEd Active Learning (DEAL)… ▽ More Weakly supervised methods, such as class activation maps (CAM) based, have been applied to achieve bleeding segmentation with low annotation efforts in Wireless Capsule Endoscopy (WCE) images. However, the CAM labels tend to be extremely noisy, and there is an irreparable gap between CAM labels and ground truths for medical images. This paper proposes a new Discrepancy-basEd Active Learning (DEAL) approach to bridge the gap between CAMs and ground truths with a few annotations. Specifically, to liberate labor, we design a novel discrepancy decoder model and a CAMPUS (CAM, Pseudo-label and groUnd-truth Selection) criterion to replace the noisy CAMs with accurate model predictions and a few human labels. The discrepancy decoder model is trained with a unique scheme to generate standard, coarse and fine predictions. And the CAMPUS criterion is proposed to predict the gaps between CAMs and ground truths based on model divergence and CAM divergence. We evaluate our method on the WCE dataset and results show that our method outperforms the state-of-the-art active learning methods and reaches comparable performance to those trained with full annotated datasets with only 10% of the training data labeled. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: accepted by MICCAI 2022

arXiv:2308.04724 [pdf, other]

Understanding Auto-Scheduling Optimizations for Model Deployment via Visualizations

Authors: Laixin Xie, Chenyang Zhang, Ruofei Ma, Xing Jiang, Xingxing Xing, Wei Wan, Quan Li

Abstract: After completing the design and training phases, deploying a deep learning model onto specific hardware is essential before practical implementation. Targeted optimizations are necessary to enhance the model's performance by reducing inference latency. Auto-scheduling, an automated technique offering various optimization options, proves to be a viable solution for large-scale auto-deployment. Howe… ▽ More After completing the design and training phases, deploying a deep learning model onto specific hardware is essential before practical implementation. Targeted optimizations are necessary to enhance the model's performance by reducing inference latency. Auto-scheduling, an automated technique offering various optimization options, proves to be a viable solution for large-scale auto-deployment. However, the low-level code generated by auto-scheduling resembles hardware coding, potentially hindering human comprehension and impeding manual optimization efforts. In this ongoing study, we aim to develop an enhanced visualization that effectively addresses the extensive profiling metrics associated with auto-scheduling. This visualization will illuminate the intricate scheduling process, enabling further advancements in latency optimization through insights derived from the schedule. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted by IEEE VIS 2023 Poster Track

arXiv:2307.10924 [pdf, other]

Intrinsic Image Decomposition Using Point Cloud Representation

Authors: Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers

Abstract: The purpose of intrinsic decomposition is to separate an image into its albedo (reflective properties) and shading components (illumination properties). This is challenging because it's an ill-posed problem. Conventional approaches primarily concentrate on 2D imagery and fail to fully exploit the capabilities of 3D data representation. 3D point clouds offer a more comprehensive format for represen… ▽ More The purpose of intrinsic decomposition is to separate an image into its albedo (reflective properties) and shading components (illumination properties). This is challenging because it's an ill-posed problem. Conventional approaches primarily concentrate on 2D imagery and fail to fully exploit the capabilities of 3D data representation. 3D point clouds offer a more comprehensive format for representing scenes, as they combine geometric and color information effectively. To this end, in this paper, we introduce Point Intrinsic Net (PoInt-Net), which leverages 3D point cloud data to concurrently estimate albedo and shading maps. The merits of PoInt-Net include the following aspects. First, the model is efficient, achieving consistent performance across point clouds of any size with training only required on small-scale point clouds. Second, it exhibits remarkable robustness; even when trained exclusively on datasets comprising individual objects, PoInt-Net demonstrates strong generalization to unseen objects and scenes. Third, it delivers superior accuracy over conventional 2D approaches, demonstrating enhanced performance across various metrics on different datasets. (Code Released) △ Less

Submitted 28 March, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Code: https://github.com/xyxingx/PoInt-Net

arXiv:2307.10757 [pdf, other]

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

Authors: Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu

Abstract: This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their consid… ▽ More This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their considerable size. Above limitations spawn another research direction, namely, optimizing large-scale PTMs for specific tasks to generate task-specific PTMs that are both compact and effective. In this paper, we focus on the speech emotion recognition task and propose an improved emotion-specific pretrained encoder called Vesper. Vesper is pretrained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Subsequently, Vesper employs hierarchical and cross-layer self-supervision to improve its ability to capture acoustic and semantic representations, both of which are crucial for emotion recognition. Experimental results on the IEMOCAP, MELD, and CREMA-D datasets demonstrate that Vesper with 4 layers outperforms WavLM Base with 12 layers, and the performance of Vesper with 12 layers surpasses that of WavLM Large with 24 layers. △ Less

Submitted 18 April, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: This paper was accepted by IEEE Transactions on Affective Computing 2024

arXiv:2307.10182 [pdf, other]

Enhancing Super-Resolution Networks through Realistic Thick-Slice CT Simulation

Authors: Zeyu Tang, Xiaodan Xing, Guang Yang

Abstract: Deep learning-based Generative Models have the potential to convert low-resolution CT images into high-resolution counterparts without long acquisition times and increased radiation exposure in thin-slice CT imaging. However, procuring appropriate training data for these Super-Resolution (SR) models is challenging. Previous SR research has simulated thick-slice CT images from thin-slice CT images… ▽ More Deep learning-based Generative Models have the potential to convert low-resolution CT images into high-resolution counterparts without long acquisition times and increased radiation exposure in thin-slice CT imaging. However, procuring appropriate training data for these Super-Resolution (SR) models is challenging. Previous SR research has simulated thick-slice CT images from thin-slice CT images to create training pairs. However, these methods either rely on simplistic interpolation techniques that lack realism or sinogram reconstruction, which require the release of raw data and complex reconstruction algorithms. Thus, we introduce a simple yet realistic method to generate thick CT images from thin-slice CT images, facilitating the creation of training pairs for SR algorithms. The training pairs produced by our method closely resemble real data distributions (PSNR=49.74 vs. 40.66, p$<$0.05). A multivariate Cox regression analysis involving thick slice CT images with lung fibrosis revealed that only the radiomics features extracted using our method demonstrated a significant correlation with mortality (HR=1.19 and HR=1.14, p$<$0.005). This paper represents the first to identify and address the challenge of generating appropriate paired training data for Deep Learning-based CT SR models, which enhances the efficacy and applicability of SR models in real-world scenarios. △ Less

Submitted 2 June, 2024; v1 submitted 2 July, 2023; originally announced July 2023.

Comments: 11 pages, 4 figures

arXiv:2306.14685 [pdf, other]

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

Authors: Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu

Abstract: Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis. In this paper, we present DiffSketcher, an innovative algorithm that creates \textit{vectorized} free-hand sketches using natural language input. DiffSketcher is developed based on a pre-trained text-to-image diffusion model. It performs the task by directly optimiz… ▽ More Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis. In this paper, we present DiffSketcher, an innovative algorithm that creates \textit{vectorized} free-hand sketches using natural language input. DiffSketcher is developed based on a pre-trained text-to-image diffusion model. It performs the task by directly optimizing a set of Bézier curves with an extended version of the score distillation sampling (SDS) loss, which allows us to use a raster-level diffusion model as a prior for optimizing a parametric vectorized sketch generator. Furthermore, we explore attention maps embedded in the diffusion model for effective stroke initialization to speed up the generation process. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual details of the subject drawn. Our experiments show that DiffSketcher achieves greater quality than prior work. The code and demo of DiffSketcher can be found at https://ximinng.github.io/DiffSketcher-project/. △ Less

Submitted 15 January, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: Accepted by NIPS 2023. Project page: https://ximinng.github.io/DiffSketcher-project/

arXiv:2305.18337 [pdf, other]

You Don't Have to Be Perfect to Be Amazing: Unveil the Utility of Synthetic Images

Authors: Xiaodan Xing, Federico Felder, Yang Nan, Giorgos Papanastasiou, Walsh Simon, Guang Yang

Abstract: Synthetic images generated from deep generative models have the potential to address data scarcity and data privacy issues. The selection of synthesis models is mostly based on image quality measurements, and most researchers favor synthetic images that produce realistic images, i.e., images with good fidelity scores, such as low Fréchet Inception Distance (FID) and high Peak Signal-To-Noise Ratio… ▽ More Synthetic images generated from deep generative models have the potential to address data scarcity and data privacy issues. The selection of synthesis models is mostly based on image quality measurements, and most researchers favor synthetic images that produce realistic images, i.e., images with good fidelity scores, such as low Fréchet Inception Distance (FID) and high Peak Signal-To-Noise Ratio (PSNR). However, the quality of synthetic images is not limited to fidelity, and a wide spectrum of metrics should be evaluated to comprehensively measure the quality of synthetic images. In addition, quality metrics are not truthful predictors of the utility of synthetic images, and the relations between these evaluation metrics are not yet clear. In this work, we have established a comprehensive set of evaluators for synthetic images, including fidelity, variety, privacy, and utility. By analyzing more than 100k chest X-ray images and their synthetic copies, we have demonstrated that there is an inevitable trade-off between synthetic image fidelity, variety, and privacy. In addition, we have empirically demonstrated that the utility score does not require images with both high fidelity and high variety. For intra- and cross-task data augmentation, mode-collapsed images and low-fidelity images can still demonstrate high utility. Finally, our experiments have also showed that it is possible to produce images with both high utility and privacy, which can provide a strong rationale for the use of deep generative models in privacy-preserving applications. Our study can shore up comprehensive guidance for the evaluation of synthetic images and elicit further developments for utility-aware deep generative models in medical image synthesis. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 10 pages, 4 figures, MICCAI Early Acceptance

arXiv:2305.09789 [pdf, other]

The Beauty or the Beast: Which Aspect of Synthetic Medical Images Deserves Our Focus?

Authors: Xiaodan Xing, Yang Nan, Federico Felder, Simon Walsh, Guang Yang

Abstract: Training medical AI algorithms requires large volumes of accurately labeled datasets, which are difficult to obtain in the real world. Synthetic images generated from deep generative models can help alleviate the data scarcity problem, but their effectiveness relies on their fidelity to real-world images. Typically, researchers select synthesis models based on image quality measurements, prioritiz… ▽ More Training medical AI algorithms requires large volumes of accurately labeled datasets, which are difficult to obtain in the real world. Synthetic images generated from deep generative models can help alleviate the data scarcity problem, but their effectiveness relies on their fidelity to real-world images. Typically, researchers select synthesis models based on image quality measurements, prioritizing synthetic images that appear realistic. However, our empirical analysis shows that high-fidelity and visually appealing synthetic images are not necessarily superior. In fact, we present a case where low-fidelity synthetic images outperformed their high-fidelity counterparts in downstream tasks. Our findings highlight the importance of comprehensive analysis before incorporating synthetic data into real-world applications. We hope our results will raise awareness among the research community of the value of low-fidelity synthetic images in medical AI algorithm training. △ Less

Submitted 14 June, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: CBMS 2023

arXiv:2304.06089 [pdf, other]

A hybrid quantum-classical algorithm for multichannel quantum scattering of atoms and molecules

Authors: Xiaodong Xing, Alejandro Gomez Cadavid, Artur F. Izmaylov, Timur V. Tscherbul

Abstract: We propose a hybrid quantum-classical algorithm for solving the time-independent Schrödinger equation for atomic and molecular collisions. The algorithm is based on the $S$-matrix version of the Kohn variational principle, which computes the fundamental scattering $S$-matrix by inverting the Hamiltonian matrix expressed in the basis of square-integrable functions. The computational bottleneck of t… ▽ More We propose a hybrid quantum-classical algorithm for solving the time-independent Schrödinger equation for atomic and molecular collisions. The algorithm is based on the $S$-matrix version of the Kohn variational principle, which computes the fundamental scattering $S$-matrix by inverting the Hamiltonian matrix expressed in the basis of square-integrable functions. The computational bottleneck of the classical algorithm -- symmetric matrix inversion -- is addressed here using the variational quantum linear solver (VQLS), a recently developed noisy intermediate-scale quantum (NISQ) algorithm for solving systems of linear equations. We apply our algorithm to single and multichannel quantum scattering problems, obtaining accurate vibrational relaxation probabilities in collinear atom-molecule collisions. We also show how the algorithm could be scaled up to simulate collisions of large polyatomic molecules. Our results demonstrate that it is possible to calculate scattering cross sections and rates for complex molecular collisions on NISQ quantum processors, opening up the possibility of scalable digital quantum computation of gas-phase bimolecular collisions and reactions of relevance to astrochemistry and ultracold chemistry. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 8 pages,6 figures

arXiv:2304.03330 [pdf, ps, other]

doi 10.1103/PhysRevX.14.011059

Inverse Volume Scaling of Finite-Size Error in Periodic Coupled Cluster Theory

Authors: Xin Xing, Lin Lin

Abstract: Coupled cluster theory is one of the most popular post-Hartree-Fock methods for ab initio molecular quantum chemistry. The finite-size error of the correlation energy in periodic coupled cluster calculations for three-dimensional insulating systems has been observed to satisfy the inverse volume scaling, even in the absence of any correction schemes. This is surprising, as simpler theories that ut… ▽ More Coupled cluster theory is one of the most popular post-Hartree-Fock methods for ab initio molecular quantum chemistry. The finite-size error of the correlation energy in periodic coupled cluster calculations for three-dimensional insulating systems has been observed to satisfy the inverse volume scaling, even in the absence of any correction schemes. This is surprising, as simpler theories that utilize only a subset of the coupled cluster diagrams exhibit much slower decay of the finite-size error, which scales inversely with the length of the system. In this study, we review the current understanding of finite-size error in quantum chemistry methods for periodic systems. We introduce new tools that elucidate the mechanisms behind this phenomenon in the context of coupled cluster doubles calculations. This reconciles some seemingly paradoxical statements related to finite-size scaling. Our findings also show that singularity subtraction can be a powerful method to effectively reduce finite-size errors in practical quantum chemistry calculations for periodic systems. △ Less

Submitted 31 March, 2024; v1 submitted 6 April, 2023; originally announced April 2023.

MSC Class: 65D32; 41A55; 81V70

arXiv:2303.17587 [pdf]

doi 10.1038/s41467-023-41777-7

Observation of non-superconducting phase changes in LuH$_{2\pm\text{x}}$N$_y$

Authors: Xiangzhuo Xing, Chao Wang, Linchao Yu, Jie Xu, Chutong Zhang, Mengge Zhang, Song Huang, Xiaoran Zhang, Bingchao Yang, Xin Chen, Yongsheng Zhang, Jian-gang Guo, Zhixiang Shi, Yanming Ma, Changfeng Chen, Xiaobing Liu

Abstract: The recent report of near-ambient superconductivity in nitrogen doped lutetium hydride has triggered a worldwide fanaticism and raised major questions about the latest claims. An intriguing phenomenon of color changes in pressurized samples from blue to pink to red was observed and correlated with the claimed superconducting transition, but the origin and underlying physics of these color changes… ▽ More The recent report of near-ambient superconductivity in nitrogen doped lutetium hydride has triggered a worldwide fanaticism and raised major questions about the latest claims. An intriguing phenomenon of color changes in pressurized samples from blue to pink to red was observed and correlated with the claimed superconducting transition, but the origin and underlying physics of these color changes have yet to be elucidated. Here we report synthesis and characterization of high-purity nitrogen doped lutetium hydride LuH$_{2\pm\text{x}}$N$_y$ with the same structure and composition as in the main phase of near-ambient superconductor1. We find a new purple phase of LuH$_{2\pm\text{x}}$N$_y$ between blue and pink phase, and reveal that the sample color changes likely stem from pressure-driven redistribution of nitrogen and its interaction with the LuH$_2$ framework. No superconducting transition is found in all blue, purple, pink and red phases at temperatures 1.8-300 K and pressures 0-30 GPa. Instead, we identify a notable temperature-induced resistance anomaly of structural and/or electronic origin in LuH$_{2\pm\text{x}}$N$_y$, which is most pronounced in the pink phase and may have been erroneously interpreted as a sign of superconducting transition. This work establishes key benchmarks for nitrogen doped lutetium hydrides, allowing an in-depth understanding of the novel pressure-induced phase changes. △ Less

Submitted 4 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: 17 pages and 5 figures in the main text. 9 pages and 12 figures in the Supplementary Material. Any valuable comments and suggestions are warmly welcomed

Journal ref: Nat. Commun. 14, 5991 (2023)

arXiv:2303.12747 [pdf, other]

Less is More: Unsupervised Mask-guided Annotated CT Image Synthesis with Minimum Manual Segmentations

Authors: Xiaodan Xing, Giorgos Papanastasiou, Simon Walsh, Guang Yang

Abstract: As a pragmatic data augmentation tool, data synthesis has generally returned dividends in performance for deep learning based medical image analysis. However, generating corresponding segmentation masks for synthetic medical images is laborious and subjective. To obtain paired synthetic medical images and segmentations, conditional generative models that use segmentation masks as synthesis conditi… ▽ More As a pragmatic data augmentation tool, data synthesis has generally returned dividends in performance for deep learning based medical image analysis. However, generating corresponding segmentation masks for synthetic medical images is laborious and subjective. To obtain paired synthetic medical images and segmentations, conditional generative models that use segmentation masks as synthesis conditions were proposed. However, these segmentation mask-conditioned generative models still relied on large, varied, and labeled training datasets, and they could only provide limited constraints on human anatomical structures, leading to unrealistic image features. Moreover, the invariant pixel-level conditions could reduce the variety of synthetic lesions and thus reduce the efficacy of data augmentation. To address these issues, in this work, we propose a novel strategy for medical image synthesis, namely Unsupervised Mask (UM)-guided synthesis, to obtain both synthetic images and segmentations using limited manual segmentation labels. We first develop a superpixel based algorithm to generate unsupervised structural guidance and then design a conditional generative model to synthesize images and annotations simultaneously from those unsupervised masks in a semi-supervised multi-task setting. In addition, we devise a multi-scale multi-task Fréchet Inception Distance (MM-FID) and multi-scale multi-task standard deviation (MM-STD) to harness both fidelity and variety evaluations of synthetic CT images. With multiple analyses on different scales, we could produce stable image quality measurements with high reproducibility. Compared with the segmentation mask guided synthesis, our UM-guided synthesis provided high-quality synthetic images with significantly higher fidelity, variety, and utility ($p<0.05$ by Wilcoxon Signed Ranked test). △ Less

Submitted 19 March, 2023; originally announced March 2023.

Comments: 12 pages, 11 figures, accepted by IEEE TMI

arXiv:2303.11601 [pdf]

doi 10.1088/1361-6668/ac72cd

Significant enhancement of critical current density in H+-intercalated FeSe single crystal

Authors: Yan Meng, Wei Wei, Xiangzhuo Xing, Xiaolei Yi, Nan Zhou, Yufeng Zhang, Wenhui Liu, Yue Sun, Zhixiang Shi

Abstract: Superconducting transition temperature (Tc) and critical current density (Jc) are two key factors that are not only crucial for probing high-temperature superconducting mechanisms, but also for practical applications. The simple crystal structure of FeSe is very favorable for the fabrication of thin films and wires, but its application is limited by the relatively low Tc and small Jc. A previous s… ▽ More Superconducting transition temperature (Tc) and critical current density (Jc) are two key factors that are not only crucial for probing high-temperature superconducting mechanisms, but also for practical applications. The simple crystal structure of FeSe is very favorable for the fabrication of thin films and wires, but its application is limited by the relatively low Tc and small Jc. A previous study has found that the Tc of FeSe can be significantly enhanced over 40 K by using the protonation method. Here, we present a systematic study of Jc and vortex properties of H+-intercalated FeSe (Hx-FeSe) single crystals. The value of Jc for Hx-FeSe single crystal is significantly enhanced, exceeding 1.3*10^6 A/cm2 at 4 K, which is more than two orders of magnitude larger than 1.1*10^4 A/cm2 of pristine FeSe. The vortex pinning mechanism of Hx-FeSe is found to be surface pinning, which is different from the dominant strong point-like pinning in pristine FeSe. Moreover, the systematic study of the vortex phase transition and the underlying mechanism provides a wealth of information for the vortex phase diagram of Hx-FeSe single crystal. Our results confirm that the introduction of H+ intercalations into FeSe not only enhances the Tc, but also significantly increases the value of Jc, which is favorable for practical applications. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 13 pages, 5 figures

Journal ref: Supercond. Sci. Technol. 35 (2022) 075012

arXiv:2303.11598 [pdf, other]

doi 10.1088/0256-307X/40/2/027401

Anomalous second magnetization peak in 12442-type RbCa$_2$Fe$_4$As$_4$F$_2$ superconductors

Authors: Xiaolei Yi, Xiangzhuo Xing, Yan Meng, Nan Zhou, Chunlei Wang, Yue Sun, Zhixiang Shi

Abstract: The second magnetization peak (SMP) appears in most superconductors and is crucial for the understanding of vortex physics as well as the application. Although it is well known that the SMP is related to the type and quantity of disorder/defects, the mechanism has not been universally understood. In this work, we selected three stoichiometric superconducting RbCa$_2$Fe$_4$As$_4$F$_2$ single crysta… ▽ More The second magnetization peak (SMP) appears in most superconductors and is crucial for the understanding of vortex physics as well as the application. Although it is well known that the SMP is related to the type and quantity of disorder/defects, the mechanism has not been universally understood. In this work, we selected three stoichiometric superconducting RbCa$_2$Fe$_4$As$_4$F$_2$ single crystals with identical superconducting critical temperature $T_c$ $\sim$ 31 K and similar self-field critical current density $J_c$, but with different amounts of disorder/defects, to study the SMP effect. It is found that only the sample S2 with a moderate disorder/defects shows a significant SMP effect. The evolution of the normalized pinning force density $f_p$ demonstrates that the dominant pinning mechanism changes from weak pinning at low temperatures to strong pinning at high temperatures. The microstructure study for sample S2 reveals some expanded Ca$_2$F$_2$ layers and dislocation defects in RbFe$_2$As$_2$ layers. The normalized magnetic relaxation results indicate that the SMP is strongly associated with the elastic to plastic (E-P) vortex transition. As temperature increases, the SMP gradually evolves into a step-like shape and then becomes a sharp peak near the irreversibility field similar to what is usually observed in low-temperature superconductors. Our findings connect the low field SMP of high-temperature superconductors and the high field peak of low-temperature superconductors, revealing the possible universal origin related to the E-P phase transition. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 9 pages, 5 figures

Journal ref: Chinese Physics Letters 40, 027401 (2023) (cover story)

arXiv:2303.01694 [pdf, other]

DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

Authors: Shuaiqi Chen, Xiaofen Xing, Weibin Zhang, Weidong Chen, Xiangmin Xu

Abstract: Speech emotion recognition is crucial to human-computer interaction. The temporal regions that represent different emotions scatter in different parts of the speech locally. Moreover, the temporal scales of important information may vary over a large range within and across speech segments. Although transformer-based models have made progress in this field, the existing models could not precisely… ▽ More Speech emotion recognition is crucial to human-computer interaction. The temporal regions that represent different emotions scatter in different parts of the speech locally. Moreover, the temporal scales of important information may vary over a large range within and across speech segments. Although transformer-based models have made progress in this field, the existing models could not precisely locate important regions at different temporal scales. To address the issue, we propose Dynamic Window transFormer (DWFormer), a new architecture that leverages temporal importance by dynamically splitting samples into windows. Self-attention mechanism is applied within windows for capturing temporal important information locally in a fine-grained way. Cross-window information interaction is also taken into account for global communication. DWFormer is evaluated on both the IEMOCAP and the MELD datasets. Experimental results show that the proposed model achieves better performance than the previous state-of-the-art methods. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 4 pages, 5 figures, 3 tables, accepted by 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP2023)

arXiv:2303.01276 [pdf, other]

Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation

Authors: Zicheng Wang, Zhen Zhao, Xiaoxia Xing, Dong Xu, Xiangyu Kong, Luping Zhou

Abstract: Semi-supervised semantic segmentation (SSS) has recently gained increasing research interest as it can reduce the requirement for large-scale fully-annotated training data. The current methods often suffer from the confirmation bias from the pseudo-labelling process, which can be alleviated by the co-training framework. The current co-training-based SSS methods rely on hand-crafted perturbations t… ▽ More Semi-supervised semantic segmentation (SSS) has recently gained increasing research interest as it can reduce the requirement for large-scale fully-annotated training data. The current methods often suffer from the confirmation bias from the pseudo-labelling process, which can be alleviated by the co-training framework. The current co-training-based SSS methods rely on hand-crafted perturbations to prevent the different sub-nets from collapsing into each other, but these artificial perturbations cannot lead to the optimal solution. In this work, we propose a new conflict-based cross-view consistency (CCVC) method based on a two-branch co-training framework which aims at enforcing the two sub-nets to learn informative features from irrelevant views. In particular, we first propose a new cross-view consistency (CVC) strategy that encourages the two sub-nets to learn distinct features from the same input by introducing a feature discrepancy loss, while these distinct features are expected to generate consistent prediction scores of the input. The CVC strategy helps to prevent the two sub-nets from stepping into the collapse. In addition, we further propose a conflict-based pseudo-labelling (CPL) method to guarantee the model will learn more useful information from conflicting predictions, which will lead to a stable training process. We validate our new CCVC approach on the SSS benchmark datasets where our method achieves new state-of-the-art performance. Our code is available at https://github.com/xiaoyao3302/CCVC. △ Less

Submitted 25 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: accepted by CVPR2023

arXiv:2302.14638 [pdf, other]

doi 10.1109/TASLP.2023.3235194

SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing

Authors: Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du

Abstract: Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full poten… ▽ More Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full potential of Transformer unexplored. In this paper, we consider the characteristics of speech and propose a general structure-based framework, called SpeechFormer++, for paralinguistic speech processing. More concretely, following the component relationship in the speech signal, we design a unit encoder to model the intra- and inter-unit information (i.e., frames, phones, and words) efficiently. According to the hierarchical relationship, we utilize merging blocks to generate features at different granularities, which is consistent with the structural pattern in the speech signal. Moreover, a word encoder is introduced to integrate word-grained features into each unit encoder, which effectively balances fine-grained and coarse-grained information. SpeechFormer++ is evaluated on the speech emotion recognition (IEMOCAP & MELD), depression classification (DAIC-WOZ) and Alzheimer's disease detection (Pitt) tasks. The results show that SpeechFormer++ outperforms the standard Transformer while greatly reducing the computational cost. Furthermore, it delivers superior results compared to the state-of-the-art approaches. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 14 pages, 7 figures, 14 tables, TASLP 2023 paper

arXiv:2302.14450 [pdf, other]

Swin Deformable Attention Hybrid U-Net for Medical Image Segmentation

Authors: Lichao Wang, Jiahao Huang, Xiaodan Xing, Guang Yang

Abstract: Medical image segmentation is a crucial task in the field of medical image analysis. Harmonizing the convolution and multi-head self-attention mechanism is a recent research focus in this field, with various combination methods proposed. However, the lack of interpretability of these hybrid models remains a common pitfall, limiting their practical application in clinical scenarios. To address this… ▽ More Medical image segmentation is a crucial task in the field of medical image analysis. Harmonizing the convolution and multi-head self-attention mechanism is a recent research focus in this field, with various combination methods proposed. However, the lack of interpretability of these hybrid models remains a common pitfall, limiting their practical application in clinical scenarios. To address this issue, we propose to incorporate the Shifted Window (Swin) Deformable Attention into a hybrid architecture to improve segmentation performance while ensuring explainability. Our proposed Swin Deformable Attention Hybrid UNet (SDAH-UNet) demonstrates state-of-the-art performance on both anatomical and lesion segmentation tasks. Moreover, we provide a direct and visual explanation of the model focalization and how the model forms it, enabling clinicians to better understand and trust the decision of the model. Our approach could be a promising solution to the challenge of developing accurate and interpretable medical image segmentation models. △ Less

Submitted 27 September, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: 10 pages, 5 figures, accepted by SIPAIM2023

arXiv:2302.13729 [pdf, other]

DST: Deformable Speech Transformer for Emotion Recognition

Authors: Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du

Abstract: Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can seve… ▽ More Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can severely degrade the model flexibility. In addition, it is difficult to obtain the optimal window settings manually. In this paper, we propose a Deformable Speech Transformer, named DST, for SER task. DST determines the usage of window sizes conditioned on input speech via a light-weight decision network. Meanwhile, data-dependent offsets derived from acoustic features are utilized to adjust the positions of the attention windows, allowing DST to adaptively discover and attend to the valuable information embedded in the speech. Extensive experiments on IEMOCAP and MELD demonstrate the superiority of DST. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 5 pages, 4 figures, 2tables, accepted by ICASSP 2023

arXiv:2302.10272 [pdf, other]

Is Autoencoder Truly Applicable for 3D CT Super-Resolution?

Authors: Weixun Luo, Xiaodan Xing, Guang Yang

Abstract: Featured by a bottleneck structure, autoencoder (AE) and its variants have been largely applied in various medical image analysis tasks, such as segmentation, reconstruction and de-noising. Despite of their promising performances in aforementioned tasks, in this paper, we claim that AE models are not applicable to single image super-resolution (SISR) for 3D CT data. Our hypothesis is that the bott… ▽ More Featured by a bottleneck structure, autoencoder (AE) and its variants have been largely applied in various medical image analysis tasks, such as segmentation, reconstruction and de-noising. Despite of their promising performances in aforementioned tasks, in this paper, we claim that AE models are not applicable to single image super-resolution (SISR) for 3D CT data. Our hypothesis is that the bottleneck architecture that resizes feature maps in AE models degrades the details of input images, thus can sabotage the performance of super-resolution. Although U-Net proposed skip connections that merge information from different levels, we claim that the degrading impact of feature resizing operations could hardly be removed by skip connections. By conducting large-scale ablation experiments and comparing the performance between models with and without the bottleneck design on a public CT lung dataset , we have discovered that AE models, including U-Net, have failed to achieve a compatible SISR result ($p<0.05$ by Student's t-test) compared to the baseline model. Our work is the first comparative study investigating the suitability of AE architecture for 3D CT SISR tasks and brings a rationale for researchers to re-think the choice of model architectures especially for 3D CT SISR tasks. The full implementation and trained models can be found at: https://github.com/Roldbach/Autoencoder-3D-CT-SISR △ Less

Submitted 31 March, 2023; v1 submitted 23 January, 2023; originally announced February 2023.

Comments: ISBI 2023

arXiv:2302.06043 [pdf, ps, other]

doi 10.1016/j.jcp.2024.112755

Finite-size effects in periodic coupled cluster calculations

Authors: Xin Xing, Lin Lin

Abstract: We provide the first rigorous study of the finite-size error in the simplest and representative coupled cluster theory, namely the coupled cluster doubles (CCD) theory, for gapped periodic systems. Assuming that the CCD equations are solved using exact Hartree-Fock orbitals and orbital energies, we prove that the convergence rate of finite-size error scales as… ▽ More We provide the first rigorous study of the finite-size error in the simplest and representative coupled cluster theory, namely the coupled cluster doubles (CCD) theory, for gapped periodic systems. Assuming that the CCD equations are solved using exact Hartree-Fock orbitals and orbital energies, we prove that the convergence rate of finite-size error scales as $\mathscr{O}(N_\mathbf{k}^{-\frac13})$, where $N_{\mathbf{k}}$ is the number of discretization point in the Brillouin zone and characterizes the system size. Our analysis shows that the dominant error lies in the coupled cluster amplitude calculation, and the convergence of the finite-size error in energy calculations can be boosted to $\mathscr{O}(N_\mathbf{k}^{-1})$ with accurate amplitudes. This also provides the first proof of the scaling of the finite-size error in the third order Møller-Plesset perturbation theory (MP3) for periodic systems. △ Less

Submitted 12 February, 2023; originally announced February 2023.

MSC Class: 81Q99; 65G99; 65D32

Journal ref: Journal of Computational Physics (2024): 112755

arXiv:2301.02875 [pdf, ps, other]

An iterative two-grid method for strongly nonlinear elliptic boundary value problems

Authors: Jiajun Zhan, Lei Yang, Xiaoqing Xing, Liuqiang Zhong

Abstract: We design and analyze an iterative two-grid algorithm for the finite element discretizations of strongly nonlinear elliptic boundary value problems in this paper. We propose an iterative two-grid algorithm, in which a nonlinear problem is first solved on the coarse space, and then a symmetric positive definite problem is solved on the fine space. The innovation of this paper lies in the establishm… ▽ More We design and analyze an iterative two-grid algorithm for the finite element discretizations of strongly nonlinear elliptic boundary value problems in this paper. We propose an iterative two-grid algorithm, in which a nonlinear problem is first solved on the coarse space, and then a symmetric positive definite problem is solved on the fine space. The innovation of this paper lies in the establishment of a first convergence analysis, which requires simultaneous estimation of four interconnected error estimates. We also present some numerical experiments to confirm the efficiency of the proposed algorithm. △ Less

Submitted 3 May, 2023; v1 submitted 7 January, 2023; originally announced January 2023.

arXiv:2301.01563 [pdf, other]

A Posterior Error Estimator for Mixed Interior Penalty Discontinuous Galerkin Finite Element Method for the H(curl)-Elliptic Problems

Authors: Ming Tang, Xiaoqing Xing, Liuqiang Zhong

Abstract: In this paper, we design the first residual type a posteriori error estimator for mixed interior penalty discontinuous Galerkin method for the H(curl)-elliptic problems. Then we prove that our residual based a posteriori error indicator is both reliable and efficient. At last, we present some numerical experiments to validate the performance of the indicator within an adaptive mesh refinement proc… ▽ More In this paper, we design the first residual type a posteriori error estimator for mixed interior penalty discontinuous Galerkin method for the H(curl)-elliptic problems. Then we prove that our residual based a posteriori error indicator is both reliable and efficient. At last, we present some numerical experiments to validate the performance of the indicator within an adaptive mesh refinement procedure. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2301.01439 [pdf, other]

Convergence of Adaptive Mixed Interior Penalty Discontinuous Galerkin Methods for H(curl)-Elliptic Problems

Authors: K. Liu, M. Tang, X. Q. Xing, L. Q. Zhong

Abstract: In this paper, we study the convergence of adaptive mixed interior penalty discontinuous Galerkin method for H(curl)-elliptic problems. We first get the mixed model of H(curl)-elliptic problem by introducing a new intermediate variable. Then we discuss the continuous variational problem and discrete variational problem, which based on interior penalty discontinuous Galerkin approximation. Next, we… ▽ More In this paper, we study the convergence of adaptive mixed interior penalty discontinuous Galerkin method for H(curl)-elliptic problems. We first get the mixed model of H(curl)-elliptic problem by introducing a new intermediate variable. Then we discuss the continuous variational problem and discrete variational problem, which based on interior penalty discontinuous Galerkin approximation. Next, we construct the corresponding posteriori error indicator, and prove the contraction of the summation of the energy error and the scaled error indicator. At last, we confirm and illustrate the theoretical result through some numerical experiments. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2301.01426 [pdf, other]

Iterative two-level algorithm for nonsymmetric or indefinite elliptic problems

Authors: Ming Tang, Xiaoqing Xing, Ying Yang, Liuqiang Zhong

Abstract: In this paper, a new iterative two-level algorithm is presented for solving the finite element discretization for nonsymmetric or indefinite elliptic problems. The iterative two-level algorithm uses the same coarse space as the traditional two-grid algorithm, but its ``fine space'' uses the higher oder finite element space under the coarse grid. Therefore, the iterative two-level algorithm only ne… ▽ More In this paper, a new iterative two-level algorithm is presented for solving the finite element discretization for nonsymmetric or indefinite elliptic problems. The iterative two-level algorithm uses the same coarse space as the traditional two-grid algorithm, but its ``fine space'' uses the higher oder finite element space under the coarse grid. Therefore, the iterative two-level algorithm only needs one grid, and the computational cost is much lower than the traditional iterative two-grid algorithm. Finally, compared with the traditional two-grid algorithm, numerical experiments show that the computational cost is lower to achieve the same convergence order. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2212.05363 [pdf, other]

doi 10.1021/acs.jpca.2c08646

Nuclear spin relaxation in cold atom-molecule collisions

Authors: Rebekah Hermsmeier, Xiaodong Xing, Timur V. Tscherbul

Abstract: We explore the quantum dynamics of nuclear spin relaxation in cold collisions of $^1Σ^+$ molecules with structureless atoms in an external magnetic field. To this end, we develop a rigorous coupled-channel methodology, which accounts for rotational and nuclear spin degrees of freedom of $^1Σ^+$ molecules, their interaction with an external magnetic field, as well as for anisotropic atom-molecule i… ▽ More We explore the quantum dynamics of nuclear spin relaxation in cold collisions of $^1Σ^+$ molecules with structureless atoms in an external magnetic field. To this end, we develop a rigorous coupled-channel methodology, which accounts for rotational and nuclear spin degrees of freedom of $^1Σ^+$ molecules, their interaction with an external magnetic field, as well as for anisotropic atom-molecule interactions. We apply the methodology to study collisional relaxation of the nuclear spin sublevels of $^{13}$CO molecules immersed in a cold buffer gas of $^4$He atoms. We find that nuclear spin relaxation in the ground rotational manifold of CO occurs extremely slowly due to the absence of direct couplings between the nuclear spin sublevels. The rates of collisional transitions between the $N=1$ nuclear spin states of CO are generally much higher due to the direct nuclear spin-rotation coupling between the states. These transitions obey selection rules, which depend on the values of space-fixed projections of rotational and nuclear spin angular momenta for the initial and final molecular states. For some initial states, we also observe a strong magnetic field dependence, which can be understood using the first Born approximation. We use our calculated nuclear spin relaxation rates to investigate the thermalization of a single nuclear spin state of CO$(N=0)$ immersed in a cold buffer gas of He. The calculated nuclear spin relaxation times ($T_1\simeq 0.5$ s at $T=1$ K) display a steep temperature dependence decreasing rapidly at elevated temperatures due to the increased population of rotationally excited states, which undergo nuclear spin relaxation at a much faster rate. Thus, long relaxation times of $N=0$ nuclear spin states in cold collisions with buffer gas atoms can only be maintained at sufficiently low temperatures ($kT\ll 2B_e$), where $B_e$ is the rotational constant. △ Less

Submitted 16 June, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

Comments: 41 pages, 12 figures

Journal ref: J. Phys. Chem. A 127, 4511-4525 (2023)

arXiv:2210.08724 [pdf, other]

An Ontology-based Method to Identify Triggering Conditions for Perception Insufficiency of Autonomous Vehicles

Authors: Xingyu Xing, Tong Jia, Junyi Chen, Lu Xiong, Zhuoping Yu

Abstract: The autonomous vehicle (AV) is a safety-critical system relying on complex sensors and algorithms. The AV may confront risk conditions if these sensors and algorithms misunderstand the environment and situation, even though all components are fault-free. The ISO 21448 defined the safety of the intended functionality (SOTIF), aiming to enhance the AV's safety by specifying AV's development and vali… ▽ More The autonomous vehicle (AV) is a safety-critical system relying on complex sensors and algorithms. The AV may confront risk conditions if these sensors and algorithms misunderstand the environment and situation, even though all components are fault-free. The ISO 21448 defined the safety of the intended functionality (SOTIF), aiming to enhance the AV's safety by specifying AV's development and validation process. As required in the ISO 21448, the triggering conditions, which may lead to the vehicle's functional insufficiencies, should be analyzed and verified. However, there is not yet a method to realize a comprehensive and systematic identification of triggering conditions so far. This paper proposed an analysis framework of triggering conditions for the perception system based on the propagation chain of events model, which consists of triggering source, influenced perception stage, and triggering effect. According to the analysis framework, ontologies of triggering source and perception stage were constructed, and the relationships between concepts in ontologies are defined. According to these ontologies, triggering conditions can be generated comprehensively and systematically. The proposed method was applied on an L3 autonomous vehicle, and 20 from 87 triggering conditions identified were tested in the field, among which eight triggering conditions triggered risky behaviors of the vehicle. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: 12 pages,10 figures

Showing 51–100 of 285 results for author: Xing, X