Search | arXiv e-print repository

The moduli space of a rational map is Carathéodory hyperbolic

Abstract: Let $f$ be a rational map of degree $d\geq 2$. The moduli space $\mathcal{M}_f$, introduced by McMullen and Sullivan, is a complex analytic space consisting all quasiconformal conjugacy classes of $f$. For $f$ that is not flexible Lattès, we show that there is a normal affine variety $X_f$ of dimension $2d-2$ and a holomorphic injection $i:\mathcal{M}_f\to X_f$ such that $i(\mathcal{M}_f)$ is prec… ▽ More Let $f$ be a rational map of degree $d\geq 2$. The moduli space $\mathcal{M}_f$, introduced by McMullen and Sullivan, is a complex analytic space consisting all quasiconformal conjugacy classes of $f$. For $f$ that is not flexible Lattès, we show that there is a normal affine variety $X_f$ of dimension $2d-2$ and a holomorphic injection $i:\mathcal{M}_f\to X_f$ such that $i(\mathcal{M}_f)$ is precompact in $X_f$. In particular $\mathcal{M}_f$ is Carathéodory hyperbolic (i.e. bounded holomorphic functions separate points in $\mathcal{M}_f$), provided that $f$ is not flexible Lattès. This solves a conjecture of McMullen. When $d\geq 4$, we give a concrete construction of $X_f$ as the normalization of the Zariski closure of the image of the reciprocal multiplier spectrum morphism. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.03302 [pdf, other]

How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?

Authors: Siye Wu, Jian Xie, Jiangjie Chen, Tinghui Zhu, Kai Zhang, Yanghua Xiao

Abstract: By leveraging the retrieval of information from external knowledge databases, Large Language Models (LLMs) exhibit enhanced capabilities for accomplishing many knowledge-intensive tasks. However, due to the inherent flaws of current retrieval systems, there might exist irrelevant information within those retrieving top-ranked passages. In this work, we present a comprehensive investigation into th… ▽ More By leveraging the retrieval of information from external knowledge databases, Large Language Models (LLMs) exhibit enhanced capabilities for accomplishing many knowledge-intensive tasks. However, due to the inherent flaws of current retrieval systems, there might exist irrelevant information within those retrieving top-ranked passages. In this work, we present a comprehensive investigation into the robustness of LLMs to different types of irrelevant information under various conditions. We initially introduce a framework to construct high-quality irrelevant information that ranges from semantically unrelated, partially related, and related to questions. Furthermore, our analysis demonstrates that the constructed irrelevant information not only scores highly on similarity metrics, being highly retrieved by existing systems, but also bears semantic connections to the context. Our investigation reveals that current LLMs still face challenges in discriminating highly semantically related information and can be easily distracted by these irrelevant yet misleading content. Besides, we also find that current solutions for handling irrelevant information have limitations in improving the robustness of LLMs to such distractions. All the resources are available on GitHub at https://github.com/Di-viner/LLM-Robustness-to-Irrelevant-Information. △ Less

Submitted 24 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: COLM 2024

arXiv:2404.02747 [pdf, other]

Faster Diffusion via Temporal Attention Decomposition

Authors: Haozhe Liu, Wentian Zhang, Jinheng Xie, Francesco Faccio, Mengmeng Xu, Tao Xiang, Mike Zheng Shou, Juan-Manuel Perez-Rua, Jürgen Schmidhuber

Abstract: We explore the role of attention mechanism during inference in text-conditional diffusion models. Empirical observations suggest that cross-attention outputs converge to a fixed point after several inference steps. The convergence time naturally divides the entire inference process into two phases: an initial phase for planning text-oriented visual semantics, which are then translated into images… ▽ More We explore the role of attention mechanism during inference in text-conditional diffusion models. Empirical observations suggest that cross-attention outputs converge to a fixed point after several inference steps. The convergence time naturally divides the entire inference process into two phases: an initial phase for planning text-oriented visual semantics, which are then translated into images in a subsequent fidelity-improving phase. Cross-attention is essential in the initial phase but almost irrelevant thereafter. However, self-attention initially plays a minor role but becomes crucial in the second phase. These findings yield a simple and training-free method known as temporally gating the attention (TGATE), which efficiently generates images by caching and reusing attention outputs at scheduled time steps. Experimental results show when widely applied to various existing text-conditional diffusion models, TGATE accelerates these models by 10%-50%. The code of TGATE is available at https://github.com/HaozheLiu-ST/T-GATE. △ Less

Submitted 17 July, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.01448 [pdf]

Prior Frequency Guided Diffusion Model for Limited Angle (LA)-CBCT Reconstruction

Authors: Jiacheng Xie, Hua-Chieh Shao, Yunxiang Li, You Zhang

Abstract: Cone-beam computed tomography (CBCT) is widely used in image-guided radiotherapy. Reconstructing CBCTs from limited-angle acquisitions (LA-CBCT) is highly desired for improved imaging efficiency, dose reduction, and better mechanical clearance. LA-CBCT reconstruction, however, suffers from severe under-sampling artifacts, making it a highly ill-posed inverse problem. Diffusion models can generate… ▽ More Cone-beam computed tomography (CBCT) is widely used in image-guided radiotherapy. Reconstructing CBCTs from limited-angle acquisitions (LA-CBCT) is highly desired for improved imaging efficiency, dose reduction, and better mechanical clearance. LA-CBCT reconstruction, however, suffers from severe under-sampling artifacts, making it a highly ill-posed inverse problem. Diffusion models can generate data/images by reversing a data-noising process through learned data distributions; and can be incorporated as a denoiser/regularizer in LA-CBCT reconstruction. In this study, we developed a diffusion model-based framework, prior frequency-guided diffusion model (PFGDM), for robust and structure-preserving LA-CBCT reconstruction. PFGDM uses a conditioned diffusion model as a regularizer for LA-CBCT reconstruction, and the condition is based on high-frequency information extracted from patient-specific prior CT scans which provides a strong anatomical prior for LA-CBCT reconstruction. Specifically, we developed two variants of PFGDM (PFGDM-A and PFGDM-B) with different conditioning schemes. PFGDM-A applies the high-frequency CT information condition until a pre-optimized iteration step, and drops it afterwards to enable both similar and differing CT/CBCT anatomies to be reconstructed. PFGDM-B, on the other hand, continuously applies the prior CT information condition in every reconstruction step, while with a decaying mechanism, to gradually phase out the reconstruction guidance from the prior CT scans. The two variants of PFGDM were tested and compared with current available LA-CBCT reconstruction solutions, via metrics including PSNR and SSIM. PFGDM outperformed all traditional and diffusion model-based methods. PFGDM reconstructs high-quality LA-CBCTs under very-limited gantry angles, allowing faster and more flexible CBCT scans with dose reductions. △ Less

Submitted 8 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 20 pages, 8 figures, submitted to Physics in Medicine & Biology

arXiv:2404.00672 [pdf, other]

A General and Efficient Training for Transformer via Token Expansion

Authors: Wenxuan Huang, Yunhang Shen, Jiao Xie, Baochang Zhang, Gaoqi He, Ke Li, Xing Sun, Shaohui Lin

Abstract: The remarkable performance of Vision Transformers (ViTs) typically requires an extremely large training cost. Existing methods have attempted to accelerate the training of ViTs, yet typically disregard method universality with accuracy dropping. Meanwhile, they break the training consistency of the original transformers, including the consistency of hyper-parameters, architecture, and strategy, wh… ▽ More The remarkable performance of Vision Transformers (ViTs) typically requires an extremely large training cost. Existing methods have attempted to accelerate the training of ViTs, yet typically disregard method universality with accuracy dropping. Meanwhile, they break the training consistency of the original transformers, including the consistency of hyper-parameters, architecture, and strategy, which prevents them from being widely applied to different Transformer networks. In this paper, we propose a novel token growth scheme Token Expansion (termed ToE) to achieve consistent training acceleration for ViTs. We introduce an "initialization-expansion-merging" pipeline to maintain the integrity of the intermediate feature distribution of original transformers, preventing the loss of crucial learnable information in the training process. ToE can not only be seamlessly integrated into the training and fine-tuning process of transformers (e.g., DeiT and LV-ViT), but also effective for efficient training frameworks (e.g., EfficientTrain), without twisting the original training hyper-parameters, architecture, and introducing additional training strategies. Extensive experiments demonstrate that ToE achieves about 1.3x faster for the training of ViTs in a lossless manner, or even with performance gains over the full-token training baselines. Code is available at https://github.com/Osilly/TokenExpansion . △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024. Code is available at https://github.com/Osilly/TokenExpansion

arXiv:2404.00403 [pdf, other]

UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

Authors: Guimin Hu, Zhihong Zhu, Daniel Hershcovich, Hasti Seifi, Jiayuan Xie

Abstract: Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) has recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, thoughts, or situations are known as emotion causes. Both are like two sides of a coin, collectively describing human behaviors and intents. However, most existing wor… ▽ More Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) has recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, thoughts, or situations are known as emotion causes. Both are like two sides of a coin, collectively describing human behaviors and intents. However, most existing works treat MERC and MECPE as separate tasks, which may result in potential challenges in integrating emotion and cause in real-world applications. In this paper, we propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality and complementarity between emotion and emotion cause. Concretely, UniMEEC reformulates the MERC and MECPE tasks as two mask prediction problems, enhancing the interaction between emotion and cause. Meanwhile, UniMEEC shares the prompt learning among modalities for probing modality-specific knowledge from the Pre-trained model. Furthermore, we propose a task-specific hierarchical context aggregation to control the information flow to the task. Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks and achieve consistent improvements compared with state-of-the-art methods. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.19919 [pdf, other]

Diff-Reg v1: Diffusion Matching Model for Registration Problem

Authors: Qianliang Wu, Haobo Jiang, Lei Luo, Jun Li, Yaqing Ding, Jin Xie, Jian Yang

Abstract: Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, wh… ▽ More Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy matching matrix. In particular, the feature extraction from the backbone occurs only once during the inference phase. Our lightweight denoising module utilizes the same feature at each reverse sampling step. Evaluation of our method on both 3D and 2D3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg. △ Less

Submitted 24 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.00436

arXiv:2403.19710 [pdf, other]

STRUM-LLM: Attributed and Structured Contrastive Summarization

Authors: Beliz Gunel, James B. Wendt, Jing Xie, Yichao Zhou, Nguyen Vo, Zachary Fisher, Sandeep Tata

Abstract: Users often struggle with decision-making between two options (A vs B), as it usually requires time-consuming research across multiple web pages. We propose STRUM-LLM that addresses this challenge by generating attributed, structured, and helpful contrastive summaries that highlight key differences between the two options. STRUM-LLM identifies helpful contrast: the specific attributes along which… ▽ More Users often struggle with decision-making between two options (A vs B), as it usually requires time-consuming research across multiple web pages. We propose STRUM-LLM that addresses this challenge by generating attributed, structured, and helpful contrastive summaries that highlight key differences between the two options. STRUM-LLM identifies helpful contrast: the specific attributes along which the two options differ significantly and which are most likely to influence the user's decision. Our technique is domain-agnostic, and does not require any human-labeled data or fixed attribute list as supervision. STRUM-LLM attributes all extractions back to the input sources along with textual evidence, and it does not have a limit on the length of input sources that it can process. STRUM-LLM Distilled has 100x more throughput than the models with comparable performance while being 10x smaller. In this paper, we provide extensive evaluations for our method and lay out future directions for our currently deployed system. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.19627 [pdf, ps, other]

Four-dimensional gradient Ricci solitons with (half) nonnegative isotropic curvature

Authors: Huai-Dong Cao, Junming Xie

Abstract: This is a sequel to our paper [24], in which we investigated the geometry of 4-dimensional gradient shrinking Ricci solitons with half positive (nonnegative) isotropic curvature. In this paper, we mainly focus on 4-dimensional gradient steady Ricci solitons with nonnegative isotropic curvature (WPIC) or half nonnegative isotropic curvature (half WPIC). In particular, for $4$D complete {\it ancient… ▽ More This is a sequel to our paper [24], in which we investigated the geometry of 4-dimensional gradient shrinking Ricci solitons with half positive (nonnegative) isotropic curvature. In this paper, we mainly focus on 4-dimensional gradient steady Ricci solitons with nonnegative isotropic curvature (WPIC) or half nonnegative isotropic curvature (half WPIC). In particular, for $4$D complete {\it ancient solutions} with WPIC, we are able to prove the nonnegativity of the Ricci curvature $Rc\geq 0$ and bound the curvature tensor $Rm$ by $|Rm| \leq R$. For 4D gradient steady solitons with WPIC, we obtain a classification result. We also give a partial classification of 4D gradient steady Ricci solitons with half WPIC. Moreover, we obtain a preliminary classification result for 4D complete gradient {\it expanding Ricci solitons} with WPIC. Finally, motivated by the recent work [60], we improve our earlier results in [24] on 4D gradient {\it shrinking Ricci solitons} with half PIC or half WPIC, and also provide a characterization of complete gradient Kähler-Ricci shrinkers in complex dimension two among 4-dimensional gradient Ricci shrinkers. △ Less

Submitted 18 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: 21 pages; v.2: added Remark 1.3 & Remark 6.2

arXiv:2403.19521 [pdf, other]

Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

Authors: Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan

Abstract: In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregate… ▽ More In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregated with equal weight and added to the residual stream, the subsequent MLP acts as an ``activation,'' which either erases or amplifies the information originating from individual heads. As a result, the topic token ``France'' stands out in the residual stream. (3) A deep MLP takes ``France'' and generates a component that redirects the residual stream towards the direction of the correct answer, i.e., ``Paris.'' This procedure is akin to applying an implicit function such as ``get\_capital($X$),'' and the argument $X$ is the topic token information passed by attention heads. To achieve the above quantitative and qualitative analysis for MLPs, we proposed a novel analytic method aimed at decomposing the outputs of the MLP into components understandable by humans. Additionally, we observed a universal anti-overconfidence mechanism in the final layer of models, which suppresses correct predictions. We mitigate this suppression by leveraging our interpretation to improve factual recall confidence. The above interpretations are evaluated across diverse tasks spanning various domains of factual knowledge, using various language models from the GPT-2 families, 1.3B OPT, up to 7B Llama-2, and in both zero- and few-shot setups. △ Less

Submitted 24 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.16107 [pdf, other]

Designing Upper-Body Gesture Interaction with and for People with Spinal Muscular Atrophy in VR

Authors: Jingze Tian, Yingna Wang, Keye Yu, Liyi Xu, Junan Xie, Franklin Mingzhe Li, Yafeng Niu, Mingming Fan

Abstract: Recent research proposed gaze-assisted gestures to enhance interaction within virtual reality (VR), providing opportunities for people with motor impairments to experience VR. Compared to people with other motor impairments, those with Spinal Muscular Atrophy (SMA) exhibit enhanced distal limb mobility, providing them with more design space. However, it remains unknown what gaze-assisted upper-bod… ▽ More Recent research proposed gaze-assisted gestures to enhance interaction within virtual reality (VR), providing opportunities for people with motor impairments to experience VR. Compared to people with other motor impairments, those with Spinal Muscular Atrophy (SMA) exhibit enhanced distal limb mobility, providing them with more design space. However, it remains unknown what gaze-assisted upper-body gestures people with SMA would want and be able to perform. We conducted an elicitation study in which 12 VR-experienced people with SMA designed upper-body gestures for 26 VR commands, and collected 312 user-defined gestures. Participants predominantly favored creating gestures with their hands. The type of tasks and participants' abilities influence their choice of body parts for gesture design. Participants tended to enhance their body involvement and preferred gestures that required minimal physical effort, and were aesthetically pleasing. Our research will contribute to creating better gesture-based input methods for people with motor impairments to interact with VR. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

arXiv:2403.16053 [pdf, other]

Quantitatively predicting angle-resolved polarized Raman intensity of black phosphorus flakes

Authors: Tao Liu, Jia-Liang Xie, Yu-Chen Leng, Heng Wu, Jiahong Wang, Yang Li, Xue-Feng Yu, Miao-Ling Lin, Ping-Heng Tan

Abstract: In-plane anisotropic layered materials (ALMs), such as black phosphorus (BP), exhibit unique angle-resolved polarized Raman (ARPR) spectroscopy characteristics, as attributed to birefringence, linear dichroism and complex Raman tensor. Moreover, the ARPR intensity profiles of BP flakes deposited on multilayer dielectrics are notably sensitive to their thickness, owing to interference effects. The… ▽ More In-plane anisotropic layered materials (ALMs), such as black phosphorus (BP), exhibit unique angle-resolved polarized Raman (ARPR) spectroscopy characteristics, as attributed to birefringence, linear dichroism and complex Raman tensor. Moreover, the ARPR intensity profiles of BP flakes deposited on multilayer dielectrics are notably sensitive to their thickness, owing to interference effects. The intricate anisotropic effects present challenges in accurately predicting the ARPR intensity of BP flakes. In this study, we propose a comprehensive strategy for predicting the ARPR intensity of BP flakes by explicitly considering optical anisotropy, encompassing birefringence, linear dichroism, and anisotropic cavity interference effects within multilayered structures. Through this approach, we have identified the intrinsic complex Raman tensors for phonon modes, independent of the BP flake thickness. By leveraging this methodology, we have elucidated the flake thickness-dependent effective complex Raman tensor elements, allowing for precise prediction of the observed ARPR intensity profile for the BP flake. This work provides a profound understanding of ARPR behaviors for ALM flakes. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures

arXiv:2403.14983 [pdf, other]

Reconstructing the evolution history of networked complex systems

Authors: Junya Wang, Yi-Jiao Zhang, Cong Xu, Jiaze Li, Jiachen Sun, Jiarong Xie, Ling Feng, Tianshou Zhou, Yanqing Hu

Abstract: The evolution processes of complex systems carry key information in the systems' functional properties. Applying machine learning algorithms, we demonstrate that the historical formation process of various networked complex systems can be extracted, including protein-protein interaction, ecology, and social network systems. The recovered evolution process has demonstrations of immense scientific v… ▽ More The evolution processes of complex systems carry key information in the systems' functional properties. Applying machine learning algorithms, we demonstrate that the historical formation process of various networked complex systems can be extracted, including protein-protein interaction, ecology, and social network systems. The recovered evolution process has demonstrations of immense scientific values, such as interpreting the evolution of protein-protein interaction network, facilitating structure prediction, and particularly revealing the key co-evolution features of network structures such as preferential attachment, community structure, local clustering, degree-degree correlation that could not be explained collectively by previous theories. Intriguingly, we discover that for large networks, if the performance of the machine learning model is slightly better than a random guess on the pairwise order of links, reliable restoration of the overall network formation process can be achieved. This suggests that evolution history restoration is generally highly feasible on empirical networks. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.13660 [pdf]

ProMamba: Prompt-Mamba for polyp segmentation

Authors: Jianhao Xie, Ruofan Liao, Ziang Zhang, Sida Yi, Yuesheng Zhu, Guibo Luo

Abstract: Detecting polyps through colonoscopy is an important task in medical image segmentation, which provides significant assistance and reference value for clinical surgery. However, accurate segmentation of polyps is a challenging task due to two main reasons. Firstly, polyps exhibit various shapes and colors. Secondly, the boundaries between polyps and their normal surroundings are often unclear. Add… ▽ More Detecting polyps through colonoscopy is an important task in medical image segmentation, which provides significant assistance and reference value for clinical surgery. However, accurate segmentation of polyps is a challenging task due to two main reasons. Firstly, polyps exhibit various shapes and colors. Secondly, the boundaries between polyps and their normal surroundings are often unclear. Additionally, significant differences between different datasets lead to limited generalization capabilities of existing methods. To address these issues, we propose a segmentation model based on Prompt-Mamba, which incorporates the latest Vision-Mamba and prompt technologies. Compared to previous models trained on the same dataset, our model not only maintains high segmentation accuracy on the validation part of the same dataset but also demonstrates superior accuracy on unseen datasets, exhibiting excellent generalization capabilities. Notably, we are the first to apply the Vision-Mamba architecture to polyp segmentation and the first to utilize prompt technology in a polyp segmentation model. Our model efficiently accomplishes segmentation tasks, surpassing previous state-of-the-art methods by an average of 5% across six datasets. Furthermore, we have developed multiple versions of our model with scaled parameter counts, achieving better performance than previous models even with fewer parameters. Our code and trained weights will be released soon. △ Less

Submitted 26 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: 10 pages, 2 figures,3 tabels

arXiv:2403.12566 [pdf, other]

doi 10.1145/3589335.3648334

Context-based Fast Recommendation Strategy for Long User Behavior Sequence in Meituan Waimai

Authors: Zhichao Feng, Junjiie Xie, Kaiyuan Li, Yu Qin, Pengfei Wang, Qianzhong Li, Bin Yin, Xiang Li, Wei Lin, Shangguang Wang

Abstract: In the recommender system of Meituan Waimai, we are dealing with ever-lengthening user behavior sequences, which pose an increasing challenge to modeling user preference effectively. Existing sequential recommendation models often fail to capture long-term dependencies or are too complex, complicating the fulfillment of Meituan Waimai's unique business needs. To better model user interests, we con… ▽ More In the recommender system of Meituan Waimai, we are dealing with ever-lengthening user behavior sequences, which pose an increasing challenge to modeling user preference effectively. Existing sequential recommendation models often fail to capture long-term dependencies or are too complex, complicating the fulfillment of Meituan Waimai's unique business needs. To better model user interests, we consider selecting relevant sub-sequences from users' extensive historical behaviors based on their preferences. In this specific scenario, we've noticed that the contexts in which users interact have a significant impact on their preferences. For this purpose, we introduce a novel method called Context-based Fast Recommendation Strategy to tackle the issue of long sequences. We first identify contexts that share similar user preferences with the target context and then locate the corresponding PoIs based on these identified contexts. This approach eliminates the necessity to select a sub-sequence for every candidate PoI, thereby avoiding high time complexity. Specifically, we implement a prototype-based approach to pinpoint contexts that mirror similar user preferences. To amplify accuracy and interpretability, we employ JS divergence of PoI attributes such as categories and prices as a measure of similarity between contexts. A temporal graph integrating both prototype and context nodes helps incorporate temporal information. We then identify appropriate prototypes considering both target contexts and short-term user preferences. Following this, we utilize contexts aligned with these prototypes to generate a sub-sequence, aimed at predicting CTR and CTCVR scores with target attention. Since its inception in 2023, this strategy has been adopted in Meituan Waimai's display recommender system, leading to a 4.6% surge in CTR and a 4.2% boost in GMV. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 9 pages, accepted by WWW 2024 Industry Track

arXiv:2403.12455 [pdf, other]

CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation

Authors: Wenqi Zhu, Jiale Cao, Jin Xie, Shuangming Yang, Yanwei Pang

Abstract: Open-vocabulary video instance segmentation strives to segment and track instances belonging to an open set of categories in a video. The vision-language model Contrastive Language-Image Pre-training (CLIP) has shown robust zero-shot classification ability in image-level open-vocabulary task. In this paper, we propose a simple encoder-decoder network, called CLIP-VIS, to adapt CLIP for open-vocabu… ▽ More Open-vocabulary video instance segmentation strives to segment and track instances belonging to an open set of categories in a video. The vision-language model Contrastive Language-Image Pre-training (CLIP) has shown robust zero-shot classification ability in image-level open-vocabulary task. In this paper, we propose a simple encoder-decoder network, called CLIP-VIS, to adapt CLIP for open-vocabulary video instance segmentation. Our CLIP-VIS adopts frozen CLIP image encoder and introduces three modules, including class-agnostic mask generation, temporal topK-enhanced matching, and weighted open-vocabulary classification. Given a set of initial queries, class-agnostic mask generation employs a transformer decoder to predict query masks and corresponding object scores and mask IoU scores. Then, temporal topK-enhanced matching performs query matching across frames by using K mostly matched frames. Finally, weighted open-vocabulary classification first generates query visual features with mask pooling, and second performs weighted classification using object scores and mask IoU scores.Our CLIP-VIS does not require the annotations of instance categories and identities. The experiments are performed on various video instance segmentation datasets, which demonstrate the effectiveness of our proposed method, especially on novel categories. When using ConvNeXt-B as backbone, our CLIP-VIS achieves the AP and APn scores of 32.2% and 40.2% on validation set of LV-VIS dataset, which outperforms OV2Seg by 11.1% and 23.9% respectively. We will release the source code and models at https://github.com/zwq456/CLIP-VIS.git. △ Less

Submitted 7 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2403.11465 [pdf]

Ultra-Long Homochiral Graphene Nanoribbons Grown Within h-BN Stacks for High-Performance Electronics

Authors: Bosai Lyu, Jiajun Chen, Sen Wang, Shuo Lou, Peiyue Shen, Jingxu Xie, Lu Qiu, Izaac Mitchell, Can Li, Cheng Hu, Xianliang Zhou, Kenji Watanabe, Takashi Taniguchi, Xiaoqun Wang, Jinfeng Jia, Qi Liang, Guorui Chen, Tingxin Li, Shiyong Wang, Wengen Ouyang, Oded Hod, Feng Ding, Michael Urbakh, Zhiwen Shi

Abstract: Van der Waals encapsulation of two-dimensional materials within hexagonal boron nitride (h-BN) stacks has proven to be a promising way to create ultrahigh-performance electronic devices. However, contemporary approaches for achieving van der Waals encapsulation, which involve artificial layer stacking using mechanical transfer techniques, are difficult to control, prone to contamination, and unsca… ▽ More Van der Waals encapsulation of two-dimensional materials within hexagonal boron nitride (h-BN) stacks has proven to be a promising way to create ultrahigh-performance electronic devices. However, contemporary approaches for achieving van der Waals encapsulation, which involve artificial layer stacking using mechanical transfer techniques, are difficult to control, prone to contamination, and unscalable. Here, we report on the transfer-free direct growth of high-quality graphene nanoribbons (GNRs) within h-BN stacks. The as-grown embedded GNRs exhibit highly desirable features being ultralong (up to 0.25 mm), ultranarrow ( < 5 nm), and homochiral with zigzag edges. Our atomistic simulations reveal that the mechanism underlying the embedded growth involves ultralow GNR friction when sliding between AA'-stacked h-BN layers. Using the grown structures, we demonstrate the transfer-free fabrication of embedded GNR field-effect devices that exhibit excellent performance at room temperature with mobilities of up to 4,600 $cm^{2} V^{-1} s^{-1}$ and on-off ratios of up to $10^{6}$. This paves the way to the bottom-up fabrication of high-performance electronic devices based on embedded layered materials. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10732 [pdf, other]

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

Authors: Zhiyong Wang, Jize Xie, Yi Chen, John C. S. Lui, Dongruo Zhou

Abstract: We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the total variation budget $B_K$, which is the summation of the change of the consecutive feature vectors of the linear bandits over $K$ rounds. However, such a quantity only measures the non-stationarity with respect to the e… ▽ More We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the total variation budget $B_K$, which is the summation of the change of the consecutive feature vectors of the linear bandits over $K$ rounds. However, such a quantity only measures the non-stationarity with respect to the expectation of the reward distribution, which makes existing algorithms sub-optimal under the general non-stationary distribution setting. In this work, we propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. Specifically, we introduce two novel algorithms: Restarted Weighted$\text{OFUL}^+$ and Restarted $\text{SAVE}^+$. These algorithms address cases where the variance information of the rewards is known and unknown, respectively. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings. Experimental evaluations further validate the superior performance of our proposed algorithms over existing works. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 30 pages

arXiv:2403.10574 [pdf, other]

Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers

Authors: Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, Rongrong Ji

Abstract: The rich spatio-temporal information is crucial to capture the complicated target appearance variations in visual tracking. However, most top-performing tracking algorithms rely on many hand-crafted components for spatio-temporal information aggregation. Consequently, the spatio-temporal information is far away from being fully explored. To alleviate this issue, we propose an adaptive tracker with… ▽ More The rich spatio-temporal information is crucial to capture the complicated target appearance variations in visual tracking. However, most top-performing tracking algorithms rely on many hand-crafted components for spatio-temporal information aggregation. Consequently, the spatio-temporal information is far away from being fully explored. To alleviate this issue, we propose an adaptive tracker with spatio-temporal transformers (named AQATrack), which adopts simple autoregressive queries to effectively learn spatio-temporal information without many hand-designed components. Firstly, we introduce a set of learnable and autoregressive queries to capture the instantaneous target appearance changes in a sliding window fashion. Then, we design a novel attention mechanism for the interaction of existing queries to generate a new query in current frame. Finally, based on the initial target template and learnt autoregressive queries, a spatio-temporal information fusion module (STM) is designed for spatiotemporal formation aggregation to locate a target object. Benefiting from the STM, we can effectively combine the static appearance and instantaneous changes to guide robust tracking. Extensive experiments show that our method significantly improves the tracker's performance on six popular tracking benchmarks: LaSOT, LaSOText, TrackingNet, GOT-10k, TNL2K, and UAV123. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.09181 [pdf, ps, other]

On the dynamical Mordell-Lang conjecture in positive characteristic

Authors: Junyi Xie, She Yang

Abstract: We disprove the original version of the dynamical Mordell-Lang conjecture in positive characteristic and propose a improved version of this pDML conjecture. We prove that this new version holds for bounded-degree self-maps of projective varieties. Moreover, we propose a geometric version of this pDML conjecture. We disprove the original version of the dynamical Mordell-Lang conjecture in positive characteristic and propose a improved version of this pDML conjecture. We prove that this new version holds for bounded-degree self-maps of projective varieties. Moreover, we propose a geometric version of this pDML conjecture. △ Less

Submitted 13 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 35 pages, most of the article is rewritten

arXiv:2403.08931 [pdf, ps, other]

Unleashing the True Power of Age-of-Information: Service Aggregation in Connected and Autonomous Vehicles

Authors: Anik Mallik, Dawei Chen, Kyungtae Han, Jiang Xie, Zhu Han

Abstract: Connected and autonomous vehicles (CAVs) rely heavily upon time-sensitive information update services to ensure the safety of people and assets, and satisfactory entertainment applications. Therefore, the freshness of information is a crucial performance metric for CAV services. However, information from roadside sensors and nearby vehicles can get delayed in transmission due to the high mobility… ▽ More Connected and autonomous vehicles (CAVs) rely heavily upon time-sensitive information update services to ensure the safety of people and assets, and satisfactory entertainment applications. Therefore, the freshness of information is a crucial performance metric for CAV services. However, information from roadside sensors and nearby vehicles can get delayed in transmission due to the high mobility of vehicles. Our research shows that a CAV's relative distance and speed play an essential role in determining the Age-of-Information (AoI). With an increase in AoI, incremental service aggregation issues are observed with out-of-sequence information updates, which hampers the performance of low-latency applications in CAVs. In this paper, we propose a novel AoI-based service aggregation method for CAVs, which can process the information updates according to their update cycles. First, the AoI for sensors and vehicles is modeled, and a predictive AoI system is designed. Then, to reduce the overall service aggregation time and computational load, intervals are used for periodic AoI prediction, and information sources are clustered based on the AoI value. Finally, the system aggregates services for CAV applications using the predicted AoI. We evaluate the system performance based on data sequencing success rate (DSSR) and overall system latency. Lastly, we compare the performance of our proposed system with three other state-of-the-art methods. The evaluation and comparison results show that our proposed predictive AoI-based service aggregation system maintains satisfactory latency and DSSR for CAV applications and outperforms other existing methods. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 6 pages, 8 figures, to appear in the Proceedings of IEEE International Conference on Communications (IEEE ICC, 9-13 June 2024, Denver, CO, USA)

arXiv:2403.08154 [pdf, other]

The Effect of Different Optimization Strategies to Physics-Constrained Deep Learning for Soil Moisture Estimation

Authors: Jianxin Xie, Bing Yao, Zheyu Jiang

Abstract: Soil moisture is a key hydrological parameter that has significant importance to human society and the environment. Accurate modeling and monitoring of soil moisture in crop fields, especially in the root zone (top 100 cm of soil), is essential for improving agricultural production and crop yield with the help of precision irrigation and farming tools. Realizing the full sensor data potential depe… ▽ More Soil moisture is a key hydrological parameter that has significant importance to human society and the environment. Accurate modeling and monitoring of soil moisture in crop fields, especially in the root zone (top 100 cm of soil), is essential for improving agricultural production and crop yield with the help of precision irrigation and farming tools. Realizing the full sensor data potential depends greatly on advanced analytical and predictive domain-aware models. In this work, we propose a physics-constrained deep learning (P-DL) framework to integrate physics-based principles on water transport and water sensing signals for effective reconstruction of the soil moisture dynamics. We adopt three different optimizers, namely Adam, RMSprop, and GD, to minimize the loss function of P-DL during the training process. In the illustrative case study, we demonstrate the empirical convergence of Adam optimizers outperforms the other optimization methods in both mini-batch and full-batch training. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07228 [pdf, other]

Physics-constrained Active Learning for Soil Moisture Estimation and Optimal Sensor Placement

Authors: Jianxin Xie, Bing Yao, Zheyu Jiang

Abstract: Soil moisture is a crucial hydrological state variable that has significant importance to the global environment and agriculture. Precise monitoring of soil moisture in crop fields is critical to reducing agricultural drought and improving crop yield. In-situ soil moisture sensors, which are buried at pre-determined depths and distributed across the field, are promising solutions for monitoring so… ▽ More Soil moisture is a crucial hydrological state variable that has significant importance to the global environment and agriculture. Precise monitoring of soil moisture in crop fields is critical to reducing agricultural drought and improving crop yield. In-situ soil moisture sensors, which are buried at pre-determined depths and distributed across the field, are promising solutions for monitoring soil moisture. However, high-density sensor deployment is neither economically feasible nor practical. Thus, to achieve a higher spatial resolution of soil moisture dynamics using a limited number of sensors, we integrate a physics-based agro-hydrological model based on Richards' equation in a physics-constrained deep learning framework to accurately predict soil moisture dynamics in the soil's root zone. This approach ensures that soil moisture estimates align well with sensor observations while obeying physical laws at the same time. Furthermore, to strategically identify the locations for sensor placement, we introduce a novel active learning framework that combines space-filling design and physics residual-based sampling to maximize data acquisition potential with limited sensors. Our numerical results demonstrate that integrating Physics-constrained Deep Learning (P-DL) with an active learning strategy within a unified framework--named the Physics-constrained Active Learning (P-DAL) framework--significantly improves the predictive accuracy and effectiveness of field-scale soil moisture monitoring using in-situ sensors. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06396 [pdf, ps, other]

A Segmentation Foundation Model for Diverse-type Tumors

Authors: Jianhao Xie, Ziang Zhang, Guibo Luo, Yuesheng Zhu

Abstract: Large pre-trained models with their numerous model parameters and extensive training datasets have shown excellent performance in various tasks. Many publicly available medical image datasets do not have a sufficient amount of data so there are few large-scale models in medical imaging. We propose a large-scale Tumor Segmentation Foundation Model (TSFM) with 1.6 billion parameters using Resblock-b… ▽ More Large pre-trained models with their numerous model parameters and extensive training datasets have shown excellent performance in various tasks. Many publicly available medical image datasets do not have a sufficient amount of data so there are few large-scale models in medical imaging. We propose a large-scale Tumor Segmentation Foundation Model (TSFM) with 1.6 billion parameters using Resblock-backbone and Transformer-bottleneck,which has good transfer ability for downstream tasks. To make TSFM exhibit good performance in tumor segmentation, we make full use of the strong spatial correlation between tumors and organs in the medical image, innovatively fuse 7 tumor datasets and 3 multi-organ datasets to build a 3D medical dataset pool, including 2779 cases with totally 300k medical images, whose size currently exceeds many other single publicly available datasets. TSFM is the pre-trained model for medical image segmentation, which also can be transferred to multiple downstream tasks for fine-tuning learning. The average performance of our pre-trained model is 2% higher than that of nnU-Net across various tumor types. In the transfer learning task, TSFM only needs 5% training epochs of nnU-Net to achieve similar performance and can surpass nnU-Net by 2% on average with 10% training epoch. Pre-trained TSFM and its code will be released soon. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: 10 pages, 2 figures.About Medical image segmentation and Foundation Model

ACM Class: I.4.6

arXiv:2403.01676 [pdf, ps, other]

Production of $X_b$ via radiative transition of $Υ(10753)$

Authors: Shi-Dong Liu, Hao-Dong Cai, Zu-Xin Cai, Hong-Shuo Gao, Gang Li, Fan Wang, Ju-Jun Xie

Abstract: We studied the radiative transitions between the $Υ(10753)$, the $S$-$D$ mixed state of the $Υ(4S)$ and $Υ_1(3\,{}^3D_1)$, and the $X_b$, the heavy quark flavor symmetry counterpart of the $X(3782)$ in the bottomonium sector. The radiative transition was assumed to occur through the intermediate bottom mesons, including $P$-wave $B_1^{(\prime)}$ mesons as well as the $S$-wave $B^{(*)}$ ones. The c… ▽ More We studied the radiative transitions between the $Υ(10753)$, the $S$-$D$ mixed state of the $Υ(4S)$ and $Υ_1(3\,{}^3D_1)$, and the $X_b$, the heavy quark flavor symmetry counterpart of the $X(3782)$ in the bottomonium sector. The radiative transition was assumed to occur through the intermediate bottom mesons, including $P$-wave $B_1^{(\prime)}$ mesons as well as the $S$-wave $B^{(*)}$ ones. The consideration of the $B_1^{(\prime)}$ mesons leads to the couplings to be in $S$-wave, and hence enhances the contributions of the intermediate meson loops. The radiative decay width for the $Υ(10753)\toγX_b$ is predicted to be order of $10~\mathrm{keV}$, corresponding to a branching fraction of $10^{-4}$. Based on the theoretical results, we strongly suggest to search for the $X_b$ in the $e^+e^-\toγX_b$ with $X_b\toππχ_{b1}$ near $\sqrt{s}=10.754~\mathrm{GeV}$, and it is hoped that the calculations here could be tested by the future Belle II experiments. △ Less

Submitted 9 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 7 pages, 4 figures, accepted by PRD(20240510)

arXiv:2403.01456 [pdf]

Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment

Authors: Jingshen Zhang, Jiajun Xie, Xinying Qiu

Abstract: Item difficulty plays a crucial role in adaptive testing. However, few works have focused on generating questions of varying difficulty levels, especially for multiple-choice (MC) cloze tests. We propose training pre-trained language models (PLMs) as surrogate models to enable item response theory (IRT) assessment, avoiding the need for human test subjects. We also propose two strategies to contro… ▽ More Item difficulty plays a crucial role in adaptive testing. However, few works have focused on generating questions of varying difficulty levels, especially for multiple-choice (MC) cloze tests. We propose training pre-trained language models (PLMs) as surrogate models to enable item response theory (IRT) assessment, avoiding the need for human test subjects. We also propose two strategies to control the difficulty levels of both the gaps and the distractors using ranking rules to reduce invalid distractors. Experimentation on a benchmark dataset demonstrates that our proposed framework and methods can effectively control and evaluate the difficulty levels of MC cloze tests. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.00331 [pdf, other]

WindGP: Efficient Graph Partitioning on Heterogenous Machines

Authors: Li Zeng, Haohan Huang, Binfan Zheng, Kang Yang, Shengcheng Shao, Jinhua Zhou, Jun Xie, Rongqian Zhao, Xin Chen

Abstract: Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider repli… ▽ More Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider replication factor and neglect the difference of machines in realistic data centers. In this paper, we propose a general graph partitioning algorithm WindGP, which can support fast and high-quality edge partitioning on heterogeneous machines. WindGP designs novel preprocessing techniques to simplify the metric and balance the computation cost according to the characteristics of graphs and machines. Also, best-first search is proposed instead of BFS and DFS, in order to generate clusters with high cohesion. Furthermore, WindGP adaptively tunes the partition results by sophisticated local search methods. Extensive experiments show that WindGP outperforms all state-of-the-art partition methods by 1.35 - 27 times on both dense and sparse distributed graph algorithms, and has good scalability with graph size and machine number. △ Less

Submitted 6 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: 19 pages, 15 figures, 18 tables

arXiv:2402.18035 [pdf, other]

doi 10.1088/1674-4527/ad2dbe

A study of 10 Rotating Radio Transients using Parkes radio telescope

Authors: Xinhui Ren, Jingbo Wang, Wenming Yan, Jintao Xie, Shuangqiang Wang, Yirong Wen, Yong Xia

Abstract: Rotating Radio Transients (RRATs) are a relatively new subclass of pulsars that emit detectable radio bursts sporadically. We conducted an analysis of 10 RRATs observed using the Parkes telescope, with 8 of these observed via the Ultra-Wideband Receiver. We measured the burst rate and produced integrated profiles spanning multiple frequency bands for 3 RRATs. We also conducted a spectral analysis… ▽ More Rotating Radio Transients (RRATs) are a relatively new subclass of pulsars that emit detectable radio bursts sporadically. We conducted an analysis of 10 RRATs observed using the Parkes telescope, with 8 of these observed via the Ultra-Wideband Receiver. We measured the burst rate and produced integrated profiles spanning multiple frequency bands for 3 RRATs. We also conducted a spectral analysis on both integrated pulses and individual pulses of 3 RRATs. All of their integrated pulses follow a simple power law, consistent with the known range of pulsar spectral indices. Their average spectral indices of single pulses are -0.9, -1.2, and -1.0 respectively, which are within the known range of pulsar spectral indices. Additionally, we find that the spreads of single-pulse spectral indices for these RRATs (ranging from -3.5 to +0.5) are narrower compared to what has been observed in other RRATs (Shapiro-Albert et al. 2018; Xie et al. 2022). It is notable that the average spectral index and scatter of single pulses are both relatively small. For the remaining 5 RRATs observed at the UWL receiver, we also provided the upper limits on fluence and flux density. In addition, we obtained the timing solution of PSR J1709-43. Our analysis shows that PSRs J1919+1745, J1709-43 and J1649-4653 are potentially nulling pulsars or weak pulsars with sparse strong pulses. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 16 pages, 8 figures, RAA accepted

arXiv:2402.17365 [pdf, ps, other]

doi 10.1088/0256-307X/41/2/029701

A Search for Radio Pulsars in Supernova Remnants Using FAST with One Pulsar Discovered

Authors: Zhen Zhang, Wen-Ming Yan, Jian-Ping Yuan, Na Wang, Jun-Tao Bai, Zhi-Gang Wen, Bao-Da Li, Jin-Tao Xie, De Zhao, Yu-Bin Wang, Nan-Nan Zhai

Abstract: We report on the results of a search for radio pulsars in five supernova remnants (SNRs) with FAST. The observations were made using the 19-beam receiver in the Snapshot mode. The integration time for each pointing is 10 min. We discovered a new pulsar PSR J1845$-$0306 which has a spin period of 983.6 ms and a dispersion measure of 444.6$\pm$2.0 cm$^{-3}$ pc in observations of SNR G29.6+0.1. To ju… ▽ More We report on the results of a search for radio pulsars in five supernova remnants (SNRs) with FAST. The observations were made using the 19-beam receiver in the Snapshot mode. The integration time for each pointing is 10 min. We discovered a new pulsar PSR J1845$-$0306 which has a spin period of 983.6 ms and a dispersion measure of 444.6$\pm$2.0 cm$^{-3}$ pc in observations of SNR G29.6+0.1. To judge the association between the pulsar and the SNR, further verification is needed. We also re-detected some known pulsars in the data from SNRs G29.6+0.1 and G29.7$-$0.3. No pulsars were detected in observations of other three SNRs. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 6 pages, 2 figures, 2 tables published in CPL

Journal ref: Chin. Phys. Lett. 2024, 41 (2): 029701 February 2024

arXiv:2402.17179 [pdf, other]

Dual-Space Optimization: Improved Molecule Sequence Design by Latent Prompt Transformer

Authors: Deqian Kong, Yuhao Huang, Jianwen Xie, Edouardo Honig, Ming Xu, Shuanghong Xue, Pei Lin, Sanping Zhou, Sheng Zhong, Nanning Zheng, Ying Nian Wu

Abstract: Designing molecules with desirable properties, such as drug-likeliness and high binding affinities towards protein targets, is a challenging problem. In this paper, we propose the Dual-Space Optimization (DSO) method that integrates latent space sampling and data space selection to solve this problem. DSO iteratively updates a latent space generative model and a synthetic dataset in an optimizatio… ▽ More Designing molecules with desirable properties, such as drug-likeliness and high binding affinities towards protein targets, is a challenging problem. In this paper, we propose the Dual-Space Optimization (DSO) method that integrates latent space sampling and data space selection to solve this problem. DSO iteratively updates a latent space generative model and a synthetic dataset in an optimization process that gradually shifts the generative model and the synthetic data towards regions of desired property values. Our generative model takes the form of a Latent Prompt Transformer (LPT) where the latent vector serves as the prompt of a causal transformer. Our extensive experiments demonstrate effectiveness of the proposed method, which sets new performance benchmarks across single-objective, multi-objective and constrained molecule design tasks. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15116 [pdf, other]

Large Multimodal Agents: A Survey

Authors: Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, Guanbin Li

Abstract: Large language models (LLMs) have achieved superior performance in powering text-based AI agents, endowing them with decision-making and reasoning abilities akin to humans. Concurrently, there is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This extension enables AI agents to interpret and respond to diverse multimodal user queries, thereb… ▽ More Large language models (LLMs) have achieved superior performance in powering text-based AI agents, endowing them with decision-making and reasoning abilities akin to humans. Concurrently, there is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This extension enables AI agents to interpret and respond to diverse multimodal user queries, thereby handling more intricate and nuanced tasks. In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short). First, we introduce the essential components involved in developing LMAs and categorize the current body of research into four distinct types. Subsequently, we review the collaborative frameworks integrating multiple LMAs , enhancing collective efficacy. One of the critical challenges in this field is the diverse evaluation methods used across existing studies, hindering effective comparison among different LMAs . Therefore, we compile these evaluation methodologies and establish a comprehensive framework to bridge the gaps. This framework aims to standardize evaluations, facilitating more meaningful comparisons. Concluding our review, we highlight the extensive applications of LMAs and propose possible future research directions. Our discussion aims to provide valuable insights and guidelines for future research in this rapidly evolving field. An up-to-date resource list is available at https://github.com/jun0wanan/awesome-large-multimodal-agents. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 15 pages, 4 figures

arXiv:2402.15069 [pdf, other]

Investigation of profile shifting and subpulse movement in PSR J0344-0901 with FAST

Authors: H. M. Tedila, R. Yuen, N. Wang, D. Li, Z. G. Wen, W. M. Yan, J. P. Yuan, X. H. Han, P. Wang, W. W. Zhu, S. J. Dang, S. Q. Wang, J. T. Xie, Q. D. Wu, Sh. Khasanov, FAST Collaboration

Abstract: We report two phenomena detected in PSR J0344$-$0901 from two observations conducted at frequency centered at 1.25 GHz using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The first phenomenon manifests as shifting in the pulse emission to later longitudinal phases and then gradually returns to its original location. The event lasts for about 216 pulse periods, with an average s… ▽ More We report two phenomena detected in PSR J0344$-$0901 from two observations conducted at frequency centered at 1.25 GHz using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The first phenomenon manifests as shifting in the pulse emission to later longitudinal phases and then gradually returns to its original location. The event lasts for about 216 pulse periods, with an average shift of about $0.7^\circ$ measured at the peak of the integrated profile. Changes in the polarization position angle (PPA) are detected around the trailing edge of the profile, together with an increase in the profile width. The second phenomenon is characterized by the apparent movement of subpulses, which results in different subpulse track patterns across the profile window. For the first time in this pulsar, we identify four emission modes, each with unique subpulse movement, and determine the pattern periods for three of the emission modes. Pulse nulling was not detected. Modeling of the changes in the PPA using the rotating vector model gives an inclination angle of $75.12^\circ \pm 3.80^\circ$ and an impact parameter of $-3.17^\circ \pm 5.32^\circ$ for this pulsar. We speculate that the subpulse movement may be related to the shifting of the pulse emission. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14789 [pdf, other]

Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning

Authors: Johnathan Xie, Yoonho Lee, Annie S. Chen, Chelsea Finn

Abstract: Self-supervised learning excels in learning representations from large amounts of unlabeled data, demonstrating success across multiple data modalities. Yet, extending self-supervised learning to new modalities is non-trivial because the specifics of existing methods are tailored to each domain, such as domain-specific augmentations which reflect the invariances in the target task. While masked mo… ▽ More Self-supervised learning excels in learning representations from large amounts of unlabeled data, demonstrating success across multiple data modalities. Yet, extending self-supervised learning to new modalities is non-trivial because the specifics of existing methods are tailored to each domain, such as domain-specific augmentations which reflect the invariances in the target task. While masked modeling is promising as a domain-agnostic framework for self-supervised learning because it does not rely on input augmentations, its mask sampling procedure remains domain-specific. We present Self-guided Masked Autoencoders (SMA), a fully domain-agnostic masked modeling method. SMA trains an attention based model using a masked modeling objective, by learning masks to sample without any domain-specific assumptions. We evaluate SMA on three self-supervised learning benchmarks in protein biology, chemical property prediction, and particle physics. We find SMA is capable of learning representations without domain-specific knowledge and achieves state-of-the-art performance on these three benchmarks. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: ICLR 2024

arXiv:2402.13598 [pdf, other]

User-LLM: Efficient LLM Contextualization with User Embeddings

Authors: Lin Ning, Luyang Liu, Jiaxing Wu, Neo Wu, Devora Berlowitz, Sushant Prakash, Bradley Green, Shawn O'Banion, Jun Xie

Abstract: Large language models (LLMs) have revolutionized natural language processing. However, effectively incorporating complex and potentially noisy user interaction data remains a challenge. To address this, we propose User-LLM, a novel framework that leverages user embeddings to contextualize LLMs. These embeddings, distilled from diverse user interactions using self-supervised pretraining, capture la… ▽ More Large language models (LLMs) have revolutionized natural language processing. However, effectively incorporating complex and potentially noisy user interaction data remains a challenge. To address this, we propose User-LLM, a novel framework that leverages user embeddings to contextualize LLMs. These embeddings, distilled from diverse user interactions using self-supervised pretraining, capture latent user preferences and their evolution over time. We integrate these user embeddings with LLMs through cross-attention and soft-prompting, enabling LLMs to dynamically adapt to user context. Our comprehensive experiments on MovieLens, Amazon Review, and Google Local Review datasets demonstrate significant performance gains across various tasks. Notably, our approach outperforms text-prompt-based contextualization on long sequence tasks and tasks that require deep user understanding while being computationally efficient. We further incorporate Perceiver layers to streamline the integration between user encoders and LLMs, reducing computational demands. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12908 [pdf, other]

RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

Authors: Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui

Abstract: Diffusion models have achieved remarkable advancements in text-to-image generation. However, existing models still have many difficulties when faced with multiple-object compositional generation. In this paper, we propose RealCompo, a new training-free and transferred-friendly text-to-image generation framework, which aims to leverage the respective advantages of text-to-image models and spatial-a… ▽ More Diffusion models have achieved remarkable advancements in text-to-image generation. However, existing models still have many difficulties when faced with multiple-object compositional generation. In this paper, we propose RealCompo, a new training-free and transferred-friendly text-to-image generation framework, which aims to leverage the respective advantages of text-to-image models and spatial-aware image diffusion models (e.g., layout, keypoints and segmentation maps) to enhance both realism and compositionality of the generated images. An intuitive and novel balancer is proposed to dynamically balance the strengths of the two models in denoising process, allowing plug-and-play use of any model without extra training. Extensive experiments show that our RealCompo consistently outperforms state-of-the-art text-to-image models and spatial-aware image diffusion models in multiple-object compositional generation while keeping satisfactory realism and compositionality of the generated images. Notably, our RealCompo can be seamlessly extended with a wide range of spatial-aware image diffusion models and stylized diffusion models. Our code is available at: https://github.com/YangLing0818/RealCompo △ Less

Submitted 24 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Project: https://github.com/YangLing0818/RealCompo

arXiv:2402.12678 [pdf, ps, other]

Algebraic dynamics and recursive inequalities

Authors: Junyi Xie

Abstract: We get three basic results in algebraic dynamics: (1). We give the first algorithm to compute the dynamical degrees to arbitrary precision. (2). We prove that for a family of dominant rational self-maps, the dynamical degrees are lower semi-continuous with respect to the Zariski topology. This implies a conjecture of Call and Silverman. (3). We prove that the set of periodic points of a coho… ▽ More We get three basic results in algebraic dynamics: (1). We give the first algorithm to compute the dynamical degrees to arbitrary precision. (2). We prove that for a family of dominant rational self-maps, the dynamical degrees are lower semi-continuous with respect to the Zariski topology. This implies a conjecture of Call and Silverman. (3). We prove that the set of periodic points of a cohomologically hyperbolic rational self-map is Zariski dense. Moreover, we show that, after a large iterate, every degree sequence grows almost at a uniform rate. This property is not satisfied for general submultiplicative sequences. Finally, we prove the Kawaguchi-Silverman conjecture for a class of self-maps of projective surfaces including all the birational ones. In fact, for every dominant rational self-map, we find a family of recursive inequalities of some dynamically meaningful cycles. Our proofs are based on these inequalities. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 45 pages

arXiv:2402.11428 [pdf, other]

Modelling The Radial Distribution of Pulsars in the Galaxy

Authors: J. T. Xie, J. B. Wang, N. Wang, R. Manchester, G. Hobbs

Abstract: The Parkes 20 cm Multibeam pulsar surveys have discovered nearly half of the known pulsars and revealed many distant pulsars with high dispersion measures. Using a sample of 1,301 pulsars from these surveys, we have explored the spatial distribution and birth rate of normal pulsars. The pulsar distances used to calculate the pulsar surface density are estimated from the YMW16 electron-density mode… ▽ More The Parkes 20 cm Multibeam pulsar surveys have discovered nearly half of the known pulsars and revealed many distant pulsars with high dispersion measures. Using a sample of 1,301 pulsars from these surveys, we have explored the spatial distribution and birth rate of normal pulsars. The pulsar distances used to calculate the pulsar surface density are estimated from the YMW16 electron-density model. When estimating the impact of the Galactic background radiation on our survey, we projected pulsars in the Galaxy onto the Galactic plane, assuming that the flux density distribution of pulsars is uniform in all directions, and utilized the most up-to-date background temperature map. We also used an up-to-date version of the ATNF Pulsar Catalogue to model the distribution of pulsar flux densities at 1400 MHz. We derive an improved radial distribution for the pulsar surface density projected on to the Galactic plane, which has a maximum value at $\sim$4 kpc from the Galactic Centre. We also derive the local surface density and birthrate of pulsars, obtaining 47 $\pm$ 5 $\mathrm{kpc^{-2}}$ and $\sim$ 4.7 $\pm$ 0.5 $\mathrm{kpc^{-2}\ Myr^{-1}}$, respectively. For the total number of potentially detectable pulsars in the Galaxy, we obtain (1.1 $\pm$ 0.2) $\times$ $10^{4}$ and (1.1 $\pm$ 0.2) $\times$ $10^{5}$ before and after applying the TM98 beaming correction model. The radial distribution function is used to estimate the proportion of pulsars in each spiral arm and the Galactic centre. △ Less

Submitted 22 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.10045 [pdf]

Short-Form Videos and Mental Health: A Knowledge-Guided Neural Topic Model

Authors: Jiaheng Xie, Ruicheng Liang, Yidong Chai, Yang Liu, Daniel Zeng

Abstract: While short-form videos head to reshape the entire social media landscape, experts are exceedingly worried about their depressive impacts on viewers, as evidenced by medical studies. To prevent widespread consequences, platforms are eager to predict these videos' impact on viewers' mental health. Subsequently, they can take intervention measures, such as revising recommendation algorithms and disp… ▽ More While short-form videos head to reshape the entire social media landscape, experts are exceedingly worried about their depressive impacts on viewers, as evidenced by medical studies. To prevent widespread consequences, platforms are eager to predict these videos' impact on viewers' mental health. Subsequently, they can take intervention measures, such as revising recommendation algorithms and displaying viewer discretion. Nevertheless, applicable predictive methods lack relevance to well-established medical knowledge, which outlines clinically proven external and environmental factors of depression. To account for such medical knowledge, we resort to an emergent methodological discipline, seeded Neural Topic Models (NTMs). However, existing seeded NTMs suffer from the limitations of single-origin topics, unknown topic sources, unclear seed supervision, and suboptimal convergence. To address those challenges, we develop a novel Knowledge-guided Multimodal NTM to predict a short-form video's depressive impact on viewers. Extensive empirical analyses using TikTok and Douyin datasets prove that our method outperforms state-of-the-art benchmarks. Our method also discovers medically relevant topics from videos that are linked to depressive impact. We contribute to IS with a novel video analytics method that is generalizable to other video classification problems. Practically, our method can help platforms understand videos' mental impacts, thus adjusting recommendations and video topic disclosure. △ Less

Submitted 21 March, 2024; v1 submitted 10 January, 2024; originally announced February 2024.

arXiv:2402.05607 [pdf, other]

Internal Model Control design for systems learned by Control Affine Neural Nonlinear Autoregressive Exogenous Models

Authors: Jing Xie, Fabio Bonassi, Riccardo Scattolini

Abstract: This paper explores the use of Control Affine Neural Nonlinear AutoRegressive eXogenous (CA-NNARX) models for nonlinear system identification and model-based control design. The idea behind this architecture is to match the known control-affine structure of the system to achieve improved performance. Coherently with recent literature of neural networks for data-driven control, we first analyze the… ▽ More This paper explores the use of Control Affine Neural Nonlinear AutoRegressive eXogenous (CA-NNARX) models for nonlinear system identification and model-based control design. The idea behind this architecture is to match the known control-affine structure of the system to achieve improved performance. Coherently with recent literature of neural networks for data-driven control, we first analyze the stability properties of CA-NNARX models, devising sufficient conditions for their incremental Input-to-State Stability ($δ$ISS) that can be enforced at the model training stage. The model's stability property is then leveraged to design a stable Internal Model Control (IMC) architecture. The proposed control scheme is tested on a simulated Quadruple Tank benchmark system to address the output reference tracking problem. The results achieved show that (i) the modeling accuracy of CA-NNARX is superior to the one of a standard NNARX model for given weight size and training epochs, and (ii) the proposed IMC law provides performance comparable to the ones of a standard Model Predictive Controller (MPC) at a significantly lower computational burden. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05606 [pdf, other]

A Learning-based Model Predictive Control Scheme with Application to Temperature Control Units

Authors: Jing Xie, Léo Simpson, Jonas Asprion, Riccardo Scattolini

Abstract: Temperature control is a complex task due to its often unknown dynamics and disturbances. This paper explores the use of Neural Nonlinear AutoRegressive eXogenous (NNARX) models for nonlinear system identification and model predictive control of a temperature control unit. First, the NNARX model is identified from input-output data collected from the real plant, and a state-space representation wi… ▽ More Temperature control is a complex task due to its often unknown dynamics and disturbances. This paper explores the use of Neural Nonlinear AutoRegressive eXogenous (NNARX) models for nonlinear system identification and model predictive control of a temperature control unit. First, the NNARX model is identified from input-output data collected from the real plant, and a state-space representation with known measurable states consisting of past input and output variables is formulated. Second, a tailored model predictive controller is designed based on the trained NNARX network. The proposed control architecture is experimentally tested on the temperature control units manufactured by Tool-Temp AG. The results achieved are compared with those obtained using a PI controller and a linear MPC. The findings illustrate that the proposed scheme achieves satisfactory tracking performance while incurring the lowest energy cost among the compared controllers. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.04710 [pdf, other]

Incorporating Retrieval-based Causal Learning with Information Bottlenecks for Interpretable Graph Neural Networks

Authors: Jiahua Rao, Jiancong Xie, Hanjing Lin, Shuangjia Zheng, Zhen Wang, Yuedong Yang

Abstract: Graph Neural Networks (GNNs) have gained considerable traction for their capability to effectively process topological data, yet their interpretability remains a critical concern. Current interpretation methods are dominated by post-hoc explanations to provide a transparent and intuitive understanding of GNNs. However, they have limited performance in interpreting complicated subgraphs and can't u… ▽ More Graph Neural Networks (GNNs) have gained considerable traction for their capability to effectively process topological data, yet their interpretability remains a critical concern. Current interpretation methods are dominated by post-hoc explanations to provide a transparent and intuitive understanding of GNNs. However, they have limited performance in interpreting complicated subgraphs and can't utilize the explanation to advance GNN predictions. On the other hand, transparent GNN models are proposed to capture critical subgraphs. While such methods could improve GNN predictions, they usually don't perform well on explanations. Thus, it is desired for a new strategy to better couple GNN explanation and prediction. In this study, we have developed a novel interpretable causal GNN framework that incorporates retrieval-based causal learning with Graph Information Bottleneck (GIB) theory. The framework could semi-parametrically retrieve crucial subgraphs detected by GIB and compress the explanatory subgraphs via a causal module. The framework was demonstrated to consistently outperform state-of-the-art methods, and to achieve 32.71\% higher precision on real-world explanation scenarios with diverse explanation types. More importantly, the learned explanations were shown able to also improve GNN prediction performance. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.04647 [pdf, other]

Latent Plan Transformer: Planning as Latent Variable Inference

Authors: Deqian Kong, Dehong Xu, Minglu Zhao, Bo Pang, Jianwen Xie, Andrew Lizarraga, Yuhao Huang, Sirui Xie, Ying Nian Wu

Abstract: In tasks aiming for long-term returns, planning becomes essential. We study generative modeling for planning with datasets repurposed from offline reinforcement learning. Specifically, we identify temporal consistency in the absence of step-wise rewards as one key technical challenge. We introduce the Latent Plan Transformer (LPT), a novel model that leverages a latent space to connect a Transform… ▽ More In tasks aiming for long-term returns, planning becomes essential. We study generative modeling for planning with datasets repurposed from offline reinforcement learning. Specifically, we identify temporal consistency in the absence of step-wise rewards as one key technical challenge. We introduce the Latent Plan Transformer (LPT), a novel model that leverages a latent space to connect a Transformer-based trajectory generator and the final return. LPT can be learned with maximum likelihood estimation on trajectory-return pairs. In learning, posterior sampling of the latent variable naturally integrates sub-trajectories to form a consistent abstraction despite the finite context. At test time, the latent variable is inferred from an expected return before policy execution, realizing the idea of planning as inference. Our experiments demonstrate that LPT can discover improved decisions from suboptimal trajectories, achieving competitive performance across several benchmarks, including Gym-Mujoco, Franka Kitchen, Maze2D, and Connect Four. It exhibits capabilities in nuanced credit assignments, trajectory stitching, and adaptation to environmental contingencies. These results validate that latent variable inference can be a strong alternative to step-wise reward prompting. △ Less

Submitted 28 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.04219 [pdf, ps, other]

A classification of nonzero skew immaculate functions

Authors: Sarah Mason, Jack Xie

Abstract: This article presents conditions under which the skewed version of immaculate noncommutative symmetric functions are nonzero. The work is motivated by the quest to determine when the matrix definition of a skew immaculate function aligns with the Hopf algberaic definition. We describe a necessary condition for a skew immaculate function to include a non-zero term, as well as a sufficient condition… ▽ More This article presents conditions under which the skewed version of immaculate noncommutative symmetric functions are nonzero. The work is motivated by the quest to determine when the matrix definition of a skew immaculate function aligns with the Hopf algberaic definition. We describe a necessary condition for a skew immaculate function to include a non-zero term, as well as a sufficient condition for there to be at least one non-zero term that survives any cancellation. We bring in several classical theorems such as the Pigeonhole Principle from combinatorics and Hall's Matching Theorem from graph theory to prove our theorems. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 20 pages, 3 figures

MSC Class: 05E05; 05C70

arXiv:2402.01622 [pdf, other]

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

Authors: Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su

Abstract: Planning has been part of the core pursuit for artificial intelligence since its conception, but earlier AI agents mostly focused on constrained settings because many of the cognitive substrates necessary for human-level planning have been lacking. Recently, language agents powered by large language models (LLMs) have shown interesting capabilities such as tool use and reasoning. Are these languag… ▽ More Planning has been part of the core pursuit for artificial intelligence since its conception, but earlier AI agents mostly focused on constrained settings because many of the cognitive substrates necessary for human-level planning have been lacking. Recently, language agents powered by large language models (LLMs) have shown interesting capabilities such as tool use and reasoning. Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents? To advance this investigation, we propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario. It provides a rich sandbox environment, various tools for accessing nearly four million data records, and 1,225 meticulously curated planning intents and reference plans. Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%. Language agents struggle to stay on task, use the right tools to collect information, or keep track of multiple constraints. However, we note that the mere possibility for language agents to tackle such a complex problem is in itself non-trivial progress. TravelPlanner provides a challenging yet meaningful testbed for future language agents. △ Less

Submitted 23 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: ICML 2024 (Spotlight)

arXiv:2402.00947 [pdf, other]

Performance of a coarsely pixelated LAPPD photosensor for the SoLID gas Cherenkov detectors

Authors: J. Xie, C. Peng, S. Joosten, Z. -E. Meziani, A. Camsonne, M. Jones, S. Malace, E. Kaczanowicz, M. Rehfuss, N. Sparveris, M. Paolone, M. Foley, M. Minot, M. Popecki, Z. W. Zhao

Abstract: The SoLID spectrometer's gas Cherenkov counters require photosensors that operate in a high luminosity and high background environment. The reference design features arrays of 9 or 16 tiled multi-anode photomultipliers (MaPMTs), distributed across 32 sectors, to serve the light-gas and heavy-gas Cherenkov counters, respectively. To assess the viability of a pixelated INCOM Large Area Picosecond Ph… ▽ More The SoLID spectrometer's gas Cherenkov counters require photosensors that operate in a high luminosity and high background environment. The reference design features arrays of 9 or 16 tiled multi-anode photomultipliers (MaPMTs), distributed across 32 sectors, to serve the light-gas and heavy-gas Cherenkov counters, respectively. To assess the viability of a pixelated INCOM Large Area Picosecond Photodetector (LAPPD$^{\rm TM}$) as an alternative photosensor to replace MaPMT arrays in either detector, we evaluated its performance under realistic SoLID running conditions in Hall C at the Thomas Jefferson National Accelerator Facility (Jefferson Lab). The results of this test confirmed that the coarse-pixelated (2.5$\times$2.5 cm$^2$ pixel size) LAPPD is capable of handling the total projected signal and background rates of the three pillar SoLID experiments. The tested photosensor detected Cherenkov signals with the capability of separating single-electron events from pair production events while rejecting background. Although the design was not aimed at ring-imaging Cherenkov detectors, Cherenkov disk images were captured in two different gas radiators. Through a direct comparison with a GEANT4 simulation, we confirmed the experimental performance of the LAPPD. △ Less

Submitted 3 July, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 11 pages, 8 figures

arXiv:2401.17686 [pdf, other]

Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning

Authors: Tinghui Zhu, Kai Zhang, Jian Xie, Yu Su

Abstract: Recent advancements have significantly augmented the reasoning capabilities of Large Language Models (LLMs) through various methodologies, especially chain-of-thought (CoT) reasoning. However, previous methods fail to address reasoning errors in intermediate steps, leading to accumulative errors. In this paper, we propose Deductive Beam Search (DBS), which seamlessly integrates CoT and deductive r… ▽ More Recent advancements have significantly augmented the reasoning capabilities of Large Language Models (LLMs) through various methodologies, especially chain-of-thought (CoT) reasoning. However, previous methods fail to address reasoning errors in intermediate steps, leading to accumulative errors. In this paper, we propose Deductive Beam Search (DBS), which seamlessly integrates CoT and deductive reasoning with step-wise beam search for LLMs. Our approach deploys a verifier, verifying the deducibility of a reasoning step and its premises, thus alleviating the error accumulation. Furthermore, we introduce a scalable and labor-free data construction method to amplify our model's verification capabilities. Extensive experiments demonstrate that our approach significantly enhances the base performance of LLMs of various scales (7B, 13B, 70B, and ChatGPT) across 8 reasoning datasets from 3 diverse reasoning genres, including arithmetic, commonsense, and symbolic. Moreover, our analysis proves DBS's capability of detecting diverse and subtle reasoning errors and robustness on different model scales. △ Less

Submitted 4 February, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.17322 [pdf, other]

doi 10.1103/PhysRevD.110.014032

Unveiling the $a_0(1710)$ nature in the process $J/ψ\to {\bar{K}}^0K^+ρ^- $

Authors: Yan Ding, En Wang, De-Min Li, Li-Sheng Geng, Ju-Jun Xie

Abstract: We have investigated the process $J/ψ\to {\bar{K}}^0K^+ρ^-$ by taking into account the $S$-wave ${K^*\bar{K}^*}$, $ρω$, and $ρφ$ final-state interactions, where the scalar meson $a_0(1710)$ is generated. In addition, we also take into account the contributions from the scalar $a_0(980)(\to \bar{K}^0K^+)$ and the intermediate resonances $K_1(1270)^{-}(\to {\bar{K}}^0ρ^-) $ and… ▽ More We have investigated the process $J/ψ\to {\bar{K}}^0K^+ρ^-$ by taking into account the $S$-wave ${K^*\bar{K}^*}$, $ρω$, and $ρφ$ final-state interactions, where the scalar meson $a_0(1710)$ is generated. In addition, we also take into account the contributions from the scalar $a_0(980)(\to \bar{K}^0K^+)$ and the intermediate resonances $K_1(1270)^{-}(\to {\bar{K}}^0ρ^-) $ and $K_1(1270)^{0}(\to K^+ρ^-)$. Our results show that, in the ${\bar{K}^0K^+}$ invariant mass distribution, a clear peak structure around 1.8~GeV appears, which could be associated with the scalar $a_0(1710)$, however, no significant structure of the $a_0(980)$ is observed. On the other hand, one can find clear peaks of the $K_1(1270)$ in the ${\bar{K}}^0ρ^-$ and $K^+ρ^-$ invariant mass distributions. The future precise measurement of this process by the BESIII and Belle II Collaborations and the planned Super Tau-Charm Facility (STCF) in the future could shed light on the nature of $a_0(1710)$. △ Less

Submitted 23 July, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 12 pages, 13 figures, the version for PRD. arXiv admin note: substantial text overlap with arXiv:2306.15964

Journal ref: Phys. Rev. D 110, 014032 (2024)

arXiv:2401.15902 [pdf, other]

A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving

Authors: Moyun Liu, Bing Chen, Youping Chen, Jingming Xie, Lei Yao, Yang Zhang, Joey Tianyi Zhou

Abstract: Depth completion is a crucial task in autonomous driving, aiming to convert a sparse depth map into a dense depth prediction. Due to its potentially rich semantic information, RGB image is commonly fused to enhance the completion effect. Image-guided depth completion involves three key challenges: 1) how to effectively fuse the two modalities; 2) how to better recover depth information; and 3) how… ▽ More Depth completion is a crucial task in autonomous driving, aiming to convert a sparse depth map into a dense depth prediction. Due to its potentially rich semantic information, RGB image is commonly fused to enhance the completion effect. Image-guided depth completion involves three key challenges: 1) how to effectively fuse the two modalities; 2) how to better recover depth information; and 3) how to achieve real-time prediction for practical autonomous driving. To solve the above problems, we propose a concise but effective network, named CENet, to achieve high-performance depth completion with a simple and elegant structure. Firstly, we use a fast guidance module to fuse the two sensor features, utilizing abundant auxiliary features extracted from the color space. Unlike other commonly used complicated guidance modules, our approach is intuitive and low-cost. In addition, we find and analyze the optimization inconsistency problem for observed and unobserved positions, and a decoupled depth prediction head is proposed to alleviate the issue. The proposed decoupled head can better output the depth of valid and invalid positions with very few extra inference time. Based on the simple structure of dual-encoder and single-decoder, our CENet can achieve superior balance between accuracy and efficiency. In the KITTI depth completion benchmark, our CENet attains competitive performance and inference speed compared with the state-of-the-art methods. To validate the generalization of our method, we also evaluate on indoor NYUv2 dataset, and our CENet still achieve impressive results. The code of this work will be available at https://github.com/lmomoy/CHNet. △ Less

Submitted 22 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.15372 [pdf, ps, other]

Infinitely many solutions for three quasilinear Laplacian systems on weighted graphs

Authors: Yan Pang, Junping Xie, Xingyong Zhang

Abstract: We investigate a generalized poly-Laplacian system with a parameter on weighted finite graph, a generalized poly-Laplacian system with a parameter and Dirichlet boundary value on weighted locally finite graphs, and a $(p,q)$-Laplacian system with a parameter on weighted locally finite graphs. We utilize a critical points theorem built by Bonanno and Bisci [Bonanno, Bisci, and Regan, Math. Comput.… ▽ More We investigate a generalized poly-Laplacian system with a parameter on weighted finite graph, a generalized poly-Laplacian system with a parameter and Dirichlet boundary value on weighted locally finite graphs, and a $(p,q)$-Laplacian system with a parameter on weighted locally finite graphs. We utilize a critical points theorem built by Bonanno and Bisci [Bonanno, Bisci, and Regan, Math. Comput. Model. 2010, 52(1-2): 152-160], which is an abstract critical points theorem without compactness condition, to obtain that these three systems have infinitely many nontrivial solutions with unbounded norm when the parameters locate some well-determined range. △ Less

Submitted 27 January, 2024; originally announced January 2024.

arXiv:2401.14604 [pdf, other]

doi 10.1088/1741-4326/ad39d9

Effects of Magnetic Helicity on 3D Equilibria and Self-Organized States in KTX Reversed Field Pinch

Authors: Ke Liu, Guodong Yu, Yuhua Huang, Wenzhe Mao, Yidong Xie, Xianyi Nie, Hong Li, Tao Lan, Jinlin Xie, Weixing Ding, Wandong Liu, Ge Zhuang, Caoxiang Zhu

Abstract: The RFP is a toroidal magnetic configuration in which plasmas can spontaneously transform into different self-organized states. Among various states, the QSH state has a dominant component for the magnetic field and significantly improves confinement. Many theoretical and experimental efforts have investigated the transitions among different states. This paper employs the MRxMHD model to study the… ▽ More The RFP is a toroidal magnetic configuration in which plasmas can spontaneously transform into different self-organized states. Among various states, the QSH state has a dominant component for the magnetic field and significantly improves confinement. Many theoretical and experimental efforts have investigated the transitions among different states. This paper employs the MRxMHD model to study the properties of QSH and other states. The SPEC is used to compute MHD equilibria for the KTX. The toroidal volume of KTX is partitioned into two subvolumes by an internal transport barrier. The geometry of this barrier is adjusted to achieve force balance across the interface, ensuring that the plasma in each subvolume is force-free and that magnetic helicity is conserved. By varying the parameters, we generate distinct self-organized states in KTX. Our findings highlight the crucial role of magnetic helicity in shaping these states. In states with low magnetic helicity in both subvolumes, the plasma exhibits axisymmetric behavior. With increasing core helicity, the plasma gradually transforms from an axisymmetric state to a double-axis helical state and finally to a single-helical-axis state. Elevated core magnetic helicity leads to a more pronounced dominant mode of the boundary magnetic field and a reduced core magnetic shear. This is consistent with previous experimental and numerical results in other RFP devices. We find a linear relationship between the plasma current and helicity in different self-organized states. Our findings suggest that KTX may enter the QSH state when the toroidal current reaches 0.72 MA. This study demonstrates that the stellarator equilibrium code SPEC unveils crucial RFP equilibrium properties, rendering it applicable to a broad range of RFP devices and other toroidal configurations. △ Less

Submitted 6 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Showing 101–150 of 1,129 results for author: Xie, J