Search | arXiv e-print repository

ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection

Authors: Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong Liu

Abstract: Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across differen… ▽ More Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across different datasets under the practical multi-class setting. The absence of standardized experimental setups can lead to potential biases in training epochs, resolution, and metric results, resulting in erroneous conclusions. This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, \textbf{\textit{ADer}}, which is a modular framework that is highly extensible for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. Additionally, we have open-sourced the GPU-assisted \href{https://pypi.org/project/ADEval}{ADEval} package to address the slow evaluation problem of metrics like time-consuming mAU-PRO on large-scale data, significantly reducing evaluation time by more than \textit{1000-fold}. Through extensive experimental results, we objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection. We hope that \textbf{\textit{ADer}} will become a valuable resource for researchers and practitioners in the field, promoting the development of more robust and generalizable anomaly detection systems. Full codes have been attached in Appendix and open-sourced at \url{https://github.com/zhangzjn/ader}. △ Less

Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.03081 [pdf, other]

A Quantum Neural Network-Based Approach to Power Quality Disturbances Detection and Recognition

Authors: Guo-Dong Li, Hai-Yan He, Yue Li, Xin-Hao Li, Hao Liu, Qing-Le Wang, Long Cheng

Abstract: Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks… ▽ More Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks (QNN) model for PQDs detection and recognition is proposed. Specifically, the model constructs a quantum circuit comprising data qubits and ancilla qubits. Classical data is transformed into quantum data by embedding it into data qubits via the encoding layer. Subsequently, parametric quantum gates are utilized to form the variational layer, which facilitates qubit information transformation, thereby extracting essential feature information for detection and recognition. The expected value is obtained by measuring ancilla qubits, enabling the completion of disturbance classification based on this expected value. An analysis reveals that the runtime and space complexities of the QNN are $O\left ( poly\left ( N \right ) \right )$ and $O\left ( N \right )$, respectively. Extensive experiments validate the feasibility and superiority of the proposed model in PQD detection and recognition. The model achieves accuracies of 99.75\%, 97.85\% and 95.5\% in experiments involving the detection of disturbances, recognition of seven single disturbances, and recognition of ten mixed disturbances, respectively. Additionally, noise simulation and comparative experiments demonstrate that the proposed model exhibits robust anti-noise capabilities, requires few training parameters, and maintains high accuracy. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02578 [pdf, other]

Pretrained Mobility Transformer: A Foundation Model for Human Mobility

Authors: Xinhua Wu, Haoyu He, Yanchao Wang, Qi Wang

Abstract: Ubiquitous mobile devices are generating vast amounts of location-based service data that reveal how individuals navigate and utilize urban spaces in detail. In this study, we utilize these extensive, unlabeled sequences of user trajectories to develop a foundation model for understanding urban space and human mobility. We introduce the \textbf{P}retrained \textbf{M}obility \textbf{T}ransformer (P… ▽ More Ubiquitous mobile devices are generating vast amounts of location-based service data that reveal how individuals navigate and utilize urban spaces in detail. In this study, we utilize these extensive, unlabeled sequences of user trajectories to develop a foundation model for understanding urban space and human mobility. We introduce the \textbf{P}retrained \textbf{M}obility \textbf{T}ransformer (PMT), which leverages the transformer architecture to process user trajectories in an autoregressive manner, converting geographical areas into tokens and embedding spatial and temporal information within these representations. Experiments conducted in three U.S. metropolitan areas over a two-month period demonstrate PMT's ability to capture underlying geographic and socio-demographic characteristics of regions. The proposed PMT excels across various downstream tasks, including next-location prediction, trajectory imputation, and trajectory generation. These results support PMT's capability and effectiveness in decoding complex patterns of human mobility, offering new insights into urban spatial functionality and individual mobility preferences. △ Less

Submitted 28 May, 2024; originally announced June 2024.

arXiv:2406.02213 [pdf, other]

Rectifying Reinforcement Learning for Reward Matching

Authors: Haoran He, Emmanuel Bengio, Qingpeng Cai, Ling Pan

Abstract: The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sample objects with probability proportional to an unnormalized reward function. GFlowNets share a strong resemblance to reinforcement learning (RL), that typically aims to maximize reward, due to their sequential decision-making processes. Recent works have studie… ▽ More The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sample objects with probability proportional to an unnormalized reward function. GFlowNets share a strong resemblance to reinforcement learning (RL), that typically aims to maximize reward, due to their sequential decision-making processes. Recent works have studied connections between GFlowNets and maximum entropy (MaxEnt) RL, which modifies the standard objective of RL agents by learning an entropy-regularized objective. However, a critical theoretical gap persists: despite the apparent similarities in their sequential decision-making nature, a direct link between GFlowNets and standard RL has yet to be discovered, while bridging this gap could further unlock the potential of both fields. In this paper, we establish a new connection between GFlowNets and policy evaluation for a uniform policy. Surprisingly, we find that the resulting value function for the uniform policy has a close relationship to the flows in GFlowNets. Leveraging these insights, we further propose a novel rectified policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets, offering a new perspective. We compare RPE, MaxEnt RL, and GFlowNets in a number of benchmarks, and show that RPE achieves competitive results compared to previous approaches. This work sheds light on the previously unexplored connection between (non-MaxEnt) RL and GFlowNets, potentially opening new avenues for future research in both fields. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01150 [pdf, other]

Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

Authors: Haoran He, Can Chang, Huazhe Xu, Ling Pan

Abstract: Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a stochastic policy to sequentially generate compositional objects with probabilities proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward objects, in contrast to standard return maximization reinforcement learning approaches, which often converge to a single op… ▽ More Generative Flow Networks (GFlowNets) are amortized sampling methods for learning a stochastic policy to sequentially generate compositional objects with probabilities proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse sets of high-reward objects, in contrast to standard return maximization reinforcement learning approaches, which often converge to a single optimal solution. Recent works have arisen for learning goal-conditioned GFlowNets to acquire various useful properties, aiming to train a single GFlowNet capable of achieving different goals as the task specifies. However, training a goal-conditioned GFlowNet poses critical challenges due to extremely sparse rewards, which is further exacerbated in large state spaces. In this work, we propose a novel method named Retrospective Backward Synthesis (RBS) to address these challenges. Specifically, RBS synthesizes a new backward trajectory based on the backward policy in GFlowNets to enrich training trajectories with enhanced quality and diversity, thereby efficiently solving the sparse reward problem. Extensive empirical results show that our method improves sample efficiency by a large margin and outperforms strong baselines on various standard evaluation benchmarks. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00050 [pdf, other]

An Empirical Analysis on Large Language Models in Debate Evaluation

Authors: Xinyi Liu, Pinxin Liu, Hangfeng He

Abstract: In this study, we investigate the capabilities and inherent biases of advanced large language models (LLMs) such as GPT-3.5 and GPT-4 in the context of debate evaluation. We discover that LLM's performance exceeds humans and surpasses the performance of state-of-the-art methods fine-tuned on extensive datasets in debate evaluation. We additionally explore and analyze biases present in LLMs, includ… ▽ More In this study, we investigate the capabilities and inherent biases of advanced large language models (LLMs) such as GPT-3.5 and GPT-4 in the context of debate evaluation. We discover that LLM's performance exceeds humans and surpasses the performance of state-of-the-art methods fine-tuned on extensive datasets in debate evaluation. We additionally explore and analyze biases present in LLMs, including positional bias, lexical bias, order bias, which may affect their evaluative judgments. Our findings reveal a consistent bias in both GPT-3.5 and GPT-4 towards the second candidate response presented, attributed to prompt design. We also uncover lexical biases in both GPT-3.5 and GPT-4, especially when label sets carry connotations such as numerical or sequential, highlighting the critical need for careful label verbalizer selection in prompt design. Additionally, our analysis indicates a tendency of both models to favor the debate's concluding side as the winner, suggesting an end-of-discussion bias. △ Less

Submitted 4 June, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 main

arXiv:2405.20763 [pdf, other]

Improving Generalization and Convergence by Enhancing Implicit Regularization

Authors: Mingze Wang, Jinbo Wang, Haotian He, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

Abstract: In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I… ▽ More In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM). △ Less

Submitted 20 August, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: 35 pages

arXiv:2405.20600 [pdf, other]

Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning

Authors: Kaicheng Fu, Changde Du, Xiaoyu Chen, Jie Peng, Huiguang He

Abstract: Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious cata… ▽ More Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious catastrophic forgetting caused by partial label problem and inadequate label semantics mining. In this paper, we propose an augmented emotional semantics learning framework for multi-label class incremental emotion decoding. Specifically, we design an augmented emotional relation graph module with label disambiguation to handle the past-missing partial label problem. Then, we leverage domain knowledge from affective dimension space to alleviate future-missing partial label problem by knowledge distillation. Besides, an emotional semantics learning module is constructed with a graph autoencoder to obtain emotion embeddings in order to guide the semantic-specific feature decoupling for better multi-label learning. Extensive experiments on three datasets show the superiority of our method for improving emotion decoding performance and mitigating forgetting on MLCIL problem. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19678 [pdf, other]

View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields

Authors: Haodi He, Colton Stearns, Adam W. Harley, Leonidas J. Guibas

Abstract: Large-scale vision foundation models such as Segment Anything (SAM) demonstrate impressive performance in zero-shot image segmentation at multiple levels of granularity. However, these zero-shot predictions are rarely 3D-consistent. As the camera viewpoint changes in a scene, so do the segmentation predictions, as well as the characterizations of "coarse" or "fine" granularity. In this work, we ad… ▽ More Large-scale vision foundation models such as Segment Anything (SAM) demonstrate impressive performance in zero-shot image segmentation at multiple levels of granularity. However, these zero-shot predictions are rarely 3D-consistent. As the camera viewpoint changes in a scene, so do the segmentation predictions, as well as the characterizations of "coarse" or "fine" granularity. In this work, we address the challenging task of lifting multi-granular and view-inconsistent image segmentations into a hierarchical and 3D-consistent representation. We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene, whose segmentation structure can be revealed at different scales by simply using different thresholds on feature distance. Our key idea is to learn an ultrametric feature space, which unlike a Euclidean space, exhibits transitivity in distance-based grouping, naturally leading to a hierarchical clustering. Put together, our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output. We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency. We additionally provide qualitative examples of our model's 3D hierarchical segmentations in real world scenes. The code and dataset are available at https://github.com/hardyho/ultrametric_feature_fields △ Less

Submitted 17 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19586 [pdf, other]

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

Authors: Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li

Abstract: Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of scene understanding and action prediction. Current methods employ both 3D representation and multi-view 2D representation to predict the poses of the robot's end-effector. However, they still require a considerable amount of high-quality robot trajectories, and suffer from limited generalization in unseen tasks… ▽ More Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of scene understanding and action prediction. Current methods employ both 3D representation and multi-view 2D representation to predict the poses of the robot's end-effector. However, they still require a considerable amount of high-quality robot trajectories, and suffer from limited generalization in unseen tasks and inefficient execution in long-horizon reasoning. In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning. Specifically, we adopt Segment Anything (SAM) pre-trained on a huge number of images and promptable masks as the foundation model for extracting task-relevant features, and employ parameter-efficient fine-tuning on robot data for a better understanding of embodied scenarios. To address long-horizon reasoning, we develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass, notably enhancing execution efficiency. Experimental results from various instruction-following tasks demonstrate that SAM-E achieves superior performance with higher execution efficiency compared to the baselines, and also significantly improves generalization in few-shot adaptation to new tasks. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: ICML 2024. Project page: https://sam-embodied.github.io

arXiv:2405.18726 [pdf, other]

Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI

Authors: Che Liu, Changde Du, Xiaoyu Chen, Huiguang He

Abstract: Drawing inspiration from the hierarchical processing of the human auditory system, which transforms sound from low-level acoustic features to high-level semantic understanding, we introduce a novel coarse-to-fine audio reconstruction method. Leveraging non-invasive functional Magnetic Resonance Imaging (fMRI) data, our approach mimics the inverse pathway of auditory processing. Initially, we utili… ▽ More Drawing inspiration from the hierarchical processing of the human auditory system, which transforms sound from low-level acoustic features to high-level semantic understanding, we introduce a novel coarse-to-fine audio reconstruction method. Leveraging non-invasive functional Magnetic Resonance Imaging (fMRI) data, our approach mimics the inverse pathway of auditory processing. Initially, we utilize CLAP to decode fMRI data coarsely into a low-dimensional semantic space, followed by a fine-grained decoding into the high-dimensional AudioMAE latent space guided by semantic features. These fine-grained neural features serve as conditions for audio reconstruction through a Latent Diffusion Model (LDM). Validation on three public fMRI datasets-Brain2Sound, Brain2Music, and Brain2Speech-underscores the superiority of our coarse-to-fine decoding method over stand-alone fine-grained approaches, showcasing state-of-the-art performance in metrics like FD, FAD, and KL. Moreover, by employing semantic prompts during decoding, we enhance the quality of reconstructed audio when semantic features are suboptimal. The demonstrated versatility of our model across diverse stimuli highlights its potential as a universal brain-to-audio framework. This research contributes to the comprehension of the human auditory system, pushing boundaries in neural decoding and audio reconstruction methodologies. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.18399 [pdf, ps, other]

A simple, randomized algorithm for diagonalizing normal matrices

Authors: Haoze He, Daniel Kressner

Abstract: We present and analyze a simple numerical method that diagonalizes a complex normal matrix A by diagonalizing the Hermitian matrix obtained from a random linear combination of the Hermitian and skew-Hermitian parts of A. We present and analyze a simple numerical method that diagonalizes a complex normal matrix A by diagonalizing the Hermitian matrix obtained from a random linear combination of the Hermitian and skew-Hermitian parts of A. △ Less

Submitted 28 May, 2024; originally announced May 2024.

MSC Class: 65F15; 15B57; 15A18

arXiv:2405.17976 [pdf]

Yuan 2.0-M32: Mixture of Experts with Attention Router

Authors: Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

Abstract: Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the… ▽ More Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the training computation consumption is only 9.25% of a dense model at the same parameter scale. Yuan 2.0-M32 demonstrates competitive capability on coding, math, and various domains of expertise, with only 3.7B active parameters of 40B in total, and 7.4 GFlops forward computation per token, both of which are only 1/19 of Llama3-70B. Yuan 2.0-M32 surpass Llama3-70B on MATH and ARC-Challenge benchmark, with accuracy of 55.89 and 95.8 respectively. The models and source codes of Yuan 2.0-M32 are released at Github1. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 14 pages,3 figures, 7 tables

arXiv:2405.17414 [pdf, other]

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Authors: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein

Abstract: Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene… ▽ More Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene from multiple different camera trajectories. Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. The CVD framework includes a novel cross-video synchronization module that promotes consistency between corresponding frames of the same video rendered from different camera poses using an epipolar attention mechanism. Trained on top of a state-of-the-art camera-control module for video generation, CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines, as shown in extensive experiments. Project page: https://collaborativevideodiffusion.github.io/. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16730 [pdf, other]

Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues include but are not limited to high sample complexity, which relates to inaccurate approximation of black-box function; and insufficient coverage and exploration of input design modes, which leads to suboptimal proposal of new input designs. In this work, we consider finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes. To this end, we formulate an learnable energy-based latent space, and propose Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo. The optimization process is then exploration of high-value designs guided by the learned energy-based model in the latent space, formulated as gradient-based sampling from a latent-variable-parameterized inverse model. We show that our particular parameterization encourages expanded exploration around high-value design modes, motivated by inversion thinking of a fundamental result of conditional covariance matrix typically used for variance reduction. We observe that our method, backed by an accurately learned informative latent space and an expanding-exploration model design, yields significant improvements over strong previous methods on both synthetic and real world datasets such as the design-bench suite. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16289 [pdf]

Intensity adaptive optics

Authors: Zimo Zhao, Yifei Ma, Jacopo Antonello, Zipei Song, Jiahe Cui, Binguo Chen, Jingyu Wang, Bangshan Sun, Honghui He, Lin Luo, Julian A. J. Fells, Steve J. Elston, Martin J. Booth, Stephen M. Morris, Chao He

Abstract: Adaptive optics (AO) is a powerful tool used in a wide range of research areas spanning from aerospace to microscopy. To date, AO has largely been applied to optical phase aberration correction, with recent advances extending to include the vectorial properties of light. However, intensity errors widely exist in optical systems, yet their associated correction methods are still very much in their… ▽ More Adaptive optics (AO) is a powerful tool used in a wide range of research areas spanning from aerospace to microscopy. To date, AO has largely been applied to optical phase aberration correction, with recent advances extending to include the vectorial properties of light. However, intensity errors widely exist in optical systems, yet their associated correction methods are still very much in their infancy. Here, we propose a new adaptive optics method that is termed intensity adaptive optics (I-AO), which features a dual-feedback loop for intensity aberration correction that addresses both intensity uniformity and the overall intensity. We demonstrate that I-AO can operate in both sensor-based and sensorless regimes and validate its feasibility by quantitatively analysing the quality of the focus of an aberrated optical system. This technique expands the AO toolkit, broadens its scope of application, and opens a new avenue for next-generation AO innovations. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15525 [pdf, other]

Sparse Matrix in Large Language Model Fine-tuning

Authors: Haoze He, Juncheng Billy Li, Xuan Jiang, Heather Miller

Abstract: LoRA and its variants have become popular parameter-efficient fine-tuning (PEFT) methods due to their ability to avoid excessive computational costs. However, an accuracy gap often exists between PEFT methods and full fine-tuning (FT), and this gap has yet to be systematically studied. In this work, we introduce a method for selecting sparse sub-matrices that aim to minimize the performance gap be… ▽ More LoRA and its variants have become popular parameter-efficient fine-tuning (PEFT) methods due to their ability to avoid excessive computational costs. However, an accuracy gap often exists between PEFT methods and full fine-tuning (FT), and this gap has yet to be systematically studied. In this work, we introduce a method for selecting sparse sub-matrices that aim to minimize the performance gap between PEFT vs. full fine-tuning (FT) while also reducing both fine-tuning computational cost and memory cost. Our Sparse Matrix Tuning (SMT) method begins by identifying the most significant sub-matrices in the gradient update, updating only these blocks during the fine-tuning process. In our experiments, we demonstrate that SMT consistently surpasses other PEFT baseline (e.g. LoRA and DoRA) in fine-tuning popular large language models such as LLaMA across a broad spectrum of tasks, while reducing the GPU memory footprint by 67% compared to FT. We also examine how the performance of LoRA and DoRA tends to plateau and decline as the number of trainable parameters increases, in contrast, our SMT method does not suffer from such issue. △ Less

Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 14 pages

arXiv:2405.15214 [pdf, other]

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Authors: Qingdong He, Jiangning Zhang, Jinlong Peng, Haoyang He, Yabiao Wang, Chengjie Wang

Abstract: Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the… ▽ More Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the RWKV model in the NLP field with necessary modifications for point cloud learning tasks. Specifically, taking the embedded point patches as input, we first propose to explore the global processing capabilities within PointRWKV blocks using modified multi-headed matrix-valued states and a dynamic attention recurrence mechanism. To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius near-neighbors graph with a graph stabilizer. Furthermore, we design PointRWKV as a multi-scale framework for hierarchical feature learning of 3D point clouds, facilitating various downstream tasks. Extensive experiments on different point cloud learning tasks show our proposed PointRWKV outperforms the transformer- and mamba-based counterparts, while significantly saving about 46\% FLOPs, demonstrating the potential option for constructing foundational 3D models. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14018 [pdf, other]

Watermarking Generative Tabular Data

Authors: Hengzhi He, Peiyu Yu, Junpeng Ren, Ying Nian Wu, Guang Cheng

Abstract: In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based… ▽ More In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based on simple data binning. Specifically, it divides the feature's value range into finely segmented intervals and embeds watermarks into selected ``green list" intervals. To detect the watermarks, we develop a principled statistical hypothesis-testing framework with minimal assumptions: it remains valid as long as the underlying data distribution has a continuous density function. The watermarking efficacy is demonstrated through rigorous theoretical analysis and empirical validation, highlighting its utility in enhancing the security of synthetic and real-world datasets. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.11739 [pdf]

What Radio Waves Tell Us about Sleep

Authors: Hao He, Chao Li, Wolfgang Ganglberger, Kaileigh Gallagher, Rumen Hristov, Michail Ouroutzoglou, Haoqi Sun, Jimeng Sun, Brandon Westover, Dina Katabi

Abstract: The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therape… ▽ More The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therapeutic responses, both in clinical trials and routine care. In this article, we develop an advanced machine learning algorithm for passively monitoring sleep and nocturnal breathing from radio waves reflected off people while asleep. Validation results in comparison with the gold standard (i.e., polysomnography) (n=849) demonstrate that the model captures the sleep hypnogram (with an accuracy of 81% for 30-second epochs categorized into Wake, Light Sleep, Deep Sleep, or REM), detects sleep apnea (AUROC = 0.88), and measures the patient's Apnea-Hypopnea Index (ICC=0.95; 95% CI = [0.93, 0.97]). Notably, the model exhibits equitable performance across race, sex, and age. Moreover, the model uncovers informative interactions between sleep stages and a range of diseases including neurological, psychiatric, cardiovascular, and immunological disorders. These findings not only hold promise for clinical practice and interventional trials but also underscore the significance of sleep as a fundamental component in understanding and managing various diseases. △ Less

Submitted 20 July, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

Comments: The first two authors contributed equally to this work

arXiv:2405.11389 [pdf, other]

Adjacent Leader Decentralized Stochastic Gradient Descent

Authors: Haoze He, Jing Wang, Anna Choromanska

Abstract: This work focuses on the decentralized deep learning optimization framework. We propose Adjacent Leader Decentralized Gradient Descent (AL-DSGD), for improving final model performance, accelerating convergence, and reducing the communication overhead of decentralized deep learning optimizers. AL-DSGD relies on two main ideas. Firstly, to increase the influence of the strongest learners on the lear… ▽ More This work focuses on the decentralized deep learning optimization framework. We propose Adjacent Leader Decentralized Gradient Descent (AL-DSGD), for improving final model performance, accelerating convergence, and reducing the communication overhead of decentralized deep learning optimizers. AL-DSGD relies on two main ideas. Firstly, to increase the influence of the strongest learners on the learning system it assigns weights to different neighbor workers according to both their performance and the degree when averaging among them, and it applies a corrective force on the workers dictated by both the currently best-performing neighbor and the neighbor with the maximal degree. Secondly, to alleviate the problem of the deterioration of the convergence speed and performance of the nodes with lower degrees, AL-DSGD relies on dynamic communication graphs, which effectively allows the workers to communicate with more nodes while keeping the degrees of the nodes low. Experiments demonstrate that AL-DSGD accelerates the convergence of the decentralized state-of-the-art techniques and improves their test performance especially in the communication constrained environments. We also theoretically prove the convergence of the proposed scheme. Finally, we release to the community a highly general and concise PyTorch-based library for distributed training of deep learning models that supports easy implementation of any distributed deep learning approach ((a)synchronous, (de)centralized). △ Less

Submitted 19 August, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

Comments: 9 pages of main paper, and 12 pages of appendix

arXiv:2405.11021 [pdf, other]

Enhanced 3D Urban Scene Reconstruction and Point Cloud Densification using Gaussian Splatting and Google Earth Imagery

Authors: Kyle Gao, Dening Lu, Hongjie He, Linlin Xu, Jonathan Li

Abstract: 3D urban scene reconstruction and modelling is a crucial research area in remote sensing with numerous applications in academia, commerce, industry, and administration. Recent advancements in view synthesis models have facilitated photorealistic 3D reconstruction solely from 2D images. Leveraging Google Earth imagery, we construct a 3D Gaussian Splatting model of the Waterloo region centered on th… ▽ More 3D urban scene reconstruction and modelling is a crucial research area in remote sensing with numerous applications in academia, commerce, industry, and administration. Recent advancements in view synthesis models have facilitated photorealistic 3D reconstruction solely from 2D images. Leveraging Google Earth imagery, we construct a 3D Gaussian Splatting model of the Waterloo region centered on the University of Waterloo and are able to achieve view-synthesis results far exceeding previous 3D view-synthesis results based on neural radiance fields which we demonstrate in our benchmark. Additionally, we retrieved the 3D geometry of the scene using the 3D point cloud extracted from the 3D Gaussian Splatting model which we benchmarked against our Multi- View-Stereo dense reconstruction of the scene, thereby reconstructing both the 3D geometry and photorealistic lighting of the large-scale urban scene through 3D Gaussian Splatting △ Less

Submitted 1 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

ACM Class: I.4; I.3

arXiv:2405.10895 [pdf, other]

doi 10.3847/2041-8213/ad638e

The unluckiest star: A spectroscopically confirmed repeated partial tidal disruption event AT 2022dbl

Authors: Zheyu Lin, Ning Jiang, Tinggui Wang, Xu Kong, Dongyue Li, Han He, Yibo Wang, Jiazheng Zhu, Wentao Li, Ji-an Jiang, Avinash Singh, Rishabh Singh Teja, D. K. Sahu, Chichuan Jin, Keiichi Maeda, Shifeng Huang

Abstract: The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from m… ▽ More The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from multiple stars can also produce similar flares. In this letter, we report the discovery of a repeated pTDE, AT 2022dbl. In a quiescent galaxy at $z=0.0284$, two separate optical/UV flares have been observed in 2022 and 2024, with no bright X-ray, radio or mid-infrared counterparts. Compared to the first flare, the second flare has a similar blackbody temperature of ~26,000 K, slightly lower peak luminosity, and slower rise and fall phases. Compared to the ZTF TDEs, their blackbody parameters and light curve shapes are all similar. The spectra taken during the second flare show a steeper continuum than the late-time spectra of the previous flare, consistent with a newly risen flare. More importantly, the possibility of two independent TDEs can be largely ruled out because the optical spectra taken around the peak of the two flares exhibit highly similar broad Balmer, N III and possible He II emission lines, especially the extreme ~4100Å emission lines. This represents the first robust spectroscopic evidence for a repeated pTDE, which can soon be verified by observing the third flare, given its short orbital period. △ Less

Submitted 29 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 17 pages, 10 figures, accepted by ApJ Letters on 2024 July 15

arXiv:2405.10775 [pdf, other]

doi 10.3847/2041-8213/ad4ce1

A Novel Model for the MeV Emission Line in GRB 221009A

Authors: Yu-Jia Wei, Jia Ren, Hao-Ning He, Yuan-Pei Yang, Da-Ming Wei, Zi-Gao Dai, B. Theodore Zhang

Abstract: Gamma-ray bursts (GRBs) have long been considered potential sources of ultra-high-energy cosmic rays (UHECRs; with energy $\gtrsim 10^{18} {\rm~eV}$). In this work, we propose a novel model generating MeV emission lines in GRB, which can constrain the properties of heavy nuclei that potentially exist in GRB jets. Specifically, we find that relativistic hydrogen-like high-atomic-number ions origina… ▽ More Gamma-ray bursts (GRBs) have long been considered potential sources of ultra-high-energy cosmic rays (UHECRs; with energy $\gtrsim 10^{18} {\rm~eV}$). In this work, we propose a novel model generating MeV emission lines in GRB, which can constrain the properties of heavy nuclei that potentially exist in GRB jets. Specifically, we find that relativistic hydrogen-like high-atomic-number ions originating from the $β$ decay of unstable nuclei and/or the recombination entrained in the GRB jet can generate narrow MeV emission lines through the de-excitation of excited-electrons. This model can successfully explain the MeV emission line observed in the most luminous GRB ever recorded, GRB~221009A, with suitable parameters including a Lorentz factor $γ\sim 820-1700$ and a total mass of heavy nuclei $M_{\rm tot} \sim 10^{23} - 10^{26}$~g. Especially, the emission line broadening can be reasonably attributed to both the expansion of the jet shell and the thermal motion of nuclei, naturally resulting in a narrow width ($σ_{\rm line} / E_{\rm line} \lesssim 0.2$) consistent with the observation. Furthermore, we predict that different GRBs can exhibit lines in different bands with various evolving behaviors, which might be confirmed with further observations. Finally, our model provides indirect evidence that GRBs may be one of the sources of UHECRs. △ Less

Submitted 8 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 13 pages, 4 figures; Published in ApJL, https://doi.org/10.3847/2041-8213/ad4ce1

arXiv:2405.10096 [pdf, other]

The Effect of Quantization in Federated Learning: A Rényi Differential Privacy Perspective

Authors: Tianqu Kang, Lumin Liu, Hengtao He, Jun Zhang, S. H. Song, Khaled B. Letaief

Abstract: Federated Learning (FL) is an emerging paradigm that holds great promise for privacy-preserving machine learning using distributed data. To enhance privacy, FL can be combined with Differential Privacy (DP), which involves adding Gaussian noise to the model weights. However, FL faces a significant challenge in terms of large communication overhead when transmitting these model weights. To address… ▽ More Federated Learning (FL) is an emerging paradigm that holds great promise for privacy-preserving machine learning using distributed data. To enhance privacy, FL can be combined with Differential Privacy (DP), which involves adding Gaussian noise to the model weights. However, FL faces a significant challenge in terms of large communication overhead when transmitting these model weights. To address this issue, quantization is commonly employed. Nevertheless, the presence of quantized Gaussian noise introduces complexities in understanding privacy protection. This research paper investigates the impact of quantization on privacy in FL systems. We examine the privacy guarantees of quantized Gaussian mechanisms using Rényi Differential Privacy (RDP). By deriving the privacy budget of quantized Gaussian mechanisms, we demonstrate that lower quantization bit levels provide improved privacy protection. To validate our theoretical findings, we employ Membership Inference Attacks (MIA), which gauge the accuracy of privacy leakage. The numerical results align with our theoretical analysis, confirming that quantization can indeed enhance privacy protection. This study not only enhances our understanding of the correlation between privacy and communication in FL but also underscores the advantages of quantization in preserving privacy. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 6 pages, 5 figures, submitted to 2024 IEEE MeditCom

arXiv:2405.10047 [pdf, other]

doi 10.1051/0004-6361/202348988

Stellar Chromospheric Activity Database of Solar-like Stars Based on the LAMOST Low-Resolution Spectroscopic Survey: II. the bolometric and photospheric calibration

Authors: Weitao Zhang, Jun Zhang, Han He, Ali Luo, Haotong Zhang

Abstract: The dependence of stellar magnetic activity on stellar parameters would be inspired by the chromospheric activity studies based on the large-scale spectroscopic surveys. The Ca II H and K lines are employed to construct indicators for assessing and studying the chromospheric activity of solar-like stars. We investigate the widely used bolometric and photospheric calibrated chromospheric activity i… ▽ More The dependence of stellar magnetic activity on stellar parameters would be inspired by the chromospheric activity studies based on the large-scale spectroscopic surveys. The Ca II H and K lines are employed to construct indicators for assessing and studying the chromospheric activity of solar-like stars. We investigate the widely used bolometric and photospheric calibrated chromospheric activity index $R'_{\rm HK}$, derived from the method in the classic literature ($R'_{\rm HK,classic}$) and the method based on the PHOENIX model ($R'_{\rm HK,PHOENIX}$). Since the detailed stellar atmospheric parameters, effective temperature ($T_{\rm eff}$), surface gravity ($\log\,g$), and metallicity ([Fe/H]), are available for LAMOST, we estimate the chromospheric activity index $R'_{\rm HK,PHOENIX}$, along with the corresponding bolometric calibrated index $R_{\rm HK,PHOENIX}$, taking these parameters into account. We provide the database of the derived chromospheric activity parameters for 1,122,495 LAMOST LRS spectra of solar-like stars. Our calculations show that $\log\,R'_{\rm HK,PHOENIX}$ is approximately linearly correlated with $\log\,R'_{\rm HK,classic}$. The results based on our extensive archive support the view that the dynamo mechanism of solar-like stars is generally consistent with the Sun; and the value of solar chromospheric activity index is located at the midpoint of the solar-like star sample. We further investigate the proportions of solar-like stars with different chromospheric activity levels (very active, active, inactive and very inactive). The investigation indicates that the occurrence rate of high levels of chromospheric activity is lower among the stars with effective temperatures between $5600$ and $5900 \,{\rm K}$. △ Less

Submitted 22 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: 18 pages, 20 figures, accepted for publication in A&A

Journal ref: A&A 688, A23 (2024)

arXiv:2405.09548 [pdf, other]

Efficient Bilevel Source Mask Optimization

Authors: Guojin Chen, Hongquan He, Peng Xu, Hao Geng, Bei Yu

Abstract: Resolution Enhancement Techniques (RETs) are critical to meet the demands of advanced technology nodes. Among RETs, Source Mask Optimization (SMO) is pivotal, concurrently optimizing both the source and the mask to expand the process window. Traditional SMO methods, however, are limited by sequential and alternating optimizations, leading to extended runtimes without performance guarantees. This p… ▽ More Resolution Enhancement Techniques (RETs) are critical to meet the demands of advanced technology nodes. Among RETs, Source Mask Optimization (SMO) is pivotal, concurrently optimizing both the source and the mask to expand the process window. Traditional SMO methods, however, are limited by sequential and alternating optimizations, leading to extended runtimes without performance guarantees. This paper introduces a unified SMO framework utilizing the accelerated Abbe forward imaging to enhance precision and efficiency. Further, we propose the innovative \texttt{BiSMO} framework, which reformulates SMO through a bilevel optimization approach, and present three gradient-based methods to tackle the challenges of bilevel SMO. Our experimental results demonstrate that \texttt{BiSMO} achieves a remarkable 40\% reduction in error metrics and 8$\times$ increase in runtime efficiency, signifying a major leap forward in SMO. △ Less

Submitted 7 March, 2024; originally announced May 2024.

Comments: Accepted by Design Automation Conference (DAC) 2024

arXiv:2405.09514 [pdf, other]

Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck

Authors: Hongru Li, Jiawei Shao, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

Abstract: Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that t… ▽ More Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that the encoded features can generalize to domain-shifted data and detect semanticshifted data, while remaining compact for transmission. In this paper, we propose a novel approach based on the information bottleneck (IB) principle and invariant risk minimization (IRM) framework. The proposed method aims to extract compact and informative features that possess high capability for effective domain-shift generalization and accurate semantic-shift detection without any knowledge of the test data during training. Specifically, we propose an invariant feature encoding approach based on the IB principle and IRM framework for domainshift generalization, which aims to find the causal relationship between the input data and task result by minimizing the complexity and domain dependence of the encoded feature. Furthermore, we enhance the task-oriented communication with the label-dependent feature encoding approach for semanticshift detection which achieves joint gains in IB optimization and detection performance. To avoid the intractable computation of the IB-based objective, we leverage variational approximation to derive a tractable upper bound for optimization. Extensive simulation results on image classification tasks demonstrate that the proposed scheme outperforms state-of-the-art approaches and achieves a better rate-distortion tradeoff. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 13 pages, 8 figures, submitted to IEEE for potential publication

arXiv:2405.07840 [pdf, other]

Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

Authors: Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He

Abstract: Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. However, many existing efforts concentrate on decoding small vocabulary sets, leaving space for the exploration of open vocabulary continuous text decoding. In this paper, we introduce a novel m… ▽ More Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. However, many existing efforts concentrate on decoding small vocabulary sets, leaving space for the exploration of open vocabulary continuous text decoding. In this paper, we introduce a novel method, the \textbf{Brain Prompt GPT (BP-GPT)}. By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce a text-to-text baseline and align the fMRI prompt to the text prompt. By introducing the text-to-text baseline, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to $4.61\%$ on METEOR and $2.43\%$ on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.06707 [pdf, other]

Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models

Authors: Yitian Li, Jidong Tian, Hao He, Yaohui Jin

Abstract: Combining different forms of prompts with pre-trained large language models has yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these methods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop \textit{Hypothesis Testing Prompting}, which adds conclusion assum… ▽ More Combining different forms of prompts with pre-trained large language models has yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these methods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop \textit{Hypothesis Testing Prompting}, which adds conclusion assumptions, backward reasoning, and fact verification during intermediate reasoning steps. \textit{Hypothesis Testing prompting} involves multiple assumptions and reverses validation of conclusions leading to its unique correct answer. Experiments on two challenging deductive reasoning datasets ProofWriter and RuleTaker show that hypothesis testing prompting not only significantly improves the effect, but also generates a more reasonable and standardized reasoning process. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04872 [pdf, other]

Logical Negation Augmenting and Debiasing for Prompt-based Methods

Authors: Yitian Li, Jidong Tian, Hao He, Yaohui Jin

Abstract: Prompt-based methods have gained increasing attention on NLP and shown validity on many downstream tasks. Many works have focused on mining these methods' potential for knowledge extraction, but few explore their ability to make logical reasoning. In this work, we focus on the effectiveness of the prompt-based methods on first-order logical reasoning and find that the bottleneck lies in logical ne… ▽ More Prompt-based methods have gained increasing attention on NLP and shown validity on many downstream tasks. Many works have focused on mining these methods' potential for knowledge extraction, but few explore their ability to make logical reasoning. In this work, we focus on the effectiveness of the prompt-based methods on first-order logical reasoning and find that the bottleneck lies in logical negation. Based on our analysis, logical negation tends to result in spurious correlations to negative answers, while propositions without logical negation correlate to positive answers. To solve the problem, we propose a simple but effective method, Negation Augmenting and Negation Debiasing (NAND), which introduces negative propositions to prompt-based methods without updating parameters. Specifically, these negative propositions can counteract spurious correlations by providing "not" for all instances so that models cannot make decisions only by whether expressions contain a logical negation. Experiments on three datasets show that NAND not only solves the problem of calibrating logical negation but also significantly enhances prompt-based methods of logical reasoning without model retraining. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.03324 [pdf]

Investigation of Galactic supernova remnants and their environment in 26.6° < l < 30.6°, $\vert b \vert \leq$ 1.25° using radio surveys

Authors: Tian-Xian Luo, Ping Zhou, Hao-Ning He

Abstract: The problem of missing Galactic supernova remnants (SNRs) refers to the issue that the currently known Galactic SNRs are significantly incomplete compared to the theoretical prediction. To expand the sample of Galactic SNRs, we use GLEAM and THOR+VGPS data across four wavebands ranging from 118 to 1420 MHz to drive a spectral index map covering the region within 26.6° < l < 30.6°,… ▽ More The problem of missing Galactic supernova remnants (SNRs) refers to the issue that the currently known Galactic SNRs are significantly incomplete compared to the theoretical prediction. To expand the sample of Galactic SNRs, we use GLEAM and THOR+VGPS data across four wavebands ranging from 118 to 1420 MHz to drive a spectral index map covering the region within 26.6° < l < 30.6°, $\vert b \vert \leq$ 1.25°, where numerous SNR candidates were recently found. By using the spectral index map of the sky region and detailed analysis of the spectral indices of individual sources, we confirmed four SNR candidates, namely G26.75+0.73, G27.06+0.04, G28.36+0.21, and G28.78$-$0.44, as SNRs. Additionally, we discovered an expanding molecular superbubble located in this region, discussed pulsars associated with SNR candidates, and discovered a long H$α$ filament that spatially overlaps with the candidate G29.38+0.10. We suggest that the problem of missing Galactic SNRs not only arises from observation limitations, but also could be due to the low-density environments of some SNRs, and the different SN explosion properties. △ Less

Submitted 30 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 13 pages, 6 figures; Published in AJ

arXiv:2405.03280 [pdf, other]

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

Authors: Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Huiguang He

Abstract: Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of nat… ▽ More Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of natural videos. To overcome these issues, this paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets. Specifically, during the fMRI-to-feature stage, we decouple semantic, structural, and motion features from fMRI through fMRI-vision-language tri-modal contrastive learning and sparse causal attention. In the feature-to-video stage, these features are merged to videos by an inflated Stable Diffusion. We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model, through permutation tests. Additionally, the visualization of voxel-wise and ROI-wise importance maps confirms the neurobiological interpretability of our model. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.19733 [pdf, other]

Iterative Reasoning Preference Optimization

Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni… ▽ More Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. We train using a modified DPO loss (Rafailov et al., 2023) with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy on GSM8K, MATH, and ARC-Challenge for Llama-2-70B-Chat, outperforming other Llama-2-based models not relying on additionally sourced datasets. For example, we see a large improvement from 55.6% to 81.6% on GSM8K and an accuracy of 88.7% with majority voting out of 32 samples. △ Less

Submitted 25 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.16019 [pdf, other]

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Authors: Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale

Abstract: Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, t… ▽ More Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. PRISM contributes (i) wide geographic and demographic participation in human feedback data; (ii) two census-representative samples for understanding collective welfare (UK and US); and (iii) individualised feedback where every rating is linked to a detailed participant profile, thus permitting exploration of personalisation and attribution of sample artefacts. We focus on collecting conversations that centre subjective and multicultural perspectives on value-laden and controversial topics, where we expect the most interpersonal and cross-cultural disagreement. We demonstrate the usefulness of PRISM via three case studies of dialogue diversity, preference diversity, and welfare outcomes, showing that it matters which humans set alignment norms. As well as offering a rich community resource, we advocate for broader participation in AI development and a more inclusive approach to technology design. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15937 [pdf, other]

doi 10.15302/frontphys.2025.015201

Probing Neutral Triple Gauge Couplings via $\boldsymbol{Zγ\,(\ell^+\ell^-γ)}$ Production at $\boldsymbol{e^+e^-}$ Colliders

Authors: Danning Liu, Rui-Qing Xiao, Shu Li, John Ellis, Hong-Jian He, Rui Yuan

Abstract: Neutral triple gauge couplings (nTGCs) are absent in the Standard Model (SM) and at the dimension-6 level in the Standard Model Effective Field Theory (SMEFT), arising first from dimension-8 operators. As such, they provide a unique window for probing new physics beyond the SM. These dimension-8 operators can be mapped to nTGC form factors whose structure is consistent with the spontaneously-broke… ▽ More Neutral triple gauge couplings (nTGCs) are absent in the Standard Model (SM) and at the dimension-6 level in the Standard Model Effective Field Theory (SMEFT), arising first from dimension-8 operators. As such, they provide a unique window for probing new physics beyond the SM. These dimension-8 operators can be mapped to nTGC form factors whose structure is consistent with the spontaneously-broken electroweak gauge symmetry of the SM. In this work, we study the probes of nTGCs in the reaction $e^+e^-\to Zγ$ with $Z\to\ell^+\ell^-\,(\ell =e,μ)$ at an $e^+e^-$ collider. We perform a detector-level simulation and analysis of this reaction at the Circular Electron Positron Collider (CEPC) with collision energy $\sqrt{s} = 240$ GeV and an integrated luminosity of 20 ab$^{-1}$. We present the sensitivity limits on probing the new physics scales of dimension-8 nTGC operators via measurements of the corresponding nTGC form factors. △ Less

Submitted 1 July, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: Frontiers of Physics (in Press), 22 pages, 10 Figs and 10 Tables

Report number: KCL-PH-TH/2024-18, CERN-TH-2024-046

Journal ref: Frontiers of Physics 20 (2025) 15201, no.1 [Cover Article]

arXiv:2404.13582 [pdf, other]

doi 10.1016/j.physletb.2024.138853

Light and hyper nuclei formation at $\sqrt{s_{\text{NN}}} =$ 3 GeV Au+Au collisions using Wigner coalescence approach

Authors: L. K. Liu, C. L. Hu, X. H. He, S. S. Shi, G. N. Xie

Abstract: The production of light nuclei and hyper-nuclei in heavy-ion collisions, particularly at high baryon density, is crucial for understanding the dynamical evolution of the collision system and exploring the internal state of nuclear matter of compacted stellar object. Despite being a topic of ongoing debate, an improved theoretical understanding is necessary. In this work, production of light nuclei… ▽ More The production of light nuclei and hyper-nuclei in heavy-ion collisions, particularly at high baryon density, is crucial for understanding the dynamical evolution of the collision system and exploring the internal state of nuclear matter of compacted stellar object. Despite being a topic of ongoing debate, an improved theoretical understanding is necessary. In this work, production of light nuclei ($d$, $t$, $^{3}$He, $^{4}$He) and hyper-nuclei ($^{3}_Λ$H, $^{4}_Λ$H) was investigated using the JAM microscopic transport model combined with an afterburner coalescence process at $\sqrt{s_{\text{NN}}} =$ 3 GeV Au+Au collisions. The formation of a specific nucleus during the coalescence process is determined by its Wigner function. The comparison of the calculations for $\mathrm{p_T}$ spectra, average $\mathrm{p_T}$, and rapidity distributions to the measurements from the STAR experiment was performed. We investigated the dynamic information carried by light nuclei and determined the averaged spatial distance $\langle ΔR \rangle$ and momentum difference $\langle ΔP \rangle$ of constituent nucleons ($Λ$) for each nucleus species. △ Less

Submitted 22 July, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Journal ref: Phys. Lett. B 855 (2024) 138853

arXiv:2404.11209 [pdf, ps, other]

Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM

Authors: Hongzhao Li, Hongyu Wang, Xia Sun, Hua He, Jun Feng

Abstract: Medical report generation automates radiology descriptions from images, easing the burden on physicians and minimizing errors. However, current methods lack structured outputs and physician interactivity for clear, clinically relevant reports. Our method introduces a prompt-guided approach to generate structured chest X-ray reports using a pre-trained large language model (LLM). First, we identify… ▽ More Medical report generation automates radiology descriptions from images, easing the burden on physicians and minimizing errors. However, current methods lack structured outputs and physician interactivity for clear, clinically relevant reports. Our method introduces a prompt-guided approach to generate structured chest X-ray reports using a pre-trained large language model (LLM). First, we identify anatomical regions in chest X-rays to generate focused sentences that center on key visual elements, thereby establishing a structured report foundation with anatomy-based sentences. We also convert the detected anatomy into textual prompts conveying anatomical comprehension to the LLM. Additionally, the clinical context prompts guide the LLM to emphasize interactivity and clinical requirements. By integrating anatomy-focused sentences and anatomy/clinical prompts, the pre-trained LLM can generate structured chest X-ray reports tailored to prompted anatomical regions and clinical contexts. We evaluate using language generation and clinical effectiveness metrics, demonstrating strong performance. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE Conference on Multimedia Expo 2024

arXiv:2404.10451 [pdf, other]

Ultrahigh Stability of O-Sublattice in $β$-Ga$_2$O$_3$

Authors: Ru He, Junlei Zhao, Jesper Byggmästar, Huan He, Flyura Djurabekova

Abstract: Recently reported remarkably high radiation tolerance of $γ$/$β$-Ga$_2$O$_3$ double-polymorphic structure brings this ultrawide bandgap semiconductor to the frontiers of power electronics applications that are able to operate in challenging environments. Understanding the mechanism of radiation tolerance is crucial for further material modification and tailoring of the desired properties. In this… ▽ More Recently reported remarkably high radiation tolerance of $γ$/$β$-Ga$_2$O$_3$ double-polymorphic structure brings this ultrawide bandgap semiconductor to the frontiers of power electronics applications that are able to operate in challenging environments. Understanding the mechanism of radiation tolerance is crucial for further material modification and tailoring of the desired properties. In this study, we employ machine-learning-enhanced atomistic simulations to assess the stability of both the gallium (Ga) and oxygen (O) sublattices under various levels of damage. Our study uncovers the remarkable resilience and stability of the O-sublattice, attributing this property to the strong tendency of recovery of the O defects, especially within the stronger disordered regions. Interestingly, we observe the opposite behavior of the Ga defects that display enhanced stability in the same regions of increased disorder. Moreover, we observe that highly defective $β$-Ga$_2$O$_3$ is able to transform into $γ$-Ga$_2$O$_3$ upon annealing due to preserved lattice organization of the O-sublattice. This result clearly manifests that the ultrahigh stability of the O-sublattice provides the backbone for the exceptional radiation tolerance of the $γ$/$β$ double-polymorphic structure. These computational insights closely align with experimental observations, opening avenues for further exploration of polymorphism in Ga$_2$O$_3$ and potentially in analogous polymorphic families spanning a broad range of diverse materials of complex polymorphic nature. △ Less

Submitted 18 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.09932 [pdf, other]

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (13 additional authors not shown)

Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions. This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.08206 [pdf]

Non-uniform wave momentum bandgap in biaxial anisotropic photonic time crystals

Authors: Junhua Dong, Sihao Zhang, Huan He, Huanan Li, Jingjun Xu

Abstract: Photonic time crystals (PTCs) host momentum bandgaps enabling intriguing non-resonant light amplification in propagating waves, but opening substantial bandgaps demands refractive index changes too extreme for conventional nonlinear optics. Here, we introduce momentum bandgaps for non-uniform waves, including evanescent and ghost types, by extending PTCs to biaxial anisotropic photonic time crysta… ▽ More Photonic time crystals (PTCs) host momentum bandgaps enabling intriguing non-resonant light amplification in propagating waves, but opening substantial bandgaps demands refractive index changes too extreme for conventional nonlinear optics. Here, we introduce momentum bandgaps for non-uniform waves, including evanescent and ghost types, by extending PTCs to biaxial anisotropic photonic time crystals that periodically alternate between uniform biaxial anisotropy and isotropic media over time. We show that ghost waves, unlike evanescent waves, sustain only momentum bandgaps, opening wide bandgaps at even the smallest modulation depths. Moreover, we demonstrate momentum bandgap effects on non-uniform waves that can be amplified, or through decaying modes, selectively attenuated. We find that ghost wave momentum bandgaps uniquely boost refracted over reflected waves under one-way incidence, in stark contrast to balanced amplification seen in both propagating and evanescent waves. Our approach expands time-varying metamaterials by integrating wave characteristics, bridging the gap between conventional nonlinear optics and PTC momentum bandgaps, and shedding new light on extreme manipulation of surface polaritons. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07443 [pdf]

1-bit Quantized On-chip Hybrid Diffraction Neural Network Enabled by Authentic All-optical Fully-connected Architecture

Authors: Yu Shao, Haiqi Gao, Yipeng Chen, Yujie liu, Junren Wen, Haidong He, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

Abstract: Optical Diffraction Neural Networks (DNNs), a subset of Optical Neural Networks (ONNs), show promise in mirroring the prowess of electronic networks. This study introduces the Hybrid Diffraction Neural Network (HDNN), a novel architecture that incorporates matrix multiplication into DNNs, synergizing the benefits of conventional ONNs with those of DNNs to surmount the modulation limitations inhere… ▽ More Optical Diffraction Neural Networks (DNNs), a subset of Optical Neural Networks (ONNs), show promise in mirroring the prowess of electronic networks. This study introduces the Hybrid Diffraction Neural Network (HDNN), a novel architecture that incorporates matrix multiplication into DNNs, synergizing the benefits of conventional ONNs with those of DNNs to surmount the modulation limitations inherent in optical diffraction neural networks. Utilizing a singular phase modulation layer and an amplitude modulation layer, the trained neural network demonstrated remarkable accuracies of 96.39% and 89% in digit recognition tasks in simulation and experiment, respectively. Additionally, we develop the Binning Design (BD) method, which effectively mitigates the constraints imposed by sampling intervals on diffraction units, substantially streamlining experimental procedures. Furthermore, we propose an on-chip HDNN that not only employs a beam-splitting phase modulation layer for enhanced integration level but also significantly relaxes device fabrication requirements, replacing metasurfaces with relief surfaces designed by 1-bit quantization. Besides, we conceptualized an all-optical HDNN-assisted lesion detection network, achieving detection outcomes that were 100% aligned with simulation predictions. This work not only advances the performance of DNNs but also streamlines the path towards industrial optical neural network production. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06687 [pdf, other]

Fast and Accurate Relative Motion Tracking for Dual Industrial Robots

Authors: Honglu He, Chen-lung Lu, Glenn Saunders, Pinghai Yang, Jeffrey Schoonover, Leo Ajdelsztajn, John Wason, Santiago Paternain, Agung Julius, John T. Wen

Abstract: Industrial robotic applications such as spraying, welding, and additive manufacturing frequently require fast, accurate, and uniform motion along a 3D spatial curve. To increase process throughput, some manufacturers propose a dual-robot setup to overcome the speed limitation of a single robot. Industrial robot motion is programmed through waypoints connected by motion primitives (Cartesian linear… ▽ More Industrial robotic applications such as spraying, welding, and additive manufacturing frequently require fast, accurate, and uniform motion along a 3D spatial curve. To increase process throughput, some manufacturers propose a dual-robot setup to overcome the speed limitation of a single robot. Industrial robot motion is programmed through waypoints connected by motion primitives (Cartesian linear and circular paths and linear joint paths at constant Cartesian speed). The actual robot motion is affected by the blending between these motion primitives and the pose of the robot (an outstretched/near-singularity pose tends to have larger path tracking errors). Choosing the waypoints and the speed along each motion segment to achieve the performance requirement is challenging. At present, there is no automated solution, and laborious manual tuning by robot experts is needed to approach the desired performance. In this paper, we present a systematic three-step approach to designing and programming a dual robot system to optimize system performance. The first step is to select the relative placement between the two robots based on the specified relative motion path. The second step is to select the relative waypoints and the motion primitives. The final step is to update the waypoints iteratively based on the actual measured relative motion. Waypoint iteration is first executed in simulation and then completed using the actual robots. For performance assessment, we use the mean path speed subject to the relative position and orientation constraints and the path speed uniformity constraint. We have demonstrated the effectiveness of this method on two systems, a physical testbed of two ABB robots and a simulation testbed of two FANUC robots, for two challenging test curves. △ Less

Submitted 14 August, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06564 [pdf, other]

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

Authors: Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Lei Xie

Abstract: Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches. However, CNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Mamba-based models, with their superior long-range modeling and linear efficiency, have garnered substantial attention. This study pioneers the application of Mamba to mu… ▽ More Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches. However, CNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Mamba-based models, with their superior long-range modeling and linear efficiency, have garnered substantial attention. This study pioneers the application of Mamba to multi-class unsupervised anomaly detection, presenting MambaAD, which consists of a pre-trained encoder and a Mamba decoder featuring (Locality-Enhanced State Space) LSS modules at multi-scales. The proposed LSS module, integrating parallel cascaded (Hybrid State Space) HSS blocks and multi-kernel convolutions operations, effectively captures both long-range and local information. The HSS block, utilizing (Hybrid Scanning) HS encoders, encodes feature maps into five scanning methods and eight directions, thereby strengthening global connections through the (State Space Model) SSM. The use of Hilbert scanning and eight directions significantly improves feature sequence modeling. Comprehensive experiments on six diverse anomaly detection datasets and seven metrics demonstrate state-of-the-art performance, substantiating the method's effectiveness. △ Less

Submitted 14 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05412 [pdf]

Valley edge states as bound states in the continuum

Authors: Shunda Yin, Liping Ye, Hailong He, Xueqin Huang, Manzhu Ke, Weiyin Deng, Jiuyang Lu, Zhengyou Liu

Abstract: Bound states in the continuum (BICs) are spatially localized states with energy embedded in the continuum spectrum of extended states. The combination of BICs physics and nontrivial band topology theory giving rise to topological BICs, which are robust against disorders and meanwhile of the merit of conventional BICs, is attracting wide attention recently. Here, we report valley edge states as top… ▽ More Bound states in the continuum (BICs) are spatially localized states with energy embedded in the continuum spectrum of extended states. The combination of BICs physics and nontrivial band topology theory giving rise to topological BICs, which are robust against disorders and meanwhile of the merit of conventional BICs, is attracting wide attention recently. Here, we report valley edge states as topological BICs, which appear at domain wall between two distinct valley topological phases. The robustness of such BICs is demonstrated. The simulations and experiments show great agreement. Our findings of valley related topological BICs shed light on both BICs and valley physics, and may foster innovative applications of topological acoustic devices. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: A revised version has been accepted by Science Bulletin

arXiv:2404.04920 [pdf, other]

Regularized Conditional Diffusion Model for Multi-Task Preference Alignment

Authors: Xudong Yu, Chenjia Bai, Haoran He, Changhong Wang, Xuelong Li

Abstract: Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks. Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions. Nevertheless, the return-conditioned paradigm relies on pre-defined reward functions, facing challenges when applied in multi-task… ▽ More Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks. Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions. Nevertheless, the return-conditioned paradigm relies on pre-defined reward functions, facing challenges when applied in multi-task settings characterized by varying reward functions (versatility) and showing limited controllability concerning human preferences (alignment). In this work, we adopt multi-task preferences as a unified condition for both single- and multi-task decision-making, and propose preference representations aligned with preference labels. The learned representations are used to guide the conditional generation process of diffusion models, and we introduce an auxiliary objective to maximize the mutual information between representations and corresponding generated trajectories, improving alignment between trajectories and preferences. Extensive experiments in D4RL and Meta-World demonstrate that our method presents favorable performance in single- and multi-task scenarios, and exhibits superior alignment with preferences. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04555 [pdf, other]

Cloud-Scale Molecular Gas Properties of the Antennae Merger: A Comparative Study with PHANGS-ALMA Galaxies and NGC 3256

Authors: Nathan Brunetti, Christine D. Wilson, Hao He, Jiayi Sun, Adam K. Leroy, Erik Rosolowsky, Ashley Bemis, Frank Bigiel, Brent Groves, Toshiki Saito, Eva Schinnerer

Abstract: We present observations of the central 9 kpc of the Antennae merger (NGC 4038/9) at 55 pc resolution in the CO 2-1 line obtained with the Atacama Large Millimeter/submillimeter Array (ALMA). We use a pixel-based analysis to compare the gas properties in the Antennae to those in 70 nearby spiral galaxies from the PHANGS-ALMA survey, as well as the merger and nearest luminous infrared galaxy NGC 325… ▽ More We present observations of the central 9 kpc of the Antennae merger (NGC 4038/9) at 55 pc resolution in the CO 2-1 line obtained with the Atacama Large Millimeter/submillimeter Array (ALMA). We use a pixel-based analysis to compare the gas properties in the Antennae to those in 70 nearby spiral galaxies from the PHANGS-ALMA survey, as well as the merger and nearest luminous infrared galaxy NGC 3256. Compared to PHANGS galaxies at matched spatial resolution, the molecular gas in the Antennae exhibits some of the highest surface densities, velocity dispersions, peak brightness temperatures, and turbulent pressures. However, the virial parameters in the Antennae are consistent with many of the PHANGS galaxies. NGC 3256 has similar gas surface densities but higher nuclear velocity dispersions than the Antennae, as well as higher system-wide peak brightness temperatures and virial parameters. NGC 3256 is at a later stage in the merging process than the Antennae, which may result in more intense merger-driven gas flows that could drive up the turbulence in the gas. The high virial parameters in NGC 3256 may indicate that this increased turbulence is suppressing future star formation as NGC 3256 moves out of the starburst phase. In comparison, the relatively normal virial parameters in the Antennae may imply that it is about to undergo a new burst of star formation. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 16 pages, 8 figures, accepted to MNRAS

Showing 51–100 of 1,330 results for author: He, H