Search | arXiv e-print repository

Hadronic cross section measurements with the DAMPE space mission using 20GeV-10TeV cosmic-ray protons and $^4$He

Authors: F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De Benedittis, I. De Mitri, F. de Palma, A. Di Giovanni, Q. Ding, T. K. Dong , et al. (126 additional authors not shown)

Abstract: Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based exp… ▽ More Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based experiments. We present an energy-dependent measurement of the inelastic cross section of protons and helium-4 nuclei (alpha particles) on a Bi$_4$Ge$_3$O$_{12}$ target, using 88 months of data collected by the DAMPE space mission. The kinetic energy range per nucleon of the measurement points ranges from 18 GeV to 9 TeV for protons, and from 5 GeV/n to 3 TeV/n for helium-4 nuclei. Our results lead to a significant improvement of the CR flux normalisation. In the case of helium-4, these results correspond to the first cross section measurements on a heavy target material at energies above 10 GeV/n. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 17 pages, submitted to PRD

arXiv:2408.17014 [pdf, ps, other]

Channel Estimation for XL-IRS Assisted Wireless Systems with Double-sided Visibility Regions

Authors: Chao Zhou, Changsheng You, Shiqi Gong, Bin Lyu, Beixiong Zheng, Yi Gong

Abstract: In this paper, we study efficient channel estimation design for an extremely large-scale intelligent reflecting surface (XL-IRS) assisted multi-user communication systems, where both the base station (BS) and users are located in the near-field region of the XL-IRS. Two unique channel characteristics of XL-IRS are considered, namely, the near-field spherical wavefronts and double-sided visibility… ▽ More In this paper, we study efficient channel estimation design for an extremely large-scale intelligent reflecting surface (XL-IRS) assisted multi-user communication systems, where both the base station (BS) and users are located in the near-field region of the XL-IRS. Two unique channel characteristics of XL-IRS are considered, namely, the near-field spherical wavefronts and double-sided visibility regions (VRs) at the BS and users, which render the channel estimation for XL-IRS highly challenging. To address this issue, we propose in this paper an efficient three-step XL-IRS channel estimation method. Specifically, in the first step, an anchor node is delicately deployed near the XL-IRS to estimate the cascaded BS-IRS-anchor channel. Then, an efficient VR detection method is devised to estimate the VR information between the BS and XL-IRS. In this way, only the channels from the visible XL-IRS elements to the BS are estimated, thereby reducing the dimension of the cascaded BS-IRS-users channels to be estimated. Third, by leveraging the common BS-IRS channel, the cascaded channels for all users are consecutively estimated accounting for the VRs of the IRS-user channels. Finally, numerical results are provided to demonstrate the effectiveness of our proposed channel estimation scheme as compared to various benchmark schemes. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 6 pages, 5 figures

arXiv:2408.16068 [pdf, other]

Identification of Prognostic Biomarkers for Stage III Non-Small Cell Lung Carcinoma in Female Nonsmokers Using Machine Learning

Authors: Huili Zheng, Qimin Zhang, Yiru Gong, Zheyan Liu, Shaohan Chen

Abstract: Lung cancer remains a leading cause of cancer-related deaths globally, with non-small cell lung cancer (NSCLC) being the most common subtype. This study aimed to identify key biomarkers associated with stage III NSCLC in non-smoking females using gene expression profiling from the GDS3837 dataset. Utilizing XGBoost, a machine learning algorithm, the analysis achieved a strong predictive performanc… ▽ More Lung cancer remains a leading cause of cancer-related deaths globally, with non-small cell lung cancer (NSCLC) being the most common subtype. This study aimed to identify key biomarkers associated with stage III NSCLC in non-smoking females using gene expression profiling from the GDS3837 dataset. Utilizing XGBoost, a machine learning algorithm, the analysis achieved a strong predictive performance with an AUC score of 0.835. The top biomarkers identified - CCAAT enhancer binding protein alpha (C/EBP-alpha), lactate dehydrogenase A4 (LDHA), UNC-45 myosin chaperone B (UNC-45B), checkpoint kinase 1 (CHK1), and hypoxia-inducible factor 1 subunit alpha (HIF-1-alpha) - have been validated in the literature as being significantly linked to lung cancer. These findings highlight the potential of these biomarkers for early diagnosis and personalized therapy, emphasizing the value of integrating machine learning with molecular profiling in cancer research. △ Less

Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

Comments: This paper has been accepted for publication in the IEEE ICBASE 2024 conference

arXiv:2408.15064 [pdf, other]

The constraint on modified black holes with extreme mass ratio inspirals

Authors: Chao Zhang, Guoyang Fu, Yungui Gong

Abstract: The low-energy effective action of String Theory introduces corrections to the dilaton-graviton sector, resulting in deformed black holes beyond general relativity. We analyze extreme mass-ratio inspiral systems (EMRIs), where a stellar-mass object spirals into a slowly rotating supermassive black hole including a distinct deviation parameter. This study examines the effects of this deformation on… ▽ More The low-energy effective action of String Theory introduces corrections to the dilaton-graviton sector, resulting in deformed black holes beyond general relativity. We analyze extreme mass-ratio inspiral systems (EMRIs), where a stellar-mass object spirals into a slowly rotating supermassive black hole including a distinct deviation parameter. This study examines the effects of this deformation on gravitational wave fluxes, orbital evolution, and phase dynamics, incorporating leading-order post-Newtonian corrections. With one-year observations of EMRIs, we employ the Fisher information matrix method to evaluate the potential for detecting deviations from general relativity through space-based gravitational wave detectors that utilize time-delay interferometry to suppress laser noise. The constraint on modified black holes, $Δα\preceq 10^{-5}$, is almost the same with and without the time-delay interferometry combination. This analysis enhances our understanding and underscores the crucial role of observations in advancing gravitational phenomena within String Theory. △ Less

Submitted 28 August, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

Comments: 19 pages, 4 figures; Added some references and revised some sentences; Comments are welcome

arXiv:2408.10672 [pdf, other]

Neural Exploratory Landscape Analysis

Authors: Zeyuan Ma, Jiacheng Chen, Hongshu Guo, Yue-Jiao Gong

Abstract: Recent research in Meta-Black-Box Optimization (MetaBBO) have shown that meta-trained neural networks can effectively guide the design of black-box optimizers, significantly reducing the need for expert tuning and delivering robust performance across complex problem distributions. Despite their success, a paradox remains: MetaBBO still rely on human-crafted Exploratory Landscape Analysis features… ▽ More Recent research in Meta-Black-Box Optimization (MetaBBO) have shown that meta-trained neural networks can effectively guide the design of black-box optimizers, significantly reducing the need for expert tuning and delivering robust performance across complex problem distributions. Despite their success, a paradox remains: MetaBBO still rely on human-crafted Exploratory Landscape Analysis features to inform the meta-level agent about the low-level optimization progress. To address the gap, this paper proposes Neural Exploratory Landscape Analysis (NeurELA), a novel framework that dynamically profiles landscape features through a two-stage, attention-based neural network, executed in an entirely end-to-end fashion. NeurELA is pre-trained over a variety of MetaBBO algorithms using a multi-task neuroevolution strategy. Extensive experiments show that NeurELA achieves consistently superior performance when integrated into different and even unseen MetaBBO tasks and can be efficiently fine-tuned for further performance boost. This advancement marks a pivotal step in making MetaBBO algorithms more autonomous and broadly applicable. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.10571

Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

Authors: Cong Wan, Yuhang He, Xiang Song, Yihong Gong

Abstract: Diffusion models have revolutionized customized text-to-image generation, allowing for efficient synthesis of photos from personal data with textual descriptions. However, these advancements bring forth risks including privacy breaches and unauthorized replication of artworks. Previous researches primarily center around using prompt-specific methods to generate adversarial examples to protect pers… ▽ More Diffusion models have revolutionized customized text-to-image generation, allowing for efficient synthesis of photos from personal data with textual descriptions. However, these advancements bring forth risks including privacy breaches and unauthorized replication of artworks. Previous researches primarily center around using prompt-specific methods to generate adversarial examples to protect personal images, yet the effectiveness of existing methods is hindered by constrained adaptability to different prompts. In this paper, we introduce a Prompt-Agnostic Adversarial Perturbation (PAP) method for customized diffusion models. PAP first models the prompt distribution using a Laplace Approximation, and then produces prompt-agnostic perturbations by maximizing a disturbance expectation based on the modeled distribution. This approach effectively tackles the prompt-agnostic attacks, leading to improved defense stability. Extensive experiments in face privacy and artistic style protection, demonstrate the superior generalization of our method in comparison to existing techniques. △ Less

Submitted 29 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: The experiments are insufficient and need to be completed

arXiv:2408.09664 [pdf]

3D-printed terahertz subwavelength dual-core fibers with dense channel-integration

Authors: Haiyuan Ge, Haisu Li, Lu Jie, Jianshuai Wang, Yang Cao, Shaghik Atakaramians, Yandong Gong, Guobin Ren, Li Pei

Abstract: Terahertz (THz) fiber that provides high-speed connections is an essential component in THz communication systems. The emerging space-division-multiplexing technology is expected to increase the transmission capacity of THz communications. A promising candidate to achieve that is integrating multiple channels in a compact THz multi-core fiber system. Here, we propose and experimentally demonstrate… ▽ More Terahertz (THz) fiber that provides high-speed connections is an essential component in THz communication systems. The emerging space-division-multiplexing technology is expected to increase the transmission capacity of THz communications. A promising candidate to achieve that is integrating multiple channels in a compact THz multi-core fiber system. Here, we propose and experimentally demonstrate a THz subwavelength rectangular dielectric dual-core fiber structure, where two identical cores can be densely integrated, thanks to the polarization-maintaining feature of the rectangular fiber. Different configurations, including the placements, core-spacings, and polarization states of two fiber cores, are comprehensively investigated to improve channel isolation. Numerical simulations show that the fractional power in core of fiber mode has a dominant effect on inter-core coupling performance. Moreover, we design the core size (1 mm x 0.5 mm) slightly less than the WR5.1 waveguide (1.295 mm x 0.6475 mm) so that the fiber can be conveniently connected with the WR5.1 flange port with mode excitation efficiencies up to 62.8%. A cost-efficient dielectric 3D printing technique is employed for rapid fabrications of dual-core fibers and corresponding polymer flange structures that offer solid integration between the fiber samples and the WR5.1 port. Experimental measurements demonstrate that a 4-mm core-spacing (less than three times the operation wavelengths over 0.17-0.21 THz) supports robust dual-channel propagation with channel isolation values more than 15 dB, which are consistent with theoretical and numerical results. This work provides a densely integrated dual-core fiber system with low fabrication cost and practical connection to the WR5.1 flange, holding exciting potential for high-capacity THz space-division-multiplexing communication systems. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 10 pages, 9 figures, 3 tables

arXiv:2408.08909 [pdf]

An Adaptive Differential Privacy Method Based on Federated Learning

Authors: Zhiqiang Wang, Xinyue Yu, Qianli Huang, Yongguang Gong

Abstract: Differential privacy is one of the methods to solve the problem of privacy protection in federated learning. Setting the same privacy budget for each round will result in reduced accuracy in training. The existing methods of the adjustment of privacy budget consider fewer influencing factors and tend to ignore the boundaries, resulting in unreasonable privacy budgets. Therefore, we proposed an ada… ▽ More Differential privacy is one of the methods to solve the problem of privacy protection in federated learning. Setting the same privacy budget for each round will result in reduced accuracy in training. The existing methods of the adjustment of privacy budget consider fewer influencing factors and tend to ignore the boundaries, resulting in unreasonable privacy budgets. Therefore, we proposed an adaptive differential privacy method based on federated learning. The method sets the adjustment coefficient and scoring function according to accuracy, loss, training rounds, and the number of datasets and clients. And the privacy budget is adjusted based on them. Then the local model update is processed according to the scaling factor and the noise. Fi-nally, the server aggregates the noised local model update and distributes the noised global model. The range of parameters and the privacy of the method are analyzed. Through the experimental evaluation, it can reduce the privacy budget by about 16%, while the accuracy remains roughly the same. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.08589 [pdf, other]

Cosmological Prediction of the Void and Galaxy Clustering Measurements in the CSST Spectroscopic Survey

Authors: Yingxiao Song, Qi Xiong, Yan Gong, Furen Deng, Kwan Chuen Chan, Xuelei Chen, Qi Guo, Guoliang Li, Ming Li, Yun Liu, Yu Luo, Wenxiang Pei, Chengliang Wei

Abstract: The void power spectrum is related to the clustering of low-density regions in the large-scale structure (LSS) of the Universe, and can be used as an effective cosmological probe to extract the information of the LSS. We generate the galaxy mock catalogs from Jiutian simulation, and identify voids using the watershed algorithm for studying the cosmological constraint strength of the China Space St… ▽ More The void power spectrum is related to the clustering of low-density regions in the large-scale structure (LSS) of the Universe, and can be used as an effective cosmological probe to extract the information of the LSS. We generate the galaxy mock catalogs from Jiutian simulation, and identify voids using the watershed algorithm for studying the cosmological constraint strength of the China Space Station Telescope (CSST) spectroscopic survey. The galaxy and void auto power spectra and void-galaxy cross power spectra at $z=0.3$, 0.6, and 0.9 are derived from the mock catalogs. To fit the full power spectra, we propose to use the void average effective radius at a given redshift to simplify the theoretical model, and adopt the Markov Chain Monte Carlo (MCMC) technique to implement the constraints on the cosmological and void parameters. The systematical parameters, such as galaxy and void biases, and noise terms in the power spectra are also included in the fitting process. We find that our theoretical model can correctly extract the cosmological information from the galaxy and void power spectra, which demonstrates its feasibility and effectivity. The joint constraint accuracy of the cosmological parameters can be improved by $\sim20\%$ compared to that from the galaxy power spectrum only. The fitting results of the void density profile and systematical parameters are also well constrained and consistent with the expectation. This indicates that the void clustering measurement can be an effective complement to the galaxy clustering probe, especially for the next generation galaxy surveys. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 11 pages, 5 figures, 2 tables

arXiv:2408.05363 [pdf, other]

AyE-Edge: Automated Deployment Space Search Empowering Accuracy yet Efficient Real-Time Object Detection on the Edge

Authors: Chao Wu, Yifan Gong, Liangkai Liu, Mengquan Li, Yushu Wu, Xuan Shen, Zhimin Li, Geng Yuan, Weisong Shi, Yanzhi Wang

Abstract: Object detection on the edge (Edge-OD) is in growing demand thanks to its ever-broad application prospects. However, the development of this field is rigorously restricted by the deployment dilemma of simultaneously achieving high accuracy, excellent power efficiency, and meeting strict real-time requirements. To tackle this dilemma, we propose AyE-Edge, the first-of-this-kind development tool tha… ▽ More Object detection on the edge (Edge-OD) is in growing demand thanks to its ever-broad application prospects. However, the development of this field is rigorously restricted by the deployment dilemma of simultaneously achieving high accuracy, excellent power efficiency, and meeting strict real-time requirements. To tackle this dilemma, we propose AyE-Edge, the first-of-this-kind development tool that explores automated algorithm-device deployment space search to realize Accurate yet power-Efficient real-time object detection on the Edge. Through a collaborative exploration of keyframe selection, CPU-GPU configuration, and DNN pruning strategy, AyE-Edge excels in extensive real-world experiments conducted on a mobile device. The results consistently demonstrate AyE-Edge's effectiveness, realizing outstanding real-time performance, detection accuracy, and notably, a remarkable 96.7% reduction in power consumption, compared to state-of-the-art (SOTA) competitors. △ Less

Submitted 25 July, 2024; originally announced August 2024.

arXiv:2408.00297 [pdf, other]

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

Authors: Qianyun He, Xinya Ji, Yicheng Gong, Yuanxun Lu, Zhengyu Diao, Linjia Huang, Yao Yao, Siyu Zhu, Zhan Ma, Songcen Xu, Xiaofei Wu, Zixiao Zhang, Xun Cao, Hao Zhu

Abstract: We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations,… ▽ More We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations, and per-frame 3D geometry. By training on the EmoTalk3D dataset, we propose a \textit{`Speech-to-Geometry-to-Appearance'} mapping framework that first predicts faithful 3D geometry sequence from the audio features, then the appearance of a 3D talking head represented by 4D Gaussians is synthesized from the predicted geometry. The appearance is further disentangled into canonical and dynamic Gaussians, learned from multi-view videos, and fused to render free-view talking head animation. Moreover, our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views. Our method exhibits improved rendering quality and stability in lip motion generation while capturing dynamic facial details such as wrinkles and subtle expressions. Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads. The code and EmoTalk3D dataset are released at https://nju-3dv.github.io/projects/EmoTalk3D. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: ECCV 2024

arXiv:2407.17154 [pdf, other]

Forecasting Constraint on the $f(R)$ Theory with the CSST SN Ia and BAO Surveys

Authors: Jun-Hui Yan, Yan Gong, Minglin Wang, Haitao Miao, Xuelei Chen

Abstract: The $f(R)$ modified gravity theory can explain the accelerating expansion of the late Universe without introducing dark energy. In this study, we predict the constraint strength on the $f(R)$ theory using the mock data generated from the China Space Station Telescope (CSST) Ultra-Deep Field (UDF) Type Ia supernova (SN Ia) survey and wide-field slitless spectroscopic baryon acoustic oscillation (BA… ▽ More The $f(R)$ modified gravity theory can explain the accelerating expansion of the late Universe without introducing dark energy. In this study, we predict the constraint strength on the $f(R)$ theory using the mock data generated from the China Space Station Telescope (CSST) Ultra-Deep Field (UDF) Type Ia supernova (SN Ia) survey and wide-field slitless spectroscopic baryon acoustic oscillation (BAO) survey. We explore three popular $f(R)$ models, and introduce a parameter $b$ to characterize the deviation of the f(R) theory from the $Λ$CDM theory. The Markov Chain Monte Carlo (MCMC) method is employed to constrain the parameters in the $f(R)$ models, and the nuisance parameters and systematical uncertainties are also considered in the model fitting process. Besides, we also perform model comparisons between the $f(R)$ models and the $Λ$CDM model. We find that the constraint accuracy using the CSST SN Ia+BAO dataset alone is comparable to or even better than the result given by the combination of the current relevant observations, and the CSST SN Ia+BAO survey can distinguish the $f(R)$ models from the $Λ$CDM model. This indicates that the CSST SN Ia and BAO surveys can effectively constrain and test the $f(R)$ theory. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 15 pages, 3 figures, 2 tables

arXiv:2407.15092 [pdf, other]

PFWNN: A deep learning method for solving forward and inverse problems of phase-field models

Authors: Gang Bao, Chang Ma, Yuxuan Gong

Abstract: Phase-field models have been widely used to investigate the phase transformation phenomena. However, it is difficult to solve the problems numerically due to their strong nonlinearities and higher-order terms. This work is devoted to solving forward and inverse problems of the phase-field models by a novel deep learning framework named Phase-Field Weak-form Neural Networks (PFWNN), which is based… ▽ More Phase-field models have been widely used to investigate the phase transformation phenomena. However, it is difficult to solve the problems numerically due to their strong nonlinearities and higher-order terms. This work is devoted to solving forward and inverse problems of the phase-field models by a novel deep learning framework named Phase-Field Weak-form Neural Networks (PFWNN), which is based on the weak forms of the phase-field equations. In this framework, the weak solutions are parameterized as deep neural networks with a periodic layer, while the test function space is constructed by functions compactly supported in small regions. The PFWNN can efficiently solve the phase-field equations characterizing the sharp transitions and identify the important parameters by employing the weak forms. It also allows local training in small regions, which significantly reduce the computational cost. Moreover, it can guarantee the residual descending along the time marching direction, enhancing the convergence of the method. Numerical examples are presented for several benchmark problems. The results validate the efficiency and accuracy of the PFWNN. This work also sheds light on solving the forward and inverse problems of general high-order time-dependent partial differential equations. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.13991 [pdf, other]

Accurately Estimating Redshifts from CSST Slitless Spectroscopic Survey using Deep Learning

Authors: Xingchen Zhou, Yan Gong, Xin Zhang, Nan Li, Xian-Min Meng, Xuelei Chen, Run Wen, Yunkun Han, Hu Zou, Xian Zhong Zheng, Xiaohu Yang, Hong Guo, Pengjie Zhang

Abstract: China Space Station Telescope (CSST) has the capability to conduct slitless spectroscopic survey simultaneously with photometric survey. The spectroscopic survey will measure slitless spectra, potentially providing more accurate estimations of galaxy properties, particularly redshift, compared to broadband photometry. However, due to low-resolution and signal-to-noise ratio of slitless spectra, me… ▽ More China Space Station Telescope (CSST) has the capability to conduct slitless spectroscopic survey simultaneously with photometric survey. The spectroscopic survey will measure slitless spectra, potentially providing more accurate estimations of galaxy properties, particularly redshift, compared to broadband photometry. However, due to low-resolution and signal-to-noise ratio of slitless spectra, measurement of these properties is significantly challenging. In this study, we employ a Bayesian neural network (BNN) to assess the accuracy of redshift estimations from slitless spectra anticipated to be observed by CSST. The slitless spectra are simulated based on real data from the early data release of the Dark Energy Spectroscopic Instrument (DESI-EDR) and the 16th data release of the Baryon Oscillaton Spectroscopic Survey (BOSS-DR16), combining the 9th data release of the DESI Legacy Survey (DESI LS DR9). The BNN provides redshifts estimates along with corresponding uncertainties, achieving an accuracy of $σ_{\rm NMAD} = 0.00063$, outlier percentage $η=0.92\%$ and weighted mean uncertainty $\bar{E} = 0.00228$. These results successfully meet the requirement for cosmological studies using slitless spectra from CSST. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 12 pages, 12 figures, submitted to ApJ, comments are welcome

arXiv:2407.13700 [pdf, other]

Cross-Task Attack: A Self-Supervision Generative Framework Based on Attention Shift

Authors: Qingyuan Zeng, Yunpeng Gong, Min Jiang

Abstract: Studying adversarial attacks on artificial intelligence (AI) systems helps discover model shortcomings, enabling the construction of a more robust system. Most existing adversarial attack methods only concentrate on single-task single-model or single-task cross-model scenarios, overlooking the multi-task characteristic of artificial intelligence systems. As a result, most of the existing attacks d… ▽ More Studying adversarial attacks on artificial intelligence (AI) systems helps discover model shortcomings, enabling the construction of a more robust system. Most existing adversarial attack methods only concentrate on single-task single-model or single-task cross-model scenarios, overlooking the multi-task characteristic of artificial intelligence systems. As a result, most of the existing attacks do not pose a practical threat to a comprehensive and collaborative AI system. However, implementing cross-task attacks is highly demanding and challenging due to the difficulty in obtaining the real labels of different tasks for the same picture and harmonizing the loss functions across different tasks. To address this issue, we propose a self-supervised Cross-Task Attack framework (CTA), which utilizes co-attention and anti-attention maps to generate cross-task adversarial perturbation. Specifically, the co-attention map reflects the area to which different visual task models pay attention, while the anti-attention map reflects the area that different visual task models neglect. CTA generates cross-task perturbations by shifting the attention area of samples away from the co-attention map and closer to the anti-attention map. We conduct extensive experiments on multiple vision tasks and the experimental results confirm the effectiveness of the proposed design for adversarial attacks. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Has been accepted by IJCNN2024

arXiv:2407.13646 [pdf, other]

Beyond Dropout: Robust Convolutional Neural Networks Based on Local Feature Masking

Authors: Yunpeng Gong, Chuangliang Zhang, Yongjie Hou, Lifei Chen, Min Jiang

Abstract: In the contemporary of deep learning, where models often grapple with the challenge of simultaneously achieving robustness against adversarial attacks and strong generalization capabilities, this study introduces an innovative Local Feature Masking (LFM) strategy aimed at fortifying the performance of Convolutional Neural Networks (CNNs) on both fronts. During the training phase, we strategically… ▽ More In the contemporary of deep learning, where models often grapple with the challenge of simultaneously achieving robustness against adversarial attacks and strong generalization capabilities, this study introduces an innovative Local Feature Masking (LFM) strategy aimed at fortifying the performance of Convolutional Neural Networks (CNNs) on both fronts. During the training phase, we strategically incorporate random feature masking in the shallow layers of CNNs, effectively alleviating overfitting issues, thereby enhancing the model's generalization ability and bolstering its resilience to adversarial attacks. LFM compels the network to adapt by leveraging remaining features to compensate for the absence of certain semantic features, nurturing a more elastic feature learning mechanism. The efficacy of LFM is substantiated through a series of quantitative and qualitative assessments, collectively showcasing a consistent and significant improvement in CNN's generalization ability and resistance against adversarial attacks--a phenomenon not observed in current and prior methodologies. The seamless integration of LFM into established CNN frameworks underscores its potential to advance both generalization and adversarial robustness within the deep learning paradigm. Through comprehensive experiments, including robust person re-identification baseline generalization experiments and adversarial attack experiments, we demonstrate the substantial enhancements offered by LFM in addressing the aforementioned challenges. This contribution represents a noteworthy stride in advancing robust neural network architectures. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: It has been accepted by IJCNN 2024

arXiv:2407.13640 [pdf, other]

Beyond Augmentation: Empowering Model Robustness under Extreme Capture Environments

Authors: Yunpeng Gong, Yongjie Hou, Chuangliang Zhang, Min Jiang

Abstract: Person Re-identification (re-ID) in computer vision aims to recognize and track individuals across different cameras. While previous research has mainly focused on challenges like pose variations and lighting changes, the impact of extreme capture conditions is often not adequately addressed. These extreme conditions, including varied lighting, camera styles, angles, and image distortions, can sig… ▽ More Person Re-identification (re-ID) in computer vision aims to recognize and track individuals across different cameras. While previous research has mainly focused on challenges like pose variations and lighting changes, the impact of extreme capture conditions is often not adequately addressed. These extreme conditions, including varied lighting, camera styles, angles, and image distortions, can significantly affect data distribution and re-ID accuracy. Current research typically improves model generalization under normal shooting conditions through data augmentation techniques such as adjusting brightness and contrast. However, these methods pay less attention to the robustness of models under extreme shooting conditions. To tackle this, we propose a multi-mode synchronization learning (MMSL) strategy . This approach involves dividing images into grids, randomly selecting grid blocks, and applying data augmentation methods like contrast and brightness adjustments. This process introduces diverse transformations without altering the original image structure, helping the model adapt to extreme variations. This method improves the model's generalization under extreme conditions and enables learning diverse features, thus better addressing the challenges in re-ID. Extensive experiments on a simulated test set under extreme conditions have demonstrated the effectiveness of our method. This approach is crucial for enhancing model robustness and adaptability in real-world scenarios, supporting the future development of person re-identification technology. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: It has been accepted by IJCNN 2024

arXiv:2407.12919 [pdf, other]

doi 10.3847/1538-4357/ad615a

Nitrogen Abundance Distribution in the inner Milky Way

Authors: Jorge L. Pineda, Shinji Horiuchi, L. D. Anderson, Matteo Luisi, William D. Langer, Paul F. Goldsmith, Thomas B. H. Kuiper, Christian Fischer, Yan Gong, Andreas Brunthaler, Michael Rugel, Karl M. Menten

Abstract: We combine a new Galactic plane survey of Hydrogen Radio Recombination Lines (RRLs) with far-infrared (FIR) surveys of ionized Nitrogen, N+, to determine Nitrogen abundance across Galactic radius. RRLs were observed with NASA DSS-43 70m antenna and the Green Bank Telescope in 108 lines-of-sight spanning -135 degrees < l < 60 degrees, at b=0 degrees. These positions were also observed in [N II] 122… ▽ More We combine a new Galactic plane survey of Hydrogen Radio Recombination Lines (RRLs) with far-infrared (FIR) surveys of ionized Nitrogen, N+, to determine Nitrogen abundance across Galactic radius. RRLs were observed with NASA DSS-43 70m antenna and the Green Bank Telescope in 108 lines-of-sight spanning -135 degrees < l < 60 degrees, at b=0 degrees. These positions were also observed in [N II] 122 um and 205 um lines with the Herschel Space Observatory. Combining RRL and [N II] 122 um and 205 um observations in 41 of 108 samples with high signal-to-noise ratio, we studied ionized Nitrogen abundance distribution across Galactocentric distances of 0-8 kpc. Combined with existing Solar neighborhood and Outer galaxy N/H abundance determinations, we studied this quantity's distribution within the Milky Way's inner 17 kpc for the first time. We found a Nitrogen abundance gradient extending from Galactocentric radii of 4-17 kpc in the Galactic plane, while within 0-4 kpc, the N/H distribution remained flat. The gradient observed at large Galactocentric distances supports inside-out galaxy growth with the additional steepening resulting from variable star formation efficiency and/or radial flows in the Galactic disk, while the inner 4 kpc flattening, coinciding with the Galactic bar's onset, may be linked to radial flows induced by the bar potential. Using SOFIA/FIFI-LS and Herschel/PACS, we observed the [N III] 57 um line to trace doubly ionized gas contribution in a sub-sample of sightlines. We found negligible N++ contributions along these sightlines, suggesting mostly singly ionized Nitrogen originating from low ionization H II region outskirts. △ Less

Submitted 19 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: Accepted for publication at the Astrophysical Journal. 25 pages, 13 figures

arXiv:2407.12585 [pdf, other]

A global view on star formation: The GLOSTAR Galactic plane survey. XI. Radio source catalog IV: $2^\circ < \ell < 28^\circ$, $36^\circ < \ell < 60^\circ$ and $|b| < 1^\circ$

Authors: S. -N. X. Medina, S. A. Dzib, J. S. Urquhart, A. Y. Yang, A. Brunthaler, K. M. Menten, F. Wyrowski, W. D. Cotton, A. Cheema, R. Dokara, Y. Gong, S. Khan, H. Nguyen, G. N. Ortiz-Leon, M. R. Rugel, V. S. Veena, H. Beuther, T. Csengeri, J. D. Pandian, N. Roy

Abstract: The GLOSTAR survey studies star formation with the VLA and the Effelsberg 100m telescope in the Galactic plane (-2d<l<60d; |b|<1d) and the Cygnus X region with unprecedented sensitivity in both flux density (~50uJy/beam) and the capability of detecting emission with angular scales in the range from 1" to the largest radio structures in the Galaxy. We provide a complete GLOSTAR-VLA D-configuratio… ▽ More The GLOSTAR survey studies star formation with the VLA and the Effelsberg 100m telescope in the Galactic plane (-2d<l<60d; |b|<1d) and the Cygnus X region with unprecedented sensitivity in both flux density (~50uJy/beam) and the capability of detecting emission with angular scales in the range from 1" to the largest radio structures in the Galaxy. We provide a complete GLOSTAR-VLA D-configuration radio source catalog for the covered part of the Galactic disk. A catalog for the pilot region (28d<l<36d) has been published in a previous paper and here we present the complementary catalog for the area within 2d<l<28d, 36d<l<60d and |b|<1d. Observations were taken with the VLA in a 4-8GHz band to image 100 degrees$^2$ of the inner Galactic disk at a reference frequency of 5.8GHz, using 260h of telescope time. We determined spectral indices inside the observed band and in the frequency range 1.4-5.8GHz by complementing our results with those from the THOR survey (1-2GHz). The final images have an angular resolution of 18" and an average sensitivity of 123uJy/beam. The sensitivity is better (~60uJy/beam) in areas free of extended emission. The Galactic disk catalog presented in this work, consists of 11211 radio sources. Of these, 1965 are known large-scale structure sources such as star-forming region complexes, well-known SNRs, SNR candidates or parts thereof. The remaining 9227 are discrete individual sources. Source parameters, namely flux densities, sizes, spectral indices, and classifications are reported. We identify 769 HII region candidates, 359 are newly classified as such. The mean value of spectral indices of 225 HII regions is 0.14$\pm$0.02, consistent with most of them emitting optically thin thermal radio emission. Combining our results with the previously published catalog of the pilot region, the final GLOSTAR-VLA D-configuration catalog contains 12981 radio sources. △ Less

Submitted 8 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: 21 pages, 18 figures, 7 tables, accepted to be published in the Astronomy & Astrophysics journal. V2 Includes language editor corrections

arXiv:2407.11966 [pdf, other]

Efficient Training with Denoised Neural Weights

Authors: Yifan Gong, Zheng Zhan, Yanyu Li, Yerlan Idelbayev, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren

Abstract: Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consuming and prone to human error. To overcome such limitations, this work takes a novel step towards building a weight generator to synthesize the neural weights for i… ▽ More Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consuming and prone to human error. To overcome such limitations, this work takes a novel step towards building a weight generator to synthesize the neural weights for initialization. We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights spanning a wide range. Specifically, we first collect a dataset with various image editing concepts and their corresponding trained weights, which are later used for the training of the weight generator. To address the different characteristics among layers and the substantial number of weights to be predicted, we divide the weights into equal-sized blocks and assign each block an index. Subsequently, a diffusion model is trained with such a dataset using both text conditions of the concept and the block indexes. By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds. Compared to training from scratch (i.e., Pix2pix), we achieve a 15x training time acceleration for a new concept while obtaining even better image generation quality. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: ECCV 2024. Project Page: https://yifanfanfanfan.github.io/denoised-weights/

arXiv:2407.11772 [pdf, other]

User Behavior Analysis and Clustering in Peace Elite: Insights and Recommendations

Authors: Yang Qiu, Yuxin Gong, Guanliang Liu

Abstract: This study presents a comprehensive analysis of user behavior and clustering in Peace Elite, a popular mobile battle royale game, employing temporal and static data mining techniques to uncover distinct player segments. Our methodology encompasses time series K-means clustering, graph-based algorithms (DeepWalk and LINE), and static attribute clustering, visualized through innovative hybrid charts… ▽ More This study presents a comprehensive analysis of user behavior and clustering in Peace Elite, a popular mobile battle royale game, employing temporal and static data mining techniques to uncover distinct player segments. Our methodology encompasses time series K-means clustering, graph-based algorithms (DeepWalk and LINE), and static attribute clustering, visualized through innovative hybrid charts. Key findings reveal significant variations in player engagement, skill levels, and social interactions across five primary user segments, ranging from highly active and skilled players to inactive or new users. We also analyze the impact of external factors on user retention and the network structure within clusters, uncovering correlations between cluster cohesion and player activity levels. This research provides valuable insights for game developers and marketers, offering data-driven recommendations for personalized game experiences, targeted marketing strategies, and improved player retention in online gaming environments. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11657 [pdf, other]

Hyperfine structure of the methanol molecule as traced by Class I methanol masers

Authors: I. I. Agafonova, O. S. Bayandina, Y. Gong, C. Henkel, Kee-Tae Kim, M. G. Kozlov, B. Lankhaar, S. A. Levshakov, K. M. Menten, W. Ubachs, I. E. Val'tts, W. Yang

Abstract: We present results on simultaneous observations of Class~I methanol masers at 25, 36, and 44 GHz towards 22 Galactic targets carried out with the Effelsberg 100-m telescope. The study investigates relations between the hyperfine (HF) structure of the torsion-rotation transitions in CH3OH and maser activity. By analyzing the radial velocity shifts between different maser lines together with the pat… ▽ More We present results on simultaneous observations of Class~I methanol masers at 25, 36, and 44 GHz towards 22 Galactic targets carried out with the Effelsberg 100-m telescope. The study investigates relations between the hyperfine (HF) structure of the torsion-rotation transitions in CH3OH and maser activity. By analyzing the radial velocity shifts between different maser lines together with the patterns of the HF structure based on laboratory measurements and quantum-chemical calculations, we find that in any source only one specific HF transition forms the maser emission and that this transition changes from source to source. The physical conditions leading to this selective behavior are still unclear. Using accurate laboratory rest frequencies for the 25 GHz transitions, we have refined the centre frequencies for the HF multiplets at 36, 44, and 95 GHz: f_36 = (36169.2488 +/- 0.0002_stat +/- 0.0004_sys) MHz. f_44 = (44069.4176 +/- 0.0002_stat +/- 0.0004_sys) MHz, and f_95 = (95169.4414 +/- 0.0003_stat +/- 0.0004_sys) MHz. Comparison with previous observations of 44 GHz masers performed 6-10 years ago with a Korean 21-m KVN telescope towards the same targets confirms the kinematic stability of Class~I maser line profiles during this time interval and reveals a systematic radial velocity shift of 0.013 +/- 0.005 km/s between the two telescopes. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 24 pages, 8 figures, 9 tables; accepted for publication in MNRAS

arXiv:2407.11320 [pdf]

A2E: Attribute-based Anonymity-Enhanced Authentication for Accessing Driverless Taxi Service

Authors: Yanwei Gong, Xiaolin Chang, Jelena Mišić, Vojislav B. Mišić

Abstract: Driverless vehicle as a taxi is gaining more attention due to its potential to enhance urban transportation efficiency. However, both unforeseen incidents led by unsupervised physical users' driverless taxi (DT) rides and personalized needs of users when riding in a DT necessitate the authentication of user identity and attributes. Moreover, safeguarding user identity privacy and quickly tracing m… ▽ More Driverless vehicle as a taxi is gaining more attention due to its potential to enhance urban transportation efficiency. However, both unforeseen incidents led by unsupervised physical users' driverless taxi (DT) rides and personalized needs of users when riding in a DT necessitate the authentication of user identity and attributes. Moreover, safeguarding user identity privacy and quickly tracing malicious users if necessary to enhance the adoption of DTs remains a challenge. This paper proposes a novel Attribute-based Anonymity Enhanced (A2E) authentication scheme for users to access DT service. From the security aspect, A2E has attribute verifiability, which is achieved by designing a user attribute credential based on redactable signature. Meanwhile, this attribute credential also satisfies unlinkability and unforgeability. In addition, A2E has enhanced anonymity, which is achieved by designing a decentralized credential issuance mechanism utilizing ring signature and secret sharing, safeguarding user attributes from association with anonymous identities. Moreover, this mechanism provides traceability and non-frameability to users. From the performance aspect, A2E causes low overhead when tracing malicious users and updating credentials. Besides, both scalability and lightweight are satisfied, which contributes to A2E's practicability. We conduct security analysis and performance evaluation to the security and performance capabilities of A2E. △ Less

Submitted 20 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10714 [pdf, other]

SEMINAR: Search Enhanced Multi-modal Interest Network and Approximate Retrieval for Lifelong Sequential Recommendation

Authors: Kaiming Shen, Xichen Ding, Zixiang Zheng, Yuqi Gong, Qianqian Li, Zhongyi Liu, Guannan Zhang

Abstract: The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personal… ▽ More The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personalized search ranking (PSR) is extremely difficult due to the insufficient learning problem of ID embedding, especially when the IDs in the lifelong sequence features do not exist in the samples of training dataset. Additionally, existing target attention mechanisms struggle to learn the multi-modal representations of items in the sequence well. The distribution of multi-modal embedding (text, image and attributes) output of user's interacted items are not properly aligned and there exist divergence across modalities. We also observe that users' search query sequences and item browsing sequences can fully depict users' intents and benefit from each other. To address these challenges, we propose a unified lifelong multi-modal sequence model called SEMINAR-Search Enhanced Multi-Modal Interest Network and Approximate Retrieval. Specifically, a network called Pretraining Search Unit (PSU) learns the lifelong sequences of multi-modal query-item pairs in a pretraining-finetuning manner with multiple objectives: multi-modal alignment, next query-item pair prediction, query-item relevance prediction, etc. After pretraining, the downstream model restores the pretrained embedding as initialization and finetunes the network. To accelerate the online retrieval speed of multi-modal embedding, we propose a multi-modal codebook-based product quantization strategy to approximate the exact attention calculati △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 9 pages,code released

arXiv:2407.10281 [pdf, other]

Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning

Authors: Xinyuan Gao, Songlin Dong, Yuhang He, Qiang Wang, Yihong Gong

Abstract: The problem of Rehearsal-Free Continual Learning (RFCL) aims to continually learn new knowledge while preventing forgetting of the old knowledge, without storing any old samples and prototypes. The latest methods leverage large-scale pre-trained models as the backbone and use key-query matching to generate trainable prompts to learn new knowledge. However, the domain gap between the pre-training d… ▽ More The problem of Rehearsal-Free Continual Learning (RFCL) aims to continually learn new knowledge while preventing forgetting of the old knowledge, without storing any old samples and prototypes. The latest methods leverage large-scale pre-trained models as the backbone and use key-query matching to generate trainable prompts to learn new knowledge. However, the domain gap between the pre-training dataset and the downstream datasets can easily lead to inaccuracies in key-query matching prompt selection when directly generating queries using the pre-trained model, which hampers learning new knowledge. Thus, in this paper, we propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA). It mainly comprises a parameter-extensible continual adapter layer (CAL) and a scaling and shifting (S&S) module in parallel with the pre-trained model. C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge, thereby avoiding matching errors and operational inefficiencies introduced by key-query matching. To reduce the gap, C-ADA employs an S&S module to transfer the feature space from pre-trained datasets to downstream datasets. Moreover, we propose an orthogonal loss to mitigate the interaction between old and new knowledge. Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method. Additionally, we conduct experiments on domain-incremental learning, surpassing the SOTA, and demonstrating the generality of our approach in different settings. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV2024

arXiv:2407.08489 [pdf, other]

Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation

Authors: Zeyang Zhao, Qilong Xue, Yuhang He, Yifan Bai, Xing Wei, Yihong Gong

Abstract: This paper introduces the point-axis representation for oriented object detection, emphasizing its flexibility and geometrically intuitive nature with two key components: points and axes. 1) Points delineate the spatial extent and contours of objects, providing detailed shape descriptions. 2) Axes define the primary directionalities of objects, providing essential orientation cues crucial for prec… ▽ More This paper introduces the point-axis representation for oriented object detection, emphasizing its flexibility and geometrically intuitive nature with two key components: points and axes. 1) Points delineate the spatial extent and contours of objects, providing detailed shape descriptions. 2) Axes define the primary directionalities of objects, providing essential orientation cues crucial for precise detection. The point-axis representation decouples location and rotation, addressing the loss discontinuity issues commonly encountered in traditional bounding box-based approaches. For effective optimization without introducing additional annotations, we propose the max-projection loss to supervise point set learning and the cross-axis loss for robust axis representation learning. Further, leveraging this representation, we present the Oriented DETR model, seamlessly integrating the DETR framework for precise point-axis prediction and end-to-end detection. Experimental results demonstrate significant performance improvements in oriented object detection tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 19 pages,7 figures,accpeted by ECCV24!

arXiv:2407.07449 [pdf, other]

Probing new fundamental fields with Extreme Mass Ratio Inspirals

Authors: Chao Zhang, Yungui Gong

Abstract: We examine extreme mass ratio inspirals (EMRIs), where a charged compact object spirals into a supermassive black hole, in modified gravity theories with additional scalar or vector fields. Using the Teukolsky and generalized Sasaki-Nakamura formalisms, we provide the post-Newtonian expansion of the energy flux of the vector waves up to $O(v^5)$ beyond the quadrupole formula in the weak field and… ▽ More We examine extreme mass ratio inspirals (EMRIs), where a charged compact object spirals into a supermassive black hole, in modified gravity theories with additional scalar or vector fields. Using the Teukolsky and generalized Sasaki-Nakamura formalisms, we provide the post-Newtonian expansion of the energy flux of the vector waves up to $O(v^5)$ beyond the quadrupole formula in the weak field and numerically calculate the energy flux in the strong field for a charged particle moving in circular orbits. Our findings reveal a degeneracy in the scalar and vector charge parameters for weak-field, slow-motion orbits. However, for strong-field, fast-motion orbits close to the innermost stable circular orbit, we observe distinct behaviors between scalar and vector fields. We investigate the potential of using EMRIs detected by space-based gravitational-wave detectors, such as the Laser Interferometer Space Antenna to identify whether a black hole carries a scalar or vector charge. The influence of scalar and vector flux on the orbital evolution and tensor GW phase can not help us distinguish scalar and vector fields. However, extra polarizations emitted by the scalar or vector field can break the correlations between the scalar field and vector field and then help us distinguish the scalar and vector field. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 29 pages, 10 figures; comments are welcome

arXiv:2407.07406 [pdf, other]

Weakly-supervised Medical Image Segmentation with Gaze Annotations

Authors: Yuan Zhong, Chenhui Tang, Yumeng Yang, Ruoxi Qi, Kang Zhou, Yuqi Gong, Pheng Ann Heng, Janet H. Hsiao, Qi Dou

Abstract: Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical… ▽ More Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical image segmentation with a gaze annotation scheme. To train with gaze, we propose a multi-level framework that trains multiple networks from discriminative human attention, simulated with a set of pseudo-masks derived by applying hierarchical thresholds on gaze heatmaps. Furthermore, to mitigate gaze noise, a cross-level consistency is exploited to regularize overfitting noisy labels, steering models toward clean patterns learned by peer networks. The proposed method is validated on two public medical datasets of polyp and prostate segmentation tasks. We contribute a high-quality gaze dataset entitled GazeMedSeg as an extension to the popular medical segmentation datasets. To the best of our knowledge, this is the first gaze dataset for medical image segmentation. Our experiments demonstrate that gaze annotation outperforms previous label-efficient annotation schemes in terms of both performance and annotation time. Our collected gaze data and code are available at: https://github.com/med-air/GazeMedSeg. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: MICCAI 2024

arXiv:2407.05770 [pdf, other]

A global view on star formation: The GLOSTAR Galactic plane survey X. Galactic HII region catalog using radio recombination lines

Authors: S. Khan, M. R. Rugel, A. Brunthaler, K. M. Menten, F. Wyrowski, J. S. Urquhart, Y. Gong, A. Y. Yang, H. Nguyen, R. Dokara, S. A. Dzib, S. -N. X. Medina, G. N. Ortiz-León, J. D. Pandian, H. Beuther, V. S. Veena, S. Neupane, A. Cheema, W. Reich, N. Roy

Abstract: Studies of Galactic HII regions are of crucial importance for studying star formation and the evolution of the interstellar medium. Gaining an insight into their physical characteristics contributes to a more comprehensive understanding of these phenomena. The GLOSTAR project aims to provide a GLObal view on STAR formation in the Milky Way by performing an unbiased and sensitive survey. This is ac… ▽ More Studies of Galactic HII regions are of crucial importance for studying star formation and the evolution of the interstellar medium. Gaining an insight into their physical characteristics contributes to a more comprehensive understanding of these phenomena. The GLOSTAR project aims to provide a GLObal view on STAR formation in the Milky Way by performing an unbiased and sensitive survey. This is achieved by using the extremely wideband (4{-}8 GHz) C-band receiver of the Karl G. Jansky Very Large Array and the Effelsberg 100 m telescope. Using radio recombination lines observed in the GLOSTAR survey with the VLA in D-configuration with a typical line sensitivity of 1σ {\sim} 3.0 mJy beam{^-1} at {\sim} 5 km s{^-1} and an angular resolution of 25", we cataloged 244 individual Galactic HII regions and derived their physical properties. We examined the mid-infrared (MIR) morphology of these HII regions and find that a significant portion of them exhibit a bubble-like morphology in the GLIMPSE 8 μm emission. We also searched for associations with the dust continuum and sources of methanol maser emission, other tracers of young stellar objects, and find that 48\% and 14\% of our HII regions, respectively, are coextensive with those. We measured the electron temperature for a large sample of HII regions within Galactocentric distances spanning from 1.6 to 13.1 kpc and derived the Galactic electron temperature gradient as {\sim} 372 {\pm} 28 K kpc{^-1} with an intercept of 4248 {\pm} 161 K, which is consistent with previous studies. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted for publication in A&A

arXiv:2407.04082 [pdf, other]

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Authors: Saurabhchand Bhati, Yuan Gong, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass

Abstract: State-space models (SSMs) have emerged as an alternative to Transformers for audio modeling due to their high computational efficiency with long inputs. While recent efforts on Audio SSMs have reported encouraging results, two main limitations remain: First, in 10-second short audio tagging tasks, Audio SSMs still underperform compared to Transformer-based models such as Audio Spectrogram Transfor… ▽ More State-space models (SSMs) have emerged as an alternative to Transformers for audio modeling due to their high computational efficiency with long inputs. While recent efforts on Audio SSMs have reported encouraging results, two main limitations remain: First, in 10-second short audio tagging tasks, Audio SSMs still underperform compared to Transformer-based models such as Audio Spectrogram Transformer (AST). Second, although Audio SSMs theoretically support long audio inputs, their actual performance with long audio has not been thoroughly evaluated. To address these limitations, in this paper, 1) We applied knowledge distillation in audio space model training, resulting in a model called Knowledge Distilled Audio SSM (DASS). To the best of our knowledge, it is the first SSM that outperforms the Transformers on AudioSet and achieves an mAP of 47.6; and 2) We designed a new test called Audio Needle In A Haystack (Audio NIAH). We find that DASS, trained with only 10-second audio clips, can retrieve sound events in audio recordings up to 2.5 hours long, while the AST model fails when the input is just 50 seconds, demonstrating SSMs are indeed more duration scalable. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02061 [pdf, other]

LiDAR-based HD Map Localization using Semantic Generalized ICP with Road Marking Detection

Authors: Yansong Gong, Xinglian Zhang, Jingyi Feng, Xiao He, Dan Zhang

Abstract: In GPS-denied scenarios, a robust environmental perception and localization system becomes crucial for autonomous driving. In this paper, a LiDAR-based online localization system is developed, incorporating road marking detection and registration on a high-definition (HD) map. Within our system, a road marking detection approach is proposed with real-time performance, in which an adaptive segmenta… ▽ More In GPS-denied scenarios, a robust environmental perception and localization system becomes crucial for autonomous driving. In this paper, a LiDAR-based online localization system is developed, incorporating road marking detection and registration on a high-definition (HD) map. Within our system, a road marking detection approach is proposed with real-time performance, in which an adaptive segmentation technique is first introduced to isolate high-reflectance points correlated with road markings, enhancing real-time efficiency. Then, a spatio-temporal probabilistic local map is formed by aggregating historical LiDAR scans, providing a dense point cloud. Finally, a LiDAR bird's-eye view (LiBEV) image is generated, and an instance segmentation network is applied to accurately label the road markings. For road marking registration, a semantic generalized iterative closest point (SG-ICP) algorithm is designed. Linear road markings are modeled as 1-manifolds embedded in 2D space, mitigating the influence of constraints along the linear direction, addressing the under-constrained problem and achieving a higher localization accuracy on HD maps than ICP. Extensive experiments are conducted in real-world scenarios, demonstrating the effectiveness and robustness of our system. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01541 [pdf]

Integration of Computer Networks and Artificial Neural Networks for an AI-based Network Operator

Authors: Binbin Wu, Jingyu Xu, Yifan Zhang, Bo Liu, Yulu Gong, Jiaxin Huang

Abstract: This paper proposes an integrated approach combining computer networks and artificial neural networks to construct an intelligent network operator, functioning as an AI model. State information from computer networks is transformed into embedded vectors, enabling the operator to efficiently recognize different pieces of information and accurately output appropriate operations for the computer netw… ▽ More This paper proposes an integrated approach combining computer networks and artificial neural networks to construct an intelligent network operator, functioning as an AI model. State information from computer networks is transformed into embedded vectors, enabling the operator to efficiently recognize different pieces of information and accurately output appropriate operations for the computer network at each step. The operator has undergone comprehensive testing, achieving a 100% accuracy rate, thus eliminating operational risks. Furthermore, a novel algorithm is proposed to emphasize crucial training losses, aiming to enhance the efficiency of operator training. Additionally, a simple computer network simulator is created and encapsulated into training and testing environment components, enabling automation of the data collection, training, and testing processes. This abstract outlines the core contributions of the paper while highlighting the innovative methodology employed in the development and validation of the AI-based network operator. △ Less

Submitted 9 April, 2024; originally announced July 2024.

arXiv:2406.19740 [pdf, ps, other]

doi 10.3847/1538-4357/ad47a3

Spatial distribution of C4H and c-C3H2 in cold molecular cores

Authors: Yijia Liu, Junzhi Wang, Shu Liu, Ningyu Tang, Yan Gong, Yuqiang Li, Juan LI, Rui Luo, Yani Xu

Abstract: C$_4$H and $c$-C$_3$H$_2$, as unsaturated hydrocarbon molecules, are important for forming large organic molecules in the interstellar medium. We present mapping observations of C$_4$H ($N$=9$-8$) lines, $c$-C$_3$H$_2$ ($J_{Ka,Kb}$=2$_{1,2}$-1$_{0,1}$) %at 85338.894 MHz and H$^{13}$CO$^+$ ($J$=1$-0$) %at 86754.2884 MHz toward 19 nearby cold molecular cores in the Milky Way with the IRAM 30m telesc… ▽ More C$_4$H and $c$-C$_3$H$_2$, as unsaturated hydrocarbon molecules, are important for forming large organic molecules in the interstellar medium. We present mapping observations of C$_4$H ($N$=9$-8$) lines, $c$-C$_3$H$_2$ ($J_{Ka,Kb}$=2$_{1,2}$-1$_{0,1}$) %at 85338.894 MHz and H$^{13}$CO$^+$ ($J$=1$-0$) %at 86754.2884 MHz toward 19 nearby cold molecular cores in the Milky Way with the IRAM 30m telescope. C$_4$H 9--8 was detected in 13 sources, while $c$-C$_3$H$_2$ was detected in 18 sources. The widely existing C$_4$H and $c$-C$_3$H$_2$ molecules in cold cores provide material to form large organic molecules. Different spatial distributions between C$_4$H 9--8 and $c$-C$_3$H$_2$ 2--1 were found. The relative abundances of these three molecules were obtained under the assumption of local thermodynamic equilibrium conditions with a fixed excitation temperature. The abundance ratio of C$_4$H to $c$-C$_3$H$_2$ ranged from 0.34 $\pm$ 0.09 in G032.93+02 to 4.65 $\pm$ 0.50 in G008.67+22. A weak correlation between C$_4$H/H$^{13}$CO$^+$ and $c$-C$_3$H$_2$/H$^{13}$CO$^+$ abundance ratios was found, with a correlation coefficient of 0.46, which indicates that there is no tight astrochemical connection between C$_4$H and $c$-C$_3$H$_2$ molecules. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 17 pages, 2 figures

arXiv:2406.18625 [pdf, other]

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

Authors: Liming Wang, Yuan Gong, Nauman Dawalatabad, Marco Vilela, Katerina Placek, Brian Tracey, Yishu Gong, Alan Premasiri, Fernando Vieira, James Glass

Abstract: Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression from longitudinal speech recordings of ALS patients. By taking advantage of high-quality pretrained speech features and… ▽ More Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression from longitudinal speech recordings of ALS patients. By taking advantage of high-quality pretrained speech features and longitudinal information in the recordings, our best model achieves 91.0\% AUC, improving upon the previous best model by 5.6\% relative on the ALS TDI dataset. Careful analysis reveals that ALST is capable of fine-grained and interpretable predictions of ALS progression, especially for distinguishing between rarer and more severe cases. Code is publicly available. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17505 [pdf, ps, other]

Chebyshev Moment Method for Regular Graphs II: Discrete Trace Formula

Authors: Yulin Gong, Wenbo Li, Shiping Liu

Abstract: We establish discrete trace formulas on a regular graph to relate its spectrum and non-backtracking walks. Our approach is based on the Chebyshev-type polynomials and we refer to this treatment as Chebyshev moment method. A key fact is that Chebyshev-type polynomials form a complete orthogonal basis with respect to the Kesten-McKay distribution. Based on this method, we further apply Cauchy's inte… ▽ More We establish discrete trace formulas on a regular graph to relate its spectrum and non-backtracking walks. Our approach is based on the Chebyshev-type polynomials and we refer to this treatment as Chebyshev moment method. A key fact is that Chebyshev-type polynomials form a complete orthogonal basis with respect to the Kesten-McKay distribution. Based on this method, we further apply Cauchy's integral formula for holomorphic functions to prove the pre-trace formula for all regular graphs and the trace formula for finite regular graphs. We further apply our results to study the resolvent, heat and Schrödinger equations, Ihara zeta functions, and combinatorial enumeration problems on regular graphs. △ Less

Submitted 25 June, 2024; originally announced June 2024.

MSC Class: 05C30; 05C31; 05C50; 05C62

arXiv:2406.16905 [pdf]

Optimising Random Forest Machine Learning Algorithms for User VR Experience Prediction Based on Iterative Local Search-Sparrow Search Algorithm

Authors: Xirui Tang, Feiyang Li, Zinan Cao, Qixuan Yu, Yulu Gong

Abstract: In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by… ▽ More In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by the sparrow search algorithm, and the random forest algorithm improved based on the iterative local search-sparrow search algorithm, respectively. The results show that the traditional random forest model has a prediction accuracy of 93% on the training set but only 73.3% on the test set, which is poor in generalisation; whereas the model improved by the sparrow search algorithm has a prediction accuracy of 94% on the test set, which is improved compared with the traditional model. What is more noteworthy is that the improved model based on the iterative local search-sparrow search algorithm achieves 100% accuracy on both the training and test sets, which is significantly better than the other two methods. These research results provide new ideas and methods for VR user experience prediction, especially the improved model based on the iterative local search-sparrow search algorithm performs well and is able to more accurately predict and classify the user's VR experience. In the future, the application of this method in other fields can be further explored, and its effectiveness can be verified through real cases to promote the development of AI technology in the field of user experience. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.16694 [pdf, other]

Task Oriented In-Domain Data Augmentation

Authors: Xiao Liang, Xinyu Hu, Simiao Zuo, Yeyun Gong, Qiang Lou, Yi Liu, Shao-Lun Huang, Jian Jiao

Abstract: Large Language Models (LLMs) have shown superior performance in various applications and fields. To achieve better performance on specialized domains such as law and advertisement, LLMs are often continue pre-trained on in-domain data. However, existing approaches suffer from two major issues. First, in-domain data are scarce compared with general domain-agnostic data. Second, data used for contin… ▽ More Large Language Models (LLMs) have shown superior performance in various applications and fields. To achieve better performance on specialized domains such as law and advertisement, LLMs are often continue pre-trained on in-domain data. However, existing approaches suffer from two major issues. First, in-domain data are scarce compared with general domain-agnostic data. Second, data used for continual pre-training are not task-aware, such that they may not be helpful to downstream applications. We propose TRAIT, a task-oriented in-domain data augmentation framework. Our framework is divided into two parts: in-domain data selection and task-oriented synthetic passage generation. The data selection strategy identifies and selects a large amount of in-domain data from general corpora, and thus significantly enriches domain knowledge in the continual pre-training data. The synthetic passages contain guidance on how to use domain knowledge to answer questions about downstream tasks. By training on such passages, the model aligns with the need of downstream applications. We adapt LLMs to two domains: advertisement and math. On average, TRAIT improves LLM performance by 8% in the advertisement domain and 7.5% in the math domain. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16274 [pdf, other]

Dynamical structures of misaligned circumbinary planets under hierarchical three-body systems

Authors: Hanlun Lei, Yanxiang Gong

Abstract: All circumbinary planets (CBPs) currently detected are located in almost co-planar configurations with respect to the binary orbit, due to the fact that CBPs with higher misalignment are more difficult to detect. However, observations of polar circumbinary gas and debris disks in recent years and long-term orbital stability of inclined planets indicate that it is possible to form misaligned CBPs a… ▽ More All circumbinary planets (CBPs) currently detected are located in almost co-planar configurations with respect to the binary orbit, due to the fact that CBPs with higher misalignment are more difficult to detect. However, observations of polar circumbinary gas and debris disks in recent years and long-term orbital stability of inclined planets indicate that it is possible to form misaligned CBPs around eccentricity binaries (even polar CBPs). In this work we focus on the dynamical structures of CBPs in a wide range of parameters in order to provide a guidance for the space where the binary can host planets for a long enough time. To this end, the dynamical model is approximated as a hierarchical three-body problem, and the secular approximation is formulated up to the hexadecapolar order in semimajor axis ratio. Dynamical maps show that there are complex structures in the parameter space. A web of secular resonances is produced in the entire parameter space and it can well explain those numerical structures arising in dynamical maps. Based on perturbative treatments, an adiabatic invariant is introduced and thus dynamical structures can be explored by analysing phase portraits. It is found that (a) the quadrupole-order resonance (nodal resonance) is responsible for the distribution of V-shape region, and high-order and secondary resonances dominate those structures inside or outside V-shape region, and (b) the secondary 1:1 resonance is the culprit causing symmetry breaking of dynamical structures inside polar region. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Accepted for publication in MNRAS. 20 pages, 15 figures

arXiv:2406.15330 [pdf, other]

Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance

Authors: Haoling Li, Xin Zhang, Xiao Liu, Yeyun Gong, Yifan Wang, Yujiu Yang, Qi Chen, Peng Cheng

Abstract: Large language models (LLMs) have revolutionized lots of fields of research. Although it is well-known that fine-tuning is essential for enhancing the capabilities of LLMs, existing research suggests that there is potential redundancy in the fine-tuning process and therefore proposes to update only a subset of parameters. However, these methods fail to leverage the task-specific information to ide… ▽ More Large language models (LLMs) have revolutionized lots of fields of research. Although it is well-known that fine-tuning is essential for enhancing the capabilities of LLMs, existing research suggests that there is potential redundancy in the fine-tuning process and therefore proposes to update only a subset of parameters. However, these methods fail to leverage the task-specific information to identify important parameters during training. Based on the insight that gradients inherently contain information on task-specific data, we propose Gradient-Mask Tuning (GMT), a method that selectively updates parameters during training based on their gradient information. Specifically, we compute the absolute values of the gradients and apply masking to those with relatively smaller magnitudes. Our empirical results across various tasks demonstrate that GMT not only outperforms traditional fine-tuning methods but also elevates the upper limits of LLM performance. Further analysis indicates that GMT exhibits insensitivity to mask ratio and possesses computational efficiency comparable to vanilla SFT. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14931 [pdf, other]

Multi-beam Training for Near-field Communications in High-frequency Bands

Authors: Cong Zhou, Changsheng You, Zixuan Huang, Shuo Shi, Yi Gong, Chan-Byoung Chae, Kaibin Huang

Abstract: In this paper, we study efficient multi-beam training design for near-field communications to reduce the beam training overhead of conventional single-beam training methods. In particular, the array-division based multi-beam training method, which is widely used in far-field communications, cannot be directly applied to the near-field scenario, since different sub-arrays may observe different user… ▽ More In this paper, we study efficient multi-beam training design for near-field communications to reduce the beam training overhead of conventional single-beam training methods. In particular, the array-division based multi-beam training method, which is widely used in far-field communications, cannot be directly applied to the near-field scenario, since different sub-arrays may observe different user angles and there exist coverage holes in the angular domain. To address these issues, we first devise a new near-field multi-beam codebook by sparsely activating a portion of antennas to form a sparse linear array (SLA), hence generating multiple beams simultaneously by effective exploiting the near-field grating-lobs. Next, a two-stage near-field beam training method is proposed, for which several candidate user locations are identified firstly based on multi-beam sweeping over time, followed by the second stage to further determine the true user location with a small number of single-beam sweeping. Finally, numerical results show that our proposed multi-beam training method significantly reduces the beam training overhead of conventional single-beam training methods, yet achieving comparable rate performance in data transmission. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: In this paper, a novel near-field multi-beam training scheme is proposed by sparsely activating a portion of antennas to form a sparse linear array

arXiv:2406.13205 [pdf]

Application of Computer Deep Learning Model in Diagnosis of Pulmonary Nodules

Authors: Yutian Yang, Hongjie Qiu, Yulu Gong, Xiaoyi Liu, Yang Lin, Muqing Li

Abstract: The 3D simulation model of the lung was established by using the reconstruction method. A computer aided pulmonary nodule detection model was constructed. The process iterates over the images to refine the lung nodule recognition model based on neural networks. It is integrated with 3D virtual modeling technology to improve the interactivity of the system, so as to achieve intelligent recognition… ▽ More The 3D simulation model of the lung was established by using the reconstruction method. A computer aided pulmonary nodule detection model was constructed. The process iterates over the images to refine the lung nodule recognition model based on neural networks. It is integrated with 3D virtual modeling technology to improve the interactivity of the system, so as to achieve intelligent recognition of lung nodules. A 3D RCNN (Region-based Convolutional Neural Network) was utilized for feature extraction and nodule identification. The LUNA16 large sample database was used as the research dataset. FROC (Free-response Receiver Operating Characteristic) analysis was applied to evaluate the model, calculating sensitivity at various false positive rates to derive the average FROC. Compared with conventional diagnostic methods, the recognition rate was significantly improved. This technique facilitates the detection of pulmonary abnormalities at an initial phase, which holds immense value for the prompt diagnosis of lung malignancies. △ Less

Submitted 19 June, 2024; originally announced June 2024.

MSC Class: 68T10; 92C50

arXiv:2406.11158 [pdf, other]

Dynamic Modeling and Control for an Offshore Semisubmersible Floating Wind Turbine

Authors: Yingjie Gong, Qinmin Yang, Hua Geng, Wenchao Meng, Lin Wang

Abstract: Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a sem… ▽ More Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a semi-submersible FWT. In particular, both the rotor dynamics and the finite rotations of the platform are considered in presented modeling approach, thereby effectively capturing the complex interplay between the platform, tower, nacelle, and rotor under combined wind and wave loads. Subsequently, based on the developed FWT model, a novel adaptive nonlinear pitch controller is formulated with the goal of striking a trade-off between regulating power generation and reducing platform motion. Notably, the proposed control strategy adopts a continuous control approach, strategically beneficial in circumventing the chattering phenomenon commonly associated with sliding mode control. Furthermore, the controller integrates an online approximator and a robust integral of the sign of the tracking error, facilitating real-time learning of system unknown dynamics while compensating for bounded disturbances. Finally, both the accuracy of the established nonlinear FWT model in predicting key dynamics and the superiority of the presented pitch controller are validated through comprehensive comparative studies. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10082 [pdf, other]

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Authors: Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass

Abstract: Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models such as Whisper are trained with hundreds of thousands of hours of data, and thus learn a better speech-to-text decoder. The huge training data differe… ▽ More Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models such as Whisper are trained with hundreds of thousands of hours of data, and thus learn a better speech-to-text decoder. The huge training data difference motivates us to adapt Whisper to handle video inputs. Inspired by Flamingo which injects visual features into language models, we propose Whisper-Flamingo which integrates visual features into the Whisper speech recognition and translation model with gated cross attention. Our audio-visual Whisper-Flamingo outperforms audio-only Whisper on English speech recognition and En-X translation for 6 languages in noisy conditions. Moreover, Whisper-Flamingo is a versatile model and conducts all of these tasks using one set of parameters, while prior methods are trained separately on each language. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Interspeech 2024. Code https://github.com/roudimit/whisper-flamingo

arXiv:2406.09710 [pdf, other]

Fine-Grained Urban Flow Inference with Multi-scale Representation Learning

Authors: Shilu Yuan, Dongfeng Li, Wei Liu, Xinxin Zhang, Meng Chen, Junjie Zhang, Yongshun Gong

Abstract: Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the interactions and dynamic information between differe… ▽ More Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the interactions and dynamic information between different-scale regions within the city. Different-scale geographical features can capture redundant information from the same spatial areas. In order to effectively learn multi-scale information across time and space, we propose an effective fine-grained urban flow inference model called UrbanMSR, which uses self-supervised contrastive learning to obtain dynamic multi-scale representations of neighborhood-level and city-level geographic information, and fuses multi-scale representations to improve fine-grained accuracy. The fusion of multi-scale representations enhances fine-grained. We validate the performance through extensive experiments on three real-world datasets. The resutls compared with state-of-the-art methods demonstrate the superiority of the proposed model. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09321 [pdf, other]

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

Authors: Delong Ran, Jinyuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang

Abstract: Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LL… ▽ More Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LLM's response are varied, such as manual annotation or prompting GPT-4 in specific ways. Each approach has its own set of strengths and weaknesses, impacting their alignment with human values, as well as the time and financial cost. This diversity in evaluation presents challenges for researchers in choosing suitable evaluation methods and conducting fair comparisons across different jailbreak attacks and defenses. In this paper, we conduct a comprehensive analysis of jailbreak evaluation methodologies, drawing from nearly ninety jailbreak research released between May 2023 and April 2024. Our study introduces a systematic taxonomy of jailbreak evaluators, offering in-depth insights into their strengths and weaknesses, along with the current status of their adaptation. Moreover, to facilitate subsequent research, we propose JailbreakEval, a user-friendly toolkit focusing on the evaluation of jailbreak attempts. It includes various well-known evaluators out-of-the-box, so that users can obtain evaluation results with only a single command. JailbreakEval also allows users to customize their own evaluation workflow in a unified framework with the ease of development and comparison. In summary, we regard JailbreakEval to be a catalyst that simplifies the evaluation process in jailbreak research and fosters an inclusive standard for jailbreak evaluation within the community. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Our code is available at https://github.com/ThuCCSLab/JailbreakEval

arXiv:2406.06558 [pdf, other]

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

Authors: Ye Zhang, Qian Leng, Mengran Zhu, Rui Ding, Yue Wu, Jintong Song, Yulu Gong

Abstract: The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF tech… ▽ More The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF techniques with advanced machine learning models, including Bayesian classifiers, Stochastic Gradient Descent (SGD), Categorical Gradient Boosting (CatBoost), and 12 instances of Deberta-v3-large models. Our approach aims to address the challenges associated with detecting AI-generated text by leveraging the strengths of both traditional feature extraction methods and state-of-the-art deep learning models. Through extensive experiments on a comprehensive dataset, we demonstrate the effectiveness of our proposed method in accurately distinguishing between human and AI-generated text. Our approach achieves superior performance compared to existing methods. This research contributes to the advancement of AI-generated text detection techniques and lays the foundation for developing robust solutions to mitigate the challenges posed by AI-generated content. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.06007 [pdf, other]

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://github.com/richard-peng-xia/CARES. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05759 [pdf, ps, other]

Chebyshev Moment Method for Regular Graphs I: Kesten-McKay and Semicircle distributions

Authors: Yulin Gong, Wenbo Li, Shiping Liu

Abstract: We develop the Chebyshev moment method to study the spectrum of regular graphs, motivated by the work of Serré. By this method, we give an elementary proof of the weak convergence to the Kesten-McKay distribution for the normalized spectral measures of random $N$-lifts in probability as $N$ tends to infinity. For a sequence of random $(q_n+1)$-regular graphs $G_n$ with $n$ vertices, we show that i… ▽ More We develop the Chebyshev moment method to study the spectrum of regular graphs, motivated by the work of Serré. By this method, we give an elementary proof of the weak convergence to the Kesten-McKay distribution for the normalized spectral measures of random $N$-lifts in probability as $N$ tends to infinity. For a sequence of random $(q_n+1)$-regular graphs $G_n$ with $n$ vertices, we show that if $q_n=n^{o(1)}$ and $q_n$ tends to infinity, then the normalized spectral measure converges in Wasserstein $p$-distance $W_{p}$ to the semicircle distribution for any $p \in [1,\infty)$ almost surely. This strengthens the result of Dumitriu and Pal. △ Less

Submitted 9 June, 2024; originally announced June 2024.

MSC Class: 05C31; 05C50; 05C80; 60B20

arXiv:2406.01719 [pdf, other]

Imputation of Missing Photometric Data and Photometric Redshift Estimation for CSST

Authors: Zhijian Luo, Zhirui Tang, Zhu Chen, Liping Fu, Wei Du, Shaohua Zhang, Yan Gong, Chenggang Shu, Junhao Lu, Yicheng Li, Xian-Min Meng, Xingchen Zhou, Zuhui Fan

Abstract: Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimat… ▽ More Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimation methods unusable. The same situation may exist for the upcoming Chinese Space Station Telescope (CSST). In this study, we employ a deep learning method called Generative Adversarial Imputation Networks (GAIN) to impute the missing photometric data in CSST, aiming to reduce the impact of data missing on photo-$z$ estimation and improve estimation accuracy. Our results demonstrate that using the GAIN technique can effectively fill in the missing photometric data in CSST. Particularly, when the data missing rate is below 30\%, the imputation of photometric data exhibits high accuracy, with higher accuracy in the $g$, $r$, $i$, $z$, and $y$ bands compared to the $NUV$ and $u$ bands. After filling in the missing values, the quality of photo-$z$ estimation obtained by the widely used Easy and Accurate Zphot from Yale (EAZY) software is notably enhanced. Evaluation metrics for assessing the quality of photo-$z$ estimation, including the catastrophic outlier fraction ($f_{out}$), the normalized median absolute deviation ($\rm {σ_{NMAD}}$), and the bias of photometric redshift ($bias$), all show some degree of improvement. Our research will help maximize the utilization of observational data and provide a new method for handling sample missing values for applications that require complete photometry data to produce results. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.21045 [pdf]

An Attention-Based Multi-Context Convolutional Encoder-Decoder Neural Network for Work Zone Traffic Impact Prediction

Authors: Qinhua Jiang, Xishun Liao, Yaofa Gong, Jiaqi Ma

Abstract: Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict th… ▽ More Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict the traffic speed and incident likelihood during planned work zone events. The proposed model transforms traffic patterns into 2D space-time images for both model input and output and employs an attention-based multi-context convolutional encoder-decoder architecture to capture the spatial-temporal dependencies between work zone events and traffic variations. Trained and validated on four years of archived work zone traffic data from Maryland, USA, the model demonstrates superior performance over baseline models in predicting traffic speed, incident likelihood, and inferred traffic attributes such as queue length and congestion timings (i.e., start time and duration). Specifically, the proposed model outperforms the baseline models by reducing the prediction error of traffic speed by 5% to 34%, queue length by 11% to 29%, congestion timing by 6% to 17%, and increasing the accuracy of incident predictions by 5% to 7%. Consequently, this model offers substantial promise for enhancing the planning and traffic management of work zones. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Showing 1–50 of 980 results for author: Gong, Y