Search | arXiv e-print repository

Pion Condensation and Pion Star from Holographic QCD

Authors: Yidian Chen, Mingshan Ding, Danning Li, Kazem Bitaghsir Fadafan, Mei Huang

Abstract: The properties of QCD matter at finite isospin densities are investigated employing holographic hard-wall and soft-wall AdS/QCD models. It is confirmed that at high enough isospin densities, charged pions start to condense and the pion superfluid phase appears in the system. It is shown that the chiral condensate and the pion condensate can be transformed to each other and form a `chiral circle' i… ▽ More The properties of QCD matter at finite isospin densities are investigated employing holographic hard-wall and soft-wall AdS/QCD models. It is confirmed that at high enough isospin densities, charged pions start to condense and the pion superfluid phase appears in the system. It is shown that the chiral condensate and the pion condensate can be transformed to each other and form a `chiral circle' in the superfluid phase. We derived the Equation of State (EoS) for pionic matter, calculated the normalized trace anomaly $Δ$ and $(ε-3p)/m_π^4$, and analyzed the sound speed and adiabatic index. Additionally, we provided data on the mass-radius relation and tidal deformability of pion stars. The results indicate that the holographic models align well with lattice QCD concerning isospin density, axial-vector condensation, EoS, and trace anomaly, though discrepancies in sound speed and adiabatic index emerge at higher isospin chemical potentials. The holographic models closely match those from chiral perturbation theory ($χ$PT), suggesting that they can be considered as five-dimensional description of $χ$PT. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 24 pages, 5 figures

arXiv:2408.16988 [pdf, other]

Periodic Coronal Rain Driven by Self-consistent Heating Process in a Radiative Magnetohydrodynamic Simulation

Authors: Zekun Lu, Feng Chen, J. H. Guo, M. D. Ding, Can Wang, Haocheng Yu, Y. W. Ni, Chun Xia

Abstract: The periodic coronal rain and in-phase radiative intensity pulsations have been observed in multiple wavelengths in recent years. However, due to the lack of three-dimensional coronal magnetic fields and thermodynamic data in observations, it remains challenging to quantify the coronal heating rate that drives the mass cycles. In this work, based on the MURaM code, we conduct a three-dimensional r… ▽ More The periodic coronal rain and in-phase radiative intensity pulsations have been observed in multiple wavelengths in recent years. However, due to the lack of three-dimensional coronal magnetic fields and thermodynamic data in observations, it remains challenging to quantify the coronal heating rate that drives the mass cycles. In this work, based on the MURaM code, we conduct a three-dimensional radiative magnetohydrodynamic simulation spanning from the convective zone to the corona, where the solar atmosphere is heated self-consistently through dissipation resulting from magneto-convection. For the first time, we model the periodic coronal rain in an active region. With a high spatial resolution, the simulation well resembles the observational features across different extreme ultraviolet wavelengths. These include the realistic interweaving coronal loops, periodic coronal rain and periodic intensity pulsations, with two periods of 3.0~h and 3.7~h identified within one loop system. Moreover, the simulation allows for a detailed three-dimensional depiction of coronal rain on small scales, revealing adjacent shower-like rain clumps $\sim500$~km in width and showcasing their multi-thermal internal structures. We further reveal that these periodic variations essentially reflect the cyclic energy evolution of the coronal loop under thermal non-equilibrium state. Importantly, as the driver of the mass circulation, the self-consistent coronal heating rate is considerably complex in time and space, with hour-level variations in one order of magnitude, minute-level bursts, and varying asymmetry reaching ten times between footpoints. This provides an instructive template for the ad hoc heating function, and further enhances our understanding of the coronal heating process. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 14 Pages, 7 figures, accepted for publication in ApJL

arXiv:2408.16500 [pdf, other]

CogVLM2: Visual Language Models for Image and Video Understanding

Authors: Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

Abstract: Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2… ▽ More Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2 inherits the visual expert architecture with improved training recipes in both pre-training and post-training stages, supporting input resolution up to $1344 \times 1344$ pixels. As a video understanding model, CogVLM2-Video integrates multi-frame input with timestamps and proposes automated temporal grounding data construction. Notably, CogVLM2 family has achieved state-of-the-art results on benchmarks like MMBench, MM-Vet, TextVQA, MVBench and VCGBench. All models are open-sourced in https://github.com/THUDM/CogVLM2 and https://github.com/THUDM/GLM-4, contributing to the advancement of the field. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.12222 [pdf]

Formation mechanism of the (2 x 1) reconstruction of calcite (104)

Authors: Haojun Zhou, Yingquan Chen, Mingyue Ding, Xiaoliang Zhong

Abstract: Calcite has recently attracted extensive research interest in fields ranging from geoscience to carbon dioxide removal. Although much effort has been made to study the (2x1) reconstruction of the most stable (104) surface, the origin of this reconstruction remains unclear. Here, we carefully investigate the atomic and electronic structures of calcite (104) via density functional theory methods wit… ▽ More Calcite has recently attracted extensive research interest in fields ranging from geoscience to carbon dioxide removal. Although much effort has been made to study the (2x1) reconstruction of the most stable (104) surface, the origin of this reconstruction remains unclear. Here, we carefully investigate the atomic and electronic structures of calcite (104) via density functional theory methods with van der Waals corrections. The results unambiguously show that the driving force for this reconstruction is the intrinsic demands of surface atoms to increase the coordination numbers. On reconstructing, calcite (104) forms four additional Ca-O bonds per (2x1) unit cell. Besides, phonon spectrums indicate both unreconstructed and reconstructed surfaces are dynamically stable. Finally, by applying the climbing image nudged elastic band method, an energy barrier is predicted during the reconstructing. This work delivers a full picture for the formation of calcite (104)-(2x1) reconstruction and can greatly advance the understanding of surface science for calcite. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.08911 [pdf, ps, other]

Determining internal topological structures and running cost of mean field games with partial boundary measurement

Authors: Ming-Hui Ding, Hongyu Liu, Guang-Hui Zheng

Abstract: This paper investigates the simultaneous reconstruction of the running cost function and the internal topological structure within the mean-field games (MFG) system utilizing partial boundary data. The inverse problem is notably challenging due to factors such as nonlinear coupling, the necessity for multi-parameter reconstruction, constraints on probability measures, and the limited availability… ▽ More This paper investigates the simultaneous reconstruction of the running cost function and the internal topological structure within the mean-field games (MFG) system utilizing partial boundary data. The inverse problem is notably challenging due to factors such as nonlinear coupling, the necessity for multi-parameter reconstruction, constraints on probability measures, and the limited availability of measurement information. To address these challenges, we propose an innovative approach grounded in a higher-order linearization method. This method is tailored for inverse problems in MFG systems that involve Dirichlet and Neumann boundary conditions. Initially, we present unique reconstruction results for the cost function and internal topological structure of the MFG system under various homogeneous boundary conditions. Subsequently, we extend these results to accommodate inhomogeneous boundary conditions. These findings greatly enhance our understanding of simultaneous reconstruction in complex MFG systems. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.07833 [pdf, other]

Laboratory confirmation and improved Accuracy of 4f and 5d energy levels of Fe II previously identified from stellar spectra

Authors: M. Ding, H. Kozuki, F. Concepcion, G. Nave, J. C. Pickering

Abstract: Many energy levels of singly ionised iron (Fe II, $Z=26$) remain uncertain or experimentally unknown. Their identification and spectral line data are required in reliable astrophysical spectral analyses. In motivation for improving the atomic data of Fe II, we analysed emission spectra of a Fe-Ne plasma produced by a Penning discharge lamp recorded by high-resolution Fourier transform spectroscopy… ▽ More Many energy levels of singly ionised iron (Fe II, $Z=26$) remain uncertain or experimentally unknown. Their identification and spectral line data are required in reliable astrophysical spectral analyses. In motivation for improving the atomic data of Fe II, we analysed emission spectra of a Fe-Ne plasma produced by a Penning discharge lamp recorded by high-resolution Fourier transform spectroscopy in the region 9000-27,000 cm$^{-1}$ (11,111-3704 Å). Semi-empirical transition probability calculations and stellar spectra of Fe II were used to guide the analysis. In total, 24 energy levels of the 3d$^6$4f and 3d$^6$5d configurations of Fe II lying between 122,351-127,881 cm$^{-1}$ were confirmed in the laboratory for the first time, in agreement with their identities proposed by previous investigations involving only stellar spectra. Level energy and wavelength uncertainties of the 24 levels are improved by up to an order-of-magnitude compared to previously published values. These results will enable more reliable application of Fe II in astrophysical spectroscopic analyses and support further investigations of the spectrum and energy levels of Fe II. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.07830 [pdf, other]

Spectrum and energy levels of the high-lying singly excited configurations of Nd III

Authors: M. Ding, A. N. Ryabtsev, E. Y. Kononov, T. Ryabchikova, J. C. Pickering

Abstract: Fourier transform spectra of Nd Penning and hollow cathode discharge lamps were recorded within the region 32,500-54,000 cm$^{-1}$ (3077-1852 Å) and grating spectra of Nd vacuum sliding sparks were recorded within the regions 820-1159 Å and 1600-3250 Å. New energy levels were found using the observed wavelengths measured accurate to a few parts in $10^8$ in Fourier transform spectra and to a few p… ▽ More Fourier transform spectra of Nd Penning and hollow cathode discharge lamps were recorded within the region 32,500-54,000 cm$^{-1}$ (3077-1852 Å) and grating spectra of Nd vacuum sliding sparks were recorded within the regions 820-1159 Å and 1600-3250 Å. New energy levels were found using the observed wavelengths measured accurate to a few parts in $10^8$ in Fourier transform spectra and to a few parts in $10^7$ in grating spectra. Atomic structure and transition probability calculations of Nd III were made using the Cowan codes by adjusting energy parameters to fit all known Nd III levels. Nd-rich stellar spectra were also used to evaluate the new calculations. In total, 355 transitions were classified from observed spectra involving 116 previously experimentally unknown energy levels of the 4f$^3$7s, 4f$^3$6d, and 4f$^3$5f configurations of Nd III, all reported here for the first time. One newly identified level of the 4f$^3$5d configuration is also reported. Typical level energy uncertainties are 0.01 cm$^{-1}$ for the 4f$^3$7s and 4f$^3$6d levels and 0.3 cm$^{-1}$ for the 4f$^3$5f levels. In addition, calculated energy levels up to 130,936 cm$^{-1}$ are presented, including eigenvector composition and calculated level lifetimes. Calculated transition probabilities and wavelengths between 1900-50,000 Å are also presented. Using newly established levels of the 4f$^3$7s configuration and the recently established levels of the 4f$^3$6s configuration, the ionisation energy of Nd III was estimated at $178,090\pm330$ cm$^{-1}$, doubling the accuracy of the previously published value. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.06327 [pdf, other]

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Authors: Xiao Liu, Tianjie Zhang, Yu Gu, Iat Long Iong, Yifan Xu, Xixuan Song, Shudan Zhang, Hanyu Lai, Xinyi Liu, Hanlin Zhao, Jiadai Sun, Xinyue Yang, Yu Yang, Zehan Qi, Shuntian Yao, Xueqiao Sun, Siyi Cheng, Qinkai Zheng, Hao Yu, Hanchen Zhang, Wenyi Hong, Ming Ding, Lihang Pan, Xiaotao Gu, Aohan Zeng , et al. (5 additional authors not shown)

Abstract: Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, potentially approaching general artificial intelligence. However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMM… ▽ More Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, potentially approaching general artificial intelligence. However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs in complex, real-world environments. To address this gap, we introduce VisualAgentBench (VAB), a comprehensive and pioneering benchmark specifically designed to train and evaluate LMMs as visual foundation agents across diverse scenarios, including Embodied, Graphical User Interface, and Visual Design, with tasks formulated to probe the depth of LMMs' understanding and interaction capabilities. Through rigorous testing across nine proprietary LMM APIs and eight open models, we demonstrate the considerable yet still developing agent capabilities of these models. Additionally, VAB constructs a trajectory training set constructed through hybrid methods including Program-based Solvers, LMM Agent Bootstrapping, and Human Demonstrations, promoting substantial performance improvements in LMMs through behavior cloning. Our work not only aims to benchmark existing models but also provides a solid foundation for future development into visual foundation agents. Code, train \& test data, and part of fine-tuned open LMMs are available at \url{https://github.com/THUDM/VisualAgentBench}. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.06072 [pdf, other]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Authors: Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang

Abstract: We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep… ▽ More We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep fusion between the two modalities. By employing a progressive training technique, CogVideoX is adept at producing coherent, long-duration videos characterized by significant motions. In addition, we develop an effective text-video data processing pipeline that includes various data preprocessing strategies and a video captioning method. It significantly helps enhance the performance of CogVideoX, improving both generation quality and semantic alignment. Results show that CogVideoX demonstrates state-of-the-art performance across both multiple machine metrics and human evaluations. The model weights of both the 3D Causal VAE and CogVideoX are publicly available at https://github.com/THUDM/CogVideo. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.05725 [pdf, other]

Various Features of the X-class White-light Flares in Super Active Region NOAA 13664

Authors: Ying Li, Xiaofeng Liu, Zhichen Jing, Wei Chen, Qiao Li, Yang Su, De-Chao Song, M. D. Ding, Li Feng, Hui Li, Weiqun Gan

Abstract: Super active region NOAA 13664 produced 12 X-class flares (including the largest one, an occulted X8.7 flare, in solar cycle 25 so far) during 2024 May 8-15 and 11 of them are identified as white-light flares. Here we present various features of these X-class white-light flares observed by the White-light Solar Telescope (WST) on board the Advanced Space-based Solar Observatory and the Helioseismi… ▽ More Super active region NOAA 13664 produced 12 X-class flares (including the largest one, an occulted X8.7 flare, in solar cycle 25 so far) during 2024 May 8-15 and 11 of them are identified as white-light flares. Here we present various features of these X-class white-light flares observed by the White-light Solar Telescope (WST) on board the Advanced Space-based Solar Observatory and the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory. It is found that both the white-light emissions at WST 3600 Å (Balmer continuum) and HMI 6173 Å (Paschen continuum) show up in different regions of the sunspot group in these flares, including outside the sunspots and within the penumbra and umbra of the sunspots. They exhibit a point-, ribbon-, loop-, or ejecta-like shape, which can come from flare ribbons (or footpoints), flare loops, and plasma ejecta depending on the perspective view. The white-light duration and relative enhancement are measured and both parameters for 3600 Å emission have greater values than those for 6173 Å emission. It is also found that these white-light emissions are cospatial well with the hard X-ray (HXR) sources in the on-disk flares but have some offsets with the HXR emissions in the off-limb flares. In addition, it is interesting that the 3600 and 6173 Å emissions show different correlations with the peak HXR fluxes, with the former one more sensitive to the HXR emission. All these greatly help us understand the white-light flares of a large magnitude from a super active region on the Sun and also provide important insights into superflares on Sun-like stars. △ Less

Submitted 11 August, 2024; originally announced August 2024.

Comments: Accepted for publication in ApJL. Any comments are welcome

arXiv:2408.02687 [pdf, other]

Compositional Physical Reasoning of Objects and Events from Videos

Authors: Zhenfang Chen, Shilong Dong, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

Abstract: Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects… ▽ More Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects' motion and interactions and predicting corresponding dynamics based on the inferred physical properties. We first introduce the Compositional Physical Reasoning (ComPhy) dataset. For a given set of objects, ComPhy includes limited videos of them moving and interacting under different initial conditions. The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions. Besides the synthetic videos from simulators, we also collect a real-world dataset to show further test physical reasoning abilities of different models. We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties, which leads to inferior performance. We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties from question answering. After training, PCR demonstrates remarkable capabilities. It can detect and associate objects across frames, ground visible and hidden physical properties, make future and counterfactual predictions, and utilize these extracted representations to answer challenging questions. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2205.01089

arXiv:2407.17685 [pdf, ps, other]

On the acyclic quantum cluster algebras with principle coefficients

Authors: Junyuan Huang, Xueqing Chen, Ming Ding, Fan Xu

Abstract: In this paper, we focus on a new lower bound quantum cluster algebra which is generated by the initial quantum cluster variables and the quantum projective cluster variables of an acyclic quantum cluster algebra with principle coefficients. We show that the new lower bound quantum cluster algebra coincides with the corresponding acyclic quantum cluster algebra. Moreover, we establish a class of fo… ▽ More In this paper, we focus on a new lower bound quantum cluster algebra which is generated by the initial quantum cluster variables and the quantum projective cluster variables of an acyclic quantum cluster algebra with principle coefficients. We show that the new lower bound quantum cluster algebra coincides with the corresponding acyclic quantum cluster algebra. Moreover, we establish a class of formulas between these generators, and obtain the dual PBW basis of this algebra. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 23 pages

arXiv:2407.15862 [pdf]

Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis

Authors: Qiuhong Wei, Ying Cui, Mengwei Ding, Yanqin Wang, Lingling Xiang, Zhengxiong Yao, Ceran Chen, Ying Long, Zhezhen Jin, Ximing Xu

Abstract: Large language models (LLMs) have demonstrated potential applications in medicine, yet data privacy and computational burden limit their deployment in healthcare institutions. Open-source and lightweight versions of LLMs emerge as potential solutions, but their performance, particularly in pediatric settings remains underexplored. In this cross-sectional study, 250 patient consultation questions w… ▽ More Large language models (LLMs) have demonstrated potential applications in medicine, yet data privacy and computational burden limit their deployment in healthcare institutions. Open-source and lightweight versions of LLMs emerge as potential solutions, but their performance, particularly in pediatric settings remains underexplored. In this cross-sectional study, 250 patient consultation questions were randomly selected from a public online medical forum, with 10 questions from each of 25 pediatric departments, spanning from December 1, 2022, to October 30, 2023. Two lightweight open-source LLMs, ChatGLM3-6B and Vicuna-7B, along with a larger-scale model, Vicuna-13B, and the widely-used proprietary ChatGPT-3.5, independently answered these questions in Chinese between November 1, 2023, and November 7, 2023. To assess reproducibility, each inquiry was replicated once. We found that ChatGLM3-6B demonstrated higher accuracy and completeness than Vicuna-13B and Vicuna-7B (P < .001), but all were outperformed by ChatGPT-3.5. ChatGPT-3.5 received the highest ratings in accuracy (65.2%) compared to ChatGLM3-6B (41.2%), Vicuna-13B (11.2%), and Vicuna-7B (4.4%). Similarly, in completeness, ChatGPT-3.5 led (78.4%), followed by ChatGLM3-6B (76.0%), Vicuna-13B (34.8%), and Vicuna-7B (22.0%) in highest ratings. ChatGLM3-6B matched ChatGPT-3.5 in readability, both outperforming Vicuna models (P < .001). In terms of empathy, ChatGPT-3.5 outperformed the lightweight LLMs (P < .001). In safety, all models performed comparably well (P > .05), with over 98.4% of responses being rated as safe. Repetition of inquiries confirmed these findings. In conclusion, Lightweight LLMs demonstrate promising application in pediatric healthcare. However, the observed gap between lightweight and large-scale proprietary LLMs underscores the need for continued development efforts. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 27 pages in total with 17 pages of main manuscript and 10 pages of supplementary materials; 4 figures in the main manuscript and 2 figures in supplementary material

MSC Class: 68M20 (Primary) 62G10 (Secondary)

arXiv:2407.15713 [pdf, other]

Inverse problems for coupled nonlocal nonlinear systems arising in mathematical biology

Authors: Ming-Hui Ding, Hongyu Liu, Catharine W. K. Lo

Abstract: In this paper, we propose and study several inverse problems of determining unknown parameters in nonlocal nonlinear coupled PDE systems, including the potentials, nonlinear interaction functions and time-fractional orders. In these coupled systems, we enforce non-negativity of the solutions, aligning with realistic scenarios in biology and ecology. There are several salient features of our invers… ▽ More In this paper, we propose and study several inverse problems of determining unknown parameters in nonlocal nonlinear coupled PDE systems, including the potentials, nonlinear interaction functions and time-fractional orders. In these coupled systems, we enforce non-negativity of the solutions, aligning with realistic scenarios in biology and ecology. There are several salient features of our inverse problem study: the drastic reduction in measurement/observation data due to averaging effects, the nonlinear coupling between multiple equations, and the nonlocality arising from fractional-type derivatives. These factors present significant challenges to our inverse problem, and such inverse problems have never been explored in previous literature. To address these challenges, we develop new and effective schemes. Our approach involves properly controlling the injection of different source terms to obtain multiple sets of mean flux data. This allows us to achieve unique identifiability results and accurately determine the unknown parameters. Finally, we establish a connection between our study and practical applications in biology, further highlighting the relevance of our work in real-world contexts. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: Keywords: inverse problems, partial data measurements, nonlocal coupled parabolic systems, fractional coupled diffusion systems, mathematical biology

MSC Class: 35R30; 35Q92; 35R11; 35K40

arXiv:2407.13921 [pdf, ps, other]

Optimality of the Bussgang Linear MMSE Channel Estimator for MIMO Systems with 1-Bit ADCs

Authors: Minhua Ding, Italo Atzeni, Antti Tölli, A. Lee Swindlehurst

Abstract: In this paper, we study the optimality of the Bussgang linear minimum mean squared error (BLMMSE) channel estimator for multiple-input multiple-output systems with 1-bit analog-to-digital converters. We compare the BLMMSE with the optimal minimum mean squared error (MMSE) channel estimator, which is generally non-linear, and we develop a novel framework based on the orthant probability of a multiv… ▽ More In this paper, we study the optimality of the Bussgang linear minimum mean squared error (BLMMSE) channel estimator for multiple-input multiple-output systems with 1-bit analog-to-digital converters. We compare the BLMMSE with the optimal minimum mean squared error (MMSE) channel estimator, which is generally non-linear, and we develop a novel framework based on the orthant probability of a multivariate normal distribution to compute the MMSE channel estimate. Then, we analyze the equivalence of the MMSE and BLMMSE channel estimators under specific assumptions on the channel correlation or pilot symbols. Interestingly, the BLMMSE channel estimator turns out to be optimal in several specific cases. Our study culminates with the presentation of a necessary and sufficient condition for the BLMMSE channel estimator to be optimal. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Presented at the IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) 2024

arXiv:2407.13134 [pdf]

doi 10.3847/1538-4365/ad5002

LAMA: LAMOST Medium-Resolution Spectral Analysis Pipeline

Authors: Chun-qian Li, Jian-rong Shi, Hong-liang Yan, Zhong-rui Bai, Jiang-tao Wang, Ming-yi Ding

Abstract: The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) has obtained more than 23 million spectra, opening an unprecedented opportunity to study stellar physics, as well as the formation and evolution of our Milky Way. In order to obtain the accurate stellar parameters, we develop a LAMOST Medium-Resolution Spectral Analysis Pipeline (LAMA), which estimates the stellar parameters fr… ▽ More The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) has obtained more than 23 million spectra, opening an unprecedented opportunity to study stellar physics, as well as the formation and evolution of our Milky Way. In order to obtain the accurate stellar parameters, we develop a LAMOST Medium-Resolution Spectral Analysis Pipeline (LAMA), which estimates the stellar parameters from the LAMOST medium-resolution spectra, including the effective temperature (Teff), surface gravity (logg), metallicity ([Fe/H]), radial velocity, and rotational velocity (vsini). LAMA estimates these parameters utilizing the template-matching method. The comparison between our results and those from the high-resolution ones, including APOGEE, GALAH, and PASTEL, shows no obvious bias, indicating the reliability of our results. The accuracy of Teff and [Fe/H] can reach 75 K and 0.12 dex, respectively, for the LAMOST Medium-Resolution Spectroscopic Survey (MRS) spectra with a signal-to-noise ratio higher than 10. For dwarfs, the uncertainty of logg is around 0.17 dex, while, for giants, it ranges from 0.18 to 0.30 dex, with the errors decreasing as logg increases. Using LAMA for the LAMOST-MRS spectra, we estimate the stellar parameters of 497,412 stars. This sample will be very helpful for investigating the formation and evolution of our Galaxy. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 18 pages, 21 figures, 4 tables

Journal ref: ApJS (2024), 273, 18

arXiv:2407.11572 [pdf, other]

doi 10.3847/2041-8213/ad5ffd

Discovery of an Extremely r-process-enhanced Thin-disk Star with [Eu/H] = +0.78

Authors: Xiao-Jin Xie, Jianrong Shi, Hong-Liang Yan, Tian-Yi Chen, Carlos Allende Prieto, Timothy C. Beers, Shuai Liu, Chun-Qian Li, Ming-Yi Ding, Yao-Jia Tang, Ruizhi Zhang, Renjing Xie

Abstract: Highly r-process-enhanced stars are rare and usually metal-poor ([Fe/H] < - 1.0), and mainly populate the Milky Way halo and dwarf galaxies. This study presents the discovery of a relatively bright (V = 12.72), highly r-process-enhanced (r-II) star ([Eu/Fe] = +1.32, [Ba/Eu] = - 0.95), LAMOST J020623.21 + 494127.9. This star was selected from the Large Sky Area Multi-Object Fiber Spectroscopic Tele… ▽ More Highly r-process-enhanced stars are rare and usually metal-poor ([Fe/H] < - 1.0), and mainly populate the Milky Way halo and dwarf galaxies. This study presents the discovery of a relatively bright (V = 12.72), highly r-process-enhanced (r-II) star ([Eu/Fe] = +1.32, [Ba/Eu] = - 0.95), LAMOST J020623.21 + 494127.9. This star was selected from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution (R ~ 7500) spectroscopic survey; follow-up high-resolution (R ~ 25,000) observations were conducted with the High Optical Resolution Spectrograph (HORuS) installed on the Gran Telescopio Canarias (GTC). The stellar parameters (${T_{\rm eff}}$ = 4130 K, $\rm log\,g $ = 1.52, $ \rm[Fe/H] $ = $ - $0.54, $ξ$ = 1.80 $ \rm{km\,{s^{-1}}} $) have been inferred taking into account non-local thermodynamic equilibrium (NLTE) effects. The abundances of [Ce/Fe], [Pr/Fe], and [Nd/Fe] are +0.19, +0.65 and +0.64, respectively, relatively low compared to the Solar r-process pattern normalized to Eu. This star has a high metallicity ([Fe/H] = - 0.54) compared to most other highly r-process-enhanced stars, and has the highest measured abundance ratio of Eu to H ([Eu/H] = +0.78). It is classified as a thin-disk star based on its kinematics, and does not appear to belong to any known stream or dwarf galaxy. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 5 figures, 3 tables

Journal ref: ApJL, 2024, Volume 970, Number 2, L30

arXiv:2407.11214 [pdf, ps, other]

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Authors: George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, Swarat Chaudhuri

Abstract: We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalization… ▽ More We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.04281 [pdf, other]

WOMD-Reasoning: A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning

Authors: Yiheng Li, Chongjian Ge, Chenran Li, Chenfeng Xu, Masayoshi Tomizuka, Chen Tang, Mingyu Ding, Wei Zhan

Abstract: We propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a language annotation dataset built on WOMD, with a focus on describing and reasoning interactions and intentions in driving scenarios. Previous language datasets primarily captured interactions caused by close distances. However, interactions induced by traffic rules and human intentions, which can occur over long distances, are yet… ▽ More We propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a language annotation dataset built on WOMD, with a focus on describing and reasoning interactions and intentions in driving scenarios. Previous language datasets primarily captured interactions caused by close distances. However, interactions induced by traffic rules and human intentions, which can occur over long distances, are yet sufficiently covered, despite being very common and more challenging for prediction or planning models to understand. Therefore, our WOMD-Reasoning focuses extensively on these interactions, providing a total of 409k Q&As for varying types of interactions. Additionally, WOMD-Reasoning presents by far the largest Q&A dataset on real-world driving scenarios, with around 3 million Q&As covering various topics of autonomous driving from map descriptions, motion status descriptions, to narratives and analyses of agents' interactions, behaviors, and intentions. This extensive textual information enables fine-tuning driving-related Large Language Models (LLMs) for a wide range of applications like scene description, prediction, planning, etc. By incorporating interaction and intention language from WOMD-Reasoning, we see significant enhancements in the performance of the state-of-the-art trajectory prediction model, Multipath++, with improvements of 10.14% in $MR_6$ and 6.90% in $minFDE_6$, proving the effectiveness of WOMD-Reasoning. We hope WOMD-Reasoning would empower LLMs in driving to offer better interaction understanding and behavioral reasoning. The dataset is available on https://waymo.com/open/download . △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.01531 [pdf, other]

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

Authors: Yixiao Wang, Yifei Zhang, Mingxiao Huo, Ran Tian, Xiang Zhang, Yichen Xie, Chenfeng Xu, Pengliang Ji, Wei Zhan, Mingyu Ding, Masayoshi Tomizuka

Abstract: The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). B… ▽ More The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). By adopting Mixture of Experts (MoE) within a transformer-based diffusion policy, SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model. SDP not only reduces the burden of active parameters but also facilitates the seamless integration and reuse of experts across various tasks. Extensive experiments on diverse tasks in both simulations and real world show that SDP 1) excels in multitask scenarios with negligible increases in active parameters, 2) prevents forgetting in continual learning of new tasks, and 3) enables efficient task transfer, offering a promising solution for advanced robotic applications. Demos and codes can be found in https://forrest-110.github.io/sparse_diffusion_policy/. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01196 [pdf, other]

Implementation of a scalable universal two-qubit quantum processor with electron and nuclear spins in a trapped ion

Authors: Ji Bian, Teng Liu, Qifeng Lao, Min Ding, Huiyi Zhang, Xinxin Rao, Pengfei Lu, Le Luo

Abstract: Increasing the quantum information processing power with limited number of hosts is vital for achieving quantum advantage. Here we propose a novel scheme that achieves a scalable n-ion-2n-qubit quantum processor utilizing four internal levels of each ion, and experimentally implement a 1-ion-2-qubit universal processor using the valence electron spin and nuclear spin of a single 171Yb+ ion. Fideli… ▽ More Increasing the quantum information processing power with limited number of hosts is vital for achieving quantum advantage. Here we propose a novel scheme that achieves a scalable n-ion-2n-qubit quantum processor utilizing four internal levels of each ion, and experimentally implement a 1-ion-2-qubit universal processor using the valence electron spin and nuclear spin of a single 171Yb+ ion. Fidelities of single-qubit and two-qubit gates are around 0.98 obtained by quantum process tomography. Additionally, the Grover's algorithm is implemented with a successful rate exceeding 0.99. We provide explicit scaling-up protocols based on standard laser-less and laser-based frameworks, and further demonstrate that the electron/nuclear-spin scheme allows less demanding two-qubit entangling gates between different ions. The replacement of some inter-atomic gates by intra-atomic gates could increase the fidelity of some quantum circuits. Our work paves the way towards achieving 2n-times increase in the size of quantum computational Hilbert space with n ions. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.15575 [pdf, ps, other]

Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

Authors: Mucong Ding, Tahseen Rabbani, Bang An, Evan Z Wang, Furong Huang

Abstract: Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational com… ▽ More Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various sampling-based and historical-embedding-based methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketch-based algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensor-sketch (PTS) theory, our framework provides a novel protocol for sketching non-linear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality-sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on large-graph benchmarks demonstrate the scalability and competitive performance of our Sketch-GNNs versus their full-size GNN counterparts. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: NeurIPS 2022

arXiv:2406.15567 [pdf, other]

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Authors: Mucong Ding, Souradip Chakraborty, Vibhu Agrawal, Zora Che, Alec Koppel, Mengdi Wang, Amrit Bedi, Furong Huang

Abstract: Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conc… ▽ More Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation and suffers from distribution shift issues. To address this, we establish that online LLM alignment is underpinned by bilevel optimization. By reducing this formulation to an efficient single-level first-order method (using the reward-policy equivalence), our approach generates new samples and iteratively refines model alignment by exploring responses and regulating preference labels. In doing so, we permit alignment methods to operate in an online and self-improving manner, as well as generalize prior online RLHF methods as special cases. Compared to state-of-the-art iterative RLHF methods, our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 24 pages, 6 figures, 3 tables

arXiv:2406.15073 [pdf, other]

KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning

Authors: Jiahan Chen, Shuhan Qi, Yifan Li, Zeyu Dong, Mingfeng Ding, Yulin Wu, Xuan Wang

Abstract: Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box prope… ▽ More Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box property of RL-based method, the generated database tuning strategies still face the urgent problem of lack explainability. Besides, the redundant parameters in large scale database always make the strategy learning become unstable. This paper proposes KnobTree, an interpertable framework designed for the optimization of database parameter configuration. In this framework, an interpertable database tuning algorithm based on RL-based differentatial tree is proposed, which building a transparent tree-based model to generate explainable database tuning strategies. To address the problem of large-scale parameters, We also introduce a explainable method for parameter importance assessment, by utilizing Shapley Values to identify parameters that have significant impacts on database performance. Experiments conducted on MySQL and Gbase8s databases have verified exceptional transparency and interpretability of the KnobTree model. The good property makes generated strategies can offer practical guidance to algorithm designers and database administrators. Moreover, our approach also slightly outperforms the existing RL-based tuning algorithms in aspects such as throughput, latency, and processing time. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.09644 [pdf, other]

Bridging Electromagnetic and Gravitational Form Factors: Insights from LFHQCD

Authors: Xiaobin Wang, Zanbin Xing, Minghui Ding, Khépani Raya, Lei Chang

Abstract: We propose an efficacious approach to derive the generalized parton distributions for the pion and proton, based upon prior knowledge of their respective parton distribution functions (PDFs). Our method leverages on integral representations of the electromagnetic form factors derived from the light-front holographic QCD (LFHQCD) formalism, coupled with PDFs computed from continuum Schwinger functi… ▽ More We propose an efficacious approach to derive the generalized parton distributions for the pion and proton, based upon prior knowledge of their respective parton distribution functions (PDFs). Our method leverages on integral representations of the electromagnetic form factors derived from the light-front holographic QCD (LFHQCD) formalism, coupled with PDFs computed from continuum Schwinger functional methods at the hadronic scale. Using these techniques, we calculate gravitational form factors and associated mass distributions for each hadron. Remarkably, our calculations yield results that closely match recent lattice QCD simulations conducted near the physical pion mass. This work not only deepens our understanding of hadronic structure but also highlights the efficacy of the LFHQCD approach in modeling fundamental properties of hadrons. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 6 pages, 5 figures

arXiv:2406.09295 [pdf, other]

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

Authors: Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong

Abstract: Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically des… ▽ More Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically designed for emerging Chinese VLMs. This benchmark is meticulously curated from real-world scenarios and Chinese Internet sources, encompassing thirteen specific tasks across three categories, and includes both single-turn and multi-turn dialogue scenarios. Incorporating a prompt rewrite strategy, AlignMMBench encompasses 1,054 images and 4,978 question-answer pairs. To facilitate the evaluation pipeline, we propose CritiqueVLM, a rule-calibrated evaluator that exceeds GPT-4's evaluation ability. Finally, we report the performance of representative VLMs on AlignMMBench, offering insights into the capabilities and limitations of different VLM architectures. All evaluation codes and data are available on https://alignmmbench.github.io. △ Less

Submitted 13 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08523 [pdf, other]

A Plug-and-Play Untrained Neural Network for Full Waveform Inversion in Reconstructing Sound Speed Images of Ultrasound Computed Tomography

Authors: Weicheng Yan, Qiude Zhang, Yun Wu, Zhaohui Liu, Liang Zhou, Mingyue Ding, Ming Yuchi, Wu Qiu

Abstract: Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in… ▽ More Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in USCT. However, traditional FWI for sound speed image reconstruction suffers from high sensitivity to the initial model caused by its strong non-convex nonlinearity, resulting in poor performance when ultrasound signals are at high frequencies. This limitation significantly restricts the application of FWI in the USCT imaging field. In this paper, we propose an untrained neural network (UNN) that can be integrated into the traditional iteration-based FWI framework as an implicit regularization prior. This integration allows for seamless deployment as a plug-and-play module within existing FWI algorithms or their variants. Notably, the proposed UNN method can be trained in an unsupervised fashion, a vital aspect in medical imaging where ground truth data is often unavailable. Evaluations of the numerical simulation and phantom experiment of the breast demonstrate that the proposed UNN improves the robustness of image reconstruction, reduces image artifacts, and achieves great image contrast. To the best of our knowledge, this study represents the first attempt to propose an implicit UNN for FWI in reconstructing sound speed images for USCT. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08035 [pdf, other]

LVBench: An Extreme Long Video Understanding Benchmark

Authors: Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Shiyu Huang, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

Abstract: Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sport… ▽ More Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sports commentary, all of which require comprehension of long videos spanning several hours. To address this gap, we introduce LVBench, a benchmark specifically designed for long video understanding. Our dataset comprises publicly sourced videos and encompasses a diverse set of tasks aimed at long video comprehension and information extraction. LVBench is designed to challenge multimodal models to demonstrate long-term memory and extended comprehension capabilities. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks. Through LVBench, we aim to spur the development of more advanced models capable of tackling the complexities of long video comprehension. Our data and code are publicly available at: https://lvbench.github.io. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07982 [pdf, ps, other]

Quantitative analysis and its applications for Keller-Segel type systems

Authors: Mengyao Ding, Yuzhou Fang, Chao Zhang

Abstract: In this paper, we utilize the De Giorgi iteration to quantitatively analyze the upper bound of solutions for Keller-Segel type systems. The refined upper bound estimate presented here has broad applications in determining large time behaviours of weak solutions and improving the regularity for models involving the $p$-Laplace operator. To demonstrate the applicability of our findings, we investiga… ▽ More In this paper, we utilize the De Giorgi iteration to quantitatively analyze the upper bound of solutions for Keller-Segel type systems. The refined upper bound estimate presented here has broad applications in determining large time behaviours of weak solutions and improving the regularity for models involving the $p$-Laplace operator. To demonstrate the applicability of our findings, we investigate the asymptotic stability of a chemotaxis model with nonlinear signal production and a chemotaxis-Navier-Stokes model with a logistic source. Additionally, within the context of $p$-Laplacian diffusion, we establish Hölder continuity for a chemotaxis-haptotaxis model and a chemotaxis-Stokes model. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07973 [pdf, other]

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

Authors: Shang Wang, Tianqing Zhu, Bo Liu, Ming Ding, Xu Guo, Dayong Ye, Wanlei Zhou, Philip S. Yu

Abstract: With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and se… ▽ More With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and security issues throughout their life cycle, drawing significant academic and industrial attention. Moreover, the risks faced by LLMs differ significantly from those encountered by traditional language models. Given that current surveys lack a clear taxonomy of unique threat models across diverse scenarios, we emphasize the unique privacy and security threats associated with five specific scenarios: pre-training, fine-tuning, retrieval-augmented generation systems, deployment, and LLM-based agents. Addressing the characteristics of each risk, this survey outlines potential threats and countermeasures. Research on attack and defense situations can offer feasible research directions, enabling more areas to benefit from LLMs. △ Less

Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06580 [pdf, other]

Break the Chain: Large Language Models Can be Shortcut Reasoners

Authors: Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang

Abstract: Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integratio… ▽ More Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integration of human-like heuristics and shortcuts into language models (LMs) through "break the chain" strategies. These strategies disrupt traditional CoT processes using controlled variables to assess their efficacy. Additionally, we develop innovative zero-shot prompting strategies that encourage the use of shortcuts, enabling LMs to quickly exploit reasoning clues and bypass detailed procedural steps. Our comprehensive experiments across various LMs, both commercial and open-source, reveal that LMs maintain effective performance with "break the chain" strategies. We also introduce ShortcutQA, a dataset specifically designed to evaluate reasoning through shortcuts, compiled from competitive tests optimized for heuristic reasoning tasks such as forward/backward reasoning and simplification. Our analysis confirms that ShortcutQA not only poses a robust challenge to LMs but also serves as an essential benchmark for enhancing reasoning efficiency in AI. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.03880 [pdf, other]

Memorization in deep learning: A survey

Authors: Jiaheng Wei, Yanjun Zhang, Leo Yu Zhang, Ming Ding, Chao Chen, Kok-Leong Ong, Jun Zhang, Yang Xiang

Abstract: Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model… ▽ More Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model generalization, security, and privacy. This raises critical questions about the nature of generalization in DNNs and their susceptibility to security breaches. In this survey, we present a systematic framework to organize memorization definitions based on the generalization and security/privacy domains and summarize memorization evaluation methods at both the example and model levels. Through a comprehensive literature review, we explore DNN memorization behaviors and their impacts on security and privacy. We also introduce privacy vulnerabilities caused by memorization and the phenomenon of forgetting and explore its connection with memorization. Furthermore, we spotlight various applications leveraging memorization and forgetting mechanisms, including noisy label learning, privacy preservation, and model enhancement. This survey offers the first-in-kind understanding of memorization in DNNs, providing insights into its challenges and opportunities for enhancing AI development while addressing critical ethical concerns. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03758 [pdf]

Phonon heat conduction across slippery interfaces in twisted graphite

Authors: Fuwei Yang, Wenjiang Zhou, Zhibin Zhang, Xuanyu Huang, Jingwen Zhang, Nianjie Liang, Wujuan Yan, Yuxi Wang, Mingchao Ding, Quanlin Guo, Yu Han, Te-Huan Liu, Kaihui Liu, Quanshui Zheng, Bai Song

Abstract: Interlayer rotation in van der Waals (vdW) materials offers great potential for manipulating phonon dynamics and heat flow in advanced electronics with ever higher compactness and power density. However, despite extensive theoretical efforts in recent years, experimental measurements remain scarce especially due to the critical challenges of preparing single-crystalline twisted interfaces and prob… ▽ More Interlayer rotation in van der Waals (vdW) materials offers great potential for manipulating phonon dynamics and heat flow in advanced electronics with ever higher compactness and power density. However, despite extensive theoretical efforts in recent years, experimental measurements remain scarce especially due to the critical challenges of preparing single-crystalline twisted interfaces and probing interfacial thermal transport with sufficient resolution. Here, we exploited the intrinsic twisted interfaces in highly oriented pyrolytic graphite (HOPG). By developing novel experimental schemes based on microfabricated mesas, we managed to achieve simultaneous mechanical characterizations and thermal measurements. In particular, we pushed the HOPG mesas with a microprobe to identify and rotate single-crystalline intrinsic interfaces owing to their slippery nature as is well known in structural superlubricity. Remarkably, we observed over 30-fold suppression of thermal conductance for the slippery interfaces by using epitaxial graphite as a control. Nonetheless, the interfacial conductance remains around 600 $\mathrm{MWm^{-2}K^{-1}}$ which surpasses the highest values for artificially stacked vdW structures by more than five times. Further, atomic simulations revealed the predominant role of the transverse acoustic phonons. Together, our findings highlight a general physical picture that directly correlates interfacial thermal transport with sliding resistance, and lay the foundation for twist-enabled thermal management which are particularly beneficial to twistronics and slidetronics. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02220 [pdf, other]

Stochastic Thermodynamics of Micromagnetics with Spin Torque

Authors: Mingnan Ding, Jun Wu, Xiangjun Xing

Abstract: In this work, we study the stochastic dynamics of micro-magnetics interacting with a spin-current torque. We extend the previously constructed stochastic Landau-Lifshitz equation to the case with spin-current torque, and verify the conditions of detailed balance. Then we construct various thermodynamics quantities such as work and heat, and prove the second law of thermodynamics. Due to the existe… ▽ More In this work, we study the stochastic dynamics of micro-magnetics interacting with a spin-current torque. We extend the previously constructed stochastic Landau-Lifshitz equation to the case with spin-current torque, and verify the conditions of detailed balance. Then we construct various thermodynamics quantities such as work and heat, and prove the second law of thermodynamics. Due to the existence of spin-torque and the asymmetry of the kinetic matrix, a novel effect of entropy pumping shows up. As a consequence, the system may behave as a heat engine which constantly transforms heat into magnetic work. Finally, we derive a fluctuation theorem for the joint probability density function of the pumped entropy and the total work, and verify it using numerical simulations. △ Less

Submitted 5 August, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 7 pages. arXiv admin note: text overlap with arXiv:2404.13612

arXiv:2405.17932 [pdf, ps, other]

Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization

Authors: Xiumei Deng, Jun Li, Kang Wei, Long Shi, Zeihui Xiong, Ming Ding, Wen Chen, Shi Jin, H. Vincent Poor

Abstract: Adaptive moment estimation (Adam), as a Stochastic Gradient Descent (SGD) variant, has gained widespread popularity in federated learning (FL) due to its fast convergence. However, federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead compared to federated SGD (FedSGD) algorithms, which arises from the necessity to transmit both local model updates a… ▽ More Adaptive moment estimation (Adam), as a Stochastic Gradient Descent (SGD) variant, has gained widespread popularity in federated learning (FL) due to its fast convergence. However, federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead compared to federated SGD (FedSGD) algorithms, which arises from the necessity to transmit both local model updates and first and second moment estimates from distributed devices to the centralized server for aggregation. Driven by this issue, we propose a novel sparse FedAdam algorithm called FedAdam-SSM, wherein distributed devices sparsify the updates of local model parameters and moment estimates and subsequently upload the sparse representations to the centralized server. To further reduce the communication overhead, the updates of local model parameters and moment estimates incorporate a shared sparse mask (SSM) into the sparsification process, eliminating the need for three separate sparse masks. Theoretically, we develop an upper bound on the divergence between the local model trained by FedAdam-SSM and the desired model trained by centralized Adam, which is related to sparsification error and imbalanced data distribution. By minimizing the divergence bound between the model trained by FedAdam-SSM and centralized Adam, we optimize the SSM to mitigate the learning performance degradation caused by sparsification error. Additionally, we provide convergence bounds for FedAdam-SSM in both convex and non-convex objective function settings, and investigate the impact of local epoch, learning rate and sparsification ratio on the convergence rate of FedAdam-SSM. Experimental results show that FedAdam-SSM outperforms baselines in terms of convergence rate (over 1.1$\times$ faster than the sparse FedAdam baselines) and test accuracy (over 14.5\% ahead of the quantized FedAdam baselines). △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17914 [pdf, other]

Trustworthy DNN Partition for Blockchain-enabled Digital Twin in Wireless IIoT Networks

Authors: Xiumei Deng, Jun Li, Long Shi, Kang Wei, Ming Ding, Yumeng Shao, Wen Chen, Shi Jin

Abstract: Digital twin (DT) has emerged as a promising solution to enhance manufacturing efficiency in industrial Internet of Things (IIoT) networks. To promote the efficiency and trustworthiness of DT for wireless IIoT networks, we propose a blockchain-enabled DT (B-DT) framework that employs deep neural network (DNN) partitioning technique and reputation-based consensus mechanism, wherein the DTs maintain… ▽ More Digital twin (DT) has emerged as a promising solution to enhance manufacturing efficiency in industrial Internet of Things (IIoT) networks. To promote the efficiency and trustworthiness of DT for wireless IIoT networks, we propose a blockchain-enabled DT (B-DT) framework that employs deep neural network (DNN) partitioning technique and reputation-based consensus mechanism, wherein the DTs maintained at the gateway side execute DNN inference tasks using the data collected from their associated IIoT devices. First, we employ DNN partitioning technique to offload the top-layer DNN inference tasks to the access point (AP) side, which alleviates the computation burden at the gateway side and thereby improves the efficiency of DNN inference. Second, we propose a reputation-based consensus mechanism that integrates Proof of Work (PoW) and Proof of Stake (PoS). Specifically, the proposed consensus mechanism evaluates the off-chain reputation of each AP according to its computation resource contributions to the DNN inference tasks, and utilizes the off-chain reputation as a stake to adjust the block generation difficulty. Third, we formulate a stochastic optimization problem of communication resource (i.e., partition point) and computation resource allocation (i.e., computation frequency of APs for top-layer DNN inference and block generation) to minimize system latency under the time-varying channel state and long-term constraints of off-chain reputation, and solve the problem using Lyapunov optimization method. Experimental results show that the proposed dynamic DNN partitioning and resource allocation (DPRA) algorithm outperforms the baselines in terms of reducing the overall latency while guaranteeing the trustworthiness of the B-DT system. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17583 [pdf, other]

Understanding Forgetting in Continual Learning with Linear Regression

Authors: Meng Ding, Kaiyi Ji, Di Wang, Jinhui Xu

Abstract: Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to catastrophic forgetting, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic… ▽ More Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to catastrophic forgetting, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both underparameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting. Additionally, our findings highlight that an appropriate choice of step size will help mitigate forgetting in both underparameterized and overparameterized settings. To validate our theoretical analysis, we conducted simulation experiments on both linear regression models and Deep Neural Networks (DNNs). Results from these simulations substantiate our theoretical findings. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: To be published in The 41st International Conference on Machine Learning

arXiv:2405.17535 [pdf, other]

Calibrated Dataset Condensation for Faster Hyperparameter Search

Authors: Mucong Ding, Yuancheng Xu, Tahseen Rabbani, Xiaoyu Liu, Brian Gravelle, Teresa Ranadive, Tai-Ching Tuan, Furong Huang

Abstract: Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generali… ▽ More Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of models and speeds up hyperparameter/architecture search for tasks on both images and graphs. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17404 [pdf, other]

Spectral Greedy Coresets for Graph Neural Networks

Authors: Mucong Ding, Yinhan He, Jundong Li, Furong Huang

Abstract: The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfu… ▽ More The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfully applied to speed up GNN training on large graphs, warranting special treatment. This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs (i.e., neighborhood subgraphs around a node) based on their spectral embeddings. We decompose the coreset selection problem for GNNs into two phases: a coarse selection of widely spread ego graphs and a refined selection to diversify their topologies. We design a greedy algorithm that approximately optimizes both objectives. Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs. Extensive experiments on ten datasets demonstrate that SGGC outperforms other coreset methods by a wide margin, generalizes well across GNN architectures, and is much faster than graph condensation. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.13080 [pdf, other]

EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

Authors: Yuwen Qian, Shuchi Wu, Kang Wei, Ming Ding, Di Xiao, Tao Xiang, Chuan Ma, Song Guo

Abstract: Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake… ▽ More Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake a comprehensive investigation into a backdoor attack paradigm, where unscrupulous clients conspire to manipulate the global model, revealing the vulnerability of FSSL to such attacks. In FSL, backdoor attacks typically build a direct association between the backdoor trigger and the target label. In contrast, in FSSL, backdoor attacks aim to alter the global model's representation for images containing the attacker's specified trigger pattern in favor of the attacker's intended target class, which is less straightforward. In this sense, we demonstrate that existing defenses are insufficient to mitigate the investigated backdoor attacks in FSSL, thus finding an effective defense mechanism is urgent. To tackle this issue, we dive into the fundamental mechanism of backdoor attacks on FSSL, proposing the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models. In particular, EmInspector assesses the similarity of embeddings from different local models using a small set of inspection images (e.g., ten images of CIFAR100) without specific requirements on sample distribution or labels. We discover that embeddings from backdoored models tend to cluster together in the embedding space for a given inspection image. Evaluation results show that EmInspector can effectively mitigate backdoor attacks on FSSL across various adversary settings. Our code is avaliable at https://github.com/ShuchiWu/EmInspector. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 18 pages, 12 figures

arXiv:2405.11713 [pdf, other]

Decentralized Privacy Preservation for Critical Connections in Graphs

Authors: Conggai Li, Wei Ni, Ming Ding, Youyang Qu, Jianjun Chen, David Smith, Wenjie Zhang, Thierry Rakotoarivelo

Abstract: Many real-world interconnections among entities can be characterized as graphs. Collecting local graph information with balanced privacy and data utility has garnered notable interest recently. This paper delves into the problem of identifying and protecting critical information of entity connections for individual participants in a graph based on cohesive subgraph searches. This problem has not b… ▽ More Many real-world interconnections among entities can be characterized as graphs. Collecting local graph information with balanced privacy and data utility has garnered notable interest recently. This paper delves into the problem of identifying and protecting critical information of entity connections for individual participants in a graph based on cohesive subgraph searches. This problem has not been addressed in the literature. To address the problem, we propose to extract the critical connections of a queried vertex using a fortress-like cohesive subgraph model known as $p$-cohesion. A user's connections within a fortress are obfuscated when being released, to protect critical information about the user. Novel merit and penalty score functions are designed to measure each participant's critical connections in the minimal $p$-cohesion, facilitating effective identification of the connections. We further propose to preserve the privacy of a vertex enquired by only protecting its critical connections when responding to queries raised by data collectors. We prove that, under the decentralized differential privacy (DDP) mechanism, one's response satisfies $(\varepsilon, δ)$-DDP when its critical connections are protected while the rest remains unperturbed. The effectiveness of our proposed method is demonstrated through extensive experiments on real-life graph datasets. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.08542 [pdf, other]

Industrial Metaverse: Enabling Technologies, Open Problems, and Future Trends

Authors: Shiying Zhang, Jun Li, Long Shi, Ming Ding, Dinh C. Nguyen, Wen Chen, Zhu Han

Abstract: As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspe… ▽ More As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspection, and product testing. However, there lacks of in-depth understanding of the enabling technologies associated with the Industrial Metaverse. This encompasses both the precise industrial scenarios targeted by each technology and the potential migration of technologies developed in other domains to the industrial sector. Driven by this issue, in this article, we conduct a comprehensive survey of the state-of-the-art literature on the Industrial Metaverse. Specifically, we first analyze the advantages of the Metaverse for industrial production. Then, we review a collection of key enabling technologies of the Industrial Metaverse, including blockchain (BC), digital twin (DT), 6G, XR, and artificial intelligence (AI), and analyze how these technologies can support different aspects of industrial production. Subsequently, we present numerous formidable challenges encountered within the Industrial Metaverse, including confidentiality and security concerns, resource limitations, and interoperability constraints. Furthermore, we investigate the extant solutions devised to address them. Finally, we briefly outline several open issues and future research directions of the Industrial Metaverse. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 26 pages, 8 figures

arXiv:2405.07483 [pdf, other]

A Class of Convex Optimization-Based Recursive Algorithms for Identification of Stochastic Systems

Authors: Mingxia Ding, Wenxiao Zhao, Tianshi Chen

Abstract: Focusing on identification, this paper develops a class of convex optimization-based criteria and correspondingly the recursive algorithms to estimate the parameter vector $θ^{*}$ of a stochastic dynamic system. Not only do the criteria include the classical least-squares estimator but also the $L_l=|\cdot|^l, l\geq 1$, the Huber, the Log-cosh, and the Quantile costs as special cases. First, we pr… ▽ More Focusing on identification, this paper develops a class of convex optimization-based criteria and correspondingly the recursive algorithms to estimate the parameter vector $θ^{*}$ of a stochastic dynamic system. Not only do the criteria include the classical least-squares estimator but also the $L_l=|\cdot|^l, l\geq 1$, the Huber, the Log-cosh, and the Quantile costs as special cases. First, we prove that the minimizers of the convex optimization-based criteria converge to $θ^{*}$ with probability one. Second, the recursive algorithms are proposed to find the estimates, which minimize the convex optimization-based criteria, and it is shown that these estimates also converge to the true parameter vector with probability one. Numerical examples are given, justifying the performance of the proposed algorithms including the strong consistency of the estimates, the robustness against outliers in the observations, and higher efficiency in online computation compared with the kernel-based regularization method due to the recursive nature. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06993 [pdf, other]

Robust Model Aggregation for Heterogeneous Federated Learning: Analysis and Optimizations

Authors: Yumeng Shao, Jun Li, Long Shi, Kang Wei, Ming Ding, Qianmu Li, Zengxiang Li, Wen Chen, Shi Jin

Abstract: Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation i… ▽ More Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation incurs a new problem of inconsistency between local updates and global updates. Motivated by the issues of conventional SFL and AFL, we first propose a time-driven SFL (T-SFL) framework for heterogeneous systems. The core idea of T-SFL is that the server aggregates the models from different clients, each with varying numbers of iterations, at regular time intervals. To evaluate the learning performance of T-SFL, we provide an upper bound on the global loss function. Further, we optimize the aggregation weights to minimize the developed upper bound. Then, we develop a discriminative model selection (DMS) algorithm that removes local models from clients whose number of iterations falls below a predetermined threshold. In particular, this algorithm ensures that each client's aggregation weight accurately reflects its true contribution to the global model update, thereby improving the efficiency and robustness of the system. To validate the effectiveness of T-SFL with the DMS algorithm, we conduct extensive experiments using several popular datasets including MNIST, Cifar-10, Fashion-MNIST, and SVHN. The experimental results demonstrate that T-SFL with the DMS algorithm can reduce the latency of conventional SFL by 50\%, while achieving an average 3\% improvement in learning accuracy over state-of-the-art AFL algorithms. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.04312 [pdf, other]

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Authors: Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang

Abstract: Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inferenc… ▽ More Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inference process and handle global dependencies. Building on this module, we adopt the DiT structure for upsampling and develop an infinite super-resolution model capable of upsampling images of various shapes and resolutions. Comprehensive experiments show that our model achieves SOTA performance in generating ultra-high-resolution images in both machine and human evaluation. Compared to commonly used UNet structures, our model can save more than 5x memory when generating 4096*4096 images. The project URL is https://github.com/THUDM/Inf-DiT. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.13845 [pdf, other]

Stochastic thermodynamics of Brownian motion in a flowing fluid

Authors: Jun Wu, Mingnan Ding, Xiangjun Xing

Abstract: We study stochastic thermodynamics of over-damped Brownian motion in a flowing fluid. Unlike some previous works, we treat the effects of the flow field as a non-conservational driving force acting on the Brownian particle. This allows us to apply the theoretical formalism developed in a recent work for general non-conservative Langevin dynamics. We define heat and work both at the trajectory leve… ▽ More We study stochastic thermodynamics of over-damped Brownian motion in a flowing fluid. Unlike some previous works, we treat the effects of the flow field as a non-conservational driving force acting on the Brownian particle. This allows us to apply the theoretical formalism developed in a recent work for general non-conservative Langevin dynamics. We define heat and work both at the trajectory level and at the ensemble level, and prove the second law of thermodynamics explicitly. The entropy production (EP) is decomposed into a housekeeping part and an excess part, both of which are non-negative at the ensemble level. Fluctuation theorems are derived for the housekeeping work, the excess work, and the total work, which are further verified using numerical simulations. A comparison between our theory and an earlier theory by Speck et. al. is also carried out. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 20 pages, 13 figures

arXiv:2404.13612 [pdf, other]

Stochastic Thermodynamics of Micromagnetics

Authors: Mingnan Ding, Jun Wu, Xiangjun Xing

Abstract: In this work, we study the stochastic thermodynamics of micro-magnetic systems. We first formulate the stochastic dynamics of micro-magnetic systems by incorporating noises into Landau-Lifshitz (LL) equation, which describes the irreversible and deterministic dynamics of magnetic moments. The resulting stochastic Landau-Lifshitz (sLL) equation obeys detailed balance, which guarantees that, with th… ▽ More In this work, we study the stochastic thermodynamics of micro-magnetic systems. We first formulate the stochastic dynamics of micro-magnetic systems by incorporating noises into Landau-Lifshitz (LL) equation, which describes the irreversible and deterministic dynamics of magnetic moments. The resulting stochastic Landau-Lifshitz (sLL) equation obeys detailed balance, which guarantees that, with the external field fixed, the system converges to thermodynamic equilibrium with vanishing entropy production and with non-vanishing probability current. We then discuss various thermodynamic variables both at the trajectory level and at the ensemble level, and further establish both the first and the second laws of thermodynamics. Finally, we establish fluctuation theorems, and verify them using numerical simulations. △ Less

Submitted 4 August, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 8 pages

arXiv:2404.09391 [pdf, other]

Privacy at a Price: Exploring its Dual Impact on AI Fairness

Authors: Mengmeng Yang, Ming Ding, Youyang Qu, Wei Ni, David Smith, Thierry Rakotoarivelo

Abstract: The worldwide adoption of machine learning (ML) and deep learning models, particularly in critical sectors, such as healthcare and finance, presents substantial challenges in maintaining individual privacy and fairness. These two elements are vital to a trustworthy environment for learning systems. While numerous studies have concentrated on protecting individual privacy through differential priva… ▽ More The worldwide adoption of machine learning (ML) and deep learning models, particularly in critical sectors, such as healthcare and finance, presents substantial challenges in maintaining individual privacy and fairness. These two elements are vital to a trustworthy environment for learning systems. While numerous studies have concentrated on protecting individual privacy through differential privacy (DP) mechanisms, emerging research indicates that differential privacy in machine learning models can unequally impact separate demographic subgroups regarding prediction accuracy. This leads to a fairness concern, and manifests as biased performance. Although the prevailing view is that enhancing privacy intensifies fairness disparities, a smaller, yet significant, subset of research suggests the opposite view. In this article, with extensive evaluation results, we demonstrate that the impact of differential privacy on fairness is not monotonous. Instead, we observe that the accuracy disparity initially grows as more DP noise (enhanced privacy) is added to the ML process, but subsequently diminishes at higher privacy levels with even more noise. Moreover, implementing gradient clipping in the differentially private stochastic gradient descent ML method can mitigate the negative impact of DP noise on fairness. This mitigation is achieved by moderating the disparity growth through a lower clipping threshold. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08324 [pdf, other]

Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning

Authors: Liwei Wang, Jun Li, Wen Chen, Qingqing Wu, Ming Ding

Abstract: Federated Learning (FL) facilitates collaborative machine learning by training models on local datasets, and subsequently aggregating these local models at a central server. However, the frequent exchange of model parameters between clients and the central server can result in significant communication overhead during the FL training process. To solve this problem, this paper proposes a novel FL f… ▽ More Federated Learning (FL) facilitates collaborative machine learning by training models on local datasets, and subsequently aggregating these local models at a central server. However, the frequent exchange of model parameters between clients and the central server can result in significant communication overhead during the FL training process. To solve this problem, this paper proposes a novel FL framework, the Model Aggregation with Layer Divergence Feedback mechanism (FedLDF). Specifically, we calculate model divergence between the local model and the global model from the previous round. Then through model layer divergence feedback, the distinct layers of each client are uploaded and the amount of data transferred is reduced effectively. Moreover, the convergence bound reveals that the access ratio of clients has a positive correlation with model performance. Simulation results show that our algorithm uploads local models with reduced communication overhead while upholding a superior global model performance. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.06605 [pdf, other]

RoadBEV: Road Surface Reconstruction in Bird's Eye View

Authors: Tong Zhao, Lei Yang, Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Yintao Wei

Abstract: Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to mor… ▽ More Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction. This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo, which estimate road elevation with monocular and stereo images, respectively. The former directly fits elevation values based on voxel features queried from image view, while the latter efficiently recognizes road elevation patterns based on BEV volume representing correlation between left and right voxel features. Insightful analyses reveal their consistence and difference with the perspective view. Experiments on real-world dataset verify the models' effectiveness and superiority. Elevation errors of RoadBEV-mono and RoadBEV-stereo achieve 1.83 cm and 0.50 cm, respectively. Our models are promising for practical road preview, providing essential information for promoting safety and comfort of autonomous vehicles. The code is released at https://github.com/ztsrxh/RoadBEV △ Less

Submitted 7 August, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE TITS https://ieeexplore.ieee.org/document/10618926

Showing 1–50 of 733 results for author: Ding, M