Search | arXiv e-print repository

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Authors: Mervat Abassy, Kareem Elozeiri, Alexander Aziz, Minh Ngoc Ta, Raj Vardhan Tomar, Bimarsha Adhikari, Saad El Dine Ahmed, Yuxia Wang, Osama Mohammed Afzal, Zhuohan Xie, Jonibek Mansurov, Ekaterina Artemova, Vladislav Mikhailov, Rui Xing, Jiahui Geng, Hasan Iqbal, Zain Muhammad Mujahid, Tarek Mahmoud, Akim Tsvigun, Alham Fikri Aji, Artem Shelmanov, Nizar Habash, Iryna Gurevych, Preslav Nakov

Abstract: The widespread accessibility of large language models (LLMs) to the general public has significantly amplified the dissemination of machine-generated texts (MGTs). Advancements in prompt manipulation have exacerbated the difficulty in discerning the origin of a text (human-authored vs machinegenerated). This raises concerns regarding the potential misuse of MGTs, particularly within educational an… ▽ More The widespread accessibility of large language models (LLMs) to the general public has significantly amplified the dissemination of machine-generated texts (MGTs). Advancements in prompt manipulation have exacerbated the difficulty in discerning the origin of a text (human-authored vs machinegenerated). This raises concerns regarding the potential misuse of MGTs, particularly within educational and academic domains. In this paper, we present $\textbf{LLM-DetectAIve}$ -- a system designed for fine-grained MGT detection. It is able to classify texts into four categories: human-written, machine-generated, machine-written machine-humanized, and human-written machine-polished. Contrary to previous MGT detectors that perform binary classification, introducing two additional categories in LLM-DetectiAIve offers insights into the varying degrees of LLM intervention during the text creation. This might be useful in some domains like education, where any LLM intervention is usually prohibited. Experiments show that LLM-DetectAIve can effectively identify the authorship of textual content, proving its usefulness in enhancing integrity in education, academia, and other domains. LLM-DetectAIve is publicly accessible at https://huggingface.co/spaces/raj-tomar001/MGT-New. The video describing our system is available at https://youtu.be/E8eT_bE7k8c. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2407.20523 [pdf, other]

Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing

Authors: Caolu Xu, Zhiyong Chen, Meixia Tao, Wenjun Zhang

Abstract: The immersive nature of the metaverse presents significant challenges for wireless multi-user interactive virtual reality (VR), such as ultra-low latency, high throughput and intensive computing, which place substantial demands on the wireless bandwidth and rendering resources of mobile edge computing (MEC). In this paper, we propose a wireless multi-user interactive VR with edge-device collaborat… ▽ More The immersive nature of the metaverse presents significant challenges for wireless multi-user interactive virtual reality (VR), such as ultra-low latency, high throughput and intensive computing, which place substantial demands on the wireless bandwidth and rendering resources of mobile edge computing (MEC). In this paper, we propose a wireless multi-user interactive VR with edge-device collaborative computing framework to overcome the motion-to-photon (MTP) threshold bottleneck. Specifically, we model the serial-parallel task execution in queues within a foreground and background separation architecture. The rendering indices of background tiles within the prediction window are determined, and both the foreground and selected background tiles are loaded into respective processing queues based on the rendering locations. To minimize the age of sensor information and the power consumption of mobile devices, we optimize rendering decisions and MEC resource allocation subject to the MTP constraint. To address this optimization problem, we design a safe reinforcement learning (RL) algorithm, active queue management-constrained updated projection (AQM-CUP). AQM-CUP constructs an environment suitable for queues, incorporating expired tiles actively discarded in processing buffers into its state and reward system. Experimental results demonstrate that the proposed framework significantly enhances user immersion while reducing device power consumption, and the superiority of the proposed AQM-CUP algorithm over conventional methods in terms of the training convergence and performance metrics. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: submitted to IEEE journal

arXiv:2407.17739 [pdf, other]

doi 10.3847/2041-8213/ad60c7

Observational Evidence for Magnetic Field Amplification in SN 1006

Authors: Moeri Tao, Jun Kataoka, Takaaki Tanaka

Abstract: We report the first observational evidence for magnetic field amplification in the north-east/south-west (NE/SW) shells of supernova remnant SN 1006, one of the most promising sites of cosmic ray (CR) acceleration. In previous studies, the strength of magnetic fields in these shells was estimated to be $B_{\rm SED}$ $\simeq$ 25$μ$G from the spectral energy distribution, where the synchrotron emiss… ▽ More We report the first observational evidence for magnetic field amplification in the north-east/south-west (NE/SW) shells of supernova remnant SN 1006, one of the most promising sites of cosmic ray (CR) acceleration. In previous studies, the strength of magnetic fields in these shells was estimated to be $B_{\rm SED}$ $\simeq$ 25$μ$G from the spectral energy distribution, where the synchrotron emission from relativistic electrons accounted for radio to X-rays, along with the inverse Compton emission extending from the GeV to TeV energy bands. However, the analysis of broadband radio data, ranging from 1.37~GHz to 100~GHz, indicated that the radio spectrum steepened from $α_1 = 0.52 \pm 0.02$ to $α_2 = 1.34 \pm 0.21$ by $Δα$ = 0.85 $\pm$ 0.21. This is naturally interpreted as a cooling break under strong magnetic field of $B_{\rm brk}$ $\ge$ 2~mG. Moreover, the high-resolution MeerKAT image indicated that the width of the radio NE/SW shells was broader than that of the X-ray shell by a factor of only 3$-$20, as measured by Chandra. Such narrow radio shells can be naturally explained if the magnetic field responsible for the radio emissions is $B_{\rm R}$ $\ge$ 2 mG. Assuming that the magnetic field is locally enhanced by a factor of approximately $a$ = 100 along the NE/SW shells, we argue that the filling factor, which is the volume ratio of such a magnetically enhanced region to that of the entire shell, must be as low as approximately $k$ = 2.5$\times$10$^{-5}$. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 6 pages, 4 figures

Journal ref: The Astrophysical Journal Letters, 970:L27 (6pp), 2024 August 1

arXiv:2407.16936 [pdf, ps, other]

Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling

Authors: Wei Guo, Molei Tao, Yongxin Chen

Abstract: We address the outstanding problem of sampling from an unnormalized density that may be non-log-concave and multimodal. To enhance the performance of simple Markov chain Monte Carlo (MCMC) methods, techniques of annealing type have been widely used. However, quantitative theoretical guarantees of these techniques are under-explored. This study takes a first step toward providing a non-asymptotic a… ▽ More We address the outstanding problem of sampling from an unnormalized density that may be non-log-concave and multimodal. To enhance the performance of simple Markov chain Monte Carlo (MCMC) methods, techniques of annealing type have been widely used. However, quantitative theoretical guarantees of these techniques are under-explored. This study takes a first step toward providing a non-asymptotic analysis of annealed MCMC. Specifically, we establish, for the first time, an oracle complexity of $\widetilde{O}\left(\frac{dβ^2{\cal A}^2}{\varepsilon^6}\right)$ for simple annealed Langevin Monte Carlo algorithm to achieve $\varepsilon^2$ accuracy in Kullback-Leibler divergence to the target distribution $π\propto{\rm e}^{-V}$ on $\mathbb{R}^d$ with $β$-smooth potential $V$. Here, ${\cal A}$ represents the action of a curve of probability measures interpolating the target distribution $π$ and a readily sampleable distribution. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.16725 [pdf, other]

Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

Authors: Kai Liu, Zhihang Fu, Chao Chen, Sheng Jin, Ze Chen, Mingyuan Tao, Rongxin Jiang, Jieping Ye

Abstract: The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spur… ▽ More The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary through automatic prompt tuning. Specifically, perceptual contexts perceive the inter-category difference (e.g., cats vs apples) for current classification tasks, while spurious contexts further identify spurious (similar but exactly not) OOD samples for every single category (e.g., cats vs panthers, apples vs peaches). The two contexts hierarchically construct the precise description for a certain category, which is, first roughly classifying a sample to the predicted category and then delicately identifying whether it is truly an ID sample or actually OOD. Moreover, the precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX). One can efficiently extend the set of recognizable categories by simply merging the hierarchical contexts learned under different sub-task settings. And extensive experiments are conducted to demonstrate CATEX's effectiveness, robustness, and category-extensibility. For instance, CATEX consistently surpasses the rivals by a large margin with several protocols on the challenging ImageNet-1K dataset. In addition, we offer new insights on how to efficiently scale up the prompt engineering in vision-language models to recognize thousands of object categories, as well as how to incorporate large language models (like GPT-3) to boost zero-shot applications. Code will be made public soon. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2407.08439 [pdf, other]

A fitted space-time finite element method for an advection-diffusion problem with moving interfaces

Authors: Quang Huy Nguyen, Van Chien Le, Phuong Cuc Hoang, Thi Thanh Mai Ta

Abstract: This paper presents a fitted space-time finite element method for solving a parabolic advection-diffusion problem with a nonstationary interface. The jumping diffusion coefficient gives rise to the discontinuity of the spatial gradient of solution across the interface. We use the Banach-Necas-Babuska theorem to show the well-posedness of the continuous variational problem. A fully discrete finite-… ▽ More This paper presents a fitted space-time finite element method for solving a parabolic advection-diffusion problem with a nonstationary interface. The jumping diffusion coefficient gives rise to the discontinuity of the spatial gradient of solution across the interface. We use the Banach-Necas-Babuska theorem to show the well-posedness of the continuous variational problem. A fully discrete finite-element based scheme is analyzed using the Galerkin method and unstructured fitted meshes. An optimal error estimate is established in a discrete energy norm under appropriate globally low but locally high regularity conditions. Some numerical results corroborate our theoretical results. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 19 pages

arXiv:2407.05289 [pdf, other]

DM-MIMO: Diffusion Models for Robust Semantic Communications over MIMO Channels

Authors: Yiheng Duan, Tong Wu, Zhiyong Chen, Meixia Tao

Abstract: This paper investigates robust semantic communications over multiple-input multiple-output (MIMO) fading channels. Current semantic communications over MIMO channels mainly focus on channel adaptive encoding and decoding, which lacks exploration of signal distribution. To leverage the potential of signal distribution in signal space denoising, we develop a diffusion model over MIMO channels (DM-MI… ▽ More This paper investigates robust semantic communications over multiple-input multiple-output (MIMO) fading channels. Current semantic communications over MIMO channels mainly focus on channel adaptive encoding and decoding, which lacks exploration of signal distribution. To leverage the potential of signal distribution in signal space denoising, we develop a diffusion model over MIMO channels (DM-MIMO), a plugin module at the receiver side in conjunction with singular value decomposition (SVD) based precoding and equalization. Specifically, due to the significant variations in effective noise power over distinct sub-channels, we determine the effective sampling steps accordingly and devise a joint sampling algorithm. Utilizing a three-stage training algorithm, DM-MIMO learns the distribution of the encoded signal, which enables noise elimination over all sub-channels. Experimental results demonstrate that the DM-MIMO effectively reduces the mean square errors (MSE) of the equalized signal and the DM-MIMO semantic communication system (DM-MIMO-JSCC) outperforms the JSCC-based semantic communication system in image reconstruction. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.03994 [pdf, other]

Unlocking the Potential of Model Merging for Low-Resource Languages

Authors: Mingxu Tao, Chen Zhang, Quzhe Huang, Tianyao Ma, Songfang Huang, Dongyan Zhao, Yansong Feng

Abstract: Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combini… ▽ More Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. We use model merging to develop task-solving LLMs for low-resource languages without SFT data in the target languages. Our experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data. Observing performance saturation in model merging with more training tokens, we further analyze the merging process and introduce a slack variable to the model merging algorithm to mitigate the loss of important parameters, thereby enhancing performance. We hope that model merging can benefit more human languages suffering from data scarcity with its higher data efficiency. △ Less

Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02015 [pdf, other]

Robust First and Second-Order Differentiation for Regularized Optimal Transport

Authors: Xingjie Li, Fei Lu, Molei Tao, Felix X. -F. Ye

Abstract: Applications such as unbalanced and fully shuffled regression can be approached by optimizing regularized optimal transport (OT) distances, such as the entropic OT and Sinkhorn distances. A common approach for this optimization is to use a first-order optimizer, which requires the gradient of the OT distance. For faster convergence, one might also resort to a second-order optimizer, which addition… ▽ More Applications such as unbalanced and fully shuffled regression can be approached by optimizing regularized optimal transport (OT) distances, such as the entropic OT and Sinkhorn distances. A common approach for this optimization is to use a first-order optimizer, which requires the gradient of the OT distance. For faster convergence, one might also resort to a second-order optimizer, which additionally requires the Hessian. The computations of these derivatives are crucial for efficient and accurate optimization. However, they present significant challenges in terms of memory consumption and numerical instability, especially for large datasets and small regularization strengths. We circumvent these issues by analytically computing the gradients for OT distances and the Hessian for the entropic OT distance, which was not previously used due to intricate tensor-wise calculations and the complex dependency on parameters within the bi-level loss function. Through analytical derivation and spectral analysis, we identify and resolve the numerical instability caused by the singularity and ill-posedness of a key linear system. Consequently, we achieve scalable and stable computation of the Hessian, enabling the implementation of the stochastic gradient descent (SGD)-Newton methods. Tests on shuffled regression examples demonstrate that the second stage of the SGD-Newton method converges orders of magnitude faster than the gradient descent-only method while achieving significantly more accurate parameter estimations. △ Less

Submitted 2 July, 2024; originally announced July 2024.

MSC Class: 68Q25; 68R10; 68U05

arXiv:2406.17807 [pdf, other]

Enhancing Commentary Strategies for Imperfect Information Card Games: A Study of Large Language Models in Guandan Commentary

Authors: Meiling Tao, Xuechen Liang, Ziyi Wang, Yiling Tao, Tianyu Shi

Abstract: Recent advancements in large language models (LLMs) have unlocked the potential for generating high-quality game commentary. However, producing insightful and engaging commentary for complex games with incomplete information remains a significant challenge. In this paper, we introduce a novel commentary method that combine Reinforcement Learning (RL) and LLMs, tailored specifically for the Chinese… ▽ More Recent advancements in large language models (LLMs) have unlocked the potential for generating high-quality game commentary. However, producing insightful and engaging commentary for complex games with incomplete information remains a significant challenge. In this paper, we introduce a novel commentary method that combine Reinforcement Learning (RL) and LLMs, tailored specifically for the Chinese card game \textit{Guandan}. Our system leverages RL to generate intricate card-playing scenarios and employs LLMs to generate corresponding commentary text, effectively emulating the strategic analysis and narrative prowess of professional commentators. The framework comprises a state commentary guide, a Theory of Mind (ToM)-based strategy analyzer, and a style retrieval module, which seamlessly collaborate to deliver detailed and context-relevant game commentary in the Chinese language environment. We empower LLMs with ToM capabilities and refine both retrieval and information filtering mechanisms. This facilitates the generation of personalized commentary content. Our experimental results showcase the substantial enhancement in performance achieved by the proposed commentary framework when applied to open-source LLMs, surpassing the performance of GPT-4 across multiple evaluation metrics. △ Less

Submitted 3 August, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15354 [pdf, other]

Can Specific THz Fields Induce Collective Base-Flipping in DNA? A Stochastic Averaging and Resonant Enhancement Investigation Based on a New Mesoscopic Model

Authors: Wang Sang Koon, Houman Owhadi, Molei Tao, Tomohiro Yanao

Abstract: We study the metastability, internal frequencies, activation mechanism, energy transfer, and the collective base-flipping in a mesoscopic DNA via resonance with specific electric fields. Our new mesoscopic DNA model takes into account not only the issues of helicity and the coupling of an electric field with the base dipole moments, but also includes environmental effects such as fluid viscosity a… ▽ More We study the metastability, internal frequencies, activation mechanism, energy transfer, and the collective base-flipping in a mesoscopic DNA via resonance with specific electric fields. Our new mesoscopic DNA model takes into account not only the issues of helicity and the coupling of an electric field with the base dipole moments, but also includes environmental effects such as fluid viscosity and thermal noise. And all the parameter values are chosen to best represent the typical values for the opening and closing dynamics of a DNA. Our study shows that while the mesocopic DNA is metastable and robust to environmental effects, it is vulnerable to certain frequencies that could be targeted by specific THz fields for triggering its collective base-flipping dynamics and causing large amplitude separation of base pairs. Based on applying Freidlin-Wentzell method of stochastic averaging and the newly developed theory of resonant enhancement to our mesoscopic DNA model, our semi-analytic estimates show that the required fields should be THz fields with frequencies around 0.28 THz and with amplitudes in the order of 450 kV/cm. These estimates compare well with the experimental data of Titova et al., which have demonstrated that they could affect the function of DNA in human skin tissues by THz pulses with frequencies around 0.5 THz and with a peak electric field at 220 kV/cm. Moreover, our estimates also conform to a number of other experimental results which appeared in the last couple years. △ Less

Submitted 18 March, 2024; originally announced June 2024.

Comments: 37 pages, 8 figures

arXiv:2406.12839 [pdf, other]

Evaluating the design space of diffusion-based generative models

Authors: Yuqing Wang, Ye He, Molei Tao

Abstract: Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a… ▽ More Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a non-asymptotic convergence analysis of denoising score matching under gradient descent. In addition, a refined sampling error analysis for variance exploding models is also provided. The combination of these two results yields a full error analysis, which elucidates (again, but this time theoretically) how to design the training and sampling processes for effective generation. For instance, our theory implies a preference toward noise distribution and loss weighting in training that qualitatively agree with the ones used in [Karras et al. 2022]. It also provides perspectives on the choices of time and variance schedules in sampling: when the score is well trained, the design in [Song et al. 2020] is more preferable, but when it is less trained, the design in [Karras et al. 2022] becomes more preferable. △ Less

Submitted 25 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: Comments are welcome. Out of admiration we titled our paper after EDM, and hoped theorists' humor is not too corny

arXiv:2406.10556 [pdf, other]

Multi-User Semantic Fusion for Semantic Communications over Degraded Broadcast Channels

Authors: Tong Wu, Zhiyong Chen, Meixia Tao, Bin Xia, Wenjun Zhang

Abstract: Degraded broadcast channels (DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. In the proposed method, the transmitter extracts semantic features for two users separately. It then effectively fuse… ▽ More Degraded broadcast channels (DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. In the proposed method, the transmitter extracts semantic features for two users separately. It then effectively fuses these semantic features for broadcasting by leveraging semantic similarity. Unlike traditional allocation of time, power, or bandwidth, the semantic fusion scheme can dynamically control the weight of the semantic features of the two users to balance the performance between the two users. Considering the different channel state information (CSI) of both users over DBC, a DBC-Aware method is developed that embeds the CSI of both users into the joint source-channel coding encoder and fusion module to adapt to the channel. Experimental results show that the proposed system outperforms the traditional broadcasting schemes. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: accepted by China Communications

arXiv:2406.07915 [pdf, ps, other]

Aggregation Design for Personalized Federated Multi-Modal Learning over Wireless Networks

Authors: Benshun Yin, Zhiyong Chen, Meixia Tao

Abstract: Federated Multi-Modal Learning (FMML) is an emerging field that integrates information from different modalities in federated learning to improve the learning performance. In this letter, we develop a parameter scheduling scheme to improve personalized performance and communication efficiency in personalized FMML, considering the non-independent and nonidentically distributed (non-IID) data along… ▽ More Federated Multi-Modal Learning (FMML) is an emerging field that integrates information from different modalities in federated learning to improve the learning performance. In this letter, we develop a parameter scheduling scheme to improve personalized performance and communication efficiency in personalized FMML, considering the non-independent and nonidentically distributed (non-IID) data along with the modality heterogeneity. Specifically, a learning-based approach is utilized to obtain the aggregation coefficients for parameters of different modalities on distinct devices. Based on the aggregation coefficients and channel state, a subset of parameters is scheduled to be uploaded to a server for each modality. Experimental results show that the proposed algorithm can effectively improve the personalized performance of FMML. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: accepted by IEEE Communications Letters

arXiv:2406.01937 [pdf, other]

Cramér-Rao Bound Analysis and Beamforming Design for Integrated Sensing and Communication with Extended Targets

Authors: Yiqiu Wang, Meixia Tao, Shu Sun

Abstract: This paper studies an integrated sensing and communication (ISAC) system, where a multi-antenna base station transmits beamformed signals for joint downlink multi-user communication and radar sensing of an extended target (ET). By considering echo signals as reflections from valid elements on the ET contour, a set of novel Cramér-Rao bounds (CRBs) is derived for parameter estimation of the ET, inc… ▽ More This paper studies an integrated sensing and communication (ISAC) system, where a multi-antenna base station transmits beamformed signals for joint downlink multi-user communication and radar sensing of an extended target (ET). By considering echo signals as reflections from valid elements on the ET contour, a set of novel Cramér-Rao bounds (CRBs) is derived for parameter estimation of the ET, including central range, direction, and orientation. The ISAC transmit beamforming design is then formulated as an optimization problem, aiming to minimize the CRB associated with radar sensing, while satisfying a minimum signal-to-interference-pulse-noise ratio requirement for each communication user, along with a 3-dB beam coverage constraint tailored for the ET. To solve this non-convex problem, we utilize semidefinite relaxation (SDR) and propose a rank-one solution extraction scheme for non-tight relaxation circumstances. To reduce the computation complexity, we further employ an efficient zero-forcing (ZF) based beamforming design, where the sensing task is performed in the null space of communication channels. Numerical results validate the effectiveness of the obtained CRB, revealing the diverse features of CRB for differently shaped ETs. The proposed SDR beamforming design outperforms benchmark designs with lower estimation error and CRB, while the ZF beamforming design greatly improves computation efficiency with minor sensing performance loss. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2312.10641

arXiv:2405.21050 [pdf, other]

Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

Authors: Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris N. Metaxas

Abstract: Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and the… ▽ More Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. Using the Kronecker product and efficient Stiefel optimizers, we achieve parameter-efficient adaptation of orthogonal matrices. We introduce Spectral Orthogonal Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity. Extensive evaluations on text-to-image diffusion models demonstrate SODA's effectiveness, offering a spectrum-aware alternative to existing fine-tuning methods. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20390 [pdf, other]

Quantitative Convergences of Lie Group Momentum Optimizers

Authors: Lingkai Kong, Molei Tao

Abstract: Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly propos… ▽ More Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly proposed. Their convergence rates are explicitly quantified under $L$-smoothness and local strong convexity assumptions. Lie NAG-SC provides acceleration over the momentumless case, i.e. Riemannian gradient descent, but Lie Heavy-Ball does not. When compared to existing accelerated optimizers for general manifolds, both Lie Heavy-Ball and Lie NAG-SC are computationally cheaper and easier to implement, thanks to their utilization of group structure. Only gradient oracle and exponential map are required, but not logarithm map or parallel transport which are computational costly. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.16381 [pdf, other]

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

Authors: Yuchen Zhu, Tianrong Chen, Lingkai Kong, Evangelos A. Theodorou, Molei Tao

Abstract: The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the po… ▽ More The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the position variable between data distribution and a fixed, easy-to-sample distribution. Normally, this would incur further difficulty for manifold data because momentum lives in a space that changes with the position. However, our trivialization technique creates to a new momentum variable that stays in a simple $\textbf{fixed vector space}$. This design, together with a manifold preserving integrator, simplifies implementation and avoids inaccuracies created by approximations such as projections to tangent space and manifold, which were typically used in prior work, hence facilitating generation with high-fidelity and efficiency. The resulting method achieves state-of-the-art performance on protein and RNA torsion angle generation and sophisticated torus datasets. We also, arguably for the first time, tackle the generation of data on high-dimensional Special Orthogonal and Unitary groups, the latter essential for quantum problems. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.06105 [pdf, ps, other]

Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?

Authors: Yutong Hu, Quzhe Huang, Mingxu Tao, Chen Zhang, Yansong Feng

Abstract: Recent studies have shown that Large Language Models (LLMs) have the potential to process extremely long text. Many works only evaluate LLMs' long-text processing ability on the language modeling task, with perplexity (PPL) as the evaluation metric. However, in our study, we find that there is no correlation between PPL and LLMs' long-text understanding ability. Besides, PPL may only reflect the m… ▽ More Recent studies have shown that Large Language Models (LLMs) have the potential to process extremely long text. Many works only evaluate LLMs' long-text processing ability on the language modeling task, with perplexity (PPL) as the evaluation metric. However, in our study, we find that there is no correlation between PPL and LLMs' long-text understanding ability. Besides, PPL may only reflect the model's ability to model local information instead of catching long-range dependency. Therefore, only using PPL to prove the model could process long text is inappropriate. The local focus feature of PPL could also explain some existing phenomena, such as the great extrapolation ability of the position method ALiBi. When evaluating a model's ability in long text, we might pay more attention to PPL's limitation and avoid overly relying on it. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03131 [pdf, other]

WDMoE: Wireless Distributed Large Language Models with Mixture of Experts

Authors: Nan Xue, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Liang Qian, Shuguang Cui, Ping Zhang

Abstract: Large Language Models (LLMs) have achieved significant success in various natural language processing tasks, but how wireless communications can support LLMs has not been extensively studied. In this paper, we propose a wireless distributed LLMs paradigm based on Mixture of Experts (MoE), named WDMoE, deploying LLMs collaboratively across edge servers of base station (BS) and mobile devices in the… ▽ More Large Language Models (LLMs) have achieved significant success in various natural language processing tasks, but how wireless communications can support LLMs has not been extensively studied. In this paper, we propose a wireless distributed LLMs paradigm based on Mixture of Experts (MoE), named WDMoE, deploying LLMs collaboratively across edge servers of base station (BS) and mobile devices in the wireless communications system. Specifically, we decompose the MoE layer in LLMs by deploying the gating network and the preceding neural network layer at BS, while distributing the expert networks across the devices. This arrangement leverages the parallel capabilities of expert networks on distributed devices. Moreover, to overcome the instability of wireless communications, we design an expert selection policy by taking into account both the performance of the model and the end-to-end latency, which includes both transmission delay and inference delay. Evaluations conducted across various LLMs and multiple datasets demonstrate that WDMoE not only outperforms existing models, such as Llama 2 with 70 billion parameters, but also significantly reduces end-to-end latency. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: submitted to IEEE conference

arXiv:2405.03125 [pdf, other]

MambaJSCC: Deep Joint Source-Channel Coding with Visual State Space Model

Authors: Tong Wu, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Wenjun Zhang, Ping Zhang

Abstract: Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the… ▽ More Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the state space, enabling feature extraction and encoding processes to operate with linear complexity. It also incorporates channel state information (CSI) via a newly proposed CSI embedding method. This method deploys a shared CSI encoding module within both the encoder and decoder to encode and inject the CSI into each VSSM-CA block, improving the adaptability of a single model to varying channel conditions. Experimental results show that MambaJSCC not only outperforms Swin Transformer based JSCC (SwinJSCC) but also significantly reduces parameter size, computational overhead, and inference delay (ID). For example, with employing an equal number of the VSSM-CA blocks and the Swin Transformer blocks, MambaJSCC achieves a 0.48 dB gain in peak-signal-to-noise ratio (PSNR) over SwinJSCC while requiring only 53.3% multiply-accumulate operations, 53.8% of the parameters, and 44.9% of ID. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: submitted to IEEE conference

arXiv:2404.06336 [pdf, other]

Quantum State Generation with Structure-Preserving Diffusion Model

Authors: Yuchen Zhu, Tianrong Chen, Evangelos A. Theodorou, Xie Chen, Molei Tao

Abstract: This article considers the generative modeling of the (mixed) states of quantum systems, and an approach based on denoising diffusion model is proposed. The key contribution is an algorithmic innovation that respects the physical nature of quantum states. More precisely, the commonly used density matrix representation of mixed-state has to be complex-valued Hermitian, positive semi-definite, and t… ▽ More This article considers the generative modeling of the (mixed) states of quantum systems, and an approach based on denoising diffusion model is proposed. The key contribution is an algorithmic innovation that respects the physical nature of quantum states. More precisely, the commonly used density matrix representation of mixed-state has to be complex-valued Hermitian, positive semi-definite, and trace one. Generic diffusion models, or other generative methods, may not be able to generate data that strictly satisfy these structural constraints, even if all training data do. To develop a machine learning algorithm that has physics hard-wired in, we leverage mirror diffusion and borrow the physical notion of von Neumann entropy to design a new map, for enabling strict structure-preserving generation. Both unconditional generation and conditional generation via classifier-free guidance are experimentally demonstrated efficacious, the latter enabling the design of new quantum states when generated on unseen labels. △ Less

Submitted 25 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05979 [pdf, other]

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Yaowei Wang, Changsheng Xu

Abstract: Story visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although these models have shown notable progress, there are still three flaws. 1) The unidirectional generation of auto-regressive manner restricts the usability i… ▽ More Story visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although these models have shown notable progress, there are still three flaws. 1) The unidirectional generation of auto-regressive manner restricts the usability in many scenarios. 2) The additional introduced story history encoders bring an extremely high computational cost. 3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly. To these ends, we propose a bidirectional, unified, and efficient framework, namely StoryImager. The StoryImager enhances the storyboard generative ability inherited from the pre-trained text-to-image model for a bidirectional generation. Specifically, we introduce a Target Frame Masking Strategy to extend and unify different story image generation tasks. Furthermore, we propose a Frame-Story Cross Attention Module that decomposes the cross attention for local fidelity and global coherence. Moreover, we design a Contextual Feature Extractor to extract contextual information from the whole storyline. The extensive experimental results demonstrate the excellent performance of our StoryImager. The code is available at https://github.com/tobran/StoryImager. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 17 pages

arXiv:2404.01663 [pdf, other]

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

Authors: Xuechen Liang, Meiling Tao, Yinghui Xia, Tianyu Shi, Jun Wang, JingSong Yang

Abstract: Open large language models (LLMs) have significantly advanced the field of natural language processing, showcasing impressive performance across various tasks.Despite the significant advancements in LLMs, their effective operation still relies heavily on human input to accurately guide the dialogue flow, with agent tuning being a crucial optimization technique that involves human adjustments to th… ▽ More Open large language models (LLMs) have significantly advanced the field of natural language processing, showcasing impressive performance across various tasks.Despite the significant advancements in LLMs, their effective operation still relies heavily on human input to accurately guide the dialogue flow, with agent tuning being a crucial optimization technique that involves human adjustments to the model for better response to such guidance.Addressing this dependency, our work introduces the TinyAgent model, trained on a meticulously curated high-quality dataset. We also present the Collaborative Multi-Agent Tuning (CMAT) framework, an innovative system designed to augment language agent capabilities through adaptive weight updates based on environmental feedback. This framework fosters collaborative learning and real-time adaptation among multiple intelligent agents, enhancing their context-awareness and long-term memory. In this research, we propose a new communication agent framework that integrates multi-agent systems with environmental feedback mechanisms, offering a scalable method to explore cooperative behaviors. Notably, our TinyAgent-7B model exhibits performance on par with GPT-3.5, despite having fewer parameters, signifying a substantial improvement in the efficiency and effectiveness of LLMs. △ Less

Submitted 26 August, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.12012 [pdf, other]

Convergence of Kinetic Langevin Monte Carlo on Lie groups

Authors: Lingkai Kong, Molei Tao

Abstract: Explicit, momentum-based dynamics for optimizing functions defined on Lie groups was recently constructed, based on techniques such as variational optimization and left trivialization. We appropriately add tractable noise to the optimization dynamics to turn it into a sampling dynamics, leveraging the advantageous feature that the trivialized momentum variable is Euclidean despite that the potenti… ▽ More Explicit, momentum-based dynamics for optimizing functions defined on Lie groups was recently constructed, based on techniques such as variational optimization and left trivialization. We appropriately add tractable noise to the optimization dynamics to turn it into a sampling dynamics, leveraging the advantageous feature that the trivialized momentum variable is Euclidean despite that the potential function lives on a manifold. We then propose a Lie-group MCMC sampler, by delicately discretizing the resulting kinetic-Langevin-type sampling dynamics. The Lie group structure is exactly preserved by this discretization. Exponential convergence with explicit convergence rate for both the continuous dynamics and the discrete sampler are then proved under $W_2$ distance. Only compactness of the Lie group and geodesically $L$-smoothness of the potential function are needed. To the best of our knowledge, this is the first convergence result for kinetic Langevin on curved spaces, and also the first quantitative result that requires no convexity or, at least not explicitly, any common relaxation such as isoperimetry. △ Less

Submitted 17 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.07652 [pdf, other]

Harder Tasks Need More Experts: Dynamic Routing in MoE Models

Authors: Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng

Abstract: In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing, which activates a predetermined number of experts regardless of the input's complexity,… ▽ More In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing, which activates a predetermined number of experts regardless of the input's complexity, our method dynamically selects experts based on the confidence level in expert selection for each input. This allows for a more efficient utilization of computational resources, activating more experts for complex tasks requiring advanced reasoning and fewer for simpler tasks. Through extensive evaluations, our dynamic routing method demonstrates substantial improvements over conventional Top-2 routing across various benchmarks, achieving an average improvement of 0.7% with less than 90% activated parameters. Further analysis shows our model dispatches more experts to tasks requiring complex reasoning skills, like BBH, confirming its ability to dynamically allocate computational resources in alignment with the input's complexity. Our findings also highlight a variation in the number of experts needed across different layers of the transformer model, offering insights into the potential for designing heterogeneous MoE frameworks. The code and models are available at https://github.com/ZhenweiAn/Dynamic_MoE. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.17886 [pdf, other]

Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion

Authors: Ye He, Kevin Rojas, Molei Tao

Abstract: This paper considers the problem of sampling from non-logconcave distribution, based on queries of its unnormalized density. It first describes a framework, Diffusion Monte Carlo (DMC), based on the simulation of a denoising diffusion process with its score function approximated by a generic Monte Carlo estimator. DMC is an oracle-based meta-algorithm, where its oracle is the assumed access to sam… ▽ More This paper considers the problem of sampling from non-logconcave distribution, based on queries of its unnormalized density. It first describes a framework, Diffusion Monte Carlo (DMC), based on the simulation of a denoising diffusion process with its score function approximated by a generic Monte Carlo estimator. DMC is an oracle-based meta-algorithm, where its oracle is the assumed access to samples that generate a Monte Carlo score estimator. Then we provide an implementation of this oracle, based on rejection sampling, and this turns DMC into a true algorithm, termed Zeroth-Order Diffusion Monte Carlo (ZOD-MC). We provide convergence analyses by first constructing a general framework, i.e. a performance guarantee for DMC, without assuming the target distribution to be log-concave or satisfying any isoperimetric inequality. Then we prove that ZOD-MC admits an inverse polynomial dependence on the desired sampling accuracy, albeit still suffering from the curse of dimensionality. Consequently, for low dimensional distributions, ZOD-MC is a very efficient sampler, with performance exceeding latest samplers, including also-denoising-diffusion-based RDMC and RS-DMC. Last, we experimentally demonstrate the insensitivity of ZOD-MC to increasingly higher barriers between modes or discontinuity in non-convex potential. △ Less

Submitted 26 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17304 [pdf, ps, other]

Probing Multimodal Large Language Models for Global and Local Semantic Representations

Authors: Mingxu Tao, Quzhe Huang, Kun Xu, Liwei Chen, Yansong Feng, Dongyan Zhao

Abstract: The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving state-of-the-art performance on image-to-text tasks. However, there are few studies exploring which layers of MLLMs make the most effort to the global image informatio… ▽ More The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving state-of-the-art performance on image-to-text tasks. However, there are few studies exploring which layers of MLLMs make the most effort to the global image information, which plays vital roles in multimodal comprehension and generation. In this study, we find that the intermediate layers of models can encode more global semantic information, whose representation vectors perform better on visual-language entailment tasks, rather than the topmost layers. We further probe models regarding local semantic representations through object recognition tasks. We find that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information. Our code and data are released via https://github.com/kobayashikanna01/probing_MLLM_rep. △ Less

Submitted 26 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by LREC-COLING 2024 as a short paper (Camera Ready)

arXiv:2402.16313 [pdf, other]

Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering

Authors: Mingxu Tao, Dongyan Zhao, Yansong Feng

Abstract: Open-ended question answering requires models to find appropriate evidence to form well-reasoned, comprehensive and helpful answers. In practical applications, models also need to engage in extended discussions on potential scenarios closely relevant to the question. With augmentation of retrieval module, open-source Large Language Models (LLMs) can produce coherent answers often with different fo… ▽ More Open-ended question answering requires models to find appropriate evidence to form well-reasoned, comprehensive and helpful answers. In practical applications, models also need to engage in extended discussions on potential scenarios closely relevant to the question. With augmentation of retrieval module, open-source Large Language Models (LLMs) can produce coherent answers often with different focuses, but are still sub-optimal in terms of reliable evidence selection and in-depth question analysis. In this paper, we propose a novel Chain-of-Discussion framework to leverage the synergy among multiple open-source LLMs aiming to provide \textbf{more correct} and \textbf{more comprehensive} answers for open-ended QA, although they are not strong enough individually. Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers. We release our data and code at \url{https://github.com/kobayashikanna01/Chain-of-Discussion}. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Under review

arXiv:2402.10062 [pdf, other]

Optimal Parameter and Neuron Pruning for Out-of-Distribution Detection

Authors: Chao Chen, Zhihang Fu, Kai Liu, Ze Chen, Mingyuan Tao, Jieping Ye

Abstract: For a machine learning model deployed in real world scenarios, the ability of detecting out-of-distribution (OOD) samples is indispensable and challenging. Most existing OOD detection methods focused on exploring advanced training skills or training-free tricks to prevent the model from yielding overconfident confidence score for unknown samples. The training-based methods require expensive traini… ▽ More For a machine learning model deployed in real world scenarios, the ability of detecting out-of-distribution (OOD) samples is indispensable and challenging. Most existing OOD detection methods focused on exploring advanced training skills or training-free tricks to prevent the model from yielding overconfident confidence score for unknown samples. The training-based methods require expensive training cost and rely on OOD samples which are not always available, while most training-free methods can not efficiently utilize the prior information from the training data. In this work, we propose an \textbf{O}ptimal \textbf{P}arameter and \textbf{N}euron \textbf{P}runing (\textbf{OPNP}) approach, which aims to identify and remove those parameters and neurons that lead to over-fitting. The main method is divided into two steps. In the first step, we evaluate the sensitivity of the model parameters and neurons by averaging gradients over all training samples. In the second step, the parameters and neurons with exceptionally large or close to zero sensitivities are removed for prediction. Our proposal is training-free, compatible with other post-hoc methods, and exploring the information from all training data. Extensive experiments are performed on multiple OOD detection tasks and model architectures, showing that our proposed OPNP consistently outperforms the existing methods by a large margin. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted by NeurIPS 2023. 19 pages

Journal ref: NeurIPS 2023

arXiv:2402.03744 [pdf, other]

INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection

Authors: Chao Chen, Kai Liu, Ze Chen, Yi Gu, Yue Wu, Mingyuan Tao, Zhihang Fu, Jieping Ye

Abstract: Knowledge hallucination have raised widespread concerns for the security and reliability of deployed LLMs. Previous efforts in detecting hallucinations have been employed at logit-level uncertainty estimation or language-level self-consistency evaluation, where the semantic information is inevitably lost during the token-decoding procedure. Thus, we propose to explore the dense semantic informatio… ▽ More Knowledge hallucination have raised widespread concerns for the security and reliability of deployed LLMs. Previous efforts in detecting hallucinations have been employed at logit-level uncertainty estimation or language-level self-consistency evaluation, where the semantic information is inevitably lost during the token-decoding procedure. Thus, we propose to explore the dense semantic information retained within LLMs' \textbf{IN}ternal \textbf{S}tates for halluc\textbf{I}nation \textbf{DE}tection (\textbf{INSIDE}). In particular, a simple yet effective \textbf{EigenScore} metric is proposed to better evaluate responses' self-consistency, which exploits the eigenvalues of responses' covariance matrix to measure the semantic consistency/diversity in the dense embedding space. Furthermore, from the perspective of self-consistent hallucination detection, a test time feature clipping approach is explored to truncate extreme activations in the internal states, which reduces overconfident generations and potentially benefits the detection of overconfident hallucinations. Extensive experiments and ablation studies are performed on several popular LLMs and question-answering (QA) benchmarks, showing the effectiveness of our proposal. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Accepted by ICLR-2024

arXiv:2401.15614 [pdf, other]

Liouvillian skin effect in a one-dimensional open many-body quantum system with generalized boundary conditions

Authors: Liang Mao, Xuanpu Yang, Ming-Jie Tao, Haiping Hu, Lei Pan

Abstract: Non-Hermitian skin effect (NHSE), namely that eigenstates of non-Hermitian Hamiltonains are localized at one boundary in the open boundary condition, attracts great interest recently.In this paper, we investigate the skin effect in one-dimensional dissipative quantum many-body systems, which we call the Liouvillian skin effect (LSE). We rigorously identify the existence of LSE for generalized boun… ▽ More Non-Hermitian skin effect (NHSE), namely that eigenstates of non-Hermitian Hamiltonains are localized at one boundary in the open boundary condition, attracts great interest recently.In this paper, we investigate the skin effect in one-dimensional dissipative quantum many-body systems, which we call the Liouvillian skin effect (LSE). We rigorously identify the existence of LSE for generalized boundary conditions by solving the Liouvillian superoperator of an exactly solvable model with the advantage of Bethe ansatz. The LSE is sensitive to boundary conditions where the signature is reflected in eigenfunctions of the system. We confirm that the LSE is fragile to a tiny co-flow boundary hopping with non-Hermitian current but can survive for a counter-flow boundary hopping in the thermodynamic limit. Our work provides a prototypical example of exactly solvable dissipative quantum many-body lattice systems exhibiting LSE for generalized boundary conditions. It can be further extended to other integrable open quantum many-body models. △ Less

Submitted 16 July, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: 15 pages, 5 figures. Comments are welcome

arXiv:2401.15405 [pdf, ps, other]

On Partly Smoothness, Activity Identification and Faster Algorithms of $L_1$ over $L_2$ Minimization

Authors: Min Tao, Xiao-Ping Zhang, Zi-Hao Xia

Abstract: The $L_1/L_2$ norm ratio arose as a sparseness measure and attracted a considerable amount of attention due to three merits: (i) sharper approximations of $L_0$ compared to the $L_1$; (ii) parameter-free and scale-invariant; (iii) more attractive than $L_1$ under highly-coherent matrices. In this paper, we first establish the partly smooth property of $L_1$ over $L_2$ minimization relative to an… ▽ More The $L_1/L_2$ norm ratio arose as a sparseness measure and attracted a considerable amount of attention due to three merits: (i) sharper approximations of $L_0$ compared to the $L_1$; (ii) parameter-free and scale-invariant; (iii) more attractive than $L_1$ under highly-coherent matrices. In this paper, we first establish the partly smooth property of $L_1$ over $L_2$ minimization relative to an active manifold ${\cal M}$ and also demonstrate its prox-regularity property. Second, we reveal that ADMM$_p$ (or ADMM$^+_p$) can identify the active manifold within a finite iterations. This discovery contributes to a deeper understanding of the optimization landscape associated with $L_1$ over $L_2$ minimization. Third, we propose a novel heuristic algorithm framework that combines ADMM$_p$ (or ADMM$^+_p$) with a globalized semismooth Newton method tailored for the active manifold ${\cal M}$. This hybrid approach leverages the strengths of both methods to enhance convergence. Finally, through extensive numerical simulations, we showcase the superiority of our heuristic algorithm over existing state-of-the-art methods for sparse recovery. △ Less

Submitted 27 January, 2024; originally announced January 2024.

arXiv:2401.15344 [pdf, other]

IRS Aided Millimeter-Wave Sensing and Communication: Beam Scanning, Beam Splitting, and Performance Analysis

Authors: Renwang Li, Xiaodan Shao, Shu Sun, Meixia Tao, Rui Zhang

Abstract: Integrated sensing and communication (ISAC) has attracted growing interests for enabling the future 6G wireless networks, due to its capability of sharing spectrum and hardware resources between communication and sensing systems. However, existing works on ISAC usually need to modify the communication protocol to cater for the new sensing performance requirement, which may be difficult to implemen… ▽ More Integrated sensing and communication (ISAC) has attracted growing interests for enabling the future 6G wireless networks, due to its capability of sharing spectrum and hardware resources between communication and sensing systems. However, existing works on ISAC usually need to modify the communication protocol to cater for the new sensing performance requirement, which may be difficult to implement in practice. In this paper, we study a new intelligent reflecting surface (IRS) aided millimeter-wave (mmWave) ISAC system by exploiting the distinct beam scanning operation in mmWave communications to achieve efficient sensing at the same time. First, we propose a two-phase ISAC protocol aided by a semi-passive IRS, consisting of beam scanning and data transmission. Specifically, in the beam scanning phase, the IRS finds the optimal beam for reflecting signals from the base station to a communication user via its passive elements. Meanwhile, the IRS directly estimates the angle of a nearby target based on echo signals from the target using its equipped active sensing element. Then, in the data transmission phase, the sensing accuracy is further improved by leveraging the data signals via possible IRS beam splitting. Next, we derive the achievable rate of the communication user as well as the Cramér-Rao bound and the approximate mean square error of the target angle estimation Finally, extensive simulation results are provided to verify our analysis as well as the effectiveness of the proposed scheme. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: submitted to IEEE TWC

arXiv:2401.09432 [pdf, other]

RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models

Authors: Meiling Tao, Xuechen Liang, Tianyu Shi, Lei Yu, Yiting Xie

Abstract: This study presents RoleCraft-GLM, an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs). RoleCraft-GLM addresses the key issue of lacking personalized interactions in conversational AI, and offers a solution with detailed and emotionally nuanced character portrayals. We contribute a unique conversational dataset that shifts from conventional celebr… ▽ More This study presents RoleCraft-GLM, an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs). RoleCraft-GLM addresses the key issue of lacking personalized interactions in conversational AI, and offers a solution with detailed and emotionally nuanced character portrayals. We contribute a unique conversational dataset that shifts from conventional celebrity-centric characters to diverse, non-celebrity personas, thus enhancing the realism and complexity of language modeling interactions. Additionally, our approach includes meticulous character development, ensuring dialogues are both realistic and emotionally resonant. The effectiveness of RoleCraft-GLM is validated through various case studies, highlighting its versatility and skill in different scenarios. Our framework excels in generating dialogues that accurately reflect characters' personality traits and emotions, thereby boosting user engagement. In conclusion, RoleCraft-GLM marks a significant leap in personalized AI interactions, and paves the way for more authentic and immersive AI-assisted role-playing experiences by enabling more nuanced and emotionally rich dialogues △ Less

Submitted 4 April, 2024; v1 submitted 17 December, 2023; originally announced January 2024.

arXiv:2401.06144 [pdf, other]

DFU: scale-robust diffusion model for zero-shot super-resolution image generation

Authors: Alex Havrilla, Kevin Rojas, Wenjing Liao, Molei Tao

Abstract: Diffusion generative models have achieved remarkable success in generating images with a fixed resolution. However, existing models have limited ability to generalize to different resolutions when training data at those resolutions are not available. Leveraging techniques from operator learning, we present a novel deep-learning architecture, Dual-FNO UNet (DFU), which approximates the score operat… ▽ More Diffusion generative models have achieved remarkable success in generating images with a fixed resolution. However, existing models have limited ability to generalize to different resolutions when training data at those resolutions are not available. Leveraging techniques from operator learning, we present a novel deep-learning architecture, Dual-FNO UNet (DFU), which approximates the score operator by combining both spatial and spectral information at multiple resolutions. Comparisons of DFU to baselines demonstrate its scalability: 1) simultaneously training on multiple resolutions improves FID over training at any single fixed resolution; 2) DFU generalizes beyond its training resolutions, allowing for coherent, high-fidelity generation at higher-resolutions with the same model, i.e. zero-shot super-resolution image-generation; 3) we propose a fine-tuning strategy to further enhance the zero-shot super-resolution image-generation capability of our model, leading to a FID of 11.3 at 1.66 times the maximum training resolution on FFHQ, which no other method can come close to achieving. △ Less

Submitted 22 January, 2024; v1 submitted 30 November, 2023; originally announced January 2024.

arXiv:2401.04335 [pdf]

doi 10.1002/lpor.202301360

SiN-on-SOI Optical Phased Array LiDAR for Ultra-Wide Field of View and 4D Sensing

Authors: Baisong Chen, Yingzhi Li, Qijie Xie, Quanxin Na, Min Tao, Ziming Wang, Zihao Zhi, Heming Hu, Xuetong Li, Huan Qu, Yafang He, Xiaolong Hu, Guoqiang Lo, Junfeng Song

Abstract: Three-dimensional (3D) imaging techniques are facilitating the autonomous vehicles to build intelligent system. Optical phased arrays (OPAs) featured by all solid-state configurations are becoming a promising solution for 3D imaging. However, majority of state-of-art OPAs commonly suffer from severe power degradation at the edge of field of view (FoV), resulting in limited effective FoV and deteri… ▽ More Three-dimensional (3D) imaging techniques are facilitating the autonomous vehicles to build intelligent system. Optical phased arrays (OPAs) featured by all solid-state configurations are becoming a promising solution for 3D imaging. However, majority of state-of-art OPAs commonly suffer from severe power degradation at the edge of field of view (FoV), resulting in limited effective FoV and deteriorating 3D imaging quality. Here, we synergize chained grating antenna and vernier concept to design a novel OPA for realizing a record wide 160°-FoV 3D imaging. By virtue of the chained antenna, the OPA exhibits less than 3-dB beam power variation within the 160° FoV. In addition, two OPAs with different pitch are integrated monolithically to form a quasi-coaxial Vernier OPA transceiver. With the aid of flat beam power profile provided by the chained antennas, the OPA exhibits uniform beam quality at an arbitrary steering angle. The superior beam steering performance enables the OPA to accomplish 160° wide-FoV 3D imaging based on the frequency-modulated continuous-wave (FMCW) LiDAR scheme. The ranging accuracy is 5.5-mm. Moreover, the OPA is also applied to velocity measurement for 4D sensing. To our best knowledge, it is the first experimental implementation of a Vernier OPA LiDAR on 3D imaging to achieve a remarkable FoV. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 18 pages with 13 figures

Journal ref: Laser Photonics Rev 2024, 2301360

arXiv:2401.03511 [pdf, other]

Automated construction of effective potential via algorithmic implicit bias

Authors: Xingjie Helen Li, Molei Tao

Abstract: We introduce a novel approach for decomposing and learning every scale of a given multiscale objective function in $\mathbb{R}^d$, where $d\ge 1$. This approach leverages a recently demonstrated implicit bias of the optimization method of gradient descent by Kong and Tao, which enables the automatic generation of data that nearly follow Gibbs distribution with an effective potential at any desired… ▽ More We introduce a novel approach for decomposing and learning every scale of a given multiscale objective function in $\mathbb{R}^d$, where $d\ge 1$. This approach leverages a recently demonstrated implicit bias of the optimization method of gradient descent by Kong and Tao, which enables the automatic generation of data that nearly follow Gibbs distribution with an effective potential at any desired scale. One application of this automated effective potential modeling is to construct reduced-order models. For instance, a deterministic surrogate Hamiltonian model can be developed to substantially soften the stiffness that bottlenecks the simulation, while maintaining the accuracy of phase portraits at the scale of interest. Similarly, a stochastic surrogate model can be constructed at a desired scale, such that both its equilibrium and out-of-equilibrium behaviors (characterized by auto-correlation function and mean path) align with those of a damped mechanical system with the original multiscale function being its potential. The robustness and efficiency of our proposed approach in multi-dimensional scenarios have been demonstrated through a series of numerical experiments. A by-product of our development is a method for anisotropic noise estimation and calibration. More precisely, Langevin model of stochastic mechanical systems may not have isotropic noise in practice, and we provide a systematic algorithm to quantify its covariance matrix without directly measuring the noise. In this case, the system may not admit closed form expression of its invariant distribution either, but with this tool, we can design friction matrix appropriately to calibrate the system so that its invariant distribution has a closed form expression of Gibbs. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 10 Figures

arXiv:2401.01564 [pdf, other]

Deep Learning Based Superposition Coded Modulation for Hierarchical Semantic Communications over Broadcast Channels

Authors: Yufei Bo, Shuo Shao, Meixia tao

Abstract: We consider multi-user semantic communications over broadcast channels. While most existing works consider that each receiver requires either the same or independent semantic information, this paper explores the scenario where the semantic information desired by different receivers is different but correlated. In particular, we investigate semantic communications over Gaussian broadcast channels w… ▽ More We consider multi-user semantic communications over broadcast channels. While most existing works consider that each receiver requires either the same or independent semantic information, this paper explores the scenario where the semantic information desired by different receivers is different but correlated. In particular, we investigate semantic communications over Gaussian broadcast channels where the transmitter has a common observable source but the receivers wish to recover hierarchical semantic information in adaptation to their channel conditions. Inspired by the capacity achieving property of superposition codes, we propose a deep learning based superposition coded modulation (DeepSCM) scheme. Specifically, the hierarchical semantic information is first extracted and encoded into basic and enhanced feature vectors. A linear minimum mean square error (LMMSE) decorrelator is then developed to obtain a refinement from the enhanced features that is uncorrelated with the basic features. Finally, the basic features and their refinement are superposed for broadcasting after probabilistic modulation. Experiments are conducted for two-receiver image semantic broadcasting with coarse and fine classification as hierarchical semantic tasks. DeepSCM outperforms the benchmarking coded-modulation scheme without a superposition structure, especially with large channel disparity and high order modulation. It also approaches the performance upperbound as if there were only one receiver. △ Less

Submitted 12 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.17428 [pdf, other]

ChangeNet: Multi-Temporal Asymmetric Change Detection Dataset

Authors: Deyi Ji, Siqi Gao, Mingyuan Tao, Hongtao Lu, Feng Zhao

Abstract: Change Detection (CD) has been attracting extensive interests with the availability of bi-temporal datasets. However, due to the huge cost of multi-temporal images acquisition and labeling, existing change detection datasets are small in quantity, short in temporal, and low in practicability. Therefore, a large-scale practical-oriented dataset covering wide temporal phases is urgently needed to fa… ▽ More Change Detection (CD) has been attracting extensive interests with the availability of bi-temporal datasets. However, due to the huge cost of multi-temporal images acquisition and labeling, existing change detection datasets are small in quantity, short in temporal, and low in practicability. Therefore, a large-scale practical-oriented dataset covering wide temporal phases is urgently needed to facilitate the community. To this end, the ChangeNet dataset is presented especially for multi-temporal change detection, along with the new task of "Asymmetric Change Detection". Specifically, ChangeNet consists of 31,000 multi-temporal images pairs, a wide range of complex scenes from 100 cities, and 6 pixel-level annotated categories, which is far superior to all the existing change detection datasets including LEVIR-CD, WHU Building CD, etc.. In addition, ChangeNet contains amounts of real-world perspective distortions in different temporal phases on the same areas, which is able to promote the practical application of change detection algorithms. The ChangeNet dataset is suitable for both binary change detection (BCD) and semantic change detection (SCD) tasks. Accordingly, we benchmark the ChangeNet dataset on six BCD methods and two SCD methods, and extensive experiments demonstrate its challenges and great significance. The dataset is available at https://github.com/jankyee/ChangeNet. △ Less

Submitted 11 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP 2024 Oral/Lecture

arXiv:2312.14341 [pdf, other]

A full splitting algorithm for fractional programs with structured numerators and denominators

Authors: Radu Ioan Boţ, Guoyin Li, Min Tao

Abstract: In this paper, we consider a class of nonconvex and nonsmooth fractional programming problems, which involve the sum of a convex, possibly nonsmooth function composed with a linear operator and a differentiable, possibly nonconvex function in the numerator and a convex, possibly nonsmooth function composed with a linear operator in the denominator. These problems have applications in various field… ▽ More In this paper, we consider a class of nonconvex and nonsmooth fractional programming problems, which involve the sum of a convex, possibly nonsmooth function composed with a linear operator and a differentiable, possibly nonconvex function in the numerator and a convex, possibly nonsmooth function composed with a linear operator in the denominator. These problems have applications in various fields, including CT reconstruction and sparse signal recovery. We propose an adaptive full-splitting proximal subgradient algorithm with an extrapolated step that addresses the challenge of evaluating the composition in the numerator by decoupling the linear operator from the nonsmooth component. We specifically evaluate the nonsmooth function using its proximal operator, while the linear operator is assessed through forward evaluations. Furthermore, the smooth component in the numerator is evaluated through its gradient, the nonsmooth component in the denominator is managed using its subgradient, and the linear operator in the denominator is also assessed through forward evaluations. We demonstrate subsequential convergence toward an approximate lifted stationary point and ensure global convergence under the Kurdyka-Łojasiewicz property, all achieved {\it without relying on any full-row rank assumptions regarding the linear operators}. We further explain the reasoning behind aiming for an approximate lifted stationary point. This is exemplified by constructing a scenario illustrating that the algorithm could diverge when seeking exact solutions. Lastly, we present a practical iteration of the algorithm incorporating a nonmonotone line search, significantly enhancing its convergence performance. Our theoretical findings are validated through simulations involving limited-angle CT reconstruction and the robust sharp ratio minimization problem. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 27 pages, 4 figures

MSC Class: 90C26; 90C32; 49M27; 65K05

arXiv:2312.10641 [pdf, other]

Beamforming Design for Integrated Sensing and Communication with Extended Target

Authors: Yiqiu Wang, Meixia Tao, Shu Sun

Abstract: This paper studies transmit beamforming design in an integrated sensing and communication (ISAC) system, where a base station sends symbols to perform downlink multi-user communication and sense an extended target simultaneously. We first model the extended target contour with truncated Fourier series. By considering echo signals as reflections from the valid elements on the target contour, a nove… ▽ More This paper studies transmit beamforming design in an integrated sensing and communication (ISAC) system, where a base station sends symbols to perform downlink multi-user communication and sense an extended target simultaneously. We first model the extended target contour with truncated Fourier series. By considering echo signals as reflections from the valid elements on the target contour, a novel Cramér-Rao bound (CRB) on the direction estimation of extended target is derived. We then formulate the transmit beamforming design as an optimization problem by minimizing the CRB of radar sensing, and satisfying a minimum signal-to-interference-plus-noise ratio requirement for each communication user, as well as a 3-dB beam coverage requirement tailored for the extended sensing target under a total transmit power constraint. In view of the non-convexity of the above problem, we employ semidefinite relaxation (SDR) technique for convex relaxation, followed by a rank-one extraction scheme for non-tight relaxation circumstances. Numerical results show that the proposed SDR beamforming scheme outperforms benchmark beampattern design methods with lower CRBs for the circumstances considered. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: 8 pages, 3 figures, published to 8th Workshop on Integrated Sensing and Communications for Internet of Things in IEEE Global Communications Conference 2023

arXiv:2312.07817 [pdf, ps, other]

Appropriate State-Dependent Friction Coefficient Accelerates Kinetic Langevin Dynamics

Authors: Keunwoo Lim, Molei Tao

Abstract: We consider the convergence of kinetic Langevin dynamics to its ergodic invariant measure, which is Gibbs distribution. Instead of the standard setup where the friction coefficient is a constant scalar, we investigate position-dependent friction coefficient and the possible accelerated convergence it enables. We show that by choosing this coefficient matrix to be $2\sqrt{\text{Hess}V}$, convergenc… ▽ More We consider the convergence of kinetic Langevin dynamics to its ergodic invariant measure, which is Gibbs distribution. Instead of the standard setup where the friction coefficient is a constant scalar, we investigate position-dependent friction coefficient and the possible accelerated convergence it enables. We show that by choosing this coefficient matrix to be $2\sqrt{\text{Hess}V}$, convergence is accelerated in the sense that no constant scalar friction coefficient can lead to faster convergence for a large subset of (nonlinear) strongly-convex potential $V$'s. The speed of convergence is quantified in terms of chi-square divergence from the target distribution, and proved using a Lyapunov approach, based on viewing sampling as optimization in the infinite dimensional space of probability distributions. △ Less

Submitted 30 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.05786 [pdf, other]

Deep Learning for Joint Design of Pilot, Channel Feedback, and Hybrid Beamforming in FDD Massive MIMO-OFDM Systems

Authors: Junyi Yang, Weifeng Zhu, Shu Sun, Xiaofeng Li, Xingqin Lin, Meixia Tao

Abstract: This letter considers the transceiver design in frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems for high-quality data transmission. We propose a novel deep learning based framework where the procedures of pilot design, channel feedback, and hybrid beamforming are realized by carefully crafted deep neural networ… ▽ More This letter considers the transceiver design in frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems for high-quality data transmission. We propose a novel deep learning based framework where the procedures of pilot design, channel feedback, and hybrid beamforming are realized by carefully crafted deep neural networks. All the considered modules are jointly learned in an end-to-end manner, and a graph neural network is adopted to effectively capture interactions between beamformers based on the built graphical representation. Numerical results validate the effectiveness of our method. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 5 pages, 4 figures, acccpted by IEEE Communication Letters

arXiv:2311.08348 [pdf, other]

MC$^2$: Towards Transparent and Culturally-Aware NLP for Minority Languages in China

Authors: Chen Zhang, Mingxu Tao, Quzhe Huang, Jiuheng Lin, Zhibin Chen, Yansong Feng

Abstract: Current large language models demonstrate deficiencies in understanding low-resource languages, particularly the minority languages in China. This limitation stems from the scarcity of available pre-training data. To address this accessibility challenge, we present MC$^2$, a Multilingual Corpus of Minority Languages in China, which is the largest open-source corpus of its kind so far. MC$^2$ inclu… ▽ More Current large language models demonstrate deficiencies in understanding low-resource languages, particularly the minority languages in China. This limitation stems from the scarcity of available pre-training data. To address this accessibility challenge, we present MC$^2$, a Multilingual Corpus of Minority Languages in China, which is the largest open-source corpus of its kind so far. MC$^2$ includes four underrepresented languages: Tibetan, Uyghur, Kazakh, and Mongolian. Notably, we focus on the less common writing systems of Kazakh and Mongolian, i.e., Kazakh Arabic script and traditional Mongolian script, respectively, which have been long neglected in previous corpus construction efforts. Recognizing the prevalence of language contamination within existing corpora, we adopt a quality-centric solution for collecting MC$^2$, prioritizing accuracy while enhancing diversity. Furthermore, we underscore the importance of attending to the multiplicity of writing systems, which is closely related to the cultural awareness of the resulting models. The MC$^2$ corpus and related models are made public to the community. △ Less

Submitted 13 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: ACL 2024 https://github.com/luciusssss/mc2_corpus

arXiv:2311.06500 [pdf, other]

Knowledge Distillation and Training Balance for Heterogeneous Decentralized Multi-Modal Learning over Wireless Networks

Authors: Benshun Yin, Zhiyong Chen, Meixia Tao

Abstract: Decentralized learning is widely employed for collaboratively training models using distributed data over wireless networks. Existing decentralized learning methods primarily focus on training single-modal networks. For the decentralized multi-modal learning (DMML), the modality heterogeneity and the non-independent and non-identically distributed (non-IID) data across devices make it difficult fo… ▽ More Decentralized learning is widely employed for collaboratively training models using distributed data over wireless networks. Existing decentralized learning methods primarily focus on training single-modal networks. For the decentralized multi-modal learning (DMML), the modality heterogeneity and the non-independent and non-identically distributed (non-IID) data across devices make it difficult for the training model to capture the correlated features across different modalities. Moreover, modality competition can result in training imbalance among different modalities, which can significantly impact the performance of DMML. To improve the training performance in the presence of non-IID data and modality heterogeneity, we propose a novel DMML with knowledge distillation (DMML-KD) framework, which decomposes the extracted feature into the modality-common and the modality-specific components. In the proposed DMML-KD, a generator is applied to learn the global conditional distribution of the modality-common features, thereby guiding the modality-common features of different devices towards the same distribution. Meanwhile, we propose to decrease the number of local iterations for the modalities with fast training speed in DMML-KD to address the imbalanced training. We design a balance metric based on the parameter variation to evaluate the training speed of different modalities in DMML-KD. Using this metric, we optimize the number of local iterations for different modalities on each device under the constraint of remaining energy on devices. Experimental results demonstrate that the proposed DMML-KD with training balance can effectively improve the training performance of DMML. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: submitted to IEEE Trans. on Mobile Computing

arXiv:2311.05103 [pdf, other]

PID-inspired Continuous-time Distributed Optimization

Authors: Meng Tao, Dongdong Yue, Jinde Cao

Abstract: This paper proposes two novel distributed continuous-time algorithms inspired by PID control to solve distributed optimization problems. The algorithms are referred to as first-order and second-order, respectively, depend on the intrinsic dynamics of the agents in the network. Sufficient conditions are derived so that both algorithms converge exponentially over undirected connected graphs. Finally… ▽ More This paper proposes two novel distributed continuous-time algorithms inspired by PID control to solve distributed optimization problems. The algorithms are referred to as first-order and second-order, respectively, depend on the intrinsic dynamics of the agents in the network. Sufficient conditions are derived so that both algorithms converge exponentially over undirected connected graphs. Finally, numerical simulations illustrate the effectiveness and efficiency of the proposed algorithms. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 7 pages, 3 figures, The 49th Annual Conference of the IEEE Industrial Electronics Society

arXiv:2311.02958 [pdf, other]

Optimization of RIS Placement for Satellite-to-Ground Coverage Enhancement

Authors: Xingchen Liu, Liuxun Xue, Shu Sun, Meixia Tao

Abstract: In satellite-to-ground communication, ensuring reliable and efficient connectivity poses significant challenges. The reconfigurable intelligent surface (RIS) offers a promising solution due to its ability to manipulate wireless propagation environments and thus enhance communication performance. In this paper, we propose a method for optimizing the placement of RISs on building facets to improve s… ▽ More In satellite-to-ground communication, ensuring reliable and efficient connectivity poses significant challenges. The reconfigurable intelligent surface (RIS) offers a promising solution due to its ability to manipulate wireless propagation environments and thus enhance communication performance. In this paper, we propose a method for optimizing the placement of RISs on building facets to improve satellite-to-ground communication coverage. We model satellite-to-ground communication with RIS assistance, considering the actual positions of buildings and ground users. The theoretical lower bound on the coverage enhancement in satellite-to-ground communication through large-scale RIS deployment is derived. Then a novel optimization framework for RIS placement is formulated, and a parallel genetic algorithm is employed to solve the problem. Simulation results demonstrate the superior performance of the proposed RIS deployment strategy in enhancing satellite communication coverage probability for non-line-of-sight users. The proposed framework can be applied to various architectural distributions, such as rural areas, towns, and cities, by adjusting parameter settings. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.17087 [pdf, other]

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

Authors: Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao

Abstract: Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases including the edge of stability (Cohen et al., 2021), balancing (Wang et al., 2022), and catapult (Lewkowycz et al., 2020). These phenomena cannot be well explained by classical optimization theory. Though significant theoretical progress has been made in understanding these implicit bi… ▽ More Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases including the edge of stability (Cohen et al., 2021), balancing (Wang et al., 2022), and catapult (Lewkowycz et al., 2020). These phenomena cannot be well explained by classical optimization theory. Though significant theoretical progress has been made in understanding these implicit biases, it remains unclear for which objective functions would they be more likely. This paper provides an initial step in answering this question and also shows that these implicit biases are in fact various tips of the same iceberg. To establish these results, we develop a global convergence theory under large learning rates, for a family of nonconvex functions without globally Lipschitz continuous gradient, which was typically assumed in existing convergence analysis. Specifically, these phenomena are more likely to occur when the optimization objective function has good regularity. This regularity, together with gradient descent using a large learning rate that favors flatter regions, results in these nontrivial dynamical behaviors. Another corollary is the first non-asymptotic convergence rate bound for large-learning-rate gradient descent optimization of nonconvex functions. Although our theory only applies to specific functions so far, the possibility of extrapolating it to neural networks is also experimentally validated, for which different choices of loss, activation functions, and other techniques such as batch normalization can all affect regularity significantly and lead to very different training dynamics. △ Less

Submitted 11 December, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.08233 [pdf, other]

The Impact of Time Step Frequency on the Realism of Robotic Manipulation Simulation for Objects of Different Scales

Authors: Minh Q. Ta, Holly Dinkel, Hameed Abdul-Rashid, Yangfei Dai, Jessica Myers, Tan Chen, Junyi Geng, Timothy Bretl

Abstract: This work evaluates the impact of time step frequency and component scale on robotic manipulation simulation accuracy. Increasing the time step frequency for small-scale objects is shown to improve simulation accuracy. This simulation, demonstrating pre-assembly part picking for two object geometries, serves as a starting point for discussing how to improve Sim2Real transfer in robotic assembly pr… ▽ More This work evaluates the impact of time step frequency and component scale on robotic manipulation simulation accuracy. Increasing the time step frequency for small-scale objects is shown to improve simulation accuracy. This simulation, demonstrating pre-assembly part picking for two object geometries, serves as a starting point for discussing how to improve Sim2Real transfer in robotic assembly processes. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 3 pages, 3 figures, Best Poster Finalist at the 2023 Robotics and AI in Future Factory Workshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Video presentation [https://www.youtube.com/watch?v=JOXrBpMmI0A]. Robotics and AI in Future Factory workshop [https://sites.google.com/view/robot-ai-future-factory/]

Showing 1–50 of 273 results for author: Tao, M