Search | arXiv e-print repository

Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags

Authors: Daiqing Qi, Handong Zhao, Zijun Wei, Sheng Li

Abstract: Despite recent advances in the general visual instruction-following ability of Multimodal Large Language Models (MLLMs), they still struggle with critical problems when required to provide a precise and detailed response to a visual instruction: (1) failure to identify novel objects or entities, (2) mention of non-existent objects, and (3) neglect of object's attributed details. Intuitive solution… ▽ More Despite recent advances in the general visual instruction-following ability of Multimodal Large Language Models (MLLMs), they still struggle with critical problems when required to provide a precise and detailed response to a visual instruction: (1) failure to identify novel objects or entities, (2) mention of non-existent objects, and (3) neglect of object's attributed details. Intuitive solutions include improving the size and quality of data or using larger foundation models. They show effectiveness in mitigating these issues, but at an expensive cost of collecting a vast amount of new data and introducing a significantly larger model. Standing at the intersection of these approaches, we examine the three object-oriented problems from the perspective of the image-to-text mapping process by the multimodal connector. In this paper, we first identify the limitations of multimodal connectors stemming from insufficient training data. Driven by this, we propose to enhance the mapping with retrieval-augmented tag tokens, which contain rich object-aware information such as object names and attributes. With our Tag-grounded visual instruction tuning with retrieval Augmentation (TUNA), we outperform baselines that share the same language model and training data on 12 benchmarks. Furthermore, we show the zero-shot capability of TUNA when provided with specific datastores. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 18 pages, 11 figures

arXiv:2406.05756 [pdf, other]

EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models

Authors: Mengfei Du, Binhao Wu, Zejun Li, Xuanjing Huang, Zhongyu Wei

Abstract: The recent rapid development of Large Vision-Language Models (LVLMs) has indicated their potential for embodied tasks.However, the critical skill of spatial understanding in embodied environments has not been thoroughly evaluated, leaving the gap between current LVLMs and qualified embodied intelligence unknown. Therefore, we construct EmbSpatial-Bench, a benchmark for evaluating embodied spatial… ▽ More The recent rapid development of Large Vision-Language Models (LVLMs) has indicated their potential for embodied tasks.However, the critical skill of spatial understanding in embodied environments has not been thoroughly evaluated, leaving the gap between current LVLMs and qualified embodied intelligence unknown. Therefore, we construct EmbSpatial-Bench, a benchmark for evaluating embodied spatial understanding of LVLMs.The benchmark is automatically derived from embodied scenes and covers 6 spatial relationships from an egocentric perspective.Experiments expose the insufficient capacity of current LVLMs (even GPT-4V). We further present EmbSpatial-SFT, an instruction-tuning dataset designed to improve LVLMs' embodied spatial understanding. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted by ACL 2024 Main

arXiv:2406.05564 [pdf, other]

Automata Extraction from Transformers

Authors: Yihao Zhang, Zeming Wei, Meng Sun

Abstract: In modern machine (ML) learning systems, Transformer-based architectures have achieved milestone success across a broad spectrum of tasks, yet understanding their operational mechanisms remains an open problem. To improve the transparency of ML systems, automata extraction methods, which interpret stateful ML models as automata typically through formal languages, have proven effective for explaini… ▽ More In modern machine (ML) learning systems, Transformer-based architectures have achieved milestone success across a broad spectrum of tasks, yet understanding their operational mechanisms remains an open problem. To improve the transparency of ML systems, automata extraction methods, which interpret stateful ML models as automata typically through formal languages, have proven effective for explaining the mechanism of recurrent neural networks (RNNs). However, few works have been applied to this paradigm to Transformer models. In particular, understanding their processing of formal languages and identifying their limitations in this area remains unexplored. In this paper, we propose an automata extraction algorithm specifically designed for Transformer models. Treating the Transformer model as a black-box system, we track the model through the transformation process of their internal latent representations during their operations, and then use classical pedagogical approaches like L* algorithm to interpret them as deterministic finite-state automata (DFA). Overall, our study reveals how the Transformer model comprehends the structure of formal languages, which not only enhances the interpretability of the Transformer-based ML systems but also marks a crucial step toward a deeper understanding of how ML systems process formal languages. Code and data are available at https://github.com/Zhang-Yihao/Transfomer2DFA. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.03402 [pdf, other]

Mixed-Precision Over-The-Air Federated Learning via Approximated Computing

Authors: Jinsheng Yuan, Zhuangkun Wei, Weisi Guo

Abstract: Over-the-Air Federated Learning (OTA-FL) has been extensively investigated as a privacy-preserving distributed learning mechanism. Realistic systems will see FL clients with diverse size, weight, and power configurations. A critical research gap in existing OTA-FL research is the assumption of homogeneous client computational bit precision. Indeed, many clients may exploit approximate computing (A… ▽ More Over-the-Air Federated Learning (OTA-FL) has been extensively investigated as a privacy-preserving distributed learning mechanism. Realistic systems will see FL clients with diverse size, weight, and power configurations. A critical research gap in existing OTA-FL research is the assumption of homogeneous client computational bit precision. Indeed, many clients may exploit approximate computing (AxC) where bit precisions are adjusted for energy and computational efficiency. The dynamic distribution of bit precision updates amongst FL clients poses an open challenge for OTA-FL, as is is incompatible in the wireless modulation superposition space. Here, we propose an AxC-based OTA-FL framework of clients with multiple precisions, demonstrating the following innovations: (i) optimize the quantization-performance trade-off for both server and clients within the constraints of varying edge computing capabilities and learning accuracy requirements, and (ii) develop heterogeneous gradient resolution OTA-FL modulation schemes to ensure compatibility with physical layer OTA aggregation. Our findings indicate that we can design modulation schemes that enable AxC based OTA-FL, which can achieve 50\% faster and smoother server convergence and a performance enhancement for the lowest precision clients compared to a homogeneous precision approach. This demonstrates the great potential of our AxC-based OTA-FL approach in heterogeneous edge computing environments. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02430 [pdf, other]

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01195 [pdf, other]

C$^3$P-VoxelMap: Compact, Cumulative and Coalescible Probabilistic Voxel Mapping

Authors: Xu Yang, Wenhao Li, Qijie Ge, Lulu Suo, Weijie Tang, Zhengyu Wei, Longxiang Huang, Bo Wang

Abstract: This work presents a compact, cumulative and coalescible probabilistic voxel mapping method to enhance performance, accuracy and memory efficiency in LiDAR odometry. Probabilistic voxel mapping requires storing past point clouds and re-iterating on them to update the uncertainty every iteration, which consumes large memory space and CPU cycles. To solve this problem, we propose a two-folded strate… ▽ More This work presents a compact, cumulative and coalescible probabilistic voxel mapping method to enhance performance, accuracy and memory efficiency in LiDAR odometry. Probabilistic voxel mapping requires storing past point clouds and re-iterating on them to update the uncertainty every iteration, which consumes large memory space and CPU cycles. To solve this problem, we propose a two-folded strategy. First, we introduce a compact point-free representation for probabilistic voxels and derive a cumulative update of the planar uncertainty without caching original point clouds. Our voxel structure only keeps track of a predetermined set of statistics for points that lie inside it. This method reduces the runtime complexity from $O(MN)$ to $O(N)$ and the space complexity from $O(N)$ to $O(1)$ where $M$ is the number of iterations and $N$ is the number of points. Second, to further minimize memory usage and enhance mapping accuracy, we provide a strategy to dynamically merge voxels associated with the same physical planes by taking advantage of the geometric features in the real world. Rather than scanning for these coalescible voxels constantly at every iteration, our merging strategy accumulates voxels in a locality-sensitive hash and triggers merging lazily. On-demand merging not only reduces memory footprint with minimal computational overhead but also improves localization accuracy thanks to cross-voxel denoising. Experiments exhibit 20% higher accuracy, 20% faster performance and 70% lower memory consumption than the state-of-the-art. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01027 [pdf, other]

PRICE: A Pretrained Model for Cross-Database Cardinality Estimation

Authors: Tianjing Zeng, Junwei Lan, Jiahong Ma, Wenqing Wei, Rong Zhu, Pengfei Li, Bolin Ding, Defu Lian, Zhewei Wei, Jingren Zhou

Abstract: Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs and lack of transferability across databases. In this paper, we propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations. PRICE takes low-level but transferable features… ▽ More Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs and lack of transferability across databases. In this paper, we propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations. PRICE takes low-level but transferable features w.r.t. data distributions and query information and elegantly applies self-attention models to learn meta-knowledge to compute cardinality in any database. It is generally applicable to any unseen new database to attain high estimation accuracy, while its preparation cost is as little as the basic one-dimensional histogram-based CardEst methods. Moreover, PRICE can be finetuned to further enhance its performance on any specific database. We pretrained PRICE using 30 diverse datasets, completing the process in about 5 hours with a resulting model size of only about 40MB. Evaluations show that PRICE consistently outperforms existing methods, achieving the highest estimation accuracy on several unseen databases and generating faster execution plans with lower overhead. After finetuning with a small volume of databasespecific queries, PRICE could even find plans very close to the optimal ones. Meanwhile, PRICE is generally applicable to different settings such as data updates, data scaling, and query workload shifts. We have made all of our data and codes publicly available at https://github.com/StCarmen/PRICE. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20742 [pdf, ps, other]

Terahertz emission from mutually synchronized standalone Bi2Sr2CaCu2O8+x intrinsic-Josephson-junction stacks

Authors: Raphael Wieland, Olcay Kizilaslan, Nickolay Kinev, Eric Dorsch, Stefan Guénon, Ziyu Song, Zihan Wei, Huabing Wang, Peiheng Wu, Dieter Koelle, Valery P. Koshelets, Reinhold Kleiner

Abstract: Suitably patterned single crystals made of the cuprate superconductor Bi$_2$Sr$_2$CaCu$_2$O$_{8+x}$ (BSCCO), intrinsically forming a stack of Josephson junctions, can generate electromagnetic radiation in the lower terahertz regime. Due to Joule heating the emission power of single stacks seems to be limited to values below 100 $μ$W. To increase the radiation power, mutually synchronized arrays si… ▽ More Suitably patterned single crystals made of the cuprate superconductor Bi$_2$Sr$_2$CaCu$_2$O$_{8+x}$ (BSCCO), intrinsically forming a stack of Josephson junctions, can generate electromagnetic radiation in the lower terahertz regime. Due to Joule heating the emission power of single stacks seems to be limited to values below 100 $μ$W. To increase the radiation power, mutually synchronized arrays situated on the same BSCCO base crystal have been studied. Mutual electromagnetic interactions via a connecting BSCCO base crystal have been considered essential for synchronization, but the approach still suffers from Joule heating, preventing the synchronization of more than three stacks. In the present paper we show, on the basis of two emitting stacks, that mutual synchronization can also be achieved by stand-alone stacks contacted by gold layers and sharing only a common gold layer. Compared to BSCCO base crystals, the gold layers have a much higher thermal conductivity and their patterning is not very problematic. We analyze our results in detail, showing that the two oscillators exhibit phase correlations over a range of $\pm$0.4 GHz relative to their center frequencies, which we mainly studied between 745 GHz and 765 GHz. However, we also find that strong phase gradients in the beams radiated from both the mutually locked stacks and the unlocked ones play an important role and, presumably, diminish the detected emission power due to destructive interference. We speculate that the effect arises from higher-order cavity modes which are excited in the individual stacks. Our main message is that the mutual interaction provided by a common gold layer may open new possibilities for relaxing the Joule-heating-problem, allowing the synchronization of a higher number of stacks. Our findings may boost attempts to substantially increase the output power levels of the BSCCO terahertz oscillators. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.19788 [pdf]

Unidirectional charge orders induced by oxygen vacancies on SrTiO$_3$(001)

Authors: Cui Ding, Wenfeng Dong, Xiaotong Jiao, Zhiyu Zhang, Guanming Gong, Zhongxu Wei, Lili Wang, Jin-Feng Jia, Qi-Kun Xue

Abstract: The discovery of high-mobility two-dimensional electron gas and low carrier density superconductivity in multiple SrTiO$_3$-based heterostructures has stimulated intense interest in the surface properties of SrTiO$_3$. The recent discovery of high-T$_c$ superconductivity in the monolayer FeSe/SrTiO$_3$ aroused the upsurge and underscored the atomic precision probe of the surface structure. By perf… ▽ More The discovery of high-mobility two-dimensional electron gas and low carrier density superconductivity in multiple SrTiO$_3$-based heterostructures has stimulated intense interest in the surface properties of SrTiO$_3$. The recent discovery of high-T$_c$ superconductivity in the monolayer FeSe/SrTiO$_3$ aroused the upsurge and underscored the atomic precision probe of the surface structure. By performing atomically resolved cryogenic scanning tunneling microscopy/spectroscopy characterization on dual-TiO$_{2}$-$δ$-terminated SrTiO$_3$(001) surfaces with ($\sqrt{13}$ $\times$ $\sqrt{13}$), c(4 $\times$ 2), mixed (2 $\times$ 1), and (2 $\times$ 2) reconstructions, we disclosed universally broken rotational symmetry and contrasting bias- and temperature-dependent electronic states for apical and equatorial oxygen sites. With the sequentially evolved surface reconstructions and simultaneously increasing equatorial oxygen vacancies, the surface anisotropy reduces, and the work function lowers. Intriguingly, unidirectional stripe orders appear on the c(4 $\times$ 2) surface, whereas local (4 $\times$ 4) order emerges and eventually forms long-range unidirectional c(4 $\times$ 4) charge order on the (2 $\times$ 2) surface. This work reveals robust unidirectional charge orders induced by oxygen vacancies due to strong and delicate electronic-lattice interaction under broken rotational symmetry, providing insights into understanding the complex behaviors in perovskite oxide-based heterostructures. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18731 [pdf, other]

VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

Authors: Ziqing Xing, Zhaoyang Zhang, Zirui Chen, Yusong Wang, Haoran Ma, Zhun Wei, Gang Bao

Abstract: Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating upd… ▽ More Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating updates of the total electric field and the contrast in the variational Born iterative method (VBIM) by multiple layers of subnetworks. We embed the calculation of the contrast variation into each of the subnetworks, converting the scattered field residual into an approximate contrast variation and then enhancing it by a U-Net, thus avoiding the requirement of matched measurement dimension and grid resolution as in existing approaches. The total field and contrast of each layer's output is supervised in the loss function of VBIM-Net, which guarantees the physical interpretability of variables of the subnetworks. In addition, we design a training scheme with extra noise to enhance the model's stability. Extensive numerical results on synthetic and experimental data both verify the inversion quality, generalization ability, and robustness of the proposed VBIM-Net. This work may provide some new inspiration for the design of efficient field-type DL schemes. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 14 pages, 21 figures

arXiv:2405.18721 [pdf, other]

doi 10.1109/TPAMI.2024.3407759

Correctable Landmark Discovery via Large Models for Vision-Language Navigation

Authors: Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang

Abstract: Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack s… ▽ More Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios. Code is available at https://github.com/expectorlin/CONSOLE. △ Less

Submitted 5 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by TPAMI 2024

arXiv:2405.18634 [pdf, other]

A Theoretical Understanding of Self-Correction through In-context Alignment

Authors: Yifei Wang, Yuyang Wu, Zeming Wei, Stefanie Jegelka, Yisen Wang

Abstract: Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i.e., correcting previous responses through self-examination, in certain circumstances. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup ak… ▽ More Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i.e., correcting previous responses through self-examination, in certain circumstances. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup akin to an alignment task, we theoretically analyze self-correction from an in-context learning perspective, showing that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way. Notably, going beyond previous theories on over-simplified linear transformers, our theoretical construction underpins the roles of several key designs of realistic transformers for self-correction: softmax attention, multi-head attention, and the MLP block. We validate these findings extensively on synthetic datasets. Inspired by these findings, we also illustrate novel applications of self-correction, such as defending against LLM jailbreaks, where a simple self-correction step does make a large difference. We believe that these findings will inspire further research on understanding, exploiting, and enhancing self-correction for building better foundation models. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16919 [pdf, other]

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Authors: Zejun Li, Ruipu Luo, Jiwen Zhang, Minghui Qiu, Zhongyu Wei

Abstract: While large multi-modal models (LMMs) have exhibited impressive capabilities across diverse tasks, their effectiveness in handling complex tasks has been limited by the prevailing single-step reasoning paradigm. To this end, this paper proposes VoCoT, a multi-step Visually grounded object-centric Chain-of-Thought reasoning framework tailored for inference with LMMs. VoCoT is characterized by two k… ▽ More While large multi-modal models (LMMs) have exhibited impressive capabilities across diverse tasks, their effectiveness in handling complex tasks has been limited by the prevailing single-step reasoning paradigm. To this end, this paper proposes VoCoT, a multi-step Visually grounded object-centric Chain-of-Thought reasoning framework tailored for inference with LMMs. VoCoT is characterized by two key features: (1) object-centric reasoning paths that revolve around cross-modal shared object-level information, and (2) visually grounded representation of object concepts in a multi-modal interleaved and aligned manner, which effectively bridges the modality gap within LMMs during long-term generation. Additionally, we construct an instruction dataset to facilitate LMMs in adapting to reasoning with VoCoT. By introducing VoCoT into the prevalent open-source LMM architecture, we introduce VolCano. With only 7B parameters and limited input resolution, VolCano demonstrates excellent performance across various scenarios, surpassing SOTA models, including GPT-4V, in tasks requiring complex reasoning. Our code, data and model will be available at https://github.com/RupertLuo/VoCoT. △ Less

Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16405 [pdf, other]

Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level

Authors: Runlin Lei, Yuwei Hu, Yuchen Ren, Zhewei Wei

Abstract: Graph Neural Networks (GNNs) excel across various applications but remain vulnerable to adversarial attacks, particularly Graph Injection Attacks (GIAs), which inject malicious nodes into the original graph and pose realistic threats. Text-attributed graphs (TAGs), where nodes are associated with textual features, are crucial due to their prevalence in real-world applications and are commonly used… ▽ More Graph Neural Networks (GNNs) excel across various applications but remain vulnerable to adversarial attacks, particularly Graph Injection Attacks (GIAs), which inject malicious nodes into the original graph and pose realistic threats. Text-attributed graphs (TAGs), where nodes are associated with textual features, are crucial due to their prevalence in real-world applications and are commonly used to evaluate these vulnerabilities. However, existing research only focuses on embedding-level GIAs, which inject node embeddings rather than actual textual content, limiting their applicability and simplifying detection. In this paper, we pioneer the exploration of GIAs at the text level, presenting three novel attack designs that inject textual content into the graph. Through theoretical and empirical analysis, we demonstrate that text interpretability, a factor previously overlooked at the embedding level, plays a crucial role in attack strength. Among the designs we investigate, the Word-frequency-based Text-level GIA (WTGIA) is particularly notable for its balance between performance and interpretability. Despite the success of WTGIA, we discover that defenders can easily enhance their defenses with customized text embedding methods or large language model (LLM)--based predictors. These insights underscore the necessity for further research into the potential and practical significance of text-level GIAs. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 29 pages

arXiv:2405.16398 [pdf, other]

Networked Integrated Sensing and Communications for 6G Wireless Systems

Authors: Jiapeng Li, Xiaodan Shao, Feng Chen, Shaohua Wan, Chang Liu, Zhiqiang Wei, Derrick Wing Kwan Ng

Abstract: Integrated sensing and communication (ISAC) is envisioned as a key pillar for enabling the upcoming sixth generation (6G) communication systems, requiring not only reliable communication functionalities but also highly accurate environmental sensing capabilities. In this paper, we design a novel networked ISAC framework to explore the collaboration among multiple users for environmental sensing. S… ▽ More Integrated sensing and communication (ISAC) is envisioned as a key pillar for enabling the upcoming sixth generation (6G) communication systems, requiring not only reliable communication functionalities but also highly accurate environmental sensing capabilities. In this paper, we design a novel networked ISAC framework to explore the collaboration among multiple users for environmental sensing. Specifically, multiple users can serve as powerful sensors, capturing back scattered signals from a target at various angles to facilitate reliable computational imaging. Centralized sensing approaches are extremely sensitive to the capability of the leader node because it requires the leader node to process the signals sent by all the users. To this end, we propose a two-step distributed cooperative sensing algorithm that allows low-dimensional intermediate estimate exchange among neighboring users, thus eliminating the reliance on the centralized leader node and improving the robustness of sensing. This way, multiple users can cooperatively sense a target by exploiting the block-wise environment sparsity and the interference cancellation technique. Furthermore, we analyze the mean square error of the proposed distributed algorithm as a networked sensing performance metric and propose a beamforming design for the proposed network ISAC scheme to maximize the networked sensing accuracy and communication performance subject to a transmit power constraint. Simulation results validate the effectiveness of the proposed algorithm compared with the state-of-the-art algorithms. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Received by IEEE Internet of Things Journal

arXiv:2405.16357 [pdf, other]

Exploring the Enigma of Neural Dynamics Through A Scattering-Transform Mixer Landscape for Riemannian Manifold

Authors: Tingting Dan, Ziquan Wei, Won Hwa Kim, Guorong Wu

Abstract: The human brain is a complex inter-wired system that emerges spontaneous functional fluctuations. In spite of tremendous success in the experimental neuroscience field, a system-level understanding of how brain anatomy supports various neural activities remains elusive. Capitalizing on the unprecedented amount of neuroimaging data, we present a physics-informed deep model to uncover the coupling m… ▽ More The human brain is a complex inter-wired system that emerges spontaneous functional fluctuations. In spite of tremendous success in the experimental neuroscience field, a system-level understanding of how brain anatomy supports various neural activities remains elusive. Capitalizing on the unprecedented amount of neuroimaging data, we present a physics-informed deep model to uncover the coupling mechanism between brain structure and function through the lens of data geometry that is rooted in the widespread wiring topology of connections between distant brain regions. Since deciphering the puzzle of self-organized patterns in functional fluctuations is the gateway to understanding the emergence of cognition and behavior, we devise a geometric deep model to uncover manifold mapping functions that characterize the intrinsic feature representations of evolving functional fluctuations on the Riemannian manifold. In lieu of learning unconstrained mapping functions, we introduce a set of graph-harmonic scattering transforms to impose the brain-wide geometry on top of manifold mapping functions, which allows us to cast the manifold-based deep learning into a reminiscent of MLP-Mixer architecture (in computer vision) for Riemannian manifold. As a proof-of-concept approach, we explore a neural-manifold perspective to understand the relationship between (static) brain structure and (dynamic) function, challenging the prevailing notion in cognitive neuroscience by proposing that neural activities are essentially excited by brain-wide oscillation waves living on the geometry of human connectomes, instead of being confined to focal areas. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 15 pages, 6 figures

MSC Class: 51H30 ACM Class: I.3.5

arXiv:2405.15349 [pdf, other]

UnKE: Unstructured Knowledge Editing in Large Language Models

Authors: Jingcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

Abstract: Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by l… ▽ More Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by long-form content, noise, and a complex yet comprehensive nature. The "knowledge locating" and "term-driven optimization" techniques conducted from the assumption used in previous methods (e.g., MEMIT) are ill-suited for unstructured knowledge. To address these challenges, we propose a novel unstructured knowledge editing method, namely UnKE, which extends previous assumptions in the layer dimension and token dimension. Firstly, in the layer dimension, we discard the "knowledge locating" step and treat first few layers as the key, which expand knowledge storage through layers to break the "knowledge stored locally" assumption. Next, we replace "term-driven optimization" with "cause-driven optimization" across all inputted tokens in the token dimension, directly optimizing the last layer of the key generator to perform editing to generate the required key vectors. By utilizing key-value pairs at the layer level, UnKE effectively represents and edits complex and comprehensive unstructured knowledge, leveraging the potential of both the MLP and attention layers. Results on newly proposed unstructure knowledge editing dataset (UnKEBench) and traditional structured datasets demonstrate that UnKE achieves remarkable performance, surpassing strong baselines. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14195 [pdf, other]

Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

Authors: Zhenyu Wei, Yujie He, Zhanchuan Cai

Abstract: RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand… ▽ More RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes, through supervised or self-supervised auxiliary Monocular Depth Estimation learning. The outputs of MDETrack's unified feature extractor are fed to the side-by-side tracking head and auxiliary depth estimation head, respectively. The auxiliary module will be discarded in inference, thus keeping the same inference speed. We evaluated our models with various training strategies on multiple datasets, and the results show an improved tracking accuracy even without real depth. Through these findings we highlight the potential of depth estimation in enhancing object tracking performance. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14116 [pdf, other]

doi 10.1109/LRA.2024.3432352

Learning Multimodal Confidence for Intention Recognition in Human-Robot Interaction

Authors: Xiyuan Zhao, Huijun Li, Tianyuan Miao, Xianyi Zhu, Zhikai Wei, Aiguo Song

Abstract: The rapid development of collaborative robotics has provided a new possibility of helping the elderly who has difficulties in daily life, allowing robots to operate according to specific intentions. However, efficient human-robot cooperation requires natural, accurate and reliable intention recognition in shared environments. The current paramount challenge for this is reducing the uncertainty of… ▽ More The rapid development of collaborative robotics has provided a new possibility of helping the elderly who has difficulties in daily life, allowing robots to operate according to specific intentions. However, efficient human-robot cooperation requires natural, accurate and reliable intention recognition in shared environments. The current paramount challenge for this is reducing the uncertainty of multimodal fused intention to be recognized and reasoning adaptively a more reliable result despite current interactive condition. In this work we propose a novel learning-based multimodal fusion framework Batch Multimodal Confidence Learning for Opinion Pool (BMCLOP). Our approach combines Bayesian multimodal fusion method and batch confidence learning algorithm to improve accuracy, uncertainty reduction and success rate given the interactive condition. In particular, the generic and practical multimodal intention recognition framework can be easily extended further. Our desired assistive scenarios consider three modalities gestures, speech and gaze, all of which produce categorical distributions over all the finite intentions. The proposed method is validated with a six-DoF robot through extensive experiments and exhibits high performance compared to baselines. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11936 [pdf, other]

UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization

Authors: Wenjia Xu, Yaxuan Yao, Jiaqi Cao, Zhiwei Wei, Chunbo Liu, Jiuniu Wang, Mugen Peng

Abstract: The application of unmanned aerial vehicles (UAV) has been widely extended recently. It is crucial to ensure accurate latitude and longitude coordinates for UAVs, especially when the global navigation satellite systems (GNSS) are disrupted and unreliable. Existing visual localization methods achieve autonomous visual localization without error accumulation by matching the ground-down view image of… ▽ More The application of unmanned aerial vehicles (UAV) has been widely extended recently. It is crucial to ensure accurate latitude and longitude coordinates for UAVs, especially when the global navigation satellite systems (GNSS) are disrupted and unreliable. Existing visual localization methods achieve autonomous visual localization without error accumulation by matching the ground-down view image of UAV with the ortho satellite maps. However, collecting UAV ground-down view images across diverse locations is costly, leading to a scarcity of large-scale datasets for real-world scenarios. Existing datasets for UAV visual localization are often limited to small geographic areas or are focused only on urban regions with distinct textures. To address this, we define the UAV visual localization task by determining the UAV's real position coordinates on a large-scale satellite map based on the captured ground-down view. In this paper, we present a large-scale dataset, UAV-VisLoc, to facilitate the UAV visual localization task. This dataset comprises images from diverse drones across 11 locations in China, capturing a range of topographical features. The dataset features images from fixed-wing drones and multi-terrain drones, captured at different altitudes and orientations. Our dataset includes 6,742 drone images and 11 satellite maps, with metadata such as latitude, longitude, altitude, and capture date. Our dataset is tailored to support both the training and testing of models by providing a diverse and extensive data. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.10606 [pdf, other]

Carrier Aggregation Enabled MIMO-OFDM Integrated Sensing and Communication

Authors: Haotian Liu, Zhiqing Wei, Jinghui Piao, Huici Wu, Xingwang Li, Zhiyong Feng

Abstract: In the evolution towards the forthcoming era of sixth-generation (6G) mobile communication systems characterized by ubiquitous intelligence, integrated sensing and communication (ISAC) is in a phase of burgeoning development. However, the capabilities of communication and sensing within single frequency band fall short of meeting the escalating demands. To this end, this paper introduces a carrier… ▽ More In the evolution towards the forthcoming era of sixth-generation (6G) mobile communication systems characterized by ubiquitous intelligence, integrated sensing and communication (ISAC) is in a phase of burgeoning development. However, the capabilities of communication and sensing within single frequency band fall short of meeting the escalating demands. To this end, this paper introduces a carrier aggregation (CA)- enabled multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) ISAC system fusing the sensing data on high and low-frequency bands by symbol-level fusion for ultimate communication experience and high-accuracy sensing. The challenges in sensing signal processing introduced by CA include the initial phase misalignment of the echo signals on high and low-frequency bands due to attenuation and radar cross section, and the fusion of the sensing data on high and lowfrequency bands with different physical-layer parameters. To this end, the sensing signal processing is decomposed into two stages. In the first stage, the problem of initial phase misalignment of the echo signals on high and low-frequency bands is solved by the angle compensation, space-domain diversity and vector crosscorrelation operations. In the second stage, this paper realizes symbol-level fusion of the sensing data on high and low-frequency bands through sensing vector rearrangement and cyclic prefix adjustment operations, thereby obtaining high-precision sensing performance. Then, the closed-form communication mutual information (MI) and sensing Cramer-Rao lower bound (CRLB) for the proposed ISAC system are derived to explore the theoretical performance bound with CA. Simulation results validate the feasibility and superiority of the proposed ISAC system. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 13page, 9figures, Submitted to IEEE Transactions on Wireless Communications

arXiv:2405.09179 [pdf, other]

Integrated Sensing and Communication Enabled Cooperative Passive Sensing Using Mobile Communication System

Authors: Zhiqing Wei, Haotian Liu, Hujun Li, Wangjun Jiang, Zhiyong Feng, Huici Wu, Ping Zhang

Abstract: Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sens… ▽ More Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sensing, cooperative passive sensing, and cooperative active and passive sensing, where the multi-BS cooperative passive sensing has the advantages of low hardware modification cost and large sensing coverage. However, multi-BS cooperative passive sensing faces the challenges of synchronization offsets mitigation and sensing information fusion. To address these challenges, a non-line of sight (NLoS) and line of sight (LoS) signal cross-correlation (NLCC) method is proposed to mitigate carrier frequency offset (CFO) and time offset (TO). Besides, a symbol-level multi-BS sensing information fusion method is proposed. The discrete samplings of echo signals from multiple BSs are matched independently and coherent accumulated to improve sensing accuracy. Moreover, a lowcomplexity joint angle-of-arrival (AoA) and angle-of-departure (AoD) estimation method is proposed to reduce the computational complexity. Simulation results show that symbol-level multi-BS cooperative passive sensing scheme has an order of magnitude higher sensing accuracy than single-BS passive sensing. This work provides a reference for the research on multi-BS cooperative passive sensing. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 16 pages, 11 figures, Submitted to IEEE Transactions on Mobile Computing

arXiv:2405.09022 [pdf, other]

doi 10.1109/JIOT.2024.3413687

Multi-Objective Optimization-based Transmit Beamforming for Multi-Target and Multi-User MIMO-ISAC Systems

Authors: Chunwei Meng, Zhiqing Wei, Dingyou Ma, Wanli Ni, Liyan Su, Zhiyong Feng

Abstract: Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi… ▽ More Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi-target sensing mutual information (MI) is derived, along with its upper bound, which can be interpreted as the sum of individual single-target sensing MI. Additionally, this upper bound can be achieved by suppressing the cross-correlation among reflected signals from different targets, which aligns with the principles of adaptive MIMO radar. Then, we propose a multi-objective optimization framework based on the signal-to-interference-plus-noise ratio of each user and the tight upper bound of sensing MI, introducing the Pareto boundary to characterize the achievable communication-sensing performance boundary of the proposed ISAC system. To achieve the Pareto boundary, the max-min system utility function method is employed, while considering the fairness between communication users and radar targets. Subsequently, the bisection search method is employed to find a specific Pareto optimal solution by solving a series of convex feasible problems. Finally, simulation results validate that the proposed method achieves a better tradeoff between multi-user communication and multi-target sensing performance. Additionally, utilizing the tight upper bound of sensing MI as a performance metric can enhance the multi-target resolution capability and angle estimation accuracy. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.08815 [pdf, other]

Efficient Vision-Language Pre-training by Cluster Masking

Authors: Zihao Wei, Zixuan Pan, Andrew Owens

Abstract: We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed. During each iteration of training, we randomly mask clusters of visually similar image patches, as measured by their raw pixel intensities. This provides an extra learning signal, beyond the contrastive training itself,… ▽ More We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed. During each iteration of training, we randomly mask clusters of visually similar image patches, as measured by their raw pixel intensities. This provides an extra learning signal, beyond the contrastive training itself, since it forces a model to predict words for masked visual structures solely from context. It also speeds up training by reducing the amount of data used in each image. We evaluate the effectiveness of our model by pre-training on a number of benchmarks, finding that it outperforms other masking strategies, such as FLIP, on the quality of the learned representation. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: CVPR 2024, Project page: https://zxp46.github.io/cluster-masking/ , Code: https://github.com/Zi-hao-Wei/Efficient-Vision-Language-Pre-training-by-Cluster-Masking

arXiv:2405.07792 [pdf, other]

Optimal Matrix Sketching over Sliding Windows

Authors: Hanyan Yin, Dongxie Wen, Jiajun Li, Zhewei Wei, Xiao Zhang, Zengfeng Huang, Feifei Li

Abstract: Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Dir… ▽ More Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound and provides a covariance error guarantee of $\varepsilon = \lVert \boldsymbol{A}^\top \boldsymbol{A} - \boldsymbol{B}^\top \boldsymbol{B} \rVert_2/\lVert \boldsymbol{A} \rVert_F^2$. The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix $\boldsymbol{A}_W$, formed by input vectors over the most recent $N$ time units. However, despite recent efforts, whether achieving the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound on sliding windows is possible has remained an open question. In this paper, we introduce the DS-FD algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of \dsfd across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07668 [pdf, other]

CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

Authors: Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Bo Jiang, W. K. Chan

Abstract: Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, ex… ▽ More Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, existing certified detection defenders suffer from protecting labels subject to manipulation, and existing certified recovery defenders cannot systematically warn samples about their labels. A certified defense that simultaneously offers robust labels and systematic warning protection against patch attacks is desirable. This paper proposes a novel certified defense technique called CrossCert. CrossCert formulates a novel approach by cross-checking two certified recovery defenders to provide unwavering certification and detection certification. Unwavering certification ensures that a certified sample, when subjected to a patched perturbation, will always be returned with a benign label without triggering any warnings with a provable guarantee. To our knowledge, CrossCert is the first certified detection technique to offer this guarantee. Our experiments show that, with a slightly lower performance than ViP and comparable performance with PatchCensor in terms of detection certification, CrossCert certifies a significant proportion of samples with the guarantee of unwavering certification. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 23 pages, 2 figures, accepted by FSE 2024 (The ACM International Conference on the Foundations of Software Engineering)

arXiv:2405.07469 [pdf, other]

Phase coding semi-quantum key distribution system based on the Single-state protocol

Authors: Qincheng Hou, Siying Huang, Naida Mo, Jindong Wang, Zhengjun Wei, Yafei Yu, Tianming Zhao, Zhiming Zhang

Abstract: Semi-quantum key distribution (SQKD) allows sharing random keys between a quantum user and a classical user. However, implementing classical user operations is challenging, posing a hurdle to achieving the Single-state protocol. By using the "selective modulation" method, the feasibility of SQKD is verified in principle. The proposal of the selective modulation method enables the realization of ot… ▽ More Semi-quantum key distribution (SQKD) allows sharing random keys between a quantum user and a classical user. However, implementing classical user operations is challenging, posing a hurdle to achieving the Single-state protocol. By using the "selective modulation" method, the feasibility of SQKD is verified in principle. The proposal of the selective modulation method enables the realization of other protocols for SQKD. To advance experimental progress in SQKD, we propose and implement a phase-encoded semi-quantum key distribution system based on the Single-state protocol and the "selective modulation" method. The system operates at a frequency of 100MHz and an average photon number of 0.1. The interference contrast achieved 96.52%, the average quantum bit error rate was 1.19%, and the raw key rate reached 88Kbps. Our experimental results demonstrate the feasibility and stability of the proposed phase-encoded semi-quantum key distribution system. Furthermore, by leveraging the "selective modulation" scheme proposed in this paper, we develop a comprehensive theoretical description of selective modulation. Through an analysis of quantum state evolution, we assess the security of our system, ultimately demonstrating its resilience against attacks targeting quantum states. The classical user of our system requires only two optical devices, significantly reducing the equipment requirements and enhancing its application potential. This work validates the feasibility of semi-quantum key distribution experiments and provides ideas for future research on semi-quantum key distribution experiments and security studies. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.05663 [pdf, other]

RPBG: Towards Robust Neural Point-based Graphics in the Wild

Authors: Qingtian Zhu, Zizhuang Wei, Zhongtian Zheng, Yifan Zhan, Zhuyu Yao, Jiawang Zhang, Kejian Wu, Yinqiang Zheng

Abstract: Point-based representations have recently gained popularity in novel view synthesis, for their unique advantages, e.g., intuitive geometric representation, simple manipulation, and faster convergence. However, based on our observation, these point-based neural re-rendering methods are only expected to perform well under ideal conditions and suffer from noisy, patchy points and unbounded scenes, wh… ▽ More Point-based representations have recently gained popularity in novel view synthesis, for their unique advantages, e.g., intuitive geometric representation, simple manipulation, and faster convergence. However, based on our observation, these point-based neural re-rendering methods are only expected to perform well under ideal conditions and suffer from noisy, patchy points and unbounded scenes, which are challenging to handle but defacto common in real applications. To this end, we revisit one such influential method, known as Neural Point-based Graphics (NPBG), as our baseline, and propose Robust Point-based Graphics (RPBG). We in-depth analyze the factors that prevent NPBG from achieving satisfactory renderings on generic datasets, and accordingly reform the pipeline to make it more robust to varying datasets in-the-wild. Inspired by the practices in image restoration, we greatly enhance the neural renderer to enable the attention-based correction of point visibility and the inpainting of incomplete rasterization, with only acceptable overheads. We also seek for a simple and lightweight alternative for environment modeling and an iterative method to alleviate the problem of poor geometry. By thorough evaluation on a wide range of datasets with different shooting conditions and camera trajectories, RPBG stably outperforms the baseline by a large margin, and exhibits its great robustness over state-of-the-art NeRF-based variants. Code available at https://github.com/QT-Zhu/RPBG. △ Less

Submitted 10 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: ECCV 2024

arXiv:2405.05486 [pdf, other]

doi 10.1103/PhysRevB.110.075306

Quantum Hall interferometry at finite bias with multiple edge channels

Authors: Zezhu Wei, D. E. Feldman, Bertrand I. Halperin

Abstract: In a quantum Hall interferometer, the dependence of the signal on source-drain voltage is controlled by details of the edge physics, such as the velocities of edge modes and the interaction between them and with screening layers. Such dependence of the signal has been seen in recent experiments at various integer and fractional filling factors, including $ν=2$ and $ν=2/5$, where two edge modes are… ▽ More In a quantum Hall interferometer, the dependence of the signal on source-drain voltage is controlled by details of the edge physics, such as the velocities of edge modes and the interaction between them and with screening layers. Such dependence of the signal has been seen in recent experiments at various integer and fractional filling factors, including $ν=2$ and $ν=2/5$, where two edge modes are present. Here we study theoretically the current-voltage curves for various values of the relative edge velocities, interaction strength, and the temperature, in a model containing two edge modes. We consider separate cases in which the inner mode or the outer mode is weakly backscattered at the tunneling contacts. When the inner mode is completely reflected and the outer mode is partially transmitted, we find striking features at very low temperatures related to resonance of excitations of the closed inner channel. Fluctuations in the charge of the closed inner mode, caused by sparse tunneling events, lead to an exponential suppression of the interference visibility at high voltages, in agreement with experiments. △ Less

Submitted 28 August, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 26 pages, 13 figures, typos fixed and reference updated, published in PRB as an Editors' Suggestion

Journal ref: Phys. Rev. B 110, 075306 (2024)

arXiv:2405.03976 [pdf]

Anomalous Gate-tunable Capacitance in Graphene Moiré Heterostructures

Authors: Linshang Chen, Haoran Long, Heng Wu, Rui Mei, Zhengyu Su, Mengjie Feng, Jiang-Bin Wu, Kenji Watanabe, Takashi Taniguchi, Xuewei Cao, Zhongming Wei, Ping-Heng Tan, Yanmeng Shi

Abstract: Interface engineered ferroelectricity in van der Waals heterostructures is of broad interest both fundamentally and technologically for the applications in neuromorphic computing and so on. In particular, the moiré ferroelectricity in graphene/hexagonal boron nitride (hBN) heterostructures driven by charge ordering instead of traditional lattice displacement has drawn considerable attention becaus… ▽ More Interface engineered ferroelectricity in van der Waals heterostructures is of broad interest both fundamentally and technologically for the applications in neuromorphic computing and so on. In particular, the moiré ferroelectricity in graphene/hexagonal boron nitride (hBN) heterostructures driven by charge ordering instead of traditional lattice displacement has drawn considerable attention because of its fascinating properties and promising high-frequency programmable electrical polarization switching. Yet, the underlying mechanism of the electronic ferroelectricity is still under debate. On the other hand, combining the interface engineered ferroelectricity and strong correlations in moiré heterostructures could enable the realization of novel quantum states such as ferroelectric superconductivity and multiferroicity. Here we study the electronic transport properties of twisted double bilayer graphene (TDBLG), aligned with one of the neighbouring hBN. We observe a strong gating hysteresis and ferroelectric-like behaviour, as well as the electronic ratchet effect. We find that the top gate is anomalously screened. On the contrary, the back gate is anomalously doubly efficient in injecting charges into graphene, that is, the effective back gate capacitance is two times larger than its geometry capacitance. This unexpected gate-tunable capacitance causes a dramatic change of electric fields between forward and backward scans. The asymmetric gating behaviours and anomalous change in capacitance could be explained with a simple model involved with a spontaneous electric polarization between top hBN and graphene. Our work provides more insights into the mysterious ferroelectricity in graphene/hBN moiré heterostructures and paves the way to the understanding of the underlying mechanism. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 20 pages, 13 figures

arXiv:2405.03755 [pdf, other]

Holographic Dual of Crosscap Conformal Field Theory

Authors: Zixia Wei

Abstract: We propose a holographic dual for 2D CFT defined on closed non-orientable manifolds, such as the real projective plane $\mathbb{RP}^2$ and the Klein bottle $\mathbb{K}^2$. Such CFT can be constructed by introducing antipodally identified cuttings, i.e. crosscaps, to a sphere and hence called crosscap CFT (XCFT). The gravity dual is AdS$_3$ spacetime with dS$_2$ end-of-the-world branes. In particul… ▽ More We propose a holographic dual for 2D CFT defined on closed non-orientable manifolds, such as the real projective plane $\mathbb{RP}^2$ and the Klein bottle $\mathbb{K}^2$. Such CFT can be constructed by introducing antipodally identified cuttings, i.e. crosscaps, to a sphere and hence called crosscap CFT (XCFT). The gravity dual is AdS$_3$ spacetime with dS$_2$ end-of-the-world branes. In particular, the Lorentzian spacetime with a global dS$_2$ brane is dual to the unitary time evolution of a crosscap state in CFT, post-selected on the CFT ground state. We compute the holographic $\mathbb{RP}^2$ partition function (or the $p$-function), one-point function, and $\mathbb{K}^2$ partition function, and see that they successfully reproduce the XCFT results. We also show a holographic $p$-theorem as an application. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 15 pages + references, 4 figures

arXiv:2405.02873 [pdf, other]

Target Localization with Macro and Micro Base Stations Cooperative Sensing

Authors: Haotian Liu, Zhiqing Wei, Furong Yang, Huici Wu, Kaifeng Han, Zhiyong Feng

Abstract: Addressing the communication and sensing demands of sixth-generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered traction in academia and industry. With the sensing limitation of single base station (BS), multi-BS cooperative sensing is regarded as a promising solution. The coexistence and overlapped coverage of macro BS (MBS) and micro BS (MiBS) are… ▽ More Addressing the communication and sensing demands of sixth-generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered traction in academia and industry. With the sensing limitation of single base station (BS), multi-BS cooperative sensing is regarded as a promising solution. The coexistence and overlapped coverage of macro BS (MBS) and micro BS (MiBS) are common in the development of 6G, making the cooperative sensing between MBS and MiBS feasible. Since MBS and MiBS work in low and high frequency bands, respectively, the challenges of MBS and MiBS cooperative sensing lie in the fusion method of the sensing information in high and low-frequency bands. To this end, this paper introduces a symbol-level fusion method and a grid-based three-dimensional discrete Fourier transform (3D-GDFT) algorithm to achieve precise localization of multiple targets with limited resources. Simulation results demonstrate that the proposed MBS and MiBS cooperative sensing scheme outperforms traditional single BS (MBS/MiBS) sensing scheme, showcasing superior sensing performance △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 7 pages 6 figures, submitted to 2024 IEEE GLOBECOM

arXiv:2405.01229 [pdf, ps, other]

Boosting Jailbreak Attack with Momentum

Authors: Yihao Zhang, Zeming Wei

Abstract: Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-documented \textit{jailbreak} attack. Recently, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search.… ▽ More Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-documented \textit{jailbreak} attack. Recently, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search. However, the efficiency of this attack has become a bottleneck in the attacking process. To mitigate this limitation, in this paper we rethink the generation of adversarial prompts through an optimization lens, aiming to stabilize the optimization process and harness more heuristic insights from previous iterations. Specifically, we introduce the \textbf{M}omentum \textbf{A}ccelerated G\textbf{C}G (\textbf{MAC}) attack, which incorporates a momentum term into the gradient heuristic. Experimental results showcase the notable enhancement achieved by MAP in gradient-based attacks on aligned language models. Our code is available at https://github.com/weizeming/momentum-attack-llm. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models

arXiv:2405.00820 [pdf, other]

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

Authors: Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao

Abstract: Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extens… ▽ More Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking. Code at https://github.com/sharc-lab/HLSFactory. △ Less

Submitted 17 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: Edit to "Section V.E" for proper attribution of open-source HLSyn, AutoDSE, and the Merlin compiler

arXiv:2405.00636 [pdf, other]

Robustness of graph embedding methods for community detection

Authors: Zhi-Feng Wei, Pablo Moriano, Ramakrishnan Kannan

Abstract: This study investigates the robustness of graph embedding methods for community detection in the face of network perturbations, specifically edge deletions. Graph embedding techniques, which represent nodes as low-dimensional vectors, are widely used for various graph machine learning tasks due to their ability to capture structural properties of networks effectively. However, the impact of pertur… ▽ More This study investigates the robustness of graph embedding methods for community detection in the face of network perturbations, specifically edge deletions. Graph embedding techniques, which represent nodes as low-dimensional vectors, are widely used for various graph machine learning tasks due to their ability to capture structural properties of networks effectively. However, the impact of perturbations on the performance of these methods remains relatively understudied. The research considers state-of-the-art graph embedding methods from two families: matrix factorization (e.g., LE, LLE, HOPE, M-NMF) and random walk-based (e.g., DeepWalk, LINE, node2vec). Through experiments conducted on both synthetic and real-world networks, the study reveals varying degrees of robustness within each family of graph embedding methods. The robustness is found to be influenced by factors such as network size, initial community partition strength, and the type of perturbation. Notably, node2vec and LLE consistently demonstrate higher robustness for community detection across different scenarios, including networks with degree and community size heterogeneity. These findings highlight the importance of selecting an appropriate graph embedding method based on the specific characteristics of the network and the task at hand, particularly in scenarios where robustness to perturbations is crucial. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 17 pages, 26 figures, 3 tables. Comments are welcome

arXiv:2405.00527 [pdf, other]

ChatBI: Towards Natural Language to Complex Business Intelligence SQL

Authors: Jinqing Lian, Xinyi Liu, Yingxia Shao, Yang Dong, Ming Wang, Zhang Wei, Tianqi Wan, Ming Dong, Hailin Yan

Abstract: The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges. In this paper, we propose ChatBI, a comprehensive and eff… ▽ More The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges. In this paper, we propose ChatBI, a comprehensive and efficient technology for solving the NL2BI task. First, we analyze the interaction mode, an important module where NL2SQL and NL2BI differ in use, and design a smaller and cheaper model to match this interaction mode. In BI scenarios, tables contain a huge number of columns, making it impossible for existing NL2SQL methods that rely on Large Language Models (LLMs) for schema linking to proceed due to token limitations. The higher proportion of ambiguous columns in BI scenarios also makes schema linking difficult. ChatBI combines existing view technology in the database community to first decompose the schema linking problem into a Single View Selection problem and then uses a smaller and cheaper machine learning model to select the single view with a significantly reduced number of columns. The columns of this single view are then passed as the required columns for schema linking into the LLM. Finally, ChatBI proposes a phased process flow different from existing process flows, which allows ChatBI to generate SQL containing complex semantics and comparison relations more accurately. We have deployed ChatBI on Baidu's data platform and integrated it into multiple product lines for large-scale production task evaluation. The obtained results highlight its superiority in practicality, versatility, and efficiency. At the same time, compared with the current mainstream NL2SQL technology under our real BI scenario data tables and queries, it also achieved the best results. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00417 [pdf, other]

Conformal Risk Control for Ordinal Classification

Authors: Yunpeng Xu, Wenge Guo, Zhi Wei

Abstract: As a natural extension to the standard conformal prediction method, several conformal risk control methods have been recently developed and applied to various learning problems. In this work, we seek to control the conformal risk in expectation for ordinal classification tasks, which have broad applications to many real problems. For this purpose, we firstly formulated the ordinal classification t… ▽ More As a natural extension to the standard conformal prediction method, several conformal risk control methods have been recently developed and applied to various learning problems. In this work, we seek to control the conformal risk in expectation for ordinal classification tasks, which have broad applications to many real problems. For this purpose, we firstly formulated the ordinal classification task in the conformal risk control framework, and provided theoretic risk bounds of the risk control method. Then we proposed two types of loss functions specially designed for ordinal classification tasks, and developed corresponding algorithms to determine the prediction set for each case to control their risks at a desired level. We demonstrated the effectiveness of our proposed methods, and analyzed the difference between the two types of risks on three different datasets, including a simulated dataset, the UTKFace dataset and the diabetic retinopathy detection dataset. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 17 pages, 8 figures, 2 table; 1 supplementary page

Journal ref: In UAI 2023: The 39th Conference on Uncertainty in Artificial Intelligence

arXiv:2404.18644 [pdf, other]

Low-Overhead Defect-Adaptive Surface Code with Bandage-Like Super-Stabilizers

Authors: Zuolin Wei, Tan He, Yangsen Ye, Dachao Wu, Yiming Zhang, Youwei Zhao, Weiping Lin, He-Liang Huang, Xiaobo Zhu, Jian-Wei Pan

Abstract: To make practical quantum algorithms work, large-scale quantum processors protected by error-correcting codes are required to resist noise and ensure reliable computational outcomes. However, a major challenge arises from defects in processor fabrication, as well as occasional losses or cosmic rays during the computing process, all of which can lead to qubit malfunctions and disrupt error-correcti… ▽ More To make practical quantum algorithms work, large-scale quantum processors protected by error-correcting codes are required to resist noise and ensure reliable computational outcomes. However, a major challenge arises from defects in processor fabrication, as well as occasional losses or cosmic rays during the computing process, all of which can lead to qubit malfunctions and disrupt error-correcting codes' normal operations. In this context, we introduce an automatic adapter to implement the surface code on defective lattices. Unlike previous approaches, this adapter leverages newly proposed bandage-like super-stabilizers to save more qubits when defects are clustered, thus enhancing the code distance and reducing super-stabilizer weight. For instance, in comparison with earlier methods, with a code size of 27 and a random defect rate of 2\%, the disabled qubits decrease by $1/3$, and the average preserved code distance increases by 63\%. This demonstrates a significant reduction in overhead when handling defects using our approach, and this advantage amplifies with increasing processor size and defect rates. Our work presents a low-overhead, automated solution to the challenge of adapting the surface code to defects, an essential step towards scaling up the construction of large-scale quantum computers for practical applications. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18211 [pdf, other]

A survey of dynamic graph neural networks

Authors: Yanping Zheng, Lu Yi, Zhewei Wei

Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for effectively mining and learning from graph-structured data, with applications spanning numerous domains. However, most research focuses on static graphs, neglecting the dynamic nature of real-world networks where topologies and attributes evolve over time. By integrating sequence modeling modules into traditional GNN architectures, d… ▽ More Graph neural networks (GNNs) have emerged as a powerful tool for effectively mining and learning from graph-structured data, with applications spanning numerous domains. However, most research focuses on static graphs, neglecting the dynamic nature of real-world networks where topologies and attributes evolve over time. By integrating sequence modeling modules into traditional GNN architectures, dynamic GNNs aim to bridge this gap, capturing the inherent temporal dependencies of dynamic graphs for a more authentic depiction of complex networks. This paper provides a comprehensive review of the fundamental concepts, key techniques, and state-of-the-art dynamic GNN models. We present the mainstream dynamic GNN models in detail and categorize models based on how temporal information is incorporated. We also discuss large-scale dynamic GNNs and pre-training techniques. Although dynamic GNNs have shown superior performance, challenges remain in scalability, handling heterogeneous information, and lack of diverse graph datasets. The paper also discusses possible future directions, such as adaptive and memory-enhanced models, inductive learning, and theoretical analysis. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18191 [pdf, other]

Exploring the Robustness of In-Context Learning with Noisy Labels

Authors: Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei

Abstract: Recently, the mysterious In-Context Learning (ICL) ability exhibited by Transformer architectures, especially in large language models (LLMs), has sparked significant research interest. However, the resilience of Transformers' in-context learning capabilities in the presence of noisy samples, prevalent in both training corpora and prompt demonstrations, remains underexplored. In this paper, inspir… ▽ More Recently, the mysterious In-Context Learning (ICL) ability exhibited by Transformer architectures, especially in large language models (LLMs), has sparked significant research interest. However, the resilience of Transformers' in-context learning capabilities in the presence of noisy samples, prevalent in both training corpora and prompt demonstrations, remains underexplored. In this paper, inspired by prior research that studies ICL ability using simple function classes, we take a closer look at this problem by investigating the robustness of Transformers against noisy labels. Specifically, we first conduct a thorough evaluation and analysis of the robustness of Transformers against noisy labels during in-context learning and show that they exhibit notable resilience against diverse types of noise in demonstration labels. Furthermore, we delve deeper into this problem by exploring whether introducing noise into the training set, akin to a form of data augmentation, enhances such robustness during inference, and find that such noise can indeed improve the robustness of ICL. Overall, our fruitful analysis and findings provide a comprehensive understanding of the resilience of Transformer models against label noises during ICL and provide valuable insights into the research on Transformers in natural language processing. Our code is available at https://github.com/InezYu0928/in-context-learning. △ Less

Submitted 1 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models

arXiv:2404.18041 [pdf, other]

Variational Optimization for Quantum Problems using Deep Generative Networks

Authors: Lingxia Zhang, Xiaodie Lin, Peidong Wang, Kaiyan Yang, Xiao Zeng, Zhaohui Wei, Zizhu Wang

Abstract: Optimization is one of the keystones of modern science and engineering. Its applications in quantum technology and machine learning helped nurture variational quantum algorithms and generative AI respectively. We propose a general approach to design variational optimization algorithms based on generative models: the Variational Generative Optimization Network (VGON). To demonstrate its broad appli… ▽ More Optimization is one of the keystones of modern science and engineering. Its applications in quantum technology and machine learning helped nurture variational quantum algorithms and generative AI respectively. We propose a general approach to design variational optimization algorithms based on generative models: the Variational Generative Optimization Network (VGON). To demonstrate its broad applicability, we apply VGON to three quantum tasks: finding the best state in an entanglement-detection protocol, finding the ground state of a 1D quantum spin model with variational quantum circuits, and generating degenerate ground states of many-body quantum Hamiltonians. For the first task, VGON greatly reduces the optimization time compared to stochastic gradient descent while generating nearly optimal quantum states. For the second task, VGON alleviates the barren plateau problem in variational quantum circuits. For the final task, VGON can identify the degenerate ground state spaces after a single stage of training and generate a variety of states therein. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 17 pages, 13 figures, comments welcome

arXiv:2404.17769 [pdf, other]

Conformal Ranked Retrieval

Authors: Yunpeng Xu, Wenge Guo, Zhi Wei

Abstract: Given the wide adoption of ranked retrieval techniques in various information systems that significantly impact our daily lives, there is an increasing need to assess and address the uncertainty inherent in their predictions. This paper introduces a novel method using the conformal risk control framework to quantitatively measure and manage risks in the context of ranked retrieval problems. Our re… ▽ More Given the wide adoption of ranked retrieval techniques in various information systems that significantly impact our daily lives, there is an increasing need to assess and address the uncertainty inherent in their predictions. This paper introduces a novel method using the conformal risk control framework to quantitatively measure and manage risks in the context of ranked retrieval problems. Our research focuses on a typical two-stage ranked retrieval problem, where the retrieval stage generates candidates for subsequent ranking. By carefully formulating the conformal risk for each stage, we have developed algorithms to effectively control these risks within their specified bounds. The efficacy of our proposed methods has been demonstrated through comprehensive experiments on three large-scale public datasets for ranked retrieval tasks, including the MSLR-WEB dataset, the Yahoo LTRC dataset and the MS MARCO dataset. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 14 pages, 6 figures, 1 table; 7 supplementary pages, 12 supplementary figures, 2 supplementary tables

arXiv:2404.17462 [pdf, other]

Integrated Sensing and Communication Channel Modeling: A Survey

Authors: Zhiqing Wei, Jinzhu Jia, Yangyang Niu, Lin Wang, Huici Wu, Heng Yang, Zhiyong Feng

Abstract: Integrated sensing and communication (ISAC) is expected to play a crucial role in the sixth-generation (6G) mobile communication systems, offering potential applications in the scenarios of intelligent transportation, smart factories, etc. The performance of radar sensing in ISAC systems is closely related to the characteristics of radar sensing and communication channels. Therefore, ISAC channel… ▽ More Integrated sensing and communication (ISAC) is expected to play a crucial role in the sixth-generation (6G) mobile communication systems, offering potential applications in the scenarios of intelligent transportation, smart factories, etc. The performance of radar sensing in ISAC systems is closely related to the characteristics of radar sensing and communication channels. Therefore, ISAC channel modeling serves as a fundamental cornerstone for evaluating and optimizing ISAC systems. This article provides a comprehensive survey on the ISAC channel modeling methods. Furthermore, the methods of target radar cross section (RCS) modeling and clutter RCS modeling are summarized. Finally, we discuss the future research trends related to ISAC channel modeling in various scenarios. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17343 [pdf, other]

A Bionic Natural Language Parser Equivalent to a Pushdown Automaton

Authors: Zhenghao Wei, Kehua Lin, Jianlin Feng

Abstract: Assembly Calculus (AC), proposed by Papadimitriou et al., aims to reproduce advanced cognitive functions through simulating neural activities, with several applications based on AC having been developed, including a natural language parser proposed by Mitropolsky et al. However, this parser lacks the ability to handle Kleene closures, preventing it from parsing all regular languages and rendering… ▽ More Assembly Calculus (AC), proposed by Papadimitriou et al., aims to reproduce advanced cognitive functions through simulating neural activities, with several applications based on AC having been developed, including a natural language parser proposed by Mitropolsky et al. However, this parser lacks the ability to handle Kleene closures, preventing it from parsing all regular languages and rendering it weaker than Finite Automata (FA). In this paper, we propose a new bionic natural language parser (BNLP) based on AC and integrates two new biologically rational structures, Recurrent Circuit and Stack Circuit which are inspired by RNN and short-term memory mechanism. In contrast to the original parser, the BNLP can fully handle all regular languages and Dyck languages. Therefore, leveraging the Chomsky-Sch űtzenberger theorem, the BNLP which can parse all Context-Free Languages can be constructed. We also formally prove that for any PDA, a Parser Automaton corresponding to BNLP can always be formed, ensuring that BNLP has a description ability equal to that of PDA and addressing the deficiencies of the original parser. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: to be published in IJCNN 2024

arXiv:2404.16275 [pdf]

Spectrum Sharing Policy in the Asia-Pacific Region

Authors: Zhiyong Feng, Zhiqing Wei

Abstract: In this chapter, we investigate the spectrum measurement results in Asia-Pacific region. Then the spectrum sharing policy in the Asia-Pacific region is reviewed in details, where the national projects and strategies on spectrum refarming and spectrum sharing in China, Japan, Singapore, India, Korea and Australia are investigated. Then we introduce the spectrum sharing test-bed that is developed in… ▽ More In this chapter, we investigate the spectrum measurement results in Asia-Pacific region. Then the spectrum sharing policy in the Asia-Pacific region is reviewed in details, where the national projects and strategies on spectrum refarming and spectrum sharing in China, Japan, Singapore, India, Korea and Australia are investigated. Then we introduce the spectrum sharing test-bed that is developed in China, which is a cognitive radio enabled TD-LTE test-bed utilizing TVWS. This chapter provides a brief introduction of the spectrum sharing mechanism and policy of Asia-Pacific region. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 33 pages, 17figures

arXiv:2404.15805 [pdf, other]

Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering

Authors: Shujian Jiao, Bingxuan Li, Lei Wang, Xiaojin Zhang, Wei Chen, Jiajie Peng, Zhongyu Wei

Abstract: Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting… ▽ More Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting amino acid representations with notable biochemical accuracy. Yet, it lacks in delivering functional protein insights, signaling an opportunity for enhancing representation quality.Our study addresses this gap by incorporating protein family classification into ESM2's training.This approach, augmented with Community Propagation-Based Clustering Algorithm, improves global protein representations, while a contextual prediction task fine-tunes local amino acid accuracy. Significantly, our model achieved state-of-the-art results in several downstream experiments, demonstrating the power of combining global and local methodologies to substantially boost protein representation quality. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.14862 [pdf, other]

Deep Learning Based Multi-Node ISAC 4D Environmental Reconstruction with Uplink- Downlink Cooperation

Authors: Bohao Lu, Zhiqing Wei, Huici Wu, Xinrui Zeng, Lin Wang, Xi Lu, Dongyang Mei, Zhiyong Feng

Abstract: Utilizing widely distributed communication nodes to achieve environmental reconstruction is one of the significant scenarios for Integrated Sensing and Communication (ISAC) and a crucial technology for 6G. To achieve this crucial functionality, we propose a deep learning based multi-node ISAC 4D environment reconstruction method with Uplink-Downlink (UL-DL) cooperation, which employs virtual apert… ▽ More Utilizing widely distributed communication nodes to achieve environmental reconstruction is one of the significant scenarios for Integrated Sensing and Communication (ISAC) and a crucial technology for 6G. To achieve this crucial functionality, we propose a deep learning based multi-node ISAC 4D environment reconstruction method with Uplink-Downlink (UL-DL) cooperation, which employs virtual aperture technology, Constant False Alarm Rate (CFAR) detection, and Mutiple Signal Classification (MUSIC) algorithm to maximize the sensing capabilities of single sensing nodes. Simultaneously, it introduces a cooperative environmental reconstruction scheme involving multi-node cooperation and Uplink-Downlink (UL-DL) cooperation to overcome the limitations of single-node sensing caused by occlusion and limited viewpoints. Furthermore, the deep learning models Attention Gate Gridding Residual Neural Network (AGGRNN) and Multi-View Sensing Fusion Network (MVSFNet) to enhance the density of sparsely reconstructed point clouds are proposed, aiming to restore as many original environmental details as possible while preserving the spatial structure of the point cloud. Additionally, we propose a multi-level fusion strategy incorporating both data-level and feature-level fusion to fully leverage the advantages of multi-node cooperation. Experimental results demonstrate that the environmental reconstruction performance of this method significantly outperforms other comparative method, enabling high-precision environmental reconstruction using ISAC system. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 13 pages,21 figures,4 tables

arXiv:2404.14444 [pdf, other]

Practical Battery Health Monitoring using Uncertainty-Aware Bayesian Neural Network

Authors: Yunyi Zhao, Zhang Wei, Qingyu Yan, Man-Fai Ng, B. Sivaneasan, Cheng Xiang

Abstract: Battery health monitoring and prediction are critically important in the era of electric mobility with a huge impact on safety, sustainability, and economic aspects. Existing research often focuses on prediction accuracy but tends to neglect practical factors that may hinder the technology's deployment in real-world applications. In this paper, we address these practical considerations and develop… ▽ More Battery health monitoring and prediction are critically important in the era of electric mobility with a huge impact on safety, sustainability, and economic aspects. Existing research often focuses on prediction accuracy but tends to neglect practical factors that may hinder the technology's deployment in real-world applications. In this paper, we address these practical considerations and develop models based on the Bayesian neural network for predicting battery end-of-life. Our models use sensor data related to battery health and apply distributions, rather than single-point, for each parameter of the models. This allows the models to capture the inherent randomness and uncertainty of battery health, which leads to not only accurate predictions but also quantifiable uncertainty. We conducted an experimental study and demonstrated the effectiveness of our proposed models, with a prediction error rate averaging 13.9%, and as low as 2.9% for certain tested batteries. Additionally, all predictions include quantifiable certainty, which improved by 66% from the initial to the mid-life stage of the battery. This research has practical values for battery technologies and contributes to accelerating the technology adoption in the industry. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 6 pages

arXiv:2404.13752 [pdf, other]

Towards General Conceptual Model Editing via Adversarial Representation Engineering

Authors: Yihao Zhang, Zeming Wei, Jun Sun, Meng Sun

Abstract: Since the development of Large Language Models (LLMs) has achieved remarkable success, understanding and controlling their internal complex mechanisms has become an urgent problem. Recent research has attempted to interpret their behaviors through the lens of inner representation. However, developing practical and efficient methods for applying these representations for general and flexible model… ▽ More Since the development of Large Language Models (LLMs) has achieved remarkable success, understanding and controlling their internal complex mechanisms has become an urgent problem. Recent research has attempted to interpret their behaviors through the lens of inner representation. However, developing practical and efficient methods for applying these representations for general and flexible model editing remains challenging. In this work, we explore how to use representation engineering methods to guide the editing of LLMs by deploying a representation sensor as an oracle. We first identify the importance of a robust and reliable sensor during editing, then propose an Adversarial Representation Engineering (ARE) framework to provide a unified and interpretable approach for conceptual model editing without compromising baseline performance. Experiments on multiple model editing paradigms demonstrate the effectiveness of ARE in various settings. Code and data are available at https://github.com/Zhang-Yihao/Adversarial-Representation-Engineering. △ Less

Submitted 23 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13603 [pdf, other]

Beyond MMSE: Rank-1 Subspace Channel Estimator for Massive MIMO Systems

Authors: Bin Li, Ziping Wei, Shaoshi Yang, Yang Zhang, Jun Zhang, Chenglin Zhao, Sheng Chen

Abstract: To glean the benefits offered by massive multi-input multi-output (MIMO) systems, channel state information must be accurately acquired. Despite the high accuracy, the computational complexity of classical linear minimum mean squared error (MMSE) estimator becomes prohibitively high in the context of massive MIMO, while the other low-complexity methods degrade the estimation accuracy seriously. In… ▽ More To glean the benefits offered by massive multi-input multi-output (MIMO) systems, channel state information must be accurately acquired. Despite the high accuracy, the computational complexity of classical linear minimum mean squared error (MMSE) estimator becomes prohibitively high in the context of massive MIMO, while the other low-complexity methods degrade the estimation accuracy seriously. In this paper, we develop a novel rank-1 subspace channel estimator to approximate the maximum likelihood (ML) estimator, which outperforms the linear MMSE estimator, but incurs a surprisingly low computational complexity. Our method first acquires the highly accurate angle-of-arrival (AoA) information via a constructed space-embedding matrix and the rank-1 subspace method. Then, it adopts the post-reception beamforming to acquire the unbiased estimate of channel gains. Furthermore, a fast method is designed to implement our new estimator. Theoretical analysis shows that the extra gain achieved by our method over the linear MMSE estimator grows according to the rule of O($\log_{10}M$), while its computational complexity is linearly scalable to the number of antennas $M$. Numerical simulations also validate the theoretical results. Our new method substantially extends the accuracy-complexity region and constitutes a promising channel estimation solution to the emerging massive MIMO communications. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 15 pages, 12 figures, accepted to appear on IEEE Transactions on Communications, Apr. 2024

Showing 51–100 of 998 results for author: Wei, Z