Search | arXiv e-print repository

arXiv:2403.19716 [pdf, other]

Capability-aware Prompt Reformulation Learning for Text-to-Image Generation

Authors: Jingtao Zhan, Qingyao Ai, Yiqun Liu, Jia Chen, Shaoping Ma

Abstract: Text-to-image generation systems have emerged as revolutionary tools in the realm of artistic creation, offering unprecedented ease in transforming textual prompts into visual art. However, the efficacy of these systems is intricately linked to the quality of user-provided prompts, which often poses a challenge to users unfamiliar with prompt crafting. This paper addresses this challenge by levera… ▽ More Text-to-image generation systems have emerged as revolutionary tools in the realm of artistic creation, offering unprecedented ease in transforming textual prompts into visual art. However, the efficacy of these systems is intricately linked to the quality of user-provided prompts, which often poses a challenge to users unfamiliar with prompt crafting. This paper addresses this challenge by leveraging user reformulation data from interaction logs to develop an automatic prompt reformulation model. Our in-depth analysis of these logs reveals that user prompt reformulation is heavily dependent on the individual user's capability, resulting in significant variance in the quality of reformulation pairs. To effectively use this data for training, we introduce the Capability-aware Prompt Reformulation (CAPR) framework. CAPR innovatively integrates user capability into the reformulation process through two key components: the Conditional Reformulation Model (CRM) and Configurable Capability Features (CCF). CRM reformulates prompts according to a specified user capability, as represented by CCF. The CCF, in turn, offers the flexibility to tune and guide the CRM's behavior. This enables CAPR to effectively learn diverse reformulation strategies across various user capacities and to simulate high-capability user reformulation during inference. Extensive experiments on standard text-to-image generation benchmarks showcase CAPR's superior performance over existing baselines and its remarkable robustness on unseen systems. Furthermore, comprehensive analyses validate the effectiveness of different components. CAPR can facilitate user-friendly interaction with text-to-image systems and make advanced artistic creation more achievable for a broader range of users. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted at SIGIR 2024

arXiv:2403.19251 [pdf, other]

Arbitrary State Transition of Open Qubit System Based on Switching Control

Authors: Guangpu Wu, Shibei Xue, Shan Ma, Sen Kuang, Daoyi Dong, Ian R. Petersen

Abstract: We present a switching control strategy based on Lyapunov control for arbitrary state transitions in open qubit systems. With coherent vector representation, we propose a switching control strategy, which can prevent the state of the qubit from entering invariant sets and singular value sets, effectively driving the system ultimately to a sufficiently small neighborhood of target states. In compar… ▽ More We present a switching control strategy based on Lyapunov control for arbitrary state transitions in open qubit systems. With coherent vector representation, we propose a switching control strategy, which can prevent the state of the qubit from entering invariant sets and singular value sets, effectively driving the system ultimately to a sufficiently small neighborhood of target states. In comparison to existing works, this control strategy relaxes the strict constraints on system models imposed by special target states. Furthermore, we identify conditions under which the open qubit system achieves finite-time stability (FTS) and finite-time contractive stability (FTCS), respectively. This represents a critical improvement in quantum state transitions, especially considering the asymptotic stability of arbitrary target states is unattainable in open quantum systems. The effectiveness of our proposed method is convincingly demonstrated through its application in a qubit system affected by various types of decoherence, including amplitude, dephasing and polarization decoherence. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 12 pages, 7 figures

arXiv:2403.18405 [pdf, other]

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

Authors: Shengjie Ma, Chong Chen, Qi Chu, Jiaxin Mao

Abstract: Collecting relevant judgments for legal case retrieval is a challenging and time-consuming task. Accurately judging the relevance between two legal cases requires a considerable effort to read the lengthy text and a high level of domain expertise to extract Legal Facts and make juridical judgments. With the advent of advanced large language models, some recent studies have suggested that it is pro… ▽ More Collecting relevant judgments for legal case retrieval is a challenging and time-consuming task. Accurately judging the relevance between two legal cases requires a considerable effort to read the lengthy text and a high level of domain expertise to extract Legal Facts and make juridical judgments. With the advent of advanced large language models, some recent studies have suggested that it is promising to use LLMs for relevance judgment. Nonetheless, the method of employing a general large language model for reliable relevance judgments in legal case retrieval is yet to be thoroughly explored. To fill this research gap, we devise a novel few-shot workflow tailored to the relevant judgment of legal cases. The proposed workflow breaks down the annotation process into a series of stages, imitating the process employed by human annotators and enabling a flexible integration of expert reasoning to enhance the accuracy of relevance judgments. By comparing the relevance judgments of LLMs and human experts, we empirically show that we can obtain reliable relevance judgments with the proposed workflow. Furthermore, we demonstrate the capacity to augment existing legal case retrieval models through the synthesis of data generated by the large language model. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17827 [pdf, other]

DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

Authors: Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin

Abstract: Generating natural hand-object interactions in 3D is challenging as the resulting hand and object motions are expected to be physically plausible and semantically meaningful. Furthermore, generalization to unseen objects is hindered by the limited scale of available hand-object interaction datasets. We propose DiffH2O, a novel method to synthesize realistic, one or two-handed object interactions f… ▽ More Generating natural hand-object interactions in 3D is challenging as the resulting hand and object motions are expected to be physically plausible and semantically meaningful. Furthermore, generalization to unseen objects is hindered by the limited scale of available hand-object interaction datasets. We propose DiffH2O, a novel method to synthesize realistic, one or two-handed object interactions from provided text prompts and geometry of the object. The method introduces three techniques that enable effective learning from limited data. First, we decompose the task into a grasping stage and a text-based interaction stage and use separate diffusion models for each. In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized. Second, we propose a compact representation that tightly couples hand and object poses. Third, we propose two different guidance schemes to allow more control of the generated motions: grasp guidance and detailed textual guidance. Grasp guidance takes a single target grasping pose and guides the diffusion model to reach this grasp at the end of the grasping stage, which provides control over the grasping pose. Given a grasping motion from this stage, multiple different actions can be prompted in the interaction phase. For textual guidance, we contribute comprehensive text descriptions to the GRAB dataset and show that they enable our method to have more fine-grained control over hand-object interactions. Our quantitative and qualitative evaluation demonstrates that the proposed method outperforms baseline methods and leads to natural hand-object motions. Moreover, we demonstrate the practicality of our framework by utilizing a hand pose estimate from an off-the-shelf pose estimator for guidance, and then sampling multiple different actions in the interaction stage. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Project Page: https://diffh2o.github.io/

arXiv:2403.17188 [pdf, other]

LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning

Authors: Siyuan Cheng, Guanhong Tao, Yingqi Liu, Guangyu Shen, Shengwei An, Shiwei Feng, Xiangzhe Xu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang

Abstract: Backdoor attack poses a significant security threat to Deep Learning applications. Existing attacks are often not evasive to established backdoor detection techniques. This susceptibility primarily stems from the fact that these attacks typically leverage a universal trigger pattern or transformation function, such that the trigger can cause misclassification for any input. In response to this, re… ▽ More Backdoor attack poses a significant security threat to Deep Learning applications. Existing attacks are often not evasive to established backdoor detection techniques. This susceptibility primarily stems from the fact that these attacks typically leverage a universal trigger pattern or transformation function, such that the trigger can cause misclassification for any input. In response to this, recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions. While these approaches manage to evade detection to some extent, they reveal vulnerability to existing backdoor mitigation techniques. To address and enhance both evasiveness and resilience, we introduce a novel backdoor attack LOTUS. Specifically, it leverages a secret function to separate samples in the victim class into a set of partitions and applies unique triggers to different partitions. Furthermore, LOTUS incorporates an effective trigger focusing mechanism, ensuring only the trigger corresponding to the partition can induce the backdoor behavior. Extensive experimental results show that LOTUS can achieve high attack success rate across 4 datasets and 7 model structures, and effectively evading 13 backdoor detection and mitigation techniques. The code is available at https://github.com/Megum1/LOTUS. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)

arXiv:2403.17007 [pdf, other]

DreamLIP: Language-Image Pre-training with Long Captions

Authors: Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen

Abstract: Language-image pre-training largely relies on how precisely and thoroughly a text describes its paired image. In practice, however, the contents of an image can be so rich that well describing them requires lengthy captions (e.g., with 10 sentences), which are usually missing in existing datasets. Consequently, there are currently no clear evidences on whether and how language-image pre-training c… ▽ More Language-image pre-training largely relies on how precisely and thoroughly a text describes its paired image. In practice, however, the contents of an image can be so rich that well describing them requires lengthy captions (e.g., with 10 sentences), which are usually missing in existing datasets. Consequently, there are currently no clear evidences on whether and how language-image pre-training could benefit from long captions. To figure this out, we first re-caption 30M images with detailed descriptions using a pre-trained Multi-modality Large Language Model (MLLM), and then study the usage of the resulting captions under a contrastive learning framework. We observe that, each sentence within a long caption is very likely to describe the image partially (e.g., an object). Motivated by this, we propose to dynamically sample sub-captions from the text label to construct multiple positive pairs, and introduce a grouping loss to match the embeddings of each sub-caption with its corresponding local image patches in a self-supervised manner. Experimental results on a wide rage of downstream tasks demonstrate the consistent superiority of our method, termed DreamLIP, over previous alternatives, highlighting its fine-grained representational capacity. It is noteworthy that, on the tasks of image-text retrieval and semantic segmentation, our model trained with 30M image-text pairs achieves on par or even better performance than CLIP trained with 400M pairs. Project page is available at https://zyf0619sjtu.github.io/dream-lip. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16812 [pdf, other]

Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Authors: Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, Xiaojuan Ma

Abstract: In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to p… ▽ More In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans' appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15709 [pdf, other]

Contact-aware Human Motion Generation from Textual Descriptions

Authors: Sihan Ma, Qiong Cao, Jing Zhang, Dacheng Tao

Abstract: This paper addresses the problem of generating 3D interactive human motion from text. Given a textual description depicting the actions of different body parts in contact with objects, we synthesize sequences of 3D body poses that are visually natural and physically plausible. Yet, this task poses a significant challenge due to the inadequate consideration of interactions by physical contacts in b… ▽ More This paper addresses the problem of generating 3D interactive human motion from text. Given a textual description depicting the actions of different body parts in contact with objects, we synthesize sequences of 3D body poses that are visually natural and physically plausible. Yet, this task poses a significant challenge due to the inadequate consideration of interactions by physical contacts in both motion and textual descriptions, leading to unnatural and implausible sequences. To tackle this challenge, we create a novel dataset named RICH-CAT, representing ``Contact-Aware Texts'' constructed from the RICH dataset. RICH-CAT comprises high-quality motion, accurate human-object contact labels, and detailed textual descriptions, encompassing over 8,500 motion-text pairs across 26 indoor/outdoor actions. Leveraging RICH-CAT, we propose a novel approach named CATMO for text-driven interactive human motion synthesis that explicitly integrates human body contacts as evidence. We employ two VQ-VAE models to encode motion and body contact sequences into distinct yet complementary latent spaces and an intertwined GPT for generating human motions and contacts in a mutually conditioned manner. Additionally, we introduce a pre-trained text encoder to learn textual embeddings that better discriminate among various contact types, allowing for more precise control over synthesized motions and contacts. Our experiments demonstrate the superior performance of our approach compared to existing text-to-motion methods, producing stable, contact-aware motion sequences. Code and data will be available for research purposes. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Project page: https://xymsh.github.io/RICH-CAT/

arXiv:2403.15285 [pdf, other]

Blockchain-based Pseudonym Management for Vehicle Twin Migrations in Vehicular Edge Metaverse

Authors: Jiawen Kang, Xiaofeng Luo, Jiangtian Nie, Tianhao Wu, Haibo Zhou, Yonghua Wang, Dusit Niyato, Shiwen Mao, Shengli Xie

Abstract: Driven by the great advances in metaverse and edge computing technologies, vehicular edge metaverses are expected to disrupt the current paradigm of intelligent transportation systems. As highly computerized avatars of Vehicular Metaverse Users (VMUs), the Vehicle Twins (VTs) deployed in edge servers can provide valuable metaverse services to improve driving safety and on-board satisfaction for th… ▽ More Driven by the great advances in metaverse and edge computing technologies, vehicular edge metaverses are expected to disrupt the current paradigm of intelligent transportation systems. As highly computerized avatars of Vehicular Metaverse Users (VMUs), the Vehicle Twins (VTs) deployed in edge servers can provide valuable metaverse services to improve driving safety and on-board satisfaction for their VMUs throughout journeys. To maintain uninterrupted metaverse experiences, VTs must be migrated among edge servers following the movements of vehicles. This can raise concerns about privacy breaches during the dynamic communications among vehicular edge metaverses. To address these concerns and safeguard location privacy, pseudonyms as temporary identifiers can be leveraged by both VMUs and VTs to realize anonymous communications in the physical space and virtual spaces. However, existing pseudonym management methods fall short in meeting the extensive pseudonym demands in vehicular edge metaverses, thus dramatically diminishing the performance of privacy preservation. To this end, we present a cross-metaverse empowered dual pseudonym management framework. We utilize cross-chain technology to enhance management efficiency and data security for pseudonyms. Furthermore, we propose a metric to assess the privacy level and employ a Multi-Agent Deep Reinforcement Learning (MADRL) approach to obtain an optimal pseudonym generating strategy. Numerical results demonstrate that our proposed schemes are high-efficiency and cost-effective, showcasing their promising applications in vehicular edge metaverses. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 14 pages, 9 figures

arXiv:2403.14899 [pdf, other]

doi 10.1080/01621459.2024.2335591

Statistical Inference For Noisy Matrix Completion Incorporating Auxiliary Information

Authors: Shujie Ma, Po-Yao Niu, Yichong Zhang, Yinchu Zhu

Abstract: This paper investigates statistical inference for noisy matrix completion in a semi-supervised model when auxiliary covariates are available. The model consists of two parts. One part is a low-rank matrix induced by unobserved latent factors; the other part models the effects of the observed covariates through a coefficient matrix which is composed of high-dimensional column vectors. We model the… ▽ More This paper investigates statistical inference for noisy matrix completion in a semi-supervised model when auxiliary covariates are available. The model consists of two parts. One part is a low-rank matrix induced by unobserved latent factors; the other part models the effects of the observed covariates through a coefficient matrix which is composed of high-dimensional column vectors. We model the observational pattern of the responses through a logistic regression of the covariates, and allow its probability to go to zero as the sample size increases. We apply an iterative least squares (LS) estimation approach in our considered context. The iterative LS methods in general enjoy a low computational cost, but deriving the statistical properties of the resulting estimators is a challenging task. We show that our method only needs a few iterations, and the resulting entry-wise estimators of the low-rank matrix and the coefficient matrix are guaranteed to have asymptotic normal distributions. As a result, individual inference can be conducted for each entry of the unknown matrices. We also propose a simultaneous testing procedure with multiplier bootstrap for the high-dimensional coefficient matrix. This simultaneous inferential tool can help us further investigate the effects of covariates for the prediction of missing entries. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13500 [pdf, other]

doi 10.1051/0004-6361/202348993

The Galactic latitude dependency of Faraday complexity in the S-PASS/ATCA RM catalogue

Authors: S. Ranchod, S. A. Mao, R. Deane, S. S. Sridhar, A. Damas-Segovia, J. D. Livingston, Y. K. Ma

Abstract: The S-band Polarisation All Sky Survey (SPASS/ATCA) rotation measure (RM) catalogue is the largest broadband RM catalogue to date, increasing the RM density in the sparse southern sky. Through analysis of this catalogue, we report a latitude dependency of the Faraday complexity of polarised sources in this catalogue within 10$^\circ$ of the Galactic plane towards the inner Galaxy. In this study, w… ▽ More The S-band Polarisation All Sky Survey (SPASS/ATCA) rotation measure (RM) catalogue is the largest broadband RM catalogue to date, increasing the RM density in the sparse southern sky. Through analysis of this catalogue, we report a latitude dependency of the Faraday complexity of polarised sources in this catalogue within 10$^\circ$ of the Galactic plane towards the inner Galaxy. In this study, we aim to investigate this trend with follow-up observations using the Australia Telescope Compact Array (ATCA). We observe 95 polarised sources from the SPASS/ATCA RM catalogue at 1.1 - 3.1 GHz with ATCA's 6 km configuration. We present Stokes QU fitting results and a comparative analysis with the SPASS/ATCA catalogue. We find an overall decrease in complexity in these sources with the higher angular resolution observations, with a complexity fraction of 42\%, establishing that the majority of the complexity in the SPASS/ATCA sample is due to the mixing-in of diffuse Galactic emission at scales $θ> 2.8'$. Furthermore, we find a correlation between our observed small-scale complexity $θ< 2.8'$ and the Galactic spiral arms, which we interpret to be due to Galactic turbulence or small-scale polarised emission. These results emphasise the importance of considering the maximum angular scale to which the observations are sensitive in the classification of Faraday complexity; the effect of which can be more carefully investigated with SKA-precursor and pathfinder arrays (e.g. MeerKAT and ASKAP). △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 16 pages, 16 figures

Journal ref: A&A 686, A104 (2024)

arXiv:2403.12840 [pdf, other]

Optical properties of Euler-Heisenberg black hole in the Cold Dark Matter Halo

Authors: Lei You, Rui-bo Wang, Shi-Jie Ma, Jian-Bo Deng, Xian-Ru Hu

Abstract: The optical properties of Euler-Heisenberg (EH) black hole (BH) surrounded by Cold Dark Matter (CDM) halo are investigated. By changing BH's parameters, we found that the radius of horizon r_{h} and radius of photon sphere r_{ph} will transparently increase as CDM halo parameters R and ρincrease. To show the influence of CDM halo on the BH's optical characteristics, we took two sets of R and ρwith… ▽ More The optical properties of Euler-Heisenberg (EH) black hole (BH) surrounded by Cold Dark Matter (CDM) halo are investigated. By changing BH's parameters, we found that the radius of horizon r_{h} and radius of photon sphere r_{ph} will transparently increase as CDM halo parameters R and ρincrease. To show the influence of CDM halo on the BH's optical characteristics, we took two sets of R and ρwith prominent differences and plot the first four orders of images for thin accretion disk with different angle of inclination θof observer. The images with light intensity distributions using Novikov-Thorne (N-T) model are also derived, as well as the effective potential, photon orbits. Especially, analysis of intersection behaviors between photon trajectories with different impact parameters and circular time-like orbits in accretion disk will help better understand the image of thin accretion disk. Our results showed that CDM halo will make BH become more larger and dimmer distinctly. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 42 pages,16 figures,4 tables

arXiv:2403.11417 [pdf, ps, other]

Positioning Using Wireless Networks: Applications, Recent Progress and Future Challenges

Authors: Yang Yang, Mingzhe Chen, Yufei Blankenship, Jemin Lee, Zabih Ghassemlooy, Julian Cheng, Shiwen Mao

Abstract: Positioning has recently received considerable attention as a key enabler in emerging applications such as extended reality, unmanned aerial vehicles and smart environments. These applications require both data communication and high-precision positioning, and thus they are particularly well-suited to be offered in wireless networks (WNs). The purpose of this paper is to provide a comprehensive ov… ▽ More Positioning has recently received considerable attention as a key enabler in emerging applications such as extended reality, unmanned aerial vehicles and smart environments. These applications require both data communication and high-precision positioning, and thus they are particularly well-suited to be offered in wireless networks (WNs). The purpose of this paper is to provide a comprehensive overview of existing works and new trends in the field of positioning techniques from both the academic and industrial perspectives. The paper provides a comprehensive overview of positioning in WNs, covering the background, applications, measurements, state-of-the-art technologies and future challenges. The paper outlines the applications of positioning from the perspectives of public facilities, enterprises and individual users. We investigate the key performance indicators and measurements of positioning systems, followed by the review of the key enabler techniques such as artificial intelligence/large models and adaptive systems. Next, we discuss a number of typical wireless positioning technologies. We extend our overview beyond the academic progress, to include the standardization efforts, and finally, we provide insight into the challenges that remain. The comprehensive overview of exisitng efforts and new trends in the field of positioning from both the academic and industrial communities would be a useful reference to researchers in the field. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11152 [pdf, other]

Evaluation Ethics of LLMs in Legal Domain

Authors: Ruizhe Zhang, Haitao Li, Yueyue Wu, Qingyao Ai, Yiqun Liu, Min Zhang, Shaoping Ma

Abstract: In recent years, the utilization of large language models for natural language dialogue has gained momentum, leading to their widespread adoption across various domains. However, their universal competence in addressing challenges specific to specialized fields such as law remains a subject of scrutiny. The incorporation of legal ethics into the model has been overlooked by researchers. We asserts… ▽ More In recent years, the utilization of large language models for natural language dialogue has gained momentum, leading to their widespread adoption across various domains. However, their universal competence in addressing challenges specific to specialized fields such as law remains a subject of scrutiny. The incorporation of legal ethics into the model has been overlooked by researchers. We asserts that rigorous ethic evaluation is essential to ensure the effective integration of large language models in legal domains, emphasizing the need to assess domain-specific proficiency and domain-specific ethic. To address this, we propose a novelty evaluation methodology, utilizing authentic legal cases to evaluate the fundamental language abilities, specialized legal knowledge and legal robustness of large language models (LLMs). The findings from our comprehensive evaluation contribute significantly to the academic discourse surrounding the suitability and performance of large language models in legal domains. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 10 pages, in processing of ACL 2024

arXiv:2403.11102 [pdf, other]

Jointly Optimizing Terahertz based Sensing and Communications in Vehicular Networks: A Dynamic Graph Neural Network Approach

Authors: Xuefei Li, Mingzhe Chen, Ye Hu, Zhilong Zhang, Danpu Liu, Shiwen Mao

Abstract: In this paper, the problem of vehicle service mode selection (sensing, communication, or both) and vehicle connections within terahertz (THz) enabled joint sensing and communications over vehicular networks is studied. The considered network consists of several service provider vehicles (SPVs) that can provide: 1) only sensing service, 2) only communication service, and 3) both services, sensing s… ▽ More In this paper, the problem of vehicle service mode selection (sensing, communication, or both) and vehicle connections within terahertz (THz) enabled joint sensing and communications over vehicular networks is studied. The considered network consists of several service provider vehicles (SPVs) that can provide: 1) only sensing service, 2) only communication service, and 3) both services, sensing service request vehicles, and communication service request vehicles. Based on the vehicle network topology and their service accessibility, SPVs strategically select service request vehicles to provide sensing, communication, or both services. This problem is formulated as an optimization problem, aiming to maximize the number of successfully served vehicles by jointly determining the service mode of each SPV and its associated vehicles. To solve this problem, we propose a dynamic graph neural network (GNN) model that selects appropriate graph information aggregation functions according to the vehicle network topology, thus extracting more vehicle network information compared to traditional static GNNs that use fixed aggregation functions for different vehicle network topologies. Using the extracted vehicle network information, the service mode of each SPV and its served service request vehicles will be determined. Simulation results show that the proposed dynamic GNN based method can improve the number of successfully served vehicles by up to 17% and 28% compared to a GNN based algorithm with a fixed neural network model and a conventional optimization algorithm without using GNNs. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.10172 [pdf, other]

Unpacking ICT-supported Social Connections and Support of Late-life Migration: From the Lens of Social Convoys

Authors: Ying Lei, Shuai Ma, Yuling Sun

Abstract: Migration and aging-related dilemmas have limited the opportunities for late-life migrants to rebuild social connections and access support. While research on migrants has drawn increasing attention in HCI, limited attention has been paid to the increasing number of late-life migrants. This paper reports a qualitative study examining the social connections and support of late-life migrants. In par… ▽ More Migration and aging-related dilemmas have limited the opportunities for late-life migrants to rebuild social connections and access support. While research on migrants has drawn increasing attention in HCI, limited attention has been paid to the increasing number of late-life migrants. This paper reports a qualitative study examining the social connections and support of late-life migrants. In particular, drawing on the social convoy model, we pay specific attention to the dynamic changes of late-life migrants' social convoy, the supporting roles each convoy plays, the functions ICT plays in the process, as well as the encountered challenges and expectations of late-life migrants regarding ICT-supported social convoys. Based on these findings, we deeply discuss the role of the social convoy in supporting more targeted social support for late-life migrants, as well as broader migrant communities. Finally, we offer late-life migrant-oriented design considerations. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09805 [pdf, other]

On the Utility of 3D Hand Poses for Action Recognition

Authors: Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao

Abstract: 3D hand pose is an underexplored modality for action recognition. Poses are compact yet informative and can greatly benefit applications with limited compute budgets. However, poses alone offer an incomplete understanding of actions, as they cannot fully capture objects and environments with which humans interact. We propose HandFormer, a novel multimodal transformer, to efficiently model hand-obj… ▽ More 3D hand pose is an underexplored modality for action recognition. Poses are compact yet informative and can greatly benefit applications with limited compute budgets. However, poses alone offer an incomplete understanding of actions, as they cannot fully capture objects and environments with which humans interact. We propose HandFormer, a novel multimodal transformer, to efficiently model hand-object interactions. HandFormer combines 3D hand poses at a high temporal resolution for fine-grained motion modeling with sparsely sampled RGB frames for encoding scene semantics. Observing the unique characteristics of hand poses, we temporally factorize hand modeling and represent each joint by its short-term trajectories. This factorized pose representation combined with sparse RGB samples is remarkably efficient and highly accurate. Unimodal HandFormer with only hand poses outperforms existing skeleton-based methods at 5x fewer FLOPs. With RGB, we achieve new state-of-the-art performance on Assembly101 and H2O with significant improvements in egocentric action recognition. △ Less

Submitted 14 August, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: ECCV 2024; https://s-shamil.github.io/HandFormer/

arXiv:2403.09552 [pdf, other]

"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making

Authors: Shuai Ma, Xinru Wang, Ying Lei, Chuhan Shi, Ming Yin, Xiaojuan Ma

Abstract: In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-con… ▽ More In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-confidence appropriateness and reliance appropriateness. Then in our second study, We propose three calibration mechanisms and compare their effects on humans' self-confidence and user experience. Subsequently, our third study investigates the effects of self-confidence calibration on AI-assisted decision-making. Results show that calibrating human self-confidence enhances human-AI team performance and encourages more rational reliance on AI (in some aspects) compared to uncalibrated baselines. Finally, we discuss our main findings and provide implications for designing future AI-assisted decision-making interfaces. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.07376 [pdf, other]

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

Authors: Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang

Abstract: Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offlin… ▽ More Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offline manner usually suffers from substantial domain gap between the VLN task and the LLM training corpus. This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision, leading to a significant mitigation of the domain gap in a cost-effective manner. Specifically, at each timestep, the LLM is prompted to forecast the navigational chain-of-thought by: 1) acting as a world model to imagine the next observation according to the instruction, 2) selecting the candidate observation that best aligns with the imagination, and 3) determining the action based on the reasoning from the prior steps. Through constructing formalized labels for training, the LLM can learn to generate desired and reasonable chain-of-thought outputs for improving the action decision. Experimental results across various training settings and popular VLN benchmarks (e.g., Room-to-Room (R2R), Room-across-Room (RxR), Room-for-Room (R4R)) show the significant superiority of NavCoT over the direct action prediction variants. Through simple parameter-efficient finetuning, our NavCoT outperforms a recent GPT4-based approach with ~7% relative improvement on the R2R dataset. We believe that NavCoT will help unlock more task-adaptive and scalable LLM-based embodied agents, which are helpful for developing real-world robotics applications. Code is available at https://github.com/expectorlin/NavCoT. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07354 [pdf, other]

BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin

Authors: Qihang Fang, Chengcheng Tang, Shugao Ma, Yanchao Yang

Abstract: Skeleton-based motion representations are robust for action localization and understanding for their invariance to perspective, lighting, and occlusion, compared with images. Yet, they are often ambiguous and incomplete when taken out of context, even for human annotators. As infants discern gestures before associating them with words, actions can be conceptualized before being grounded with label… ▽ More Skeleton-based motion representations are robust for action localization and understanding for their invariance to perspective, lighting, and occlusion, compared with images. Yet, they are often ambiguous and incomplete when taken out of context, even for human annotators. As infants discern gestures before associating them with words, actions can be conceptualized before being grounded with labels. Therefore, we propose the first unsupervised pre-training framework, Boundary-Interior Decoding (BID), that partitions a skeleton-based motion sequence into discovered semantically meaningful pre-action segments. By fine-tuning our pre-training network with a small number of annotated data, we show results out-performing SOTA methods by a large margin. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 18 pages, 8 figures

MSC Class: 68T45 ACM Class: I.4.8

arXiv:2403.07274 [pdf, other]

Achievable Rate Analysis and Optimization of Double-RIS Assisted Spatially Correlated MIMO with Statistical CSI

Authors: Kaizhe Xu, Jiajia Guo, Jun Zhang, Shi Jin, Shaodan Ma

Abstract: Reconfigurable intelligent surface (RIS) is a novel meta-material which can form a smart radio environment by dynamically altering reflection directions of the impinging electromagnetic waves. In the prior literature, the inter-RIS links which also contribute to the performance of the whole system are usually neglected when multiple RISs are deployed. In this paper we investigate a general double-… ▽ More Reconfigurable intelligent surface (RIS) is a novel meta-material which can form a smart radio environment by dynamically altering reflection directions of the impinging electromagnetic waves. In the prior literature, the inter-RIS links which also contribute to the performance of the whole system are usually neglected when multiple RISs are deployed. In this paper we investigate a general double-RIS assisted multiple-input multiple-output (MIMO) wireless communication system under spatially correlated non line-of-sight propagation channels, where the cooperation of the double RISs is also considered. The design objective is to maximize the achievable ergodic rate based on full statistical channel state information (CSI). Specifically, we firstly present a closed-form asymptotic expression for the achievable ergodic rate by utilizing replica method from statistical physics. Then a full statistical CSI-enabled optimal design is proposed which avoids high pilot training overhead compared to instantaneous CSI-enabled design. To further reduce the signal processing overhead and lower the complexity for practical realization, a common-phase scheme is proposed to design the double RISs. Simulation results show that the derived asymptotic ergodic rate is quite accurate even for small-sized antenna arrays. And the proposed optimization algorithm can achieve substantial gain at the expense of a low overhead and complexity. Furthermore, the cooperative double-RIS assisted MIMO framework is proven to achieve superior ergodic rate performance and high communication reliability under harsh propagation environment. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06579 [pdf, other]

Edge Information Hub: Orchestrating Satellites, UAVs, MEC, Sensing and Communications for 6G Closed-Loop Controls

Authors: Chengleyang Lei, Wei Feng, Peng Wei, Yunfei Chen, Ning Ge, Shiwen Mao

Abstract: An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to the limited individual abilities, these robots usually require an edge information hub (EIH), with not only communication but also sensing and computing functions. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aerial… ▽ More An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to the limited individual abilities, these robots usually require an edge information hub (EIH), with not only communication but also sensing and computing functions. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aerial base stations or mobile edge computing (MEC), the EIH would direct the operations of robots via sensing-communication-computing-control ($\textbf{SC}^3$) closed-loop orchestration. This paper aims to optimize the closed-loop control performance of multiple $\textbf{SC}^3$ loops, with constraints on satellite-backhaul rate, computing capability, and on-board energy. Specifically, the linear quadratic regulator (LQR) control cost is used to measure the closed-loop utility, and a sum LQR cost minimization problem is formulated to jointly optimize the splitting of sensor data and allocation of communication and computing resources. We first derive the optimal splitting ratio of sensor data, and then recast the problem to a more tractable form. An iterative algorithm is finally proposed to provide a sub-optimal solution. Simulation results demonstrate the superiority of the proposed algorithm. We also uncover the influence of $\textbf{SC}^3$ parameters on closed-loop controls, highlighting more systematic understanding. △ Less

Submitted 24 August, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 16pages, 11 figures

arXiv:2403.06259 [pdf, other]

Editing Conceptual Knowledge for Large Language Models

Authors: Xiaohan Wang, Shengyu Mao, Ningyu Zhang, Shumin Deng, Yunzhi Yao, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

Abstract: Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establi… ▽ More Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establishing a suite of new metrics for evaluation. The experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge in LLMs, leading to poor performance. We anticipate this can inspire further progress in better understanding LLMs. Our project homepage is available at https://zjunlp.github.io/project/ConceptEdit. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: Work in progress. Code: https://github.com/zjunlp/EasyEdit Dataset: https://huggingface.co/datasets/zjunlp/ConceptEdit

arXiv:2403.06224 [pdf, other]

doi 10.1103/PhysRevB.109.214311

Imaginary gap-closed points and dynamics in a class of dissipative systems

Authors: Shicheng Ma, Heng Lin, Jinghui Pi

Abstract: We investigate imaginary gap-closed (IGC) points and their associated dynamics in dissipative systems. In a general non-Hermitian model, we derive the equation governing the IGC points of the energy spectrum, establishing that these points are only determined by the Hermitian part of the Hamiltonian. Focusing on a class of one-dimensional dissipative chains, we explore quantum walks across differe… ▽ More We investigate imaginary gap-closed (IGC) points and their associated dynamics in dissipative systems. In a general non-Hermitian model, we derive the equation governing the IGC points of the energy spectrum, establishing that these points are only determined by the Hermitian part of the Hamiltonian. Focusing on a class of one-dimensional dissipative chains, we explore quantum walks across different scenarios and various parameters, showing that IGC points induce a power-law decay scaling in bulk loss probability and trigger a boundary phenomenon referred to as "edge burst". This observation underscores the crucial role of IGC points under periodic boundary conditions (PBCs) in shaping quantum walk dynamics. Finally, we demonstrate that the damping matrices of these dissipative chains under PBCs possess Liouvillian gapless points, implying an algebraic convergence towards the steady state in long-time dynamics. △ Less

Submitted 2 July, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 11pages,8 figures

Journal ref: Phys. Rev. B 109, 214311 (2024)

arXiv:2403.05987 [pdf, other]

ROME/REA: Three-year, Tri-color Timeseries Photometry of the Galactic Bulge

Authors: R. A. Street, E. Bachelet, Y. Tsapras, M. P. G. Hundertmark, V. Bozza, D. M. Bramich, A. Cassan, M. Dominik, R. Figuera Jaimes, K. Horne, S. Mao, A. Saha, J. Wambsganss, Weicheng Zang

Abstract: The ROME/REA (Robotic Observations of Microlensing Events/Reactive Event Assessment) Survey was a Key Project at Las Cumbres Observatory (hereafter LCO) which continuously monitored 20 selected fields (3.76 sq.deg.) in the Galactic Bulge throughout their seasonal visibility window over a three-year period, between March 2017 and March 2020. Observations were made in three optical passbands (SDSS-g… ▽ More The ROME/REA (Robotic Observations of Microlensing Events/Reactive Event Assessment) Survey was a Key Project at Las Cumbres Observatory (hereafter LCO) which continuously monitored 20 selected fields (3.76 sq.deg.) in the Galactic Bulge throughout their seasonal visibility window over a three-year period, between March 2017 and March 2020. Observations were made in three optical passbands (SDSS-g', -r', -i'), and LCO's multi-site telescope network enabled the survey to achieve a typical cadence of $\sim$10\,hrs in i' and ~15 hrs in g' and r'. In addition, intervals of higher cadence (<1 hr) data were obtained during monitoring of key microlensing events within the fields. This paper describes the Difference Image Analysis data reduction pipeline developed to process these data, and the process for combining the photometry from LCO's three observing sites in the Southern Hemisphere. The full timeseries photometry for all 8 million stars, down to a limiting magnitude of i~18 mag is provided in the data release accompanying this paper, and samples of the data are presented for exemplar microlensing events, illustrating how the tri-band data are used to derive constraints on the microlensing source star parameters, a necessary step in determining the physical properties of the lensing object. The timeseries data also enables a wealth of additional science, for example in characterizing long-timescale stellar variability, and a few examples of the data for known variables are presented. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted for publication in PASP

arXiv:2403.05826 [pdf, other]

Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks

Authors: Minrui Xu, Dusit Niyato, Hongliang Zhang, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han

Abstract: Edge intelligence in space-air-ground integrated networks (SAGINs) can enable worldwide network coverage beyond geographical limitations for users to access ubiquitous and low-latency intelligence services. Facing global coverage and complex environments in SAGINs, edge intelligence can provision approximate large language models (LLMs) agents for users via edge servers at ground base stations (BS… ▽ More Edge intelligence in space-air-ground integrated networks (SAGINs) can enable worldwide network coverage beyond geographical limitations for users to access ubiquitous and low-latency intelligence services. Facing global coverage and complex environments in SAGINs, edge intelligence can provision approximate large language models (LLMs) agents for users via edge servers at ground base stations (BSs) or cloud data centers relayed by satellites. As LLMs with billions of parameters are pre-trained on vast datasets, LLM agents have few-shot learning capabilities, e.g., chain-of-thought (CoT) prompting for complex tasks, which raises a new trade-off between resource consumption and performance in SAGINs. In this paper, we propose a joint caching and inference framework for edge intelligence to provision sustainable and ubiquitous LLM agents in SAGINs. We introduce "cached model-as-a-resource" for offering LLMs with limited context windows and propose a novel optimization framework, i.e., joint model caching and inference, to utilize cached model resources for provisioning LLM agent services along with communication, computing, and storage resources. We design "age of thought" (AoT) considering the CoT prompting of LLMs, and propose a least AoT cached model replacement algorithm for optimizing the provisioning cost. We propose a deep Q-network-based modified second-bid (DQMSB) auction to incentivize network operators, which can enhance allocation efficiency by 23% while guaranteeing strategy-proofness and free from adverse selection. △ Less

Submitted 31 May, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.05567 [pdf, other]

A Unified Framework for Underwater Metaverse with Optical Perception

Authors: Jingyang Cao, Mu Zhou, Jiacheng Wang, Guangyuan Liu, Dusit Niyato, Shiwen Mao, Zhu Han, Jiawen Kang

Abstract: With the advancement of AI technology and increasing attention to deep-sea exploration, the underwater Metaverse is gradually emerging. This paper explores the concept of underwater Metaverse, emerging virtual reality systems and services aimed at simulating and enhancing virtual experience of marine environments. First, we discuss potential applications of underwater Metaverse in underwater scien… ▽ More With the advancement of AI technology and increasing attention to deep-sea exploration, the underwater Metaverse is gradually emerging. This paper explores the concept of underwater Metaverse, emerging virtual reality systems and services aimed at simulating and enhancing virtual experience of marine environments. First, we discuss potential applications of underwater Metaverse in underwater scientific research and marine conservation. Next, we present the architecture and supporting technologies of the underwater Metaverse, including high-resolution underwater imageing technologies and image processing technologies for rendering a realistic virtual world. Based on this, we present a use case for building a realistic underwater virtual world using underwater quantum imaging-generated artificial intelligence (QI-GAI) technology. The results demonstrate the effectiveness of the underwater Metaverse framework in simulating complex underwater environments, thus validating its potential in providing high-quality, interactive underwater virtual experiences. Finally, the paper examines the future development directions of underwater Metaverse, and provides new perspectives for marine science and conservation. △ Less

Submitted 20 February, 2024; originally announced March 2024.

arXiv:2403.04272 [pdf, other]

Active Generalized Category Discovery

Authors: Shijie Ma, Fei Zhu, Zhun Zhong, Xu-Yao Zhang, Cheng-Lin Liu

Abstract: Generalized Category Discovery (GCD) is a pragmatic and challenging open-world task, which endeavors to cluster unlabeled samples from both novel and old classes, leveraging some labeled data of old classes. Given that knowledge learned from old classes is not fully transferable to new classes, and that novel categories are fully unlabeled, GCD inherently faces intractable problems, including imba… ▽ More Generalized Category Discovery (GCD) is a pragmatic and challenging open-world task, which endeavors to cluster unlabeled samples from both novel and old classes, leveraging some labeled data of old classes. Given that knowledge learned from old classes is not fully transferable to new classes, and that novel categories are fully unlabeled, GCD inherently faces intractable problems, including imbalanced classification performance and inconsistent confidence between old and new classes, especially in the low-labeling regime. Hence, some annotations of new classes are deemed necessary. However, labeling new classes is extremely costly. To address this issue, we take the spirit of active learning and propose a new setting called Active Generalized Category Discovery (AGCD). The goal is to improve the performance of GCD by actively selecting a limited amount of valuable samples for labeling from the oracle. To solve this problem, we devise an adaptive sampling strategy, which jointly considers novelty, informativeness and diversity to adaptively select novel samples with proper uncertainty. However, owing to the varied orderings of label indices caused by the clustering of novel classes, the queried labels are not directly applicable to subsequent training. To overcome this issue, we further propose a stable label mapping algorithm that transforms ground truth labels to the label space of the classifier, thereby ensuring consistent training across different active selection stages. Our method achieves state-of-the-art performance on both generic and fine-grained datasets. Our code is available at https://github.com/mashijie1028/ActiveGCD △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.04259 [pdf, other]

Decentralized and Equitable Optimal Transport

Authors: Ivan Lau, Shiqian Ma, César A. Uribe

Abstract: This paper considers the decentralized (discrete) optimal transport (D-OT) problem. In this setting, a network of agents seeks to design a transportation plan jointly, where the cost function is the sum of privately held costs for each agent. We reformulate the D-OT problem as a constraint-coupled optimization problem and propose a single-loop decentralized algorithm with an iteration complexity o… ▽ More This paper considers the decentralized (discrete) optimal transport (D-OT) problem. In this setting, a network of agents seeks to design a transportation plan jointly, where the cost function is the sum of privately held costs for each agent. We reformulate the D-OT problem as a constraint-coupled optimization problem and propose a single-loop decentralized algorithm with an iteration complexity of O(1/ε) that matches existing centralized first-order approaches. Moreover, we propose the decentralized equitable optimal transport (DE-OT) problem. In DE-OT, in addition to cooperatively designing a transportation plan that minimizes transportation costs, agents seek to ensure equity in their individual costs. The iteration complexity of the proposed method to solve DE-OT is also O(1/ε). This rate improves existing centralized algorithms, where the best iteration complexity obtained is O(1/ε^2). △ Less

Submitted 12 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted to ACC 2024

arXiv:2403.03809 [pdf, ps, other]

Variational Bayesian Learning based Joint Localization and Path Loss Exponent with Distance-dependent Noise in Wireless Sensor Network

Authors: Yunfei Li, Yiting Luo, Weiqiang Tan, Chunguo Li, Shaodan Ma, Guanghua Yang

Abstract: This paper focuses on the challenge of jointly optimizing location and path loss exponent (PLE) in distance-dependent noise. Departing from the conventional independent noise model used in localization and path loss exponent estimation problems, we consider a more realistic model incorporating distance-dependent noise variance, as revealed in recent theoretical analyses and experimental results. T… ▽ More This paper focuses on the challenge of jointly optimizing location and path loss exponent (PLE) in distance-dependent noise. Departing from the conventional independent noise model used in localization and path loss exponent estimation problems, we consider a more realistic model incorporating distance-dependent noise variance, as revealed in recent theoretical analyses and experimental results. The distance-dependent noise introduces a complex noise model with unknown noise power and PLE, resulting in an exceptionally challenging non-convex and nonlinear optimization problem. In this study, we address a joint localization and path loss exponent estimation problem encompassing distance-dependent noise, unknown parameters, and uncertainties in sensor node locations. To surmount the intractable nonlinear and non-convex objective function inherent in the problem, we introduce a variational Bayesian learning-based framework that enables the joint optimization of localization, path loss exponent, and reference noise parameters by leveraging an effective approximation to the true posterior distribution. Furthermore, the proposed joint learning algorithm provides an iterative closed-form solution and exhibits superior performance in terms of computational complexity compared to existing algorithms. Computer simulation results demonstrate that the proposed algorithm approaches the performance of the Bayesian Cramer-Rao bound (BCRB), achieves localization performance comparable to the (maximum likelihood-Gaussian message passing) ML-GMP algorithm in some cases, and outperforms the other comparison algorithm in all cases. △ Less

Submitted 20 July, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03736 [pdf, other]

Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer

Authors: Naifu Xue, Qi Mao, Zijian Wang, Yuan Zhang, Siwei Ma

Abstract: Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivat… ▽ More Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivated by the capabilities of predictive language models for lossless compression, this paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression. A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization, alongside a multi-stage transformer designed to exploit spatial contextual information for modeling the prior distribution. As such, the dual-purpose framework effectively utilizes the learned prior for entropy estimation and assists in the regeneration of lost tokens. Extensive experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception, particularly in ultra-low bitrate scenarios (<=0.03 bpp), pioneering a new direction in generative compression. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03145 [pdf, other]

Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

Authors: Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

Abstract: Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips. Existing methods predominantly rely on self-supervised contrastive learning of audio-visual correspondence. Without any bounding-box annotations, they struggle to achieve precise localization, especially for small objects, and suffer from blurry boundaries and false positives.… ▽ More Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips. Existing methods predominantly rely on self-supervised contrastive learning of audio-visual correspondence. Without any bounding-box annotations, they struggle to achieve precise localization, especially for small objects, and suffer from blurry boundaries and false positives. Moreover, the naive semi-supervised method is poor in fully leveraging the information of abundant unlabeled data. In this paper, we propose a novel semi-supervised learning framework for AVSL, namely Dual Mean-Teacher (DMT), comprising two teacher-student structures to circumvent the confirmation bias issue. Specifically, two teachers, pre-trained on limited labeled data, are employed to filter out noisy samples via the consensus between their predictions, and then generate high-quality pseudo-labels by intersecting their confidence maps. The sufficient utilization of both labeled and unlabeled data and the proposed unbiased framework enable DMT to outperform current state-of-the-art methods by a large margin, with CIoU of 90.4% and 48.8% on Flickr-SoundNet and VGG-Sound Source, obtaining 8.9%, 9.6% and 4.6%, 6.4% improvements over self- and semi-supervised methods respectively, given only 3% positional-annotations. We also extend our framework to some existing AVSL methods and consistently boost their performance. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted to NeurIPS2023

arXiv:2403.03095 [pdf, other]

Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

Authors: Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

Abstract: Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects in the scene given audio cues. In our work, we focus on semi-supervised AVSL with pseudo-labeling. To address the issues with vanilla hard pseudo-labels including bias accumulation, noise sensitivity, and instability, we propose a novel method named Cross Pseudo-Labeling (XPL), wherein two models learn fro… ▽ More Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects in the scene given audio cues. In our work, we focus on semi-supervised AVSL with pseudo-labeling. To address the issues with vanilla hard pseudo-labels including bias accumulation, noise sensitivity, and instability, we propose a novel method named Cross Pseudo-Labeling (XPL), wherein two models learn from each other with the cross-refine mechanism to avoid bias accumulation. We equip XPL with two effective components. Firstly, the soft pseudo-labels with sharpening and pseudo-label exponential moving average mechanisms enable models to achieve gradual self-improvement and ensure stable training. Secondly, the curriculum data selection module adaptively selects pseudo-labels with high quality during training to mitigate potential bias. Experimental results demonstrate that XPL significantly outperforms existing methods, achieving state-of-the-art performance while effectively mitigating confirmation bias and ensuring training stability. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted To ICASSP2024

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.02437 [pdf, other]

SoK: Challenges and Opportunities in Federated Unlearning

Authors: Hyejun Jeong, Shiqing Ma, Amir Houmansadr

Abstract: Federated learning (FL), introduced in 2017, facilitates collaborative learning between non-trusting parties with no need for the parties to explicitly share their data among themselves. This allows training models on user data while respecting privacy regulations such as GDPR and CPRA. However, emerging privacy requirements may mandate model owners to be able to \emph{forget} some learned data, e… ▽ More Federated learning (FL), introduced in 2017, facilitates collaborative learning between non-trusting parties with no need for the parties to explicitly share their data among themselves. This allows training models on user data while respecting privacy regulations such as GDPR and CPRA. However, emerging privacy requirements may mandate model owners to be able to \emph{forget} some learned data, e.g., when requested by data owners or law enforcement. This has given birth to an active field of research called \emph{machine unlearning}. In the context of FL, many techniques developed for unlearning in centralized settings are not trivially applicable! This is due to the unique differences between centralized and distributed learning, in particular, interactivity, stochasticity, heterogeneity, and limited accessibility in FL. In response, a recent line of work has focused on developing unlearning mechanisms tailored to FL. This SoK paper aims to take a deep look at the \emph{federated unlearning} literature, with the goal of identifying research trends and challenges in this emerging field. By carefully categorizing papers published on FL unlearning (since 2020), we aim to pinpoint the unique complexities of federated unlearning, highlighting limitations on directly applying centralized unlearning methods. We compare existing federated unlearning methods regarding influence removal and performance recovery, compare their threat models and assumptions, and discuss their implications and limitations. For instance, we analyze the experimental setup of FL unlearning studies from various perspectives, including data heterogeneity and its simulation, the datasets used for demonstration, and evaluation metrics. Our work aims to offer insights and suggestions for future research on federated unlearning. △ Less

Submitted 5 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01791 [pdf, other]

Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Authors: Shuai Ma, Chenyi Zhang, Xinru Wang, Xiaojuan Ma, Ming Yin

Abstract: Artificial Intelligence (AI) is increasingly employed in various decision-making tasks, typically as a Recommender, providing recommendations that the AI deems correct. However, recent studies suggest this may diminish human analytical thinking and lead to humans' inappropriate reliance on AI, impairing the synergy in human-AI teams. In contrast, human advisors in group decision-making perform var… ▽ More Artificial Intelligence (AI) is increasingly employed in various decision-making tasks, typically as a Recommender, providing recommendations that the AI deems correct. However, recent studies suggest this may diminish human analytical thinking and lead to humans' inappropriate reliance on AI, impairing the synergy in human-AI teams. In contrast, human advisors in group decision-making perform various roles, such as analyzing alternative options or criticizing decision-makers to encourage their critical thinking. This diversity of roles has not yet been empirically explored in AI assistance. In this paper, we examine three AI roles: Recommender, Analyzer, and Devil's Advocate, and evaluate their effects across two AI performance levels. Our results show each role's distinct strengths and limitations in task performance, reliance appropriateness, and user experience. Notably, the Recommender role is not always the most effective, especially if the AI performance level is low, the Analyzer role may be preferable. These insights offer valuable implications for designing AI assistants with adaptive functional roles according to different situations. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01759 [pdf, other]

Open-world Machine Learning: A Review and New Outlooks

Authors: Fei Zhu, Shijie Ma, Zhen Cheng, Xu-Yao Zhang, Zhaoxiang Zhang, Cheng-Lin Liu

Abstract: Machine learning has achieved remarkable success in many applications. However, existing studies are largely based on the closed-world assumption, which assumes that the environment is stationary, and the model is fixed once deployed. In many real-world applications, this fundamental and rather naive assumption may not hold because an open environment is complex, dynamic, and full of unknowns. In… ▽ More Machine learning has achieved remarkable success in many applications. However, existing studies are largely based on the closed-world assumption, which assumes that the environment is stationary, and the model is fixed once deployed. In many real-world applications, this fundamental and rather naive assumption may not hold because an open environment is complex, dynamic, and full of unknowns. In such cases, rejecting unknowns, discovering novelties, and then incrementally learning them, could enable models to be safe and evolve continually as biological systems do. This paper provides a holistic view of open-world machine learning by investigating unknown rejection, novel class discovery, and class-incremental learning in a unified paradigm. The challenges, principles, and limitations of current methodologies are discussed in detail. Finally, we discuss several potential directions for future research. This paper aims to provide a comprehensive introduction to the emerging open-world machine learning paradigm, to help researchers build more powerful AI systems in their respective fields, and to promote the development of artificial general intelligence. △ Less

Submitted 14 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01093 [pdf, other]

Variational Bayesian Learning Based Localization and Channel Reconstruction in RIS-aided Systems

Authors: Yunfei Li, Yiting Luo, Xianda Wu, Zheng Shi, Shaodan Ma, Guanghua Yang

Abstract: The emerging immersive and autonomous services have posed stringent requirements on both communications and localization. By considering the great potential of reconfigurable intelligent surface (RIS), this paper focuses on the joint channel estimation and localization for RIS-aided wireless systems. As opposed to existing works that treat channel estimation and localization independently, this pa… ▽ More The emerging immersive and autonomous services have posed stringent requirements on both communications and localization. By considering the great potential of reconfigurable intelligent surface (RIS), this paper focuses on the joint channel estimation and localization for RIS-aided wireless systems. As opposed to existing works that treat channel estimation and localization independently, this paper exploits the intrinsic coupling and nonlinear relationships between the channel parameters and user location for enhancement of both localization and channel reconstruction. By noticing the non-convex, nonlinear objective function and the sparser angle pattern, a variational Bayesian learning-based framework is developed to jointly estimate the channel parameters and user location through leveraging an effective approximation of the posterior distribution. The proposed framework is capable of unifying near-field and far-field scenarios owing to exploitation of sparsity of the angular domain. Since the joint channel and location estimation problem has a closed-form solution in each iteration, our proposed iterative algorithm performs better than the conventional particle swarm optimization (PSO) and maximum likelihood (ML) based ones in terms of computational complexity. Simulations demonstrate that the proposed algorithm almost reaches the Bayesian Cramer-Rao bound (BCRB) and achieves a superior estimation accuracy by comparing to the PSO and the ML algorithms. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.19193 [pdf, ps, other]

Magnetic catalysis and diamagnetism from pion fluctuations

Authors: Jie Mei, Rui Wen, Shijun Mao, Mei Huang, Kun Xu

Abstract: In the framework of Nambu--Jona-Lasinio model beyond mean field approximation, the effects of pion fluctuations on (inverse) magnetic catalysis and magnetic susceptibility are studied. The negative magnetic susceptibility at low temperature is observed when contributions from both neutral and charged pions are taken into account. In weak field approximation, it is observed that at finite temperatu… ▽ More In the framework of Nambu--Jona-Lasinio model beyond mean field approximation, the effects of pion fluctuations on (inverse) magnetic catalysis and magnetic susceptibility are studied. The negative magnetic susceptibility at low temperature is observed when contributions from both neutral and charged pions are taken into account. In weak field approximation, it is observed that at finite temperature, the magnetic inhibition effect in the chiral limit, resulting from the difference between the transverse and longitudinal velocities of neutral pions, converts to weak magnetic catalysis when considering a non-zero current quark mass. Moreover, the magnetic catalysis is amplified by the charged pions. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 14 pages, 8 figures

arXiv:2402.17764 [pdf, other]

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Authors: Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

Abstract: Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-t… ▽ More Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Work in progress

arXiv:2402.16661 [pdf, other]

Penalized Generative Variable Selection

Authors: Tong Wang, Jian Huang, Shuangge Ma

Abstract: Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using t… ▽ More Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using the conditional Wasserstein Generative Adversarial networks. Group Lasso penalization is applied for variable selection, which may improve model estimation/prediction, interpretability, stability, etc. Significantly advancing from the existing literature, the analysis of censored survival data is also considered. We establish the convergence rate for variable selection while considering the approximation error, and obtain a more efficient distribution estimation. Simulations and the analysis of real experimental data demonstrate satisfactory practical utility of the proposed analysis. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.16366 [pdf, other]

SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field

Authors: Zetian Song, Wenhong Duan, Yuhuai Zhang, Shiqi Wang, Siwei Ma, Wen Gao

Abstract: Representing the Neural Radiance Field (NeRF) with the explicit voxel grid (EVG) is a promising direction for improving NeRFs. However, the EVG representation is not efficient for storage and transmission because of the terrific memory cost. Current methods for compressing EVG mainly inherit the methods designed for neural network compression, such as pruning and quantization, which do not take fu… ▽ More Representing the Neural Radiance Field (NeRF) with the explicit voxel grid (EVG) is a promising direction for improving NeRFs. However, the EVG representation is not efficient for storage and transmission because of the terrific memory cost. Current methods for compressing EVG mainly inherit the methods designed for neural network compression, such as pruning and quantization, which do not take full advantage of the spatial correlation of voxels. Inspired by prosperous digital image compression techniques, this paper proposes SPC-NeRF, a novel framework applying spatial predictive coding in EVG compression. The proposed framework can remove spatial redundancy efficiently for better compression performance.Moreover, we model the bitrate and design a novel form of the loss function, where we can jointly optimize compression ratio and distortion to achieve higher coding efficiency. Extensive experiments demonstrate that our method can achieve 32% bit saving compared to the state-of-the-art method VQRF on multiple representative test datasets, with comparable training time. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15713 [pdf, other]

Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Authors: Shengkun Ma, Jiale Han, Yi Liang, Bo Cheng

Abstract: Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations while avoiding forgetting old ones with few labeled training data. The primary challenges are catastrophic forgetting and overfitting. This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models to address the above two chal… ▽ More Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations while avoiding forgetting old ones with few labeled training data. The primary challenges are catastrophic forgetting and overfitting. This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models to address the above two challenges, thereby making language models better continual few-shot relation extractors. Specifically, we propose a Contrastive Prompt Learning framework, which designs prompt representation to acquire more generalized knowledge that can be easily adapted to old and new categories, and margin-based contrastive learning to focus more on hard samples, therefore alleviating catastrophic forgetting and overfitting issues. To further remedy overfitting in low-resource scenarios, we introduce an effective memory augmentation strategy that employs well-crafted prompts to guide ChatGPT in generating diverse samples. Extensive experiments demonstrate that our method outperforms state-of-the-art methods by a large margin and significantly mitigates catastrophic forgetting and overfitting in low-resource scenarios. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Accepted as COLING2024

arXiv:2402.15690 [pdf, other]

Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology

Authors: Zhenhua Wang, Wei Xie, Baosheng Wang, Enze Wang, Zhiwen Gui, Shuoyoucheng Ma, Kai Chen

Abstract: Large Language Models (LLMs) have gradually become the gateway for people to acquire new knowledge. However, attackers can break the model's security protection ("jail") to access restricted information, which is called "jailbreaking." Previous studies have shown the weakness of current LLMs when confronted with such jailbreaking attacks. Nevertheless, comprehension of the intrinsic decision-makin… ▽ More Large Language Models (LLMs) have gradually become the gateway for people to acquire new knowledge. However, attackers can break the model's security protection ("jail") to access restricted information, which is called "jailbreaking." Previous studies have shown the weakness of current LLMs when confronted with such jailbreaking attacks. Nevertheless, comprehension of the intrinsic decision-making mechanism within the LLMs upon receipt of jailbreak prompts is noticeably lacking. Our research provides a psychological explanation of the jailbreak prompts. Drawing on cognitive consistency theory, we argue that the key to jailbreak is guiding the LLM to achieve cognitive coordination in an erroneous direction. Further, we propose an automatic black-box jailbreaking method based on the Foot-in-the-Door (FITD) technique. This method progressively induces the model to answer harmful questions via multi-step incremental prompts. We instantiated a prototype system to evaluate the jailbreaking effectiveness on 8 advanced LLMs, yielding an average success rate of 83.9%. This study builds a psychological perspective on the explanatory insights into the intrinsic decision-making logic of LLMs. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.13959 [pdf, other]

Retention Induced Biases in a Recommendation System with Heterogeneous Users

Authors: Shichao Ma

Abstract: I examine a conceptual model of a recommendation system (RS) with user inflow and churn dynamics. When inflow and churn balance out, the user distribution reaches a steady state. Changing the recommendation algorithm alters the steady state and creates a transition period. During this period, the RS behaves differently from its new steady state. In particular, A/B experiment metrics obtained in tr… ▽ More I examine a conceptual model of a recommendation system (RS) with user inflow and churn dynamics. When inflow and churn balance out, the user distribution reaches a steady state. Changing the recommendation algorithm alters the steady state and creates a transition period. During this period, the RS behaves differently from its new steady state. In particular, A/B experiment metrics obtained in transition periods are biased indicators of the RS's long term performance. Scholars and practitioners, however, often conduct A/B tests shortly after introducing new algorithms to validate their effectiveness. This A/B experiment paradigm, widely regarded as the gold standard for assessing RS improvements, may consequently yield false conclusions. I also briefly discuss the data bias caused by the user retention dynamics. △ Less

Submitted 6 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12903 [pdf, ps, other]

Inverse problems for semilinear Schrödinger equations at large frequency via polynomial resolvent estimates on manifolds

Authors: Katya Krupchyk, Shiqi Ma, Suman Kumar Sahoo, Mikko Salo, Simon St-Amant

Abstract: We study inverse boundary problems for semilinear Schrödinger equations on smooth compact Riemannian manifolds of dimensions $\ge 2$ with smooth boundary, at a large fixed frequency. We show that certain classes of cubic nonlinearities are determined uniquely from the knowledge of the nonlinear Dirichlet--to--Neumann map at a large fixed frequency on quite general Riemannian manifolds. In particul… ▽ More We study inverse boundary problems for semilinear Schrödinger equations on smooth compact Riemannian manifolds of dimensions $\ge 2$ with smooth boundary, at a large fixed frequency. We show that certain classes of cubic nonlinearities are determined uniquely from the knowledge of the nonlinear Dirichlet--to--Neumann map at a large fixed frequency on quite general Riemannian manifolds. In particular, in contrast to the previous results available, here the manifolds need not satisfy any product structure, may have trapped geodesics, and the geodesic ray transform need not be injective. Only a mild assumption about the geometry of intersecting geodesics is required. We also establish a polynomial resolvent estimate for the Laplacian on an arbitrary smooth compact Riemannian manifold without boundary, valid for most frequencies. This estimate, along with the invariant construction of Gaussian beam quasimodes with uniform bounds for underlying constants and a stationary phase lemma with explicit control over all involved constants, constitutes the key elements in proving the uniqueness results for the considered inverse problems. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12474 [pdf, other]

CGOLS V: Disk-wide Stellar Feedback and Observational Implications of the Cholla Galactic Wind Model

Authors: Evan E. Schneider, S. Alwin Mao

Abstract: We present the fifth simulation in the CGOLS project -- a set of isolated starburst galaxy simulations modeled over large scales ($10\kpc$) at uniformly high resolution ($Δx \approx 5\pc$). Supernova feedback in this simulation is implemented as a disk-wide distribution of clusters, and we assess the impact of this geometry on several features of the resulting outflow, including radial profiles of… ▽ More We present the fifth simulation in the CGOLS project -- a set of isolated starburst galaxy simulations modeled over large scales ($10\kpc$) at uniformly high resolution ($Δx \approx 5\pc$). Supernova feedback in this simulation is implemented as a disk-wide distribution of clusters, and we assess the impact of this geometry on several features of the resulting outflow, including radial profiles of various phases; mass, momentum, and energy outflow rates; covering fraction of cool gas; mock absorption-line spectra; and X-ray surface brightness. In general, we find that the outflow generated by this model is cooler, slower, and contains more mass in the cool phase than a more centrally concentrated outflow driven by a similar number of supernovae. In addition, the energy loading factors in the hot phase are an order-of-magnitude lower, indicating much larger losses due to radiative cooling in the outflow. However, coupling between the hot and cool phases is more efficient than in the nuclear burst case, with almost 50\% of the total outflowing energy flux carried by the cool phase at a radial distance of 5 kpc. These physical differences have corresponding signatures in observable quantities: the covering fraction of cool gas is much larger, and there is greater evidence of absorption in low and intermediate ionization-energy lines. Taken together, our simulations indicate that centrally-concentrated starbursts are more effective at driving hot, low-density outflows that will expand far into the halo, while galaxy-wide bursts may be more effective at removing cool gas from the disk. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 22 pages, 13 figures, accepted in ApJ

arXiv:2402.11422 [pdf, other]

Mitigating Catastrophic Forgetting in Multi-domain Chinese Spelling Correction by Multi-stage Knowledge Transfer Framework

Authors: Peng Xing, Yinghui Li, Shirong Ma, Xinnian Liang, Haojing Huang, Yangning Li, Hai-Tao Zheng, Wenhao Jiang, Ying Shen

Abstract: Chinese Spelling Correction (CSC) aims to detect and correct spelling errors in given sentences. Recently, multi-domain CSC has gradually attracted the attention of researchers because it is more practicable. In this paper, we focus on the key flaw of the CSC model when adapting to multi-domain scenarios: the tendency to forget previously acquired knowledge upon learning new domain-specific knowle… ▽ More Chinese Spelling Correction (CSC) aims to detect and correct spelling errors in given sentences. Recently, multi-domain CSC has gradually attracted the attention of researchers because it is more practicable. In this paper, we focus on the key flaw of the CSC model when adapting to multi-domain scenarios: the tendency to forget previously acquired knowledge upon learning new domain-specific knowledge (i.e., catastrophic forgetting). To address this, we propose a novel model-agnostic Multi-stage Knowledge Transfer (MKT) framework, which utilizes a continuously evolving teacher model for knowledge transfer in each domain, rather than focusing solely on new domain knowledge. It deserves to be mentioned that we are the first to apply continual learning methods to the multi-domain CSC task. Experiments prove the effectiveness of our proposed method, and further analyses demonstrate the importance of overcoming catastrophic forgetting for improving the model performance. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.11420 [pdf, other]

Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction

Authors: Yinghui Li, Shang Qin, Jingheng Ye, Shirong Ma, Yangning Li, Libo Qin, Xuming Hu, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu

Abstract: Recently, Large Language Models (LLMs) have been widely studied by researchers for their roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. Previous studies have shown that LLMs' performance as correctors on CGEC remains unsatisfactory due to its challeng… ▽ More Recently, Large Language Models (LLMs) have been widely studied by researchers for their roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. Previous studies have shown that LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus. To promote the CGEC field to better adapt to the era of LLMs, we rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC. Considering the rich grammatical knowledge stored in LLMs and their powerful semantic understanding capabilities, we utilize LLMs as explainers to provide explanation information for the CGEC small models during error correction to enhance performance. We also use LLMs as evaluators to bring more reasonable CGEC evaluations, thus alleviating the troubles caused by the subjectivity of the CGEC task. In particular, our work is also an active exploration of how LLMs and small models better collaborate in downstream tasks. Extensive experiments and detailed analyses on widely used datasets verify the effectiveness of our thinking intuition and the proposed methods. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.11100 [pdf, other]

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Authors: Yinghui Li, Qingyu Zhou, Yuanzhen Luo, Shirong Ma, Yangning Li, Hai-Tao Zheng, Xuming Hu, Philip S. Yu

Abstract: Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understa… ▽ More Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. And we design three tasks with increasing difficulty in the FLUB benchmark to evaluate the fallacy understanding ability of LLMs. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs, reflecting our FLUB is challenging and worthy of more future study. Interesting discoveries and valuable insights are achieved in our extensive experiments and detailed analyses. We hope that our benchmark can encourage the community to improve LLMs' ability to understand fallacies. Our data and codes are available at https://github.com/THUKElab/FLUB. △ Less

Submitted 9 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Showing 151–200 of 2,055 results for author: Ma, S