Zum Hauptinhalt springen

Showing 51–100 of 623 results for author: Qi, X

.
  1. arXiv:2405.11616  [pdf, other

    cs.CV

    Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

    Authors: Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo

    Abstract: In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should… ▽ More

    Submitted 29 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  2. arXiv:2405.04880  [pdf, other

    cs.SD cs.AI eess.AS

    The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

    Authors: Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun

    Abstract: With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on… ▽ More

    Submitted 15 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2404.14047  [pdf, other

    cs.LG

    An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs

    Authors: Wei Huang, Xingyu Zheng, Xudong Ma, Haotong Qin, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno

    Abstract: The LLaMA family has become one of the most powerful open-source Large Language Models (LLMs) and the popular LLM backbones of Multimodal Large Language Models (MLLMs), widely applied in Computer Vision (CV) and Natural Language Understanding (NLU) tasks. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over… ▽ More

    Submitted 19 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  4. arXiv:2404.13013  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

    Authors: Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi

    Abstract: We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability. Beyond holistic image understanding, Groma is adept at region-level tasks such as region captioning and visual grounding. Such capabilities are built upon a localized visual tokenization mechanism, where an image input is decomposed into regions of interest and subsequently encode… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  5. arXiv:2404.11869  [pdf, other

    cs.LG cs.SI

    Node-like as a Whole: Structure-aware Searching and Coarsening for Graph Classification

    Authors: Xiaorui Qi, Qijie Bai, Yanlong Wen, Haiwei Zhang, Xiaojie Yuan

    Abstract: Graph Transformers (GTs) have made remarkable achievements in graph-level tasks. However, most existing works regard graph structures as a form of guidance or bias for enhancing node representations, which focuses on node-central perspectives and lacks explicit representations of edges and structures. One natural question is, can we treat graph structures node-like as a whole to learn high-level f… ▽ More

    Submitted 25 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  6. arXiv:2404.11525  [pdf, other

    cs.CV eess.IV

    JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

    Authors: Zeyu Zhang, Xuyin Qi, Mingxi Chen, Guangxi Li, Ryan Pham, Ayub Qassim, Ella Berry, Zhibin Liao, Owen Siggs, Robert Mclaughlin, Jamie Craig, Minh-Son To

    Abstract: The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offeri… ▽ More

    Submitted 28 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted to MIUA 2024 Oral

  7. arXiv:2404.09613  [pdf, other

    cs.ET cs.AI cs.AR

    Efficient and accurate neural field reconstruction using resistive memory

    Authors: Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  8. arXiv:2404.05648  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model

    Authors: Jichang Yang, Hegan Chen, Jia Chen, Songqi Wang, Shaocong Wang, Yifei Yu, Xi Chen, Bo Wang, Xinyuan Zhang, Binbin Cui, Yi Li, Ning Lin, Meng Xu, Yi Li, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Han Wang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated st… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  9. arXiv:2404.02026  [pdf

    physics.optics cond-mat.mtrl-sci

    Infrared nanosensors of pico- to micro-newton forces

    Authors: Natalie Fardian-Melamed, Artiom Skripka, Changhwan Lee, Benedikt Ursprung, Thomas P. Darlington, Ayelet Teitelboim, Xiao Qi, Maoji Wang, Jordan M. Gerton, Bruce E. Cohen, Emory M. Chan, P. James Schuck

    Abstract: Mechanical force is an essential feature for many physical and biological processes.1-12 Remote measurement of mechanical signals with high sensitivity and spatial resolution is needed for diverse applications, including robotics,13 biophysics,14-20 energy storage,21-24 and medicine.25-27 Nanoscale luminescent force sensors excel at measuring piconewton forces,28-32 while larger sensors have prove… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  10. arXiv:2404.00409  [pdf, other

    cs.CV cs.GR

    3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting

    Authors: Xiaoyang Lyu, Yang-Tian Sun, Yi-Hua Huang, Xiuzhe Wu, Ziyi Yang, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi

    Abstract: In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS. The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized. Firs… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  11. arXiv:2403.19314  [pdf, other

    cs.CV

    Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

    Authors: Xiaoyang Lyu, Chirui Chang, Peng Dai, Yang-Tian Sun, Xiaojuan Qi

    Abstract: Scene reconstruction from multi-view images is a fundamental problem in computer vision and graphics. Recent neural implicit surface reconstruction methods have achieved high-quality results; however, editing and manipulating the 3D geometry of reconstructed scenes remains challenging due to the absence of naturally decomposed object entities and complex object/background compositions. In this pap… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, accepted by CVPR 2024

  12. arXiv:2403.14760  [pdf, other

    cs.CV

    Can 3D Vision-Language Models Truly Understand Natural Language?

    Authors: Weipeng Deng, Jihan Yang, Runyu Ding, Jiahui Liu, Yijiang Li, Xiaojuan Qi, Edith Ngai

    Abstract: Rapid advancements in 3D vision-language (3D-VL) tasks have opened up new avenues for human interaction with embodied agents or robots using natural language. Despite this progress, we find a notable limitation: existing 3D-VL models exhibit sensitivity to the styles of language input, struggling to understand sentences with the same semantic meaning but written in different variants. This observa… ▽ More

    Submitted 3 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: https://github.com/VincentDENGP/3D-LR

  13. arXiv:2403.12104  [pdf, other

    cond-mat.mes-hall physics.optics

    Topology reconstruction for asymmetric systems by isomorphic mapping or perturbation approximation

    Authors: Yunlin Li, Jingguang Chen, Xingchao Qi, Langlang Xiong, Xianjun Wang, Yufu Liu, Fang Guan, Lei Shi, Xunya Jiang

    Abstract: The systems without symmetries, e.g. the spatial and chiral symmetries, are generally thought to be improper for topological study and no conventional integral topological invariant can be well defined. In this work, with multi-band asymmetric Rice-Mele-like systems as examples, for the first time we show that the topology of all gaps can be reconstructed by two general methods and topological ori… ▽ More

    Submitted 24 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  14. arXiv:2403.12035  [pdf, other

    cs.CV

    CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

    Authors: Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu Chen, Bin Liang, Kam-Fai Wong, Lei Zhang

    Abstract: Recent advancements in video generation have been remarkable, yet many existing methods struggle with issues of consistency and poor text-video alignment. Moreover, the field lacks effective techniques for text-guided video inpainting, a stark contrast to the well-explored domain of text-guided image inpainting. To this end, this paper proposes a novel text-guided video inpainting model that achie… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  15. arXiv:2403.10071  [pdf, other

    cs.CV

    Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling

    Authors: Baoquan Zhang, Huaibin Wang, Luo Chuyao, Xutao Li, Liang Guotao, Yunming Ye, Xiaochen Qi, Yao He

    Abstract: Vector-Quantized Image Modeling (VQIM) is a fundamental research problem in image synthesis, which aims to represent an image with a discrete token sequence. Existing studies effectively address this problem by learning a discrete codebook from scratch and in a code-independent manner to quantize continuous representations into discrete tokens. However, learning a codebook from scratch and in a co… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  16. arXiv:2403.06384  [pdf, other

    physics.atom-ph

    Precision Spectroscopy and Nuclear Structure Parameters in 7Li+ ion

    Authors: Hua Guan, Xiao-Qiu Qi, Peng-Peng Zhou, Wei Sun, Shao-Long Chen, Xu-Rui Chang, Yao Huang, Pei-Pei Zhang, Zong-Chao Yan, G. W. F. Drake, Ai-Xi Chen, Zhen-Xiang Zhong, Ting-Yun Shi, Ke-Lin Gao

    Abstract: The optical Ramsey technique is used to obtain precise measurements of the hyperfine splittings in the $2\,^3\!S_1$ and $2\,^3\!P_J$ states of $^7$Li$^+$. Together with bound-state quantum electrodynamic theory, the Zemach radius and quadrupole moment of the $^7$Li nucleus are determined to be $3.35(1)$~fm and $-3.86(5)$~fm$^2$ respectively, with the quadrupole moment deviating from the recommende… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  17. arXiv:2403.05895  [pdf, other

    cs.CV

    DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos

    Authors: Xiuzhe Wu, Xiaoyang Lyu, Qihao Huang, Yong Liu, Yang Wu, Ying Shan, Xiaojuan Qi

    Abstract: Although considerable advancements have been attained in self-supervised depth estimation from monocular videos, most existing methods often treat all objects in a video as static entities, which however violates the dynamic nature of real-world scenes and fails to model the geometry and motion of moving objects. In this paper, we propose a self-supervised method to jointly learn 3D motion and dep… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 24 pages, 14 figures, Tech Report

  18. arXiv:2403.04098  [pdf

    physics.optics physics.app-ph

    Intrinsic Optical Bistability of Photon Avalanching Nanocrystals

    Authors: Artiom Skripka, Zhuolei Zhang, Xiao Qi, Benedikt Ursprung, Peter Ercius, Bruce E. Cohen, P. James Schuck, Daniel Jaque, Emory M. Chan

    Abstract: Optically bistable materials respond to a single input with two possible optical outputs, contingent upon excitation history. Such materials would be ideal for optical switching and memory, yet limited understanding of intrinsic optical bistability (IOB) prevents development of nanoscale IOB materials suitable for devices. Here, we demonstrate IOB in Nd3+-doped KPb2Cl5 avalanching nanoparticles (A… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  19. arXiv:2403.02579  [pdf, other

    cond-mat.dis-nn cs.LG

    Geometric Dynamics of Signal Propagation Predict Trainability of Transformers

    Authors: Aditya Cowsik, Tamra Nebabu, Xiao-Liang Qi, Surya Ganguli

    Abstract: We investigate forward signal propagation and gradient back propagation in deep, randomly initialized transformers, yielding simple necessary and sufficient conditions on initialization hyperparameters that ensure trainability of deep transformers. Our approach treats the evolution of the representations of $n$ tokens as they propagate through the transformer layers in terms of a discrete time dyn… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  20. arXiv:2402.18133  [pdf, other

    cs.LG cs.CV

    Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

    Authors: Jiequan Cui, Beier Zhu, Xin Wen, Xiaojuan Qi, Bei Yu, Hanwang Zhang

    Abstract: In this paper, we present an empirical study on image recognition fairness, i.e., extreme class accuracy disparity on balanced data like ImageNet. We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets, network architectures, and model capacities. Moreover, several intriguing properties of fairness are id… ▽ More

    Submitted 12 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  21. arXiv:2402.15870  [pdf, other

    cs.CV

    Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting

    Authors: Ziyi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang Jin

    Abstract: The recent advancements in 3D Gaussian splatting (3D-GS) have not only facilitated real-time rendering through modern GPU rasterization pipelines but have also attained state-of-the-art rendering quality. Nevertheless, despite its exceptional rendering quality and performance on standard datasets, 3D-GS frequently encounters difficulties in accurately modeling specular and anisotropic components.… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  22. arXiv:2402.14968  [pdf, other

    cs.CR cs.CL

    Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

    Authors: Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Junjie Hu, Yixuan Li, Patrick McDaniel, Muhao Chen, Bo Li, Chaowei Xiao

    Abstract: Despite the general capabilities of Large Language Models (LLM), these models still request fine-tuning or adaptation with customized data when meeting specific business demands. However, this process inevitably introduces new threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack) under the setting of Language-Model-as-a-Service (LMaaS), where the model's safety has been s… ▽ More

    Submitted 20 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  23. arXiv:2402.14577  [pdf, other

    cs.CV

    Debiasing Text-to-Image Diffusion Models

    Authors: Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi

    Abstract: Learning-based Text-to-Image (TTI) models like Stable Diffusion have revolutionized the way visual content is generated in various domains. However, recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems, which raises important concerns. In this work, we target resolving the social bias in TTI diffusion models. We begin by formalizing the problem se… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  24. arXiv:2402.12791  [pdf, other

    physics.optics

    Dual-polarization huge photonic spin Hall shift and deep-subwavelength sensing based on topological singularities in one-dimensional photonic crystals

    Authors: Yufu Liu, Xianjun Wang, Yunlin Li, Haoran Zhang, Langlang Xiong, Xingchao Qi, Zhen Lai, Xuezhi Wang, Xunya Jiang

    Abstract: Although several efforts have been taken to enhance the photonic spin Hall shift in deep-subwavelength region, according to effective medium theory, the fundamental confliction between near-zero reflection coefficient and near-zero incident angle still hinders the further application. Here, we reveal a fundamental breakdown of effective medium theory due to the existing of topological singularity… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  25. arXiv:2402.05162  [pdf, other

    cs.LG cs.AI cs.CL

    Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

    Authors: Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson

    Abstract: Large language models (LLMs) show inherent brittleness in their safety mechanisms, as evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications. We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from util… ▽ More

    Submitted 1 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 22 pages, 9 figures. Project page is available at https://boyiwei.com/alignment-attribution/

  26. arXiv:2402.04291  [pdf, other

    cs.LG cs.AI cs.CL

    BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

    Authors: Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi

    Abstract: Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely reduce model weights to a mere 1 bit, lowering the expensive computation and memory requirements. However, existing quantization techniques fall short of maintai… ▽ More

    Submitted 15 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 19 pages

  27. arXiv:2402.03908  [pdf, other

    cs.CV

    EscherNet: A Generative Model for Scalable View Synthesis

    Authors: Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison

    Abstract: We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding, allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality, flexibility, and scala… ▽ More

    Submitted 19 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: CVPR2024 Project Page: https://kxhit.github.io/EscherNet

  28. arXiv:2402.03310  [pdf, other

    cs.AI cs.CV

    V-IRL: Grounding Virtual Intelligence in Real Life

    Authors: Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xie

    Abstract: There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created. To develop AI agents that can sense, think, and act as flexibly as humans in real-world settings, it is imperative to bridge the realism gap between the digital and physical worlds. How can we embody agents in an environment as rich and diverse as the one we inhabit, without… ▽ More

    Submitted 18 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Project page: https://virl-platform.github.io

  29. arXiv:2401.11372  [pdf, other

    cs.RO cs.LG

    Back-stepping Experience Replay with Application to Model-free Reinforcement Learning for a Soft Snake Robot

    Authors: Xinda Qi, Dong Chen, Zhaojian Li, Xiaobo Tan

    Abstract: In this paper, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed t… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: Submitted to the IEEE for possible publication

  30. arXiv:2401.06377  [pdf, other

    cs.RO

    Design and Nonlinear Modeling of a Modular Cable Driven Soft Robotic Arm

    Authors: Xinda Qi, Yu Mei, Dong Chen, Zhaojian Li, Xiaobo Tan

    Abstract: We propose a novel multi-section cable-driven soft robotic arm inspired by octopus tentacles along with a new modeling approach. Each section of the modular manipulator is made of a soft tubing backbone, a soft silicon arm body, and two rigid endcaps, which connect adjacent sections and decouple the actuation cables of different sections. The soft robotic arm is made with casting after the rigid e… ▽ More

    Submitted 15 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: The paper has been accepted by IEEE Transactions on Mechatronics

  31. arXiv:2401.05750  [pdf, other

    cs.CV

    GO-NeRF: Generating Virtual Objects in Neural Radiance Fields

    Authors: Peng Dai, Feitong Tan, Xin Yu, Yinda Zhang, Xiaojuan Qi

    Abstract: Despite advances in 3D generation, the direct creation of 3D objects within an existing 3D scene represented as NeRF remains underexplored. This process requires not only high-quality 3D object generation but also seamless composition of the generated 3D content into the existing NeRF. To this end, we propose a new method, GO-NeRF, capable of utilizing scene context for high-quality and harmonious… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 12 pages

    MSC Class: ACM-class

  32. arXiv:2401.02475  [pdf, other

    quant-ph hep-th

    Space-time generalization of mutual information

    Authors: Paolo Glorioso, Xiao-Liang Qi, Zhenbin Yang

    Abstract: The mutual information characterizes correlations between spatially separated regions of a system. Yet, in experiments we often measure dynamical correlations, which involve probing operators that are also separated in time. Here, we introduce a space-time generalization of mutual information which, by construction, satisfies several natural properties of the mutual information and at the same tim… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 33 pages, 12 figures

  33. arXiv:2401.00149  [pdf, ps, other

    quant-ph

    Properties of new even and odd nonlinear coherent states with different parameters

    Authors: Cheng Zhang, Rui-Jiao Miao, Xiao-Qiu Qi

    Abstract: We construct a class of nonlinear coherent states (NLCSs) by introducing a more general nonlinear function and study their non-classical properties, specifically the second-order correlation function $g^{(2)}(0)$, Mandel parameter $Q$, squeezing, amplitude squared squeezing and Wigner function of the optical field. The results indicate that the non-classical properties of the new types of even and… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

  34. arXiv:2312.14937  [pdf, other

    cs.CV cs.GR

    SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

    Authors: Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

    Abstract: Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse c… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Code link: https://github.com/yihua7/SC-GS

  35. arXiv:2312.12824  [pdf, other

    eess.IV cs.CV

    FedSODA: Federated Cross-assessment and Dynamic Aggregation for Histopathology Segmentation

    Authors: Yuan Zhang, Yaolei Qi, Xiaoming Qi, Lotfi Senhadji, Yongyue Wei, Feng Chen, Guanyu Yang

    Abstract: Federated learning (FL) for histopathology image segmentation involving multiple medical sites plays a crucial role in advancing the field of accurate disease diagnosis and treatment. However, it is still a task of great challenges due to the sample imbalance across clients and large data heterogeneity from disparate organs, variable segmentation tasks, and diverse distribution. Thus, we propose a… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP2024

  36. arXiv:2312.11960  [pdf, ps, other

    cs.CC math.CO

    Offensive Alliances in Signed Graphs

    Authors: Zhidan Feng, Henning Fernau, Kevin Mann, Xingqin Qi

    Abstract: Signed graphs have been introduced to enrich graph structures expressing relationships between persons or general social entities, introducing edge signs to reflect the nature of the relationship, e.g., friendship or enmity. Independently, offensive alliances have been defined and studied for undirected, unsigned graphs. We join both lines of research and define offensive alliances in signed graph… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  37. arXiv:2312.11843  [pdf, other

    cs.RO

    Enhancing Social Decision-Making of Autonomous Vehicles: A Mixed-Strategy Game Approach With Interaction Orientation Identification

    Authors: Jiaqi Liu, Xiao Qi, Peng Hang, Jian Sun

    Abstract: The integration of Autonomous Vehicles (AVs) into existing human-driven traffic systems poses considerable challenges, especially within environments where human and machine interactions are frequent and complex, such as at unsignalized intersections. To deal with these challenges, we introduce a novel framework predicated on dynamic and socially-aware decision-making game theory to augment the so… ▽ More

    Submitted 4 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  38. arXiv:2312.11084  [pdf, other

    cs.RO cs.MA

    Multi-Agent Reinforcement Learning for Connected and Automated Vehicles Control: Recent Advancements and Future Prospects

    Authors: Min Hua, Dong Chen, Xinda Qi, Kun Jiang, Zemin Eitan Liu, Quan Zhou, Hongming Xu

    Abstract: Connected and automated vehicles (CAVs) have emerged as a potential solution to the future challenges of developing safe, efficient, and eco-friendly transportation systems. However, CAV control presents significant challenges, given the complexity of interconnectivity and coordination required among the vehicles. To address this, multi-agent reinforcement learning (MARL), with its notable advance… ▽ More

    Submitted 16 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  39. arXiv:2312.09424  [pdf, other

    cs.CL cs.AI

    Open Domain Knowledge Extraction for Knowledge Graphs

    Authors: Kun Qian, Anton Belyi, Fei Wu, Samira Khorshidi, Azadeh Nikfarjam, Rahul Khot, Yisi Sang, Katherine Luna, Xianqi Chu, Eric Choi, Yash Govind, Chloe Seivwright, Yiwen Sun, Ahmed Fakhry, Theo Rekatsinas, Ihab Ilyas, Xiaoguang Qi, Yunyao Li

    Abstract: The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from ope… ▽ More

    Submitted 30 October, 2023; originally announced December 2023.

    Comments: 7 pages, 7 figures, 5 tables, preprint technical report, no code or data is released

    MSC Class: 68T30 (primary) ACM Class: F.4.1; I.2.4

  40. arXiv:2312.09262  [pdf, other

    cs.LG cs.AR

    Random resistive memory-based deep extreme point learning machine for unified visual processing

    Authors: Shaocong Wang, Yizhao Gao, Yi Li, Woyu Zhang, Yifei Yu, Bo Wang, Ning Lin, Hegan Chen, Yue Zhang, Yang Jiang, Dingchen Wang, Jia Chen, Peng Dai, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Xiaoxin Xu, Hayden So, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Visual sensors, including 3D LiDAR, neuromorphic DVS sensors, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. Realizing intensive multi-sensory data analysis directly on edge intelligent machines is crucial for numerous emerging edge applications, such as augmented and virtual reality and unmanned aerial vehicles, which necessitates unified data rep… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  41. arXiv:2312.08754  [pdf, other

    cs.CV

    UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

    Authors: Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli Ouyang

    Abstract: Recent advancements in text-to-3D generation technology have significantly advanced the conversion of textual descriptions into imaginative well-geometrical and finely textured 3D objects. Despite these developments, a prevalent limitation arises from the use of RGB data in diffusion or reconstruction models, which often results in models with inherent lighting and shadows effects that detract fro… ▽ More

    Submitted 13 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted to ECCV 2024

  42. arXiv:2312.04044  [pdf, other

    cs.CV

    Residual Graph Convolutional Network for Bird's-Eye-View Semantic Segmentation

    Authors: Qiuxiao Chen, Xiaojun Qi

    Abstract: Retrieving spatial information and understanding the semantic information of the surroundings are important for Bird's-Eye-View (BEV) semantic segmentation. In the application of autonomous driving, autonomous vehicles need to be aware of their surroundings to drive safely. However, current BEV semantic segmentation techniques, deep Convolutional Neural Networks (CNNs) and transformers, have diffi… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 8 pages, 5 figures, this paper has been accepted by and will be presented at the WACV 2024

  43. arXiv:2311.17963  [pdf, other

    cs.CV

    M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

    Authors: Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: While current LLM chatbots like GPT-4V bridge the gap between human instructions and visual representations to enable text-image generations, they still lack efficient alignment methods for high-fidelity performance on multiple downstream tasks. In this paper, we propose \textbf{$M^{2}Chat$}, a novel unified multimodal LLM framework for generating interleaved text-image conversation across various… ▽ More

    Submitted 13 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  44. arXiv:2311.17532  [pdf, other

    cs.CV

    Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

    Authors: Xingqun Qi, Jiahao Pan, Peng Li, Ruibin Yuan, Xiaowei Chi, Mengfei Li, Wenhan Luo, Wei Xue, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Generating vivid and emotional 3D co-speech gestures is crucial for virtual avatar animation in human-machine interaction applications. While the existing methods enable generating the gestures to follow a single emotion label, they overlook that long gesture sequence modeling with emotion transition is more practical in real scenes. In addition, the lack of large-scale available datasets with emo… ▽ More

    Submitted 27 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024

  45. arXiv:2311.13200  [pdf, other

    cs.CV

    Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models

    Authors: Xiyu Qi, Yifan Wu, Yongqiang Mao, Wenhui Zhang, Yidan Zhang

    Abstract: The Segment Anything Model (SAM) exhibits remarkable versatility and zero-shot learning abilities, owing largely to its extensive training data (SA-1B). Recognizing SAM's dependency on manual guidance given its category-agnostic nature, we identified unexplored potential within few-shot semantic segmentation tasks for remote sensing imagery. This research introduces a structured framework designed… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  46. arXiv:2311.12315  [pdf, other

    cs.CL

    AcademicGPT: Empowering Academic Research

    Authors: Shufa Wei, Xiaolong Xu, Xianbiao Qi, Xi Yin, Jun Xia, Jingyi Ren, Peijun Tang, Yuxiang Zhong, Yihao Chen, Xiaoqin Ren, Yuxin Liang, Liankai Huang, Kai Xie, Weikang Gui, Wei Tan, Shuanglong Sun, Yongquan Hu, Qinxian Liu, Nanjin Li, Chihao Dai, Lihua Wang, Xiaohui Liu, Lei Zhang, Yutao Xie

    Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities across various natural language processing tasks. Yet, many of these advanced LLMs are tailored for broad, general-purpose applications. In this technical report, we introduce AcademicGPT, designed specifically to empower academic research. AcademicGPT is a continual training model derived from LLaMA2-70B. Our training corpus… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Technical Report. arXiv admin note: text overlap with arXiv:2310.12081, arXiv:2310.10053 by other authors

  47. arXiv:2311.08424  [pdf, other

    cs.SE

    Exploring Multi-Programming-Language Commits and Their Impacts on Software Quality: An Empirical Study on Apache Projects

    Authors: Zengyang Li, Xiaoxiao Qi, Qinyi Yu, Peng Liang, Ran Mo, Chen Yang

    Abstract: Context: Modern software systems (e.g., Apache Spark) are usually written in multiple programming languages (PLs). There is little understanding on the phenomenon of multi-programming-language commits (MPLCs), which involve modified source files written in multiple PLs. Objective: This work aims to explore MPLCs and their impacts on development difficulty and software quality. Methods: We performe… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Preprint accepted for publication in Journal of Systems and Software, 2022. arXiv admin note: substantial text overlap with arXiv:2103.11691

  48. arXiv:2311.07164  [pdf, other

    cs.ET cs.AI cs.AR

    Pruning random resistive memory for optimizing analogue AI

    Authors: Yi Li, Songqi Wang, Yaping Zhao, Shaocong Wang, Woyu Zhang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Xiaoxin Xu, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic device… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  49. arXiv:2311.06956  [pdf, other

    cs.CV

    SegReg: Segmenting OARs by Registering MR Images and CT Annotations

    Authors: Zeyu Zhang, Xuyin Qi, Bowen Zhang, Biao Wu, Hien Le, Bora Jeong, Zhibin Liao, Yunxiang Liu, Johan Verjans, Minh-Son To, Richard Hartley

    Abstract: Organ at risk (OAR) segmentation is a critical process in radiotherapy treatment planning such as head and neck tumors. Nevertheless, in clinical practice, radiation oncologists predominantly perform OAR segmentations manually on CT scans. This manual process is highly time-consuming and expensive, limiting the number of patients who can receive timely radiotherapy. Additionally, CT scans offer lo… ▽ More

    Submitted 1 March, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: Accepted to ISBI 2024

  50. arXiv:2311.05470  [pdf, other

    cs.CE

    Designing ship hull forms using generative adversarial networks

    Authors: Kazuo Yonekura, Kotaro Omori, Xinran Qi, Katsuyuki Suzuki

    Abstract: We proposed a GAN-based method to generate a ship hull form. Unlike mathematical hull forms that require geometrical parameters to generate ship hull forms, the proposed method requires desirable ship performance parameters, i.e., the drag coefficient and tonnage. The requirements of ship owners are generally focused on the ship performance and not the geometry itself. Hence, the proposed model is… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.