Zum Hauptinhalt springen

Showing 1–50 of 388 results for author: Zou, X

.
  1. arXiv:2408.09429  [pdf, other

    cs.LG cs.CL cs.CV

    Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

    Authors: Kening Zheng, Junkai Chen, Yibo Yan, Xin Zou, Xuming Hu

    Abstract: Hallucination issues persistently plagued current multimodal large language models (MLLMs). While existing research primarily focuses on object-level or attribute-level hallucinations, sidelining the more sophisticated relation hallucinations that necessitate advanced reasoning abilities from MLLMs. Besides, recent benchmarks regarding relation hallucinations lack in-depth evaluation and effective… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  2. arXiv:2408.08802  [pdf, other

    cs.CV

    PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors

    Authors: Rongxuan Wang, Xin Lu, Xiaoyang Liu, Xiaoyi Zou, Tongyi Cao, Ying Li

    Abstract: Online vectorized High-Definition (HD) map construction is crucial for subsequent prediction and planning tasks in autonomous driving. Following MapTR paradigm, recent works have made noteworthy achievements. However, reference points are randomly initialized in mainstream methods, leading to unstable matching between predictions and ground truth. To address this issue, we introduce PriorMapNet to… ▽ More

    Submitted 20 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  3. arXiv:2408.02878  [pdf

    physics.optics physics.app-ph

    Ultrahigh-speed thin-film lithium niobate optical coherent receiver

    Authors: Xiaojun Xie, Chao Wei, Xingchen He, Yake Chen, Chenghao Wang, Jihui Sun, Lin Jiang, Jia Ye, Xihua Zou, Wei Pan, Lianshan Yan

    Abstract: The rapid advancement of the thin-film lithium niobate platform has established it as a premier choice for high-performance photonics integration. High-speed optical coherent receivers are essential for supporting the large communication capacities required by data center interconnects. Although high-speed photodiodes have been demonstrated on the thin-film LiNbO3 platform, the development of an u… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  4. arXiv:2407.17062  [pdf, other

    astro-ph.SR

    MK-like spectral classification for hot subdwarf stars with LAMOST spectra

    Authors: Xuan Zou, Zhenxin Lei

    Abstract: An MK-like spectral classification has been conducted for 1224 hot subdwarf stars with LAMOST DR9 low-resolution spectra. The whole sample was divided into four categories according to the spectral line characteristics: He-normal, He-weak, He-strong C and He-strong. Each selected spectrum was assigned a spectral class, a luminosity class and an helium class by comparing the line depth and width wi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 12 pages,10 figures, 2 tables, accepted for publication in PASJ

  5. arXiv:2407.15354  [pdf, other

    cs.CV cs.RO

    Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection

    Authors: Zhili Chen, Shuangjie Xu, Maosheng Ye, Zian Qian, Xiaoyi Zou, Dit-Yan Yeung, Qifeng Chen

    Abstract: The Bird's-Eye-View (BEV) representation is a critical factor that directly impacts the 3D object detection performance, but the traditional BEV grid representation induces quadratic computational cost as the spatial resolution grows. To address this limitation, we present a new camera-based 3D object detector with high-resolution vector representation: VectorFormer. The presented high-resolution… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. Project page: https://github.com/zlichen/VectorFormer

  6. arXiv:2407.02534  [pdf, other

    cs.CR cs.CV

    Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

    Authors: Xiaotian Zou, Ke Li, Yongkang Chen

    Abstract: Large Visual Language Model\textbfs (VLMs) such as GPT-4V have achieved remarkable success in generating comprehensive and nuanced responses. Researchers have proposed various benchmarks for evaluating the capabilities of VLMs. With the integration of visual and text inputs in VLMs, new security issues emerge, as malicious attackers can exploit multiple modalities to achieve their objectives. This… ▽ More

    Submitted 26 August, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2406.18358  [pdf, other

    physics.plasm-ph physics.app-ph

    Microscopic characteristics of SF6 partial discharge induced by a floating linear metal particle

    Authors: Zihao Feng, Yuanyuan Jiang, Liyang Zhang, Zhigang Liu, Kai Wang, Xinxin Wang, Xiaobing Zou, Haiyun Luo, Yangyang Fu

    Abstract: Direct current (DC) gas insulated transmission lines (GILs) have been widely used in power transmission, but might be threatened by partial discharge due to the presence of floating impurities (e.g., dust and metal particles) inside the sealed chamber. In this letter, by using a 2D fluid model we characterize the microscopic properties of the partial discharge induced by a floating linear metal pa… ▽ More

    Submitted 20 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  8. arXiv:2406.18007  [pdf, other

    cs.MM

    Deep Mamba Multi-modal Learning

    Authors: Jian Zhu, Xin Zou, Yu Cui, Zhangmin Huang, Chenshu Hu, Bo Lyu

    Abstract: Inspired by the excellent performance of Mamba networks, we propose a novel Deep Mamba Multi-modal Learning (DMML). It can be used to achieve the fusion of multi-modal features. We apply DMML to the field of multimedia retrieval and propose an innovative Deep Mamba Multi-modal Hashing (DMMH) method. It combines the advantages of algorithm accuracy and inference speed. We validated the effectivenes… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

    Comments: Deep Mamba Multi-modal Learning; Deep Mamba Multi-modal Hashing

  9. arXiv:2406.17952  [pdf, other

    cs.LG cs.CG

    LINSCAN -- A Linearity Based Clustering Algorithm

    Authors: Andrew Dennehy, Xiaoyu Zou, Shabnam J. Semnani, Yuri Fialko, Alexander Cloninger

    Abstract: DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approx… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: 20 pages; v2 - typo update; Code: https://github.com/AudioLLMs/AudioBench

  11. arXiv:2406.14069  [pdf, other

    eess.IV cs.CV

    Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  12. arXiv:2406.12943  [pdf

    eess.IV

    A square cross-section FOV rotational CL (SC-CL) and its analytical reconstruction method

    Authors: Xiang Zou, Wuliang Shi, Muge Du, Yuxiang Xing

    Abstract: Rotational computed laminography (CL) has broad application potential in three-dimensional imaging of plate-like objects, as it only needs x-ray to pass through the tested object in the thickness direction during the imaging process. In this study, a square cross-section FOV rotational CL (SC-CL) was proposed. Then, the FDK-type analytical reconstruction algorithm applicable to the SC-CL was deriv… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  13. arXiv:2406.12018  [pdf, other

    cs.CL

    CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

    Authors: Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung

    Abstract: Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexit… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Work in progress

  14. arXiv:2406.06887  [pdf, other

    cs.CL cs.AI cs.LG cs.PL cs.SE

    PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models

    Authors: Dylan Zhang, Shizhe Diao, Xueyan Zou, Hao Peng

    Abstract: Instruction-finetuned code language models (LMs) have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language instructions and gold code snippet pairs. Recent evidence suggests that these models, never exposed to incorrect solutions during training, often struggle to distinguish between correct and incorrect solutions. This observation… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  15. arXiv:2405.19647  [pdf, other

    cs.LG

    FTS: A Framework to Find a Faithful TimeSieve

    Authors: Songning Lai, Ninghui Feng, Jiechao Gao, Hao Wang, Haochen Sui, Xin Zou, Jiayu Yang, Wenshuo Chen, Hang Zhao, Xuming Hu, Yutao Yue

    Abstract: The field of time series forecasting has garnered significant attention in recent years, prompting the development of advanced models like TimeSieve, which demonstrates impressive performance. However, an analysis reveals certain unfaithfulness issues, including high sensitivity to random seeds, input and layer noise perturbations and parametric perturbations. Recognizing these challenges, we emba… ▽ More

    Submitted 10 August, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Journal ref: IJCAI2024 workshop

  16. arXiv:2405.18991  [pdf, other

    cs.CV cs.CL cs.MM

    EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

    Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

    Abstract: This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the producti… ▽ More

    Submitted 5 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures

  17. arXiv:2405.14135  [pdf, other

    cs.LG cs.AI

    Learning Geospatial Region Embedding with Heterogeneous Graph

    Authors: Xingchen Zou, Jiani Huang, Xixuan Hao, Yuhao Yang, Haomin Wen, Yibo Yan, Chao Huang, Yuxuan Liang

    Abstract: Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  18. arXiv:2405.08327  [pdf, other

    astro-ph.HE

    Multiband Simultaneous Photometry of Type II SN 2023ixf with Mephisto and the Twin 50-cm Telescopes

    Authors: Yuan-Pei Yang, Xiangkun Liu, Yu Pan, Xinzhong Er, Dezi Liu, Yuan Fang, Guowang Du, Yongzhi Cai, Xian Xu, Xinlei Chen, Xingzhu Zou, Helong Guo, Chenxu Liu, Yehao Cheng, Brajesh Kumar, Xiaowei Liu

    Abstract: SN 2023ixf, recently reported in the nearby galaxy M101 at a distance of $6.85~{\rm Mpc}$, was one of the closest and brightest core-collapse supernovae (CCSNe) in the last decade. In this work, we present multi-wavelength photometric observation of SN 2023ixf with the Multi-channel Photometric Survey Telescope (Mephisto) in $uvgr$ bands and with the twin 50-cm telescopes in $griz$ bands. We find… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 15 pages, 7 figures, 3 tables. Accepted for publication in ApJ. Comments welcome!

  19. arXiv:2405.07964  [pdf, other

    astro-ph.HE

    Early-phase simultaneous multiband observations of the Type II supernova SN 2024ggi with Mephisto

    Authors: Xinlei Chen, Brajesh Kumar, Xinzhong Er, Helong Guo, Yuan-Pei Yang, Weikang Lin, Yuan Fang, Guowang Du, Chenxu Liu, Jiewei Zhao, Tianyu Zhang, Yuxi Bao, Xingzhu Zou, Yu Pan, Yu Wang, Xufeng Zhu, Kaushik Chatterjee, Xiangkun Liu, Dezi Liu, Edoardo P. Lagioia, Geeta Rangwal, Shiyan Zhong, Jinghua Zhang, Jianhui Lian, Yongzhi Cai , et al. (2 additional authors not shown)

    Abstract: We present early-phase good-cadence (hour-to-day) simultaneous multiband ($ugi$ and $vrz$ bands) imaging of the nearby supernova SN~2024ggi, which exploded in the nearby galaxy, NGC 3621. A quick follow-up was conducted within less than a day after the explosion and continued $\sim$23 days. The $uvg$ band light curves display a rapid rise ($\sim$1.4 mag day$^{-1}$) to maximum in $\sim$4 days and a… ▽ More

    Submitted 2 August, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Pages 12, Table 1, Figures 7

    Journal ref: ApJL, 2024, 971:L2

  20. arXiv:2405.01204  [pdf, other

    cs.CV cs.AI

    Towards Cross-Scale Attention and Surface Supervision for Fractured Bone Segmentation in CT

    Authors: Yu Zhou, Xiahao Zou, Yi Wang

    Abstract: Bone segmentation is an essential step for the preoperative planning of fracture trauma surgery. The automated segmentation of fractured bone from computed tomography (CT) scans remains challenging, due to the large differences of fractures in position and morphology, and also the inherent anatomical characteristics of different bone structures. To alleviate these issues, we propose a cross-scale… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  21. arXiv:2404.09569  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall

    Surprising pressure-induced magnetic transformations from Helimagnetic order to Antiferromagnetic state in NiI2

    Authors: Qiye Liu, Wenjie Su, Yue Gu, Xi Zhang, Xiuquan Xia, Le Wang, Ke Xiao, Xiaodong Cui, Xiaolong Zou, Bin Xi, Jia-Wei Mei, Jun-Feng Dai

    Abstract: Interlayer magnetic interactions play a pivotal role in determining the magnetic arrangement within van der Waals (vdW) magnets, and the remarkable tunability of these interactions through applied pressure further enhances their significance. Here, we investigate NiI2 flakes, a representative vdW magnet, under hydrostatic pressures up to 11 GPa. We reveal a notable increase in magnetic transition… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  22. arXiv:2404.09433  [pdf, other

    eess.IV

    MarsQE: Semantic-Informed Quality Enhancement for Compressed Martian Image

    Authors: Chengfeng Liu, Mai Xu, Qunliang Xing, Xin Zou

    Abstract: Lossy image compression is essential for Mars exploration missions, due to the limited bandwidth between Earth and Mars. However, the compression may introduce visual artifacts that complicate the geological analysis of the Martian surface. Existing quality enhancement approaches, primarily designed for Earth images, fall short for Martian images due to a lack of consideration for the unique Marti… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  23. arXiv:2404.07458  [pdf, other

    physics.plasm-ph

    I-mode Plasma Confinement Improvement by Real-time Lithium Injection and its Classification on EAST Tokamak

    Authors: X. M. Zhong, X. L. Zou, A. D. Liu, Y. T. Song, G. Zhuang, H. Q. Liu, L. Q. Xu, E. Z. Li, B. Zhang, G. Z. Zuo, Z. Wang, C. Zhou, J. Zhang, W. X. Shi, L. T. Gao, S. F. Wang, W. Gao, T. Q. Jia, Q. Zang, H. L. Zhao, M. Wang, H. D. Xu, X. J. Wang, X. Gao, X. D. Lin , et al. (3 additional authors not shown)

    Abstract: I-mode is a promising regime for future fusion reactors due to the high energy confinement and the moderate particle confinement. However, the effect of lithium, which has been widely applied for particle recycling and impurity control, on I-mode plasma is still unclear. Recently, experiments of real-time lithium powder injection on I-mode plasma have been carried out in EAST Tokamak. It was found… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  24. arXiv:2404.00727  [pdf, other

    cs.CL

    A Controlled Reevaluation of Coreference Resolution Models

    Authors: Ian Porada, Xiyuan Zou, Jackie Chi Kit Cheung

    Abstract: All state-of-the-art coreference resolution (CR) models involve finetuning a pretrained language model. Whether the superior performance of one CR model over another is due to the choice of language model or other factors, such as the task-specific architecture, is difficult or impossible to determine due to lack of a standardized experimental setup. To resolve this ambiguity, we systematically ev… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: LREC-COLING 2024

  25. arXiv:2403.19980  [pdf, other

    cs.CV

    A Parallel Attention Network for Cattle Face Recognition

    Authors: Jiayu Li, Xuechao Zou, Shiying Wang, Ben Chen, Junliang Xing, Pin Tao

    Abstract: Cattle face recognition holds paramount significance in domains such as animal husbandry and behavioral research. Despite significant progress in confined environments, applying these accomplishments in wild settings remains challenging. Thus, we create the first large-scale cattle face recognition dataset, ICRWE, for wild environments. It encompasses 483 cattle and 9,816 high-resolution image sam… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME 2024

  26. Coverage-Guaranteed Prediction Sets for Out-of-Distribution Data

    Authors: Xin Zou, Weiwei Liu

    Abstract: Out-of-distribution (OOD) generalization has attracted increasing research attention in recent years, due to its promising experimental results in real-world applications. In this paper,we study the confidence set prediction problem in the OOD generalization setting. Split conformal prediction (SCP) is an efficient framework for handling the confidence set prediction problem. However, the validity… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Journal ref: AAAI (2024) Vol. 38, No. 15, pages 17263-17270

  27. arXiv:2403.14135  [pdf, other

    eess.IV cs.CV

    Powerful Lossy Compression for Noisy Images

    Authors: Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou

    Abstract: Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recentl… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME 2024

  28. arXiv:2403.11373  [pdf, other

    cs.CV

    Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration

    Authors: Shu Zhao, Xiaohan Zou, Tan Yu, Huijuan Xu

    Abstract: Pre-trained large multi-modal models (LMMs) exploit fine-tuning to adapt diverse user applications. Nevertheless, fine-tuning may face challenges due to deactivated sensors (e.g., cameras turned off for privacy or technical issues), yielding modality-incomplete data and leading to inconsistency in training data and the data for inference. Additionally, continuous training leads to catastrophic for… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  29. arXiv:2403.10920  [pdf, other

    cs.CR

    Batch-oriented Element-wise Approximate Activation for Privacy-Preserving Neural Networks

    Authors: Peng Zhang, Ao Duan, Xianglu Zou, Yuhong Liu

    Abstract: Privacy-Preserving Neural Networks (PPNN) are advanced to perform inference without breaching user privacy, which can serve as an essential tool for medical diagnosis to simultaneously achieve big data utility and privacy protection. As one of the key techniques to enable PPNN, Fully Homomorphic Encryption (FHE) is facing a great challenge that homomorphic operations cannot be easily adapted for n… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  30. arXiv:2403.08572  [pdf, other

    cs.LG

    Caformer: Rethinking Time Series Analysis from Causal Perspective

    Authors: Kexuan Zhang, Xiaobei Zou, Yang Tang

    Abstract: Time series analysis is a vital task with broad applications in various domains. However, effectively capturing cross-dimension and cross-time dependencies in non-stationary time series poses significant challenges, particularly in the context of environmental factors. The spurious correlation induced by the environment confounds the causal relationships between cross-dimension and cross-time depe… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  31. arXiv:2403.02601  [pdf, other

    eess.IV cs.CV

    Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

    Authors: Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu

    Abstract: For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  32. Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook

    Authors: Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, Yuxuan Liang

    Abstract: As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning m… ▽ More

    Submitted 16 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Journal ref: Inform.Fusion.113(2025)102606

  33. arXiv:2402.18828  [pdf, other

    cond-mat.str-el

    Strongly-tilted field induced Hamiltonian dimerization and nested quantum scars in the 1D spinless Fermi-Hubbard model

    Authors: Wei-Jie Huang, Yu-Biao Wu, Guang-Can Guo, Wu-Ming Liu, Xu-Bo Zou

    Abstract: We investigate the quantum dynamics of the 1D spinless Fermi-Hubbard model with a linear-tilted potential. Surprisingly in a strong resonance regime, we show that the model can be described by the kinetically constrained effective Hamiltonian, and it can be spontaneously divided into two commuting parts dubbed Hamiltonian dimerization, which consist of a sum of constrained two-site hopping terms a… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  34. arXiv:2402.15758  [pdf, other

    cs.CL cs.AI

    Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens

    Authors: Ziqian Zeng, Jiahong Yu, Qianshi Pang, Zihao Wang, Huiping Zhuang, Hongen Shao, Xiaofeng Zou

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their widespread application is hindered by the resource-intensive decoding process. To address this challenge, current approaches have incorporated additional decoding heads to enable parallel prediction of multiple subsequent tokens, thereby achieving inference acceleration. Nevertheless, the ac… ▽ More

    Submitted 18 April, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  35. arXiv:2402.14857  [pdf, other

    cs.CL cs.AI cs.CR

    Is the System Message Really Important to Jailbreaks in Large Language Models?

    Authors: Xiaotian Zou, Yongkang Chen, Ke Li

    Abstract: The rapid evolution of Large Language Models (LLMs) has rendered them indispensable in modern society. While security measures are typically to align LLMs with human values prior to release, recent studies have unveiled a concerning phenomenon named "Jailbreak". This term refers to the unexpected and potentially harmful responses generated by LLMs when prompted with malicious questions. Most exist… ▽ More

    Submitted 18 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 13 pages,3 figures

  36. arXiv:2402.06096  [pdf, other

    gr-qc astro-ph.EP astro-ph.IM physics.space-ph

    Doppler Tracking Data of Martian Mission Tianwen-I and Upper Limit of Stochastic Gravitational Wave Background

    Authors: Xiaoming Bi, Zhongkai Guo, Xiaobo Zou, Yong Huang, Peijia Li, Jianfeng Cao, Lue Chen, Wenlin Tang, Yun Kau Lau

    Abstract: Two way ranging data for spacecraft tracking of China's first Martian mission Tianwen-I is analysed. Shortly before the spacecraft entered the Mars parking orbit, the two way coherent microwave link between the spacecraft and the Earth resembles a long arm gravitational wave interferometer, with both the spacecraft and the Earth regarded as in an approximate free falling state. By carefully select… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 10 pages, 8 figures

  37. arXiv:2402.03767  [pdf

    physics.app-ph cond-mat.mes-hall

    Magnetic Field Gated and Current Controlled Spintronic Mem-transistor Neuron -based Spiking Neural Networks

    Authors: Aijaz H. Lone, Meng Tang, Daniel N. Rahimi, Xuecui Zou, Dongxing Zheng, Hossein Fariborzi, Xixiang Zhang, Gianluca Setti

    Abstract: Spintronic devices, such as the domain walls and skyrmions, have shown significant potential for applications in energy-efficient data storage and beyond CMOS computing architectures. In recent years, spiking neural networks have shown more bio-plausibility. Based on the magnetic multilayer spintronic devices, we demonstrate the magnetic field-gated Leaky integrate and fire neuron characteristics… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 33 pages, 10 figures

  38. arXiv:2401.14427  [pdf, other

    cs.SE cs.CR cs.LG

    Beimingwu: A Learnware Dock System

    Authors: Zhi-Hao Tan, Jian-Dong Liu, Xiao-Dong Bi, Peng Tan, Qin-Cheng Zheng, Hai-Tian Liu, Yi Xie, Xiao-Chuan Zou, Yang Yu, Zhi-Hua Zhou

    Abstract: The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes. In this paradigm, developers worldwide can submit their high-performing models spontaneously to the learnware dock system (formerly known as learnwa… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  39. arXiv:2401.06715  [pdf, other

    cs.CL cs.AI

    Reframing Tax Law Entailment as Analogical Reasoning

    Authors: Xinrui Zou, Ming Zhang, Nathaniel Weir, Benjamin Van Durme, Nils Holzenberger

    Abstract: Statutory reasoning refers to the application of legislative provisions to a series of case facts described in natural language. We re-frame statutory reasoning as an analogy task, where each instance of the analogy task involves a combination of two instances of statutory reasoning. This increases the dataset size by two orders of magnitude, and introduces an element of interpretability. We show… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  40. arXiv:2312.13637  [pdf

    cond-mat.mes-hall cond-mat.str-el

    Layer-dependent evolution of electronic structures and correlations in rhombohedral multilayer graphene

    Authors: Yue-Ying Zhou, Yang Zhang, Shihao Zhang, Hao Cai, Ling-Hui Tong, Yuan Tian, Tongtong Chen, Qiwei Tian, Chen Zhang, Yiliu Wang, Xuming Zou, Xingqiang Liu, Yuanyuan Hu, Li Zhang, Lijie Zhang, Wen-Xiao Wang, Lei Liao, Zhihui Qin, Long-Jing Yin

    Abstract: The recent discovery of superconductivity and magnetism in trilayer rhombohedral graphene (RG) establishes an ideal, untwisted platform to study strong correlation electronic phenomena. However, the correlated effects in multilayer RG have received limited attention, and, particularly, the evolution of the correlations with increasing layer number remains an unresolved question. Here, we show the… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 21 pages, 4 figures

  41. arXiv:2312.12236  [pdf, ps, other

    cs.LG cs.IT math.ST

    Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure

    Authors: Xinying Zou, Samir M. Perlaza, Iñaki Esnaola, Eitan Altman

    Abstract: In this paper, the worst-case probability measure over the data is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. More specifically, the worst-case probability measure is a Gibbs probability measure and the unique solution to the maximization of the expected loss under a relative entropy constraint with respect to a reference probability mea… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: To appear in the Proceedings of the AAAI Conference on Artificial Intelligence (7 + 2 pages)

    Report number: INRIA Technical Report RR-9515

  42. arXiv:2312.07532  [pdf, other

    cs.CV cs.AI cs.CL

    Interfacing Foundation Models' Embeddings

    Authors: Xueyan Zou, Linjie Li, Jianfeng Wang, Jianwei Yang, Mingyu Ding, Junyi Wei, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

    Abstract: Foundation models possess strong capabilities in reasoning and memorizing across modalities. To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity. As shown in the teaser figure, a lightweight transformer interface without tuning any… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: CODE: https://github.com/UX-Decoder/FIND

  43. arXiv:2312.07141  [pdf, other

    cs.CL

    Multilingual large language models leak human stereotypes across language boundaries

    Authors: Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daume III

    Abstract: Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language mode… ▽ More

    Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  44. arXiv:2312.02949  [pdf, other

    cs.CV

    LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

    Authors: Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

    Abstract: With the recent significant advancements in large multi-modal models (LMMs), the importance of their grounding capability in visual chat is increasingly recognized. Despite recent efforts to enable LMMs to support grounding, their capabilities for grounding and chat are usually separate, and their chat performance drops dramatically when asked to ground. The problem is the lack of a dataset for gr… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  45. arXiv:2312.02646  [pdf, other

    cs.LG cs.AI

    SAMSGL: Series-Aligned Multi-Scale Graph Learning for Spatio-Temporal Forecasting

    Authors: Xiaobei Zou, Luolin Xiong, Yang Tang, Jürgen Kurths

    Abstract: Spatio-temporal forecasting in various domains, like traffic prediction and weather forecasting, is a challenging endeavor, primarily due to the difficulties in modeling propagation dynamics and capturing high-dimensional interactions among nodes. Despite the significant strides made by graph-based networks in spatio-temporal forecasting, there remain two pivotal factors closely related to forecas… ▽ More

    Submitted 27 May, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted by Chaos

  46. arXiv:2311.18327  [pdf

    eess.SY

    Deep Reinforcement Learning Based Optimal Energy Management of Multi-energy Microgrids with Uncertainties

    Authors: Yang Cui, Yang Xu, Yang Li, Yijian Wang, Xinpeng Zou

    Abstract: Multi-energy microgrid (MEMG) offers an effective approach to deal with energy demand diversification and new energy consumption on the consumer side. In MEMG, it is critical to deploy an energy management system (EMS) for efficient utilization of energy and reliable operation of the system. To help EMS formulate optimal dispatching schemes, a deep reinforcement learning (DRL)-based MEMG energy ma… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted by CSEE Journal of Power and Energy Systems

  47. arXiv:2311.16512  [pdf, other

    cs.CV cs.AI

    CoSeR: Bridging Image and Language for Cognitive Super-Resolution

    Authors: Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Renjing Pei, Xueyi Zou, Youliang Yan, Yujiu Yang

    Abstract: Existing super-resolution (SR) models primarily focus on restoring local texture details, often neglecting the global semantic information within the scene. This oversight can lead to the omission of crucial semantic details or the introduction of inaccurate textures during the recovery process. In our work, we introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with t… ▽ More

    Submitted 20 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Project page: https://coser-main.github.io ; GitHub repository: https://github.com/VINHYU/CoSeR

  48. arXiv:2311.13601  [pdf, other

    cs.CV cs.AI cs.LG

    Visual In-Context Prompting

    Authors: Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

    Abstract: In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain. Existing visual prompting methods focus on referring segmentation to segment the most relevant object, falling short of addressing many generic vision tasks like open-set segmentation and detection. In this paper, we introduce… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: technical report

  49. arXiv:2311.12083  [pdf, other

    cs.CV eess.IV

    PanBench: Towards High-Resolution and High-Performance Pansharpening

    Authors: Shiying Wang, Xuechao Zou, Kai Li, Junliang Xing, Pin Tao

    Abstract: Pansharpening, a pivotal task in remote sensing, involves integrating low-resolution multispectral images with high-resolution panchromatic images to synthesize an image that is both high-resolution and retains multispectral information. These pansharpened images enhance precision in land cover classification, change detection, and environmental monitoring within remote sensing data analysis. Whil… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  50. arXiv:2311.05437  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

    Authors: Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

    Abstract: LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models. It maintains a skill repository of pre-trained vision and vision-language models and can activate relevant tools based on users' inputs to fulfill real-world tasks. LLaVA-Plus is trained on multimodal instruction-following data to acquire the ability to use tools, covering visual understa… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 25 pages, 25M file size. Project Page: https://llava-vl.github.io/llava-plus/