Skip to main content

Showing 1–50 of 1,860 results for author: Liu, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13255  [pdf, other

    cs.IT eess.SP

    Interleaved Block-Sparse Transform

    Authors: Lei Liu, Ming Wang, Shufeng Li, Yuhao Chi, Ning Wei, ZhaoYang Zhang

    Abstract: Low-complexity Bayes-optimal memory approximate message passing (MAMP) is an efficient signal estimation algorithm in compressed sensing and multicarrier modulation. However, achieving replica Bayes optimality with MAMP necessitates a large-scale right-unitarily invariant transformation, which is prohibitive in practical systems due to its high computational complexity and hardware costs. To solve… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Submitted to the IEEE Journal

  2. arXiv:2407.12264  [pdf, ps, other

    cs.IT eess.SP

    Hybrid Near-Far Field Channel Estimation for Holographic MIMO Communications

    Authors: Shaohua Yue, Shuhao Zeng, Liang Liu, Yonina C. Eldar, Boya Di

    Abstract: Holographic MIMO communications, enabled by large-scale antenna arrays with quasi-continuous apertures, is a potential technology for spectrum efficiency improvement. However, the increased antenna aperture size extends the range of the Fresnel region, leading to a hybrid near-far field communication mode. The users and scatterers randomly lie in near-field and far-field zones, and thus, conventio… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 13 pages, 15 figures

  3. arXiv:2407.11895  [pdf, other

    cs.CV

    OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

    Authors: Zehan Wang, Ziang Zhang, Hang Zhang, Luping Liu, Rongjie Huang, Xize Cheng, Hengshuang Zhao, Zhou Zhao

    Abstract: Recently, human-computer interaction with various modalities has shown promising applications, like GPT-4o and Gemini. Given the foundational role of multimodal joint representation in understanding and generation pipelines, high-quality omni joint representations would be a step toward co-processing more diverse multimodal information. In this work, we present OmniBind, large-scale multimodal joi… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Homepage is http://omnibind.github.io

  4. arXiv:2407.10980  [pdf, ps, other

    cs.NI

    Learning-based Big Data Sharing Incentive in Mobile AIGC Networks

    Authors: Jinbo Wen, Yang Zhang, Yulin Chen, Weifeng Zhong, Xumin Huang, Lei Liu, Dusit Niyato

    Abstract: Rapid advancements in wireless communication have led to a dramatic upsurge in data volumes within mobile edge networks. These substantial data volumes offer opportunities for training Artificial Intelligence-Generated Content (AIGC) models to possess strong prediction and decision-making capabilities. AIGC represents an innovative approach that utilizes sophisticated generative AI algorithms to a… ▽ More

    Submitted 10 June, 2024; originally announced July 2024.

  5. arXiv:2407.10718  [pdf, other

    cs.AI cs.CL

    Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

    Authors: Yulong Wang, Tianhao Shen, Lifeng Liu, Jian Xie

    Abstract: Existing agents based on large language models (LLMs) demonstrate robust problem-solving capabilities by integrating LLMs' inherent knowledge, strong in-context learning and zero-shot capabilities, and the use of tools combined with intricately designed LLM invocation workflows by humans. However, these agents still exhibit shortcomings in long-term reasoning and under-use the potential of existin… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Our code is available at https://github.com/Ag2S1/Sibyl-System

  6. arXiv:2407.10186  [pdf, other

    cs.NI

    Toward Explainable Reasoning in 6G: A Proof of Concept Study on Radio Resource Allocation

    Authors: Farhad Rezazadeh, Sergio Barrachina-Muñoz, Hatim Chergui, Josep Mangues, Mehdi Bennis, Dusit Niyato, Houbing Song, Lingjia Liu

    Abstract: The move toward artificial intelligence (AI)-native sixth-generation (6G) networks has put more emphasis on the importance of explainability and trustworthiness in network management operations, especially for mission-critical use-cases. Such desired trust transcends traditional post-hoc explainable AI (XAI) methods to using contextual explanations for guiding the learning process in an in-hoc way… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 21 pages, 11 Figures, 5 Tables

  7. arXiv:2407.08665  [pdf

    cond-mat.mtrl-sci cs.ET

    Superparamagnetic Tunnel Junctions for Reliable True Randomness

    Authors: Dooyong Koh, Qiuyuan Wang, Brooke C. McGoldrick, Luqiao Liu, Marc A. Baldo

    Abstract: Stochastic devices have the potential to disrupt computing, revolutionizing low-power machine learning acceleration, probabilistic computing, and hardware security. As implemented, however, superparamagnetic tunnel junctions (sMTJs) face significant challenges including the need for external magnetic fields, and poor reliability and scalability. Here, we present experimental demonstration of three… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.08428  [pdf, other

    cs.CV cs.AI

    A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

    Authors: Wentao Lei, Jinting Wang, Fengji Ma, Guanjie Huang, Li Liu

    Abstract: Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is critical. Recent advancements in generative models… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  9. arXiv:2407.08265  [pdf, other

    cs.CV

    Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

    Authors: Miao Yan, Ping Zhang, Haofei Zhang, Ruqian Hao, Juanxiu Liu, Xiaoyang Wang, Lin Liu

    Abstract: Thermal infrared tracking is an essential topic in computer vision tasks because of its advantage of all-weather imaging. However, most conventional methods utilize only hand-crafted features, while deep learning-based correlation filtering methods are limited by simple correlation operations. Transformer-based methods ignore temporal and coordinate information, which is critical for TIR tracking… ▽ More

    Submitted 18 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.08187  [pdf, other

    cs.CV

    ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation

    Authors: Ruijie Zhu, Chuxin Wang, Ziyang Song, Li Liu, Tianzhu Zhang, Yongdong Zhang

    Abstract: Estimating depth from a single image is a challenging visual task. Compared to relative depth estimation, metric depth estimation attracts more attention due to its practical physical significance and critical applications in real-life scenarios. However, existing metric depth estimation methods are typically trained on specific datasets with similar scenes, facing challenges in generalizing acros… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 14 pages, 11 figure, 13 tables

  11. arXiv:2407.07930  [pdf

    q-bio.BM cs.LG

    Token-Mol 1.0: Tokenized drug design with large language model

    Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  12. arXiv:2407.07791  [pdf, other

    cs.CL

    Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

    Authors: Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, Gongshen Liu

    Abstract: The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications, such as collaborative problem-solving and autonomous negotiation. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 18 Pages, working in progress

  13. arXiv:2407.07780  [pdf, other

    cs.CV

    Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher

    Authors: Jiangming Chen, Li Liu, Wanxia Deng, Zhen Liu, Yu Liu, Yingmei Wei, Yongxiang Liu

    Abstract: Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level ov… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  14. arXiv:2407.07744  [pdf, other

    cs.IT cs.AI eess.SP

    Belief Information based Deep Channel Estimation for Massive MIMO Systems

    Authors: Jialong Xu, Liu Liu, Xin Wang, Lan Chen

    Abstract: In the next generation wireless communication system, transmission rates should continue to rise to support emerging scenarios, e.g., the immersive communications. From the perspective of communication system evolution, multiple-input multiple-output (MIMO) technology remains pivotal for enhancing transmission rates. However, current MIMO systems rely on inserting pilot signals to achieve accurate… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  15. arXiv:2407.06545  [pdf, other

    cs.RO

    Visual-Geometry GP-based Navigable Space for Autonomous Navigation

    Authors: Mahmoud Ali, Durgkant Pushp, Zheng Chen, Lantao Liu

    Abstract: Autonomous navigation in unknown environments is challenging and demands the consideration of both geometric and semantic information in order to parse the navigability of the environment. In this work, we propose a novel space modeling framework, Visual-Geometry Sparse Gaussian Process (VG-SGP), that simultaneously considers semantics and geometry of the scene. Our proposed approach can overcome… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted for publication at 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems ( IROS 2024)

  16. arXiv:2407.06249  [pdf, other

    cs.CL cs.SE

    CodeUpdateArena: Benchmarking Knowledge Editing on API Updates

    Authors: Zeyu Leo Liu, Shrey Pandit, Xi Ye, Eunsol Choi, Greg Durrett

    Abstract: Large language models (LLMs) are increasingly being used to synthesize and reason about source code. However, the static nature of these models' knowledge does not reflect the fact that libraries and API functions they invoke are continuously evolving, with functionality being added or changing. While numerous benchmarks evaluate how LLMs can generate code, no prior work has studied how an LLMs' k… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Under Review

  17. arXiv:2407.06115  [pdf, other

    cs.CV cs.AI cs.CL

    Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

    Authors: Qi Jia, Baoyu Fan, Cong Xu, Lu Liu, Liang Jin, Guoguang Du, Zhenhua Guo, Yaqian Zhao, Xuanjing Huang, Rengang Li

    Abstract: Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro… ▽ More

    Submitted 15 May, 2024; originally announced July 2024.

  18. arXiv:2407.05736  [pdf, other

    cs.AI cs.CV

    TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

    Authors: Kun Wu, Zixu Wang, Xiulong Yang, Yangyang Chen, Zhenqi Han, Jialu Zhang, Lizhuang Liu

    Abstract: As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction mod… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures

  19. arXiv:2407.05361  [pdf, other

    eess.AS cs.CL

    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

    Authors: Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu

    Abstract: Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first op… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Fix typos

  20. arXiv:2407.05319  [pdf, other

    cs.CL

    Rethinking Targeted Adversarial Attacks For Neural Machine Translation

    Authors: Junjie Wu, Lemao Liu, Wei Bi, Dit-Yan Yeung

    Abstract: Targeted adversarial attacks are widely used to evaluate the robustness of neural machine translation systems. Unfortunately, this paper first identifies a critical issue in the existing settings of NMT targeted adversarial attacks, where their attacking results are largely overestimated. To this end, this paper presents a new setting for NMT targeted adversarial attacks that could lead to reliabl… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 5 pages, 2 figures, accepted by ICASSP 2024

  21. arXiv:2407.04331  [pdf, other

    cs.SD cs.AI eess.AS

    MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss

    Authors: Yangyang Shu, Haiming Xu, Ziqin Zhou, Anton van den Hengel, Lingqiao Liu

    Abstract: Automatically generating symbolic music-music scores tailored to specific human needs-can be highly beneficial for musicians and enthusiasts. Recent studies have shown promising results using extensive datasets and advanced transformer architectures. However, these state-of-the-art models generally offer only basic control over aspects like tempo and style for the entire composition, lacking the a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Demo is available at: https://ganperf.github.io/musebarcontrol.github.io/musebarcontrol/

  22. arXiv:2407.04242  [pdf, other

    cs.CV

    Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

    Authors: Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni

    Abstract: Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependen… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024. This is the submitted manuscript and the preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections

  23. arXiv:2407.03898  [pdf, other

    cs.IT

    Overflow-Avoiding Memory AMP

    Authors: Shunqi Huang, Lei Liu, Brian M. Kurkoski

    Abstract: Approximate Message Passing (AMP) type algorithms are widely used for signal recovery in high-dimensional noisy linear systems. Recently, a principle called Memory AMP (MAMP) was proposed. Leveraging this principle, the gradient descent MAMP (GD-MAMP) algorithm was designed, inheriting the strengths of AMP and OAMP/VAMP. In this paper, we first provide an overflow-avoiding GD-MAMP (OA-GD-MAMP) to… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  24. arXiv:2407.01007  [pdf, other

    cs.CV

    GMT: A Robust Global Association Model for Multi-Target Multi-Camera Tracking

    Authors: Huijie Fan, Tinghui Zhao, Qiang Wang, Baojie Fan, Yandong Tang, LianQing Liu

    Abstract: In the task of multi-target multi-camera (MTMC) tracking of pedestrians, the data association problem is a key issue and main challenge, especially with complications arising from camera movements, lighting variations, and obstructions. However, most MTMC models adopt two-step approaches, thus heavily depending on the results of the first-step tracking in practical applications. Moreover, the same… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  25. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  26. arXiv:2407.00297  [pdf

    eess.IV cs.CV

    UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

    Authors: Guanghao Zhu, Lin Liu, Jing Zhang, Xiaohui Du, Ruqian Hao, Juanxiu Liu

    Abstract: Facial nerve segmentation is crucial for preoperative path planning in cochlear implantation surgery. Recently, researchers have proposed some segmentation methods, such as atlas-based and deep learning-based methods. However, since the facial nerve is a tubular organ with a diameter of only 1.0-1.5mm, it is challenging to locate and segment the facial nerve in CT scans. In this work, we propose a… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  27. arXiv:2406.20076  [pdf, other

    cs.CV

    EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

    Authors: Yuxuan Zhang, Tianheng Cheng, Rui Hu, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

    Abstract: Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Preprint. Code and models are available at: https://github.com/hustvl/EVF-SAM

  28. arXiv:2406.19649  [pdf

    eess.IV cs.CV

    AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

    Authors: Guanghao Zhu, Jing Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong Liu, Lin Liu

    Abstract: Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). First… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  29. arXiv:2406.19544  [pdf, other

    cs.SE

    Where Are Large Language Models for Code Generation on GitHub?

    Authors: Xiao Yu, Lei Liu, Xing Hu, Jacky Wai Keung, Jin Liu, Xin Xia

    Abstract: The increasing use of Large Language Models (LLMs) in software development has garnered significant attention from researchers assessing the quality of the code they generate. However, much of the research focuses on controlled datasets such as HumanEval, which fail to adequately represent how developers actually utilize LLMs' code generation capabilities or clarify the characteristics of LLM-gene… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  30. arXiv:2406.19156  [pdf, other

    cs.LG

    Heterogeneous Causal Metapath Graph Neural Network for Gene-Microbe-Disease Association Prediction

    Authors: Kexin Zhang, Feng Huang, Luotao Liu, Zhankun Xiong, Hongyu Zhang, Yuan Quan, Wen Zhang

    Abstract: The recent focus on microbes in human medicine highlights their potential role in the genetic framework of diseases. To decode the complex interactions among genes, microbes, and diseases, computational predictions of gene-microbe-disease (GMD) associations are crucial. Existing methods primarily address gene-disease and microbe-disease associations, but the more intricate triple-wise GMD associat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  31. arXiv:2406.18950  [pdf, other

    eess.IV cs.CV

    MMR-Mamba: Multi-Modal MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

    Authors: Jing Zou, Lanqing Liu, Qi Chen, Shujun Wang, Zhanli Hu, Xiaohan Xing, Jing Qin

    Abstract: Multi-modal MRI offers valuable complementary information for diagnosis and treatment; however, its utility is limited by prolonged scanning times. To accelerate the acquisition process, a practical approach is to reconstruct images of the target modality, which requires longer scanning times, from under-sampled k-space data using the fully-sampled reference modality with shorter scanning times as… ▽ More

    Submitted 7 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figure

  32. arXiv:2406.18574  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Few-Shot Continual Learning for Remote Sensing Image Scene Classification

    Authors: Muhammad Anwar Ma'sum, Mahardhika Pratama, Ramasamy Savitha, Lin Liu, Habibullah, Ryszard Kowalczyk

    Abstract: A continual learning (CL) model is desired for remote sensing image analysis because of varying camera parameters, spectral ranges, resolutions, etc. There exist some recent initiatives to develop CL techniques in this domain but they still depend on massive labelled samples which do not fully fit remote sensing applications because ground truths are often obtained via field-based surveys. This pa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Under Review for Publication in IEEE TGRS

  33. arXiv:2406.18555  [pdf

    eess.IV cs.CV

    Using a Convolutional Neural Network and Explainable AI to Diagnose Dementia Based on MRI Scans

    Authors: Tyler Morris, Ziming Liu, Longjian Liu, Xiaopeng Zhao

    Abstract: As the number of dementia patients rises, the need for accurate diagnostic procedures rises as well. Current methods, like using an MRI scan, rely on human input, which can be inaccurate. However, the decision logic behind machine learning algorithms and their outputs cannot be explained, as most operate in black-box models. Therefore, to increase the accuracy of diagnosing dementia through MRIs,… ▽ More

    Submitted 25 May, 2024; originally announced June 2024.

    Comments: 4 pages, 4 figures

  34. arXiv:2406.17988  [pdf, other

    cs.CV

    DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

    Authors: Qingxuan Wu, Zhiyang Dou, Sirui Xu, Soshi Shimada, Chen Wang, Zhengming Yu, Yuan Liu, Cheng Lin, Zeyu Cao, Taku Komura, Vladislav Golyanik, Christian Theobalt, Wenping Wang, Lingjie Liu

    Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures, 3 tables

  35. arXiv:2406.17777  [pdf, other

    cs.CV

    Text-Animator: Controllable Visual Text Video Generation

    Authors: Lin Liu, Quande Liu, Shengju Qian, Yuan Zhou, Wengang Zhou, Houqiang Li, Lingxi Xie, Qi Tian

    Abstract: Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising. One significant unresolved aspect within T2V is the effective visualization of text within generated videos. Despite the progress achieved in Text-to-Video~(T2V) generation, current methods still cannot effectively visualize texts in videos directly, as they mainly focus on summar… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Project Page: https://laulampaul.github.io/text-animator.html

  36. arXiv:2406.17538  [pdf, other

    cs.CV

    SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

    Authors: Guanghao Zhu, Lin Liu, Yuhao Hu, Haixin Sun, Fang Liu, Xiaohui Du, Ruqian Hao, Juanxiu Liu, Yong Liu, Hao Deng, Jing Zhang

    Abstract: Micro-expressions (MEs) are subtle facial movements that occur spontaneously when people try to conceal the real emotions. Micro-expression recognition (MER) is crucial in many fields, including criminal analysis and psychotherapy. However, MER is challenging since MEs have low intensity and ME datasets are small in size. To this end, a three-stream temporal-shift attention network based on self-k… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  37. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  38. arXiv:2406.16868  [pdf, other

    eess.SP cs.AI

    Neural Network-based Two-Dimensional Filtering for OTFS Symbol Detection

    Authors: Jiarui Xu, Karim Said, Lizhong Zheng, Lingjia Liu

    Abstract: Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only the limited over-the-air (OTA) pilot symbols are utilized for training. However, the previous RC-based approach does not design… ▽ More

    Submitted 8 March, 2024; originally announced June 2024.

    Comments: 6 pages, conference paper. arXiv admin note: substantial text overlap with arXiv:2311.08543

  39. arXiv:2406.16377  [pdf, other

    cs.CL cs.AI

    On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

    Authors: Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

    Abstract: Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  40. arXiv:2406.16151  [pdf, other

    cs.AI cs.LG eess.SY

    Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes

    Authors: Larkin Liu, Shiqi Liu, Matej Jusup

    Abstract: In the world of stochastic control, especially in economics and engineering, Markov Decision Processes (MDPs) can effectively model various stochastic decision processes, from asset management to transportation optimization. These underlying MDPs, upon closer examination, often reveal a specifically constrained causal structure concerning the transition and reward dynamics. By exploiting this stru… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Working manuscript

    ACM Class: C.4

  41. arXiv:2406.16066  [pdf, other

    cs.CE

    Constructing Boundary-identical Microstructures by Guided Diffusion for Fast Multiscale Designs

    Authors: Jingxuan Feng, Lili Wang, Xiaoya Zhai, Kai Chen, Wenming Wu, Ligang Liu, Xiao-Ming Fu

    Abstract: We propose a novel method to construct large-scale boundary-identical microstructure datasets with high attribute coverage for highly efficient multiscale design. Central to our technique is using a deep generative model to generate microstructures under the two conditions, including the specified boundary and homogenized elastic tensor. We achieve the desired dataset by alternately adding microst… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  42. arXiv:2406.14482  [pdf, other

    cs.CV

    Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

    Authors: Xinyi Ying, Chao Xiao, Ruojing Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zaiping Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

    Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  43. arXiv:2406.14186  [pdf, other

    eess.IV cs.CV

    CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

    Authors: Tingwei Liu, Miao Zhang, Leiye Liu, Jialong Zhong, Shuyao Wang, Yongri Piao, Huchuan Lu

    Abstract: Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted in MICCAI 2024

  44. arXiv:2406.14090  [pdf, other

    cs.AI

    Personalized Music Recommendation with a Heterogeneity-aware Deep Bayesian Network

    Authors: Erkang Jing, Yezheng Liu, Yidong Chai, Shuo Yu, Longshun Liu, Yuanchun Jiang, Yang Wang

    Abstract: Music recommender systems are crucial in music streaming platforms, providing users with music they would enjoy. Recent studies have shown that user emotions can affect users' music mood preferences. However, existing emotion-aware music recommender systems (EMRSs) explicitly or implicitly assume that users' actual emotional states expressed by an identical emotion word are homogeneous. They also… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 34 pages, 19 figures

  45. arXiv:2406.13943  [pdf, ps, other

    cs.IT

    New QEC codes and EAQEC codes from repeated-root cyclic codes of length $2^rp^s$

    Authors: Lanqiang Li, Ziwen Cao, Tingting Wu, Li Liu

    Abstract: Let $p$ be an odd prime and $r,s,m$ be positive integers. In this study, we initiate our exploration by delving into the intricate structure of all repeated-root cyclic codes and their duals with a length of $2^rp^s$ over the finite field $\mathbb{F}_{p^m}$. Through the utilization of CSS and Steane's constructions, a series of new quantum error-correcting (QEC) codes are constructed with paramete… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    MSC Class: 94B15 (Primary) 94B05; 11T71(Secondary)

  46. arXiv:2406.12255  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

    Authors: Lijie Hu, Liang Liu, Shu Yang, Xin Chen, Hongru Xiao, Mengdi Li, Pan Zhou, Muhammad Asif Ali, Di Wang

    Abstract: Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs). While some studies focus on improving CoT accuracy through methods like retrieval enhancement, yet a rigorous explanation for why CoT achieves such success remains unclear. In this paper, we analyze CoT methods under two different settings by asking the following questions: (1… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 21 pages

  47. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  48. arXiv:2406.11410  [pdf, other

    cs.CL cs.AI

    HARE: HumAn pRiors, a key to small language model Efficiency

    Authors: Lingyun Zhang, Bin jin, Gaojian Ge, Lunhui Liu, Xuewen Shen, Mingyong Wu, Houqian Zhang, Yongneng Jiang, Shiqi Chen, Shi Pu

    Abstract: Human priors play a crucial role in efficiently utilizing data in deep learning. However, with the development of large language models (LLMs), there is an increasing emphasis on scaling both model size and data volume, which often diminishes the importance of human priors in data construction. Influenced by these trends, existing Small Language Models (SLMs) mainly rely on web-scraped large-scale… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  49. arXiv:2406.10907  [pdf, other

    cs.CV

    SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection

    Authors: Lin Liu, Ziying Song, Qiming Xia, Feiyang Jia, Caiyan Jia, Lei Yang, Hongyu Pan

    Abstract: LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications due to its computational efficiency advantages. Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy. However, the former lacks the ability to aggregate contextual information, resulting in insufficie… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.02702

  50. arXiv:2406.08723  [pdf, other

    cs.CL

    ECBD: Evidence-Centered Benchmark Design for NLP

    Authors: Yu Lu Liu, Su Lin Blodgett, Jackie Chi Kit Cheung, Q. Vera Liao, Alexandra Olteanu, Ziang Xiao

    Abstract: Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.