Zum Hauptinhalt springen

Showing 1–50 of 1,690 results for author: liu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17334  [pdf

    q-bio.NC cs.CE cs.SC q-bio.TO

    Role of Data-driven Regional Growth Model in Shaping Brain Folding Patterns

    Authors: Jixin Hou, Zhengwang Wu, Xianyan Chen, Dajiang Zhu, Tianming Liu, Gang Li, Xianqiao Wang

    Abstract: The surface morphology of the developing mammalian brain is crucial for understanding brain function and dysfunction. Computational modeling offers valuable insights into the underlying mechanisms for early brain folding. While previous studies generally assume uniform growth, recent findings indicate significant regional variations in brain tissue growth. However, the role of these variations in… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 43 pages, 16 figures

  2. arXiv:2408.15887  [pdf

    eess.IV cs.CV

    SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through Residual Visual Mamba Layers and Shape Priors

    Authors: Zhiqing Zhang, Tianyong Liu, Guojia Fan, Bin Li, Qianjin Feng, Shoujun Zhou

    Abstract: Accurate segmentation of 3D clinical medical images is critical in the diagnosis and treatment of spinal diseases. However, the inherent complexity of spinal anatomy and uncertainty inherent in current imaging technologies, poses significant challenges for semantic segmentation of spinal images. Although convolutional neural networks (CNNs) and Transformer-based models have made some progress in s… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 17 pages, 11 figures

  3. arXiv:2408.14467  [pdf, other

    cs.CL

    Explicit Inductive Inference using Large Language Models

    Authors: Tianyang Liu, Tianyi Li, Liang Cheng, Mark Steedman

    Abstract: Large Language Models (LLMs) are reported to hold undesirable attestation bias on inference tasks: when asked to predict if a premise P entails a hypothesis H, instead of considering H's conditional truthfulness entailed by P, LLMs tend to use the out-of-context truth label of H as a fragile proxy. In this paper, we propose a pipeline that exploits this bias to do explicit inductive inference. Our… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  4. arXiv:2408.14418  [pdf, other

    cs.CL cs.AI

    MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues

    Authors: Kuluhan Binici, Abhinav Ramesh Kashyap, Viktor Schlegel, Andy T. Liu, Vijay Prakash Dwivedi, Thanh-Tung Nguyen, Xiaoxue Gao, Nancy F. Chen, Stefan Winkler

    Abstract: Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solut… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  5. arXiv:2408.13858  [pdf, other

    cs.CV cs.LG

    Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching

    Authors: Minghao Liu, Le Zhang, Yingjie Tian, Xiaochao Qu, Luoqi Liu, Ting Liu

    Abstract: Recent advances in text-to-image diffusion models have demonstrated impressive capabilities in image quality. However, complex scene generation remains relatively unexplored, and even the definition of `complex scene' itself remains unclear. In this paper, we address this gap by providing a precise definition of complex scenes and introducing a set of Complex Decomposition Criteria (CDC) based on… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  6. arXiv:2408.13529  [pdf, other

    cs.RO

    Effects of fiber number and density on fiber jamming: Towards follow-the-leader deployment of a continuum robot

    Authors: Chen Qian, Tangyou Liu, Liao Wu

    Abstract: Fiber jamming modules (FJMs) offer flexibility and quick stiffness variation, making them suitable for follow-the-leader (FTL) motions in continuum robots, which is ideal for minimally invasive surgery (MIS). However, their potential has not been fully exploited, particularly in designing and manufacturing small-sized FJMs with high stiffness variation. Although existing research has focused on fa… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 6 pages, 6 figures, accepted by IROS2024

  7. arXiv:2408.12942  [pdf, other

    cs.CL cs.AI

    Causal-Guided Active Learning for Debiasing Large Language Models

    Authors: Li Du, Zhouhao Sun, Xiao Ding, Yixuan Ma, Yang Zhao, Kaitao Qiu, Ting Liu, Bing Qin

    Abstract: Although achieving promising performance, recent analyses show that current generative large language models (LLMs) may still capture dataset biases and utilize them for generation, leading to poor generalizability and harmfulness of LLMs. However, due to the diversity of dataset biases and the over-optimization problem, previous prior-knowledge-based debiasing methods and fine-tuning-based debias… ▽ More

    Submitted 30 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted as ACL 2024 main conference & Rewared as Outstanding Paper

  8. arXiv:2408.12821  [pdf, other

    cs.CV cs.AI

    Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery

    Authors: Zhenyuan Yang, Xuhui Lin, Qinyi He, Ziye Huang, Zhengliang Liu, Hanqi Jiang, Peng Shu, Zihao Wu, Yiwei Li, Stephen Law, Gengchen Mai, Tianming Liu, Tao Yang

    Abstract: The emergence of Large Language Models (LLMs) and multimodal foundation models (FMs) has generated heightened interest in their applications that integrate vision and language. This paper investigates the capabilities of ChatGPT-4V and Gemini Pro for Street View Imagery, Built Environment, and Interior by evaluating their performance across various tasks. The assessments include street furniture i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  9. arXiv:2408.12109  [pdf, other

    cs.CV cs.CL

    RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

    Authors: Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu

    Abstract: Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the sc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  10. arXiv:2408.12095  [pdf, other

    cs.CL cs.AI cs.LG

    uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

    Authors: Aishik Nagar, Yutong Liu, Andy T. Liu, Viktor Schlegel, Vijay Prakash Dwivedi, Arun-Kumar Kaliya-Perumal, Guna Pratheep Kalanchiam, Yili Tang, Robby T. Tan

    Abstract: Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as fai… ▽ More

    Submitted 25 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 12 pages

  11. arXiv:2408.11850  [pdf, other

    cs.CL

    Parallel Speculative Decoding with Adaptive Draft Length

    Authors: Tianyu Liu, Yun Li, Qitan Lv, Kai Liu, Jianchen Zhu, Winston Hu

    Abstract: Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textit{guessing} tokens, and vice… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  12. arXiv:2408.11535  [pdf, other

    cs.CV

    SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything

    Authors: Chongkai Yu, Anqi Li, Xiaochao Qu, Luoqi Liu, Ting Liu

    Abstract: The advent of the Segment Anything Model (SAM) marks a significant milestone for interactive segmentation using generalist models. As a late fusion model, SAM extracts image embeddings once and merges them with prompts in later interactions. This strategy limits the models ability to extract detailed information from the prompted target zone. Current specialist models utilize the early fusion stra… ▽ More

    Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  13. arXiv:2408.11431  [pdf, other

    cs.CL cs.AI

    Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning

    Authors: Kai Xiong, Xiao Ding, Li Du, Jiahao Ying, Ting Liu, Bing Qin, Yixin Cao

    Abstract: Large Language Models (LLMs) are versatile and demonstrate impressive generalization ability by mining and learning information from extensive unlabeled text. However, they still exhibit reasoning mistakes, often stemming from knowledge deficiencies, which can affect their trustworthiness and reliability. Although users can provide diverse and comprehensive queries, obtaining sufficient and effect… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Review

  14. arXiv:2408.10627  [pdf, other

    cs.CV

    Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?

    Authors: Chen Liang, Qiang Guo, Xiaochao Qu, Luoqi Liu, Ting Liu

    Abstract: Video segmentation aims at partitioning video sequences into meaningful segments based on objects or regions of interest within frames. Current video segmentation models are often derived from image segmentation techniques, which struggle to cope with small-scale or class-imbalanced video datasets. This leads to inconsistent segmentation results across frames. To address these issues, we propose a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  15. arXiv:2408.10623  [pdf, other

    cs.CV

    TextMastero: Mastering High-Quality Scene Text Editing in Diverse Languages and Styles

    Authors: Tong Wang, Xiaochao Qu, Ting Liu

    Abstract: Scene text editing aims to modify texts on images while maintaining the style of newly generated text similar to the original. Given an image, a target area, and target text, the task produces an output image with the target text in the selected area, replacing the original. This task has been studied extensively, with initial success using Generative Adversarial Networks (GANs) to balance text fi… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  16. arXiv:2408.09916  [pdf, other

    cs.CV cs.CL

    Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

    Authors: Qizhou Chen, Taolin Zhang, Chengyu Wang, Xiaofeng He, Dakan Wang, Tingting Liu

    Abstract: Model editing aims to correct outdated or erroneous knowledge in large models without costly retraining. Recent research discovered that the mid-layer representation of the subject's final token in a prompt has a strong influence on factual predictions, and developed Large Language Model (LLM) editing techniques based on this observation. However, for Vision-LLMs (VLLMs), how visual representation… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  17. arXiv:2408.09819  [pdf, other

    cs.CL cs.AI

    CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

    Authors: Linhao Yu, Yongqi Leng, Yufei Huang, Shang Wu, Haixin Liu, Xinmeng Ji, Jiahui Zhao, Jinwang Song, Tingting Cui, Xiaoqing Cheng, Tao Liu, Deyi Xiong

    Abstract: What a large language model (LLM) would respond in ethically relevant context? In this paper, we curate a large benchmark CMoralEval for morality evaluation of Chinese LLMs. The data sources of CMoralEval are two-fold: 1) a Chinese TV program discussing Chinese moral norms with stories from the society and 2) a collection of Chinese moral anomies from various newspapers and academic papers on mora… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 (Findings)

  18. arXiv:2408.09465  [pdf, other

    cs.CV cs.AI

    MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment

    Authors: Tianyi Liu, Zhaorui Tan, Muyin Chen, Xi Yang, Haochuan Jiang, Kaizhu Huang

    Abstract: Brain tumor segmentation is often based on multiple magnetic resonance imaging (MRI). However, in clinical practice, certain modalities of MRI may be missing, which presents a more difficult scenario. To cope with this challenge, Knowledge Distillation, Domain Adaption, and Shared Latent Space have emerged as commonly promising strategies. However, recent efforts typically overlook the modality ga… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  19. arXiv:2408.09198  [pdf, other

    cs.RO

    Learning Based Toolpath Planner on Diverse Graphs for 3D Printing

    Authors: Yuming Huang, Yuhu Guo, Renbo Su, Xingjian Han, Junhao Ding, Tianyu Zhang, Tao Liu, Weiming Wang, Guoxin Fang, Xu Song, Emily Whiting, Charlie C. L. Wang

    Abstract: This paper presents a learning based planner for computing optimized 3D printing toolpaths on prescribed graphs, the challenges of which include the varying graph structures on different models and the large scale of nodes & edges on a graph. We adopt an on-the-fly strategy to tackle these challenges, formulating the planner as a Deep Q-Network (DQN) based optimizer to decide the next `best' node… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  20. arXiv:2408.08323  [pdf, other

    cs.HC

    Exploring Urban Comfort through Novel Wearables and Environmental Surveys

    Authors: Patrick Chwalek, Sailin Zhong, Nathan Perry, Tianqi Liu, Clayton Miller, Hamed Seiied Alavi, Denis Lalanne, Joseph A. Paradiso

    Abstract: This study presents a comprehensive dataset capturing indoor environmental parameters, physiological responses, and subjective perceptions across three global cities. Utilizing wearable sensors, including smart eyeglasses, and a modified Cozie app, environmental and physiological data were collected, along with pre-screening, onboarding, and recurring surveys. Peripheral cues facilitated participa… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Submitted to Nature Scientific Data

  21. arXiv:2408.07967  [pdf, other

    cs.CV

    FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

    Authors: Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai

    Abstract: This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper i… ▽ More

    Submitted 19 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  22. Learning Rule-Induced Subgraph Representations for Inductive Relation Prediction

    Authors: Tianyu Liu, Qitan Lv, Jie Wang, Shuling Yang, Hanzhu Chen

    Abstract: Inductive relation prediction (IRP) -- where entities can be different during training and inference -- has shown great power for completing evolving knowledge graphs. Existing works mainly focus on using graph neural networks (GNNs) to learn the representation of the subgraph induced from the target link, which can be seen as an implicit rule-mining process to measure the plausibility of the targ… ▽ More

    Submitted 20 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Journal ref: Advances in Neural Information Processing Systems 36 (2024)

  23. arXiv:2408.06072  [pdf, other

    cs.CV

    CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

    Authors: Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang

    Abstract: We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  24. arXiv:2408.04835  [pdf, other

    cs.NI

    Next-Generation Wi-Fi Networks with Generative AI: Design and Insights

    Authors: Jingyu Wang, Xuming Fang, Dusit Niyato, Tie Liu

    Abstract: Generative artificial intelligence (GAI), known for its powerful capabilities in image and text processing, also holds significant promise for the design and performance enhancement of future wireless networks. In this article, we explore the transformative potential of GAI in next-generation Wi-Fi networks, exploiting its advanced capabilities to address key challenges and improve overall network… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  25. arXiv:2408.04583  [pdf, other

    cs.LG cs.AI

    Unveiling the Power of Sparse Neural Networks for Feature Selection

    Authors: Zahra Atashgahi, Tennison Liu, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu, Mihaela van der Schaar

    Abstract: Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. Leveraging the dynamic sparse training (DST) algorithms within SNNs has demonstrated promising feature selection capabilities while drastically reducing computational overheads. Despite these advancements, several critical aspects remain insufficiently explored for feature selection. Questions persist reg… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  26. arXiv:2408.04131  [pdf, other

    cs.LG

    Heterogeneous Graph Sequence Neural Networks for Dynamic Traffic Assignment

    Authors: Tong Liu, Hadi Meidani

    Abstract: Traffic assignment and traffic flow prediction provide critical insights for urban planning, traffic management, and the development of intelligent transportation systems. An efficient model for calculating traffic flows over the entire transportation network could provide a more detailed and realistic understanding of traffic dynamics. However, existing traffic prediction approaches, such as thos… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 figures

  27. arXiv:2408.04034  [pdf, other

    cs.CV

    Task-oriented Sequential Grounding in 3D Scenes

    Authors: Zhuofan Zhang, Ziyu Zhu, Pengxiang Li, Tengyu Liu, Xiaojian Ma, Yixin Chen, Baoxiong Jia, Siyuan Huang, Qing Li

    Abstract: Grounding natural language in physical 3D environments is essential for the advancement of embodied artificial intelligence. Current datasets and models for 3D visual grounding predominantly focus on identifying and localizing objects from static, object-centric descriptions. These approaches do not adequately address the dynamic and sequential nature of task-oriented grounding necessary for pract… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: website: https://sg-3d.github.io/

  28. AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

    Authors: Mingcan Xiang, Steven Jiaxun Tang, Qizheng Yang, Hui Guan, Tongping Liu

    Abstract: In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams such as images, video, and sensor data is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 13 pages, 9 figures, Published at ACM Multimedia (ACM MM) 2024

  29. arXiv:2408.03675  [pdf, other

    cs.CL

    NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

    Authors: Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun, Dianhai Yu, Hua Wu

    Abstract: Large Language Models (LLMs) have ignited an innovative surge of AI applications, marking a new era of exciting possibilities equipped with extended context windows. However, hosting these models is cost-prohibitive mainly due to the extensive memory consumption of KV Cache involving long-context modeling. Despite several works proposing to evict unnecessary tokens from the KV Cache, most of them… ▽ More

    Submitted 7 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 (main conference, long paper)

  30. arXiv:2408.03482  [pdf, other

    cs.CR

    Beyond App Markets: Demystifying Underground Mobile App Distribution Via Telegram

    Authors: Yanhui Guo, Dong Wang, Liu Wang, Yongsheng Fang, Chao Wang, Minghui Yang, Tianming Liu, Haoyu Wang

    Abstract: The thriving mobile app ecosystem encompasses a wide range of functionalities. However, within this ecosystem, a subset of apps provides illicit services such as gambling and pornography to pursue economic gains, collectively referred to as "underground economy apps". While previous studies have examined these apps' characteristics and identification methods, investigations into their distribution… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  31. arXiv:2408.03359  [pdf, other

    cs.LG cs.AI cs.CL

    LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification

    Authors: Zhen Qin, Junru Wu, Jiaming Shen, Tianqi Liu, Xuanhui Wang

    Abstract: We introduce LAMPO, a novel paradigm that leverages Large Language Models (LLMs) for solving few-shot multi-class ordinal classification tasks. Unlike conventional methods, which concatenate all demonstration examples with the test instance and prompt LLMs to produce the pointwise prediction, our framework uses the LLM as a preference machine that makes a relative comparative decision between the… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  32. arXiv:2408.03286  [pdf, other

    cs.CV

    Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

    Authors: Zhiling Yan, Weixiang Sun, Rong Zhou, Zhengqing Yuan, Kai Zhang, Yiwei Li, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun

    Abstract: Medical image segmentation and video object segmentation are essential for diagnosing and analyzing diseases by identifying and measuring biological structures. Recent advances in natural domain have been driven by foundation models like the Segment Anything Model 2 (SAM-2). To explore the performance of SAM-2 in biomedical applications, we designed three evaluation pipelines for single-frame 2D i… ▽ More

    Submitted 17 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  33. arXiv:2408.03079  [pdf, other

    cs.CL cs.AI

    Enhancing Complex Causality Extraction via Improved Subtask Interaction and Knowledge Fusion

    Authors: Jinglong Gao, Chen Lu, Xiao Ding, Zhongyang Li, Ting Liu, Bing Qin

    Abstract: Event Causality Extraction (ECE) aims at extracting causal event pairs from texts. Despite ChatGPT's recent success, fine-tuning small models remains the best approach for the ECE task. However, existing fine-tuning based ECE methods cannot address all three key challenges in ECE simultaneously: 1) Complex Causality Extraction, where multiple causal-effect pairs occur within a single sentence; 2)… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: NLPCC 2024 Oral

  34. arXiv:2408.01607  [pdf

    cs.CV cs.LG

    Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives

    Authors: Lei Ma, Ziyun Yan, Mengmeng Li, Tao Liu, Liqin Tan, Xuan Wang, Weiqiang He, Ruikun Wang, Guangjun He, Heng Lu, Thomas Blaschke

    Abstract: Deep learning has gained significant attention in remote sensing, especially in pixel- or patch-level applications. Despite initial attempts to integrate deep learning into object-based image analysis (OBIA), its full potential remains largely unexplored. In this article, as OBIA usage becomes more widespread, we conducted a comprehensive review and expansion of its task subdomains, with or withou… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  35. arXiv:2408.01370  [pdf, other

    cs.CV cs.RO

    EVIT: Event-based Visual-Inertial Tracking in Semi-Dense Maps Using Windowed Nonlinear Optimization

    Authors: Runze Yuan, Tao Liu, Zijia Dai, Yi-Fan Zuo, Laurent Kneip

    Abstract: Event cameras are an interesting visual exteroceptive sensor that reacts to brightness changes rather than integrating absolute image intensities. Owing to this design, the sensor exhibits strong performance in situations of challenging dynamics and illumination conditions. While event-based simultaneous tracking and mapping remains a challenging problem, a number of recent works have pointed out… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures, 3 tables, International Conference on Intelligent Robots and Systems 2024

  36. arXiv:2408.01319  [pdf, other

    cs.AI

    A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

    Authors: Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li, Yutong Zhang, Zihao Wu, Zhengliang Liu, Tianyang Zhong, Bao Ge, Tuo Zhang, Ning Qiang, Xintao Hu, Xi Jiang, Xin Zhang, Wei Zhang, Dinggang Shen, Tianming Liu, Shu Zhang

    Abstract: In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  37. arXiv:2408.01276  [pdf, other

    cs.CV

    Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement

    Authors: Wenbin Zou, Hongxia Gao, Weipeng Yang, Tongtong Liu

    Abstract: Ultra-high-definition (UHD) technology has attracted widespread attention due to its exceptional visual quality, but it also poses new challenges for low-light image enhancement (LLIE) techniques. UHD images inherently possess high computational complexity, leading existing UHD LLIE methods to employ high-magnification downsampling to reduce computational costs, which in turn results in informatio… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 10 pages, 8 figures, ACMMM2024 accepted

  38. arXiv:2408.00278  [pdf, other

    cs.LG cs.AI cs.NE

    High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures

    Authors: Xiang Fu, Xinpeng Zhang, Jixiang Ma, Peng Zhao, Shuai Lu, Xu T. Liu

    Abstract: Convolution is the core component within deep neural networks and it is computationally intensive and time consuming. Tensor data layouts significantly impact convolution operations in terms of memory access and computational efficiency. Yet, there is still a lack of comprehensive performance characterization on data layouts on SIMD architectures concerning convolution methods. This paper proposes… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  39. arXiv:2407.21057  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-group Uncertainty Quantification for Long-form Text Generation

    Authors: Terrance Liu, Zhiwei Steven Wu

    Abstract: While large language models are rapidly moving towards consumer-facing applications, they are often still prone to factual errors and hallucinations. In order to reduce the potential harms that may come from these errors, it is important for users to know to what extent they can trust an LLM when it makes a factual claim. To this end, we study the problem of uncertainty quantification of factual c… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  40. arXiv:2407.20522  [pdf, other

    cs.HC

    Evaluating Fairness in Black-box Algorithmic Markets: A Case Study of Ride Sharing in Chicago

    Authors: Yuhan Liu, Yuhan Zheng, Siyuan Zhang, Lydia T. Liu

    Abstract: This study examines fairness within the rideshare industry, focusing on both drivers' wages and riders' trip fares. Through quantitative analysis, we found that drivers' hourly wages are significantly influenced by factors such as race/ethnicity, health insurance status, tenure to the platform, and working hours. Despite platforms' policies not intentionally embedding biases, disparities persist b… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to the Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact, co-located with the International Conference on Machine Learning, Vienna, Austria

  41. arXiv:2407.19625  [pdf, other

    cs.CL cs.MM

    LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment

    Authors: Taoyu Su, Xinghua Zhang, Jiawei Sheng, Zhenyu Zhang, Tingwen Liu

    Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs (MMKGs), whose entities can be associated with relational triples and related images. Most previous studies treat the graph structure as a special modality, and fuse different modality information with separate uni-modal encoders, neglecting valuable relational associations in modaliti… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ECAI 2024

  42. arXiv:2407.19389  [pdf, other

    cs.DC cs.LG math.OC

    FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

    Authors: Feijie Wu, Xingchen Wang, Yaqing Wang, Tianci Liu, Lu Su, Jing Gao

    Abstract: In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacit… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  43. arXiv:2407.19302  [pdf, other

    cs.CL cs.MM

    IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

    Authors: Taoyu Su, Jiawei Sheng, Shicheng Wang, Xinghua Zhang, Hongbo Xu, Tingwen Liu

    Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational infor… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  44. arXiv:2407.18523  [pdf, other

    cs.LG

    DTFormer: A Transformer-Based Method for Discrete-Time Dynamic Graph Representation Learning

    Authors: Xi Chen, Yun Xiong, Siwei Zhang, Jiawei Zhang, Yao Zhang, Shiyang Zhou, Xixi Wu, Mingyang Zhang, Tengfei Liu, Weiqiang Wang

    Abstract: Discrete-Time Dynamic Graphs (DTDGs), which are prevalent in real-world implementations and notable for their ease of data acquisition, have garnered considerable attention from both academic researchers and industry practitioners. The representation learning of DTDGs has been extensively applied to model the dynamics of temporally changing entities and their evolving connections. Currently, DTDG… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  45. arXiv:2407.18064  [pdf, other

    cs.HC

    ComPeer: A Generative Conversational Agent for Proactive Peer Support

    Authors: Tianjian Liu, Hongzheng Zhao, Yuheng Liu, Xingbo Wang, Zhenhui Peng

    Abstract: Conversational Agents (CAs) acting as peer supporters have been widely studied and demonstrated beneficial for people's mental health. However, previous peer support CAs either are user-initiated or follow predefined rules to initiate the conversations, which may discourage users to engage and build relationships with the CAs for long-term benefits. In this paper, we develop ComPeer, a generative… ▽ More

    Submitted 5 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: To appear at the 2024 ACM Symposium on User Interface Software and Technology (UIST); 22 pages (7 figures, 7 tables)

  46. arXiv:2407.16008  [pdf, other

    cs.CL

    Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation

    Authors: Jiaming Shen, Ran Xu, Yennie Jun, Zhen Qin, Tianqi Liu, Carl Yang, Yi Liang, Simon Baumgartner, Michael Bendersky

    Abstract: Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. They are trained using preference datasets where each example consists of one input prompt, two responses, and a preference label. As curating a high-quality human labeled preference dataset is both time-consuming and expensive, people often rely on existing powerful LLMs for preference label generati… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  47. arXiv:2407.15975  [pdf, other

    cs.CL

    Multilingual Fine-Grained News Headline Hallucination Detection

    Authors: Jiaming Shen, Tianqi Liu, Jialu Liu, Zhen Qin, Jay Pavagadhi, Simon Baumgartner, Michael Bendersky

    Abstract: The popularity of automated news headline generation has surged with advancements in pre-trained language models. However, these models often suffer from the ``hallucination'' problem, where the generated headline is not fully supported by its source article. Efforts to address this issue have predominantly focused on English, using over-simplistic classification schemes that overlook nuanced hall… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  48. arXiv:2407.15791  [pdf, other

    cs.CV

    RADA: Robust and Accurate Feature Learning with Domain Adaptation

    Authors: Jingtai He, Gehao Zhang, Tingting Liu, Songlin Du

    Abstract: Recent advancements in keypoint detection and descriptor extraction have shown impressive performance in local feature learning tasks. However, existing methods generally exhibit suboptimal performance under extreme conditions such as significant appearance changes and domain shifts. In this study, we introduce a multi-level feature aggregation network that incorporates two pivotal components to f… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  49. arXiv:2407.15683  [pdf, other

    cs.CV

    Enhancing Transferability of Targeted Adversarial Examples: A Self-Universal Perspective

    Authors: Bowen Peng, Li Liu, Tianpeng Liu, Zhen Liu, Yongxiang Liu

    Abstract: Transfer-based targeted adversarial attacks against black-box deep neural networks (DNNs) have been proven to be significantly more challenging than untargeted ones. The impressive transferability of current SOTA, the generative methods, comes at the cost of requiring massive amounts of additional data and time-consuming training for each targeted label. This results in limited efficiency and flex… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 8 pages and 9 figures

  50. arXiv:2407.13642  [pdf, other

    cs.CV

    Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

    Authors: Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher

    Abstract: In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024