Skip to main content

Showing 1–50 of 1,294 results for author: Mao, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13596  [pdf, other

    cs.CV

    EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Xuerui Mao

    Abstract: Recent advances in visual prompting in the natural image area have allowed users to interact with artificial intelligence (AI) tools through various visual marks such as box, point, and free-form shapes. However, due to the significant difference between the natural and remote sensing (RS) images, existing visual prompting models face challenges in RS scenarios. Moreover, RS MLLMs mainly focus on… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13268  [pdf, other

    cs.AI cs.LG

    Mixture of Experts based Multi-task Supervise Learning from Crowds

    Authors: Tao Han, Huaixuan Shi, Xinyi Ding, Xiao Ma, Huamao Gu, Yili Fang

    Abstract: Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.12267  [pdf, other

    cs.CV cs.GR

    Generating 3D House Wireframes with Semantics

    Authors: Xueqi Ma, Yilin Liu, Wenjun Zhou, Ruowei Wang, Hui Huang

    Abstract: We present a new approach for generating 3D house wireframes with semantic enrichment using an autoregressive model. Unlike conventional generative models that independently process vertices, edges, and faces, our approach employs a unified wire-based representation for improved coherence in learning 3D wireframe structures. By re-ordering wire sequences based on semantic meanings, we facilitate s… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: European Conference on Computer Vision (Proceedings of ECCV 2024); Project page: https://vcc.tech/research/2024/3DWire; GitHub repository: https://github.com/3d-house-wireframe/3d-house-wireframe-dataset

  4. arXiv:2407.12019  [pdf, other

    cs.CL cs.AI

    DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

    Authors: Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao

    Abstract: Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dy… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Published on PRCV24

  5. arXiv:2407.11626  [pdf

    cs.LG cs.NE

    Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces

    Authors: Dongnan Jin, Yali Liu, Qiuzhi Song, Xunju Ma, Yue Liu, Dehao Wu

    Abstract: In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search i… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  6. arXiv:2407.10468  [pdf, other

    cs.SD cs.AI eess.AS

    LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

    Authors: Zhenxiong Tan, Xinyin Ma, Gongfan Fang, Xinchao Wang

    Abstract: Latent diffusion models have shown promising results in audio generation, making notable advancements over traditional methods. However, their performance, while impressive with short audio clips, faces challenges when extended to longer audio sequences. These challenges are due to model's self-attention mechanism and training predominantly on 10-second clips, which complicates the extension to lo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024; Code: https://github.com/Yuanshi9815/LiteFocus

  7. arXiv:2407.10439  [pdf, other

    cs.CV

    PolyRoom: Room-aware Transformer for Floorplan Reconstruction

    Authors: Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen

    Abstract: Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from point clouds. Despite significant advancements achieved in recent years, current methods still encounter several challenges, such as missing corners or edges, inacc… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  8. arXiv:2407.10366  [pdf, other

    cs.CV cs.AI cs.LG

    Accessing Vision Foundation Models at ImageNet-level Costs

    Authors: Yitian Zhang, Xu Ma, Yue Bai, Huan Wang, Yun Fu

    Abstract: Vision foundation models are renowned for their generalization ability due to massive training data. Nevertheless, they demand tremendous training resources, and the training data is often inaccessible, e.g., CLIP, DINOv2, posing great challenges to developing derivatives that could advance research in this field. In this work, we offer a very simple and general solution, named Proteus, to distill… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  9. arXiv:2407.09247  [pdf, other

    cs.AI

    Constrained Intrinsic Motivation for Reinforcement Learning

    Authors: Xiang Zheng, Xingjun Ma, Chao Shen, Cong Wang

    Abstract: This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer fr… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024

  10. arXiv:2407.09096  [pdf, other

    cs.LG cs.AI

    STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs

    Authors: Yiheng Huang, Xiaowei Mao, Shengnan Guo, Yubin Chen, Youfang Lin, Huaiyu Wan

    Abstract: Spatial-temporal forecasting and imputation are important for real-world dynamic systems such as intelligent transportation, urban planning, and public health. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While large language models (LLMs) have exhibited st… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  11. arXiv:2407.09013  [pdf, ps, other

    cs.AI cs.LG

    Procedural Content Generation via Generative Artificial Intelligence

    Authors: Xinyu Mao, Wanli Yu, Kazunori D Yamada, Michael R. Zielewski

    Abstract: The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is e… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  12. arXiv:2407.08978  [pdf, other

    cs.CL cs.LG

    Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models

    Authors: Linghao Jin, Li An, Xuezhe Ma

    Abstract: Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint

  13. arXiv:2407.08528  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing octree-based context models for point cloud geometry compression with attention-based child node number prediction

    Authors: Chang Sun, Hui Yuan, Xiaolong Mao, Xin Lu, Raouf Hamzaoui

    Abstract: In point cloud geometry compression, most octreebased context models use the cross-entropy between the onehot encoding of node occupancy and the probability distribution predicted by the context model as the loss. This approach converts the problem of predicting the number (a regression problem) and the position (a classification problem) of occupied child nodes into a 255-dimensional classificati… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 2 figures and 2 tables

    Journal ref: IEEE Signal Processing Letters, 2024

  14. arXiv:2407.07868  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

    Authors: Eugene Teoh, Sumit Patidar, Xiao Ma, Stephen James

    Abstract: Generalising vision-based manipulation policies to novel environments remains a challenging area with limited exploration. Current practices involve collecting data in one location, training imitation learning or reinforcement learning policies with this data, and deploying the policy in the same location. However, this approach lacks scalability as it necessitates data collection in multiple loca… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Project website: https://greenaug.github.io/

  15. arXiv:2407.07791  [pdf, other

    cs.CL

    Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

    Authors: Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, Gongshen Liu

    Abstract: The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications, such as collaborative problem-solving and autonomous negotiation. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 18 Pages, working in progress

  16. arXiv:2407.07788  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

    Authors: Nikita Chernyadev, Nicholas Backshall, Xiao Ma, Yunfan Lu, Younggyo Seo, Stephen James

    Abstract: We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Project webpage: https://chernyadev.github.io/bigym/

  17. arXiv:2407.07325  [pdf, other

    cs.CV cs.CL cs.MM eess.IV

    HiLight: Technical Report on the Motern AI Video Language Model

    Authors: Zhiting Wang, Qiangong Zhou, Kangjie Yang, Zongyang Liu, Xin Mao

    Abstract: This technical report presents the implementation of a state-of-the-art video encoder for video-text modal alignment and a video conversation framework called HiLight, which features dual visual towers. The work is divided into two main parts: 1.alignment of video and text modalities; 2.convenient and efficient way to interact with users. Our goal is to address the task of video comprehension in t… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  18. arXiv:2407.06841  [pdf, other

    cs.CV

    HTD-Mamba: Efficient Hyperspectral Target Detection with Pyramid State Space Model

    Authors: Dunbin Shen, Xuanbing Zhu, Jiacheng Tian, Jianjun Liu, Zhenrong Du, Hongyu Wang, Xiaorui Ma

    Abstract: Hyperspectral target detection (HTD) identifies objects of interest from complex backgrounds at the pixel level, playing a vital role in Earth observation. However, HTD faces challenges due to limited prior knowledge and spectral variation, leading to underfitting models and unreliable performance. To address these challenges, this paper proposes an efficient self-supervised HTD method with a pyra… ▽ More

    Submitted 17 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: 13 pages,6 figures, 5 tables

  19. arXiv:2407.05282  [pdf, other

    cs.CV

    UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

    Authors: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang

    Abstract: This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct a… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 32 pages, 14 figures

  20. arXiv:2407.05010  [pdf, other

    cs.CV

    PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

    Authors: Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, Wenwu Zhu

    Abstract: We introduce PRANCE, a Vision Transformer compression framework that jointly optimizes the activated channels and reduces tokens, based on the characteristics of inputs. Specifically, PRANCE~ leverages adaptive token optimization strategies for a certain computational budget, aiming to accelerate ViTs' inference from a unified data and architectural perspective. However, the joint framework poses… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  21. arXiv:2407.04616  [pdf, other

    cs.CV cs.AI cs.LG

    Isomorphic Pruning for Vision Models

    Authors: Gongfan Fang, Xinyin Ma, Michael Bi Mi, Xinchao Wang

    Abstract: Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures. However, assessing the relative importance of different sub-structures remains a significant challenge, particularly in advanced vision models featuring novel mechanisms and architectures like self-attention, depth-wise convolutions, or residual connections. These heterogeneous subst… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  22. arXiv:2407.03695  [pdf, other

    cs.CV

    M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask

    Authors: Xinyu Yang, Xiaochen Ma, Xuekang Zhu, Bo Du, Lei Su, Bingkui Tong, Zeyu Lei, Jizhe Zhou

    Abstract: In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  23. arXiv:2407.02813  [pdf, other

    cs.CV cs.AI cs.LG

    Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

    Authors: Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma

    Abstract: Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks… ▽ More

    Submitted 11 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  24. arXiv:2407.02716  [pdf, other

    cs.CV cs.LG

    Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models

    Authors: Xu Han, Linghao Jin, Xuezhe Ma, Xiaofeng Liu

    Abstract: Fine-tuning pre-trained Vision-Language Models (VLMs) has shown remarkable capabilities in medical image and textual depiction synergy. Nevertheless, many pre-training datasets are restricted by patient privacy concerns, potentially containing noise that can adversely affect downstream performance. Moreover, the growing reliance on multi-modal generation exacerbates this issue because of its susce… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  25. Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation

    Authors: Xinyu Mao, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

    Abstract: The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review completion if performed alongside downstream tasks. Recent studies have shown that neural models have good potential on this task, but their time-consuming fin… ▽ More

    Submitted 17 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted at SIGIR 2024;typos corrected

  26. arXiv:2407.00114  [pdf, other

    cs.LG cs.AI cs.CL

    OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

    Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  27. arXiv:2406.19964  [pdf, other

    cs.CR

    Secure Outsourced Decryption for FHE-based Privacy-preserving Cloud Computing

    Authors: Xirong Ma, Chuan Li, Yuchang Hu, Yunting Tao, Yali Jiang, Yanbin Li, Fanyu Kong, Chunpeng Ge

    Abstract: The demand for processing vast volumes of data has surged dramatically due to the advancement of machine learning technology. Large-scale data processing necessitates substantial computational resources, prompting individuals and enterprises to turn to cloud services. Accompanying this trend is a growing concern regarding data leakage and misuse. Homomorphic encryption (HE) is one solution for saf… ▽ More

    Submitted 9 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: content and title updated

  28. arXiv:2406.19711  [pdf, other

    cs.LG

    CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems

    Authors: Ziming Zhao, Tiehua Zhang, Zhishu Shen, Hai Dong, Xingjun Ma, Xianhui Liu, Yun Yang

    Abstract: In recent years, the widespread adoption of distributed microservice architectures within the industry has significantly increased the demand for enhanced system availability and robustness. Due to the complex service invocation paths and dependencies at enterprise-level microservice systems, it is challenging to locate the anomalies promptly during service invocations, thus causing intractable is… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  29. arXiv:2406.17923  [pdf, other

    cs.CL

    PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning

    Authors: Shiva Kumar Pentyala, Zhichao Wang, Bin Bi, Kiran Ramnath, Xiang-Bo Mao, Regunathan Radhakrishnan, Sitaram Asur, Na, Cheng

    Abstract: Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks. The LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream applications. However, this sequential training pipeline leads to alignment tax that degrades the LLM performance. This paper introduces PAFT, a new PArallel training pa… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  30. arXiv:2406.17608  [pdf, other

    cs.CV

    Test-Time Generative Augmentation for Medical Image Segmentation

    Authors: Xiao Ma, Yuhui Tao, Yuhan Zhang, Zexuan Ji, Yizhe Zhang, Qiang Chen

    Abstract: In this paper, we propose a novel approach to enhance medical image segmentation during test time. Instead of employing hand-crafted transforms or functions on the input test image to create multiple views for test-time augmentation, we advocate for the utilization of an advanced domain-fine-tuned generative model (GM), e.g., stable diffusion (SD), for test-time augmentation. Given that the GM has… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12pages, 2figures

  31. arXiv:2406.17343  [pdf, other

    cs.CV cs.AI

    Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

    Authors: Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu

    Abstract: Recent advancements in diffusion models, particularly the trend of architectural transformation from UNet-based Diffusion to Diffusion Transformer (DiT), have significantly improved the quality and scalability of image synthesis. Despite the incredible generative quality, the large computational requirements of these large-scale models significantly hinder the deployments in real-world scenarios.… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  32. arXiv:2406.16571  [pdf, other

    math.OC cs.AI cs.LG eess.SY

    Differentiable Distributionally Robust Optimization Layers

    Authors: Xutao Ma, Chao Ning, Wenli Du

    Abstract: In recent years, there has been a growing research interest in decision-focused learning, which embeds optimization problems as a layer in learning pipelines and demonstrates a superior performance than the prediction-focused approach. However, for distributionally robust optimization (DRO), a popular paradigm for decision-making under uncertainty, it is still unknown how to embed it as a layer, i… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: In Forty-first International Conference on Machine Learning (2024)

  33. arXiv:2406.16502  [pdf, other

    cs.CV

    LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery

    Authors: Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Hongbo Guo, Mengting Ma, Sensen Wu, Zhenhong Du, Siyang Song, Wei Zhang

    Abstract: Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensin… ▽ More

    Submitted 1 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Under Review

  34. Placing Timely Refreshing Services at the Network Edge

    Authors: Xishuo Li, Shan Zhang, Hongbin Luo, Xiao Ma, Junyi He

    Abstract: Accommodating services at the network edge is favorable for time-sensitive applications. However, maintaining service usability is resource-consuming in terms of pulling service images to the edge, synchronizing databases of service containers, and hot updates of service modules. Accordingly, it is critical to determine which service to place based on the received user requests and service refresh… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  35. arXiv:2406.15686  [pdf, other

    cs.CR cs.NI

    The Case for Transport-Level Encryption in Datacenter Networks

    Authors: Tianyi Gao, Xinshu Ma, Suhas Narreddy, Eugenio Luo, Steven W. D. Chien, Michio Honda

    Abstract: Cloud applications need network data encryption to isolate from other tenants and protect their data from potential eavesdroppers in the network infrastructure. This paper presents SDP, a protocol design for emerging datacenter transport protocols, such as pHost, NDP, and Homa, to integrate data encryption with the use of existing NIC offloading of cryptographic operations designed for TLS over TC… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  36. arXiv:2406.15320  [pdf, other

    cs.CV

    Rethinking Remote Sensing Change Detection With A Mask View

    Authors: Xiaowen Ma, Zhenkai Wu, Rongrong Lian, Wei Zhang, Siyang Song

    Abstract: Remote sensing change detection aims to compare two or more images recorded for the same area but taken at different time stamps to quantitatively and qualitatively assess changes in geographical entities and environmental factors. Mainstream models usually built on pixel-by-pixel change detection paradigms, which cannot tolerate the diversity of changes due to complex scenes and variation in imag… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Under review

  37. arXiv:2406.15319  [pdf, other

    cs.CL cs.AI

    LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

    Authors: Ziyan Jiang, Xueguang Ma, Wenhu Chen

    Abstract: In traditional RAG framework, the basic retrieval units are normally short. The common retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design forces the retriever to search over a large corpus to find the `needle' unit. In contrast, the readers only need to extract answers from the short retrieved units. Such an imbalanced `heavy' retriever and `light' reader design ca… ▽ More

    Submitted 30 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Technical Report

  38. Towards Timely Video Analytics Services at the Network Edge

    Authors: Xishuo Li, Shan Zhang, Yuejiao Huang, Xiao Ma, Zhiyuan Wang, Hongbin Luo

    Abstract: Real-time video analytics services aim to provide users with accurate recognition results timely. However, existing studies usually fall into the dilemma between reducing delay and improving accuracy. The edge computing scenario imposes strict transmission and computation resource constraints, making balancing these conflicting metrics under dynamic network conditions difficult. In this regard, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  39. arXiv:2406.14555  [pdf, other

    cs.CV

    A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

    Authors: Xincheng Shuai, Henghui Ding, Xingjun Ma, Rongcheng Tu, Yu-Gang Jiang, Dacheng Tao

    Abstract: Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of Artificial Intelligence Generative Content (AIGC). Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models, which generate images according to text prompts. Th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Project Page: https://github.com/xinchengshuai/Awesome-Image-Editing

  40. arXiv:2406.14075  [pdf, other

    cs.CL

    EXCEEDS: Extracting Complex Events as Connecting the Dots to Graphs in Scientific Domain

    Authors: Yi-Fan Lu, Xian-Ling Mao, Bo Wang, Xiao Liu, Heyan Huang

    Abstract: It is crucial to utilize events to understand a specific domain. There are lots of research on event extraction in many domains such as news, finance and biology domain. However, scientific domain still lacks event extraction research, including comprehensive datasets and corresponding methods. Compared to other domains, scientific domain presents two characteristics: denser nuggets and more compl… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: This paper is working in process

  41. arXiv:2406.13606  [pdf, other

    cs.CV

    DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning

    Authors: Xiaowen Ma, Jiawei Yang, Rui Che, Huanting Zhang, Wei Zhang

    Abstract: Remote sensing change detection (RSCD) aims to identify the changes of interest in a region by analyzing multi-temporal remote sensing images, and has an outstanding value for local development monitoring. Existing RSCD methods are devoted to contextual modeling in the spatial domain to enhance the changes of interest. Despite the satisfactory performance achieved, the lack of knowledge in the fre… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICME 2024 Oral

  42. arXiv:2406.12928  [pdf, other

    cs.LG cs.AI cs.CL

    Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

    Authors: Yijun Liu, Yuan Meng, Fang Wu, Shenhao Peng, Hang Yao, Chaoyu Guan, Chen Tang, Xinzhu Ma, Zhi Wang, Wenwu Zhu

    Abstract: Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and inference cost, quantization also faces challenges in performance degradation at low bit-widths. Understanding the impact of quantization on LLM capabilities, espec… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  43. arXiv:2406.12752  [pdf, other

    cs.CR cs.CV cs.LG

    Extracting Training Data from Unconditional Diffusion Models

    Authors: Yunhao Chen, Xingjun Ma, Difan Zou, Yu-Gang Jiang

    Abstract: As diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI), the study of their memorization of the raw training data has attracted growing attention. Existing works in this direction aim to establish an understanding of whether or to what extent DPMs learn by memorization. Such an understanding is crucial for identifying potential r… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  44. arXiv:2406.11523  [pdf

    cs.RO

    Guiding the Last Centimeter: Novel Anatomy-Aware Probe Servoing for Standardized Imaging Plane Navigation in Robotic Lung Ultrasound

    Authors: Xihan Ma, Mingjie Zeng, Jeffrey C. Hill, Beatrice Hoffmann, Ziming Zhang, Haichong K. Zhang

    Abstract: Navigating the ultrasound (US) probe to the standardized imaging plane (SIP) for image acquisition is a critical but operator-dependent task in conventional freehand diagnostic US. Robotic US systems (RUSS) offer the potential to enhance imaging consistency by leveraging real-time US image feedback to optimize the probe pose, thereby reducing reliance on operator expertise. However, determining th… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  45. arXiv:2406.11263  [pdf, other

    cs.CL cs.AI

    The Fall of ROME: Understanding the Collapse of LLMs in Model Editing

    Authors: Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen

    Abstract: Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that con… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  46. arXiv:2406.11251  [pdf, other

    cs.IR

    Unifying Multimodal Retrieval via Document Screenshot Embedding

    Authors: Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin

    Abstract: In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious, prone to errors, and has information loss. To this end, we propose Document Screenshot Embedding} (DSE), a novel retrieval paradigm that regards docu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  47. arXiv:2406.10797  [pdf, other

    cs.CV

    STAR: Scale-wise Text-to-image generation via Auto-Regressive representations

    Authors: Xiaoxiao Ma, Mohan Zhou, Tao Liang, Yalong Bai, Tiejun Zhao, Huaian Chen, Yi Jin

    Abstract: We present STAR, a text-to-image model that employs scale-wise auto-regressive paradigm. Unlike VAR, which is limited to class-conditioned synthesis within a fixed set of predetermined categories, our STAR enables text-driven open-set generation through three key designs: To boost diversity and generalizability with unseen combinations of objects and concepts, we introduce a pre-trained text encod… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  48. arXiv:2406.10580  [pdf, other

    cs.CV

    IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization

    Authors: Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, Jizhe Zhou

    Abstract: A comprehensive benchmark is yet to be established in the Image Manipulation Detection \& Localization (IMDL) field. The absence of such a benchmark leads to insufficient and misleading model evaluations, severely undermining the development of this field. However, the scarcity of open-sourced baseline models and inconsistent training and evaluation protocols make conducting rigorous experiments a… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Technical report

  49. arXiv:2406.10391  [pdf, other

    q-bio.QM cs.LG

    BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

    Authors: Yuchen Ren, Zhiyuan Chen, Lifeng Qiao, Hongtai Jing, Yuchen Cai, Sheng Xu, Peng Ye, Xinzhu Ma, Siqi Sun, Hongliang Yan, Dong Yuan, Wanli Ouyang, Xihui Liu

    Abstract: RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we i… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  50. arXiv:2406.09897  [pdf, other

    cs.CL

    3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

    Authors: Xindian Ma, Wenyuan Liu, Peng Zhang, Nan Xu

    Abstract: Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.