Zum Hauptinhalt springen

Showing 1–50 of 804 results for author: Zhiqiang

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.07972  [pdf, other

    cs.CV

    Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

    Authors: Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang

    Abstract: The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of which map the 2D image features into 3D grids. However, the current grid representing features within a… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  2. arXiv:2409.07497  [pdf, other

    cs.AI cs.CL cs.DB cs.IR cs.LG

    OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System

    Authors: Ningyu Zhang, Zekun Xi, Yujie Luo, Peng Wang, Bozhong Tian, Yunzhi Yao, Jintian Zhang, Shumin Deng, Mengshu Sun, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen

    Abstract: Knowledge representation has been a central aim of AI since its inception. Symbolic Knowledge Graphs (KGs) and neural Large Language Models (LLMs) can both represent knowledge. KGs provide highly accurate and explicit knowledge representation, but face scalability issue; while LLMs offer expansive coverage of knowledge, but incur significant training costs and struggle with precise and reliable kn… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: LLM+KG@VLDB2024, code is available at https://github.com/zjunlp/OneEdit

  3. arXiv:2409.07253  [pdf, other

    cs.LG cs.CV

    Alignment of Diffusion Models: Fundamentals, Challenges, and Future

    Authors: Buhua Liu, Shitong Shao, Bao Li, Lichen Bai, Zhiqiang Xu, Haoyi Xiong, James Kwok, Sumi Helal, Zeke Xie

    Abstract: Diffusion models have emerged as the leading paradigm in generative modeling, excelling in various applications. Despite their success, these models often misalign with human intentions, generating outputs that may not match text prompts or possess desired properties. Inspired by the success of alignment in tuning large language models, recent studies have investigated aligning diffusion models wi… ▽ More

    Submitted 12 September, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: 35 pages, 5 figures, 3 tables

  4. arXiv:2409.06323  [pdf, other

    cs.LG cs.AI cs.SI

    LAMP: Learnable Meta-Path Guided Adversarial Contrastive Learning for Heterogeneous Graphs

    Authors: Siqing Li, Jin-Duk Park, Wei Huang, Xin Cao, Won-Yong Shin, Zhiqiang Xu

    Abstract: Heterogeneous graph neural networks (HGNNs) have significantly propelled the information retrieval (IR) field. Still, the effectiveness of HGNNs heavily relies on high-quality labels, which are often expensive to acquire. This challenge has shifted attention towards Heterogeneous Graph Contrastive Learning (HGCL), which usually requires pre-defined meta-paths. However, our findings reveal that met… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 19 pages, 7 figures

  5. arXiv:2409.05152  [pdf, other

    cs.CL cs.AI cs.DB cs.IR cs.LG

    OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

    Authors: Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang, Zhiqiang Zhang, Jun Zhou, Huajun Chen, Ningyu Zhang

    Abstract: Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval fra… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

    Comments: Work in progress; code is available at https://github.com/zjunlp/OneGen

  6. arXiv:2409.04961  [pdf, other

    cs.RO

    Heterogeneous LiDAR Dataset for Benchmarking Robust Localization in Diverse Degenerate Scenarios

    Authors: Zhiqiang Chen, Yuhua Qi, Dapeng Feng, Xuebin Zhuang, Hongbo Chen, Xiangcheng Hu, Jin Wu, Kelin Peng, Peng Lu

    Abstract: The ability to estimate pose and generate maps using 3D LiDAR significantly enhances robotic system autonomy. However, existing open-source datasets lack representation of geometrically degenerate environments, limiting the development and benchmarking of robust LiDAR SLAM algorithms. To address this gap, we introduce GEODE, a comprehensive multi-LiDAR, multi-scenario dataset specifically designed… ▽ More

    Submitted 10 September, 2024; v1 submitted 7 September, 2024; originally announced September 2024.

    Comments: 15 pages, 9 figures, 6 tables. Submitted for IJRR dataset paper

  7. arXiv:2409.04340  [pdf, other

    cs.LG cs.AI cs.CL

    AGR: Age Group fairness Reward for Bias Mitigation in LLMs

    Authors: Shuirong Cao, Ruoxi Cheng, Zhiqiang Wang

    Abstract: LLMs can exhibit age biases, resulting in unequal treatment of individuals across age groups. While much research has addressed racial and gender biases, age bias remains little explored. The scarcity of instruction-tuning and preference datasets for age bias hampers its detection and measurement, and existing fine-tuning methods seldom address age-related fairness. In this paper, we construct age… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: The first two authors contributed equally to this work. Corresponding to Zhiqiang Wang. ACKNOWLEDGMENT: we would like to thank the computing resources support from the State Key Laboratory of New Computer Software Technologies at Nanjing University

  8. arXiv:2409.02920  [pdf, other

    cs.RO cs.AI cs.CL

    RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)

    Authors: Yao Mu, Tianxing Chen, Shijia Peng, Zanxin Chen, Zeyu Gao, Yude Zou, Lunkai Lin, Zhiqiang Xie, Ping Luo

    Abstract: Effective collaboration of dual-arm robots and their tool use capabilities are increasingly important areas in the advancement of robotics. These skills play a significant role in expanding robots' ability to operate in diverse real-world environments. However, progress is impeded by the scarcity of specialized training data. This paper introduces RoboTwin, a novel benchmark dataset combining real… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Project page: https://robotwin-benchmark.github.io/early-version/

  9. arXiv:2409.00694  [pdf, other

    cs.CV

    IAFI-FCOS: Intra- and across-layer feature interaction FCOS model for lesion detection of CT images

    Authors: Qiu Guan, Mengjie Pan, Feng Chen, Zhiqiang Yang, Zhongwen Yu, Qianwei Zhou, Haigen Hu

    Abstract: Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous l… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 2024 IJCNN

  10. BaseMirror: Automatic Reverse Engineering of Baseband Commands from Android's Radio Interface Layer

    Authors: Wenqiang Li, Haohuang Wen, Zhiqiang Lin

    Abstract: In modern mobile devices, baseband is an integral component running on top of cellular processors to handle crucial radio communications. However, recent research reveals significant vulnerabilities in these basebands, posing serious security risks like remote code execution. Yet, effectively scrutinizing basebands remains a daunting task, as they run closed-source and proprietary software on vend… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: This is the extended version of the CCS 2024 paper with the same title

    Journal ref: The ACM Conference on Computer and Communications Security (CCS) 2024

  11. arXiv:2408.16659  [pdf, other

    physics.med-ph cs.GR

    Motion-Driven Neural Optimizer for Prophylactic Braces Made by Distributed Microstructures

    Authors: Xingjian Han, Yu Jiang, Weiming Wang, Guoxin Fang, Simeon Gill, Zhiqiang Zhang, Shengfa Wang, Jun Saito, Deepak Kumar, Zhongxuan Luo, Emily Whiting, Charlie C. L. Wang

    Abstract: Joint injuries, and their long-term consequences, present a substantial global health burden. Wearable prophylactic braces are an attractive potential solution to reduce the incidence of joint injuries by limiting joint movements that are related to injury risk. Given human motion and ground reaction forces, we present a computational framework that enables the design of personalized braces by opt… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  12. arXiv:2408.16555  [pdf

    cs.CR cs.LG

    Android Malware Detection Based on RGB Images and Multi-feature Fusion

    Authors: Zhiqiang Wang, Qiulong Yu, Sicheng Yuan

    Abstract: With the widespread adoption of smartphones, Android malware has become a significant challenge in the field of mobile device security. Current Android malware detection methods often rely on feature engineering to construct dynamic or static features, which are then used for learning. However, static feature-based methods struggle to counter code obfuscation, packing, and signing techniques, whil… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 9 pages,10 figures

  13. arXiv:2408.15997  [pdf, other

    cs.LG cs.AI

    Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need

    Authors: Sijia Peng, Yun Xiong, Yangyong Zhu, Zhiqiang Shen

    Abstract: Time series forecasting requires balancing short-term and long-term dependencies for accurate predictions. Existing methods mainly focus on long-term dependency modeling, neglecting the complexities of short-term dynamics, which may hinder performance. Transformers are superior in modeling long-term dependencies but are criticized for their quadratic computational cost. Mamba provides a near-linea… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Code at https://github.com/lunaaa95/mou/

  14. arXiv:2408.14087  [pdf, other

    cs.CV

    LSM-YOLO: A Compact and Effective ROI Detector for Medical Detection

    Authors: Zhongwen Yu, Qiu Guan, Jianmin Yang, Zhiqiang Yang, Qianwei Zhou, Yang Chen, Feng Chen

    Abstract: In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above probl… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  15. arXiv:2408.13367  [pdf

    cs.CR cs.ET

    Generative Blockchain: Transforming Blockchain from Transaction Recording to Transaction Generation through Proof-of-Merit

    Authors: Haozhao Zhang, Zhe Zhang, Zhiqiang Zheng, Varghese Jacob

    Abstract: This paper proposes a new paradigm: generative blockchain, which aims to transform conventional blockchain technology by combining transaction generation and recording, rather than focusing solely on transaction recording. Central to our design is a novel consensus mechanism, Proof-of-Merit (PoM), specifically crafted for environments where businesses must solve complex problems before transaction… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  16. arXiv:2408.12454  [pdf, other

    cs.CV cs.AI

    Relaxed Rotational Equivariance via $G$-Biases in Vision

    Authors: Zhiqiang Wu, Licheng Sun, Yingjie Liu, Jian Yang, Hanlin Dong, Shing-Ho J. Lin, Xuan Tang, Jinpeng Mi, Bo Jin, Xian Wei

    Abstract: Group Equivariant Convolution (GConv) can effectively handle rotational symmetry data. They assume uniform and strict rotational symmetry across all features, as the transformations under the specific group. However, real-world data rarely conforms to strict rotational symmetry commonly referred to as Rotational Symmetry-Breaking in the system or dataset, making GConv unable to adapt effectively t… ▽ More

    Submitted 25 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  17. arXiv:2408.12413  [pdf, other

    q-bio.BM cs.AI

    Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures

    Authors: Ce Liu, Jun Wang, Zhiqiang Cai, Yingxu Wang, Huizhen Kuang, Kaihui Cheng, Liwei Zhang, Qingkun Su, Yining Tang, Fenglei Cao, Limei Han, Siyu Zhu, Yuan Qi

    Abstract: Despite significant progress in static protein structure collection and prediction, the dynamic behavior of proteins, one of their most vital characteristics, has been largely overlooked in prior research. This oversight can be attributed to the limited availability, diversity, and heterogeneity of dynamic protein datasets. To address this gap, we propose to enhance existing prestigious static 3D… ▽ More

    Submitted 4 September, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  18. arXiv:2408.11760  [pdf, other

    cs.CV cs.AI

    SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance

    Authors: Zhiqiang Wu, Yingjie Liu, Hanlin Dong, Xuan Tang, Jian Yang, Bo Jin, Mingsong Chen, Xian Wei

    Abstract: Introducing Group Equivariant Convolution (GConv) empowers models to explore symmetries hidden in visual data, improving their performance. However, in real-world scenarios, objects or scenes often exhibit perturbations of a symmetric system, specifically a deviation from a symmetric architecture, which can be characterized by a non-trivial action of a symmetry group, known as Symmetry-Breaking. T… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  19. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  20. arXiv:2408.08909  [pdf

    cs.CR cs.AI cs.DC

    An Adaptive Differential Privacy Method Based on Federated Learning

    Authors: Zhiqiang Wang, Xinyue Yu, Qianli Huang, Yongguang Gong

    Abstract: Differential privacy is one of the methods to solve the problem of privacy protection in federated learning. Setting the same privacy budget for each round will result in reduced accuracy in training. The existing methods of the adjustment of privacy budget consider fewer influencing factors and tend to ignore the boundaries, resulting in unreasonable privacy budgets. Therefore, we proposed an ada… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  21. arXiv:2408.08588  [pdf, other

    cs.IT eess.SP

    Movable Antenna for Wireless Communications:Prototyping and Experimental Results

    Authors: Zhenjun Dong, Zhiwen Zhou, Zhiqiang Xiao, Chaoyue Zhang, Xinrui Li, Hongqi Min, Yong Zeng, Shi Jin, Rui Zhang

    Abstract: Movable antenna (MA), which can flexibly change the position of antenna in three-dimensional (3D) continuous space, is an emerging technology for achieving full spatial performance gains. In this paper, a prototype of MA communication system with ultra-accurate movement control is presented to verify the performance gain of MA in practical environments. The prototype utilizes the feedback control… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  22. arXiv:2408.08570  [pdf, other

    cs.CV

    EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation

    Authors: Jun Zhou, Chunsheng Liu, Faliang Chang, Wenqian Wang, Penghui Hao, Yiming Huang, Zhiqiang Yang

    Abstract: Associating driver attention with driving scene across two fields of views (FOVs) is a hard cross-domain perception problem, which requires comprehensive consideration of cross-view mapping, dynamic driving scene analysis, and driver status tracking. Previous methods typically focus on a single view or map attention to the scene via estimated gaze, failing to exploit the implicit connection betwee… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 13pages, 9 figures,

  23. arXiv:2408.07966  [pdf, other

    cs.LG cs.DC

    Addressing Skewed Heterogeneity via Federated Prototype Rectification with Personalization

    Authors: Shunxin Guo, Hongsong Wang, Shuxia Lin, Zhiqiang Kou, Xin Geng

    Abstract: Federated learning is an efficient framework designed to facilitate collaborative model training across multiple distributed devices while preserving user data privacy. A significant challenge of federated learning is data-level heterogeneity, i.e., skewed or long-tailed distribution of private data. Although various methods have been proposed to address this challenge, most of them assume that th… ▽ More

    Submitted 22 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  24. arXiv:2408.07321  [pdf, other

    cs.SE cs.CR

    LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions

    Authors: Yiran Cheng, Lwin Khin Shar, Ting Zhang, Shouguo Yang, Chaopeng Dong, David Lo, Shichao Lv, Zhiqiang Shi, Limin Sun

    Abstract: Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vul… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  25. arXiv:2408.03544  [pdf, other

    cs.CL cs.AI

    Unlocking the Non-Native Language Context Limitation: Native Language Prompting Facilitates Knowledge Elicitation

    Authors: Baixuan Li, Yunlong Fan, Zhiqiang Gao

    Abstract: Multilingual large language models (MLLMs) struggle to answer questions posed in non-dominant languages, even though they have acquired the relevant knowledge from their dominant language corpus. In contrast, human multilinguals can overcome such non-native language context limitations through Positive Native Language Transfer (PNLT). Inspired by the process of PNLT, we analogize the dominant lang… ▽ More

    Submitted 16 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  26. arXiv:2408.02088  [pdf, other

    cs.CV cs.AI

    KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving

    Authors: Zhihao Lai, Chuanhao Liu, Shihui Sheng, Zhiqiang Zhang

    Abstract: Accurate 3D object detection in autonomous driving is critical yet challenging due to occlusions, varying object sizes, and complex urban environments. This paper introduces the KAN-RCBEVDepth method, an innovative approach aimed at enhancing 3D object detection by fusing multimodal sensor data from cameras, LiDAR, and millimeter-wave radar. Our unique Bird's Eye View-based approach significantly… ▽ More

    Submitted 27 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

  27. arXiv:2408.00491  [pdf, other

    cs.CL cs.CV cs.MM

    GalleryGPT: Analyzing Paintings with Large Multimodal Models

    Authors: Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

    Abstract: Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse interpretations, and complex visual elements, requiring expertise in art history, cultural background, and aesthetic theory. However, limited by the data… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted as Oral Presentation at ACM Multimedia 2024

  28. arXiv:2407.21586  [pdf, other

    cs.CV

    Adaptive Mix for Semi-Supervised Medical Image Segmentation

    Authors: Zhiqiang Shen, Peng Cao, Junming Su, Jinzhu Yang, Osmar R. Zaiane

    Abstract: Mix-up is a key technique for consistency regularization-based semi-supervised learning methods, generating strong-perturbed samples for strong-weak pseudo-supervision. Existing mix-up operations are performed either randomly or with predefined rules, such as replacing low-confidence patches with high-confidence ones. The former lacks control over the perturbation degree, leading to overfitting on… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  29. arXiv:2407.20499  [pdf, other

    cs.LG

    Optimizing Long-tailed Link Prediction in Graph Neural Networks through Structure Representation Enhancement

    Authors: Yakun Wang, Daixin Wang, Hongrui Liu, Binbin Hu, Yingcui Yan, Qiyang Zhang, Zhiqiang Zhang

    Abstract: Link prediction, as a fundamental task for graph neural networks (GNNs), has boasted significant progress in varied domains. Its success is typically influenced by the expressive power of node representation, but recent developments reveal the inferior performance of low-degree nodes owing to their sparse neighbor connections, known as the degree-based long-tailed problem. Will the degree-based lo… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  30. arXiv:2407.19672  [pdf, other

    cs.CL

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

    Authors: Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia, Xin Li, Lidong Bing

    Abstract: Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  31. arXiv:2407.15568  [pdf, other

    cs.SE cs.HC

    Empowering Agile-Based Generative Software Development through Human-AI Teamwork

    Authors: Sai Zhang, Zhenchang Xing, Ronghui Guo, Fangzhou Xu, Lei Chen, Zhaoyuan Zhang, Xiaowang Zhang, Zhiyong Feng, Zhiqiang Zhuang

    Abstract: In software development, the raw requirements proposed by users are frequently incomplete, which impedes the complete implementation of application functionalities. With the emergence of large language models, recent methods with the top-down waterfall model employ a questioning approach for requirement completion, attempting to explore further user requirements. However, users, constrained by the… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    ACM Class: K.6.3

  32. arXiv:2407.13101  [pdf, other

    cs.CL cs.AI

    Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach

    Authors: Zhouyu Jiang, Mengshu Sun, Lei Liang, Zhiqiang Zhang

    Abstract: Multi-hop question answering is a challenging task with distinct industrial relevance, and Retrieval-Augmented Generation (RAG) methods based on large language models (LLMs) have become a popular approach to tackle this task. Owing to the potential inability to retrieve all necessary information in a single iteration, a series of iterative RAG methods has been recently developed, showing significa… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  33. arXiv:2407.12882  [pdf, other

    cs.CL cs.AI cs.LG

    InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

    Authors: Yujia Hu, Zhiqiang Hu, Chun-Wei Seah, Roy Ka-Wei Lee

    Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in a wide range of NLP tasks. However, when it comes to authorship verification (AV) tasks, which involve determining whether two given texts share the same authorship, even advanced models like ChatGPT exhibit notable limitations. This paper introduces a novel approach, termed InstructAV, for authorship verification. This appro… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  34. arXiv:2407.11083  [pdf, other

    cs.LG

    Empowering Graph Invariance Learning with Deep Spurious Infomax

    Authors: Tianjun Yao, Yongqiang Chen, Zhenhao Chen, Kai Hu, Zhiqiang Shen, Kun Zhang

    Abstract: Recently, there has been a surge of interest in developing graph neural networks that utilize the invariance principle on graphs to generalize the out-of-distribution (OOD) data. Due to the limited knowledge about OOD data, existing approaches often pose assumptions about the correlation strengths of the underlying spurious features and the target labels. However, this prior is often unavailable a… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: ICML2024 camera-ready version

    ACM Class: I.2.6

  35. arXiv:2407.08694  [pdf, other

    cs.DC cs.AI cs.LG

    Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight

    Authors: Zhiqiang Xie, Yujia Zheng, Lizi Ottens, Kun Zhang, Christos Kozyrakis, Jonathan Mace

    Abstract: Runtime failure and performance degradation is commonplace in modern cloud systems. For cloud providers, automatically determining the root cause of incidents is paramount to ensuring high reliability and availability as prompt fault localization can enable faster diagnosis and triage for timely resolution. A compelling solution explored in recent work is causal reasoning using causal graphs to ca… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  36. arXiv:2407.08106  [pdf, other

    cs.RO

    SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM

    Authors: Neng Wang, Xieyuanli Chen, Chenghao Shi, Zhiqiang Zheng, Hongshan Yu, Huimin Lu

    Abstract: Loop closing is a crucial component in SLAM that helps eliminate accumulated errors through two main steps: loop detection and loop pose correction. The first step determines whether loop closing should be performed, while the second estimates the 6-DoF pose to correct odometry drift. Current methods mostly focus on developing robust descriptors for loop closure detection, often neglecting loop po… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  37. Distributed multi-robot potential-field-based exploration with submap-based mapping and noise-augmented strategy

    Authors: Khattiya Pongsirijinda, Zhiqiang Cao, Kaushik Bhowmik, Muhammad Shalihan, Billy Pik Lik Lau, Ran Liu, Chau Yuen, U-Xuan Tan

    Abstract: Multi-robot collaboration has become a needed component in unknown environment exploration due to its ability to accomplish various challenging situations. Potential-field-based methods are widely used for autonomous exploration because of their high efficiency and low travel cost. However, exploration speed and collaboration ability are still challenging topics. Therefore, we propose a Distribute… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by Robotics and Autonomous Systems

  38. arXiv:2407.07347  [pdf, other

    cs.CV eess.IV

    MNeRV: A Multilayer Neural Representation for Videos

    Authors: Qingling Chang, Haohui Yu, Shuxuan Fu, Zhiqiang Zeng, Chuangquan Chen

    Abstract: As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HN… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures, 8 table

  39. arXiv:2407.07093  [pdf, other

    cs.CL cs.AI cs.LG

    FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

    Authors: Liqun Ma, Mingjie Sun, Zhiqiang Shen

    Abstract: This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary LLM like BitNet b1.58) to match the performance of its full-precision counterparts (e.g., FP16 or BF16) in transformer-based LLMs. It achieves this by employing an autoregressive distillation (AD) loss… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Github at https://github.com/LiqunMa/FBI-LLM

  40. arXiv:2407.05248  [pdf, other

    cs.CV

    Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

    Authors: Junming Su, Zhiqiang Shen, Peng Cao, Jinzhu Yang, Osmar R. Zaiane

    Abstract: The existing barely-supervised medical image segmentation (BSS) methods, adopting a registration-segmentation paradigm, aim to learn from data with very few annotations to mitigate the extreme label scarcity problem. However, this paradigm poses a challenge: pseudo-labels generated by image registration come with significant noise. To address this issue, we propose a self-paced sample selection fr… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  41. arXiv:2407.04381  [pdf, other

    cs.CV cs.AI

    Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection

    Authors: Zhiqiang Yang, Qiu Guan, Keer Zhao, Jianmin Yang, Xinli Xu, Haixia Long, Ying Tang

    Abstract: Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  42. arXiv:2407.03745  [pdf, other

    cs.CR

    SRAS: Self-governed Remote Attestation Scheme for Multi-party Collaboration

    Authors: Linan Tian, Yunke Shen, Zhiqiang Li

    Abstract: Trusted Execution Environments (TEEs), such as Intel Software Guard Extensions (SGX), ensure the confidentiality and integrity of user applications when using cloud computing resources. However, in the multi-party cloud computing scenario, how to select a Relying Party to verify the TEE of each party and avoid leaking sensitive data to each other remains an open question. In this paper, we propose… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  43. arXiv:2407.02877  [pdf, other

    cs.IT eess.SP

    Resource Allocation Design for Next-Generation Multiple Access: A Tutorial Overview

    Authors: Zhiqiang Wei, Dongfang Xu, Shuangyang Li, Shenghui Song, Derrick Wing Kwan Ng, Giuseppe Caire

    Abstract: Multiple access is the cornerstone technology for each generation of wireless cellular networks and resource allocation design plays a crucial role in multiple access. In this paper, we present a comprehensive tutorial overview for junior researchers in this field, aiming to offer a foundational guide for resource allocation design in the context of next-generation multiple access (NGMA). Initiall… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 69 pages, 10 figures, 5 tables

  44. arXiv:2407.02779  [pdf, other

    cs.AI cs.LG

    Croppable Knowledge Graph Embedding

    Authors: Yushan Zhu, Wen Zhang, Zhiqiang Liu, Mingyang Chen, Lei Liang, Huajun Chen

    Abstract: Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks. The suitable dimensions of the embeddings depend on the storage and computing conditions of the specific application scenarios. Once a new dimension is required, a new KGE model needs to be trained from scratch, which greatly increases the training cost and limits the effic… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  45. arXiv:2407.02005  [pdf, other

    cs.CL cs.SD eess.AS

    An End-to-End Speech Summarization Using Large Language Model

    Authors: Hengchao Shang, Zongyao Li, Jiaxin Guo, Shaojun Li, Zhiqiang Rao, Yuanchang Luo, Daimeng Wei, Hao Yang

    Abstract: Abstractive Speech Summarization (SSum) aims to generate human-like text summaries from spoken content. It encounters difficulties in handling long speech input and capturing the intricate cross-modal mapping between long speech inputs and short text summaries. Research on large language models (LLMs) and multimodal information fusion has provided new insights for addressing these challenges. In t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: InterSpeech 2024

  46. arXiv:2407.01511  [pdf, other

    cs.AI

    CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

    Authors: Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, Philip Torr, Bernard Ghanem, Guohao Li

    Abstract: The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  47. arXiv:2407.01496  [pdf, other

    math.NA cs.LG

    Fast Iterative Solver For Neural Network Method: II. 1D Diffusion-Reaction Problems And Data Fitting

    Authors: Zhiqiang Cai, Anastassia Doktorova, Robert D. Falgout, César Herrera

    Abstract: This paper expands the damped block Newton (dBN) method introduced recently in [4] for 1D diffusion-reaction equations and least-squares data fitting problems. To determine the linear parameters (the weights and bias of the output layer) of the neural network (NN), the dBN method requires solving systems of linear equations involving the mass matrix. While the mass matrix for local hat basis funct… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 65K10; 65F05

  48. arXiv:2407.00352  [pdf, other

    cs.CV cs.AI

    PhyTracker: An Online Tracker for Phytoplankton

    Authors: Yang Yu, Qingxuan Lv, Yuezun Li, Zhiqiang Wei, Junyu Dong

    Abstract: Phytoplankton, a crucial component of aquatic ecosystems, requires efficient monitoring to understand marine ecological processes and environmental conditions. Traditional phytoplankton monitoring methods, relying on non-in situ observations, are time-consuming and resource-intensive, limiting timely analysis. To address these limitations, we introduce PhyTracker, an intelligent in situ tracking f… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13pages,eleven figures

  49. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  50. arXiv:2406.19973  [pdf, other

    cs.CV cs.LG

    STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical

    Authors: Guohao Sun, Can Qin, Huazhu Fu, Linwei Wang, Zhiqiang Tao

    Abstract: Large Vision-Language Models (LVLMs) have shown significant potential in assisting medical diagnosis by leveraging extensive biomedical datasets. However, the advancement of medical image understanding and reasoning critically depends on building high-quality visual instruction data, which is costly and labor-intensive to obtain, particularly in the medical domain. To mitigate this data-starving i… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 10 pages, 6 figures