Skip to main content

Showing 1–50 of 278 results for author: Shao, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13237  [pdf, other

    cs.AI

    LLM-Empowered State Representation for Reinforcement Learning

    Authors: Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji

    Abstract: Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate mappings from states to task rewards. Traditional methods typically depend on extensive sample learning to enrich state representations with task-specific information, which leads to low sample efficiency and high time… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.12344  [pdf, other

    cs.CL cs.CY

    The Better Angels of Machine Personality: How Personality Relates to LLM Safety

    Authors: Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong Liu, Yu Qiao, Jing Shao

    Abstract: Personality psychologists have analyzed the relationship between personality and safety behaviors in human society. Although Large Language Models (LLMs) demonstrate personality traits, the relationship between personality traits and safety abilities in LLMs still remains a mystery. In this paper, we discover that LLMs' personality traits are closely related to their safety abilities, i.e., toxici… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.10632  [pdf, other

    eess.IV cs.AI cs.CV

    Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model

    Authors: Zhening Liu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang

    Abstract: With the rapid advancement of stereo vision technologies, stereo image compression has emerged as a crucial field that continues to draw significant attention. Previous approaches have primarily employed a unidirectional paradigm, where the compression of one view is dependent on the other, resulting in imbalanced compression. To address this issue, we introduce a symmetric bidirectional stereo im… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  4. arXiv:2407.06043  [pdf, other

    cs.CV

    Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts

    Authors: Puzuo Wang, Wei Yao, Jie Shao, Zhiyi He

    Abstract: Domain adaptation (DA) techniques help deep learning models generalize across data shifts for point cloud semantic segmentation (PCSS). Test-time adaptation (TTA) allows direct adaptation of a pre-trained model to unlabeled data during inference stage without access to source data or additional training, avoiding privacy issues and large computational resources. We address TTA for geospatial PCSS… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  5. arXiv:2407.05540  [pdf, other

    cs.CV

    GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

    Authors: Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan

    Abstract: Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph fo… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  6. A Pairwise DomMix Attentive Adversarial Network for Unsupervised Domain Adaptive Object Detection

    Authors: Jie Shao, Jiacheng Wu, Wenzhong Shen, Cheng Yang

    Abstract: Unsupervised Domain Adaptive Object Detection (DAOD) could adapt a model trained on a source domain to an unlabeled target domain for object detection. Existing unsupervised DAOD methods usually perform feature alignments from the target to the source. Unidirectional domain transfer would omit information about the target samples and result in suboptimal adaptation when there are large domain shif… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: has published on IEEE Signal Processing Letters, 2023

  7. Style Alignment based Dynamic Observation Method for UAV-View Geo-localization

    Authors: Jie Shao, LingHao Jiang

    Abstract: The task of UAV-view geo-localization is to estimate the localization of a query satellite/drone image by matching it against a reference dataset consisting of drone/satellite images. Though tremendous strides have been made in feature alignment between satellite and drone views, vast differences in both inter and intra-class due to changes in viewpoint, altitude, and lighting remain a huge challe… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: has published on IEEE Transactions on Geoscience and Remote Sensing, 2023

  8. arXiv:2407.00949  [pdf, ps, other

    cs.CV eess.IV

    SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

    Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

    Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  9. arXiv:2407.00600  [pdf, other

    cs.CV cs.AI

    GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing

    Authors: Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu, Dacheng Tao

    Abstract: Large Vision-Language Models (LVLMs) have been widely adopted in various applications; however, they exhibit significant gender biases. Existing benchmarks primarily evaluate gender bias at the demographic group level, neglecting individual fairness, which emphasizes equal treatment of similar individuals. This research gap limits the detection of discriminatory behaviors, as individual fairness o… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures

  10. arXiv:2406.12550  [pdf, other

    cs.LG cs.AI

    Offline Imitation Learning with Model-based Reverse Augmentation

    Authors: Jie-Jing Shao, Hao-Sen Shi, Lan-Zhe Guo, Yu-Feng Li

    Abstract: In offline Imitation Learning (IL), one of the main challenges is the \textit{covariate shift} between the expert observations and the actual distribution encountered by the agent, because it is difficult to determine what action an agent should take when outside the state distribution of the expert demonstrations. Recently, the model-free solutions introduce the supplementary data and identify th… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD2024

  11. arXiv:2406.12072  [pdf, other

    cs.AI cs.LG

    DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

    Authors: Jiasheng Zhang, Jialin Chen, Menglin Yang, Aosong Feng, Shuang Liang, Jie Shao, Rex Ying

    Abstract: Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 28 pages, 13 figures

  12. arXiv:2406.12030  [pdf, other

    cs.CV cs.AI cs.CL

    SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

    Authors: Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao

    Abstract: The emergence of Vision Language Models (VLMs) has brought unprecedented advances in understanding multimodal information. The combination of textual and visual semantics in VLMs is highly complex and diverse, making the safety alignment of these models challenging. Furthermore, due to the limited study on the safety alignment of VLMs, there is a lack of large-scale, high-quality datasets. To addr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  13. arXiv:2406.06594  [pdf, other

    q-fin.CP cs.AI cs.LG

    Stock Movement Prediction with Multimodal Stable Fusion via Gated Cross-Attention Mechanism

    Authors: Chang Zong, Jian Shao, Weiming Lu, Yueting Zhuang

    Abstract: The accurate prediction of stock movements is crucial for investment strategies. Stock prices are subject to the influence of various forms of information, including financial indicators, sentiment analysis, news documents, and relational structures. Predominant analytical approaches, however, tend to address only unimodal or bimodal sources, neglecting the complexity of multimodal data. Further c… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 29 pages, 10 figures

    MSC Class: 68T07 ACM Class: I.2.6; J.4

  14. arXiv:2406.06572  [pdf, other

    cs.CL cs.AI cs.IR

    Graph Neural Network Enhanced Retrieval for Question Answering of LLMs

    Authors: Zijian Li, Qingyan Guo, Jiawei Shao, Lei Song, Jiang Bian, Jun Zhang, Rui Wang

    Abstract: Retrieval augmented generation has revolutionized large language model (LLM) outputs by providing factual supports. Nevertheless, it struggles to capture all the necessary knowledge for complex reasoning questions. Existing retrieval methods typically divide reference documents into passages, treating them in isolation. These passages, however, are often interrelated, such as passages that are con… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Under review

  15. arXiv:2406.06107  [pdf, other

    cs.AI

    EXPIL: Explanatory Predicate Invention for Learning in Games

    Authors: Jingyuan Sha, Hikaru Shindo, Quentin Delfosse, Kristian Kersting, Devendra Singh Dhami

    Abstract: Reinforcement learning (RL) has proven to be a powerful tool for training agents that excel in various games. However, the black-box nature of neural network models often hinders our ability to understand the reasoning behind the agent's actions. Recent research has attempted to address this issue by using the guidance of pretrained neural agents to encode logic-based policies, allowing for interp… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 pages references, 8 figures, 3 tables

  16. arXiv:2406.03474  [pdf, other

    cs.CV

    AD-H: Autonomous Driving with Hierarchical Agents

    Authors: Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

    Abstract: Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  17. arXiv:2406.01493  [pdf, other

    cs.CV

    Learning Temporally Consistent Video Depth from Video Diffusion Priors

    Authors: Jiahao Shao, Yuanbo Yang, Hongyu Zhou, Youmin Zhang, Yujun Shen, Matteo Poggi, Yiyi Liao

    Abstract: This work addresses the challenge of video depth estimation, which expects not only per-frame accuracy but, more importantly, cross-frame consistency. Instead of directly developing a depth estimator from scratch, we reformulate the prediction task into a conditional generation problem. This allows us to leverage the prior knowledge embedded in existing video generation models, thereby reducing le… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  18. arXiv:2405.18044  [pdf, other

    cs.MA cs.AI

    Cognitive Insights and Stable Coalition Matching for Fostering Multi-Agent Cooperation

    Authors: Jiaqi Shao, Tianjun Yuan, Tao Lin, Xuanyu Cao, Bing Luo

    Abstract: Cognitive abilities, such as Theory of Mind (ToM), play a vital role in facilitating cooperation in human social interactions. However, our study reveals that agents with higher ToM abilities may not necessarily exhibit better cooperative behavior compared to those with lower ToM abilities. To address this challenge, we propose a novel matching coalition mechanism that leverages the strengths of a… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  19. arXiv:2405.17053  [pdf, other

    cs.NI cs.AI cs.LG

    WirelessLLM: Empowering Large Language Models Towards Wireless Intelligence

    Authors: Jiawei Shao, Jingwen Tong, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, Jun Zhang

    Abstract: The rapid evolution of wireless technologies and the growing complexity of network infrastructures necessitate a paradigm shift in how communication networks are designed, configured, and managed. Recent advancements in Large Language Models (LLMs) have sparked interest in their potential to revolutionize wireless communication systems. However, existing studies on LLMs for wireless systems are li… ▽ More

    Submitted 15 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  20. arXiv:2405.14365  [pdf, other

    cs.CL cs.AI

    JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

    Authors: Kun Zhou, Beichen Zhang, Jiapeng Wang, Zhipeng Chen, Wayne Xin Zhao, Jing Sha, Zhichao Sheng, Shijin Wang, Ji-Rong Wen

    Abstract: Mathematical reasoning is an important capability of large language models~(LLMs) for real-world applications. To enhance this capability, existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs (\eg GPT-4) to synthesize massive math problems. Both types of work generally lead to large costs in training or synthesis. To reduce the cost, based on op… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 28 pages, SOTA math LLM using Well-trained Data Synthesis LLM

  21. arXiv:2405.09514  [pdf, other

    eess.SP cs.IT cs.LG

    Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck

    Authors: Hongru Li, Jiawei Shao, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that t… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages, 8 figures, submitted to IEEE for potential publication

  22. arXiv:2405.06232  [pdf, other

    cs.AI

    Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process

    Authors: Tong Xiao, Jiayu Liu, Zhenya Huang, Jinze Wu, Jing Sha, Shijin Wang, Enhong Chen

    Abstract: Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand both text and diagram, master essential geometry knowledge, and appropriately apply it in reasoning. However, existing works follow a paradigm of neural machine translation and only focus on enhancing the capability of enc… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024 Accepted

  23. arXiv:2405.06004  [pdf, other

    physics.ao-ph cs.AI cs.LG

    EWMoE: An effective model for global weather forecasting with mixture-of-experts

    Authors: Lihao Gan, Xin Man, Chenghong Zhang, Jie Shao

    Abstract: Weather forecasting is a crucial task for meteorologic research, with direct social and economic impacts. Recently, data-driven weather forecasting models based on deep learning have shown great potential, achieving superior performance compared with traditional numerical weather prediction methods. However, these models often require massive training data and computational resources. In this pape… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  24. arXiv:2404.14368  [pdf, other

    cs.CV cs.AI cs.CL

    Graphic Design with Large Multimodal Model

    Authors: Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao

    Abstract: In the field of graphic design, automating the integration of design elements into a cohesive multi-layered artwork not only boosts productivity but also paves the way for the democratization of graphic design. One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements. It has been constrained by the necessity for a predefined correct sequence of laye… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  25. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  26. arXiv:2404.13692  [pdf, other

    cs.CV

    A sustainable development perspective on urban-scale roof greening priorities and benefits

    Authors: Jie Shao, Wei Yao, Lei Luo, Linzhou Zeng, Zhiyi He, Puzuo Wang, Huadong Guo

    Abstract: Greenspaces are tightly linked to human well-being. Yet, rapid urbanization has exacerbated greenspace exposure inequality and declining human life quality. Roof greening has been recognized as an effective strategy to mitigate these negative impacts. Understanding priorities and benefits is crucial to promoting green roofs. Here, using geospatial big data, we conduct an urban-scale assessment of… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  27. arXiv:2403.19622  [pdf, other

    cs.RO cs.CV

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Authors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

    Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, mak… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 24 pages, 12 figures, 6 tables

  28. arXiv:2403.18760  [pdf, other

    cs.RO

    MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

    Authors: Yike Wu, Jiatao Zhang, Nan Hu, LanLing Tang, Guilin Qi, Jun Shao, Jie Ren, Wei Song

    Abstract: In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models' planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which… ▽ More

    Submitted 1 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  29. arXiv:2403.17830  [pdf, other

    cs.CV

    Assessment of Multimodal Large Language Models in Alignment with Human Values

    Authors: Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

    Abstract: Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh). However, in terms of Multimodal Large Language Models (MLLMs), despite their commendable performance in perception and reasoning tasks, their alignment with human values remains largely unexplored, given the complexity of defining h… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.02692

  30. arXiv:2403.12722  [pdf, other

    cs.CV

    HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

    Authors: Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao

    Abstract: Holistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Our project page is at https://xdimlab.github.io/hugs_website

  31. arXiv:2403.12171  [pdf, other

    cs.CL cs.AI

    EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

    Authors: Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  32. arXiv:2403.12037  [pdf, other

    cs.CV

    MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

    Authors: Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao

    Abstract: It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways. However, existing approaches often fail to steadily follow instructions due to difficulties in understanding abstract and sequential natural language instructions. To this end, we introduce MineDreamer, an open-ended embodied agent built upon the challenging Minecraft simulator… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://sites.google.com/view/minedreamer/main

  33. arXiv:2403.11050  [pdf, other

    cs.CV

    Endora: Video Generation Models as Endoscopy Simulators

    Authors: Chenxin Li, Hengyu Liu, Yifan Liu, Brandon Y. Feng, Wuyang Li, Xinyu Liu, Zhen Chen, Jing Shao, Yixuan Yuan

    Abstract: Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped.This paper introduces \model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We present a… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Project page: https://endora-medvidgen.github.io/

  34. arXiv:2403.09131  [pdf, other

    cs.CL cs.AI

    ProSwitch: Knowledge-Guided Instruction Tuning to Generate Professional and Non-Professional Styled Text

    Authors: Chang Zong, Yuyan Chen, Weiming Lu, Jian Shao, Yueting Zhuang

    Abstract: Large Language Models (LLMs) have demonstrated efficacy in various linguistic applications, including text summarization and controlled text generation. However, studies into their capacity of switching between styles via fine-tuning remain underexplored. This study concentrates on textual professionalism and introduces a novel methodology, named ProSwitch, which equips a language model with the a… ▽ More

    Submitted 15 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 8 pages

    MSC Class: 68T50 ACM Class: I.2.7

  35. arXiv:2403.08505  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    Content-aware Masked Image Modeling Transformer for Stereo Image Compression

    Authors: Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

    Abstract: Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compressi… ▽ More

    Submitted 19 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  36. arXiv:2403.07865  [pdf, other

    cs.CL cs.AI cs.CR cs.LG cs.SE

    CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

    Authors: Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma

    Abstract: The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces C… ▽ More

    Submitted 9 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: ACL Findings 2024, Code is available at https://github.com/renqibing/CodeAttack

  37. arXiv:2403.07608  [pdf, other

    cs.DB cs.AI cs.LG

    Couler: Unified Machine Learning Workflow Optimization in Cloud

    Authors: Xiaoda Wang, Yuan Tang, Tengda Guo, Bo Sang, Jingji Wu, Jian Sha, Ke Zhang, Jiang Qian, Mingjie Tang

    Abstract: Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, num… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  38. arXiv:2403.06071  [pdf, other

    cs.CV cs.IR

    Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic Hashing

    Authors: Liyang He, Zhenya Huang, Jiayu Liu, Enhong Chen, Fei Wang, Jing Sha, Shijin Wang

    Abstract: Unsupervised semantic hashing has emerged as an indispensable technique for fast image search, which aims to convert images into binary hash codes without relying on labels. Recent advancements in the field demonstrate that employing large-scale backbones (e.g., ViT) in unsupervised semantic hashing models can yield substantial improvements. However, the inference delay has become increasingly dif… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 12 pages, 19 figures, Proceedings of the ACM Web Conference 2024 (WWW '24)

  39. arXiv:2403.05170  [pdf, other

    cs.CV

    DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition

    Authors: Jie Shao, Ke Zhu, Hanxiao Zhang, Jianxin Wu

    Abstract: This paper proposes a new pipeline for long-tail (LT) recognition. Instead of re-weighting or re-sampling, we utilize the long-tailed dataset itself to generate a balanced proxy that can be optimized through cross-entropy (CE). Specifically, a randomly initialized diffusion model, trained exclusively on the long-tailed dataset, is employed to synthesize new samples for underrepresented classes. Th… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  40. arXiv:2403.02528  [pdf, other

    cs.CL cs.AI

    DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation

    Authors: Xueqing Wu, Rui Zheng, Jingzhen Sha, Te-Lin Wu, Hanyu Zhou, Mohan Tang, Kai-Wei Chang, Nanyun Peng, Haoran Huang

    Abstract: Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights to comprehensively answer a given user query for tabular data. In this work, we aim to propose new resources and benchmarks to inspire future research on this crucial yet challenging and under-explored task. However, collecting data analysis annotations curated by experts can be prohibitively expensi… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  41. arXiv:2403.02132  [pdf, other

    cs.CV

    UB-FineNet: Urban Building Fine-grained Classification Network for Open-access Satellite Images

    Authors: Zhiyi He, Wei Yao, Jie Shao, Puzuo Wang

    Abstract: Fine classification of city-scale buildings from satellite remote sensing imagery is a crucial research area with significant implications for urban planning, infrastructure development, and population distribution analysis. However, the task faces big challenges due to low-resolution overhead images acquired from high altitude space-borne platforms and the long-tail sample distribution of fine-gr… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  42. arXiv:2402.19465  [pdf, other

    cs.CL cs.AI

    Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

    Authors: Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao

    Abstract: Ensuring the trustworthiness of large language models (LLMs) is crucial. Most studies concentrate on fully pre-trained LLMs to better understand and improve LLMs' trustworthiness. In this paper, to reveal the untapped potential of pre-training, we pioneer the exploration of LLMs' trustworthiness during this period, focusing on five key dimensions: reliability, privacy, toxicity, fairness, and robu… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  43. arXiv:2402.14320  [pdf, other

    cs.CL cs.AI

    Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering

    Authors: Chang Zong, Yuchen Yan, Weiming Lu, Jian Shao, Eliot Huang, Heng Chang, Yueting Zhuang

    Abstract: Recent progress with LLM-based agents has shown promising results across various tasks. However, their use in answering questions from knowledge bases remains largely unexplored. Implementing a KBQA system using traditional methods is challenging due to the shortage of task-specific training data and the complexity of creating task-focused model structures. In this paper, we present Triad, a unifi… ▽ More

    Submitted 15 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 8 pages

    MSC Class: 68T50 ACM Class: I.2.7

  44. arXiv:2402.10464  [pdf, other

    cs.LG cs.NI

    FedKit: Enabling Cross-Platform Federated Learning for Android and iOS

    Authors: Sichang He, Beilong Tang, Boyan Zhang, Jiaoqi Shao, Xiaomin Ouyang, Daniel Nata Nugraha, Bing Luo

    Abstract: We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and traini… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: This work has been accepted for demonstration on IEEE International Conference on Computer Communications (INFOCOM) 2024

  45. arXiv:2402.09283  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

    Authors: Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao

    Abstract: Large Language Models (LLMs) are now commonplace in conversation applications. However, their risks of misuse for generating harmful responses have raised serious societal concerns and spurred recent research on LLM conversation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, an… ▽ More

    Submitted 27 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted to NAACL 2024

  46. arXiv:2402.05044  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

    Authors: Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao

    Abstract: In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount. To meet this crucial need, we propose \emph{SALAD-Bench}, a safety benchmark specifically designed for evaluating LLMs, attack, and defense methods. Distinguished by its breadth, SALAD-Bench transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy s… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 Findings

  47. arXiv:2402.04546  [pdf, other

    cs.RO

    LiDAR-Forest Dataset: LiDAR Point Cloud Simulation Dataset for Forestry Application

    Authors: Yawen Lu, Zhuoyang Sun, Jinyuan Shao, Qianyu Guo, Yunhan Huang, Songlin Fei, Yingjie Chen

    Abstract: The popularity of LiDAR devices and sensor technology has gradually empowered users from autonomous driving to forest monitoring, and research on 3D LiDAR has made remarkable progress over the years. Unlike 2D images, whose focused area is visible and rich in texture information, understanding the point distribution can help companies and researchers find better ways to develop point-based 3D appl… ▽ More

    Submitted 15 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 5 pages

  48. arXiv:2402.01276  [pdf, other

    cs.AI

    Federated Unlearning: a Perspective of Stability and Fairness

    Authors: Jiaqi Shao, Tao Lin, Xuanyu Cao, Bing Luo

    Abstract: This paper explores the multifaceted consequences of federated unlearning (FU) with data heterogeneity. We introduce key metrics for FU assessment, concentrating on verification, global stability, and local fairness, and investigate the inherent trade-offs. Furthermore, we formulate the unlearning process with data heterogeneity through an optimization framework. Our key contribution lies in a com… ▽ More

    Submitted 1 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  49. arXiv:2401.15885  [pdf, other

    cs.CV

    Rectify the Regression Bias in Long-Tailed Object Detection

    Authors: Ke Zhu, Minghao Fu, Jie Shao, Tianyu Liu, Jianxin Wu

    Abstract: Long-tailed object detection faces great challenges because of its extremely imbalanced class distribution. Recent methods mainly focus on the classification bias and its loss function design, while ignoring the subtle influence of the regression branch. This paper shows that the regression bias exists and does adversely and seriously impact the detection accuracy. While existing methods fail to h… ▽ More

    Submitted 31 January, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  50. arXiv:2401.15071  [pdf, other

    cs.CV

    From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

    Authors: Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He , et al. (11 additional authors not shown)

    Abstract: Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance unde… ▽ More

    Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.