Skip to main content

Showing 1–50 of 761 results for author: Zhao, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11569  [pdf, other

    cs.CV

    SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds

    Authors: Yanbo Wang, Wentao Zhao, Chuan Cao, Tianchen Deng, Jingchuan Wang, Weidong Chen

    Abstract: Although LiDAR semantic segmentation advances rapidly, state-of-the-art methods often incorporate specifically designed inductive bias derived from benchmarks originating from mechanical spinning LiDAR. This can limit model generalizability to other kinds of LiDAR technologies and make hyperparameter tuning more complex. To tackle these issues, we propose a generalized framework to accommodate var… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  2. arXiv:2407.10804  [pdf, other

    cs.CL

    Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment

    Authors: Jinhao Jiang, Junyi Li, Wayne Xin Zhao, Yang Song, Tao Zhang, Ji-Rong Wen

    Abstract: Adapting general large language models (LLMs) to specialized domains presents great challenges due to varied data distributions. This adaptation typically requires continual pre-training on massive domain-specific corpora to facilitate knowledge memorization, followed by training to apply this knowledge following human instructions and preferences. However, this method may result in inefficient kn… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: LLM, CPT, knowledge learning, format alignment; work in progress

  3. GeoMix: Towards Geometry-Aware Data Augmentation

    Authors: Wentao Zhao, Qitian Wu, Chenxiao Yang, Junchi Yan

    Abstract: Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. By synthesizing samples through the interpolation of features and labels, Mixup effectively addresses the issue of data scarcity. However, it has rarely been explored in graph learning tasks due to the irregularity and connectivity of graph data. Specifically, in node classifica… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Published as a conference paper at KDD 2024

  4. arXiv:2407.10255  [pdf, other

    cs.SD eess.AS

    CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR

    Authors: Wenbo Zhao, Ziwei Li, Chuan Yu, Zhijian Ou

    Abstract: Streaming automatic speech recognition (ASR) is very important for many real-world ASR applications. However, a notable challenge for streaming ASR systems lies in balancing operational performance against latency constraint. Recently, a method of chunking, simulating future context and decoding, called CUSIDE, has been proposed for connectionist temporal classification (CTC) based streaming ASR,… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  5. arXiv:2407.09829  [pdf, other

    cs.RO

    VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation

    Authors: Wentao Zhao, Jiaming Chen, Ziyu Meng, Donghui Mao, Ran Song, Wei Zhang

    Abstract: Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some complex scenarios. To address this issue, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation framework which takes advantage… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by RSS2024

  6. arXiv:2407.08010  [pdf

    cs.LG cs.NE

    A New Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction

    Authors: Fulong Yao, Wanqing Zhao, Matthew Forshaw, Yang Song

    Abstract: This paper proposes a new self-organizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO) for multi-step time series prediction. Differing from the traditional six-layer IT2FNN, a nine-layer network is developed to improve prediction accuracy, uncertainty handling and model interpretability. First, a new co-antecedent layer and a modified consequent layer are devised to im… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  7. arXiv:2407.05563  [pdf, other

    cs.CL

    LLMBox: A Comprehensive Library for Large Language Models

    Authors: Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Demo

  8. arXiv:2407.05092  [pdf, other

    cs.CL

    Exploring Sound Change Over Time: A Review of Computational and Human Perception

    Authors: Siqi He, Wei Zhao

    Abstract: Computational and human perception are often considered separate approaches for studying sound changes over time; few works have touched on the intersection of both. To fill this research gap, we provide a pioneering review contrasting computational with human perception from the perspectives of methods and tasks. Overall, computational approaches rely on computer-driven models to perceive histori… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: LChange24 Camera Ready

  9. arXiv:2407.04010  [pdf, other

    cs.CL

    Exploring Diachronic and Diatopic Changes in Dialect Continua: Tasks, Datasets and Challenges

    Authors: Melis Çelikkol, Lydia Körber, Wei Zhao

    Abstract: Everlasting contact between language communities leads to constant changes in languages over time, and gives rise to language varieties and dialects. However, the communities speaking non-standard language are often overlooked by non-inclusive NLP technologies. Recently, there has been a surge of interest in studying diatopic and diachronic changes in dialect NLP, but there is currently no researc… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: LChange24 Camera Ready

  10. arXiv:2407.00834  [pdf, other

    cs.IR

    Prediction of Sentinel-2 multi-band imagery with attention BiLSTM for continuous earth surface monitoring

    Authors: Weiying Zhao, Natalia Efremova

    Abstract: Continuous monitoring of crops and forecasting crop conditions through time series analysis is crucial for effective agricultural management. This study proposes a framework based on an attention Bidirectional Long Short-Term Memory (BiLSTM) network for predicting multiband images. Our model can forecast target images on user-defined dates, including future dates and periods characterized by persi… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  11. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  12. arXiv:2406.18118  [pdf, other

    cs.CR cs.CL

    SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

    Authors: Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, w… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  13. arXiv:2406.16845  [pdf, other

    cs.CL

    RaTEScore: A Metric for Radiology Report Generation

    Authors: Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models. RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Technically, we developed a comprehens… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.16347  [pdf, other

    cs.CR cs.SE

    VulZoo: A Comprehensive Vulnerability Intelligence Dataset

    Authors: Bonan Ruan, Jiahao Liu, Weibo Zhao, Zhenkai Liang

    Abstract: Software vulnerabilities pose critical security and risk concerns for many software systems. Many techniques have been proposed to effectively assess and prioritize these vulnerabilities before they cause serious consequences. To evaluate their performance, these solutions often craft their own experimental datasets from limited information sources, such as MITRE CVE and NVD, lacking a global over… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  16. arXiv:2406.15718  [pdf, other

    cs.CL

    Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

    Authors: Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

    Abstract: As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can lis… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  17. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  18. arXiv:2406.14129  [pdf, other

    cs.CV cs.CL cs.MM

    Towards Event-oriented Long Video Understanding

    Authors: Yifan Du, Kun Zhou, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen

    Abstract: With the rapid development of video Multimodal Large Language Models (MLLMs), numerous benchmarks have been proposed to assess their video understanding capability. However, due to the lack of rich events in the videos, these datasets may suffer from the short-cut bias that the answers can be deduced from a few frames, without the need to watch the entire video. To address this issue, we introduce… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work on progress

  19. arXiv:2406.14022  [pdf, other

    cs.LG cs.CL

    Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning

    Authors: Xiaolei Wang, Xinyu Tang, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: The emergence of in-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) for recognizing the task from demonstrations and utilizing pre-trained priors, and task learning (TL) for learning from demonstrations. However, relationships between the two abilities and how such relationships affect the emergence of ICL is unclear. In this paper, we take the first… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: work in progress

  20. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  21. arXiv:2406.13890  [pdf, other

    cs.CL cs.AI

    ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

    Authors: Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu

    Abstract: LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical eval… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  22. arXiv:2406.13445  [pdf, other

    cs.CV cs.AI

    Lost in UNet: Improving Infrared Small Target Detection by Underappreciated Local Features

    Authors: Wuzhou Quan, Wei Zhao, Weiming Wang, Haoran Xie, Fu Lee Wang, Mingqiang Wei

    Abstract: Many targets are often very small in infrared images due to the long-distance imaging meachnism. UNet and its variants, as popular detection backbone networks, downsample the local features early and cause the irreversible loss of these local features, leading to both the missed and false detection of small targets in infrared images. We propose HintU, a novel network to recover the local features… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  23. arXiv:2406.13381  [pdf, other

    cs.CL

    CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

    Authors: Xinming Hou, Mingming Yang, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Wayne Xin Zhao

    Abstract: Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages, 4 figures

  24. arXiv:2406.12824  [pdf, other

    cs.CL cs.AI

    From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

    Authors: Hitesh Wadhwa, Rahul Seetharaman, Somyaa Aggarwal, Reshmi Ghosh, Samyadeep Basu, Soundararajan Srinivasan, Wenlong Zhao, Shreyas Chaudhari, Ehsan Aghazadeh

    Abstract: Retrieval Augmented Generation (RAG) enriches the ability of language models to reason using external context to augment responses for a given user prompt. This approach has risen in popularity due to practical applications in various applications of language models in search, question/answering, and chat-bots. However, the exact nature of how this approach works isn't clearly understood. In this… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  25. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  26. arXiv:2406.12606  [pdf, other

    cs.CL

    Low-Redundant Optimization for Large Language Model Alignment

    Authors: Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Jingyuan Wang, Ji-Rong Wen

    Abstract: Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios. They are prone to overfit into the unexpected patterns or superficial styles in the training data. We conduct an empirical study that only selects the top-10\% most updated parameters in LLMs for alignment training, and see improvements in the convergence process and final performanc… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 14 pages, working in progress

  27. arXiv:2406.12397  [pdf, other

    cs.CL

    Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models

    Authors: Jie Chen, Yupeng Zhang, Bingning Wang, Wayne Xin Zhao, Ji-Rong Wen, Weipeng Chen

    Abstract: Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs). Studies have shown that synthetic data can effectively improve the performance of LLMs on downstream benchmarks. However, despite its potential benefits, our analysis suggests that there may be inherent flaws in synthetic data. The uniform format of syn… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 15 pages

  28. arXiv:2406.11434  [pdf, other

    cs.DB

    DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models

    Authors: Fan Zhou, Siqiao Xue, Danrui Qi, Wenhui Shi, Wang Zhao, Ganglin Wei, Hongyang Zhang, Caigai Jiang, Gangwei Jiang, Zhixuan Chu, Faqiang Chen

    Abstract: Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  29. arXiv:2406.11277  [pdf, other

    cs.CL

    Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector

    Authors: Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen

    Abstract: Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as GPT-4. In this paper, we propose an autonomous LLM-based agent framework, called HaluAgent, which enables relatively smaller LLMs (e.g. Baichuan2-Chat 7B) to actively select suitable tools for detecting multiple hallucination types such as text, c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  30. arXiv:2406.11192  [pdf, other

    cs.CL

    Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition

    Authors: Yuming Yang, Wantong Zhao, Caishuang Huang, Junjie Ye, Xiao Wang, Huiyuan Zheng, Yang Nan, Yuran Wang, Xueying Xu, Kaixin Huang, Yunke Zhang, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Open Named Entity Recognition (NER), which involves identifying arbitrary types of entities from arbitrary domains, remains challenging for Large Language Models (LLMs). Recent studies suggest that fine-tuning LLMs on extensive NER data can boost their performance. However, training directly on existing datasets faces issues due to inconsistent entity definitions and redundant data, limiting LLMs… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 20 pages. Project page: https://github.com/UmeanNever/B2NER

  31. arXiv:2406.11172  [pdf, other

    cs.CL

    Enhancing Criminal Case Matching through Diverse Legal Factors

    Authors: Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang

    Abstract: Criminal case matching endeavors to determine the relevance between different criminal cases. Conventional methods predict the relevance solely based on instance-level semantic features and neglect the diverse legal factors (LFs), which are associated with diverse court judgments. Consequently, comprehensively representing a criminal case remains a challenge for these approaches. Moreover, extract… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  32. arXiv:2406.10954  [pdf, other

    cs.LG cs.CR

    Towards Efficient Target-Level Machine Unlearning Based on Essential Graph

    Authors: Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, Wei Zhao

    Abstract: Machine unlearning is an emerging technology that has come to attract widespread attention. A number of factors, including regulations and laws, privacy, and usability concerns, have resulted in this need to allow a trained model to forget some of its training data. Existing studies of machine unlearning mainly focus on unlearning requests that forget a cluster of instances or all instances from o… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  33. arXiv:2406.10951  [pdf, other

    cs.CR

    Don't Forget Too Much: Towards Machine Unlearning on Feature Level

    Authors: Heng Xu, Tianqing Zhu, Wanlei Zhou, Wei Zhao

    Abstract: Machine unlearning enables pre-trained models to remove the effect of certain portions of training data. Previous machine unlearning schemes have mainly focused on unlearning a cluster of instances or all instances belonging to a specific class. These types of unlearning might have a significant impact on the model utility; and they may be inadequate for situations where we only need to unlearn fe… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  34. arXiv:2406.10501  [pdf, other

    cs.CV

    Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition

    Authors: Weichao Zhao, Wengang Zhou, Hezhen Hu, Min Wang, Houqiang Li

    Abstract: Recently, there have been efforts to improve the performance in sign language recognition by designing self-supervised learning methods. However, these methods capture limited information from sign pose data in a frame-wise learning manner, leading to sub-optimal solutions. To this end, we propose a simple yet effective self-supervised contrastive learning framework to excavate rich context via sp… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted by TIP2023

  35. arXiv:2406.09884  [pdf, other

    cs.MM cs.CL cs.SI

    Enhancing Fake News Detection in Social Media via Label Propagation on Cross-modal Tweet Graph

    Authors: Wanqing Zhao, Yuta Nakashima, Haiyuan Chen, Noboru Babaguchi

    Abstract: Fake news detection in social media has become increasingly important due to the rapid proliferation of personal media channels and the consequential dissemination of misleading information. Existing methods, which primarily rely on multimodal features and graph-based techniques, have shown promising performance in detecting fake news. However, they still face a limitation, i.e., sparsity in graph… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages

  36. arXiv:2406.09812  [pdf, other

    cs.IR

    Soil nitrogen forecasting from environmental variables provided by multisensor remote sensing images

    Authors: Weiying Zhao, Ganzorig Chuluunbat, Aleksei Unagaev, Natalia Efremova

    Abstract: This study introduces a framework for forecasting soil nitrogen content, leveraging multi-modal data, including multi-sensor remote sensing images and advanced machine learning methods. We integrate the Land Use/Land Cover Area Frame Survey (LUCAS) database, which covers European and UK territory, with environmental variables from satellite sensors to create a dataset of novel features. We further… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  37. arXiv:2406.07094  [pdf, other

    cs.IR

    Grapevine Disease Prediction Using Climate Variables from Multi-Sensor Remote Sensing Imagery via a Transformer Model

    Authors: Weiying Zhao, Natalia Efremova

    Abstract: Early detection and management of grapevine diseases are important in pursuing sustainable viticulture. This paper introduces a novel framework leveraging the TabPFN model to forecast blockwise grapevine diseases using climate variables from multi-sensor remote sensing imagery. By integrating advanced machine learning techniques with detailed environmental data, our approach significantly enhances… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  38. arXiv:2406.07017  [pdf, other

    cs.LG cs.CL

    MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations

    Authors: Zixiao Wang, Jingwei Zhang, Wenqian Zhao, Farzan Farnia, Bei Yu

    Abstract: Few-shot gradient methods have been extensively utilized in existing model pruning methods, where the model weights are regarded as static values and the effects of potential weight perturbations are not considered. However, the widely used large language models (LLMs) have several billion model parameters, which could increase the fragility of few-shot gradient pruning. In this work, we experimen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.06953  [pdf, other

    cs.CV

    Stepwise Regression and Pre-trained Edge for Robust Stereo Matching

    Authors: Weiqing Xiao, Wei Zhao

    Abstract: Due to the difficulty in obtaining real samples and ground truth, the generalization performance and the fine-tuned performance are critical for the feasibility of stereo matching methods in real-world applications. However, the presence of substantial disparity distributions and density variations across different datasets presents significant challenges for the generalization and fine-tuning of… ▽ More

    Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  40. arXiv:2406.04578  [pdf, other

    cs.CL

    SC2: Towards Enhancing Content Preservation and Style Consistency in Long Text Style Transfer

    Authors: Jie Zhao, Ziyu Guan, Cai Xu, Wei Zhao, Yue Jiang

    Abstract: Text style transfer (TST) aims to vary the style polarity of text while preserving the semantic content. Although recent advancements have demonstrated remarkable progress in short TST, it remains a relatively straightforward task with limited practical applications. The more comprehensive long TST task presents two challenges: (1) existing methods encounter difficulties in accurately evaluating c… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  41. arXiv:2406.03902  [pdf, other

    eess.IV cs.CV

    C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

    Authors: Yiqun Lin, Jiewen Yang, Hualiang Wang, Xinpeng Ding, Wei Zhao, Xiaomeng Li

    Abstract: Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  42. arXiv:2406.03488  [pdf, other

    cs.DC

    Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training

    Authors: Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun

    Abstract: The emergence of large language models (LLMs) relies heavily on distributed training strategies, among which pipeline parallelism plays a crucial role. As LLMs' training sequence length extends to 32k or even 128k, the current pipeline parallel methods face severe bottlenecks, including high memory footprints and substantial pipeline bubbles, greatly hindering model scalability and training throug… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures, 6 tables

  43. arXiv:2406.03035  [pdf, other

    cs.CV

    Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

    Authors: Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo

    Abstract: Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple characte… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  44. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  45. arXiv:2406.02240  [pdf, other

    cs.NI

    Quantum Computing in Wireless Communications and Networking: A Tutorial-cum-Survey

    Authors: Wei Zhao, Tangjie Weng, Yue Ruan, Zhi Liu, Xuangou Wu, Xiao Zheng, Nei Kato

    Abstract: Owing to its outstanding parallel computing capabilities, quantum computing (QC) has been a subject of continuous attention. With the gradual maturation of QC platforms, it has increasingly played a significant role in various fields such as transportation, pharmaceuticals, and industrial manufacturing,achieving unprecedented milestones. In modern society, wireless communication stands as an indis… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  46. arXiv:2406.02166  [pdf, other

    cs.SD cs.CL eess.AS

    Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision

    Authors: Saierdaer Yusuyin, Te Ma, Hao Huang, Wenbo Zhao, Zhijian Ou

    Abstract: There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. Th… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  47. arXiv:2406.02013  [pdf, other

    cs.LG

    Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

    Authors: Jiahang Cao, Qiang Zhang, Ziqing Wang, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu

    Abstract: Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretica… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures

  48. arXiv:2406.01894  [pdf, other

    cs.CV

    SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks

    Authors: Yi Pan, Jun-Jie Huang, Zihan Chen, Wentao Zhao, Ziyue Wang

    Abstract: Robust and imperceptible adversarial video attack is challenging due to the spatial and temporal characteristics of videos. The existing video adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations. In this paper, we propose a novel Sparse Adversarial Video Attack via Spatio-Temporal Invertible Neural Networks (SVASTIN) to gen… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  49. arXiv:2406.01326  [pdf, other

    cs.CV

    TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

    Authors: Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Houqiang Li, Can Huang

    Abstract: Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy me… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 20 pages, 8 figures

  50. arXiv:2406.00656  [pdf, other

    cs.CL

    Presence or Absence: Are Unknown Word Usages in Dictionaries?

    Authors: Xianghe Ma, Dominik Schlechtweg, Wei Zhao

    Abstract: There has been a surge of interest in computational modeling of semantic change. The foci of previous works are on detecting and interpreting word senses gained over time; however, it remains unclear whether the gained senses are covered by dictionaries. In this work, we aim to fill this research gap by comparing detected word senses with dictionary sense inventories in order to bridge between the… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: LChange24 Camera Ready