Skip to main content

Showing 1–50 of 206 results for author: Guo, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13221  [pdf, other

    cs.CV

    Multimodal Label Relevance Ranking via Reinforcement Learning

    Authors: Taian Guo, Taolin Zhang, Haoqian Wu, Hanjun Li, Ruizhi Qiao, Xing Sun

    Abstract: Conventional multi-label recognition methods often focus on label confidence, frequently overlooking the pivotal role of partial order relations consistent with human preference. To resolve these issues, we introduce a novel method for multimodal label relevance ranking, named Label Relevance Ranking with Proximal Policy Optimization (LR\textsuperscript{2}PPO), which effectively discerns partial o… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV2024

  2. arXiv:2407.09057  [pdf, other

    cs.CV

    PersonificationNet: Making customized subject act like a person

    Authors: Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua

    Abstract: Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a Personifi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  3. arXiv:2407.02301  [pdf, other

    cs.CL

    CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models

    Authors: Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao

    Abstract: Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to b… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  4. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  5. arXiv:2407.00466  [pdf, other

    cs.CL cs.AI

    BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

    Authors: Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu, Tiannan Guo, Stan Z. Li, Kaicheng Yu

    Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  6. arXiv:2406.17431  [pdf, other

    cs.SE

    A Large-scale Investigation of Semantically Incompatible APIs behind Compatibility Issues in Android Apps

    Authors: Shidong Pan, Tianchen Guo, Lihong Zhang, Pei Liu, Zhenchang Xing, Xiaoyu Sun

    Abstract: Application Programming Interface (API) incompatibility is a long-standing issue in Android application development. The rapid evolution of Android APIs results in a significant number of API additions, removals, and changes between adjacent versions. Unfortunately, this high frequency of alterations may lead to compatibility issues, often without adequate notification to developers regarding thes… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.01451  [pdf, other

    cs.CV cs.MM

    SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

    Authors: Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this paper, we introduce SemiRES, a semi-supervised framework that effectively leverages a combination of labeled and unlabeled data to perform RES. A significant hurdle in applying semi-supervised techniques to RES is the prevalence of noisy pseudo-labels, particularly at the boundaries of objects. SemiRES incorporates the Segment Anything Model (SAM), renowned for its precise boundary demarca… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML2024

  8. arXiv:2406.01414  [pdf, other

    cs.LG eess.SP

    CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

    Authors: Yiyang Zhao, Yunzhuo Liu, Bo Jiang, Tian Guo

    Abstract: This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-le… ▽ More

    Submitted 17 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.04131

  9. arXiv:2406.00291  [pdf, other

    cs.LG cs.AI

    Multi-Objective Neural Architecture Search by Learning Search Space Partitions

    Authors: Yiyang Zhao, Linnan Wang, Tian Guo

    Abstract: Deploying deep learning models requires taking into consideration neural network metrics such as model size, inference latency, and #FLOPs, aside from inference accuracy. This results in deep learning model designers leveraging multi-objective optimization to design effective deep neural networks in multiple criteria. However, applying multi-objective optimizations to neural architecture search (N… ▽ More

    Submitted 17 July, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

    Journal ref: Journal of Machine Learning Research 25 (2024) 1-41

  10. arXiv:2405.00437  [pdf, other

    cs.CE

    Reduced-order modeling for second-order computational homogenization with applications to geometrically parameterized elastomeric metamaterials

    Authors: T. Guo, V. G. Kouznetsova, M. G. D. Geers, K. Veroy, O. Rokoš

    Abstract: The structural properties of mechanical metamaterials are typically studied with two-scale methods based on computational homogenization. Because such materials have a complex microstructure, enriched schemes such as second-order computational homogenization are required to fully capture their non-linear behavior, which arises from non-local interactions due to the buckling or patterning of the mi… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  11. arXiv:2404.19383  [pdf, other

    cs.CV

    Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition

    Authors: Zhendong Liu, Haifeng Xia, Tong Guo, Libo Sun, Ming Shao, Siyu Xia

    Abstract: Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purpos… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  12. arXiv:2404.15746  [pdf, other

    stat.ML cs.CR cs.LG

    Collaborative Heterogeneous Causal Inference Beyond Meta-analysis

    Authors: Tianyu Guo, Sai Praneeth Karimireddy, Michael I. Jordan

    Abstract: Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method could easily fail when a certain site couldn't cover the entire population. Moreover, it still relies on th… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: submitted to ICML

  13. arXiv:2404.05105  [pdf, other

    cs.CV

    VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module

    Authors: Ziyang Wang, Jian-Qing Zheng, Chao Ma, Tao Guo

    Abstract: Image registration, a critical process in medical imaging, involves aligning different sets of medical imaging data into a single unified coordinate system. Deep learning networks, such as the Convolutional Neural Network (CNN)-based VoxelMorph, Vision Transformer (ViT)-based TransMorph, and State Space Model (SSM)-based MambaMorph, have demonstrated effective performance in this domain. The recen… ▽ More

    Submitted 14 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

  14. arXiv:2404.00924  [pdf, other

    cs.CV

    BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks

    Authors: Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang

    Abstract: Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow estimation (OFE)) have been widely involved in our daily life in applications like autonomous driving, augmented reality and video composition. Although certain applications are security-critical or bear societal significance, the adversarial robustness of such models are not sufficiently studied, especially in th… ▽ More

    Submitted 24 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Paper accepted at ICML 2024

  15. arXiv:2403.10825  [pdf, other

    cs.CV

    Affective Behaviour Analysis via Integrating Multi-Modal Knowledge

    Authors: Wei Zhang, Feng Qiu, Chen Liu, Lincheng Li, Heming Du, Tiancheng Guo, Xin Yu

    Abstract: Affective Behavior Analysis aims to facilitate technology emotionally smart, creating a world where devices can understand and react to our emotions as humans do. To comprehensively evaluate the authenticity and applicability of emotional behavior analysis techniques in natural environments, the 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 11 pages, 1 figure

  16. arXiv:2403.07608  [pdf, other

    cs.DB cs.AI cs.LG

    Couler: Unified Machine Learning Workflow Optimization in Cloud

    Authors: Xiaoda Wang, Yuan Tang, Tengda Guo, Bo Sang, Jingji Wu, Jian Sha, Ke Zhang, Jiang Qian, Mingjie Tang

    Abstract: Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, num… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  17. arXiv:2403.06243  [pdf, other

    cs.CV

    BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering

    Authors: Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Pingyu Wang, Xuecheng Nie

    Abstract: Developing blind video deflickering (BVD) algorithms to enhance video temporal consistency, is gaining importance amid the flourish of image processing and video generation. However, the intricate nature of video data complicates the training of deep learning methods, leading to high resource consumption and instability, notably under severe lighting flicker. This underscores the critical need for… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  18. arXiv:2403.01960  [pdf, other

    cs.SD eess.AS

    A robust audio deepfake detection system via multi-view feature

    Authors: Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang

    Abstract: With the advancement of generative modeling techniques, synthetic human speech becomes increasingly indistinguishable from real, and tricky challenges are elicited for the audio deepfake detection (ADD) system. In this paper, we exploit audio features to improve the generalizability of ADD systems. Investigation of the ADD task performance is conducted over a broad range of audio features, includi… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 figures

  19. arXiv:2403.00818  [pdf, other

    cs.CL cs.LG

    DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

    Authors: Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang

    Abstract: Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to en… ▽ More

    Submitted 5 March, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

  20. arXiv:2402.18679  [pdf, other

    cs.AI cs.LG

    Data Interpreter: An LLM Agent For Data Science

    Authors: Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu

    Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution de… ▽ More

    Submitted 12 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  21. arXiv:2402.17487  [pdf, other

    cs.CV cs.LG eess.IV

    Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model

    Authors: Panqi Jia, A. Burakhan Koyuncu, Jue Mao, Ze Cui, Yi Ma, Tiansheng Guo, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Jing Wang, Elena Alshina, Andre Kaup

    Abstract: The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted at (IEEE) PCS 2024; 6 pages

  22. arXiv:2402.17364  [pdf, other

    cs.CV

    Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

    Authors: Zicheng Zhang, Ruobing Zheng, Ziwen Liu, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Bonan Li, Ming Yang

    Abstract: Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences. These implicit methods are still confronted by visual artifacts and jitters, since the lack of explicit geometric constraints poses a fundamental challenge in accurately modeling complex facial deformations. In this paper, we i… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  23. arXiv:2402.15326  [pdf, other

    cs.LG

    Understanding Oversmoothing in Diffusion-Based GNNs From the Perspective of Operator Semigroup Theory

    Authors: Weichen Zhao, Chenguang Wang, Xinyan Wang, Congying Han, Tiande Guo, Tianshu Yu

    Abstract: This paper presents a novel study of the oversmoothing issue in diffusion-based Graph Neural Networks (GNNs). Diverging from extant approaches grounded in random walk analysis or particle systems, we approach this problem through operator semigroup theory. This theoretical framework allows us to rigorously prove that oversmoothing is intrinsically linked to the ergodicity of the diffusion operator… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  24. arXiv:2402.14359  [pdf, other

    cs.CL

    Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark

    Authors: Xiuying Chen, Tairan Wang, Qingqing Zhu, Taicheng Guo, Shen Gao, Zhiyong Lu, Xin Gao, Xiangliang Zhang

    Abstract: The summarization capabilities of pretrained and large language models (LLMs) have been widely validated in general areas, but their use in scientific corpus, which involves complex sentences and specialized knowledge, has been less assessed. This paper presents conceptual and experimental analyses of scientific summarization, highlighting the inadequacies of traditional evaluation methods, such a… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 14pages

  25. arXiv:2402.11768  [pdf, other

    cs.RO

    Targeted Parallelization of Conflict-Based Search for Multi-Robot Path Planning

    Authors: Teng Guo, Jingjin Yu

    Abstract: Multi-Robot Path Planning (MRPP) on graphs, equivalently known as Multi-Agent Path Finding (MAPF), is a well-established NP-hard problem with critically important applications. As serial computation in (near)-optimally solving MRPP approaches the computation efficiency limit, parallelization offers a promising route to push the limit further, especially in handling hard or large MRPP instances. In… ▽ More

    Submitted 15 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Submitted to IROS

  26. arXiv:2402.11767  [pdf, other

    cs.RO

    Decentralized Lifelong Path Planning for Multiple Ackerman Car-Like Robots

    Authors: Teng Guo, Jingjin Yu

    Abstract: Path planning for multiple non-holonomic robots in continuous domains constitutes a difficult robotics challenge with many applications. Despite significant recent progress on the topic, computationally efficient and high-quality solutions are lacking, especially in lifelong settings where robots must continuously take on new tasks. In this work, we make it possible to extend key ideas enabling st… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: ICRA 2024

  27. arXiv:2402.11766  [pdf, other

    cs.RO

    Well-Connected Set and Its Application to Multi-Robot Path Planning

    Authors: Teng Guo, Jingjin Yu

    Abstract: Parking lots and autonomous warehouses for accommodating many vehicles/robots adopt designs in which the underlying graphs are \emph{well-connected} to simplify planning and reduce congestion. In this study, we formulate and delve into the \emph{largest well-connected set} (LWCS) problem and explore its applications in layout design for multi-robot path planning. Roughly speaking, a well-connected… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: ICRA 2024

  28. arXiv:2402.05138  [pdf, other

    cs.AI cs.CL

    SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

    Authors: Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang

    Abstract: The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-respon… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Work in progress

  29. arXiv:2402.02165  [pdf, other

    cs.LG

    Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error

    Authors: Haoran Li, Zicheng Zhang, Wang Luo, Congying Han, Yudong Hu, Tiande Guo, Shichen Liao

    Abstract: Establishing robust policies is essential to counter attacks or disturbances affecting deep reinforcement learning (DRL) agents. Recent studies explore state-adversarial robustness and suggest the potential lack of an optimal robust policy (ORP), posing challenges in setting strict robustness constraints. This work further investigates ORP: At first, we introduce a consistency assumption of policy… ▽ More

    Submitted 19 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Journal ref: ICML 2024 Oral

  30. arXiv:2402.01680  [pdf, other

    cs.CL cs.AI cs.MA

    Large Language Model based Multi-Agents: A Survey of Progress and Challenges

    Authors: Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang

    Abstract: Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in com… ▽ More

    Submitted 18 April, 2024; v1 submitted 21 January, 2024; originally announced February 2024.

    Comments: This work is ongoing and we welcome your contribution!

  31. arXiv:2401.14723  [pdf, ps, other

    cs.IT

    Sliding Secure Symmetric Multilevel Diversity Coding

    Authors: Tao Guo, Laigang Guo, Yinfei Xu, Congduan Li, Shi Jin, Raymond Yeung

    Abstract: Symmetric multilevel diversity coding (SMDC) is a source coding problem where the independent sources are ordered according to their importance. It was shown that separately encoding independent sources (referred to as ``\textit{superposition coding}") is optimal. In this paper, we consider an $(L,s)$ \textit{sliding secure} SMDC problem with security priority, where each source… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  32. arXiv:2401.13934  [pdf, other

    cs.CV

    MambaMorph: a Mamba-based Framework for Medical MR-CT Deformable Registration

    Authors: Tao Guo, Yinuo Wang, Shihao Shu, Diansheng Chen, Zhouping Tang, Cai Meng, Xiangzhi Bai

    Abstract: Capturing voxel-wise spatial correspondence across distinct modalities is crucial for medical image analysis. However, current registration approaches are not practical enough in terms of registration accuracy and clinical applicability. In this paper, we introduce MambaMorph, a novel multi-modality deformable registration framework. Specifically, MambaMorph utilizes a Mamba-based registration mod… ▽ More

    Submitted 12 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  33. arXiv:2401.06327  [pdf, other

    cs.CL

    Learning from Semi-Factuals: A Debiased and Semantic-Aware Framework for Generalized Relation Discovery

    Authors: Jiaxin Wang, Lingling Zhang, Jun Liu, Tianlin Guo, Wenjun Wu

    Abstract: We introduce a novel task, called Generalized Relation Discovery (GRD), for open-world relation extraction. GRD aims to identify unlabeled instances in existing pre-defined relations or discover novel relations by assigning instances to clusters as well as providing specific meanings for these clusters. The key challenges of GRD are how to mitigate the serious model biases caused by labeled pre-de… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  34. arXiv:2312.17276  [pdf, other

    cs.CL cs.LG

    PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation

    Authors: Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, Dacheng Tao

    Abstract: The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  35. arXiv:2312.16423  [pdf, other

    cs.AI math.OC

    General Method for Solving Four Types of SAT Problems

    Authors: Anqi Li, Congying Han, Tiande Guo, Haoran Li, Bonan Li

    Abstract: Existing methods provide varying algorithms for different types of Boolean satisfiability problems (SAT), lacking a general solution framework. Accordingly, this study proposes a unified framework DCSAT based on integer programming and reinforcement learning (RL) algorithm to solve different types of SAT problems such as MaxSAT, Weighted MaxSAT, PMS, WPMS. Specifically, we first construct a consol… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: 34 pages

  36. arXiv:2312.15880  [pdf, other

    cs.CL

    KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph

    Authors: Tiezheng Guo, Qingwen Yang, Chen Wang, Yanyi Liu, Pan Li, Jiawei Tang, Dapeng Li, Yingyou Wen

    Abstract: Large language model (LLM) has achieved outstanding performance on various downstream tasks with its powerful natural language understanding and zero-shot capability, but LLM still suffers from knowledge limitation. Especially in scenarios that require long logical chains or complex reasoning, the hallucination and knowledge limitation of LLM limit its performance in question answering (QA). In th… ▽ More

    Submitted 19 January, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

  37. arXiv:2312.15412  [pdf, other

    cs.LG cs.MA

    CARSS: Cooperative Attention-guided Reinforcement Subpath Synthesis for Solving Traveling Salesman Problem

    Authors: Yuchen Shi, Congying Han, Tiande Guo

    Abstract: This paper introduces CARSS (Cooperative Attention-guided Reinforcement Subpath Synthesis), a novel approach to address the Traveling Salesman Problem (TSP) by leveraging cooperative Multi-Agent Reinforcement Learning (MARL). CARSS decomposes the TSP solving process into two distinct yet synergistic steps: "subpath generation" and "subpath merging." In the former, a cooperative MARL framework is e… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  38. Two-Stage Adaptive Network for Semi-Supervised Cross-Domain Crater Detection under Varying Scenario Distributions

    Authors: Yifan Liu, Tiecheng Song, Chengye Xian, Ruiyuan Chen, Yi Zhao, Rui Li, Tan Guo

    Abstract: Crater detection can provide valuable information for humans to explore the topography and understand the history of extraterrestrial planets. Due to the significantly varying scenario distributions, existing detection models trained on known labelled crater datasets are hardly effective when applied to new unlabelled planets. To address this issue, we propose a two-stage adaptive network (TAN) fo… ▽ More

    Submitted 10 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Journal ref: Liu, Y.; Song, T.; Xian, C.; Chen, R.; Zhao, Y.; Li, R.; Guo, T. Two-Stage Adaptive Network for Semi-Supervised Cross-Domain Crater Detection under Varying Scenario Distributions. Remote Sens. 2024, 16, 2024

  39. arXiv:2312.06049  [pdf, other

    cs.CV

    SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

    Authors: Jifeng Shen, Teng Guo, Xin Zuo, Heng Fan, Wankou Yang

    Abstract: Global feature based Pedestrian Attribute Recognition (PAR) models are often poorly localized when using Grad-CAM for attribute response analysis, which has a significant impact on the interpretability, generalizability and performance. Previous researches have attempted to improve generalization and interpretation through meticulous model design, yet they often have neglected or underutilized eff… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 39 pages, 11 figures, Accepted by Pattern Recognition

  40. arXiv:2312.04141  [pdf, ps, other

    cs.IT

    Distributed Approximate Computing with Constant Locality

    Authors: Deheng Yuan, Tao Guo, Zhongyi Huang, Shi Jin

    Abstract: Consider a distributed coding for computing problem with constant decoding locality, i.e., with a vanishing error probability, any single sample of the function can be approximately recovered by probing only constant number of compressed bits. We establish an achievable rate region by designing an efficient layered coding scheme, where the coding rate is reduced by introducing auxiliary random var… ▽ More

    Submitted 29 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  41. arXiv:2312.03441  [pdf, other

    cs.CV

    UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity

    Authors: Jialong Zuo, Hanyu Zhou, Ying Nie, Feng Zhang, Tianyu Guo, Nong Sang, Yunhe Wang, Changxin Gao

    Abstract: Existing text-based person retrieval datasets often have relatively coarse-grained text annotations. This hinders the model to comprehend the fine-grained semantics of query texts in real scenarios. To address this problem, we contribute a new benchmark named \textbf{UFineBench} for text-based person retrieval with ultra-fine granularity. Firstly, we construct a new \textbf{dataset} named UFine6… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  42. arXiv:2312.00674  [pdf, other

    cs.CV

    LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models

    Authors: Ying Nie, Wei He, Kai Han, Yehui Tang, Tianyu Guo, Fanyi Du, Yunhe Wang

    Abstract: Vision-language pre-training like CLIP has shown promising performance on various downstream tasks such as zero-shot image classification and image-text retrieval. Most of the existing CLIP-alike works usually adopt relatively large image encoders like ResNet50 and ViT, while the lightweight counterparts are rarely discussed. In this paper, we propose a multi-level interaction paradigm for trainin… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  43. arXiv:2311.17493  [pdf, other

    cs.CV

    Towards Higher Ranks via Adversarial Weight Pruning

    Authors: Yuchuan Tian, Hanting Chen, Tianyu Guo, Chao Xu, Yunhe Wang

    Abstract: Convolutional Neural Networks (CNNs) are hard to deploy on edge devices due to its high computation and storage complexities. As a common practice for model compression, network pruning consists of two major categories: unstructured and structured pruning, where unstructured pruning constantly performs better. However, unstructured pruning presents a structured pattern at high pruning rates, which… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023 Accepted

  44. arXiv:2311.01689  [pdf, other

    cs.CL cs.AI

    Data-Free Distillation of Language Model by Text-to-Text Transfer

    Authors: Zheyuan Bai, Xinduo Liu, Hailin Hu, Tianyu Guo, Qinghua Zhang, Yunhe Wang

    Abstract: Data-Free Knowledge Distillation (DFKD) plays a vital role in compressing the model when original training data is unavailable. Previous works for DFKD in NLP mainly focus on distilling encoder-only structures like BERT on classification tasks, which overlook the notable progress of generative language modeling. In this work, we propose a novel DFKD framework, namely DFKD-T$^{3}$, where the pretra… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  45. arXiv:2311.00241  [pdf, other

    cs.CV

    1DFormer: a Transformer Architecture Learning 1D Landmark Representations for Facial Landmark Tracking

    Authors: Shi Yin, Shijie Huan, Shangfei Wang, Jinshui Hu, Tao Guo, Bing Yin, Baocai Yin, Cong Liu

    Abstract: Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance on locating facial landmarks. However, previous methods ignored to make deep explorations on the good potentials of 1D landmark representations for sequential and structural modeling of multiple landmarks to track facial landmarks. To address this limitation, we propose a Transformer architec… ▽ More

    Submitted 1 February, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

  46. arXiv:2310.17753  [pdf, other

    cs.RO

    Bin Assignment and Decentralized Path Planning for Multi-Robot Parcel Sorting

    Authors: Teng Guo, Jingjin Yu

    Abstract: At modern warehouses, mobile robots transport packages and drop them into collection bins/chutes based on shipping destinations grouped by, e.g., the ZIP code. System throughput, measured as the number of packages sorted per unit of time, determines the efficiency of the warehouse. This research develops a scalable, high-throughput multi-robot parcel sorting solution, decomposing the task into two… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  47. arXiv:2310.14437  [pdf, other

    cs.CV

    Mobile AR Depth Estimation: Challenges & Prospects -- Extended Version

    Authors: Ashkan Ganj, Yiqin Zhao, Hang Su, Tian Guo

    Abstract: Metric depth estimation plays an important role in mobile augmented reality (AR). With accurate metric depth, we can achieve more realistic user interactions such as object placement and occlusion detection. While specialized hardware like LiDAR demonstrates its promise, its restricted availability, i.e., only on selected high-end mobile devices, and performance limitations such as range and sensi… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  48. arXiv:2310.10975  [pdf, other

    cs.CV

    NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning

    Authors: Haowei Wang, Jiayi Ji, Tianyu Guo, Yilong Yang, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

    Abstract: Panoptic Narrative Detection (PND) and Segmentation (PNS) are two challenging tasks that involve identifying and locating multiple targets in an image according to a long narrative description. In this paper, we propose a unified and effective framework called NICE that can jointly learn these two panoptic narrative recognition tasks. Existing visual grounding tasks use a two-branch paradigm, but… ▽ More

    Submitted 23 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 18 pages. 9 figures, 9 tables

  49. arXiv:2310.10821  [pdf, other

    cs.DC

    Get-A-Sense: Designing Spatial Context Awareness for Mobile AR Environment Understanding

    Authors: Yiqin Zhao, Ashkan Ganj, Tian Guo

    Abstract: Physical environment understanding is vital in delivering immersive and interactive mobile augmented reality (AR) user experiences. Recently, we have witnessed a transition in the design of environment understanding systems, from visual data focused to centering on the concept of spatial context, including user, device, and environment information. Even though spatial context can benefit the envir… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  50. arXiv:2310.10616  [pdf, other

    cs.LG

    How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

    Authors: Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai

    Abstract: While large language models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) capabilities, understandings of such capabilities are still in an early stage, where existing theory and mechanistic understanding focus mostly on simple scenarios such as learning simple function classes. This paper takes initial steps on understanding ICL in more complex scena… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.