Zum Hauptinhalt springen

Showing 1–50 of 159 results for author: Qi, G

.
  1. arXiv:2408.03695  [pdf, other

    cs.CV

    Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling

    Authors: Zilyu Ye, Jinxiu Liu, Ruotian Peng, Jinjin Cao, Zhiyang Chen, Yiyang Zhang, Ziwei Xuan, Mingyuan Zhou, Xiaoqian Shen, Mohamed Elhoseiny, Qi Liu, Guo-Jun Qi

    Abstract: Recent image generation models excel at creating high-quality images from brief captions. However, they fail to maintain consistency of multiple instances across images when encountering lengthy contexts. This inconsistency is largely due to in existing training datasets the absence of granular instance feature labeling in existing training datasets. To tackle these issues, we introduce Openstory+… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  2. arXiv:2408.00803  [pdf, other

    cs.SE cs.AI cs.CE

    A Comprehensive Survey on Root Cause Analysis in (Micro) Services: Methodologies, Challenges, and Trends

    Authors: Tingting Wang, Guilin Qi

    Abstract: The complex dependencies and propagative faults inherent in microservices, characterized by a dense network of interconnected services, pose significant challenges in identifying the underlying causes of issues. Prompt identification and resolution of disruptive problems are crucial to ensure rapid recovery and maintain system stability. Numerous methodologies have emerged to address this challeng… ▽ More

    Submitted 23 July, 2024; originally announced August 2024.

  3. arXiv:2406.18957  [pdf, other

    cs.DC cs.GT

    A Treatment of EIP-1559: Enhancing Transaction Fee Mechanism through Nth-Price Auction

    Authors: Kun Li, Guangpeng Qi, Guangyong Shang, Wanli Deng, Minghui Xu, Xiuzhen Cheng

    Abstract: With the widespread adoption of blockchain technology, the transaction fee mechanism (TFM) in blockchain systems has become a prominent research topic. An ideal TFM should satisfy user incentive compatibility (UIC), miner incentive compatibility (MIC), and miner-user side contract proofness ($c$-SCP). However, state-of-the-art works either fail to meet these three properties simultaneously or only… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.17532  [pdf, other

    cs.AI cs.CL cs.LO

    Can Large Language Models Understand DL-Lite Ontologies? An Empirical Study

    Authors: Keyu Wang, Guilin Qi, Jiaqi Li, Songlin Zhai

    Abstract: Large language models (LLMs) have shown significant achievements in solving a wide range of tasks. Recently, LLMs' capability to store, retrieve and infer with symbolic knowledge has drawn a great deal of attention, showing their potential to understand structured information. However, it is not yet known whether LLMs can understand Description Logic (DL) ontologies. In this work, we empirically a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  5. arXiv:2405.18700  [pdf, other

    cs.CV

    Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction

    Authors: Xuehao Gao, Yang Yang, Yang Wu, Shaoyi Du, Guo-Jun Qi

    Abstract: Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention. While many fruitful efforts have been made to human motion prediction, most approaches focus on pose-driven prediction and inferring human motion in isolation from the contextual environment, thus leaving the body location movement in the scene behind. However, real-… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Image Processing

  6. arXiv:2405.18483  [pdf, other

    cs.CV

    Towards Open Domain Text-Driven Synthesis of Multi-Person Motions

    Authors: Mengyi Shan, Lu Dong, Yutao Han, Yuan Yao, Tao Liu, Ifeoma Nwogu, Guo-Jun Qi, Mitch Hill

    Abstract: This work aims to generate natural and diverse group motions of multiple humans from textual descriptions. While single-person text-to-motion generation is extensively studied, it remains challenging to synthesize motions for more than one or two subjects from in-the-wild prompts, mainly due to the lack of available datasets. In this work, we curate human pose and motion datasets by estimating pos… ▽ More

    Submitted 15 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: ECCV 2024. Project page: https://shanmy.github.io/Multi-Motion/

  7. arXiv:2405.12523  [pdf, other

    cs.CV cs.AI

    Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models

    Authors: Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi

    Abstract: Machine unlearning empowers individuals with the `right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient… ▽ More

    Submitted 29 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  8. arXiv:2404.13680  [pdf, other

    cs.CV cs.AI

    Zero-shot High-fidelity and Pose-controllable Character Animation

    Authors: Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang

    Abstract: Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations,… ▽ More

    Submitted 5 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures

  9. arXiv:2404.13289  [pdf, other

    cs.CL cs.MM cs.SD eess.AS

    Double Mixture: Towards Continual Event Detection from Speech

    Authors: Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Yinwei Wei, Hao Yang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

    Abstract: Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events. Traditional ASR systems often overlook the interplay between these events, focusing solely on content, even though the interpretation of dialogue can vary with environmental context. This paper tackles two primary challenges in speech event detection: the continual integration of… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: The first two authors contributed equally to this work

  10. arXiv:2403.19723  [pdf, other

    cs.CL cs.AI cs.DB cs.MM

    HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

    Authors: Rihui Jin, Yu Li, Guilin Qi, Nan Hu, Yuan-Fang Li, Jiaoyan Chen, Jianan Wang, Yongrui Chen, Dehai Min

    Abstract: Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's p… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  11. arXiv:2403.19305  [pdf, other

    cs.CL cs.AI

    MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation

    Authors: Yu Li, Shenyu Zhang, Rui Wu, Xiutian Huang, Yongrui Chen, Wenhao Xu, Guilin Qi, Dehai Min

    Abstract: Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models, especially in open-ended text, has consistently presented a significant challenge. Addressing this, recent work has explored the possibility of using LLMs as evaluato… ▽ More

    Submitted 15 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted as a long paper presentation by DASFAA 2024 Industrial Track

  12. arXiv:2403.18760  [pdf, other

    cs.RO

    MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

    Authors: Yike Wu, Jiatao Zhang, Nan Hu, LanLing Tang, Guilin Qi, Jun Shao, Jie Ren, Wei Song

    Abstract: In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models' planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which… ▽ More

    Submitted 1 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  13. arXiv:2403.13270  [pdf

    cs.CE

    Canonical Descriptors for Periodic Lattice Truss Materials

    Authors: Ge Qi, Huai-Liang Zheng, Chen-xi Liu, Li MA, Kai-Uwe Schröder

    Abstract: For decades, aspects of the topological architecture, and of the mechanical as well as other physical behaviors of periodic lattice truss materials (PLTMs) have been massively studied. Their approximate infinite design space presents a double-edged sword, implying on one hand dramatic designability in fulfilling the requirement of various performance, but on the other hand unexpected intractabilit… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 57 pages, 7 figures, 3 tables

    ACM Class: I.1.1

  14. arXiv:2403.11509  [pdf, other

    cs.CL

    DEE: Dual-stage Explainable Evaluation Method for Text Generation

    Authors: Shenyu Zhang, Yu Li, Rui Wu, Xiutian Huang, Yongrui Chen, Wenhao Xu, Guilin Qi

    Abstract: Automatic methods for evaluating machine-generated texts hold significant importance due to the expanding applications of generative systems. Conventional methods tend to grapple with a lack of explainability, issuing a solitary numerical score to signify the assessment outcome. Recent advancements have sought to mitigate this limitation by incorporating large language models (LLMs) to offer more… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by DASFAA 2024

  15. arXiv:2402.14835  [pdf, other

    cs.CL cs.AI cs.LG

    MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing

    Authors: Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi, Haiyun Jiang, Siyuan Cheng, Bozhong Tian

    Abstract: Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs). Despite its potential, current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored. This gap presents a notable challenge, as FG entity recognition is pivotal for… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 8 pages

  16. arXiv:2402.14596  [pdf

    cs.AI

    The Role of LLMs in Sustainable Smart Cities: Applications, Challenges, and Future Directions

    Authors: Amin Ullah, Guilin Qi, Saddam Hussain, Irfan Ullah, Zafar Ali

    Abstract: Smart cities stand as pivotal components in the ongoing pursuit of elevating urban living standards, facilitating the rapid expansion of urban areas while efficiently managing resources through sustainable and scalable innovations. In this regard, as emerging technologies like Artificial Intelligence (AI), the Internet of Things (IoT), big data analytics, and fog and edge computing have become inc… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  17. arXiv:2402.13264  [pdf, other

    cs.AI

    KGroot: Enhancing Root Cause Analysis through Knowledge Graphs and Graph Convolutional Neural Networks

    Authors: Tingting Wang, Guilin Qi, Tianxing Wu

    Abstract: Fault localization is challenging in online micro-service due to the wide variety of monitoring data volume, types, events and complex interdependencies in service and components. Faults events in services are propagative and can trigger a cascade of alerts in a short period of time. In the industry, fault localization is typically conducted manually by experienced personnel. This reliance on expe… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  18. arXiv:2402.12869  [pdf, other

    cs.CL

    Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

    Authors: Dehai Min, Nan Hu, Rihui Jin, Nuo Lin, Jiaoyan Chen, Yongrui Chen, Yu Li, Guilin Qi, Yun Li, Nijun Li, Qianren Wang

    Abstract: Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly… ▽ More

    Submitted 9 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to NAACL 2024 Industry Track Paper

  19. arXiv:2402.11542  [pdf, other

    cs.CL cs.AI

    Question Answering Over Spatio-Temporal Knowledge Graph

    Authors: Xinbang Dai, Huiying Li, Guilin Qi

    Abstract: Spatio-temporal knowledge graphs (STKGs) extend the concept of knowledge graphs (KGs) by incorporating time and location information. While the research community's focus on Knowledge Graph Question Answering (KGQA), the field of answering questions incorporating both spatio-temporal information based on STKGs remains largely unexplored. Furthermore, a lack of comprehensive datasets also has hinde… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 11 pages, 4 figures

    ACM Class: I.2.4; I.2.7

  20. arXiv:2402.11541  [pdf, other

    cs.CL cs.AI

    Large Language Models Can Better Understand Knowledge Graphs Than We Thought

    Authors: Xinbang Dai, Yuncheng Hua, Tongtong Wu, Yang Sheng, Qiu Ji, Guilin Qi

    Abstract: As the parameter scale of large language models (LLMs) grows, jointly training knowledge graph (KG) embeddings with model parameters to enhance LLM capabilities becomes increasingly costly. Consequently, the community has shown interest in developing prompt strategies that effectively integrate KG information into LLMs. However, the format for incorporating KGs into LLMs lacks standardization; for… ▽ More

    Submitted 16 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 15 pages

    ACM Class: I.2.4; I.2.7

  21. arXiv:2402.05712  [pdf, other

    cs.CV cs.AI

    DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer

    Authors: Zhiyuan Ma, Xiangyu Zhu, Guojun Qi, Chen Qian, Zhaoxiang Zhang, Zhen Lei

    Abstract: Speech-driven 3D facial animation is important for many multimedia applications. Recent work has shown promise in using either Diffusion models or Transformer architectures for this task. However, their mere aggregation does not lead to improved performance. We suspect this is due to a shortage of paired audio-4D data, which is crucial for the Transformer to effectively perform as a denoiser withi… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures. Code is avalable at https://github.com/theEricMa/DiffSpeaker

  22. arXiv:2402.01677  [pdf, other

    cs.AI cs.CL

    Embedding Ontologies via Incorporating Extensional and Intensional Knowledge

    Authors: Keyu Wang, Guilin Qi, Jiaoyan Chen, Yi Huang, Tianxing Wu

    Abstract: Ontologies contain rich knowledge within domain, which can be divided into two categories, namely extensional knowledge and intensional knowledge. Extensional knowledge provides information about the concrete instances that belong to specific concepts in the ontology, while intensional knowledge details inherent properties, characteristics, and semantic associations among concepts. However, existi… ▽ More

    Submitted 25 June, 2024; v1 submitted 20 January, 2024; originally announced February 2024.

  23. arXiv:2401.15385  [pdf, other

    cs.CL cs.MM

    Towards Event Extraction from Speech with Contextual Clues

    Authors: Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

    Abstract: While text-based event extraction has been an active research area and has seen successful application in many domains, extracting semantic events from speech directly is an under-explored problem. In this paper, we introduce the Speech Event Extraction (SpeechEE) task and construct three synthetic training sets and one human-spoken test set. Compared to event extraction from text, SpeechEE poses… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: Under Review

  24. arXiv:2401.14640  [pdf, other

    cs.CL

    Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs

    Authors: Nan Hu, Jiaoyan Chen, Yike Wu, Guilin Qi, Sheng Bi, Tongtong Wu, Jeff Z. Pan

    Abstract: The attribution of question answering is to provide citations for supporting generated statements, and has attracted wide research attention. The current methods for automatically evaluating the attribution, which are often based on Large Language Models (LLMs), are still inadequate, particularly in recognizing subtle differences between attributions, and complex relationships between citations an… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 13 pages, 5 figures

  25. arXiv:2401.11078  [pdf, other

    cs.CV

    UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures

    Authors: Mingyuan Zhou, Rakib Hyder, Ziwei Xuan, Guojun Qi

    Abstract: Recent advances in 3D avatar generation have gained significant attentions. These breakthroughs aim to produce more realistic animatable avatars, narrowing the gap between virtual and real-world experiences. Most of existing works employ Score Distillation Sampling (SDS) loss, combined with a differentiable renderer and text condition, to guide a diffusion model in generating 3D avatars. However,… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: The project page is at http://usrc-sea.github.io/UltrAvatar/

  26. arXiv:2312.14970  [pdf, ps, other

    physics.soc-ph cond-mat.dis-nn nlin.AO q-bio.PE stat.ML

    Optimal coordination in Minority Game: A solution from reinforcement learning

    Authors: Guozhong Zheng, Weiran Cai, Guanxiao Qi, Jiqiang Zhang, Li Chen

    Abstract: Efficient allocation is important in nature and human society where individuals often compete for finite resources. The Minority Game is perhaps the simplest model that provides deep insights into how human coordinate to maximize the resource utilization. However, this model assumes the static strategies that are provided a priori, failing to capture their adaptive nature. Here, we turn to the par… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 10 pages, 7 figures, 1 table. A working paper, comments are welcome

  27. arXiv:2312.07100  [pdf, other

    cs.CV

    Lightweight high-resolution Subject Matting in the Real World

    Authors: Peng Liu, Fanyi Wang, Jingwen Su, Yanhao Zhang, Guojun Qi

    Abstract: Existing saliency object detection (SOD) methods struggle to satisfy fast inference and accurate results simultaneously in high resolution scenes. They are limited by the quality of public datasets and efficient network modules for high-resolution images. To alleviate these issues, we propose to construct a saliency object matting dataset HRSOM and a lightweight network PSUNet. Considering efficie… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  28. arXiv:2312.05482  [pdf, other

    cs.CV cs.AI

    BARET : Balanced Attention based Real image Editing driven by Target-text Inversion

    Authors: Yuming Qiao, Fanyi Wang, Jingwen Su, Yanhao Zhang, Yunjie Yu, Siyu Wu, Guo-Jun Qi

    Abstract: Image editing approaches with diffusion models have been rapidly developed, yet their applicability are subject to requirements such as specific editing types (e.g., foreground or background object editing, style transfer), multiple conditions (e.g., mask, sketch, caption), and time consuming fine-tuning of diffusion models. For alleviating these limitations and realizing efficient real image edit… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  29. arXiv:2311.18303  [pdf, other

    cs.CV

    OmniMotionGPT: Animal Motion Generation with Limited Data

    Authors: Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan, Bingbing Wen, Ziwei Xuan, Mitch Hill, Junjie Bai, Guo-Jun Qi, Yalin Wang

    Abstract: Our paper aims to generate diverse and realistic animal motion sequences from textual descriptions, without a large-scale animal text-motion dataset. While the task of text-driven human motion synthesis is already extensively studied and benchmarked, it remains challenging to transfer this success to other skeleton structures with limited data. In this work, we design a model architecture that imi… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: The project page is at https://zshyang.github.io/omgpt-website/

  30. arXiv:2310.18378  [pdf, other

    cs.AI

    Ontology Revision based on Pre-trained Language Models

    Authors: Qiu Ji, Guilin Qi, Yuxin Ye, Jiaye Li, Site Li, Jianjie Ren, Songtao Lu

    Abstract: Ontology revision aims to seamlessly incorporate a new ontology into an existing ontology and plays a crucial role in tasks such as ontology evolution, ontology maintenance, and ontology alignment. Similar to repair single ontologies, resolving logical incoherence in the task of ontology revision is also important and meaningful, because incoherence is a main potential factor to cause inconsistenc… ▽ More

    Submitted 26 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

  31. arXiv:2310.08032  [pdf, other

    cs.AI

    Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning

    Authors: Jiaqi Li, Guilin Qi, Chuanyi Zhang, Yongrui Chen, Yiming Tan, Chenlong Xia, Ye Tian

    Abstract: Multimodal movie genre classification has always been regarded as a demanding multi-label classification task due to the diversity of multimodal data such as posters, plot summaries, trailers and metadata. Although existing works have made great progress in modeling and combining each modality, they still face three issues: 1) unutilized group relations in metadata, 2) unreliable attention allocat… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted by ACM MM 2023

  32. arXiv:2310.04801  [pdf, other

    cs.CL

    Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing

    Authors: Yongrui Chen, Shenyu Zhang, Guilin Qi, Xinnan Guo

    Abstract: Continual table semantic parsing aims to train a parser on a sequence of tasks, where each task requires the parser to translate natural language into SQL based on task-specific tables but only offers limited training examples. Conventional methods tend to suffer from overfitting with limited supervision, as well as catastrophic forgetting due to parameter updates. Despite recent advancements that… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS-2023 (Poster)

  33. arXiv:2309.11206  [pdf, other

    cs.CL cs.AI

    Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering

    Authors: Yike Wu, Nan Hu, Sheng Bi, Guilin Qi, Jie Ren, Anhuan Xie, Wei Song

    Abstract: Despite their competitive performance on knowledge-intensive tasks, large language models (LLMs) still have limitations in memorizing all world knowledge especially long tail knowledge. In this paper, we study the KG-augmented language model approach for solving the knowledge graph question answering (KGQA) task that requires rich world knowledge. Existing work has shown that retrieving KG knowled… ▽ More

    Submitted 21 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  34. arXiv:2309.05447  [pdf, other

    cs.CL

    DoG-Instruct: Towards Premium Instruction-Tuning Data via Text-Grounded Instruction Wrapping

    Authors: Yongrui Chen, Haiyun Jiang, Xinting Huang, Shuming Shi, Guilin Qi

    Abstract: The improvement of LLMs' instruction-following capabilities relies heavily on the availability of high-quality instruction-response pairs. Unfortunately, the current methods used to collect the pairs suffer from either unaffordable labor costs or severe hallucinations in the self-generation of LLM. To tackle these challenges, this paper proposes a scalable solution. It involves training LLMs to ge… ▽ More

    Submitted 25 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted in NAACL 2024

  35. arXiv:2309.00938  [pdf, other

    cs.CV

    Exploring the Robustness of Human Parsers Towards Common Corruptions

    Authors: Sanyi Zhang, Xiaochun Cao, Rui Wang, Guo-Jun Qi, Jie Zhou

    Abstract: Human parsing aims to segment each pixel of the human image with fine-grained semantic categories. However, current human parsers trained with clean data are easily confused by numerous image corruptions such as blur and noise. To improve the robustness of human parsers, in this paper, we construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us… ▽ More

    Submitted 6 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

    Comments: Accepted by IEEE Transactions on Image Processing (TIP)

  36. arXiv:2309.00013  [pdf, other

    cs.CV

    Model Inversion Attack via Dynamic Memory Learning

    Authors: Gege Qi, YueFeng Chen, Xiaofeng Mao, Binyuan Hui, Xiaodan Li, Rong Zhang, Hui Xue

    Abstract: Model Inversion (MI) attacks aim to recover the private training data from the target model, which has raised security concerns about the deployment of DNNs in practice. Recent advances in generative adversarial models have rendered them particularly effective in MI attacks, primarily due to their ability to generate high-fidelity and perceptually realistic images that closely resemble the target… ▽ More

    Submitted 23 August, 2023; originally announced September 2023.

  37. arXiv:2307.12498  [pdf, other

    cs.SD cs.CL eess.AS

    Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training

    Authors: Gege Qi, Yuefeng Chen, Xiaofeng Mao, Xiaojun Jia, Ranjie Duan, Rong Zhang, Hui Xue

    Abstract: Developing a practically-robust automatic speech recognition (ASR) is challenging since the model should not only maintain the original performance on clean samples, but also achieve consistent efficacy under small volume perturbations and large domain shifts. To address this problem, we propose a novel WavAugment Guided Phoneme Adversarial Training (wapat). wapat use adversarial examples in phone… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  38. arXiv:2307.06533  [pdf, other

    cs.CV

    Domain-adaptive Person Re-identification without Cross-camera Paired Samples

    Authors: Huafeng Li, Yanmei Mao, Yafei Zhang, Guanqiu Qi, Zhengtao Yu

    Abstract: Existing person re-identification (re-ID) research mainly focuses on pedestrian identity matching across cameras in adjacent areas. However, in reality, it is inevitable to face the problem of pedestrian identity matching across long-distance scenes. The cross-camera pedestrian samples collected from long-distance scenes often have no positive samples. It is extremely challenging to use cross-came… ▽ More

    Submitted 15 July, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: 13 pages,7 figures

  39. LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar

    Authors: Yuelang Xu, Hongwen Zhang, Lizhen Wang, Xiaochen Zhao, Han Huang, Guojun Qi, Yebin Liu

    Abstract: Existing approaches to animatable NeRF-based head avatars are either built upon face templates or use the expression coefficients of templates as the driving signal. Despite the promising progress, their performances are heavily bound by the expression power and the tracking accuracy of the templates. In this work, we present LatentAvatar, an expressive neural head avatar driven by latent expressi… ▽ More

    Submitted 3 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted by SIGGRAPH 2023

  40. arXiv:2304.03903  [pdf, other

    cs.CV cs.AI

    High-Fidelity Clothed Avatar Reconstruction from a Single Image

    Authors: Tingting Liao, Xiaomei Zhang, Yuliang Xiu, Hongwei Yi, Xudong Liu, Guo-Jun Qi, Yong Zhang, Xuan Wang, Xiangyu Zhu, Zhen Lei

    Abstract: This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  41. arXiv:2304.01664  [pdf, other

    cs.AI

    An Embedding-based Approach to Inconsistency-tolerant Reasoning with Inconsistent Ontologies

    Authors: Keyu Wang, Site Li, Jiaye Li, Guilin Qi, Qiu Ji

    Abstract: Inconsistency handling is an important issue in knowledge management. Especially in ontology engineering, logical inconsistencies may occur during ontology construction. A natural way to reason with an inconsistent ontology is to utilize the maximal consistent subsets of the ontology. However, previous studies on selecting maximum consistent subsets have rarely considered the semantics of the axio… ▽ More

    Submitted 26 November, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: 9 pages,1 figure

  42. arXiv:2304.01289  [pdf, other

    cs.CV

    Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver

    Authors: Xianpeng Liu, Ce Zheng, Kelvin Cheng, Nan Xue, Guo-Jun Qi, Tianfu Wu

    Abstract: The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal ve… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  43. arXiv:2303.15954  [pdf

    cs.LG cs.AI

    TraffNet: Learning Causality of Traffic Generation for What-if Prediction

    Authors: Ming Xu, Qiang Ai, Ruimin Li, Yunyi Ma, Geqi Qi, Xiangfu Meng, Haibo Jin

    Abstract: Real-time what-if traffic prediction is crucial for decision making in intelligent traffic management and control. Although current deep learning methods demonstrate significant advantages in traffic prediction, they are powerless in what-if traffic prediction due to their nature of correla-tion-based. Here, we present a simple deep learning framework called TraffNet that learns the mechanisms of… ▽ More

    Submitted 22 June, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

  44. arXiv:2303.14662  [pdf, other

    cs.CV cs.AI

    OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering

    Authors: Zhiyuan Ma, Xiangyu Zhu, Guojun Qi, Zhen Lei, Lei Zhang

    Abstract: Controllability, generalizability and efficiency are the major objectives of constructing face avatars represented by neural implicit field. However, existing methods have not managed to accommodate the three requirements simultaneously. They either focus on static portraits, restricting the representation ability to a specific subject, or suffer from substantial computational cost, limiting their… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023. The code is available at https://github.com/theEricMa/OTAvatar

  45. arXiv:2303.13357  [pdf, other

    cs.CV cs.AI cs.MM

    POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

    Authors: Ce Zheng, Xianpeng Liu, Guo-Jun Qi, Chen Chen

    Abstract: Transformer architectures have achieved SOTA performance on the human mesh recovery (HMR) from monocular images. However, the performance gain has come at the cost of substantial memory and computational overhead. A lightweight and efficient model to reconstruct accurate human mesh is needed for real-world applications. In this paper, we propose a pure transformer architecture named POoling aTtent… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  46. arXiv:2303.10368  [pdf, other

    cs.CL

    An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question Answering

    Authors: Nan Hu, Yike Wu, Guilin Qi, Dehai Min, Jiaoyan Chen, Jeff Z. Pan, Zafar Ali

    Abstract: Large-scale pre-trained language models (PLMs) such as BERT have recently achieved great success and become a milestone in natural language processing (NLP). It is now the consensus of the NLP community to adopt PLMs as the backbone for downstream tasks. In recent works on knowledge graph question answering (KGQA), BERT or its variants have become necessary in their KGQA models. However, there is… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by World Wide Web Journal

  47. arXiv:2303.07992  [pdf, other

    cs.CL

    Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family

    Authors: Yiming Tan, Dehai Min, Yu Li, Wenbo Li, Nan Hu, Yongrui Chen, Guilin Qi

    Abstract: ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models. Although there have been some works analyzing the question answering performance of Cha… ▽ More

    Submitted 20 September, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: To be published in Proceedings of ISWC 2023, 22nd International Semantic Web Conference

  48. arXiv:2303.07598  [pdf, other

    cs.CV cs.AI cs.LG

    AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+

    Authors: Xiao Wang, Ying Wang, Ziwei Xuan, Guo-Jun Qi

    Abstract: Unsupervised learning of vision transformers seeks to pretrain an encoder via pretext tasks without labels. Among them is the Masked Image Modeling (MIM) aligned with pretraining of language transformers by predicting masked patches as a pretext task. A criterion in unsupervised pretraining is the pretext task needs to be sufficiently hard to prevent the transformer encoder from learning trivial l… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: 9 pages, 5 figures

  49. arXiv:2302.14236  [pdf

    physics.plasm-ph physics.ins-det

    Vibration and jitter of free-flowing thin liquid sheets as target for high-repetition-rate laser-ion acceleration

    Authors: Zhengxuan Cao, Ziyang Peng, Yinren Shou, Jiarui Zhao, Shiyou Chen, Ying Gao, Jianbo Liu, Pengjie Wang, Zhusong Mei, Zhuo Pan, Defeng Kong, Guijun Qi, Shirui Xu, Zhipeng Liu, Yulan Liang, Shengxuan Xu, Tan Song, Xun Chen, Qingfan Wu, Xuan Liu, Wenjun Ma

    Abstract: Very thin free-flowing liquid sheets are promising targets for high-repetition-rate laser-ion acceleration. In this work, we report the generation of micrometer-thin free-flowing liquid sheets from the collision of two liquid jets, and study the vibration and jitter in their surface normal direction. The dependence of their motion amplitudes on the generation parameters is studied in detail. The o… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: 13 pages, 5 figures, Original Reasearch

  50. arXiv:2302.12986  [pdf, other

    cs.CV

    Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search

    Authors: Benzhi Wang, Yang Yang, Jinlin Wu, Guo-jun Qi, Zhen Lei

    Abstract: Weakly supervised person search aims to jointly detect and match persons with only bounding box annotations. Existing approaches typically focus on improving the features by exploring relations of persons. However, scale variation problem is a more severe obstacle and under-studied that a person often owns images with different scales (resolutions). On the one hand, small-scale images contain less… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: 10 pages, 7 figures

    Journal ref: ICCV 2023