Zum Hauptinhalt springen

Showing 51–100 of 233 results for author: Qin, B

.
  1. arXiv:2312.17044  [pdf, other

    cs.CL

    Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

    Authors: Liang Zhao, Xiaocheng Feng, Xiachong Feng, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin, Ting Liu

    Abstract: Transformer has taken the field of natural language processing (NLP) by storm since its birth. Further, Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities. Nevertheless, all Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones,… ▽ More

    Submitted 2 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Work in progress

  2. arXiv:2312.14988  [pdf, other

    cs.CV

    Emage: Non-Autoregressive Text-to-Image Generation

    Authors: Zhangyin Feng, Runyi Hu, Liangxin Liu, Fan Zhang, Duyu Tang, Yong Dai, Xiaocheng Feng, Jiwei Li, Bing Qin, Shuming Shi

    Abstract: Autoregressive and diffusion models drive the recent breakthroughs on text-to-image generation. Despite their huge success of generating high-realistic images, a common shortcoming of these models is their high inference latency - autoregressive models run more than a thousand times successively to produce image tokens and diffusion models convert Gaussian noise into images with many hundreds of d… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  3. arXiv:2312.04889  [pdf, other

    cs.AI cs.CL cs.LG

    KwaiAgents: Generalized Information-seeking Agent System with Large Language Models

    Authors: Haojie Pan, Zepeng Zhai, Hao Yuan, Yaojia Lv, Ruiji Fu, Ming Liu, Zhongyuan Wang, Bing Qin

    Abstract: Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this inquisitiveness. Despite not having the capacity to process and memorize vast amounts of information in their brains, humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the… ▽ More

    Submitted 10 January, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

  4. arXiv:2312.04127  [pdf, other

    cs.CL

    Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak

    Authors: Yanrui Du, Sendong Zhao, Ming Ma, Yuhan Chen, Bing Qin

    Abstract: Extensive work has been devoted to improving the safety mechanism of Large Language Models (LLMs). However, LLMs still tend to generate harmful responses when faced with malicious instructions, a phenomenon referred to as "Jailbreak Attack". In our research, we introduce a novel automatic jailbreak method RADIAL, which bypasses the security mechanism by amplifying the potential of LLMs to generate… ▽ More

    Submitted 23 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  5. arXiv:2311.17667  [pdf, other

    cs.CL cs.AI

    TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

    Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Haotian Wang, Ming Liu, Bing Qin

    Abstract: Grasping the concept of time is a fundamental facet of human cognition, indispensable for truly comprehending the intricacies of the world. Previous studies typically focus on specific aspects of time, lacking a comprehensive temporal reasoning benchmark. To address this, we propose TimeBench, a comprehensive hierarchical temporal reasoning benchmark that covers a broad spectrum of temporal reason… ▽ More

    Submitted 28 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Accepted to ACL 2024

  6. arXiv:2311.13614  [pdf, other

    cs.CV cs.AI

    HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

    Authors: Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang

    Abstract: Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks. However, the hallucinations inherent in machine-generated data, which could lead to hallucinatory outputs in MLLMs, remain under-explored. This work aims to investigate various hallucinations (i.e., objec… ▽ More

    Submitted 24 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024

  7. arXiv:2311.05876  [pdf, other

    cs.CL

    Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

    Authors: Zhangyin Feng, Weitao Ma, Weijiang Yu, Lei Huang, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting liu

    Abstract: Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects.… ▽ More

    Submitted 7 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Work in progress; 22 pages. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  8. arXiv:2311.05232  [pdf, other

    cs.CL

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    Authors: Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting Liu

    Abstract: The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses subs… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Work in progress; 49 pages

  9. arXiv:2311.04816  [pdf, other

    cs.CL cs.AI

    MTGER: Multi-view Temporal Graph Enhanced Temporal Reasoning over Time-Involved Document

    Authors: Zheng Chu, Zekun Wang, Jiafeng Liang, Ming Liu, Bing Qin

    Abstract: The facts and time in the document are intricately intertwined, making temporal reasoning over documents challenging. Previous work models time implicitly, making it difficult to handle such complex relationships. To address this issue, we propose MTGER, a novel Multi-view Temporal Graph Enhanced Temporal Reasoning framework for temporal reasoning over time-involved documents. Concretely, MTGER ex… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Findings of EMNLP 2023, long paper

  10. arXiv:2310.16534  [pdf, other

    cs.CL cs.CV

    An Early Evaluation of GPT-4V(ision)

    Authors: Yang Wu, Shilong Wang, Hao Yang, Tian Zheng, Hongbo Zhang, Yanyan Zhao, Bing Qin

    Abstract: In this paper, we evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio. To estimate GPT-4V's performance, we manually construct 656 test instances and carefully evaluate the results of GPT-4V. The highlights of our findings are as follows: (1) GPT-4V exhib… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Technical Report. Data are available at https://github.com/albertwy/GPT-4V-Evaluation

  11. arXiv:2310.14790  [pdf, other

    cs.LG cs.AI

    Weighted Joint Maximum Mean Discrepancy Enabled Multi-Source-Multi-Target Unsupervised Domain Adaptation Fault Diagnosis

    Authors: Zixuan Wang, Haoran Tang, Haibo Wang, Bo Qin, Mark D. Butala, Weiming Shen, Hongwei Wang

    Abstract: Despite the remarkable results that can be achieved by data-driven intelligent fault diagnosis techniques, they presuppose the same distribution of training and test data as well as sufficient labeled data. Various operating states often exist in practical scenarios, leading to the problem of domain shift that hinders the effectiveness of fault diagnosis. While recent unsupervised domain adaptatio… ▽ More

    Submitted 23 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

  12. arXiv:2310.13610  [pdf, other

    cs.CL cs.AI

    Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

    Authors: Yanrui Du, Sendong Zhao, Haochun Wang, Yuhan Chen, Rui Bai, Zewen Qiang, Muzhen Cai, Bing Qin

    Abstract: Explaining black-box model behavior with natural language has achieved impressive results in various NLP tasks. Recent research has explored the utilization of subsequences from the input text as a rationale, providing users with evidence to support the model decision. Although existing frameworks excel in generating high-quality rationales while achieving high task performance, they neglect to ac… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  13. arXiv:2310.05149  [pdf, other

    cs.CL

    Retrieval-Generation Synergy Augmented Large Language Models

    Authors: Zhangyin Feng, Xiaocheng Feng, Dezhi Zhao, Maojin Yang, Bing Qin

    Abstract: Large language models augmented with task-relevant documents have demonstrated impressive performance on knowledge-intensive tasks. However, regarding how to obtain effective documents, the existing methods are mainly divided into two categories. One is to retrieve from an external knowledge base, and the other is to utilize large language models to generate documents. We propose an iterative retr… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  14. arXiv:2309.15402  [pdf, other

    cs.CL cs.AI

    Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

    Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, Ting Liu

    Abstract: Reasoning, a fundamental cognitive process integral to human intelligence, has garnered substantial interest within artificial intelligence. Notably, recent studies have revealed that chain-of-thought prompting significantly enhances LLM's reasoning capabilities, which attracts widespread attention from both academics and industry. In this paper, we systematically investigate relevant research, su… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted to ACL 2024

  15. arXiv:2309.05203  [pdf, other

    cs.CL

    From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery

    Authors: Yuhan Chen, Nuwa Xi, Yanrui Du, Haochun Wang, Jianyu Chen, Sendong Zhao, Bing Qin

    Abstract: Molecule discovery serves as a cornerstone in numerous scientific domains, fueling the development of new materials and innovative drug designs. Recent developments of in-silico molecule discovery have highlighted the promising results of cross-modal techniques, which bridge molecular structures with their descriptive annotations. However, these cross-modal methods frequently encounter the issue o… ▽ More

    Submitted 5 March, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: AAAI2024

  16. arXiv:2309.04198  [pdf, other

    cs.CL

    Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain

    Authors: Yanrui Du, Sendong Zhao, Muzhen Cai, Ming Ma, Danyang Zhao, Jiawei Cao, Bing Qin

    Abstract: Extensive studies have been devoted to privatizing general-domain Large Language Models (LLMs) as Domain-Specific LLMs via feeding specific-domain data. However, these privatization efforts often ignored a critical aspect: Dual Logic Ability, which is a core reasoning ability for LLMs. The dual logic ability of LLMs ensures that they can maintain a consistent stance when confronted with both posit… ▽ More

    Submitted 23 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

  17. arXiv:2309.04175  [pdf, other

    cs.CL cs.AI

    Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

    Authors: Haochun Wang, Sendong Zhao, Zewen Qiang, Zijian Li, Nuwa Xi, Yanrui Du, MuZhen Cai, Haoqiang Guo, Yuhan Chen, Haoming Xu, Bing Qin, Ting Liu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable success in diverse natural language processing (NLP) tasks in general domains. However, LLMs sometimes generate responses with the hallucination about medical facts due to limited domain knowledge. Such shortcomings pose potential risks in the utilization of LLMs within medical contexts. To address this challenge, we propose knowledge-tunin… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 11 pages, 5 figures

  18. arXiv:2309.04174  [pdf, other

    cs.CL cs.AI

    Manifold-based Verbalizer Space Re-embedding for Tuning-free Prompt-based Classification

    Authors: Haochun Wang, Sendong Zhao, Chi Liu, Nuwa Xi, Muzhen Cai, Bing Qin, Ting Liu

    Abstract: Prompt-based classification adapts tasks to a cloze question format utilizing the [MASK] token and the filled tokens are then mapped to labels through pre-defined verbalizers. Recent studies have explored the use of verbalizer embeddings to reduce labor in this process. However, all existing studies require a tuning process for either the pre-trained models or additional trainable embeddings. Mean… ▽ More

    Submitted 29 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted by AAAI 2024, 11 pages, 3 figures

  19. arXiv:2309.04162  [pdf, other

    cs.CL

    GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models' Over-Reliance on Superficial Clue

    Authors: Yanrui Du, Sendong Zhao, Yuhan Chen, Rai Bai, Jing Liu, Hua Wu, Haifeng Wang, Bing Qin

    Abstract: Pre-trained models have achieved success in Chinese Short Text Matching (STM) tasks, but they often rely on superficial clues, leading to a lack of robust predictions. To address this issue, it is crucial to analyze and mitigate the influence of superficial clues on STM models. Our study aims to investigate their over-reliance on the edit distance feature, commonly used to measure the semantic sim… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  20. arXiv:2309.03852  [pdf, other

    cs.CL cs.AI

    FLM-101B: An Open LLM and How to Train It with $100K Budget

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun, Yequan Wang

    Abstract: Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.… ▽ More

    Submitted 17 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

  21. arXiv:2308.07749  [pdf, other

    cs.CV cs.AI

    Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model

    Authors: Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang

    Abstract: The rising demand for creating lifelike avatars in the digital realm has led to an increased need for generating high-quality human videos guided by textual descriptions and poses. We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues. Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion. The… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 11 pages, 3 figures

  22. arXiv:2308.05282  [pdf, other

    cs.CR

    Decentralized Finance (DeFi): A Survey

    Authors: Erya Jiang, Bo Qin, Qin Wang, Zhipeng Wang, Qianhong Wu, Jian Weng, Xinyu Li, Chenyang Wang, Yuhang Ding, Yanran Zhang

    Abstract: Decentralized Finance (DeFi) is a new paradigm in the creation, distribution, and utilization of financial services via the integration of blockchain technology. Our research conducts a comprehensive introduction and meticulous classification of various DeFi applications. Beyond that, we thoroughly analyze these risks from both technical and economic perspectives, spanning multiple layers. We poin… ▽ More

    Submitted 30 November, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

  23. arXiv:2308.03275  [pdf, other

    cs.CL

    Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization

    Authors: Xiachong Feng, Xiaocheng Feng, Xiyuan Du, Min-Yen Kan, Bing Qin

    Abstract: Meeting summarization has emerged as a promising technique for providing users with condensed summaries. However, existing work has focused on training models on centralized data, neglecting real-world scenarios where meeting data are infeasible to collect centrally, due to their sensitive nature. This gap motivates us to explore federated learning for meeting summarization. Two critical challenge… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: This work has been submitted to the IEEE TASLP for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  24. arXiv:2307.05355  [pdf, other

    eess.SP cs.CL

    UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language

    Authors: Nuwa Xi, Sendong Zhao, Haochun Wang, Chi Liu, Bing Qin, Ting Liu

    Abstract: Decoding text stimuli from cognitive signals (e.g. fMRI) enhances our understanding of the human language system, paving the way for building versatile Brain-Computer Interface. However, existing studies largely focus on decoding individual word-level fMRI volumes from a restricted vocabulary, which is far too idealized for real-world application. In this paper, we propose fMRI2text, the first ope… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: the 61st Annual Meeting of the Association for Computational Linguistics

  25. arXiv:2306.17034  [pdf, other

    cs.AI cs.CL

    Exploring & Exploiting High-Order Graph Structure for Sparse Knowledge Graph Completion

    Authors: Tao He, Ming Liu, Yixin Cao, Zekun Wang, Zihao Zheng, Zheng Chu, Bing Qin

    Abstract: Sparse knowledge graph (KG) scenarios pose a challenge for previous Knowledge Graph Completion (KGC) methods, that is, the completion performance decreases rapidly with the increase of graph sparsity. This problem is also exacerbated because of the widespread existence of sparse KGs in practical applications. To alleviate this challenge, we present a novel framework, LR-GCN, that is able to automa… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 12 pages, 5 figures

  26. arXiv:2306.16176  [pdf, other

    cs.CL

    SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills

    Authors: Zhangyin Feng, Yong Dai, Fan Zhang, Duyu Tang, Xiaocheng Feng, Shuangzhi Wu, Bing Qin, Yunbo Cao, Shuming Shi

    Abstract: Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, which enables a single model to tackle many different tasks from different languages. To this end, we define several language-specific skills and task-spe… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  27. arXiv:2306.14701  [pdf, other

    cs.LG cs.AI

    Hard Sample Mining Enabled Supervised Contrastive Feature Learning for Wind Turbine Pitch System Fault Diagnosis

    Authors: Zixuan Wang, Bo Qin, Mengxuan Li, Chenlu Zhan, Mark D. Butala, Peng Peng, Hongwei Wang

    Abstract: The efficient utilization of wind power by wind turbines relies on the ability of their pitch systems to adjust blade pitch angles in response to varying wind speeds. However, the presence of multiple health conditions in the pitch system due to the long-term wear and tear poses challenges in accurately classifying them, thus increasing the maintenance cost of wind turbines or even damaging them.… ▽ More

    Submitted 10 August, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

  28. arXiv:2305.17458  [pdf, other

    cs.CL

    A Diffusion Model for Event Skeleton Generation

    Authors: Fangqi Zhu, Lin Zhang, Jun Gao, Bing Qin, Ruifeng Xu, Haiqin Yang

    Abstract: Event skeleton generation, aiming to induce an event schema skeleton graph with abstracted event nodes and their temporal relations from a set of event instance graphs, is a critical step in the temporal complex event schema induction task. Existing methods effectively address this task from a graph generation perspective but suffer from noise-sensitive and error accumulation, e.g., the inability… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  29. arXiv:2305.16811  [pdf, other

    cs.CV cs.CL

    Improved Visual Story Generation with Adaptive Context Modeling

    Authors: Zhangyin Feng, Yuchen Ren, Xinmiao Yu, Xiaocheng Feng, Duyu Tang, Shuming Shi, Bing Qin

    Abstract: Diffusion models developed on top of powerful text-to-image generation models like Stable Diffusion achieve remarkable success in visual story generation. However, the best-performing approach considers historically generated results as flattened memory cells, ignoring the fact that not all preceding images contribute equally to the generation of the characters and scenes at the current stage. To… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  30. arXiv:2305.15718  [pdf, other

    cs.CL

    Towards Higher Pareto Frontier in Multilingual Machine Translation

    Authors: Yichong Huang, Xiaocheng Feng, Xinwei Geng, Baohang Li, Bing Qin

    Abstract: Multilingual neural machine translation has witnessed remarkable progress in recent years. However, the long-tailed distribution of multilingual corpora poses a challenge of Pareto optimization, i.e., optimizing for some languages may come at the cost of degrading the performance of others. Existing balancing training strategies are equivalent to a series of Pareto optimal solutions, which trade o… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL2023

  31. arXiv:2305.15033  [pdf, other

    cs.CL

    SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models

    Authors: Zekun Wang, Jingchang Chen, Wangchunshu Zhou, Haichao Zhu, Jiafeng Liang, Liping Shan, Ming Liu, Dongliang Xu, Qing Yang, Bing Qin

    Abstract: Despite achieving remarkable performance on various vision-language tasks, Transformer-based Vision-Language Models (VLMs) suffer from redundancy in inputs and parameters, significantly hampering their efficiency in real-world applications. Moreover, the degree of redundancy in token representations and model parameters, such as attention heads, varies significantly for different inputs. In light… ▽ More

    Submitted 26 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: COLING-LREC 2024

  32. arXiv:2305.13697  [pdf, other

    cs.CL

    UNIMO-3: Multi-granularity Interaction for Vision-Language Representation Learning

    Authors: Hao Yang, Can Gao, Hao Líu, Xinyan Xiao, Yanyan Zhao, Bing Qin

    Abstract: Vision-and-language (VL) pre-training, which aims to learn a general representation of image-text pairs that can be transferred to various vision-and-language tasks. Compared with modeling uni-modal data, the main challenge of the VL model is: how to learn the cross-modal interaction from multimodal data, especially the fine-grained interaction. Existing works have shown that fully transformer-bas… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  33. arXiv:2305.12328  [pdf, other

    cs.CV cs.AI cs.MM

    InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

    Authors: Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang

    Abstract: We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions. Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion. The proposed InstructVid2Vid model modifies a pretrained image generation model, Stable Diffusion, to generate a time-dependent… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted by ICME 2024

  34. arXiv:2305.11595  [pdf, other

    cs.CL cs.AI

    Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate

    Authors: Kai Xiong, Xiao Ding, Yixin Cao, Ting Liu, Bing Qin

    Abstract: Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. Existing works primarily focus on the inconsistency issues within a single LLM, while we complementarily explore the inter-consistency among multiple LLMs for collaboration. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared… ▽ More

    Submitted 18 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Findings Camera Ready Version

  35. arXiv:2305.10709  [pdf, other

    cs.CL cs.AI

    NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing

    Authors: Tingting Wu, Xiao Ding, Minji Tang, Hao Zhang, Bing Qin, Ting Liu

    Abstract: Large-scale datasets in the real world inevitably involve label noise. Deep models can gradually overfit noisy labels and thus degrade model generalization. To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance. Due to the lack of suitable datasets, previous studies have frequently employed synthetic label noise t… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: The paper has been accepted by ACL 2023 Findings. Dataset and code are available in this https URL

  36. arXiv:2305.07375  [pdf, other

    cs.CL cs.AI

    Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation

    Authors: Jinglong Gao, Xiao Ding, Bing Qin, Ting Liu

    Abstract: Causal reasoning ability is crucial for numerous NLP applications. Despite the impressive emerging ability of ChatGPT in various NLP tasks, it is unclear how well ChatGPT performs in causal reasoning. In this paper, we conduct the first comprehensive evaluation of the ChatGPT's causal reasoning capabilities. Experiments show that ChatGPT is not a good causal reasoner, but a good causal explainer.… ▽ More

    Submitted 12 October, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of EMNLP 2023

  37. arXiv:2305.04429  [pdf, other

    cs.CL

    Improving Cross-Task Generalization with Step-by-Step Instructions

    Authors: Yang Wu, Yanyan Zhao, Zhongyang Li, Bing Qin, Kai Xiong

    Abstract: Instruction tuning has been shown to be able to improve cross-task generalization of language models. However, it is still challenging for language models to complete the target tasks following the instructions, as the instructions are general and lack intermediate steps. To address this problem, we propose to incorporate the step-by-step instructions to help language models to decompose the tasks… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

  38. arXiv:2305.03296  [pdf, other

    cs.CL

    TransESC: Smoothing Emotional Support Conversation via Turn-Level State Transition

    Authors: Weixiang Zhao, Yanyan Zhao, Shilong Wang, Bing Qin

    Abstract: Emotion Support Conversation (ESC) is an emerging and challenging task with the goal of reducing the emotional distress of people. Previous attempts fail to maintain smooth transitions between utterances in ESC because they ignore to grasp the fine-grained transition information at each dialogue turn. To solve this problem, we propose to take into account turn-level state \textbf{Trans}itions of \… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  39. arXiv:2305.03111  [pdf, other

    cs.CL

    Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

    Authors: Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Rongyu Cao, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C. C. Chang, Fei Huang, Reynold Cheng, Yongbin Li

    Abstract: Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and rea… ▽ More

    Submitted 14 November, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  40. arXiv:2305.01253  [pdf, other

    cs.CL

    The Role of Summarization in Generative Agents: A Preliminary Perspective

    Authors: Xiachong Feng, Xiaocheng Feng, Bing Qin

    Abstract: Generative agents that simulate human society show tremendous potential for further research and practical applications. Specifically, the generative agent architecture comprising several meticulously designed modules constitutes the most critical component. To facilitate progress in this research, this report presents our integrated perspective on comprehending generative agents through summariza… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  41. arXiv:2304.09582  [pdf, other

    cs.CL

    Is ChatGPT Equipped with Emotional Dialogue Capabilities?

    Authors: Weixiang Zhao, Yanyan Zhao, Xin Lu, Shilong Wang, Yanpeng Tong, Bing Qin

    Abstract: This report presents a study on the emotional dialogue capability of ChatGPT, an advanced language model developed by OpenAI. The study evaluates the performance of ChatGPT on emotional dialogue understanding and generation through a series of experiments on several downstream tasks. Our findings indicate that while ChatGPT's performance on emotional dialogue understanding may still lag behind tha… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

  42. arXiv:2304.06975  [pdf, other

    cs.CL

    HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge

    Authors: Haochun Wang, Chi Liu, Nuwa Xi, Zewen Qiang, Sendong Zhao, Bing Qin, Ting Liu

    Abstract: Large Language Models (LLMs), such as the LLaMA model, have demonstrated their effectiveness in various general-domain natural language processing (NLP) tasks. Nevertheless, LLMs have not yet performed optimally in biomedical domain tasks due to the need for medical expertise in the responses. In response to this challenge, we propose HuaTuo, a LLaMA-based model that has been supervised-fine-tuned… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: LLaMA-based Chinese Medical model - HuaTuo. Model, code and training data are available at https://github.com/SCIR-HI/Huatuo-Llama-Med-Chinese

  43. arXiv:2304.05642  [pdf, other

    cs.CL

    Global Prompt Cell: A Portable Control Module for Effective Prompt Tuning

    Authors: Chi Liu, Haochun Wang, Nuwa Xi, Sendong Zhao, Bing Qin

    Abstract: As a novel approach to tuning pre-trained models, prompt tuning involves freezing the parameters in downstream tasks while inserting trainable embeddings into inputs in the first layer. However, previous methods have mainly focused on the initialization of prompt embeddings. The strategy of training and utilizing prompt embeddings in a reasonable way has become a limiting factor in the effectivene… ▽ More

    Submitted 13 May, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

  44. arXiv:2304.03512  [pdf, other

    cs.CL

    Hierarchical Catalogue Generation for Literature Review: A Benchmark

    Authors: Kun Zhu, Xiaocheng Feng, Xiachong Feng, Yingsheng Wu, Bing Qin

    Abstract: Scientific literature review generation aims to extract and organize important information from an abundant collection of reference papers and produces corresponding reviews while lacking a clear and logical hierarchy. We observe that a high-quality catalogue-guided generation process can effectively alleviate this problem. Therefore, we present an atomic and challenging task named Hierarchical Ca… ▽ More

    Submitted 16 November, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: EMNLP 2023 findings

  45. Novel Method to Reliably Determine the QCD Coupling from $R_{\rm uds}$ Measurements and its effects to Muon $g-2$ and $α(M_Z^2)$ within the Tau-Charm Energy Region

    Authors: Jian-Ming Shen, Bing-Hai Qin, Jiang Yan, Sheng-Quan Wang, Xing-Gang Wu

    Abstract: We present a novel method for precisely determining the QCD running coupling from $R_{\rm uds}$ measurements in electron-positron annihilation. When calculating the fixed-order perturbative QCD (pQCD) approximant of $R_{\rm uds}$, its effective coupling constant $α_s(Q_*^2)$ is determined by using the principle of maximum conformality, a systematic scale-setting method for gauge theories, whose re… ▽ More

    Submitted 5 July, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: 22 pages, 8 figures. To be published in JHEP

    Journal ref: JHEP07(2023)109

  46. arXiv:2303.06531  [pdf, other

    cs.RO cs.AI

    Towards Practical Multi-Robot Hybrid Tasks Allocation for Autonomous Cleaning

    Authors: Yabin Wang, Xiaopeng Hong, Zhiheng Ma, Tiedong Ma, Baoxing Qin, Zhou Su

    Abstract: Task allocation plays a vital role in multi-robot autonomous cleaning systems, where multiple robots work together to clean a large area. However, most current studies mainly focus on deterministic, single-task allocation for cleaning robots, without considering hybrid tasks in uncertain working environments. Moreover, there is a lack of datasets and benchmarks for relevant research. In this paper… ▽ More

    Submitted 4 April, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

  47. STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training

    Authors: Weihong Zhong, Mao Zheng, Duyu Tang, Xuan Luo, Heng Gong, Xiaocheng Feng, Bing Qin

    Abstract: Although large-scale video-language pre-training models, which usually build a global alignment between the video and the text, have achieved remarkable progress on various downstream tasks, the idea of adopting fine-grained information during the pre-training stage is not well explored. In this work, we propose STOA-VLP, a pre-training framework that jointly models object and action information a… ▽ More

    Submitted 23 May, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: AAAI 2023, 7 pages, 3 figures

  48. arXiv:2301.09237  [pdf, other

    cs.HC cs.CL

    Semantic-aware Contrastive Learning for Electroencephalography-to-Text Generation with Curriculum Learning

    Authors: Xiachong Feng, Xiaocheng Feng, Bing Qin

    Abstract: Electroencephalography-to-Text generation (EEG-to-Text), which aims to directly generate natural text from EEG signals has drawn increasing attention in recent years due to the enormous potential for Brain-computer interfaces (BCIs). However, the remarkable discrepancy between the subject-dependent EEG representation and the semantic-dependent text representation poses a great challenge to this ta… ▽ More

    Submitted 22 January, 2023; originally announced January 2023.

  49. arXiv:2301.07507  [pdf, other

    cs.CL cs.DB

    Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

    Authors: Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, Yongbin Li

    Abstract: The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years, as it can assist end users in efficiently extracting vital information from databases without the need for technical background. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted to AAAI 2023 main conference (oral)

  50. arXiv:2212.10392  [pdf, other

    cs.CL

    Debiasing Stance Detection Models with Counterfactual Reasoning and Adversarial Bias Learning

    Authors: Jianhua Yuan, Yanyan Zhao, Bing Qin

    Abstract: Stance detection models may tend to rely on dataset bias in the text part as a shortcut and thus fail to sufficiently learn the interaction between the targets and texts. Recent debiasing methods usually treated features learned by small models or big models at earlier steps as bias features and proposed to exclude the branch learning those bias features during inference. However, most of these me… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: Work in Progress