Search | arXiv e-print repository

ReCo: Reliable Causal Chain Reasoning via Structural Causal Recurrent Neural Networks

Authors: Kai Xiong, Xiao Ding, Zhongyang Li, Li Du, Bing Qin, Yi Zheng, Baoxing Huai

Abstract: Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario. To address these issues, we propo… ▽ More Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario. To address these issues, we propose a novel Reliable Causal chain reasoning framework~(ReCo), which introduces exogenous variables to represent the threshold and scene factors of each causal pair within the causal chain, and estimates the threshold and scene contradictions across exogenous variables via structural causal recurrent neural networks~(SRNN). Experiments show that ReCo outperforms a series of strong baselines on both Chinese and English CCR datasets. Moreover, by injecting reliable causal chain knowledge distilled by ReCo, BERT can achieve better performances on four downstream causal-related tasks than BERT models enhanced by other kinds of knowledge. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: Accepted by EMNLP 2022

arXiv:2212.08307 [pdf, other]

Controllable Text Generation via Probability Density Estimation in the Latent Space

Authors: Yuxuan Gu, Xiaocheng Feng, Sicheng Ma, Lingyuan Zhang, Heng Gong, Weihong Zhong, Bing Qin

Abstract: Previous work on controllable text generation has explored the idea of control from the latent space, such as optimizing a representation with attribute-related classifiers or sampling a representation from relevant discrete samples. However, they are not effective enough in modeling both the latent space and the control, leaving controlled text with low quality and diversity. In this work, we pro… ▽ More Previous work on controllable text generation has explored the idea of control from the latent space, such as optimizing a representation with attribute-related classifiers or sampling a representation from relevant discrete samples. However, they are not effective enough in modeling both the latent space and the control, leaving controlled text with low quality and diversity. In this work, we propose a novel control framework using probability density estimation in the latent space. Our method utilizes an invertible transformation function, the Normalizing Flow, that maps the complex distributions in the latent space to simple Gaussian distributions in the prior space. Thus, we can perform sophisticated and flexible control in the prior space and feed the control effects back into the latent space owing to the one-one-mapping property of invertible transformations. Experiments on single-attribute controls and multi-attribute control reveal that our method outperforms several strong baselines on attribute relevance and text quality and achieves the SOTA. Further analysis of control strength adjustment demonstrates the flexibility of our control strategy. △ Less

Submitted 24 May, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: 25 pages, 9 figures, Accepted to ACL2023

arXiv:2212.02995 [pdf, other]

Knowledge-Bridged Causal Interaction Network for Causal Emotion Entailment

Authors: Weixiang Zhao, Yanyan Zhao, Zhuojun Li, Bing Qin

Abstract: Causal Emotion Entailment aims to identify causal utterances that are responsible for the target utterance with a non-neutral emotion in conversations. Previous works are limited in thorough understanding of the conversational context and accurate reasoning of the emotion cause. To this end, we propose Knowledge-Bridged Causal Interaction Network (KBCIN) with commonsense knowledge (CSK) leveraged… ▽ More Causal Emotion Entailment aims to identify causal utterances that are responsible for the target utterance with a non-neutral emotion in conversations. Previous works are limited in thorough understanding of the conversational context and accurate reasoning of the emotion cause. To this end, we propose Knowledge-Bridged Causal Interaction Network (KBCIN) with commonsense knowledge (CSK) leveraged as three bridges. Specifically, we construct a conversational graph for each conversation and leverage the event-centered CSK as the semantics-level bridge (S-bridge) to capture the deep inter-utterance dependencies in the conversational context via the CSK-Enhanced Graph Attention module. Moreover, social-interaction CSK serves as emotion-level bridge (E-bridge) and action-level bridge (A-bridge) to connect candidate utterances with the target one, which provides explicit causal clues for the Emotional Interaction module and Actional Interaction module to reason the target emotion. Experimental results show that our model achieves better performance over most baseline models. Our source code is publicly available at https://github.com/circle-hit/KBCIN. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Accepted by AAAI 2023

arXiv:2212.01543 [pdf, other]

The RoyalFlush System for the WMT 2022 Efficiency Task

Authors: Bo Qin, Aixin Jia, Qiang Wang, Jianning Lu, Shuqin Pan, Haibo Wang, Ming Chen

Abstract: This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation (HRT) to combine the advantages of autoregressive and non-autoregressive translation. Specifically, HRT first autoregressivel… ▽ More This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation (HRT) to combine the advantages of autoregressive and non-autoregressive translation. Specifically, HRT first autoregressively generates a discontinuous sequence (e.g., make a prediction every $k$ tokens, $k>1$) and then fills in all previously skipped tokens at once in a non-autoregressive manner. Thus, we can easily trade off the translation quality and speed by adjusting $k$. In addition, by integrating other modeling techniques (e.g., sequence-level knowledge distillation and deep-encoder-shallow-decoder layer allocation strategy) and a mass of engineering efforts, HRT improves 80\% inference speed and achieves equivalent translation performance with the same-capacity AT counterpart. Our fastest system reaches 6k+ words/second on the GPU latency setting, estimated to be about 3.1x faster than the last year's winner. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: Accepted by WMT 2022. arXiv admin note: text overlap with arXiv:2210.10416

arXiv:2211.16368 [pdf, other]

DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention

Authors: Bosheng Qin, Juncheng Li, Siliang Tang, Yueting Zhuang

Abstract: Many studies have been conducted to improve the efficiency of Transformer from quadric to linear. Among them, the low-rank-based methods aim to learn the projection matrices to compress the sequence length. However, the projection matrices are fixed once they have been learned, which compress sequence length with dedicated coefficients for tokens in the same position. Adopting such input-invariant… ▽ More Many studies have been conducted to improve the efficiency of Transformer from quadric to linear. Among them, the low-rank-based methods aim to learn the projection matrices to compress the sequence length. However, the projection matrices are fixed once they have been learned, which compress sequence length with dedicated coefficients for tokens in the same position. Adopting such input-invariant projections ignores the fact that the most informative part of a sequence varies from sequence to sequence, thus failing to preserve the most useful information that lies in varied positions. In addition, previous efficient Transformers only focus on the influence of sequence length while neglecting the effect of hidden state dimension. To address the aforementioned problems, we present an efficient yet effective attention mechanism, namely the Dynamic Bilinear Low-Rank Attention (DBA), which compresses the sequence length by input-sensitive dynamic projection matrices and achieves linear time and space complexity by jointly optimizing the sequence length and hidden state dimension while maintaining state-of-the-art performance. Specifically, we first theoretically demonstrate that the sequence length can be compressed non-destructively from a novel perspective of information theory, with compression matrices dynamically determined by the input sequence. Furthermore, we show that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, optimizing the attention in bilinear form. Theoretical analysis shows that DBA is proficient in capturing high-order relations in cross-attention problems. Experiments over tasks with diverse sequence length conditions show that DBA achieves state-of-the-art performance compared with various strong baselines while maintaining less memory consumption with higher speed. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: 19 pages, 4 figures

arXiv:2211.10001 [pdf, other]

doi 10.1007/978-981-99-7356-9_38

BDTS: Blockchain-based Data Trading System

Authors: Erya Jiang, Bo Qin, Qin Wang, Qianhong Wu, Sanxi Li, Wenchang Shi, Yingxin Bi, Wenyi Tang

Abstract: Trading data through blockchain platforms is hard to achieve \textit{fair exchange}. Reasons come from two folds: Firstly, guaranteeing fairness between sellers and consumers is a challenging task as the deception of any participating parties is risk-free. This leads to the second issue where judging the behavior of data executors (such as cloud service providers) among distrustful parties is impr… ▽ More Trading data through blockchain platforms is hard to achieve \textit{fair exchange}. Reasons come from two folds: Firstly, guaranteeing fairness between sellers and consumers is a challenging task as the deception of any participating parties is risk-free. This leads to the second issue where judging the behavior of data executors (such as cloud service providers) among distrustful parties is impractical in the context of traditional trading protocols. To fill the gaps, in this paper, we present a \underline{b}lockchain-based \underline{d}ata \underline{t}rading \underline{s}ystem, named BDTS. BDTS implements a fair-exchange protocol in which benign behaviors can get rewarded while dishonest behaviors will be punished. Our scheme requires the seller to provide consumers with the correct encryption keys for proper execution and encourage a rational data executor to behave faithfully for maximum benefits from rewards. We analyze the strategies of consumers, sellers, and dealers in the trading game and point out that everyone should be honest about their interests so that the game will reach Nash equilibrium. Evaluations prove efficiency and practicability. △ Less

Submitted 31 October, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: ICICS 2023 (Best Paper Award)

Journal ref: International Conference on Information and Communications Security, pp. 645-664. Singapore: Springer Nature Singapore, 2023

arXiv:2211.03612 [pdf]

BigCilin: An Automatic Chinese Open-domain Knowledge Graph with Fine-grained Hypernym-Hyponym Relations

Authors: Ming Liu, Yaojia LV, Jingrun Zhang, Ruiji Fu, Bing Qin

Abstract: This paper presents BigCilin, the first Chinese open-domain knowledge graph with fine-grained hypernym-hyponym re-lations which are extracted automatically from multiple sources for Chinese named entities. With the fine-grained hypernym-hyponym relations, BigCilin owns flexible semantic hierarchical structure. Since the hypernym-hyponym paths are automati-cally generated and one entity may have se… ▽ More This paper presents BigCilin, the first Chinese open-domain knowledge graph with fine-grained hypernym-hyponym re-lations which are extracted automatically from multiple sources for Chinese named entities. With the fine-grained hypernym-hyponym relations, BigCilin owns flexible semantic hierarchical structure. Since the hypernym-hyponym paths are automati-cally generated and one entity may have several senses, we provide a path disambi-guation solution to map a hypernym-hyponym path of one entity to its one sense on the condition that the path and the sense express the same meaning. In order to conveniently access our BigCilin Knowle-dge graph, we provide web interface in two ways. One is that it supports querying any Chinese named entity and browsing the extracted hypernym-hyponym paths surro-unding the query entity. The other is that it gives a top-down browsing view to illust-rate the overall hierarchical structure of our BigCilin knowledge graph over some sam-pled entities. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures

arXiv:2211.00732 [pdf, other]

Kuaipedia: a Large-scale Multi-modal Short-video Encyclopedia

Authors: Haojie Pan, Zepeng Zhai, Yuzhou Zhang, Ruiji Fu, Ming Liu, Yangqiu Song, Zhongyuan Wang, Bing Qin

Abstract: Online encyclopedias, such as Wikipedia, have been well-developed and researched in the last two decades. One can find any attributes or other information of a wiki item on a wiki page edited by a community of volunteers. However, the traditional text, images and tables can hardly express some aspects of an wiki item. For example, when we talk about ``Shiba Inu'', one may care more about ``How to… ▽ More Online encyclopedias, such as Wikipedia, have been well-developed and researched in the last two decades. One can find any attributes or other information of a wiki item on a wiki page edited by a community of volunteers. However, the traditional text, images and tables can hardly express some aspects of an wiki item. For example, when we talk about ``Shiba Inu'', one may care more about ``How to feed it'' or ``How to train it not to protect its food''. Currently, short-video platforms have become a hallmark in the online world. Whether you're on TikTok, Instagram, Kuaishou, or YouTube Shorts, short-video apps have changed how we consume and create content today. Except for producing short videos for entertainment, we can find more and more authors sharing insightful knowledge widely across all walks of life. These short videos, which we call knowledge videos, can easily express any aspects (e.g. hair or how-to-feed) consumers want to know about an item (e.g. Shiba Inu), and they can be systematically analyzed and organized like an online encyclopedia. In this paper, we propose Kuaipedia, a large-scale multi-modal encyclopedia consisting of items, aspects, and short videos lined to them, which was extracted from billions of videos of Kuaishou (Kwai), a well-known short-video platform in China. We first collected items from multiple sources and mined user-centered aspects from millions of users' queries to build an item-aspect tree. Then we propose a new task called ``multi-modal item-aspect linking'' as an expansion of ``entity linking'' to link short videos into item-aspect pairs and build the whole short-video encyclopedia. Intrinsic evaluations show that our encyclopedia is of large scale and highly accurate. We also conduct sufficient extrinsic experiments to show how Kuaipedia can help fundamental applications such as entity typing and entity linking. △ Less

Submitted 11 August, 2023; v1 submitted 28 October, 2022; originally announced November 2022.

arXiv:2210.03884 [pdf, other]

Don't Lose Yourself! Empathetic Response Generation via Explicit Self-Other Awareness

Authors: Weixiang Zhao, Yanyan Zhao, Xin Lu, Bing Qin

Abstract: As a critical step to achieve human-like chatbots, empathetic response generation has attained increasing interests. Previous attempts are incomplete and not sufficient enough to elicit empathy because they only focus on the initial aspect of empathy to automatically mimic the feelings and thoughts of the user via other-awareness. However, they ignore to maintain and take the own views of the syst… ▽ More As a critical step to achieve human-like chatbots, empathetic response generation has attained increasing interests. Previous attempts are incomplete and not sufficient enough to elicit empathy because they only focus on the initial aspect of empathy to automatically mimic the feelings and thoughts of the user via other-awareness. However, they ignore to maintain and take the own views of the system into account, which is a crucial process to achieve the empathy called self-other awareness. To this end, we propose to generate Empathetic response with explicit Self-Other Awareness (EmpSOA). Specifically, three stages, self-other differentiation, self-other modulation and self-other generation, are devised to clearly maintain, regulate and inject the self-other aware information into the process of empathetic response generation. Both automatic and human evaluations on the benchmark dataset demonstrate the superiority of EmpSOA to generate more empathetic responses. △ Less

Submitted 5 May, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: Accepted to Findings of ACL 2023

arXiv:2210.02889 [pdf, other]

A Distributional Lens for Multi-Aspect Controllable Text Generation

Authors: Yuxuan Gu, Xiaocheng Feng, Sicheng Ma, Lingyuan Zhang, Heng Gong, Bing Qin

Abstract: Multi-aspect controllable text generation is a more challenging and practical task than single-aspect control. Existing methods achieve complex multi-aspect control by fusing multiple controllers learned from single-aspect, but suffer from attribute degeneration caused by the mutual interference of these controllers. To address this, we provide observations on attribute fusion from a distributiona… ▽ More Multi-aspect controllable text generation is a more challenging and practical task than single-aspect control. Existing methods achieve complex multi-aspect control by fusing multiple controllers learned from single-aspect, but suffer from attribute degeneration caused by the mutual interference of these controllers. To address this, we provide observations on attribute fusion from a distributional perspective and propose to directly search for the intersection areas of multiple attribute distributions as their combination for generation. Our method first estimates the attribute space with an autoencoder structure. Afterward, we iteratively approach the intersections by jointly minimizing distances to points representing different attributes. Finally, we map them to attribute-relevant sentences with a prefix-tuning-based decoder. Experiments on the three-aspect control task, including sentiment, topic, and detoxification aspects, reveal that our method outperforms several strong baselines on attribute relevance and text quality and achieves the SOTA. Further analysis also supplies some explanatory support for the effectiveness of our approach. △ Less

Submitted 19 October, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: 21pages, 21figures, EMNLP2022

arXiv:2209.09768 [pdf, other]

An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition

Authors: Yang Wu, Pai Peng, Zhenyu Zhang, Yanyan Zhao, Bing Qin

Abstract: Recent works on multi-modal emotion recognition move towards end-to-end models, which can extract the task-specific features supervised by the target task compared with the two-phase pipeline. However, previous methods only model the feature interactions between the textual and either acoustic and visual modalities, ignoring capturing the feature interactions between the acoustic and visual modali… ▽ More Recent works on multi-modal emotion recognition move towards end-to-end models, which can extract the task-specific features supervised by the target task compared with the two-phase pipeline. However, previous methods only model the feature interactions between the textual and either acoustic and visual modalities, ignoring capturing the feature interactions between the acoustic and visual modalities. In this paper, we propose the multi-modal end-to-end transformer (ME2ET), which can effectively model the tri-modal features interaction among the textual, acoustic, and visual modalities at the low-level and high-level. At the low-level, we propose the progressive tri-modal attention, which can model the tri-modal feature interactions by adopting a two-pass strategy and can further leverage such interactions to significantly reduce the computation and memory complexity through reducing the input token length. At the high-level, we introduce the tri-modal feature fusion layer to explicitly aggregate the semantic representations of three modalities. The experimental results on the CMU-MOSEI and IEMOCAP datasets show that ME2ET achieves the state-of-the-art performance. The further in-depth analysis demonstrates the effectiveness, efficiency, and interpretability of the proposed progressive tri-modal attention, which can help our model to achieve better performance while significantly reducing the computation and memory cost. Our code will be publicly available. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.07248 [pdf, other]

doi 10.1103/PhysRevFluids.7.110511

Mixing in chaotic flows with swimming bacteria

Authors: Ranjiangshang Ran, Quentin Brosseau, Brendan C. Blackwell, Boyang Qin, Rebecca L. Winter, Paulo E. Arratia

Abstract: This is a manuscript accepted for publication on Physical Review Fluids, Gallery of Fluid Motion special issue. The manuscript is associated with a poster winner of the 39th Annual Gallery of Fluid Motion Award, for work presented at the 74th Annual Meeting of the American Physical Society's Division of Fluid Dynamics (Phoenix, AZ, USA 2021). This is a manuscript accepted for publication on Physical Review Fluids, Gallery of Fluid Motion special issue. The manuscript is associated with a poster winner of the 39th Annual Gallery of Fluid Motion Award, for work presented at the 74th Annual Meeting of the American Physical Society's Division of Fluid Dynamics (Phoenix, AZ, USA 2021). △ Less

Submitted 24 August, 2022; originally announced September 2022.

Comments: This is a manuscript accepted for publication on Physical Review Fluids, Gallery of Fluid Motion special issue

Journal ref: Phys.Rev.Fluids 7 (2022) 110511

arXiv:2209.06453 [pdf, other]

Prompt Combines Paraphrase: Teaching Pre-trained Models to Understand Rare Biomedical Words

Authors: Haochun Wang, Chi Liu, Nuwa Xi, Sendong Zhao, Meizhi Ju, Shiwei Zhang, Ziheng Zhang, Yefeng Zheng, Bing Qin, Ting Liu

Abstract: Prompt-based fine-tuning for pre-trained models has proven effective for many natural language processing tasks under few-shot settings in general domain. However, tuning with prompt in biomedical domain has not been investigated thoroughly. Biomedical words are often rare in general domain, but quite ubiquitous in biomedical contexts, which dramatically deteriorates the performance of pre-trained… ▽ More Prompt-based fine-tuning for pre-trained models has proven effective for many natural language processing tasks under few-shot settings in general domain. However, tuning with prompt in biomedical domain has not been investigated thoroughly. Biomedical words are often rare in general domain, but quite ubiquitous in biomedical contexts, which dramatically deteriorates the performance of pre-trained models on downstream biomedical applications even after fine-tuning, especially in low-resource scenarios. We propose a simple yet effective approach to helping models learn rare biomedical words during tuning with prompt. Experimental results show that our method can achieve up to 6% improvement in biomedical natural language inference task without any extra parameters or training steps using few-shot vanilla prompt settings. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: Accepted to COLING 2022

arXiv:2209.06442 [pdf, other]

SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers

Authors: Bowen Qin, Lihan Wang, Binyuan Hui, Bowen Li, Xiangpeng Wei, Binhua Li, Fei Huang, Luo Si, Min Yang, Yongbin Li

Abstract: This paper aims to improve the performance of text-to-SQL parsing by exploring the intrinsic uncertainties in the neural network based approaches (called SUN). From the data uncertainty perspective, it is indisputable that a single SQL can be learned from multiple semantically-equivalent questions.Different from previous methods that are limited to one-to-one mapping, we propose a data uncertainty… ▽ More This paper aims to improve the performance of text-to-SQL parsing by exploring the intrinsic uncertainties in the neural network based approaches (called SUN). From the data uncertainty perspective, it is indisputable that a single SQL can be learned from multiple semantically-equivalent questions.Different from previous methods that are limited to one-to-one mapping, we propose a data uncertainty constraint to explore the underlying complementary semantic information among multiple semantically-equivalent questions (many-to-one) and learn the robust feature representations with reduced spurious associations. In this way, we can reduce the sensitivity of the learned representations and improve the robustness of the parser. From the model uncertainty perspective, there is often structural information (dependence) among the weights of neural networks. To improve the generalizability and stability of neural text-to-SQL parsers, we propose a model uncertainty constraint to refine the query representations by enforcing the output representations of different perturbed encoding networks to be consistent with each other. Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms strong competitors and achieves new state-of-the-art results. For reproducibility, we release our code and data at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/sunsql. △ Less

Submitted 28 October, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: Accepted at COLING 2022

arXiv:2209.02276 [pdf, other]

Zero-shot Aspect-level Sentiment Classification via Explicit Utilization of Aspect-to-Document Sentiment Composition

Authors: Pengfei Deng, Jianhua Yuan, Yanyan Zhao, Bing Qin

Abstract: As aspect-level sentiment labels are expensive and labor-intensive to acquire, zero-shot aspect-level sentiment classification is proposed to learn classifiers applicable to new domains without using any annotated aspect-level data. In contrast, document-level sentiment data with ratings are more easily accessible. In this work, we achieve zero-shot aspect-level sentiment classification by only us… ▽ More As aspect-level sentiment labels are expensive and labor-intensive to acquire, zero-shot aspect-level sentiment classification is proposed to learn classifiers applicable to new domains without using any annotated aspect-level data. In contrast, document-level sentiment data with ratings are more easily accessible. In this work, we achieve zero-shot aspect-level sentiment classification by only using document-level reviews. Our key intuition is that the sentiment representation of a document is composed of the sentiment representations of all the aspects of that document. Based on this, we propose the AF-DSC method to explicitly model such sentiment composition in reviews. AF-DSC first learns sentiment representations for all potential aspects and then aggregates aspect-level sentiments into a document-level one to perform document-level sentiment classification. In this way, we obtain the aspect-level sentiment classifier as the by-product of the document-level sentiment classifier. Experimental results on aspect-level sentiment classification benchmarks demonstrate the effectiveness of explicit utilization of sentiment composition in document-level sentiment classification. Our model with only 30k training data outperforms previous work utilizing millions of data. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2208.13629 [pdf, other]

A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions

Authors: Bowen Qin, Binyuan Hui, Lihan Wang, Min Yang, Jinyang Li, Binhua Li, Ruiying Geng, Rongyu Cao, Jian Sun, Luo Si, Fei Huang, Yongbin Li

Abstract: Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational databases. Early text-to-SQL parsing systems from the database community achieved a noticeable progress with the cost of heavy human engineering and user interactio… ▽ More Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational databases. Early text-to-SQL parsing systems from the database community achieved a noticeable progress with the cost of heavy human engineering and user interactions with the systems. In recent years, deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output SQL query. Subsequently, the large pre-trained language models have taken the state-of-the-art of the text-to-SQL parsing task to a new level. In this survey, we present a comprehensive review on deep learning approaches for text-to-SQL parsing. First, we introduce the text-to-SQL parsing corpora which can be categorized as single-turn and multi-turn. Second, we provide a systematical overview of pre-trained language models and existing methods for text-to-SQL parsing. Third, we present readers with the challenges faced by text-to-SQL parsing and explore some potential future directions in this field. △ Less

Submitted 29 August, 2022; originally announced August 2022.

arXiv:2208.09884 [pdf, other]

DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination

Authors: Tingting Wu, Xiao Ding, Hao Zhang, Jinglong Gao, Li Du, Bing Qin, Ting Liu

Abstract: Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful (e.g., easy to hard) sequence. Previous work takes incorrect samples as generic hard ones without discrimina… ▽ More Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful (e.g., easy to hard) sequence. Previous work takes incorrect samples as generic hard ones without discriminating between hard samples (i.e., hard samples in correct data) and incorrect samples. Indeed, a model should learn from hard samples to promote generalization rather than overfit to incorrect ones. In this paper, we address this problem by appending a novel loss function DiscrimLoss, on top of the existing task loss. Its main effect is to automatically and stably estimate the importance of easy samples and difficult samples (including hard and incorrect samples) at the early stages of training to improve the model performance. Then, during the following stages, DiscrimLoss is dedicated to discriminating between hard and incorrect samples to improve the model generalization. Such a training strategy can be formulated dynamically in a self-supervised manner, effectively mimicking the main principle of curriculum learning. Experiments on image classification, image regression, text sequence regression, and event relation reasoning demonstrate the versatility and effectiveness of our method, particularly in the presence of diversified noise levels. △ Less

Submitted 21 August, 2022; originally announced August 2022.

arXiv:2207.13005 [pdf, other]

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Authors: Zhenran Xu, Zifei Shan, Yuxin Li, Baotian Hu, Bing Qin

Abstract: Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL dataset… ▽ More Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task. △ Less

Submitted 29 October, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: WSDM 2023

arXiv:2207.01528 [pdf, other]

VEM$^2$L: A Plug-and-play Framework for Fusing Text and Structure Knowledge on Sparse Knowledge Graph Completion

Authors: Tao He, Ming Liu, Yixin Cao, Tianwen Jiang, Zihao Zheng, Jingrun Zhang, Sendong Zhao, Bing Qin

Abstract: Knowledge Graph Completion (KGC) aims to reason over known facts and infer missing links but achieves weak performances on those sparse Knowledge Graphs (KGs). Recent works introduce text information as auxiliary features or apply graph densification to alleviate this challenge, but suffer from problems of ineffectively incorporating structure features and injecting noisy triples. In this paper, w… ▽ More Knowledge Graph Completion (KGC) aims to reason over known facts and infer missing links but achieves weak performances on those sparse Knowledge Graphs (KGs). Recent works introduce text information as auxiliary features or apply graph densification to alleviate this challenge, but suffer from problems of ineffectively incorporating structure features and injecting noisy triples. In this paper, we solve the sparse KGC from these two motivations simultaneously and handle their respective drawbacks further, and propose a plug-and-play unified framework VEM$^2$L over sparse KGs. The basic idea of VEM$^2$L is to motivate a text-based KGC model and a structure-based KGC model to learn with each other to fuse respective knowledge into unity. To exploit text and structure features together in depth, we partition knowledge within models into two nonoverlapping parts: expressiveness ability on the training set and generalization ability upon unobserved queries. For the former, we motivate these two text-based and structure-based models to learn from each other on the training sets. And for the generalization ability, we propose a novel knowledge fusion strategy derived by the Variational EM (VEM) algorithm, during which we also apply a graph densification operation to alleviate the sparse graph problem further. Our graph densification is derived by VEM algorithm. Due to the convergence of EM algorithm, we guarantee the increase of likelihood function theoretically with less being impacted by noisy injected triples heavily. By combining these two fusion methods and graph densification, we propose the VEM$^2$L framework finally. Both detailed theoretical evidence, as well as qualitative experiments, demonstrates the effectiveness of our proposed framework. △ Less

Submitted 15 August, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: 12 pages, 5 figures

arXiv:2206.14017 [pdf, other]

Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing

Authors: Lihan Wang, Bowen Qin, Binyuan Hui, Bowen Li, Min Yang, Bailin Wang, Binhua Li, Fei Huang, Luo Si, Yongbin Li

Abstract: The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing… ▽ More The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing procedure based on Poincaré distance metric, and use the induced relations to augment current graph-based parsers for better schema linking. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences, even when surface forms of mentions and entities differ. Moreover, our probing procedure is entirely unsupervised and requires no additional parameters. Extensive experiments show that our framework sets new state-of-the-art performance on three benchmarks. We empirically verify that our probing procedure can indeed find desired relational structures through qualitative analysis. Our code can be found at https://github.com/AlibabaResearch/DAMO-ConvAI. △ Less

Submitted 6 August, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted at KDD 2022

arXiv:2206.13969 [pdf, other]

MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with Multimodal Fine-grained Aligned Annotations

Authors: Hao Yang, Yanyan Zhao, Jianwei Liu, Yang Wu, Bing Qin

Abstract: Multimodal fine-grained sentiment analysis has recently attracted increasing attention due to its broad applications. However, the existing multimodal fine-grained sentiment datasets most focus on annotating the fine-grained elements in text but ignore those in images, which leads to the fine-grained elements in visual content not receiving the full attention they deserve. In this paper, we propos… ▽ More Multimodal fine-grained sentiment analysis has recently attracted increasing attention due to its broad applications. However, the existing multimodal fine-grained sentiment datasets most focus on annotating the fine-grained elements in text but ignore those in images, which leads to the fine-grained elements in visual content not receiving the full attention they deserve. In this paper, we propose a new dataset, the Multimodal Aspect-Category Sentiment Analysis (MACSA) dataset, which contains more than 21K text-image pairs. The dataset provides fine-grained annotations for both textual and visual content and firstly uses the aspect category as the pivot to align the fine-grained elements between the two modalities. Based on our dataset, we propose the Multimodal ACSA task and a multimodal graph-based aligned model (MGAM), which adopts a fine-grained cross-modal fusion method. Experimental results show that our method can facilitate the baseline comparison for future research on this corpus. We will make the dataset and code publicly available. △ Less

Submitted 28 June, 2022; originally announced June 2022.

arXiv:2206.12742 [pdf, other]

A Planning-free Longitudinal Controller Design for Vehicles in Dynamic Traffic Environments

Authors: Wubing B. Qin

Abstract: This paper investigates the longitudinal control problem in a dynamic traffic environment where driving scenarios change between free-driving scenarios and car-following scenarios. A comprehensive longitudinal controller is proposed to ensure reasonable transient response and steady-state response in scenarios changes, which is independent of planning algorithms. This design takes into account pas… ▽ More This paper investigates the longitudinal control problem in a dynamic traffic environment where driving scenarios change between free-driving scenarios and car-following scenarios. A comprehensive longitudinal controller is proposed to ensure reasonable transient response and steady-state response in scenarios changes, which is independent of planning algorithms. This design takes into account passenger comfort, safety concerns and disturbance rejections, and attempts to meet the requirement of lower cost, faster response, increased comfort, enhanced safety and elevated extendability from the automated vehicle industry. Design insights and intuitions are provided in detail. Comprehensive simulations are conducted to demonstrate the efficacy of the proposed controller in different driving scenarios. △ Less

Submitted 25 June, 2022; originally announced June 2022.

Comments: 11 pages, 8 figures, 1 table

arXiv:2206.04592 [pdf, other]

Representing Lanes as Arc-length-based Parametric Curves to Facilitate Estimation in Vehicle Control

Authors: Wubing B. Qin

Abstract: This paper revisits the fundamental mathematics of Taylor series to approximate curves with function representation and arc-length-based parametric representation. Parametric representation is shown to preserve its form in coordinate transformation and parameter shifting. These preservations can significantly facilitate lane estimation in vehicle control since lanes perceived by cameras are typica… ▽ More This paper revisits the fundamental mathematics of Taylor series to approximate curves with function representation and arc-length-based parametric representation. Parametric representation is shown to preserve its form in coordinate transformation and parameter shifting. These preservations can significantly facilitate lane estimation in vehicle control since lanes perceived by cameras are typically represented in vehicle body-fixed frames which are translating and rotating. Then we derived the transformation from function representation to arc-length-based parametric representation and its inverse. We applied the transformation to lane estimation in vehicle control problem, and derived the evolution of coefficients for parametric representation that can be used for prediction. We come up with a procedure to simulate the whole process with perception, lane estimation and control for the path-following problem. Simulations are performed to demonstrate the efficacy of the proposed lane estimation algorithm using parametric representation. The results indicate that the proposed technique ensures that vehicle control can achieve reasonably good performance at very low perception updating rate. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: 14 pages, 8 figures, currently submitted and under review

arXiv:2205.12593 [pdf, other]

Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

Authors: Yanrui Du, Jing Yan, Yan Chen, Jing Liu, Sendong Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, Bing Qin

Abstract: Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring… ▽ More Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis shows that biased examples are easier for models to learn, while at the time of prediction, biased words make a significantly higher contribution to the models' predictions, and models tend to assign predicted labels over-relying on the spurious correlation between words and labels. To mitigate models' over-reliance on the shortcut (i.e. spurious correlation), we propose a training strategy Less-Learn-Shortcut (LLS): our strategy quantifies the biased degree of the biased examples and down-weights them accordingly. Experimental results on Question Matching, Natural Language Inference and Sentiment Analysis tasks show that LLS is a task-agnostic strategy and can improve the model performance on adversarial data while maintaining good performance on in-domain data. △ Less

Submitted 22 June, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.10822 [pdf, other]

A Graph Enhanced BERT Model for Event Prediction

Authors: Li Du, Xiao Ding, Yue Zhang, Kai Xiong, Ting Liu, Bing Qin

Abstract: Predicting the subsequent event for an existing event context is an important but challenging task, as it requires understanding the underlying relationship between events. Previous methods propose to retrieve relational features from event graph to enhance the modeling of event correlation. However, the sparsity of event graph may restrict the acquisition of relevant graph information, and hence… ▽ More Predicting the subsequent event for an existing event context is an important but challenging task, as it requires understanding the underlying relationship between events. Previous methods propose to retrieve relational features from event graph to enhance the modeling of event correlation. However, the sparsity of event graph may restrict the acquisition of relevant graph information, and hence influence the model performance. To address this issue, we consider automatically building of event graph using a BERT model. To this end, we incorporate an additional structured variable into BERT to learn to predict the event connections in the training process. Hence, in the test process, the connection relationship for unseen events can be predicted by the structured variable. Results on two event prediction tasks: script event prediction and story ending prediction, show that our approach can outperform state-of-the-art baseline methods. △ Less

Submitted 22 May, 2022; originally announced May 2022.

arXiv:2205.07762 [pdf, other]

A Nonlinear Lateral Controller Design for Vehicle Path-following with an Arbitrary Sensor Location

Authors: Wubing B. Qin, Zhaojian Li

Abstract: This paper investigates the lateral control problem in vehicular path-following when the feedback sensor(s) are mounted at an arbitrary location in the longitudinal symmetric axis. We point out that some existing literature has abused the kinematic bicycle model describing the motion of rear axle center for other locations, which may lead to poor performance in practical implementations. A new non… ▽ More This paper investigates the lateral control problem in vehicular path-following when the feedback sensor(s) are mounted at an arbitrary location in the longitudinal symmetric axis. We point out that some existing literature has abused the kinematic bicycle model describing the motion of rear axle center for other locations, which may lead to poor performance in practical implementations. A new nonlinear controller with low-complexity and high-maneuverability is then designed that takes into account senor mounting location, driving comfort and transient response with large initial errors. Design insights and intuitions are also provided in detail. Furthermore, analysis on stability and tracking performance for the closed-loop system are studied, and conditions and guidelines are provided on the selection of control parameters. Comprehensive simulations are performed to demonstrate the efficacy of the proposed nonlinear controller for arbitrary sensor locations. Meanwhile, we also show that designing controllers ignoring the sensor location may lead to unexpected vehicular sway motion in non-straight paths. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 11 pages, 9 figures, 1 table, submitted to IEEE Transactions on Intelligent Vehicles

arXiv:2205.05849 [pdf, other]

e-CARE: a New Dataset for Exploring Explainable Causal Reasoning

Authors: Li Du, Xiao Ding, Kai Xiong, Ting Liu, Bing Qin

Abstract: Understanding causality has vital importance for various Natural Language Processing (NLP) applications. Beyond the labeled instances, conceptual explanations of the causality can provide deep understanding of the causal facts to facilitate the causal reasoning process. However, such explanation information still remains absent in existing causal reasoning resources. In this paper, we fill this ga… ▽ More Understanding causality has vital importance for various Natural Language Processing (NLP) applications. Beyond the labeled instances, conceptual explanations of the causality can provide deep understanding of the causal facts to facilitate the causal reasoning process. However, such explanation information still remains absent in existing causal reasoning resources. In this paper, we fill this gap by presenting a human-annotated explainable CAusal REasoning dataset (e-CARE), which contains over 21K causal reasoning questions, together with natural language formed explanations of the causal questions. Experimental results show that generating valid explanations for causal facts still remains especially challenging for the state-of-the-art models, and the explanation information can be helpful for promoting the accuracy and stability of causal reasoning models. △ Less

Submitted 11 May, 2022; originally announced May 2022.

arXiv:2205.01879 [pdf, other]

doi 10.1109/TVT.2022.3175746

A Nonlinear Car-following Controller Design Inspired By Human-driving Behaviors to Increase Comfort and Enhance Safety

Authors: Wubing B. Qin

Abstract: This paper investigates the car-following problem and proposes a nonlinear controller that considers driving comfort, safety concerns, steady-state response and transient response. This controller is designed based on the demands of lower cost, faster response, increased comfort, enhanced safety and elevated extendability from the automotive industry. Design insights and intuitions are provided in… ▽ More This paper investigates the car-following problem and proposes a nonlinear controller that considers driving comfort, safety concerns, steady-state response and transient response. This controller is designed based on the demands of lower cost, faster response, increased comfort, enhanced safety and elevated extendability from the automotive industry. Design insights and intuitions are provided in detail. Also, theoretical analysis are performed on plant stability, string stability and tracking performance of the closed-loop system. Conditions and guidelines are provided on the selection of control parameters. Comprehensive simulations are conducted to demonstrate the efficacy of the proposed controller in different driving scenarios. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: 13 pages, 10 figures, submitted to IEEE Transactions on Vehicular Technology

Journal ref: TVT.2022.3175746

arXiv:2205.01620 [pdf, other]

Unifying the Convergences in Multilingual Neural Machine Translation

Authors: Yichong Huang, Xiaocheng Feng, Xinwei Geng, Bing Qin

Abstract: Although all-in-one-model multilingual neural machine translation (multilingual NMT) has achieved remarkable progress, the convergence inconsistency in the joint training is ignored, i.e., different language pairs reaching convergence in different epochs. This leads to the trained MNMT model over-fitting low-resource language translations while under-fitting high-resource ones. In this paper, we p… ▽ More Although all-in-one-model multilingual neural machine translation (multilingual NMT) has achieved remarkable progress, the convergence inconsistency in the joint training is ignored, i.e., different language pairs reaching convergence in different epochs. This leads to the trained MNMT model over-fitting low-resource language translations while under-fitting high-resource ones. In this paper, we propose a novel training strategy named LSSD (Language-Specific Self-Distillation), which can alleviate the convergence inconsistency and help MNMT models achieve the best performance on each language pair simultaneously. Specifically, LSSD picks up language-specific best checkpoints for each language pair to teach the current model on the fly. Furthermore, we systematically explore three sample-level manipulations of knowledge transferring. Experimental results on three datasets show that LSSD obtains consistent improvements towards all language pairs and achieves the state-of-the-art. △ Less

Submitted 19 October, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

Comments: EMNLP2022

arXiv:2204.10105 [pdf, other]

Working memory inspired hierarchical video decomposition with transformative representations

Authors: Binjie Qin, Haohao Mao, Ruipeng Zhang, Yueqi Zhu, Song Ding, Xu Chen

Abstract: Video decomposition is very important to extract moving foreground objects from complex backgrounds in computer vision, machine learning, and medical imaging, e.g., extracting moving contrast-filled vessels from the complex and noisy backgrounds of X-ray coronary angiography (XCA). However, the challenges caused by dynamic backgrounds, overlapping heterogeneous environments and complex noises stil… ▽ More Video decomposition is very important to extract moving foreground objects from complex backgrounds in computer vision, machine learning, and medical imaging, e.g., extracting moving contrast-filled vessels from the complex and noisy backgrounds of X-ray coronary angiography (XCA). However, the challenges caused by dynamic backgrounds, overlapping heterogeneous environments and complex noises still exist in video decomposition. To solve these problems, this study is the first to introduce a flexible visual working memory model in video decomposition tasks to provide interpretable and high-performance hierarchical deep architecture, integrating the transformative representations between sensory and control layers from the perspective of visual and cognitive neuroscience. Specifically, robust PCA unrolling networks acting as a structure-regularized sensor layer decompose XCA into sparse/low-rank structured representations to separate moving contrast-filled vessels from noisy and complex backgrounds. Then, patch recurrent convolutional LSTM networks with a backprojection module embody unstructured random representations of the control layer in working memory, recurrently projecting spatiotemporally decomposed nonlocal patches into orthogonal subspaces for heterogeneous vessel retrieval and interference suppression. This video decomposition deep architecture effectively restores the heterogeneous profiles of intensity and the geometries of moving objects against the complex background interferences. Experiments show that the proposed method significantly outperforms state-of-the-art methods in accurate moving contrast-filled vessel extraction with excellent flexibility and computational efficiency. △ Less

Submitted 5 May, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

arXiv:2204.08466 [pdf, other]

Robust PCA Unrolling Network for Super-resolution Vessel Extraction in X-ray Coronary Angiography

Authors: Binjie Qin, Haohao Mao, Yiming Liu, Jun Zhao, Yisong Lv, Yueqi Zhu, Song Ding, Xu Chen

Abstract: Although robust PCA has been increasingly adopted to extract vessels from X-ray coronary angiography (XCA) images, challenging problems such as inefficient vessel-sparsity modelling, noisy and dynamic background artefacts, and high computational cost still remain unsolved. Therefore, we propose a novel robust PCA unrolling network with sparse feature selection for super-resolution XCA vessel imagi… ▽ More Although robust PCA has been increasingly adopted to extract vessels from X-ray coronary angiography (XCA) images, challenging problems such as inefficient vessel-sparsity modelling, noisy and dynamic background artefacts, and high computational cost still remain unsolved. Therefore, we propose a novel robust PCA unrolling network with sparse feature selection for super-resolution XCA vessel imaging. Being embedded within a patch-wise spatiotemporal super-resolution framework that is built upon a pooling layer and a convolutional long short-term memory network, the proposed network can not only gradually prune complex vessel-like artefacts and noisy backgrounds in XCA during network training but also iteratively learn and select the high-level spatiotemporal semantic information of moving contrast agents flowing in the XCA-imaged vessels. The experimental results show that the proposed method significantly outperforms state-of-the-art methods, especially in the imaging of the vessel network and its distal vessels, by restoring the intensity and geometry profiles of heterogeneous vessels against complex and dynamic backgrounds. △ Less

Submitted 23 April, 2022; v1 submitted 16 April, 2022; originally announced April 2022.

arXiv:2203.16373 [pdf, other]

Slow-varying Dynamics Assisted Temporal Capsule Network for Machinery Remaining Useful Life Estimation

Authors: Yan Qin, Chau Yuen, Yimin Shao, Bo Qin, Xiaoli Li

Abstract: Capsule network (CapsNet) acts as a promising alternative to the typical convolutional neural network, which is the dominant network to develop the remaining useful life (RUL) estimation models for mechanical equipment. Although CapsNet comes with an impressive ability to represent the entities' hierarchical relationships through a high-dimensional vector embedding, it fails to capture the long-te… ▽ More Capsule network (CapsNet) acts as a promising alternative to the typical convolutional neural network, which is the dominant network to develop the remaining useful life (RUL) estimation models for mechanical equipment. Although CapsNet comes with an impressive ability to represent the entities' hierarchical relationships through a high-dimensional vector embedding, it fails to capture the long-term temporal correlation of run-to-failure time series measured from degraded mechanical equipment. On the other hand, the slow-varying dynamics, which reveals the low-frequency information hidden in mechanical dynamical behaviour, is overlooked in the existing RUL estimation models, limiting the utmost ability of advanced networks. To address the aforementioned concerns, we propose a Slow-varying Dynamics assisted Temporal CapsNet (SD-TemCapsNet) to simultaneously learn the slow-varying dynamics and temporal dynamics from measurements for accurate RUL estimation. First, in light of the sensitivity of fault evolution, slow-varying features are decomposed from normal raw data to convey the low-frequency components corresponding to the system dynamics. Next, the long short-term memory (LSTM) mechanism is introduced into CapsNet to capture the temporal correlation of time series. To this end, experiments conducted on an aircraft engine and a milling machine verify that the proposed SD-TemCapsNet outperforms the mainstream methods. In comparison with CapsNet, the estimation accuracy of the aircraft engine with four different scenarios has been improved by 10.17%, 24.97%, 3.25%, and 13.03% concerning the index root mean squared error, respectively. Similarly, the estimation accuracy of the milling machine has been improved by 23.57% compared to LSTM and 19.54% compared to CapsNet. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: This paper has been accepted by IEEE Transactions on Cybernetics

arXiv:2203.06958 [pdf, other]

S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

Authors: Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Qin, Bowen Li, Jian Sun, Yongbin Li

Abstract: The task of converting a natural language question into an executable SQL query, known as text-to-SQL, is an important branch of semantic parsing. The state-of-the-art graph-based encoder has been successfully used in this task but does not model the question syntax well. In this paper, we propose S$^2$SQL, injecting Syntax to question-Schema graph encoder for Text-to-SQL parsers, which effectivel… ▽ More The task of converting a natural language question into an executable SQL query, known as text-to-SQL, is an important branch of semantic parsing. The state-of-the-art graph-based encoder has been successfully used in this task but does not model the question syntax well. In this paper, we propose S$^2$SQL, injecting Syntax to question-Schema graph encoder for Text-to-SQL parsers, which effectively leverages the syntactic dependency information of questions in text-to-SQL to improve the performance. We also employ the decoupling constraint to induce diverse relational edge embedding, which further improves the network's performance. Experiments on the Spider and robustness setting Spider-Syn demonstrate that the proposed approach outperforms all existing methods when pre-training models are used, resulting in a performance ranks first on the Spider leaderboard. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: Accepted at ACL 2022 Findings

arXiv:2203.00257 [pdf, other]

Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors

Authors: Yang Wu, Yanyan Zhao, Hao Yang, Song Chen, Bing Qin, Xiaohuan Cao, Wenting Zhao

Abstract: Multimodal sentiment analysis has attracted increasing attention and lots of models have been proposed. However, the performance of the state-of-the-art models decreases sharply when they are deployed in the real world. We find that the main reason is that real-world applications can only access the text outputs by the automatic speech recognition (ASR) models, which may be with errors because of… ▽ More Multimodal sentiment analysis has attracted increasing attention and lots of models have been proposed. However, the performance of the state-of-the-art models decreases sharply when they are deployed in the real world. We find that the main reason is that real-world applications can only access the text outputs by the automatic speech recognition (ASR) models, which may be with errors because of the limitation of model capacity. Through further analysis of the ASR outputs, we find that in some cases the sentiment words, the key sentiment elements in the textual modality, are recognized as other words, which makes the sentiment of the text change and hurts the performance of multimodal sentiment models directly. To address this problem, we propose the sentiment word aware multimodal refinement model (SWRM), which can dynamically refine the erroneous sentiment words by leveraging multimodal sentiment clues. Specifically, we first use the sentiment word position detection module to obtain the most possible position of the sentiment word in the text and then utilize the multimodal sentiment word refinement module to dynamically refine the sentiment word embeddings. The refined embeddings are taken as the textual inputs of the multimodal feature fusion module to predict the sentiment labels. We conduct extensive experiments on the real-world datasets including MOSI-Speechbrain, MOSI-IBM, and MOSI-iFlytek and the results demonstrate the effectiveness of our model, which surpasses the current state-of-the-art models on three datasets. Furthermore, our approach can be adapted for other multimodal feature fusion models easily. Data and code are available at https://github.com/albertwy/SWRM. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2202.12142 [pdf, other]

Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words

Authors: Zhangyin Feng, Duyu Tang, Cong Zhou, Junwei Liao, Shuangzhi Wu, Xiaocheng Feng, Bing Qin, Yunbo Cao, Shuming Shi

Abstract: The standard BERT adopts subword-based tokenization, which may break a word into two or more wordpieces (e.g., converting "lossless" to "loss" and "less"). This will bring inconvenience in following situations: (1) what is the best way to obtain the contextual vector of a word that is divided into multiple wordpieces? (2) how to predict a word via cloze test without knowing the number of wordpiece… ▽ More The standard BERT adopts subword-based tokenization, which may break a word into two or more wordpieces (e.g., converting "lossless" to "loss" and "less"). This will bring inconvenience in following situations: (1) what is the best way to obtain the contextual vector of a word that is divided into multiple wordpieces? (2) how to predict a word via cloze test without knowing the number of wordpieces in advance? In this work, we explore the possibility of developing BERT-style pretrained model over a vocabulary of words instead of wordpieces. We call such word-level BERT model as WordBERT. We train models with different vocabulary sizes, initialization configurations and languages. Results show that, compared to standard wordpiece-based BERT, WordBERT makes significant improvements on cloze test and machine reading comprehension. On many other natural language understanding tasks, including POS tagging, chunking and NER, WordBERT consistently performs better than BERT. Model analysis indicates that the major advantage of WordBERT over BERT lies in the understanding for low-frequency words and rare words. Furthermore, since the pipeline is language-independent, we train WordBERT for Chinese language and obtain significant gains on five natural language understanding datasets. Lastly, the analyse on inference speed illustrates WordBERT has comparable time cost to BERT in natural language understanding tasks. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.08576 [pdf, other]

Local Differential Privacy for Belief Functions

Authors: Qiyu Li, Chunlai Zhou, Biao Qin, Zhiqiang Xu

Abstract: In this paper, we propose two new definitions of local differential privacy for belief functions. One is based on Shafer's semantics of randomly coded messages and the other from the perspective of imprecise probabilities. We show that such basic properties as composition and post-processing also hold for our new definitions. Moreover, we provide a hypothesis testing framework for these definition… ▽ More In this paper, we propose two new definitions of local differential privacy for belief functions. One is based on Shafer's semantics of randomly coded messages and the other from the perspective of imprecise probabilities. We show that such basic properties as composition and post-processing also hold for our new definitions. Moreover, we provide a hypothesis testing framework for these definitions and study the effect of "don't know" in the trade-off between privacy and utility in discrete distribution estimation. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2112.08723 [pdf, other]

Distilled Dual-Encoder Model for Vision-Language Understanding

Authors: Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei

Abstract: We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering. Dual-encoder models have a faster inference speed than fusion-encoder models and enable the pre-computation of images and text during inference. However, the shallow interaction module used in dual-encoder models is… ▽ More We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering. Dual-encoder models have a faster inference speed than fusion-encoder models and enable the pre-computation of images and text during inference. However, the shallow interaction module used in dual-encoder models is insufficient to handle complex vision-language understanding tasks. In order to learn deep interactions of images and text, we introduce cross-modal attention distillation, which uses the image-to-text and text-to-image attention distributions of a fusion-encoder model to guide the training of our dual-encoder model. In addition, we show that applying the cross-modal attention distillation for both pre-training and fine-tuning stages achieves further improvements. Experimental results demonstrate that the distilled dual-encoder model achieves competitive performance for visual reasoning, visual entailment and visual question answering tasks while enjoying a much faster inference speed than fusion-encoder models. Our code and models will be publicly available at https://github.com/kugwzk/Distilled-DualEncoder. △ Less

Submitted 17 October, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: EMNLP 2022

arXiv:2112.03603 [pdf, other]

Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning

Authors: Xiaohang Bian, Bo Qin, Xiaozhe Xin, Jianwu Li, Xuefeng Su, Yanfeng Wang

Abstract: Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual… ▽ More Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual learning Network (ABM) which consists of one shared encoder and two parallel inverse decoders (L2R and R2L). The two decoders are enhanced via mutual distillation, which involves one-to-one knowledge transfer at each training step, making full use of the complementary information from two inverse directions. Moreover, in order to deal with mathematical symbols in diverse scales, an Attention Aggregation Module (AAM) is proposed to effectively integrate multi-scale coverage attentions. Notably, in the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference, keeping the original parameter size and inference speed. Extensive experiments demonstrate that our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling, substantially outperforming the state-of-the-art methods. The source code is available in https://github.com/XH-B/ABM. △ Less

Submitted 23 February, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: 9 pages,5 figures, have been accepted in AAAI 2022 Oral

Journal ref: AAAI 2022

arXiv:2111.10946 [pdf, other]

A General Framework for Lifelong Localization and Mapping in Changing Environment

Authors: Min Zhao, Xin Guo, Le Song, Baoxing Qin, Xuesong Shi, Gim Hee Lee, Guanghui Sun

Abstract: The environment of most real-world scenarios such as malls and supermarkets changes at all times. A pre-built map that does not account for these changes becomes out-of-date easily. Therefore, it is necessary to have an up-to-date model of the environment to facilitate long-term operation of a robot. To this end, this paper presents a general lifelong simultaneous localization and mapping (SLAM) f… ▽ More The environment of most real-world scenarios such as malls and supermarkets changes at all times. A pre-built map that does not account for these changes becomes out-of-date easily. Therefore, it is necessary to have an up-to-date model of the environment to facilitate long-term operation of a robot. To this end, this paper presents a general lifelong simultaneous localization and mapping (SLAM) framework. Our framework uses a multiple session map representation, and exploits an efficient map updating strategy that includes map building, pose graph refinement and sparsification. To mitigate the unbounded increase of memory usage, we propose a map-trimming method based on the Chow-Liu maximum-mutual-information spanning tree. The proposed SLAM framework has been comprehensively validated by over a month of robot deployment in real supermarket environment. Furthermore, we release the dataset collected from the indoor and outdoor changing environment with the hope to accelerate lifelong SLAM research in the community. Our dataset is available at https://github.com/sanduan168/lifelong-SLAM-dataset. △ Less

Submitted 21 November, 2021; originally announced November 2021.

arXiv:2111.09486 [pdf, other]

Linking-Enhanced Pre-Training for Table Semantic Parsing

Authors: Bowen Qin, Lihan Wang, Binyuan Hui, Ruiying Geng, Zheng Cao, Min Yang, Jian Sun, Yongbin Li

Abstract: Recently pre-training models have significantly improved the performance of various NLP tasks by leveraging large-scale text corpora to improve the contextual representation ability of the neural network. The large pre-training language model has also been applied in the area of table semantic parsing. However, existing pre-training approaches have not carefully explored explicit interaction relat… ▽ More Recently pre-training models have significantly improved the performance of various NLP tasks by leveraging large-scale text corpora to improve the contextual representation ability of the neural network. The large pre-training language model has also been applied in the area of table semantic parsing. However, existing pre-training approaches have not carefully explored explicit interaction relationships between a question and the corresponding database schema, which is a key ingredient for uncovering their semantic and structural correspondence. Furthermore, the question-aware representation learning in the schema grounding context has received less attention in pre-training objective.To alleviate these issues, this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training. We further propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner. We evaluate our pre-trained framework by fine-tuning it on two benchmarks, Spider and SQUALL. The results demonstrate the effectiveness of our pre-training objective and curriculum compared to a variety of baselines. △ Less

Submitted 14 February, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

arXiv:2108.09104 [pdf, other]

doi 10.1145/3459637.3481924

GEDIT: Geographic-Enhanced and Dependency-Guided Tagging for Joint POI and Accessibility Extraction at Baidu Maps

Authors: Yibo Sun, Jizhou Huang, Chunyuan Yuan, Miao Fan, Haifeng Wang, Ming Liu, Bing Qin

Abstract: Providing timely accessibility reminders of a point-of-interest (POI) plays a vital role in improving user satisfaction of finding places and making visiting decisions. However, it is difficult to keep the POI database in sync with the real-world counterparts due to the dynamic nature of business changes. To alleviate this problem, we formulate and present a practical solution that jointly extract… ▽ More Providing timely accessibility reminders of a point-of-interest (POI) plays a vital role in improving user satisfaction of finding places and making visiting decisions. However, it is difficult to keep the POI database in sync with the real-world counterparts due to the dynamic nature of business changes. To alleviate this problem, we formulate and present a practical solution that jointly extracts POI mentions and identifies their coupled accessibility labels from unstructured text. We approach this task as a sequence tagging problem, where the goal is to produce <POI name, accessibility label> pairs from unstructured text. This task is challenging because of two main issues: (1) POI names are often newly-coined words so as to successfully register new entities or brands and (2) there may exist multiple pairs in the text, which necessitates dealing with one-to-many or many-to-one mapping to make each POI coupled with its accessibility label. To this end, we propose a Geographic-Enhanced and Dependency-guIded sequence Tagging (GEDIT) model to concurrently address the two challenges. First, to alleviate challenge #1, we develop a geographic-enhanced pre-trained model to learn the text representations. Second, to mitigate challenge #2, we apply a relational graph convolutional network to learn the tree node representations from the parsed dependency tree. Finally, we construct a neural sequence tagging model by integrating and feeding the previously pre-learned representations into a CRF layer. Extensive experiments conducted on a real-world dataset demonstrate the superiority and effectiveness of GEDIT. In addition, it has already been deployed in production at Baidu Maps. Statistics show that the proposed solution can save significant human effort and labor costs to deal with the same amount of documents, which confirms that it is a practical way for POI accessibility maintenance. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: Accepted by CIKM'21

arXiv:2108.02230 [pdf, other]

doi 10.1007/s11071-022-07761-4

Nonholonomic dynamics and control of road vehicles: moving toward automation

Authors: Wubing B. Qin, Yiming Zhang, Dénes Takács, Gábor Stépán, Gábor Orosz

Abstract: Nonholonomic models of automobiles are developed by utilizing tools of analytical mechanics, in particular the Appellian approach that allows one to describe the vehicle dynamics with minimum number of time-dependent state variables. The models are categorized based on how they represent the wheel-ground contact, whether they incorporate the longitudinal dynamics, and whether they consider the ste… ▽ More Nonholonomic models of automobiles are developed by utilizing tools of analytical mechanics, in particular the Appellian approach that allows one to describe the vehicle dynamics with minimum number of time-dependent state variables. The models are categorized based on how they represent the wheel-ground contact, whether they incorporate the longitudinal dynamics, and whether they consider the steering dynamics. It is demonstrated that the developed models can be used to design low-complexity controllers that enable automated vehicles to execute a large variety of maneuvers with high precision. △ Less

Submitted 25 June, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: 42 pages, 25 figures, 5 tables, accepted for inclusion in a future issue in Nonlinear Dynamics, Springer

Journal ref: Nonlinear Dynamics (2022)

arXiv:2108.01049 [pdf, other]

doi 10.1073/pnas.2108548118

Bacteria hinder large-scale transport and enhance small-scale mixing in time-periodic flows

Authors: Ranjiangshang Ran, Quentin Brosseau, Brendan C. Blackwell, Boyang Qin, Rebecca Winter, Paulo E. Arratia

Abstract: Understanding mixing and transport of passive scalars in active fluids is important to many natural (e.g. algal blooms) and industrial (e.g. biofuel, vaccine production) processes. Here, we study the mixing of a passive scalar (dye) in dilute suspensions of swimming Escherichia coli in experiments using a two-dimensional (2D) time-periodic flow and in a simple simulation. Results show that the pre… ▽ More Understanding mixing and transport of passive scalars in active fluids is important to many natural (e.g. algal blooms) and industrial (e.g. biofuel, vaccine production) processes. Here, we study the mixing of a passive scalar (dye) in dilute suspensions of swimming Escherichia coli in experiments using a two-dimensional (2D) time-periodic flow and in a simple simulation. Results show that the presence of bacteria hinders large scale transport and reduce overall mixing rate. Stretching fields, calculated from experimentally measured velocity fields, show that bacterial activity attenuates fluid stretching and lowers flow chaoticity. Simulations suggest that this attenuation may be attributed to a transient accumulation of bacteria along regions of high stretching. Spatial power spectra and correlation functions of dye concentration fields show that the transport of scalar variance across scales is also hindered by bacterial activity, resulting in an increase in average size and lifetime of structures. On the other hand, at small scales, activity seems to enhance local mixing. One piece of evidence is that the probability distribution of the spatial concentration gradients is nearly symmetric with a vanishing skewness. Overall, our results show that the coupling between activity and flow can lead to nontrivial effects on mixing and transport. △ Less

Submitted 22 August, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: Supplementary Information added

arXiv:2107.09852 [pdf, other]

CausalBERT: Injecting Causal Knowledge Into Pre-trained Models with Minimal Supervision

Authors: Zhongyang Li, Xiao Ding, Kuo Liao, Bing Qin, Ting Liu

Abstract: Recent work has shown success in incorporating pre-trained models like BERT to improve NLP systems. However, existing pre-trained models lack of causal knowledge which prevents today's NLP systems from thinking like humans. In this paper, we investigate the problem of injecting causal knowledge into pre-trained models. There are two fundamental problems: 1) how to collect various granularities of… ▽ More Recent work has shown success in incorporating pre-trained models like BERT to improve NLP systems. However, existing pre-trained models lack of causal knowledge which prevents today's NLP systems from thinking like humans. In this paper, we investigate the problem of injecting causal knowledge into pre-trained models. There are two fundamental problems: 1) how to collect various granularities of causal pairs from unstructured texts; 2) how to effectively inject causal knowledge into pre-trained models. To address these issues, we extend the idea of CausalBERT from previous studies, and conduct experiments on various datasets to evaluate its effectiveness. In addition, we adopt a regularization-based method to preserve the already learned knowledge with an extra regularization term while injecting causal knowledge. Extensive experiments on 7 datasets, including four causal pair classification tasks, two causal QA tasks and a causal inference task, demonstrate that CausalBERT captures rich causal knowledge and outperforms all pre-trained models-based state-of-the-art methods, achieving a new causal inference benchmark. △ Less

Submitted 7 August, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

arXiv:2107.03175 [pdf, ps, other]

A Survey on Dialogue Summarization: Recent Advances and New Frontiers

Authors: Xiachong Feng, Xiaocheng Feng, Bing Qin

Abstract: Dialogue summarization aims to condense the original dialogue into a shorter version covering salient information, which is a crucial way to reduce dialogue data overload. Recently, the promising achievements in both dialogue systems and natural language generation techniques drastically lead this task to a new landscape, which results in significant research attentions. However, there still remai… ▽ More Dialogue summarization aims to condense the original dialogue into a shorter version covering salient information, which is a crucial way to reduce dialogue data overload. Recently, the promising achievements in both dialogue systems and natural language generation techniques drastically lead this task to a new landscape, which results in significant research attentions. However, there still remains a lack of a comprehensive survey for this task. To this end, we take the first step and present a thorough review of this research field carefully and widely. In detail, we systematically organize the current works according to the characteristics of each domain, covering meeting, chat, email thread, customer service and medical dialogue. Additionally, we provide an overview of publicly available research datasets as well as organize two leaderboards under unified metrics. Furthermore, we discuss some future directions, including faithfulness, multi-modal, multi-domain and multi-lingual dialogue summarization, and give our thoughts respectively. We hope that this first survey of dialogue summarization can provide the community with a quick access and a general picture to this task and motivate future researches. △ Less

Submitted 27 April, 2022; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: IJCAI 2022 Survey Track

arXiv:2106.09895 [pdf, other]

PRGC: Potential Relation and Global Correspondence Based Joint Relational Triple Extraction

Authors: Hengyi Zheng, Rui Wen, Xi Chen, Yifan Yang, Yunyan Zhang, Ziheng Zhang, Ningyu Zhang, Bin Qin, Ming Xu, Yefeng Zheng

Abstract: Joint extraction of entities and relations from unstructured texts is a crucial task in information extraction. Recent methods achieve considerable performance but still suffer from some inherent limitations, such as redundancy of relation prediction, poor generalization of span-based extraction and inefficiency. In this paper, we decompose this task into three subtasks, Relation Judgement, Entity… ▽ More Joint extraction of entities and relations from unstructured texts is a crucial task in information extraction. Recent methods achieve considerable performance but still suffer from some inherent limitations, such as redundancy of relation prediction, poor generalization of span-based extraction and inefficiency. In this paper, we decompose this task into three subtasks, Relation Judgement, Entity Extraction and Subject-object Alignment from a novel perspective and then propose a joint relational triple extraction framework based on Potential Relation and Global Correspondence (PRGC). Specifically, we design a component to predict potential relations, which constrains the following entity extraction to the predicted relation subset rather than all relations; then a relation-specific sequence tagging component is applied to handle the overlapping problem between subjects and objects; finally, a global correspondence component is designed to align the subject and object into a triple with low-complexity. Extensive experiments show that PRGC achieves state-of-the-art performance on public benchmarks with higher efficiency and delivers consistent performance gain on complex scenarios of overlapping triples. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: Accepted by ACL 2021

arXiv:2105.12544 [pdf, other]

Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

Authors: Xiachong Feng, Xiaocheng Feng, Libo Qin, Bing Qin, Ting Liu

Abstract: Current dialogue summarization systems usually encode the text with a number of general semantic features (e.g., keywords and topics) to gain more powerful dialogue modeling capabilities. However, these features are obtained via open-domain toolkits that are dialog-agnostic or heavily relied on human annotations. In this paper, we show how DialoGPT, a pre-trained model for conversational response… ▽ More Current dialogue summarization systems usually encode the text with a number of general semantic features (e.g., keywords and topics) to gain more powerful dialogue modeling capabilities. However, these features are obtained via open-domain toolkits that are dialog-agnostic or heavily relied on human annotations. In this paper, we show how DialoGPT, a pre-trained model for conversational response generation, can be developed as an unsupervised dialogue annotator, which takes advantage of dialogue background knowledge encoded in DialoGPT. We apply DialoGPT to label three types of features on two dialogue summarization datasets, SAMSum and AMI, and employ pre-trained and non pre-trained models as our summarizes. Experimental results show that our proposed method can obtain remarkable improvements on both datasets and achieves new state-of-the-art performance on the SAMSum dataset. △ Less

Submitted 27 May, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

Comments: ACL 2021

arXiv:2104.14839 [pdf, other]

The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey

Authors: Yichong Huang, Xiachong Feng, Xiaocheng Feng, Bing Qin

Abstract: Recently, various neural encoder-decoder models pioneered by Seq2Seq framework have been proposed to achieve the goal of generating more abstractive summaries by learning to map input text to output text. At a high level, such neural models can freely generate summaries without any constraint on the words or phrases used. Moreover, their format is closer to human-edited summaries and output is mor… ▽ More Recently, various neural encoder-decoder models pioneered by Seq2Seq framework have been proposed to achieve the goal of generating more abstractive summaries by learning to map input text to output text. At a high level, such neural models can freely generate summaries without any constraint on the words or phrases used. Moreover, their format is closer to human-edited summaries and output is more readable and fluent. However, the neural model's abstraction ability is a double-edged sword. A commonly observed problem with the generated summaries is the distortion or fabrication of factual information in the article. This inconsistency between the original text and the summary has caused various concerns over its applicability, and the previous evaluation methods of text summarization are not suitable for this issue. In response to the above problems, the current research direction is predominantly divided into two categories, one is to design fact-aware evaluation metrics to select outputs without factual inconsistency errors, and the other is to develop new summarization systems towards factual consistency. In this survey, we focus on presenting a comprehensive review of these fact-specific evaluation methods and text summarization models. △ Less

Submitted 10 April, 2023; v1 submitted 30 April, 2021; originally announced April 2021.

Comments: 9 pages, 5 figures

arXiv:2104.12377 [pdf, other]

DADgraph: A Discourse-aware Dialogue Graph Neural Network for Multiparty Dialogue Machine Reading Comprehension

Authors: Jiaqi Li, Ming Liu, Zihao Zheng, Heng Zhang, Bing Qin, Min-Yen Kan, Ting Liu

Abstract: Multiparty Dialogue Machine Reading Comprehension (MRC) differs from traditional MRC as models must handle the complex dialogue discourse structure, previously unconsidered in traditional MRC. To fully exploit such discourse structure in multiparty dialogue, we present a discourse-aware dialogue graph neural network, DADgraph, which explicitly constructs the dialogue graph using discourse dependen… ▽ More Multiparty Dialogue Machine Reading Comprehension (MRC) differs from traditional MRC as models must handle the complex dialogue discourse structure, previously unconsidered in traditional MRC. To fully exploit such discourse structure in multiparty dialogue, we present a discourse-aware dialogue graph neural network, DADgraph, which explicitly constructs the dialogue graph using discourse dependency links and discourse relations. To validate our model, we perform experiments on the Molweni corpus, a large-scale MRC dataset built over multiparty dialogue annotated with discourse structure. Experiments on Molweni show that our discourse-aware model achieves statistically significant improvements compared against strong neural network MRC baselines. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: Accepted by IJCNN 2021

arXiv:2104.08480 [pdf, other]

Learning to Share by Masking the Non-shared for Multi-domain Sentiment Classification

Authors: Jianhua Yuan, Yanyan Zhao, Bing Qin, Ting Liu

Abstract: Multi-domain sentiment classification deals with the scenario where labeled data exists for multiple domains but insufficient for training effective sentiment classifiers that work across domains. Thus, fully exploiting sentiment knowledge shared across domains is crucial for real world applications. While many existing works try to extract domain-invariant features in high-dimensional space, such… ▽ More Multi-domain sentiment classification deals with the scenario where labeled data exists for multiple domains but insufficient for training effective sentiment classifiers that work across domains. Thus, fully exploiting sentiment knowledge shared across domains is crucial for real world applications. While many existing works try to extract domain-invariant features in high-dimensional space, such models fail to explicitly distinguish between shared and private features at text-level, which to some extent lacks interpretablity. Based on the assumption that removing domain-related tokens from texts would help improve their domain-invariance, we instead first transform original sentences to be domain-agnostic. To this end, we propose the BertMasker network which explicitly masks domain-related words from texts, learns domain-invariant sentiment features from these domain-agnostic texts, and uses those masked words to form domain-aware sentence representations. Empirical experiments on a well-adopted multiple domain sentiment classification dataset demonstrate the effectiveness of our proposed model on both multi-domain sentiment classification and cross-domain settings, by increasing the accuracy by 0.94% and 1.8% respectively. Further analysis on masking proves that removing those domain-related and sentiment irrelevant tokens decreases texts' domain distinction, resulting in the performance degradation of a BERT-based domain classifier by over 12%. △ Less

Submitted 17 April, 2021; originally announced April 2021.

Comments: 11 pages

Showing 101–150 of 233 results for author: Qin, B