Search | arXiv e-print repository

Solution-oriented Agent-based Models Generation with Verifier-assisted Iterative In-context Learning

Authors: Tong Niu, Weihao Zhang, Rong Zhao

Abstract: Agent-based models (ABMs) stand as an essential paradigm for proposing and validating hypothetical solutions or policies aimed at addressing challenges posed by complex systems and achieving various objectives. This process demands labor-intensive endeavors and multidisciplinary expertise. Large language models (LLMs) encapsulating cross-domain knowledge and programming proficiency could potential… ▽ More Agent-based models (ABMs) stand as an essential paradigm for proposing and validating hypothetical solutions or policies aimed at addressing challenges posed by complex systems and achieving various objectives. This process demands labor-intensive endeavors and multidisciplinary expertise. Large language models (LLMs) encapsulating cross-domain knowledge and programming proficiency could potentially alleviate the difficulty of this process. However, LLMs excel in handling sequential information, making it challenging for analyzing the intricate interactions and nonlinear dynamics inherent in ABMs. Additionally, due to the lack of self-evaluation capability of LLMs, relying solely on LLMs is insufficient to effectively accomplish this process. In this paper, we present SAGE, a general solution-oriented ABM generation framework designed for automatic modeling and generating solutions for targeted problems. Unlike approaches reliant on expert handcrafting or resource-intensive neural network training, SAGE establishes a verifier-assisted iterative in-context learning process employing large language models (LLMs) to leverages their inherent cross-domain knowledge for tackling intricate demands from diverse domain scenarios. In SAGE, we introduce an semi-structured conceptual representation expliciting the intricate structures of ABMs and an objective representation to guide LLMs in modeling scenarios and proposing hypothetical solutions through in-context learning. To ensure the model executability and solution feasibility, SAGE devises a two-level verifier with chain-of-thought prompting tailored to the complex interactions and non-linear dynamics of ABMs, driving the iterative generation optimization. Moreover, we construct an evaluation dataset of solution-oriented ABMs from open sources.It contains practical models across various domains. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Journal ref: International Conference on Autonomous Agents and Multiagent Systems 2024

arXiv:2401.13945 [pdf]

doi 10.1007/s11633-024-1496-2

General Automatic Solution Generation of Social Problems

Authors: Tong Niu, Haoyu Huang, Yu Du, Weihao Zhang, Luping Shi, Rong Zhao

Abstract: Given the escalating intricacy and multifaceted nature of contemporary social systems, manually generating solutions to address pertinent social issues has become a formidable task. In response to this challenge, the rapid development of artificial intelligence has spurred the exploration of computational methodologies aimed at automatically generating solutions. However, current methods for auto-… ▽ More Given the escalating intricacy and multifaceted nature of contemporary social systems, manually generating solutions to address pertinent social issues has become a formidable task. In response to this challenge, the rapid development of artificial intelligence has spurred the exploration of computational methodologies aimed at automatically generating solutions. However, current methods for auto-generation of solutions mainly concentrate on local social regulations that pertain to specific scenarios. Here, we report an automatic social operating system (ASOS) designed for general social solution generation, which is built upon agent-based models, enabling both global and local analyses and regulations of social problems across spatial and temporal dimensions. ASOS adopts a hypergraph with extensible social semantics for a comprehensive and structured representation of social dynamics. It also incorporates a generalized protocol for standardized hypergraph operations and a symbolic hybrid framework that delivers interpretable solutions, yielding a balance between regulatory efficacy and function viability. To demonstrate the effectiveness of ASOS, we apply it to the domain of averting extreme events within international oil futures markets. By generating a new trading role supplemented by new mechanisms, ASOS can adeptly discern precarious market conditions and make front-running interventions for non-profit purposes. This study demonstrates that ASOS provides an efficient and systematic approach for generating solutions for enhancing our society. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Journal ref: Machine Intelligence Research 2024

arXiv:2401.06947 [pdf, other]

Parameter-Efficient Detoxification with Contrastive Decoding

Authors: Tong Niu, Caiming Xiong, Semih Yavuz, Yingbo Zhou

Abstract: The field of natural language generation has witnessed significant advancements in recent years, including the development of controllable text generation techniques. However, controlling the attributes of the generated text remains a challenge, especially when aiming to avoid undesirable behavior such as toxicity. In this work, we introduce Detoxification Generator (DETOXIGEN), an inference-time… ▽ More The field of natural language generation has witnessed significant advancements in recent years, including the development of controllable text generation techniques. However, controlling the attributes of the generated text remains a challenge, especially when aiming to avoid undesirable behavior such as toxicity. In this work, we introduce Detoxification Generator (DETOXIGEN), an inference-time algorithm that steers the generation away from unwanted styles. DETOXIGEN is an ensemble of a pre-trained language model (generator) and a detoxifier. The detoxifier is trained intentionally on the toxic data representative of the undesirable attribute, encouraging it to generate text in that style exclusively. During the actual generation, we use the trained detoxifier to produce undesirable tokens for the generator to contrast against at each decoding step. This approach directly informs the generator to avoid generating tokens that the detoxifier considers highly likely. We evaluate DETOXIGEN on the commonly used REALTOXICITYPROMPTS benchmark (Gehman et al., 2020) with various language models as generators. We find that it significantly outperforms previous approaches in detoxification metrics while not compromising on the generation quality. Moreover, the detoxifier is obtained by soft prompt-tuning using the same backbone language model as the generator. Hence, DETOXIGEN requires only a tiny amount of extra weights from the virtual tokens of the detoxifier to be loaded into GPU memory while decoding, making it a promising lightweight, practical, and parameter-efficient detoxification strategy. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2311.10952 [pdf, other]

NAS-ASDet: An Adaptive Design Method for Surface Defect Detection Network using Neural Architecture Search

Authors: Zhenrong Wang, Bin Li, Weifeng Li, Shuanlong Niu, Wang Miao, Tongzhi Niu

Abstract: Deep convolutional neural networks (CNNs) have been widely used in surface defect detection. However, no CNN architecture is suitable for all detection tasks and designing effective task-specific requires considerable effort. The neural architecture search (NAS) technology makes it possible to automatically generate adaptive data-driven networks. Here, we propose a new method called NAS-ASDet to a… ▽ More Deep convolutional neural networks (CNNs) have been widely used in surface defect detection. However, no CNN architecture is suitable for all detection tasks and designing effective task-specific requires considerable effort. The neural architecture search (NAS) technology makes it possible to automatically generate adaptive data-driven networks. Here, we propose a new method called NAS-ASDet to adaptively design network for surface defect detection. First, a refined and industry-appropriate search space that can adaptively adjust the feature distribution is designed, which consists of repeatedly stacked basic novel cells with searchable attention operations. Then, a progressive search strategy with a deep supervision mechanism is used to explore the search space faster and better. This method can design high-performance and lightweight defect detection networks with data scarcity in industrial scenarios. The experimental results on four datasets demonstrate that the proposed method achieves superior performance and a relatively lighter model size compared to other competitive methods, including both manual and NAS-based approaches. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2310.20170 [pdf, other]

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Authors: Wenting Zhao, Ye Liu, Tong Niu, Yao Wan, Philip S. Yu, Shafiq Joty, Yingbo Zhou, Semih Yavuz

Abstract: Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasi… ▽ More Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasized retrieval from unstructured text corpora, owing to its seamless integration into prompts. When using structured data such as knowledge graphs, most methods simplify it into natural text, neglecting the underlying structures. Moreover, a significant gap in the current landscape is the absence of a realistic benchmark for evaluating the effectiveness of grounding LLMs on heterogeneous knowledge sources (e.g., knowledge base and text). To fill this gap, we have curated a comprehensive dataset that poses two unique challenges: (1) Two-hop multi-source questions that require retrieving information from both open-domain structured and unstructured knowledge sources; retrieving information from structured knowledge sources is a critical component in correctly answering the questions. (2) The generation of symbolic queries (e.g., SPARQL for Wikidata) is a key requirement, which adds another layer of challenge. Our dataset is created using a combination of automatic generation through predefined reasoning chains and human annotation. We also introduce a novel approach that leverages multiple retrieval tools, including text passage retrieval and symbolic language-assisted retrieval. Our model outperforms previous approaches by a significant margin, demonstrating its effectiveness in addressing the above-mentioned reasoning challenges. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.09843 [pdf]

CoCoFormer: A controllable feature-rich polyphonic music generation method

Authors: Jiuyang Zhou, Tengfei Niu, Hong Zhu, Xingping Wang

Abstract: This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music t… ▽ More This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music textures. This paper proposed Condition Choir Transformer (CoCoFormer) which controls the output of the model by controlling the chord and rhythm inputs at a fine-grained level. In this paper, the self-supervised method improves the loss function and performs joint training through conditional control input and unconditional input training. In order to alleviate the lack of diversity on generated samples caused by the teacher forcing training, this paper added an adversarial training method. CoCoFormer enhances model performance with explicit and implicit inputs to chords and rhythms. In this paper, the experiments proves that CoCoFormer has reached the current better level than current models. On the premise of specifying the polyphonic music texture, the same melody can also be generated in a variety of ways. △ Less

Submitted 27 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

arXiv:2309.03450 [pdf, other]

XGen-7B Technical Report

Authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong

Abstract: Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many t… ▽ More Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.05655 [pdf]

Attention-based 3D CNN with Multi-layer Features for Alzheimer's Disease Diagnosis using Brain Images

Authors: Yanteng Zhang, Qizhi Teng, Xiaohai He, Tong Niu, Lipei Zhang, Yan Liu, Chao Ren

Abstract: Structural MRI and PET imaging play an important role in the diagnosis of Alzheimer's disease (AD), showing the morphological changes and glucose metabolism changes in the brain respectively. The manifestations in the brain image of some cognitive impairment patients are relatively inconspicuous, for example, it still has difficulties in achieving accurate diagnosis through sMRI in clinical practi… ▽ More Structural MRI and PET imaging play an important role in the diagnosis of Alzheimer's disease (AD), showing the morphological changes and glucose metabolism changes in the brain respectively. The manifestations in the brain image of some cognitive impairment patients are relatively inconspicuous, for example, it still has difficulties in achieving accurate diagnosis through sMRI in clinical practice. With the emergence of deep learning, convolutional neural network (CNN) has become a valuable method in AD-aided diagnosis, but some CNN methods cannot effectively learn the features of brain image, making the diagnosis of AD still presents some challenges. In this work, we propose an end-to-end 3D CNN framework for AD diagnosis based on ResNet, which integrates multi-layer features obtained under the effect of the attention mechanism to better capture subtle differences in brain images. The attention maps showed our model can focus on key brain regions related to the disease diagnosis. Our method was verified in ablation experiments with two modality images on 792 subjects from the ADNI database, where AD diagnostic accuracies of 89.71% and 91.18% were achieved based on sMRI and PET respectively, and also outperformed some state-of-the-art methods. △ Less

Submitted 10 August, 2023; originally announced August 2023.

Comments: 4 pages, 4 figures

Journal ref: The 45th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2023

arXiv:2302.10574 [pdf, other]

MulGT: Multi-task Graph-Transformer with Task-aware Knowledge Injection and Domain Knowledge-driven Pooling for Whole Slide Image Analysis

Authors: Weiqin Zhao, Shujun Wang, Maximus Yeung, Tianye Niu, Lequan Yu

Abstract: Whole slide image (WSI) has been widely used to assist automated diagnosis under the deep learning fields. However, most previous works only discuss the SINGLE task setting which is not aligned with real clinical setting, where pathologists often conduct multiple diagnosis tasks simultaneously. Also, it is commonly recognized that the multi-task learning paradigm can improve learning efficiency by… ▽ More Whole slide image (WSI) has been widely used to assist automated diagnosis under the deep learning fields. However, most previous works only discuss the SINGLE task setting which is not aligned with real clinical setting, where pathologists often conduct multiple diagnosis tasks simultaneously. Also, it is commonly recognized that the multi-task learning paradigm can improve learning efficiency by exploiting commonalities and differences across multiple tasks. To this end, we present a novel multi-task framework (i.e., MulGT) for WSI analysis by the specially designed Graph-Transformer equipped with Task-aware Knowledge Injection and Domain Knowledge-driven Graph Pooling modules. Basically, with the Graph Neural Network and Transformer as the building commons, our framework is able to learn task-agnostic low-level local information as well as task-specific high-level global representation. Considering that different tasks in WSI analysis depend on different features and properties, we also design a novel Task-aware Knowledge Injection module to transfer the task-shared graph embedding into task-specific feature spaces to learn more accurate representation for different tasks. Further, we elaborately design a novel Domain Knowledge-driven Graph Pooling module for each task to improve both the accuracy and robustness of different tasks by leveraging different diagnosis patterns of multiple tasks. We evaluated our method on two public WSI datasets from TCGA projects, i.e., esophageal carcinoma and kidney carcinoma. Experimental results show that our method outperforms single-task counterparts and the state-of-theart methods on both tumor typing and staging tasks. △ Less

Submitted 30 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: AAAI 2023

arXiv:2301.10441 [pdf, other]

Learning Trustworthy Model from Noisy Labels based on Rough Set for Surface Defect Detection

Authors: Tongzhi Niu, Bin Li, Kai Li, Yufeng Lin, Yuwei Li, Weifeng Li, Zhenrong Wang

Abstract: In the surface defect detection, there are some suspicious regions that cannot be uniquely classified as abnormal or normal. The annotating of suspicious regions is easily affected by factors such as workers' emotional fluctuations and judgment standard, resulting in noisy labels, which in turn leads to missing and false detections, and ultimately leads to inconsistent judgments of product quality… ▽ More In the surface defect detection, there are some suspicious regions that cannot be uniquely classified as abnormal or normal. The annotating of suspicious regions is easily affected by factors such as workers' emotional fluctuations and judgment standard, resulting in noisy labels, which in turn leads to missing and false detections, and ultimately leads to inconsistent judgments of product quality. Unlike the usual noisy labels, the ones used for surface defect detection appear to be inconsistent rather than mislabeled. The noise occurs in almost every label and is difficult to correct or evaluate. In this paper, we proposed a framework that learns trustworthy models from noisy labels for surface defect defection. At first, to avoid the negative impact of noisy labels on the model, we represent the suspicious regions with consistent and precise elements at the pixel-level and redesign the loss function. Secondly, without changing network structure and adding any extra labels, pluggable spatially correlated Bayesian module is proposed. Finally, the defect discrimination confidence is proposed to measure the uncertainty, with which anomalies can be identified as defects. Our results indicate not only the effectiveness of the proposed method in learning from noisy labels, but also robustness and real-time performance. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 12 pages, 8figures

arXiv:2211.02256 [pdf]

ISA-Net: Improved spatial attention network for PET-CT tumor segmentation

Authors: Zhengyong Huang, Sijuan Zou, Guoshuai Wang, Zixiang Chen, Hao Shen, Haiyan Wang, Na Zhang, Lu Zhang, Fan Yang, Haining Wangg, Dong Liang, Tianye Niu, Xiaohua Zhuc, Zhanli Hua

Abstract: Achieving accurate and automated tumor segmentation plays an important role in both clinical practice and radiomics research. Segmentation in medicine is now often performed manually by experts, which is a laborious, expensive and error-prone task. Manual annotation relies heavily on the experience and knowledge of these experts. In addition, there is much intra- and interobserver variation. There… ▽ More Achieving accurate and automated tumor segmentation plays an important role in both clinical practice and radiomics research. Segmentation in medicine is now often performed manually by experts, which is a laborious, expensive and error-prone task. Manual annotation relies heavily on the experience and knowledge of these experts. In addition, there is much intra- and interobserver variation. Therefore, it is of great significance to develop a method that can automatically segment tumor target regions. In this paper, we propose a deep learning segmentation method based on multimodal positron emission tomography-computed tomography (PET-CT), which combines the high sensitivity of PET and the precise anatomical information of CT. We design an improved spatial attention network(ISA-Net) to increase the accuracy of PET or CT in detecting tumors, which uses multi-scale convolution operation to extract feature information and can highlight the tumor region location information and suppress the non-tumor region location information. In addition, our network uses dual-channel inputs in the coding stage and fuses them in the decoding stage, which can take advantage of the differences and complementarities between PET and CT. We validated the proposed ISA-Net method on two clinical datasets, a soft tissue sarcoma(STS) and a head and neck tumor(HECKTOR) dataset, and compared with other attention methods for tumor segmentation. The DSC score of 0.8378 on STS dataset and 0.8076 on HECKTOR dataset show that ISA-Net method achieves better segmentation performance and has better generalization. Conclusions: The method proposed in this paper is based on multi-modal medical image tumor segmentation, which can effectively utilize the difference and complementarity of different modes. The method can also be applied to other multi-modal data or single-modal data by proper adjustment. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2208.03879 [pdf, other]

Clear Memory-Augmented Auto-Encoder for Surface Defect Detection

Authors: Wei Luo, Tongzhi Niu, Lixin Tang, Wenyong Yu, Bin Li

Abstract: In surface defect detection, due to the extreme imbalance in the number of positive and negative samples, positive-samples-based anomaly detection methods have received more and more attention. Specifically, reconstruction-based methods are the most popular. However, existing methods are either difficult to repair abnormal foregrounds or reconstruct clear backgrounds. Therefore, we propose a clear… ▽ More In surface defect detection, due to the extreme imbalance in the number of positive and negative samples, positive-samples-based anomaly detection methods have received more and more attention. Specifically, reconstruction-based methods are the most popular. However, existing methods are either difficult to repair abnormal foregrounds or reconstruct clear backgrounds. Therefore, we propose a clear memory-augmented auto-encoder (CMA-AE). At first, we propose a novel clear memory-augmented module (CMAM), which combines the encoding and memoryencoding in a way of forgetting and inputting, thereby repairing abnormal foregrounds and preserving clear backgrounds. Secondly, a general artificial anomaly generation algorithm (GAAGA) is proposed to simulate anomalies that are as realistic and feature-rich as possible. At last, we propose a novel multi scale feature residual detection method (MSFR) for defect segmentation, which makes the defect location more accurate. Extensive comparison experiments demonstrate that CMA-AE achieves state-of-the-art detection accuracy and shows great potential in industrial applications. △ Less

Submitted 13 February, 2023; v1 submitted 7 August, 2022; originally announced August 2022.

Comments: 12 pages

arXiv:2207.10491 [pdf, ps, other]

doi 10.1016/j.ffa.2023.102187

More constructions of $n$-cycle permutations

Authors: Tailin Niu, Kangquan Li, Longjiang Qu, Bing Sun

Abstract: $n$-cycle permutations with small $n$ have the advantage that their compositional inverses are efficient in terms of implementation. They can be also used in constructing Bent functions and designing codes. Since the AGW Criterion was proposed, the permuting property of several forms of polynomials has been studied. In this paper, characterizations of several types of $n… ▽ More $n$-cycle permutations with small $n$ have the advantage that their compositional inverses are efficient in terms of implementation. They can be also used in constructing Bent functions and designing codes. Since the AGW Criterion was proposed, the permuting property of several forms of polynomials has been studied. In this paper, characterizations of several types of $n$-cycle permutations are investigated. Three criteria for $ n $-cycle permutations of the form $xh(λ(x))$, $ h(ψ(x)) \varphi(x)+g(ψ(x)) $ and $g\left( x^{q^i} -x +δ\right) +bx $ with general $n$ are provided. We demonstrate these criteria by providing explicit constructions. For the form of $x^rh(x^s)$, several new explicit triple-cycle permutations are also provided. Finally, we also consider triple-cycle permutations of the form $x^t + c\rm Tr_{q^m/q}(x^s)$ and provide one explicit construction. Many of our constructions are both new in the $n$-cycle property and the permutation property. △ Less

Submitted 5 August, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 20 pages. Typos were corrected and marked in blue. arXiv admin note: text overlap with arXiv:2007.14865 by other authors

MSC Class: 11T06

arXiv:2207.08401 [pdf]

Marvista: Exploring the Design of a Human-AI Collaborative News Reading Tool

Authors: Xiang 'Anthony' Chen, Chien-Sheng Wu, Lidiya Murakhovs'ka, Philippe Laban, Tong Niu, Wenhao Liu, Caiming Xiong

Abstract: We explore the design of Marvista -- a human-AI collaborative tool that employs a suite of natural language processing models to provide end-to-end support for reading online news articles. Before reading an article, Marvista helps a user plan what to read by filtering text based on how much time one can spend and what questions one is interested to find out from the article. During reading, Marvi… ▽ More We explore the design of Marvista -- a human-AI collaborative tool that employs a suite of natural language processing models to provide end-to-end support for reading online news articles. Before reading an article, Marvista helps a user plan what to read by filtering text based on how much time one can spend and what questions one is interested to find out from the article. During reading, Marvista helps the user reflect on their understanding of each paragraph with AI-generated questions. After reading, Marvista generates an explainable human-AI summary that combines both AI's processing of the text, the user's reading behavior, and user-generated data in the reading process. In contrast to prior work that offered (content-independent) interaction techniques or devices for reading, Marvista takes a human-AI collaborative approach that contributes text-specific guidance (content-aware) to support the entire reading process. △ Less

Submitted 23 June, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.08224 [pdf, other]

Multi scale Feature Extraction and Fusion for Online Knowledge Distillation

Authors: Panpan Zou, Yinglei Teng, Tao Niu

Abstract: Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distilla… ▽ More Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distillation, which comprises three key components: Multi-scale Feature Extraction, Dual-attention and Feature Fusion, towards generating more informative feature maps for distillation. The multiscale feature extraction exploiting divide-and-concatenate in channel dimension is proposed to improve the multi-scale representation ability of feature maps. To obtain more accurate information, we design a dual-attention to strengthen the important channel and spatial regions adaptively. Moreover, we aggregate and fuse the former processed feature maps via feature fusion to assist the training of student models. Extensive experiments on CIF AR-10, CIF AR-100, and CINIC-10 show that MFEF transfers more beneficial representational knowledge for distillation and outperforms alternative methods among various network architectures △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: 12 pages, 3 figures

arXiv:2206.08186 [pdf, other]

Asymptotic Soft Cluster Pruning for Deep Neural Networks

Authors: Tao Niu, Yinglei Teng, Panpan Zou

Abstract: Filter pruning method introduces structural sparsity by removing selected filters and is thus particularly effective for reducing complexity. Previous works empirically prune networks from the point of view that filter with smaller norm contributes less to the final results. However, such criteria has been proven sensitive to the distribution of filters, and the accuracy may hard to recover since… ▽ More Filter pruning method introduces structural sparsity by removing selected filters and is thus particularly effective for reducing complexity. Previous works empirically prune networks from the point of view that filter with smaller norm contributes less to the final results. However, such criteria has been proven sensitive to the distribution of filters, and the accuracy may hard to recover since the capacity gap is fixed once pruned. In this paper, we propose a novel filter pruning method called Asymptotic Soft Cluster Pruning (ASCP), to identify the redundancy of network based on the similarity of filters. Each filter from over-parameterized network is first distinguished by clustering, and then reconstructed to manually introduce redundancy into it. Several guidelines of clustering are proposed to better preserve feature extraction ability. After reconstruction, filters are allowed to be updated to eliminate the effect caused by mistakenly selected. Besides, various decaying strategies of the pruning rate are adopted to stabilize the pruning process and improve the final performance as well. By gradually generating more identical filters within each cluster, ASCP can remove them through channel addition operation with almost no accuracy drop. Extensive experiments on CIFAR-10 and ImageNet datasets show that our method can achieve competitive results compared with many state-of-the-art algorithms. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2205.08605 [pdf, other]

OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval

Authors: Tong Niu, Kazuma Hashimoto, Yingbo Zhou, Caiming Xiong

Abstract: Aligning parallel sentences in multilingual corpora is essential to curating data for downstream applications such as Machine Translation. In this work, we present OneAligner, an alignment model specially designed for sentence retrieval tasks. This model is able to train on only one language pair and transfers, in a cross-lingual fashion, to low-resource language pairs with negligible degradation… ▽ More Aligning parallel sentences in multilingual corpora is essential to curating data for downstream applications such as Machine Translation. In this work, we present OneAligner, an alignment model specially designed for sentence retrieval tasks. This model is able to train on only one language pair and transfers, in a cross-lingual fashion, to low-resource language pairs with negligible degradation in performance. When trained with all language pairs of a large-scale parallel multilingual corpus (OPUS-100), this model achieves the state-of-the-art result on the Tateoba dataset, outperforming an equally-sized previous model by 8.0 points in accuracy while using less than 0.6% of their parallel data. When finetuned on a single rich-resource language pair, be it English-centered or not, our model is able to match the performance of the ones finetuned on all language pairs under the same data budget with less than 2.0 points decrease in accuracy. Furthermore, with the same setup, scaling up the number of rich-resource language pairs monotonically improves the performance, reaching a minimum of 0.4 points discrepancy in accuracy, making it less mandatory to collect any low-resource parallel data. Finally, we conclude through empirical results and analyses that the performance of the sentence alignment task depends mostly on the monolingual and parallel data size, up to a certain size threshold, rather than on what language pairs are used for training or evaluation. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: Accepted to Findings of ACL 2022

arXiv:2201.10290 [pdf, ps, other]

doi 10.1016/j.ffa.2022.102126

Characterizations and constructions of n-to-1 mappings over finite fields

Authors: Tailin Niu, Kangquan Li, Longjiang Qu, Chao Li

Abstract: $n$-to-$1$ mappings have wide applications in many areas, especially in cryptography, finite geometry, coding theory and combinatorial design. In this paper, many classes of $n$-to-$1$ mappings over finite fields are studied. First, we provide a characterization of general $n$-to-$1$ mappings over $\mathbb{F}_{p^m}$ by means of the Walsh transform. Then, we completely determine $3$-to-$1… ▽ More $n$-to-$1$ mappings have wide applications in many areas, especially in cryptography, finite geometry, coding theory and combinatorial design. In this paper, many classes of $n$-to-$1$ mappings over finite fields are studied. First, we provide a characterization of general $n$-to-$1$ mappings over $\mathbb{F}_{p^m}$ by means of the Walsh transform. Then, we completely determine $3$-to-$1$ polynomials with degree no more than $4$ over $\mathbb{F}_{p^{m}}$. Furthermore, we obtain an AGW-like criterion for characterizing an equivalent relationship between the $n$-to-$1$ property of a mapping over finite set $A$ and that of another mapping over a subset of $A$. Finally, we apply the AGW-like criterion into several forms of polynomials and obtain some explicit $n$-to-$1$ mappings. Especially, three explicit constructions of the form $x^rh\left( x^s \right) $ from the cyclotomic perspective, and several classes of $n$-to-$1$ mappings of the form $ g\left( x^{q^k} -x +δ\right) +cx$ are provided. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 23 pages, submitted to a journel

MSC Class: 11T06

arXiv:2201.02968 [pdf, other]

An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic

Authors: Tao Niu, Yinglei Teng, Zhu Han, Panpan Zou

Abstract: Recently, the applications of deep neural network (DNN) have been very prominent in many fields such as computer vision (CV) and natural language processing (NLP) due to its superior feature extraction performance. However, the high-dimension parameter model and large-scale mathematical calculation restrict the execution efficiency, especially for Internet of Things (IoT) devices. Different from t… ▽ More Recently, the applications of deep neural network (DNN) have been very prominent in many fields such as computer vision (CV) and natural language processing (NLP) due to its superior feature extraction performance. However, the high-dimension parameter model and large-scale mathematical calculation restrict the execution efficiency, especially for Internet of Things (IoT) devices. Different from the previous cloud/edge-only pattern that brings huge pressure for uplink communication and device-only fashion that undertakes unaffordable calculation strength, we highlight the collaborative computation between the device and edge for DNN models, which can achieve a good balance between the communication load and execution accuracy. Specifically, a systematic on-demand co-inference framework is proposed to exploit the multi-branch structure, in which the pre-trained Alexnet is right-sized through \emph{early-exit} and partitioned at an intermediate DNN layer. The integer quantization is enforced to further compress transmission bits. As a result, we establish a new Deep Reinforcement Learning (DRL) optimizer-Soft Actor Critic for discrete (SAC-d), which generates the \emph{exit point}, \emph{partition point}, and \emph{compressing bits} by soft policy iterations. Based on the latency and accuracy aware reward design, such an optimizer can well adapt to the complex environment like dynamic wireless channel and arbitrary CPU processing, and is capable of supporting the 5G URLLC. Real-world experiment on Raspberry Pi 4 and PC shows the outperformance of the proposed solution. △ Less

Submitted 9 January, 2022; originally announced January 2022.

arXiv:2110.08175 [pdf, other]

MixQG: Neural Question Generation with Mixed Answer Types

Authors: Lidiya Murakhovs'ka, Chien-Sheng Wu, Philippe Laban, Tong Niu, Wenhao Liu, Caiming Xiong

Abstract: Asking good questions is an essential ability for both human and machine intelligence. However, existing neural question generation approaches mainly focus on the short factoid type of answers. In this paper, we propose a neural question generator, MixQG, to bridge this gap. We combine 9 question answering datasets with diverse answer types, including yes/no, multiple-choice, extractive, and abstr… ▽ More Asking good questions is an essential ability for both human and machine intelligence. However, existing neural question generation approaches mainly focus on the short factoid type of answers. In this paper, we propose a neural question generator, MixQG, to bridge this gap. We combine 9 question answering datasets with diverse answer types, including yes/no, multiple-choice, extractive, and abstractive answers, to train a single generative model. We show with empirical results that our model outperforms existing work in both seen and unseen domains and can generate questions with different cognitive levels when conditioned on different answer types. Our code is released and well-integrated with the Huggingface library to facilitate various downstream applications. △ Less

Submitted 31 May, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: camera-ready version

arXiv:2010.12885 [pdf, other]

Unsupervised Paraphrasing with Pretrained Language Models

Authors: Tong Niu, Semih Yavuz, Yingbo Zhou, Nitish Shirish Keskar, Huan Wang, Caiming Xiong

Abstract: Paraphrase generation has benefited extensively from recent progress in the designing of training objectives and model architectures. However, previous explorations have largely focused on supervised methods, which require a large amount of labeled data that is costly to collect. To address this drawback, we adopt a transfer learning approach and propose a training pipeline that enables pre-traine… ▽ More Paraphrase generation has benefited extensively from recent progress in the designing of training objectives and model architectures. However, previous explorations have largely focused on supervised methods, which require a large amount of labeled data that is costly to collect. To address this drawback, we adopt a transfer learning approach and propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking (DB). To enforce a surface form dissimilar from the input, whenever the language model emits a token contained in the source sequence, DB prevents the model from outputting the subsequent source token for the next generation step. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair (QQP) and the ParaNMT datasets and is robust to domain shift between the two datasets of distinct distributions. We also demonstrate that our model transfers to paraphrasing in other languages without any additional finetuning. △ Less

Submitted 10 September, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: Accepted at EMNLP 2021 main conference

arXiv:2010.12850 [pdf, other]

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

Authors: Shiyang Li, Semih Yavuz, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen Rajani, Xifeng Yan, Yingbo Zhou, Caiming Xiong

Abstract: Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood. We propose controllable counterfactuals (CoCo) to bridge this gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the u… ▽ More Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood. We propose controllable counterfactuals (CoCo) to bridge this gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the user responded differently but still consistently with the dialogue flow? CoCo leverages turn-level belief states as counterfactual conditionals to produce novel conversation scenarios in two steps: (i) counterfactual goal generation at turn-level by dropping and adding slots followed by replacing slot values, (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow. Evaluating state-of-the-art DST models on MultiWOZ dataset with CoCo-generated counterfactuals results in a significant performance drop of up to 30.8% (from 49.4% to 18.6%) in absolute joint goal accuracy. In comparison, widely used techniques like paraphrasing only affect the accuracy by at most 2%. Human evaluations show that COCO-generated conversations perfectly reflect the underlying user goal with more than 95% accuracy and are as human-like as the original conversations, further strengthening its reliability and promise to be adopted as part of the robustness evaluation of DST models. △ Less

Submitted 26 March, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: ICLR 2021

arXiv:2010.12730 [pdf, other]

Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality

Authors: Gustavo Aguilar, Bryan McCann, Tong Niu, Nazneen Rajani, Nitish Keskar, Thamar Solorio

Abstract: Byte-pair encoding (BPE) is a ubiquitous algorithm in the subword tokenization process of language models as it provides multiple benefits. However, this process is solely based on pre-training data statistics, making it hard for the tokenizer to handle infrequent spellings. On the other hand, though robust to misspellings, pure character-level models often lead to unreasonably long sequences and… ▽ More Byte-pair encoding (BPE) is a ubiquitous algorithm in the subword tokenization process of language models as it provides multiple benefits. However, this process is solely based on pre-training data statistics, making it hard for the tokenizer to handle infrequent spellings. On the other hand, though robust to misspellings, pure character-level models often lead to unreasonably long sequences and make it harder for the model to learn meaningful words. To alleviate these challenges, we propose a character-based subword module (char2subword) that learns the subword embedding table in pre-trained models like BERT. Our char2subword module builds representations from characters out of the subword vocabulary, and it can be used as a drop-in replacement of the subword embedding table. The module is robust to character-level alterations such as misspellings, word inflection, casing, and punctuation. We integrate it further with BERT through pre-training while keeping BERT transformer parameters fixed--and thus, providing a practical method. Finally, we show that incorporating our module to mBERT significantly improves the performance on the social media linguistic code-switching evaluation (LinCE) benchmark. △ Less

Submitted 23 September, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: Findings of EMNLP 2020

arXiv:2010.09030 [pdf, other]

Explaining and Improving Model Behavior with k Nearest Neighbor Representations

Authors: Nazneen Fatema Rajani, Ben Krause, Wengpeng Yin, Tong Niu, Richard Socher, Caiming Xiong

Abstract: Interpretability techniques in NLP have mainly focused on understanding individual predictions using attention visualization or gradient-based saliency maps over tokens. We propose using k nearest neighbor (kNN) representations to identify training examples responsible for a model's predictions and obtain a corpus-level understanding of the model's behavior. Apart from interpretability, we show th… ▽ More Interpretability techniques in NLP have mainly focused on understanding individual predictions using attention visualization or gradient-based saliency maps over tokens. We propose using k nearest neighbor (kNN) representations to identify training examples responsible for a model's predictions and obtain a corpus-level understanding of the model's behavior. Apart from interpretability, we show that kNN representations are effective at uncovering learned spurious associations, identifying mislabeled examples, and improving the fine-tuned model's performance. We focus on Natural Language Inference (NLI) as a case study and experiment with multiple datasets. Our method deploys backoff to kNN for BERT and RoBERTa on examples with low model confidence without any update to the model parameters. Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2004.12552 [pdf, ps, other]

doi 10.1109/TIT.2021.3089145

Finding compositional inverses of permutations from the AGW criterion

Authors: Tailin Niu, Kangquan Li, Longjiang Qu, Qiang Wang

Abstract: Permutation polynomials and their compositional inverses have wide applications in cryptography, coding theory, and combinatorial designs. Motivated by several previous results on finding compositional inverses of permutation polynomials of different forms, we propose a general method for finding these inverses of permutation polynomials constructed by the AGW criterion. As a result, we have reduc… ▽ More Permutation polynomials and their compositional inverses have wide applications in cryptography, coding theory, and combinatorial designs. Motivated by several previous results on finding compositional inverses of permutation polynomials of different forms, we propose a general method for finding these inverses of permutation polynomials constructed by the AGW criterion. As a result, we have reduced the problem of finding the compositional inverse of such a permutation polynomial over a finite field to that of finding the inverse of a bijection over a smaller set. We demonstrate our method by interpreting several recent known results, as well as by providing new explicit results on more classes of permutation polynomials in different types. In addition, we give new criteria for these permutation polynomials being involutions. Explicit constructions are also provided for all involutory criteria. △ Less

Submitted 1 April, 2021; v1 submitted 26 April, 2020; originally announced April 2020.

Comments: 24 pages. Revision submitted

MSC Class: 11T06

arXiv:2001.05467 [pdf, other]

AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses

Authors: Tong Niu, Mohit Bansal

Abstract: Many sequence-to-sequence dialogue models tend to generate safe, uninformative responses. There have been various useful efforts on trying to eliminate them. However, these approaches either improve decoding algorithms during inference, rely on hand-crafted features, or employ complex models. In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull wit… ▽ More Many sequence-to-sequence dialogue models tend to generate safe, uninformative responses. There have been various useful efforts on trying to eliminate them. However, these approaches either improve decoding algorithms during inference, rely on hand-crafted features, or employ complex models. In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. Specifically, we start with a simple yet effective automatic metric, AvgOut, which calculates the average output probability distribution of all time steps on the decoder side during training. This metric directly estimates which tokens are more likely to be generated, thus making it a faithful evaluation of the model diversity (i.e., for diverse models, the token probabilities should be more evenly distributed rather than peaked at a few dull tokens). We then leverage this novel metric to propose three models that promote diversity without losing relevance. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch; the second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level; the third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal. Moreover, we experiment with a hybrid model by combining the loss terms of MinAvgOut and RL. All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). Moreover, our approaches are orthogonal to the base model, making them applicable as an add-on to other emerging better dialogue models in the future. △ Less

Submitted 15 January, 2020; originally announced January 2020.

Comments: AAAI 2020 (8 pages)

arXiv:1909.12868 [pdf, other]

Automatically Learning Data Augmentation Policies for Dialogue Tasks

Authors: Tong Niu, Mohit Bansal

Abstract: Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches for optimal perturbation policies via a controller trained using performance rewards of a sampled policy on the target task, hence reducing data-level model bias. While being a powerful algorithm, their work has focused on computer vision tasks, where it is comparatively easy to apply imperceptible perturbations without changi… ▽ More Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches for optimal perturbation policies via a controller trained using performance rewards of a sampled policy on the target task, hence reducing data-level model bias. While being a powerful algorithm, their work has focused on computer vision tasks, where it is comparatively easy to apply imperceptible perturbations without changing an image's semantic meaning. In our work, we adapt AutoAugment to automatically discover effective perturbation policies for natural language processing (NLP) tasks such as dialogue generation. We start with a pool of atomic operations that apply subtle semantic-preserving perturbations to the source inputs of a dialogue task (e.g., different POS-tag types of stopword dropout, grammatical errors, and paraphrasing). Next, we allow the controller to learn more complex augmentation policies by searching over the space of the various combinations of these atomic operations. Moreover, we also explore conditioning the controller on the source inputs of the target task, since certain strategies may not apply to inputs that do not contain that strategy's required linguistic features. Empirically, we demonstrate that both our input-agnostic and input-aware controllers discover useful data augmentation policies, and achieve significant improvements over the previous state-of-the-art, including trained on manually-designed policies. △ Less

Submitted 27 September, 2019; originally announced September 2019.

Comments: 7 pages (EMNLP 2019)

arXiv:1909.03223 [pdf, other]

Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression

Authors: Tong Niu, Caiming Xiong, Richard Socher

Abstract: Text compression has diverse applications such as Summarization, Reading Comprehension and Text Editing. However, almost all existing approaches require either hand-crafted features, syntactic labels or parallel data. Even for one that achieves this task in an unsupervised setting, its architecture necessitates a task-specific autoencoder. Moreover, these models only generate one compressed senten… ▽ More Text compression has diverse applications such as Summarization, Reading Comprehension and Text Editing. However, almost all existing approaches require either hand-crafted features, syntactic labels or parallel data. Even for one that achieves this task in an unsupervised setting, its architecture necessitates a task-specific autoencoder. Moreover, these models only generate one compressed sentence for each source input, so that adapting to different style requirements (e.g. length) for the final output usually implies retraining the model from scratch. In this work, we propose a fully unsupervised model, Deleter, that is able to discover an "optimal deletion path" for an arbitrary sentence, where each intermediate sequence along the path is a coherent subsequence of the previous one. This approach relies exclusively on a pretrained bidirectional language model (BERT) to score each candidate deletion based on the average Perplexity of the resulting sentence and performs progressive greedy lookahead search to select the best deletion for each step. We apply Deleter to the task of extractive Sentence Compression, and found that our model is competitive with state-of-the-art supervised models trained on 1.02 million in-domain examples with similar compression ratio. Qualitative analysis, as well as automatic and human evaluations both verify that our model produces high-quality compression. △ Less

Submitted 7 September, 2019; originally announced September 2019.

Comments: 5 pages, 1 figure (presented @ WeCNLP)

arXiv:1905.13550 [pdf]

A novel hybrid model based on multi-objective Harris hawks optimization algorithm for daily PM2.5 and PM10 forecasting

Authors: Pei Du, Jianzhou Wang, Yan Hao, Tong Niu, Wendong Yang

Abstract: High levels of air pollution may seriously affect people's living environment and even endanger their lives. In order to reduce air pollution concentrations, and warn the public before the occurrence of hazardous air pollutants, it is urgent to design an accurate and reliable air pollutant forecasting model. However, most previous research have many deficiencies, such as ignoring the importance of… ▽ More High levels of air pollution may seriously affect people's living environment and even endanger their lives. In order to reduce air pollution concentrations, and warn the public before the occurrence of hazardous air pollutants, it is urgent to design an accurate and reliable air pollutant forecasting model. However, most previous research have many deficiencies, such as ignoring the importance of predictive stability, and poor initial parameters and so on, which have significantly effect on the performance of air pollution prediction. Therefore, to address these issues, a novel hybrid model is proposed in this study. Specifically, a powerful data preprocessing techniques is applied to decompose the original time series into different modes from low- frequency to high- frequency. Next, a new multi-objective algorithm called MOHHO is first developed in this study, which are introduced to tune the parameters of ELM model with high forecasting accuracy and stability for air pollution series prediction, simultaneously. And the optimized ELM model is used to perform the time series prediction. Finally, a scientific and robust evaluation system including several error criteria, benchmark models, and several experiments using six air pollutant concentrations time series from three cities in China is designed to perform a compressive assessment for the presented hybrid forecasting model. Experimental results indicate that the proposed hybrid model can guarantee a more stable and higher predictive performance compared to others, whose superior prediction ability may help to develop effective plans for air pollutant emissions and prevent health problems caused by air pollution. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: 24 pages, 4 figures

MSC Class: 68U20

arXiv:1809.02079 [pdf, ps, other]

Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models

Authors: Tong Niu, Mohit Bansal

Abstract: We present two categories of model-agnostic adversarial strategies that reveal the weaknesses of several generative, task-oriented dialogue models: Should-Not-Change strategies that evaluate over-sensitivity to small and semantics-preserving edits, as well as Should-Change strategies that test if a model is over-stable against subtle yet semantics-changing modifications. We next perform adversaria… ▽ More We present two categories of model-agnostic adversarial strategies that reveal the weaknesses of several generative, task-oriented dialogue models: Should-Not-Change strategies that evaluate over-sensitivity to small and semantics-preserving edits, as well as Should-Change strategies that test if a model is over-stable against subtle yet semantics-changing modifications. We next perform adversarial training with each strategy, employing a max-margin approach for negative generative examples. This not only makes the target dialogue model more robust to the adversarial inputs, but also helps it perform significantly better on the original inputs. Moreover, training on all strategies combined achieves further improvements, achieving a new state-of-the-art performance on the original task (also verified via human evaluation). In addition to adversarial training, we also address the robustness task at the model-level, by feeding it subword units as both inputs and outputs, and show that the resulting model is equally competitive, requires only 1/4 of the original vocabulary size, and is robust to one of the adversarial strategies (to which the original model is vulnerable) even without adversarial training. △ Less

Submitted 6 September, 2018; originally announced September 2018.

Comments: CoNLL 2018 (15 pages)

arXiv:1805.03162 [pdf, other]

Polite Dialogue Generation Without Parallel Data

Authors: Tong Niu, Mohit Bansal

Abstract: Stylistic dialogue response generation, with valuable applications in personality-based conversational agents, is a challenging task because the response needs to be fluent, contextually-relevant, as well as paralinguistically accurate. Moreover, parallel datasets for regular-to-stylistic pairs are usually unavailable. We present three weakly-supervised models that can generate diverse polite (or… ▽ More Stylistic dialogue response generation, with valuable applications in personality-based conversational agents, is a challenging task because the response needs to be fluent, contextually-relevant, as well as paralinguistically accurate. Moreover, parallel datasets for regular-to-stylistic pairs are usually unavailable. We present three weakly-supervised models that can generate diverse polite (or rude) dialogue responses without parallel data. Our late fusion model (Fusion) merges the decoder of an encoder-attention-decoder dialogue model with a language model trained on stand-alone polite utterances. Our label-fine-tuning (LFT) model prepends to each source sequence a politeness-score scaled label (predicted by our state-of-the-art politeness classifier) during training, and at test time is able to generate polite, neutral, and rude responses by simply scaling the label embedding by the corresponding score. Our reinforcement learning model (Polite-RL) encourages politeness generation by assigning rewards proportional to the politeness classifier score of the sampled response. We also present two retrieval-based polite dialogue model baselines. Human evaluation validates that while the Fusion and the retrieval-based models achieve politeness with poorer context-relevance, the LFT and Polite-RL models can produce significantly more polite responses without sacrificing dialogue quality. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: To Appear in TACL Journal (16 pages) (first submission cycle: Oct1 2017)

arXiv:1804.06440 [pdf, other]

Detecting Linguistic Characteristics of Alzheimer's Dementia by Interpreting Neural Models

Authors: Sweta Karlekar, Tong Niu, Mohit Bansal

Abstract: Alzheimer's disease (AD) is an irreversible and progressive brain disease that can be stopped or slowed down with medical treatment. Language changes serve as a sign that a patient's cognitive functions have been impacted, potentially leading to early diagnosis. In this work, we use NLP techniques to classify and analyze the linguistic characteristics of AD patients using the DementiaBank dataset.… ▽ More Alzheimer's disease (AD) is an irreversible and progressive brain disease that can be stopped or slowed down with medical treatment. Language changes serve as a sign that a patient's cognitive functions have been impacted, potentially leading to early diagnosis. In this work, we use NLP techniques to classify and analyze the linguistic characteristics of AD patients using the DementiaBank dataset. We apply three neural models based on CNNs, LSTM-RNNs, and their combination, to distinguish between language samples from AD and control patients. We achieve a new independent benchmark accuracy for the AD classification task. More importantly, we next interpret what these neural models have learned about the linguistic characteristics of AD patients, via analysis based on activation clustering and first-derivative saliency techniques. We then perform novel automatic pattern discovery inside activation clusters, and consolidate AD patients' distinctive grammar patterns. Additionally, we show that first derivative saliency can not only rediscover previous language patterns of AD patients, but also shed light on the limitations of neural models. Lastly, we also include analysis of gender-separated AD data. △ Less

Submitted 17 April, 2018; originally announced April 2018.

Comments: NAACL 2018 (7 pages)

arXiv:1212.5956 [pdf, ps, other]

Interoperability and Standardization of Intercloud Cloud Computing

Authors: Jingxin K. Wang, Jianrui Ding, Tian Niu

Abstract: Cloud computing is getting mature, and the interoperability and standardization of the clouds is still waiting to be solved. This paper discussed the interoperability among clouds about message transmission, data transmission and virtual machine transfer. Starting from IEEE Pioneering Cloud Computing Initiative, this paper discussed about standardization of the cloud computing, especially interclo… ▽ More Cloud computing is getting mature, and the interoperability and standardization of the clouds is still waiting to be solved. This paper discussed the interoperability among clouds about message transmission, data transmission and virtual machine transfer. Starting from IEEE Pioneering Cloud Computing Initiative, this paper discussed about standardization of the cloud computing, especially intercloud cloud computing. This paper also discussed the standardization from the market-oriented view. △ Less

Submitted 24 December, 2012; originally announced December 2012.

Showing 1–33 of 33 results for author: Niu, T