Search | arXiv e-print repository

DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation

Authors: Yaowen Bi, Yuteng Lian, Jie Cui, Jun Liu, Peijian Wang, Guanghui Li, Xuejun Chen, Jinglin Zhao, Hao Wen, Jing Zhang, Zhaoqi Zhang, Wenzhuo Song, Yang Sun, Weiwei Zhang, Mingchen Cai, Guanxing Zhang

Abstract: Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis acro… ▽ More Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis across various tasks in MTL, we have observed an interesting divergence phenomenon that the same feature can have significantly different importance across different tasks in MTL. To address these issues, we propose Deep Multiple Task-specific Feature Interactions Network (DTN) with a novel model structure design. DTN introduces multiple diversified task-specific feature interaction methods and task-sensitive network in MTL networks, enabling the model to learn task-specific diversified feature interaction representations, which improves the efficiency of joint representation learning in a general setup. We applied DTN to our company's real-world E-commerce recommendation dataset, which consisted of over 6.3 billion samples, the results demonstrated that DTN significantly outperformed state-of-the-art MTL models. Moreover, during online evaluation of DTN in a large-scale E-commerce recommender system, we observed a 3.28% in clicks, a 3.10% increase in orders and a 2.70% increase in GMV (Gross Merchandise Value) compared to the state-of-the-art MTL models. Finally, extensive offline experiments conducted on public benchmark datasets demonstrate that DTN can be applied to various scenarios beyond recommendations, enhancing the performance of ranking models. △ Less

Submitted 23 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

arXiv:2407.13999 [pdf, other]

NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication

Authors: Yuchen Lian, Tessa Verhoef, Arianna Bisazza

Abstract: Recent advances in computational linguistics include simulating the emergence of human-like languages with interacting neural network agents, starting from sets of random symbols. The recently introduced NeLLCom framework (Lian et al., 2023) allows agents to first learn an artificial language and then use it to communicate, with the aim of studying the emergence of specific linguistics properties.… ▽ More Recent advances in computational linguistics include simulating the emergence of human-like languages with interacting neural network agents, starting from sets of random symbols. The recently introduced NeLLCom framework (Lian et al., 2023) allows agents to first learn an artificial language and then use it to communicate, with the aim of studying the emergence of specific linguistics properties. We extend this framework (NeLLCom-X) by introducing more realistic role-alternating agents and group communication in order to investigate the interplay between language learnability, communication pressures, and group size effects. We validate NeLLCom-X by replicating key findings from prior research simulating the emergence of a word-order/case-marking trade-off. Next, we investigate how interaction affects linguistic convergence and emergence of the trade-off. The novel framework facilitates future simulations of diverse linguistic aspects, emphasizing the importance of interaction and group dynamics in language evolution. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2402.13035 [pdf, other]

Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models

Authors: Che Zhang, Zhenyang Xiao, Chengcheng Han, Yixin Lian, Yuejian Fang

Abstract: Self-correction has achieved impressive results in enhancing the style and security of the generated output from large language models (LLMs). However, recent studies suggest that self-correction might be limited or even counterproductive in reasoning tasks due to LLMs' difficulties in identifying logical mistakes. In this paper, we aim to enhance the self-checking capabilities of LLMs by constr… ▽ More Self-correction has achieved impressive results in enhancing the style and security of the generated output from large language models (LLMs). However, recent studies suggest that self-correction might be limited or even counterproductive in reasoning tasks due to LLMs' difficulties in identifying logical mistakes. In this paper, we aim to enhance the self-checking capabilities of LLMs by constructing training data for checking tasks. Specifically, we apply the Chain of Thought (CoT) methodology to self-checking tasks, utilizing fine-grained step-level analyses and explanations to assess the correctness of reasoning paths. We propose a specialized checking format called "Step CoT Check". Following this format, we construct a checking-correction dataset that includes detailed step-by-step analysis and checking. Then we fine-tune LLMs to enhance their error detection and correction abilities. Our experiments demonstrate that fine-tuning with the "Step CoT Check" format significantly improves the self-checking and self-correction abilities of LLMs across multiple benchmarks. This approach outperforms other formats, especially in locating the incorrect position, with greater benefits observed in larger models. For reproducibility, all the datasets and code are provided in https://github.com/bammt/Learn-to-check. △ Less

Submitted 17 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.11903 [pdf, other]

DiLA: Enhancing LLM Tool Learning with Differential Logic Layer

Authors: Yu Zhang, Hui-Ling Zhen, Zehua Pei, Yingzhao Lian, Lihao Yin, Mingxuan Yuan, Bei Yu

Abstract: Considering the challenges faced by large language models (LLMs) in logical reasoning and planning, prior efforts have sought to augment LLMs with access to external solvers. While progress has been made on simple reasoning problems, solving classical constraint satisfaction problems, such as the Boolean Satisfiability Problem (SAT) and Graph Coloring Problem (GCP), remains difficult for off-the-s… ▽ More Considering the challenges faced by large language models (LLMs) in logical reasoning and planning, prior efforts have sought to augment LLMs with access to external solvers. While progress has been made on simple reasoning problems, solving classical constraint satisfaction problems, such as the Boolean Satisfiability Problem (SAT) and Graph Coloring Problem (GCP), remains difficult for off-the-shelf solvers due to their intricate expressions and exponential search spaces. In this paper, we propose a novel differential logic layer-aided language modeling (DiLA) approach, where logical constraints are integrated into the forward and backward passes of a network layer, to provide another option for LLM tool learning. In DiLA, LLM aims to transform the language description to logic constraints and identify initial solutions of the highest quality, while the differential logic layer focuses on iteratively refining the LLM-prompted solution. Leveraging the logic layer as a bridge, DiLA enhances the logical reasoning ability of LLMs on a range of reasoning problems encoded by Boolean variables, guaranteeing the efficiency and correctness of the solution process. We evaluate the performance of DiLA on two classic reasoning problems and empirically demonstrate its consistent outperformance against existing prompt-based and solver-aided approaches. △ Less

Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2305.12295 by other authors

arXiv:2311.16442 [pdf, other]

Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization

Authors: Jinhao Li, Jiaming Xu, Shiyao Li, Shan Huang, Jun Liu, Yaoxiu Lian, Guohao Dai

Abstract: Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. Many previous studies exploit quantization methods to reduce LLM inference cost by reducing latency and memory consumption. Applying 2-bit single-precision weight quantization brings >3% accuracy loss, so the state-of-the-art methods use mixed-precision methods for LLMs (e.… ▽ More Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. Many previous studies exploit quantization methods to reduce LLM inference cost by reducing latency and memory consumption. Applying 2-bit single-precision weight quantization brings >3% accuracy loss, so the state-of-the-art methods use mixed-precision methods for LLMs (e.g. Llama2-7b, etc.) to improve the accuracy. However, challenges still exist: (1) Uneven distribution in weight matrix. (2) Large speed degradation by adding sparse outliers. (3) Time-consuming dequantization operations on GPUs. To tackle these challenges and enable fast and efficient LLM inference on GPUs, we propose the following techniques in this paper. (1) Intra-weight mixed-precision quantization. (2) Exclusive 2-bit sparse outlier with minimum speed degradation. (3) Asynchronous dequantization. We conduct extensive experiments on different model families (e.g. Llama3, etc.) and model sizes. We achieve 2.91-bit for each weight considering all scales/zeros for different models with negligible loss. As a result, with our 2/4/16 mixed-precision quantization for each weight matrix and asynchronous dequantization during inference, our design achieves an end-to-end speedup for Llama2-7b is 1.74x over the original model, and we reduce both runtime cost and total cost by up to 2.53x and 2.29x with less GPU requirements. △ Less

Submitted 1 July, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.05074 [pdf, other]

DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

Authors: Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao, Baoyuan Wang

Abstract: Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective or even detrimental when applied to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. To address this limitation, we introduce Dialogue-guided Chain-of-Thought (Dial… ▽ More Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective or even detrimental when applied to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. To address this limitation, we introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer. Additionally, we optimize the model's reasoning path selection using the Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning capabilities. Our method offers several advantages compared to previous approaches. Firstly, we transform the process of solving complex reasoning questions by breaking them down into a series of simpler sub-questions, significantly reducing the task difficulty and making it more suitable for SLMs. Secondly, we optimize the model's reasoning path selection through the PPO algorithm. We conduct comprehensive experiments on four arithmetic reasoning datasets, demonstrating that our method achieves significant performance improvements compared to state-of-the-art competitors. △ Less

Submitted 23 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023

arXiv:2310.02629 [pdf, other]

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

Authors: Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

Abstract: Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these dr… ▽ More Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55\% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset. △ Less

Submitted 7 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Accepted by ASRU2023

arXiv:2306.08401 [pdf, other]

LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming

Authors: Jingsheng Gao, Yixin Lian, Ziyi Zhou, Yuzhuo Fu, Baoyuan Wang

Abstract: Open-domain dialogue systems have made promising progress in recent years. While the state-of-the-art dialogue agents are built upon large-scale text-based social media data and large pre-trained models, there is no guarantee these agents could also perform well in fast-growing scenarios, such as live streaming, due to the bounded transferability of pre-trained models and biased distributions of p… ▽ More Open-domain dialogue systems have made promising progress in recent years. While the state-of-the-art dialogue agents are built upon large-scale text-based social media data and large pre-trained models, there is no guarantee these agents could also perform well in fast-growing scenarios, such as live streaming, due to the bounded transferability of pre-trained models and biased distributions of public datasets from Reddit and Weibo, etc. To improve the essential capability of responding and establish a benchmark in the live open-domain scenario, we introduce the LiveChat dataset, composed of 1.33 million real-life Chinese dialogues with almost 3800 average sessions across 351 personas and fine-grained profiles for each persona. LiveChat is automatically constructed by processing numerous live videos on the Internet and naturally falls within the scope of multi-party conversations, where the issues of Who says What to Whom should be considered. Therefore, we target two critical tasks of response modeling and addressee recognition and propose retrieval-based baselines grounded on advanced techniques. Experimental results have validated the positive effects of leveraging persona profiles and larger average sessions per persona. In addition, we also benchmark the transferability of advanced generation-based models on LiveChat and pose some future directions for current challenges. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: ACL 2023 Main Conference

arXiv:2305.16885 [pdf, other]

Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification

Authors: Ke Ji, Yixin Lian, Jingsheng Gao, Baoyuan Wang

Abstract: Due to the complex label hierarchy and intensive labeling cost in practice, the hierarchical text classification (HTC) suffers a poor performance especially when low-resource or few-shot settings are considered. Recently, there is a growing trend of applying prompts on pre-trained language models (PLMs), which has exhibited effectiveness in the few-shot flat text classification tasks. However, lim… ▽ More Due to the complex label hierarchy and intensive labeling cost in practice, the hierarchical text classification (HTC) suffers a poor performance especially when low-resource or few-shot settings are considered. Recently, there is a growing trend of applying prompts on pre-trained language models (PLMs), which has exhibited effectiveness in the few-shot flat text classification tasks. However, limited work has studied the paradigm of prompt-based learning in the HTC problem when the training data is extremely scarce. In this work, we define a path-based few-shot setting and establish a strict path-based evaluation metric to further explore few-shot HTC tasks. To address the issue, we propose the hierarchical verbalizer ("HierVerb"), a multi-verbalizer framework treating HTC as a single- or multi-label classification problem at multiple layers and learning vectors as verbalizers constrained by hierarchical structure and hierarchical contrastive learning. In this manner, HierVerb fuses label hierarchy knowledge into verbalizers and remarkably outperforms those who inject hierarchy through graph encoders, maximizing the benefits of PLMs. Extensive experiments on three popular HTC datasets under the few-shot settings demonstrate that prompt with HierVerb significantly boosts the HTC performance, meanwhile indicating an elegant way to bridge the gap between the large pre-trained model and downstream hierarchical classification tasks. Our code and few-shot dataset are publicly available at https://github.com/1KE-JI/HierVerb. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 14 pages, 8 figures, Accepted by ACL 2023

arXiv:2303.11138 [pdf, other]

doi 10.1109/LCSYS.2023.3287568

Fault Detection via Occupation Kernel Principal Component Analysis

Authors: Zachary Morrison, Benjamin P. Russo, Yingzhao Lian, Rushikesh Kamalapurkar

Abstract: The reliable operation of automatic systems is heavily dependent on the ability to detect faults in the underlying dynamical system. While traditional model-based methods have been widely used for fault detection, data-driven approaches have garnered increasing attention due to their ease of deployment and minimal need for expert knowledge. In this paper, we present a novel principal component ana… ▽ More The reliable operation of automatic systems is heavily dependent on the ability to detect faults in the underlying dynamical system. While traditional model-based methods have been widely used for fault detection, data-driven approaches have garnered increasing attention due to their ease of deployment and minimal need for expert knowledge. In this paper, we present a novel principal component analysis (PCA) method that uses occupation kernels. Occupation kernels result in feature maps that are tailored to the measured data, have inherent noise-robustness due to the use of integration, and can utilize irregularly sampled system trajectories of variable lengths for PCA. The occupation kernel PCA method is used to develop a reconstruction error approach to fault detection and its efficacy is validated using numerical simulations. △ Less

Submitted 26 June, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

arXiv:2301.13083 [pdf, other]

Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off

Authors: Yuchen Lian, Arianna Bisazza, Tessa Verhoef

Abstract: Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. A common explanation is the lack of appropriate cognitive biases in these learners. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. We investigate this latter accoun… ▽ More Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. A common explanation is the lack of appropriate cognitive biases in these learners. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. We investigate this latter account focusing on the word-order/case-marking trade-off, a widely attested language universal that has proven particularly hard to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a miniature language via supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding specific biases in the agents. We see this as an essential step towards the investigation of language universals with neural learners. △ Less

Submitted 31 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: Accepted to TACL, pre-MIT Press publication version

arXiv:2301.05834 [pdf, ps, other]

On lattice tilings of $\mathbb{Z}^{n}$ by limited magnitude error balls $\mathcal{B}(n,2,1,1)$

Authors: Tao Zhang, Yanlu Lian, Gennian Ge

Abstract: Limited magnitude error model has applications in flash memory. In this model, a perfect code is equivalent to a tiling of $\mathbb{Z}^n$ by limited magnitude error balls. In this paper, we give a complete classification of lattice tilings of $\mathbb{Z}^n$ by limited magnitude error balls $\mathcal{B}(n,2,1,1)$. Limited magnitude error model has applications in flash memory. In this model, a perfect code is equivalent to a tiling of $\mathbb{Z}^n$ by limited magnitude error balls. In this paper, we give a complete classification of lattice tilings of $\mathbb{Z}^n$ by limited magnitude error balls $\mathcal{B}(n,2,1,1)$. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: 15 pages

arXiv:2205.15703 [pdf, other]

Lessons Learned from Data-Driven Building Control Experiments: Contrasting Gaussian Process-based MPC, Bilevel DeePC, and Deep Reinforcement Learning

Authors: Loris Di Natale, Yingzhao Lian, Emilio T. Maddalena, Jicheng Shi, Colin N. Jones

Abstract: This manuscript offers the perspective of experimentalists on a number of modern data-driven techniques: model predictive control relying on Gaussian processes, adaptive data-driven control based on behavioral theory, and deep reinforcement learning. These techniques are compared in terms of data requirements, ease of use, computational burden, and robustness in the context of real-world applicati… ▽ More This manuscript offers the perspective of experimentalists on a number of modern data-driven techniques: model predictive control relying on Gaussian processes, adaptive data-driven control based on behavioral theory, and deep reinforcement learning. These techniques are compared in terms of data requirements, ease of use, computational burden, and robustness in the context of real-world applications. Our remarks and observations stem from a number of experimental investigations carried out in the field of building control in diverse environments, from lecture halls and apartment spaces to a hospital surgery center. The final goal is to support others in identifying what technique is best suited to tackle their own problems. △ Less

Submitted 31 May, 2022; originally announced May 2022.

arXiv:2204.00376 [pdf, other]

Few-shot One-class Domain Adaptation Based on Frequency for Iris Presentation Attack Detection

Authors: Yachun Li, Ying Lian, Jingjing Wang, Yuhui Chen, Chunmao Wang, Shiliang Pu

Abstract: Iris presentation attack detection (PAD) has achieved remarkable success to ensure the reliability and security of iris recognition systems. Most existing methods exploit discriminative features in the spatial domain and report outstanding performance under intra-dataset settings. However, the degradation of performance is inevitable under cross-dataset settings, suffering from domain shift. In co… ▽ More Iris presentation attack detection (PAD) has achieved remarkable success to ensure the reliability and security of iris recognition systems. Most existing methods exploit discriminative features in the spatial domain and report outstanding performance under intra-dataset settings. However, the degradation of performance is inevitable under cross-dataset settings, suffering from domain shift. In consideration of real-world applications, a small number of bonafide samples are easily accessible. We thus define a new domain adaptation setting called Few-shot One-class Domain Adaptation (FODA), where adaptation only relies on a limited number of target bonafide samples. To address this problem, we propose a novel FODA framework based on the expressive power of frequency information. Specifically, our method integrates frequency-related information through two proposed modules. Frequency-based Attention Module (FAM) aggregates frequency information into spatial attention and explicitly emphasizes high-frequency fine-grained features. Frequency Mixing Module (FMM) mixes certain frequency components to generate large-scale target-style samples for adaptation with limited target bonafide samples. Extensive experiments on LivDet-Iris 2017 dataset demonstrate the proposed method achieves state-of-the-art or competitive performance under both cross-dataset and intra-dataset settings. △ Less

Submitted 1 April, 2022; originally announced April 2022.

Comments: Camera Ready, ICASSP 2022

arXiv:2105.12371 [pdf, other]

Quotient Space-Based Keyword Retrieval in Sponsored Search

Authors: Yijiang Lian, Shuang Li, Chaobing Feng, YanFeng Zhu

Abstract: Synonymous keyword retrieval has become an important problem for sponsored search ever since major search engines relax the exact match product's matching requirement to a synonymous level. Since the synonymous relations between queries and keywords are quite scarce, the traditional information retrieval framework is inefficient in this scenario. In this paper, we propose a novel quotient space-ba… ▽ More Synonymous keyword retrieval has become an important problem for sponsored search ever since major search engines relax the exact match product's matching requirement to a synonymous level. Since the synonymous relations between queries and keywords are quite scarce, the traditional information retrieval framework is inefficient in this scenario. In this paper, we propose a novel quotient space-based retrieval framework to address this problem. Considering the synonymy among keywords as a mathematical equivalence relation, we can compress the synonymous keywords into one representative, and the corresponding quotient space would greatly reduce the size of the keyword repository. Then an embedding-based retrieval is directly conducted between queries and the keyword representatives. To mitigate the semantic gap of the quotient space-based retrieval, a single semantic siamese model is utilized to detect both the keyword--keyword and query-keyword synonymous relations. The experiments show that with our quotient space-based retrieval method, the synonymous keyword retrieving performance can be greatly improved in terms of memory cost and recall efficiency. This method has been successfully implemented in Baidu's online sponsored search system and has yielded a significant improvement in revenue. △ Less

Submitted 26 May, 2021; originally announced May 2021.

arXiv:2104.07637 [pdf, other]

The Effect of Efficient Messaging and Input Variability on Neural-Agent Iterated Language Learning

Authors: Yuchen Lian, Arianna Bisazza, Tessa Verhoef

Abstract: Natural languages display a trade-off among different strategies to convey syntactic structure, such as word order or inflection. This trade-off, however, has not appeared in recent simulations of iterated language learning with neural network agents (Chaabouni et al., 2019b). We re-evaluate this result in light of three factors that play an important role in comparable experiments from the Langua… ▽ More Natural languages display a trade-off among different strategies to convey syntactic structure, such as word order or inflection. This trade-off, however, has not appeared in recent simulations of iterated language learning with neural network agents (Chaabouni et al., 2019b). We re-evaluate this result in light of three factors that play an important role in comparable experiments from the Language Evolution field: (i) speaker bias towards efficient messaging, (ii) non systematic input languages, and (iii) learning bottleneck. Our simulations show that neural agents mainly strive to maintain the utterance type distribution observed during learning, instead of developing a more efficient or systematic language. △ Less

Submitted 10 September, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: To appear at EMNLP 2021

arXiv:2102.10560 [pdf, other]

A Concept Knowledge-Driven Keywords Retrieval Framework for Sponsored Search

Authors: Yijiang Lian, Yubo Liu, Zhicong Ye, Liang Yuan, Yanfeng Zhu, Min Zhao, Jianyi Cheng, Xinwei Feng

Abstract: In sponsored search, retrieving synonymous keywords for exact match type is important for accurately targeted advertising. Data-driven deep learning-based method has been proposed to tackle this problem. An apparent disadvantage of this method is its poor generalization performance on entity-level long-tail instances, even though they might share similar concept-level patterns with frequent instan… ▽ More In sponsored search, retrieving synonymous keywords for exact match type is important for accurately targeted advertising. Data-driven deep learning-based method has been proposed to tackle this problem. An apparent disadvantage of this method is its poor generalization performance on entity-level long-tail instances, even though they might share similar concept-level patterns with frequent instances. With the help of a large knowledge base, we find that most commercial synonymous query-keyword pairs can be abstracted into meaningful conceptual patterns through concept tagging. Based on this fact, we propose a novel knowledge-driven conceptual retrieval framework to mitigate this problem, which consists of three parts: data conceptualization, matching via conceptual patterns and concept-augmented discrimination. Both offline and online experiments show that our method is very effective. This framework has been successfully applied to Baidu's sponsored search system, which yields a significant improvement in revenue. △ Less

Submitted 21 February, 2021; originally announced February 2021.

arXiv:2101.02392 [pdf, other]

Detecting Log Anomalies with Multi-Head Attention (LAMA)

Authors: Yicheng Guo, Yujin Wen, Congwei Jiang, Yixin Lian, Yi Wan

Abstract: Anomaly detection is a crucial and challenging subject that has been studied within diverse research areas. In this work, we explore the task of log anomaly detection (especially computer system logs and user behavior logs) by analyzing logs' sequential information. We propose LAMA, a multi-head attention based sequential model to process log streams as template activity (event) sequences. A nex… ▽ More Anomaly detection is a crucial and challenging subject that has been studied within diverse research areas. In this work, we explore the task of log anomaly detection (especially computer system logs and user behavior logs) by analyzing logs' sequential information. We propose LAMA, a multi-head attention based sequential model to process log streams as template activity (event) sequences. A next event prediction task is applied to train the model for anomaly detection. Extensive empirical studies demonstrate that our new model outperforms existing log anomaly detection methods including statistical and deep learning methodologies, which validate the effectiveness of our proposed method in learning sequence patterns of log data. △ Less

Submitted 7 January, 2021; originally announced January 2021.

arXiv:2010.05594 [pdf, other]

MultiWOZ 2.3: A multi-domain task-oriented dialogue dataset enhanced with annotation corrections and co-reference annotation

Authors: Ting Han, Ximing Liu, Ryuichi Takanobu, Yixin Lian, Chongxuan Huang, Dazhen Wan, Wei Peng, Minlie Huang

Abstract: Task-oriented dialogue systems have made unprecedented progress with multiple state-of-the-art (SOTA) models underpinned by a number of publicly available MultiWOZ datasets. Dialogue state annotations are error-prone, leading to sub-optimal performance. Various efforts have been put in rectifying the annotation errors presented in the original MultiWOZ dataset. In this paper, we introduce MultiWOZ… ▽ More Task-oriented dialogue systems have made unprecedented progress with multiple state-of-the-art (SOTA) models underpinned by a number of publicly available MultiWOZ datasets. Dialogue state annotations are error-prone, leading to sub-optimal performance. Various efforts have been put in rectifying the annotation errors presented in the original MultiWOZ dataset. In this paper, we introduce MultiWOZ 2.3, in which we differentiate incorrect annotations in dialogue acts from dialogue states, identifying a lack of co-reference when publishing the updated dataset. To ensure consistency between dialogue acts and dialogue states, we implement co-reference features and unify annotations of dialogue acts and dialogue states. We update the state of the art performance of natural language understanding and dialogue state tracking on MultiWOZ 2.3, where the results show significant improvements than on previous versions of MultiWOZ datasets (2.0-2.2). △ Less

Submitted 14 June, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

arXiv:2008.02014 [pdf, other]

Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

Authors: Yijiang Lian, Zhijie Chen, Xin Pei, Shuang Li, Yifei Wang, Yuefeng Qiu, Zhiheng Zhang, Zhipeng Tao, Liang Yuan, Hanju Guan, Kefeng Zhang, Zhigang Li, Xiaochun Liu

Abstract: Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned… ▽ More Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue. △ Less

Submitted 5 August, 2020; originally announced August 2020.

arXiv:2008.01969 [pdf, other]

Retrieve Synonymous keywords for Frequent Queries in Sponsored Search in a Data Augmentation Way

Authors: Yijiang Lian, Zhenjun You, Fan Wu, Wenqiang Liu, Jing Jia

Abstract: In sponsored search, retrieving synonymous keywords is of great importance for accurately targeted advertising. The semantic gap between queries and keywords and the extremely high precision requirements (>= 95\%) are two major challenges to this task. To the best of our knowledge, the problem has not been openly discussed. In an industrial sponsored search system, the retrieved keywords for frequ… ▽ More In sponsored search, retrieving synonymous keywords is of great importance for accurately targeted advertising. The semantic gap between queries and keywords and the extremely high precision requirements (>= 95\%) are two major challenges to this task. To the best of our knowledge, the problem has not been openly discussed. In an industrial sponsored search system, the retrieved keywords for frequent queries are usually done ahead of time and stored in a lookup table. Considering these results as a seed dataset, we propose a data-augmentation-like framework to improve the synonymous retrieval performance for these frequent queries. This framework comprises two steps: translation-based retrieval and discriminant-based filtering. Firstly, we devise a Trie-based translation model to make a data increment. In this phase, a Bag-of-Core-Words trick is conducted, which increased the data increment's volume 4.2 times while keeping the original precision. Then we use a BERT-based discriminant model to filter out nonsynonymous pairs, which exceeds the traditional feature-driven GBDT model with 11\% absolute AUC improvement. This method has been successfully applied to Baidu's sponsored search system, which has yielded a significant improvement in revenue. In addition, a commercial Chinese dataset containing 500K synonymous pairs with a precision of 95\% is released to the public for paraphrase study (http://ai.baidu.com/broad/subordinate?dataset=paraphrasing). △ Less

Submitted 5 August, 2020; originally announced August 2020.

arXiv:1902.00592 [pdf, other]

An end-to-end Generative Retrieval Method for Sponsored Search Engine --Decoding Efficiently into a Closed Target Domain

Authors: Yijiang Lian, Zhijie Chen, Jinlong Hu, Kefeng Zhang, Chunwei Yan, Muchenxuan Tong, Wenying Han, Hanju Guan, Ying Li, Ying Cao, Yang Yu, Zhigang Li, Xiaochun Liu, Yue Wang

Abstract: In this paper, we present a generative retrieval method for sponsored search engine, which uses neural machine translation (NMT) to generate keywords directly from query. This method is completely end-to-end, which skips query rewriting and relevance judging phases in traditional retrieval systems. Different from standard machine translation, the target space in the retrieval setting is a constrai… ▽ More In this paper, we present a generative retrieval method for sponsored search engine, which uses neural machine translation (NMT) to generate keywords directly from query. This method is completely end-to-end, which skips query rewriting and relevance judging phases in traditional retrieval systems. Different from standard machine translation, the target space in the retrieval setting is a constrained closed set, where only committed keywords should be generated. We present a Trie-based pruning technique in beam search to address this problem. The biggest challenge in deploying this method into a real industrial environment is the latency impact of running the decoder. Self-normalized training coupled with Trie-based dynamic pruning dramatically reduces the inference time, yielding a speedup of more than 20 times. We also devise an mixed online-offline serving architecture to reduce the latency and CPU consumption. To encourage the NMT to generate new keywords uncovered by the existing system, training data is carefully selected. This model has been successfully applied in Baidu's commercial search engine as a supplementary retrieval branch, which has brought a remarkable revenue improvement of more than 10 percents. △ Less

Submitted 18 March, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

Comments: 8 pages, 8 figures, conference

arXiv:1812.06585 [pdf, other]

Generalizable Meta-Heuristic based on Temporal Estimation of Rewards for Large Scale Blackbox Optimization

Authors: Mingde Zhao, Hongwei Ge, Yi Lian, Kai Zhang

Abstract: The generalization abilities of heuristic optimizers may deteriorate with the increment of the search space dimensionality. To achieve generalized performance across Large Scale Blackbox Optimization (LSBO) tasks, it ispossible to ensemble several heuristics and devise a meta-heuristic to control their initiation. This paper first proposes a methodology of transforming LSBO problems into online de… ▽ More The generalization abilities of heuristic optimizers may deteriorate with the increment of the search space dimensionality. To achieve generalized performance across Large Scale Blackbox Optimization (LSBO) tasks, it ispossible to ensemble several heuristics and devise a meta-heuristic to control their initiation. This paper first proposes a methodology of transforming LSBO problems into online decision processes to maximize efficiency of resource utilization. Then, using the perspective of multi-armed bandits with non-stationary reward distributions, we propose a meta-heuristic based on Temporal Estimation of Rewards (TER) to address such decision process. TER uses a window for temporal credit assignment and Boltzmann exploration to balance the exploration-exploitation tradeoff. The prior-free TER generalizes across LSBO tasks with flexibility for different types of limited computational resources (e.g. time, money, etc.) and is easy to be adapted to new tasks for its simplicity and easy interface for heuristic articulation. Tests on the benchmarks validate the problem formulation and suggest significant effectiveness: when TER is articulated with three heuristics, competitive performance is reported across different sets of benchmark problems with search dimensions up to 10000. △ Less

Submitted 18 September, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

Comments: 7 pages of contents, 1 page of references, 2 pages for appendix

arXiv:1810.09517 [pdf]

A High Accuracy and High Sensitivity System Architecture for Electrical Impedance Tomography System

Authors: Hui Li, Boxiao Liu, Yongfu Li, Guoxing Wang, Yong Lian

Abstract: A high accuracy and high sensitivity system architecture is proposed for the read-out circuit of electrical impedance tomography system-on-chip. The switched ratiometric technique is applied in the proposed architecture. The proposed system architecture minimizes the device noise by processing signals from both read-out electrodes and the stimulus. The quantized signals are post-processed in the d… ▽ More A high accuracy and high sensitivity system architecture is proposed for the read-out circuit of electrical impedance tomography system-on-chip. The switched ratiometric technique is applied in the proposed architecture. The proposed system architecture minimizes the device noise by processing signals from both read-out electrodes and the stimulus. The quantized signals are post-processed in the digital processing unit for proper signal demodulation and impedance ratio calculation. Our proposed architecture improves the sensitivity of the read-out circuit, cancels out the gain fluctuations in the system, and overcomes the effects of motion artifacts on measurements. △ Less

Submitted 2 October, 2018; originally announced October 2018.

arXiv:1808.03679 [pdf]

Machine Learning Promoting Extreme Simplification of Spectroscopy Equipment

Authors: Jianchao Lee, Qiannan Duan, Sifan Bi, Ruen Luo, Yachao Lian, Hanqiang Liu, Ruixing Tian, Jiayuan Chen, Guodong Ma, Jinhong Gao, Zhaoyi Xu

Abstract: The spectroscopy measurement is one of main pathways for exploring and understanding the nature. Today, it seems that racing artificial intelligence will remould its styles. The algorithms contained in huge neural networks are capable of substituting many of expensive and complex components of spectrum instruments. In this work, we presented a smart machine learning strategy on the measurement of… ▽ More The spectroscopy measurement is one of main pathways for exploring and understanding the nature. Today, it seems that racing artificial intelligence will remould its styles. The algorithms contained in huge neural networks are capable of substituting many of expensive and complex components of spectrum instruments. In this work, we presented a smart machine learning strategy on the measurement of absorbance curves, and also initially verified that an exceedingly-simplified equipment is sufficient to meet the needs for this strategy. Further, with its simplicity, the setup is expected to infiltrate into many scientific areas in versatile forms. △ Less

Submitted 13 September, 2019; v1 submitted 5 August, 2018; originally announced August 2018.

Comments: This is the second version. On pages 7 through 8, we have added a new case about the spectral properties of mixtures. Specifically, paragraph 1 on page 8 and Fig.7 is added

arXiv:1409.8021 [pdf]

doi 10.1109/BioCAS.2012.6418512

A Smart Cushion for Real-Time Heart Rate Monitoring

Authors: Chacko John Deepu, Zhihao Chen, Ju Teng Teo, Soon Huat Ng, Xiefeng Yang, Yong Lian

Abstract: This paper presents a smart cushion for real time heart rate monitoring. The cushion comprises of an integrated micro-bending fiber sensor, which records the BCG (Ballistocardiogram) signal without direct skin-electrode contact, and an optical transceiver that does signal amplification, digitization, and pre-filtering. To remove the artifacts and extract heart rate from BCG signal, a computational… ▽ More This paper presents a smart cushion for real time heart rate monitoring. The cushion comprises of an integrated micro-bending fiber sensor, which records the BCG (Ballistocardiogram) signal without direct skin-electrode contact, and an optical transceiver that does signal amplification, digitization, and pre-filtering. To remove the artifacts and extract heart rate from BCG signal, a computationally efficient heart rate detection algorithm is developed. The system doesn't require any pre-training and is highly responsive with the outputs updated every 3 sec and initial response within first 10 sec. Tests conducted on human subjects show the detected heart rate closely matches the one from a commercial SpO2 device. △ Less

Submitted 29 September, 2014; originally announced September 2014.

Comments: 2012 IEEE Biomedical Circuits and Systems Conference

arXiv:1409.8020 [pdf]

doi 10.1109/DELTA.2010.43

An ECG-on-Chip for Wearable Cardiac Monitoring Devices

Authors: C. J. Deepu, X. Y. Xu, X. D. Zou, L. B. Yao, Y. Lian

Abstract: This paper describes a highly integrated, low power chip solution for ECG signal processing in wearable devices. The chip contains an instrumentation amplifier with programmable gain, a band-pass filter, a 12-bit SAR ADC, a novel QRS detector, 8K on-chip SRAM, and relevant control circuitry and CPU interfaces. The analog front end circuits accurately senses and digitizes the raw ECG signal, which… ▽ More This paper describes a highly integrated, low power chip solution for ECG signal processing in wearable devices. The chip contains an instrumentation amplifier with programmable gain, a band-pass filter, a 12-bit SAR ADC, a novel QRS detector, 8K on-chip SRAM, and relevant control circuitry and CPU interfaces. The analog front end circuits accurately senses and digitizes the raw ECG signal, which is then filtered to extract the QRS. The sampling frequency used is 256 Hz. ECG samples are buffered locally on an asynchronous FIFO and is read out using a faster clock, as and when it is required by the host CPU via an SPI interface. The chip was designed and implemented in 0.35um standard CMOS process. The analog core operates at 1V while the digital circuits and SRAM operate at 3.3V. The chip total core area is 5.74 mm^2 and consumes 9.6uW. Small size and low power consumption make this design suitable for usage in wearable heart monitoring devices. △ Less

Submitted 29 September, 2014; originally announced September 2014.

Journal ref: 5th IEEE International Symposium on Electronic Design Test and Applications 2010

arXiv:1409.8018 [pdf]

doi 10.1109/JSSC.2014.2349994

An ECG-on-Chip with 535-nW/Channel Integrated Lossless Data Compressor for Wireless Sensors

Authors: C. J. Deepu, X. Zhang, W. -S. Liew, D. L. T. Wong, Y. Lian

Abstract: This paper presents a low-power ECG recording system-on-chip (SoC) with on-chip low-complexity lossless ECG compression for data reduction in wireless/ambulatory ECG sensor devices. The chip uses a linear slope predictor for data compression, and incorporates a novel low-complexity dynamic coding-packaging scheme to frame the prediction error into fixed-length 16-bit format. The proposed technique… ▽ More This paper presents a low-power ECG recording system-on-chip (SoC) with on-chip low-complexity lossless ECG compression for data reduction in wireless/ambulatory ECG sensor devices. The chip uses a linear slope predictor for data compression, and incorporates a novel low-complexity dynamic coding-packaging scheme to frame the prediction error into fixed-length 16-bit format. The proposed technique achieves an average compression ratio of 2.25x on MIT/BIH ECG database. Implemented in a standard 0.35 um process, the compressor uses 0.565K gates/channel occupying 0.4 mm2 for four channels, and consumes 535 nW/channel at 2.4 V for ECG sampled at 512 Hz. Small size and ultra-low power consumption makes the proposed technique suitable for wearable ECG sensor applications. △ Less

Submitted 29 September, 2014; originally announced September 2014.

Journal ref: IEEE Journal of Solid-State Circuits, Nov 2014

Showing 1–28 of 28 results for author: Lian, Y