Search | arXiv e-print repository

The Power of the Noisy Channel: Unsupervised End-to-End Task-Oriented Dialogue with LLMs

Abstract: Training task-oriented dialogue systems typically requires turn-level annotations for interacting with their APIs: e.g. a dialogue state and the system actions taken at each step. These annotations can be costly to produce, error-prone, and require both domain and annotation expertise. With advances in LLMs, we hypothesize unlabelled data and a schema definition are sufficient for building a worki… ▽ More Training task-oriented dialogue systems typically requires turn-level annotations for interacting with their APIs: e.g. a dialogue state and the system actions taken at each step. These annotations can be costly to produce, error-prone, and require both domain and annotation expertise. With advances in LLMs, we hypothesize unlabelled data and a schema definition are sufficient for building a working task-oriented dialogue system, completely unsupervised. Using only (1) a well-defined API schema (2) a set of unlabelled dialogues between a user and agent, we develop a novel approach for inferring turn-level annotations as latent variables using a noisy channel model. We iteratively improve these pseudo-labels with expectation-maximization (EM), and use the inferred labels to train an end-to-end dialogue agent. Evaluating our approach on the MultiWOZ benchmark, our method more than doubles the dialogue success rate of a strong GPT-3.5 baseline. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 16 Pages, 7 Figures

arXiv:2307.01453 [pdf, other]

Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking

Authors: Brendan King, Jeffrey Flanigan

Abstract: There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances th… ▽ More There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances the state of the art with three advancements to in-context learning for DST. First, we formulate DST as a Python programming task, explicitly modeling language coreference as variable reference in Python. Second, since in-context learning depends highly on the context examples, we propose a method to retrieve a diverse set of relevant examples to improve performance. Finally, we introduce a novel re-weighting method during decoding that takes into account probabilities of competing surface forms, and produces a more accurate dialogue state prediction. We evaluate our approach using MultiWOZ and achieve state-of-the-art multi-domain joint-goal accuracy in zero and few-shot settings. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 14 pages, 2 figures, to appear in Findings of the ACL 2023

arXiv:2302.12944 [pdf, other]

Dependency Dialogue Acts -- Annotation Scheme and Case Study

Authors: Jon Z. Cai, Brendan King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ananya Ganesh, James H. Martin, Martha Palmer, Marilyn Walker, Jeffrey Flanigan

Abstract: In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse,… ▽ More In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse, and response structure in multi-party multi-threaded conversations. A few key features distinguish DDA from existing dialogue annotation frameworks such as SWBD-DAMSL and the ISO 24617-2 standard. First, DDA prioritizes the relational structure of the dialogue units and the dialog context, annotating both dialog acts and rhetorical relations as response relations to particular utterances. Second, DDA embraces overloading in dialogues, encouraging annotators to specify multiple response relations and dialog acts for each dialog unit. Lastly, DDA places an emphasis on adequately capturing how a speaker is using the full dialog context to plan and organize their speech. With these features, DDA is highly expressive and recall-oriented with regard to conversation dynamics between multiple speakers. In what follows, we present the DDA annotation framework and case studies annotating DDA structures in multi-party, multi-threaded conversations. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: The 13th International Workshop on Spoken Dialogue Systems Technology

Journal ref: The 13th International Workshop on Spoken Dialogue Systems Technology 2023

arXiv:2207.11345 [pdf, ps, other]

doi 10.21437/Interspeech.2022-10816

Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

Authors: Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke

Abstract: As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts. One approach to achieve fairness in speech recognition is to (1) identify speaker cohorts that suffer from subpar performance and (2) apply fairness mitigation measures targeting the cohorts discovered. In this paper, we report on initial findings with both… ▽ More As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts. One approach to achieve fairness in speech recognition is to (1) identify speaker cohorts that suffer from subpar performance and (2) apply fairness mitigation measures targeting the cohorts discovered. In this paper, we report on initial findings with both discovery and mitigation of performance disparities using data from a product-scale AI assistant speech recognition system. We compare cohort discovery based on geographic and demographic information to a more scalable method that groups speakers without human labels, using speaker embedding technology. For fairness mitigation, we find that oversampling of underrepresented cohorts, as well as modeling speaker cohort membership by additional input variables, reduces the gap between top- and bottom-performing cohorts, without deteriorating overall recognition accuracy. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: Proc. Interspeech 2022

Journal ref: Proc. Interspeech, Sept. 2022, pp. 1268-1272

arXiv:2207.07850 [pdf, other]

doi 10.21437/Interspeech.2022-11063

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

Authors: Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas

Abstract: We present an approach to reduce the performance disparity between geographic regions without degrading performance on the overall user population for ASR. A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER). However, when the ASR model is adapted to get better performance on these high-WER regions, its parameters wander from t… ▽ More We present an approach to reduce the performance disparity between geographic regions without degrading performance on the overall user population for ASR. A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER). However, when the ASR model is adapted to get better performance on these high-WER regions, its parameters wander from the previous optimal values, which can lead to worse performance in other regions. In our proposed method, we utilize the elastic weight consolidation (EWC) regularization loss to identify directions in parameters space along which the ASR weights can vary to improve for high-error regions, while still maintaining performance on the speaker population overall. Our results demonstrate that EWC can reduce the word error rate (WER) in the region with highest WER by 3.2% relative while reducing the overall WER by 1.3% relative. We also evaluate the role of language and acoustic models in ASR fairness and propose a clustering algorithm to identify WER disparities based on geographic region. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: Accepted for publication at Interspeech 2022

Journal ref: Proc. Interspeech, Sept. 2022, pp. 1298-1302

arXiv:2207.02393 [pdf, other]

Compute Cost Amortized Transformer for Streaming ASR

Authors: Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel

Abstract: We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization. Our architecture creates sparse computation pathways dynamically at inference time, resulting in selective use of compute resources throughout decoding, enabling significant reductions in compute with minimal impact on acc… ▽ More We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization. Our architecture creates sparse computation pathways dynamically at inference time, resulting in selective use of compute resources throughout decoding, enabling significant reductions in compute with minimal impact on accuracy. The fully differentiable architecture is trained end-to-end with an accompanying lightweight arbitrator mechanism operating at the frame-level to make dynamic decisions on each input while a tunable loss function is used to regularize the overall level of compute against predictive performance. We report empirical results from experiments using the compute amortized Transformer-Transducer (T-T) model conducted on LibriSpeech data. Our best model can achieve a 60% compute cost reduction with only a 3% relative word error rate (WER) increase. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2112.00350 [pdf, other]

Investigation of Training Label Error Impact on RNN-T

Authors: I-Fan Chen, Brian King, Jasha Droppo

Abstract: In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models. The result shows deletion errors are more harmful than substitution and insertion label errors in RNN-T training data. We also examined label error impact mitigation approaches on RNN-T and found that, though all the methods mitigate the label-error-caused degradati… ▽ More In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models. The result shows deletion errors are more harmful than substitution and insertion label errors in RNN-T training data. We also examined label error impact mitigation approaches on RNN-T and found that, though all the methods mitigate the label-error-caused degradation to some extent, they could not remove the performance gap between the models trained with and without the presence of label errors. Based on the analysis results, we suggest to design data pipelines for RNN-T with higher priority on reducing deletion label errors. We also find that ensuring high-quality training labels remains important, despite of the existence of the label error mitigation approaches. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: 6 pages

arXiv:2110.05543 [pdf, other]

Fallout: Distributed Systems Testing as a Service

Authors: Guy Bolton King, Sean McCarthy, Pushkala Pattabhiraman, Jake Luciani, Matt Fleming

Abstract: All modern distributed systems list performance and scalability as their core strengths. Given that optimal performance requires carefully selecting configuration options, and typical cluster sizes can range anywhere from 2 to 300 nodes, it is rare for any two clusters to be exactly the same. Validating the behavior and performance of distributed systems in this large configuration space is challe… ▽ More All modern distributed systems list performance and scalability as their core strengths. Given that optimal performance requires carefully selecting configuration options, and typical cluster sizes can range anywhere from 2 to 300 nodes, it is rare for any two clusters to be exactly the same. Validating the behavior and performance of distributed systems in this large configuration space is challenging without automation that stretches across the software stack. In this paper we present Fallout, an open-source distributed systems testing service that automatically provisions and configures distributed systems and clients, supports running a variety of workloads and benchmarks, and generates performance reports based on collected metrics for visual analysis. We have been running the Fallout service internally at DataStax for over 5 years and have recently open sourced it to support our work with Apache Cassandra, Pulsar, and other open source projects. We describe the architecture of Fallout along with the evolution of its design and the lessons we learned operating this service in a dynamic environment where teams work on different products and favor different benchmarking tools. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: Submitted to 2021 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench'21)

arXiv:2106.07734 [pdf, other]

CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

Authors: Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

Abstract: We propose a simple yet effective method to compress an RNN-Transducer (RNN-T) through the well-known knowledge distillation paradigm. We show that the transducer's encoder outputs naturally have a high entropy and contain rich information about acoustically similar word-piece confusions. This rich information is suppressed when combined with the lower entropy decoder outputs to produce the joint… ▽ More We propose a simple yet effective method to compress an RNN-Transducer (RNN-T) through the well-known knowledge distillation paradigm. We show that the transducer's encoder outputs naturally have a high entropy and contain rich information about acoustically similar word-piece confusions. This rich information is suppressed when combined with the lower entropy decoder outputs to produce the joint network logits. Consequently, we introduce an auxiliary loss to distill the encoder logits from a teacher transducer's encoder, and explore training strategies where this encoder distillation works effectively. We find that tandem training of teacher and student encoders with an inplace encoder distillation outperforms the use of a pre-trained and static teacher transducer. We also report an interesting phenomenon we refer to as implicit distillation, that occurs when the teacher and student encoders share the same decoder. Our experiments show 5.37-8.4% relative word error rate reductions (WERR) on in-house test sets, and 5.05-6.18% relative WERRs on LibriSpeech test sets. △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: Accepted at InterSpeech 2021

arXiv:2106.02750 [pdf, other]

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

Authors: Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

Abstract: Automatic speech recognition (ASR) models are typically designed to operate on a single input data type, e.g. a single or multi-channel audio streamed from a device. This design decision assumes the primary input data source does not change and if an additional (auxiliary) data source is occasionally available, it cannot be used. An ASR model that operates on both primary and auxiliary data can ac… ▽ More Automatic speech recognition (ASR) models are typically designed to operate on a single input data type, e.g. a single or multi-channel audio streamed from a device. This design decision assumes the primary input data source does not change and if an additional (auxiliary) data source is occasionally available, it cannot be used. An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable. In this work, we propose a unified ASR model that can serve both modes. We demonstrate its efficacy in a realistic scenario where a set of devices typically stream a single primary audio channel, and two additional auxiliary channels only when upload bandwidth allows it. The architecture enables a unique methodology that uses both types of input audio during training time. Our proposed approach achieves up to 12.5% relative word-error-rate reduction (WERR) compared to a PO baseline, and up to 16.0% relative WERR in low-SNR conditions. The unique training methodology achieves up to 2.5% relative WERR compared to a PPA baseline. △ Less

Submitted 28 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2105.05920 [pdf, ps, other]

Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

Authors: Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas

Abstract: Attention-based beamformers have recently been shown to be effective for multi-channel speech recognition. However, they are less capable at capturing local information. In this work, we propose a 2D Conv-Attention module which combines convolution neural networks with attention for beamforming. We apply self- and cross-attention to explicitly model the correlations within and between the input ch… ▽ More Attention-based beamformers have recently been shown to be effective for multi-channel speech recognition. However, they are less capable at capturing local information. In this work, we propose a 2D Conv-Attention module which combines convolution neural networks with attention for beamforming. We apply self- and cross-attention to explicitly model the correlations within and between the input channels. The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers. We train and evaluate on an in-house multi-channel dataset. The results show a relative improvement of 3.8% in WER by the proposed model over the baseline neural beamformer. △ Less

Submitted 14 May, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

arXiv:2102.03951 [pdf, other]

End-to-End Multi-Channel Transformer for Speech Recognition

Authors: Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann

Abstract: Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms. In this paper, we leverage the neural transformer architectures for multi-channel speech recognition systems, where the spectral and spatial information collected from different microphones are integrated using attention layers. Our multi-channel transformer network mainly consist… ▽ More Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms. In this paper, we leverage the neural transformer architectures for multi-channel speech recognition systems, where the spectral and spatial information collected from different microphones are integrated using attention layers. Our multi-channel transformer network mainly consists of three parts: channel-wise self attention layers (CSA), cross-channel attention layers (CCA), and multi-channel encoder-decoder attention layers (EDA). The CSA and CCA layers encode the contextual relationship within and between channels and across time, respectively. The channel-attended outputs from CSA and CCA are then fed into the EDA layers to help decode the next token given the preceding ones. The experiments show that in a far-field in-house dataset, our method outperforms the baseline single-channel transformer, as well as the super-directive and neural beamformers cascaded with the transformers. △ Less

Submitted 7 February, 2021; originally announced February 2021.

Comments: Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

arXiv:2102.01740 [pdf, other]

Reliability Analysis of Artificial Intelligence Systems Using Recurrent Events Data from Autonomous Vehicles

Authors: Yili Hong, Jie Min, Caleb B. King, William Q. Meeker

Abstract: Artificial intelligence (AI) systems have become increasingly common and the trend will continue. Examples of AI systems include autonomous vehicles (AV), computer vision, natural language processing, and AI medical experts. To allow for safe and effective deployment of AI systems, the reliability of such systems needs to be assessed. Traditionally, reliability assessment is based on reliability t… ▽ More Artificial intelligence (AI) systems have become increasingly common and the trend will continue. Examples of AI systems include autonomous vehicles (AV), computer vision, natural language processing, and AI medical experts. To allow for safe and effective deployment of AI systems, the reliability of such systems needs to be assessed. Traditionally, reliability assessment is based on reliability test data and the subsequent statistical modeling and analysis. The availability of reliability data for AI systems, however, is limited because such data are typically sensitive and proprietary. The California Department of Motor Vehicles (DMV) oversees and regulates an AV testing program, in which many AV manufacturers are conducting AV road tests. Manufacturers participating in the program are required to report recurrent disengagement events to California DMV. This information is being made available to the public. In this paper, we use recurrent disengagement events as a representation of the reliability of the AI system in AV, and propose a statistical framework for modeling and analyzing the recurrent events data from AV driving tests. We use traditional parametric models in software reliability and propose a new nonparametric model based on monotonic splines to describe the event process. We develop inference procedures for selecting the best models, quantifying uncertainty, and testing heterogeneity in the event process. We then analyze the recurrent events data from four AV manufacturers, and make inferences on the reliability of the AI systems in AV. We also describe how the proposed analysis can be applied to assess the reliability of other AI systems. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: 30 pages, 9 figures

arXiv:2009.09671 [pdf, other]

Towards application-specific query processing systems

Authors: Dimitrios Vasilas, Marc Shapiro, Bradley King, Sara Hamouda

Abstract: Database systems use query processing subsystems for enabling efficient query-based data retrieval. An essential aspect of designing any query-intensive application is tuning the query system to fit the application's requirements and workload characteristics. However, the configuration parameters provided by traditional database systems do not cover the design decisions and trade-offs that arise f… ▽ More Database systems use query processing subsystems for enabling efficient query-based data retrieval. An essential aspect of designing any query-intensive application is tuning the query system to fit the application's requirements and workload characteristics. However, the configuration parameters provided by traditional database systems do not cover the design decisions and trade-offs that arise from the geo-distribution of users and data. In this paper, we present a vision towards a new type of query system architecture that addresses this challenge by enabling query systems to be designed and deployed in a per use case basis. We propose a distributed abstraction called Query Processing Unit that encapsulates primitive query processing tasks, and show how it can be used as a building block for assembling query systems. Using this approach, application architects can construct query systems specialized to their use cases, by controlling the query system's architecture and the placement of its state. We demonstrate the expressiveness of this approach by applying it to the design of a query system that can flexibly place its state in the data center or at the edge, and show that state placement decisions affect the trade-off between query response time and query result freshness. △ Less

Submitted 21 September, 2020; originally announced September 2020.

Journal ref: 36{è}me Conf{é}rence sur la Gestion de Donn{é}es -- Principes, Technologies et Applications (BDA 2020), Oct 2020, Paris, France

arXiv:2007.00131 [pdf, other]

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

Authors: Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas

Abstract: Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time. Performance improvements over vanilla LSTM architectures have been reported by prepending a stack of frequency-LSTM (FLSTM) layers to the time LSTM. These FLSTM layers can learn a more robust input feature to the time LSTM layers by modeling time-fre… ▽ More Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time. Performance improvements over vanilla LSTM architectures have been reported by prepending a stack of frequency-LSTM (FLSTM) layers to the time LSTM. These FLSTM layers can learn a more robust input feature to the time LSTM layers by modeling time-frequency correlations in the acoustic input signals. A drawback of FLSTM based architectures however is that they operate at a predefined, and tuned, window size and stride, referred to as 'view' in this paper. We present a simple and efficient modification by combining the outputs of multiple FLSTM stacks with different views, into a dimensionality reduced feature representation. The proposed multi-view FLSTM architecture allows to model a wider range of time-frequency correlations compared to an FLSTM model with single view. When trained on 50K hours of English far-field speech data with CTC loss followed by sMBR sequence training, we show that the multi-view FLSTM acoustic model provides relative Word Error Rate (WER) improvements of 3-7% for different speaker and acoustic environment scenarios over an optimized single FLSTM model, while retaining a similar computational footprint. △ Less

Submitted 30 June, 2020; originally announced July 2020.

arXiv:1803.04141 [pdf, other]

A Modular Design for Geo-Distributed Querying

Authors: Dimitrios Vasilas, Marc Shapiro, Bradley King

Abstract: Most distributed storage systems provide limited abilities for querying data by attributes other than their primary keys. Supporting efficient search on secondary attributes is challenging as applications pose varying requirements to query processing systems, and no single system design can be suitable for all needs. In this paper, we show how to overcome these challenges in order to extend distri… ▽ More Most distributed storage systems provide limited abilities for querying data by attributes other than their primary keys. Supporting efficient search on secondary attributes is challenging as applications pose varying requirements to query processing systems, and no single system design can be suitable for all needs. In this paper, we show how to overcome these challenges in order to extend distributed data stores to support queries on secondary attributes. We propose a modular architecture that is flexible and allows query processing systems to make trade-offs according to different use case requirements. We describe adap-tive mechanisms that make use of this flexibility to enable query processing systems to dynamically adjust to query and write operation workloads. △ Less

Submitted 12 March, 2018; originally announced March 2018.

Comments: 5th Workshop on Principles and Practice of Consistency for Distributed Data, Apr 2018, Porto, Portugal. 5th Workshop on Principles and Practice of Consistency for Distributed Data April 23--26, 2018, Porto, Portugal, 2018

arXiv:1712.08348 [pdf, other]

Towards Software Development For Social Robotics Systems

Authors: Chong Sun, Jiongyan Zhang, Cong Liu, Barry Chew Bao King, Yuwei Zhang, Matthew Galle, Maria Spichkova

Abstract: In this paper we introduce the core results of the project on software development for social robotics systems. The usability of maintenance and control features is crucial for many kinds of systems, but in the case of social robotics we also have to take into account that (1) the humanoid robot physically interacts with humans, (2) the conversation with children might have different requirements… ▽ More In this paper we introduce the core results of the project on software development for social robotics systems. The usability of maintenance and control features is crucial for many kinds of systems, but in the case of social robotics we also have to take into account that (1) the humanoid robot physically interacts with humans, (2) the conversation with children might have different requirements in comparison to the conversation with adults. The results of our work were implement for the humanoid PAL REEM robot, but their core ideas can be applied for other types of humanoid robots. We developed a web-based solution that supports the management of robot-guided tours, provides recommendations for the users as well as allows for a visual analysis of the data on previous tours. △ Less

Submitted 22 December, 2017; originally announced December 2017.

arXiv:1603.08016 [pdf, other]

Classifying Syntactic Regularities for Hundreds of Languages

Authors: Reed Coke, Ben King, Dragomir Radev

Abstract: This paper presents a comparison of classification methods for linguistic typology for the purpose of expanding an extensive, but sparse language resource: the World Atlas of Language Structures (WALS) (Dryer and Haspelmath, 2013). We experimented with a variety of regression and nearest-neighbor methods for use in classification over a set of 325 languages and six syntactic rules drawn from WALS.… ▽ More This paper presents a comparison of classification methods for linguistic typology for the purpose of expanding an extensive, but sparse language resource: the World Atlas of Language Structures (WALS) (Dryer and Haspelmath, 2013). We experimented with a variety of regression and nearest-neighbor methods for use in classification over a set of 325 languages and six syntactic rules drawn from WALS. To classify each rule, we consider the typological features of the other five rules; linguistic features extracted from a word-aligned Bible in each language; and genealogical features (genus and family) of each language. In general, we find that propagating the majority label among all languages of the same genus achieves the best accuracy in label pre- diction. Following this, a logistic regression model that combines typological and linguistic features offers the next best performance. Interestingly, this model actually outperforms the majority labels among all languages of the same family. △ Less

Submitted 27 April, 2016; v1 submitted 25 March, 2016; originally announced March 2016.

Showing 1–18 of 18 results for author: King, B