Search | arXiv e-print repository

doi 10.1145/3534678.3539173

Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems

Authors: Jack FitzGerald, Shankar Ananthakrishnan, Konstantine Arkoudas, Davide Bernardi, Abhishek Bhagia, Claudio Delli Bovi, Jin Cao, Rakesh Chada, Amit Chauhan, Luoxin Chen, Anurag Dwarakanath, Satyam Dwivedi, Turan Gojayev, Karthik Gopalakrishnan, Thomas Gueudre, Dilek Hakkani-Tur, Wael Hamza, Jonathan Hueser, Kevin Martin Jose, Haidar Khan, Beiye Liu, Jianhua Lu, Alessandro Manzotti, Pradeep Natarajan, Karolina Owczarzak , et al. (16 additional authors not shown)

Abstract: We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform co… ▽ More We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform comparably to XLM-R and mT5 when evaluated on the written-form Cross-lingual Natural Language Inference (XNLI) corpus. We perform a second stage of pretraining on our teacher models using in-domain data from our system, improving error rates by 3.86% relative for intent classification and 7.01% relative for slot filling. We find that even a 170M-parameter model distilled from our Stage 2 teacher model has 2.88% better intent classification and 7.69% better slot filling error rates when compared to the 2.3B-parameter teacher trained only on public data (Stage 1), emphasizing the importance of in-domain data for pretraining. When evaluated offline using labeled NLU data, our 17M-parameter Stage 2 distilled model outperforms both XLM-R Base (85M params) and DistillBERT (42M params) by 4.23% to 6.14%, respectively. Finally, we present results from a full virtual assistant experimentation platform, where we find that models trained using our pretraining and distillation pipeline outperform models distilled from 85M-parameter teachers by 3.74%-4.91% on an automatic measurement of full-system user dissatisfaction. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: KDD 2022

ACM Class: I.2.7

Journal ref: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22), August 14-18, 2022, Washington, DC, USA

arXiv:1907.06632 [pdf, other]

Metamorphic Testing of a Deep Learning based Forecaster

Authors: Anurag Dwarakanath, Manish Ahuja, Sanjay Podder, Silja Vinu, Arijit Naskar, Koushik MV

Abstract: In this paper, we present the Metamorphic Testing of an in-use deep learning based forecasting application. The application looks at the past data of system characteristics (e.g. `memory allocation') to predict outages in the future. We focus on two statistical / machine learning based components - a) detection of co-relation between system characteristics and b) estimating the future value of a s… ▽ More In this paper, we present the Metamorphic Testing of an in-use deep learning based forecasting application. The application looks at the past data of system characteristics (e.g. `memory allocation') to predict outages in the future. We focus on two statistical / machine learning based components - a) detection of co-relation between system characteristics and b) estimating the future value of a system characteristic using an LSTM (a deep learning architecture). In total, 19 Metamorphic Relations have been developed and we provide proofs & algorithms where applicable. We evaluated our method through two settings. In the first, we executed the relations on the actual application and uncovered 8 issues not known before. Second, we generated hypothetical bugs, through Mutation Testing, on a reference implementation of the LSTM based forecaster and found that 65.9% of the bugs were caught through the relations. △ Less

Submitted 13 July, 2019; originally announced July 2019.

Comments: Paper published at the 2019 IEEE/ACM 4th International Workshop on Metamorphic Testing (MET)

arXiv:1809.09477 [pdf, other]

Trustworthiness in Enterprise Crowdsourcing: a Taxonomy & evidence from data

Authors: Anurag Dwarakanath, Shrikanth N. C., Kumar Abhinav, Alex Kass

Abstract: In this paper we study the trustworthiness of the crowd for crowdsourced software development. Through the study of literature from various domains, we present the risks that impact the trustworthiness in an enterprise context. We survey known techniques to mitigate these risks. We also analyze key metrics from multiple years of empirical data of actual crowdsourced software development tasks from… ▽ More In this paper we study the trustworthiness of the crowd for crowdsourced software development. Through the study of literature from various domains, we present the risks that impact the trustworthiness in an enterprise context. We survey known techniques to mitigate these risks. We also analyze key metrics from multiple years of empirical data of actual crowdsourced software development tasks from two leading vendors. We present the metrics around untrustworthy behavior and the performance of certain mitigation techniques. Our study and results can serve as guidelines for crowdsourced enterprise software development. △ Less

Submitted 25 September, 2018; originally announced September 2018.

Comments: Author's submitted version. Final version accepted at ICSE SEIP 2016. Published version at: https://dl.acm.org/citation.cfm?id=2889225

arXiv:1809.09455 [pdf]

Machines that test Software like Humans

Authors: Anurag Dwarakanath, Neville Dubash, Sanjay Podder

Abstract: Automated software testing involves the execution of test scripts by a machine instead of being manually run. This significantly reduces the amount of manual time & effort needed and thus is of great interest to the software testing industry. There have been various tools developed to automate the testing of web applications (e.g. Selenium WebDriver); however, the practical adoption of test automa… ▽ More Automated software testing involves the execution of test scripts by a machine instead of being manually run. This significantly reduces the amount of manual time & effort needed and thus is of great interest to the software testing industry. There have been various tools developed to automate the testing of web applications (e.g. Selenium WebDriver); however, the practical adoption of test automation is still miniscule. This is due to the complexity of creating and maintaining automation scripts. The key problem with the existing methods is that the automation test scripts require certain implementation specifics of the Application Under Test (AUT) (e.g. the html code of a web element, or an image of a web element). On the other hand, if we look at the way manual testing is done, the tester interprets the textual test scripts and interacts with the AUT purely based on what he perceives visually through the GUI. In this paper, we present an approach to build a machine that can mimic human behavior for software testing using recent advances in Computer Vision. We also present four use-cases of how this approach can significantly advance the test automation space making test automation simple enough to be adopted practically. △ Less

Submitted 25 September, 2018; originally announced September 2018.

arXiv:1809.08446 [pdf, other]

Minimum Number of Test Paths for Prime Path and other Structural Coverage Criteria

Authors: Anurag Dwarakanath, Aruna Jankiti

Abstract: The software system under test can be modeled as a graph comprising of a set of vertices, (V) and a set of edges, (E). Test Cases are Test Paths over the graph meeting a particular test criterion. In this paper, we present a method to achieve the minimum number of Test Paths needed to cover different structural coverage criteria. Our method can accommodate Prime Path, Edge-Pair, Simple & Complete… ▽ More The software system under test can be modeled as a graph comprising of a set of vertices, (V) and a set of edges, (E). Test Cases are Test Paths over the graph meeting a particular test criterion. In this paper, we present a method to achieve the minimum number of Test Paths needed to cover different structural coverage criteria. Our method can accommodate Prime Path, Edge-Pair, Simple & Complete Round Trip, Edge and Node coverage criteria. Our method obtains the optimal solution by transforming the graph into a flow graph and solving the minimum flow problem. We present an algorithm for the minimum flow problem that matches the best known solution complexity of $O(|V| |E|)$. Our method is evaluated through two sets of tests. In the first, we test against graphs representing actual software. In the second test, we create random graphs of varying complexity. In each test we measure the number of Test Paths, the length of Test Paths, the lower bound on minimum number of Test Paths and the execution time. △ Less

Submitted 22 September, 2018; originally announced September 2018.

Comments: Author final version. Paper accepted at Testing Software and Systems. ICTSS 2014. Lecture Notes in Computer Science, vol 8763. Springer, Berlin, Heidelberg. Published paper available at: https://link.springer.com/chapter/10.1007/978-3-662-44857-1_5

arXiv:1809.08100 [pdf]

doi 10.1109/ICST.2017.52

Accelerating Test Automation through a Domain Specific Language

Authors: Anurag Dwarakanath, Dipin Era, Aditya Priyadarshi, Neville Dubash, Sanjay Podder

Abstract: Test automation involves the automatic execution of test scripts instead of being manually run. This significantly reduces the amount of manual effort needed and thus is of great interest to the software testing industry. There are two key problems in the existing tools and methods for test automation - a) Creating an automation test script is essentially a code development task, which most tester… ▽ More Test automation involves the automatic execution of test scripts instead of being manually run. This significantly reduces the amount of manual effort needed and thus is of great interest to the software testing industry. There are two key problems in the existing tools and methods for test automation - a) Creating an automation test script is essentially a code development task, which most testers are not trained on; and b) the automation test script is seldom readable, making the task of maintenance an effort intensive process. We present the Accelerating Test Automation Platform (ATAP) which is aimed at making test automation accessible to non-programmers. ATAP allows the creation of an automation test script through a domain specific language based on English. The English-like test scripts are automatically converted to machine executable code using Selenium WebDriver. ATAP's English-like test script makes it easy for non-programmers to author. The functional flow of an ATAP script is easy to understand as well thus making maintenance simpler (you can understand the flow of the test script when you revisit it many months later). ATAP has been built around the Eclipse ecosystem and has been used in a real-life testing project. We present the details of the implementation of ATAP and the results from its usage in practice. △ Less

Submitted 21 September, 2018; originally announced September 2018.

Comments: Accepted at 10th IEEE International Conference on Software Testing, Verification and Validation

arXiv:1808.05353 [pdf, other]

doi 10.1145/3213846.3213858

Identifying Implementation Bugs in Machine Learning based Image Classifiers using Metamorphic Testing

Authors: Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, R. P. Jagadeesh Chandra Bose, Neville Dubash, Sanjay Podder

Abstract: We have recently witnessed tremendous success of Machine Learning (ML) in practical applications. Computer vision, speech recognition and language translation have all seen a near human level performance. We expect, in the near future, most business applications will have some form of ML. However, testing such applications is extremely challenging and would be very expensive if we follow today's m… ▽ More We have recently witnessed tremendous success of Machine Learning (ML) in practical applications. Computer vision, speech recognition and language translation have all seen a near human level performance. We expect, in the near future, most business applications will have some form of ML. However, testing such applications is extremely challenging and would be very expensive if we follow today's methodologies. In this work, we present an articulation of the challenges in testing ML based applications. We then present our solution approach, based on the concept of Metamorphic Testing, which aims to identify implementation bugs in ML based image classifiers. We have developed metamorphic relations for an application based on Support Vector Machine and a Deep Learning based application. Empirical validation showed that our approach was able to catch 71% of the implementation bugs in the ML applications. △ Less

Submitted 16 August, 2018; originally announced August 2018.

Comments: Published at 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018)

Showing 1–7 of 7 results for author: Dwarakanath, A