Showing 1–2 of 2 results for author: Maveli, N

Search v0.5.6 released 2020-02-24

arXiv:2110.02283 [pdf, other]

cs.CL cs.AI cs.LG

Co-training an Unsupervised Constituency Parser with Weak Supervision

Authors: Nickil Maveli, Shay B. Cohen

Abstract: We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay betwe… ▽ More We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results. Our code and pre-trained models are available at https://github.com/Nickil21/weakly-supervised-parsing. △ Less

Submitted 18 March, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: Accepted to Findings of ACL 2022
arXiv:2009.06375 [pdf, other]

cs.CL cs.IR cs.LG cs.SI stat.ML

EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets

Authors: Nickil Maveli

Abstract: Twitter and, in general, social media has become an indispensable communication channel in times of emergency. The ubiquitousness of smartphone gadgets enables people to declare an emergency observed in real-time. As a result, more agencies are interested in programmatically monitoring Twitter (disaster relief organizations and news agencies). Therefore, recognizing the informativeness of a Tweet… ▽ More Twitter and, in general, social media has become an indispensable communication channel in times of emergency. The ubiquitousness of smartphone gadgets enables people to declare an emergency observed in real-time. As a result, more agencies are interested in programmatically monitoring Twitter (disaster relief organizations and news agencies). Therefore, recognizing the informativeness of a Tweet can help filter noise from the large volumes of Tweets. In this paper, we present our submission for WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. Our most successful model is an ensemble of transformers, including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting. The proposed system achieves an F1 score of 0.9011 on the test set (ranking 7th on the leaderboard) and shows significant gains in performance compared to a baseline system using FastText embeddings. △ Less

Submitted 18 April, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: Accepted at W-NUT workshop of EMNLP 2020 (7 pages, 6 figures, 3 tables)

Search v0.5.6 released 2020-02-24