Showing 1–2 of 2 results for author: Maveli, N
-
Co-training an Unsupervised Constituency Parser with Weak Supervision
Authors:
Nickil Maveli,
Shay B. Cohen
Abstract:
We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay betwe…
▽ More
We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results. Our code and pre-trained models are available at https://github.com/Nickil21/weakly-supervised-parsing.
△ Less
Submitted 18 March, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
Authors:
Nickil Maveli
Abstract:
Twitter and, in general, social media has become an indispensable communication channel in times of emergency. The ubiquitousness of smartphone gadgets enables people to declare an emergency observed in real-time. As a result, more agencies are interested in programmatically monitoring Twitter (disaster relief organizations and news agencies). Therefore, recognizing the informativeness of a Tweet…
▽ More
Twitter and, in general, social media has become an indispensable communication channel in times of emergency. The ubiquitousness of smartphone gadgets enables people to declare an emergency observed in real-time. As a result, more agencies are interested in programmatically monitoring Twitter (disaster relief organizations and news agencies). Therefore, recognizing the informativeness of a Tweet can help filter noise from the large volumes of Tweets. In this paper, we present our submission for WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. Our most successful model is an ensemble of transformers, including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting. The proposed system achieves an F1 score of 0.9011 on the test set (ranking 7th on the leaderboard) and shows significant gains in performance compared to a baseline system using FastText embeddings.
△ Less
Submitted 18 April, 2021; v1 submitted 6 September, 2020;
originally announced September 2020.