22institutetext: FH Aachen – University of Applied Sciences, 52428 Jülich, Germany 22email: {p.kohl,y.kraemer,kraft}@fh-aachen.de 22institutetext: University of Kassel, 34121 Kassel, Germany 22email: [email protected]

Scoping Review of Active Learning Strategies and their Evaluation Environments for Entity Recognition Tasks

Philipp Kohl 11 0000-0002-5972-8413    Yoka Krämer 11 0009-0006-7326-3268    Claudia Fohry 22    Bodo Kraft 11
Abstract

We conducted a scoping review for active learning in the domain of natural language processing (NLP), which we summarize in accordance with the PRISMA-ScR guidelines as follows:

Objective: Identify active learning strategies that were proposed for entity recognition and their evaluation environments (datasets, metrics, hardware, execution time).

Design: We used Scopus and ACM as our search engines. We compared the results with two literature surveys to assess the search quality. We included peer-reviewed English publications introducing or comparing active learning strategies for entity recognition.

Results: We analyzed 62 relevant papers and identified 106 active learning strategies. We grouped them into three categories: exploitation-based (60x), exploration-based (14x), and hybrid strategies (32x). We found that all studies used the F1-score as an evaluation metric. Information about hardware (6x) and execution time (13x) was only occasionally included. The 62 papers used 57 different datasets to evaluate their respective strategies. Most datasets contained newspaper articles or biomedical/medical data. Our analysis revealed that 26 out of 57 datasets are publicly accessible.

Conclusion: Numerous active learning strategies have been identified, along with significant open questions that still need to be addressed. Researchers and practitioners face difficulties when making data-driven decisions about which active learning strategy to adopt. Conducting comprehensive empirical comparisons using the evaluation environment proposed in this study could help establish best practices in the domain.

Keywords:
Scoping Review Active Learning Selective Sampling Entity Recognition Span Labeling Annotation Effort Annotation Costs NLP.

1 Introduction

Recent years showed significant advancements [98, 7, 19] in natural language processing (NLP): Large language models (LLMs) emerged [8], facilitating new methodologies by describing tasks in natural language without a strong formalism. Because resource-intensive LLMs are not always superior [112], smaller, supervised learning-based models are still highly relevant for specialized domains or use cases that require rapid inference or are constrained by hardware limitations (such as mobile devices or offline scenarios) [34].

One of these domains is entity recognition [73]. Entity recognition  (ER) describes the task of assigning a label to a sequence of words (e.g. to extract a person, a date or any other predefined label). To apply supervised learning to ER, data must be annotated. The manual annotation process, in which humans annotate data points with these predefined labels, is time-intensive and expensive [106]. Its output is an annotated dataset, which is also called corpus (pl. corpora) in the NLP domain. We use the terms interchangeably in this paper.

Researchers have been exploring supporting methods to reduce the annotation effort, such as semi-supervised learning [93, 31], self-learning [94, 108], and active learning [4, 72]. Active learning (AL) is an approach to strategically or heuristically select data points for human annotation. This methodology can reduce the number of data points required to achieve competitive model performance compared to the classical annotation and training process. Thus, AL can decrease the time and cost of the annotation and training processes. However, the selection of an appropriate AL strategy is crucial. Selecting an inappropriate strategy can lead to lower performance compared to random data selection [10, 38].

Over the past two decades, researchers have developed many active learning strategies in the field of NLP for various scenarios. However, it is still challenging for researchers and practitioners to select a promising strategy for a given use case. While existing AL surveys provide taxonomies [72, 105, 106], there is still a lack of comprehensive performance analyses. Towards closing this gap, an overview of the domain can support researchers in conducting such analyses. Therefore, we executed a scoping review focusing on active learning strategies and their evaluation environments limited to the entity recognition task in NLP. We concentrated our review on model-agnostic strategies so researchers can use our results for a broad range of models. Our review answers the following review questions:

  1. 1.

    Which model-agnostic AL strategies have been applied to ER?

  2. 2.

    How did the researchers evaluate their strategies?

    1. (a)

      Which datasets did they use?

    2. (b)

      Which metrics did they use to compare AL strategies?

    3. (c)

      How much time do the AL strategies need for initialization, proposing new data points to annotators, and model retraining (in case of exploitation) depending on the hardware?

We chose the ER task due to its complexity in the annotation process [17] and AL [72]. The complexity results from the ER model, which makes decisions for every token (e.g., word). Many AL strategies (>>> 80) compute the relevance of a data point based on these individual decisions.

For our review, we selected the format of a scoping review [25, 61] because we give an overview of the domain by identifying the research field’s available strategies, datasets, metrics, execution times, and hardware used. We also identified research gaps and open questions, which enables other researchers to conduct a systematic review with precisely defined research questions in the field of AL and ER based on our work111The preparation of systematic reviews is a strong hint for performing a scoping review..

We adhere closely to the PRISMA-ScR [95] reporting schema and checklist. The schema provides detailed guidelines for conducting a scoping review (databases, criteria, search, data charting, …) and writing a paper with all necessary information. We publish our additional materials (exhaustive lists of all information regarding the review questions) publicly on GitHub222https://github.com/philipp-kohl/scoping-review-active-learning-er.

Section 2 defines the entity recognition task and active learning and outlines established taxonomies. Section 3 details our review process, ensuring reproducibility and extensibility. Section 4 presents the aggregated findings on AL strategies and their evaluation environments alongside open research questions. Section 5 discusses related work on reducing annotation efforts besides AL. Section 6 addresses ethical considerations, and we conclude by summarizing our findings and highlighting future directions in Section 7.

2 Definitions

Our scoping review focuses on the application of AL to ER. These terms are not used uniformly in the literature. To make our work reproducible and comprehensible, this section starts with precise definitions of both concepts concentrated on the NLP domain, as they are assumed throughout this paper. We also use the definitions as eligibility criteria (Subsection 3.3) for our scoping review.

2.1 Entity Recognition

Entity recognition (ER) describes the NLP task of using a machine learning model to find entities automatically (e.g., persons or organizations) in a text. ER works with an arbitrary predefined label set. A specialized type of ER is named entity recognition [75], which focuses on proper nouns with a label set of person, organization, location, and dates.

ER splits the text into tokens (in a simplified format into words) and assigns a label to each token. This type is called sequence labeling. The literature distinguishes between sequence [75] and span labeling approaches [84, 104]. For sequence labeling approaches, it is challenging to label overlapping entities because every token receives only one label. Span labeling closes this gap. Spans represent n𝑛nitalic_n consecutive tokens. Span labeling enumerates all spans with length 1n1𝑛1-n1 - italic_n and classifies each span. Thus, span labeling has to make more decisions, which makes it more complex.

In this paper, we use the term entity recognition for sequence and span labeling approaches with an arbitrary predefined label set.

2.2 Active Learning

Refer to caption
Figure 1: Flow diagram for the pool-based active learning cycle following [72, 38].

Active learning (AL) reduces the annotation effort by selecting data points from an unannotated corpus with an AL strategy. Figure 1 shows the AL cycle: 1) We start with a pool of unlabeled data points. The AL strategy selects data points from the pool and passes them to the human annotator. 2) Once the annotator has enriched the data points with the labels, the new labeled batch is added to the already labeled dataset. 3) When the amount of newly added data points reaches a threshold, the (re-)training of the NLP model will be triggered. 4) A new annotation cycle is started if no stopping criterion333Stopping criteria are e.g., desired model performance level or the unavailability of unlabeled data points. is fulfilled. 5) This cycle repeats until a stopping criterion is met.

AL assumes that data points are not equally valuable for the amount of knowledge the NLP model gains [69]. AL algorithms pick the data points for annotating and, therefore, also for training to maximize the knowledge gained per annotated data point [69, 58, 110]. Ideally, this process reduces annotation effort and cost compared to sequential or random data selection [110, 69]. However, using an unsuitable strategy can even lower performance compared to random selection [10, 38].

We address pool-based AL, where we have a (large) unlabeled dataset, and select the most promising data points by the strategy. An alternative approach is stream-based AL [53]. The data is passed one by one to the strategy. The strategy decides whether to propose the single data point to the annotator without incorporating information from other data points. Users apply stream-based AL e.g., due to limited hardware settings.

As a basis for our own work, we adopted the well-known taxonomies from surveys [56, 5, 106, 72] to categorize active learning strategies. Thereby, we modified the terminology to consistently use the same term for semantically similar concepts, improving our work’s clarity and readability. On the top level of our categorization of AL strategies, we follow the distinction into exploitation-based, exploration-based, and hybrid strategies:

Exploitation methods leverage feedback from the ER model to assess the potential value of a data point in the learning process [56]. These methods typically use uncertainty scores, disagreement among multiple weak learners, or performance predictions as their basis. Exploitation strategies’ intuition wants to enhance the decision boundaries, although they tend to focus on outliers [106]. The authors of [106] use the term informativeness, which is very similar to the understanding of exploitation. Hence, we combine the terms to have a consistent naming throughout this work. Informativeness strategies consider a single independent instance without assessing the relation to other data points. They do not explicitly state the need for model feedback, although all the stated strategies use model information. Thus, the definition aligns well with exploitation.

In the case of AL for ER, we have to consider that ER works on token- or span-level. This gives us single feedback information for each token, which AL strategies have to interpret to measure the usefulness of the whole data point. For this purpose, different aggregation methods are known in the literature [71]: e.g., total sum, average, or single most uncertainty.

Exploration methods select data points independently of model feedback. They use vector representations combined with clustering approaches based on density, diversity, and discriminative to determine a data point’s relevance. Exploration strategies aim to cover the vector space holistically [56]. [106] uses the term representativeness for exploration. Similar to exploitation and informativeness, we use the term exploration in this work. The authors use representativeness for strategies that include information on multiple data points to select a subset.

Hybrid methods combine exploitation and exploration methods [106]. They use several strategies sequentially, in parallel, or combine them in weighted aggregation. This way, strategies compensate for other strategies’ drawbacks (e.g., selecting outliers). Because hybrid strategies incorporate exploitation strategies, we must also consider aggregation methods.

We use this taxonomy to group the AL strategies of our screened papers in Subsection 4.1.

3 Methodology

Our scoping review follows the procedure proposed by PRISMA-ScR [95]. We have selected and analyzed papers introducing or modifying active learning strategies applied to ER. For these papers, we list and group the active learning strategies and evaluation datasets. We inspected the evaluation environment to see which metrics researchers use and if they describe the used hardware and give information about execution times.

3.1 Search Engines

We based the search engine selection on [26]. We chose Scopus444https://www.scopus.com/ as our primary search engine because of its literature coverage, advanced searching, and filter features. As a secondary search engine, we used ACM Digital Library555https://dl.acm.org/ to challenge Scopus and enhance the literature coverage.

Scopus is a multidisciplinary, international database and search engine666https://www.elsevier.com/products/scopus/content. It allows the downloading of search results in bulk and supports repeatable queries, guaranteeing reproducibility and maintainability [26]. Scopus was highlighted in [26, 9] as an appropriate choice due to its robust functionalities and extensive database containing more than 14,000 scientific journals. Regarding our specific area of computational linguistics, a manual search for 20 prominent conferences and journals777https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_computationallinguistics confirmed that Scopus indexes all of them. The database encompasses 2689 sources (conference proceedings, journals, book series, …) in computer science and 364 in artificial intelligence, highlighting its broad scope.

We used advanced searching with Scopus and the following query:

{minted}

[fontsize=]python TITLE-ABS-KEY ( (”Active Learning” OR ”Selective Sampling”) AND # End Group 1 (”Sequence Labeling” OR ”Span Categorization” OR ”Entity Classification” OR ”Named Entity” OR ”Entity Recognition” OR ”Span Labeling” OR ”Information Extraction” OR ”Sequence Tagging”)) AND # End Group 2 ( LIMIT-TO ( LANGUAGE,”English” ) ) AND ( LIMIT-TO ( DOCTYPE,”cp” ) OR LIMIT-TO ( DOCTYPE,”ar” ) OR LIMIT-TO ( DOCTYPE,”ch” ) ) AND ( LIMIT-TO ( PUBSTAGE,”final” )) # End Group 3

This search query covers articles’ titles, abstracts, and keywords, looking for relevant matches. It is structured with three groups of terms linked by a logical ’AND’, which are indicated with the comments End Group n. These groups include synonyms for active learning, entity recognition, and filtering criteria888We included only papers written in English that are either book chapters (ch), articles (ar) or conference papers (cp).. The terms in the second group are not strict synonyms. However, in various sources, they are commonly used to describe ER. The selection of these synonyms has evolved iteratively: beginning with ’Entity Recognition’, we then expanded the list by analyzing keywords in articles found through Scopus and literature surveys such as [69, 106], adding relevant terms gradually.

ACM Digital Library serves as a secondary search engine to complement our primary database search with Scopus. Its ACM Guide to Computing Literature database indexes over 2.8 million records999https://libraries.acm.org/digital-library/acm-guide-to-computing-literature, emphasizing conference proceedings, a key source of current research.

ACM does not offer to search titles, abstracts and keywords as a whole. Thus, we adapted our Scopus search string and applied it only to abstracts, leaving out our Scopus-specific filtering criteria. We hypothesized that research papers that address AL and ER would likely include relevant keywords in their abstracts.

3.2 Review Process

Refer to caption
Figure 2: Our review process followed the procedure proposed by [95]. It is divided into five stages, described in more detail in Subsection 3.2. The number of exclusion reasons listed for stage 2) to 4) does not always add up to the total number of excluded records because multiple exclusion criteria can exclude the same record. See the GitHub repository for a detailed list.

Figure 2 gives an overview of our review process: First, for the Identification of relevant papers (records), we entered the search strings stated in Subsection 3.1 in the advanced search fields on Scopus and ACM. We last updated the results of this search on the 12th of January, 2024. This search yielded 260 papers at Scopus and 105 papers at ACM. To further secure the comprehensiveness of our results, we manually compared the results with two AL literature surveys [69, 106]. Together, these two surveys yielded 121 papers: 103 papers from [69] and 18 papers from [106]101010We focused our manual search on the relevant sections to avoid including papers not targeting our topic. We extracted references from Section 3 and 4.2.4 of [69]. From [106], we took the sources listed in appendix A for (named) ER..

We imported the results from Scopus, ACM, and the literature surveys (486 papers) to the screening and documentation tool rayyan.ai111111https://www.rayyan.ai/. This tool facilitates documenting the results of the automatic Pre-screening and the following manual Screening rounds 1 + 2 by recording the reviewers’ decisions and exclusion reasons. During Pre-Screening, we excluded duplicates and conference proceedings. Then, in Screening Round 1, one reviewer screened all documents and excluded only those fulfilling at least one of the obvious exclusion criteria (Subsection 3.3). Thereby, the reviewer excluded 248 documents. Then, in Screening Round 2, two reviewers screened the remaining 124 papers independently and analyzed them regarding their fit to our detailed exclusion criteria (Subsection 3.3). In this stage, we excluded 2 of the 3 remaining papers of our manual search that had surpassed the screening process so far. All other 118 papers had already been excluded in earlier stages: 25 were excluded during Pre-Screening, 93 during Screening Round 1. The resulting exclusion of more than 99% of the manually added records indicates a high coverage of our Scopus and ACM search results.

Finally, we analyzed the remaining 62 papers and created the results (Section 4): One reviewer extracted the AL strategies, the datasets, metrics, used hardware, and execution times. The second reviewer verified these results to improve the outcome’s quality and coverage. Overall, we identified 106 AL strategies applied to 57 datasets. See our GitHub repository for details.

The distinction between different AL strategies in the context of ER is not trivial. The scores are often calculated at the token level, which must be aggregated to select entire documents. We consider two AL strategies as different if they differ on at least one level: e.g. if the Least Confidence (LC) score is calculated on the token level, some authors average all token scores to a document-level LC score. Others use the value of the token with minimal confidence. This difference is represented in our analysis by identifying two separate AL strategies.

We categorize the identified strategies according to the taxonomy already described in Subsection 2.2.

3.3 Eligibility Criteria

We defined inclusion and exclusion criteria based on our review questions in Section 1. We used these criteria to perform our scoping review. Furthermore, they help other researchers reproduce or update this scoping review. We did not apply any restrictions on the papers’ publication year. All other criteria are listed below.

A paper had to match the following inclusion criteria holistically in order to be included in the review. The paper had to:

  • apply a pool-based AL method as defined in Subsection 2.2.

  • apply AL strategies to ER as defined in Subsection 2.1.

  • be written in English to ensure it addresses the global community.

  • be peer-reviewed, which represents a successful prior quality assessment.

  • use at least one model-agnostic AL strategy. We want to investigate strategies that can be applied to a broad spectrum of use cases and models.

We defined two groups of exclusion criteria to structure our review process (Figure 2): Obvious exclusion criteria contain more formal and less complex decisions that one reviewer can make based on the abstract and, if necessary, an additional short screening of the full text. Detailed exclusion criteria require a more detailed content analysis and were therefore assessed by two reviewers independently. In this case, both reviewers read the paper carefully and analyzed its contents to make a decision.

If a paper matched one of the following obvious exclusion criteria, we excluded it from the review:

  • The paper was a duplicate. Duplicates could occur because we used several search strategies and included all results in the first step.

  • It was not possible to access a full-text version with free access, IEEE or Scopus subscription.

  • The record was a complete conference proceeding. Conference proceedings were excluded because the relevant individual papers should also be contained in our search results.

  • The paper conducted a survey or a systematic/scoping review. We excluded them due to the same reason as conference proceedings.

  • The paper evaluated their AL strategies only on datasets that do not follow a language based on the Latin writing system. This creates a language group with a common base, which is essential for the model selection [16].

  • The record did not report on an ER task.

  • The paper did not use AL.

  • AL was not applied to an ER task.

  • The paper used a stream-based AL procedure.

If a paper matched one of the detailed exclusion criteria, we excluded it from our review:

  • Used AL strategies were not identifiable. In that case, the paper does not focus on AL as a main topic, which does not align with our objective.

  • None of the AL strategies presented were model-agnostic.

  • AL was applied to multiple NLP tasks in a non-separable manner.

  • AL was combined with other methods121212Data augmentation [43], weak [24] and distant [40] supervision, proactive learning [42], over-labeling [59], semi supervised learning [93], self learning [62], self-training [108], multi-task AL [109], pre-tagging [54], cross-lingual transfer learning [11], and imitation learning [50] to reduce the annotation costs. We excluded the combination because these methods introduce different changes to the AL cycle (Subsection 2.2).

  • The paper’s entity recognition task did not match our definition from Subsection 2.1. Observed modifications were transfer knowledge (e.g., using a source corpus to transfer knowledge onto a target corpus [47, 86]) or selecting subsequences instead of whole samples [68, 52].

4 Results

Refer to caption
Figure 3: Publication year of the 2000er for the 62 papers analyzed within this scoping review.

The following sections discuss and summarize our results. For comprehensive lists of papers, AL strategies, corpora, and evaluation environments with detailed information, please consult our GitHub repository131313https://github.com/philipp-kohl/scoping-review-active-learning-er. The analyses presented in the following answer our review questions from Section 1. Furthermore, we identify research gaps, which we provide after our observations.

All papers identified through the procedure described in Subsection 3.2 were published between 2004 and 2023 (compare Figure 3). As shown in the figure, the interest in AL for ER is on the rise. Almost half of the papers (30 out of 62) were published in the past five years.

When looking at our results, our eligibility criteria must be kept in mind. We selected papers presenting research on developing or modifying AL strategies for ER. This introduces a bias that hinders the transfer of our results outside of this scientific scope.

Table 1: Overview of the AL strategies identified divided into categories following [106, 73, 72]. More details concerning the concrete selection strategies can be found in the references and in our GitHub repository.
AL
method
Specification
# of AL
strategies
# of
usages
Papers
Exploitation Uncertainty 36 97
[51, 44, 56, 103, 79, 1, 12, 91, 64, 92, 107]
[71, 2, 17, 18, 107, 83, 39, 64, 20, 57, 55]
[73, 85, 74, 29, 87, 63, 37, 45, 46, 97, 60]
[80, 10, 111, 76, 77, 48, 81, 14, 70, 78, 36]
[67, 77, 101, 13]
Disagreement 14 23
[103, 78, 80, 76, 10, 77, 28, 90, 66, 73, 17]
[65, 27, 89]
Performance
Prediction
9 10 [10, 29, 63, 48, 73]
Variance Reduction 1 1 [73]
Exploration Density 6 6 [10, 13, 29, 111, 14, 97]
Discriminative 5 6 [44, 10, 13]
Density &
Discriminative
3 3 [56, 36]
Hybrid
Uncertainty
& Density
14 18 [37, 97, 48, 35, 12, 102, 73, 56, 14, 111, 100]
Uncertainty
& Discriminative
12 13 [78, 10, 35, 36, 71]
Uncertainty
& Other
5 5 [36, 87, 6]
Disagreement &
Discriminative
1 1 [23]
Table 2: Uncertainty-based AL strategies with their heuristic.
Heuristic # of strategies # of usages
Least Confidence 11 35
Entropy 9 22
Margin 4 14
Count 4 4
Round Robin 3 3
Max. Norm. Log-Probability 1 15
Other 4 4
Sum 36 97

4.1 Active Learning Strategies for ER (Review Question 1)

In total, we identified 106 AL strategies with ER applications in our scoping review. Table 1 lists the total number of AL strategies using the different methods (exploitation, exploration, and hybrid) and their specification following [106, 73, 72].

We list our observations of the results regarding the AL strategies in the following:

Focus on Exploitation-based Approaches

We identified 60 exploitation-based AL strategies, which were used 131 times in our 62 analyzed papers. Uncertainty-based AL strategies represent the majority (Table 1): 60% of the exploitation-based strategies belong to uncertainty-based approaches, which are used 74% of the time. Table 2 shows the number of strategies and their usages of the uncertainty-based AL strategies grouped by the different scoring approaches, which we call heuristics. Least confidence approaches were developed and used most.

Infrequently Used Exploration-based Approaches

We identified 14 exploration-based strategies. They are applied less often in isolation (15 times) than in hybrid settings (37 times). This is noteworthy because the implementation of hybrid approaches is more complex. Exploitation-based strategies, in contrast, are used extensively on their own.

Distribution of AL Strategies Over Domains

As depicted in Table 3, the three most used domains are bio-medicine, medicine, and newspaper. We observed that significantly more exploitation than exploration approaches are applied in all domains. Interestingly, hybrid strategies are on par with exploitation strategies in the medical domain. The other two domains use less than half as often hybrid strategies as exploitation strategies.

Table 3: Number of strategies applied to the three main domains.
Approach # of strategies Medicine Biomedicine News-corpora
Exploitation 60 17 29 44
Exploration 14 5 3 4
Hybrid 32 17 13 14
Sum 106 39 45 62

4.2 Corpora (Review Question 2a)

We identified 57 corpora from more than 9 domains. Most corpora belong to the domain of bio-medicine (12), medicine (9), and newspaper articles (7). The others hold three or fewer corpora141414Cybersecurity (3), Scientific Papers (3), Twitter Posts (2), Wikipedia Articles (3), Instructions (2), E-Mail (2), and we group the single domain corpora into an other category (14). See GitHub repository for an exhaustive list..

Figure 4 shows the corpora usage per domain. 35 out of 62 papers use newspaper articles to investigate AL performance. Second and third are bio-medicine and medicine, with 23 usages each. The bio-medicine and medicine domains have the highest number of corpora, but researchers use newspaper articles more frequently.

The CoNLL [88] corpora (2003 and 2002) based on newspaper articles were the most used corpora with 30 usages. Other often used corpora were i2b2/VA 2010 (medicine) [96] with 7 usages and JNLPBA (bio-medicine) [15] with 6 usages.

We investigated public access to corpora and prepared a list of accessible datasets for further research. We consider a corpus open access if the researchers provide a link to the dataset or to a reference that introduced and published the corpus. Also, we consulted the authors’ web pages to find corpora for their publications when necessary. If a corpus has to be requested and is only licensable for research and academic purposes, we consider it as open access. 26 out of 57 corpora follow the reproducible research requirement to publish their datasets with open access.

Researchers do not always publish their datasets or make their annotations freely available. 26 of 57 corpora are private or can be licensed against a fee. We classify them as not open access. For 5 corpora, we could not draw a reasonable decision due to invalid or moved internet resources. Thus, we consider them as not openly accessible.

We list our observations of the results regarding the corpora in the following.

High Usage of Newspaper Articles

Newspaper articles show the highest usage (Figure 4) across the domains. More than half of the identified papers evaluate their active learning strategies on newspaper articles, although the medicine and bio-medicine corpora have more corpora. We hypothesize that newspaper articles are often freely available, and the annotation process does not need the same level of expertise as for (bio-)medicine.

Most Corpora for Bio-medicine and Medicine Domain

The annotation process for the bio-medicine and medicine corpora can be very costly due to highly educated staff. 7 out of 21 bio-medicine and medicine corpora are accessible under open-access licensing (see GitHub). We hypothesize that the number of corpora indicates that the domain sees great potential to reduce the annotation effort with AL. With their contribution of publicly available corpora, which were annotated by highly educated staff, they probably want to facilitate more research.

We provide a list of publicly accessible corpora designated for ER, enabling researchers to investigate and advance the development of ER and AL methods.

Refer to caption
Figure 4: The figure shows how many times corpora from specific domains are used in our 62 papers, grouped by the corpus licensing. The majority of the papers use open access corpora for their experiments.

4.3 Metrics, Hardware, and Execution Times (Review Questions 2b, 2c)

Our examination reveals a uniformity in the metrics used across studies, suggesting a consensus on their effectiveness. Regarding the hardware, only 6 out of 62 papers detail the hardware used for experiments, which we consider a critical oversight given that the hardware can significantly influence the training and inference times. This impacts the annotation process: The time for retraining models and determining new data points for the annotators results in waiting times [97]. The hardware reported ranged from personal computers to small server instances and workstations. No information was found on the usage of distributed clusters. Our analysis of execution times identified 13 papers reporting on training duration, annotator wait times, inference speeds, and annotation time.

We made the following observations:

AL Performance Comparison Metrics

Due to the differences in the implementations, parameters, and environments, a direct comparison of the performance of different AL strategies is unrealizable. However, the findings of the records offer information about the metrics used to evaluate AL strategies. Frequently it is F1-score (60), precision (16), and recall (16). Rarely   (<<< 4 times), it is accuracy, annotation time, and error rate.

AL Execution Times and Used Hardware

Little attention is paid to these aspects. 13 out of 62 papers reported any kind of timing information. Only 6 stated their used hardware. Papers presenting real-world applications of AL to ER tasks [97] mention the relevance of short retraining times for the AL model because they correlate with the waiting time for annotators. We could not find information about the duration of the initialization of an AL strategy, nor did we find information about the time a strategy needs to propose the new data points to the annotator.

4.4 Research Gaps

Based on our observations from the last sections, we formulate open topics as questions, which can guide future work in the field of AL and ER:

General

How do AL strategies perform in various domains in terms of their performance and execution time on specified hardware conditions? Is there a universally effective active learning strategy independent of the use case?

Exploitation Approaches

What are the reasons for the quantitative dominance of exploitation-based, especially uncertainty-based, AL strategies? Are these strategies also outstanding qualitatively? Are they used as a solution for ER, or are they primarily used as baselines for comparison with other strategies?

Exploration and Hybrid Approaches

Is the isolated usage of exploration-based AL strategies less beneficial than solely exploitation-based applications? Do hybrid approaches outperform exploration but not exploitation-based strategies?

Domain Research

Why does medical research for ER focus equally on exploitation and hybrid approaches while other domains favor exploitation-based strategies? Does the intense focus on evaluating AL strategies on newspaper articles reveal well-performing strategies? Do newspaper articles function as a baseline? Do AL strategies perform on specialized domains such as medicine as well as for broad domains like newspaper articles? Do the corpora and label sets differ in number and complexity? Is there a universally effective active learning strategy, or are certain strategies more effective in specific domains?

4.5 Evaluation Environment

According to the results presented, we establish a set of criteria for evaluating the effectiveness of Active Learning (AL) strategies in future works. An evaluation framework like ALE [38] and a reasoned selection of AL strategies and datasets enable a fair comparison. The evaluation environment should consider the following aspects:

Strategies

The comparison of strategies should include at least one strategy of each specification and heuristic (see Table 1 and Table 2). To assess the impact of the aggregation method, the permutation should be considered for exploitation and hybrid strategies.

Dataset

The strategies should be tested on a diverse set of corpora. This assesses the strategy’s robustness and allows to investigate the existence of an overall high-performing AL strategy. Based on the open access corpora (Figure 4), several domains can be tested (news, (bio-) medicine, scientific, and social media). The corpora may differ in size, language, and label complexity, which introduces different challenges.

Hardware and Execution Time

It is important to consider the time constraints of AL strategies as they can affect the annotation process [33]: The time required for proposing new data points and retraining the model can impact the waiting time for annotators. Therefore, it is essential to record the timings for initializing the AL strategy, proposing data points, and retraining the model. The timing information is strongly dependent on the hardware used. Therefore, it is crucial to provide information about the used hardware.

Evaluation metric

Based on this scoping review, the F1-Score should be in the list of reporting metrics.

Bias

Tracking bias reinforcement can help identify strategies that mitigate bias instead of amplifying it (see Section 6).

5 Related Work

Besides AL, researchers developed other approaches to reduce the annotation effort. Semi-supervised [82] and weak supervision [49] techniques rely on an initial dataset from which they derive rules or heuristics, enabling the automatic annotation of a larger dataset with reduced manual effort. These approaches might introduce noise into the data due to less precise heuristics.

Distant supervision [32], on the other hand, leverages external resources to generate positive examples for specific tasks, which is especially useful when external knowledge bases can provide substantial input. Data augmentation [22] complements these methods by creating new instances from already labeled data through various linguistic transformations.

AL and the stated methods can be used together to further reduce the manual effort [93, 24, 40, 43].

Surveys such as those by [69, 106] are most closely related to our work. Thereby, [69] considers deep learning techniques with AL in several areas (such as computer vision), while [106] focuses solely on NLP. Both surveys categorize the strategies found, many of which cannot be directly applied to ER.

6 Ethical Consideration

AL must be ethically scrutinized in the general context of NLP [41]. Specifically, AL as a data selection method can insert or enforce statistical bias following [21, 30]. The authors propose possible reasons for this unwanted effect, such as the distribution of data points in the seed and train dataset. Furthermore, AL errors can cause the model to become wrongly confident, making it difficult to correct the learned structure. As a result, data may stop being proposed for this concept because the model does not show room for improvement [30]. Another bias may be transferred from transformer models [99] during the pre-training phase of the AL model [30].

These bias-introducing and enforcing effects are especially alarming considering the focus of AL research on the medical domain. Existing approaches optimize for fairness metrics [3] or vary the error rate in each iteration of adaptive clustering to reduce bias [30] for classification tasks. These methods should also be tested for ER.

7 Conclusion and Future Work

We conducted a scoping review to provide an overview of active learning strategies, metrics, datasets, execution times, and hardware used for the entity recognition task.

Our results as comprehensive lists can be found in the provided GitHub repository: We identified 106 AL strategies and 57 datasets in 62 papers. A large share of the strategies follows the exploitation-based (60) approach. 36 of them use uncertainty-based sampling. Furthermore, we noted fewer exploration-based AL strategies than hybrid ones. For evaluation purposes, the F1-score is the dominant metric to demonstrate the performance of an AL strategy. Unfortunately, very few researchers report the execution time and used hardware for their experiments. We examined the 57 datasets on their availability and found 26 publicly accessible corpora. The most frequently used corpora are from the newspaper, bio-medical, and medical domains. We created an evaluation environment based on our observations. Additionally, we have identified research gaps in the field, which researchers can use as an outline for further research.

We plan to conduct comprehensive performance tests on a subset of the AL strategies and datasets found in this scoping review based on our evaluation environment.

References

  • [1] Agrawal, A., Tripathi, S., Vardhan, M.: Active learning approach using a modified least confidence sampling strategy for named entity recognition. Progress in Artificial Intelligence 10(2), 113–128 (2021). https://doi.org/10.1007/s13748-021-00230-w
  • [2] Agrawal, A., Tripathi, S., Vardhan, M.: Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition. Computing 105(5), 979–997 (2023). https://doi.org/10.1007/s00607-021-01000-1
  • [3] Anahideh, H., Asudeh, A., Thirumuruganathan, S.: Fair Active Learning. arXiv:2001.01796 [cs, stat] (Mar 2021)
  • [4] Arora, S., Agarwal, S., Students, M.: Active Learning for Natural Language Processing. Language Technologies Institute School of Computer Science Carnegie Mellon University 2 (2007)
  • [5] Bondu, A., Lemaire, V., Boullé, M.: Exploration vs. exploitation in active learning : A Bayesian approach. In: The 2010 International Joint Conference on Neural Networks (IJCNN). pp. 1–7 (Jul 2010). https://doi.org/10.1109/IJCNN.2010.5596815
  • [6] Brent, P., Green, N., Breimyer, P., Krishnamurthy, R., Samatova, N.: Systematic evaluation of convergence criteria in iterative training for NLP. In: Proceedings of the 22nd International Florida Artificial Intelligence Research Society Conference, FLAIRS-22. pp. 15–20 (2009)
  • [7] Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners (Jul 2020). https://doi.org/10.48550/arXiv.2005.14165
  • [8] Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. pp. 1877–1901. NIPS’20, Curran Associates Inc., Red Hook, NY, USA (Dec 2020)
  • [9] Burnham, J.F.: Scopus database: A review. Biomedical Digital Libraries 3(1),  1–8 (Dec 2006). https://doi.org/10.1186/1742-5581-3-1
  • [10] Chang, H.S., Vembu, S., Mohan, S., Uppaal, R., McCallum, A.: Using error decay prediction to overcome practical issues of deep active learning for named entity recognition. Machine Learning 109(9-10), 1749–1778 (2020). https://doi.org/10.1007/s10994-020-05897-1
  • [11] Chaudhary, A., Xie, J., Sheikh, Z., Neubig, G., Carbonell, J.: A little annotation does a lot of good: A study in bootstrapping low-resource named entity recognizers. In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. pp. 5164–5174 (2019)
  • [12] Chen, Y., Lask, T., Mei, Q., Chen, Q., Moon, S., Wang, J., Nguyen, K., Dawodu, T., Cohen, T., Denny, J., Xu, H.: An active learning-enabled annotation system for clinical named entity recognition. BMC Medical Informatics and Decision Making 17 (2017). https://doi.org/10.1186/s12911-017-0466-9
  • [13] Chen, Y., Lasko, T.A., Mei, Q., Denny, J.C., Xu, H.: A study of active learning methods for named entity recognition in clinical text. Journal of Biomedical Informatics 58, 11–18 (Dec 2015). https://doi.org/10.1016/j.jbi.2015.09.010
  • [14] Claveau, V., Kijak, E.: Strategies to select examples for active learning with conditional random fields. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 10761 LNCS, pp. 30–43 (2018). https://doi.org/10.1007/978-3-319-77113-7_3
  • [15] Collier, N., Ohta, T., Tsuruoka, Y., Tateisi, Y., Kim, J.D.: Introduction to the Bio-entity Recognition Task at JNLPBA. In: Collier, N., Ruch, P., Nazarenko, A. (eds.) Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications (NLPBA/BioNLP). pp. 73–78. COLING, Geneva, Switzerland (Aug 2004)
  • [16] Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 7059–7069. No. 634, Curran Associates Inc., Red Hook, NY, USA (Dec 2019)
  • [17] Culotta, A., Kristjansson, T., McCallum, A., Viola, P.: Corrective feedback and persistent learning for information extraction. Artificial Intelligence 170(14-15), 1101–1122 (2006). https://doi.org/10.1016/j.artint.2006.08.001
  • [18] Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks. In: Proceedings of the National Conference on Artificial Intelligence. vol. 2, pp. 746–751 (2005)
  • [19] Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805v2 (Oct 2018)
  • [20] Esuli, A., Marcheggiani, D., Sebastiani, F.: Sentence-based active learning strategies for information extraction. In: CEUR Workshop Proceedings. vol. 560, pp. 41–45 (2010)
  • [21] Farquhar, S., Gal, Y., Rainforth, T.: On Statistical Bias In Active Learning: How and When To Fix It (May 2021)
  • [22] Feng, S.Y., Gangal, V., Wei, J., Chandar, S., Vosoughi, S., Mitamura, T., Hovy, E.: A Survey of Data Augmentation Approaches for NLP (Dec 2021). https://doi.org/10.48550/arXiv.2105.03075
  • [23] Gao, N., Karampatziakis, N., Potharaju, R., Cucerzan, S.: Active entity recognition in low resource settings. In: International Conference on Information and Knowledge Management, Proceedings. pp. 2261–2264 (2019). https://doi.org/10.1145/3357384.3358109
  • [24] Gonsior, J., Thiele, M., Lehner, W.: WeakAL: Combining Active Learning and Weak Supervision. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 12323 LNAI, pp. 34–49 (2020). https://doi.org/10.1007/978-3-030-61527-7_3
  • [25] Grant, M.J., Booth, A.: A typology of reviews: An analysis of 14 review types and associated methodologies. Health Information & Libraries Journal 26(2), 91–108 (2009). https://doi.org/10.1111/j.1471-1842.2009.00848.x
  • [26] Gusenbauer, M., Haddaway, N.R.: Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods 11(2), 181–217 (2020). https://doi.org/10.1002/jrsm.1378
  • [27] Hachey, B., Alex, B., Becker, M.: Investigating the effects of selective sampling on the annotation task. In: CoNLL 2005 - Proceedings of the Ninth Conference on Computational Natural Language Learning. pp. 144–151 (2005). https://doi.org/10.3115/1706543.1706569
  • [28] Hahn, U., Beisswanger, E., Buyko, E., Faessler, E.: Active Learning-based corpus annotation–the PathoJen experience. AMIA … Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 2012, 301–310 (2012)
  • [29] Han, X., Kwoh, C., Kim, J.J.: Clustering based active learning for biomedical Named Entity Recognition. In: Proceedings of the International Joint Conference on Neural Networks. vol. 2016-October, pp. 1253–1260 (2016). https://doi.org/10.1109/IJCNN.2016.7727341
  • [30] Hassan, S., Alikhani, M.: D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias
  • [31] Hassanzadeh, H., Keyvanpour, M.: A two-phase hybrid of semi-supervised and active learning approach for sequence labeling. Intelligent Data Analysis 17(2), 251–270 (2013). https://doi.org/10.3233/IDA-130577
  • [32] Hedderich, M.A., Lange, L., Klakow, D.: ANEA: Distant Supervision for Low-Resource Named Entity Recognition (Apr 2021). https://doi.org/10.48550/arXiv.2102.13129
  • [33] Herde, M., Huseljic, D., Sick, B., Calma, A.: A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification. IEEE Access 9, 166970–166989 (2021). https://doi.org/10.1109/ACCESS.2021.3135514
  • [34] Jayakumar, T., Farooqui, F., Farooqui, L.: Large Language Models are legal but they are not: Making the case for a powerful LegalLLM. In: Preo\textcommabelowtiuc-Pietro, D., Goanta, C., Chalkidis, I., Barrett, L., Spanakis, G.J., Aletras, N. (eds.) Proceedings of the Natural Legal Language Processing Workshop 2023. pp. 223–229. Association for Computational Linguistics, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023.nllp-1.22
  • [35] Kholghi, M., De Vine, L., Sitbon, L., Zuccon, G., Nguyen, A.: Clinical information extraction using small data: An active learning approach based on sequence representations and word embeddings. Journal of the Association for Information Science and Technology 68(11), 2543–2556 (2017). https://doi.org/10.1002/asi.23936
  • [36] Kholghi, M., Sitbon, L., Zuccon, G., Nguyen, A.: External knowledge and query strategies in active learning: A study in clinical information extraction. In: International Conference on Information and Knowledge Management, Proceedings. vol. 19-23-Oct-2015, pp. 143–152 (2015). https://doi.org/10.1145/2806416.2806550
  • [37] Kim, S., Song, Y., Kim, K., Cha, J.W., Lee, G.: MMR-based active machine learning for bio named entity recognition. In: HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Short Papers. pp. 69–72 (2006)
  • [38] Kohl, P., Freyer, N., Krämer, Y., Werth, H., Wolf, S., Kraft, B., Meinecke, M., Zündorf, A.: ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP. In: Conte, D., Fred, A., Gusikhin, O., Sansone, C. (eds.) Deep Learning Theory and Applications. pp. 235–253. Communications in Computer and Information Science, Springer Nature Switzerland, Cham (2023). https://doi.org/10.1007/978-3-031-39059-3_16
  • [39] Laws, F., Scheible, C., Schütze, H.: Active learning with amazon mechanical turk. In: EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. pp. 1546–1556 (2011)
  • [40] Lee, S., Song, Y., Choi, M., Kim, H.: Bagging-based active learning model for named entity recognition with distant supervision. In: 2016 International Conference on Big Data and Smart Computing, BigComp 2016. pp. 321–324 (2016). https://doi.org/10.1109/BIGCOMP.2016.7425938
  • [41] Leidner, J.L., Plachouras, V.: Ethical by Design: Ethics Best Practices for Natural Language Processing. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. pp. 30–40. Association for Computational Linguistics, Valencia, Spain (2017). https://doi.org/10.18653/v1/W17-1604
  • [42] Li, M., Nguyen, N., Ananiadou, S.: Proactive Learning for Named Entity Recognition. In: BioNLP 2017 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 16th BioNLP Workshop. pp. 117–125 (2017)
  • [43] Li, Q., Huang, Z., Dou, Y., Zhang, Z.: A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 12816 LNAI, pp. 88–100 (2021). https://doi.org/10.1007/978-3-030-82147-0_8
  • [44] Li, W., Du, Y., Li, X., Chen, X., Xie, C., Li, H., Li, X.: UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning. Engineering Applications of Artificial Intelligence (2022). https://doi.org/10.1016/j.engappai.2022.105460
  • [45] Li, Y., Yue, T., Zhenxin, W.: IEKM-MD: An intelligent platform for information extraction and knowledge mining in multi-domains. In: CEUR Workshop Proceedings. vol. 2658, pp. 73–78 (2020)
  • [46] Lin, B., Lee, D.H., Xu, F., Lan, O., Ren, X.: AlpacaTag: An active learning-based crowd annotation framework for sequence tagging. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of System Demonstrations. pp. 58–63 (2019)
  • [47] Lin, S., Gao, J., Zhang, S., He, X., Sheng, Y., Chen, J.: A continuous learning method for recognizing named entities by integrating domain contextual relevance measurement and Web farming mode of Web intelligence. World Wide Web 23(3), 1769–1790 (2020). https://doi.org/10.1007/s11280-019-00758-x
  • [48] Linh, L., Nguyen, M.T., Zuccon, G., Demartini, G.: Loss-based Active Learning for Named Entity Recognition. In: Proceedings of the International Joint Conference on Neural Networks. vol. 2021-July (2021). https://doi.org/10.1109/IJCNN52387.2021.9533675
  • [49] Lison, P., Barnes, J., Hubin, A.: Skweak: Weak Supervision Made Easy for NLP. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. pp. 337–346 (2021). https://doi.org/10.18653/v1/2021.acl-demo.40
  • [50] Liu, M., Buntine, W., Haffari, G.: Learning how to actively learn: A deep imitation learning approach. In: ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). vol. 1, pp. 1874–1883 (2018). https://doi.org/10.18653/v1/p18-1174
  • [51] Liu, M., Tu, Z., Zhang, T., Su, T., Xu, X., Wang, Z.: LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition. Neural Processing Letters 54(3), 2433–2454 (2022). https://doi.org/10.1007/s11063-021-10737-x
  • [52] Liu, Y., Hu, J., Chen, Z., Wan, X., Chang, T.H.: EASAL: Entity-Aware Subsequence-Based Active Learning for Named Entity Recognition. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023. vol. 37, pp. 8897–8905 (2023)
  • [53] Loy, C.C., Hospedales, T.M., Tao Xiang, Shaogang Gong: Stream-based joint exploration-exploitation active learning. 2012 IEEE Conference on Computer Vision and Pattern Recognition pp. 1560–1567 (Jun 2012). https://doi.org/10.1109/CVPR.2012.6247847
  • [54] Marcheggiani, D., Artières, T.: An experimental comparison of active learning strategies for partially labeled sequences. In: EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. pp. 898–906 (2014). https://doi.org/10.3115/v1/d14-1097
  • [55] Mejer, A., Crammer, K.: Confidence in structured-prediction using Confidence-Weighted models. In: EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. pp. 971–981 (2010)
  • [56] Mendonça, V., Sardinha, A., Coheur, L., Santos, A.L.: Query Strategies, Assemble! Active Learning with Expert Advice for Low-resource Natural Language Processing. In: 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). pp. 1–8 (Jul 2020). https://doi.org/10.1109/FUZZ48607.2020.9177707
  • [57] Miller, S., Guinness, J., Zamanian, A.: Name tagging with word clusters and discriminative training. In: HLT-NAACL 2004 - Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. pp. 337–342 (2004)
  • [58] Mirończuk, M.M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Systems with Applications 106, 36–54 (2018)
  • [59] Mo, Y., Scott, S., Downey, D.: Learning hierarchically decomposable concepts with active over-labeling. In: Proceedings - IEEE International Conference on Data Mining, ICDM. pp. 340–349 (2017). https://doi.org/10.1109/ICDM.2016.165
  • [60] Moniz, J., Patra, B., Gormley, M.: On Efficiently Acquiring Annotations for Multilingual Models. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics. vol. 2, pp. 69–85 (2022)
  • [61] Munn, Z., Peters, M.D.J., Stern, C., Tufanaru, C., McArthur, A., Aromataris, E.: Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Medical Research Methodology 18(1),  143 (Nov 2018). https://doi.org/10.1186/s12874-018-0611-x
  • [62] Neto, J., Faleiros, T.: Deep Active-Self Learning Applied to Named Entity Recognition. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 13074 LNAI, pp. 405–418 (2021). https://doi.org/10.1007/978-3-030-91699-2_28
  • [63] Nguyen, V., Lee, W., Ye, N., Chai, K., Chieu, H.: Active learning for probabilistic hypotheses using the maximum Gibbs error criterion. In: Advances in Neural Information Processing Systems (2013)
  • [64] Ni, J., Delaney, B., Florian, R.: Fast Model Adaptation for Automated Section Classification in Electronic Medical Records. In: Studies in Health Technology and Informatics. vol. 216, pp. 35–39 (2015). https://doi.org/10.3233/978-1-61499-564-7-35
  • [65] Olsson, F.: On privacy preservation in text and document-based active learning for named entity recognition. In: International Conference on Information and Knowledge Management, Proceedings. pp. 53–60 (2009). https://doi.org/10.1145/1651449.1651460
  • [66] Olsson, F., Tomanek, K.: An intrinsic stopping criterion for committee-based active learning. In: CoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning. pp. 138–146 (2009). https://doi.org/10.3115/1596374.1596398
  • [67] Pradhan, A., Todi, K., Selvarasu, A., Sanyal, A.: Knowledge Graph Generation with Deep Active Learning. In: Proceedings of the International Joint Conference on Neural Networks (2020). https://doi.org/10.1109/IJCNN48605.2020.9207515
  • [68] Radmard, P., Fathullah, Y., Lipani, A.: Subsequence Based Deep Active Learning for Named Entity Recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 4310–4321. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.acl-long.332
  • [69] Ren, P., Xiao, Y., Chang, X., Huang, P.Y., Li, Z., Gupta, B.B., Chen, X., Wang, X.: A Survey of Deep Active Learning. ACM Computing Surveys 54(9), 1–40 (Dec 2022). https://doi.org/10.1145/3472291
  • [70] Saha, S., Ekbal, A., Verma, M., Sikdar, U., Poesio, M.: Active learning technique for biomedical named entity extraction. In: ACM International Conference Proceeding Series. pp. 835–841 (2012). https://doi.org/10.1145/2345396.2345532
  • [71] Şapci, A., Kemik, H., Yeniterzi, R., Tastan, O.: Focusing on potential named entities during active label acquisition. Natural Language Engineering (2023). https://doi.org/10.1017/S1351324923000165
  • [72] Settles, B.: Active Learning Literature Survey p. 67 (2009)
  • [73] Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. pp. 1070–1079 (2008)
  • [74] Shardlow, M., Ju, M., Li, M., O’Reilly, C., Iavarone, E., McNaught, J., Ananiadou, S.: A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience. Neuroinformatics 17(3), 391–406 (2019). https://doi.org/10.1007/s12021-018-9404-y
  • [75] Sharma, A., Amrita, Chakraborty, S., Kumar, S.: Named Entity Recognition in Natural Language Processing: A Systematic Review. In: Gupta, D., Khanna, A., Kansal, V., Fortino, G., Hassanien, A.E. (eds.) Proceedings of Second Doctoral Symposium on Computational Intelligence. pp. 817–828. Advances in Intelligent Systems and Computing, Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3346-1_66
  • [76] Shelmanov, A., Puzyrev, D., Kupriyanova, L., Belyakov, D., Larionov, D., Khromov, N., Kozlova, O., Artemova, E., Dylov, D.V., Panchenko, A.: Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates (Feb 2021)
  • [77] Shen, Y., Yun, H., Lipton, Z., Kronrod, Y., Anandkumar, A.: Deep active learning for named entity recognition. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, Rep4NLP 2017 at the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017. pp. 252–256 (2017)
  • [78] Shen, Y., Yun, H., Lipton, Z.C., Kronrod, Y., Anandkumar, A.: Deep Active Learning for Named Entity Recognition (Feb 2018)
  • [79] Shrivastava, A., Heer, J.: ISeqL. In: International Conference on Intelligent User Interfaces, Proceedings IUI. pp. 43–54 (2020). https://doi.org/10.1145/3377325.3377503
  • [80] Siddhant, A., Lipton, Z.C.: Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study (Sep 2018)
  • [81] Simpson, E., Gurevych, I.: A Bayesian approach for sequence tagging with crowds. In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. pp. 1093–1104 (2019)
  • [82] Sintayehu, H., Lehal, G.S.: Named entity recognition: A semi-supervised learning approach. International Journal of Information Technology 13(4), 1659–1665 (Aug 2021). https://doi.org/10.1007/s41870-020-00470-4
  • [83] Skeppstedt, M., Rzepka, R., Araki, K., Kerren, A.: Visualising and evaluating the effects of combining active learning with word embedding features. In: Proceedings of the 15th Conference on Natural Language Processing, KONVENS 2019. pp. 91–100 (2020)
  • [84] Son, N.H., Yu, H.M., Nguyen, T.A.D., Nguyen, M.T.: Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents. In: 2022 International Joint Conference on Neural Networks (IJCNN). pp. 1–8 (Jul 2022). https://doi.org/10.1109/IJCNN55064.2022.9892779
  • [85] Tang, S., Liu, H., Almatared, M., Abudayyeh, O., Lei, Z., Fong, A.: Towards Automated Construction Quantity Take-Off: An Integrated Approach to Information Extraction from Work Descriptions. Buildings 12(3) (2022). https://doi.org/10.3390/buildings12030354
  • [86] Tang, X., Wu, S., Chen, G., Chen, K., Shou, L.: Learning to Label with Active Learning and Reinforcement Learning. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 12682 LNCS, pp. 549–557 (2021). https://doi.org/10.1007/978-3-030-73197-7_36
  • [87] Tchoua, R., Ajith, A., Hong, Z., Ward, L., Chard, K., Audus, D., Patel, S., De Pablo, J., Foster, I.: Active learning yields better training data for scientific named entity recognition. In: Proceedings - IEEE 15th International Conference on eScience, eScience 2019. pp. 126–135 (2019). https://doi.org/10.1109/eScience.2019.00021
  • [88] Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. pp. 142–147 (2003)
  • [89] Tomanek, K., Hahn, U.: Approximating learning curves for active-learning-driven annotation. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. pp. 1319–1324 (2008)
  • [90] Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: K-CAP’09 - Proceedings of the 5th International Conference on Knowledge Capture. pp. 105–112 (2009). https://doi.org/10.1145/1597735.1597754
  • [91] Tomanek, K., Hahn, U.: Annotation time stamps - Temporal metadata from the linguistic annotation process. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. pp. 2516–2521 (2010)
  • [92] Tomanek, K., Laws, F., Hahn, U., Schütze, H.: On proper unit selection in active learning: Co-selection effects for named entity recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing. pp. 9–17. HLT ’09, Association for Computational Linguistics, USA (2009)
  • [93] Tran, V., Hoang, D., Nguyen, N., Hwang, D.: A hybrid method for named entity recognition on tweet streams. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 10191 LNAI, pp. 258–268 (2017). https://doi.org/10.1007/978-3-319-54472-4_25
  • [94] Tran, V., Nguyen, N., Fujita, H., Hoang, D., Hwang, D.: A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields. Knowledge-Based Systems 132, 179–187 (2017). https://doi.org/10.1016/j.knosys.2017.06.023
  • [95] Tricco, A.C., Lillie, E., Zarin, W., O’Brien, K.K., Colquhoun, H., Levac, D., Moher, D., Peters, M.D., Horsley, T., Weeks, L., Hempel, S., Akl, E.A., Chang, C., McGowan, J., Stewart, L., Hartling, L., Aldcroft, A., Wilson, M.G., Garritty, C., Lewin, S., Godfrey, C.M., Macdonald, M.T., Langlois, E.V., Soares-Weiser, K., Moriarty, J., Clifford, T., Tunçalp, Ö., Straus, S.E.: PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Annals of Internal Medicine 169(7), 467–473 (Oct 2018). https://doi.org/10.7326/M18-0850
  • [96] Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association : JAMIA 18(5), 552–556 (2011). https://doi.org/10.1136/amiajnl-2011-000203
  • [97] Van Nguyen, M., Ngo, N., Min, B., Nguyen, T.: FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction. In: NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session. pp. 131–139 (2022)
  • [98] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need (Aug 2023). https://doi.org/10.48550/arXiv.1706.03762
  • [99] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need (Aug 2023). https://doi.org/10.48550/arXiv.1706.03762
  • [100] Veerasekharreddy, B., Rao, K., Koppula, N.: Named Entity Recognition using CRF with Active Learning Algorithm in English Texts. In: 6th International Conference on Electronics, Communication and Aerospace Technology, ICECA 2022 - Proceedings. pp. 1041–1044 (2022). https://doi.org/10.1109/ICECA55336.2022.10009592
  • [101] Verma, M., Sikdar, U., Saha, S., Ekbal, A.: Ensemble based active annotation for biomedical named entity recognition. In: Proceedings of the 2013 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2013. pp. 973–978 (2013). https://doi.org/10.1109/ICACCI.2013.6637308
  • [102] Wei, Q., Chen, Y., Salimi, M., Denny, J., Mei, Q., Lasko, T., Chen, Q., Wu, S., Franklin, A., Cohen, T., Xu, H.: Cost-aware active learning for named entity recognition in clinical text. Journal of the American Medical Informatics Association 26(11), 1314–1322 (2019). https://doi.org/10.1093/jamia/ocz102
  • [103] Yao, J., Dou, Z., Nie, J., Wen, J.: Looking Back on the Past: Active Learning with Historical Evaluation Results. IEEE Transactions on Knowledge and Data Engineering (2020). https://doi.org/10.1109/TKDE.2020.3045816
  • [104] Zaratiana, U., Tomeh, N., Holat, P., Charnois, T.: GNNer: Reducing Overlapping in Span-based NER Using Graph Neural Networks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. pp. 97–103. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.acl-srw.9
  • [105] Zhan, X., Wang, Q., Huang, K.h., Xiong, H., Dou, D., Chan, A.B.: A Comparative Survey of Deep Active Learning (Jul 2022)
  • [106] Zhang, Z., Strubell, E., Hovy, E.: A survey of active learning for natural language processing. arXiv preprint arXiv:2210.10109 (2022)
  • [107] Zheng, G., Mukherjee, S., Dong, X., Li, F.: OpenTag: Open aribute value extraction from product profiles. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1049–1058 (2018). https://doi.org/10.1145/3219819.3219839
  • [108] Zhong, Z., Liu, F., Wu, Y., Wu, J.: Chinese named entity recognition combined active learning with self-training. Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology 36(4), 82–88 (2014). https://doi.org/10.11887/j.cn.201404015
  • [109] Zhou, B., Cai, X., Zhang, Y., Guo, W., Yuan, X.: MTAAL: Multi-Task Adversarial Active Learning for Medical Named Entity Recognition and Normalization. In: 35th AAAI Conference on Artificial Intelligence, AAAI 2021. vol. 16, pp. 14586–14593 (2021)
  • [110] Zhou, M., Duan, N., Liu, S., Shum, H.Y.: Progress in neural NLP: Modeling, learning, and reasoning. Engineering 6(3), 275–290 (2020)
  • [111] Zhou, S., Liang, S., Yang, Q., Jiang, W., He, Y., Li, Y.: Active Learning Based Labeling Method for Fault Disposal Pre-plans. In: Advances and Trends in Artificial Intelligence. Theory and Applications. pp. 377–382 (2023)
  • [112] Zhuo, T.Y., Huang, Y., Chen, C., Xing, Z.: Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity (May 2023). https://doi.org/10.48550/arXiv.2301.12867