Search | arXiv e-print repository

Automated Neural Patent Landscaping in the Small Data Regime

Authors: Tisa Islam Erana, Mark A. Finlayson

Abstract: Patent landscaping is the process of identifying all patents related to a particular technological area, and is important for assessing various aspects of the intellectual property context. Traditionally, constructing patent landscapes is intensely laborious and expensive, and the rapid expansion of patenting activity in recent decades has driven an increasing need for efficient and effective auto… ▽ More Patent landscaping is the process of identifying all patents related to a particular technological area, and is important for assessing various aspects of the intellectual property context. Traditionally, constructing patent landscapes is intensely laborious and expensive, and the rapid expansion of patenting activity in recent decades has driven an increasing need for efficient and effective automated patent landscaping approaches. In particular, it is critical that we be able to construct patent landscapes using a minimal number of labeled examples, as labeling patents for a narrow technology area requires highly specialized (and hence expensive) technical knowledge. We present an automated neural patent landscaping system that demonstrates significantly improved performance on difficult examples (0.69 $F_1$ on 'hard' examples, versus 0.6 for previously reported systems), and also significant improvements with much less training data (overall 0.75 $F_1$ on as few as 24 examples). Furthermore, in evaluating such automated landscaping systems, acquiring good data is challenge; we demonstrate a higher-quality training data generation procedure by merging Abood and Feltenberger's (2018) "seed/anti-seed" approach with active learning to collect difficult labeled examples near the decision boundary. Using this procedure we created a new dataset of labeled AI patents for training and testing. As in prior work we compare our approach with a number of baseline systems, and we release our code and data for others to build upon. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 11 pages, 4 figures

arXiv:2406.16838 [pdf, other]

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Authors: Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui

Abstract: One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, m… ▽ More One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.05265 [pdf, other]

TLEX: An Efficient Method for Extracting Exact Timelines from TimeML Temporal Graphs

Authors: Mustafa Ocal, Ning Xie, Mark Finlayson

Abstract: A timeline provides a total ordering of events and times, and is useful for a number of natural language understanding tasks. However, qualitative temporal graphs that can be derived directly from text -- such as TimeML annotations -- usually explicitly reveal only partial orderings of events and times. In this work, we apply prior work on solving point algebra problems to the task of extracting t… ▽ More A timeline provides a total ordering of events and times, and is useful for a number of natural language understanding tasks. However, qualitative temporal graphs that can be derived directly from text -- such as TimeML annotations -- usually explicitly reveal only partial orderings of events and times. In this work, we apply prior work on solving point algebra problems to the task of extracting timelines from TimeML annotated texts, and develop an exact, end-to-end solution which we call TLEX (TimeLine EXtraction). TLEX transforms TimeML annotations into a collection of timelines arranged in a trunk-and-branch structure. Like what has been done in prior work, TLEX checks the consistency of the temporal graph and solves it; however, it adds two novel functionalities. First, it identifies specific relations involved in an inconsistency (which could then be manually corrected) and, second, TLEX performs a novel identification of sections of the timelines that have indeterminate order, information critical for downstream tasks such as aligning events from different timelines. We provide detailed descriptions and analysis of the algorithmic components in TLEX, and conduct experimental evaluations by applying TLEX to 385 TimeML annotated texts from four corpora. We show that 123 of the texts are inconsistent, 181 of them have more than one ``real world'' or main timeline, and there are 2,541 indeterminate sections across all four corpora. A sampling evaluation showed that TLEX is 98--100% accurate with 95% confidence along five dimensions: the ordering of time-points, the number of main timelines, the placement of time-points on main versus subordinate timelines, the connecting point of branch timelines, and the location of the indeterminate sections. We provide a reference implementation of TLEX, the extracted timelines for all texts, and the manual corrections of the inconsistent texts. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 25 pages, 9 figures

arXiv:2403.09539 [pdf, other]

Logits of API-Protected LLMs Leak Proprietary Information

Authors: Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

Abstract: The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing und… ▽ More The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1,000 for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We show that this lends itself to a model image or a model signature which unlocks several capabilities with affordable cost: efficiently discovering the LLM's hidden size, obtaining full-vocabulary outputs, detecting and disambiguating different model updates, identifying the source LLM given a single full LLM output, and even estimating the output layer parameters. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability. △ Less

Submitted 14 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2312.02810 [pdf]

doi 10.34703/gzx1-9v95/8PC8JY

Using the SP!CE Framework to Code Influence Campaign Activity on Social Media: Case Study on the 2022 Brazilian Presidential Election

Authors: Alexander Gocso, Claudia Perez Brito, Bryan Ruesca, Allen Mendes, Mark A. Finlayson

Abstract: We describe a case study in the use of the Structured Process for Information Campaign Enhancement (SP!CE, version 2.1) to evaluate influence campaigns present in the 2nd round of the Brazilian presidential election in 2022 October. SP!CE is a US-military focused framework for describing both friendly and adversary actions in influence campaigns, and is inter-operable with the Disinformation Analy… ▽ More We describe a case study in the use of the Structured Process for Information Campaign Enhancement (SP!CE, version 2.1) to evaluate influence campaigns present in the 2nd round of the Brazilian presidential election in 2022 October. SP!CE is a US-military focused framework for describing both friendly and adversary actions in influence campaigns, and is inter-operable with the Disinformation Analysis and Risk Management (DISARM) framework. The purpose of the case study is to demonstrate how SP!CE can be used to describe influence campaign behaviors. We selected the Brazilian election as the target of the case study as it is known that there were significant amounts of mis- and disinformation present on social media during the campaigns. Our goal was to demonstrate how SP!CE could be applied in such a context, showing how social media content could be aligned with information campaign behaviors and how such an alignment can be used to analyze which mis- and disinformation narratives were in play. Additionally, we aim to provide insights on best practices regarding how to apply the framework in further research. We release the coding and screenshots of the relevant social media posts to support future research. △ Less

Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: 23 pages, 34 figures, 1 table

arXiv:2310.01693 [pdf, other]

Closing the Curious Case of Neural Text Degeneration

Authors: Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal

Abstract: Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonze… ▽ More Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonzero true probability. However, thresholds are a coarse heuristic, and necessarily discard some tokens with nonzero true probability as well. In pursuit of a more precise sampling strategy, we show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability, without relying on a threshold. Based on our findings, we develop an experimental truncation strategy and the present pilot studies demonstrating the promise of this type of algorithm. Our evaluations show that our method outperforms its threshold-based counterparts under automatic and human evaluation metrics for low-entropy (i.e., close to greedy) open-ended text generation. Our theoretical findings and pilot experiments provide both insight into why truncation sampling works, and make progress toward more expressive sampling algorithms that better surface the generative capabilities of large language models. △ Less

Submitted 2 October, 2023; originally announced October 2023.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2305.14596 [pdf, other]

Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy

Authors: Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal

Abstract: When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer choices. Spreading probability mass across multiple surface forms with identical meaning (such as "bath" and "bathtub") is thought to cause an underestimation of a model's true performance, referred to as th… ▽ More When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer choices. Spreading probability mass across multiple surface forms with identical meaning (such as "bath" and "bathtub") is thought to cause an underestimation of a model's true performance, referred to as the "surface form competition" (SFC) hypothesis. This has motivated the introduction of various probability normalization methods. However, many core questions remain unanswered. How do we measure SFC? Are there direct ways of reducing it, and does doing so improve task performance? We propose a mathematical formalism for SFC which allows us to quantify and bound its impact for the first time. We identify a simple method for reducing it -- namely, increasing probability mass on the given answer choices by a) including them in the prompt and b) using in-context learning with even just one example. We show this method eliminates the impact of SFC in the majority of instances. Our experiments on three diverse datasets and six LMs reveal several additional surprising findings. For example, both normalization and prompting methods for reducing SFC can be ineffective or even detrimental to task performance for some LMs. We conclude with practical insights for effectively prompting LMs for multiple-choice tasks. △ Less

Submitted 31 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: EMNLP 2023

arXiv:2210.17517 [pdf, other]

Lila: A Unified Benchmark for Mathematical Reasoning

Authors: Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan

Abstract: Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., q… ▽ More Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., question-answering, fill-in-the-blanks (iii) language diversity e.g., no language, simple language (iv) external knowledge e.g., commonsense, physics. We construct our benchmark by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs, thereby obtaining explainable solutions in addition to the correct answer. We additionally introduce two evaluation datasets to measure out-of-distribution performance and robustness to language perturbation. Finally, we introduce BHASKARA, a general-purpose mathematical reasoning model trained on LILA. Importantly, we find that multi-tasking leads to significant improvements (average relative improvement of 21.83% F1 score vs. single-task models), while the best performing model only obtains 60.40%, indicating the room for improvement in general mathematical reasoning and understanding. △ Less

Submitted 8 March, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: EMNLP 2022

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2210.02406 [pdf, other]

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Authors: Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

Abstract: Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by deco… ▽ More Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a library of prompting-based LLMs dedicated to these sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task, further decomposed if necessary, and even easily replaced with more effective prompts, trained models, or symbolic functions if desired. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting using GPT3. On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks. When the complexity comes from the input length, we can recursively decompose the task into the same task but with smaller inputs. We also evaluate our approach on textual multi-step reasoning tasks: on long-context multi-hop QA task, we can more effectively teach the sub-tasks via our separate sub-tasks prompts; and on open-domain multi-hop QA, we can incorporate a symbolic information retrieval within our decomposition framework, leading to improved performance on both tasks. Datasets, Code and Prompts available at https://github.com/allenai/DecomP. △ Less

Submitted 11 April, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: ICLR'23 Camera Ready

arXiv:2204.09148 [pdf, other]

What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

Authors: Matthew Finlayson, Kyle Richardson, Ashish Sabharwal, Peter Clark

Abstract: The instruction learning paradigm -- where a model learns to perform new tasks from task descriptions alone -- has become popular in general-purpose model research. The capabilities of large transformer models as instruction learners, however, remain poorly understood. We use a controlled synthetic environment to characterize such capabilities. Specifically, we use the task of deciding whether a g… ▽ More The instruction learning paradigm -- where a model learns to perform new tasks from task descriptions alone -- has become popular in general-purpose model research. The capabilities of large transformer models as instruction learners, however, remain poorly understood. We use a controlled synthetic environment to characterize such capabilities. Specifically, we use the task of deciding whether a given string matches a regular expression (viewed as an instruction) to identify properties of tasks, instructions, and instances that make instruction learning challenging. For instance, we find that our model, a fine-tuned T5-based text2text transformer, struggles with large regular languages, suggesting that less precise instructions are challenging for models. Additionally, instruction executions that require tracking longer contexts of prior steps are also more difficult. We use our findings to systematically construct a challenging instruction learning dataset, which we call Hard RegSet. Fine-tuning on Hard RegSet, our large transformer learns to correctly interpret only 65.6% of test instructions (with at least 90% accuracy), and 11%-24% of the instructions in out-of-distribution generalization settings. We propose Hard RegSet as a challenging instruction learning task, and a controlled environment for studying instruction learning. △ Less

Submitted 24 May, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: Typos corrected, rewordings

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2204.06085 [pdf, other]

Finding Trolls Under Bridges: Preliminary Work on a Motif Detector

Authors: W. Victor H. Yarlott, Armando Ochoa, Anurag Acharya, Laurel Bobrow, Diego Castro Estrada, Diana Gomez, Joan Zheng, David McDonald, Chris Miller, Mark A. Finlayson

Abstract: Motifs are distinctive recurring elements found in folklore that have significance as communicative devices in news, literature, press releases, and propaganda. Motifs concisely imply a large constellation of culturally-relevant information, and their broad usage suggests their cognitive importance as touchstones of cultural knowledge, making their detection a worthy step toward culturally-aware n… ▽ More Motifs are distinctive recurring elements found in folklore that have significance as communicative devices in news, literature, press releases, and propaganda. Motifs concisely imply a large constellation of culturally-relevant information, and their broad usage suggests their cognitive importance as touchstones of cultural knowledge, making their detection a worthy step toward culturally-aware natural language processing tasks. Until now, folklorists and others interested in motifs have only extracted motifs from narratives manually. We present a preliminary report on the development of a system for automatically detecting motifs. We briefly describe an annotation effort to produce data for training motif detection, which is on-going. We describe our in-progress architecture in detail, which aims to capture, in part, how people determine whether or not a motif candidate is being used in a motific way. This description includes a test of an off-the-shelf metaphor detector as a feature for motif detection, which achieves a F1 of 0.35 on motifs and a macro-average F1 of 0.21 across four categories which we assign to motif candidates. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: 13 pages, 2 figures, Presented at The Ninth Advances in Cognitive Systems (ACS) Conference 2021 (arXiv:2201.06134)

Report number: ACS2021/23

arXiv:2201.10618 [pdf, other]

The ABBE Corpus: Animate Beings Being Emotional

Authors: Samira Zad, Joshuan Jimenez, Mark A. Finlayson

Abstract: Emotion detection is an established NLP task of demonstrated utility for text understanding. However, basic emotion detection leaves out key information, namely, who is experiencing the emotion in question. For example, it may be the author, the narrator, or a character; or the emotion may correspond to something the audience is supposed to feel, or even be unattributable to a specific being, e.g.… ▽ More Emotion detection is an established NLP task of demonstrated utility for text understanding. However, basic emotion detection leaves out key information, namely, who is experiencing the emotion in question. For example, it may be the author, the narrator, or a character; or the emotion may correspond to something the audience is supposed to feel, or even be unattributable to a specific being, e.g., when emotions are being discussed per se. We provide the ABBE corpus -- Animate Beings Being Emotional -- a new double-annotated corpus of texts that captures this key information for one class of emotion experiencer, namely, animate beings in the world described by the text. Such a corpus is useful for developing systems that seek to model or understand this specific type of expressed emotion. Our corpus contains 30 chapters, comprising 134,513 words, drawn from the Corpus of English Novels, and contains 2,010 unique emotion expressions attributable to 2,227 animate beings. The emotion expressions are categorized according to Plutchik's 8-category emotion model, and the overall inter-annotator agreement for the annotations was 0.83 Cohen's Kappa, indicating excellent agreement. We describe in detail our annotation scheme and procedure, and also release the corpus for use by other researchers. △ Less

Submitted 15 February, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: 9 pages, 1 figure

arXiv:2106.06087 [pdf, other]

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models

Authors: Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov

Abstract: Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts. To elucidate the mechanisms by which the models accomplish this behavior, this study applies causal mediation analysis to pre-trained neural language models. We investigate the magnitude of models' preferences for grammatical inflections, as well as whether ne… ▽ More Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts. To elucidate the mechanisms by which the models accomplish this behavior, this study applies causal mediation analysis to pre-trained neural language models. We investigate the magnitude of models' preferences for grammatical inflections, as well as whether neurons process subject-verb agreement similarly across sentences with different syntactic structures. We uncover similarities and differences across architectures and model sizes -- notably, that larger models do not necessarily learn stronger preferences. We also observe two distinct mechanisms for producing subject-verb agreement depending on the syntactic structure of the input sentence. Finally, we find that language models rely on similar sets of neurons when given sentences with similar syntactic structure. △ Less

Submitted 22 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: Accepted to ACL-IJCNLP 2021

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2009.05664 [pdf, other]

Towards an Atlas of Cultural Commonsense for Machine Reasoning

Authors: Anurag Acharya, Kartik Talamadupula, Mark A Finlayson

Abstract: Existing commonsense reasoning datasets for AI and NLP tasks fail to address an important aspect of human life: cultural differences. We introduce an approach that extends prior work on crowdsourcing commonsense knowledge by incorporating differences in knowledge that are attributable to cultural or national groups. We demonstrate the technique by collecting commonsense knowledge that surrounds si… ▽ More Existing commonsense reasoning datasets for AI and NLP tasks fail to address an important aspect of human life: cultural differences. We introduce an approach that extends prior work on crowdsourcing commonsense knowledge by incorporating differences in knowledge that are attributable to cultural or national groups. We demonstrate the technique by collecting commonsense knowledge that surrounds six fairly universal rituals -- birth, coming-of-age, marriage, funerals, new year, and birthdays -- across two national groups: the United States and India. Our study expands the different types of relationships identified by existing work in the field of commonsense reasoning for commonplace events, and uses these new types to gather information that distinguish the identity of the groups providing the knowledge. It also moves us a step closer towards building a machine that doesn't assume a rigid framework of universal (and likely Western-biased) commonsense knowledge, but rather has the ability to reason in a contextually and culturally sensitive way. Our hope is that cultural knowledge of this sort will lead to more human-like performance in NLP tasks such as question answering (QA) and text understanding and generation. △ Less

Submitted 18 December, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

Comments: 9 pages, 9 figures

ACM Class: I.2.6; I.2.7

arXiv:1602.05753 [pdf]

Overview of Annotation Creation: Processes & Tools

Authors: Mark A. Finlayson, Tomaž Erjavec

Abstract: Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so central to achieving high quality, reusable annotations… ▽ More Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so central to achieving high quality, reusable annotations with low cost, the focus is on identifying capabilities that are necessary or useful for annotation tools, as well as common problems these tools present that reduce their utility. Although examples of specific tools are provided in many cases, this chapter concentrates more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair. The two core capabilities tools must have are support for the chosen annotation scheme and the ability to work on the language under study. Additional capabilities are organized into three categories: those that are widely provided; those that often useful but found in only a few tools; and those that have as yet little or no available tool support. △ Less

Submitted 18 February, 2016; originally announced February 2016.

Comments: To appear in: James Pustejovsky and Nancy Ide (eds.) "Handbook of Linguistic Annotation." 2016. New York: Springer

Showing 1–15 of 15 results for author: Finlayson, M