Search | arXiv e-print repository

End-to-End Trainable Soft Retriever for Low-resource Relation Extraction

Authors: Kohei Makino, Makoto Miwa, Yutaka Sasaki

Abstract: This study addresses a crucial challenge in instance-based relation extraction using text generation models: end-to-end training in target relation extraction task is not applicable to retrievers due to the non-differentiable nature of instance selection. We propose a novel End-to-end TRAinable Soft K-nearest neighbor retriever (ETRASK) by the neural prompting method that utilizes a soft, differen… ▽ More This study addresses a crucial challenge in instance-based relation extraction using text generation models: end-to-end training in target relation extraction task is not applicable to retrievers due to the non-differentiable nature of instance selection. We propose a novel End-to-end TRAinable Soft K-nearest neighbor retriever (ETRASK) by the neural prompting method that utilizes a soft, differentiable selection of the $k$ nearest instances. This approach enables the end-to-end training of retrievers in target tasks. On the TACRED benchmark dataset with a low-resource setting where the training data was reduced to 10\%, our method achieved a state-of-the-art F1 score of 71.5\%. Moreover, ETRASK consistently improved the baseline model by adding instances for all settings. These results highlight the efficacy of our approach in enhancing relation extraction performance, especially in resource-constrained environments. Our findings offer a promising direction for future research with extraction and the broader application of text generation in natural language processing. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: preprint

arXiv:2302.05392 [pdf, other]

Span-based Named Entity Recognition by Generating and Compressing Information

Authors: Nhung T. H. Nguyen, Makoto Miwa, Sophia Ananiadou

Abstract: The information bottleneck (IB) principle has been proven effective in various NLP applications. The existing work, however, only used either generative or information compression models to improve the performance of the target task. In this paper, we propose to combine the two types of IB models into one system to enhance Named Entity Recognition (NER). For one type of IB model, we incorporate tw… ▽ More The information bottleneck (IB) principle has been proven effective in various NLP applications. The existing work, however, only used either generative or information compression models to improve the performance of the target task. In this paper, we propose to combine the two types of IB models into one system to enhance Named Entity Recognition (NER). For one type of IB model, we incorporate two unsupervised generative components, span reconstruction and synonym generation, into a span-based NER system. The span reconstruction ensures that the contextualised span representation keeps the span information, while the synonym generation makes synonyms have similar representations even in different contexts. For the other type of IB model, we add a supervised IB layer that performs information compression into the system to preserve useful features for NER in the resulting span representations. Experiments on five different corpora indicate that jointly training both generative and information compression models can enhance the performance of the baseline span-based NER system. Our source code is publicly available at https://github.com/nguyennth/joint-ib-models. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: The paper has 13 pages but the main content is in 9 pages. There are two figures and 9 tables. The paper is accepted as a long paper at EACL 2023

arXiv:2204.00511 [pdf, other]

Learning Disentangled Representations of Negation and Uncertainty

Authors: Jake Vasilakes, Chrysoula Zerva, Makoto Miwa, Sophia Ananiadou

Abstract: Negation and uncertainty modeling are long-standing tasks in natural language processing. Linguistic theory postulates that expressions of negation and uncertainty are semantically independent from each other and the content they modify. However, previous works on representation learning do not explicitly model this independence. We therefore attempt to disentangle the representations of negation,… ▽ More Negation and uncertainty modeling are long-standing tasks in natural language processing. Linguistic theory postulates that expressions of negation and uncertainty are semantically independent from each other and the content they modify. However, previous works on representation learning do not explicitly model this independence. We therefore attempt to disentangle the representations of negation, uncertainty, and content using a Variational Autoencoder. We find that simply supervising the latent representations results in good disentanglement, but auxiliary objectives based on adversarial learning and mutual information minimization can provide additional disentanglement gains. △ Less

Submitted 1 April, 2022; originally announced April 2022.

Comments: Accepted to ACL 2022. 18 pages, 7 figures. Code and data are available at https://github.com/jvasilakes/disentanglement-vae

arXiv:2110.04077 [pdf, ps, other]

Physical Context and Timing Aware Sequence Generating GANs

Authors: Hayato Futase, Tomoki Tsujimura, Tetsuya Kajimoto, Hajime Kawarazaki, Toshiyuki Suzuki, Makoto Miwa, Yutaka Sasaki

Abstract: Generative Adversarial Networks (GANs) have shown remarkable successes in generating realistic images and interpolating changes between images. Existing models, however, do not take into account physical contexts behind images in generating the images, which may cause unrealistic changes. Furthermore, it is difficult to generate the changes at a specific timing and they often do not match with act… ▽ More Generative Adversarial Networks (GANs) have shown remarkable successes in generating realistic images and interpolating changes between images. Existing models, however, do not take into account physical contexts behind images in generating the images, which may cause unrealistic changes. Furthermore, it is difficult to generate the changes at a specific timing and they often do not match with actual changes. This paper proposes a novel GAN, named Physical Context and Timing aware sequence generating GANs (PCTGAN), that generates an image in a sequence at a specific timing between two images with considering physical contexts behind them. Our method consists of three components: an encoder, a generator, and a discriminator. The encoder estimates latent vectors from the beginning and ending images, their timings, and a target timing. The generator generates images and the physical contexts at the beginning, ending, and target timing from the corresponding latent vectors. The discriminator discriminates whether the generated images and contexts are real or not. In the experiments, PCTGAN is applied to a data set of sequential changes of shapes in die forging processes. We show that both timing and physical contexts are effective in generating sequential images. △ Less

Submitted 28 September, 2021; originally announced October 2021.

arXiv:2106.14157 [pdf, other]

Analyzing Research Trends in Inorganic Materials Literature Using NLP

Authors: Fusataka Kuniyoshi, Jun Ozawa, Makoto Miwa

Abstract: In the field of inorganic materials science, there is a growing demand to extract knowledge such as physical properties and synthesis processes of materials by machine-reading a large number of papers. This is because materials researchers refer to many papers in order to come up with promising terms of experiments for material synthesis. However, there are only a few systems that can extract mate… ▽ More In the field of inorganic materials science, there is a growing demand to extract knowledge such as physical properties and synthesis processes of materials by machine-reading a large number of papers. This is because materials researchers refer to many papers in order to come up with promising terms of experiments for material synthesis. However, there are only a few systems that can extract material names and their properties. This study proposes a large-scale natural language processing (NLP) pipeline for extracting material names and properties from materials science literature to enable the search and retrieval of results in materials science. Therefore, we propose a label definition for extracting material names and properties and accordingly build a corpus containing 836 annotated paragraphs extracted from 301 papers for training a named entity recognition (NER) model. Experimental results demonstrate the utility of this NER model; it achieves successful extraction with a micro-F1 score of 78.1%. To demonstrate the efficacy of our approach, we present a thorough evaluation on a real-world automatically annotated corpus by applying our trained NER model to 12,895 materials science papers. We analyze the trend in materials science by visualizing the outputs of the NLP pipeline. For example, the country-by-year analysis indicates that in recent years, the number of papers on "MoS2," a material used in perovskite solar cells, has been increasing rapidly in China but decreasing in the United States. Further, according to the conditions-by-year analysis, the processing temperature of the catalyst material "PEDOT:PSS" is shifting below 200 degree, and the number of reports with a processing time exceeding 5 h is increasing slightly. △ Less

Submitted 27 June, 2021; originally announced June 2021.

Comments: Accepted to ECML-PKDD2021. Preprint

arXiv:2106.09900 [pdf, other]

A Neural Edge-Editing Approach for Document-Level Relation Graph Extraction

Authors: Kohei Makino, Makoto Miwa, Yutaka Sasaki

Abstract: In this paper, we propose a novel edge-editing approach to extract relation information from a document. We treat the relations in a document as a relation graph among entities in this approach. The relation graph is iteratively constructed by editing edges of an initial graph, which might be a graph extracted by another system or an empty graph. The way to edit edges is to classify them in a clos… ▽ More In this paper, we propose a novel edge-editing approach to extract relation information from a document. We treat the relations in a document as a relation graph among entities in this approach. The relation graph is iteratively constructed by editing edges of an initial graph, which might be a graph extracted by another system or an empty graph. The way to edit edges is to classify them in a close-first manner using the document and temporally-constructed graph information; each edge is represented with a document context information by a pretrained transformer model and a graph context information by a graph convolutional neural network model. We evaluate our approach on the task to extract material synthesis procedures from materials science texts. The experimental results show the effectiveness of our approach in editing the graphs initialized by our in-house rule-based system and empty graphs. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: Accepted for publication at the Findings of the Association for Computational Linguistics (Findings-ACL2021), 2021. 10 pages, 6 figures, 8 tables

arXiv:2104.08225 [pdf, other]

Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors

Authors: Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou

Abstract: We propose a multi-task, probabilistic approach to facilitate distantly supervised relation extraction by bringing closer the representations of sentences that contain the same Knowledge Base pairs. To achieve this, we bias the latent space of sentences via a Variational Autoencoder (VAE) that is trained jointly with a relation classifier. The latent code guides the pair representations and influe… ▽ More We propose a multi-task, probabilistic approach to facilitate distantly supervised relation extraction by bringing closer the representations of sentences that contain the same Knowledge Base pairs. To achieve this, we bias the latent space of sentences via a Variational Autoencoder (VAE) that is trained jointly with a relation classifier. The latent code guides the pair representations and influences sentence reconstruction. Experimental results on two datasets created via distant supervision indicate that multi-task learning results in performance benefits. Additional exploration of employing Knowledge Base priors into the VAE reveals that the sentence space can be shifted towards that of the Knowledge Base, offering interpretability and further improving results. △ Less

Submitted 16 April, 2021; originally announced April 2021.

Comments: 16 pages, 9 figures, Accepted as a long paper at NAACL 2021

arXiv:2002.07339 [pdf, other]

Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature

Authors: Fusataka Kuniyoshi, Kohei Makino, Jun Ozawa, Makoto Miwa

Abstract: The synthesis process is essential for achieving computational experiment design in the field of inorganic materials chemistry. In this work, we present a novel corpus of the synthesis process for all-solid-state batteries and an automated machine reading system for extracting the synthesis processes buried in the scientific literature. We define the representation of the synthesis processes using… ▽ More The synthesis process is essential for achieving computational experiment design in the field of inorganic materials chemistry. In this work, we present a novel corpus of the synthesis process for all-solid-state batteries and an automated machine reading system for extracting the synthesis processes buried in the scientific literature. We define the representation of the synthesis processes using flow graphs, and create a corpus from the experimental sections of 243 papers. The automated machine-reading system is developed by a deep learning-based sequence tagger and simple heuristic rule-based relation extractor. Our experimental results demonstrate that the sequence tagger with the optimal setting can detect the entities with a macro-averaged F1 score of 0.826, while the rule-based relation extractor can achieve high performance with a macro-averaged F1 score of 0.887. △ Less

Submitted 17 February, 2020; originally announced February 2020.

Comments: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France

arXiv:1910.10281 [pdf, other]

A Search-based Neural Model for Biomedical Nested and Overlapping Event Detection

Authors: Kurt Espinosa, Makoto Miwa, Sophia Ananiadou

Abstract: We tackle the nested and overlapping event detection task and propose a novel search-based neural network (SBNN) structured prediction model that treats the task as a search problem on a relation graph of trigger-argument structures. Unlike existing structured prediction tasks such as dependency parsing, the task targets to detect DAG structures, which constitute events, from the relation graph. W… ▽ More We tackle the nested and overlapping event detection task and propose a novel search-based neural network (SBNN) structured prediction model that treats the task as a search problem on a relation graph of trigger-argument structures. Unlike existing structured prediction tasks such as dependency parsing, the task targets to detect DAG structures, which constitute events, from the relation graph. We define actions to construct events and use all the beams in a beam search to detect all event structures that may be overlapping and nested. The search process constructs events in a bottom-up manner while modelling the global properties for nested and overlapping structures simultaneously using neural networks. We show that the model achieves performance comparable to the state-of-the-art model Turku Event Extraction System (TEES) on the BioNLP Cancer Genetics (CG) Shared Task 2013 without the use of any syntactic and hand-engineered features. Further analyses on the development set show that our model is more computationally efficient while yielding higher F1-score performance. △ Less

Submitted 24 October, 2019; v1 submitted 22 October, 2019; originally announced October 2019.

Comments: Accepted at EMNLP-IJCNLP 2019

arXiv:1909.00228 [pdf, other]

Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs

Authors: Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou

Abstract: Document-level relation extraction is a complex human process that requires logical inference to extract relationships between named entities in text. Existing approaches use graph-based neural models with words as nodes and edges as relations between them, to encode relations across sentences. These models are node-based, i.e., they form pair representations based solely on the two target node re… ▽ More Document-level relation extraction is a complex human process that requires logical inference to extract relationships between named entities in text. Existing approaches use graph-based neural models with words as nodes and edges as relations between them, to encode relations across sentences. These models are node-based, i.e., they form pair representations based solely on the two target node representations. However, entity relations can be better expressed through unique edge representations formed as paths between nodes. We thus propose an edge-oriented graph neural model for document-level relation extraction. The model utilises different types of nodes and edges to create a document-level graph. An inference mechanism on the graph edges enables to learn intra- and inter-sentence relations using multi-instance learning internally. Experiments on two document-level biomedical datasets for chemical-disease and gene-disease associations show the usefulness of the proposed edge-oriented approach. △ Less

Submitted 31 August, 2019; originally announced September 2019.

Comments: 12 pages, 5 figures, 6 tables. Accepted in EMNLP-IJCNLP 2019

arXiv:1906.04684 [pdf, other]

Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network

Authors: Sunil Kumar Sahu, Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou

Abstract: Inter-sentence relation extraction deals with a number of complex semantic relationships in documents, which require local, non-local, syntactic and semantic dependencies. Existing methods do not fully exploit such dependencies. We present a novel inter-sentence relation extraction model that builds a labelled edge graph convolutional neural network model on a document-level graph. The graph is co… ▽ More Inter-sentence relation extraction deals with a number of complex semantic relationships in documents, which require local, non-local, syntactic and semantic dependencies. Existing methods do not fully exploit such dependencies. We present a novel inter-sentence relation extraction model that builds a labelled edge graph convolutional neural network model on a document-level graph. The graph is constructed using various inter- and intra-sentence dependencies to capture local and non-local dependency information. In order to predict the relation of an entity pair, we utilise multi-instance learning with bi-affine pairwise scoring. Experimental results show that our model achieves comparable performance to the state-of-the-art neural models on two biochemistry datasets. Our analysis shows that all the types in the graph are effective for inter-sentence relation extraction. △ Less

Submitted 11 June, 2019; originally announced June 2019.

Comments: Accepted in Association for Computational Linguistics (ACL) 2019 8 pages, 3 figures, 3 tables

arXiv:1903.12650 [pdf, ps, other]

Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds

Authors: Masafumi Yamazaki, Akihiko Kasagi, Akihiro Tabuchi, Takumi Honda, Masahiro Miwa, Naoto Fukumoto, Tsuguchika Tabaru, Atsushi Ike, Kohta Nakashima

Abstract: There has been a strong demand for algorithms that can execute machine learning as faster as possible and the speed of deep learning has accelerated by 30 times only in the past two years. Distributed deep learning using the large mini-batch is a key technology to address the demand and is a great challenge as it is difficult to achieve high scalability on large clusters without compromising accur… ▽ More There has been a strong demand for algorithms that can execute machine learning as faster as possible and the speed of deep learning has accelerated by 30 times only in the past two years. Distributed deep learning using the large mini-batch is a key technology to address the demand and is a great challenge as it is difficult to achieve high scalability on large clusters without compromising accuracy. In this paper, we introduce optimization methods which we applied to this challenge. We achieved the training time of 74.7 seconds using 2,048 GPUs on ABCI cluster applying these methods. The training throughput is over 1.73 million images/sec and the top-1 validation accuracy is 75.08%. △ Less

Submitted 29 March, 2019; originally announced March 2019.

arXiv:1902.07023 [pdf, other]

A Walk-based Model on Entity Graphs for Relation Extraction

Authors: Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou

Abstract: We present a novel graph-based neural network model for relation extraction. Our model treats multiple pairs in a sentence simultaneously and considers interactions among them. All the entities in a sentence are placed as nodes in a fully-connected graph structure. The edges are represented with position-aware contexts around the entity pairs. In order to consider different relation paths between… ▽ More We present a novel graph-based neural network model for relation extraction. Our model treats multiple pairs in a sentence simultaneously and considers interactions among them. All the entities in a sentence are placed as nodes in a fully-connected graph structure. The edges are represented with position-aware contexts around the entity pairs. In order to consider different relation paths between two entities, we construct up to l-length walks between each pair. The resulting walks are merged and iteratively used to update the edge representations into longer walks representations. We show that the model achieves performance comparable to the state-of-the-art systems on the ACE 2005 dataset without using any external tools. △ Less

Submitted 13 March, 2020; v1 submitted 19 February, 2019; originally announced February 2019.

Comments: 8 pages, 2 figures, 2 tables

Journal ref: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2018, pages 81-88

arXiv:1805.05593 [pdf, other]

Enhancing Drug-Drug Interaction Extraction from Texts by Molecular Structure Information

Authors: Masaki Asada, Makoto Miwa, Yutaka Sasaki

Abstract: We propose a novel neural method to extract drug-drug interactions (DDIs) from texts using external drug molecular structure information. We encode textual drug pairs with convolutional neural networks and their molecular pairs with graph convolutional networks (GCNs), and then we concatenate the outputs of these two networks. In the experiments, we show that GCNs can predict DDIs from the molecul… ▽ More We propose a novel neural method to extract drug-drug interactions (DDIs) from texts using external drug molecular structure information. We encode textual drug pairs with convolutional neural networks and their molecular pairs with graph convolutional networks (GCNs), and then we concatenate the outputs of these two networks. In the experiments, we show that GCNs can predict DDIs from the molecular structures of drugs in high accuracy and the molecular information can enhance text-based DDI extraction by 2.39 percent points in the F-score on the DDIExtraction 2013 shared task data set. △ Less

Submitted 15 May, 2018; originally announced May 2018.

Comments: accepted as a short paper at ACL2018

arXiv:1706.05122 [pdf, ps, other]

Bib2vec: An Embedding-based Search System for Bibliographic Information

Authors: Takuma Yoneda, Koki Mori, Makoto Miwa, Yutaka Sasaki

Abstract: We propose a novel embedding model that represents relationships among several elements in bibliographic information with high representation ability and flexibility. Based on this model, we present a novel search system that shows the relationships among the elements in the ACL Anthology Reference Corpus. The evaluation results show that our model can achieve a high prediction ability and produce… ▽ More We propose a novel embedding model that represents relationships among several elements in bibliographic information with high representation ability and flexibility. Based on this model, we present a novel search system that shows the relationships among the elements in the ACL Anthology Reference Corpus. The evaluation results show that our model can achieve a high prediction ability and produce reasonable search results. △ Less

Submitted 5 April, 2018; v1 submitted 15 June, 2017; originally announced June 2017.

Comments: EACL2017 extended version. The demonstration is available at http://tti-coin.jp/demo/bib2vec/

Journal ref: Proceedings of the EACL 2017 Software Demonstrations, Valencia, Spain, April 3-7 2017, pages 112-115

arXiv:1601.00770 [pdf, other]

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

Authors: Makoto Miwa, Mohit Bansal

Abstract: We present a novel end-to-end neural model to extract entities and relations between them. Our recurrent neural network based model captures both word sequence and dependency tree substructure information by stacking bidirectional tree-structured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows our model to jointly represent both entities and relations with shared parameters in a singl… ▽ More We present a novel end-to-end neural model to extract entities and relations between them. Our recurrent neural network based model captures both word sequence and dependency tree substructure information by stacking bidirectional tree-structured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows our model to jointly represent both entities and relations with shared parameters in a single model. We further encourage detection of entities during training and use of entity information in relation extraction via entity pretraining and scheduled sampling. Our model improves over the state-of-the-art feature-based model on end-to-end relation extraction, achieving 12.1% and 5.7% relative error reductions in F1-score on ACE2005 and ACE2004, respectively. We also show that our LSTM-RNN based model compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8). Finally, we present an extensive ablation analysis of several model components. △ Less

Submitted 7 June, 2016; v1 submitted 5 January, 2016; originally announced January 2016.

Comments: Accepted for publication at the Association for Computational Linguistics (ACL), 2016. 13 pages, 1 figure, 6 tables

arXiv:1503.00095 [pdf, ps, other]

Task-Oriented Learning of Word Embeddings for Semantic Relation Classification

Authors: Kazuma Hashimoto, Pontus Stenetorp, Makoto Miwa, Yoshimasa Tsuruoka

Abstract: We present a novel learning method for word embeddings designed for relation classification. Our word embeddings are trained by predicting words between noun pairs using lexical relation-specific features on a large unlabeled corpus. This allows us to explicitly incorporate relation-specific information into the word embeddings. The learned word embeddings are then used to construct feature vector… ▽ More We present a novel learning method for word embeddings designed for relation classification. Our word embeddings are trained by predicting words between noun pairs using lexical relation-specific features on a large unlabeled corpus. This allows us to explicitly incorporate relation-specific information into the word embeddings. The learned word embeddings are then used to construct feature vectors for a relation classification model. On a well-established semantic relation classification task, our method significantly outperforms a baseline based on a previously introduced word embedding method, and compares favorably to previous state-of-the-art models that use syntactic information or manually constructed external resources. △ Less

Submitted 22 June, 2015; v1 submitted 28 February, 2015; originally announced March 2015.

Comments: The Nineteenth Conference on Computational Natural Language Learning (CoNLL 2015)

arXiv:0807.2701 [pdf, other]

A Cutting Plane Method based on Redundant Rows for Improving Fractional Distance

Authors: Makoto Miwa, Tadashi Wadayama, Ichi Takumi

Abstract: In this paper, an idea of the cutting plane method is employed to improve the fractional distance of a given binary parity check matrix. The fractional distance is the minimum weight (with respect to l1-distance) of vertices of the fundamental polytope. The cutting polytope is defined based on redundant rows of the parity check matrix and it plays a key role to eliminate unnecessary fractional v… ▽ More In this paper, an idea of the cutting plane method is employed to improve the fractional distance of a given binary parity check matrix. The fractional distance is the minimum weight (with respect to l1-distance) of vertices of the fundamental polytope. The cutting polytope is defined based on redundant rows of the parity check matrix and it plays a key role to eliminate unnecessary fractional vertices in the fundamental polytope. We propose a greedy algorithm and its efficient implementation for improving the fractional distance based on the cutting plane method. △ Less

Submitted 17 July, 2008; originally announced July 2008.

Comments: 8 pages, To be presented at Turbo Coding 2008

Showing 1–18 of 18 results for author: Miwa, M