Zum Hauptinhalt springen

Showing 1–38 of 38 results for author: Komachi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.17540  [pdf, other

    cs.CL

    Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction

    Authors: Masamune Kobayashi, Masato Mita, Mamoru Komachi

    Abstract: Large Language Models (LLMs) have been reported to outperform existing automatic evaluation metrics in some tasks, such as text summarization and machine translation. However, there has been a lack of research on LLMs as evaluators in grammatical error correction (GEC). In this study, we investigate the performance of LLMs in GEC evaluation by employing prompts designed to incorporate various eval… ▽ More

    Submitted 26 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to BEA workshop at NAACL 2024

  2. arXiv:2403.02674  [pdf, other

    cs.CL

    Revisiting Meta-evaluation for Grammatical Error Correction

    Authors: Masamune Kobayashi, Masato Mita, Mamoru Komachi

    Abstract: Metrics are the foundation for automatic evaluation in grammatical error correction (GEC), with their evaluation of the metrics (meta-evaluation) relying on their correlation with human judgments. However, conventional meta-evaluations in English GEC encounter several challenges including biases caused by inconsistencies in evaluation granularity, and an outdated setup using classical systems. The… ▽ More

    Submitted 26 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to TACL; Presented at EMNLP 2024

  3. arXiv:2305.05928  [pdf, ps, other

    cs.CL

    WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia

    Authors: Kenichiro Ando, Satoshi Sekine, Mamoru Komachi

    Abstract: Wikipedia can be edited by anyone and thus contains various quality sentences. Therefore, Wikipedia includes some poor-quality edits, which are often marked up by other editors. While editors' reviews enhance the credibility of Wikipedia, it is hard to check all edited text. Assisting in this process is very important, but a large and comprehensive dataset for studying it does not currently exist.… ▽ More

    Submitted 29 December, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: AAAI 2024 Main Track Accepted

  4. Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?

    Authors: Kenichiro Ando, Mamoru Komachi, Takashi Okumura, Hiromasa Horiguchi, Yuji Matsumoto

    Abstract: During the patient's hospitalization, the physician must record daily observations of the patient and summarize them into a brief document called "discharge summary" when the patient is discharged. Automated generation of discharge summary can greatly relieve the physicians' burden, and has been addressed recently in the research community. Most previous studies of discharge summary generation usi… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Journal ref: International Conference on Technologies and Applications of Artificial Intelligence (TAAI). 2022;143-148

  5. Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

    Authors: Kenichiro Ando, Takashi Okumura, Mamoru Komachi, Hiromasa Horiguchi, Yuji Matsumoto

    Abstract: Automated summarization of clinical texts can reduce the burden of medical professionals. "Discharge summaries" are one promising application of the summarization, because they can be generated from daily inpatient records. Our preliminary experiment suggests that 20-31% of the descriptions in discharge summaries overlap with the content of the inpatient records. However, it remains unclear how th… ▽ More

    Submitted 20 December, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

    Journal ref: PLOS Digital Health. 2022;1(9):1-19

  6. arXiv:2201.11258  [pdf, other

    cs.CL

    Learning How to Translate North Korean through South Korean

    Authors: Hwichan Kim, Sangwhan Moon, Naoaki Okazaki, Mamoru Komachi

    Abstract: South and North Korea both use the Korean language. However, Korean NLP research has focused on South Korean only, and existing NLP systems of the Korean language, such as neural machine translation (NMT) models, cannot properly handle North Korean inputs. Training a model using North Korean data is the most straightforward approach to solving this problem, but there is insufficient data to train… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: 8 pages, 1 figures, 8 tables

  7. arXiv:2201.08038  [pdf, ps, other

    cs.CL

    Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

    Authors: Daisuke Suzuki, Yujin Takahashi, Ikumi Yamashita, Taichi Aida, Tosho Hirasawa, Michitaka Nakatsuji, Masato Mita, Mamoru Komachi

    Abstract: In grammatical error correction (GEC), automatic evaluation is an important factor for research and development of GEC systems. Previous studies on automatic evaluation have demonstrated that quality estimation models built from datasets with manual evaluation can achieve high performance in automatic evaluation of English GEC without using reference sentences.. However, quality estimation models… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: 8 pages (6pages + references)

  8. arXiv:2201.06199  [pdf, other

    cs.CL

    Proficiency Matters Quality Estimation in Grammatical Error Correction

    Authors: Yujin Takahashi, Masahiro Kaneko, Masato Mita, Mamoru Komachi

    Abstract: This study investigates how supervised quality estimation (QE) models of grammatical error correction (GEC) are affected by the learners' proficiency with the data. QE models for GEC evaluations in prior work have obtained a high correlation with manual evaluations. However, when functioning in a real-world context, the data used for the reported results have limitations because prior works were b… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

    Comments: 6 pages (4 pages + references)

  9. arXiv:2106.06689  [pdf, other

    cs.CL

    Neural Combinatory Constituency Parsing

    Authors: Zhousi Chen, Longtu Zhang, Aizhan Imankulova, Mamoru Komachi

    Abstract: We propose two fast neural combinatory models for constituency parsing: binary and multi-branching. Our models decompose the bottom-up parsing process into 1) classification of tags, labels, and binary orientations or chunks and 2) vector composition based on the computed orientations or chunks. These models have theoretical sub-quadratic complexity and empirical linear complexity. The binary mode… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: Findings of ACL 2021; 15 pages

  10. arXiv:2105.07316  [pdf, other

    cs.CL

    From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

    Authors: Rob van der Goot, Ibrahim Sharaf, Aizhan Imankulova, Ahmet Üstün, Marija Stepanović, Alan Ramponi, Siti Oryza Khairunnisa, Mamoru Komachi, Barbara Plank

    Abstract: The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual Slot and Intent… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: To appear in the proceedings of NAACL 2021

  11. arXiv:2104.08478  [pdf, other

    cs.CL

    Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation

    Authors: Seiichiro Kondo, Kengo Hotate, Masahiro Kaneko, Mamoru Komachi

    Abstract: Neural machine translation (NMT) has recently gained widespread attention because of its high translation accuracy. However, it shows poor performance in the translation of long sentences, which is a major issue in low-resource languages. It is assumed that this issue is caused by insufficient number of long sentences in the training data. Therefore, this study proposes a simple data augmentation… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 7 pages; camera-ready for NAACL Student Research Workshop 2021

  12. arXiv:2104.07848  [pdf, other

    cs.CL

    Comparison of Grammatical Error Correction Using Back-Translation Models

    Authors: Aomi Koyama, Kengo Hotate, Masahiro Kaneko, Mamoru Komachi

    Abstract: Grammatical error correction (GEC) suffers from a lack of sufficient parallel data. Therefore, GEC studies have developed various methods to generate pseudo data, which comprise pairs of grammatical and artificially produced ungrammatical sentences. Currently, a mainstream approach to generate pseudo data is back-translation (BT). Most previous GEC studies using BT have employed the same architect… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: 10 pages; camera-ready for NAACL Student Research Workshop 2021

  13. arXiv:2011.02093  [pdf, other

    cs.CL

    Chinese Grammatical Correction Using BERT-based Pre-trained Model

    Authors: Hongfei Wang, Michiki Kurosawa, Satoru Katsumata, Mamoru Komachi

    Abstract: In recent years, pre-trained models have been extensively studied, and several downstream tasks have benefited from their utilization. In this study, we verify the effectiveness of two methods that incorporate a BERT-based pre-trained model developed by Cui et al. (2020) into an encoder-decoder model on Chinese grammatical error correction tasks. We also analyze the error type and conclude that se… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: 6 pages; AACL-IJCNLP 2020

  14. arXiv:2010.01793  [pdf, other

    eess.AS cs.SD

    JSSS: free Japanese speech corpus for summarization and simplification

    Authors: Shinnosuke Takamichi, Mamoru Komachi, Naoko Tanji, Hiroshi Saruwatari

    Abstract: In this paper, we construct a new Japanese speech corpus for speech-based summarization and simplification, "JSSS" (pronounced "j-triple-s"). Given the success of reading-style speech synthesis from short-form sentences, we aim to design more difficult tasks for delivering information to humans. Our corpus contains voices recorded for two tasks that have a role in providing information under const… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

  15. arXiv:2006.12799  [pdf, other

    cs.CL

    Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020

    Authors: Tosho Hirasawa, Zhishen Yang, Mamoru Komachi, Naoaki Okazaki

    Abstract: Video-guided machine translation as one of multimodal neural machine translation tasks targeting on generating high-quality text translation by tangibly engaging both video and text. In this work, we presented our video-guided machine translation system in approaching the Video-guided Machine Translation Challenge 2020. This system employs keyframe-based video feature extractions along with the vi… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 4 pages; First Workshop on Advances in Language and Vision Research (ALVR 2020)

  16. arXiv:2005.11849  [pdf, ps, other

    cs.CL

    Stronger Baselines for Grammatical Error Correction Using Pretrained Encoder-Decoder Model

    Authors: Satoru Katsumata, Mamoru Komachi

    Abstract: Studies on grammatical error correction (GEC) have reported the effectiveness of pretraining a Seq2Seq model with a large amount of pseudodata. However, this approach requires time-consuming pretraining for GEC because of the size of the pseudodata. In this study, we explore the utility of bidirectional and auto-regressive transformers (BART) as a generic pretrained encoder-decoder model for GEC.… ▽ More

    Submitted 29 September, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

    Comments: 6 pages; AACL-IJCNLP 2020

  17. arXiv:2004.03180  [pdf, other

    cs.CL

    Towards Multimodal Simultaneous Neural Machine Translation

    Authors: Aizhan Imankulova, Masahiro Kaneko, Tosho Hirasawa, Mamoru Komachi

    Abstract: Simultaneous translation involves translating a sentence before the speaker's utterance is completed in order to realize real-time understanding in multiple languages. This task is significantly more challenging than the general full sentence translation because of the shortage of input information during decoding. To alleviate this shortage, we propose multimodal simultaneous neural machine trans… ▽ More

    Submitted 23 October, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: 10 pages; WMT 2020

  18. arXiv:1909.04879  [pdf, other

    cs.CL

    Dynamic Fusion: Attentional Language Model for Neural Machine Translation

    Authors: Michiki Kurosawa, Mamoru Komachi

    Abstract: Neural Machine Translation (NMT) can be used to generate fluent output. As such, language models have been investigated for incorporation with NMT. In prior investigations, two models have been used: a translation model and a language model. The translation model's predictions are weighted by the language model with a hand-crafted ratio in advance. However, these approaches fail to adopt the langu… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: 13 pages; PACLING 2019

  19. arXiv:1909.00531  [pdf, other

    cs.CL

    Improving Context-aware Neural Machine Translation with Target-side Context

    Authors: Hayahide Yamagishi, Mamoru Komachi

    Abstract: In recent years, several studies on neural machine translation (NMT) have attempted to use document-level context by using a multi-encoder and two attention mechanisms to read the current and previous sentences to incorporate the context of the previous sentences. These studies concluded that the target-side context is less useful than the source-side context. However, we considered that the reaso… ▽ More

    Submitted 2 September, 2019; originally announced September 2019.

    Comments: 12 pages; PACLING 2019

  20. arXiv:1907.12679  [pdf, ps, other

    cs.CL

    Machine Translation Evaluation with BERT Regressor

    Authors: Hiroki Shimanaka, Tomoyuki Kajiwara, Mamoru Komachi

    Abstract: We introduce the metric using BERT (Bidirectional Encoder Representations from Transformers) (Devlin et al., 2019) for automatic machine translation evaluation. The experimental results of the WMT-2017 Metrics Shared Task dataset show that our metric achieves state-of-the-art performance in segment-level metrics task for all to-English language pairs.

    Submitted 29 July, 2019; originally announced July 2019.

    Comments: 6 pages

  21. arXiv:1907.09724  [pdf, ps, other

    cs.CL

    Towards Unsupervised Grammatical Error Correction using Statistical Machine Translation with Synthetic Comparable Corpus

    Authors: Satoru Katsumata, Mamoru Komachi

    Abstract: We introduce unsupervised techniques based on phrase-based statistical machine translation for grammatical error correction (GEC) trained on a pseudo learner corpus created by Google Translation. We verified our GEC system through experiments on various GEC dataset, includi ng a low resource track of the shared task at Building Educational Applications 2019 (BEA 2019). As a result, we achieved an… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: 7 pages; extended version of BEA 2019

  22. arXiv:1905.10464  [pdf, other

    cs.CL

    Debiasing Word Embeddings Improves Multimodal Machine Translation

    Authors: Tosho Hirasawa, Mamoru Komachi

    Abstract: In recent years, pretrained word embeddings have proved useful for multimodal neural machine translation (NMT) models to address the shortage of available datasets. However, the integration of pretrained word embeddings has not yet been explored extensively. Further, pretrained word embeddings in high dimensional spaces have been reported to suffer from the hubness problem. Although some debiasing… ▽ More

    Submitted 22 June, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: 11 pages; MT Summit 2019 (camera ready)

  23. arXiv:1904.07334  [pdf, other

    cs.CL

    Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection

    Authors: Masahiro Kaneko, Mamoru Komachi

    Abstract: It is known that a deep neural network model pre-trained with large-scale data greatly improves the accuracy of various tasks, especially when there are resource constraints. However, the information needed to solve a given task can vary, and simply using the output of the final layer is not necessarily sufficient. Moreover, to our knowledge, exploiting large language representation models to dete… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 12 pages; CICLing 2019

  24. arXiv:1904.02244  [pdf, other

    cs.CL

    Multi-task Learning for Japanese Predicate Argument Structure Analysis

    Authors: Hikaru Omori, Mamoru Komachi

    Abstract: An event-noun is a noun that has an argument structure similar to a predicate. Recent works, including those considered state-of-the-art, ignore event-nouns or build a single model for solving both Japanese predicate argument structure analysis (PASA) and event-noun argument structure analysis (ENASA). However, because there are interactions between predicates and event-nouns, it is not sufficient… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: 10 pages; NAACL 2019

  25. arXiv:1904.00639  [pdf, other

    cs.CL

    Multimodal Machine Translation with Embedding Prediction

    Authors: Tosho Hirasawa, Hayahide Yamagishi, Yukio Matsumura, Mamoru Komachi

    Abstract: Multimodal machine translation is an attractive application of neural machine translation (NMT). It helps computers to deeply understand visual objects and their relations with natural languages. However, multimodal NMT systems suffer from a shortage of available training data, resulting in poor performance for translating rare words. In NMT, pretrained word embeddings have been shown to improve N… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: 6 pages; NAACL 2019 Student Research Workshop

  26. arXiv:1903.00149  [pdf

    cs.CL

    Chinese-Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information

    Authors: Longtu Zhang, Mamoru Komachi

    Abstract: Unsupervised neural machine translation (UNMT) requires only monolingual data of similar language pairs during training and can produce bi-directional translation models with relatively good performance on alphabetic languages (Lample et al., 2018). However, no research has been done to logographic language pairs. This study focuses on Chinese-Japanese UNMT trained by data containing sub-character… ▽ More

    Submitted 28 February, 2019; originally announced March 2019.

    Comments: 5 pages

  27. arXiv:1901.10196  [pdf, ps, other

    cs.CL

    Divide and Generate: Neural Generation of Complex Sentences

    Authors: Tomoya Ogata, Mamoru Komachi, Tomoya Takatani

    Abstract: We propose a task to generate a complex sentence from a simple sentence in order to amplify various kinds of responses in the database. We first divide a complex sentence into a main clause and a subordinate clause to learn a generator model of modifiers, and then use the model to generate a modifier clause to create a complex sentence from a simple sentence. We present an automatic evaluation met… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

  28. arXiv:1809.10867  [pdf, other

    cs.CL

    The Rule of Three: Abstractive Text Summarization in Three Bullet Points

    Authors: Tomonori Kodaira, Mamoru Komachi

    Abstract: Neural network-based approaches have become widespread for abstractive text summarization. Though previously proposed models for abstractive text summarization addressed the problem of repetition of the same contents in the summary, they did not explicitly consider its information structure. One of the reasons these previous models failed to account for information structure in the generated summa… ▽ More

    Submitted 28 September, 2018; originally announced September 2018.

    Comments: 9 pages; PACLIC 2018

  29. arXiv:1809.02694  [pdf

    cs.CL

    Neural Machine Translation of Logographic Languages Using Sub-character Level Information

    Authors: Longtu Zhang, Mamoru Komachi

    Abstract: Recent neural machine translation (NMT) systems have been greatly improved by encoder-decoder models with attention mechanisms and sub-word units. However, important differences between languages with logographic and alphabetic writing systems have long been overlooked. This study focuses on these differences and uses a simple approach to improve the performance of NMT systems utilizing decomposed… ▽ More

    Submitted 7 September, 2018; originally announced September 2018.

    Comments: WMT 2018 (regular paper); 9 pages

  30. arXiv:1805.11189  [pdf, other

    cs.CL

    Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models

    Authors: Satoru Katsumata, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi

    Abstract: Encoder-decoder models typically only employ words that are frequently used in the training corpus to reduce the computational costs and exclude noise. However, this vocabulary set may still include words that interfere with learning in encoder-decoder models. This paper proposes a method for selecting more suitable words for learning encoders by utilizing not only frequency, but also co-occurrenc… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.

    Comments: 8 pages; 2018 ACL Student Research Workshop

  31. arXiv:1805.10047  [pdf, ps, other

    cs.CL

    Japanese Predicate Conjugation for Neural Machine Translation

    Authors: Michiki Kurosawa, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi

    Abstract: Neural machine translation (NMT) has a drawback in that can generate only high-frequency words owing to the computational costs of the softmax function in the output layer. In Japanese-English NMT, Japanese predicate conjugation causes an increase in vocabulary size. For example, one verb can have as many as 19 surface varieties. In this research, we focus on predicate conjugation for compressin… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

    Comments: 6 pages; NAACL 2018 Student Research Workshop

  32. arXiv:1805.07469  [pdf, ps, other

    cs.CL

    Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations

    Authors: Hiroki Shimanaka, Tomoyuki Kajiwara, Mamoru Komachi

    Abstract: Sentence representations can capture a wide range of information that cannot be captured by local features based on character or word N-grams. This paper examines the usefulness of universal sentence representations for evaluating the quality of machine translation. Although it is difficult to train sentence representations using small-scale translation datasets with manual evaluation, sentence re… ▽ More

    Submitted 18 May, 2018; originally announced May 2018.

    Comments: NAACL 2018 Student Research Workshop; 6 pages

  33. arXiv:1709.08011  [pdf, other

    cs.CL

    Long Short-Term Memory for Japanese Word Segmentation

    Authors: Yoshiaki Kitagawa, Mamoru Komachi

    Abstract: This study presents a Long Short-Term Memory (LSTM) neural network approach to Japanese word segmentation (JWS). Previous studies on Chinese word segmentation (CWS) succeeded in using recurrent neural networks such as LSTM and gated recurrent units (GRU). However, in contrast to Chinese, Japanese includes several character types, such as hiragana, katakana, and kanji, that produce orthographic var… ▽ More

    Submitted 26 September, 2018; v1 submitted 23 September, 2017; originally announced September 2017.

    Comments: 10 pages; PACLIC 2018

  34. arXiv:1706.08198  [pdf, other

    cs.CL

    English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor

    Authors: Yukio Matsumura, Takayuki Sato, Mamoru Komachi

    Abstract: Neural machine translation (NMT) has recently become popular in the field of machine translation. However, NMT suffers from the problem of repeating or missing words in the translation. To address this problem, Tu et al. (2017) proposed an encoder-decoder-reconstructor framework for NMT using back-translation. In this method, they selected the best forward translation model in the same manner as B… ▽ More

    Submitted 25 June, 2017; originally announced June 2017.

    Comments: 8 pages

  35. arXiv:1704.00924  [pdf, other

    cs.CL

    Japanese Sentiment Classification using a Tree-Structured Long Short-Term Memory with Attention

    Authors: Ryosuke Miyazaki, Mamoru Komachi

    Abstract: Previous approaches to training syntax-based sentiment classification models required phrase-level annotated corpora, which are not readily available in many languages other than English. Thus, we propose the use of tree-structured Long Short-Term Memory with an attention mechanism that pays attention to each subtree of the parse tree. Experimental results indicate that our model achieves the stat… ▽ More

    Submitted 29 September, 2018; v1 submitted 4 April, 2017; originally announced April 2017.

    Comments: 10 pages; PACLIC 2018

  36. arXiv:1704.00380  [pdf, other

    cs.CL

    Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings

    Authors: Junki Matsuo, Mamoru Komachi, Katsuhito Sudoh

    Abstract: One of the most important problems in machine translation (MT) evaluation is to evaluate the similarity between translation hypotheses with different surface forms from the reference, especially at the segment level. We propose to use word embeddings to perform word alignment for segment-level MT evaluation. We performed experiments with three types of alignment methods using word embeddings. We e… ▽ More

    Submitted 2 April, 2017; originally announced April 2017.

    Comments: 5 pages

  37. arXiv:1703.05916  [pdf, other

    cs.CL

    Construction of a Japanese Word Similarity Dataset

    Authors: Yuya Sakaizawa, Mamoru Komachi

    Abstract: An evaluation of distributed word representation is generally conducted using a word similarity task and/or a word analogy task. There are many datasets readily available for these tasks in English. However, evaluating distributed representation in languages that do not have such resources (e.g., Japanese) is difficult. Therefore, as a first step toward evaluating distributed representations in Ja… ▽ More

    Submitted 22 February, 2018; v1 submitted 17 March, 2017; originally announced March 2017.

    Comments: LREC 2018; 4 pages

  38. arXiv:1703.04879  [pdf, other

    cs.CL

    Sparse Named Entity Classification using Factorization Machines

    Authors: Ai Hirata, Mamoru Komachi

    Abstract: Named entity classification is the task of classifying text-based elements into various categories, including places, names, dates, times, and monetary values. A bottleneck in named entity classification, however, is the data problem of sparseness, because new named entities continually emerge, making it rather difficult to maintain a dictionary for named entity classification. Thus, in this paper… ▽ More

    Submitted 14 March, 2017; originally announced March 2017.

    Comments: 4+1 pages