Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Loem, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.17790  [pdf, other

    cs.CL cs.AI

    Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

    Authors: Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki

    Abstract: Cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus allows us to leverage the vast amount of English language resources and reduce the pre-training cost. In this study, we constructed Swallow, an LLM with enhanced Japanese capability, by extending the vocabulary of Llama 2 to include Japanese characters and conducting continual pre-training on a… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  2. arXiv:2404.17733  [pdf, other

    cs.CL cs.AI

    Building a Large Japanese Web Corpus for Large Language Models

    Authors: Naoaki Okazaki, Kakeru Hattori, Hirai Shota, Hiroki Iida, Masanari Ohi, Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Rio Yokota, Sakae Mizuki

    Abstract: Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This c… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 17 pages

  3. arXiv:2402.15987  [pdf, other

    cs.CL cs.AI

    Likelihood-based Mitigation of Evaluation Bias in Large Language Models

    Authors: Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, Naoaki Okazaki

    Abstract: Large Language Models (LLMs) are widely used to evaluate natural language generation tasks as automated metrics. However, the likelihood, a measure of LLM's plausibility for a sentence, can vary due to superficial differences in sentences, such as word order and sentence structure. It is therefore possible that there might be a likelihood bias if LLMs are used for evaluation: they might overrate s… ▽ More

    Submitted 1 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: 4 main pages

  4. arXiv:2311.08107  [pdf, other

    cs.CL

    SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks

    Authors: Mengsay Loem, Masahiro Kaneko, Naoaki Okazaki

    Abstract: Large Language Models (LLMs) can justify or critique their predictions through discussions with other models or humans, thereby enriching their intrinsic understanding of instances. While proactive discussions in the inference phase have been shown to boost performance, such interactions have not been extensively explored during the training phase. We hypothesize that incorporating interactive dis… ▽ More

    Submitted 29 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  5. arXiv:2305.18156  [pdf, other

    cs.CL cs.AI

    Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods

    Authors: Mengsay Loem, Masahiro Kaneko, Sho Takase, Naoaki Okazaki

    Abstract: Large-scale pre-trained language models such as GPT-3 have shown remarkable performance across various natural language processing tasks. However, applying prompt-based methods with GPT-3 for Grammatical Error Correction (GEC) tasks and their controllability remains underexplored. Controllability in GEC is crucial for real-world applications, particularly in educational settings, where the ability… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted in BEA 2023

  6. arXiv:2207.13354  [pdf, other

    cs.CL

    Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention

    Authors: Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki

    Abstract: Impressive performance of Transformer has been attributed to self-attention, where dependencies between entire input in a sequence are considered at every position. In this work, we reform the neural $n$-gram model, which focuses on only several surrounding representations of each position, with the multi-head mechanism as in Vaswani et al.(2017). Through experiments on sequence-to-sequence tasks,… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

  7. arXiv:2201.05313  [pdf, other

    cs.CL

    ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

    Authors: Mengsay Loem, Sho Takase, Masahiro Kaneko, Naoaki Okazaki

    Abstract: Neural models trained with large amount of parallel data have achieved impressive performance in abstractive summarization tasks. However, large-scale parallel corpora are expensive and challenging to construct. In this work, we introduce a low-cost and effective strategy, ExtraPhrase, to augment training data for abstractive summarization tasks. ExtraPhrase constructs pseudo training data in two… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.