Skip to main content

Showing 1–7 of 7 results for author: Murawaki, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2402.18877  [pdf, other

    cs.CL

    Principal Component Analysis as a Sanity Check for Bayesian Phylolinguistic Reconstruction

    Authors: Yugo Murawaki

    Abstract: Bayesian approaches to reconstructing the evolutionary history of languages rely on the tree model, which assumes that these languages descended from a common ancestor and underwent modifications over time. However, this assumption can be violated to different extents due to contact and other factors. Understanding the degree to which this assumption is violated is crucial for validating the accur… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-COLING 2024

  3. arXiv:2211.06662  [pdf, other

    cs.CL

    Addressing Segmentation Ambiguity in Neural Linguistic Steganography

    Authors: Jumon Nozaki, Yugo Murawaki

    Abstract: Previous studies on neural linguistic steganography, except Ueoka et al. (2021), overlook the fact that the sender must detokenize cover texts to avoid arousing the eavesdropper's suspicion. In this paper, we demonstrate that segmentation ambiguity indeed causes occasional decoding failures at the receiver's side. With the near-ubiquity of subwords, this problem now affects any language. We propos… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

    Comments: Accepted at AACL-IJCNLP2022

  4. arXiv:2104.09833  [pdf, other

    cs.CL

    Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model

    Authors: Honai Ueoka, Yugo Murawaki, Sadao Kurohashi

    Abstract: With advances in neural language models, the focus of linguistic steganography has shifted from edit-based approaches to generation-based ones. While the latter's payload capacity is impressive, generating genuine-looking texts remains challenging. In this paper, we revisit edit-based linguistic steganography, with the idea that a masked language model offers an off-the-shelf solution. The propose… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: 7 pages, 4 firgures

  5. arXiv:2008.01523  [pdf, other

    cs.CL

    A System for Worldwide COVID-19 Information Aggregation

    Authors: Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda , et al. (4 additional authors not shown)

    Abstract: The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-… ▽ More

    Submitted 11 October, 2020; v1 submitted 27 July, 2020; originally announced August 2020.

    Comments: Accepted to EMNLP 2020 Workshop NLP-COVID

  6. arXiv:1909.00694  [pdf, other

    cs.CL

    Minimally Supervised Learning of Affective Events Using Discourse Relations

    Authors: Jun Saito, Yugo Murawaki, Sadao Kurohashi

    Abstract: Recognizing affective events that trigger positive or negative sentiment has a wide range of natural language processing applications but remains a challenging problem mainly because the polarity of an event is not necessarily predictable from its constituent words. In this paper, we propose to propagate affective polarity using discourse relations. Our method is simple and only requires a very sm… ▽ More

    Submitted 28 December, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

    Comments: 8 pages, 1 figure. EMNLP2019 (short paper)

  7. arXiv:1906.09719  [pdf, other

    cs.CL

    On the Definition of Japanese Word

    Authors: Yugo Murawaki

    Abstract: The annotation guidelines for Universal Dependencies (UD) stipulate that the basic units of dependency annotation are syntactic words, but it is not clear what are syntactic words in Japanese. Departing from the long tradition of using phrasal units called bunsetsu for dependency parsing, the current UD Japanese treebanks adopt the Short Unit Words. However, we argue that they are not syntactic wo… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.