Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Sawatphol, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19164  [pdf, other

    cs.CL

    Addressing Topic Leakage in Cross-Topic Evaluation for Authorship Verification

    Authors: Jitkapat Sawatphol, Can Udomcharoenchaikit, Sarana Nutanong

    Abstract: Authorship verification (AV) aims to identify whether a pair of texts has the same author. We address the challenge of evaluating AV models' robustness against topic shifts. The conventional evaluation assumes minimal topic overlap between training and test data. However, we argue that there can still be topic leakage in test data, causing misleading model performance and unstable rankings. To add… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted to publish at Transactions of the Association for Computational Linguistics

  2. arXiv:2406.06000  [pdf

    cs.CL

    ThaiCoref: Thai Coreference Resolution Dataset

    Authors: Pontakorn Trakuekul, Wei Qi Leong, Charin Polpanumas, Jitkapat Sawatphol, William Chandra Tjhi, Attapol T. Rutherford

    Abstract: While coreference resolution is a well-established research area in Natural Language Processing (NLP), research focusing on Thai language remains limited due to the lack of large annotated corpora. In this work, we introduce ThaiCoref, a dataset for Thai coreference resolution. Our dataset comprises 777,271 tokens, 44,082 mentions and 10,429 entities across four text genres: university essays, new… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2403.16127  [pdf, other

    cs.CL cs.AI

    WangchanLion and WangchanX MRC Eval

    Authors: Wannaphong Phatthiyaphaibun, Surapon Nonesung, Patomporn Payoungkhamdee, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Jitkapat Sawatphol, Chompakorn Chaksangchaichot, Ekapol Chuangsuwanich, Sarana Nutanong

    Abstract: This technical report describes the development of WangchanLion, an instruction fine-tuned model focusing on Machine Reading Comprehension (MRC) in the Thai language. Our model is based on SEA-LION and a collection of instruction following datasets. To promote open research and reproducibility, we publicly release all training data, code, and the final model weights under the Apache-2 license. To… ▽ More

    Submitted 23 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.