Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: J, K N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06331  [pdf, other

    cs.CL

    CharSS: Character-Level Transformer Model for Sanskrit Word Segmentation

    Authors: Krishnakant Bhatt, Karthika N J, Ganesh Ramakrishnan, Preethi Jyothi

    Abstract: Subword tokens in Indian languages inherently carry meaning, and isolating them can enhance NLP tasks, making sub-word segmentation a crucial process. Segmenting Sanskrit and other Indian languages into subtokens is not straightforward, as it may include sandhi, which may lead to changes in the word boundaries. We propose a new approach of utilizing a Character-level Transformer model for Sanskrit… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.