Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Tumrani, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15720  [pdf, other

    cs.CL

    An Evaluation of Sindhi Word Embedding in Semantic Analogies and Downstream Tasks

    Authors: Wazir Ali, Saifullah Tumrani, Jay Kumar, Tariq Rahim Soomro

    Abstract: In this paper, we propose a new word embedding based corpus consisting of more than 61 million words crawled from multiple web resources. We design a preprocessing pipeline for the filtration of unwanted text from crawled data. Afterwards, the cleaned vocabulary is fed to state-of-the-art continuous-bag-of-words, skip-gram, and GloVe word embedding algorithms. For the evaluation of pretrained embe… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:1911.12579

  2. arXiv:2012.15079  [pdf, other

    cs.CL cs.LG

    Enhancing Sindhi Word Segmentation using Subword Representation Learning and Position-aware Self-attention

    Authors: Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, Zenglin Xu

    Abstract: Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It's cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation methods rely on designing and combining hand-crafted features. However, these methods have limitation… ▽ More

    Submitted 4 September, 2024; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: Journal Paper, 14 pages