Skip to main content

Showing 1–5 of 5 results for author: Dorkin, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03861  [pdf, other

    cs.CL

    TartuNLP @ AXOLOTL-24: Leveraging Classifier Output for New Sense Detection in Lexical Semantics

    Authors: Aleksei Dorkin, Kairit Sirts

    Abstract: We present our submission to the AXOLOTL-24 shared task. The shared task comprises two subtasks: identifying new senses that words gain with time (when comparing newer and older time periods) and producing the definitions for the identified new senses. We implemented a conceptually simple and computationally inexpensive solution to both subtasks. We trained adapter-based binary classification mode… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to the 5th International Workshop on Computational Approaches to Historical Language Change 2024 (LChange'24)

  2. arXiv:2405.01159  [pdf, other

    cs.CL

    TartuNLP at EvaLatin 2024: Emotion Polarity Detection

    Authors: Aleksei Dorkin, Kairit Sirts

    Abstract: This paper presents the TartuNLP team submission to EvaLatin 2024 shared task of the emotion polarity detection for historical Latin texts. Our system relies on two distinct approaches to annotating training data for supervised learning: 1) creating heuristics-based labels by adopting the polarity lexicon provided by the organizers and 2) generating labels with GPT4. We employed parameter efficien… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted to The Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2024)

  3. arXiv:2404.19430  [pdf, other

    cs.CL

    Sõnajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation

    Authors: Aleksei Dorkin, Kairit Sirts

    Abstract: We present an information retrieval based reverse dictionary system using modern pre-trained language models and approximate nearest neighbors search algorithms. The proposed approach is applied to an existing Estonian language lexicon resource, Sõnaveeb (word web), with the purpose of enhancing and enriching it by introducing cross-lingual reverse dictionary functionality powered by semantic sear… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted to *SEM 2024

  4. arXiv:2404.15003  [pdf, other

    cs.CL

    Comparison of Current Approaches to Lemmatization: A Case Study in Estonian

    Authors: Aleksei Dorkin, Kairit Sirts

    Abstract: This study evaluates three different lemmatization approaches to Estonian -- Generative character-level models, Pattern-based word-level classification models, and rule-based morphological analysis. According to our experiments, a significantly smaller Generative model consistently outperforms the Pattern-based classification model based on EstBERT. Additionally, we observe a relatively small over… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 6 pages, 2 figures

    Journal ref: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 280-285, May 2023

  5. arXiv:2404.12845  [pdf, other

    cs.CL

    TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages

    Authors: Aleksei Dorkin, Kairit Sirts

    Abstract: We present our submission to the unconstrained subtask of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages for morphological annotation, POS-tagging, lemmatization, character- and word-level gap-filling. We developed a simple, uniform, and computationally lightweight approach based on the adapters framework using parameter-efficient fine-tuning. We appl… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 11 pages, 3 figures

    Journal ref: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pp. 120-130, March 2024