Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Sandhan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.09501  [pdf, other

    cs.CL

    DepNeCTI: Dependency-based Nested Compound Type Identification for Sanskrit

    Authors: Jivnesh Sandhan, Yaswanth Narsupalli, Sreevatsa Muppirala, Sriram Krishnan, Pavankumar Satuluri, Amba Kulkarni, Pawan Goyal

    Abstract: Multi-component compounding is a prevalent phenomenon in Sanskrit, and understanding the implicit structure of a compound's components is crucial for deciphering its meaning. Earlier approaches in Sanskrit have focused on binary compounds and neglected the multi-component compound setting. This work introduces the novel task of nested compound type identification (NeCTI), which aims to identify ne… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 9 Pages, Camera-ready version accepted at EMNLP23 (Findings)

  2. arXiv:2308.08807  [pdf, other

    cs.CL

    Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

    Authors: Jivnesh Sandhan

    Abstract: The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology fo… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Ph.D. dissertation

  3. arXiv:2308.07081  [pdf, other

    cs.CL

    Aesthetics of Sanskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on Siksastaka

    Authors: Jivnesh Sandhan, Amruta Barbadikar, Malay Maity, Pavankumar Satuluri, Tushar Sandhan, Ravi M. Gupta, Pawan Goyal, Laxmidhar Behera

    Abstract: Sanskrit poetry has played a significant role in shaping the literary and cultural landscape of the Indian subcontinent for centuries. However, not much attention has been devoted to uncovering the hidden beauty of Sanskrit poetry in computational linguistics. This article explores the intersection of Sanskrit poetry and computational linguistics by proposing a roadmap of an interpretable framewor… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: 15 pages

  4. arXiv:2302.09527  [pdf, other

    cs.CL

    SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes

    Authors: Jivnesh Sandhan, Anshul Agarwal, Laxmidhar Behera, Tushar Sandhan, Pawan Goyal

    Abstract: We present a neural Sanskrit Natural Language Processing (NLP) toolkit named SanskritShala (a school of Sanskrit) to facilitate computational linguistic analyses for several tasks such as word segmentation, morphological tagging, dependency parsing, and compound type identification. Our systems currently report state-of-the-art performance on available benchmark datasets for all tasks. SanskritSha… ▽ More

    Submitted 29 May, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: 7 pages, Accepted at ACL23 (Demo track) to be held at Toronto, Canada

  5. arXiv:2210.11753  [pdf, other

    cs.CL

    TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer

    Authors: Jivnesh Sandhan, Rathin Singha, Narein Rao, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal

    Abstract: Sanskrit Word Segmentation (SWS) is essential in making digitized texts available and in deploying downstream tasks. It is, however, non-trivial because of the sandhi phenomenon that modifies the characters at the word boundaries, and needs special treatment. Existing lexicon driven approaches for SWS make use of Sanskrit Heritage Reader, a lexicon-driven shallow parser, to generate the complete c… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP22 (Findings)

  6. arXiv:2208.10310  [pdf, other

    cs.CL

    A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

    Authors: Jivnesh Sandhan, Ashish Gupta, Hrishikesh Terdalkar, Tushar Sandhan, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal

    Abstract: The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier appro… ▽ More

    Submitted 11 September, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: The work is accepted at COLING22, Gyeongju, Republic of Korea

  7. arXiv:2201.11391  [pdf, other

    cs.CL

    Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages

    Authors: Jivnesh Sandhan, Ayush Daksh, Om Adideva Paranjay, Laxmidhar Behera, Pawan Goyal

    Abstract: Nowadays, the interest in code-mixing has become ubiquitous in Natural Language Processing (NLP); however, not much attention has been given to address this phenomenon for Speech Translation (ST) task. This can be solely attributed to the lack of code-mixed ST task labelled data. Thus, we introduce Prabhupadavani, which is a multilingual code-mixed ST dataset for 25 languages. It is multi-domain,… ▽ More

    Submitted 4 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: The work is accepted at COLING22-SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

  8. arXiv:2201.11374  [pdf, other

    cs.CL

    Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing

    Authors: Jivnesh Sandhan, Laxmidhar Behera, Pawan Goyal

    Abstract: In this work, we focus on low-resource dependency parsing for multiple languages. Several strategies are tailored to enhance performance in low-resource scenarios. While these are well-known to the community, it is not trivial to select the best-performing combination of these strategies for a low-resource language that we are interested in, and not much attention has been given to measuring the e… ▽ More

    Submitted 29 January, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Accepted at EACL2023 to be held in Croatia Europe

  9. arXiv:2104.00270  [pdf, other

    cs.CL

    Evaluating Neural Word Embeddings for Sanskrit

    Authors: Jivnesh Sandhan, Om Adideva, Digumarthi Komal, Laxmidhar Behera, Pawan Goyal

    Abstract: Recently, the supervised learning paradigm's surprisingly remarkable performance has garnered considerable attention from Sanskrit Computational Linguists. As a result, the Sanskrit community has put laudable efforts to build task-specific labeled data for various downstream Natural Language Processing (NLP) tasks. The primary component of these approaches comes from representations of word embedd… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: 14 pages, The work is submitted at WSC 2022, Canberra, Australia

  10. arXiv:2102.06551  [pdf, other

    cs.CL

    A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages

    Authors: Jivnesh Sandhan, Amrith Krishna, Ashim Gupta, Laxmidhar Behera, Pawan Goyal

    Abstract: Neural dependency parsing has achieved remarkable performance for many domains and languages. The bottleneck of massive labeled data limits the effectiveness of these approaches for low resource languages. In this work, we focus on dependency parsing for morphological rich languages (MRLs) in a low-resource setting. Although morphological information is essential for the dependency parsing task, t… ▽ More

    Submitted 12 April, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: 6 pages, The work is accepted at EACL-SRW, 2021, Kyiv, Ukraine Typos corrected in Section 3.2

  11. arXiv:2004.08076  [pdf

    cs.CL

    Neural Approaches for Data Driven Dependency Parsing in Sanskrit

    Authors: Amrith Krishna, Ashim Gupta, Deepak Garasangi, Jivnesh Sandhan, Pavankumar Satuluri, Pawan Goyal

    Abstract: Data-driven approaches for dependency parsing have been of great interest in Natural Language Processing for the past couple of decades. However, Sanskrit still lacks a robust purely data-driven dependency parser, probably with an exception to Krishna (2019). This can primarily be attributed to the lack of availability of task-specific labelled data and the morphologically rich nature of the langu… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.

    Comments: submitted to WSC 2021