Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Sannigrahi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06729  [pdf, other

    cs.IR cs.AI cs.CL

    Synthetic Query Generation using Large Language Models for Virtual Assistants

    Authors: Sonal Sannigrahi, Thiago Fraga-Silva, Youssef Oualil, Christophe Van Gysel

    Abstract: Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands. The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives. Hence, the generation of synthetic queries that are similar to existing VA usage can greatly improve upon the VA'… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: SIGIR '24. The 47th International ACM SIGIR Conference on Research & Development in Information Retrieval

  2. arXiv:2305.03207  [pdf, other

    cs.CL cs.AI

    Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages

    Authors: Sonal Sannigrahi, Rachel Bawden

    Abstract: Multilingual language models have shown impressive cross-lingual transfer ability across a diverse set of languages and tasks. To improve the cross-lingual ability of these models, some strategies include transliteration and finer-grained segmentation into characters as opposed to subwords. In this work, we investigate lexical sharing in multilingual machine translation (MT) from Hindi, Gujarati,… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: EAMT main conference

  3. arXiv:2304.14796  [pdf, other

    cs.CL cs.IR

    Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?

    Authors: Sonal Sannigrahi, Josef van Genabith, Cristina Espana-Bonet

    Abstract: Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentence embeddings estimated from raw texts are key in achieving state-of-the-art results in various tasks requiring semantic understanding. However, obtaining embeddings at the document level is challenging due to computational requirements and lack of appropriate data. Instead, most approaches fall back… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

    Comments: EACL 2023 Findings paper, to present at LoResMT

  4. arXiv:2203.14632  [pdf, other

    cs.CL

    Isomorphic Cross-lingual Embeddings for Low-Resource Languages

    Authors: Sonal Sannigrahi, Jesse Read

    Abstract: Cross-Lingual Word Embeddings (CLWEs) are a key component to transfer linguistic information learnt from higher-resource settings into lower-resource ones. Recent research in cross-lingual representation learning has focused on offline mapping approaches due to their simplicity, computational efficacy, and ability to work with minimal parallel resources. However, they crucially depend on the assum… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted non-archival Repl4NLP, ACL 2022

  5. arXiv:2106.03694  [pdf

    cs.CV cs.LG

    Detection of marine floating plastic using Sentinel-2 imagery and machine learning models

    Authors: Srikanta Sannigrahi, Bidroha Basu, Arunima Sarkar Basu, Francesco Pilla

    Abstract: The increasing level of marine plastic pollution poses severe threats to the marine ecosystem and biodiversity. The present study attempted to explore the full functionality of open Sentinel satellite data and ML models for detecting and classifying floating plastic debris in Mytilene (Greece), Limassol (Cyprus), Calabria (Italy), and Beirut (Lebanon). Two ML models, i.e. Support Vector Machine (S… ▽ More

    Submitted 8 June, 2021; v1 submitted 27 May, 2021; originally announced June 2021.

    Comments: 30 pages