Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: M, G R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2301.10015  [pdf, other

    cs.SD cs.AI eess.AS

    Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics

    Authors: Gurunath Reddy M, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang

    Abstract: We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is… ▽ More

    Submitted 22 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2011.06380

  2. arXiv:2202.01078  [pdf, other

    cs.SD eess.AS

    Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

    Authors: Gurunath Reddy M, K. Sreenivasa Rao, Partha Pratim Das

    Abstract: Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment wit… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: 72 pages

  3. arXiv:2011.04297  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Knowledge Distillation for Singing Voice Detection

    Authors: Soumava Paul, Gurunath Reddy M, K Sreenivasa Rao, Partha Pratim Das

    Abstract: Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for C… ▽ More

    Submitted 19 August, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted at INTERSPEECH 2021. 5 pages, 3 figures

  4. arXiv:2006.00782  [pdf, other

    eess.AS cs.CL cs.SD

    Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

    Authors: Sanket Shah, Basil Abraham, Gurunath Reddy M, Sunayana Sitaram, Vikas Joshi

    Abstract: Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech.… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: 5 pages (4 pages + 1 page references), 5 tables, 1 figure, 1 algorithm, 16 references

  5. arXiv:1904.09765  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    hf0: A hybrid pitch extraction method for multimodal voice

    Authors: Pradeep Rengaswamy, Gurunath Reddy M, Krothapalli Sreenivasa Rao

    Abstract: Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, ha… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

    Comments: Pitch Extraction, F0 extraction, harmonic signals, speech, monophonic songs, Convolutional Neural Network, 5 pages, 5 figures

  6. arXiv:1811.09956  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning

    Authors: Gurunath Reddy M, Tanumay Mandal, Krothapalli Sreenivasa Rao

    Abstract: In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx.… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/39