Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Eren, A Ö

Searching in archive cs. Search in all archives.
.
  1. arXiv:2204.08567  [pdf, other

    cs.SD eess.AS

    Automated Audio Captioning using Audio Event Clues

    Authors: Ayşegül Özkaya Eren, Mustafa Sert

    Abstract: Audio captioning is an important research area that aims to generate meaningful descriptions for audio clips. Most of the existing research extracts acoustic features of audio clips as input to encoder-decoder and transformer architectures to produce the captions in a sequence-to-sequence manner. Due to data insufficiency and the architecture's inadequate learning capacity, additional information… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: submitted to IEEE/ACM Transactions on Audio Speech and Language Processing

  2. arXiv:2110.01210  [pdf, other

    cs.SD eess.AS

    Audio Captioning Using Sound Event Detection

    Authors: Ayşegül Özkaya Eren, Mustafa Sert

    Abstract: This technical report proposes an audio captioning system for DCASE 2021 Task 6 audio captioning challenge. Our proposed model is based on an encoder-decoder architecture with bi-directional Gated Recurrent Units (BiGRU) using pretrained audio features and sound event detection. A pretrained neural network (PANN) is used to extract audio features and Word2Vec is selected with the aim of extracting… ▽ More

    Submitted 7 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: Submitted to DCASE 2021 Challenge

  3. arXiv:2105.06355  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Captioning with Composition of Acoustic and Semantic Information

    Authors: Ayşegül Özkaya Eren, Mustafa Sert

    Abstract: Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder-decoder based models without considering semantic information. To fill this gap, we present a novel encoder-decoder architecture using bi-directional Gated Recurrent Units (Bi… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in International Journal of Semantic Computing. arXiv admin note: substantial text overlap with arXiv:2006.03391

  4. arXiv:2006.03391  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Captioning using Gated Recurrent Units

    Authors: Ayşegül Özkaya Eren, Mustafa Sert

    Abstract: Audio captioning is a recently proposed task for automatically generating a textual description of a given audio clip. In this study, a novel deep network architecture with audio embeddings is presented to predict audio captions. Within the aim of extracting audio features in addition to log Mel energies, VGGish audio embedding model is used to explore the usability of audio embeddings in the audi… ▽ More

    Submitted 3 January, 2021; v1 submitted 5 June, 2020; originally announced June 2020.