Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Shimada, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.16276  [pdf, other

    cs.CV cs.AI

    Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding

    Authors: Yunlong Tang, Daiki Shimada, Jing Bi, Mingqian Feng, Hang Hua, Chenliang Xu

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in natural language and multimodal domains. By fine-tuning multimodal LLMs with temporal annotations from well-annotated datasets, e.g., dense video captioning datasets, their temporal understanding capacity in video-language tasks can be obtained. However, there is a notable lack of untrimmed audio-visual video datasets with p… ▽ More

    Submitted 20 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  2. arXiv:2106.00180  [pdf, other

    cs.CV cs.SD eess.AS

    Dual Normalization Multitasking for Audio-Visual Sounding Object Localization

    Authors: Tokuhiro Nishikawa, Daiki Shimada, Jerry Jun Yokono

    Abstract: Although several research works have been reported on audio-visual sound source localization in unconstrained videos, no datasets and metrics have been proposed in the literature to quantitatively evaluate its performance. Defining the ground truth for sound source localization is difficult, because the location where the sound is produced is not limited to the range of the source object, but the… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

    Comments: 10 pages, 6 figures