Skip to main content

Showing 1–16 of 16 results for author: Kurita, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2406.14240  [pdf, other

    cs.CV cs.AI

    CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

    Authors: Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue

    Abstract: Vision-and-language navigation (VLN) aims to guide autonomous agents through real-world environments by integrating visual and linguistic cues. While substantial progress has been made in understanding these interactive modalities in ground-level navigation, aerial navigation remains largely underexplored. This is primarily due to the scarcity of resources suitable for real-world, city-scale aeria… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The first two authors are equally contributed

  3. arXiv:2405.16559  [pdf, other

    cs.RO cs.CV

    Map-based Modular Approach for Zero-shot Embodied Question Answering

    Authors: Koya Sakamoto, Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe

    Abstract: Building robots capable of interacting with humans through natural language in the visual world presents a significant challenge in the field of robotics. To overcome this challenge, Embodied Question Answering (EQA) has been proposed as a benchmark task to measure the ability to identify an object navigating through a previously unseen environment in response to human-posed questions. Although so… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2404.02523  [pdf, other

    cs.CV cs.AI

    Text-driven Affordance Learning from Egocentric Vision

    Authors: Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori

    Abstract: Visual affordance learning is a key component for robots to understand how to interact with objects. Conventional approaches in this field rely on pre-defined objects and actions, falling short of capturing diverse interactions in realworld scenarios. The key idea of our approach is employing textual instruction, targeting various affordances for a wide range of objects. This approach covers both… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  5. arXiv:2403.19454  [pdf, other

    cs.CL

    JDocQA: Japanese Document Question Answering Dataset for Generative Language Models

    Authors: Eri Onami, Shuhei Kurita, Taiki Miyanishi, Taro Watanabe

    Abstract: Document question answering is a task of question answering on given documents such as reports, slides, pamphlets, and websites, and it is a truly demanding task as paper and electronic forms of documents are so common in our society. This is known as a quite challenging task because it requires not only text understanding but also understanding of figures and tables, and hence visual question ans… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: LREC-COLING2024

  6. arXiv:2402.17969  [pdf, other

    cs.CV cs.AI

    Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction

    Authors: Koki Maeda, Shuhei Kurita, Taiki Miyanishi, Naoaki Okazaki

    Abstract: Given the accelerating progress of vision and language modeling, accurate evaluation of machine-generated image captions remains critical. In order to evaluate captions more closely to human preferences, metrics need to discriminate between captions of varying quality and content. However, conventional metrics fail short of comparing beyond superficial matches of words or embedding similarities; t… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  7. arXiv:2401.09759  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition

    Authors: Hao Wang, Shuhei Kurita, Shuichiro Shimizu, Daisuke Kawahara

    Abstract: Audio-visual speech recognition (AVSR) is a multimodal extension of automatic speech recognition (ASR), using video as a complement to audio. In AVSR, considerable efforts have been directed at datasets for facial features such as lip-readings, while they often fall short in evaluating the image comprehension capabilities in broader contexts. In this paper, we construct SlideAVSR, an AVSR dataset… ▽ More

    Submitted 2 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 3rd Workshop on Advances in Language and Vision Research (ALVR 2024)

  8. arXiv:2310.18773  [pdf, other

    cs.CV

    CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data

    Authors: Taiki Miyanishi, Fumiya Kitamori, Shuhei Kurita, Jungdae Lee, Motoaki Kawanabe, Nakamasa Inoue

    Abstract: City-scale 3D point cloud is a promising way to express detailed and complicated outdoor structures. It encompasses both the appearance and geometry features of segmented city components, including cars, streets, and buildings, that can be utilized for attractive applications such as user-interactive navigation of autonomous vehicles and drones. However, compared to the extensive text annotations… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: NeurIPS D&B 2023. The first two authors are equally contributed

  9. arXiv:2309.10430  [pdf, other

    cs.CV

    Predicate Classification Using Optimal Transport Loss in Scene Graph Generation

    Authors: Sorachi Kurita, Satoshi Oyama, Itsuki Noda

    Abstract: In scene graph generation (SGG), learning with cross-entropy loss yields biased predictions owing to the severe imbalance in the distribution of the relationship labels in the dataset. Thus, this study proposes a method to generate scene graphs using optimal transport as a measure for comparing two probability distributions. We apply learning with the optimal transport loss, which reflects the sim… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  10. arXiv:2308.12035  [pdf, other

    cs.CV

    RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D

    Authors: Shuhei Kurita, Naoki Katsura, Eri Onami

    Abstract: Grounding textual expressions on scene objects from first-person views is a truly demanding capability in developing agents that are aware of their surroundings and behave following intuitive text instructions. Such capability is of necessity for glass-devices or autonomous robots to localize referred objects in the real-world. In the conventional referring expression comprehension tasks of images… ▽ More

    Submitted 30 October, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: 15 pages, 11 figures. ICCV2023. Codes are available at https://github.com/shuheikurita/RefEgo

  11. arXiv:2305.13876  [pdf, other

    cs.CV

    Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans

    Authors: Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, Motoki Kawanabe

    Abstract: We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG), which overcomes limitations of existing 3D visual grounding models, specifically their restricted 3D resources and consequent tendencies of overfitting a specific 3D dataset. We created RIORefer, a large-scale 3D visual grounding dataset, to facilitate Cross3DVG. It includes more than 63k diverse descriptions of 3… ▽ More

    Submitted 7 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 3DV 2024

  12. arXiv:2209.05840  [pdf, other

    cs.CL cs.AI

    Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

    Authors: Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Yoshitaka Ushiku, Shinsuke Mori

    Abstract: We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn each cooking action result in a recipe text. The dataset consists of object state changes and the workflow of the recipe text. The state change is represented as an image pair, while the workflow is represented as a recipe flow graph (r-FG). The image pairs are grounded in the r-FG, which provides the cross-mo… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: COLING 2022

  13. arXiv:2112.10482  [pdf, other

    cs.CV

    ScanQA: 3D Question Answering for Spatial Scene Understanding

    Authors: Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe

    Abstract: We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the 3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D indoor scan and answer the given textual questions about the 3D scene. Unlike the 2D-question answering of VQA, the conventional 2D-QA models suffer from problems with spatial understanding of object alignment and direction… ▽ More

    Submitted 7 May, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: CVPR2022. The first three authors are equally contributed. Project page: https://github.com/ATR-DBI/ScanQA

  14. arXiv:2009.07783  [pdf, other

    cs.CL

    Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule

    Authors: Shuhei Kurita, Kyunghyun Cho

    Abstract: Vision-and-language navigation (VLN) is a task in which an agent is embodied in a realistic 3D environment and follows an instruction to reach the goal node. While most of the previous studies have built and investigated a discriminative approach, we notice that there are in fact two possible approaches to building such a VLN agent: discriminative \textit{and} generative. In this paper, we design… ▽ More

    Submitted 8 October, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: 13 pages, 8 figures

  15. arXiv:1906.01239  [pdf, other

    cs.CL

    Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies

    Authors: Shuhei Kurita, Anders Søgaard

    Abstract: In Semantic Dependency Parsing (SDP), semantic relations form directed acyclic graphs, rather than trees. We propose a new iterative predicate selection (IPS) algorithm for SDP. Our IPS algorithm combines the graph-based and transition-based parsing approaches in order to handle multiple semantic head words. We train the IPS model using a combination of multi-task learning and task-specific policy… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: ACL2019 Long accepted. 9 pages for the paper and the additional 2 pages for the supplemental material

  16. arXiv:1806.00971  [pdf, other

    cs.CL

    Neural Adversarial Training for Semi-supervised Japanese Predicate-argument Structure Analysis

    Authors: Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi

    Abstract: Japanese predicate-argument structure (PAS) analysis involves zero anaphora resolution, which is notoriously difficult. To improve the performance of Japanese PAS analysis, it is straightforward to increase the size of corpora annotated with PAS. However, since it is prohibitively expensive, it is promising to take advantage of a large amount of raw corpora. In this paper, we propose a novel Japan… ▽ More

    Submitted 4 June, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted by ACL-2018. 9 pages, 3 figures