Zum Hauptinhalt springen

Showing 1–29 of 29 results for author: Yeo, H

Searching in archive cs. Search in all archives.
.
  1. MicroCam: Leveraging Smartphone Microscope Camera for Context-Aware Contact Surface Sensing

    Authors: Yongquan Hu, Hui-Shyong Yeo, Mingyue Yuan, Haoran Fan, Don Samitha Elvitigala, Wen Hu, Aaron Quigley

    Abstract: The primary focus of this research is the discreet and subtle everyday contact interactions between mobile phones and their surrounding surfaces. Such interactions are anticipated to facilitate mobile context awareness, encompassing aspects such as dispensing medication updates, intelligently switching modes (e.g., silent mode), or initiating commands (e.g., deactivating an alarm). We introduce Mi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 28 pages

    Journal ref: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7.3 (2023): 1-28

  2. arXiv:2406.07867  [pdf, other

    cs.CV cs.AI cs.HC

    Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    Authors: Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro

    Abstract: In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corp… ▽ More

    Submitted 2 August, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (Oral)

  3. arXiv:2404.15635  [pdf, other

    cs.CV cs.LG

    A Real-time Evaluation Framework for Pedestrian's Potential Risk at Non-Signalized Intersections Based on Predicted Post-Encroachment Time

    Authors: Tengfeng Lin, Zhixiong Jin, Seongjin Choi, Hwasoo Yeo

    Abstract: Addressing pedestrian safety at intersections is one of the paramount concerns in the field of transportation research, driven by the urgency of reducing traffic-related injuries and fatalities. With advances in computer vision technologies and predictive models, the pursuit of developing real-time proactive protection systems is increasingly recognized as vital to improving pedestrian safety at i… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  4. arXiv:2402.15151  [pdf, other

    cs.CV cs.CL eess.AS eess.IV

    Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

    Authors: Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

    Abstract: In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM),… ▽ More

    Submitted 13 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: An Erratum was added on the last page of this paper

  5. arXiv:2401.10314  [pdf, other

    cs.SE cs.AI cs.LG cs.RO

    LangProp: A code optimization framework using Large Language Models applied to driving

    Authors: Shu Ishida, Gianluca Corrado, George Fedoseev, Hudson Yeo, Lloyd Russell, Jamie Shotton, João F. Henriques, Anthony Hu

    Abstract: We propose LangProp, a framework for iteratively optimizing code generated by large language models (LLMs), in both supervised and reinforcement learning settings. While LLMs can generate sensible coding solutions zero-shot, they are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code… ▽ More

    Submitted 3 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  6. arXiv:2401.09802  [pdf, other

    eess.AS cs.CV cs.SD

    Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

    Authors: Minsu Kim, Jeong Hun Yeo, Se Jin Park, Hyeongseop Rha, Yong Man Ro

    Abstract: This paper explores sentence-level multilingual Visual Speech Recognition (VSR) that can recognize different languages with a single trained model. As the massive multilingual modeling of visual data requires huge computational costs, we propose a novel training strategy, processing with visual speech units. Motivated by the recent success of the audio speech unit, we propose to use a visual speec… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: ACMMM 2024

  7. arXiv:2309.17080  [pdf, other

    cs.CV cs.AI cs.RO

    GAIA-1: A Generative World Model for Autonomous Driving

    Authors: Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, Gianluca Corrado

    Abstract: Autonomous driving promises transformative improvements to transportation, but building systems capable of safely navigating the unstructured complexity of real-world scenarios remains challenging. A critical problem lies in effectively predicting the various potential outcomes that may emerge in response to the vehicle's actions as the world evolves. To address this challenge, we introduce GAIA… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Technical Report

  8. arXiv:2309.08535  [pdf, other

    cs.CV cs.AI eess.AS

    Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

    Authors: Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro

    Abstract: This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages, especially for low-resource languages that have a limited number of labeled data. Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the… ▽ More

    Submitted 12 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  9. arXiv:2309.08531  [pdf, other

    cs.CV cs.CL eess.AS eess.IV

    Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

    Authors: Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

    Abstract: In this paper, we propose methods to build a powerful and efficient Image-to-Speech captioning (Im2Sp) model. To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp. We set the output of the proposed Im2Sp as discretized speech units, i.e., the quantized speech features of a self-s… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  10. arXiv:2308.09311  [pdf, other

    cs.CV cs.CL cs.SD eess.AS eess.IV

    Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

    Authors: Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro

    Abstract: This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have sufficient power to model lip movements and language, it is regarded as challenging to develop lip reading models for low-resource languages. In order… ▽ More

    Submitted 12 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  11. arXiv:2308.07593  [pdf, other

    cs.CV cs.MM eess.AS eess.IV

    AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

    Authors: Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

    Abstract: Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different fro… ▽ More

    Submitted 11 January, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE Transactions on Multimedia

  12. arXiv:2305.04542  [pdf, other

    cs.CV eess.AS

    Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

    Authors: Jeong Hun Yeo, Minsu Kim, Yong Man Ro

    Abstract: Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements. Some works have been recently presented which use audio signals to supplement visual information. However, existing methods utilize only limited information such as phoneme-level features and soft labels of Automatic Speech Recognition (ASR) networks. In this paper, we present a Multi-Temporal Lip-Audio Mem… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Presented at ICASSP 2023

  13. arXiv:2303.15826  [pdf, other

    eess.IV cs.AI cs.CV

    MS-MT: Multi-Scale Mean Teacher with Contrastive Unpaired Translation for Cross-Modality Vestibular Schwannoma and Cochlea Segmentation

    Authors: Ziyuan Zhao, Kaixin Xu, Huai Zhe Yeo, Xulei Yang, Cuntai Guan

    Abstract: Domain shift has been a long-standing issue for medical image segmentation. Recently, unsupervised domain adaptation (UDA) methods have achieved promising cross-modality segmentation performance by distilling knowledge from a label-rich source domain to a target domain without labels. In this work, we propose a multi-scale self-ensembling based UDA framework for automatic segmentation of two key b… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted by BrainLes MICCAI proceedings (5th solution for MICCAI 2022 Cross-Modality Domain Adaptation (crossMoDA) Challenge)

  14. arXiv:2210.07729  [pdf, other

    cs.CV cs.AI cs.RO

    Model-Based Imitation Learning for Urban Driving

    Authors: Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zak Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, Jamie Shotton

    Abstract: An accurate model of the environment and the dynamic agents acting in it offers great potential for improving motion planning. We present MILE: a Model-based Imitation LEarning approach to jointly learn a model of the world and a policy for autonomous driving. Our method leverages 3D geometry as an inductive bias and learns a highly compact latent space directly from high-resolution videos of expe… ▽ More

    Submitted 3 November, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  15. arXiv:2204.02181  [pdf, other

    cs.CV

    Vision Transformer Equipped with Neural Resizer on Facial Expression Recognition Task

    Authors: Hyeonbin Hwang, Soyeon Kim, Wei-Jin Park, Jiho Seo, Kyungtae Ko, Hyeon Yeo

    Abstract: When it comes to wild conditions, Facial Expression Recognition is often challenged with low-quality data and imbalanced, ambiguous labels. This field has much benefited from CNN based approaches; however, CNN models have structural limitation to see the facial regions in distant. As a remedy, Transformer has been introduced to vision fields with global receptive field, but requires adjusting inpu… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted to IEEE ICASSP 2022

  16. arXiv:2204.01725  [pdf, other

    cs.CV

    Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

    Authors: Minsu Kim, Jeong Hun Yeo, Yong Man Ro

    Abstract: Recognizing speech from silent lip movement, which is called lip reading, is a challenging task due to 1) the inherent information insufficiency of lip movement to fully represent the speech, and 2) the existence of homophenes that have similar lip movement with different pronunciations. In this paper, we try to alleviate the aforementioned two challenges in lip reading by proposing a Multi-head V… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Published at AAAI 2022

  17. arXiv:2201.05877  [pdf, other

    cs.LG cs.AI

    A Framework for Pedestrian Sub-classification and Arrival Time Prediction at Signalized Intersection Using Preprocessed Lidar Data

    Authors: Tengfeng Lin, Zhixiong Jin, Seongjin Choi, Hwasoo Yeo

    Abstract: The mortality rate for pedestrians using wheelchairs was 36% higher than the overall population pedestrian mortality rate. However, there is no data to clarify the pedestrians' categories in both fatal and nonfatal accidents, since police reports often do not keep a record of whether a victim was using a wheelchair or has a disability. Currently, real-time detection of vulnerable road users using… ▽ More

    Submitted 15 January, 2022; originally announced January 2022.

    Comments: 15 pages, 11 figures, 4 tables

  18. Transformer-based Map Matching Model with Limited Ground-Truth Data using Transfer-Learning Approach

    Authors: Zhixiong Jin, Jiwon Kim, Hwasoo Yeo, Seongjin Choi

    Abstract: In many spatial trajectory-based applications, it is necessary to map raw trajectory data points onto road networks in digital maps, which is commonly referred to as a map-matching process. While most previous map-matching methods have focused on using rule-based algorithms to deal with the map-matching problems, in this paper, we consider the map-matching task from the data-driven perspective, pr… ▽ More

    Submitted 7 October, 2021; v1 submitted 1 August, 2021; originally announced August 2021.

    Comments: 25 pages, 9 figures, 4 tables

  19. arXiv:2107.12507  [pdf

    cs.CV cs.CY

    Analyzing vehicle pedestrian interactions combining data cube structure and predictive collision risk estimation model

    Authors: Byeongjoon Noh, Hansaem Park, Hwasoo Yeo

    Abstract: Traffic accidents are a threat to human lives, particularly pedestrians causing premature deaths. Therefore, it is necessary to devise systems to prevent accidents in advance and respond proactively, using potential risky situations as one of the surrogate safety measurements. This study introduces a new concept of a pedestrian safety system that combines the field and the centralized processes. T… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: 33 pages, 19 figures

  20. arXiv:2107.03554  [pdf

    cs.CV cs.CY

    Automated Object Behavioral Feature Extraction for Potential Risk Analysis based on Video Sensor

    Authors: Byeongjoon Noh, Dongho Ka, Wonjun Noh, Hwasoo Yeo

    Abstract: Pedestrians are exposed to risk of death or serious injuries on roads, especially unsignalized crosswalks, for a variety of reasons. To date, an extensive variety of studies have reported on vision based traffic safety system. However, many studies required manual inspection of the volumes of traffic video to reliably obtain traffic related objects behavioral factors. In this paper, we propose an… ▽ More

    Submitted 25 October, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: 6 pages, 9 figures

  21. arXiv:2105.02582  [pdf

    cs.CV

    Vision based Pedestrian Potential Risk Analysis based on Automated Behavior Feature Extraction for Smart and Safe City

    Authors: Byeongjoon Noh, Dongho Ka, David Lee, Hwasoo Yeo

    Abstract: Despite recent advances in vehicle safety technologies, road traffic accidents still pose a severe threat to human lives and have become a leading cause of premature deaths. In particular, crosswalks present a major threat to pedestrians, but we lack dense behavioral data to investigate the risks they face. Therefore, we propose a comprehensive analytical model for pedestrian potential risk using… ▽ More

    Submitted 27 May, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: 26 pages, 15 figures, 5 tables

  22. arXiv:2105.02572  [pdf

    cs.CV cs.AI

    A novel method of predictive collision risk area estimation for proactive pedestrian accident prevention system in urban surveillance infrastructure

    Authors: Byeongjoon Noh, Hwasoo Yeo

    Abstract: Road traffic accidents, especially vehicle pedestrian collisions in crosswalk, globally pose a severe threat to human lives and have become a leading cause of premature deaths. In order to protect such vulnerable road users from collisions, it is necessary to recognize possible conflict in advance and warn to road users, not post facto. A breakthrough for proactively preventing pedestrian collisio… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: 26 pages, 17 figures, 5 tables

  23. arXiv:2104.14285  [pdf, other

    cs.RO

    Hybrid tracker based optimal path tracking system for complex road environments for autonomous driving

    Authors: Eunbin Seo, Seunggi Lee, Gwanjun Shin, Hoyeong Yeo, Yongseob Lim, Gyeungho Choi

    Abstract: Path tracking system plays a key technology in autonomous driving. The system should be driven accurately along the lane and be careful not to cause any inconvenience to passengers. To address such tasks, this paper proposes hybrid tracker based optimal path tracking system. By applying a deep learning based lane detection algorithm and a designated fast lane fitting algorithm, this paper develope… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: Submitted to IEEE Access This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  24. arXiv:2009.10868  [pdf, other

    cs.CV

    A Real-Time Predictive Pedestrian Collision Warning Service for Cooperative Intelligent Transportation Systems Using 3D Pose Estimation

    Authors: Ue-Hwan Kim, Dongho Ka, Hwasoo Yeo, Jong-Hwan Kim

    Abstract: Minimizing traffic accidents between vehicles and pedestrians is one of the primary research goals in intelligent transportation systems. To achieve the goal, pedestrian orientation recognition and prediction of pedestrian's crossing or not-crossing intention play a central role. Contemporary approaches do not guarantee satisfactory performance due to limited field-of-view, lack of generalization,… ▽ More

    Submitted 21 February, 2022; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: 12 pages, 8 figures, 4 tables

  25. TrajGAIL: Generating Urban Vehicle Trajectories using Generative Adversarial Imitation Learning

    Authors: Seongjin Choi, Jiwon Kim, Hwasoo Yeo

    Abstract: Recently, an abundant amount of urban vehicle trajectory data has been collected in road networks. Many studies have used machine learning algorithms to analyze patterns in vehicle trajectories to predict location sequences of individual travelers. Unlike the previous studies that used a discriminative modeling approach, this research suggests a generative modeling approach to learn the underlying… ▽ More

    Submitted 15 January, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: 25 pages, 10 figures, 2 table

    Journal ref: Transportation Research Part C: Emerging Technologies Volume 128, July 2021, 103091

  26. arXiv:2002.04406  [pdf, other

    physics.soc-ph cs.LG eess.SP stat.ML

    Traffic Data Imputation using Deep Convolutional Neural Networks

    Authors: Ouafa Benkraouda, Bilal Thonnam Thodi, Hwasoo Yeo, Monica Menendez, Saif Eddin Jabari

    Abstract: We propose a statistical learning-based traffic speed estimation method that uses sparse vehicle trajectory information. Using a convolutional encoder-decoder based architecture, we show that a well trained neural network can learn spatio-temporal traffic speed dynamics from time-space diagrams. We demonstrate this for a homogeneous road section using simulated vehicle trajectories and then valida… ▽ More

    Submitted 21 January, 2020; originally announced February 2020.

    Journal ref: IEEE Access, 8, 2020, pp. 104740-104752

  27. arXiv:1812.07151  [pdf, other

    cs.LG cs.AI

    Attention-based Recurrent Neural Network for Urban Vehicle Trajectory Prediction

    Authors: Seongjin Choi, Jiwon Kim, Hwasoo Yeo

    Abstract: With the increasing deployment of diverse positioning devices and location-based services, a huge amount of spatial and temporal information has been collected and accumulated as trajectory data. Among many applications, trajectory-based location prediction is gaining increasing attention because of its potential to improve the performance of many applications in multiple domains. This research fo… ▽ More

    Submitted 3 December, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

  28. AdaM: Adapting Multi-User Interfaces for Collaborative Environments in Real-Time

    Authors: Seonwook Park, Christoph Gebhardt, Roman Rädle, Anna Feit, Hana Vrzakova, Niraj Dayama, Hui-Shyong Yeo, Clemens Klokmose, Aaron Quigley, Antti Oulasvirta, Otmar Hilliges

    Abstract: Developing cross-device multi-user interfaces (UIs) is a challenging problem. There are numerous ways in which content and interactivity can be distributed. However, good solutions must consider multiple users, their roles, their preferences and access rights, as well as device capabilities. Manual and rule-based solutions are tedious to create and do not scale to larger problems nor do they adapt… ▽ More

    Submitted 29 March, 2018; v1 submitted 3 March, 2018; originally announced March 2018.

    Comments: formatting tweaks

    ACM Class: H.5.m

  29. arXiv:1708.07174  [pdf

    cs.CR cs.NI

    Understanding Modern Intrusion Detection Systems: A Survey

    Authors: Liu Hua Yeo, Xiangdong Che, Shalini Lakkaraju

    Abstract: Intrusion detection systems (IDS) help detect unauthorized activities or intrusions that may compromise the confidentiality, integrity or availability of a resource. This paper presents a general overview of IDSs, the way they are classified, and the different algorithms used to detect anomalous activities. It attempts to compare the various methods of intrusion techniques. It also describes the v… ▽ More

    Submitted 30 November, 2017; v1 submitted 23 August, 2017; originally announced August 2017.

    Comments: 9 pages, 5 figures