Search | arXiv e-print repository

arXiv:2407.19719 [pdf, other]

Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images

Authors: Jiaxin Zhang, Yunqin Li, Tomohiro Fukuda, Bowen Wang

Abstract: Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources. This process often involves extensive field surveys, manual data collection, and subjective assessments, which can be time-consuming, costly, and sometimes inconsistent. Street View Images (SVIs), along with deep learning methods, provide a way to realize large-scale urban safet… ▽ More Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources. This process often involves extensive field surveys, manual data collection, and subjective assessments, which can be time-consuming, costly, and sometimes inconsistent. Street View Images (SVIs), along with deep learning methods, provide a way to realize large-scale urban safety detection. However, achieving this goal often requires extensive human annotation to train safety ranking models, and the architectural differences between cities hinder the transferability of these models. Thus, a fully automated method for conducting safety evaluations is essential. Recent advances in multimodal large language models (MLLMs) have demonstrated powerful reasoning and analytical capabilities. Cutting-edge models, e.g., GPT-4 have shown surprising performance in many tasks. We employed these models for urban safety ranking on a human-annotated anchor set and validated that the results from MLLMs align closely with human perceptions. Additionally, we proposed a method based on the pre-trained Contrastive Language-Image Pre-training (CLIP) feature and K-Nearest Neighbors (K-NN) retrieval to quickly assess the safety index of the entire city. Experimental results show that our method outperforms existing training needed deep learning approaches, achieving efficient and accurate urban safety evaluations. The proposed automation for urban safety perception assessment is a valuable tool for city planners, policymakers, and researchers aiming to improve urban environments. △ Less

Submitted 5 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

Comments: 13 pages, 9 figures

arXiv:2402.17310 [pdf, other]

Method of Tracking and Analysis of Fluorescent-Labeled Cells Using Automatic Thresholding and Labeling

Authors: Mizuki Fukasawa, Tomokazu Fukuda, Takuya Akashi

Abstract: High-throughput screening using cell images is an efficient method for screening new candidates for pharmaceutical drugs. To complete the screening process, it is essential to have an efficient process for analyzing cell images. This paper presents a new method for efficiently tracking cells and quantitatively detecting the signal ratio between cytoplasm and nuclei. Existing methods include those… ▽ More High-throughput screening using cell images is an efficient method for screening new candidates for pharmaceutical drugs. To complete the screening process, it is essential to have an efficient process for analyzing cell images. This paper presents a new method for efficiently tracking cells and quantitatively detecting the signal ratio between cytoplasm and nuclei. Existing methods include those that use image processing techniques and those that utilize artificial intelligence (AI). However, these methods do not consider the correspondence of cells between images, or require a significant amount of new learning data to train AI. Therefore, our method uses automatic thresholding and labeling algorithms to compare the position of each cell between images, and continuously measure and analyze the signal ratio of cells. This paper describes the algorithm of our method. Using the method, we experimented to investigate the effect of the number of opening and closing operations during the binarization process on the tracking of the cells. Through the experiment, we determined the appropriate number of opening and closing processes. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 5 pages, 7 figures

arXiv:2311.13090 [pdf, other]

On the Limitation of Diffusion Models for Synthesizing Training Datasets

Authors: Shin'ya Yamaguchi, Takuma Fukuda

Abstract: Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpos… ▽ More Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpose of replicating datasets for training discriminative tasks. This paper investigates the gap between synthetic and real samples by analyzing the synthetic samples reconstructed from real samples through the diffusion and reverse process. By varying the time steps starting the reverse process in the reconstruction, we can control the trade-off between the information in the original real data and the information added by diffusion models. Through assessing the reconstructed samples and trained models, we found that the synthetic data are concentrated in modes of the training data distribution as the reverse step increases, and thus, they are difficult to cover the outer edges of the distribution. Our findings imply that modern diffusion models are insufficient to replicate training data distribution perfectly, and there is room for the improvement of generative modeling in the replication of training datasets. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023 SyntheticData4ML Workshop

arXiv:2211.02524 [pdf]

doi 10.1109/MCOM.002.2200003

Open Multi-Access Network Platform with Dynamic Task Offloading and Intelligent Resource Monitoring

Authors: Takuji Tachibana, Kazuki Sawada, Hiroyuki Fujii, Ryo Maruyama, Tomonori Yamada, Masaaki Fujii, Toshimichi Fukuda

Abstract: We constructed an open multi-access network platform using open-source hardware and software. The open multi-access network platform is characterized by the flexible utilization of network functions, integral management and control of wired and wireless access networks, zero-touch provisioning, intelligent resource monitoring, and dynamic task offloading. We also propose an application-driven dyna… ▽ More We constructed an open multi-access network platform using open-source hardware and software. The open multi-access network platform is characterized by the flexible utilization of network functions, integral management and control of wired and wireless access networks, zero-touch provisioning, intelligent resource monitoring, and dynamic task offloading. We also propose an application-driven dynamic task offloading that utilizes intelligent resource monitoring to ensure effective task processing in edge and cloud servers. For this purpose, we developed a mobile application and server applications for the open multi-access network platform. To investigate the feasibility and availability of our developed platform, we experimentally and analytically evaluated the effectiveness of application-driven dynamic task offloading and intelligent resource monitoring. The experimental results demonstrated that application-driven dynamic task offloading could reduce real-time task response time and traffic over metro and core networks. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Journal ref: IEEE Communications Magazine, Vol. 60, Issue 8, pp. 52-58, August 2022

arXiv:2207.03870 [pdf, other]

BlindSpotNet: Seeing Where We Cannot See

Authors: Taichi Fukuda, Kotaro Hasegawa, Shinya Ishizaki, Shohei Nobuhara, Ko Nishino

Abstract: We introduce 2D blind spot estimation as a critical visual task for road scene understanding. By automatically detecting road regions that are occluded from the vehicle's vantage point, we can proactively alert a manual driver or a self-driving system to potential causes of accidents (e.g., draw attention to a road region from which a child may spring out). Detecting blind spots in full 3D would b… ▽ More We introduce 2D blind spot estimation as a critical visual task for road scene understanding. By automatically detecting road regions that are occluded from the vehicle's vantage point, we can proactively alert a manual driver or a self-driving system to potential causes of accidents (e.g., draw attention to a road region from which a child may spring out). Detecting blind spots in full 3D would be challenging, as 3D reasoning on the fly even if the car is equipped with LiDAR would be prohibitively expensive and error prone. We instead propose to learn to estimate blind spots in 2D, just from a monocular camera. We achieve this in two steps. We first introduce an automatic method for generating ``ground-truth'' blind spot training data for arbitrary driving videos by leveraging monocular depth estimation, semantic segmentation, and SLAM. The key idea is to reason in 3D but from 2D images by defining blind spots as those road regions that are currently invisible but become visible in the near future. We construct a large-scale dataset with this automatic offline blind spot estimation, which we refer to as Road Blind Spot (RBS) dataset. Next, we introduce BlindSpotNet (BSN), a simple network that fully leverages this dataset for fully automatic estimation of frame-wise blind spot probability maps for arbitrary driving videos. Extensive experimental results demonstrate the validity of our RBS Dataset and the effectiveness of our BSN. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2203.15176 [pdf, other]

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Authors: Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata

Abstract: We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects… ▽ More We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JPN500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques improve the generalization of RNNT models individually and they can also be complementary. In particular, they yield good improvements over a strong SWB300 baseline and give state-of-art performance on SWB300 using RNNT models. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Submitted to Interspeech 2022

arXiv:2112.08878 [pdf, other]

Knowledge Distillation Leveraging Alternative Soft Targets from Non-Parallel Qualified Speech Data

Authors: Tohru Nagano, Takashi Fukuda, Gakuto Kurata

Abstract: This paper describes a novel knowledge distillation framework that leverages acoustically qualified speech data included in an existing training data pool as privileged information. In our proposed framework, a student network is trained with multiple soft targets for each utterance that consist of main soft targets from original speakers' utterance and alternative targets from other speakers' utt… ▽ More This paper describes a novel knowledge distillation framework that leverages acoustically qualified speech data included in an existing training data pool as privileged information. In our proposed framework, a student network is trained with multiple soft targets for each utterance that consist of main soft targets from original speakers' utterance and alternative targets from other speakers' utterances spoken under better acoustic conditions as a secondary view. These qualified utterances from other speakers, used to generate better soft targets, are collected from a qualified data pool by using strict constraints in terms of word/phone/state durations. Our proposed method is a form of target-side data augmentation that creates multiple copies of data with corresponding better soft targets obtained from a qualified data pool. We show in our experiments under acoustic model adaptation settings that the proposed method, exploiting better soft targets obtained from various speakers, can further improve recognition accuracy compared with conventional methods using only soft targets from original speakers. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Showing 1–7 of 7 results for author: Fukuda, T