-
Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images
Authors:
Jiaxin Zhang,
Yunqin Li,
Tomohiro Fukuda,
Bowen Wang
Abstract:
Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources. This process often involves extensive field surveys, manual data collection, and subjective assessments, which can be time-consuming, costly, and sometimes inconsistent. Street View Images (SVIs), along with deep learning methods, provide a way to realize large-scale urban safet…
▽ More
Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources. This process often involves extensive field surveys, manual data collection, and subjective assessments, which can be time-consuming, costly, and sometimes inconsistent. Street View Images (SVIs), along with deep learning methods, provide a way to realize large-scale urban safety detection. However, achieving this goal often requires extensive human annotation to train safety ranking models, and the architectural differences between cities hinder the transferability of these models. Thus, a fully automated method for conducting safety evaluations is essential. Recent advances in multimodal large language models (MLLMs) have demonstrated powerful reasoning and analytical capabilities. Cutting-edge models, e.g., GPT-4 have shown surprising performance in many tasks. We employed these models for urban safety ranking on a human-annotated anchor set and validated that the results from MLLMs align closely with human perceptions. Additionally, we proposed a method based on the pre-trained Contrastive Language-Image Pre-training (CLIP) feature and K-Nearest Neighbors (K-NN) retrieval to quickly assess the safety index of the entire city. Experimental results show that our method outperforms existing training needed deep learning approaches, achieving efficient and accurate urban safety evaluations. The proposed automation for urban safety perception assessment is a valuable tool for city planners, policymakers, and researchers aiming to improve urban environments.
△ Less
Submitted 5 August, 2024; v1 submitted 29 July, 2024;
originally announced July 2024.
-
Method of Tracking and Analysis of Fluorescent-Labeled Cells Using Automatic Thresholding and Labeling
Authors:
Mizuki Fukasawa,
Tomokazu Fukuda,
Takuya Akashi
Abstract:
High-throughput screening using cell images is an efficient method for screening new candidates for pharmaceutical drugs. To complete the screening process, it is essential to have an efficient process for analyzing cell images. This paper presents a new method for efficiently tracking cells and quantitatively detecting the signal ratio between cytoplasm and nuclei. Existing methods include those…
▽ More
High-throughput screening using cell images is an efficient method for screening new candidates for pharmaceutical drugs. To complete the screening process, it is essential to have an efficient process for analyzing cell images. This paper presents a new method for efficiently tracking cells and quantitatively detecting the signal ratio between cytoplasm and nuclei. Existing methods include those that use image processing techniques and those that utilize artificial intelligence (AI). However, these methods do not consider the correspondence of cells between images, or require a significant amount of new learning data to train AI. Therefore, our method uses automatic thresholding and labeling algorithms to compare the position of each cell between images, and continuously measure and analyze the signal ratio of cells. This paper describes the algorithm of our method. Using the method, we experimented to investigate the effect of the number of opening and closing operations during the binarization process on the tracking of the cells. Through the experiment, we determined the appropriate number of opening and closing processes.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
On the Limitation of Diffusion Models for Synthesizing Training Datasets
Authors:
Shin'ya Yamaguchi,
Takuma Fukuda
Abstract:
Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpos…
▽ More
Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpose of replicating datasets for training discriminative tasks. This paper investigates the gap between synthetic and real samples by analyzing the synthetic samples reconstructed from real samples through the diffusion and reverse process. By varying the time steps starting the reverse process in the reconstruction, we can control the trade-off between the information in the original real data and the information added by diffusion models. Through assessing the reconstructed samples and trained models, we found that the synthetic data are concentrated in modes of the training data distribution as the reverse step increases, and thus, they are difficult to cover the outer edges of the distribution. Our findings imply that modern diffusion models are insufficient to replicate training data distribution perfectly, and there is room for the improvement of generative modeling in the replication of training datasets.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Open Multi-Access Network Platform with Dynamic Task Offloading and Intelligent Resource Monitoring
Authors:
Takuji Tachibana,
Kazuki Sawada,
Hiroyuki Fujii,
Ryo Maruyama,
Tomonori Yamada,
Masaaki Fujii,
Toshimichi Fukuda
Abstract:
We constructed an open multi-access network platform using open-source hardware and software. The open multi-access network platform is characterized by the flexible utilization of network functions, integral management and control of wired and wireless access networks, zero-touch provisioning, intelligent resource monitoring, and dynamic task offloading. We also propose an application-driven dyna…
▽ More
We constructed an open multi-access network platform using open-source hardware and software. The open multi-access network platform is characterized by the flexible utilization of network functions, integral management and control of wired and wireless access networks, zero-touch provisioning, intelligent resource monitoring, and dynamic task offloading. We also propose an application-driven dynamic task offloading that utilizes intelligent resource monitoring to ensure effective task processing in edge and cloud servers. For this purpose, we developed a mobile application and server applications for the open multi-access network platform. To investigate the feasibility and availability of our developed platform, we experimentally and analytically evaluated the effectiveness of application-driven dynamic task offloading and intelligent resource monitoring. The experimental results demonstrated that application-driven dynamic task offloading could reduce real-time task response time and traffic over metro and core networks.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
BlindSpotNet: Seeing Where We Cannot See
Authors:
Taichi Fukuda,
Kotaro Hasegawa,
Shinya Ishizaki,
Shohei Nobuhara,
Ko Nishino
Abstract:
We introduce 2D blind spot estimation as a critical visual task for road scene understanding. By automatically detecting road regions that are occluded from the vehicle's vantage point, we can proactively alert a manual driver or a self-driving system to potential causes of accidents (e.g., draw attention to a road region from which a child may spring out). Detecting blind spots in full 3D would b…
▽ More
We introduce 2D blind spot estimation as a critical visual task for road scene understanding. By automatically detecting road regions that are occluded from the vehicle's vantage point, we can proactively alert a manual driver or a self-driving system to potential causes of accidents (e.g., draw attention to a road region from which a child may spring out). Detecting blind spots in full 3D would be challenging, as 3D reasoning on the fly even if the car is equipped with LiDAR would be prohibitively expensive and error prone. We instead propose to learn to estimate blind spots in 2D, just from a monocular camera. We achieve this in two steps. We first introduce an automatic method for generating ``ground-truth'' blind spot training data for arbitrary driving videos by leveraging monocular depth estimation, semantic segmentation, and SLAM. The key idea is to reason in 3D but from 2D images by defining blind spots as those road regions that are currently invisible but become visible in the near future. We construct a large-scale dataset with this automatic offline blind spot estimation, which we refer to as Road Blind Spot (RBS) dataset. Next, we introduce BlindSpotNet (BSN), a simple network that fully leverages this dataset for fully automatic estimation of frame-wise blind spot probability maps for arbitrary driving videos. Extensive experimental results demonstrate the validity of our RBS Dataset and the effectiveness of our BSN.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
Authors:
Xiaodong Cui,
George Saon,
Tohru Nagano,
Masayuki Suzuki,
Takashi Fukuda,
Brian Kingsbury,
Gakuto Kurata
Abstract:
We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects…
▽ More
We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JPN500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques improve the generalization of RNNT models individually and they can also be complementary. In particular, they yield good improvements over a strong SWB300 baseline and give state-of-art performance on SWB300 using RNNT models.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Knowledge Distillation Leveraging Alternative Soft Targets from Non-Parallel Qualified Speech Data
Authors:
Tohru Nagano,
Takashi Fukuda,
Gakuto Kurata
Abstract:
This paper describes a novel knowledge distillation framework that leverages acoustically qualified speech data included in an existing training data pool as privileged information. In our proposed framework, a student network is trained with multiple soft targets for each utterance that consist of main soft targets from original speakers' utterance and alternative targets from other speakers' utt…
▽ More
This paper describes a novel knowledge distillation framework that leverages acoustically qualified speech data included in an existing training data pool as privileged information. In our proposed framework, a student network is trained with multiple soft targets for each utterance that consist of main soft targets from original speakers' utterance and alternative targets from other speakers' utterances spoken under better acoustic conditions as a secondary view. These qualified utterances from other speakers, used to generate better soft targets, are collected from a qualified data pool by using strict constraints in terms of word/phone/state durations. Our proposed method is a form of target-side data augmentation that creates multiple copies of data with corresponding better soft targets obtained from a qualified data pool. We show in our experiments under acoustic model adaptation settings that the proposed method, exploiting better soft targets obtained from various speakers, can further improve recognition accuracy compared with conventional methods using only soft targets from original speakers.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.