Search | arXiv e-print repository

arXiv:2201.12047

Exploring Object-Aware Attention Guided Frame Association for RGB-D SLAM

Authors: Ali Caglayan, Nevrez Imamoglu, Oguzhan Guclu, Ali Osman Serhatoglu, Weimin Wang, Ahmet Burak Can, Ryosuke Nakamura

Abstract: Deep learning models as an emerging topic have shown great progress in various fields. Especially, visualization tools such as class activation mapping methods provided visual explanation on the reasoning of convolutional neural networks (CNNs). By using the gradients of the network layers, it is possible to demonstrate where the networks pay attention during a specific image recognition task. Mor… ▽ More Deep learning models as an emerging topic have shown great progress in various fields. Especially, visualization tools such as class activation mapping methods provided visual explanation on the reasoning of convolutional neural networks (CNNs). By using the gradients of the network layers, it is possible to demonstrate where the networks pay attention during a specific image recognition task. Moreover, these gradients can be integrated with CNN features for localizing more generalized task dependent attentive (salient) objects in scenes. Despite this progress, there is not much explicit usage of this gradient (network attention) information to integrate with CNN representations for object semantics. This can be very useful for visual tasks such as simultaneous localization and mapping (SLAM) where CNN representations of spatially attentive object locations may lead to improved performance. Therefore, in this work, we propose the use of task specific network attention for RGB-D indoor SLAM. To do so, we integrate layer-wise object attention information (layer gradients) with CNN layer representations to improve frame association performance in an RGB-D indoor SLAM method. Experiments show promising results with improved performance over the baseline. △ Less

Submitted 13 March, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

Comments: This article has been removed by arXiv administrators because the submitter did not have the authority to grant the license at the time of submission

arXiv:2104.06653 [pdf, other]

ADNet: Temporal Anomaly Detection in Surveillance Videos

Authors: Halil İbrahim Öztürk, Ahmet Burak Can

Abstract: Anomaly detection in surveillance videos is an important research problem in computer vision. In this paper, we propose ADNet, an anomaly detection network, which utilizes temporal convolutions to localize anomalies in videos. The model works online by accepting consecutive windows consisting of fixed-number of video clips. Features extracted from video clips in a window are fed to ADNet, which al… ▽ More Anomaly detection in surveillance videos is an important research problem in computer vision. In this paper, we propose ADNet, an anomaly detection network, which utilizes temporal convolutions to localize anomalies in videos. The model works online by accepting consecutive windows consisting of fixed-number of video clips. Features extracted from video clips in a window are fed to ADNet, which allows to localize anomalies in videos effectively. We propose the AD Loss function to improve abnormal segment detection performance of ADNet. Additionally, we propose to use F1@k metric for temporal anomaly detection. F1@k is a better evaluation metric than AUC in terms of not penalizing minor shifts in temporal segments and punishing short false positive temporal segment predictions. Furthermore, we extend UCF Crime dataset by adding two more anomaly classes and providing temporal anomaly annotations for all classes. Finally, we thoroughly evaluate our model on the extended UCF Crime dataset. ADNet produces promising results with respect to F1@k metric. Dataset extensions and code will be publicly available upon publishing △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: FGVRID workshop of ICPR conference, 15 pages

arXiv:2004.12349 [pdf, other]

When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition

Authors: Ali Caglayan, Nevrez Imamoglu, Ahmet Burak Can, Ryosuke Nakamura

Abstract: Recognizing objects and scenes are two challenging but essential tasks in image understanding. In particular, the use of RGB-D sensors in handling these tasks has emerged as an important area of focus for better visual understanding. Meanwhile, deep neural networks, specifically convolutional neural networks (CNNs), have become widespread and have been applied to many visual tasks by replacing han… ▽ More Recognizing objects and scenes are two challenging but essential tasks in image understanding. In particular, the use of RGB-D sensors in handling these tasks has emerged as an important area of focus for better visual understanding. Meanwhile, deep neural networks, specifically convolutional neural networks (CNNs), have become widespread and have been applied to many visual tasks by replacing hand-crafted features with effective deep features. However, it is an open problem how to exploit deep features from a multi-layer CNN model effectively. In this paper, we propose a novel two-stage framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks. In the first stage, a pretrained CNN model has been employed as a backbone to extract visual features at multiple levels. The second stage maps these features into high level representations with a fully randomized structure of recursive neural networks (RNNs) efficiently. To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed by extending the idea of randomness in RNNs. Multi-modal fusion has been performed through a soft voting approach by computing weights based on individual recognition confidences (i.e. SVM scores) of RGB and depth streams separately. This produces consistent class label estimation in final RGB-D classification performance. Extensive experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully. Comparative experimental results on the popular Washington RGB-D Object and SUN RGB-D Scene datasets show that the proposed approach achieves superior or on-par performance compared to state-of-the-art methods both in object and scene recognition tasks. Code is available at https://github.com/acaglayan/CNN_randRNN. △ Less

Submitted 11 January, 2022; v1 submitted 26 April, 2020; originally announced April 2020.

Comments: 15 pages, 9 figures, 5 tables (To appear in Computer Vision and Image Understanding, Elsevier)

Showing 1–3 of 3 results for author: Can, A B