Search | arXiv e-print repository

Sea Ice Segmentation From SAR Data by Convolutional Transformer Networks

Authors: Nicolae-Catalin Ristea, Andrei Anghel, Mihai Datcu

Abstract: Sea ice is a crucial component of the Earth's climate system and is highly sensitive to changes in temperature and atmospheric conditions. Accurate and timely measurement of sea ice parameters is important for understanding and predicting the impacts of climate change. Nevertheless, the amount of satellite data acquired over ice areas is huge, making the subjective measurements ineffective. Theref… ▽ More Sea ice is a crucial component of the Earth's climate system and is highly sensitive to changes in temperature and atmospheric conditions. Accurate and timely measurement of sea ice parameters is important for understanding and predicting the impacts of climate change. Nevertheless, the amount of satellite data acquired over ice areas is huge, making the subjective measurements ineffective. Therefore, automated algorithms must be used in order to fully exploit the continuous data feeds coming from satellites. In this paper, we present a novel approach for sea ice segmentation based on SAR satellite imagery using hybrid convolutional transformer (ConvTr) networks. We show that our approach outperforms classical convolutional networks, while being considerably more efficient than pure transformer models. ConvTr obtained a mean intersection over union (mIoU) of 63.68% on the AI4Arctic data set, assuming an inference time of 120ms for a 400 x 400 squared km product. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.08413 [pdf, other]

Artificial intelligence to advance Earth observation: a perspective

Authors: Devis Tuia, Konrad Schindler, Begüm Demir, Gustau Camps-Valls, Xiao Xiang Zhu, Mrinalini Kochupillai, Sašo Džeroski, Jan N. van Rijn, Holger H. Hoos, Fabio Del Frate, Mihai Datcu, Jorge-Arnulfo Quiané-Ruiz, Volker Markl, Bertrand Le Saux, Rochelle Schneider

Abstract: Earth observation (EO) is a prime instrument for monitoring land and ocean processes, studying the dynamics at work, and taking the pulse of our planet. This article gives a bird's eye view of the essential scientific tools and approaches informing and supporting the transition from raw EO data to usable EO-based information. The promises, as well as the current challenges of these developments, a… ▽ More Earth observation (EO) is a prime instrument for monitoring land and ocean processes, studying the dynamics at work, and taking the pulse of our planet. This article gives a bird's eye view of the essential scientific tools and approaches informing and supporting the transition from raw EO data to usable EO-based information. The promises, as well as the current challenges of these developments, are highlighted under dedicated sections. Specifically, we cover the impact of (i) Computer vision; (ii) Machine learning; (iii) Advanced processing and computing; (iv) Knowledge-based AI; (v) Explainable AI and causal inference; (vi) Physics-aware models; (vii) User-centric approaches; and (viii) the much-needed discussion of ethical and societal issues related to the massive use of ML technologies in EO. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2301.03589 [pdf, other]

Explainable, Physics Aware, Trustworthy AI Paradigm Shift for Synthetic Aperture Radar

Authors: Mihai Datcu, Zhongling Huang, Andrei Anghel, Juanping Zhao, Remus Cacoveanu

Abstract: The recognition or understanding of the scenes observed with a SAR system requires a broader range of cues, beyond the spatial context. These encompass but are not limited to: imaging geometry, imaging mode, properties of the Fourier spectrum of the images or the behavior of the polarimetric signatures. In this paper, we propose a change of paradigm for explainability in data science for the case… ▽ More The recognition or understanding of the scenes observed with a SAR system requires a broader range of cues, beyond the spatial context. These encompass but are not limited to: imaging geometry, imaging mode, properties of the Fourier spectrum of the images or the behavior of the polarimetric signatures. In this paper, we propose a change of paradigm for explainability in data science for the case of Synthetic Aperture Radar (SAR) data to ground the explainable AI for SAR. It aims to use explainable data transformations based on well-established models to generate inputs for AI methods, to provide knowledgeable feedback for training process, and to learn or improve high-complexity unknown or un-formalized models from the data. At first, we introduce a representation of the SAR system with physical layers: i) instrument and platform, ii) imaging formation, iii) scattering signatures and objects, that can be integrated with an AI model for hybrid modeling. Successively, some illustrative examples are presented to demonstrate how to achieve hybrid modeling for SAR image understanding. The perspective of trustworthy model and supplementary explanations are discussed later. Finally, we draw the conclusion and we deem the proposed concept has applicability to the entire class of coherent imaging sensors and other computational imaging systems. △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2210.16038 [pdf, other]

Deep Learning-Based Anomaly Detection in Synthetic Aperture Radar Imaging

Authors: Max Muzeau, Chengfang Ren, Sébastien Angelliaume, Mihai Datcu, Jean-Philippe Ovarlez

Abstract: In this paper, we proposed to investigate unsupervised anomaly detection in Synthetic Aperture Radar (SAR) images. Our approach considers anomalies as abnormal patterns that deviate from their surroundings but without any prior knowledge of their characteristics. In the literature, most model-based algorithms face three main issues. First, the speckle noise corrupts the image and potentially leads… ▽ More In this paper, we proposed to investigate unsupervised anomaly detection in Synthetic Aperture Radar (SAR) images. Our approach considers anomalies as abnormal patterns that deviate from their surroundings but without any prior knowledge of their characteristics. In the literature, most model-based algorithms face three main issues. First, the speckle noise corrupts the image and potentially leads to numerous false detections. Second, statistical approaches may exhibit deficiencies in modeling spatial correlation in SAR images. Finally, neural networks based on supervised learning approaches are not recommended due to the lack of annotated SAR data, notably for the class of abnormal patterns. Our proposed method aims to address these issues through a self-supervised algorithm. The speckle is first removed through the deep learning SAR2SAR algorithm. Then, an adversarial autoencoder is trained to reconstruct an anomaly-free SAR image. Finally, a change detection processing step is applied between the input and the output to detect anomalies. Experiments are performed to show the advantages of our method compared to the conventional Reed-Xiaoli algorithm, highlighting the importance of an efficient despeckling pre-processing step. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: 9 pages, 10 figures

arXiv:2209.15034 [pdf, other]

doi 10.1109/TGRS.2023.3272279

Guided Unsupervised Learning by Subaperture Decomposition for Ocean SAR Image Retrieval

Authors: Nicolae-Cătălin Ristea, Andrei Anghel, Mihai Datcu, Bertrand Chapron

Abstract: Spaceborne synthetic aperture radar (SAR) can provide accurate images of the ocean surface roughness day-or-night in nearly all weather conditions, being an unique asset for many geophysical applications. Considering the huge amount of data daily acquired by satellites, automated techniques for physical features extraction are needed. Even if supervised deep learning methods attain state-of-the-ar… ▽ More Spaceborne synthetic aperture radar (SAR) can provide accurate images of the ocean surface roughness day-or-night in nearly all weather conditions, being an unique asset for many geophysical applications. Considering the huge amount of data daily acquired by satellites, automated techniques for physical features extraction are needed. Even if supervised deep learning methods attain state-of-the-art results, they require great amount of labeled data, which are difficult and excessively expensive to acquire for ocean SAR imagery. To this end, we use the subaperture decomposition (SD) algorithm to enhance the unsupervised learning retrieval on the ocean surface, empowering ocean researchers to search into large ocean databases. We empirically prove that SD improve the retrieval precision with over 20% for an unsupervised transformer auto-encoder network. Moreover, we show that SD brings important performance boost when Doppler centroid images are used as input data, leading the way to new unsupervised physics guided retrieval algorithms. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.07799 [pdf, other]

doi 10.1109/JSTARS.2023.3316306

Quantum Transfer Learning for Real-World, Small, and High-Dimensional Datasets

Authors: Soronzonbold Otgonbaatar, Gottfried Schwarz, Mihai Datcu, Dieter Kranzlmüller

Abstract: Quantum machine learning (QML) networks promise to have some computational (or quantum) advantage for classifying supervised datasets (e.g., satellite images) over some conventional deep learning (DL) techniques due to their expressive power via their local effective dimension. There are, however, two main challenges regardless of the promised quantum advantage: 1) Currently available quantum bits… ▽ More Quantum machine learning (QML) networks promise to have some computational (or quantum) advantage for classifying supervised datasets (e.g., satellite images) over some conventional deep learning (DL) techniques due to their expressive power via their local effective dimension. There are, however, two main challenges regardless of the promised quantum advantage: 1) Currently available quantum bits (qubits) are very small in number, while real-world datasets are characterized by hundreds of high-dimensional elements (i.e., features). Additionally, there is not a single unified approach for embedding real-world high-dimensional datasets in a limited number of qubits. 2) Some real-world datasets are too small for training intricate QML networks. Hence, to tackle these two challenges for benchmarking and validating QML networks on real-world, small, and high-dimensional datasets in one-go, we employ quantum transfer learning composed of a multi-qubit QML network, and a very deep convolutional network (a with VGG16 architecture) extracting informative features from any small, high-dimensional dataset. We use real-amplitude and strongly-entangling N-layer QML networks with and without data re-uploading layers as a multi-qubit QML network, and evaluate their expressive power quantified by using their local effective dimension; the lower the local effective dimension of a QML network, the better its performance on unseen data. Our numerical results show that the strongly-entangling N-layer QML network has a lower local effective dimension than the real-amplitude QML network and outperforms it on the hard-to-classify three-class labelling problem. In addition, quantum transfer learning helps tackle the two challenges mentioned above for benchmarking and validating QML networks on real-world, small, and high-dimensional datasets. △ Less

Submitted 20 September, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

Comments: This article is published in IEEE jstars: https://ieeexplore.ieee.org/document/10253962

Journal ref: IEEE jstars, 18 September 2023

arXiv:2204.04691 [pdf]

Coreset of Hyperspectral Images on Small Quantum Computer

Authors: Soronzonbold Otgonbaatar, Mihai Datcu, Begüm Demir

Abstract: Machine Learning (ML) techniques are employed to analyze and process big Remote Sensing (RS) data, and one well-known ML technique is a Support Vector Machine (SVM). An SVM is a quadratic programming (QP) problem, and a D-Wave quantum annealer (D-Wave QA) promises to solve this QP problem more efficiently than a conventional computer. However, the D-Wave QA cannot solve directly the SVM due to its… ▽ More Machine Learning (ML) techniques are employed to analyze and process big Remote Sensing (RS) data, and one well-known ML technique is a Support Vector Machine (SVM). An SVM is a quadratic programming (QP) problem, and a D-Wave quantum annealer (D-Wave QA) promises to solve this QP problem more efficiently than a conventional computer. However, the D-Wave QA cannot solve directly the SVM due to its very few input qubits. Hence, we use a coreset ("core of a dataset") of given EO data for training an SVM on this small D-Wave QA. The coreset is a small, representative weighted subset of an original dataset, and any training models generate competitive classes by using the coreset in contrast to by using its original dataset. We measured the closeness between an original dataset and its coreset by employing a Kullback-Leibler (KL) divergence measure. Moreover, we trained the SVM on the coreset data by using both a D-Wave QA and a conventional method. We conclude that the coreset characterizes the original dataset with very small KL divergence measure. In addition, we present our KL divergence results for demonstrating the closeness between our original data and its coreset. As practical RS data, we use Hyperspectral Image (HSI) of Indian Pine, USA. △ Less

Submitted 12 April, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

Comments: Accepted to IGARSS2022. You may not be able to access this article after the publication in the conference

arXiv:2204.04438 [pdf, other]

Guided deep learning by subaperture decomposition: ocean patterns from SAR imagery

Authors: Nicolae-Catalin Ristea, Andrei Anghel, Mihai Datcu, Bertrand Chapron

Abstract: Spaceborne synthetic aperture radar can provide meters scale images of the ocean surface roughness day or night in nearly all weather conditions. This makes it a unique asset for many geophysical applications. Sentinel 1 SAR wave mode vignettes have made possible to capture many important oceanic and atmospheric phenomena since 2014. However, considering the amount of data provided, expanding appl… ▽ More Spaceborne synthetic aperture radar can provide meters scale images of the ocean surface roughness day or night in nearly all weather conditions. This makes it a unique asset for many geophysical applications. Sentinel 1 SAR wave mode vignettes have made possible to capture many important oceanic and atmospheric phenomena since 2014. However, considering the amount of data provided, expanding applications requires a strategy to automatically process and extract geophysical parameters. In this study, we propose to apply subaperture decomposition as a preprocessing stage for SAR deep learning models. Our data centring approach surpassed the baseline by 0.7, obtaining state of the art on the TenGeoPSARwv data set. In addition, we empirically showed that subaperture decomposition could bring additional information over the original vignette, by rising the number of clusters for an unsupervised segmentation method. Overall, we encourage the development of data centring approaches, showing that, data preprocessing could bring significant performance improvements over existing deep learning models. △ Less

Submitted 9 April, 2022; originally announced April 2022.

arXiv:2110.14144 [pdf, other]

doi 10.1016/j.isprsjprs.2022.05.008

Physically Explainable CNN for SAR Image Classification

Authors: Zhongling Huang, Xiwen Yao, Ying Liu, Corneliu Octavian Dumitru, Mihai Datcu, Junwei Han

Abstract: Integrating the special electromagnetic characteristics of Synthetic Aperture Radar (SAR) in deep neural networks is essential in order to enhance the explainability and physics awareness of deep learning. In this paper, we first propose a novel physically explainable convolutional neural network for SAR image classification, namely physics guided and injected learning (PGIL). It comprises three p… ▽ More Integrating the special electromagnetic characteristics of Synthetic Aperture Radar (SAR) in deep neural networks is essential in order to enhance the explainability and physics awareness of deep learning. In this paper, we first propose a novel physically explainable convolutional neural network for SAR image classification, namely physics guided and injected learning (PGIL). It comprises three parts: (1) explainable models (XM) to provide prior physics knowledge, (2) physics guided network (PGN) to encode the knowledge into physics-aware features, and (3) physics injected network (PIN) to adaptively introduce the physics-aware features into classification pipeline for label prediction. A hybrid Image-Physics SAR dataset format is proposed for evaluation, with both Sentinel-1 and Gaofen-3 SAR data being experimented. The results show that the proposed PGIL substantially improve the classification performance in case of limited labeled data compared with the counterpart data-driven CNN and other pre-training methods. Additionally, the physics explanations are discussed to indicate the interpretability and the physical consistency preserved in the predictions. We deem the proposed method would promote the development of physically explainable deep learning in SAR image interpretation field. △ Less

Submitted 2 June, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing Volume 190, August 2022, Pages 25-37

arXiv:2108.13246 [pdf, other]

LUAI Challenge 2021 on Learning to Understand Aerial Images

Authors: Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang , et al. (11 additional authors not shown)

Abstract: This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This cha… ▽ More This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This challenge received a total of 146 registrations on the three tasks. Through the challenge, we hope to draw attention from a wide range of communities and call for more efforts on the problems of learning to understand aerial images. △ Less

Submitted 17 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 7 pages, 2 figures, accepted by ICCVW 2021

arXiv:2104.09918 [pdf, other]

doi 10.1016/j.imavis.2020.104003

CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval

Authors: Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu

Abstract: We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich featu… ▽ More We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich feature space. However, the existing deep generative modeling-based SBIR approaches majorly focus on bridging the gaps between the seen and unseen classes by generating pseudo-unseen-class samples. Besides, violating the ZSL protocol by not utilizing any unseen-class information during training, such techniques do not pay explicit attention to modeling the discriminative nature of the shared space. Also, we note that learning a unified feature space for both the multi-view visual data is a tedious task considering the significant domain difference between sketches and color images. In this respect, as a remedy, we introduce a novel framework for zero-shot SBIR. While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain exploiting information from the respective sketch counterpart. In order to preserve the semantic consistency of the shared space, we consider a graph CNN-based module that propagates the semantic class topology to the shared space. To ensure an improved response time during inference, we further explore the possibility of representing the shared space in terms of hash codes. Experimental results obtained on the benchmark TU-Berlin and the Sketchy datasets confirm the superiority of CrossATNet in yielding state-of-the-art results. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: Accepted in Journal of Image and Vision Computing

arXiv:2102.12219 [pdf, other]

doi 10.1109/TPAMI.2021.3117983

Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges

Authors: Jian Ding, Nan Xue, Gui-Song Xia, Xiang Bai, Wen Yang, Micheal Ying Yang, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang

Abstract: In the past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird's-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper,we prese… ▽ More In the past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird's-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper,we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI. The proposed DOTA dataset contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images. Based on this large-scale and well-annotated dataset, we build baselines covering 10 state-of-the-art algorithms with over 70 configurations, where the speed and accuracy performances of each model have been evaluated. Furthermore, we provide a code library for ODAI and build a website for evaluating different algorithms. Previous challenges run on DOTA have attracted more than 1300 teams worldwide. We believe that the expanded large-scale DOTA dataset, the extensive baselines, the code library and the challenges can facilitate the designs of robust algorithms and reproducible research on the problem of object detection in aerial images. △ Less

Submitted 4 December, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

Comments: Accepted to IEEE TPAMI

ACM Class: I.4.8

arXiv:2008.05225 [pdf, other]

doi 10.1109/LGRS.2021.3056392

A Zero-Shot Sketch-based Inter-Modal Object Retrieval Scheme for Remote Sensing Images

Authors: Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu

Abstract: Conventional existing retrieval methods in remote sensing (RS) are often based on a uni-modal data retrieval framework. In this work, we propose a novel inter-modal triplet-based zero-shot retrieval scheme utilizing a sketch-based representation of RS data. The proposed scheme performs efficiently even when the sketch representations are marginally prototypical of the image. We conducted experimen… ▽ More Conventional existing retrieval methods in remote sensing (RS) are often based on a uni-modal data retrieval framework. In this work, we propose a novel inter-modal triplet-based zero-shot retrieval scheme utilizing a sketch-based representation of RS data. The proposed scheme performs efficiently even when the sketch representations are marginally prototypical of the image. We conducted experiments on a new bi-modal image-sketch dataset called Earth on Canvas (EoC) conceived during this study. We perform a thorough bench-marking of this dataset and demonstrate that the proposed network outperforms other state-of-the-art methods for zero-shot sketch-based retrieval framework in remote sensing. △ Less

Submitted 12 August, 2020; originally announced August 2020.

arXiv:2001.01425 [pdf, other]

doi 10.1109/LGRS.2020.2965558

Classification of Large-Scale High-Resolution SAR Images with Deep Transfer Learning

Authors: Zhongling Huang, Corneliu Octavian Dumitru, Zongxu Pan, Bin Lei, Mihai Datcu

Abstract: The classification of large-scale high-resolution SAR land cover images acquired by satellites is a challenging task, facing several difficulties such as semantic annotation with expertise, changing data characteristics due to varying imaging parameters or regional target area differences, and complex scattering mechanisms being different from optical imaging. Given a large-scale SAR land cover da… ▽ More The classification of large-scale high-resolution SAR land cover images acquired by satellites is a challenging task, facing several difficulties such as semantic annotation with expertise, changing data characteristics due to varying imaging parameters or regional target area differences, and complex scattering mechanisms being different from optical imaging. Given a large-scale SAR land cover dataset collected from TerraSAR-X images with a hierarchical three-level annotation of 150 categories and comprising more than 100,000 patches, three main challenges in automatically interpreting SAR images of highly imbalanced classes, geographic diversity, and label noise are addressed. In this letter, a deep transfer learning method is proposed based on a similarly annotated optical land cover dataset (NWPU-RESISC45). Besides, a top-2 smooth loss function with cost-sensitive parameters was introduced to tackle the label noise and imbalanced classes' problems. The proposed method shows high efficiency in transferring information from a similarly annotated remote sensing dataset, a robust performance on highly imbalanced classes, and is alleviating the over-fitting problem caused by label noise. What's more, the learned deep model has a good generalization for other SAR-specific tasks, such as MSTAR target recognition with a state-of-the-art classification accuracy of 99.46%. △ Less

Submitted 6 January, 2020; originally announced January 2020.

Journal ref: IEEE Geoscience and Remote Sensing Letters 2020

arXiv:1904.04794 [pdf, other]

doi 10.1016/j.patrec.2020.02.006

CMIR-NET : A Deep Learning Based Model For Cross-Modal Retrieval In Remote Sensing

Authors: Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu

Abstract: We address the problem of cross-modal information retrieval in the domain of remote sensing. In particular, we are interested in two application scenarios: i) cross-modal retrieval between panchromatic (PAN) and multi-spectral imagery, and ii) multi-label image retrieval between very high resolution (VHR) images and speech based label annotations. Notice that these multi-modal retrieval scenarios… ▽ More We address the problem of cross-modal information retrieval in the domain of remote sensing. In particular, we are interested in two application scenarios: i) cross-modal retrieval between panchromatic (PAN) and multi-spectral imagery, and ii) multi-label image retrieval between very high resolution (VHR) images and speech based label annotations. Notice that these multi-modal retrieval scenarios are more challenging than the traditional uni-modal retrieval approaches given the inherent differences in distributions between the modalities. However, with the growing availability of multi-source remote sensing data and the scarcity of enough semantic annotations, the task of multi-modal retrieval has recently become extremely important. In this regard, we propose a novel deep neural network based architecture which is considered to learn a discriminative shared feature space for all the input modalities, suitable for semantically coherent information retrieval. Extensive experiments are carried out on the benchmark large-scale PAN - multi-spectral DSRSID dataset and the multi-label UC-Merced dataset. Together with the Merced dataset, we generate a corpus of speech signals corresponding to the labels. Superior performance with respect to the current state-of-the-art is observed in all the cases. △ Less

Submitted 24 May, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

arXiv:1807.07778 [pdf]

Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X

Authors: Dongyang Ao, Corneliu Octavian Dumitru, Gottfried Schwarz, Mihai Datcu

Abstract: Contrary to optical images, Synthetic Aperture Radar (SAR) images are in different electromagnetic spectrum where the human visual system is not accustomed to. Thus, with more and more SAR applications, the demand for enhanced high-quality SAR images has increased considerably. However, high-quality SAR images entail high costs due to the limitations of current SAR devices and their image processi… ▽ More Contrary to optical images, Synthetic Aperture Radar (SAR) images are in different electromagnetic spectrum where the human visual system is not accustomed to. Thus, with more and more SAR applications, the demand for enhanced high-quality SAR images has increased considerably. However, high-quality SAR images entail high costs due to the limitations of current SAR devices and their image processing resources. To improve the quality of SAR images and to reduce the costs of their generation, we propose a Dialectical Generative Adversarial Network (Dialectical GAN) to generate high-quality SAR images. This method is based on the analysis of hierarchical SAR information and the "dialectical" structure of GAN frameworks. As a demonstration, a typical example will be shown where a low-resolution SAR image (e.g., a Sentinel-1 image) with large ground coverage is translated into a high-resolution SAR image (e.g., a TerraSAR-X image). Three traditional algorithms are compared, and a new algorithm is proposed based on a network framework by combining conditional WGAN-GP (Wasserstein Generative Adversarial Network - Gradient Penalty) loss functions and Spatial Gram matrices under the rule of dialectics. Experimental results show that the SAR image translation works very well when we compare the results of our proposed method with the selected traditional methods. △ Less

Submitted 20 July, 2018; originally announced July 2018.

Comments: 22 pages, 15 figures

arXiv:1711.10398 [pdf, other]

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Authors: Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang

Abstract: Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of… ▽ More Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging. △ Less

Submitted 19 May, 2019; v1 submitted 28 November, 2017; originally announced November 2017.

Comments: Accepted to CVPR 2018

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition, 2018

arXiv:1711.02010 [pdf, other]

Artificial Generation of Big Data for Improving Image Classification: A Generative Adversarial Network Approach on SAR Data

Authors: Dimitrios Marmanis, Wei Yao, Fathalrahman Adam, Mihai Datcu, Peter Reinartz, Konrad Schindler, Jan Dirk Wegner, Uwe Stilla

Abstract: Very High Spatial Resolution (VHSR) large-scale SAR image databases are still an unresolved issue in the Remote Sensing field. In this work, we propose such a dataset and use it to explore patch-based classification in urban and periurban areas, considering 7 distinct semantic classes. In this context, we investigate the accuracy of large CNN classification models and pre-trained networks for SAR… ▽ More Very High Spatial Resolution (VHSR) large-scale SAR image databases are still an unresolved issue in the Remote Sensing field. In this work, we propose such a dataset and use it to explore patch-based classification in urban and periurban areas, considering 7 distinct semantic classes. In this context, we investigate the accuracy of large CNN classification models and pre-trained networks for SAR imaging systems. Furthermore, we propose a Generative Adversarial Network (GAN) for SAR image generation and test, whether the synthetic data can actually improve classification accuracy. △ Less

Submitted 6 November, 2017; originally announced November 2017.

Comments: Submitted for review in "Big Data from Space 2017" conference

arXiv:1707.07321 [pdf, other]

Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation

Authors: Xin-Yi Tong, Gui-Song Xia, Fan Hu, Yanfei Zhong, Mihai Datcu, Liangpei Zhang

Abstract: Remote sensing (RS) image retrieval is of great significant for geological information mining. Over the past two decades, a large amount of research on this task has been carried out, which mainly focuses on the following three core issues: feature extraction, similarity metric and relevance feedback. Due to the complexity and multiformity of ground objects in high-resolution remote sensing (HRRS)… ▽ More Remote sensing (RS) image retrieval is of great significant for geological information mining. Over the past two decades, a large amount of research on this task has been carried out, which mainly focuses on the following three core issues: feature extraction, similarity metric and relevance feedback. Due to the complexity and multiformity of ground objects in high-resolution remote sensing (HRRS) images, there is still room for improvement in the current retrieval approaches. In this paper, we analyze the three core issues of RS image retrieval and provide a comprehensive review on existing methods. Furthermore, for the goal to advance the state-of-the-art in HRRS image retrieval, we focus on the feature extraction issue and delve how to use powerful deep representations to address this task. We conduct systematic investigation on evaluating correlative factors that may affect the performance of deep features. By optimizing each factor, we acquire remarkable retrieval results on publicly available HRRS datasets. Finally, we explain the experimental phenomenon in detail and draw conclusions according to our analysis. Our work can serve as a guiding role for the research of content-based RS image retrieval. △ Less

Submitted 20 November, 2019; v1 submitted 23 July, 2017; originally announced July 2017.

arXiv:1612.01337 [pdf, other]

Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection

Authors: Dimitrios Marmanis, Konrad Schindler, Jan Dirk Wegner, Silvano Galliani, Mihai Datcu, Uwe Stilla

Abstract: We present an end-to-end trainable deep convolutional neural network (DCNN) for semantic segmentation with built-in awareness of semantically meaningful boundaries. Semantic segmentation is a fundamental remote sensing task, and most state-of-the-art methods rely on DCNNs as their workhorse. A major reason for their success is that deep networks learn to accumulate contextual information over very… ▽ More We present an end-to-end trainable deep convolutional neural network (DCNN) for semantic segmentation with built-in awareness of semantically meaningful boundaries. Semantic segmentation is a fundamental remote sensing task, and most state-of-the-art methods rely on DCNNs as their workhorse. A major reason for their success is that deep networks learn to accumulate contextual information over very large windows (receptive fields). However, this success comes at a cost, since the associated loss of effecive spatial resolution washes out high-frequency details and leads to blurry object boundaries. Here, we propose to counter this effect by combining semantic segmentation with semantically informed edge detection, thus making class-boundaries explicit in the model, First, we construct a comparatively simple, memory-efficient model by adding boundary detection to the Segnet encoder-decoder architecture. Second, we also include boundary detection in FCN-type models and set up a high-end classifier ensemble. We show that boundary detection significantly improves semantic segmentation with CNNs. Our high-end ensemble achieves > 90% overall accuracy on the ISPRS Vaihingen benchmark. △ Less

Submitted 21 December, 2017; v1 submitted 5 December, 2016; originally announced December 2016.

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, Volume 135, January 2018, Pages 158-172, ISSN 0924-2716, https://doi.org/10.1016/j.isprsjprs.2017.11.009

arXiv:1402.3405 [pdf, ps, other]

doi 10.1016/j.patrec.2014.01.019

Authorship Analysis based on Data Compression

Authors: Daniele Cerra, Mihai Datcu, Peter Reinartz

Abstract: This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a similarity measure based on compression with dictionaries directly extracted from the written texts. The FCD computes a similarity between two documents through an effective binary search on the intersection set between the two related dictionaries. In the reported experiments the proposed method is app… ▽ More This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a similarity measure based on compression with dictionaries directly extracted from the written texts. The FCD computes a similarity between two documents through an effective binary search on the intersection set between the two related dictionaries. In the reported experiments the proposed method is applied to documents which are heterogeneous in style, written in five different languages and coming from different historical periods. Results are comparable to the state of the art and outperform traditional compression-based methods. △ Less

Submitted 14 February, 2014; originally announced February 2014.

arXiv:1307.1289 [pdf, other]

doi 10.1016/j.patrec.2013.05.025

Further results on dissimilarity spaces for hyperspectral images RF-CBIR

Authors: Miguel Angel Veganzones, Mihai Datcu, Manuel Graña

Abstract: Content-Based Image Retrieval (CBIR) systems are powerful search tools in image databases that have been little applied to hyperspectral images. Relevance feedback (RF) is an iterative process that uses machine learning techniques and user's feedback to improve the CBIR systems performance. We pursued to expand previous research in hyperspectral CBIR systems built on dissimilarity functions define… ▽ More Content-Based Image Retrieval (CBIR) systems are powerful search tools in image databases that have been little applied to hyperspectral images. Relevance feedback (RF) is an iterative process that uses machine learning techniques and user's feedback to improve the CBIR systems performance. We pursued to expand previous research in hyperspectral CBIR systems built on dissimilarity functions defined either on spectral and spatial features extracted by spectral unmixing techniques, or on dictionaries extracted by dictionary-based compressors. These dissimilarity functions were not suitable for direct application in common machine learning techniques. We propose to use a RF general approach based on dissimilarity spaces which is more appropriate for the application of machine learning algorithms to the hyperspectral RF-CBIR. We validate the proposed RF method for hyperspectral CBIR systems over a real hyperspectral dataset. △ Less

Submitted 4 July, 2013; originally announced July 2013.

Comments: In Pattern Recognition Letters (2013)

Report number: veganzones_PRL2013

Journal ref: Pattern Recognition Letters 34, 14 (2013) 1659-1668

arXiv:1210.0758 [pdf]

doi 10.1016/j.jvcir.2011.10.009

A fast compression-based similarity measure with applications to content-based image retrieval

Authors: Daniele Cerra, Mihai Datcu

Abstract: Compression-based similarity measures are effectively employed in applications on diverse data types with a basically parameter-free approach. Nevertheless, there are problems in applying these techniques to medium-to-large datasets which have been seldom addressed. This paper proposes a similarity measure based on compression with dictionaries, the Fast Compression Distance (FCD), which reduces t… ▽ More Compression-based similarity measures are effectively employed in applications on diverse data types with a basically parameter-free approach. Nevertheless, there are problems in applying these techniques to medium-to-large datasets which have been seldom addressed. This paper proposes a similarity measure based on compression with dictionaries, the Fast Compression Distance (FCD), which reduces the complexity of these methods, without degradations in performance. On its basis a content-based color image retrieval system is defined, which can be compared to state-of-the-art methods based on invariant color features. Through the FCD a better understanding of compression-based techniques is achieved, by performing experiments on datasets which are larger than the ones analyzed so far in literature. △ Less

Submitted 2 October, 2012; originally announced October 2012.

Comments: Pre-print

Journal ref: Journal of Visual Communication and Image Representation, vol. 23, no. 2, pp. 293-302, 2012

arXiv:0709.3013 [pdf, ps, other]

Supervised learning on graphs of spatio-temporal similarity in satellite image sequences

Authors: Patrick Héas, Mihai Datcu

Abstract: High resolution satellite image sequences are multidimensional signals composed of spatio-temporal patterns associated to numerous and various phenomena. Bayesian methods have been previously proposed in (Heas and Datcu, 2005) to code the information contained in satellite image sequences in a graph representation using Bayesian methods. Based on such a representation, this paper further present… ▽ More High resolution satellite image sequences are multidimensional signals composed of spatio-temporal patterns associated to numerous and various phenomena. Bayesian methods have been previously proposed in (Heas and Datcu, 2005) to code the information contained in satellite image sequences in a graph representation using Bayesian methods. Based on such a representation, this paper further presents a supervised learning methodology of semantics associated to spatio-temporal patterns occurring in satellite image sequences. It enables the recognition and the probabilistic retrieval of similar events. Indeed, graphs are attached to statistical models for spatio-temporal processes, which at their turn describe physical changes in the observed scene. Therefore, we adjust a parametric model evaluating similarity types between graph patterns in order to represent user-specific semantics attached to spatio-temporal phenomena. The learning step is performed by the incremental definition of similarity types via user-provided spatio-temporal pattern examples attached to positive or/and negative semantics. From these examples, probabilities are inferred using a Bayesian network and a Dirichlet model. This enables to links user interest to a specific similarity model between graph patterns. According to the current state of learning, semantic posterior probabilities are updated for all possible graph patterns so that similar spatio-temporal phenomena can be recognized and retrieved from the image sequence. Few experiments performed on a multi-spectral SPOT image sequence illustrate the proposed spatio-temporal recognition method. △ Less

Submitted 20 September, 2007; v1 submitted 19 September, 2007; originally announced September 2007.

Showing 1–24 of 24 results for author: Datcu, M