Deep-Like Hashing-in-Hash for Visual Retrieval: An Embarrassingly Simple Method

IEEE Trans Image Process. 2020 Jul 30:PP. doi: 10.1109/TIP.2020.3011796. Online ahead of print.

Abstract

Existing hashing methods have yielded significant performance in image and multimedia retrieval, which can be categorized into two groups: shallow hashing and deep hashing. However, there still exist some intrinsic limitations among them. The former generally adopts a one-step strategy to learn the hashing codes for discovering the discriminative binary feature, but the latent discriminative information in the learned hashing codes is not well exploited. The latter, as deep neural network based hashing models, can learn highly discriminative and compact features, but relies on large-scale data and computation resources for numerous network parameters tuning with back-propagation optimization. Straightforward training of deep hashing models from scratch on small-scale data is almost impossible. Therefore, in order to develop efficient but effective learning to hash algorithm that depends only on small-scale data, we propose a novel non-neural network based deep-like learning framework, i.e. multi-level cascaded hashing (MCH) approach with hierarchical learning strategy, for image retrieval. The contributions are threefold. First, a hashing-in-hash architecture is designed in MCH, which inherits the excellent traits of traditional neural networks based deep learning, such that discriminative binary features that are beneficial to image retrieval can be effectively captured. Second, in each level the binary features of all preceding levels and the visual appearance feature are simultaneously cascaded as inputs of all subsequent levels to retrain, which fully exploits the implicated discriminative information. Third, a basic learning to hash (BLH) model with label constraint is proposed for hierarchical learning. Without loss of generality, the existing hashing models can be easily integrated into our MCH framework. We show experimentally on small- and large-scale visual retrieval tasks that our method outperforms several state-of-the-arts.