Extracting informative image features and learning effective approximate hashing functions are two crucial steps in image retrieval . Conventional methods often study these two steps separately, e.g., learning hash functions from a predefined hand-crafted feature space. Meanwhile, the bit lengths of output hashing codes are preset in most previous methods, neglecting the significance level of different bits and restricting their practical flexibility. To address these issues, we propose a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images. We pose hashing learning as a problem of regularized similarity learning. Specifically, we organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, we maximize the margin between matched pairs and mismatched pairs in the Hamming space. In addition, a regularization term is introduced to enforce the adjacency consistency, i.e., images of similar appearances should have similar codes. The deep convolutional neural network is utilized to train the model in an end-to-end fashion, where discriminative image features and hash functions are simultaneously optimized. Furthermore, each bit of our hashing codes is unequally weighted so that we can manipulate the code lengths by truncating the insignificant bits. Our framework outperforms state-of-the-arts on public benchmarks of similar image search and also achieves promising results in the application of person re-identification in surveillance. It is also shown that the generated bit-scalable hashing codes well preserve the discriminative powers with shorter code lengths.
The bit-scalable deep hashing learning framework. The bottom panel shows the deep architecture of neural network that produces the hashing code with the weight matrix by taking raw images as inputs. The training stage is illustrated in the left up panel, where we train the network with triplet-based similarity learning. An example of hashing retrieval is presented in the right up panel, where the similarity is measured by the Hamming affinity.
Fig. 1. The results on the MNIST dataset. (a) Precision curves within Hamming radius 2; (b) Precision curves with top 500 returned; (c) Precision curves with 64 hash bits. Note that DSCH and DRSCH are different versions of proposed method.
Fig. 2. The results on the CIFAR-10 dataset. (a) Precision curves within Hamming radius 2; (b) Precision curves with top 500 returned; (c) Precision curves with 64 hash bits. Note that DSCH and DRSCH are different versions of proposed method.
Fig. 3. The results on the NUS-WIDE dataset. (a) Precision curves within Hamming radius 2; (b) Precision curves with top 500 returned; (c) Precision curves with 64 hash bits. Note that DSCH and DRSCH are different versions of proposed method.
Fig. 4. The results on the CIFAR-20 dataset. (a) Precision curves within Hamming radius 2; (b) Precision curves with top 500 returned; (c) Precision curves with 64 hash bits. Note that DSCH and DRSCH are different versions of proposed method.
Evaluation of Bit-Scalable Hashing
Fig. 5. Precision@500 vs. #bits. (a) MNIST dataset; (b) CIFAR-10 dataset; (c) NUS-WIDE dataset; (d) CIFAR-20 dataset. Note that BS-DRSCH is the Bit-Scalable versionof our method.
Fig. 6. Visual comparison. Image retrieval results (top 10 returned images) for ten CIFAR-10 test images using Hamming ranking on 32-bit hash codes. The left column shows the query images. The middle 10 columns show the top returned images by fix length hashing learning algorithm. The right 10 columns indicate the top returned images adopting bit-scalable learning method. Red rectangles indicate mistakes. Note that for Bit-Scalable Hashing, we train a neural network with 64-bit output and select the 32 bits with the largest weights for testing.
Application to Person Re-Identification
Table 1. Experimental results of Person Re-identification on CUHK03 dataset using manually labeled pedestrain bounding boxes on CMC approach. The first two methods are ours with different hash bits. Next two methods are another deep hash learning algorithm with different length. Method V~X are three cascade hash learning methods applying CNN features. The last three are state-of-the-arts algorithms for person re-identification.
- BRE- B. Kulis and T. Darrell, “Learning to hash with binary reconstructive embeddings,” in NIPS, 2009.
- MLH- M. Norouzi and D. J. Fleet, “Minimal loss hashing for compact binary codes,” in ICML, 2011.
- KSH- W. Liu, J. Wang, R. Ji, Y. Jiang, and S. Chang, “Supervised hashing with kernels,” in CVPR, 2012.
- DSRH- F. Zhao, Y. Huang, L. Wang, and T. Tan, “Deep semantic ranking based hashing for multi-label image retrieval,” in CVPR, 2015.
- FPNN- W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person re-identification,” in CVPR, 2014
- KISSME- M. K¨ostinger, M. Hirzer, P.Wohlhart, P. M. Roth, and H. Bischof, “Large scale metric learning from equivalence constraints,” in CVPR, 2012.
- eSDC- R. Zhao, W. Ouyang, and X. Wang, “Unsupervised salience learning for person re-identification,” in CVPR, 2013.
- S. Ding, L. Lin, G. Wang, and H. Chao, “Deep feature learning with relative distance comparison for person re-identification,” Pattern Recognition, vol. 48, no. 10, pp. 2993–3003, 2015.