To avoid the exhaustive search over locations and scales, current state-of-the-art object detection systems usually involve a crucial component generating a batch of candidate object proposals from images. In this paper, we present a simple yet effective approach for segmenting object proposals via a deep architecture of recursive neural networks (RNNs), which hierarchically groups regions for detecting object candi- dates over scales. Unlike traditional methods that mainly adopt fixed similarity measures for merging regions or finding object proposals, our approach adaptively learns the region merging similarity and the objectness measure during the process of hierarchical region grouping. Specifically, guided by a structured loss, the RNN model jointly optimizes the cross-region similarity metric with the region merging process as well as the objectness prediction. During inference of the object proposal generation, we introduce randomness into the greedy search to cope with the ambiguity of grouping regions. Extensive experiments on standard benchmarks, e.g., PASCAL VOC and ImageNet, suggest that our approach is capable of producing object proposals with high recall while well preserving the object boundaries and outperforms other existing methods in both accuracy and efficiency.
Fig. An overview of our proposed object proposal segmentation framework. The bottom shows local feature extraction, and the top illustrates bottom-up recursive region grouping process. The four modules, Fs, Fc, Fm and Fo, work cooperatively to group regions for generating object proposals.
Fig. Comparison of our proposed method and other state-of-the-art approaches on the PASCAL VOC 2007 test set.
Fig. Comparison of our proposed method and other state-of-the-art approaches on the PASCAL VOC 2012 validation set.
Fig. Comparison of our proposed method and other state-of-the-art approaches on the ImageNet 2015 validation set
Fig. Qualitative examples of our object proposals. Ground truth box are shown in green and red, with green indicating the object is found and red indicating the object is not found. The blue boxes are the object proposals with highest IoU to each ground truth box, and the blue silhouettes are the corresponding object contours. All the samples are taken from PASCAL VOC dataset.
SS – J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, 2013.
RP – S. Manen, M. Guillaumin, and L. Van Gool, “Prime object proposals with randomized prim’s algorithm,” in Proc. IEEE Int. Conf. Comput. Vis. IEEE, Dec. 2013, pp. 2536–2543.
PRN – S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real time object detection with region proposal networks,” in roc. Adv. Neural Inf. Process. Syst., Dec. 2015.
EB – C. L. Zitnick and P. Dolla´r, “Edge boxes: Locating object proposals from edges,” in Proc. Eur. Conf. Comput. Vis. Springer, Sep. 2014, pp. 391–405.
BING – M – M. Cheng, Z. Zhang, W.-Y. Lin, and P. Torr, “BING: Binarized normed gradients for objectness estimation at 300fps,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, Jun. 2014, pp. 3286–3293.
CADM – Y. Xiao, C. Lu, E. Tsougenis, Y. Lu, and C.-K. Tang, “Complexity adaptive distance metric for object proposals generation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, Jun. 2015, pp. 778–786.
GOP – P. Kr¨ahenb¨uhl and V. Koltun, “Geodesic object proposals,” in Proc. Eur. Conf. Comput. Vis. Springer, Sep. 2014, pp. 725–739.
MHS – C. Wang, L. Zhao, S. Liang, L. Zhang, J. Jia, and Y. Wei, “Object proposal by multi-branch hierarchical segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, June 2015, pp. 3873–3881.
LPO – P. Kr¨ahenb¨uhl and V. Koltun, “Learning to propose objects,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, Jun. 2015, pp. 1574–1582.
MCG – J. Pont-Tuset, P. Arbelaez, J. T. Barron, F. Marques, and J. Malik, “Multiscale combinatorial grouping for image segmentation and object proposal generation,” arXiv preprint arXiv:1503.00848, 2015.