Image saliency detection has recently witnessed rapid progress due to deep convolutional neural networks. However, none of the existing methods is able to identify object instances in the detected salient regions. In this paper, we present a salient instance segmentation method that produces a saliency mask with distinct object instance labels for an input image. Our method consists of three steps, estimating saliency map, detecting salient object contours and identifying salient object instances. For the first two steps, we propose a multiscale saliency refinement network, which generates high-quality salient region masks and salient object contours. Once integrated with multiscale combinatorial grouping and a MAP-based subset optimization framework, our method can generate very promising salient object instance segmentation results. To promote further research and evaluation of salient instance segmentation, we also construct a new database of 1000 images and their pixelwise salient instance annotations. Experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance on all public benchmarks for salient region detection as well as on our new dataset for salient instance segmentation.
As shown above, our method for salient instance segmentation consists of four cascaded components, including salient region detection, salient object contour detection, salient instance generation and salient instance refinement.
Specifically, we propose a deep multiscale refinement network and apply it to both salient region detection and salient object contour detection. Next, we generate a fixed number of salient object proposals on the basis of the results of salient object contour detection and apply a subset optimization method for further screening these object proposals. Finally, the results from the previous three steps areintegrated in a CRF model to generate the final salient instance segmentation.
The architecture of our multiscale refinement network
As salient instance segmentation is a completely new problem, no suitable datasets exist. In order to promote the study of this problem, we have built a new dataset with pixelwise salient instance labels. We initially collected 1388 images. To reduce the ambiguity in salient region detection results, these images were mostly selected from existing datasets for salient region detection, including ECSSD , DUT-OMRON, HKU-IS, and MSO datasets .
Evaluation on Salient Region Detection
Comparison of quantitative results including maximum F-measure (larger is better) and MAE (smaller is better). The best three results on each dataset are shown in red, blue, and green , respectively. Note that the training set of DHSNet includes the testing set of MSRA-B and Dut-OMRON, and the entire MSRA-B dataset is used as part of the training set of RFCN. Corresponding test results are excluded here.
Visual comparison of saliency maps from state-of-the-art methods, including our MSRNet. The ground truth (GT) is shown in the last column. MSRNet consistently produces saliency maps closest to the ground truth.
Evaluation on Salient Region Detection
Examples of salient instance segmentation results by our MSRNet based framework.
Quantitative benchmark results of salient object contourdetection and salient instance segmentation on our new dataset.
 Q. Yan, L. Xu, J. Shi, and J. Jia. Hierarchical saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1155–1162, 2013.
 C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3166–3173, 2013.
 G. Li and Y. Yu. Visual saliency based on multiscale deep features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
 J. Zhang, S. Sclaroff, Z. Lin, X. Shen, B. Price, and R. Mech. Unconstrained salient object detection via proposal subset optimization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.