This project investigates how to perform robust and efficient video segmentation while suppressing the effects of data noises and/or corruptions.We propose a general algorithm, called Sub-Optimal Low-rank Decomposition (SOLD), which pursues the low-rank representation for video segmentation. Given the data matrix formed by supervoxel representation of an observed video sequence, SOLD seeks a sub-optimal solution by making the matrix rank explicitly determined. In particular, the affinity matrix with the fixed rank can be decomposed into two submatrices of low rank, and then we iteratively optimize them with closed-form solutions. Moreover, we incorporate a discriminative replication prior into SOLD based on the observation that smallsize video patterns tend to recur frequently within the same object. The Normalized-Cut (NCut) algorithm is adopted with the low-rank representation to segment the video into several spatio-temporal regions. The video is processed in a streaming fashion, i.e. sequentially segmenting a batch of frames, where we further design several temporal consistent constraints to improve the robustness. Extensive experiments on two public challenging datasets VSB100 and SegTrack suggest that our framework outperforms other video segmentation approaches in both accuracy and efficiency.
Fig. 1. Illustration of the unsupervised and interactive segmentation results of our method. The different colors indicate the different regions. The first row shows the quintessential frames in video sequences, and the dash lines in third column indicate the user interactions, in which the red denotes the foreground and the blue denotes the background. Our results and the corresponding ground truth are shown in the middle and last row, respectively.
Fig. 2. Qualitative results of SOLD against other video segmentation methods, including Boundary Precision-Recall (BPR) curves, Volume Precision-Recall (VPR) curves, and a summary of the aggregation performance evaluations, which includes Optimal Dataset Scale (ODS), Optimal Segmentation Scale (OSS) and Average Precision (AP) of BPR and VPR.
Fig. 3. Sample unsupervised segmentation results generated by our SOLD. Different colors indicate different regions.
Fig. 4. Sample interactive segmentation results generated by our SOLD.
In this project, we have proposed a general algorithm for lowrank representation pursuit by decomposing the matrix with the fixed rank and proved that a sub-optimal solution can be achieved by alternating closed-form optimization. Based on this algorithm, we have developed an effective and efficient framework that automatically segments streaming videos in both unsupervised and interactive way. Extensive experiments on the standard benchmarks have demonstrated the superior performances of our approach over other video segmentation methods. In future work, we will improve our video segmentation framework by introducing more robust video features and over-segmentation methods. Our low-rank decomposition algorithm can be also extended to other vision tasks such as multi-object tracking and saliency detection.
1. F. Galasso, N. Nagaraja, T. Cardenas, T. Brox, and B. Schiele. A unified video segmentation benchmark: annotation, metrics and analysis. In Proc. IEEE Int. Conf. Comput. Vis., 2013.
2. J. Corso, E. Sharon, S. Dube, S. El-Saden, U. Sinha, and A. Yuille. Efficient multilevel brain tumor segmentation with integrated bayesian model classification. IEEE Trans. Med. Imag., 27(5): 629–640, May 2008.
3. F. Galasso, R. Cipolla, and B. Schiele. Video segmentation with superpixels. In Proc. Asian Conf. Comput. Vis., 2012.
4. M. Grundmann, V. Kwatra, M. Han, and I. Essa. Efficient hierarchical graph-based video segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010.
5. C. Xu, C. Xiong, and J. J. Corso. Streaming hierarchical video segmentation. In Proc. Eur. Conf. Comput. Vis., 2012.