This project aims to develop an integrated system for clothing co-parsing: given an image database of clothes/human, where all images are unsegmented but annotated with tags, jointly parse the images into semantic clothing configurations. To incorporate prior knowledge on clothing, we present a semantic template for clothing that arranges diverse clothes tags based on the spatial layout of human and garment co-occurrence. Then we propose a framework consisting of two phases of optimization, guided by the semantic template:
(i) image co-segmentation for extracting clothes regions: We first group regions in every image, and then propagate and refine segmentation jointly over all images by employing exemplar-SVMs;
(ii) region co-labeling for recognizing cloth components: We assign a garment tag to each region by modeling the problem as a multi-image graphical model.
Average recall of some garment items with high occurrences in Fashionista.
|aPA: average Pixel Accuracy (%)
mAGR: mean Average Garment Recall (%)
Some parsing results are showed in the following figure. Our method could easily parse clothes accurately enough even in some challenging illumination and complex background conditions. Moreover, our method could even parse some small garments such as belt, purse, hat, sunglasses, etc. For reasonably ambiguous clothing patterns such as dotted t-shirt or colorful dress, our framework could give satisfying results. In addition, the proposed method could even parse several persons in a single image simultaneously.
- K. Yamaguchi, H. Kiapour, L. E. Ortiz, and T. L. Berg. Parsing clothing in fashion photographs. CVPR, 2012.
- X. Liu, B. Cheng, S. Yan, J. Tang, T. S. Chua, and H. Jin. Label to region by bi-layer sparsity priors. In ACM MM, 2009.
- J. Shotton, M. Johnson, and R. Cipolla. Semantic texton forests for image categorization and segmentation. In CVPR, 2008.