CVPR 2017
Attention-Aware Face Hallucination via Deep Reinforcement Learning
Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, Guanbin Li
CVPR 2017


Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images. In contrast to existing methods that often learn a single patch-to-patch mapping from LR to HR images and are regardless of the contextual interdependency between patches, we propose a novel Attention-aware Face Hallucination (Attention-FH) framework which resorts to deep reinforcement learning for sequentially discovering attended patches and then performing the facial part enhancement by fully exploiting the global interdependency of the image. Specifically, in each time step, the recurrent policy network is proposed to dynamically specify a new attended region by incorporating what happened in the past. The state (i.e., face hallucination result for the whole image) can thus be exploited and updated by the local enhancement network on the selected region. The Attention-FH approach jointly learns the recurrent policy network and local enhancement network through maximizing the long-term reward that reflects the hallucination performance over the whole image. Therefore, our proposed Attention-FH is capable of adaptively personalizing an optimal searching path for each face image according to its own characteristic. Extensive experiments show our approach significantly surpasses the state-of-the-arts on in-the-wild faces with large pose and illumination variations.




We propose an Attention-aware face hallucination (Attention-FH) framework that recurrently discovers facial parts and enhance them by fully exploiting the global interdependency of the image, as shown in Fig. 1. In particular, accounting for the diverse characteristics of face images on blurriness, pose, illumination and face appearance, we explore to search for an optimal accommodated enhancement route for each face hallucination. And we resort to the deep reinforcement learning (RL) method to harness the model learning since this technique has been demonstrated its effectiveness on globally optimizing the sequential models without supervision for every step.


Figure 1: Sequentially discovering and enhancing facial parts in our Attention-FH framework. At each time step, our framework specifies an attended region based on the past hallucination results and enhances it by considering the global perspective of the whole face. The red solid bounding boxes indicate the latest perceived patch in each step and the blue dashed bounding boxes indicate all the previously enhanced regions. We adopt a global reward at the end of sequence to drive the framework learning under Reinforcement Learning paradigm.


Figure 2: Network architecture of our recurrent policy network and local enhancement network. At each time step, the recurrent policy network takes a current hallucination result It-1 and action history vector encoded by LSTM (512 hidden states) as the input and then outputs the action probabilities for all W * H  locations, where W, H are the height and width of the input image. The policy network first encodes the It-1 with one fully-connected layer (256 neurons), and then fuse the encoded image and the action vector with a LSTM layer. Finally a fully-connected linear layer is appended to generate the W * H-way probabilities. Given the probability map, we extract the local patch, then pass the patch and It-1 into the local enhancement network to generate the enhanced patch. The local enhancement network is constructed by two fully-connected layers (each with 256 neurons) for encoding It-1 and 8 cascaded convolutional layers for image patch enhancement. Thus a new face hallucination result can be generated by replacing the local patch with an enhanced patch.




expTable. 1: Comparison between our method and others in terms of PSNR, SSIM and FSIM evaluate metrics. We employ the bold face to label the first place result and underline to label the second in each column.


comparison_8x_2_highFigure 3: Qualitative results on LFW-funneled with scaling factor of 8.

attentive_example_highFigure 4: Example results of enhancement sequences and corresponding patches selected by the agent. The gray in some patches indicates area outside of original image. Best viewed by zooming in the electronic version.




[1] C.-Y. Yang, S. Liu, and M.-H. Yang. Structured face hallucination. In CVPR 2013.

[2] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin. Learning face hallucination in the wild. In AAAI 2015.

[3] X. Ma, J. Zhang, and C. Qi. Hallucinating face by positionpatch. Pattern Recognition, 43(6):2224–2236, 2010.

[4] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In ECCV, 2014.

[5] J. Kim, J. K. Lee, and K. M. Lee. Accurate image superresolution using very deep convolutional networks. In CVPR, 2016.

[6] O. Tuzel, Y. Taguchi, and J. R. Hershey. Global-local face upsampling network. arXiv preprint arXiv:1603.07235, 2016.