Detecting pedestrian has been arguably addressed as a special topic beyond general object detection. Although recent deep learning object detectors such as Fast/Faster R-CNN [1,2] have shown excellent performance for general object detection, they have limited success for detecting pedestrian, and previous leading pedestrian detectors were in general hybrid methods combining hand-crafted and deep convolutional features. In this paper, we investigate issues involving Faster R-CNN  for pedestrian detection. We discover that the Region Proposal Network (RPN) in Faster R-CNN indeed performs well as a stand-alone pedestrian detector, but surprisingly, the downstream classiﬁer degrades the results. We argue that two reasons account for the unsatisfactory accuracy: (i) insuﬃcient resolution of feature maps for handling small instances, and (ii) lack of any bootstrapping strategy for mining hard negative examples. Driven by these observations, we propose a very simple but eﬀective baseline for pedestrian detection, using an RPN followed by boosted forests on shared, high-resolution convolutional feature maps. We comprehensively evaluate this method on several benchmarks (Caltech, INRIA, ETH, and KITTI), presenting competitive accuracy and good speed. Code will be made publicly available.
Fig.1: Comparisons on the Caltech set (legends indicate MR).
Fig.2: Comparisons on the Caltech set using IoU > 0.7 to determine True Positives (legends indicate MR).
Fig.3: Comparisons on the Caltech-New set (legends indicate MR−2 (MR−4)).
Fig.4: Comparisons on the INRIA dataset (legends indicate MR).
Fig.5: Comparisons on the ETH dataset (legends indicate MR).
Table 1: Comparisons on the KITTI dataset collected at the time of submission (Feb 2016). The timing records are collected from the KITTI leaderboard. †: region proposal running time ignored (estimated 2s).
- Ross Girshick. Fast R-CNN. In IEEE International Conference on Computer Vision (ICCV), 2015.
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NIPS), 2015.