We proposes a reconfigurable model to recognize and detect multiclass (or multiview) objects with large variation in appearance. Compared with well acknowledged hierarchical models, we study two advanced capabilities in hierarchy for object modeling: (i)“switch” variables(i.e. or-nodes) for specifying alternative compositions, and (ii) making local classifiers (i.e. leaf-nodes) shared among different classes. These capabilities enable us to account well for structural variabilities while preserving the model compact.
Fig 1. An example of the proposed 4-layer And-Or graph model for multiclass object recognition.
Our model, in the form of an And-Or Graph, comprises four layers: a batch of leaf-nodes with collaborative edges in bottom for localizing object parts; the or-nodes over bottom to activate their children leaf-nodes; the and-nodes to classify objects as a whole; one root-node on the top for switching multiclass classification, which is also an or-node. For model training, we present an EM-type algorithm, namely dynamical structural optimization (DSO), to iteratively determine the structural configuration, (e.g., leaf-node generation associated with their parent or-nodes and shared across other classes), along with optimizing multi-layer parameters. The proposed method is valid on challenging databases, e.g., PASCAL VOC 2007 and UIUC-People, and it achieves state-of-the-arts performance.
The main properties of our approach can be highlighted as:
- Model reconfigurability: Inspired by And-Or graph models in [2,3], we develop the “switch variables”, namely or-nodes, to specify alternative compositions in hierarchy. It worths mentioning that the association of or-nodes with its children leaf-nodes can be automatically determined in model training. In Fig.1, the sheep head is localized by the leaf-node that is activated by its parent or-node.
- Model sharing.: The leaf-nodes are sharable among different classes so that we keep the model compact to represent multiple object categories. For example, in Fig.1, the part of feet in category horse and sheep have similar appearances, and thus can be both detected by the leaf-node shared across the two classes.
Dynamical Structural Optimization(DSO)
This learning algorithm is an EM-type procedure that incorporating structure reconfiguration and parameter estimation. It is extended from the CCCP procedure. During each iteration, our algorithm dynamically create and remove leaf-nodes associated with their parent or-nodes, and share leaf-nodes among classes.
Fig.2 Dynamical Structural Optimization.(a) The model structure after the first iteration; (b) A new leaf-node is created to recognize the head of sheep; (c) A leaf-node for sheep leg is shared with the horse; (d) A leaf-node for horse leg is removed.
We evaluate our method on two challenging datasets: UIUC people and PASCAL VOC 2007.
Fig.3 Visualization of the trained model on UIUC people dataset. (a) shows parts of the model with two classes (views)；(b) visualizes two detectors that are composed by 9 activated leaf-nodes.
We show the results of detection as follows.
Table.1 Detection accuracies on UIUC people dataset. (Ours(full):our full system; Ours-3: And-Or Graph model without sharing leaf-nodes)
Table.2 Results on PASCAL VOC 2007. We also simplify the model by setting the parameters of collaborative edges to zero. Two models are generated under this setting, denoted by “Ours-1” and “Ours-2”, by turning on/off the leaf-nodes sharing among classes, respectively.
We also evaluate the benefits of sharing leaf-nodes.
Fig.4 (a) shows the APs on UIUC people dataset. (b) represents the leaf-node numbers with the increasing of object categories on PASCAL VOC 2007 dataset. (Ours(full):our full system; Ours-3: And-Or Graph model without sharing leaf-nodes)
 Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection. X. Wang, L. Lin, L. Huang, and S. Yan in CVPR 2013
 Latent Hierarchical Structural Learning for Object Detection. L. Zhu, Y. Chen, Y. Lu, C. Lin, and A. Yuille, in CVPR 2010
 Learning hierarchical poselets for human parsing. Y. Wang, D. Tran, and Z. Liao, in CVPR 2011
 The concave-convex procedure(cccp). A. Yuille and A. Rangarajan, in NIPS 2001