Multi-class Part Parsing with Joint Boundary-Semantic Awareness
Yifan Zhao, Jia Li*, Yu Zhang,
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University
School of Electronics Engineering and Computer Science, Peking University
Object part parsing in the wild, which requires to simultaneously detect multiple object classes in the scene and accurately segments semantic parts within each class, is challenging for the joint presence of class-level and part-level ambiguities. Despite its importance, however, this problem
is not sufficiently explored in existing works. In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem. To handle part-level ambiguity, a boundary awareness module is proposed to make mid-level features at multiple
scales attend to part boundaries for accurate part localization, which are then fused with high-level features for effective part recognition. For class-level ambiguity, we further present a semantic awareness module that selects discriminative part features relevant to a category to prevent irrelevant features being merged together. The proposed modules are lightweight and implementation friendly, improving the performance substantially when plugged into various baseline architectures. Our full model sets new state-of-the-art results on the Pascal-Part dataset, in both multi-class and
the conventional single-class setting, while running substantially faster than recent high-performance approaches.
Our joint Boundary-Semantic Awareness Network (BSANet) framework, is mainly composed of a boundary aware spatial selection module and a semantic aware channel selection module. The boundary awareness module aims to aggregate the local features near boundaries in low-level and semantic context in high-level, which is supervised by an edge regression loss. Semantic awareness module aims to use the supervised semantic object context to enhance the expression of class-relevant feature channels
Difference and relations
Differences of three pyramid decoders. (a): Top-down pyramid decoder. (b): Top-down pyramid decoder with feature transfer. (c): Spatial aware feature pyramid.
Qualitative comparisons on PASCAL-Part dataset. Our model generates superior results with finer local details and semantic understanding comparing to the-state-of-the-art models.
Segmentation Performance of mIoU on PASCAL-Part Benchmark. Avg.: the average per-object-class mIoU. mIoU: per-part class mIoU. *: use pretrained model on MS-COCO dataset.
Segmentation Performance of mIoU on Pascal-Person-Part Benchmark. *: re-trained on the proposed dataset. Pose An.: learning with auxiliary pose annotation.
2019/8: We have updated the modified annotations.
2019/12: We have updated the the preliminary results. The detailed readme file as well as other resources will be updated soon.
Yifan Zhao, Jia Li*, Yu Zhang*, and Yonghong Tian. Multi-class Part Parsing with Joint Boundary-Semantic Awareness.