Multi-class Part Parsing with Joint Boundary-Semantic Awareness

Yifan Zhao,   Jia Li*,   Yu Zhang,  

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University

Yonghong Tian

School of Electronics Engineering and Computer Science, Peking University

Object part parsing in the wild, which requires to simultaneously detect multiple object classes in the scene and accurately segments semantic parts within each class, is challenging for the joint presence of class-level and part-level ambiguities. Despite its importance, however, this problem is not sufficiently explored in existing works. In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem. To handle part-level ambiguity, a boundary awareness module is proposed to make mid-level features at multiple scales attend to part boundaries for accurate part localization, which are then fused with high-level features for effective part recognition. For class-level ambiguity, we further present a semantic awareness module that selects discriminative part features relevant to a category to prevent irrelevant features being merged together. The proposed modules are lightweight and implementation friendly, improving the performance substantially when plugged into various baseline architectures. Our full model sets new state-of-the-art results on the Pascal-Part dataset, in both multi-class and the conventional single-class setting, while running substantially faster than recent high-performance approaches.

 Approach


Our joint Boundary-Semantic Awareness Network (BSANet) framework, is mainly composed of a boundary aware spatial selection module and a semantic aware channel selection module. The boundary awareness module aims to aggregate the local features near boundaries in low-level and semantic context in high-level, which is supervised by an edge regression loss. Semantic awareness module aims to use the supervised semantic object context to enhance the expression of class-relevant feature channels

 Difference and relations


Differences of three pyramid decoders. (a): Top-down pyramid decoder. (b): Top-down pyramid decoder with feature transfer. (c): Spatial aware feature pyramid.

 Visualizations


Qualitative comparisons on PASCAL-Part dataset. Our model generates superior results with finer local details and semantic understanding comparing to the-state-of-the-art models.

 Multi-class Benchmark


Segmentation Performance of mIoU on PASCAL-Part Benchmark. Avg.: the average per-object-class mIoU. mIoU: per-part class mIoU. *: use pretrained model on MS-COCO dataset.

 PASCAL-Person Benchmark


Segmentation Performance of mIoU on Pascal-Person-Part Benchmark. *: re-trained on the proposed dataset. Pose An.: learning with auxiliary pose annotation.

 Update logs


2019/8: We have updated the modified annotations.

2019/12: We have updated the the preliminary results. The detailed readme file as well as other resources will be updated soon.

 Citation