Is Depth Really Necessary for Salient Object Detection?

**Jiawei Zhao, Yifan Zhao, Jia Li*, Xiaowu Chen**

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University

Salient object detection (SOD) is a crucial and preliminary task for many computer vision applications, which have made progress with deep CNNs. Most of the existing methods mainly rely on the RGB information to distinguish the salient objects, which faces difficulties in some complex scenarios. To solve this, many recent RGBD-based networks are proposed by adopting the depth map as an independent input and fuse the features with RGB information. Taking the advantages of RGB and RGBD methods, we propose a novel depth-aware salient object detection framework, which has following superior designs: 1) It does not rely on depth data in the testing phase. 2) It comprehensively optimizes SOD features with multi-level depth-aware regularizations. 3) The depth information also serves as error-weighted map to correct the segmentation process. With these insightful designs combined, we make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference, which not only surpasses the state-of-the-art performance on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin, while adopting less information and implementation light-weighted.

Approach

The overall architecture of our model. Our depth-awareness SOD framework (DASNet) is mainly composed of three parts,~\ie, a salient object detection module, a depth awareness module and an error-weighted correction. ASPP denotes atrous spatial pyramid pooling. CAF denotes the proposed channel-aware fusion module. DEC denotes the proposed depth error-weighted correction. The dashed line denotes supervision.

Relations and Discussions

Different types of SOD architecture. a) : Typical RGB-based SOD network architecture. b): Typical RGBD-based SOD network architecture. c): Proposed Depth-awareness SOD network architecture.

RGB-D SOD Benchmark

Performance comparison with 9 state-of-the-art RGBD-based SOD methods on five benchmarks. The best results are highlighted in bold.

Qualitative comparison of the state-of-the-art RGBD-based methods and our approach. Obviously, saliency maps produced by our model are clearer and more accurate than others in various challenging scenarios.

RGB SOD Benchmark

Performance comparison with 10 state-of-the-art RGB-based SOD methods on five benchmarks. The best results are highlighted in bold.

Qualitative comparison of the state-of-the-art RGB-based methods and our approach. Obviously, saliency maps produced by our model are clearer and more accurate than others in various challenging scenarios.

Computational Efficiency

Complexity comparison with RGB-based models and RGBD-based models. Models ranking the first and second place are viewed in bold and underlined.

Update logs

2020/08: We have updated the results.

Citation

Jiawei Zhao, Yifan Zhao, Jia Li*, Xiaowu Chen. Is Depth Really Necessary for Salient Object Detection?
- Paper: [PDF, 5.7MB]
- RGB-D Results: [NJU2K] [NLPR] [STERE] [DES] [SSD] [LFSD] [SIP]
- RGB Results: [ECSSD] [DUTS] [DUT-OMRON] [HKU-IS] [PASCAL-S]
- Resources: [TrainingSet] [EstimatedDepth]
- Training Code: [Code]
- Evaluation Code: [Evaluation]