Hierarchical Deep Co-segmentation of Primary Objects in Aerial Videos

Jia Li*   Pengcheng Yuan    Daxin Gu

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University

Yonghong Tian*

School of Electronics Engineering and Computer Science, Peking University

Published in IEEE Multimedia, July. 2018

Primary object segmentation plays an important role in understanding videos generated by unmanned aerial vehicles. In this paper, we propose a large-scale dataset with 500 aerial videos and manually annotated primary objects. To the best of our knowledge, it is the largest dataset to date for primary object segmentation in aerial videos. From this dataset, we find most aerial videos contain large-scale scenes, small primary objects as well as consistently varying scales and viewpoints. Inspired by that, we propose a hierarchical deep co-segmentation approach that repeatedly divides a video into two sub-videos formed by the odd and even frames, respectively. In this manner, the primary objects shared by sub-videos can be co-segmented by training two-stream CNNs and finally refined within the neighborhood reversible flows. Experimental results show that our approach remarkably outperforms 17 state-of-the-art methods in segmenting primary objects in various types of aerial videos.


Frames and ground-truth masks from APD. (a) APD-Human (95 videos), (b) APD-Building (121 videos), (c) APD-Vehicle (56 videos), (d) APD-Boat (180 videos) and (e) APD-Other (48 videos).


The framework of our approach is shown above, which consists of three major stages: 1) hierarchical temporal slicing of aerial videos, 2) mask initialization via video object co-segmentation and 3) mask refinement within neighborhood reversible flows.


Performance benchmark of HDC and state-of-the-art models before being fine-tuned on VOS and APD. The first two models are marked with bold and underline, respectively.