Graph-based High-Order Relation Discovery for Fine-grained Recognition

Yifan Zhao,  Ke Yan,  Feiyue Huang,   Jia Li*,  

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University

Tecent Youtu Lab, Shanghai, China

Fine-grained object recognition aims to learn effective features that can identify the subtle differences between visually similar objects. Most of the existing works tend to amplify discriminative part regions with attention mechanisms. Besides its unstable performance under complex backgrounds, the intrinsic interrelationship between different semantic features is less explored. Toward this end, we propose an effective graph-based relation discovery approach to build a contextual understanding of highorder relationships. In our approach, a high-dimensional feature bank is first formed and jointly regularized with semantic- and positional-aware high-order constraints, endowing rich attributes to feature representations. Second, to overcome the high-dimension curse, we propose a graph based semantic grouping strategy to embed this high-order tensor bank into a low-dimensional space. Meanwhile, a group-wise learning strategy is proposed to regularize the features focusing on the cluster embedding center. With the collaborative learning of three modules, our module is able to grasp the stronger contextual details of fine-grained objects. Experimental evidence demonstrates our approach achieves new state-of-the-art on 4 widely-used fine-grained object recognition benchmarks


The motivation of proposed approach. Our proposed approach first exploits the structurally channel-aware relationship b) into a high-dimensional graph embedding. Then these relation nodes are grouped into low-dimensional space d) with a semantic grouping strategy, forming the final grouped activations f).


The proposed graph-based relation discovery (GaRD) approach consists of three essential modules: the relation-discovery module to extract rich relation-aware high-dimension features, the graph-based semantic grouping module to find low-dimension feature embeddings, and the group-wise learning strategy is adopted to update the gradient using class centers.

 Comparisons and Relations

Illustrations of different mutual attention methods. a) Bilinear pooling [28]: building channel-aware second-order relations, using vectorized features. b) Trilinear attention [49]: third-order channel relations, preserving the original feature shape. c) Our relation-discovery module: joint positiona


Performance comparisons on several representative datasets. More results can be found in our paper.