ACM MM 2018
Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
Tianshui Chen, Wenxi Wu, Yuefang Gao, Le Dong, Xiaonan Luo, Liang Lin
ACM MM 2018

Introduction


In this work, we propose a hierarchical semantic embedding (HSE) framework that incorporates category hierarchy to aid fine-grained image recogntion. To evaluate the proposed framework, we organize the 200 bird species of the Caltech-UCSD birds dataset with the four-level category hierarchy and construct a large-scale butterfly dataset (butterfly-200) that also covers four level categories. Extensive experiments on these two and the newly-released VegFru datasets demonstrate the superiority of our HSE framework over the baseline methods and existing competitors.

 

 

HSE Framework


Figure 1. An overall pipeline of our proposed hierarchical semantic embedding framework. It employs a trunk network to extract image features and subsequently utilizes a branch network to predict the categories of each level. At each level, it incorporates the predicted score vector to guide learning finer-grained feature and simultaneously regularizes label prediction during training.

 

 

Butterfly-200 dataset


Details

Butterfly-200 is a dataset with images from 200 common species of butterflies. The detailed information is presented as follows.

Image number: 25,279 images.

Category number: 200 species, 116 genera, 23 subfamilies, and 5 families.

Annotations: four level categories.

Download

The images and corresponding annoations can be downloaded from Dropbox.

Sample images and corresponding annotations

 

 

Extended Caltech-UCSD Birds dataset


Details

Extended Caltech-UCSD Birds dataset is an extention of the original page, with annotating each image with four-level categories. The detailed information is presented as follows.

Image number: 11,788 images.

Category number: 200 species, 122 genera, 37 families, and 13 orders.

Annotations: four level categories.

Download

The images can be downloaded from the original page, and the corresponding hierarchical annoations can be downloaded from Dropbox.

Sample images and corresponding annotations

 

 

Experiment results


Table 1. Comparison of the accuracy (in %) of all levels of our HSE framework, two baseline methods, and two variants of our framework that removes semantic embedding representation learning (Ours w/o SERL) and that removes semantic guided label regularization (Ours w/o SGLR) on the CUB and Butterfly-200 test sets, respectively.

Table 2. Comparisons of our HSE framework with existing state of the arts on recognizing categories of finest level on the CUB dataset. BA and PA denote bounding box annotations and part annotations, respectively. √ indicates corresponding annotations are used during training or test.

Table 3. Comparison of accuracy of our HSE framework, existing state-of-the-art methods, and the baseline methods on the VegFru dataset.

 

 

References


Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, 2011.

Saihui Hou, Yushan Feng, and Zilei Wang. VegFru: A Domain-Specific Dataset for Fine-grained Visual Categorization. In ICCV, 2017.