Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

ACM MM 2018

Tianshui Chen, Wenxi Wu, Yuefang Gao, Le Dong, Xiaonan Luo, Liang Lin

ACM MM 2018

Introduction

In this work, we propose a hierarchical semantic embedding (HSE) framework that incorporates category hierarchy to aid fine-grained image recogntion. To evaluate the proposed framework, we organize the 200 bird species of the Caltech-UCSD birds dataset with the four-level category hierarchy and construct a large-scale butterfly dataset (butterfly-200) that also covers four level categories. Extensive experiments on these two and the newly-released VegFru datasets demonstrate the superiority of our HSE framework over the baseline methods and existing competitors.

HSE Framework

Figure 1. An overall pipeline of our proposed hierarchical semantic embedding framework. It employs a trunk network to extract image features and subsequently utilizes a branch network to predict the categories of each level. At each level, it incorporates the predicted score vector to guide learning finer-grained feature and simultaneously regularizes label prediction during training.

Butterfly-200 dataset

Details

Butterfly-200 is a dataset with images from 200 common species of butterflies. The detailed information is presented as follows.

Image number: 25,279 images.

Category number: 200 species, 116 genera, 23 subfamilies, and 5 families.

Annotations: four level categories.

Download

The images and corresponding annoations can be downloaded from Dropbox.

Sample images and corresponding annotations

Extended Caltech-UCSD Birds dataset

Details

Extended Caltech-UCSD Birds dataset is an extention of the original page, with annotating each image with four-level categories. The detailed information is presented as follows.

Image number: 11,788 images.

Category number: 200 species, 122 genera, 37 families, and 13 orders.

Annotations: four level categories.

Download

The images can be downloaded from the original page, and the corresponding hierarchical annoations can be downloaded from Dropbox.

Sample images and corresponding annotations

Experiment results

Table 1. Comparison of the accuracy (in %) of all levels of our HSE framework, two baseline methods, and two variants of our framework that removes semantic embedding representation learning (Ours w/o SERL) and that removes semantic guided label regularization (Ours w/o SGLR) on the CUB and Butterfly-200 test sets, respectively.

Table 2. Comparisons of our HSE framework with existing state of the arts on recognizing categories of finest level on the CUB dataset. BA and PA denote bounding box annotations and part annotations, respectively. √ indicates corresponding annotations are used during training or test.

Table 3. Comparison of accuracy of our HSE framework, existing state-of-the-art methods, and the baseline methods on the VegFru dataset.

References

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, 2011.

Saihui Hou, Yushan Feng, and Zilei Wang. VegFru: A Domain-Specific Dataset for Fine-grained Visual Categorization. In ICCV, 2017.

中山大学人机物智能融合实验室 Human Cyber Physical Intelligence Integration Lab

hcp@sysu.edu.cn
广州市广州大学城外环东路132号

Official Account

News: Achievements; Activities; sharings; Talks

People: Faculty; Students; Alumni

Projects: Computer Vision; Multimodal; Robotics

Links: Git-Lab