ICCV 2019
Crowd Counting with Deep Structured Scale Integration Network
Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin*
ICCV 2019

Abstract


    Automatic estimation of the number of people in unconstrained crowded scenes is a challenging task and one major difficulty stems from the huge scale variation of people. In this paper, we propose a novel Deep Structured Scale Integration Network (DSSINet) for crowd counting, which addresses the scale variation of people by using structured feature representation learning and hierarchically structured loss function optimization. Unlike conventional methods which directly fuse multiple features with weighted average or concatenation, we first introduce a Structured Feature Enhancement Module based on conditional random fields (CRFs) to refine multiscale features mutually with a message passing mechanism. In this module, each scale-specific feature is considered as a continuous random variable and passes complementary information to refine the features at other scales. Second, we utilize a Dilated Multiscale Structural Similarity loss to enforce our DSSINet to learn the local correlation of people’s scales within regions of various size, thus yielding high-quality density maps. Extensive experiments on four challenging benchmarks well demonstrate the effectiveness of our method. Specifically, our DSSINet achieves improvements of 9.5% error reduction on Shanghaitech dataset and 24.9% on UCF-QNRF dataset against the state-of-the-art methods.

 

 

Framework


 

 

Experiment


 

 

 

Conclusion


In this paper, we develop a Deep Structured Scale Integration Network for crowd counting, which handles the huge variation of people’s scales from two aspects, including structured feature representation learning and hierarchically structured loss function optimization. First, a Structured Feature Enhancement Module based on conditional random fields (CRFs) is proposed to mutually refine multiple features and boost their robustness. Second, we utilize a Dilated Multiscale Structural Similarity Loss to force our network to learn the local correlation within regions of various sizes, thereby producing locally consistent estimation results. Extensive experiments on four benchmarks show that our method achieves superior performance in comparison to the state-of-the-art methods.