T-NNLS 2022
Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction
Lingbo Liu, Zewei Yang, Guanbin Li, Kuo Wang, Tianshui Chen and Liang Lin
T-NNLS 2022

Abstract


Land remote-sensing analysis is a crucial research in earth science. In this work, we focus on a challenging task of land analysis, i.e., automatic extraction of traffic roads from remote-sensing data, which has widespread applications in urban development and expansion estimation. Nevertheless, conventional methods either only utilized the limited information of aerial images, or simply fused multimodal information (e.g., vehicle trajectories), thus cannot well recognize unconstrained roads. To facilitate this problem, we introduce a novel neural network framework termed cross-modal message propagation network (CMMPNet), which fully benefits the complementary different modal data (i.e., aerial images and crowdsourced trajectories). Specifically, CMMPNet is composed of two deep autoencoders for modality-specific representation learning and a tailor-designed dual enhancement module for cross-modal representation refinement. In particular, the complementary information of each modality is comprehensively extracted and dynamically propagated to enhance the representation of another modality. Extensive experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction benefiting from blending different modal data, either using image and trajectory data or image and light detection and ranging (LiDAR) data. From the experimental results, we observe that the proposed approach outperforms current state-of-the-art methods by large margins. Our source code is resealed on the project page http://lingboliu.com/multimodal_road_extraction.html
 

 

 

Framework


We propose a novel Cross-Modal Message Propagation Network (CMMPNet) for multimodal road extraction. Specifically, our CMMPNet is composed of (i) two deep AutoEncoders that take an aerial image and a trajectory heat-map respectively to learn modality-specific features, and (ii) a Dual Enhancement Module (DEM) that dynamically propagates the non-local messages (NLM, i.e, local one and global one) of every modality with gated functions to enhance the representation of another modality. The final features of the image and trajectory heat-map are concatenated to generate a traffic road map.

Fig.1 Architecture of the proposed CMMPNet for multimodal road extraction. Specifically, our CMMPNet is composed of 1) two deep autoencoders that take an aerial image and a trajectory heat-map, respectively, to learn modality-specific features and 2) a DEM that dynamically propagates the NLMs (i.e., local one and global one) of every modality with gated functions to enhance the representation of another modality. The final features of the image and trajectory heat-map are concatenated to generate a traffic road map.

 

Experiment


 

 

Conclusion


In this work, we investigate a challenging task for land remote-sensing analysis, i.e., how to robustly extract traffic roads using the complementary information of aerial images and vehicle crowdsourced trajectories. To this end, we introduce a novel CMMPNet, which learns modalityspecific features explicitly with two individual autoencoders and enhances these features mutually with a tailor-designed DEM. Specifically, we comprehensively extract and dynamically propagate the complementary information of each modality to enhance the representation of another modality. Extensive experiments conducted on two real-world benchmarks show that the proposed CMMPNet is not only effective for image + trajectory-based road extraction, but also suitable for image + LiDAR-based road extraction.

Nevertheless, there are still several issues worthy of further study. First, the connectivity of traffic roads has not been explicitly explored in conventional works. Intuitively, the temporal information of vehicle trajectories could be utilized to distinguish disconnected road regions (e.g., urban roads are usually separated by fences and green belts). However, existing image + trajectory datasets lack the road connectivity annotation. To facilitate the researches in this field, we will construct a large-scale multimodal road extraction with rich connectivity annotation and propose a multimodal spatial-temporal framework to explicitly estimate the road connectivity in future work. Second, some elevated roads at different heights are overlapped on aerial images. The height information accessed with GPS devices is relatively coarse. Thus in future work, we will also develop some advanced approaches to effectively recognize the roads at different heights with the coarse height information of crowdsourced trajectories.
 

 

 

Acknowledgement


This work was supported in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2020B1515020048; in part by the National Natural Science Foundation of China under Grant 61976250, Grant U1811463, and Grant 61836012; and in part by the Guangzhou Science and Technology Project under Grant 202102020633.
 

 

 

References


  • Leveraging crowdsourced gps data for road extraction from aerial imagery, CVPR 2019
  • Deepdualmapper: A gated fusion network for automatic map extraction using aerial images and trajectories, AAAI 2020
  • D-linknet: Linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction, CVPR Workshop 2018