CVPR 2024
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei Zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li
CVPR 2024


We delve into pseudo-labeling for semi-supervised monocular 3D object detection (SSM3OD) and discover two primary issues: a misalignment between the prediction quality of 3D and 2D attributes and the tendency of depth supervision derived from pseudo-labels to be noisy, leading to significant optimization conflicts with other reliable forms of supervision. To tackle these issues, we introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD. Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels by separately processing 2D and 3D attributes. This module incorporates a unique homography-based method for identifying dependable pseudo-labels in Bird’s Eye View (BEV) space, specifically for 3D attributes. Additionally, we present a Depth Gradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudolabels, effectively decoupling the depth gradient and removing conflicting gradients. This dual decoupling strategy—at both the pseudo-label generation and gradient levels—significantly improves the utilization of pseudo-labels in SSM3OD. Our comprehensive experiments on the KITTI benchmark demonstrate the superiority of our method over existing approaches.











In this work, we introduced a decoupled pseudo-labeling approach for Semi-Supervised Monocular 3D Object Detection (SSM3OD), designed to optimize the use of pseudolabels more effectively. This approach features a decoupled pseudo-label generation module, incorporating a homography-based pseudo-label mining algorithm to efficiently provide reliable pseudo-labels for both 2D and 3D attributes. Additionally, we developed a depth gradient projection module to mitigate the adverse effects of noisy depth supervision. Comprehensive evaluations on the KITTI benchmark validate the effectiveness of our proposed method, demonstrating its superior performance in SSM3OD.




This work was supported in part by the National Natural Science Foundation of China (NO. 62322608), in part by the CAAI-MindSpore Open Fund, developed on OpenI Community, in part by the Open Project Program of the Key Laboratory of Artificial Intelligence for Perception and Understanding, Liaoning Province (AIPU, No. 20230003).