CVPR 2025
Boosting the Dual-Stream Architecture in Ultra-High Resolution Segmentation with Resolution-Biased Uncertainty Estimation
Abstract
Over the last decade, significant efforts have been dedicated to designing efficient models for the challenge of ultra-high resolution (UHR) semantic segmentation. These model mainly follow the dual-stream architecture and generally fall into three subcategories according to the improvement objectives, i.e., dual-stream ensemble, selective zoom, and complementary learning. However, most of them overly concentrate on crafting complex pipelines to pursue one of the above objectives separately, limiting the model performance in both accuracy and inference consumption. In this paper, we suggest simultaneously achieving these objectives by estimating resolution-biased uncertainties in low resolution stream. Here, the resolution-biased uncertainty refers to the degree of prediction unreliability primarily caused by resolution loss from down-sampling operations. Specifically, we propose a dual-stream UHR segmentation framework, where an estimator is used to assess resolution-biased uncertainties through the entropy map and high-frequency feature residual. The framework also includes a selector, an ensembler, and a complementer to boost the model with obtained estimations. They share the uncertainty estimations as the weights to choose difficult regions as the inputs for UHR stream, perform weighted fusion between distinct streams, and enhance the learning for important pixels, respectively. Experiment results demonstrate that our method achieves a satisfactory balance between accuracy and inference consumption against other state-of-the-art (SOTA) methods.
Framework
Experiment
Conclusion
In this paper, we propose a dual-stream UHR segmentation framework, in which an estimator is used to firstly assess resolution-biased uncertainties in low-resolution stream from entropy map and high-frequency feature residual. Besides, an uncertainty-aware selector and ensembler are utilized in the framework to select uncertain regions as the inputs for UHR stream and perform uncertainty weighted fusion between distinct streams. Moreover, an uncertainty-based complementer is included in the framework for better model optimization, which performs uncertainty-based pixel-wise weighting on the task loss to increase the importance of the uncertain pixels in the learning process of the UHR stream. Experimental results demonstrate the superiority of our approach in terms of accuracy and inference consumption across various UHR segmentation datasets.