CVPR 2025
No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition
Abstract
Over the last decade, many notable methods have emerged to tackle the computational resource challenge of the high resolution image recognition (HRIR). They typically focus on identifying and aggregating a few salient regions for classification, discarding sub-salient areas for low training consumption. Nevertheless, many HRIR tasks necessitate the exploration of wider regions to model objects and contexts, which limits their performance in such scenarios. To address this issue, we present a DBPS strategy to enable training with more patches at low consumption. Specifically, in addition to a fundamental buffer that stores the embeddings of most salient patches, DBPS further employs an auxiliary buffer to recycle those sub-salient ones. To reduce the computational cost associated with gradients of sub-salient patches, these patches are primarily used in the forward pass to provide sufficient information for classification. Meanwhile, only the gradients of the salient patches are back-propagated to update the entire network. Moreover, we design a Multiple Instance Learning (MIL) architecture that leverages aggregated information from salient patches to filter out uninformative background within subsalient patches for better accuracy. Besides, we introduce the random patch drop to accelerate training process and uncover informative regions. Experiment results demonstrate the superiority of our method in terms of both accuracy and training consumption against other advanced methods.
Framework
Experiment
Conclusion
In this paper, we propose the dual-buffer patch selection (DBPS) method to increase the number of image patches used in training HRIR models while keeping computational resource consumption at a low level. To suppress the uninformative background information in the sub-salient image patches, we devise a dual-attention MIL architecture to generate a salient query for aggregating sub-salient patch embeddings. Additionally, we introduce an efficient random patch drop training strategy to uncover informative image regions while reducing both the training time and GPU memory usage. Experimental results demonstrate the effectiveness of our approach in terms of accuracy and training consumption across various HRIR tasks and datasets.