CVPR 2025
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang, Jiajin Tang, Zhengxuan Wei, Yu Wu, Guanbin Li, Sibei Yang
CVPR 2025

Abstract

 


Class-incremental/Continual image segmentation (CIS) aims to train an image segmenter in stages, where the set of available categories differs at each stage. To leverage  the built-in objectness of query-based transformers, which mitigates catastrophic forgetting of mask proposals, current methods often decouple mask generation from the continual learning process. This study, however, identifies two key issues with decoupled frameworks: loss of plasticity and heavy reliance on input data order. To address these, we conduct an in-depth investigation of the built-in objectness and find that highly aggregated image features provide a shortcut for queries to generate masks through simple feature alignment. Based on this, we propose SimCIS, a simple yet powerful baseline for CIS. Its core idea is to directly select image features for query assignment, ensuring “perfect alignment” to preserve objectness, while simultaneously allowing queries to select new classes to promote plasticity. To further combat catastrophic forgetting of categories, we introduce cross-stage consistency in selection and an innovative “visual query”-based replay mechanism. Experiments demonstrate that SimCIS consistently outperforms state-of-the-art methods across various segmentation tasks, settings, splits, and input data orders. 

 

 

Framework


 

 

 

 

 

 

Experiment


 

 

 

 

 

 

 

 

 

 

Conclusion


In this work, we present a novel class-incremental image segmentation (CIS) method called SimCIS, which addresses the challenges of catastrophic forgetting and background semantic shift. We first explore the emergence and diminishing of built-in objectness in query-based transformers and then propose two novel modules: lazy query pre-alignment and consistent selection loss, to ensure both intra-stage and cross-stage built-in objectness. Additionally, we introduce virtual queries to mitigate catastrophic forgetting in class prediction. Comparisons with previous state-of-the-art CIS methods and our ablation study demonstrate the superiority of each individual component in our model, highlighting its effectiveness in overcoming the key challenges of incremental learning.