ICASSP 2025
Anima2: Cross-Species Animal Animation through Image-to-Video Synthesis with Subject Alignment
Y. Xu, Y. Chen, Z. Huang, Z. He, G. Wang and L. Lin
ICASSP 2025

Abstract


Recent video editing advancements rely on accurate pose sequences to animate human actors. However, these efforts are not suitable for cross-species animation due to pose misalignment between species (for example, the poses of a cat differ greatly from that of a pig due to their distinct body structures). In this paper, we present Anima2, a zero-shot diffusion-based video generator to address this issue, aiming to accurately ANIMAte ANIMAls while preserving the background. The key technique involves two-fold subject alignment. First, we improve appearance feature extraction by integrating a Laplacian detail booster and a prompt-tuning identity extractor. They capture essential appearance information, including identity and fine details. Second, we align shape features and address conflicts from differing animals by introducing a scale-information remover and an adaptive rescaling module. They both enhance subject alignment for accurate cross-species animation. Additionally, we introduce two high-quality animal video datasets with diverse species to benchmark cross-species animation. Trained on these extensive datasets, our model directly generates videos with accurate movements, consistent appearances, and high-fidelity frames, eliminating the need for test-time training. Extensive experiments demonstrate our method’s superiority in cross-species animation, showcasing robust adaptability and generality.

 

 

Framework


 

Experiment


 

 

Conclusion


In this study, we introduce Anima2, a cross-species animation framework. Our approach features subject alignment with four key components: LDB and DIDE for appearance extraction, and SIR and ARM to prevent shape information leakage, ensuring consistent training and precise action control. It also showcases robust adaptability and generality.