ICASSP 2025
EasyControl: Adding Control to Video Diffusion for Controllable Video Generation and Interpolation
Cong Wang, Jiaxi Gu, Panwen Hu, Xiao Dong, Yuanfan Guo, Hang Xu, Xiaodan Liang
ICASSP 2025

Abstract


The diffusion model is widely leveraged for either controllable video generation or video interpolation. As each field has its task-specific problems, it is difficult to merely develop a single model for completing both tasks simultaneously. Moreover, most existing works only support image conditions and necessitate redesigning the model structure to accommodate other types of conditions. Even so, they still face frame flickering issues when using the image as the condition due to the strong alignment of image pixels. To tackle these problems, in this work, we are the first to propose a unified diffusion framework, EasyControl, for both tasks of controllable video generation and interpolation with different types of conditions. The proposed EasyControl introduces a condition adapter to extract the condition features, which is then injected into an interchangeable fundamental text-to-video model to guide the video generation. To alleviate frame flicker problems, we propose a module named VideoInit to integrate the low-frequency band of input condition images, ensuring smoother generation. Experimental results on four benchmarks suggest that our method outperforms the previous methods on each task.

 

 

Framework


 

 

Experiment


 

 

Conclusion


In this work, We propose that EasyControl unifies both controllable video generation and video interpolation tasks in a single framework, which realizes controllable video generation under various conditions and generates intermediate frames between given conditions. The proposed VideoInit methods further enhance the quality and fluency of generated videos. Our qualitative and quantitative experiments demonstrate that EasyControl outperforms the previous methods on each task.