Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Sanoojan Baliah; Qinliang Lin; Shengcai Liao; Xiaodan Liang; Muhammad Haris Khan

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Abstract

Despite promising progress in face swapping task, realistic swapped images remain elusive, often marred by artifacts, particularly in scenarios involving high pose variation, color differences, and occlusion. To address these issues, we propose a novel approach that better harnesses diffusion models for face-swapping by making following core contributions. (a) We propose to reframe the face-swapping task as a self-supervised, train-time inpainting problem, enhancing the identity transfer while blending with the target image. (b) We introduce a multi-step De-noising Diffusion Implicit Model (DDIM) sampling during training, reinforcing identity and perceptual similarities. (c) Third, we introduce CLIP feature disentanglement to extract pose, expression, and lighting information from the target image, improving fidelity. (d) Further, we introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping, with an additional feature of head swapping. Ours can swap hair and even accessories, beyond traditional face swapping. Unlike prior works reliant on multiple off-the-shelf models, ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models. Extensive experiments on FFHQ and CelebA datasets validate the efficacy and robustness of our approach, show-casing high-fidelity, realistic face-swapping with minimal inference time. Our code is available at REFace.

Framework

Experiment

Conclusion

We proposed a train-time diffusion-based inpainting pipeline for face-swapping to obtain realistic swaps. Our introduction of a disentangled CLIP feature further improves the pose and expression perseverance. Furthermore, we propose a simple mask shuffling technique to even handle headswapping task. While our method significantly boosts both the performance (in qualitative and quantitative results) and efficiency (i.e. inference time and training cost), there is still room for improvement, especially under extreme pose and expression variations which we leave for future work.

中山大学人机物智能融合实验室 Human Cyber Physical Intelligence Integration Lab

hcp@sysu.edu.cn
广州市广州大学城外环东路132号

Official Account

News: Achievements; Activities; sharings; Talks

People: Faculty; Students; Alumni

Projects: Computer Vision; Multimodal; Robotics

Links: Git-Lab