DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode

ICCV 2025

Junjia Huang, Pengxiang Yan, Jinhang Cai, Jiyang Liu, Zhao Wang, Yitong Wang, Xinglong Wu, Guanbin Li

ICCV 2025

Abstract

Text-driven image generation using diffusion models has recently gained significant attention. To enable more flexible image manipulation and editing, recent research has expanded from single image generation to transparent layer generation and multi-layer compositions. However, existing approaches often fail to provide a thorough exploration of multi-layer structures, leading to inconsistent inter-layer interactions, such as occlusion relationships, spatial layout, and shadowing. In this paper, we introduce DreamLayer, a novel framework that enables coherent text-driven generation of multiple image layers, by explicitly modeling the relationship between transparent foreground and background layers. DreamLayer incorporates three key components, i.e., Context-Aware Cross-Attention (CACA) for global-local information exchange, Layer-Shared Self-Attention (LSSA) for establishing robust inter-layer connections, and Information Retained Harmonization (IRH) for refining fusion details at the latent level. By leveraging a coherent full-image context, DreamLayer builds inter-layer connections through attention mechanisms and applies a harmonization step to achieve seamless layer fusion. To facilitate research in multi-layer generation, we construct a high-quality, diverse multi-layer dataset including 400k samples. Extensive experiments and user studies demonstrate that DreamLayer generates more coherent and well-aligned layers, with broad applicability, including latent-space image editing and image-to-layer decomposition.

Framework

Experiment

Conclusion

In this paper, we introduce a large-scale, high-quality multi-layer dataset featuring diverse foreground objects and backgrounds. Building on this, we propose DreamLayer, a framework for simultaneously generating multi-layer images. To address layout consistency among foreground layers, we introduce Context-Aware Cross-Attention, which guides foreground generation using the harmonious layout of a global image. To enhance inter-layer connections, we present Layer-Shared Self-Attention, enabling effective information exchange between layers. Finally, to generate a cohesive composite image, we propose Information Retained Harmonization, which merges layers at the latent level to achieve seamless fusion. DreamLayer support not only multi-layer generation but also layer decomposition for image-to-layer task with inversion, enabling flexible editing within the latent space for har monious adjustments. Experimental results demonstrate the effectiveness of DreamLayer in multi-layer generation.

中山大学人机物智能融合实验室 Human Cyber Physical Intelligence Integration Lab

hcp@sysu.edu.cn
广州市广州大学城外环东路132号

Official Account

News: Achievements; Activities; sharings; Talks

People: Faculty; Students; Alumni

Projects: Computer Vision; Multimodal; Robotics

Links: Git-Lab