ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Bohao Peng¹, Jian Wang¹, Yuechen Zhang¹, Wenbo Li¹, Ming-Chang Yang¹, Jiaya Jia^1,2

¹ The Chinese University of Hong Kong, ² SmartMore

Overview

TL;DR: This work proposes a light-weight controllable module for various base models (SD1.5, SDXL, SD3, SVD) and tasks (image / video generation with various conditions).

⚠️ For the webpage transfer, we compress the images and videos to reduce file size. Please refer to the original files for full quality.

Hover on the picture to see the overlayed condition, and click the picture to see the full view!

SVD + Pose

SDXL + Canny

SD1.5 + Canny

SD3 + SR

Video Generation via Stable Video Diffusion

User Input: First frame image & pose guidance sequence

If you can't load the videos beacause of the network problem, you can also view them through BiliBili

SVD + Pose

Image Generation with SDXL

We trained a canny adapter on SDXL model.

SDXL + Canny

Various Stylization & Editing

Condition

SDXL + Canny

Condition

SDXL + Canny

Condition

SDXL + Canny

Source Image

SDXL + Canny

Condition

SDXL + Canny

Image Generation with 1.5

Our method is also adaptable for community's LoRA weights

Pose Condition

SD1.5 + Warrior

SD1.5 + Genshin

SD1.5 + Chinese Painting

SD1.5 + Animation

We trained a multiple adapters on SD1.5 model.

SD1.5 + Pose + LoRA

SD1.5 + Pose

SD1.5 + Canny

SD1.5 + Mask

SD1.5 + Depth

Image Super-Resolution with SD3

We trained a Super-Resolution ControlNeXt on SD3 with degraded inputs.

SD3 + LR

SD3 + SR

SD3 + LR

SD3 + SR

SD3 + LR

SD3 + SR

SD3 + LR

SD3 + SR

Training convergence

Our method achieves significantly faster convergence during training.

It starts to learn the control abilities within hundreds of training steps.

Training of SD1.5

Contact Us

Feel free to contact Bohao Peng at bhpeng22@cse.cuhk.edu.hk for any question，cooperation, and communication.

If you find this work useful, please consider citing:

@article{peng2024controlnext,
                        title={ControlNeXt: Powerful and Efficient Control for Image and Video Generation},
                        author={Peng, Bohao and Wang, Jian and Zhang, Yuechen and Li, Wenbo and Yang, Ming-Chang and Jia, Jiaya},
                        journal={arXiv preprint arXiv:2408.06070},
                        year={2024}
                      }

ControlNeXt

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Overview

Hover on the picture to see the overlayed condition, and click the picture to see the full view!

Video Generation via Stable Video Diffusion

User Input: First frame image & pose guidance sequence

If you can't load the videos beacause of the network problem, you can also view them through BiliBili

Image Generation with SDXL

We trained a canny adapter on SDXL model.

Image Generation with 1.5

Our method is also adaptable for community's LoRA weights

We trained a multiple adapters on SD1.5 model.

Image Super-Resolution with SD3

We trained a Super-Resolution ControlNeXt on SD3 with degraded inputs.

Training convergence

Our method achieves significantly faster convergence during training.

It starts to learn the control abilities within hundreds of training steps.

Contact Us

Thank UltraPixel to provide us the project page's template!