About ControlNet

ControlNet is a neural network structure designed to control diffusion models by adding extra conditions. It is primarily used for adding conditional control to text-to-image diffusion models. The structure of ControlNet involves duplicating the weights of neural network blocks into a “locked” copy and a “trainable” copy. The “trainable” copy learns your condition, while the “locked” copy preserves your original model.

Here are four key features of ControlNet

  1. Zero Convolution: ControlNet uses a “zero convolution” which is a 1×1 convolution with both weight and bias initialized as zeros. Before training, all zero convolutions output zeros, and ControlNet will not cause any distortion. This ensures that your original model is safe and not distorted during the training process.
  2. Small-Scale Training: ControlNet allows for training on small-scale or even personal devices. This is possible because training with a small dataset of image pairs will not destroy the production-ready diffusion models.
  3. Model Compatibility: ControlNet is friendly to merge/replacement/offsetting of models/weights/blocks/layers. This makes it a versatile tool that can be used in conjunction with various other models and structures.
  4. Stable Diffusion: By repeating the simple structure of ControlNet multiple times, it can control stable diffusion effectively. This allows the ControlNet to reuse the SD encoder as a deep, strong, robust, and powerful backbone to learn diverse controls.