4/30/2023 0 Comments Promptdog latest release![]() ![]() This tool may be useful for artistic creations. We also trained a relatively simple ControlNet for anime line drawings. ![]() You can see that the hairstyle of the man in the input image is modified by depth model, but preserved by the normal model. Below is the depth result with same inputs. This is intuitive: minor details are not salient in depth maps, but are salient in normal maps. Prompt: "Plaster statue of Abraham Lincoln"Ĭompared to depth model, this model seems to be a bit better at preserving the geometry. Rightnow in the APP, the normal is computed from the midas depth map and a user threshold (to determine how many area is background with identity normal face to viewer, tune the "Normal background threshold" in the gradio app to get a feeling). Please use this mode if you are using 8GB GPU(s) or if you want larger batch size. See also Guess Mode / Non-Prompt Mode.Ģ - Now you can play with any community model by Transferring the ControlNet.ġ - Low VRAM mode is added. Great! Features & Newsģ - We released a discussion - Precomputed ControlNet: Speed up ControlNet by 45%, but is it necessary?Ħ - We released a blog - Ablation Study: Why ControlNets use deep encoder? What if it was lighter? Or even an MLP?Ġ - Implementation for non-prompt mode released. The required GPU memory is not much larger than original SD, although many layers are added. The original SD encoder does not need to store gradients (the locked original SD Encoder Block 1234 and Middle). Note that the way we connect layers is computational efficient. Many evidences (like this and this) validate that the SD encoder is an excellent backbone. In this way, the ControlNet can reuse the SD encoder as a deep, strong, robust, and powerful backbone to learn diverse controls. Stable Diffusion + ControlNetīy repeating the above simple structure 14 times, we can control stable diffusion in this way: Why "zero convolution" works?Ī: This is not true. Q: But wait, if the weight of a conv layer is zero, the gradient will also be zero, and the network will not learn anything. This is also friendly to merge/replacement/offsetting of models/weights/blocks/layers. This allows training on small-scale or even personal devices. ![]() The "zero convolution" is 1×1 convolution with both weight and bias initialized as zeros.īefore training, all zero convolutions output zeros, and ControlNet will not cause any distortion. Thanks to this, training with small dataset of image pairs will not destroy the production-ready diffusion models. The "trainable" one learns your condition. It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy. Official implementation of Adding Conditional Control to Text-to-Image Diffusion Models.ĬontrolNet is a neural network structure to control diffusion models by adding extra conditions. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |