SDXL parameters (Detailed)


Drawing software: ProcessOn, the following pictures can be saved for high-definition viewing

1 Unet

1.0 Introduction

Responsible for predicting noise

1.1 Detailed overall structure

1.2 Reduced version of the overall structure

1.3 Time step encoding

1.4 CrossAttnDownBlock2D

Each ResnetBlock2D has two inputs

  1. One is the output latent from the previous layer,

  2. Another output from the time step encoding module time_embeds ( shape = [2, 1280], omitted below, the default [2, 1280] is the shape of tersor)

Each Transformer2DModel input has two

  1. Output of the previous layer