现有模型信息记录

底模SD1.4

BK-SDM

https://github.com/Nota-NetsPresso/BK-SDM 数据来源:BK-SDM ModelCard

训练信息

训练集:BK-SDM数据

  • Training Data
  • Hardware: A single NVIDIA A100 80GB GPU
  • Gradient Accumulations: 4
  • Batch: 256 (=4×64)
  • Optimizer: AdamW
  • Learning Rate: a constant learning rate of 5e-5 for 50K-iteration pretraining

测试信息

从数据来源拷贝整理 测试集:MS COCO 30k,可以通过BK-SDM的脚本下载或者见文件链接

The following table shows the results on 30K samples from the MS-COCO validation split. After generating 512×512 images with the PNDM scheduler and 25 denoising steps, we downsampled them to 256×256 for evaluating generation scores.

Zero-shot MS-COCO 256×256 30K

  • Our models were drawn at the 50K-th training iteration.
ModelFID↓IS↑CLIP Score↑
(ViT-g/14)
# Params,
U-Net
# Params,
Whole SDM
Stable Diffusion v1.413.0536.760.29580.86B1.04B
BK-SDM-Base (Ours)15.7633.790.28780.58B0.76B
BK-SDM-Base-2M (Ours)14.8134.170.28830.58B0.76B
BK-SDM-Small (Ours)16.9831.680.26770.49B0.66B
BK-SDM-Small-2M (Ours)17.0533.100.27340.49B0.66B
BK-SDM-Tiny (Ours)17.1230.090.26530.33B0.50B
BK-SDM-Tiny-2M (Ours)17.5331.320.26900.33B0.50B
The following figure depicts synthesized images with some MS-COCO captions.

Effect of Different Data Sizes for Training BK-SDM-Small

Increasing the number of training pairs improves the IS and CLIP scores over training progress. The MS-COCO 256×256 30K benchmark was used for evaluation. Furthermore, with the growth in data volume, visual results become more favorable (e.g., better image-text alignment and clear distinction among objects).

Additional Visual Examples

Personalized Generation (Full Finetuning)

To show the applicability of our lightweight SD backbones, we use DreamBooth finetuning for personalized generation.

  • Each subject is marked as “a [identifier] [class noun]” (e.g., “a [V] dog”).
  • Our BK-SDMs can synthesize the input subjects in different backgrounds while preserving their appearance.

底模SD1.5

segmind/tiny-sd

segmind/tiny-sd · Hugging Face

训练信息

code: segmind/distill-sd: Segmind Distilled diffusion (github.com)

segmind/portrait-finetuned

segmind/portrait-finetuned · Hugging Face

大部分信息同segmind/tiny-sd,不过是在portrait images上面finetune过的模型。

训练信息

  • Base Model:  segmind/tiny-sd · Hugging Face
  • Training Data
    •  没细说,只说是portrait images,7k images
  • Steps: 131000
  • Gradient Accumulations: 4
  • Batch: 128 (=4×32)
  • Mixed-precision: fp16
  • Image resolution: 768
  • Learning Rate: 1e-4

测试信息汇总

模型FID-mscoco 30kFID-mscoco 5kUnet ParamsText Encoder ParamsImage Decoder ParmsWhole Parms
runwayml/stable-diffusion-v1-513.870519.45070.860B0.123B0.05B1.032B
segmind/tiny-sd17.726623.04780.323B0.123B0.05B0.495B
tiny-sd-lcm-640022.149327.45310.323B0.123B0.05B0.495B