Add some benchmarks and information

This commit is contained in:
Přemysl Eric Janouch 2024-01-18 18:16:18 +01:00
parent 8df76dbaab
commit 77e988365d
Signed by: p
GPG Key ID: A0420B94F92B9493
2 changed files with 163 additions and 7 deletions

View File

@ -2,24 +2,129 @@ deeptagger
==========
This is an automatic image tagger/classifier written in C++,
without using any Python, and primarily targets various anime models.
primarily targeting various anime models.
Unfortunately, you will still need Python and some luck to prepare the models,
achieved by running download.sh. You will need about 20 gigabytes of space.
Unfortunately, you will still need Python 3, as well as some luck, to prepare
the models, achieved by running download.sh. You will need about 20 gigabytes
of space for this operation.
Very little effort is made to make this work on non-Unix systems.
"WaifuDiffusion v1.4" models are officially distributed with ONNX model exports
that do not support symbolic batch sizes. The script attempts to fix this
by running custom exports.
Getting this to work
--------------------
You're invited to change things to suit your particular needs.
Getting it to work
------------------
To build the evaluator, install a C++ compiler, CMake, and development packages
of GraphicsMagick and ONNX Runtime.
Prebuilt ONNX Runtime can be most conveniently downloaded from
https://github.com/microsoft/onnxruntime/releases[GitHub releases].
Remember to install CUDA packages, such as _nvidia-cudnn_ on Debian,
Remember to also install CUDA packages, such as _nvidia-cudnn_ on Debian,
if you plan on using the GPU-enabled options.
$ cmake -DONNXRuntime_ROOT=/path/to/onnxruntime -B build
$ cmake --build build
$ ./download.sh
$ build/deeptagger models/deepdanbooru-v3-20211112-sgd-e28.model image.jpg
Very little effort is made to make the project compatible with non-POSIX
systems.
Options
-------
--batch 1::
This program makes use of batches by decoding and preparing multiple images
in parallel before sending them off to models.
Batching requires appropriate models.
--cpu::
Force CPU inference, which is usually extremely slow.
--debug::
Increase verbosity.
--options "CUDAExecutionProvider;device_id=0"::
Set various ONNX Runtime execution provider options.
--pipe::
Take input filenames from the standard input.
--threshold 0.1::
Output weight threshold. Needs to be set very high on ML-Danbooru models.
Model benchmarks
----------------
These were measured on a machine with GeForce RTX 4090 (24G),
and Ryzen 9 7950X3D (32 threads), on a sample of 704 images,
which took over eight hours.
There is room for further performance tuning.
GPU inference
~~~~~~~~~~~~~
[cols="<,>,>", options=header]
|===
|Model|Batch size|Time
|ML-Danbooru Caformer dec-5-97527|16|OOM
|WD v1.4 ViT v2 (batch)|16|19 s
|DeepDanbooru|16|21 s
|WD v1.4 SwinV2 v2 (batch)|16|21 s
|WD v1.4 ViT v2 (batch)|4|27 s
|WD v1.4 SwinV2 v2 (batch)|4|30 s
|DeepDanbooru|4|31 s
|ML-Danbooru TResNet-D 6-30000|16|31 s
|WD v1.4 MOAT v2 (batch)|16|31 s
|WD v1.4 ConvNeXT v2 (batch)|16|32 s
|WD v1.4 ConvNeXTV2 v2 (batch)|16|36 s
|ML-Danbooru TResNet-D 6-30000|4|39 s
|WD v1.4 ConvNeXT v2 (batch)|4|39 s
|WD v1.4 MOAT v2 (batch)|4|39 s
|WD v1.4 ConvNeXTV2 v2 (batch)|4|43 s
|WD v1.4 ViT v2|1|43 s
|WD v1.4 ViT v2 (batch)|1|43 s
|ML-Danbooru Caformer dec-5-97527|4|48 s
|DeepDanbooru|1|53 s
|WD v1.4 MOAT v2|1|53 s
|WD v1.4 ConvNeXT v2|1|54 s
|WD v1.4 MOAT v2 (batch)|1|54 s
|WD v1.4 SwinV2 v2|1|54 s
|WD v1.4 SwinV2 v2 (batch)|1|54 s
|WD v1.4 ConvNeXT v2 (batch)|1|56 s
|WD v1.4 ConvNeXTV2 v2|1|56 s
|ML-Danbooru TResNet-D 6-30000|1|58 s
|WD v1.4 ConvNeXTV2 v2 (batch)|1|58 s
|ML-Danbooru Caformer dec-5-97527|1|73 s
|===
CPU inference
~~~~~~~~~~~~~
[cols="<,>,>", options=header]
|===
|Model|Batch size|Time
|DeepDanbooru|16|45 s
|DeepDanbooru|4|54 s
|DeepDanbooru|1|88 s
|ML-Danbooru TResNet-D 6-30000|4|139 s
|ML-Danbooru TResNet-D 6-30000|16|162 s
|ML-Danbooru TResNet-D 6-30000|1|167 s
|WD v1.4 ConvNeXT v2|1|208 s
|WD v1.4 ConvNeXT v2 (batch)|4|226 s
|WD v1.4 ConvNeXT v2 (batch)|16|238 s
|WD v1.4 ConvNeXTV2 v2|1|245 s
|WD v1.4 ConvNeXTV2 v2 (batch)|4|268 s
|WD v1.4 ViT v2 (batch)|16|270 s
|WD v1.4 ConvNeXT v2 (batch)|1|272 s
|WD v1.4 SwinV2 v2 (batch)|4|277 s
|WD v1.4 ViT v2 (batch)|4|277 s
|WD v1.4 ConvNeXTV2 v2 (batch)|16|294 s
|WD v1.4 SwinV2 v2 (batch)|1|300 s
|WD v1.4 SwinV2 v2|1|302 s
|WD v1.4 SwinV2 v2 (batch)|16|305 s
|WD v1.4 MOAT v2 (batch)|4|307 s
|WD v1.4 ViT v2|1|308 s
|WD v1.4 ViT v2 (batch)|1|311 s
|WD v1.4 ConvNeXTV2 v2 (batch)|1|312 s
|WD v1.4 MOAT v2|1|332 s
|WD v1.4 MOAT v2 (batch)|16|335 s
|WD v1.4 MOAT v2 (batch)|1|339 s
|ML-Danbooru Caformer dec-5-97527|4|637 s
|ML-Danbooru Caformer dec-5-97527|16|689 s
|ML-Danbooru Caformer dec-5-97527|1|829 s
|===

51
deeptagger/bench-interpret.sh Executable file
View File

@ -0,0 +1,51 @@
#!/bin/sh -e
parse() {
awk 'BEGIN {
OFS = FS = "\t"
} {
name = $1
path = $2
cpu = $3 != ""
batch = $4
time = $5
if (path ~ "/batch-")
name = name " (batch)"
else if (name ~ /^WD / && batch > 1)
next
} {
group = name FS cpu FS batch
if (lastgroup != group) {
if (lastgroup)
print lastgroup, mintime
lastgroup = group
mintime = time
} else {
if (mintime > time)
mintime = time
}
} END {
print lastgroup, mintime
}' "${BENCH_LOG:-bench.out}"
}
cat <<END
GPU inference
~~~~~~~~~~~~~
[cols="<,>,>", options=header]
|===
|Model|Batch size|Time
$(parse | awk -F'\t' 'BEGIN { OFS = "|" }
!$2 { print "", $1, $3, $4 " s" }' | sort -t'|' -nk4)
|===
CPU inference
~~~~~~~~~~~~~
[cols="<,>,>", options=header]
|===
|Model|Batch size|Time
$(parse | awk -F'\t' 'BEGIN { OFS = "|" }
$2 { print "", $1, $3, $4 " s" }' | sort -t'|' -nk4)
|===
END