Add some benchmarks and information
This commit is contained in:
parent
8df76dbaab
commit
77e988365d
|
@ -2,24 +2,129 @@ deeptagger
|
||||||
==========
|
==========
|
||||||
|
|
||||||
This is an automatic image tagger/classifier written in C++,
|
This is an automatic image tagger/classifier written in C++,
|
||||||
without using any Python, and primarily targets various anime models.
|
primarily targeting various anime models.
|
||||||
|
|
||||||
Unfortunately, you will still need Python and some luck to prepare the models,
|
Unfortunately, you will still need Python 3, as well as some luck, to prepare
|
||||||
achieved by running download.sh. You will need about 20 gigabytes of space.
|
the models, achieved by running download.sh. You will need about 20 gigabytes
|
||||||
|
of space for this operation.
|
||||||
|
|
||||||
Very little effort is made to make this work on non-Unix systems.
|
"WaifuDiffusion v1.4" models are officially distributed with ONNX model exports
|
||||||
|
that do not support symbolic batch sizes. The script attempts to fix this
|
||||||
|
by running custom exports.
|
||||||
|
|
||||||
Getting this to work
|
You're invited to change things to suit your particular needs.
|
||||||
--------------------
|
|
||||||
|
Getting it to work
|
||||||
|
------------------
|
||||||
To build the evaluator, install a C++ compiler, CMake, and development packages
|
To build the evaluator, install a C++ compiler, CMake, and development packages
|
||||||
of GraphicsMagick and ONNX Runtime.
|
of GraphicsMagick and ONNX Runtime.
|
||||||
|
|
||||||
Prebuilt ONNX Runtime can be most conveniently downloaded from
|
Prebuilt ONNX Runtime can be most conveniently downloaded from
|
||||||
https://github.com/microsoft/onnxruntime/releases[GitHub releases].
|
https://github.com/microsoft/onnxruntime/releases[GitHub releases].
|
||||||
Remember to install CUDA packages, such as _nvidia-cudnn_ on Debian,
|
Remember to also install CUDA packages, such as _nvidia-cudnn_ on Debian,
|
||||||
if you plan on using the GPU-enabled options.
|
if you plan on using the GPU-enabled options.
|
||||||
|
|
||||||
$ cmake -DONNXRuntime_ROOT=/path/to/onnxruntime -B build
|
$ cmake -DONNXRuntime_ROOT=/path/to/onnxruntime -B build
|
||||||
$ cmake --build build
|
$ cmake --build build
|
||||||
$ ./download.sh
|
$ ./download.sh
|
||||||
$ build/deeptagger models/deepdanbooru-v3-20211112-sgd-e28.model image.jpg
|
$ build/deeptagger models/deepdanbooru-v3-20211112-sgd-e28.model image.jpg
|
||||||
|
|
||||||
|
Very little effort is made to make the project compatible with non-POSIX
|
||||||
|
systems.
|
||||||
|
|
||||||
|
Options
|
||||||
|
-------
|
||||||
|
--batch 1::
|
||||||
|
This program makes use of batches by decoding and preparing multiple images
|
||||||
|
in parallel before sending them off to models.
|
||||||
|
Batching requires appropriate models.
|
||||||
|
--cpu::
|
||||||
|
Force CPU inference, which is usually extremely slow.
|
||||||
|
--debug::
|
||||||
|
Increase verbosity.
|
||||||
|
--options "CUDAExecutionProvider;device_id=0"::
|
||||||
|
Set various ONNX Runtime execution provider options.
|
||||||
|
--pipe::
|
||||||
|
Take input filenames from the standard input.
|
||||||
|
--threshold 0.1::
|
||||||
|
Output weight threshold. Needs to be set very high on ML-Danbooru models.
|
||||||
|
|
||||||
|
Model benchmarks
|
||||||
|
----------------
|
||||||
|
These were measured on a machine with GeForce RTX 4090 (24G),
|
||||||
|
and Ryzen 9 7950X3D (32 threads), on a sample of 704 images,
|
||||||
|
which took over eight hours.
|
||||||
|
|
||||||
|
There is room for further performance tuning.
|
||||||
|
|
||||||
|
GPU inference
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
[cols="<,>,>", options=header]
|
||||||
|
|===
|
||||||
|
|Model|Batch size|Time
|
||||||
|
|ML-Danbooru Caformer dec-5-97527|16|OOM
|
||||||
|
|WD v1.4 ViT v2 (batch)|16|19 s
|
||||||
|
|DeepDanbooru|16|21 s
|
||||||
|
|WD v1.4 SwinV2 v2 (batch)|16|21 s
|
||||||
|
|WD v1.4 ViT v2 (batch)|4|27 s
|
||||||
|
|WD v1.4 SwinV2 v2 (batch)|4|30 s
|
||||||
|
|DeepDanbooru|4|31 s
|
||||||
|
|ML-Danbooru TResNet-D 6-30000|16|31 s
|
||||||
|
|WD v1.4 MOAT v2 (batch)|16|31 s
|
||||||
|
|WD v1.4 ConvNeXT v2 (batch)|16|32 s
|
||||||
|
|WD v1.4 ConvNeXTV2 v2 (batch)|16|36 s
|
||||||
|
|ML-Danbooru TResNet-D 6-30000|4|39 s
|
||||||
|
|WD v1.4 ConvNeXT v2 (batch)|4|39 s
|
||||||
|
|WD v1.4 MOAT v2 (batch)|4|39 s
|
||||||
|
|WD v1.4 ConvNeXTV2 v2 (batch)|4|43 s
|
||||||
|
|WD v1.4 ViT v2|1|43 s
|
||||||
|
|WD v1.4 ViT v2 (batch)|1|43 s
|
||||||
|
|ML-Danbooru Caformer dec-5-97527|4|48 s
|
||||||
|
|DeepDanbooru|1|53 s
|
||||||
|
|WD v1.4 MOAT v2|1|53 s
|
||||||
|
|WD v1.4 ConvNeXT v2|1|54 s
|
||||||
|
|WD v1.4 MOAT v2 (batch)|1|54 s
|
||||||
|
|WD v1.4 SwinV2 v2|1|54 s
|
||||||
|
|WD v1.4 SwinV2 v2 (batch)|1|54 s
|
||||||
|
|WD v1.4 ConvNeXT v2 (batch)|1|56 s
|
||||||
|
|WD v1.4 ConvNeXTV2 v2|1|56 s
|
||||||
|
|ML-Danbooru TResNet-D 6-30000|1|58 s
|
||||||
|
|WD v1.4 ConvNeXTV2 v2 (batch)|1|58 s
|
||||||
|
|ML-Danbooru Caformer dec-5-97527|1|73 s
|
||||||
|
|===
|
||||||
|
|
||||||
|
CPU inference
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
[cols="<,>,>", options=header]
|
||||||
|
|===
|
||||||
|
|Model|Batch size|Time
|
||||||
|
|DeepDanbooru|16|45 s
|
||||||
|
|DeepDanbooru|4|54 s
|
||||||
|
|DeepDanbooru|1|88 s
|
||||||
|
|ML-Danbooru TResNet-D 6-30000|4|139 s
|
||||||
|
|ML-Danbooru TResNet-D 6-30000|16|162 s
|
||||||
|
|ML-Danbooru TResNet-D 6-30000|1|167 s
|
||||||
|
|WD v1.4 ConvNeXT v2|1|208 s
|
||||||
|
|WD v1.4 ConvNeXT v2 (batch)|4|226 s
|
||||||
|
|WD v1.4 ConvNeXT v2 (batch)|16|238 s
|
||||||
|
|WD v1.4 ConvNeXTV2 v2|1|245 s
|
||||||
|
|WD v1.4 ConvNeXTV2 v2 (batch)|4|268 s
|
||||||
|
|WD v1.4 ViT v2 (batch)|16|270 s
|
||||||
|
|WD v1.4 ConvNeXT v2 (batch)|1|272 s
|
||||||
|
|WD v1.4 SwinV2 v2 (batch)|4|277 s
|
||||||
|
|WD v1.4 ViT v2 (batch)|4|277 s
|
||||||
|
|WD v1.4 ConvNeXTV2 v2 (batch)|16|294 s
|
||||||
|
|WD v1.4 SwinV2 v2 (batch)|1|300 s
|
||||||
|
|WD v1.4 SwinV2 v2|1|302 s
|
||||||
|
|WD v1.4 SwinV2 v2 (batch)|16|305 s
|
||||||
|
|WD v1.4 MOAT v2 (batch)|4|307 s
|
||||||
|
|WD v1.4 ViT v2|1|308 s
|
||||||
|
|WD v1.4 ViT v2 (batch)|1|311 s
|
||||||
|
|WD v1.4 ConvNeXTV2 v2 (batch)|1|312 s
|
||||||
|
|WD v1.4 MOAT v2|1|332 s
|
||||||
|
|WD v1.4 MOAT v2 (batch)|16|335 s
|
||||||
|
|WD v1.4 MOAT v2 (batch)|1|339 s
|
||||||
|
|ML-Danbooru Caformer dec-5-97527|4|637 s
|
||||||
|
|ML-Danbooru Caformer dec-5-97527|16|689 s
|
||||||
|
|ML-Danbooru Caformer dec-5-97527|1|829 s
|
||||||
|
|===
|
||||||
|
|
|
@ -0,0 +1,51 @@
|
||||||
|
#!/bin/sh -e
|
||||||
|
parse() {
|
||||||
|
awk 'BEGIN {
|
||||||
|
OFS = FS = "\t"
|
||||||
|
} {
|
||||||
|
name = $1
|
||||||
|
path = $2
|
||||||
|
cpu = $3 != ""
|
||||||
|
batch = $4
|
||||||
|
time = $5
|
||||||
|
|
||||||
|
if (path ~ "/batch-")
|
||||||
|
name = name " (batch)"
|
||||||
|
else if (name ~ /^WD / && batch > 1)
|
||||||
|
next
|
||||||
|
} {
|
||||||
|
group = name FS cpu FS batch
|
||||||
|
if (lastgroup != group) {
|
||||||
|
if (lastgroup)
|
||||||
|
print lastgroup, mintime
|
||||||
|
|
||||||
|
lastgroup = group
|
||||||
|
mintime = time
|
||||||
|
} else {
|
||||||
|
if (mintime > time)
|
||||||
|
mintime = time
|
||||||
|
}
|
||||||
|
} END {
|
||||||
|
print lastgroup, mintime
|
||||||
|
}' "${BENCH_LOG:-bench.out}"
|
||||||
|
}
|
||||||
|
|
||||||
|
cat <<END
|
||||||
|
GPU inference
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
[cols="<,>,>", options=header]
|
||||||
|
|===
|
||||||
|
|Model|Batch size|Time
|
||||||
|
$(parse | awk -F'\t' 'BEGIN { OFS = "|" }
|
||||||
|
!$2 { print "", $1, $3, $4 " s" }' | sort -t'|' -nk4)
|
||||||
|
|===
|
||||||
|
|
||||||
|
CPU inference
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
[cols="<,>,>", options=header]
|
||||||
|
|===
|
||||||
|
|Model|Batch size|Time
|
||||||
|
$(parse | awk -F'\t' 'BEGIN { OFS = "|" }
|
||||||
|
$2 { print "", $1, $3, $4 " s" }' | sort -t'|' -nk4)
|
||||||
|
|===
|
||||||
|
END
|
Loading…
Reference in New Issue