Add some benchmarks and information
This commit is contained in:
parent
8df76dbaab
commit
77e988365d
@ -2,24 +2,129 @@ deeptagger
|
||||
==========
|
||||
|
||||
This is an automatic image tagger/classifier written in C++,
|
||||
without using any Python, and primarily targets various anime models.
|
||||
primarily targeting various anime models.
|
||||
|
||||
Unfortunately, you will still need Python and some luck to prepare the models,
|
||||
achieved by running download.sh. You will need about 20 gigabytes of space.
|
||||
Unfortunately, you will still need Python 3, as well as some luck, to prepare
|
||||
the models, achieved by running download.sh. You will need about 20 gigabytes
|
||||
of space for this operation.
|
||||
|
||||
Very little effort is made to make this work on non-Unix systems.
|
||||
"WaifuDiffusion v1.4" models are officially distributed with ONNX model exports
|
||||
that do not support symbolic batch sizes. The script attempts to fix this
|
||||
by running custom exports.
|
||||
|
||||
Getting this to work
|
||||
--------------------
|
||||
You're invited to change things to suit your particular needs.
|
||||
|
||||
Getting it to work
|
||||
------------------
|
||||
To build the evaluator, install a C++ compiler, CMake, and development packages
|
||||
of GraphicsMagick and ONNX Runtime.
|
||||
|
||||
Prebuilt ONNX Runtime can be most conveniently downloaded from
|
||||
https://github.com/microsoft/onnxruntime/releases[GitHub releases].
|
||||
Remember to install CUDA packages, such as _nvidia-cudnn_ on Debian,
|
||||
Remember to also install CUDA packages, such as _nvidia-cudnn_ on Debian,
|
||||
if you plan on using the GPU-enabled options.
|
||||
|
||||
$ cmake -DONNXRuntime_ROOT=/path/to/onnxruntime -B build
|
||||
$ cmake --build build
|
||||
$ ./download.sh
|
||||
$ build/deeptagger models/deepdanbooru-v3-20211112-sgd-e28.model image.jpg
|
||||
|
||||
Very little effort is made to make the project compatible with non-POSIX
|
||||
systems.
|
||||
|
||||
Options
|
||||
-------
|
||||
--batch 1::
|
||||
This program makes use of batches by decoding and preparing multiple images
|
||||
in parallel before sending them off to models.
|
||||
Batching requires appropriate models.
|
||||
--cpu::
|
||||
Force CPU inference, which is usually extremely slow.
|
||||
--debug::
|
||||
Increase verbosity.
|
||||
--options "CUDAExecutionProvider;device_id=0"::
|
||||
Set various ONNX Runtime execution provider options.
|
||||
--pipe::
|
||||
Take input filenames from the standard input.
|
||||
--threshold 0.1::
|
||||
Output weight threshold. Needs to be set very high on ML-Danbooru models.
|
||||
|
||||
Model benchmarks
|
||||
----------------
|
||||
These were measured on a machine with GeForce RTX 4090 (24G),
|
||||
and Ryzen 9 7950X3D (32 threads), on a sample of 704 images,
|
||||
which took over eight hours.
|
||||
|
||||
There is room for further performance tuning.
|
||||
|
||||
GPU inference
|
||||
~~~~~~~~~~~~~
|
||||
[cols="<,>,>", options=header]
|
||||
|===
|
||||
|Model|Batch size|Time
|
||||
|ML-Danbooru Caformer dec-5-97527|16|OOM
|
||||
|WD v1.4 ViT v2 (batch)|16|19 s
|
||||
|DeepDanbooru|16|21 s
|
||||
|WD v1.4 SwinV2 v2 (batch)|16|21 s
|
||||
|WD v1.4 ViT v2 (batch)|4|27 s
|
||||
|WD v1.4 SwinV2 v2 (batch)|4|30 s
|
||||
|DeepDanbooru|4|31 s
|
||||
|ML-Danbooru TResNet-D 6-30000|16|31 s
|
||||
|WD v1.4 MOAT v2 (batch)|16|31 s
|
||||
|WD v1.4 ConvNeXT v2 (batch)|16|32 s
|
||||
|WD v1.4 ConvNeXTV2 v2 (batch)|16|36 s
|
||||
|ML-Danbooru TResNet-D 6-30000|4|39 s
|
||||
|WD v1.4 ConvNeXT v2 (batch)|4|39 s
|
||||
|WD v1.4 MOAT v2 (batch)|4|39 s
|
||||
|WD v1.4 ConvNeXTV2 v2 (batch)|4|43 s
|
||||
|WD v1.4 ViT v2|1|43 s
|
||||
|WD v1.4 ViT v2 (batch)|1|43 s
|
||||
|ML-Danbooru Caformer dec-5-97527|4|48 s
|
||||
|DeepDanbooru|1|53 s
|
||||
|WD v1.4 MOAT v2|1|53 s
|
||||
|WD v1.4 ConvNeXT v2|1|54 s
|
||||
|WD v1.4 MOAT v2 (batch)|1|54 s
|
||||
|WD v1.4 SwinV2 v2|1|54 s
|
||||
|WD v1.4 SwinV2 v2 (batch)|1|54 s
|
||||
|WD v1.4 ConvNeXT v2 (batch)|1|56 s
|
||||
|WD v1.4 ConvNeXTV2 v2|1|56 s
|
||||
|ML-Danbooru TResNet-D 6-30000|1|58 s
|
||||
|WD v1.4 ConvNeXTV2 v2 (batch)|1|58 s
|
||||
|ML-Danbooru Caformer dec-5-97527|1|73 s
|
||||
|===
|
||||
|
||||
CPU inference
|
||||
~~~~~~~~~~~~~
|
||||
[cols="<,>,>", options=header]
|
||||
|===
|
||||
|Model|Batch size|Time
|
||||
|DeepDanbooru|16|45 s
|
||||
|DeepDanbooru|4|54 s
|
||||
|DeepDanbooru|1|88 s
|
||||
|ML-Danbooru TResNet-D 6-30000|4|139 s
|
||||
|ML-Danbooru TResNet-D 6-30000|16|162 s
|
||||
|ML-Danbooru TResNet-D 6-30000|1|167 s
|
||||
|WD v1.4 ConvNeXT v2|1|208 s
|
||||
|WD v1.4 ConvNeXT v2 (batch)|4|226 s
|
||||
|WD v1.4 ConvNeXT v2 (batch)|16|238 s
|
||||
|WD v1.4 ConvNeXTV2 v2|1|245 s
|
||||
|WD v1.4 ConvNeXTV2 v2 (batch)|4|268 s
|
||||
|WD v1.4 ViT v2 (batch)|16|270 s
|
||||
|WD v1.4 ConvNeXT v2 (batch)|1|272 s
|
||||
|WD v1.4 SwinV2 v2 (batch)|4|277 s
|
||||
|WD v1.4 ViT v2 (batch)|4|277 s
|
||||
|WD v1.4 ConvNeXTV2 v2 (batch)|16|294 s
|
||||
|WD v1.4 SwinV2 v2 (batch)|1|300 s
|
||||
|WD v1.4 SwinV2 v2|1|302 s
|
||||
|WD v1.4 SwinV2 v2 (batch)|16|305 s
|
||||
|WD v1.4 MOAT v2 (batch)|4|307 s
|
||||
|WD v1.4 ViT v2|1|308 s
|
||||
|WD v1.4 ViT v2 (batch)|1|311 s
|
||||
|WD v1.4 ConvNeXTV2 v2 (batch)|1|312 s
|
||||
|WD v1.4 MOAT v2|1|332 s
|
||||
|WD v1.4 MOAT v2 (batch)|16|335 s
|
||||
|WD v1.4 MOAT v2 (batch)|1|339 s
|
||||
|ML-Danbooru Caformer dec-5-97527|4|637 s
|
||||
|ML-Danbooru Caformer dec-5-97527|16|689 s
|
||||
|ML-Danbooru Caformer dec-5-97527|1|829 s
|
||||
|===
|
||||
|
51
deeptagger/bench-interpret.sh
Executable file
51
deeptagger/bench-interpret.sh
Executable file
@ -0,0 +1,51 @@
|
||||
#!/bin/sh -e
|
||||
parse() {
|
||||
awk 'BEGIN {
|
||||
OFS = FS = "\t"
|
||||
} {
|
||||
name = $1
|
||||
path = $2
|
||||
cpu = $3 != ""
|
||||
batch = $4
|
||||
time = $5
|
||||
|
||||
if (path ~ "/batch-")
|
||||
name = name " (batch)"
|
||||
else if (name ~ /^WD / && batch > 1)
|
||||
next
|
||||
} {
|
||||
group = name FS cpu FS batch
|
||||
if (lastgroup != group) {
|
||||
if (lastgroup)
|
||||
print lastgroup, mintime
|
||||
|
||||
lastgroup = group
|
||||
mintime = time
|
||||
} else {
|
||||
if (mintime > time)
|
||||
mintime = time
|
||||
}
|
||||
} END {
|
||||
print lastgroup, mintime
|
||||
}' "${BENCH_LOG:-bench.out}"
|
||||
}
|
||||
|
||||
cat <<END
|
||||
GPU inference
|
||||
~~~~~~~~~~~~~
|
||||
[cols="<,>,>", options=header]
|
||||
|===
|
||||
|Model|Batch size|Time
|
||||
$(parse | awk -F'\t' 'BEGIN { OFS = "|" }
|
||||
!$2 { print "", $1, $3, $4 " s" }' | sort -t'|' -nk4)
|
||||
|===
|
||||
|
||||
CPU inference
|
||||
~~~~~~~~~~~~~
|
||||
[cols="<,>,>", options=header]
|
||||
|===
|
||||
|Model|Batch size|Time
|
||||
$(parse | awk -F'\t' 'BEGIN { OFS = "|" }
|
||||
$2 { print "", $1, $3, $4 " s" }' | sort -t'|' -nk4)
|
||||
|===
|
||||
END
|
Loading…
Reference in New Issue
Block a user