gallery/deeptagger/README.adoc

deeptagger
==========

This is an automatic image tagger/classifier written in C++,
primarily targeting various anime models.

Unfortunately, you will still need Python 3, as well as some luck, to prepare
the models, achieved by running download.sh.  You will need about 20 gigabytes
of space for this operation.

"WaifuDiffusion v1.4" models are officially distributed with ONNX model exports
that do not support symbolic batch sizes.  The script attempts to fix this
by running custom exports.

You're invited to change things to suit your particular needs.

Getting it to work
------------------
To build the evaluator, install a C++ compiler, CMake, and development packages
of GraphicsMagick and ONNX Runtime.

Prebuilt ONNX Runtime can be most conveniently downloaded from
https://github.com/microsoft/onnxruntime/releases[GitHub releases].
Remember to also install CUDA packages, such as _nvidia-cudnn_ on Debian,
if you plan on using the GPU-enabled options.

 $ cmake -DONNXRuntime_ROOT=/path/to/onnxruntime -B build
 $ cmake --build build
 $ ./download.sh
 $ build/deeptagger models/deepdanbooru-v3-20211112-sgd-e28.model image.jpg

The project requires a POSIX-compatible system to build.

Options
-------
--batch 1::
	This program makes use of batches by decoding and preparing multiple images
	in parallel before sending them off to models.
	Batching requires appropriate models.
--cpu::
	Force CPU inference, which is usually extremely slow.
--debug::
	Increase verbosity.
--options "CUDAExecutionProvider;device_id=0"::
	Set various ONNX Runtime execution provider options.
--pipe::
	Take input filenames from the standard input.
--threshold 0.1::
	Output weight threshold.  Needs to be set higher on ML-Danbooru models.

Tagging galleries
-----------------
The appropriate invocation depends on your machine, and the chosen model.
Unless you have a powerful machine, or use a fast model, it may take forever.

 $ find "$GALLERY/images" -type f \
   | build/deeptagger --pipe -b 16 -t 0.5 \
     models/ml_caformer_m36_dec-5-97527.model \
   | sed 's|[^\t]*/||' \
   | gallery tag "$GALLERY" caformer "ML-Danbooru CAFormer"

Model benchmarks (Linux)
------------------------
These were measured with ORT 1.16.3 on a machine with GeForce RTX 4090 (24G),
and Ryzen 9 7950X3D (32 threads), on a sample of 704 images,
which took over eight hours.  Times include model loading.

There is room for further performance tuning.

GPU inference
~~~~~~~~~~~~~
[cols="<,>,>", options=header]
|===
|Model|Batch size|Time
|WD v1.4 ViT v2 (batch)|16|19 s
|DeepDanbooru|16|21 s
|WD v1.4 SwinV2 v2 (batch)|16|21 s
|ML-Danbooru CAFormer dec-5-97527|16|25 s
|WD v1.4 ViT v2 (batch)|4|27 s
|WD v1.4 SwinV2 v2 (batch)|4|30 s
|DeepDanbooru|4|31 s
|ML-Danbooru TResNet-D 6-30000|16|31 s
|WD v1.4 MOAT v2 (batch)|16|31 s
|WD v1.4 ConvNeXT v2 (batch)|16|32 s
|ML-Danbooru CAFormer dec-5-97527|4|32 s
|WD v1.4 ConvNeXTV2 v2 (batch)|16|36 s
|ML-Danbooru TResNet-D 6-30000|4|39 s
|WD v1.4 ConvNeXT v2 (batch)|4|39 s
|WD v1.4 MOAT v2 (batch)|4|39 s
|WD v1.4 ConvNeXTV2 v2 (batch)|4|43 s
|WD v1.4 ViT v2|1|43 s
|WD v1.4 ViT v2 (batch)|1|43 s
|ML-Danbooru CAFormer dec-5-97527|1|52 s
|DeepDanbooru|1|53 s
|WD v1.4 MOAT v2|1|53 s
|WD v1.4 ConvNeXT v2|1|54 s
|WD v1.4 MOAT v2 (batch)|1|54 s
|WD v1.4 SwinV2 v2|1|54 s
|WD v1.4 SwinV2 v2 (batch)|1|54 s
|WD v1.4 ConvNeXT v2 (batch)|1|56 s
|WD v1.4 ConvNeXTV2 v2|1|56 s
|ML-Danbooru TResNet-D 6-30000|1|58 s
|WD v1.4 ConvNeXTV2 v2 (batch)|1|58 s
|===

CPU inference
~~~~~~~~~~~~~
[cols="<,>,>", options=header]
|===
|Model|Batch size|Time
|DeepDanbooru|16|45 s
|DeepDanbooru|4|54 s
|DeepDanbooru|1|88 s
|ML-Danbooru TResNet-D 6-30000|4|139 s
|ML-Danbooru TResNet-D 6-30000|16|162 s
|ML-Danbooru TResNet-D 6-30000|1|167 s
|WD v1.4 ConvNeXT v2|1|208 s
|WD v1.4 ConvNeXT v2 (batch)|4|226 s
|WD v1.4 ConvNeXT v2 (batch)|16|238 s
|WD v1.4 ConvNeXTV2 v2|1|245 s
|WD v1.4 ConvNeXTV2 v2 (batch)|4|268 s
|WD v1.4 ViT v2 (batch)|16|270 s
|ML-Danbooru CAFormer dec-5-97527|4|270 s
|WD v1.4 ConvNeXT v2 (batch)|1|272 s
|WD v1.4 SwinV2 v2 (batch)|4|277 s
|WD v1.4 ViT v2 (batch)|4|277 s
|WD v1.4 ConvNeXTV2 v2 (batch)|16|294 s
|WD v1.4 SwinV2 v2 (batch)|1|300 s
|WD v1.4 SwinV2 v2|1|302 s
|WD v1.4 SwinV2 v2 (batch)|16|305 s
|ML-Danbooru CAFormer dec-5-97527|16|305 s
|WD v1.4 MOAT v2 (batch)|4|307 s
|WD v1.4 ViT v2|1|308 s
|WD v1.4 ViT v2 (batch)|1|311 s
|WD v1.4 ConvNeXTV2 v2 (batch)|1|312 s
|WD v1.4 MOAT v2|1|332 s
|WD v1.4 MOAT v2 (batch)|16|335 s
|WD v1.4 MOAT v2 (batch)|1|339 s
|ML-Danbooru CAFormer dec-5-97527|1|352 s
|===

Model benchmarks (macOS)
------------------------
These were measured with ORT 1.16.3 on a MacBook Pro, M1 Pro (16GB),
macOS Ventura 13.6.2, on a sample of 179 images.  Times include model loading.

There was often significant memory pressure and swapping,
which may explain some of the anomalies.  CoreML often makes things worse,
and generally consumes a lot more memory than pure CPU execution.

The kernel panic was repeatable.

GPU inference
~~~~~~~~~~~~~
[cols="<2,>1,>1", options=header]
|===
|Model|Batch size|Time
|DeepDanbooru|1|24 s
|DeepDanbooru|8|31 s
|DeepDanbooru|4|33 s
|WD v1.4 SwinV2 v2 (batch)|4|71 s
|WD v1.4 SwinV2 v2 (batch)|1|76 s
|WD v1.4 ViT v2 (batch)|4|97 s
|WD v1.4 ViT v2 (batch)|8|97 s
|ML-Danbooru TResNet-D 6-30000|8|100 s
|ML-Danbooru TResNet-D 6-30000|4|101 s
|WD v1.4 ViT v2 (batch)|1|105 s
|ML-Danbooru TResNet-D 6-30000|1|125 s
|WD v1.4 ConvNeXT v2 (batch)|8|126 s
|WD v1.4 SwinV2 v2 (batch)|8|127 s
|WD v1.4 ConvNeXT v2 (batch)|4|128 s
|WD v1.4 ConvNeXTV2 v2 (batch)|8|132 s
|WD v1.4 ConvNeXTV2 v2 (batch)|4|133 s
|WD v1.4 ViT v2|1|146 s
|WD v1.4 ConvNeXT v2 (batch)|1|149 s
|WD v1.4 ConvNeXTV2 v2 (batch)|1|160 s
|WD v1.4 MOAT v2 (batch)|1|165 s
|WD v1.4 SwinV2 v2|1|166 s
|ML-Danbooru CAFormer dec-5-97527|1|263 s
|WD v1.4 ConvNeXT v2|1|273 s
|WD v1.4 MOAT v2|1|273 s
|WD v1.4 ConvNeXTV2 v2|1|340 s
|ML-Danbooru CAFormer dec-5-97527|4|445 s
|ML-Danbooru CAFormer dec-5-97527|8|1790 s
|WD v1.4 MOAT v2 (batch)|4|kernel panic
|===

CPU inference
~~~~~~~~~~~~~
[cols="<2,>1,>1", options=header]
|===
|Model|Batch size|Time
|DeepDanbooru|8|54 s
|DeepDanbooru|4|55 s
|DeepDanbooru|1|75 s
|WD v1.4 SwinV2 v2 (batch)|8|93 s
|WD v1.4 SwinV2 v2 (batch)|4|94 s
|ML-Danbooru TResNet-D 6-30000|8|97 s
|WD v1.4 SwinV2 v2 (batch)|1|98 s
|ML-Danbooru TResNet-D 6-30000|4|99 s
|WD v1.4 SwinV2 v2|1|99 s
|ML-Danbooru CAFormer dec-5-97527|4|110 s
|ML-Danbooru CAFormer dec-5-97527|8|110 s
|WD v1.4 ViT v2 (batch)|4|111 s
|WD v1.4 ViT v2 (batch)|8|111 s
|WD v1.4 ViT v2 (batch)|1|113 s
|WD v1.4 ViT v2|1|113 s
|ML-Danbooru TResNet-D 6-30000|1|118 s
|ML-Danbooru CAFormer dec-5-97527|1|122 s
|WD v1.4 ConvNeXT v2 (batch)|8|124 s
|WD v1.4 ConvNeXT v2 (batch)|4|125 s
|WD v1.4 ConvNeXTV2 v2 (batch)|8|129 s
|WD v1.4 ConvNeXT v2|1|130 s
|WD v1.4 ConvNeXTV2 v2 (batch)|4|131 s
|WD v1.4 MOAT v2 (batch)|8|134 s
|WD v1.4 ConvNeXTV2 v2|1|136 s
|WD v1.4 MOAT v2 (batch)|4|136 s
|WD v1.4 ConvNeXT v2 (batch)|1|146 s
|WD v1.4 MOAT v2 (batch)|1|156 s
|WD v1.4 MOAT v2|1|156 s
|WD v1.4 ConvNeXTV2 v2 (batch)|1|157 s
|===

Comparison with WDMassTagger
----------------------------
Using CUDA, on the same Linux computer as above, on a sample of 6352 images.
We're a bit slower, depending on the model.
Batch sizes of 16 and 32 give practically equivalent results for both.

[cols="<,>,>,>", options="header,autowidth"]
|===
|Model|WDMassTagger|deeptagger (batch)|Ratio
|wd-v1-4-convnext-tagger-v2   |1:18 |1:55 |68 %
|wd-v1-4-convnextv2-tagger-v2 |1:20 |2:10 |62 %
|wd-v1-4-moat-tagger-v2       |1:22 |1:52 |73 %
|wd-v1-4-swinv2-tagger-v2     |1:28 |1:34 |94 %
|wd-v1-4-vit-tagger-v2        |1:16 |1:22 |93 %
|===
Add a deep tagger in C++ 2024-01-07 23:26:05 +01:00			`deeptagger`
			`==========`

			`This is an automatic image tagger/classifier written in C++,`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`primarily targeting various anime models.`
Add a deep tagger in C++ 2024-01-07 23:26:05 +01:00
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`Unfortunately, you will still need Python 3, as well as some luck, to prepare`
			`the models, achieved by running download.sh. You will need about 20 gigabytes`
			`of space for this operation.`
Add a deep tagger in C++ 2024-01-07 23:26:05 +01:00
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`"WaifuDiffusion v1.4" models are officially distributed with ONNX model exports`
			`that do not support symbolic batch sizes. The script attempts to fix this`
			`by running custom exports.`
Add a deep tagger in C++ 2024-01-07 23:26:05 +01:00
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`You're invited to change things to suit your particular needs.`

			`Getting it to work`
			`------------------`
Add a deep tagger in C++ 2024-01-07 23:26:05 +01:00			`To build the evaluator, install a C++ compiler, CMake, and development packages`
			`of GraphicsMagick and ONNX Runtime.`

			`Prebuilt ONNX Runtime can be most conveniently downloaded from`
			`https://github.com/microsoft/onnxruntime/releases[GitHub releases].`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`Remember to also install CUDA packages, such as _nvidia-cudnn_ on Debian,`
Add a deep tagger in C++ 2024-01-07 23:26:05 +01:00			`if you plan on using the GPU-enabled options.`

			`$ cmake -DONNXRuntime_ROOT=/path/to/onnxruntime -B build`
			`$ cmake --build build`
			`$ ./download.sh`
			`$ build/deeptagger models/deepdanbooru-v3-20211112-sgd-e28.model image.jpg`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00
deeptagger: give up on Windows altogether ORT requires MSVC to build. MSVC can't use MSYS2 libraries. CMake + clang-cl doesn't work in MSYS2. GraphicsMagick doesn't provide development packages for Windows. There is no getopt on Windows. 2024-01-21 17:47:56 +01:00			`The project requires a POSIX-compatible system to build.`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00
			`Options`
			`-------`
			`--batch 1::`
			`This program makes use of batches by decoding and preparing multiple images`
			`in parallel before sending them off to models.`
			`Batching requires appropriate models.`
			`--cpu::`
			`Force CPU inference, which is usually extremely slow.`
			`--debug::`
			`Increase verbosity.`
			`--options "CUDAExecutionProvider;device_id=0"::`
			`Set various ONNX Runtime execution provider options.`
			`--pipe::`
			`Take input filenames from the standard input.`
			`--threshold 0.1::`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`Output weight threshold. Needs to be set higher on ML-Danbooru models.`

			`Tagging galleries`
			`-----------------`
			`The appropriate invocation depends on your machine, and the chosen model.`
			`Unless you have a powerful machine, or use a fast model, it may take forever.`

			`$ find "$GALLERY/images" -type f \`
			`\| build/deeptagger --pipe -b 16 -t 0.5 \`
			`models/ml_caformer_m36_dec-5-97527.model \`
			`\| sed 's\|[^\t]*/\|\|' \`
			`\| gallery tag "$GALLERY" caformer "ML-Danbooru CAFormer"`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00			`Model benchmarks (Linux)`
			`------------------------`
			`These were measured with ORT 1.16.3 on a machine with GeForce RTX 4090 (24G),`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`and Ryzen 9 7950X3D (32 threads), on a sample of 704 images,`
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00			`which took over eight hours. Times include model loading.`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00
			`There is room for further performance tuning.`

			`GPU inference`
			`~~~~~~~~~~~~~`
			`[cols="<,>,>", options=header]`
			`\|===`
			`\|Model\|Batch size\|Time`
			`\|WD v1.4 ViT v2 (batch)\|16\|19 s`
			`\|DeepDanbooru\|16\|21 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|16\|21 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|16\|25 s`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`\|WD v1.4 ViT v2 (batch)\|4\|27 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|4\|30 s`
			`\|DeepDanbooru\|4\|31 s`
			`\|ML-Danbooru TResNet-D 6-30000\|16\|31 s`
			`\|WD v1.4 MOAT v2 (batch)\|16\|31 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|16\|32 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|4\|32 s`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|16\|36 s`
			`\|ML-Danbooru TResNet-D 6-30000\|4\|39 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|4\|39 s`
			`\|WD v1.4 MOAT v2 (batch)\|4\|39 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|4\|43 s`
			`\|WD v1.4 ViT v2\|1\|43 s`
			`\|WD v1.4 ViT v2 (batch)\|1\|43 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|1\|52 s`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`\|DeepDanbooru\|1\|53 s`
			`\|WD v1.4 MOAT v2\|1\|53 s`
			`\|WD v1.4 ConvNeXT v2\|1\|54 s`
			`\|WD v1.4 MOAT v2 (batch)\|1\|54 s`
			`\|WD v1.4 SwinV2 v2\|1\|54 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|1\|54 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|1\|56 s`
			`\|WD v1.4 ConvNeXTV2 v2\|1\|56 s`
			`\|ML-Danbooru TResNet-D 6-30000\|1\|58 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|1\|58 s`
			`\|===`

			`CPU inference`
			`~~~~~~~~~~~~~`
			`[cols="<,>,>", options=header]`
			`\|===`
			`\|Model\|Batch size\|Time`
			`\|DeepDanbooru\|16\|45 s`
			`\|DeepDanbooru\|4\|54 s`
			`\|DeepDanbooru\|1\|88 s`
			`\|ML-Danbooru TResNet-D 6-30000\|4\|139 s`
			`\|ML-Danbooru TResNet-D 6-30000\|16\|162 s`
			`\|ML-Danbooru TResNet-D 6-30000\|1\|167 s`
			`\|WD v1.4 ConvNeXT v2\|1\|208 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|4\|226 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|16\|238 s`
			`\|WD v1.4 ConvNeXTV2 v2\|1\|245 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|4\|268 s`
			`\|WD v1.4 ViT v2 (batch)\|16\|270 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|4\|270 s`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`\|WD v1.4 ConvNeXT v2 (batch)\|1\|272 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|4\|277 s`
			`\|WD v1.4 ViT v2 (batch)\|4\|277 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|16\|294 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|1\|300 s`
			`\|WD v1.4 SwinV2 v2\|1\|302 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|16\|305 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|16\|305 s`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`\|WD v1.4 MOAT v2 (batch)\|4\|307 s`
			`\|WD v1.4 ViT v2\|1\|308 s`
			`\|WD v1.4 ViT v2 (batch)\|1\|311 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|1\|312 s`
			`\|WD v1.4 MOAT v2\|1\|332 s`
			`\|WD v1.4 MOAT v2 (batch)\|16\|335 s`
			`\|WD v1.4 MOAT v2 (batch)\|1\|339 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|1\|352 s`
Add some benchmarks and information 2024-01-18 18:16:18 +01:00			`\|===`
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00
			`Model benchmarks (macOS)`
			`------------------------`
			`These were measured with ORT 1.16.3 on a MacBook Pro, M1 Pro (16GB),`
			`macOS Ventura 13.6.2, on a sample of 179 images. Times include model loading.`

			`There was often significant memory pressure and swapping,`
			`which may explain some of the anomalies. CoreML often makes things worse,`
			`and generally consumes a lot more memory than pure CPU execution.`

			`The kernel panic was repeatable.`

			`GPU inference`
			`~~~~~~~~~~~~~`
Add benchmarks against WDMassTagger 2024-01-19 20:00:05 +01:00			`[cols="<2,>1,>1", options=header]`
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00			`\|===`
			`\|Model\|Batch size\|Time`
			`\|DeepDanbooru\|1\|24 s`
			`\|DeepDanbooru\|8\|31 s`
			`\|DeepDanbooru\|4\|33 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|4\|71 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|1\|76 s`
			`\|WD v1.4 ViT v2 (batch)\|4\|97 s`
			`\|WD v1.4 ViT v2 (batch)\|8\|97 s`
			`\|ML-Danbooru TResNet-D 6-30000\|8\|100 s`
			`\|ML-Danbooru TResNet-D 6-30000\|4\|101 s`
			`\|WD v1.4 ViT v2 (batch)\|1\|105 s`
			`\|ML-Danbooru TResNet-D 6-30000\|1\|125 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|8\|126 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|8\|127 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|4\|128 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|8\|132 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|4\|133 s`
			`\|WD v1.4 ViT v2\|1\|146 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|1\|149 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|1\|160 s`
			`\|WD v1.4 MOAT v2 (batch)\|1\|165 s`
			`\|WD v1.4 SwinV2 v2\|1\|166 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|1\|263 s`
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00			`\|WD v1.4 ConvNeXT v2\|1\|273 s`
			`\|WD v1.4 MOAT v2\|1\|273 s`
			`\|WD v1.4 ConvNeXTV2 v2\|1\|340 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|4\|445 s`
			`\|ML-Danbooru CAFormer dec-5-97527\|8\|1790 s`
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00			`\|WD v1.4 MOAT v2 (batch)\|4\|kernel panic`
			`\|===`

			`CPU inference`
			`~~~~~~~~~~~~~`
Add benchmarks against WDMassTagger 2024-01-19 20:00:05 +01:00			`[cols="<2,>1,>1", options=header]`
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00			`\|===`
			`\|Model\|Batch size\|Time`
			`\|DeepDanbooru\|8\|54 s`
			`\|DeepDanbooru\|4\|55 s`
			`\|DeepDanbooru\|1\|75 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|8\|93 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|4\|94 s`
			`\|ML-Danbooru TResNet-D 6-30000\|8\|97 s`
			`\|WD v1.4 SwinV2 v2 (batch)\|1\|98 s`
			`\|ML-Danbooru TResNet-D 6-30000\|4\|99 s`
			`\|WD v1.4 SwinV2 v2\|1\|99 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|4\|110 s`
			`\|ML-Danbooru CAFormer dec-5-97527\|8\|110 s`
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00			`\|WD v1.4 ViT v2 (batch)\|4\|111 s`
			`\|WD v1.4 ViT v2 (batch)\|8\|111 s`
			`\|WD v1.4 ViT v2 (batch)\|1\|113 s`
			`\|WD v1.4 ViT v2\|1\|113 s`
			`\|ML-Danbooru TResNet-D 6-30000\|1\|118 s`
deeptagger: add an example of how to use it And refer to CAFormer correctly. 2024-01-21 12:35:48 +01:00			`\|ML-Danbooru CAFormer dec-5-97527\|1\|122 s`
Add CoreML benchmarks 2024-01-19 15:32:10 +01:00			`\|WD v1.4 ConvNeXT v2 (batch)\|8\|124 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|4\|125 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|8\|129 s`
			`\|WD v1.4 ConvNeXT v2\|1\|130 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|4\|131 s`
			`\|WD v1.4 MOAT v2 (batch)\|8\|134 s`
			`\|WD v1.4 ConvNeXTV2 v2\|1\|136 s`
			`\|WD v1.4 MOAT v2 (batch)\|4\|136 s`
			`\|WD v1.4 ConvNeXT v2 (batch)\|1\|146 s`
			`\|WD v1.4 MOAT v2 (batch)\|1\|156 s`
			`\|WD v1.4 MOAT v2\|1\|156 s`
			`\|WD v1.4 ConvNeXTV2 v2 (batch)\|1\|157 s`
			`\|===`
Add benchmarks against WDMassTagger 2024-01-19 20:00:05 +01:00
			`Comparison with WDMassTagger`
			`----------------------------`
			`Using CUDA, on the same Linux computer as above, on a sample of 6352 images.`
			`We're a bit slower, depending on the model.`
			`Batch sizes of 16 and 32 give practically equivalent results for both.`

			`[cols="<,>,>,>", options="header,autowidth"]`
			`\|===`
			`\|Model\|WDMassTagger\|deeptagger (batch)\|Ratio`
			`\|wd-v1-4-convnext-tagger-v2 \|1:18 \|1:55 \|68 %`
			`\|wd-v1-4-convnextv2-tagger-v2 \|1:20 \|2:10 \|62 %`
			`\|wd-v1-4-moat-tagger-v2 \|1:22 \|1:52 \|73 %`
			`\|wd-v1-4-swinv2-tagger-v2 \|1:28 \|1:34 \|94 %`
			`\|wd-v1-4-vit-tagger-v2 \|1:16 \|1:22 \|93 %`
			`\|===`