# WIP on Vectorized interpolation

- Install Pillow-SIMD
```
pip uninstall -y pillow && CC="cc -mavx2" pip install --no-cache-dir --force-reinstall pillow-simd
```

## Run benchmarks: nightly vs PR

```
wget https://raw.githubusercontent.com/pytorch/vision/main/torchvision/transforms/functional_tensor.py -O torchvision_functional_tensor.py
```

```
python -u run_bench_interp.py "output/$(date "+%Y%m%d-%H%M%S")-pr.pkl" --tag=PR
```

## Output consistency with master pytorch

```
# On pytorch-nightly
python verif_interp2.py verif_expected --is_ref=True

# On PR
python verif_interp2.py verif_expected --is_ref=False
```

## Some results

### 08/02/2023

```

PIL version:  9.0.0.post1
[--------------------------------------------------------------------- Resize --------------------------------------------------------------------]
                                                                 |  Pillow (9.0.0.post1)  |  torch (2.0.0a0+gite6bdca1) PR  |   torchvision resize
1 threads: ----------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=True     |    38.074 (+-0.541)    |        145.393 (+-1.645)        |   368.952 (+-1.475)
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=False    |                        |        112.422 (+-0.668)        |    74.104 (+-0.135)
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=True     |   112.459 (+-0.690)    |        496.323 (+-3.041)        |   1560.163 (+-5.492)
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=False    |                        |        363.923 (+-1.618)        |   186.801 (+-0.461)
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=True     |   184.901 (+-1.414)    |        890.993 (+-2.717)        |  2949.707 (+-95.723)
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=False    |                        |        647.951 (+-4.457)        |   318.293 (+-0.674)
      3 torch.uint8 channels_last bilinear 270 -> 224 aa=True    |   139.299 (+-0.729)    |        329.859 (+-1.688)        |   1242.137 (+-4.737)
      3 torch.uint8 channels_last bilinear 270 -> 224 aa=False   |                        |        307.541 (+-2.314)        |   908.356 (+-1.292)
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=True     |                        |         67.892 (+-0.188)        |   473.954 (+-0.875)
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=False    |                        |         34.854 (+-0.124)        |    87.333 (+-0.212)
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=True     |                        |        188.218 (+-1.294)        |   2064.724 (+-7.161)
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=False    |                        |         55.389 (+-0.175)        |   238.161 (+-0.517)
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=True     |                        |        316.895 (+-1.609)        |  3929.540 (+-11.386)
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=False    |                        |         73.027 (+-0.227)        |   424.204 (+-1.261)
      4 torch.uint8 channels_last bilinear 270 -> 224 aa=True    |                        |        166.030 (+-0.901)        |   1489.629 (+-5.092)
      4 torch.uint8 channels_last bilinear 270 -> 224 aa=False   |                        |        143.489 (+-0.757)        |   992.293 (+-1.604)
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=True    |    37.804 (+-0.178)    |        145.445 (+-0.874)        |   355.802 (+-1.438)
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=False   |                        |        112.183 (+-0.704)        |   203.691 (+-0.742)
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=True    |   112.137 (+-0.763)    |        496.563 (+-11.028)       |  1549.939 (+-10.290)
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=False   |                        |        364.179 (+-2.418)        |   678.691 (+-2.422)
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=True    |   184.557 (+-1.122)    |        891.174 (+-3.050)        |  2930.927 (+-11.987)
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=False   |                        |        647.634 (+-1.752)        |  1287.492 (+-877.768)
      3 torch.uint8 channels_first bilinear 270 -> 224 aa=True   |   139.091 (+-1.009)    |        329.487 (+-1.858)        |   818.238 (+-1.593)
      3 torch.uint8 channels_first bilinear 270 -> 224 aa=False  |                        |        308.697 (+-2.485)        |   1209.505 (+-4.367)
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=True    |                        |         87.350 (+-0.238)        |   460.749 (+-1.384)
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=False   |                        |         53.891 (+-0.200)        |   252.033 (+-0.734)
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=True    |                        |        257.106 (+-1.468)        |   2052.175 (+-6.119)
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=False   |                        |        124.054 (+-0.658)        |  909.929 (+-272.601)
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=True    |                        |        442.343 (+-2.452)        |  3904.617 (+-11.139)
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=False   |                        |        199.340 (+-1.232)        |  4785.501 (+-106.702)
      4 torch.uint8 channels_first bilinear 270 -> 224 aa=True   |                        |        285.454 (+-3.306)        |   1073.443 (+-6.514)
      4 torch.uint8 channels_first bilinear 270 -> 224 aa=False  |                        |        264.808 (+-3.390)        |   1157.429 (+-4.480)

Times are in microseconds (us).
```

### 07/02/2023

```
Num threads: 1

PIL version:  9.0.0.post1
[-------------------------------------------------------------------- Resize --------------------------------------------------------------------]
                                                                 |  Pillow (9.0.0.post1)  |  torch (2.0.0a0+gite6bdca1) PR  |   torchvision resize
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=True     |    38.945 (+-0.222)    |        130.363 (+-0.612)        |   364.956 (+-2.782)
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=False    |                        |        108.715 (+-0.305)        |    72.821 (+-0.245)
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=True     |   112.800 (+-0.394)    |        439.170 (+-0.637)        |   1596.292 (+-2.404)
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=False    |                        |        360.557 (+-0.442)        |   185.144 (+-0.231)
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=True     |   186.025 (+-0.873)    |        781.784 (+-4.723)        |   2941.418 (+-3.263)
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=False    |                        |        643.826 (+-2.035)        |   316.546 (+-0.371)
      3 torch.uint8 channels_last bilinear 270 -> 224 aa=True    |   139.784 (+-0.302)    |        319.836 (+-1.226)        |   1238.219 (+-1.816)
      3 torch.uint8 channels_last bilinear 270 -> 224 aa=False   |                        |        297.607 (+-3.890)        |   908.446 (+-1.849)
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=True     |                        |         52.814 (+-0.490)        |   470.149 (+-7.399)
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=False    |                        |         31.900 (+-0.115)        |    86.144 (+-0.203)
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=True     |                        |        131.809 (+-1.086)        |   2099.700 (+-3.938)
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=False    |                        |         52.489 (+-0.074)        |   236.924 (+-0.330)
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=True     |                        |        207.632 (+-1.031)        |   3934.327 (+-4.734)
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=False    |                        |         69.291 (+-0.169)        |   422.172 (+-0.420)
      4 torch.uint8 channels_last bilinear 270 -> 224 aa=True    |                        |        149.362 (+-0.545)        |   1484.460 (+-3.488)
      4 torch.uint8 channels_last bilinear 270 -> 224 aa=False   |                        |        127.503 (+-0.296)        |   992.280 (+-1.658)
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=True    |    38.934 (+-0.066)    |        130.096 (+-0.442)        |   352.259 (+-0.315)
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=False   |                        |        108.973 (+-0.755)        |   201.381 (+-0.268)
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=True    |   112.429 (+-0.337)    |        439.524 (+-2.111)        |   1582.000 (+-2.005)
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=False   |                        |        360.747 (+-0.596)        |   707.031 (+-4.563)
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=True    |   186.654 (+-0.514)    |        781.276 (+-2.859)        |   2929.456 (+-7.766)
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=False   |                        |        643.654 (+-2.010)        |  1436.077 (+-45.418)
      3 torch.uint8 channels_first bilinear 270 -> 224 aa=True   |   140.697 (+-0.760)    |        318.995 (+-1.630)        |   814.211 (+-2.972)
      3 torch.uint8 channels_first bilinear 270 -> 224 aa=False  |                        |        295.188 (+-1.540)        |   1208.328 (+-1.880)
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=True    |                        |         71.006 (+-0.246)        |   456.845 (+-1.242)
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=False   |                        |         50.860 (+-0.104)        |   249.063 (+-0.477)
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=True    |                        |        199.646 (+-0.859)        |   2091.296 (+-2.892)
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=False   |                        |        120.245 (+-0.589)        |   950.757 (+-17.017)
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=True    |                        |        330.196 (+-1.010)        |   3908.203 (+-4.160)
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=False   |                        |        194.640 (+-0.230)        |   4760.218 (+-9.555)
      4 torch.uint8 channels_first bilinear 270 -> 224 aa=True   |                        |        266.951 (+-1.631)        |   1069.409 (+-4.428)
      4 torch.uint8 channels_first bilinear 270 -> 224 aa=False  |                        |        243.640 (+-0.885)        |   1163.805 (+-1.581)

Times are in microseconds (us).
```

### 02/02/2023

- Removed pointer from upsample_avx_bilinear
- Avoid copy if num_channels=4 and channels_last

```
Num threads: 1

PIL version:  9.0.0.post1
[----------------------------------------------------------------------------- Resize ----------------------------------------------------------------------------]
                                                                |  Pillow (9.0.0.post1)  |  torch (2.0.0a0+git7f72623) PR  |  torch (2.0.0a0+git7f72623) PR (float)
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=True    |          38.6          |               395.9             |                   360.7
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=False   |                        |               363.4             |                    68.2
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=True    |         112.5          |              1530.6             |                  1555.5
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=False   |                        |              1369.7             |                   179.9
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=True    |         186.0          |              2652.6             |                  2935.8
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=False   |                        |              2507.9             |                   309.7
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=True    |                        |                57.1             |                   466.0
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=False   |                        |                37.6             |                    81.1
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=True    |                        |               131.6             |                  2093.5
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=False   |                        |                58.0             |                   231.4
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=True    |                        |               204.7             |                  3926.6
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=False   |                        |                74.7             |                   418.0
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=True   |          38.7          |               397.9             |                   348.7
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=False  |                        |               361.9             |                   197.9
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=True   |         112.2          |              1448.7             |                  1540.7
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=False  |                        |              1388.4             |                  1493.0
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=True   |         186.0          |              2633.9             |                  2923.4
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=False  |                        |              2585.5             |                  1271.8
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=True   |                        |               208.1             |                   453.3
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=False  |                        |               188.8             |                   245.1
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=True   |                        |               748.0             |                  2043.2
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=False  |                        |               673.7             |                   864.4
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=True   |                        |              1362.0             |                  3897.1
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=False  |                        |              1230.4             |                  1837.6

Times are in microseconds (us).
```

- recoded unpack rgb method

```
Num threads: 1

PIL version:  9.0.0.post1
[----------------------------------------------------------------------------- Resize ----------------------------------------------------------------------------]
                                                                |  Pillow (9.0.0.post1)  |  torch (2.0.0a0+git7f72623) PR  |  torch (2.0.0a0+git7f72623) PR (float)
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=True    |          38.9          |              132.6              |                   360.5
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=False   |                        |              111.6              |                    68.2
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=True    |         113.7          |              443.0              |                  1554.3
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=False   |                        |              362.6              |                   180.1
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=True    |         187.5          |              784.7              |                  2904.1
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=False   |                        |              645.3              |                   309.0
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=True    |                        |               55.1              |                   464.7
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=False   |                        |               33.5              |                    80.9
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=True    |                        |              135.8              |                  2065.8
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=False   |                        |               55.1              |                   231.5
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=True    |                        |              209.8              |                  3873.7
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=False   |                        |               71.3              |                   411.1
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=True   |          39.2          |              132.5              |                   348.6
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=False  |                        |              111.9              |                   199.4
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=True   |         112.7          |              439.8              |                  1542.1
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=False  |                        |              362.2              |                  1569.3
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=True   |         185.4          |              779.8              |                  2888.7
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=False  |                        |              645.0              |                  1440.4
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=True   |                        |               73.9              |                   453.4
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=False  |                        |               53.5              |                   245.7
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=True   |                        |              200.3              |                  2041.5
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=False  |                        |              122.8              |                   933.2
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=True   |                        |              331.4              |                  3852.1
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=False  |                        |              197.1              |                  2046.7

Times are in microseconds (us).
```

### 01/02/2023

- Use tensor to allocate memory
- Compute weights once

```
cd /tmp/pth/interpolate_vec_uint8/ && python -u check_interp.py
```

```
Torch version: 2.0.0a0+git7f72623Torch config: PyTorch built with:  - GCC 9.4
  - C++ Version: 201703
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2  - Build settings: BUILD_TYPE=Release, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=0, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=0, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF,

Num threads: 1

PIL version:  9.0.0.post1
[----------------------------------------------------------------------------- Resize ----------------------------------------------------------------------------]
                                                                |  Pillow (9.0.0.post1)  |  torch (2.0.0a0+git7f72623) PR  |  torch (2.0.0a0+git7f72623) PR (float)
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=True    |          38.6          |               346.7             |                   361.0
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=False   |                        |               327.8             |                    67.5
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=True    |         112.1          |              1321.2             |                  1553.2
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=False   |                        |              1248.8             |                   179.6
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=True    |         184.9          |              2429.3             |                  2910.2
      3 torch.uint8 channels_last bilinear 712 -> 32 aa=False   |                        |              2306.4             |                   309.6
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=True    |                        |               208.1             |                   466.6
      4 torch.uint8 channels_last bilinear 256 -> 32 aa=False   |                        |               189.6             |                    80.7
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=True    |                        |               744.2             |                  2053.8
      4 torch.uint8 channels_last bilinear 520 -> 32 aa=False   |                        |               674.7             |                   231.1
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=True    |                        |              1359.0             |                  3886.2
      4 torch.uint8 channels_last bilinear 712 -> 32 aa=False   |                        |              1230.8             |                   412.0
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=True   |          38.3          |               346.6             |                   349.3
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=False  |                        |               328.0             |                   196.9
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=True   |         112.3          |              1321.6             |                  1538.2
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=False  |                        |              1249.3             |                  1515.5
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=True   |         185.0          |              2435.4             |                  2887.2
      3 torch.uint8 channels_first bilinear 712 -> 32 aa=False  |                        |              2312.7             |                  2556.5
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=True   |                        |               209.3             |                   453.2
      4 torch.uint8 channels_first bilinear 256 -> 32 aa=False  |                        |               190.6             |                   244.8
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=True   |                        |               745.8             |                  2278.6
      4 torch.uint8 channels_first bilinear 520 -> 32 aa=False  |                        |               730.4             |                  1360.3
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=True   |                        |              1480.8             |                  4110.3
      4 torch.uint8 channels_first bilinear 712 -> 32 aa=False  |                        |              1311.5             |                  3714.8

Times are in microseconds (us).
```

### 30/01/2023 (Repro current results)

```
cd /tmp/pth/interpolate_vec_uint8/ && python -u check_interp.py
```


```
Torch version: 2.0.0a0+git7f72623Torch config: PyTorch built with:  - GCC 9.4  - C++ Version: 201703  - OpenMP 201511 (a.k.a. OpenMP 4.5)  - CPU capability usage: AVX2  - Build settings: BUILD_TYPE=Release, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=0, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=0, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF,

Num threads: 1

PIL version:  9.0.0.post1
[----------------------------------------------------------------------------- Resize ----------------------------------------------------------------------------]
                                                                |  Pillow (9.0.0.post1)  |  torch (2.0.0a0+git7f72623) PR  |  torch (2.0.0a0+git7f72623) PR (float)
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=True    |          41.3          |               395.4             |                   377.4
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=False   |                        |               368.4             |                    67.8
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=True    |         112.2          |              1456.1             |                  1557.2
      3 torch.uint8 channels_last bilinear 520 -> 32 aa=False   |                        |              1372.7             |                   180.1
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=True   |          38.5          |               372.0             |                   349.0
      3 torch.uint8 channels_first bilinear 256 -> 32 aa=False  |                        |               356.8             |                   196.9
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=True   |         112.8          |              1449.1             |                  1543.7
      3 torch.uint8 channels_first bilinear 520 -> 32 aa=False  |                        |              1379.2             |                  1306.1

Times are in microseconds (us).
```


```
Num threads: 1

PIL version:  9.0.0.post1
[----------------------------------------------------------------------------- Resize -----------------------------------------------------------------------------]
                                                                 |  Pillow (9.0.0.post1)  |  torch (2.0.0a0+git7f72623) PR  |  torch (2.0.0a0+git7f72623) PR (float)
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 270 -> 224 aa=True    |         148.4          |              628.7              |                  1269.7
      3 torch.uint8 channels_last bilinear 270 -> 224 aa=False   |                        |              608.8              |                   917.8
      3 torch.uint8 channels_first bilinear 270 -> 224 aa=True   |         149.7          |              598.4              |                   772.5
      3 torch.uint8 channels_first bilinear 270 -> 224 aa=False  |                        |              569.8              |                  1300.6

Times are in microseconds (us).
```

```
Num threads: 1

PIL version:  9.4.0
check_interp.py:93: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
  expected_pil = torch.from_numpy(np.asarray(output_pil_img)).clone().permute(2, 0, 1).contiguous()
[------------------------------------------------------------------------- Resize ------------------------------------------------------------------------]
                                                              |  Pillow (9.4.0)  |  torch (2.0.0a0+git7f72623) PR  |  torch (2.0.0a0+git7f72623) PR (float)
1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------
      3 torch.uint8 channels_last bilinear 256 -> 32 aa=True  |      223.3       |              382.2              |                  361.8

Times are in microseconds (us).
```

### uint8 -> float -> resize -> uint8

```
Num threads: 1
[--------- Downsampling: torch.Size([3, 438, 906]) -> (320, 196) ---------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git943acd4
1 threads: ----------------------------------------------------------------
      channels_first contiguous  |       345.0       |         2530.7

Times are in microseconds (us).

[--------- Downsampling: torch.Size([3, 438, 906]) -> (460, 220) ---------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git943acd4
1 threads: ----------------------------------------------------------------
      channels_first contiguous  |       412.8       |         2947.4

Times are in microseconds (us).

[---------- Downsampling: torch.Size([3, 438, 906]) -> (120, 96) ---------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git943acd4
1 threads: ----------------------------------------------------------------
      channels_first contiguous  |       214.0       |         2124.6

Times are in microseconds (us).

[--------- Downsampling: torch.Size([3, 438, 906]) -> (1200, 196) --------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git943acd4
1 threads: ----------------------------------------------------------------
      channels_first contiguous  |       911.3       |         7560.4

Times are in microseconds (us).

[--------- Downsampling: torch.Size([3, 438, 906]) -> (120, 1200) --------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git943acd4
1 threads: ----------------------------------------------------------------
      channels_first contiguous  |       291.0       |         2700.7

Times are in microseconds (us).
```


### 30/11/2022 - fallback uint8 implementation

```
Num threads: 1
[---------------------------------- Downsampling: torch.Size([3, 438, 906]) -> (320, 196) ----------------------------------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git7a3055e, using uint8  |  1.14.0a0+git7a3055e, using float
1 threads: ------------------------------------------------------------------------------------------------------------------
      channels_first contiguous  |       348.8       |               3315.0               |               2578.3

Times are in microseconds (us).

[---------------------------------- Downsampling: torch.Size([3, 438, 906]) -> (460, 220) ----------------------------------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git7a3055e, using uint8  |  1.14.0a0+git7a3055e, using float
1 threads: ------------------------------------------------------------------------------------------------------------------
      channels_first contiguous  |       412.5       |               4231.5               |               3004.9

Times are in microseconds (us).

[----------------------------------- Downsampling: torch.Size([3, 438, 906]) -> (120, 96) ----------------------------------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git7a3055e, using uint8  |  1.14.0a0+git7a3055e, using float
1 threads: ------------------------------------------------------------------------------------------------------------------
      channels_first contiguous  |       216.4       |               1818.1               |               2286.3

Times are in microseconds (us).

[---------------------------------- Downsampling: torch.Size([3, 438, 906]) -> (1200, 196) ---------------------------------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git7a3055e, using uint8  |  1.14.0a0+git7a3055e, using float
1 threads: ------------------------------------------------------------------------------------------------------------------
      channels_first contiguous  |       907.3       |               9095.5               |               5861.1

Times are in microseconds (us).

[---------------------------------- Downsampling: torch.Size([3, 438, 906]) -> (120, 1200) ---------------------------------]
                                 |  PIL 9.0.0.post1  |  1.14.0a0+git7a3055e, using uint8  |  1.14.0a0+git7a3055e, using float
1 threads: ------------------------------------------------------------------------------------------------------------------
      channels_first contiguous  |       298.1       |               2753.7               |               2865.6

Times are in microseconds (us).
```

```
PIL version:  9.0.0.post1
[-------------------- Resize measurements ---------------------]
                                |  Pillow image  |  torch tensor
1 threads: -----------------------------------------------------
      1200 -> 256, torch.uint8  |      57.0      |     295.1

Times are in microseconds (us).
```