# Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: https://github.com/ggerganov/llama.cpp/discussions/5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

# llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix


# KL-divergence statistics for Mistral-7B

* Last updated 2024-02-27 (add IQ4_XS).
* imatrix from wiki.train, 200*512 tokens.
* KL-divergence measured on wiki.test.

![image](https://gist.github.com/assets/90720/ac93a0df-e308-458f-8ff8-04aed10627e4)

|             | **Bits per weight** | **KL-divergence median** | **KL-divergence q99** | **Top tokens differ** | **ln(PPL(Q)/PPL(base))** |
|-------------|---------------------|--------------------------|-----------------------|-----------------------|--------------------------|
| **IQ1_S**   | 1.78                | 0.5495                   | 5.5174                | 0.3840                | 0.9235                   |
| **IQ2_XXS** | 2.20                | 0.1751                   | 2.4983                | 0.2313                | 0.2988                   |
| **IQ2_XS**  | 2.43                | 0.1146                   | 1.7693                | 0.1943                | 0.2046                   |
| **IQ2_S**   | 2.55                | 0.0949                   | 1.6284                | 0.1806                | 0.1722                   |
| **IQ2_M**   | 2.76                | 0.0702                   | 1.0935                | 0.1557                | 0.1223                   |
| **Q2_K_S**  | 2.79                | 0.0829                   | 1.5111                | 0.1735                | 0.1600                   |
| **Q2_K**    | 3.00                | 0.0588                   | 1.0337                | 0.1492                | 0.1103                   |
| **IQ3_XXS** | 3.21                | 0.0330                   | 0.5492                | 0.1137                | 0.0589                   |
| **IQ3_XS**  | 3.32                | 0.0296                   | 0.4550                | 0.1071                | 0.0458                   |
| **Q3_K_S**  | 3.50                | 0.0304                   | 0.4481                | 0.1068                | 0.0511                   |
| **IQ3_S**   | 3.52                | 0.0205                   | 0.3018                | 0.0895                | 0.0306                   |
| **IQ3_M**   | 3.63                | 0.0186                   | 0.2740                | 0.0859                | 0.0268                   |
| **Q3_K_M**  | 3.89                | 0.0171                   | 0.2546                | 0.0839                | 0.0258                   |
| **Q3_K_L**  | 4.22                | 0.0152                   | 0.2202                | 0.0797                | 0.0205                   |
| **IQ4_XS**  | 4.32                | 0.0088                   | 0.1082                | 0.0606                | 0.0079                   |
| **IQ4_NL**  | 4.56                | 0.0085                   | 0.1077                | 0.0605                | 0.0074                   |
| **Q4_K_S**  | 4.57                | 0.0083                   | 0.1012                | 0.0600                | 0.0081                   |
| **Q4_K_M**  | 4.83                | 0.0075                   | 0.0885                | 0.0576                | 0.0060                   |
| **Q5_K_S**  | 5.52                | 0.0045                   | 0.0393                | 0.0454                | 0.0005                   |
| **Q5_K_M**  | 5.67                | 0.0043                   | 0.0368                | 0.0444                | 0.0005                   |
| **Q6_K**    | 6.57                | 0.0032                   | 0.0222                | 0.0394                | −0.0008                  |

# ROCm benchmarks for Mistral-7B

* Last updated 2024-03-15 (bench #6083).

![image](https://gist.github.com/assets/90720/e53d9081-4a64-4ede-9531-0cfb97e0e964)

|            | **GiB** | **pp512 -ngl 99** | **tg128 -ngl 99** | **pp512 -ngl 0** | **tg128 -ngl 0** | **pp512 -ngl 0 #6083** |
|------------|---------|-------------------|-------------------|------------------|------------------|------------------------|
| **IQ1_S**  | 1.50    | 709.29            | 74.85             | 324.35           | 15.66            | 585.61                 |
| **IQ2_XS** | 2.05    | 704.52            | 58.44             | 316.10           | 15.11            | 557.68                 |
| **IQ3_XS** | 2.79    | 682.72            | 45.79             | 300.61           | 10.49            | 527.83                 |
| **IQ4_XS** | 3.64    | 712.96            | 64.17             | 292.36           | 11.06            | 495.92                 |
| **Q4_0**   | 3.83    | 870.44            | 63.42             | 310.94           | 10.44            | 554.56                 |
| **Q5_K**   | 4.78    | 691.40            | 46.52             | 273.83           | 8.54             | 453.58                 |
| **Q6_K**   | 5.53    | 661.98            | 47.57             | 261.16           | 7.34             | 415.22                 |
| **Q8_0**   | 7.17    | 881.95            | 39.74             | 270.70           | 5.74             | 440.44                 |
| **f16**    | 13.49   |                   |                   | 211.12           | 3.06             | 303.60                 |