# Which GGUF is right for me? (Opinionated) Good question! I am collecting human data on how quantization affects outputs. See here for more information: https://github.com/ggerganov/llama.cpp/discussions/5962 In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters. # llama.cpp feature matrix See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix # KL-divergence statistics for Mistral-7B * Last updated 2024-02-27 (add IQ4_XS). * imatrix from wiki.train, 200*512 tokens. * KL-divergence measured on wiki.test. ![image](https://gist.github.com/assets/90720/ac93a0df-e308-458f-8ff8-04aed10627e4) | | **Bits per weight** | **KL-divergence median** | **KL-divergence q99** | **Top tokens differ** | **ln(PPL(Q)/PPL(base))** | |-------------|---------------------|--------------------------|-----------------------|-----------------------|--------------------------| | **IQ1_S** | 1.78 | 0.5495 | 5.5174 | 0.3840 | 0.9235 | | **IQ2_XXS** | 2.20 | 0.1751 | 2.4983 | 0.2313 | 0.2988 | | **IQ2_XS** | 2.43 | 0.1146 | 1.7693 | 0.1943 | 0.2046 | | **IQ2_S** | 2.55 | 0.0949 | 1.6284 | 0.1806 | 0.1722 | | **IQ2_M** | 2.76 | 0.0702 | 1.0935 | 0.1557 | 0.1223 | | **Q2_K_S** | 2.79 | 0.0829 | 1.5111 | 0.1735 | 0.1600 | | **Q2_K** | 3.00 | 0.0588 | 1.0337 | 0.1492 | 0.1103 | | **IQ3_XXS** | 3.21 | 0.0330 | 0.5492 | 0.1137 | 0.0589 | | **IQ3_XS** | 3.32 | 0.0296 | 0.4550 | 0.1071 | 0.0458 | | **Q3_K_S** | 3.50 | 0.0304 | 0.4481 | 0.1068 | 0.0511 | | **IQ3_S** | 3.52 | 0.0205 | 0.3018 | 0.0895 | 0.0306 | | **IQ3_M** | 3.63 | 0.0186 | 0.2740 | 0.0859 | 0.0268 | | **Q3_K_M** | 3.89 | 0.0171 | 0.2546 | 0.0839 | 0.0258 | | **Q3_K_L** | 4.22 | 0.0152 | 0.2202 | 0.0797 | 0.0205 | | **IQ4_XS** | 4.32 | 0.0088 | 0.1082 | 0.0606 | 0.0079 | | **IQ4_NL** | 4.56 | 0.0085 | 0.1077 | 0.0605 | 0.0074 | | **Q4_K_S** | 4.57 | 0.0083 | 0.1012 | 0.0600 | 0.0081 | | **Q4_K_M** | 4.83 | 0.0075 | 0.0885 | 0.0576 | 0.0060 | | **Q5_K_S** | 5.52 | 0.0045 | 0.0393 | 0.0454 | 0.0005 | | **Q5_K_M** | 5.67 | 0.0043 | 0.0368 | 0.0444 | 0.0005 | | **Q6_K** | 6.57 | 0.0032 | 0.0222 | 0.0394 | −0.0008 | # ROCm benchmarks for Mistral-7B * Last updated 2024-03-15 (bench #6083). ![image](https://gist.github.com/assets/90720/e53d9081-4a64-4ede-9531-0cfb97e0e964) | | **GiB** | **pp512 -ngl 99** | **tg128 -ngl 99** | **pp512 -ngl 0** | **tg128 -ngl 0** | **pp512 -ngl 0 #6083** | |------------|---------|-------------------|-------------------|------------------|------------------|------------------------| | **IQ1_S** | 1.50 | 709.29 | 74.85 | 324.35 | 15.66 | 585.61 | | **IQ2_XS** | 2.05 | 704.52 | 58.44 | 316.10 | 15.11 | 557.68 | | **IQ3_XS** | 2.79 | 682.72 | 45.79 | 300.61 | 10.49 | 527.83 | | **IQ4_XS** | 3.64 | 712.96 | 64.17 | 292.36 | 11.06 | 495.92 | | **Q4_0** | 3.83 | 870.44 | 63.42 | 310.94 | 10.44 | 554.56 | | **Q5_K** | 4.78 | 691.40 | 46.52 | 273.83 | 8.54 | 453.58 | | **Q6_K** | 5.53 | 661.98 | 47.57 | 261.16 | 7.34 | 415.22 | | **Q8_0** | 7.17 | 881.95 | 39.74 | 270.70 | 5.74 | 440.44 | | **f16** | 13.49 | | | 211.12 | 3.06 | 303.60 |