Skip to content

Instantly share code, notes, and snippets.

@wen-long
Last active August 23, 2024 18:14
Show Gist options
  • Save wen-long/856eeab4a589c8b19283ab81e7ba48b9 to your computer and use it in GitHub Desktop.
Save wen-long/856eeab4a589c8b19283ab81e7ba48b9 to your computer and use it in GitHub Desktop.

Revisions

  1. wen-long revised this gist Aug 23, 2024. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions learn hash.txt
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,6 @@
    hash 重要吗?
    Rust杂谈 HashMap性能比不上Java? https://www.bilibili.com/video/BV1RTiceCEr6/

    https://github.com/rurban/smhasher
    可以作为索引, 如果有专门讲 hash 的书没有提到 smhasher, 要么过时要么不专业
    务必 clone 代码自己在自己的机器上测试并得出结论, 不要过于关心其他人的结果
  2. wen-long revised this gist Aug 21, 2024. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions learn hash.txt
    Original file line number Diff line number Diff line change
    @@ -7,6 +7,8 @@ smhasher 的 Summary 竟然将 rapidhash 放在首位, 这与我的实际测试
    吹牛逼和傲慢在 hash 算法这个领域也有出现

    在我的机器上, xxh3 性能是 rapidhash 的两倍. 运行在单核虚拟机, 宿主机宣称为 AMD Ryzen 7950x
    homebrew 原提供的 xxhash -Os 参数不能提供最佳性能,已经修复为 -O3
    详见 https://github.com/Homebrew/homebrew-core/pull/181691

    ```
    $ xxhsum -b
  3. wen-long created this gist Aug 19, 2024.
    170 changes: 170 additions & 0 deletions learn hash.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,170 @@
    https://github.com/rurban/smhasher
    可以作为索引, 如果有专门讲 hash 的书没有提到 smhasher, 要么过时要么不专业
    务必 clone 代码自己在自己的机器上测试并得出结论, 不要过于关心其他人的结果
    smhasher 的 Summary 竟然将 rapidhash 放在首位, 这与我的实际测试不符


    吹牛逼和傲慢在 hash 算法这个领域也有出现

    在我的机器上, xxh3 性能是 rapidhash 的两倍. 运行在单核虚拟机, 宿主机宣称为 AMD Ryzen 7950x

    ```
    $ xxhsum -b
    xxhsum 0.8.1 by Yann Collet
    compiled as 64-bit x86_64 autoVec little endian with GCC 11.2.0
    Sample of 100 KB...
    1#XXH32 : 102400 -> 99518 it/s ( 9718.6 MB/s)
    3#XXH64 : 102400 -> 143061 it/s (13970.8 MB/s)
    5#XXH3_64b : 102400 -> 610795 it/s (59647.9 MB/s)
    11#XXH128 : 102400 -> 604307 it/s (59014.3 MB/s)

    processor : 0
    vendor_id : AuthenticAMD
    cpu family : 25
    model : 1
    model name : AMD EPYC-Milan Processor
    stepping : 1
    microcode : 0x1000065
    cpu MHz : 4499.996
    cache size : 512 KB
    physical id : 0
    siblings : 1
    core id : 0
    cpu cores : 1
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 13
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm arch_capabilities
    bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass srso
    bogomips : 8999.99
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 40 bits physical, 48 bits virtual
    power management:

    --- Testing xxh3 "xxHash v3, 64-bit" GOOD

    [[[ Speed Tests ]]]

    Bulk speed test - 262144-byte keys
    Alignment 7 - 14.657 bytes/cycle - 41934.22 MiB/sec @ 3 ghz
    Alignment 6 - 14.724 bytes/cycle - 42124.75 MiB/sec @ 3 ghz
    Alignment 5 - 14.683 bytes/cycle - 42008.81 MiB/sec @ 3 ghz
    Alignment 4 - 14.846 bytes/cycle - 42475.55 MiB/sec @ 3 ghz
    Alignment 3 - 14.635 bytes/cycle - 41871.10 MiB/sec @ 3 ghz
    Alignment 2 - 14.678 bytes/cycle - 41993.26 MiB/sec @ 3 ghz
    Alignment 1 - 14.573 bytes/cycle - 41692.77 MiB/sec @ 3 ghz
    Alignment 0 - 17.671 bytes/cycle - 50557.43 MiB/sec @ 3 ghz
    Average - 15.058 bytes/cycle - 43082.23 MiB/sec @ 3 ghz

    --- Testing rapidhash "rapidhash v1" GOOD

    [[[ Speed Tests ]]]

    Bulk speed test - 262144-byte keys
    Alignment 7 - 8.065 bytes/cycle - 23075.42 MiB/sec @ 3 ghz
    Alignment 6 - 8.061 bytes/cycle - 23062.59 MiB/sec @ 3 ghz
    Alignment 5 - 8.066 bytes/cycle - 23076.17 MiB/sec @ 3 ghz
    Alignment 4 - 8.107 bytes/cycle - 23194.39 MiB/sec @ 3 ghz
    Alignment 3 - 8.069 bytes/cycle - 23084.34 MiB/sec @ 3 ghz
    Alignment 2 - 8.072 bytes/cycle - 23094.01 MiB/sec @ 3 ghz
    Alignment 1 - 8.070 bytes/cycle - 23089.31 MiB/sec @ 3 ghz
    Alignment 0 - 8.101 bytes/cycle - 23177.19 MiB/sec @ 3 ghz
    Average - 8.076 bytes/cycle - 23106.68 MiB/sec @ 3 ghz
    ```

    另一台 Ryzen5 PRO 4650G

    ```
    ./SMHasher --test=Speed xxh3
    --- Testing xxh3 "xxHash v3, 64-bit" GOOD

    [[[ Speed Tests ]]]

    Bulk speed test - 262144-byte keys
    Alignment 7 - 13.483 bytes/cycle - 38574.27 MiB/sec @ 3 ghz
    Alignment 6 - 13.476 bytes/cycle - 38556.37 MiB/sec @ 3 ghz
    Alignment 5 - 13.406 bytes/cycle - 38354.21 MiB/sec @ 3 ghz
    Alignment 4 - 13.398 bytes/cycle - 38331.72 MiB/sec @ 3 ghz
    Alignment 3 - 13.452 bytes/cycle - 38486.57 MiB/sec @ 3 ghz
    Alignment 2 - 13.452 bytes/cycle - 38487.14 MiB/sec @ 3 ghz
    Alignment 1 - 13.427 bytes/cycle - 38414.08 MiB/sec @ 3 ghz
    Alignment 0 - 16.680 bytes/cycle - 47721.07 MiB/sec @ 3 ghz
    Average - 13.847 bytes/cycle - 39615.68 MiB/sec @ 3 ghz

    ./SMHasher --test=Speed rapidhash
    --- Testing rapidhash "rapidhash v1" GOOD

    [[[ Speed Tests ]]]

    Bulk speed test - 262144-byte keys
    Alignment 7 - 7.747 bytes/cycle - 22165.33 MiB/sec @ 3 ghz
    Alignment 6 - 7.742 bytes/cycle - 22150.55 MiB/sec @ 3 ghz
    Alignment 5 - 7.753 bytes/cycle - 22182.48 MiB/sec @ 3 ghz
    Alignment 4 - 7.767 bytes/cycle - 22222.63 MiB/sec @ 3 ghz
    Alignment 3 - 7.816 bytes/cycle - 22360.94 MiB/sec @ 3 ghz
    Alignment 2 - 7.808 bytes/cycle - 22338.03 MiB/sec @ 3 ghz
    Alignment 1 - 7.805 bytes/cycle - 22329.86 MiB/sec @ 3 ghz
    Alignment 0 - 7.819 bytes/cycle - 22371.62 MiB/sec @ 3 ghz
    Average - 7.782 bytes/cycle - 22265.18 MiB/sec @ 3 ghz
    ```


    学术用语 pseudorandom function (PRF)


    XXH3 诞生帖
    https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html
    https://news.ycombinator.com/item?id=19402602

    阅读比较新的 hash 算法作者本人发布的帖子
    被认可和广泛采纳的 hash 可能宣传该 hash 如何 beat 了其他 hash
    指令集
    对于大/小输入的区别
    hash 的吞吐和延迟

    追求“多快好省”的 hash 算法, 因为存在又慢又差的 hash 算法


    SipHash 诞生记, 从 Hash DoS 攻击讲起, 当时太多的软件基础不能抵抗该攻击
    [29C3: Hash-flooding DoS reloaded: attacks and defenses (EN)](https://www.youtube.com/watch?v=Vdrab3sB7MU)
    SipHash 早期文档 https://web.archive.org/web/20190207205346/https://www.131002.net/siphash/

    SipHash 没有被 go 代码维护者足够 like, golang 标准库不会提供 SipHash
    https://github.com/golang/go/issues/19659

    有人提议 golang map 使用 SipHash 以加强安全, 因维护者观点不同被拒绝

    > 1. golang map 的用户(开发者)是安全的第一责任人, key 如果使用用户输入是 map 用户的责任
    > 2. 当前没有对 golang map Hash-flooding 的 POC

    https://github.com/golang/go/issues/9365


    尽管上面两条情况没有发生变化, go 还是修改了 aeshash 实现来避免攻击, 是因为心虚吗? 既然要修改 aeshash, 为何不使用 SipHash?
    https://github.com/golang/go/commit/91059de095703ebc4ce6b8bad7a0a40dedeef7dc

    hash 实现可以充分使用高级指令集, golang map 内部 hash 算法会使用 aes 指令(不是用于加密)
    https://github.com/tildeleb/aeshash

    内部 hash 可以实现的很混乱, 毕竟 key 无需暴露和传递

    怎么就算 cryptographic hash function?
    linux 内核文档 https://docs.kernel.org/security/siphash.html 提到 SipHash is a cryptographically secure PRF

    不同于 golang 对 SipHash 无端的抵触, linux 内核决定用 SipHash 替换 "all of these hashing functions"
    https://lwn.net/Articles/711167/

    https://github.com/BLAKE3-team/BLAKE3

    hash 的演进方向
    1. beat 竞争对手, 加入 smhasher
    2. 发帖演讲宣传
    3. 公布正式文档
    4. 提供便于使用的二进制, 固定实现, fingerprint functions
    5. 采用高级指令集
    6. 并行化