Skip to content

Instantly share code, notes, and snippets.

@maaku
Last active November 17, 2020 21:53
Show Gist options
  • Select an option

  • Save maaku/41b0054de0731321d23e9da90ba4ee0a to your computer and use it in GitHub Desktop.

Select an option

Save maaku/41b0054de0731321d23e9da90ba4ee0a to your computer and use it in GitHub Desktop.

Revisions

  1. maaku revised this gist Oct 30, 2017. 9 changed files with 0 additions and 0 deletions.
    File renamed without changes.
    File renamed without changes.
    File renamed without changes
    File renamed without changes.
    File renamed without changes
    File renamed without changes.
  2. maaku revised this gist Oct 30, 2017. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -285,7 +285,7 @@ and a 'mergemerklebranch' RPC for unifying two or more fast Merkle tree inclusio

    This BIP is used by BIP116 (MERKLEBRANCHVERIFY)[4] to add Merkle inclusion proof verification to script by means of a soft-fork NOP expansion opcode.
    Deployment of MERKLEBRANCHVERIFY would make the contents of this BIP consensus critical.
    The deployment plan for BIP116 is covered in the text of that BIP..
    The deployment plan for BIP116 is covered in the text of that BIP.

    ==References==

    @@ -295,4 +295,4 @@ The deployment plan for BIP116 is covered in the text of that BIP..

    [3] [http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf Secure Hash Standard]

    [4] [https://github.com/bitcoin/bips/blob/master/bip-0116.mediawiki BIP 116] MERKLEBRANCHVERIFY
    [4] [https://github.com/bitcoin/bips/blob/master/bip-0116.mediawiki BIP 116 MERKLEBRANCHVERIFY]
  3. maaku revised this gist Oct 28, 2017. 10 changed files with 4 additions and 4 deletions.
    File renamed without changes.
    File renamed without changes.
    File renamed without changes
    File renamed without changes.
    File renamed without changes
    File renamed without changes.
    8 changes: 4 additions & 4 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -47,7 +47,7 @@ This BIP describes such a structure, and provides an example implementation.
    A Merkle hash-tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and inner nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    The following image depicts an example unbalanced hash-tree:

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/6e6a42b4ec905207eab1b69401b6cc0fffa42b2f/unbalanced-hash-tree.png]]
    :: [[File:bip-0098/unbalanced-hash-tree.png]]
    '''A''', '''B''', and '''C''' are leaf labels, 32-byte double-SHA256 hashes of the data associated with the leaf.
    '''Node''' and '''Root''' are inner nodes, whose labels are fast-SHA256 (defined below) hashes of their respective children's labels.
    @@ -123,7 +123,7 @@ followed by the hashes themselves in the order previously traversed.

    There are eight possible configurations of internal nodes, as given in the following diagram:

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/node-variants.png]]
    :: [[File:bip-0098/node-variants.png]]
    In this diagram, DESCEND means the branch links to another internal node, as indicated by the its child graph elements labeled "...";
    SKIP means the branch contains a hash of an elided subtree or element, and the fast-SHA256 root hash of this subtree or double-SHA256 hash of the element is included in the proof structure; and
    @@ -187,7 +187,7 @@ and the number of SKIP hashes can be either 0 or 1.

    Consider the following Merkle tree structure:

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/traversal-example.png]]
    :: [[File:bip-0098/traversal-example.png]]
    There are six (6) internal nodes.
    The depth-first, left-to-right, pre-order traversal of the tree visits these nodes in the following order: A, B, D, F, C, then E.
    @@ -228,7 +228,7 @@ The resulting 101 byte proof, encoded in base64:.
    The 3-bit encoding for inner nodes allows encoding all relevant configurations of the nodes where the left and right branches can each be one of {DESCEND, SKIP, VERIFY}.
    The excluded 9th possibility would have both branches as SKIP:

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/skip-skip.png]]
    :: [[File:bip-0098/skip-skip.png]]
    This possibility is not allowed as for verification purposes it is entirely equivalent to the shorter proof where the branch to that node was SKIP'ed.
    Disallowing a node with two SKIP branches eliminates what would otherwise be a source of proof malleability.
  4. maaku revised this gist Oct 28, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -295,4 +295,4 @@ The deployment plan for BIP116 is covered in the text of that BIP..

    [3] [http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf Secure Hash Standard]

    [4] [https://github.com/bitcoin/bips/blob/master/bip-0116.mediawiki BIP 116] MERKLEBRANCHVERIFY
    [4] [https://github.com/bitcoin/bips/blob/master/bip-0116.mediawiki BIP 116] MERKLEBRANCHVERIFY
  5. maaku revised this gist Oct 28, 2017. 1 changed file with 8 additions and 0 deletions.
    8 changes: 8 additions & 0 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -281,10 +281,18 @@ An implementation of this BIP for extraction of Merkle branches and fast, log-sp
    Also included in this repo is a 'merklebranch' RPC for calculating root values and extracting inclusion proofs for both arbitrary trees and trees constructed from lists of values using the algorithm in this BIP,
    and a 'mergemerklebranch' RPC for unifying two or more fast Merkle tree inclusion proofs--replacing SKIP hashes in one proof with a subtree extracted from another.

    ==Deployment==

    This BIP is used by BIP116 (MERKLEBRANCHVERIFY)[4] to add Merkle inclusion proof verification to script by means of a soft-fork NOP expansion opcode.
    Deployment of MERKLEBRANCHVERIFY would make the contents of this BIP consensus critical.
    The deployment plan for BIP116 is covered in the text of that BIP..

    ==References==

    [1] [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-2459 National Vulnerability Database: CVE-2012-2459]

    [2] [https://github.com/sipa/bitcoin/tree/201709_dsha256_64 github.com:sipa/bitcoin 201709_dsha256_64] Pieter Wuille, September 2017, personal communication. By making use of knowledge that the inputs at each stage are fixed length, Mr. Wuille was able to achieve a 22.7% reduction in the time it takes to compute the double-SHA256 hash of 64 bytes of data, the hash aggregation function of the Satoshi Merkle tree construction.

    [3] [http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf Secure Hash Standard]

    [4] [https://github.com/bitcoin/bips/blob/master/bip-0116.mediawiki BIP 116] MERKLEBRANCHVERIFY
  6. maaku revised this gist Oct 28, 2017. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -3,6 +3,8 @@
    Layer: Consensus (soft fork)
    Title: Fast Merkle Trees
    Author: Mark Friedenbach <[email protected]>
    Kalle Alm <[email protected]>
    BtcDrak <[email protected]>
    Status: Draft
    Type: Standards Track
    Created: 2017-08-24
  7. maaku revised this gist Oct 28, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,5 @@
    <pre>
    BIP: ???
    BIP: 98
    Layer: Consensus (soft fork)
    Title: Fast Merkle Trees
    Author: Mark Friedenbach <[email protected]>
  8. maaku revised this gist Oct 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -174,7 +174,7 @@ These excess bits must be zero.

    Note that the tree serialization is self-segmenting.
    By tracking tree structure a proof reader will know when the parser has reached the last internal node.
    The number of innder nodes serialized in the proof MUST equal the number of nodes inferred from the tree structure itself.
    The number of inner nodes serialized in the proof MUST equal the number of nodes inferred from the tree structure itself.
    Similarly, the number of SKIP hashes can also be inferred from the tree structure as serialized, and MUST equal the number of hashes provided within the proof.

    The single-hash proof has N=0 (the number of inner nodes),
  9. maaku revised this gist Oct 19, 2017. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -123,8 +123,8 @@ There are eight possible configurations of internal nodes, as given in the follo

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/node-variants.png]]
    In this diagram, DESCEND means the branch links to another internal node, as indicated by the its child graph elements labeled "..."
    SKIP means the branch contains a hash of an elided subtree or element, and the fast-SHA256 root hash of this subtree or double-SHA256 hash of the element is included in the proof structure.
    In this diagram, DESCEND means the branch links to another internal node, as indicated by the its child graph elements labeled "...";
    SKIP means the branch contains a hash of an elided subtree or element, and the fast-SHA256 root hash of this subtree or double-SHA256 hash of the element is included in the proof structure; and
    VERIFY means the branch contains an externally provided hash that is needed as witness for the verification of the proof.
    In tabular form, these code values are:

  10. maaku revised this gist Oct 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -123,7 +123,7 @@ There are eight possible configurations of internal nodes, as given in the follo

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/node-variants.png]]
    In this diagram, DESCEND means the branch links to another internal node, as indicated by the its child graph elements labeled "...".
    In this diagram, DESCEND means the branch links to another internal node, as indicated by the its child graph elements labeled "..."
    SKIP means the branch contains a hash of an elided subtree or element, and the fast-SHA256 root hash of this subtree or double-SHA256 hash of the element is included in the proof structure.
    VERIFY means the branch contains an externally provided hash that is needed as witness for the verification of the proof.
    In tabular form, these code values are:
  11. maaku revised this gist Oct 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -115,7 +115,7 @@ To prove that a set of hashes is contained within a Merkle tree with a given roo
    Typically the last two elements, the paths and the elided branch hashes, are lumped together and referred to as the ''proof''.

    Serialization begins with a variable-length integer (VarInt) used to encode N, the number of internal nodes in the proof.
    Next the structure of the tree is traversed using depth-first, left-to-right, pre-order algorithm to visit each internal nodes, which is serialized using a packed 3-bit representation for the configuration of each node, consuming <code>(3*N + 7) / 8</code> bytes.
    Next the structure of the tree is traversed using depth-first, left-to-right, pre-order algorithm to visit each internal nodes, which are serialized using a packed 3-bit representation for the configuration of each node, consuming <code>(3*N + 7) / 8</code> bytes.
    Then the number skipped hashes (those included in the proof, not verified by the proof) is serialized as a variable-length integer (VarInt),
    followed by the hashes themselves in the order previously traversed.

  12. maaku revised this gist Oct 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -89,7 +89,7 @@ so the sorts of attacks prevented by message padding and double-hashing do not a

    The 'initialization vector' for fast-SHA256 is changed in order to prevent a category of attacks on higher level protocols where a partial collision can serve as both a leaf hash and as an inner node commitment to another leaf hash.
    The IV is computed using standard SHA-256 plus midstate extraction so as to preserve compatability with cryptographic library interfaces that do not support custom IVs, at the cost of a 2x performance hit if neither custom IVs nor resuming from midstate are supported.
    The data to be hashed data is a nothing-up-my-sleeve number that is unlikely to have a known hash preimage.
    The data hashed is a nothing-up-my-sleeve number that is unlikely to have a known hash preimage.
    The prime 23 was chosen as the leading fractional bits of the first eight (8) primes, two (2) through nineteen (19), are constants used in the setup of SHA-256 itself.
    Using the next prime in sequence reduces the likelihood of introducing weakness due to reuse of a constant factor.

  13. maaku revised this gist Oct 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -91,7 +91,7 @@ The 'initialization vector' for fast-SHA256 is changed in order to prevent a cat
    The IV is computed using standard SHA-256 plus midstate extraction so as to preserve compatability with cryptographic library interfaces that do not support custom IVs, at the cost of a 2x performance hit if neither custom IVs nor resuming from midstate are supported.
    The data to be hashed data is a nothing-up-my-sleeve number that is unlikely to have a known hash preimage.
    The prime 23 was chosen as the leading fractional bits of the first eight (8) primes, two (2) through nineteen (19), are constants used in the setup of SHA-256 itself.
    Using the next prime in sequence reduces the risk of introducing weakness due to a reuse of the same constant factor.
    Using the next prime in sequence reduces the likelihood of introducing weakness due to reuse of a constant factor.

    The Merkle root hash of a single element tree is a simple pass-through of the leaf hash without modification so as to allow for chained validation of split proofs.
    This is particularly useful when the validation environment constrains proof sizes, such as push limits in Bitcoin script.
  14. maaku revised this gist Oct 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -79,7 +79,7 @@ and a Merkle tree with a single value has a root hash label equal to that self-s

    ===Rationale===

    The fast-SHA256 hash function can be calculated 2.32x faster than a specialized double-SHA256 implementation[2], or three (3) times faster than an implementation applying generic SHA-256 twice,
    The fast-SHA256 hash function can be calculated 2.32x faster than a specialized double-SHA256 implementation[2], or three (3) times faster than an implementation applying a generic SHA-256 primitive twice,
    as hashing 64 bytes of data with SHA-256 as specified by FIPS 180-4[3] takes two compression runs (because of message padding) and then a third compression run for the double-SHA256 construction.
    Validating a fast-SHA256 Merkle root is therefore more than twice as fast as the double-SHA256 construction used by Satoshi in bitcoin.
    Furthermore the fastest fast-SHA256 implementation ''is'' the generic SHA-256 implementation, enabling generic circuitry and code reuse without a cost to performance.
  15. maaku revised this gist Oct 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -27,7 +27,7 @@ Bitcoin uses a unique Merkle hash-tree construct invented by Satoshi for calcula
    While it would be convenient for new applications to make use of this same data structure so as to share implementation and maintenance costs, there are three principle drawbacks to reuse.

    First, Satoshi's Merkle hash-tree has a serious vulnerability[1] related to duplicate tree entries that can cause bugs in protocols that use it.
    While it is possible to secure protocols and implementations against exploit of this flaw, it requires foresight and a is a bit more tricky to design secure protocols that work around this vulnerability.
    While it is possible to secure protocols and implementations against exploit of this flaw, it requires foresight and it is a bit more tricky to design secure protocols that work around this vulnerability.
    Designers of new protocols ought avoid using the Satoshi Merkle hash-tree construct where at all possible in order to responsibly decrease the likelihood of downstream bugs in naïve implementations.

    Second, Satoshi's Merkle hash-tree performs an unnecessary number of cryptographic hash function compression rounds, resulting in construction and validation times that are approximately three (3) times more computation than is strictly necessary in a naïve implementation, or 2.32x more computation in an implementation specialized for this purpose only[2].
  16. maaku revised this gist Oct 19, 2017. 1 changed file with 6 additions and 6 deletions.
    12 changes: 6 additions & 6 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -42,21 +42,21 @@ This BIP describes such a structure, and provides an example implementation.

    ==Specification==

    A Merkle hash-tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and interior nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    A Merkle hash-tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and inner nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    The following image depicts an example unbalanced hash-tree:

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/6e6a42b4ec905207eab1b69401b6cc0fffa42b2f/unbalanced-hash-tree.png]]
    '''A''', '''B''', and '''C''' are leaf labels, 32-byte double-SHA256 hashes of the data associated with the leaf.
    '''Node''' and '''Root''' are interior nodes, whose labels are fast-SHA256 (defined below) hashes of their respective children's labels.
    '''Node''' and '''Root''' are inner nodes, whose labels are fast-SHA256 (defined below) hashes of their respective children's labels.
    '''Node''' is labelled with the fast-SHA256 hash of the concatination of '''B''' and '''C'''.
    '''Root''' is labelled with the fast-SHA256 hash of the concatination of '''A''' and '''Node''', and is the ''Merkle root'' of the tree.
    Nodes with single children are not allowed.

    The ''double-SHA256'' cryptographic hash function takes an arbitrary-length data as input and produces a 32-byte hash by running the data through the SHA-256 hash function as specified in FIPS 180-4[3], and then running the same hash function again on the 32-byte result, as a protection against length-extension attacks.

    The ''fast-SHA256'' cryptographic hash function takes two 32-byte hash values, concatenates these to produce a 64-byte buffer, and applies a single run of the SHA-256 hash function with a custom 'initialization vector' (IV) and without message paddding.
    The result is a 32-byte 'midstate' which is the combined hash value and the label of the interior node.
    The result is a 32-byte 'midstate' which is the combined hash value and the label of the inner node.
    The changed IV protects against path-length extension attacks (grinding to interpret a hash as both an inner node and a leaf).
    fast-SHA256 is only defined for two 32-byte inputs.
    The custom IV is the intermediate hash value generated after performing a standard SHA-256 of the following hex-encoded bytes and extracting the midstate:
    @@ -84,7 +84,7 @@ as hashing 64 bytes of data with SHA-256 as specified by FIPS 180-4[3] takes two
    Validating a fast-SHA256 Merkle root is therefore more than twice as fast as the double-SHA256 construction used by Satoshi in bitcoin.
    Furthermore the fastest fast-SHA256 implementation ''is'' the generic SHA-256 implementation, enabling generic circuitry and code reuse without a cost to performance.

    The application of fast-SHA256 to interior-node label updates is safe in this limited domain because the inputs are hash values and fixed in number and in length,
    The application of fast-SHA256 to inner node label updates is safe in this limited domain because the inputs are hash values and fixed in number and in length,
    so the sorts of attacks prevented by message padding and double-hashing do not apply.

    The 'initialization vector' for fast-SHA256 is changed in order to prevent a category of attacks on higher level protocols where a partial collision can serve as both a leaf hash and as an inner node commitment to another leaf hash.
    @@ -108,7 +108,7 @@ This section specifies a standard encoding for a multi-element inclusion proof.
    To prove that a set of hashes is contained within a Merkle tree with a given root requires four pieces of information:

    # The root hash of the Merkle tree;
    # The hash values to be verified, a set usually consisting of the double-SHA256 hash of data elements, but potentially the labels of interior nodes instead, or both;
    # The hash values to be verified, a set usually consisting of the double-SHA256 hash of data elements, but potentially the labels of inner nodes instead, or both;
    # The paths from the root to the nodes containing the values under consideration, expressed as a serialized binary tree structure; and
    # The hash values of branches not taken along those paths.
    @@ -267,7 +267,7 @@ It differs in a subtle but important way from the algorithm used by Satoshi so a
    # The last remaining item in the list is the Merkle root.
    This algorithm differs from Merkle lists used in bitcoin in two ways.
    First, fast-SHA256 is used instead of double-SHA256 for interior node labels.
    First, fast-SHA256 is used instead of double-SHA256 for inner node labels.
    Second, final entries on an odd-length list are not duplicated and hashed, which is the mistake that led to CVE-2012-2459[1].

    ==Implementation==
  17. maaku revised this gist Oct 19, 2017. 1 changed file with 8 additions and 8 deletions.
    16 changes: 8 additions & 8 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -13,17 +13,17 @@
    ==Abstract==

    In many applications it is useful to prove membership of a data element in a set without having to reveal the entire contents of that set.
    The Merkle hash tree, where inner/non-leaf nodes are labeled with the hash of the labels or values of its children, is a cryptographic tool that achieves this goal.
    Bitcoin uses a Merkle hash tree construct for committing the transactions of a block into the block header.
    The Merkle hash-tree, where inner/non-leaf nodes are labeled with the hash of the labels or values of its children, is a cryptographic tool that achieves this goal.
    Bitcoin uses a Merkle hash-tree construct for committing the transactions of a block into the block header.
    This particular design, created by Satoshi, suffers from a serious flaw related to duplicate entries documented in the National Vulnerability Database as CVE-2012-2459[1], and also suffers from less than optimal performance due to unnecessary double-hashing.

    This Bitcoin Improvement Proposal describes a more efficient Merkle hash tree construct that is not vulnerable to CVE-2012-2459
    and achieves an approximate 55% decrease in hash tree construction and validation times as compared with fully optimized implementations of the Satoshi Merkle hash tree construct.
    This Bitcoin Improvement Proposal describes a more efficient Merkle hash-tree construct that is not vulnerable to CVE-2012-2459
    and achieves an approximate 55% decrease in hash-tree construction and validation times as compared with fully optimized implementations of the Satoshi Merkle hash-tree construct.

    ==Motivation==

    A Merkle hash-tree is a directed acyclic graph data structure where all non-terminal nodes are labeled with the hash of combined labels or values of the node(s) it is connected to.
    Bitcoin uses a unique Merkle hash tree construct invented by Satoshi for calculating the block header commitment to the list of transactions in a block.
    Bitcoin uses a unique Merkle hash-tree construct invented by Satoshi for calculating the block header commitment to the list of transactions in a block.
    While it would be convenient for new applications to make use of this same data structure so as to share implementation and maintenance costs, there are three principle drawbacks to reuse.

    First, Satoshi's Merkle hash-tree has a serious vulnerability[1] related to duplicate tree entries that can cause bugs in protocols that use it.
    @@ -42,8 +42,8 @@ This BIP describes such a structure, and provides an example implementation.

    ==Specification==

    A Merkle hash tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and interior nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    The following image depicts an example unbalanced hash tree:
    A Merkle hash-tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and interior nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    The following image depicts an example unbalanced hash-tree:

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/6e6a42b4ec905207eab1b69401b6cc0fffa42b2f/unbalanced-hash-tree.png]]
    @@ -102,7 +102,7 @@ thereby allowing, for example, a fixed series of four (4) chained validations to

    ==Inclusion Proofs==

    An important use of Merkle hash trees is the ability to compactly prove membership with log-sized proofs.
    An important use of Merkle hash-trees is the ability to compactly prove membership with log-sized proofs.
    This section specifies a standard encoding for a multi-element inclusion proof.

    To prove that a set of hashes is contained within a Merkle tree with a given root requires four pieces of information:
  18. maaku revised this gist Oct 19, 2017. 1 changed file with 21 additions and 21 deletions.
    42 changes: 21 additions & 21 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -192,27 +192,27 @@ The depth-first, left-to-right, pre-order traversal of the tree visits these nod
    There are three (3) skipped hashes, visited in the following order: 0x00..., 0x66..., and 0x22...
    The remaining four (4) hashes are provided at runtime to be verified by the proof.

    {|
    | scope="col"|
    | scope="col"| Byte 1
    | scope="col"| Byte 2
    | scope="col"| Byte 3
    |-
    | scope="row"| Bits
    | 76543210
    | 76543210
    | 76543210
    |-
    | scope="row"| Nodes
    | AAABBBDD
    | DFFFCCCE
    | EE------
    |-
    | scope="row"| Code
    | 10111101
    | 10000100
    | 01000000
    |}
    {|
    | scope="col"|
    | scope="col"| Byte 1
    | scope="col"| Byte 2
    | scope="col"| Byte 3
    |-
    | scope="row"| Bits
    | 76543210
    | 76543210
    | 76543210
    |-
    | scope="row"| Nodes
    | AAABBBDD
    | DFFFCCCE
    | EE------
    |-
    | scope="row"| Code
    | 10111101
    | 10000100
    | 01000000
    |}

    The serialization begins with the VarInt encoded number of inner nodes, <code>0x06</code>, followed by the tree serialization itself, <code>0xbd8440</code>.
    Next the number of SKIP hashes is VarInt encoded, <code>0x03</code>, followed by the three (3) hashes in sequence.
  19. maaku revised this gist Oct 19, 2017. 1 changed file with 38 additions and 38 deletions.
    76 changes: 38 additions & 38 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -128,44 +128,44 @@ SKIP means the branch contains a hash of an elided subtree or element, and the f
    VERIFY means the branch contains an externally provided hash that is needed as witness for the verification of the proof.
    In tabular form, these code values are:

    {| class="wikitable"
    |-
    | scope="col"| Code
    | scope="col"| Left
    | scope="col"| Right
    |-
    | scope="row"| 000
    | VERIFY
    | SKIP
    |-
    | scope="row"| 001
    | VERIFY
    | VERIFY
    |-
    | scope="row"| 010
    | VERIFY
    | DESCEND
    |-
    | scope="row"| 011
    | DESCEND
    | SKIP
    |-
    | scope="row"| 100
    | DESCEND
    | VERIFY
    |-
    | scope="row"| 101
    | DESCEND
    | DESCEND
    |-
    | scope="row"| 110
    | SKIP
    | VERIFY
    |-
    | scope="row"| 111
    | SKIP
    | DESCEND
    |}
    {| class="wikitable"
    |-
    | scope="col"| Code
    | scope="col"| Left
    | scope="col"| Right
    |-
    | scope="row"| 000
    | VERIFY
    | SKIP
    |-
    | scope="row"| 001
    | VERIFY
    | VERIFY
    |-
    | scope="row"| 010
    | VERIFY
    | DESCEND
    |-
    | scope="row"| 011
    | DESCEND
    | SKIP
    |-
    | scope="row"| 100
    | DESCEND
    | VERIFY
    |-
    | scope="row"| 101
    | DESCEND
    | DESCEND
    |-
    | scope="row"| 110
    | SKIP
    | VERIFY
    |-
    | scope="row"| 111
    | SKIP
    | DESCEND
    |}

    These 3-bit codes are packed into a byte array such that eight (8) codes would fit in every three (3) bytes.
    The order of filling a byte begins with the most significant bit <code>0x80</code> and ends with the lest significant bit <code>0x01</code>.
  20. maaku revised this gist Oct 19, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -43,7 +43,7 @@ This BIP describes such a structure, and provides an example implementation.
    ==Specification==

    A Merkle hash tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and interior nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    The following image depicts an unbalanced hash tree that will be used as an example in this specification:
    The following image depicts an example unbalanced hash tree:

    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/6e6a42b4ec905207eab1b69401b6cc0fffa42b2f/unbalanced-hash-tree.png]]
  21. maaku revised this gist Oct 19, 2017. 1 changed file with 3 additions and 3 deletions.
    6 changes: 3 additions & 3 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -45,7 +45,7 @@ This BIP describes such a structure, and provides an example implementation.
    A Merkle hash tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and interior nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    The following image depicts an unbalanced hash tree that will be used as an example in this specification:

    :: [[File:https://gist.githubusercontent.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/68cedce98f10d2ce7573e121463aa3fb8c9db9c4/unbalanced-hash-tree.png]]
    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/6e6a42b4ec905207eab1b69401b6cc0fffa42b2f/unbalanced-hash-tree.png]]
    '''A''', '''B''', and '''C''' are leaf labels, 32-byte double-SHA256 hashes of the data associated with the leaf.
    '''Node''' and '''Root''' are interior nodes, whose labels are fast-SHA256 (defined below) hashes of their respective children's labels.
    @@ -185,7 +185,7 @@ and the number of SKIP hashes can be either 0 or 1.

    Consider the following Merkle tree structure:

    :: [File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/traversal-example.png]
    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/traversal-example.png]]
    There are six (6) internal nodes.
    The depth-first, left-to-right, pre-order traversal of the tree visits these nodes in the following order: A, B, D, F, C, then E.
    @@ -226,7 +226,7 @@ The resulting 101 byte proof, encoded in base64:.
    The 3-bit encoding for inner nodes allows encoding all relevant configurations of the nodes where the left and right branches can each be one of {DESCEND, SKIP, VERIFY}.
    The excluded 9th possibility would have both branches as SKIP:

    :: [File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/skip-skip.png]
    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/skip-skip.png]]
    This possibility is not allowed as for verification purposes it is entirely equivalent to the shorter proof where the branch to that node was SKIP'ed.
    Disallowing a node with two SKIP branches eliminates what would otherwise be a source of proof malleability.
  22. maaku revised this gist Oct 19, 2017. 1 changed file with 7 additions and 3 deletions.
    10 changes: 7 additions & 3 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -121,7 +121,7 @@ followed by the hashes themselves in the order previously traversed.

    There are eight possible configurations of internal nodes, as given in the following diagram:

    :: [[File:node-variants.png]]
    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/node-variants.png]]
    In this diagram, DESCEND means the branch links to another internal node, as indicated by the its child graph elements labeled "...".
    SKIP means the branch contains a hash of an elided subtree or element, and the fast-SHA256 root hash of this subtree or double-SHA256 hash of the element is included in the proof structure.
    @@ -185,7 +185,7 @@ and the number of SKIP hashes can be either 0 or 1.

    Consider the following Merkle tree structure:

    :: [File:traversal-example.png]
    :: [File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/traversal-example.png]
    There are six (6) internal nodes.
    The depth-first, left-to-right, pre-order traversal of the tree visits these nodes in the following order: A, B, D, F, C, then E.
    @@ -224,7 +224,11 @@ The resulting 101 byte proof, encoded in base64:.
    ===Rationale===

    The 3-bit encoding for inner nodes allows encoding all relevant configurations of the nodes where the left and right branches can each be one of {DESCEND, SKIP, VERIFY}.
    The excluded 9th possibility would have both branches as SKIP--this possibility is not allowed as for verification purposes it is entirely equivalent to the shorter proof where the branch to that node was SKIP'ed.
    The excluded 9th possibility would have both branches as SKIP:

    :: [File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/b85c1c05aaec55598fa79770a17141fe86295002/skip-skip.png]
    This possibility is not allowed as for verification purposes it is entirely equivalent to the shorter proof where the branch to that node was SKIP'ed.
    Disallowing a node with two SKIP branches eliminates what would otherwise be a source of proof malleability.

    The number of hashing operations required to verify a proof is one less than the number of hashes (SKIP and VERIFY combined),
  23. maaku revised this gist Oct 19, 2017. 10 changed files with 308 additions and 54 deletions.
    230 changes: 177 additions & 53 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -17,7 +17,8 @@ The Merkle hash tree, where inner/non-leaf nodes are labeled with the hash of th
    Bitcoin uses a Merkle hash tree construct for committing the transactions of a block into the block header.
    This particular design, created by Satoshi, suffers from a serious flaw related to duplicate entries documented in the National Vulnerability Database as CVE-2012-2459[1], and also suffers from less than optimal performance due to unnecessary double-hashing.

    This Bitcoin Improvement Proposal describes a more efficient Merkle hash tree construct that is not vulnerable to CVE-2012-2459 and achieves an approximate 3x decrease in hash tree construction and validation times.
    This Bitcoin Improvement Proposal describes a more efficient Merkle hash tree construct that is not vulnerable to CVE-2012-2459
    and achieves an approximate 55% decrease in hash tree construction and validation times as compared with fully optimized implementations of the Satoshi Merkle hash tree construct.

    ==Motivation==

    @@ -29,7 +30,7 @@ First, Satoshi's Merkle hash-tree has a serious vulnerability[1] related to dupl
    While it is possible to secure protocols and implementations against exploit of this flaw, it requires foresight and a is a bit more tricky to design secure protocols that work around this vulnerability.
    Designers of new protocols ought avoid using the Satoshi Merkle hash-tree construct where at all possible in order to responsibly decrease the likelihood of downstream bugs in naïve implementations.

    Second, Satoshi's Merkle hash-tree performs an unnecessary number of cryptographic hash function compression rounds, resulting in approximately three (3) times more computation for construction and validation than is strictly necessary.
    Second, Satoshi's Merkle hash-tree performs an unnecessary number of cryptographic hash function compression rounds, resulting in construction and validation times that are approximately three (3) times more computation than is strictly necessary in a naïve implementation, or 2.32x more computation in an implementation specialized for this purpose only[2].
    New implementations that do not require backwards compatibility ought to consider hash-tree implementations that do not carry this unnecessary performance hit.

    Third, Satoshi's algorithm presumes construction of a tree index from an ordered list, and therefore is designed to support balanced trees with a uniform path length from root to leaf for all elements in the tree.
    @@ -52,85 +53,205 @@ The following image depicts an unbalanced hash tree that will be used as an exam
    '''Root''' is labelled with the fast-SHA256 hash of the concatination of '''A''' and '''Node''', and is the ''Merkle root'' of the tree.
    Nodes with single children are not allowed.

    The ''double-SHA256'' cryptographic hash function takes an arbitrary-length data as input and produces a 32-byte hash by running the data through the SHA-256 hash function as specified in FIPS 180-4[2], and then running the same hash function again on the result, as a protection against length-extension attacks.
    The ''double-SHA256'' cryptographic hash function takes an arbitrary-length data as input and produces a 32-byte hash by running the data through the SHA-256 hash function as specified in FIPS 180-4[3], and then running the same hash function again on the 32-byte result, as a protection against length-extension attacks.

    The ''fast-SHA256'' cryptographic hash function takes two 32-byte hash values, concatenates these to produce a 64-byte buffer, and single run of the SHA-256 hash function with a custom 'initialization vector' (IV) and without message paddding.
    The result is a 32-byte 'midstate' which is the combined hash value and label of the interior node, with the changed IV protecting against path-length extension attacks (grinding to interpret a hash as both an inner node and a leaf).
    The ''fast-SHA256'' cryptographic hash function takes two 32-byte hash values, concatenates these to produce a 64-byte buffer, and applies a single run of the SHA-256 hash function with a custom 'initialization vector' (IV) and without message paddding.
    The result is a 32-byte 'midstate' which is the combined hash value and the label of the interior node.
    The changed IV protects against path-length extension attacks (grinding to interpret a hash as both an inner node and a leaf).
    fast-SHA256 is only defined for two 32-byte inputs.
    The custom IV is the intermediate hash value generated after performing a standard SHA-256 of the following hex-encoded bytes and extracting the midstate:

    6a09e667f3bcc908 b2fb1366ea957d3e 3adec17512775099 da2f590b0667322a
    95f9060875714587 5163fcdfb907b672 1ee950bc8738f694 f0090e6c7bf44ed1
    cbbb9d5dc1059ed8 e7730eaff25e24a3 f367f2fc266a0373 fe7a4d34486d08ae
    d41670a136851f32 663914b66b4b3c23 1b9e3d7740a60887 63c11d86d446cb1c
    This data is the first 512 fractional bits of the square root of 2.
    This data is the first 512 fractional bits of the square root of 23, the 9th prime number.
    The resulting midstate is used as IV for the fast-SHA256 cryptographic hash function:

    static unsigned char _MidstateIV[32] =
    { 0x91, 0x06, 0x6c, 0x2b, 0x97, 0x5c, 0xc8, 0x32,
    0xe7, 0x6c, 0xd4, 0x01, 0x68, 0x21, 0x8d, 0x36,
    0x18, 0xd0, 0x9b, 0xe1, 0x9a, 0x7b, 0xff, 0xc0,
    0xec, 0x1d, 0xcc, 0xf0, 0x8f, 0x77, 0x5b, 0xbd };
    { 0x89, 0xcc, 0x59, 0xc6, 0xf7, 0xce, 0x43, 0xfc,
    0xf6, 0x12, 0x67, 0x0e, 0x78, 0xe9, 0x36, 0x2e,
    0x76, 0x8f, 0xd2, 0xc9, 0x18, 0xbd, 0x42, 0xed,
    0x0e, 0x0b, 0x9f, 0x79, 0xee, 0xf6, 0x8a, 0x24 };
    As fast-SHA256 is only defined for two hash inputs, there are necessarily two special cases:
    an empty Merkle tree has a root hash value of zero,
    and a Merkle tree with a single value has a root hash value equal to the label of that single node (a passthrough operation with no hashing).
    As fast-SHA256 is only defined for two (2) 32-byte hash inputs, there are necessarily two special cases:
    an empty Merkle tree is not allowed, nor is any root hash defined for such a "tree";
    and a Merkle tree with a single value has a root hash label equal to that self-same value of the leaf branch, the only node in the tree (a passthrough operation with no hashing).

    ==Rationale==
    ===Rationale===

    The fast-SHA256 hash function is roughly three times as fast as double-SHA256 in hash tree verification, as hashing 64 bytes of data with SHA-256 as specified by FIPS 180-4[2] takes two compression runs (because of message padding) and then a third compression run for the double-SHA256 construction.
    Validating a fast-SHA256 Merkle root is therefore 3x as fast as the double-SHA256 construction used by Satoshi in bitcoin.
    The fast-SHA256 hash function can be calculated 2.32x faster than a specialized double-SHA256 implementation[2], or three (3) times faster than an implementation applying generic SHA-256 twice,
    as hashing 64 bytes of data with SHA-256 as specified by FIPS 180-4[3] takes two compression runs (because of message padding) and then a third compression run for the double-SHA256 construction.
    Validating a fast-SHA256 Merkle root is therefore more than twice as fast as the double-SHA256 construction used by Satoshi in bitcoin.
    Furthermore the fastest fast-SHA256 implementation ''is'' the generic SHA-256 implementation, enabling generic circuitry and code reuse without a cost to performance.

    The application of fast-SHA256 to interior-node label updates is safe in this limited domain because the inputs are hash values and fixed in number and in length,
    so the sorts of attacks prevented by message padding and double-hashing do not apply.

    The 'initialization vector' for fast-SHA256 is changed in order to prevent the possible attack of grinding a hash value that can serve both as a leaf hash and as an inner node commitment to another leaf hash.
    The IV is computed using standard SHA-256 plus midstate extraction so as to preserve compatability with cryptographic library interfaces that do not support custom IVs, at the cost of a 2x performance hit in that case.
    The 'initialization vector' for fast-SHA256 is changed in order to prevent a category of attacks on higher level protocols where a partial collision can serve as both a leaf hash and as an inner node commitment to another leaf hash.
    The IV is computed using standard SHA-256 plus midstate extraction so as to preserve compatability with cryptographic library interfaces that do not support custom IVs, at the cost of a 2x performance hit if neither custom IVs nor resuming from midstate are supported.
    The data to be hashed data is a nothing-up-my-sleeve number that is unlikely to have a known hash preimage.
    The prime 23 was chosen as the leading fractional bits of the first eight (8) primes, two (2) through nineteen (19), are constants used in the setup of SHA-256 itself.
    Using the next prime in sequence reduces the risk of introducing weakness due to a reuse of the same constant factor.

    The Merkle root hash of a single element tree is a simple pass-through of the leaf hash without modification so as to allow for chained validation of split proofs.
    This is particularly useful when the validation environment constrains proof sizes, such as push limits in Bitcoin script.
    Chained validation allows a verifier to split one proof into two or more, where the leaf is shown to be under an inner node, and that inner node is shown to be under the root.
    Without pass-through hashing in a single-element tree, use of chained validation would unnecessarily introduce a minimum path length requirement equal to the number of chain links.
    Pass-through hashing of single elements allows instead for one or more of the chained validations to use a "NOP" proof consisting of a zero-length path, thereby allowing, for example, a series of 4 chained validations to verify a length 3 path.
    Pass-through hashing of single elements allows instead for one or more of the chained validations to use a "NOP" proof consisting of a zero-length path,
    thereby allowing, for example, a fixed series of four (4) chained validations to verify a length three (3) or shorter path.

    ==Inclusion Proofs==

    An important use of Merkle hash trees is the ability to compactly prove membership with log-sized proofs.
    This section specifies a standard encoding for a single-element inclusion proof.
    This section specifies a standard encoding for a multi-element inclusion proof.

    To prove that a hash is contained within a Merkle tree with a given root requires four pieces of information:
    To prove that a set of hashes is contained within a Merkle tree with a given root requires four pieces of information:

    # The root hash of the Merkle tree;
    # The hash of the node under consideration, usually the double-SHA256 of a data element, but potentially the label of an interior node instead;
    # The path from the root to the node under consideration, expressed as a series of left/right binary choices; and
    # The hash values of branches not taken along the path.
    Typically the last two elements, the path and the elided branch hashes, are lumped together and referred to as the ''proof''.

    The path is represented as an unsigned integer with a number of bits equal to the length of the path.
    A zero (0) indicates the left-branch is taken and the elided right-branch hash is provided, while a one (1) indicates a right-branch is taken and the elided left-branch hash is provided.
    The list of hashes for branches not taken is equal in size to the length of the path.
    The following algorithm validates a proof:

    # Begin with a root hash '''r''', a leaf hash '''v''', an integer path '''p''', and a list of branch hashes '''b'''.
    # Set '''h''' to the leaf value '''v'''.
    # For i in <nowiki>[0, path length):</nowiki>
    #* If <nowiki>p&(1<<i) [bit i is set in p]: h = fast-SHA256(b[i] || h)</nowiki>
    #* Else <nowiki>[bit i is not set in p]: h = fast-SHA256(h || b[i])</nowiki>
    # Check if '''h''' matches the root hash '''r'''.
    Note what this algorithm implies: lower-order path bits are closer to the element being proved, whereas higher-order path bits are closer to the root, and branch hashes are stored from bottom (sibling to the element) to top (child of the root).

    A proof is serialized by first serializing the path as a variable-length unsigned integer, using the VARINT serialization format, then the hashes are serialized from bottom to top.
    This serialization format implicitly encodes the length of the proof as the number of 32-byte hashes which can be extracted from the stream after the path has been deserialized.
    To prevent proof malleability, a path cannot have a bit set for which there is no corresponding branch hash, and such proofs should fail validation.

    Note that this serialization format is not self-segmenting.
    The deserializer must know how long the proof object is to determine how many hashes to read.
    # The hash values to be verified, a set usually consisting of the double-SHA256 hash of data elements, but potentially the labels of interior nodes instead, or both;
    # The paths from the root to the nodes containing the values under consideration, expressed as a serialized binary tree structure; and
    # The hash values of branches not taken along those paths.
    Typically the last two elements, the paths and the elided branch hashes, are lumped together and referred to as the ''proof''.

    Serialization begins with a variable-length integer (VarInt) used to encode N, the number of internal nodes in the proof.
    Next the structure of the tree is traversed using depth-first, left-to-right, pre-order algorithm to visit each internal nodes, which is serialized using a packed 3-bit representation for the configuration of each node, consuming <code>(3*N + 7) / 8</code> bytes.
    Then the number skipped hashes (those included in the proof, not verified by the proof) is serialized as a variable-length integer (VarInt),
    followed by the hashes themselves in the order previously traversed.

    There are eight possible configurations of internal nodes, as given in the following diagram:

    :: [[File:node-variants.png]]
    In this diagram, DESCEND means the branch links to another internal node, as indicated by the its child graph elements labeled "...".
    SKIP means the branch contains a hash of an elided subtree or element, and the fast-SHA256 root hash of this subtree or double-SHA256 hash of the element is included in the proof structure.
    VERIFY means the branch contains an externally provided hash that is needed as witness for the verification of the proof.
    In tabular form, these code values are:

    {| class="wikitable"
    |-
    | scope="col"| Code
    | scope="col"| Left
    | scope="col"| Right
    |-
    | scope="row"| 000
    | VERIFY
    | SKIP
    |-
    | scope="row"| 001
    | VERIFY
    | VERIFY
    |-
    | scope="row"| 010
    | VERIFY
    | DESCEND
    |-
    | scope="row"| 011
    | DESCEND
    | SKIP
    |-
    | scope="row"| 100
    | DESCEND
    | VERIFY
    |-
    | scope="row"| 101
    | DESCEND
    | DESCEND
    |-
    | scope="row"| 110
    | SKIP
    | VERIFY
    |-
    | scope="row"| 111
    | SKIP
    | DESCEND
    |}
    These 3-bit codes are packed into a byte array such that eight (8) codes would fit in every three (3) bytes.
    The order of filling a byte begins with the most significant bit <code>0x80</code> and ends with the lest significant bit <code>0x01</code>.
    Unless the number of inner nodes is a multiple of eight (8), there will be excess low-order bits in the final byte of serialization.
    These excess bits must be zero.

    Note that the tree serialization is self-segmenting.
    By tracking tree structure a proof reader will know when the parser has reached the last internal node.
    The number of innder nodes serialized in the proof MUST equal the number of nodes inferred from the tree structure itself.
    Similarly, the number of SKIP hashes can also be inferred from the tree structure as serialized, and MUST equal the number of hashes provided within the proof.

    The single-hash proof has N=0 (the number of inner nodes),
    the tree structure is not serialized (as there are no inner nodes),
    and the number of SKIP hashes can be either 0 or 1.

    ===Example===

    Consider the following Merkle tree structure:

    :: [File:traversal-example.png]
    There are six (6) internal nodes.
    The depth-first, left-to-right, pre-order traversal of the tree visits these nodes in the following order: A, B, D, F, C, then E.
    There are three (3) skipped hashes, visited in the following order: 0x00..., 0x66..., and 0x22...
    The remaining four (4) hashes are provided at runtime to be verified by the proof.

    {|
    | scope="col"|
    | scope="col"| Byte 1
    | scope="col"| Byte 2
    | scope="col"| Byte 3
    |-
    | scope="row"| Bits
    | 76543210
    | 76543210
    | 76543210
    |-
    | scope="row"| Nodes
    | AAABBBDD
    | DFFFCCCE
    | EE------
    |-
    | scope="row"| Code
    | 10111101
    | 10000100
    | 01000000
    |}
    The serialization begins with the VarInt encoded number of inner nodes, <code>0x06</code>, followed by the tree serialization itself, <code>0xbd8440</code>.
    Next the number of SKIP hashes is VarInt encoded, <code>0x03</code>, followed by the three (3) hashes in sequence.
    The resulting 101 byte proof, encoded in base64:.

    Br2EQAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGZmZmZmZmZmZmZmZmZmZmZmZmZm
    ZmZmZmZmZmZmZmZmREREREREREREREREREREREREREREREREREREREREREQ=
    ===Rationale===

    The 3-bit encoding for inner nodes allows encoding all relevant configurations of the nodes where the left and right branches can each be one of {DESCEND, SKIP, VERIFY}.
    The excluded 9th possibility would have both branches as SKIP--this possibility is not allowed as for verification purposes it is entirely equivalent to the shorter proof where the branch to that node was SKIP'ed.
    Disallowing a node with two SKIP branches eliminates what would otherwise be a source of proof malleability.

    The number of hashing operations required to verify a proof is one less than the number of hashes (SKIP and VERIFY combined),
    and is exactly equal to the number of inner nodes serialized as the beginning of the proof as N.
    The variable-length integer encoding has the property that serialized integers, sorted lexigraphically, will also be sorted numerically.
    Since the first serialized item is the number of inner nodes, sorting proofs lexigraphically has the effect of sorting the proofs by the amount of work required to verify.

    The number of hashes required as input for verification of a proof is N+1 minus the number of SKIP hashses,
    and can be quickly calculated without parsing the tree structure.

    The coding and packing rules for the serialized tree structure were also chosen to make lexigraphical comparison useful (or at least not meaningless).
    If we consider a fully-expanded tree (no SKIP hashes, all VERIFY) to be encoding a list of elements in the order traversed depth-first from left-to-right,
    then we can extract proofs for subsets of the list by SKIP'ing the hashes of missing values and recursively pruning any resulting SKIP,SKIP nodes.
    Lexigraphically comparing the resulting serialized tree structures is the same as lexigraphically comparing lists of indices from the original list verified by the derived proof.

    Because the number of inner nodes and the number of SKIP hashes is extractible from the tree structure,
    both variable-length integers in the proof are redundant and could have been omitted.
    However that would require either construction and storage of the explicit tree in memory at deserialization time,
    or duplication of the relatively complicated tree parsing code in both the serialization and verification methods.
    For that reason (as well as to handle the single-hash edge case) the redundant inner node and SKIP hash counts are made explicit in the serialization,
    and the two values must match what is inferred from the tree structure for a proof to be valid.
    This makes deserialization trivial and defers tree construction until verification time,
    which has the additional benefit of enabling log-space verification algorithms.

    ==Fast Merkle Lists==

    Many applications use a Merkle tree to provide indexing of, or compact membership proofs of an element in a list.
    Many applications use a Merkle tree to provide indexing of, or compact membership proofs about the elements in a list.
    This ammendum specifies an algorithm that constructs a canonical balanced tree structure for lists of various lengths.
    It differs in a subtle but important way from the algorithm used by Satoshi so as to structurally prevent the vulnerability described in [1].

    @@ -147,14 +268,17 @@ Second, final entries on an odd-length list are not duplicated and hashed, which

    ==Implementation==

    An implementation of this BIP for extraction of Merkle branches and fast Merkle branch validation is available at the following Github repository:
    An implementation of this BIP for extraction of Merkle branches and fast, log-space Merkle branch validation is available at the following Github repository:

    [https://github.com/maaku/bitcoin/tree/fast-merkle-tree]

    Also included in this repo is a 'merklebranch' RPC for calculating root values and extracting inclusion proofs for both arbitrary trees and trees constructed from lists of values using the algorithm in this BIP.
    Also included in this repo is a 'merklebranch' RPC for calculating root values and extracting inclusion proofs for both arbitrary trees and trees constructed from lists of values using the algorithm in this BIP,
    and a 'mergemerklebranch' RPC for unifying two or more fast Merkle tree inclusion proofs--replacing SKIP hashes in one proof with a subtree extracted from another.

    ==References==

    [1] [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-2459 National Vulnerability Database: CVE-2012-2459]

    [2] [http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf Secure Hash Standard]
    [2] [https://github.com/sipa/bitcoin/tree/201709_dsha256_64 github.com:sipa/bitcoin 201709_dsha256_64] Pieter Wuille, September 2017, personal communication. By making use of knowledge that the inputs at each stage are fixed length, Mr. Wuille was able to achieve a 22.7% reduction in the time it takes to compute the double-SHA256 hash of 64 bytes of data, the hash aggregation function of the Satoshi Merkle tree construction.

    [3] [http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf Secure Hash Standard]
    6 changes: 6 additions & 0 deletions build.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    #!/bin/sh

    dot -Tpng -o node-variants.png node-variants.dot
    dot -Tpng -o skip-skip.png skip-skip.dot
    dot -Tpng -o traversal-example.png traversal-example.dot
    dot -Tpng -o unbalanced-hash-tree.png unbalanced-hash-tree.dot
    85 changes: 85 additions & 0 deletions node-variants.dot
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,85 @@
    digraph G {
    row1 [shape=none, label=""]

    A [label="000"]
    A -> Al [label="L"]
    Al [label="VERIFY"]
    A -> Ar [label="R"]
    Ar [label="SKIP"]

    B [label="001"]
    B -> Bl [label="L"]
    Bl [label="VERIFY"]
    B -> Br [label="R"]
    Br [label="VERIFY"]

    { rank = same; row1; A; B; }

    C [label="010"]
    C -> Cl [label="L"]
    Cl [label="VERIFY"]
    C -> Cr [label="R"]
    Cr [label="DESCEND"]
    Cr -> Crl
    Crl [label="..."]
    Cr -> Crr
    Crr [label="..."]

    D [label="011"]
    D -> Dl [label="L"]
    Dl [label="DESCEND"]
    Dl -> Dll
    Dll [label="..."]
    Dl -> Dlr
    Dlr [label="..."]
    D -> Dr [label="R"]
    Dr [label="SKIP"]

    E [label="100"]
    E -> El [label="L"]
    El [label="DESCEND"]
    El -> Ell
    Ell [label="..."]
    El -> Elr
    Elr [label="..."]
    E -> Er [label="R"]
    Er [label="VERIFY"]

    row1 -> invis [style=invis]
    invis [shape=none, label=""]
    invis -> C [style=invis]
    { rank = same; C; D; E; }

    F [label="101"]
    F -> Fl [label="L"]
    Fl [label="DESCEND"]
    Fl -> Fll
    Fll [label="..."]
    Fl -> Flr
    Flr [label="..."]
    F -> Fr [label="R"]
    Fr [label="DESCEND"]
    Fr -> Frl
    Frl [label="..."]
    Fr -> Frr
    Frr [label="..."]

    G [label="110"]
    G -> Gl [label="L"]
    Gl [label="SKIP"]
    G -> Gr [label="R"]
    Gr [label="VERIFY"]

    H [label="111"]
    H -> Hl [label="L"]
    Hl [label="SKIP"]
    H -> Hr [label="R"]
    Hr [label="DESCEND"]
    Hr -> Hrl
    Hrl [label="..."]
    Hr -> Hrr
    Hrr [label="..."]

    Crl -> F [style=invis]
    { rank = same; F; G; H; }
    }
    Binary file added node-variants.png
    Loading
    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.
    7 changes: 7 additions & 0 deletions skip-skip.dot
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,7 @@
    digraph G {
    A [label="???"]
    A -> Al [label="L"]
    Al [label="SKIP"]
    A -> Ar [label="R"]
    Ar [label="SKIP"]
    }
    Binary file added skip-skip.png
    Loading
    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.
    32 changes: 32 additions & 0 deletions traversal-example.dot
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,32 @@
    digraph G {
    a [label="A\n101"]
    a -> b
    a -> c

    b [label="B\n111"]
    b -> s0
    s0 [label="SKIP\n0x00..."]
    b -> d

    d [label="D\n011"]
    d -> f
    d -> s1
    s1 [label="SKIP\n0x22..."]

    f [label="F\n000"]
    f -> v1
    v1 [label="VERIFY\n0x55..."]
    f -> s2
    s2 [label="SKIP\n0x66..."]

    c [label="C\n010"]
    c -> v2
    v2 [label="VERIFY\n0x11..."]
    c -> e

    e [label="E\n001"]
    e -> v3
    v3 [label="VERIFY\n0x33..."]
    e -> v4
    v4 [label="VERIFY\n0x44..."]
    }
    Binary file added traversal-example.png
    Loading
    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.
    2 changes: 1 addition & 1 deletion unbalanced-hash-tree.dot
    Original file line number Diff line number Diff line change
    @@ -7,5 +7,5 @@ digraph G {
    1 -> B
    B [label="B\nskip"]
    1 -> C
    C [label="C\ninclude"]
    C [label="C\nverify"]
    }
    Binary file modified unbalanced-hash-tree.png
    Loading
    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.
  24. maaku revised this gist Oct 10, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -44,7 +44,7 @@ This BIP describes such a structure, and provides an example implementation.
    A Merkle hash tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and interior nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    The following image depicts an unbalanced hash tree that will be used as an example in this specification:

    [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/68cedce98f10d2ce7573e121463aa3fb8c9db9c4/unbalanced-hash-tree.png]]
    :: [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/68cedce98f10d2ce7573e121463aa3fb8c9db9c4/unbalanced-hash-tree.png]]
    '''A''', '''B''', and '''C''' are leaf labels, 32-byte double-SHA256 hashes of the data associated with the leaf.
    '''Node''' and '''Root''' are interior nodes, whose labels are fast-SHA256 (defined below) hashes of their respective children's labels.
  25. maaku revised this gist Oct 10, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,7 @@
    <pre>
    BIP: ???
    Layer: Consensus (soft fork)
    Title: Fast Merkle Trees (Consensus layer)
    Title: Fast Merkle Trees
    Author: Mark Friedenbach <[email protected]>
    Status: Draft
    Type: Standards Track
  26. maaku revised this gist Oct 10, 2017. 4 changed files with 12 additions and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -44,7 +44,7 @@ This BIP describes such a structure, and provides an example implementation.
    A Merkle hash tree as defined by this BIP is an arbitrarily-balanced binary tree whose terminal/leaf nodes are labelled with the double-SHA256 hashes of data, whose format is outside the scope of this BIP, and interior nodes with labels constructed from the fast-SHA256 hash of its children's labels.
    The following image depicts an unbalanced hash tree that will be used as an example in this specification:

    [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/7d9739431c8ee5a207f37ffde34d668bfce0a8c1/unbalanced-hash-tree.png]]
    [[File:https://gist.github.com/maaku/41b0054de0731321d23e9da90ba4ee0a/raw/68cedce98f10d2ce7573e121463aa3fb8c9db9c4/unbalanced-hash-tree.png]]

    '''A''', '''B''', and '''C''' are leaf labels, 32-byte double-SHA256 hashes of the data associated with the leaf.
    '''Node''' and '''Root''' are interior nodes, whose labels are fast-SHA256 (defined below) hashes of their respective children's labels.
    11 changes: 11 additions & 0 deletions unbalanced-hash-tree.dot
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,11 @@
    digraph G {
    0 [label="Root\nH(A || H(B || C))"]
    0 -> A
    A [label="A\nskip"]
    0 -> 1
    1 [label="Node\nH(B || C)"]
    1 -> B
    B [label="B\nskip"]
    1 -> C
    C [label="C\ninclude"]
    }
    Binary file removed unbalanced-hash-tree.graffle
    Binary file not shown.
    Binary file modified unbalanced-hash-tree.png
    Loading
    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.
  27. maaku revised this gist Sep 21, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -83,7 +83,7 @@ Validating a fast-SHA256 Merkle root is therefore 3x as fast as the double-SHA25
    The application of fast-SHA256 to interior-node label updates is safe in this limited domain because the inputs are hash values and fixed in number and in length,
    so the sorts of attacks prevented by message padding and double-hashing do not apply.

    The 'initialization vector' for fast-SHA256 is changed in order to prevent the possible attack of grinding a hash value that can serve as a leaf hash and as an inner node commitment to another leaf hash.
    The 'initialization vector' for fast-SHA256 is changed in order to prevent the possible attack of grinding a hash value that can serve both as a leaf hash and as an inner node commitment to another leaf hash.
    The IV is computed using standard SHA-256 plus midstate extraction so as to preserve compatability with cryptographic library interfaces that do not support custom IVs, at the cost of a 2x performance hit in that case.
    The data to be hashed data is a nothing-up-my-sleeve number that is unlikely to have a known hash preimage.

  28. maaku revised this gist Sep 21, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -83,7 +83,7 @@ Validating a fast-SHA256 Merkle root is therefore 3x as fast as the double-SHA25
    The application of fast-SHA256 to interior-node label updates is safe in this limited domain because the inputs are hash values and fixed in number and in length,
    so the sorts of attacks prevented by message padding and double-hashing do not apply.

    The 'initialization vector' for fast-SHA256 is changed in order to prevent a possible attack of grinding a hash value that can serve as a leaf hash and as an inner node commitment to another leaf hash.
    The 'initialization vector' for fast-SHA256 is changed in order to prevent the possible attack of grinding a hash value that can serve as a leaf hash and as an inner node commitment to another leaf hash.
    The IV is computed using standard SHA-256 plus midstate extraction so as to preserve compatability with cryptographic library interfaces that do not support custom IVs, at the cost of a 2x performance hit in that case.
    The data to be hashed data is a nothing-up-my-sleeve number that is unlikely to have a known hash preimage.

  29. maaku revised this gist Sep 21, 2017. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -54,7 +54,7 @@ Nodes with single children are not allowed.

    The ''double-SHA256'' cryptographic hash function takes an arbitrary-length data as input and produces a 32-byte hash by running the data through the SHA-256 hash function as specified in FIPS 180-4[2], and then running the same hash function again on the result, as a protection against length-extension attacks.

    The ''fast-SHA256'' cryptographic hash function takes two 32-byte hash values, concatenates these to produce a 64-byte buffer, and single run of the SHA-256 hash function with a custom IV and without message paddding.
    The ''fast-SHA256'' cryptographic hash function takes two 32-byte hash values, concatenates these to produce a 64-byte buffer, and single run of the SHA-256 hash function with a custom 'initialization vector' (IV) and without message paddding.
    The result is a 32-byte 'midstate' which is the combined hash value and label of the interior node, with the changed IV protecting against path-length extension attacks (grinding to interpret a hash as both an inner node and a leaf).
    fast-SHA256 is only defined for two 32-byte inputs.
    The custom IV is the intermediate hash value generated after performing a standard SHA-256 of the following hex-encoded bytes and extracting the midstate:
    @@ -83,7 +83,7 @@ Validating a fast-SHA256 Merkle root is therefore 3x as fast as the double-SHA25
    The application of fast-SHA256 to interior-node label updates is safe in this limited domain because the inputs are hash values and fixed in number and in length,
    so the sorts of attacks prevented by message padding and double-hashing do not apply.

    The IV for fast-SHA256 is changed in order to prevent a possible attack of grinding a hash value that can serve as a leaf hash and as an inner node commitment to another leaf hash.
    The 'initialization vector' for fast-SHA256 is changed in order to prevent a possible attack of grinding a hash value that can serve as a leaf hash and as an inner node commitment to another leaf hash.
    The IV is computed using standard SHA-256 plus midstate extraction so as to preserve compatability with cryptographic library interfaces that do not support custom IVs, at the cost of a 2x performance hit in that case.
    The data to be hashed data is a nothing-up-my-sleeve number that is unlikely to have a known hash preimage.

  30. maaku revised this gist Sep 21, 2017. 1 changed file with 10 additions and 0 deletions.
    10 changes: 10 additions & 0 deletions bip-fast-merkle-tree.mediawiki
    Original file line number Diff line number Diff line change
    @@ -83,6 +83,16 @@ Validating a fast-SHA256 Merkle root is therefore 3x as fast as the double-SHA25
    The application of fast-SHA256 to interior-node label updates is safe in this limited domain because the inputs are hash values and fixed in number and in length,
    so the sorts of attacks prevented by message padding and double-hashing do not apply.

    The IV for fast-SHA256 is changed in order to prevent a possible attack of grinding a hash value that can serve as a leaf hash and as an inner node commitment to another leaf hash.
    The IV is computed using standard SHA-256 plus midstate extraction so as to preserve compatability with cryptographic library interfaces that do not support custom IVs, at the cost of a 2x performance hit in that case.
    The data to be hashed data is a nothing-up-my-sleeve number that is unlikely to have a known hash preimage.

    The Merkle root hash of a single element tree is a simple pass-through of the leaf hash without modification so as to allow for chained validation of split proofs.
    This is particularly useful when the validation environment constrains proof sizes, such as push limits in Bitcoin script.
    Chained validation allows a verifier to split one proof into two or more, where the leaf is shown to be under an inner node, and that inner node is shown to be under the root.
    Without pass-through hashing in a single-element tree, use of chained validation would unnecessarily introduce a minimum path length requirement equal to the number of chain links.
    Pass-through hashing of single elements allows instead for one or more of the chained validations to use a "NOP" proof consisting of a zero-length path, thereby allowing, for example, a series of 4 chained validations to verify a length 3 path.

    ==Inclusion Proofs==

    An important use of Merkle hash trees is the ability to compactly prove membership with log-sized proofs.