Skip to content

Instantly share code, notes, and snippets.

@LaurentMT
Last active April 29, 2024 07:45
Show Gist options
  • Select an option

  • Save LaurentMT/e758767ca4038ac40aaf to your computer and use it in GitHub Desktop.

Select an option

Save LaurentMT/e758767ca4038ac40aaf to your computer and use it in GitHub Desktop.

Revisions

  1. LaurentMT revised this gist Jun 23, 2017. 1 changed file with 6 additions and 4 deletions.
    10 changes: 6 additions & 4 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -80,11 +80,13 @@ Observations

    Per se, Max Entropy isn't very interesting but it becomes much more useful when used to compute the following ratios:

    Wallet efficiency = Intrinsic Entropy / Max Entropy
    This ratio may be a proxy for qualifying the efficiency of a wallet when it builds a transaction.
    Wallet efficiency = Intrinsic Entropy - Max Entropy (expressed in bits)
    This metrics may be a proxy for qualifying the efficiency of a wallet when it builds a coinjoin transaction.
    Wallet efficiency can also be expressed as a ratio: IntrinsicNumberOfCombinations(tx) / NumberOfCombinations(closest perfect coinjoin).

    Blockchain efficiency = Actual Entropy / Max Entropy
    This ratio may be a proxy for qualifying the efficiency of the whole ledger/ecosystem in term of protection of users privacy.
    Blockchain efficiency = Actual Entropy - Max Entropy (expressed in bits)
    This metrics may be a proxy for qualifying the efficiency of the whole ledger/ecosystem in term of protection of users privacy.
    Blockchain efficiency can also be expressed as a ratio: ActualNumberOfCombinations(tx) / NumberOfCombinations(closest perfect coinjoin)

    - Rule: Actual entropy of a bitcoin transaction is a dynamic value susceptible to decline over time. At best, it will stay constant. It will never increase.

  2. LaurentMT revised this gist Dec 21, 2015. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -9,7 +9,7 @@ Attacks considered in the scope of these metrics are:
    - Merged Inputs Heuristic: methods identifying the inputs controlled by a same entity
    - Coinjoin Sudoku: methods identifying the links existing between the inputs and outputs of a transaction

    These metrics can be applied to all bitcoin transactions but are specifically useful for qualifying the "quality" of joint transactions (CoinJoin, SharedSend, ...).
    These metrics can be applied to all bitcoin transactions but are specifically useful for qualifying the "quality" of joint transactions (CoinJoin, SharedCoin, ...).



    @@ -137,7 +137,7 @@ TBH, I wasn't expecting such a large coverage ratio for this first round. I'm no

    Analysis of the unprocessed transactions

    It's almost certain that a subset of these transactions is composed of true Coinjoin/SharedSend transactions.
    It's almost certain that a subset of these transactions is composed of true Coinjoin/SharedCoin transactions.
    But it's also likely that another part is composed of large transactions with low entropy (classic payments with a large number of inputs and 2 outputs, ...).
    A better/smarter implementation may allow the refinement of these results. To be done...

  3. LaurentMT revised this gist Dec 20, 2015. 1 changed file with 11 additions and 11 deletions.
    22 changes: 11 additions & 11 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -119,18 +119,18 @@ Implementation
    I've developed a python implementation of an algorithm computing this metric.
    This implementation surely deserves the title of "worst possible solution": brute force algorithm, no parallelization, no optimization of the algorithm (memoization), ...
    The unique optimization used by this implementation is a reduction of the problem space thanks to information provided by external heuristics.
    As a direct consequence of this brute force approach, processing is limited to the most simple transactions (with low number of inputs/outputs).
    As a direct consequence of this brute force approach, processing is limited to the most simple transactions (with #inputs <= 10 & #outputs <= 10).
    Anyway, I was confident this implementation might crunch a good portion of the bitcoin ledger (60-70% of the transactions) so I've decided to keep it for a first round of tests.



    Tests & Results

    - Processing of transactions from block 1 to block 364409 (July 2015)
    - Processing of transactions from block 1 to block 388602 (December 2015)

    - #Txs (Total) = 75,107,329 (100%)
    - #Txs with entropy computed = 74,231,048 (98.83%)
    - #Txs not processed = 876,281 (1.16%)
    - #Txs (Total) = 97,977,912 (100,00%)
    - #Txs with entropy computed = 96,597,663 ( 98,59%)
    - #Txs not processed = 1,380,249 ( 1,41%)

    TBH, I wasn't expecting such a large coverage ratio for this first round. I'm not sure it's great news for privacy.

    @@ -144,17 +144,17 @@ A better/smarter implementation may allow the refinement of these results. To be

    Analysis of the transactions processed

    Cumulative distribution of txs per #combinations (loglog scale): http://i.imgur.com/ZvdtGam.png
    Cumulative distribution of txs per #combinations (loglog scale): http://i.imgur.com/jms8tpi.png
    This distribution measures the probability of a transaction with #combinations >= value

    Cumulative distribution of txs per entropy (loglog scale): http://i.imgur.com/FXEsuzd.png
    Cumulative distribution of txs per entropy (linlog scale): http://i.imgur.com/NlLBL5W.png
    This distribution measures the probability of a transaction with entropy >= value

    Around 90.55% of the transactions processed have a null entropy (i.e. they have inputs & outputs deterministically linked)
    Around 9.45% of the transactions processed have E(tx) >= 1 (i.e. they're as good or better than "ambiguous" transactions)
    Arounf 0.73% of the transactions processed have E(tx) >= 1.585 (i.e. they're as good or better than the most basic coinjoin transactions)
    Around 85.47% of the transactions processed have a null entropy (i.e. they have inputs & outputs deterministically linked)
    Around 14.52% of the transactions processed have E(tx) >= 1 (i.e. they're as good or better than "ambiguous" transactions)
    Around 1.89% of the transactions processed have E(tx) >= 1.585 (i.e. they're as good or better than the most basic coinjoin transactions)

    In best case scenario, we may have around 1.89% of the transactions with an entropy greater or equal to the entropy of a basic coinjoin but it's likely that the real value is somewhere between 1% and 2%.
    In best case scenario, we may have around 3.3% of the transactions with an entropy greater or equal to the entropy of a basic coinjoin but it's likely that the real value is somewhere between 2% and 3%.
    Note that it doesn't imply all these transactions have a coinjoin structure. They just produce higher entropies like coinjoin transactions.


  4. LaurentMT created this gist Sep 21, 2015.
    164 changes: 164 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,164 @@
    This document is an attempt to define metrics quantifying the degree of privacy provided by a bitcoin transaction.


    Objectives

    Definition of metrics measuring the resistance of a transaction to a set of attacks against users privacy.

    Attacks considered in the scope of these metrics are:
    - Merged Inputs Heuristic: methods identifying the inputs controlled by a same entity
    - Coinjoin Sudoku: methods identifying the links existing between the inputs and outputs of a transaction

    These metrics can be applied to all bitcoin transactions but are specifically useful for qualifying the "quality" of joint transactions (CoinJoin, SharedSend, ...).



    Entropy of a Transaction

    The idea of this metric comes from a discussion with Greg Maxwell (1).
    He was suggesting a notion of "Coinjoin Entropy" measuring how many possible mappings of inputs to outputs are possible given the values of the txos.

    Considering that the number of combinations may be very high (b/o the combinatorial explosion created by coinjoin transactions) it's wise to define the notion of entropy:

    E = log2(N)

    with:
    E = entropy of the transaction
    N = number of combinations (mappings of inputs to outputs)

    Basically, it's the Shannon Entropy with the hypothesis that all events have the same probability (i.e. no additional information giving a "preference" to specific combinations).



    Examples

    Example 1: Basic Payment Transaction
    * https://oxt.me/transaction/dcba20fdfe34fe240fa6eacccfb2e58468ba2feafcfff99706145800d09a09a6
    * N = 1
    * E = 0
    * The unique interpretation is: [(I1,I2,I3,I4,I5,I6)=>(O1,O2)]


    Example 2: Ambiguous Transaction
    * https://oxt.me/transaction/8c5feb901f3983b0f28d996f9606d895d75136dbe8d77ed1d6c7340a403a73bf
    * N = 2
    * E = 1
    * Interpretations are:
    - [(I1)=>(O2), (I2)=>(O1)]
    - [(I1,I2)=>(O1,O2)]


    Example 3: Basic Coinjoin Transaction (DarkWallet)
    * https://oxt.me/transaction/8e56317360a548e8ef28ec475878ef70d1371bee3526c017ac22ad61ae5740b8
    * N = 3
    * E = 1.585
    * Interpretations are:
    - [(I1)=>(O1,O2), (I2)=>(O3,O4)]
    - [(I1)=>(O3,O2), (I2)=>(O1,O4)]
    - [(I1,I2)=>(O1,O2,O3,O4)]
    * Note:
    O2 and O4 are the change outputs of the coinjoin transaction.
    For all combinations we have I1=>O2 and I2=>O4.
    We say that I1 is deterministically linked to O2 and I2 is deterministically linked to O4.




    Observations

    - Computation of the metric becomes quickly expensive when privacy tools like coinjoin are used to build the transaction (no surprise here).

    - It's possible to decrease the computational cost thanks to side-channel information allowing to reduce the size of the problem space (e.g. inputs controlled by a same entity can be merged in a single input).
    For example, these information can be provided by any heuristic allowing to cluster addresses (merged inputs, change address, ...).

    - From this notion of Entropy, we can derive several metrics qualifying a transaction:
    - Intrinsic Entropy: it's the value computed without any additional information, when the transaction is considered separately from the blockchain.
    - Actual Entropy: it's the value taking into account additional information. On my hand I've worked on this one because I think it's the one which matters the most to users.
    - Max Entropy: it's the value associated to a perfect coinjoin transaction having a structure similar or close to the evaluated transaction.
    A perfect coinjoin is a coinjoin with all inputs having the same amount and all outputs having the same amount.
    Here, we call structure the tuple (#inputs, #outputs)

    Per se, Max Entropy isn't very interesting but it becomes much more useful when used to compute the following ratios:

    Wallet efficiency = Intrinsic Entropy / Max Entropy
    This ratio may be a proxy for qualifying the efficiency of a wallet when it builds a transaction.

    Blockchain efficiency = Actual Entropy / Max Entropy
    This ratio may be a proxy for qualifying the efficiency of the whole ledger/ecosystem in term of protection of users privacy.

    - Rule: Actual entropy of a bitcoin transaction is a dynamic value susceptible to decline over time. At best, it will stay constant. It will never increase.



    Limitations

    - This metric is susceptible to be tricked by specific patterns of transactions like steganographic transactions.
    A steganographic transaction aims to hide the real payment inside a "fake" payment.
    Its fingerprint is similar to a classic payment but it differs from a classic payment by involving the payee in the payment process.

    Example: UserA must pay 1.5btc to UserB. UserB (the payee) collaborates to the process by providing an input.

    UserA 2btc -- -- 0.5btc UserA (change)
    \ T1 /
    |-------|
    / \
    UserB 1btc -- -- 2.5btc UserB

    This transaction is indistinguishable from a classic payment (User1 provides 2 inputs. 1 output goes to the payee, 1 output is change).
    According to our previous definition, we have E(T1)=0 instead of E(T1)=1 (i.e. classic payment + steganograhic transaction)


    - This metric is a first good proxy for measuring privacy at transaction level. A transaction with a high entropy is likely to provide better privacy to the users.
    But, as illustrated by the "Coinjoin Sudoku" attack (2), this metric fails to detect privacy leaks occuring at the lower level of specific inputs/outputs.
    Therefore, this metric shall be completed by additional metrics (see part 2)



    Implementation

    I've developed a python implementation of an algorithm computing this metric.
    This implementation surely deserves the title of "worst possible solution": brute force algorithm, no parallelization, no optimization of the algorithm (memoization), ...
    The unique optimization used by this implementation is a reduction of the problem space thanks to information provided by external heuristics.
    As a direct consequence of this brute force approach, processing is limited to the most simple transactions (with low number of inputs/outputs).
    Anyway, I was confident this implementation might crunch a good portion of the bitcoin ledger (60-70% of the transactions) so I've decided to keep it for a first round of tests.



    Tests & Results

    - Processing of transactions from block 1 to block 364409 (July 2015)

    - #Txs (Total) = 75,107,329 (100%)
    - #Txs with entropy computed = 74,231,048 (98.83%)
    - #Txs not processed = 876,281 (1.16%)

    TBH, I wasn't expecting such a large coverage ratio for this first round. I'm not sure it's great news for privacy.


    Analysis of the unprocessed transactions

    It's almost certain that a subset of these transactions is composed of true Coinjoin/SharedSend transactions.
    But it's also likely that another part is composed of large transactions with low entropy (classic payments with a large number of inputs and 2 outputs, ...).
    A better/smarter implementation may allow the refinement of these results. To be done...


    Analysis of the transactions processed

    Cumulative distribution of txs per #combinations (loglog scale): http://i.imgur.com/ZvdtGam.png
    This distribution measures the probability of a transaction with #combinations >= value

    Cumulative distribution of txs per entropy (loglog scale): http://i.imgur.com/FXEsuzd.png
    This distribution measures the probability of a transaction with entropy >= value

    Around 90.55% of the transactions processed have a null entropy (i.e. they have inputs & outputs deterministically linked)
    Around 9.45% of the transactions processed have E(tx) >= 1 (i.e. they're as good or better than "ambiguous" transactions)
    Arounf 0.73% of the transactions processed have E(tx) >= 1.585 (i.e. they're as good or better than the most basic coinjoin transactions)

    In best case scenario, we may have around 1.89% of the transactions with an entropy greater or equal to the entropy of a basic coinjoin but it's likely that the real value is somewhere between 1% and 2%.
    Note that it doesn't imply all these transactions have a coinjoin structure. They just produce higher entropies like coinjoin transactions.


    References

    (1): G.Maxwell - CoinJoin: Bitcoin privacy for the real world (https://bitcointalk.org/index.php?topic=279249.msg7117154#msg7117154)
    (2): K.Atlas - Coinjoin Sudoku (http://www.coinjoinsudoku.com/)