Skip to content

Instantly share code, notes, and snippets.

@travisdowns
Last active October 17, 2019 00:53
Show Gist options
  • Save travisdowns/4d80ee9a6b36d20590e846fa346797c1 to your computer and use it in GitHub Desktop.
Save travisdowns/4d80ee9a6b36d20590e846fa346797c1 to your computer and use it in GitHub Desktop.

Revisions

  1. travisdowns revised this gist Oct 3, 2019. 1 changed file with 6 additions and 2 deletions.
    8 changes: 6 additions & 2 deletions lcd-osaca.md
    Original file line number Diff line number Diff line change
    @@ -1,8 +1,8 @@
    In a [recent paper](https://arxiv.org/abs/1910.00214), a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows:

    > OSACAcan detect LCDs by creating a DAG of a code comprising twoback-to-back copies of the loop body. It can thus analyze allpossible paths from each vertex of the first kernel section anddetect cyclic LCDs if there exists a dependency chain fromone instruction form to its corresponding duplicate.
    > OSACA can detect LCDs by creating a DAG of a code comprising two back-to-back copies of the loop body. It can thus analyze all possible paths from each vertex of the first kernel section and detect cyclic LCDs if there exists a dependency chain from one instruction form to its corresponding duplicate.
    However, I don't think this is sufficient to catch all loop carried dependencies. In particular, you can have a case where a dependency is loop carried, but where no vertex (instruction) in the second iteration depends on the _same_ vertex in the first. Rather, some other vertex in the second depends on the first, and in some subsequent iteration, the cycle is completed.
    However, I don't think this is sufficient to catch all loop carried dependencies. In particular, you can have a case where a dependency is loop carried, but where no vertex (instruction) in the second iteration depends on the _same_ vertex<sup>1</sup> in the first. Rather, some other vertex in the second depends on the first, and in some subsequent iteration, the cycle is completed.

    An example:

    @@ -22,3 +22,7 @@ Is there a loop carried dependecy here? Yes, there is: the chain of `add` instru
    We see then that there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This also shows you can have non-integer LCD lengths when measured per iteration!

    I would show the OSACA generated graphs here, but I couldn't get it to run, so you'll have to use your imagination. If anyone wants to contribute ASCII art, I'll include it :).

    ---

    <sup>1</sup> By _same_ I mean "its corresponding dulicate" as in the quoted portion of the paper.
  2. travisdowns revised this gist Oct 3, 2019. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion lcd-osaca.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    In a recent OSACA paper, a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows:
    In a [recent paper](https://arxiv.org/abs/1910.00214), a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows:

    > OSACAcan detect LCDs by creating a DAG of a code comprising twoback-to-back copies of the loop body. It can thus analyze allpossible paths from each vertex of the first kernel section anddetect cyclic LCDs if there exists a dependency chain fromone instruction form to its corresponding duplicate.
  3. travisdowns revised this gist Oct 3, 2019. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions lcd-osaca.md
    Original file line number Diff line number Diff line change
    @@ -20,3 +20,5 @@ mov ebx, ecx ; F (deps: D)
    Is there a loop carried dependecy here? Yes, there is: the chain of `add` instructions in all _even_ iterations depends on the result of the chain in the preceeding _even_ iteration, and the same is true for odd iterations. In particular, there is an intra-iteration chain `A->B->C->D->F`, but this chain only depends on `E` from the previous iteration which doens't appear in the chain. The other chain is simply `E`, which depends on `F` from the previous iteration. So the dependency is carried, but it takes _two_ iterations for any dependency to flow back to the _same_ vertex.

    We see then that there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This also shows you can have non-integer LCD lengths when measured per iteration!

    I would show the OSACA generated graphs here, but I couldn't get it to run, so you'll have to use your imagination. If anyone wants to contribute ASCII art, I'll include it :).
  4. travisdowns revised this gist Oct 3, 2019. 1 changed file with 9 additions and 7 deletions.
    16 changes: 9 additions & 7 deletions lcd-osaca.md
    Original file line number Diff line number Diff line change
    @@ -7,14 +7,16 @@ However, I don't think this is sufficient to catch all loop carried dependencies
    An example:

    ```
    add eax, 1
    add eax, 1
    add eax, 1
    add eax, 1 ; A (deps: E-previous)
    add eax, 1 ; B (deps: A)
    add eax, 1 ; C (deps: B)
    ; swap eax, ebx (exploded version of xchg eax, ebx)
    mov ecx, eax
    mov eax, ebx
    mov ebx, ecx
    mov ecx, eax ; D (deps: C)
    mov eax, ebx ; E (deps: F-previous)
    mov ebx, ecx ; F (deps: D)
    ```

    Is there a loop carried dependecy here? Yes, there is: the chain of `add` instructions in all _even_ iterations depends on the result of the chain in the preceeding _even_ iteration, and the same is true for odd iterations. So there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This shows you can have non-integer LCD lengths when measured per iteration!
    Is there a loop carried dependecy here? Yes, there is: the chain of `add` instructions in all _even_ iterations depends on the result of the chain in the preceeding _even_ iteration, and the same is true for odd iterations. In particular, there is an intra-iteration chain `A->B->C->D->F`, but this chain only depends on `E` from the previous iteration which doens't appear in the chain. The other chain is simply `E`, which depends on `F` from the previous iteration. So the dependency is carried, but it takes _two_ iterations for any dependency to flow back to the _same_ vertex.

    We see then that there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This also shows you can have non-integer LCD lengths when measured per iteration!
  5. travisdowns created this gist Oct 3, 2019.
    20 changes: 20 additions & 0 deletions lcd-osaca.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,20 @@
    In a recent OSACA paper, a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows:

    > OSACAcan detect LCDs by creating a DAG of a code comprising twoback-to-back copies of the loop body. It can thus analyze allpossible paths from each vertex of the first kernel section anddetect cyclic LCDs if there exists a dependency chain fromone instruction form to its corresponding duplicate.
    However, I don't think this is sufficient to catch all loop carried dependencies. In particular, you can have a case where a dependency is loop carried, but where no vertex (instruction) in the second iteration depends on the _same_ vertex in the first. Rather, some other vertex in the second depends on the first, and in some subsequent iteration, the cycle is completed.

    An example:

    ```
    add eax, 1
    add eax, 1
    add eax, 1
    ; swap eax, ebx (exploded version of xchg eax, ebx)
    mov ecx, eax
    mov eax, ebx
    mov ebx, ecx
    ```

    Is there a loop carried dependecy here? Yes, there is: the chain of `add` instructions in all _even_ iterations depends on the result of the chain in the preceeding _even_ iteration, and the same is true for odd iterations. So there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This shows you can have non-integer LCD lengths when measured per iteration!