Last active
October 17, 2019 00:53
-
-
Save travisdowns/4d80ee9a6b36d20590e846fa346797c1 to your computer and use it in GitHub Desktop.
Revisions
-
travisdowns revised this gist
Oct 3, 2019 . 1 changed file with 6 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,8 +1,8 @@ In a [recent paper](https://arxiv.org/abs/1910.00214), a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows: > OSACA can detect LCDs by creating a DAG of a code comprising two back-to-back copies of the loop body. It can thus analyze all possible paths from each vertex of the first kernel section and detect cyclic LCDs if there exists a dependency chain from one instruction form to its corresponding duplicate. However, I don't think this is sufficient to catch all loop carried dependencies. In particular, you can have a case where a dependency is loop carried, but where no vertex (instruction) in the second iteration depends on the _same_ vertex<sup>1</sup> in the first. Rather, some other vertex in the second depends on the first, and in some subsequent iteration, the cycle is completed. An example: @@ -22,3 +22,7 @@ Is there a loop carried dependecy here? Yes, there is: the chain of `add` instru We see then that there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This also shows you can have non-integer LCD lengths when measured per iteration! I would show the OSACA generated graphs here, but I couldn't get it to run, so you'll have to use your imagination. If anyone wants to contribute ASCII art, I'll include it :). --- <sup>1</sup> By _same_ I mean "its corresponding dulicate" as in the quoted portion of the paper. -
travisdowns revised this gist
Oct 3, 2019 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ In a [recent paper](https://arxiv.org/abs/1910.00214), a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows: > OSACAcan detect LCDs by creating a DAG of a code comprising twoback-to-back copies of the loop body. It can thus analyze allpossible paths from each vertex of the first kernel section anddetect cyclic LCDs if there exists a dependency chain fromone instruction form to its corresponding duplicate. -
travisdowns revised this gist
Oct 3, 2019 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -20,3 +20,5 @@ mov ebx, ecx ; F (deps: D) Is there a loop carried dependecy here? Yes, there is: the chain of `add` instructions in all _even_ iterations depends on the result of the chain in the preceeding _even_ iteration, and the same is true for odd iterations. In particular, there is an intra-iteration chain `A->B->C->D->F`, but this chain only depends on `E` from the previous iteration which doens't appear in the chain. The other chain is simply `E`, which depends on `F` from the previous iteration. So the dependency is carried, but it takes _two_ iterations for any dependency to flow back to the _same_ vertex. We see then that there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This also shows you can have non-integer LCD lengths when measured per iteration! I would show the OSACA generated graphs here, but I couldn't get it to run, so you'll have to use your imagination. If anyone wants to contribute ASCII art, I'll include it :). -
travisdowns revised this gist
Oct 3, 2019 . 1 changed file with 9 additions and 7 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,14 +7,16 @@ However, I don't think this is sufficient to catch all loop carried dependencies An example: ``` add eax, 1 ; A (deps: E-previous) add eax, 1 ; B (deps: A) add eax, 1 ; C (deps: B) ; swap eax, ebx (exploded version of xchg eax, ebx) mov ecx, eax ; D (deps: C) mov eax, ebx ; E (deps: F-previous) mov ebx, ecx ; F (deps: D) ``` Is there a loop carried dependecy here? Yes, there is: the chain of `add` instructions in all _even_ iterations depends on the result of the chain in the preceeding _even_ iteration, and the same is true for odd iterations. In particular, there is an intra-iteration chain `A->B->C->D->F`, but this chain only depends on `E` from the previous iteration which doens't appear in the chain. The other chain is simply `E`, which depends on `F` from the previous iteration. So the dependency is carried, but it takes _two_ iterations for any dependency to flow back to the _same_ vertex. We see then that there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This also shows you can have non-integer LCD lengths when measured per iteration! -
travisdowns created this gist
Oct 3, 2019 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,20 @@ In a recent OSACA paper, a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows: > OSACAcan detect LCDs by creating a DAG of a code comprising twoback-to-back copies of the loop body. It can thus analyze allpossible paths from each vertex of the first kernel section anddetect cyclic LCDs if there exists a dependency chain fromone instruction form to its corresponding duplicate. However, I don't think this is sufficient to catch all loop carried dependencies. In particular, you can have a case where a dependency is loop carried, but where no vertex (instruction) in the second iteration depends on the _same_ vertex in the first. Rather, some other vertex in the second depends on the first, and in some subsequent iteration, the cycle is completed. An example: ``` add eax, 1 add eax, 1 add eax, 1 ; swap eax, ebx (exploded version of xchg eax, ebx) mov ecx, eax mov eax, ebx mov ebx, ecx ``` Is there a loop carried dependecy here? Yes, there is: the chain of `add` instructions in all _even_ iterations depends on the result of the chain in the preceeding _even_ iteration, and the same is true for odd iterations. So there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This shows you can have non-integer LCD lengths when measured per iteration!