In a recent OSACA paper, a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows: > OSACAcan detect LCDs by creating a DAG of a code comprising twoback-to-back copies of the loop body. It can thus analyze allpossible paths from each vertex of the first kernel section anddetect cyclic LCDs if there exists a dependency chain fromone instruction form to its corresponding duplicate. However, I don't think this is sufficient to catch all loop carried dependencies. In particular, you can have a case where a dependency is loop carried, but where no vertex (instruction) in the second iteration depends on the _same_ vertex in the first. Rather, some other vertex in the second depends on the first, and in some subsequent iteration, the cycle is completed. An example: ``` add eax, 1 ; A (deps: E-previous) add eax, 1 ; B (deps: A) add eax, 1 ; C (deps: B) ; swap eax, ebx (exploded version of xchg eax, ebx) mov ecx, eax ; D (deps: C) mov eax, ebx ; E (deps: F-previous) mov ebx, ecx ; F (deps: D) ``` Is there a loop carried dependecy here? Yes, there is: the chain of `add` instructions in all _even_ iterations depends on the result of the chain in the preceeding _even_ iteration, and the same is true for odd iterations. In particular, there is an intra-iteration chain `A->B->C->D->F`, but this chain only depends on `E` from the previous iteration which doens't appear in the chain. The other chain is simply `E`, which depends on `F` from the previous iteration. So the dependency is carried, but it takes _two_ iterations for any dependency to flow back to the _same_ vertex. We see then that there are two equivalent carried dependecy chains of length 3 (if we assume the `mov` are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This also shows you can have non-integer LCD lengths when measured per iteration!