Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save AsmCoder110/0b6a5b0d2273e867036fe9580ca5c2b2 to your computer and use it in GitHub Desktop.
Save AsmCoder110/0b6a5b0d2273e867036fe9580ca5c2b2 to your computer and use it in GitHub Desktop.

Revisions

  1. AsmCoder110 revised this gist Apr 5, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -60,7 +60,7 @@ mov eax, 1
    ret
    ```

    The optimizer using Type-Based Alias Analysis (TBAA)<sup id="a6">[6](#f6)</sup> assumes **1** will be returned and directly moves the constant value into register **eax** which carries the return value. TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores. In this case TBAA knows that a *float* can not alias and *int* and optimizes away the load of **i**.
    The optimizer using Type-Based Alias Analysis (TBAA)<sup id="a6">[6](#f6)</sup> assumes **1** will be returned and directly moves the constant value into register **eax** which carries the return value. TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores. In this case TBAA knows that a *float* cannot alias an *int* and optimizes away the load of **i**.

    ## Now, to the Rule-Book

  2. @shafik shafik revised this gist Apr 1, 2018. 1 changed file with 14 additions and 14 deletions.
    28 changes: 14 additions & 14 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -5,13 +5,13 @@ What is strict aliasing? First we will describe what is aliasing and then we can

    In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term *strict aliasing rule*. If we attempt to access a value using a type not allowed it is classified as [undefined behavior](http://en.cppreference.com/w/cpp/language/ub)(**UB**). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.

    Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we thought was valid. This is undesirable and it is a worth while goal to understand the strict aliasing rules and how to avoid violating them.
    Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we thought was valid. This is undesirable and it is a worthwhile goal to understand the strict aliasing rules and how to avoid violating them.

    To understand more about why we care, we will discuss issues that come up when violating strict aliasing rules, type punning since common techniques used in type punning often violate strict aliasing rules and how to type pun correctly, along with some possible help from C++20 to make type punning simpler and less error prone. We will wrap up the discussion by going over some methods for catching strict aliasing violations.

    ### Prelminary examples
    ### Preliminary examples

    Let's look at some examples, then we can talk about exactly what the standard(s) say, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. Here is na example that should not be surprising ([live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA)):
    Let's look at some examples, then we can talk about exactly what the standard(s) say, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. Here is an example that should not be surprising ([live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA)):

    ```cpp
    int x = 10;
    @@ -50,7 +50,7 @@ In the function **foo** we take an *int\** and a *float\**, in this example we c
    1
    ```
    Which may not be expected but is perfectly valid since we have invoked undefined behavior. A *float* can not validly alias an *int* object. Therefore the optimizer can assume the *constant 1* stored when dereferecing **i** will be the return value since a store through **f** could not validly affect an *int* object. Plugging the code in Compiler Explorer shows this is exactly what is happening([live example](https://godbolt.org/g/yNV5aj)):
    Which may not be expected but is perfectly valid since we have invoked undefined behavior. A *float* can not validly alias an *int* object. Therefore the optimizer can assume the *constant 1* stored when dereferencing **i** will be the return value since a store through **f** could not validly affect an *int* object. Plugging the code in Compiler Explorer shows this is exactly what is happening([live example](https://godbolt.org/g/yNV5aj)):
    ```assembly
    foo(float*, int*): # @foo(float*, int*)
    @@ -249,15 +249,15 @@ float *fp = new (p) float{1.0f} ; // Dynamic type of *p is now float
    ## Are int8_t and uint8_t char types?
    Theoretically neither *int8_t* nor *uint8_t* have to be *char* types but practically they are implemented that way. This is important because if they are really *char* types then they also alias similar to *char* types. If you are unaware of can [lead to suprising performance impacts](https://stackoverflow.com/q/26295216/1708801). We can see that glibc typedefs [int8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L36) and [uint8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L48) to *singed char* and *unsigned char* respectively.
    Theoretically neither *int8_t* nor *uint8_t* have to be *char* types but practically they are implemented that way. This is important because if they are really *char* types then they also alias similar to *char* types. If you are unaware of this it can [lead to surprising performance impacts](https://stackoverflow.com/q/26295216/1708801). We can see that glibc typedefs [int8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L36) and [uint8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L48) to *signed char* and *unsigned char* respectively.
    This would be hard to change since for *C++* it would be an ABI break. This would change name mangling and would break any API using either of those types in their interface.
    ## What is Type Punning
    We have gotten to this point and we may be wondering, why would we want to alias for? The answer typically is to *type pun*, often the methods used violate strict aliasing rules.
    Sometimes we want to circumvent the type system and interpret an object as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. *Type punning* is useful for tasks that want access to the underlying representation of an object to view, transport or manipulate. Typical areas we find type punning being used are compilers, serialization, networking code etc…
    Sometimes we want to circumvent the type system and interpret an object as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. *Type punning* is useful for tasks that want access to the underlying representation of an object to view, transport or manipulate. Typical areas we find type punning being used are compilers, serialization, networking code, etc…
    Traditionally this has been accomplished by taking the address of the object, casting it to a pointer of the type we want to reinterpret it as and then accessing the value, or in other words by aliasing. For example:
    @@ -285,7 +285,7 @@ union u1
    union u1 u;
    u.f = 1.0f;

    printf( “”%d\n”, u.n ); // UB in C++ n is not the active member
    printf( "%d\n”, u.n ); // UB in C++ n is not the active member
    ```
    This is not valid in C++ and some consider the purpose of unions to be solely for implementing variant types and feel using unions for type punning is an abuse.
    @@ -311,7 +311,7 @@ At a sufficient optimization level any decent modern compiler generates identica
    ### Type Punning Arrays
    But, what if we want to type pun an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int*. The optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):
    But, what if we want to type pun an array of *unsigned char* into a series of *unsigned ints* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int*. The optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):
    @@ -334,9 +334,9 @@ int bar( unsigned char *p, size_t len ) {
    }
    ```

    In the example, we take a *char\** **p**, assume it points to multiple chunks of **sizeof(unsigned int)** data, we type pun each chunk of data as an *unsigned int*, compute **foo()** on each chunck of type punned data and sum it into **result** and return the final value.
    In the example, we take a *char\** **p**, assume it points to multiple chunks of **sizeof(unsigned int)** data, we type pun each chunk of data as an *unsigned int*, compute **foo()** on each chunk of type punned data and sum it into **result** and return the final value.

    The assembly for the body of the loop shows the optimizer reduces the body into a direct access of the underlying *unsinged char array* as an *unsigned int*, adding it directly into **pop**:
    The assembly for the body of the loop shows the optimizer reduces the body into a direct access of the underlying *unsigned char array* as an *unsigned int*, adding it directly into **eax**:

    ```Assembly
    add eax, dword ptr [rdi + rcx]
    @@ -361,7 +361,7 @@ int bar( unsigned char *p, size_t len ) {
    ## C++20 and bit_cast
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which gives a simple and safe way to type-pun as well as being usable in a constexpr context. It requires us to use an intermediate struct in the case where *To* and *From* types don't have the same size<sup id="a15">[15](#f15)</sup>. We will use a struct containing a **sizeof( unsigned int )** character array (*assumes 4 byte unsigned int*) to be the *From* type and *unsigned int* as the *To* type.:
    In C++20 we may gain **bit_cast**<sup id="a14">[14](#f14)</sup> which gives a simple and safe way to type-pun as well as being usable in a constexpr context. It requires us to use an intermediate struct in the case where *To* and *From* types don't have the same size<sup id="a15">[15](#f15)</sup>. We will use a struct containing a **sizeof( unsigned int )** character array (*assumes 4 byte unsigned int*) to be the *From* type and *unsigned int* as the *To* type.:
    ```cpp
    struct uint_chars {
    @@ -396,7 +396,7 @@ The C++17 draft standard in section *[basic.align] paragraph 1*:

    >Object types have alignment requirements (6.7.1, 6.7.2) which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type; stricter alignment can be requested using the alignment specifier (10.6.2).
    Both C99 and C11 are explict that a conversion that results in a unaligned pointer is undefined behavior, section *6.3.2.3 Pointers* says:
    Both C99 and C11 are explicit that a conversion that results in a unaligned pointer is undefined behavior, section *6.3.2.3 Pointers* says:

    >A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned<sup>57)</sup> for the pointed-to type, the behavior is undefined. ...
    @@ -414,7 +414,7 @@ So let's assume:
    Then type punning an array of char of size 4 as an *int* violates strict aliasing but may also violate alignment requirements if the array has an alignment of 1 or 2 bytes.

    ```cpp
    char arr[4] = { 0x0F, 0x0, 0x0, 0x00 }; // Could be allocated on a 1 or 2 byte boundry
    char arr[4] = { 0x0F, 0x0, 0x0, 0x00 }; // Could be allocated on a 1 or 2 byte boundary
    int x = *reinterpret_cast<int*>(arr); // Undefined behavior we have an unaligned pointer
    ```
    @@ -510,7 +510,7 @@ We have learned about aliasing rules in both C and C++, what it means that the c

    Optimizers are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations. We can expect the optimizations will only get better and will break more code we have been used to just working.

    We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but For C++ they will only catch a small fraction of the cases and for C with tis-interpreter we should be able to catch most violations. Potentially
    We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but for C++ they will only catch a small fraction of the cases and for C with tis-interpreter we should be able to catch most violations.

    Thank you to those who provided feedback on this write-up: JF Bastien, Christopher Di Bella, Pascal Cuoq, Matt P. Dziubinski, Patrice Roy and Ólafur Waage

  3. @shafik shafik revised this gist Mar 31, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -514,7 +514,7 @@ We have standard conformant methods for type punning and in release and sometime

    Thank you to those who provided feedback on this write-up: JF Bastien, Christopher Di Bella, Pascal Cuoq, Matt P. Dziubinski, Patrice Roy and Ólafur Waage

    Of course in the end, all errors are the authors.
    Of course in the end, all errors are the author's.

    #### Footnotes

  4. @shafik shafik revised this gist Mar 30, 2018. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -512,6 +512,9 @@ Optimizers are slowly getting better at type based aliasing analysis and already

    We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but For C++ they will only catch a small fraction of the cases and for C with tis-interpreter we should be able to catch most violations. Potentially

    Thank you to those who provided feedback on this write-up: JF Bastien, Christopher Di Bella, Pascal Cuoq, Matt P. Dziubinski, Patrice Roy and Ólafur Waage

    Of course in the end, all errors are the authors.

    #### Footnotes

  5. @shafik shafik revised this gist Mar 30, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -562,7 +562,7 @@ We have standard conformant methods for type punning and in release and sometime
    <br>
    <b id="f23">23</b> The unaligned access example take from the Address Sanitizer Algorithm wiki https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#unaligned-accesses [](#a23)
    <br>
    <b id="f24">24</b> TrustInSoft tis-interpreter https://trust-in-soft.com/tis-interpreter/ , strict aliasing checks can be run by building tis-kernel https://github.com/TrustInSoft/tis-kernel without any disable flags [](#a24)
    <b id="f24">24</b> TrustInSoft tis-interpreter https://trust-in-soft.com/tis-interpreter/ , strict aliasing checks can be run by building tis-kernel https://github.com/TrustInSoft/tis-kernel [](#a24)
    <br>
    <b id="f25">25</b> Detecting Strict Aliasing Violations in the Wild https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf a paper that covers dos and don't w.r.t to aliasing in C [](#a25)
    <br>
  6. @shafik shafik revised this gist Mar 29, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    # What is Strict Aliasing Rule and Why do we care?
    # What is the Strict Aliasing Rule and Why do we care?
    ## (OR Type Punning, Undefined Behavior and Alignment, Oh My!)

    What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.
  7. @shafik shafik revised this gist Mar 28, 2018. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -43,7 +43,7 @@ int main() {
    }
    ```
    In the function **foo** we take an *int\** and a *float\**, in this example we call **foo** and set both parameters to point to the same memory location which in this example contains an *int*. We may naively expect the result of the second **cout** to be **0** but with optimization enabled using **-O2** both gcc and clang produce the following result:
    In the function **foo** we take an *int\** and a *float\**, in this example we call **foo** and set both parameters to point to the same memory location which in this example contains an *int*. Note, the [reinterpret_cast](http://en.cppreference.com/w/cpp/language/reinterpret_cast) is telling the compiler to treat the the expression as if it had the type specificed by its template parameter. In this case we are telling it to treat the expression **&x** as if it had type *float\**. We may naively expect the result of the second **cout** to be **0** but with optimization enabled using **-O2** both gcc and clang produce the following result:
    ```
    0
    @@ -566,4 +566,4 @@ We have standard conformant methods for type punning and in release and sometime
    <br>
    <b id="f25">25</b> Detecting Strict Aliasing Violations in the Wild https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf a paper that covers dos and don't w.r.t to aliasing in C [](#a25)
    <br>
    <b id="f26">26</b> TySan patches, clang: https://reviews.llvm.org/D32199 runtime: https://reviews.llvm.org/D32197 llvm: https://reviews.llvm.org/D32198 [](#a26)
    <b id="f26">26</b> TySan patches, clang: https://reviews.llvm.org/D32199 runtime: https://reviews.llvm.org/D32197 llvm: https://reviews.llvm.org/D32198 [](#a26)
  8. @shafik shafik revised this gist Mar 27, 2018. 1 changed file with 5 additions and 4 deletions.
    9 changes: 5 additions & 4 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -502,7 +502,7 @@ example1.c:15:[sa] warning: The pointer (short *)p has type short *. It violates
    ```

    Finally there is [TySan](https://www.youtube.com/watch?v=vAXJeN7k32Y)<sup id="a25">[25](#f25)</sup> which is currently in development. This sanitizer adds type checking information in a shadow memory segment and checks accesses to see if they violate aliasing rules. The tool potentially should be able to catch all aliasing violations but may have a large run-time overhead.
    Finally there is [TySan](https://www.youtube.com/watch?v=vAXJeN7k32Y)<sup id="a26">[26](#f26)</sup> which is currently in development. This sanitizer adds type checking information in a shadow memory segment and checks accesses to see if they violate aliasing rules. The tool potentially should be able to catch all aliasing violations but may have a large run-time overhead.

    ## Conclusion

    @@ -562,7 +562,8 @@ We have standard conformant methods for type punning and in release and sometime
    <br>
    <b id="f23">23</b> The unaligned access example take from the Address Sanitizer Algorithm wiki https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#unaligned-accesses [](#a23)
    <br>
    <b id="f24">24</b> TrustInSoft tis-interpreter https://trust-in-soft.com/tis-interpreter/ its implementation is discussed in
    Detecting Strict Aliasing Violations in the Wild https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf and the strict aliasing checks can be run by building tis-kernel https://github.com/TrustInSoft/tis-kernel without any disable flags [](#a24)
    <b id="f24">24</b> TrustInSoft tis-interpreter https://trust-in-soft.com/tis-interpreter/ , strict aliasing checks can be run by building tis-kernel https://github.com/TrustInSoft/tis-kernel without any disable flags [](#a24)
    <br>
    <b id="f25">25</b> TySan patches, clang: https://reviews.llvm.org/D32199 runtime: https://reviews.llvm.org/D32197 llvm: https://reviews.llvm.org/D32198 [](#a25)
    <b id="f25">25</b> Detecting Strict Aliasing Violations in the Wild https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf a paper that covers dos and don't w.r.t to aliasing in C [](#a25)
    <br>
    <b id="f26">26</b> TySan patches, clang: https://reviews.llvm.org/D32199 runtime: https://reviews.llvm.org/D32197 llvm: https://reviews.llvm.org/D32198 [](#a26)
  9. @shafik shafik revised this gist Mar 27, 2018. 1 changed file with 7 additions and 7 deletions.
    14 changes: 7 additions & 7 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -64,11 +64,11 @@ The optimizer using Type-Based Alias Analysis (TBAA)<sup id="a6">[6](#f6)</sup>

    ## Now, to the Rule-Book

    What exactly does the standard say we are allowed and not allowed to do? The standard language is not straight forward, so for each item I will try to provide code examples that demonstrates the meaning.
    What exactly does the standard say we are allowed and not allowed to do? The standard language is not straightforward, so for each item I will try to provide code examples that demonstrates the meaning.

    ### What does the C11 draft standard say?
    ### What does the C11 standard say?

    The **C11** draft standard<sup id="a2">[2](#f2)</sup> says the following in section *6.5 Expressions paragraph 7*:
    The **C11** standard<sup id="a2">[2](#f2)</sup> says the following in section *6.5 Expressions paragraph 7*:

    >An object shall have its stored value accessed only by an lvalue expression<sup id="a5">[5](#f5)</sup> that has one of the following types:<sup>88)</sup>
    > — a type compatible with the effective type of the object,
    @@ -225,7 +225,7 @@ int foo( std::byte &b, uint32_t &ui ) {
    }
    ```
    Worth noting *signed char* is not included in the list above, this is a notable difference from *C* which says *a cahracter type*.
    Worth noting *signed char* is not included in the list above, this is a notable difference from *C* which says *a character type*.
    ## Subtle Differences
    @@ -307,7 +307,7 @@ void func1( double d ) {
    //...
    ```
    At a sufficient optimization level this should generate identical code to the previously mentioned **reinterpret_cast** method or *union* method for *type punning*. Examining the generated code we see it uses just register mov ([live Compiler Explorer Example](https://godbolt.org/g/BfZGwX)).
    At a sufficient optimization level any decent modern compiler generates identical code to the previously mentioned **reinterpret_cast** method or *union* method for *type punning*. Examining the generated code we see it uses just register mov ([live Compiler Explorer Example](https://godbolt.org/g/BfZGwX)).
    ### Type Punning Arrays
    @@ -517,9 +517,9 @@ We have standard conformant methods for type punning and in release and sometime

    <b id="f1">1</b> Undefined behavior described on cppreference http://en.cppreference.com/w/cpp/language/ub [](#a1)
    <br>
    <b id="f2">2</b> Draft C11 standard http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf [](#a2)
    <b id="f2">2</b> Draft C11 standard is freely available http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf [](#a2)
    <br>
    <b id="f3">3</b> Draft C++17 standard https://github.com/cplusplus/draft/raw/master/papers/n4659.pdf [](#a3)
    <b id="f3">3</b> Draft C++17 standard is freely available https://github.com/cplusplus/draft/raw/master/papers/n4659.pdf [](#a3)
    <br>
    <b id="f4">4</b> Latest C++ draft standard can be found here: http://eel.is/c++draft/ [](#a4)
    <br>
  10. @shafik shafik revised this gist Mar 23, 2018. 1 changed file with 25 additions and 4 deletions.
    29 changes: 25 additions & 4 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -467,9 +467,27 @@ printf( "%d\n", *u ); // Access to range [6-9]
    The last tool I will recommend is C++ specific and not strictly a tool but a coding practice, don't allow C-style casts. Both gcc and clang will produce a diagnostic for C-style casts using **-Wold-style-cast**. This will force any undefined type puns to use reinterpret_cast, in general reinterpret_cast should be a flag for closer code review. It is also easiser to search your code base for reinterpret_cast to perform an audit.
    For C we have all the tools already covered and we also another good tool available tis-interpreter<sup id="a24">[24](#f24)</sup>, a static analyzer that exhaustively analyzes a program for a large subset of the C language. Given a C verions of the earlier example where gcc misses one case, tis-interpeter is able to catch all three ([see it live](https://wandbox.org/permlink/ebLBJ17Pg7TsnIgY)):
    For C we have all the tools already covered and we also have tis-interpreter<sup id="a24">[24](#f24)</sup>, a static analyzer that exhaustively analyzes a program for a large subset of the C language. Given a C verions of the earlier example where using **-fstrict-aliasing** misses one case ([see it live](https://wandbox.org/permlink/ebLBJ17Pg7TsnIgY))
    ```c
    int a = 1;
    short j;
    float f = 1.0 ;
    printf("%i\n", j = *((short*)&a));
    printf("%i\n", j = *((int*)&f));
    int *p;
    p=&a;
    printf("%i\n", j = *((short*)p));
    ```

    tis-interpeter is able to catch all three, the following example invokes tis-kernal as tis-interpreter (output is edited for brevity):

    ```
    ./bin/tis-kernel -sa example1.c
    ...
    example1.c:9:[sa] warning: The pointer (short *)(& a) has type short *. It violates strict aliasing
    rules by accessing a cell with effective type int.
    ...
    @@ -484,15 +502,15 @@ example1.c:15:[sa] warning: The pointer (short *)p has type short *. It violates
    ```

    Finally looking at tool that are in development and we may have in the near future [TBAA sanitizer](https://www.youtube.com/watch?v=vAXJeN7k32Y) looks promosing but currently is expensive resources-wise.
    Finally there is [TySan](https://www.youtube.com/watch?v=vAXJeN7k32Y)<sup id="a25">[25](#f25)</sup> which is currently in development. This sanitizer adds type checking information in a shadow memory segment and checks accesses to see if they violate aliasing rules. The tool potentially should be able to catch all aliasing violations but may have a large run-time overhead.

    ## Conclusion

    We have learned about aliasing rules in both C and C++, what it means that the compiler expects that we follow these rules strictly and the consequences of not doing so. We learned about some tools that will help us catch some misuses of aliasing. We have seen a common use for type aliasing is type punning and how to type pun correctly.

    Optimizers are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations. We can expect the optimizations will only get better and will break more code we have been used to just working.

    We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but they will only catch a small fraction of the cases.
    We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but For C++ they will only catch a small fraction of the cases and for C with tis-interpreter we should be able to catch most violations. Potentially


    #### Footnotes
    @@ -543,5 +561,8 @@ We have standard conformant methods for type punning and in release and sometime
    <b id="f22">22</b> ASan documentation https://clang.llvm.org/docs/AddressSanitizer.html [](#a22)
    <br>
    <b id="f23">23</b> The unaligned access example take from the Address Sanitizer Algorithm wiki https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#unaligned-accesses [](#a23)
    <br>
    <b id="f24">24</b> TrustInSoft tis-interpreter https://trust-in-soft.com/tis-interpreter/ its implementation is discussed in
    Detecting Strict Aliasing Violations in the Wild https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf and the strict aliasing checks can be run by building tis-kernel https://github.com/TrustInSoft/tis-kernel without any disable flags [↩](#a24)
    Detecting Strict Aliasing Violations in the Wild https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf and the strict aliasing checks can be run by building tis-kernel https://github.com/TrustInSoft/tis-kernel without any disable flags [](#a24)
    <br>
    <b id="f25">25</b> TySan patches, clang: https://reviews.llvm.org/D32199 runtime: https://reviews.llvm.org/D32197 llvm: https://reviews.llvm.org/D32198 [](#a25)
  11. @shafik shafik revised this gist Mar 20, 2018. 1 changed file with 30 additions and 9 deletions.
    39 changes: 30 additions & 9 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -225,6 +225,8 @@ int foo( std::byte &b, uint32_t &ui ) {
    }
    ```
    Worth noting *signed char* is not included in the list above, this is a notable difference from *C* which says *a cahracter type*.
    ## Subtle Differences
    So although we can see that C and C++ say similar things about aliasing there are some differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) or [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'd memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> for example by writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    @@ -429,26 +431,27 @@ Another unexpected penalty to unaligned accesses is that it breaks atomics on so

    ## Catching Strict Aliasing Violations

    We don't have a lot of good tools for catching strict aliasing, the tools we have will catch some cases of strict aliasing violations and some cases of misaligned loads and stores.
    We don't have a lot of good tools for catching strict aliasing in C++, the tools we have will catch some cases of strict aliasing violations and some cases of misaligned loads and stores.

    gcc using the flag **-fstrict-aliasing** and **-Wstrict-aliasing**<sup id="a19">[19](#f19)</sup> can catch some cases although not without false positives/negatives. For example the following cases<sup id="a21">[21](#f21)</sup> will generate a warning in gcc ([see it live](https://wandbox.org/permlink/ERccUsWgS9hDpVqM)):
    gcc using the flag **-fstrict-aliasing** and **-Wstrict-aliasing**<sup id="a19">[19](#f19)</sup> can catch some cases although not without false positives/negatives. For example the following cases<sup id="a21">[21](#f21)</sup> will generate a warning in gcc ([see it live](https://wandbox.org/permlink/cfckjTgwNTYHDIry)):

    ```cpp
    int a = 1;
    short j;
    float f ;
    float f = 1.f; // Originally not initialized but tis-kernel caught
    // it was being accessed w/ an indeterminate value below

    printf("%i\n", j = *((short*)&a));
    printf("%i\n", j = *((int*)&f));
    printf("%i\n", j = *(reinterpret_cast<short*>(&a)));
    printf("%i\n", j = *(reinterpret_cast<int*>(&f)));
    ```
    although it will not catch this additional case [see it live](https://wandbox.org/permlink/r0uCFcFoJjtZVAyZ):
    although it will not catch this additional case ([see it live](https://wandbox.org/permlink/dwd9jhy53AF7a2D0)):
    ```cpp
    int *p;
    p=&a;
    printf("%i\n", j = *((short*)p));
    printf("%i\n", j = *(reinterpret_cast<short*>(p)));
    ```

    Although clang allows these flags it apparently does not actually implement the warnings<sup id="a20">[20](#f20)</sup>.
    @@ -464,8 +467,24 @@ printf( "%d\n", *u ); // Access to range [6-9]
    The last tool I will recommend is C++ specific and not strictly a tool but a coding practice, don't allow C-style casts. Both gcc and clang will produce a diagnostic for C-style casts using **-Wold-style-cast**. This will force any undefined type puns to use reinterpret_cast, in general reinterpret_cast should be a flag for closer code review. It is also easiser to search your code base for reinterpret_cast to perform an audit.
    For C we have all the tools already covered and we also another good tool available tis-interpreter<sup id="a24">[24](#f24)</sup>, a static analyzer that exhaustively analyzes a program for a large subset of the C language. Given a C verions of the earlier example where gcc misses one case, tis-interpeter is able to catch all three ([see it live](https://wandbox.org/permlink/ebLBJ17Pg7TsnIgY)):
    ```
    example1.c:9:[sa] warning: The pointer (short *)(& a) has type short *. It violates strict aliasing
    rules by accessing a cell with effective type int.
    ...

    example1.c:10:[sa] warning: The pointer (int *)(& f) has type int *. It violates strict aliasing rules by
    accessing a cell with effective type float.
    Callstack: main
    ...

    example1.c:15:[sa] warning: The pointer (short *)p has type short *. It violates strict aliasing rules by
    accessing a cell with effective type int.

    ```
    In the future we may get a [TBAA sanitizer](https://www.youtube.com/watch?v=vAXJeN7k32Y) but this is still a work in progress.
    Finally looking at tool that are in development and we may have in the near future [TBAA sanitizer](https://www.youtube.com/watch?v=vAXJeN7k32Y) looks promosing but currently is expensive resources-wise.
    ## Conclusion
    @@ -523,4 +542,6 @@ We have standard conformant methods for type punning and in release and sometime
    <br>
    <b id="f22">22</b> ASan documentation https://clang.llvm.org/docs/AddressSanitizer.html [↩](#a22)
    <br>
    <b id="f23">23</b> The unaligned access example take from the Address Sanitizer Algorithm wiki https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#unaligned-accesses [↩](#a23)
    <b id="f23">23</b> The unaligned access example take from the Address Sanitizer Algorithm wiki https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#unaligned-accesses [↩](#a23)
    <b id="f24">24</b> TrustInSoft tis-interpreter https://trust-in-soft.com/tis-interpreter/ its implementation is discussed in
    Detecting Strict Aliasing Violations in the Wild https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf and the strict aliasing checks can be run by building tis-kernel https://github.com/TrustInSoft/tis-kernel without any disable flags [↩](#a24)
  12. @shafik shafik revised this gist Mar 18, 2018. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -311,6 +311,8 @@ At a sufficient optimization level this should generate identical code to the pr
    But, what if we want to type pun an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int*. The optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):
    ```cpp
    // Simple operation just return the value back
    int foo( unsigned int x ) { return x ; }
    @@ -323,13 +325,15 @@ int bar( unsigned char *p, size_t len ) {
    unsigned int ui = 0;
    std::memcpy( &ui, &p[index], sizeof(unsigned int) );
    result += foo( ui ;
    result += foo( ui ) ;
    }
    return result;
    }
    ```

    In the example, we take a *char\** **p**, assume it points to multiple chunks of **sizeof(unsigned int)** data, we type pun each chunk of data as an *unsigned int*, compute **foo()** on each chunck of type punned data and sum it into **result** and return the final value.

    The assembly for the body of the loop shows the optimizer reduces the body into a direct access of the underlying *unsinged char array* as an *unsigned int*, adding it directly into **pop**:

    ```Assembly
  13. @shafik shafik revised this gist Mar 17, 2018. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -460,6 +460,9 @@ printf( "%d\n", *u ); // Access to range [6-9]
    The last tool I will recommend is C++ specific and not strictly a tool but a coding practice, don't allow C-style casts. Both gcc and clang will produce a diagnostic for C-style casts using **-Wold-style-cast**. This will force any undefined type puns to use reinterpret_cast, in general reinterpret_cast should be a flag for closer code review. It is also easiser to search your code base for reinterpret_cast to perform an audit.
    In the future we may get a [TBAA sanitizer](https://www.youtube.com/watch?v=vAXJeN7k32Y) but this is still a work in progress.
    ## Conclusion
    We have learned about aliasing rules in both C and C++, what it means that the compiler expects that we follow these rules strictly and the consequences of not doing so. We learned about some tools that will help us catch some misuses of aliasing. We have seen a common use for type aliasing is type punning and how to type pun correctly.
  14. @shafik shafik revised this gist Mar 16, 2018. 1 changed file with 7 additions and 5 deletions.
    12 changes: 7 additions & 5 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -253,9 +253,11 @@ This would be hard to change since for *C++* it would be an ABI break. This woul
    ## What is Type Punning
    Sometimes we want to circumvent the type system and interpret an object as a different type. This is called type punning, to reinterpret a segment of memory as another type. Type punning is useful to tasks that want access to the underlying representation of an object to view, transport or manipulate. Typical areas we find type punning being used are compilers, serialization, networking code etc…
    We have gotten to this point and we may be wondering, why would we want to alias for? The answer typically is to *type pun*, often the methods used violate strict aliasing rules.
    Traditionally this has been accomplished by taking the address of the object, casting it to a pointer of the type we want to reinterpret it as and then accessing the value.
    Sometimes we want to circumvent the type system and interpret an object as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. *Type punning* is useful for tasks that want access to the underlying representation of an object to view, transport or manipulate. Typical areas we find type punning being used are compilers, serialization, networking code etc…
    Traditionally this has been accomplished by taking the address of the object, casting it to a pointer of the type we want to reinterpret it as and then accessing the value, or in other words by aliasing. For example:
    ```cpp
    int x = 1 ;
    @@ -269,7 +271,7 @@ float *fp = reinterpret_cast<float*>(&x) ; // Not a valid aliasing
    printf( “%f\n”, *fp ) ;
    ```

    As we have seen earlier this is not a valid aliasing, so we are invoking undefined behavior. But traditionally compiler did not take advantage of strict aliasing rules and this type of code usually just worked, developers have unfortunately gotten used to doing things this way. A common alternate method for type punning is through unions, which is valid in C but undefined behavior in C++ ([see live example](https://wandbox.org/permlink/oOf9bPlcWDYrYqPF)):
    As we have seen earlier this is not a valid aliasing, so we are invoking undefined behavior. But traditionally compilers did not take advantage of strict aliasing rules and this type of code usually just worked, developers have unfortunately gotten used to doing things this way. A common alternate method for type punning is through unions, which is valid in C but *undefined behavior* in C++ ([see live example](https://wandbox.org/permlink/oOf9bPlcWDYrYqPF)):

    ```c
    union u1
    @@ -288,7 +290,7 @@ This is not valid in C++ and some consider the purpose of unions to be solely fo
    ### How do we Type Pun correctly?
    The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for type punning and optimize it away and generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for *type punning* and optimize it away and generate a register to register move. For example if we know *int64_t* is the same size as *double*:
    ```cpp
    static_assert( sizeof( double ) == sizeof( int64_t ) ); // C++17 does not require a message
    @@ -303,7 +305,7 @@ void func1( double d ) {
    //...
    ```
    At a sufficient optimization level this should generate identical code to the previously mentioned **reinterpret_cast** method and *union* method for *type punning*. Examining the generated code we see it uses just register moves ([live Compiler Explorer Example](https://godbolt.org/g/BfZGwX)).
    At a sufficient optimization level this should generate identical code to the previously mentioned **reinterpret_cast** method or *union* method for *type punning*. Examining the generated code we see it uses just register mov ([live Compiler Explorer Example](https://godbolt.org/g/BfZGwX)).
    ### Type Punning Arrays
  15. @shafik shafik revised this gist Mar 16, 2018. 1 changed file with 39 additions and 27 deletions.
    66 changes: 39 additions & 27 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -251,24 +251,50 @@ Theoretically neither *int8_t* nor *uint8_t* have to be *char* types but practic
    This would be hard to change since for *C++* it would be an ABI break. This would change name mangling and would break any API using either of those types in their interface.
    ## How do we Type Pun correctly?
    ## What is Type Punning
    Sometimes we want to treat a piece of memory like it is bag of bits, circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for type punning and optimize it away to generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    Sometimes we want to circumvent the type system and interpret an object as a different type. This is called type punning, to reinterpret a segment of memory as another type. Type punning is useful to tasks that want access to the underlying representation of an object to view, transport or manipulate. Typical areas we find type punning being used are compilers, serialization, networking code etc…
    Traditionally this has been accomplished by taking the address of the object, casting it to a pointer of the type we want to reinterpret it as and then accessing the value.
    ```cpp
    static_assert( sizeof( double ) == sizeof( int64_t ) ); // C++17 does not require a message
    int x = 1 ;
    // In C
    float *fp = (float*)&x ; // Not a valid aliasing
    // In C++
    float *fp = reinterpret_cast<float*>(&x) ; // Not a valid aliasing
    printf( “%f\n”, *fp ) ;
    ```

    and we want to obtain the integer representation of a *double*. We could reinterpret the bits using **reinterpret_cast**, which violates strict aliasing rules:
    As we have seen earlier this is not a valid aliasing, so we are invoking undefined behavior. But traditionally compiler did not take advantage of strict aliasing rules and this type of code usually just worked, developers have unfortunately gotten used to doing things this way. A common alternate method for type punning is through unions, which is valid in C but undefined behavior in C++ ([see live example](https://wandbox.org/permlink/oOf9bPlcWDYrYqPF)):

    ```c
    union u1
    {
    int n;
    float f;
    } ;

    union u1 u;
    u.f = 1.0f;

    printf( “”%d\n”, u.n ); // UB in C++ n is not the active member
    ```
    This is not valid in C++ and some consider the purpose of unions to be solely for implementing variant types and feel using unions for type punning is an abuse.
    ### How do we Type Pun correctly?
    The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for type punning and optimize it away and generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    ```cpp
    void func1( double d ) {
    std::int64_t n;
    n = *reinterpret_cast<std::int64_t *>(&d); // UB, int64_t is not allowed to alias double
    // ...
    static_assert( sizeof( double ) == sizeof( int64_t ) ); // C++17 does not require a message
    ```

    or we could use **memcpy**:
    we can use **memcpy**:

    ```cpp
    void func1( double d ) {
    @@ -277,22 +303,9 @@ void func1( double d ) {
    //...
    ```
    or we could use the old type punning trick via a union<sup id="a13">[13](#f13)</sup>(undefined behavior in C++):

    ```cpp
    union u1
    {
    std::int64_t n;
    double d;
    } ;

    u1 u;
    u.d = d;
    At a sufficient optimization level this should generate identical code to the previously mentioned **reinterpret_cast** method and *union* method for *type punning*. Examining the generated code we see it uses just register moves ([live Compiler Explorer Example](https://godbolt.org/g/BfZGwX)).
    printf( "%" PRId64 "\n", u.n ); // UB in C++ n is not the active member
    ```
    At a sufficient optimization level all three cases should generate identical code using just register moves ([live Compiler Explorer Example](https://godbolt.org/g/BfZGwX)).
    ### Type Punning Arrays
    But, what if we want to type pun an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int*. The optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):
    @@ -447,14 +460,13 @@ The last tool I will recommend is C++ specific and not strictly a tool but a cod
    ## Conclusion
    Type punning is a tool for treating a type like a bag of bits, which can be useful or even essential in many low level tasks. Traditionally compilers did not take advantage of optimization opportunities around strict aliasing violations and so software developers became used to using these methods to perform type punning and the vast majority of type punning code we will find online will violate the strict aliasing rule.
    We have learned about aliasing rules in both C and C++, what it means that the compiler expects that we follow these rules strictly and the consequences of not doing so. We learned about some tools that will help us catch some misuses of aliasing. We have seen a common use for type aliasing is type punning and how to type pun correctly.
    Optimizers are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations. We can expect the optimizations will only get better and will break more code we have been used to just working.
    Optimizers are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations. We can expect the optimizations will only get better and will break more code we have been used to just working.
    We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but they will only catch a small fraction of the cases.
    #### Footnotes
    <b id="f1">1</b> Undefined behavior described on cppreference http://en.cppreference.com/w/cpp/language/ub [↩](#a1)
  16. @shafik shafik revised this gist Mar 15, 2018. 1 changed file with 16 additions and 11 deletions.
    27 changes: 16 additions & 11 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -340,7 +340,7 @@ int bar( unsigned char *p, size_t len ) {
    ## C++20 and bit_cast
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which gives a simple and safe way to type-pun as well as being usable in constexpr context. It requires us to use an intermediate struct in the case where *To* and *From* types don't have the same size<sup id="a15">[15](#f15)</sup>. We will use a struct containing a four characater array(*assumes 4 byte unsigned int*) to be the *From* type and *unsigned int* as the *To* type.:
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which gives a simple and safe way to type-pun as well as being usable in a constexpr context. It requires us to use an intermediate struct in the case where *To* and *From* types don't have the same size<sup id="a15">[15](#f15)</sup>. We will use a struct containing a **sizeof( unsigned int )** character array (*assumes 4 byte unsigned int*) to be the *From* type and *unsigned int* as the *To* type.:
    ```cpp
    struct uint_chars {
    @@ -367,7 +367,7 @@ It is unfortunate that we need this intermediate type but that is the current co

    ## Alignment

    We have seen in previous examples violating strict aliasing rules can lead to stores being optimized away. Violating strict aliasing rules can also lead to violations of alignment requirement. Both the C and C++ standard state that objects have *alignment requirements* which restrict where in memory objects can be allocated and therefore accessed<sup id="a17">[17](#f17)</sup>. C11 section *6.2.8 Alignment of objects* says:
    We have seen in previous examples violating strict aliasing rules can lead to stores being optimized away. Violating strict aliasing rules can also lead to violations of alignment requirement. Both the C and C++ standard state that objects have *alignment requirements* which restrict where objects can be allocated (*in memory*) and therefore accessed<sup id="a17">[17](#f17)</sup>. C11 section *6.2.8 Alignment of objects* says:

    >Complete object types have alignment requirements which place restrictions on the addresses at which objects of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type: stricter alignment can be requested using the _Alignas keyword.
    @@ -385,14 +385,19 @@ Although C++ is not as explict I believe this sentence from *[basic.align] parag
    ### An Example

    So let's assume that **alignof(char)** and **alignof(int)** are 1 and 4 respectively and sizeof(int) is 4 then type punning an array of char of size 4:
    So let's assume:

    - **alignof(char)** and **alignof(int)** are 1 and 4 respectively
    - sizeof(int) is 4

    Then type punning an array of char of size 4 as an *int* violates strict aliasing but may also violate alignment requirements if the array has an alignment of 1 or 2 bytes.

    ```cpp
    char arr[4] = { 0x0F, 0x0, 0x0, 0x00 }; // Could be allocated on a 1 or 2 byte boundry
    int x = *reinterpret_cast<int*>(arr);
    int x = *reinterpret_cast<int*>(arr); // Undefined behavior we have an unaligned pointer
    ```
    as an int violates strict aliasing but may also violate alignment requirements if **arr** has an alignment of 1 or 2 bytes. Which could lead to reduced performance or a bus error<sup id="a18">[18](#f18)</sup> in some situations. Whereas using **alignas** to force the array to the same alignment of *int* would prevent violating alignment requirements:
    Which could lead to reduced performance or a bus error<sup id="a18">[18](#f18)</sup> in some situations. Whereas using **alignas** to force the array to the same alignment of *int* would prevent violating alignment requirements:
    ```cpp
    alignas(alignof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 };
    @@ -401,13 +406,13 @@ int x = *reinterpret_cast<int*>(arr);

    ### Atomics

    Another unexpected penalty to unaligned accesses is that is breaks atomics on some architectures. Atomics stores may not appear atomic to other threads on x86 if they are misaligned<sup id="a7">[7](#f7)</sup>.
    Another unexpected penalty to unaligned accesses is that it breaks atomics on some architectures. Atomic stores may not appear atomic to other threads on x86 if they are misaligned<sup id="a7">[7](#f7)</sup>.

    ## Catching Strict Aliasing Violations

    We don't have a lot of good tools for catching strict aliasing, the tools we have will catch some cases of strict aliasing violations and some cases of misaligned loads and stores.

    gcc using the flag **-fstrict-aliasing** and **-Wstrict-aliasing**<sup id="a19">[19](#f19)</sup> can catch some cases although not without false positives/negatives. For example the following cases<sup id="a21">[21](#f21)</sup> will generate a warning in gcc [see it live](https://wandbox.org/permlink/ERccUsWgS9hDpVqM):
    gcc using the flag **-fstrict-aliasing** and **-Wstrict-aliasing**<sup id="a19">[19](#f19)</sup> can catch some cases although not without false positives/negatives. For example the following cases<sup id="a21">[21](#f21)</sup> will generate a warning in gcc ([see it live](https://wandbox.org/permlink/ERccUsWgS9hDpVqM)):

    ```cpp
    int a = 1;
    @@ -427,9 +432,9 @@ p=&a;
    printf("%i\n", j = *((short*)p));
    ```

    clang although it allows these flags apparently does not actually implement the warnings<sup id="a20">[20](#f20)</sup>.
    Although clang allows these flags it apparently does not actually implement the warnings<sup id="a20">[20](#f20)</sup>.

    Another tool we have available to us is dynamic analysis using ASan<sup id="a22">[22](#f22)</sup> we can catch misaligned loads and stores. Although these are not directly strict aliasing violations they are a common result of strict aliasing violations. For example the following cases<sup id="a23">[23](#f23)</sup> will generate runtime errors when built with clang using **-fsanitize=address**
    Another tool we have available to us is ASan<sup id="a22">[22](#f22)</sup> which can catch misaligned loads and stores. Although these are not directly strict aliasing violations they are a common result of strict aliasing violations. For example the following cases<sup id="a23">[23](#f23)</sup> will generate runtime errors when built with clang using **-fsanitize=address**

    ```cpp
    int *x = new int[2]; // 8 bytes: [0,7].
    @@ -442,9 +447,9 @@ The last tool I will recommend is C++ specific and not strictly a tool but a cod
    ## Conclusion
    Type punning is a tool for treating a type like a bag of bits, which can be useful or even essential in many low level tasks. Traditionally compilers did not take advantage of optimizations opportunities around strict aliasing violations and so software developers became used to using these methods to perform type punning and the vast majority of type punning code we will find online will violate the strict aliasing rule.
    Type punning is a tool for treating a type like a bag of bits, which can be useful or even essential in many low level tasks. Traditionally compilers did not take advantage of optimization opportunities around strict aliasing violations and so software developers became used to using these methods to perform type punning and the vast majority of type punning code we will find online will violate the strict aliasing rule.
    Optimizers are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations and we can expect the optimizations will only get better and will break more and more code we have been used to just working.
    Optimizers are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations. We can expect the optimizations will only get better and will break more code we have been used to just working.
    We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but they will only catch a small fraction of the cases.
  17. @shafik shafik revised this gist Mar 15, 2018. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -227,7 +227,7 @@ int foo( std::byte &b, uint32_t &ui ) {
    ## Subtle Differences
    So although we can see that C and C++ say similar things about aliasing there are some differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) or [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'd memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    So although we can see that C and C++ say similar things about aliasing there are some differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) or [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'd memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> for example by writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    ```c
    // The following is valid C but not valid C++
    @@ -247,9 +247,9 @@ float *fp = new (p) float{1.0f} ; // Dynamic type of *p is now float
    ## Are int8_t and uint8_t char types?
    Theoretically neither *int8_t* nor *uint8_t* have to be *char* types but practically they are implemented that way. This is important because if they are really *char* types then they also alias like *char* types which is you are unaware of can [lead to suprising performance impacts](https://stackoverflow.com/q/26295216/1708801). We can see that glibc typedefs [int8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L36) and [uint8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L48) to *singed char* and *unsigned char* respectively.
    Theoretically neither *int8_t* nor *uint8_t* have to be *char* types but practically they are implemented that way. This is important because if they are really *char* types then they also alias similar to *char* types. If you are unaware of can [lead to suprising performance impacts](https://stackoverflow.com/q/26295216/1708801). We can see that glibc typedefs [int8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L36) and [uint8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L48) to *singed char* and *unsigned char* respectively.
    This would be hard to chance since at least for *C++* it would be an ABI break. Since this would change name mangling and would break any API using either of those types in their interface.
    This would be hard to change since for *C++* it would be an ABI break. This would change name mangling and would break any API using either of those types in their interface.
    ## How do we Type Pun correctly?
    @@ -292,9 +292,9 @@ u.d = d;
    printf( "%" PRId64 "\n", u.n ); // UB in C++ n is not the active member
    ```
    At a sufficient optimization level all three cases should generate identical code using just register moves [live Compiler Explorer Example](https://godbolt.org/g/BfZGwX).
    At a sufficient optimization level all three cases should generate identical code using just register moves ([live Compiler Explorer Example](https://godbolt.org/g/BfZGwX)).
    But, what if we want to type punning an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int* the optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):
    But, what if we want to type pun an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int*. The optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):
    ```cpp
    // Simple operation just return the value back
  18. @shafik shafik revised this gist Mar 15, 2018. 1 changed file with 93 additions and 93 deletions.
    186 changes: 93 additions & 93 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -14,32 +14,32 @@ To understand more about why we care, we will discuss issues that come up when v
    Let's look at some examples, then we can talk about exactly what the standard(s) say, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. Here is na example that should not be surprising ([live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA)):

    ```cpp
    int x = 10 ;
    int *ip = &x ;
    int x = 10;
    int *ip = &x;

    std::cout << *ip << "\n" ;
    *ip = 12 ;
    std::cout << x << "\n" ;
    std::cout << *ip << "\n";
    *ip = 12;
    std::cout << x << "\n";
    ```

    We have a *int\** pointing to memory occupied by an *int* and this is a valid aliasing. The optimizer must assume that assignments through **ip** could update the value occupied by **x**.

    The next example shows an aliasing that leads to undefined behavior ([live example](https://wandbox.org/permlink/8qA8JyJRVHtS9LPf)):
    The next example shows aliasing that leads to undefined behavior ([live example](https://wandbox.org/permlink/8qA8JyJRVHtS9LPf)):

    ```cpp
    int foo( float *f, int *i ) {
    *i = 1 ;
    *f = 0.f ;
    *i = 1;
    *f = 0.f;

    return *i ;
    return *i;
    }

    int main() {
    int x = 0 ;
    int x = 0;

    std::cout << x << "\n" ; // Expect 0
    x = foo(reinterpret_cast<float*>(&x), &x ) ;
    std::cout << x << "\n" ; // Expect 0?
    std::cout << x << "\n"; // Expect 0
    x = foo(reinterpret_cast<float*>(&x), &x);
    std::cout << x << "\n"; // Expect 0?
    }
    ```
    @@ -64,7 +64,7 @@ The optimizer using Type-Based Alias Analysis (TBAA)<sup id="a6">[6](#f6)</sup>

    ## Now, to the Rule-Book

    What exactly does the standard say we are allowed and not allowed to do? The standard language is not straight forward, so for each item I will try to provide a code examples that demonstrates the meaning.
    What exactly does the standard say we are allowed and not allowed to do? The standard language is not straight forward, so for each item I will try to provide code examples that demonstrates the meaning.

    ### What does the C11 draft standard say?

    @@ -74,26 +74,26 @@ The **C11** draft standard<sup id="a2">[2](#f2)</sup> says the following in sect
    > — a type compatible with the effective type of the object,
    ```c
    int x = 1 ;
    int *p = &x ;
    printf("%d\n", *p ) ; // *p gives us an lvalue expression of type int which is compatible with int
    int x = 1;
    int *p = &x;
    printf("%d\n", *p); // *p gives us an lvalue expression of type int which is compatible with int
    ```
    > — a qualified version of a type compatible with the effective type of the object,
    ```c
    int x = 1;
    const int *p = &x ;
    printf("%d\n", *p ) ; // *p gives us an lvalue expression of type const int which is compatible with int
    const int *p = &x;
    printf("%d\n", *p); // *p gives us an lvalue expression of type const int which is compatible with int
    ```

    > — a type that is the signed or unsigned type corresponding to the effective type of the object,
    ```c
    int x = 1;
    unsigned int *p = (unsigned int*)&x ;
    printf("%u\n", *p ) ; // *p gives us an lvalue expression of type unsigned int which corresponds to
    // the effective type of the object
    unsigned int *p = (unsigned int*)&x;
    printf("%u\n", *p ); // *p gives us an lvalue expression of type unsigned int which corresponds to
    // the effective type of the object
    ```
    [See Footnote 12 for gcc/clang extension](#f12), that allows assigning *unsigned int\** to *int\** even though they are not compatible types.
    @@ -102,32 +102,32 @@ printf("%u\n", *p ) ; // *p gives us an lvalue expression of type unsigned int w
    ```c
    int x = 1;
    const unsigned int *p = (const unsigned int*)&x ;
    printf("%u\n", *p ) ; // *p gives us an lvalue expression of type const unsigned int which is a unsigned type
    // that corresponds with to a qualified verison of the effective type of the object
    const unsigned int *p = (const unsigned int*)&x;
    printf("%u\n", *p ); // *p gives us an lvalue expression of type const unsigned int which is a unsigned type
    // that corresponds with to a qualified verison of the effective type of the object
    ```

    > — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    ```c
    struct foo {
    int x ;
    } ;
    int x;
    };

    void foobar( struct foo *fp, int *ip ) ; // struct foo is an aggregate that includes int among its members so it can
    // can alias with *ip
    void foobar( struct foo *fp, int *ip ); // struct foo is an aggregate that includes int among its members so it can
    // can alias with *ip

    foo f ;
    foobar( &f, &f.x ) ;
    foo f;
    foobar( &f, &f.x );
    ```
    > — a character type.
    ```c
    int x = 65;
    char *p = (char *)&x ;
    printf("%c\n", *p ) ; // *p gives us an lvalue expression of type char which is a character type.
    // The results are not portable due to endianness issues.
    char *p = (char *)&x;
    printf("%c\n", *p ); // *p gives us an lvalue expression of type char which is a character type.
    // The results are not portable due to endianness issues.
    ```

    ### What the C++17 Draft Standard say
    @@ -138,19 +138,19 @@ The C++17 draft standard<sup id="a3">[3](#f3)</sup> in section *\[basic.lval\]
    > (11.1) — the dynamic type of the object,
    ```cpp
    void *p = malloc( sizeof(int) ) ; // We have allocated storage but not started the lifetime of an object
    int *ip = new (p) int{0} ; // Placement new changes the dynamic type of the object to int
    std::cout << *ip << "\n" ; // *ip gives us a glvalue expression of type int which matches the dynamic type
    void *p = malloc( sizeof(int) ); // We have allocated storage but not started the lifetime of an object
    int *ip = new (p) int{0}; // Placement new changes the dynamic type of the object to int
    std::cout << *ip << "\n"; // *ip gives us a glvalue expression of type int which matches the dynamic type
    // of the allocated object
    ```
    > (11.2) — a cv-qualified version of the dynamic type of the object,
    ```cpp
    int x = 1;
    const int *cip = &x ;
    std::cout << *cip << "\n" ; // *cip gives us a glvalue expression of type const int which is a cv-qualified
    // version of the dynamic type of x
    const int *cip = &x;
    std::cout << *cip << "\n"; // *cip gives us a glvalue expression of type const int which is a cv-qualified
    // version of the dynamic type of x
    ```

    > (11.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
    @@ -167,76 +167,76 @@ std::cout << *cip << "\n" ; // *cip gives us a glvalue expression of type const
    // We can see from this godbolt(https://godbolt.org/g/KowGXB) the optimizer assumes aliasing.
    signed int foo( signed int &si, unsigned int &ui ) {
    si = 1;
    ui = 2 ;
    ui = 2;

    return si ;
    return si;
    }
    ```
    > (11.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
    ```cpp
    signed int foo( const signed int &si1, int &si2) ; // Hard to show this one assumes aliasing
    signed int foo( const signed int &si1, int &si2); // Hard to show this one assumes aliasing
    ```

    > (11.6) — an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
    ```cpp
    struct foo {
    int x ;
    } ;
    int x;
    };

    // Compiler Explorer example(https://godbolt.org/g/z2wJTC) shows aliasing assumption
    int foobar( foo &fp, int &ip ) {
    fp.x = 1 ;
    ip = 2 ;
    fp.x = 1;
    ip = 2;

    return fp.x ;
    return fp.x;
    }

    foo f ;
    foobar( f, f.x ) ;
    foo f;
    foobar( f, f.x );
    ```
    > (11.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
    ```cpp
    struct foo { int x ; } ;
    struct foo { int x ; };
    struct bar : public foo {} ;
    struct bar : public foo {};
    int foobar( foo &f, bar &b ) {
    f.x = 1 ;
    b.x = 2 ;
    f.x = 1;
    b.x = 2;
    return f.x ;
    return f.x;
    }
    ```

    > (11.8) — a char, unsigned char, or std::byte type.

    ```cpp
    int foo( std::byte &b, uint32_t &ui ) {
    b = static_cast<std::byte>('a') ;
    ui = 0xFFFFFFFF ;
    b = static_cast<std::byte>('a');
    ui = 0xFFFFFFFF;

    return std::to_integer<int>( b ) ; // b gives us a glvalue expression of type std::byte which can alias
    // an object of type uint32_t
    return std::to_integer<int>( b ); // b gives us a glvalue expression of type std::byte which can alias
    // an object of type uint32_t
    }
    ```
    ## Subtle Differences
    So although we can see that C and C++ say similar things about aliasing there are some differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) nor [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    So although we can see that C and C++ say similar things about aliasing there are some differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) or [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'd memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    ```c
    // The following is valid C but not valid C++
    void *p = malloc(sizeof(float)) ;
    float f = 1.0f ;
    memcpy( p, &f, sizeof(float)) ; // Effective type of *p is float in C
    // Or
    float *fp = p ;
    *fp = 1.0f ; // Effective type of *p is float in C
    void *p = malloc(sizeof(float));
    float f = 1.0f;
    memcpy( p, &f, sizeof(float)); // Effective type of *p is float in C
    // Or
    float *fp = p;
    *fp = 1.0f; // Effective type of *p is float in C
    ```

    Neither of these methods is sufficient in C++ which requires **placement new**:
    @@ -256,15 +256,15 @@ This would be hard to chance since at least for *C++* it would be an ABI break.
    Sometimes we want to treat a piece of memory like it is bag of bits, circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for type punning and optimize it away to generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    ```cpp
    static_assert( sizeof( double ) == sizeof( int64_t ) ) ; // C++17 does not require a message
    static_assert( sizeof( double ) == sizeof( int64_t ) ); // C++17 does not require a message
    ```

    and we want to obtain the integer representation of a *double*. We could reinterpret the bits using **reinterpret_cast**, which violates strict aliasing rules:

    ```cpp
    void func1( double d ) {
    std::int64_t n ;
    n = *reinterpret_cast<std::int64_t *>(&d) ; // UB, int64_t is not allowed to alias double
    std::int64_t n;
    n = *reinterpret_cast<std::int64_t *>(&d); // UB, int64_t is not allowed to alias double
    // ...
    ```
    @@ -283,13 +283,13 @@ or we could use the old type punning trick via a union<sup id="a13">[13](#f13)</
    union u1
    {
    std::int64_t n;
    double d ;
    double d;
    } ;

    u1 u ;
    u.d = d ;
    u1 u;
    u.d = d;

    printf( "%" PRId64 "\n", u.n ) ; // UB in C++ n is not the active member
    printf( "%" PRId64 "\n", u.n ); // UB in C++ n is not the active member
    ```
    At a sufficient optimization level all three cases should generate identical code using just register moves [live Compiler Explorer Example](https://godbolt.org/g/BfZGwX).
    @@ -298,20 +298,20 @@ But, what if we want to type punning an array of *unsigned char* into a series o
    ```cpp
    // Simple operation just return the value back
    int foo(unsigned int x ) { return x ;}
    int foo( unsigned int x ) { return x ; }
    // Assume len is a multiple of sizeof(unsigned int)
    int bar( unsigned char *p, size_t len ) {
    int result = 0 ;
    int result = 0;
    for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    unsigned int ui = 0;
    std::memcpy( &ui, &p[index], sizeof(unsigned int) ) ;
    std::memcpy( &ui, &p[index], sizeof(unsigned int) );
    result += foo( ui ) ;
    result += foo( ui ;
    }
    return result ;
    return result;
    }
    ```

    @@ -326,15 +326,15 @@ Same code but using **reinterpret_cast** to type pun(violates strict aliasing):
    ```cpp
    // Assume len is a multiple of sizeof(unsigned int)
    int bar( unsigned char *p, size_t len ) {
    int result = 0 ;
    int result = 0;

    for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    unsigned int ui = *reinterpret_cast<unsigned int*>(&p[index]) ;
    unsigned int ui = *reinterpret_cast<unsigned int*>(&p[index]);

    result += foo( ui ) ;
    result += foo( ui );
    }

    return result ;
    return result;
    }
    ```
    @@ -345,18 +345,18 @@ In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which gives a
    ```cpp
    struct uint_chars {
    unsigned char arr[sizeof( unsigned int )] = {} ; // Assume sizeof( unsigned int ) == 4
    } ;
    };
    // Assume len is a multiple of 4
    int bar( unsigned char *p, size_t len ) {
    int result = 0 ;
    int result = 0;
    for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    uint_chars f ;
    std::memcpy( f.arr, &p[index], sizeof(unsigned int)) ;
    unsigned int result = bit_cast<unsigned int>(f) ;
    uint_chars f;
    std::memcpy( f.arr, &p[index], sizeof(unsigned int));
    unsigned int result = bit_cast<unsigned int>(f);
    result += foo( result ) ;
    result += foo( result );
    }
    return result ;
    @@ -388,15 +388,15 @@ Although C++ is not as explict I believe this sentence from *[basic.align] parag
    So let's assume that **alignof(char)** and **alignof(int)** are 1 and 4 respectively and sizeof(int) is 4 then type punning an array of char of size 4:

    ```cpp
    char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ; // Could be allocated on a 1 or 2 byte boundry
    int x = *reinterpret_cast<int*>(arr) ;
    char arr[4] = { 0x0F, 0x0, 0x0, 0x00 }; // Could be allocated on a 1 or 2 byte boundry
    int x = *reinterpret_cast<int*>(arr);
    ```
    as an int violates strict aliasing but may also violate alignment requirements if **arr** has an alignment of 1 or 2 bytes. Which could lead to reduced performance or a bus error<sup id="a18">[18](#f18)</sup> in some situations. Whereas using **alignas** to force the array to the same alignment of *int* would prevent violating alignment requirements:
    ```cpp
    alignas(alignof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ;
    int x = *reinterpret_cast<int*>(arr) ;
    alignas(alignof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 };
    int x = *reinterpret_cast<int*>(arr);
    ```

    ### Atomics
    @@ -435,7 +435,7 @@ Another tool we have available to us is dynamic analysis using ASan<sup id="a22"
    int *x = new int[2]; // 8 bytes: [0,7].
    int *u = (int*)((char*)x + 6); // regardless of alignment of x this will not be an aligned address
    *u = 1; // Access to range [6-9]
    printf( "%d\n", *u ) ; // Access to range [6-9]
    printf( "%d\n", *u ); // Access to range [6-9]
    ```
    The last tool I will recommend is C++ specific and not strictly a tool but a coding practice, don't allow C-style casts. Both gcc and clang will produce a diagnostic for C-style casts using **-Wold-style-cast**. This will force any undefined type puns to use reinterpret_cast, in general reinterpret_cast should be a flag for closer code review. It is also easiser to search your code base for reinterpret_cast to perform an audit.
  19. @shafik shafik revised this gist Mar 14, 2018. 1 changed file with 19 additions and 16 deletions.
    35 changes: 19 additions & 16 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -1,11 +1,15 @@
    # What is Strict Aliasing Rule and Why do we care?
    ## (OR Type Punning, Undefined Behavior and Alignment, Oh My!)


    What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.
    What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.

    In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term *strict aliasing rule*. If we attempt to access a value using a type not allowed it is classified as [undefined behavior](http://en.cppreference.com/w/cpp/language/ub)(**UB**). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.

    Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we thought was valid. This is undesirable and it is a worth while goal to understand the strict aliasing rules and how to avoid violating them.
    Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we thought was valid. This is undesirable and it is a worth while goal to understand the strict aliasing rules and how to avoid violating them.

    To understand more about why we care, we will discuss issues that come up when violating strict aliasing rules, type punning since common techniques used in type punning often violate strict aliasing rules and how to type pun correctly, along with some possible help from C++20 to make type punning simpler and less error prone. We will wrap up the discussion by going over some methods for catching strict aliasing violations.

    ### Prelminary examples

    Let's look at some examples, then we can talk about exactly what the standard(s) say, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. Here is na example that should not be surprising ([live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA)):

    @@ -252,7 +256,7 @@ This would be hard to chance since at least for *C++* it would be an ABI break.
    Sometimes we want to treat a piece of memory like it is bag of bits, circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for type punning and optimize it away to generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    ```cpp
    static_assert( sizeof( double ) == sizeof( int64_t ), "" ) ;
    static_assert( sizeof( double ) == sizeof( int64_t ) ) ; // C++17 does not require a message
    ```

    and we want to obtain the integer representation of a *double*. We could reinterpret the bits using **reinterpret_cast**, which violates strict aliasing rules:
    @@ -284,6 +288,8 @@ union u1

    u1 u ;
    u.d = d ;

    printf( "%" PRId64 "\n", u.n ) ; // UB in C++ n is not the active member
    ```
    At a sufficient optimization level all three cases should generate identical code using just register moves [live Compiler Explorer Example](https://godbolt.org/g/BfZGwX).
    @@ -337,20 +343,17 @@ int bar( unsigned char *p, size_t len ) {
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which gives a simple and safe way to type-pun as well as being usable in constexpr context. It requires us to use an intermediate struct in the case where *To* and *From* types don't have the same size<sup id="a15">[15](#f15)</sup>. We will use a struct containing a four characater array(*assumes 4 byte unsigned int*) to be the *From* type and *unsigned int* as the *To* type.:
    ```cpp
    // Asserting unsigned int is size 4
    static_assert( sizeof( unsigned int ) == 4, "" ) ;
    struct four_chars {
    unsigned char arr[4] = {} ;
    struct uint_chars {
    unsigned char arr[sizeof( unsigned int )] = {} ; // Assume sizeof( unsigned int ) == 4
    } ;
    // Assume len is a multiple of 4
    int bar( unsigned char *p, size_t len ) {
    int result = 0 ;
    for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    four_chars f ;
    std::memcpy( f.arr, p, sizeof(unsigned int)) ;
    uint_chars f ;
    std::memcpy( f.arr, &p[index], sizeof(unsigned int)) ;
    unsigned int result = bit_cast<unsigned int>(f) ;
    result += foo( result ) ;
    @@ -392,7 +395,7 @@ int x = *reinterpret_cast<int*>(arr) ;
    as an int violates strict aliasing but may also violate alignment requirements if **arr** has an alignment of 1 or 2 bytes. Which could lead to reduced performance or a bus error<sup id="a18">[18](#f18)</sup> in some situations. Whereas using **alignas** to force the array to the same alignment of *int* would prevent violating alignment requirements:
    ```cpp
    alignas(aligof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ;
    alignas(alignof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ;
    int x = *reinterpret_cast<int*>(arr) ;
    ```

    @@ -429,9 +432,9 @@ clang although it allows these flags apparently does not actually implement the
    Another tool we have available to us is dynamic analysis using ASan<sup id="a22">[22](#f22)</sup> we can catch misaligned loads and stores. Although these are not directly strict aliasing violations they are a common result of strict aliasing violations. For example the following cases<sup id="a23">[23](#f23)</sup> will generate runtime errors when built with clang using **-fsanitize=address**

    ```cpp
    int *x = new int[2]; // 8 bytes: [0,7].
    int *u = (int*)((char*)x + 6); // regardless of alignment of x this will not be an aligned address
    *u = 1; // Access to range [6-9]
    int *x = new int[2]; // 8 bytes: [0,7].
    int *u = (int*)((char*)x + 6); // regardless of alignment of x this will not be an aligned address
    *u = 1; // Access to range [6-9]
    printf( "%d\n", *u ) ; // Access to range [6-9]
    ```
    @@ -441,7 +444,7 @@ The last tool I will recommend is C++ specific and not strictly a tool but a cod
    Type punning is a tool for treating a type like a bag of bits, which can be useful or even essential in many low level tasks. Traditionally compilers did not take advantage of optimizations opportunities around strict aliasing violations and so software developers became used to using these methods to perform type punning and the vast majority of type punning code we will find online will violate the strict aliasing rule.
    Optimizer's are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations and we can expect the optimizations will only get better and will break more and more code we have been used to just working.
    Optimizers are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations and we can expect the optimizations will only get better and will break more and more code we have been used to just working.
    We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but they will only catch a small fraction of the cases.
  20. @shafik shafik revised this gist Mar 14, 2018. 1 changed file with 14 additions and 8 deletions.
    22 changes: 14 additions & 8 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,13 @@
    # What is Strict Aliasing Rule and Why do we care?


    What is strict aliasing? Well, first we will describe what is aliasing and then we can learn what being strict about it means. In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is [undefined behavior](http://en.cppreference.com/w/cpp/language/ub)(**UB**). Once we have undefined behavior all bets are off, the results of our program are no longer reliable. Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the future versions of a compiler with a new optimization will break code we thought was valid. This is undesirable and it is a worth while goal to understand the strict aliasing rules and how to avoid violating them.
    What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.

    Let's look at some examples, then we can talk about exactly what the standard(s) say, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. The first example should not be surprising [live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA):
    In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term *strict aliasing rule*. If we attempt to access a value using a type not allowed it is classified as [undefined behavior](http://en.cppreference.com/w/cpp/language/ub)(**UB**). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.

    Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we thought was valid. This is undesirable and it is a worth while goal to understand the strict aliasing rules and how to avoid violating them.

    Let's look at some examples, then we can talk about exactly what the standard(s) say, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. Here is na example that should not be surprising ([live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA)):

    ```cpp
    int x = 10 ;
    @@ -16,7 +20,7 @@ std::cout << x << "\n" ;

    We have a *int\** pointing to memory occupied by an *int* and this is a valid aliasing. The optimizer must assume that assignments through **ip** could update the value occupied by **x**.

    The next example shows an aliasing that leads to undefined behavior([live example](https://wandbox.org/permlink/8qA8JyJRVHtS9LPf)):
    The next example shows an aliasing that leads to undefined behavior ([live example](https://wandbox.org/permlink/8qA8JyJRVHtS9LPf)):

    ```cpp
    int foo( float *f, int *i ) {
    @@ -52,9 +56,9 @@ mov eax, 1
    ret
    ```

    The optimizer using Type-Based Alias Analysis(TBAA)<sup id="a6">[6](#f6)</sup> assumes **1** will be returned and directly moves the constant value into register **eax** which carries the return value. TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores. In this case TBAA knows that a *float* can not alias and *int* and optimizes away the load of **i**.
    The optimizer using Type-Based Alias Analysis (TBAA)<sup id="a6">[6](#f6)</sup> assumes **1** will be returned and directly moves the constant value into register **eax** which carries the return value. TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores. In this case TBAA knows that a *float* can not alias and *int* and optimizes away the load of **i**.

    ## Now to the Rule-Book
    ## Now, to the Rule-Book

    What exactly does the standard say we are allowed and not allowed to do? The standard language is not straight forward, so for each item I will try to provide a code examples that demonstrates the meaning.

    @@ -118,7 +122,8 @@ foobar( &f, &f.x ) ;
    ```c
    int x = 65;
    char *p = (char *)&x ;
    printf("%c\n", *p ) ; // *p gives us an lvalue expression of type char which is a character type
    printf("%c\n", *p ) ; // *p gives us an lvalue expression of type char which is a character type.
    // The results are not portable due to endianness issues.
    ```

    ### What the C++17 Draft Standard say
    @@ -221,11 +226,12 @@ int foo( std::byte &b, uint32_t &ui ) {
    So although we can see that C and C++ say similar things about aliasing there are some differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) nor [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    ```c
    // The following is valid C but not valid C++
    void *p = malloc(sizeof(float)) ;
    float f = 1.0f ;
    memcpy( *p, &f, sizeof(float)) ; // Effective type of *p is float in C
    memcpy( p, &f, sizeof(float)) ; // Effective type of *p is float in C
    // Or
    float *fp = p ;
    float *fp = p ;
    *fp = 1.0f ; // Effective type of *p is float in C
    ```

  21. @shafik shafik revised this gist Mar 11, 2018. 1 changed file with 20 additions and 16 deletions.
    36 changes: 20 additions & 16 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,9 @@
    # What is Strict Aliasing Rule and Why do we care?


    What is strict aliasing? Well, first of all what is aliasing and then we can talk about bring strict about it. In C and C++ aliasing has to do with what types of expressions we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is [undefined behavior](http://en.cppreference.com/w/cpp/language/ub). Once we have undefined behavior all bets are off, the results of our program are no longer reliable. Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the future versions of a compiler with a new optimization will break code we thought was valid. This is obviously undesirable and so it is a worth while goal to understand strict aliasing and how to avoid violating it.
    What is strict aliasing? Well, first we will describe what is aliasing and then we can learn what being strict about it means. In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is [undefined behavior](http://en.cppreference.com/w/cpp/language/ub)(**UB**). Once we have undefined behavior all bets are off, the results of our program are no longer reliable. Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the future versions of a compiler with a new optimization will break code we thought was valid. This is undesirable and it is a worth while goal to understand the strict aliasing rules and how to avoid violating them.

    Let's look at some examples and then we can talk about exactly what the standard(s) says, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. The first example should not be surprising [live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA):
    Let's look at some examples, then we can talk about exactly what the standard(s) say, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. The first example should not be surprising [live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA):

    ```cpp
    int x = 10 ;
    @@ -35,7 +35,7 @@ int main() {
    }
    ```
    In the function **foo** we take an *int\** and a *float\**, in this example we call **foo** and set both parameters to point to the same memory location which in this example contains an *int*. We may naively expect the result of the second **cout** to **0** but with optimization enabled using **-O2** both gcc and clang produce the following result:
    In the function **foo** we take an *int\** and a *float\**, in this example we call **foo** and set both parameters to point to the same memory location which in this example contains an *int*. We may naively expect the result of the second **cout** to be **0** but with optimization enabled using **-O2** both gcc and clang produce the following result:
    ```
    0
    @@ -52,7 +52,7 @@ mov eax, 1
    ret
    ```

    The optimizer using Type-Based Alias Analysis(TBAA)<sup id="a6">[6](#f6)</sup> assumes **1** will be returned and directly moves the constant value into register **eax** which carries the return value. TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores. In this case knowing TBAA knows that a *float* can not alias and *int* and optimizes away the load of **i**.
    The optimizer using Type-Based Alias Analysis(TBAA)<sup id="a6">[6](#f6)</sup> assumes **1** will be returned and directly moves the constant value into register **eax** which carries the return value. TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores. In this case TBAA knows that a *float* can not alias and *int* and optimizes away the load of **i**.

    ## Now to the Rule-Book

    @@ -218,15 +218,15 @@ int foo( std::byte &b, uint32_t &ui ) {
    ## Subtle Differences
    So although we can see that C and C++ say similar things about aliasing there are some significant differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) nor [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    So although we can see that C and C++ say similar things about aliasing there are some differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) nor [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    ```c
    void *p = malloc(sizeof(float)) ;
    float f = 1.0f ;
    memcpy( *p, &f, sizeof(float)) ; // Effective type of *p is float in C
    // Or
    // Or
    float *fp = p ;
    *fp = 1.0f ; // Effective type of *p is float in C
    *fp = 1.0f ; // Effective type of *p is float in C
    ```

    Neither of these methods is sufficient in C++ which requires **placement new**:
    @@ -237,13 +237,13 @@ float *fp = new (p) float{1.0f} ; // Dynamic type of *p is now float
    ## Are int8_t and uint8_t char types?
    Theoretically neither *int8_t* nor *uint8_t* have to be *char* types but practically they are implemented that way. We can see that glibc typedefs but [int8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L36) and [uint8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L48) to *singed char* and *unsigned char* respectively.
    Theoretically neither *int8_t* nor *uint8_t* have to be *char* types but practically they are implemented that way. This is important because if they are really *char* types then they also alias like *char* types which is you are unaware of can [lead to suprising performance impacts](https://stackoverflow.com/q/26295216/1708801). We can see that glibc typedefs [int8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L36) and [uint8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L48) to *singed char* and *unsigned char* respectively.
    At least for *C++* it would be an ABI break to change that since this would change name mangling and would break any API using either of those types in their interface.
    This would be hard to chance since at least for *C++* it would be an ABI break. Since this would change name mangling and would break any API using either of those types in their interface.
    ## How do we Type Pun correctly?
    Sometimes we want to treat a piece of memory like it is bag of bits and circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for type punning and optimize it away to generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    Sometimes we want to treat a piece of memory like it is bag of bits, circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for type punning and optimize it away to generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    ```cpp
    static_assert( sizeof( double ) == sizeof( int64_t ), "" ) ;
    @@ -252,15 +252,19 @@ static_assert( sizeof( double ) == sizeof( int64_t ), "" ) ;
    and we want to obtain the integer representation of a *double*. We could reinterpret the bits using **reinterpret_cast**, which violates strict aliasing rules:

    ```cpp
    std::int64_t n ;
    n = *reinterpret_cast<std::int64_t *>(&d) ;
    void func1( double d ) {
    std::int64_t n ;
    n = *reinterpret_cast<std::int64_t *>(&d) ; // UB, int64_t is not allowed to alias double
    // ...
    ```
    or we could use **memcpy**:
    ```cpp
    std::int64_t n;
    std::memcpy(&n, &d, sizeof d);
    void func1( double d ) {
    std::int64_t n;
    std::memcpy(&n, &d, sizeof d);
    //...
    ```

    or we could use the old type punning trick via a union<sup id="a13">[13](#f13)</sup>(undefined behavior in C++):
    @@ -278,7 +282,7 @@ u.d = d ;

    At a sufficient optimization level all three cases should generate identical code using just register moves [live Compiler Explorer Example](https://godbolt.org/g/BfZGwX).

    What if we want to type punning an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int* the optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):
    But, what if we want to type punning an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int* the optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):

    ```cpp
    // Simple operation just return the value back
    @@ -324,7 +328,7 @@ int bar( unsigned char *p, size_t len ) {
    ## C++20 and bit_cast
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which gives a simple and safe way to type-pun as well as being usable in constexpr context. It requires us to use an intermediate struct in this case since the *To* and *From* type have to have the same size<sup id="a15">[15](#f15)</sup>. We will use a struct containing a four characater array(*assumes 4 byte unsigned int*) to be the *From* type and *unsigned int* as the *To* type.:
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which gives a simple and safe way to type-pun as well as being usable in constexpr context. It requires us to use an intermediate struct in the case where *To* and *From* types don't have the same size<sup id="a15">[15](#f15)</sup>. We will use a struct containing a four characater array(*assumes 4 byte unsigned int*) to be the *From* type and *unsigned int* as the *To* type.:
    ```cpp
    // Asserting unsigned int is size 4
  22. @shafik shafik revised this gist Mar 8, 2018. 1 changed file with 6 additions and 1 deletion.
    7 changes: 6 additions & 1 deletion WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -235,6 +235,11 @@ Neither of these methods is sufficient in C++ which requires **placement new**:
    float *fp = new (p) float{1.0f} ; // Dynamic type of *p is now float
    ```
    ## Are int8_t and uint8_t char types?
    Theoretically neither *int8_t* nor *uint8_t* have to be *char* types but practically they are implemented that way. We can see that glibc typedefs but [int8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L36) and [uint8_t](https://github.com/lattera/glibc/blob/master/sysdeps/generic/stdint.h#L48) to *singed char* and *unsigned char* respectively.
    At least for *C++* it would be an ABI break to change that since this would change name mangling and would break any API using either of those types in their interface.
    ## How do we Type Pun correctly?
    @@ -448,7 +453,7 @@ We have standard conformant methods for type punning and in release and sometime
    <br>
    <b id="f7">7</b> Demonstrates torn loads for misaligned atomics https://gist.github.com/michaeljclark/31fc67fe41d233a83e9ec8e3702398e8 and tweet referencing this example https://twitter.com/corkmork/status/944421528829009925 [↩](#a7)
    <br>
    <b id="f8">8</b> Dynamic type as described on cppreference http://en.cppreference.com/w/cpp/language/type#Dynamic_type [↩](#a8)
    <b id="f8">8</b> Comment in gcc bug report explaining why changing int8_t and uint8_t to not be char types would be an ABI break for C++ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110#c13 and twitter thread discussing the issue https://twitter.com/shafikyaghmour/status/822179548825468928 [↩](#a8)
    <br>
    <b id="f9">9</b> "New” Value Terminology which explains how glvalue, xvalue and prvalue came about http://www.stroustrup.com/terminology.pdf [↩](#a9)
    <br>
  23. @shafik shafik revised this gist Mar 8, 2018. 1 changed file with 4 additions and 1 deletion.
    5 changes: 4 additions & 1 deletion WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -381,6 +381,9 @@ alignas(aligof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ;
    int x = *reinterpret_cast<int*>(arr) ;
    ```

    ### Atomics

    Another unexpected penalty to unaligned accesses is that is breaks atomics on some architectures. Atomics stores may not appear atomic to other threads on x86 if they are misaligned<sup id="a7">[7](#f7)</sup>.

    ## Catching Strict Aliasing Violations

    @@ -443,7 +446,7 @@ We have standard conformant methods for type punning and in release and sometime
    <br>
    <b id="f6">6</b> Type-Based Alias Analysis http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Website/articles/DDJ/2000/0010/0010d/0010d.htm [↩](#a6)
    <br>
    <b id="f7">7</b> Compatible type described on cppreferene http://en.cppreference.com/w/c/language/type#Compatible_types [↩](#a7)
    <b id="f7">7</b> Demonstrates torn loads for misaligned atomics https://gist.github.com/michaeljclark/31fc67fe41d233a83e9ec8e3702398e8 and tweet referencing this example https://twitter.com/corkmork/status/944421528829009925 [↩](#a7)
    <br>
    <b id="f8">8</b> Dynamic type as described on cppreference http://en.cppreference.com/w/cpp/language/type#Dynamic_type [↩](#a8)
    <br>
  24. @shafik shafik revised this gist Mar 8, 2018. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -52,7 +52,7 @@ mov eax, 1
    ret
    ```

    The optimizer assumes **1** will be returned and directly moves the constant value into register **eax** which carries the return value.
    The optimizer using Type-Based Alias Analysis(TBAA)<sup id="a6">[6](#f6)</sup> assumes **1** will be returned and directly moves the constant value into register **eax** which carries the return value. TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores. In this case knowing TBAA knows that a *float* can not alias and *int* and optimizes away the load of **i**.

    ## Now to the Rule-Book

    @@ -441,7 +441,7 @@ We have standard conformant methods for type punning and in release and sometime
    <br>
    <b id="f5">5</b> Understanding lvalues and rvalues in C and C++ https://eli.thegreenplace.net/2011/12/15/understanding-lvalues-and-rvalues-in-c-and-c [↩](#a5)
    <br>
    <b id="f6">6</b> Effective type described on cppreference http://en.cppreference.com/w/c/language/object#Effective_type [↩](#a6)
    <b id="f6">6</b> Type-Based Alias Analysis http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Website/articles/DDJ/2000/0010/0010d/0010d.htm [↩](#a6)
    <br>
    <b id="f7">7</b> Compatible type described on cppreferene http://en.cppreference.com/w/c/language/type#Compatible_types [↩](#a7)
    <br>
  25. @shafik shafik revised this gist Mar 7, 2018. 1 changed file with 4 additions and 4 deletions.
    8 changes: 4 additions & 4 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,7 @@
    # What is Strict Aliasing Rule and Why do we care?


    What is strict aliasing? Well, first of all what is aliasing and then we can talk about bring strict about it. In C and C++ aliasing has to do with what types of expressions we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is undefined behavior<sup id="a1">[1](#f1)</sup>. Once we have undefined behavior all bets are off, the results of our program are no longer reliable. Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the future versions of a compiler with a new optimization will break code we thought was valid. This is obviously undesirable and so it is a worth while goal to understand strict aliasing and how to avoid violating it.
    What is strict aliasing? Well, first of all what is aliasing and then we can talk about bring strict about it. In C and C++ aliasing has to do with what types of expressions we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is [undefined behavior](http://en.cppreference.com/w/cpp/language/ub). Once we have undefined behavior all bets are off, the results of our program are no longer reliable. Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the future versions of a compiler with a new optimization will break code we thought was valid. This is obviously undesirable and so it is a worth while goal to understand strict aliasing and how to avoid violating it.

    Let's look at some examples and then we can talk about exactly what the standard(s) says, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. The first example should not be surprising [live example](https://wandbox.org/permlink/7sCJTAyrifZ0zfFA):

    @@ -218,7 +218,7 @@ int foo( std::byte &b, uint32_t &ui ) {
    ## Subtle Differences
    So although we can see that C and C++ say similar things about aliasing there are some significant differences that we should be aware of. C++ does not have C's concept of *effective type*<sup id="a6">[6](#f6)</sup> nor *compatible type*<sup id="a7">[7](#f7)</sup> and C does not have C++'s concept of *dynamic type*<sup id="a8">[8](#f8)</sup> or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy()**<sup id="a11">[11](#f11)</sup>.
    So although we can see that C and C++ say similar things about aliasing there are some significant differences that we should be aware of. C++ does not have C's concept of [effective type](http://en.cppreference.com/w/c/language/object#Effective_type) nor [compatible type](http://en.cppreference.com/w/c/language/type#Compatible_types) and C does not have C++'s concept of [dynamic type](http://en.cppreference.com/w/cpp/language/type#Dynamic_type) or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy**<sup id="a11">[11](#f11)</sup>.
    ```c
    void *p = malloc(sizeof(float)) ;
    @@ -238,7 +238,7 @@ float *fp = new (p) float{1.0f} ; // Dynamic type of *p is now float
    ## How do we Type Pun correctly?
    Sometimes we want to treat a piece of memory like it is bag of bits and circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy()**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy()** for type punning and optimize it away to generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    Sometimes we want to treat a piece of memory like it is bag of bits and circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy** for type punning and optimize it away to generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    ```cpp
    static_assert( sizeof( double ) == sizeof( int64_t ), "" ) ;
    @@ -251,7 +251,7 @@ std::int64_t n ;
    n = *reinterpret_cast<std::int64_t *>(&d) ;
    ```

    or we could use **memcpy()**:
    or we could use **memcpy**:

    ```cpp
    std::int64_t n;
  26. @shafik shafik revised this gist Mar 6, 2018. 1 changed file with 26 additions and 18 deletions.
    44 changes: 26 additions & 18 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -273,22 +273,24 @@ u.d = d ;

    At a sufficient optimization level all three cases should generate identical code using just register moves [live Compiler Explorer Example](https://godbolt.org/g/BfZGwX).

    What if we want to type punning an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int* the optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/kCjkx2):
    What if we want to type punning an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int* the optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/acjqjD):

    ```cpp
    // Simple operation just return the value back
    int foo(unsigned int x ) { return x ;}

    int pop( unsigned char *p, size_t len ) {
    int pop = 0 ;
    // Assume len is a multiple of sizeof(unsigned int)
    int bar( unsigned char *p, size_t len ) {
    int result = 0 ;

    for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    unsigned int ui = 0;
    std::memcpy( &ui, &p[index], sizeof(unsigned int) ) ;

    pop += foo( ui ) ;
    result += foo( ui ) ;
    }

    return pop ;
    return result ;
    }
    ```
    @@ -301,43 +303,50 @@ add eax, dword ptr [rdi + rcx]
    Same code but using **reinterpret_cast** to type pun(violates strict aliasing):

    ```cpp
    int pop( unsigned char *p, size_t len ) {
    int pop = 0 ;
    // Assume len is a multiple of sizeof(unsigned int)
    int bar( unsigned char *p, size_t len ) {
    int result = 0 ;

    for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    unsigned int ui = *reinterpret_cast<unsigned int*>(&p[index]) ;
    unsigned int ui = *reinterpret_cast<unsigned int*>(&p[index]) ;

    pop += foo( ui ) ;
    result += foo( ui ) ;
    }

    return pop ;
    return result ;
    }
    ```
    ## C++20 and bit_cast
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which requires using and intermediate struct since requires the *To* and *From* type to have the same size<sup id="a15">[15](#f15)</sup>:
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which gives a simple and safe way to type-pun as well as being usable in constexpr context. It requires us to use an intermediate struct in this case since the *To* and *From* type have to have the same size<sup id="a15">[15](#f15)</sup>. We will use a struct containing a four characater array(*assumes 4 byte unsigned int*) to be the *From* type and *unsigned int* as the *To* type.:
    ```cpp
    // Asserting unsigned int is size 4
    static_assert( sizeof( unsigned int ) == 4, "" ) ;
    struct four_chars {
    unsigned char arr[4] = {} ;
    } ;
    int pop( unsigned char *p, size_t len ) {
    int pop = 0 ;
    // Assume len is a multiple of 4
    int bar( unsigned char *p, size_t len ) {
    int result = 0 ;
    for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    four_chars f ;
    std::memcpy( f.arr, p, 4) ;
    std::memcpy( f.arr, p, sizeof(unsigned int)) ;
    unsigned int result = bit_cast<unsigned int>(f) ;
    pop += foo( result ) ;
    result += foo( result ) ;
    }
    return pop ;
    return result ;
    }
    ```

    It is unfortunate that we need this intermediate type but that is the current contraint of **bit_cast**.

    ## Alignment

    We have seen in previous examples violating strict aliasing rules can lead to stores being optimized away. Violating strict aliasing rules can also lead to violations of alignment requirement. Both the C and C++ standard state that objects have *alignment requirements* which restrict where in memory objects can be allocated and therefore accessed<sup id="a17">[17](#f17)</sup>. C11 section *6.2.8 Alignment of objects* says:
    @@ -449,8 +458,7 @@ We have standard conformant methods for type punning and in release and sometime
    <br>
    <b id="f13">13</b> Unions and memcpy and type punning https://stackoverflow.com/q/25664848/1708801 [↩](#a13)
    <br>
    <b id="f14">14</b> Revision two of the bit_cast<> proposal http://www.open-std.org/jtc1/sc22/wg21/docs/papers
    /2017/p0476r2.html [↩](#a14)
    <b id="f14">14</b> Revision two of the bit_cast<> proposal http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0476r2.html [↩](#a14)
    <br>
    <b id="f15">15</b> How to use bit_cast to type pun a unsigned char array https://gist.github.com/shafik/a956a17d00024b32b35634eeba1eb49e [↩](#a15)
    <br>
  27. @shafik shafik revised this gist Mar 6, 2018. 1 changed file with 11 additions and 9 deletions.
    20 changes: 11 additions & 9 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -87,6 +87,8 @@ unsigned int *p = (unsigned int*)&x ;
    printf("%u\n", *p ) ; // *p gives us an lvalue expression of type unsigned int which corresponds to
    // the effective type of the object
    ```
    [See Footnote 12 for gcc/clang extension](#f12), that allows assigning *unsigned int\** to *int\** even though they are not compatible types.
    > — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    @@ -216,7 +218,7 @@ int foo( std::byte &b, uint32_t &ui ) {
    ## Subtle Differences
    So although we can see that C and C++ say similar things about aliasing there are some significant differences that we should be aware of. C++ does not have C's concept of *effective type*<sup id="a6">[6](#f6)</sup> nor *compatible type*<sup id="a7">[7](#f7)</sup> and C does not have C++'s concept of *dynamic type*<sup id="a8">[8](#f8)</sup> or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue* expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the *effective type* by for example writing to the memory through an *lvalue* or **memcpy()**.
    So although we can see that C and C++ say similar things about aliasing there are some significant differences that we should be aware of. C++ does not have C's concept of *effective type*<sup id="a6">[6](#f6)</sup> nor *compatible type*<sup id="a7">[7](#f7)</sup> and C does not have C++'s concept of *dynamic type*<sup id="a8">[8](#f8)</sup> or *similar type*. Although both have *lvalue* and *rvalue* expressions<sup id="a5">[5](#f5)</sup>, C++ also has *glvalue*, *prvalue* and *xvalue*<sup id="a9">[9](#f9)</sup> expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the *effective type*<sup id="a10">[10](#f10)</sup> by for example writing to the memory through an *lvalue* or **memcpy()**<sup id="a11">[11](#f11)</sup>.
    ```c
    void *p = malloc(sizeof(float)) ;
    @@ -256,7 +258,7 @@ std::int64_t n;
    std::memcpy(&n, &d, sizeof d);
    ```
    or we could use the old type punning trick via a union(undefined behavior in C++):
    or we could use the old type punning trick via a union<sup id="a13">[13](#f13)</sup>(undefined behavior in C++):
    ```cpp
    union u1
    @@ -314,7 +316,7 @@ int pop( unsigned char *p, size_t len ) {
    ## C++20 and bit_cast
    In C++20 we may gain **bit_cast<>** which requires using and intermediate struct since requires the *To* and *From* type to have the same size:
    In C++20 we may gain **bit_cast<>**<sup id="a14">[14](#f14)</sup> which requires using and intermediate struct since requires the *To* and *From* type to have the same size<sup id="a15">[15](#f15)</sup>:
    ```cpp
    struct four_chars {
    @@ -338,7 +340,7 @@ int pop( unsigned char *p, size_t len ) {

    ## Alignment

    We have seen in previous examples violating strict aliasing rules can lead to stores being optimized away. Violating strict aliasing rules can also lead to violations of alignment requirement. Both the C and C++ standard state that objects have *alignment requirements* which restrict where in memory objects can be allocated and therefore accessed. C11 section *6.2.8 Alignment of objects* says:
    We have seen in previous examples violating strict aliasing rules can lead to stores being optimized away. Violating strict aliasing rules can also lead to violations of alignment requirement. Both the C and C++ standard state that objects have *alignment requirements* which restrict where in memory objects can be allocated and therefore accessed<sup id="a17">[17](#f17)</sup>. C11 section *6.2.8 Alignment of objects* says:

    >Complete object types have alignment requirements which place restrictions on the addresses at which objects of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type: stricter alignment can be requested using the _Alignas keyword.
    @@ -363,10 +365,10 @@ char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ; // Could be allocated on a 1 or 2 byte
    int x = *reinterpret_cast<int*>(arr) ;
    ```
    as an int violates strict aliasing but may also violate alignment requirements if **arr** has an alignment of 1 or 2 bytes. Which could lead to reduced performance or a bus error in some situations. Whereas using **alignas** to force the array to the same alignment of *int* would prevent violating alignment requirements:
    as an int violates strict aliasing but may also violate alignment requirements if **arr** has an alignment of 1 or 2 bytes. Which could lead to reduced performance or a bus error<sup id="a18">[18](#f18)</sup> in some situations. Whereas using **alignas** to force the array to the same alignment of *int* would prevent violating alignment requirements:
    ```cpp
    alignas(aligof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ; // Could be allocated on a 1 or 2 byte boundry
    alignas(aligof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ;
    int x = *reinterpret_cast<int*>(arr) ;
    ```

    @@ -375,7 +377,7 @@ int x = *reinterpret_cast<int*>(arr) ;

    We don't have a lot of good tools for catching strict aliasing, the tools we have will catch some cases of strict aliasing violations and some cases of misaligned loads and stores.

    gcc using the flag **-fstrict-aliasing** and **-Wstrict-aliasing** can catch some cases although not without false positives/negatives. For example the following cases will generate a warning in gcc [see it live](https://wandbox.org/permlink/ERccUsWgS9hDpVqM):
    gcc using the flag **-fstrict-aliasing** and **-Wstrict-aliasing**<sup id="a19">[19](#f19)</sup> can catch some cases although not without false positives/negatives. For example the following cases<sup id="a21">[21](#f21)</sup> will generate a warning in gcc [see it live](https://wandbox.org/permlink/ERccUsWgS9hDpVqM):

    ```cpp
    int a = 1;
    @@ -395,9 +397,9 @@ p=&a;
    printf("%i\n", j = *((short*)p));
    ```

    clang although it allows these flags apparently does not actually implement the warnings.
    clang although it allows these flags apparently does not actually implement the warnings<sup id="a20">[20](#f20)</sup>.

    Another tool we have available to us is dynamic analysis using ASan we can catch misaligned loads and stores. Although these are not directly strict aliasing violations they are a common result of strict aliasing violations. For example the following cases will generate runtime errors when built with clang using **-fsanitize=address**
    Another tool we have available to us is dynamic analysis using ASan<sup id="a22">[22](#f22)</sup> we can catch misaligned loads and stores. Although these are not directly strict aliasing violations they are a common result of strict aliasing violations. For example the following cases<sup id="a23">[23](#f23)</sup> will generate runtime errors when built with clang using **-fsanitize=address**

    ```cpp
    int *x = new int[2]; // 8 bytes: [0,7].
  28. @shafik shafik revised this gist Mar 4, 2018. 1 changed file with 29 additions and 20 deletions.
    49 changes: 29 additions & 20 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -16,7 +16,7 @@ std::cout << x << "\n" ;

    We have a *int\** pointing to memory occupied by an *int* and this is a valid aliasing. The optimizer must assume that assignments through **ip** could update the value occupied by **x**.

    The next example shows an aliasing that leads to undefined behavior [live example](https://wandbox.org/permlink/8qA8JyJRVHtS9LPf):
    The next example shows an aliasing that leads to undefined behavior([live example](https://wandbox.org/permlink/8qA8JyJRVHtS9LPf)):

    ```cpp
    int foo( float *f, int *i ) {
    @@ -42,7 +42,7 @@ In the function **foo** we take an *int\** and a *float\**, in this example we c
    1
    ```
    Which may not be expected but is perfectly valid since we have invoked undefined behavior. A *float* can not validly alias an *int* object. Therefore the optimizer can assume the *constant 1* stored when dereferecing **i** will be the return value since a store through **f** could not validly affect an *int* object. Plugging the code in Compiler Explorer shows this is exactly what is happening [live example](https://godbolt.org/g/yNV5aj):
    Which may not be expected but is perfectly valid since we have invoked undefined behavior. A *float* can not validly alias an *int* object. Therefore the optimizer can assume the *constant 1* stored when dereferecing **i** will be the return value since a store through **f** could not validly affect an *int* object. Plugging the code in Compiler Explorer shows this is exactly what is happening([live example](https://godbolt.org/g/yNV5aj)):
    ```assembly
    foo(float*, int*): # @foo(float*, int*)
    @@ -62,35 +62,39 @@ What exactly does the standard say we are allowed and not allowed to do? The sta

    The **C11** draft standard<sup id="a2">[2](#f2)</sup> says the following in section *6.5 Expressions paragraph 7*:

    >An object shall have its stored value accessed only by an lvalue expression<sup id="a5">[5](#f5)</sup> that has one of the following types:88)
    >An object shall have its stored value accessed only by an lvalue expression<sup id="a5">[5](#f5)</sup> that has one of the following types:<sup>88)</sup>
    > — a type compatible with the effective type of the object,
    ```c
    int x ;
    int x = 1 ;
    int *p = &x ;
    printf("%d\n", *p ) ; // *p gives us an lvalue expression of type int which is compatible with int
    ```
    > — a qualified version of a type compatible with the effective type of the object,
    ```c
    int x ;
    int x = 1;
    const int *p = &x ;
    printf("%d\n", *p ) ; // *p gives us an lvalue expression of type const int which is compatible with int
    ```

    > — a type that is the signed or unsigned type corresponding to the effective type of the object,
    ```c
    int x ;
    int x = 1;
    unsigned int *p = (unsigned int*)&x ;
    printf("%u\n", *p ) ; // *p gives us an lvalue expression of type unsigned int which corresponds to
    // the effective type of the object
    ```
    > — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    ```c
    int x ;
    int x = 1;
    const unsigned int *p = (const unsigned int*)&x ;
    printf("%u\n", *p ) ; // *p gives us an lvalue expression of type const unsigned int which is a unsigned type
    // that corresponds with to a qualified verison of the effective type of the object
    ```

    > — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    @@ -100,7 +104,8 @@ struct foo {
    int x ;
    } ;

    void foobar( struct foo *fp, int *ip ) ;
    void foobar( struct foo *fp, int *ip ) ; // struct foo is an aggregate that includes int among its members so it can
    // can alias with *ip

    foo f ;
    foobar( &f, &f.x ) ;
    @@ -109,8 +114,9 @@ foobar( &f, &f.x ) ;
    > — a character type.
    ```c
    int x ;
    char *p = &x ;
    int x = 65;
    char *p = (char *)&x ;
    printf("%c\n", *p ) ; // *p gives us an lvalue expression of type char which is a character type
    ```

    ### What the C++17 Draft Standard say
    @@ -121,18 +127,19 @@ The C++17 draft standard<sup id="a3">[3](#f3)</sup> in section *\[basic.lval\]
    > (11.1) — the dynamic type of the object,
    ```cpp
    int x ;
    int *ip = &x ;

    void *p = malloc( sizeof(int) ) ;
    int *ip = new (p) int{0} ;
    void *p = malloc( sizeof(int) ) ; // We have allocated storage but not started the lifetime of an object
    int *ip = new (p) int{0} ; // Placement new changes the dynamic type of the object to int
    std::cout << *ip << "\n" ; // *ip gives us a glvalue expression of type int which matches the dynamic type
    // of the allocated object
    ```
    > (11.2) — a cv-qualified version of the dynamic type of the object,
    ```cpp
    int x ;
    int x = 1;
    const int *cip = &x ;
    std::cout << *cip << "\n" ; // *cip gives us a glvalue expression of type const int which is a cv-qualified
    // version of the dynamic type of x
    ```

    > (11.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
    @@ -145,6 +152,7 @@ const int *cip = &x ;
    > (11.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,
    ```cpp
    // Both si and ui are signed or unsigned types corresponding to each others dynamic types
    // We can see from this godbolt(https://godbolt.org/g/KowGXB) the optimizer assumes aliasing.
    signed int foo( signed int &si, unsigned int &ui ) {
    si = 1;
    @@ -197,11 +205,12 @@ int foobar( foo &f, bar &b ) {
    > (11.8) — a char, unsigned char, or std::byte type.

    ```cpp
    uint8_t foo( std::byte &b, uint32_t &ui ) {
    b = static_cast<std::byte>('a' ) ;
    ui = 0xFFFFFFFF ;
    int foo( std::byte &b, uint32_t &ui ) {
    b = static_cast<std::byte>('a') ;
    ui = 0xFFFFFFFF ;

    return static_cast<uint8_t>( b ) ;
    return std::to_integer<int>( b ) ; // b gives us a glvalue expression of type std::byte which can alias
    // an object of type uint32_t
    }
    ```
  29. @shafik shafik revised this gist Mar 3, 2018. 1 changed file with 18 additions and 12 deletions.
    30 changes: 18 additions & 12 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -227,20 +227,20 @@ float *fp = new (p) float{1.0f} ; // Dynamic type of *p is now float
    ## How do we Type Pun correctly?
    Sometimes we want to treat a piece of memory like it is bag of bits and circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy()**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy()** for type punning and optimize it away to generate register to register moves. For example we know *int64_t* is the same size as *double*:
    Sometimes we want to treat a piece of memory like it is bag of bits and circumvent the type system and interpret it as a different type. This is called *type punning*, to reinterpret a segment of memory as another type. The standard blessed method for *type punning* in both C and C++ is **memcpy()**. This may seem a little heavy handed but the optimizer should recognize the use of **memcpy()** for type punning and optimize it away to generate register to register moves. For example if we know *int64_t* is the same size as *double*:
    ```cpp
    static_assert( sizeof( double ) == sizeof( int64_t ), "" ) ;
    ```

    we want to obtain the integer representation of a *double*. We could reinterpret the bits using **reinterpret_cast** which violates strict aliasing rules:
    and we want to obtain the integer representation of a *double*. We could reinterpret the bits using **reinterpret_cast**, which violates strict aliasing rules:

    ```cpp
    std::int64_t n ;
    n = *reinterpret_cast<std::int64_t *>(&d) ;
    ```

    Or we could use **memcpy()**:
    or we could use **memcpy()**:

    ```cpp
    std::int64_t n;
    @@ -250,19 +250,19 @@ std::memcpy(&n, &d, sizeof d);
    or we could use the old type punning trick via a union(undefined behavior in C++):
    ```cpp
    union u1
    {
    std::int64_t n;
    double d ;
    } ;
    union u1
    {
    std::int64_t n;
    double d ;
    } ;
    u1 u ;
    u.d = d ;
    u1 u ;
    u.d = d ;
    ```

    At a sufficient optimization level all three cases should generate identical code using just register moves [live Compiler Explorer Example](https://godbolt.org/g/BfZGwX).

    Even for more interesting code, for example type punning an array of *unsigned char* into a series of *unsigned int* and performing an operation on each value, the optimizer still manages to see through the memcpy [Live Compiler Explorer Example](https://godbolt.org/g/kCjkx2):
    What if we want to type punning an array of *unsigned char* into a series of *unsigned int* and then perform an operation on each *unsigned int* value? We can use **memcpy** to pun the *unsigned char array* into a temporary of type *unsinged int* the optimizer will still manage to see through the **memcpy** and optimize away both the temporary and the copy and operate directly on the underlying data, [Live Compiler Explorer Example](https://godbolt.org/g/kCjkx2):

    ```cpp
    int foo(unsigned int x ) { return x ;}
    @@ -271,7 +271,7 @@ int pop( unsigned char *p, size_t len ) {
    int pop = 0 ;

    for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    unsigned int ui = 0;
    unsigned int ui = 0;
    std::memcpy( &ui, &p[index], sizeof(unsigned int) ) ;

    pop += foo( ui ) ;
    @@ -281,6 +281,12 @@ int pop( unsigned char *p, size_t len ) {
    }
    ```
    The assembly for the body of the loop shows the optimizer reduces the body into a direct access of the underlying *unsinged char array* as an *unsigned int*, adding it directly into **pop**:
    ```Assembly
    add eax, dword ptr [rdi + rcx]
    ```

    Same code but using **reinterpret_cast** to type pun(violates strict aliasing):

    ```cpp
  30. @shafik shafik revised this gist Mar 3, 2018. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions WhatIsStrictAliasingAndWhyDoWeCare.md
    Original file line number Diff line number Diff line change
    @@ -19,9 +19,9 @@ We have a *int\** pointing to memory occupied by an *int* and this is a valid al
    The next example shows an aliasing that leads to undefined behavior [live example](https://wandbox.org/permlink/8qA8JyJRVHtS9LPf):

    ```cpp
    int foo( float *f, int *i ) {
    *i = 1 ;
    *f = 0.f ;
    int foo( float *f, int *i ) {
    *i = 1 ;
    *f = 0.f ;

    return *i ;
    }
    @@ -35,14 +35,14 @@ int main() {
    }
    ```
    In this case with optimization enabled using **-O2** both gcc and clang produce the following result:
    In the function **foo** we take an *int\** and a *float\**, in this example we call **foo** and set both parameters to point to the same memory location which in this example contains an *int*. We may naively expect the result of the second **cout** to **0** but with optimization enabled using **-O2** both gcc and clang produce the following result:
    ```
    0
    1
    ```
    Which some may not expect but is perfectly valid since we have invoked undefined behavior. In this case a *float* can not validly alias an *int* object. Therefore the optimizer can assume the *constant 1* stored when dereferecing **i** will be the return value since a store through **f** could not validly affect an *int* object. Plugging the code in Compiler Explorer shows this is exactly what is happening [live example](https://godbolt.org/g/yNV5aj):
    Which may not be expected but is perfectly valid since we have invoked undefined behavior. A *float* can not validly alias an *int* object. Therefore the optimizer can assume the *constant 1* stored when dereferecing **i** will be the return value since a store through **f** could not validly affect an *int* object. Plugging the code in Compiler Explorer shows this is exactly what is happening [live example](https://godbolt.org/g/yNV5aj):
    ```assembly
    foo(float*, int*): # @foo(float*, int*)