What is Strict Aliasing Rule and Why do we care?

What is strict aliasing? Well, first we will describe what is aliasing and then we can learn what being strict about it means. In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is undefined behavior(UB). Once we have undefined behavior all bets are off, the results of our program are no longer reliable. Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the future versions of a compiler with a new optimization will break code we thought was valid. This is undesirable and it is a worth while goal to understand the strict aliasing rules and how to avoid violating them.

Let's look at some examples, then we can talk about exactly what the standard(s) say, examine some further examples and then see how to avoid strict aliasing and catch violations we missed. The first example should not be surprising live example:

int x = 10 ;
int *ip = &x ;
    
std::cout << *ip << "\n" ;
*ip = 12 ;
std::cout << x << "\n" ;

We have a int* pointing to memory occupied by an int and this is a valid aliasing. The optimizer must assume that assignments through ip could update the value occupied by x.

The next example shows an aliasing that leads to undefined behavior(live example):

int foo( float *f, int *i ) { 
    *i = 1 ;               
    *f = 0.f ;            
   
   return *i ;
}

int main() {
    int x = 0 ;
    
    std::cout << x << "\n" ;   // Expect 0
    x = foo(reinterpret_cast<float*>(&x), &x ) ;
    std::cout << x << "\n" ;   // Expect 0?
}

In the function foo we take an int* and a float*, in this example we call foo and set both parameters to point to the same memory location which in this example contains an int. We may naively expect the result of the second cout to be 0 but with optimization enabled using -O2 both gcc and clang produce the following result:

0
1

Which may not be expected but is perfectly valid since we have invoked undefined behavior. A float can not validly alias an int object. Therefore the optimizer can assume the constant 1 stored when dereferecing i will be the return value since a store through f could not validly affect an int object. Plugging the code in Compiler Explorer shows this is exactly what is happening(live example):

foo(float*, int*): # @foo(float*, int*)
mov dword ptr [rsi], 1  
mov dword ptr [rdi], 0
mov eax, 1                       
ret

The optimizer using Type-Based Alias Analysis(TBAA)⁶ assumes 1 will be returned and directly moves the constant value into register eax which carries the return value. TBAA uses the languages rules about what types are allowed to alias to optimize loads and stores. In this case TBAA knows that a float can not alias and int and optimizes away the load of i.

Now to the Rule-Book

What exactly does the standard say we are allowed and not allowed to do? The standard language is not straight forward, so for each item I will try to provide a code examples that demonstrates the meaning.

What does the C11 draft standard say?

The C11 draft standard² says the following in section 6.5 Expressions paragraph 7:

An object shall have its stored value accessed only by an lvalue expression⁵ that has one of the following types:⁸⁸⁾ — a type compatible with the effective type of the object,

int x = 1 ;
int *p = &x ;   
printf("%d\n", *p ) ; // *p gives us an lvalue expression of type int which is compatible with int

— a qualified version of a type compatible with the effective type of the object,

int x = 1;
const int *p = &x ;
printf("%d\n", *p ) ; // *p gives us an lvalue expression of type const int which is compatible with int

— a type that is the signed or unsigned type corresponding to the effective type of the object,

int x = 1;
unsigned int *p = (unsigned int*)&x ;
printf("%u\n", *p ) ; // *p gives us an lvalue expression of type unsigned int which corresponds to 
                      // the effective type of the object

See Footnote 12 for gcc/clang extension, that allows assigning unsigned int* to int* even though they are not compatible types.

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

int x = 1;
const unsigned int *p = (const unsigned int*)&x ;
printf("%u\n", *p ) ; // *p gives us an lvalue expression of type const unsigned int which is a unsigned type 
                      // that corresponds with to a qualified verison of the effective type of the object

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

struct foo {
  int x ;
} ;
    
void foobar( struct foo *fp, int *ip ) ;  // struct foo is an aggregate that includes int among its members so it can
                                          // can alias with *ip

foo f ;
foobar( &f, &f.x ) ;

— a character type.

int x = 65;
char *p = (char *)&x ;
printf("%c\n", *p ) ;  // *p gives us an lvalue expression of type char which is a character type

What the C++17 Draft Standard say

The C++17 draft standard³ in section [basic.lval] paragraph 11 says:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:⁶³ (11.1) — the dynamic type of the object,

void *p = malloc( sizeof(int) ) ; // We have allocated storage but not started the lifetime of an object
int *ip = new (p) int{0} ;        // Placement new changes the dynamic type of the object to int
std::cout << *ip << "\n" ;        // *ip gives us a glvalue expression of type int which matches the dynamic type 
                                  // of the allocated object

(11.2) — a cv-qualified version of the dynamic type of the object,

int x = 1;
const int *cip = &x ;
std::cout << *cip << "\n" ;  // *cip gives us a glvalue expression of type const int which is a cv-qualified 
                             // version of the dynamic type of x

(11.3) — a type similar (as defined in 7.5) to the dynamic type of the object,

// Need an example for this

(11.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,

// Both si and ui are signed or unsigned types corresponding to each others dynamic types
// We can see from this godbolt(https://godbolt.org/g/KowGXB) the optimizer assumes aliasing.
signed int foo( signed int &si, unsigned int &ui ) {
  si = 1;
  ui = 2 ;

  return si ;
}

(11.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

signed int foo( const signed int &si1, int &si2) ; // Hard to show this one assumes aliasing

(11.6) — an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

struct foo {
 int x ;
} ;

// Compiler Explorer example(https://godbolt.org/g/z2wJTC) shows aliasing assumption
int foobar( foo &fp, int &ip ) {
 fp.x = 1 ;
 ip = 2 ;

 return fp.x ;
}

foo f ; 
foobar( f, f.x ) ;

(11.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

struct foo { int x ; } ;

struct bar : public foo {} ;

int foobar( foo &f, bar &b ) {
  f.x = 1 ;
  b.x = 2 ;

  return f.x ;
}

(11.8) — a char, unsigned char, or std::byte type.

int foo( std::byte &b, uint32_t &ui ) {
  b = static_cast<std::byte>('a') ;
  ui = 0xFFFFFFFF ;                   
  
  return std::to_integer<int>( b ) ;  // b gives us a glvalue expression of type std::byte which can alias
                                      // an object of type uint32_t
}

Subtle Differences

So although we can see that C and C++ say similar things about aliasing there are some differences that we should be aware of. C++ does not have C's concept of effective type nor compatible type and C does not have C++'s concept of dynamic type or similar type. Although both have lvalue and rvalue expressions⁵, C++ also has glvalue, prvalue and xvalue⁹ expressions. These differences are mostly out of scope for this article but one interesting example is how to create an object out of malloc'ed memory. In C we can set the effective type¹⁰ by for example writing to the memory through an lvalue or memcpy¹¹.

void *p = malloc(sizeof(float)) ;
float f = 1.0f ;
memcpy( *p, &f, sizeof(float)) ;  // Effective type of *p is float in C
                                  // Or
float *fp = p ;
*fp = 1.0f ;                      // Effective type of *p is float in C

Neither of these methods is sufficient in C++ which requires placement new:

float *fp = new (p) float{1.0f} ;   // Dynamic type of *p is now float

Are int8_t and uint8_t char types?

Theoretically neither int8_t nor uint8_t have to be char types but practically they are implemented that way. This is important because if they are really char types then they also alias like char types which is you are unaware of can lead to suprising performance impacts. We can see that glibc typedefs int8_t and uint8_t to singed char and unsigned char respectively.

This would be hard to chance since at least for C++ it would be an ABI break. Since this would change name mangling and would break any API using either of those types in their interface.

How do we Type Pun correctly?

Sometimes we want to treat a piece of memory like it is bag of bits, circumvent the type system and interpret it as a different type. This is called type punning, to reinterpret a segment of memory as another type. The standard blessed method for type punning in both C and C++ is memcpy. This may seem a little heavy handed but the optimizer should recognize the use of memcpy for type punning and optimize it away to generate register to register moves. For example if we know int64_t is the same size as double:

static_assert( sizeof( double ) == sizeof( int64_t ), "" ) ;

and we want to obtain the integer representation of a double. We could reinterpret the bits using reinterpret_cast, which violates strict aliasing rules:

void func1( double d ) {
  std::int64_t n ;
  n = *reinterpret_cast<std::int64_t *>(&d) ; // UB, int64_t is not allowed to alias double
  // ...

or we could use memcpy:

void func1( double d ) {
  std::int64_t n;
  std::memcpy(&n, &d, sizeof d); 
  //...

or we could use the old type punning trick via a union¹³(undefined behavior in C++):

union u1
{
  std::int64_t n;
  double d ;
} ;

u1 u ;
u.d = d ;

At a sufficient optimization level all three cases should generate identical code using just register moves live Compiler Explorer Example.

But, what if we want to type punning an array of unsigned char into a series of unsigned int and then perform an operation on each unsigned int value? We can use memcpy to pun the unsigned char array into a temporary of type unsinged int the optimizer will still manage to see through the memcpy and optimize away both the temporary and the copy and operate directly on the underlying data, Live Compiler Explorer Example:

// Simple operation just return the value back
int foo(unsigned int x ) { return x ;}

// Assume len is a multiple of sizeof(unsigned int) 
int bar( unsigned char *p, size_t len ) {
  int result = 0 ;

  for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
    unsigned int ui = 0;                                 
    std::memcpy( &ui, &p[index], sizeof(unsigned int) ) ;

    result += foo( ui ) ;
  }

  return result ;
}

The assembly for the body of the loop shows the optimizer reduces the body into a direct access of the underlying unsinged char array as an unsigned int, adding it directly into pop:

add     eax, dword ptr [rdi + rcx]

Same code but using reinterpret_cast to type pun(violates strict aliasing):

// Assume len is a multiple of sizeof(unsigned int) 
int bar( unsigned char *p, size_t len ) {
 int result = 0 ;

 for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
   unsigned int ui = *reinterpret_cast<unsigned int*>(&p[index]) ;

   result += foo( ui ) ;
 }

 return result ;
}

C++20 and bit_cast

In C++20 we may gain bit_cast<>¹⁴ which gives a simple and safe way to type-pun as well as being usable in constexpr context. It requires us to use an intermediate struct in the case where To and From types don't have the same size¹⁵. We will use a struct containing a four characater array(assumes 4 byte unsigned int) to be the From type and unsigned int as the To type.:

// Asserting unsigned int is size 4
static_assert( sizeof( unsigned int ) == 4, "" ) ; 

struct four_chars {
 unsigned char arr[4] = {} ; 
} ;

// Assume len is a multiple of 4 
int bar( unsigned char *p, size_t len ) {
 int result = 0 ;

 for( size_t index = 0; index < len; index += sizeof(unsigned int) ) {
   four_chars f ;
   std::memcpy( f.arr, p, sizeof(unsigned int)) ;
   unsigned int result = bit_cast<unsigned int>(f) ;

   result += foo( result ) ;
 }

 return result ;
}

It is unfortunate that we need this intermediate type but that is the current contraint of bit_cast.

Alignment

We have seen in previous examples violating strict aliasing rules can lead to stores being optimized away. Violating strict aliasing rules can also lead to violations of alignment requirement. Both the C and C++ standard state that objects have alignment requirements which restrict where in memory objects can be allocated and therefore accessed¹⁷. C11 section 6.2.8 Alignment of objects says:

Complete object types have alignment requirements which place restrictions on the addresses at which objects of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type: stricter alignment can be requested using the _Alignas keyword.

The C++17 draft standard in section [basic.align] paragraph 1:

Object types have alignment requirements (6.7.1, 6.7.2) which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type; stricter alignment can be requested using the alignment specifier (10.6.2).

Both C99 and C11 are explict that a conversion that results in a unaligned pointer is undefined behavior, section 6.3.2.3 Pointers says:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned⁵⁷⁾ for the pointed-to type, the behavior is undefined. ...

Although C++ is not as explict I believe this sentence from [basic.align] paragraph 1 is sufficient:

... An object type imposes an alignment requirement on every object of that type; ...

An Example

So let's assume that alignof(char) and alignof(int) are 1 and 4 respectively and sizeof(int) is 4 then type punning an array of char of size 4:

char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ; // Could be allocated on a 1 or 2 byte boundry
int x = *reinterpret_cast<int*>(arr) ;

as an int violates strict aliasing but may also violate alignment requirements if arr has an alignment of 1 or 2 bytes. Which could lead to reduced performance or a bus error¹⁸ in some situations. Whereas using alignas to force the array to the same alignment of int would prevent violating alignment requirements:

alignas(aligof(int)) char arr[4] = { 0x0F, 0x0, 0x0, 0x00 } ; 
int x = *reinterpret_cast<int*>(arr) ;

Atomics

Another unexpected penalty to unaligned accesses is that is breaks atomics on some architectures. Atomics stores may not appear atomic to other threads on x86 if they are misaligned⁷.

Catching Strict Aliasing Violations

We don't have a lot of good tools for catching strict aliasing, the tools we have will catch some cases of strict aliasing violations and some cases of misaligned loads and stores.

gcc using the flag -fstrict-aliasing and -Wstrict-aliasing¹⁹ can catch some cases although not without false positives/negatives. For example the following cases²¹ will generate a warning in gcc see it live:

int a = 1;
short j;
float f ;

printf("%i\n", j = *((short*)&a));
printf("%i\n", j = *((int*)&f));

although it will not catch this additional case see it live:

int *p;

p=&a;
printf("%i\n", j = *((short*)p));

clang although it allows these flags apparently does not actually implement the warnings²⁰.

Another tool we have available to us is dynamic analysis using ASan²² we can catch misaligned loads and stores. Although these are not directly strict aliasing violations they are a common result of strict aliasing violations. For example the following cases²³ will generate runtime errors when built with clang using -fsanitize=address

int *x = new int[2];                // 8 bytes: [0,7].
int *u = (int*)((char*)x + 6);  // regardless of alignment of x this will not be an aligned address
*u = 1;                                    // Access to range [6-9]
printf( "%d\n", *u ) ;             // Access to range [6-9]

The last tool I will recommend is C++ specific and not strictly a tool but a coding practice, don't allow C-style casts. Both gcc and clang will produce a diagnostic for C-style casts using -Wold-style-cast. This will force any undefined type puns to use reinterpret_cast, in general reinterpret_cast should be a flag for closer code review. It is also easiser to search your code base for reinterpret_cast to perform an audit.

Conclusion

Type punning is a tool for treating a type like a bag of bits, which can be useful or even essential in many low level tasks. Traditionally compilers did not take advantage of optimizations opportunities around strict aliasing violations and so software developers became used to using these methods to perform type punning and the vast majority of type punning code we will find online will violate the strict aliasing rule.

Optimizer's are slowly getting better at type based aliasing analysis and already break some code that relies on strict aliasing violations and we can expect the optimizations will only get better and will break more and more code we have been used to just working.

We have standard conformant methods for type punning and in release and sometimes debug builds these methods should be cost free abstractions. We have some tools for catching strict aliasing violations but they will only catch a small fraction of the cases.

Footnotes

1 Undefined behavior described on cppreference http://en.cppreference.com/w/cpp/language/ub ↩
2 Draft C11 standard http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf ↩
3 Draft C++17 standard https://github.com/cplusplus/draft/raw/master/papers/n4659.pdf ↩
4 Latest C++ draft standard can be found here: http://eel.is/c++draft/ ↩
5 Understanding lvalues and rvalues in C and C++ https://eli.thegreenplace.net/2011/12/15/understanding-lvalues-and-rvalues-in-c-and-c ↩
6 Type-Based Alias Analysis http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Website/articles/DDJ/2000/0010/0010d/0010d.htm ↩
7 Demonstrates torn loads for misaligned atomics https://gist.github.com/michaeljclark/31fc67fe41d233a83e9ec8e3702398e8 and tweet referencing this example https://twitter.com/corkmork/status/944421528829009925 ↩
8 Comment in gcc bug report explaining why changing int8_t and uint8_t to not be char types would be an ABI break for C++ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110#c13 and twitter thread discussing the issue https://twitter.com/shafikyaghmour/status/822179548825468928 ↩
9 "New” Value Terminology which explains how glvalue, xvalue and prvalue came about http://www.stroustrup.com/terminology.pdf ↩
10 Effective types and aliasing https://gustedt.wordpress.com/2016/08/17/effective-types-and-aliasing/ ↩
11 “constructing” a trivially-copyable object with memcpy https://stackoverflow.com/q/30114397/1708801 ↩

12 Why does gcc and clang allow assigning an unsigned int * to int * since they are not compatible types, although they may alias https://twitter.com/shafikyaghmour/status/957702383810658304 and https://gcc.gnu.org/ml/gcc/2003-10/msg00184.html ↩
13 Unions and memcpy and type punning https://stackoverflow.com/q/25664848/1708801 ↩
14 Revision two of the bit_cast<> proposal http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0476r2.html ↩
15 How to use bit_cast to type pun a unsigned char array https://gist.github.com/shafik/a956a17d00024b32b35634eeba1eb49e ↩
16 bit_cast implementation of pop() https://godbolt.org/g/bXBie7 ↩
17 Unaligned access https://en.wikipedia.org/wiki/Bus_error#Unaligned_access ↩
18 A bug story: data alignment on x86 http://pzemtsov.github.io/2016/11/06/bug-story-alignment-on-x86.html ↩
19 gcc documentation for -Wstrict-aliasing https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing ↩
20 Comments indicating clang does not implement -Wstrict-aliasing https://github.com/llvm-mirror/clang/blob/master/test/Misc/warning-flags-tree.c ↩
21 Stack Overflow questions examples came from https://stackoverflow.com/q/25117826/1708801 ↩
22 ASan documentation https://clang.llvm.org/docs/AddressSanitizer.html ↩
23 The unaligned access example take from the Address Sanitizer Algorithm wiki https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#unaligned-accesses ↩

AsmCoder110/WhatIsStrictAliasingAndWhyDoWeCare.md

Select an option

No results found