rygorous · October 22, 2024 05:58 · hailinzeng · Oct 22, 2024 · rygorous · Oct 22, 2024
diff --git a/gistfile1.txt b/gistfile1.txt
 Version a:

  byteswap32(uint32_t x)
  {
      uint32_t y = (x >> 24) & 0xff;
      y |= (x >> 8) & 0xff00;
      y |= (x << 8) & 0xff0000;
      y |= (x << 24) & 0xff000000u;
  }
  
  (or any reordering thereof, or substitute |= with +=)
  
 Version b:

  byteswap32(uint32_t x)
  {
      uint32_t y = (x >> 24) & 0xff;
      y |= ((x >> 16) & 0xff) << 8;
      y |= ((x >> 8) & 0xff) << 16;
      y |= (x & 0xff) << 24;
  }

  (or any reordering thereof, or substitute |= with +=)
  
 Version c:

  byteswap32(uint32_t x)
  {
      // static_assert that sizeof(uint32_t) == 4 if you want.
      uint8_t bytes[4];
      uint32_t y;
      memcpy(bytes, &x, 4);
      std::swap(bytes[0], bytes[3]);
      std::swap(bytes[1], bytes[2]);
      memcpy(&y, bytes, 4);
      return y;
  }
  
  (again, up to reordering. Or use type punning through a union. Or use two buffers and copy
  with reordering instead of swapping in-place.)
  
 Version d:

  uint32_t byteswap32(uint32_t x)
  {
      return (byteswap16(x & 0xffff) << 16) | byteswap16(x >> 16);
  }
 
  (up to reordering. With multiple possible implementations for byteswap16.)
  
 Version e:

  uint32_t byteswap32(uint32_t x)
  {
      // This looks strange but happens to map directly to 3 PowerPC instructions
      // (rlwinm, rlwimi, rlwimi) that form the standard byte reverse sequence on
      // that target.
      uint32_t y = (x << 24) | (x >> 8); // rlwinm
      y = (y & ~0x00ff0000u) | ((x <<  8) & 0x00ff0000u); // rlwimi
      y = (y & ~0x000000ffu) | ((x >> 24) & 0x000000ffu); // rlwimi
  }
  
 I have seen all these basic variants (and many of the noted variations) in
 production code. That's why "just puttern matching during instruction selection"
 doesn't work: there is no canonical way this is always written. If you want to
 handle this, you can either:

  a) perform sufficient analysis to detect any of these patterns, or
  b) introduce a canonical way to write it, make that fast, and recommend people use it.

 Now, I've argued elsewhere that exposing byte swaps directly is kind of unfortunate
 in the first place, since mostly byteswaps gets used when loading data with a known
 endianness, on a target architecture that has a different endianness. The preferable
 way to handle that is to state the target endian directly, not have logic to figure
 out whether to swap or not. But the same concern applies to  other constructs such as,
 say, loading a little-endian value by doing

  bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24)
  
 It's great when you can agree people to always write it exactly that way, but there are
 many variants floating around, and many such cases to catch. You can make pure
 pattern matching work if you make clear from the outset that there is one blessed way
 to do say an unaligned little endian load (say the code sequence above), and ensure that
 all compilers handle that correctly. But with C/C++ that ship has sailed; there are many
 variants in common use, and different compilers disagree on what the right thing to
 pattern-match is, if they implement it at all!

 Again, this is a lot simpler if there's a known construct that people are supposed to
 use, and making that an official part of the language is pretty much the only way you
 get both the compilers and the users to actually handle it well.
	Version a:

	byteswap32(uint32_t x)
	{
	uint32_t y = (x >> 24) & 0xff;
	y \|= (x >> 8) & 0xff00;
	y \|= (x << 8) & 0xff0000;
	y \|= (x << 24) & 0xff000000u;
	}

	(or any reordering thereof, or substitute \|= with +=)

	Version b:

	byteswap32(uint32_t x)
	{
	uint32_t y = (x >> 24) & 0xff;
	y \|= ((x >> 16) & 0xff) << 8;
	y \|= ((x >> 8) & 0xff) << 16;
	y \|= (x & 0xff) << 24;
	}

	(or any reordering thereof, or substitute \|= with +=)

	Version c:

	byteswap32(uint32_t x)
	{
	// static_assert that sizeof(uint32_t) == 4 if you want.
	uint8_t bytes[4];
	uint32_t y;
	memcpy(bytes, &x, 4);
	std::swap(bytes[0], bytes[3]);
	std::swap(bytes[1], bytes[2]);
	memcpy(&y, bytes, 4);
	return y;
	}

	(again, up to reordering. Or use type punning through a union. Or use two buffers and copy
	with reordering instead of swapping in-place.)

	Version d:

	uint32_t byteswap32(uint32_t x)
	{
	return (byteswap16(x & 0xffff) << 16) \| byteswap16(x >> 16);
	}

	(up to reordering. With multiple possible implementations for byteswap16.)

	Version e:

	uint32_t byteswap32(uint32_t x)
	{
	// This looks strange but happens to map directly to 3 PowerPC instructions
	// (rlwinm, rlwimi, rlwimi) that form the standard byte reverse sequence on
	// that target.
	uint32_t y = (x << 24) \| (x >> 8); // rlwinm
	y = (y & ~0x00ff0000u) \| ((x << 8) & 0x00ff0000u); // rlwimi
	y = (y & ~0x000000ffu) \| ((x >> 24) & 0x000000ffu); // rlwimi
	}

	I have seen all these basic variants (and many of the noted variations) in
	production code. That's why "just puttern matching during instruction selection"
	doesn't work: there is no canonical way this is always written. If you want to
	handle this, you can either:

	a) perform sufficient analysis to detect any of these patterns, or
	b) introduce a canonical way to write it, make that fast, and recommend people use it.

	Now, I've argued elsewhere that exposing byte swaps directly is kind of unfortunate
	in the first place, since mostly byteswaps gets used when loading data with a known
	endianness, on a target architecture that has a different endianness. The preferable
	way to handle that is to state the target endian directly, not have logic to figure
	out whether to swap or not. But the same concern applies to other constructs such as,
	say, loading a little-endian value by doing

	bytes[0] \| (bytes[1] << 8) \| (bytes[2] << 16) \| (bytes[3] << 24)

	It's great when you can agree people to always write it exactly that way, but there are
	many variants floating around, and many such cases to catch. You can make pure
	pattern matching work if you make clear from the outset that there is one blessed way
	to do say an unaligned little endian load (say the code sequence above), and ensure that
	all compilers handle that correctly. But with C/C++ that ship has sailed; there are many
	variants in common use, and different compilers disagree on what the right thing to
	pattern-match is, if they implement it at all!

	Again, this is a lot simpler if there's a known construct that people are supposed to
	use, and making that an official part of the language is pretty much the only way you
	get both the compilers and the users to actually handle it well.
No results found