Artem Kharytoniuk kennyalive

Minimal D3D11 part 2

Follow-up to Minimal D3D11, adding instanced rendering. As before: An uncluttered Direct3D 11 setup & basic rendering primer / API familiarizer. Complete, runnable Windows application contained in a single function and laid out in a linear, step-by-step fashion. No modern C++ / OOP / obscuring cruft.

The main difference here is that the hollow cube is rendered using DrawIndexedInstanced (which saves a lot of vertices compared to the original, so model data is now small enough to be included in the source without being too much in the way), but also all trigonometry and matrix math is moved to the vertex shader, further simplifying the main code.

Each instance is merely this piece of geometry, consisting of 4 triangles:

..which is

I recently upgraded to Windows 11 and had to set up Visual Studio C++ again. A few things have changed in Windows 11 with the tiled window management and deeper Windows Terminal integration, so I ended up revisiting my usual VS setup and made a bunch of changes. I'm documenting them here for my own future benefit (yes, I do know about VS import/export settings) and on the off chance that anyone else might get some ideas for their own setup.

It should be mentioned that I'm a single monitor user (multimon gives me neck pain as I've gotten older) and the monitor I currently use is 25". A laptop screen can't really accommodate the 2x2 full screen layout I'm using as my default here, so when I'm on a laptop I only use the vertical layout out of necessity.

why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.

Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.

[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]

The other issue is to do with CPU power management.

What is the Strict Aliasing Rule and Why do we care?

(OR Type Punning, Undefined Behavior and Alignment, Oh My!)

What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.

In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is classified as undefined behavior(UB). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.

Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we th

Commit Message Guidelines

Short (72 chars or less) summary

More detailed explanatory text. Wrap it to 72 characters. The blank
line separating the summary from the body is critical (unless you omit
the body entirely).

Write your commit message in the imperative: "Fix bug" and not "Fixed
bug" or "Fixes bug." This convention matches up with commit messages

	RWTexture2D<float3> tex : register(u0);

	// Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs", copied in from Wikipedia
	uint Xorshift(uint seed)
	{
	uint x = seed;
	x ^= x << 13;
	x ^= x >> 17;
	x ^= x << 5;
	return x;

	// The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae

	// Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16.
	// See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details
	float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize)
	{
	// We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding
	// down the sample location to get the exact center of our "starting" texel. The starting texel will be at
	// location [1, 1] in the grid, where [0, 0] is the top left corner.
	float2 samplePos = uv * texSize;

	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD

Artem Kharytoniuk kennyalive

Minimal D3D11 part 2

What is the Strict Aliasing Rule and Why do we care?

(OR Type Punning, Undefined Behavior and Alignment, Oh My!)

Commit Message Guidelines

Orthodox C++

This article has been updated and is available here.