// aarch64 assumed, but reasonably general // x0: address of object to be initialised; x1, x2, x3, values to use to initialise its first three slots // we want to ensure no other thread ever sees an unitialised x0 // the dumb way: fence str x1, [x0] str x2, [x0, 8] str x3, [x0, 16] dmb ishst // but fences are cringe. how can we do 'better'? data dependencies! swp x1, x4, [x0] swp x2, x5, [x0, 8] // (for the pedants, this addressing mode doesn't exist; this is pseudo-code) swp x3, x6, [x0, 16] add x4, x4, x5 // this could be anything (xor, sub, or, etc.) add x4, x4, x6 sub x4, x4, x4 // produce constant zero dependency-ordered after all writes add x0, x0, x4 // and ensure further accesses to the object are similarly dp-ordered // or ... control dependencies! swp x1, x4, [x0] swp x2, x5, [x0, 8] swp x3, x6, [x0, 16] add x4, x4, x5 add x4, x4, x6 cmp x4, x4 b.ne anywhere // or ... forbidden memory control dependency hacks! swp x1, x4, [x0] swp x2, x5, [x0, 8] swp x3, x6, [x0, 16] add x4, x4, x5 add x4, x4, x6 sub x4, x4, x4 ldr x4, [x4, x0] // the dummy load can actually go anywhere; it just has to depend on x4, so e.g. this also works ldr x4, [x4, sp] // and it doesn't even have to be a load; it could be a store too str xzr, [x4, sp] // I don't know how widely the memory control dependency hack is known, if it's known at all, but I find it incredibly cute // swp is likely too expensive for any of this to be worth it, sadly (awaiting benchmarks to the contrary!) // but it would be really nice to have an str variant that also produces a dependency token you can use to enforce ordering