Created
September 12, 2024 02:38
-
-
Save pbackus/28e7f5668219ce83467c83c347ec7202 to your computer and use it in GitHub Desktop.
Revisions
-
pbackus created this gist
Sep 12, 2024 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,305 @@ # Enumerated Unions | Field | Value | |-----------------|-----------------------------------------------------------------| | DIP: | (number/id -- assigned by DIP Manager) | | Author: | Paul Backus ([email protected]) | | Implementation: | (links to implementation PR if any) | | Status: | Draft | ## Abstract This DIP proposes a conservative design for sum types that aims to be consistent with existing D syntax and semantics. It does not discuss pattern matching. ## Contents * [Rationale](#rationale) * [Prior Work](#prior-work) * [Description](#description) * [Breaking Changes and Deprecations](#breaking-changes-and-deprecations) * [Reference](#reference) * [Copyright & License](#copyright--license) * [History](#history) ## Rationale Sum types have proven to be a useful and popular feature in many languages. In D, several library implementations are available, including Phobos's `std.variant` and `std.sumtype`, and vibe.d's `taggedalgebraic`. Benefits to having sum types as a built-in language feature (rather than a library feature) would include nicer syntax, better error messages, and better compile-time performance. ## Prior Work ### In D * [`std.variant.Algebraic`][algebraic] * [`taggedalgebraic`][taggedalgebraic] * [`std.sumtype`][std_sumtype] * [Walter Bright's sumtype DIP][walter_dip] [algebraic]: https://dlang.org/phobos/std_variant.html#Algebraic [taggedalgebraic]: https://code.dlang.org/packages/taggedalgebraic [std_sumtype]: https://dlang.org/phobos/std_sumtype.html [walter_dip]: https://github.com/WalterBright/documents/blob/96bca2f9f3520cf53ed5c4dec8e5e2d855e64e66/sumtype.md ### In other languages Other languages have taken a variety of different approaches to implementing sum types. This list includes representative examples of several approaches: * C++'s [std::variant][cpp_variant] * TypeScript's [union types][ts_union] * Rust's [enumerations][rust_enum] * Standard ML's [datatypes][ml_datatype] * Scala's [sealed traits and case classes][scala_sealed_trait] [cpp_variant]: https://en.cppreference.com/w/cpp/utility/variant [ts_union]: https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#union-types [rust_enum]: https://doc.rust-lang.org/reference/items/enumerations.html [ml_datatype]: https://en.wikibooks.org/wiki/Standard_ML_Programming/Types#Datatype_declarations [scala_sealed_trait]: https://docs.scala-lang.org/tour/pattern-matching.html#matching-on-case-classes ## Description Enumerated unions are a specialized kind of union. **Except when otherwise specified, enumerated unions behave the same way as unions.** ### Syntax An enumerated union is declared by using the keywords `enum union` instead of `union` in a union declaration. **Example:** ```d enum union WebAddress { ubyte[4] ipv4; ubyte[16] ipv6; string url; } ``` ### Fields [Anonymous struct and union fields][anonymous_fields] are not allowed in an enumerated union. This ensures that there is always exactly one active field in any `enum union` object. [anonymous_fields]: https://dlang.org/spec/struct.html#anonymous ### `__tag` property The `__tag` property is used to determine at runtime which field of an enumerated union is active. For any `enum union` expression `e`, the expression `e.__tag` is an rvalue of type `size_t` which evaluates to the index of the active field in `e.tupleof`. **Example:** Using the `__tag` property to check if a field is active. ```d bool has(string target)(ref WebAddress addr) if (target == "ipv4" || target == "ipv6" || target == "url") { switch (addr.__tag) { static foreach (i, field; WebAddress.tupleof) { case i: { enum isTarget = __traits(identifier, field) == target; return isTarget; } } default: assert(0); } } unittest { Address a = { url: "https://dlang.org/" }; assert( a.has!"url"); assert(!a.has!"ipv4"); } ``` ### Memory layout In addition to its declared fields, an enumerated union may contain an additional hidden field called the *tag field.* The tag field is used to store any additional data necessary to keep track of the `enum union`'s active field at runtime. It may be omitted if the compiler determines that no additional data is needed (for example, if the `enum union` has only one declared field). The tag field's storage does not overlap with any of the declared fields. The type of the tag field must be a POD type, but is otherwise unspecified. The size, offset, and alignment of the tag field are unspecified. If two `enum union` values are of the same type, and both have the same active field, then the values stored in their tag fields must have identical binary representations. Aside from the restriction above, the values stored in an `enum union`'s tag field are unspecified. It is undefined behavior to store any value in the tag field of an `enum union` object that was not read from the tag field of an object of the same type. The tag field is not included in `.tupleof`. Unless otherwise specified, any reference to the "fields" of an `enum union` in this document refers only to the declared fields, and does not include the tag field. ### Special member functions Unlike traditional unions, enumerated unions may have copy constructors, postblits, destructors, and invariants. If an `enum union` does not have a copy constructor or a postblit, but one or more of its fields has elaborate copy semantics, a copy constructor is generated which performs the following steps: 1. Copy-initializes the active field from the active field of the original object. If the active field has a copy constructor or postblit, it is called during this step. 2. Copy-initializes the tag field (if any) from the tag field of the original object. A type has elaborate copy semantics if it has a postblit or copy constructor, or if it directly embeds a type with elaborate copy semantics. This is the same definition used by [`std.traits.hasElaborateCopyConstructor`][elaborate_copy_constructor]. If necessary, the compiler should generate multiple copy constructor overloads to handle different combinations of type qualifiers on the new and original objects. If an `enum union` does not have a destructor, but one or more of its fields has elaborate destruction semantics, a destructor is generated which performs the following steps: 1. If the active field has elaborate destruction semantics, destroys the active field. A type has elaborate destruction semantics if 1. it has a destructor or directly embeds a type with elaborate destruction semantics; and, 2. it is not a class type or a non-enumerated union type. This is the same definition used by [`std.traits.hasElaborateDestructor`][elaborate_destructor]. [elaborate_copy_constructor]: https://dlang.org/phobos/std_traits.html#hasElaborateCopyConstructor [elaborate_destructor]: https://dlang.org/phobos/std_traits.html#hasElaborateDestructor ### Equality Enumerated union values of the same type can be compared for equality. Two `enum union` values of the same type are equal if they have the same active field, and the values of their active fields are equal. ### Safety Direct access to fields of an enumerated union is subject to the same safety restrictions as access to fields of a traditional union. A value of an enumerated union type is a [safe value][safe_values] if 1. its `__tag` property evaluates to the index of the active field, and 2. the value of its active field is safe. `@trusted` code may assume that the field indicated by the `__tag` property is the active field, and may rely on that assumption to allow access to the active field in `@safe` code. **Example:** ```d @trusted ref get(string target)(ref WebAddress addr) if (target == "ipv4" || target == "ipv6" || target == "url") { switch (addr.__tag) { static foreach (i, field; WebAddress.tupleof) { case i: { enum isTarget = __traits(identifier, field) == target; static if (!isTarget) assert(0, "Active field is " ~ active ~ ", not " ~ target); else return addr.tupleof[i]; } } default: assert(0); } } @safe unittest { WebAddress a1 = { url: "https://www.rust-lang.org/" }; WebAddress a2 = { ipv4: [127, 0, 0, 1] }; assert(a1.get!"url" == "https://www.rust-lang.org/"); assert(a2.get!"ipv4" == [127, 0, 0, 1]); a1.get!"url" = "https://dlang.org/"; } ``` Writing to an `enum union` object is `@system` if the `enum union` has fields whose types have unsafe values, since doing so could invalidate existing pointers or references to the active field. Access to the tag field of an `enum union`, if it exists, is always `@system`. [safe_values]: https://dlang.org/spec/function.html#safe-values ### Reflection A new *TypeSpecialization*, `enum union`, is added to the syntax for the `is()` expression. `is(T == enum union)` evaluates to `true` if `T` is an enumerated union type. `is(T : enum union)` evaluates to `true` if `T` is an enumerated union type, or implicitly converts to an enumerated union type. ## Breaking Changes and Deprecations Currently, the syntax `enum union { /* ... */ }` is parsed by the D compiler as a union declaration with the `enum` storage class applied to it. Since the `enum` storage class has no effect in this context, it is unlikely that existing D projects will be affected if this syntax is given a new meaning. However, it is not impossible. ## Reference * [Sum Types - first draft][walter_dip_thread] by Walter Bright. [walter_dip_thread]: https://forum.dlang.org/thread/[email protected] ## Copyright & License Copyright (c) 2024 by the D Language Foundation Licensed under [Creative Commons Zero 1.0](https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt) ## History The DIP Manager will supplement this section with links to forum discsusionss and a summary of the formal assessment.