Skip to content

Instantly share code, notes, and snippets.

@pbackus
Created September 12, 2024 02:38
Show Gist options
  • Select an option

  • Save pbackus/28e7f5668219ce83467c83c347ec7202 to your computer and use it in GitHub Desktop.

Select an option

Save pbackus/28e7f5668219ce83467c83c347ec7202 to your computer and use it in GitHub Desktop.

Revisions

  1. pbackus created this gist Sep 12, 2024.
    305 changes: 305 additions & 0 deletions enum-unions.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,305 @@
    # Enumerated Unions

    | Field | Value |
    |-----------------|-----------------------------------------------------------------|
    | DIP: | (number/id -- assigned by DIP Manager) |
    | Author: | Paul Backus ([email protected]) |
    | Implementation: | (links to implementation PR if any) |
    | Status: | Draft |

    ## Abstract

    This DIP proposes a conservative design for sum types that aims to be
    consistent with existing D syntax and semantics. It does not discuss pattern
    matching.

    ## Contents
    * [Rationale](#rationale)
    * [Prior Work](#prior-work)
    * [Description](#description)
    * [Breaking Changes and Deprecations](#breaking-changes-and-deprecations)
    * [Reference](#reference)
    * [Copyright & License](#copyright--license)
    * [History](#history)

    ## Rationale

    Sum types have proven to be a useful and popular feature in many languages. In
    D, several library implementations are available, including Phobos's
    `std.variant` and `std.sumtype`, and vibe.d's `taggedalgebraic`.

    Benefits to having sum types as a built-in language feature (rather than a
    library feature) would include nicer syntax, better error messages, and better
    compile-time performance.

    ## Prior Work

    ### In D

    * [`std.variant.Algebraic`][algebraic]
    * [`taggedalgebraic`][taggedalgebraic]
    * [`std.sumtype`][std_sumtype]
    * [Walter Bright's sumtype DIP][walter_dip]

    [algebraic]: https://dlang.org/phobos/std_variant.html#Algebraic
    [taggedalgebraic]: https://code.dlang.org/packages/taggedalgebraic
    [std_sumtype]: https://dlang.org/phobos/std_sumtype.html
    [walter_dip]: https://github.com/WalterBright/documents/blob/96bca2f9f3520cf53ed5c4dec8e5e2d855e64e66/sumtype.md

    ### In other languages

    Other languages have taken a variety of different approaches to implementing
    sum types. This list includes representative examples of several approaches:

    * C++'s [std::variant][cpp_variant]
    * TypeScript's [union types][ts_union]
    * Rust's [enumerations][rust_enum]
    * Standard ML's [datatypes][ml_datatype]
    * Scala's [sealed traits and case classes][scala_sealed_trait]

    [cpp_variant]: https://en.cppreference.com/w/cpp/utility/variant
    [ts_union]: https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#union-types
    [rust_enum]: https://doc.rust-lang.org/reference/items/enumerations.html
    [ml_datatype]: https://en.wikibooks.org/wiki/Standard_ML_Programming/Types#Datatype_declarations
    [scala_sealed_trait]: https://docs.scala-lang.org/tour/pattern-matching.html#matching-on-case-classes

    ## Description

    Enumerated unions are a specialized kind of union. **Except when otherwise
    specified, enumerated unions behave the same way as unions.**

    ### Syntax

    An enumerated union is declared by using the keywords `enum union` instead of
    `union` in a union declaration.

    **Example:**

    ```d
    enum union WebAddress
    {
    ubyte[4] ipv4;
    ubyte[16] ipv6;
    string url;
    }
    ```

    ### Fields

    [Anonymous struct and union fields][anonymous_fields] are not allowed in an
    enumerated union. This ensures that there is always exactly one active field in
    any `enum union` object.

    [anonymous_fields]: https://dlang.org/spec/struct.html#anonymous

    ### `__tag` property

    The `__tag` property is used to determine at runtime which field of an
    enumerated union is active.

    For any `enum union` expression `e`, the expression `e.__tag` is an rvalue of
    type `size_t` which evaluates to the index of the active field in `e.tupleof`.

    **Example:** Using the `__tag` property to check if a field is active.

    ```d
    bool has(string target)(ref WebAddress addr)
    if (target == "ipv4" || target == "ipv6" || target == "url")
    {
    switch (addr.__tag)
    {
    static foreach (i, field; WebAddress.tupleof)
    {
    case i:
    {
    enum isTarget = __traits(identifier, field) == target;
    return isTarget;
    }
    }
    default:
    assert(0);
    }
    }
    unittest
    {
    Address a = { url: "https://dlang.org/" };
    assert( a.has!"url");
    assert(!a.has!"ipv4");
    }
    ```

    ### Memory layout

    In addition to its declared fields, an enumerated union may contain an
    additional hidden field called the *tag field.*

    The tag field is used to store any additional data necessary to keep track of
    the `enum union`'s active field at runtime. It may be omitted if the compiler
    determines that no additional data is needed (for example, if the `enum union`
    has only one declared field).

    The tag field's storage does not overlap with any of the declared fields.

    The type of the tag field must be a POD type, but is otherwise unspecified.

    The size, offset, and alignment of the tag field are unspecified.

    If two `enum union` values are of the same type, and both have the same active
    field, then the values stored in their tag fields must have identical binary
    representations.

    Aside from the restriction above, the values stored in an `enum union`'s tag
    field are unspecified.

    It is undefined behavior to store any value in the tag field of an `enum union`
    object that was not read from the tag field of an object of the same type.

    The tag field is not included in `.tupleof`.

    Unless otherwise specified, any reference to the "fields" of an `enum union` in
    this document refers only to the declared fields, and does not include the tag
    field.

    ### Special member functions

    Unlike traditional unions, enumerated unions may have copy constructors,
    postblits, destructors, and invariants.

    If an `enum union` does not have a copy constructor or a postblit, but one or
    more of its fields has elaborate copy semantics, a copy constructor is
    generated which performs the following steps:

    1. Copy-initializes the active field from the active field of the original
    object. If the active field has a copy constructor or postblit, it is called
    during this step.
    2. Copy-initializes the tag field (if any) from the tag field of the original
    object.

    A type has elaborate copy semantics if it has a postblit or copy constructor,
    or if it directly embeds a type with elaborate copy semantics. This is the same
    definition used by
    [`std.traits.hasElaborateCopyConstructor`][elaborate_copy_constructor].

    If necessary, the compiler should generate multiple copy constructor overloads
    to handle different combinations of type qualifiers on the new and original
    objects.

    If an `enum union` does not have a destructor, but one or more of its fields
    has elaborate destruction semantics, a destructor is generated which performs
    the following steps:

    1. If the active field has elaborate destruction semantics, destroys the active
    field.

    A type has elaborate destruction semantics if

    1. it has a destructor or directly embeds a type with elaborate destruction
    semantics; and,
    2. it is not a class type or a non-enumerated union type.

    This is the same definition used by
    [`std.traits.hasElaborateDestructor`][elaborate_destructor].

    [elaborate_copy_constructor]: https://dlang.org/phobos/std_traits.html#hasElaborateCopyConstructor
    [elaborate_destructor]: https://dlang.org/phobos/std_traits.html#hasElaborateDestructor

    ### Equality

    Enumerated union values of the same type can be compared for equality.

    Two `enum union` values of the same type are equal if they have the same active
    field, and the values of their active fields are equal.

    ### Safety

    Direct access to fields of an enumerated union is subject to the same safety
    restrictions as access to fields of a traditional union.

    A value of an enumerated union type is a [safe value][safe_values] if

    1. its `__tag` property evaluates to the index of the active field, and
    2. the value of its active field is safe.

    `@trusted` code may assume that the field indicated by the `__tag` property is
    the active field, and may rely on that assumption to allow access to the active
    field in `@safe` code.

    **Example:**

    ```d
    @trusted ref get(string target)(ref WebAddress addr)
    if (target == "ipv4" || target == "ipv6" || target == "url")
    {
    switch (addr.__tag)
    {
    static foreach (i, field; WebAddress.tupleof)
    {
    case i:
    {
    enum isTarget = __traits(identifier, field) == target;
    static if (!isTarget)
    assert(0, "Active field is " ~ active ~ ", not " ~ target);
    else
    return addr.tupleof[i];
    }
    }
    default:
    assert(0);
    }
    }
    @safe unittest
    {
    WebAddress a1 = { url: "https://www.rust-lang.org/" };
    WebAddress a2 = { ipv4: [127, 0, 0, 1] };
    assert(a1.get!"url" == "https://www.rust-lang.org/");
    assert(a2.get!"ipv4" == [127, 0, 0, 1]);
    a1.get!"url" = "https://dlang.org/";
    }
    ```

    Writing to an `enum union` object is `@system` if the `enum union` has fields
    whose types have unsafe values, since doing so could invalidate existing
    pointers or references to the active field.

    Access to the tag field of an `enum union`, if it exists, is always `@system`.

    [safe_values]: https://dlang.org/spec/function.html#safe-values

    ### Reflection

    A new *TypeSpecialization*, `enum union`, is added to the syntax for the `is()`
    expression.

    `is(T == enum union)` evaluates to `true` if `T` is an enumerated union type.

    `is(T : enum union)` evaluates to `true` if `T` is an enumerated union type, or
    implicitly converts to an enumerated union type.

    ## Breaking Changes and Deprecations

    Currently, the syntax `enum union { /* ... */ }` is parsed by the D compiler as
    a union declaration with the `enum` storage class applied to it.

    Since the `enum` storage class has no effect in this context, it is unlikely
    that existing D projects will be affected if this syntax is given a new
    meaning. However, it is not impossible.

    ## Reference

    * [Sum Types - first draft][walter_dip_thread] by Walter Bright.

    [walter_dip_thread]: https://forum.dlang.org/thread/[email protected]

    ## Copyright & License

    Copyright (c) 2024 by the D Language Foundation

    Licensed under [Creative Commons Zero 1.0](https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt)

    ## History

    The DIP Manager will supplement this section with links to forum discsusionss and a summary of the formal assessment.