Skip to content

Instantly share code, notes, and snippets.

@stevedonovan
Created July 1, 2018 12:38
Show Gist options
  • Save stevedonovan/7f137e35b2553efff84a28e7f77b7604 to your computer and use it in GitHub Desktop.
Save stevedonovan/7f137e35b2553efff84a28e7f77b7604 to your computer and use it in GitHub Desktop.

Revisions

  1. stevedonovan created this gist Jul 1, 2018.
    445 changes: 445 additions & 0 deletions common-rust-traits.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,445 @@
    ## What is a Trait?

    In Rust, types containing data - structs, enums and any other 'aggregate'
    types like tuples and arrays - are dumb. They may have methods but that
    is just a convenience; they are just functions. Types have no
    relationship with each other.

    _Traits_ are the abstract mechanism for adding functionality to types
    and establishing relationships between them.

    ## Printing Out: Display

    For a value to be printed out using "{}", it must _implement_ the [Display](?) trait.

    If we're only interested in how a value displays itself, then there are
    two ways to define functions taking such values. In this example, we
    want to print out slices of references to displayable values.

    The first is _generic_ where the element
    type of the slice is _any_ type that implements `Display`:

    ```rust
    fn display_items_generic<T: Display> (items: &[&T]) {
    for item in items.iter() {
    println!("{}", item);
    }
    }

    display_items_generic(&[&10, &20]);
    ```
    Here the trait `Display` is acting as a constraint on a generic type.
    Separate code is generated for each distinct type `T`. There is no
    direct analog with mainstream languages here - the closest would be C++
    [concepts](https://en.wikipedia.org/wiki/Concepts_(C%2B%2B)) which solves
    the "compile-time duck-typing" problem with C++ templates.

    The second is _polymorphic_, where the element type of the slice is a
    reference to `Display`.

    ```rust
    fn display_items_polymorphic (items: &[&Display]) {
    for item in items.iter() {
    println!("{}", item);
    }
    }

    display_items_generic(&[&10, "hello"]);
    ```

    Code is only generated once for `display_items_polymorphic`, but we invoke
    different code for each type dynamically. Note that the slice can now contain
    references to _any_ value that implements `Display`. Here `Display` is
    acting very much like what is called an _interface_ in Java.

    The conversion involved is interesting: a reference to a concrete type
    becomes a _trait object_. It's non-trivial because the trait object
    has two parts - the original reference and a 'virtual method table'
    containing the methods of the trait (a so-called "fat pointer").

    ```rust
    let d: &Display = &10;
    ```

    (A little _too_ much magic is happening here, and Rust is moving towards a
    more explicit notation for trait objects, `&dyn Display` etc.)

    How to decide between generic and polymorphic? The second is more flexible,
    but involves going through _virtual methods_ which is slightly slower.
    Generic functions/structs can implement 'zero overhead abstractions'
    since the compiler can inline such functions. The only honest answer is
    "it depends". Bear in mind that the actual cost of using trait objects
    might be negligible compared to the other work done by a program. (It's hard
    to make engineering decisions based on micro-benchmarks.)

    Defining `Display` for your own types is straightforward but needs to be
    explicit, since the compiler cannot reasonably guess what the
    output format must be (unlike with [Debug](?))

    ```rust
    use std::fmt;

    struct MyType {
    x: u32,
    y: u32
    }

    impl fmt::Display for MyType {
    fn display(&self, f: &mut fmt::Formatter) -> fmt::Result {
    write!(f, "x={},y={}", self.x, self.y)
    }
    }
    ```

    Any type that implements `Display` _automatically_ implements [ToString](?), so
    `42.to_string()`, `"hello".to_string()` all work as expected.

    (Rust traits often operate in little groups like this.)

    ## Conversion: From and Into

    An important pair of traits is `From/Into`. The [From](?) trait expresses the conversion
    of one value into another using the `from` method. So we have `String::from("hello")` .
    If `From` is implemented, then the [Into](?) trait is auto-implemented.

    Since `String` implements `From<&str>`, then `&str` automatically implements `Into<String>`.

    ```rust
    let s = String::from("hello"); // From
    let s: String = "hello".into(); // Into
    ```
    The [json](?) crate provides a nice example. A JSON object is indexed with strings,
    and new fields can be created by inserting `JsonValue` values:

    ```rust
    obj["surname"] = JsonValue::from("Smith"); // From
    obj["name"] = "Joe".into(); // Into
    obj["age"] = 35.into(); // Into
    ```
    Note how convenient it is to use `into` here, instead of `from`! We are doing
    a conversion which Rust will not do implicitly, but `into()` is a small word,
    easy to type and read.

    `From` expresses a conversion that _always_ succeeds. It may be relatively expensive, though:
    converting a string slice to a `String` will allocate a buffer and copy the bytes. The
    conversion always takes place by value.

    `From/Info` has an intimate relationship with Rust error handling.

    This statement:

    ```rust
    let res = returns_some_result()?;
    ```
    is basically sugar for this:

    ```rust
    let res = match returns_some_result() {
    Ok(r) => r,
    Err(e) => return Err(e.into())
    };
    ```
    That is, any error type which can convert _into_ the returned error type works.

    A useful strategy for informal error handling is to make the function return
    `Result<T,Box<Error>>`. Any type that implements `Error` can be converted
    into the trait object `Box<Error>`.

    ## Making Copies: Clone and Copy

    `From` (and its mirror image `Into`) describe how distinct types are converted into
    each other. `Clone` describes how a new value of the same type can be created.
    Rust likes to make any potentially expensive operation obvious, so `val.clone()`.

    This can simply involve moving some bits around ("bitwise copy").
    A number is just a bit pattern in memory.

    But `String` is different, since as well as size and capacity fields,
    it has dynamically-allocated string data. To clone a string involves
    allocating that buffer and copying the original bytes into it.

    Making your types cloneable is easy, as long as every type in a struct or enum
    implements `Clone`:

    ```rust
    #[derive(Debug,Clone)]
    struct Person {
    first_name: String,
    last_name: String,
    }
    ```

    `Copy` is a _marker trait_ (there are no methods to implement) which says that
    a type may be copied by just moving bits. You can define it for your own
    structs:

    ```rust
    #[derive(Debug,Clone,Copy)]
    struct Point {
    x: f32,
    y: f32,
    z: f32
    }
    ```
    Again, only possible if all types implement `Copy`. You cannot sneak in a
    non-`Copy` type like `String` here!

    This trait interacts with a key Rust feature: moving. Moving a value is always
    done by simply moving bits around. If the value is `Copy`, then the original
    location remains valid.

    ```rust
    let n1 = 42;
    let n2 = n1;
    // n1 is still fine (i32 is Copy)
    let s1 = "hello".to_string();
    let s2 = s1;
    // value moved into s2, s1 can no longer be used!
    ```
    Bad things would happen if `s1` was still valid - both `s1` and `s2` would
    be dropped at the end of scope and their shared buffer would be deallocated twice!
    C++ handles this situation by always copying; in Rust you
    must say `s1.clone()`.

    ## Fallible Conversions - FromStr

    If I have the integer `42`, then it is quite safe to convert this to an owned string,
    which is expressed by `ToString`. However, if I have the string "42" then in general
    the conversion into `i32` must be prepared to fail.

    To implement [FromStr](?) takes two things; an implementation of the `from_str` method
    and setting the associated type `Err` to the error type returned when the conversion fails.

    Usually it's used implicitly through the string `parse` method. This is a method with
    a generic output type, which needs to be tied down.

    E.g. using the so-called turbofish operator:

    ```rust
    let answer = match "42".parse::<i32>() {
    Ok(n) => n,
    Err(e) => panic!("'42' was not 42!");
    };
    ```

    Or (more elegantly) in a function where we can use `?`:

    ```rust
    let answer: i32 = "42".parse()?;
    ```

    The Rust standard library defines `FromStr` for the numerical types and for network addresses.
    It is of course possible for external crates to define `FromStr` for their types and then
    they will work with `parse` as well. This is a cool thing about the standard traits - they
    are all open for further extension.

    ## Reference Conversions - AsRef

    [AsRef](?) expresses the situation where a cheap _reference_ conversion is possible
    between two types.

    The most common place you will see it in action is with `&Path`. In an ideal world,
    all file systems would enforce UTF-8 names and we could just use `String` to
    store them. However, we have not yet arrived at Utopia and Rust has a dedicated
    type `PathBuf` with specialized path handling methods, backed by `OsString`,
    which represents untrusted text from the OS. `&Path` is the borrowed counterpart
    to `PathBuf`. It is cheap to get a `&Path` reference from regular Rust strings
    so `AsRef` is appropriate:

    ```rust
    // asref.rs
    fn exists(p: impl AsRef<Path>) -> bool {
    p.as_ref().exists()
    }

    assert!(exists("asref.rs"));
    assert!(exists(Path::new("asref.rs")));
    let ps = String::from("asref.rs");
    assert!(exists(&ps));
    assert!(exists(PathBuf::from("asref.rs")));
    ```

    This allows any function or method working with file system paths to be conveniently
    called with any type that implements `AsRef<Path>`. From the documentation:

    ```rust
    impl AsRef<Path> for Path
    impl AsRef<Path> for OsStr
    impl AsRef<Path> for OsString
    impl AsRef<Path> for str
    impl AsRef<Path> for String
    impl AsRef<Path> for PathBuf
    ```

    Follow this pattern when defining a public API, because people are accustomed to
    this little convenience.

    `AsRef<str>` is implemented for `String`, so we can also say:

    ```rust
    fn is_hello(s: impl AsRef<str>) {
    assert_eq!("hello", s.as_ref());
    }

    is_hello("hello");
    is_hello(String::from("hello"));
    ```
    This seems attractive, but using this is very much a matter of taste. Idiomatic Rust code
    prefers to declare string arguments as `&str` and lean on _deref coercion_
    for convenient passing of `&String` references.

    ## Deref

    Many string methods in Rust are not actually defined on `String`. The methods
    explicitly defined typically _mutate_ the string, like `push` and `push_str`.
    But something like `starts_with` applies to string slices as well.

    At one point in Rust's history, this had to be done explicitly, so if you
    had a `String` called `s`, you would have to say 's.as_str().starts_with("hello")`.
    You will occasionally see `as_str()`, but mostly method resolution happens
    through the magic of _deref coercion_.

    The [Deref](?) trait is actually used to implement the "dereference" operator `*`.
    This has the same meaning as in C - extract the value which the reference is
    pointing to - although doesn't appear explicitly as much. If `r` is a reference,
    then you say `r.foo()`, but if you did want the value, you have to say `*r`
    (In this respect Rust references are more like C pointers than C++ references,
    which try to be indistinguishable from C++ values.)

    `String` implements `Deref`; the type of `&*s` is `&str`.

    Deref coercion means that `&String` will implicitly convert into `&str`:

    ```rust
    let s: String = "hello".into();
    let rs: &str = &s;
    ```
    "Coercion" is a strong word, but this is one of the few places in Rust
    where type conversion happens silently. `&String` is a very
    different type to `&str`! I still remember my
    confusion when the compiler insisted that these types were distinct,
    especially with operators where the convenience of deref coercion
    does not happen. The match operator matches types explicitly
    and this is where `s.as_str()` is still necessary - `&s` would not work:

    ```
    let s = "hello".to_string();
    ...
    match s.as_str() {
    "hello" => {},
    "dolly" => {},
    ....
    }
    ```

    It's idiomatic to use string slices in function arguments, knowing that
    `&String` will convert to `&str`.

    Deref coercion is also used to resolve methods - if the method isn't defined
    on `String`, then we try `&str`.

    A similar relationship holds between `Vec<T>` and `&[T]`. Likewise, it's
    not idiomatic to have `&Vec<T>` as a function argument type, since `&[T]`
    is more flexible and `&Vec<T>` will convert to `&[T]`.

    ## Ownership: Borrow

    Ownership is an important concept in Rust; we have types like `String` that
    "own" their data, and types like `&str` that can "borrow" data from
    an owned typed.

    The [Borrow](?) trait solves a sticky problem with associative maps and sets.
    Typically we would keep owned strings in a `HashSet` to avoid borrowing blues.
    But we really don't want to _create_ a `String` to query set membership!

    ```rust
    let mut set = HashSet::new();
    set.insert("one".to_string());
    // set is now HashSet<String>
    if set.contains("two") {
    println!("got two!");
    }
    ```
    The borrowed type `&str` can be used instead of `&String` here!

    ## Iteration: Iterator and IntoIterator

    The [Iterator](?) trait is interesting. You are only required to implement
    one method - `next()` - and all that method must do is return an
    `Option` value each time it's called. When that value is `None` we
    are finished.

    However, there are a lot of _provided_ methods which have default
    implementations in `Iterator`. You get `map`,`filter`,etc for free.

    This is the verbose way to use an iterator:

    ```rust
    let mut iter = [10, 20, 30].iter();
    while let Some(n) = iter.next() {
    println!("got {}", n);
    }
    ```
    The `for` statement provides a shortcut:

    ```rust
    for n in [10, 20, 30].iter() {
    println!("got {}", n);
    }
    ```
    The expression here actually is _anything that can convert into an iterator_,
    which is expressed by `IntoIterator`. So `for n in &[10, 20, 30] {...}` works
    as well - a slice is definitely not an iterator, but it implements
    `IntoIterator`. Simularly, `for i in 0..10 {...}` involves a range expression
    implicitly converting into an iterator. Iterators implement `IntoIterator`
    (trivially).

    So the `for` statement in Rust is specifically tied to a single trait.

    Iterators in Rust are a zero-overhead abstraction, which means that _usually_
    you do not pay a run-time penalty for using them. In fact, if you wrote out
    a loop over slice elements explicitly it would be slower because
    of run-time index range checks.

    The most general way to pass a sequence of values to a function is
    to use `IntoIterator`. Just using `&[T]` is too limited and requires the caller
    to build up a buffer (which could be both awkward and expensive), `Iterator<Item=T>`
    itself requires caller to call `iter()` etc.

    ```rust
    fn sum (ii: impl IntoIterator<Item=i32>) -> i32 {
    ii.into_iter().sum()
    }

    println!("{}", sum(0..9));
    println!("{}", sum(vec![1,2,3]));
    // cloned() here makes an interator over i32 from an interator over &i32
    println!("{}", sum([1,2,3].iter().cloned()));
    ```

    ## Conclusion: Why are there So Many Ways to Create a String?

    ```rust
    let s = "hello".to_string(); // ToString
    let s = String::from("hello"); // From
    let s: String = "hello".into(); // Into
    let s = "hello".to_owned(); // ToOwned
    ```
    This is a common complaint at first - people like to have one idiomatic way of
    doing common operations. And curiously enough - none of these are actual
    `String` methods!

    But all these traits are needed, since they make truly generic programming possible;
    when you create strings in code, just pick one way and use it consistently.

    A consequence of Rust's dependence on traits is that it can take a while
    to [learn to read the documentation](https://stevedonovan.github.io/rust-gentle-intro/5-stdlib-containers.html).
    Knowing what methods can be called on a type depends on what traits are implemented for that type.

    However, Rust traits are not sneaky. They have to be brought into scope before they
    can be used. For instance, you need `use std::error::Error` before you can
    call `description()` on a type implementing `Error`. A _lot_ of types are brought
    in by default by the Rust prelude, however.