Newtype Index Pattern

Newtype Index Pattern 
 
https://ift.tt/2LkYmFd 
 
 
 
<div>
<div>
Similarly to the <a href="https://matklad.github.io/2018/05/24/typed-key-pattern.html">previous post</a>, we will once again add types to the Rust code which works perfectly fine without them. This time, we&rsquo;ll try to improve the pervasive pattern of using indexes to manage cyclic data structures.
<h1 id="the-problem">The problem</h1>
Often one wants to work with a data structure which contains a cycle of some form: object <code>foo</code> references <code>bar</code>, which references <code>baz</code> which references <code>foo</code> again. The textbook example here is a graph of vertices and edges. In practice, however, true graphs are a rare encounter. Instead, you are more likely to see a tree with parent pointers, which contains a lot of trivial cycles. And sometimes cyclic graphs are implicit: an <code>Employee</code> can be the head of a <code>Departement</code>, and <code>Departement</code> has a <code>Vec&lt;Employee&gt;</code> personal. This is sort-of a graph in disguise: in usual graphs, all vertices are of the same type, and here <code>Employee</code> and <code>Departement</code> are different types.
Working with such data structures is hard in any language. To arrive at a situation when <code>A</code> points to <code>B</code> which points back to <code>A</code>, some form of mutability is required. Indeed, either <code>A</code> or <code>B</code> must be created first, and so it can not point to the other immediately after construction. You can paper over this mutability with <code>let rec</code>, as in OCaml, or with laziness, as in Haskell, but it is still there.
Rust tends to surface subtle problems in the form of compile-time errors, so implementing such graphs in Rust is challenging. The three usual approaches are:
<ul>
<li>reference counting, explanation by <a href="https://github.com/nrc/r4cppp/blob/master/graphs/README.md#rcrefcellnode">nrc</a>,</li>
<li>arena and real cyclic references, explanation by <a href="https://exyr.org/2018/rust-arenas-vs-dropck/">simonsapin</a> (this one is really neat!),</li>
<li>arena and integer indices, explanation by <a href="http://smallcultfollowing.com/babysteps/blog/2015/04/06/modeling-graphs-in-rust-using-vector-indices/">nikomatsakis</a>.</li>
</ul>
(apparently, rewriting a Haskell monad tutorial in Rust results in a graphs blog post).
I personally like the indexing approach the most. However it presents an interesting readability challenge. With references, you have a <code>foo</code> of type <code>&amp;Foo</code>, and it is immediately clear what that <code>foo</code> is, and what you can do with it. With indexes, however, you have a <code>foo: usize</code>, and it is not obvious that you somehow can get a <code>Foo</code>. Even worse, if indexes are used for two types of objects, like <code>Foo</code> and <code>Bar</code>, you may end up with <code>thing: usize</code>. While writing the code with <code>usize</code> actually works pretty well (I don&rsquo;t think I&rsquo;ve ever used the wrong index type), reading it later is more complicated, because <code>usize</code> is much less suggestive of what you could do.
<h1 id="newtype-trick">Newtype trick</h1>
One way to ameliorate this problem is to introduce a newtype wrapper around <code>usize</code>:
<div>
<div>
<pre>
<code>struct Foo;

#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
struct FooIdx(usize);

struct Arena {
 foos: Vec&lt;Foo&gt;,
}

impl Arena {
 fn foo(&amp;self, foo: FooIdx) -&gt; &amp;Foo {
 &amp;self.foos[foo.0]
 }
}
</code>
</pre>
</div>
</div>
Here, &ldquo;one should use <code>FooIdx</code> to index into <code>Vec&lt;Foo&gt;</code>&rdquo; is still just a convention. A cool thing about Rust is that we can turn this convention into a property verified during type checking. By adding an appropriate impl, we should be able to index into <code>Vec&lt;Foo&gt;</code> with <code>FooIdx</code> directly:
<div>
<div>
<pre>
<code>#[test]
fn direct_indexing(foos: Vec&lt;Foo&gt;, idx: FooIdx) {
 let _foo: &amp;Foo = &amp;foos[idx];
}
</code>
</pre>
</div>
</div>
The impl would look like this:
<div>
<div>
<pre>
<code>use std::ops;

impl ops::Index&lt;FooIdx&gt; for Vec&lt;Foo&gt; {
 type Output = Foo;

 fn index(&amp;self, index: FooIdx) -&gt; &amp;Foo {
 &amp;self[index.0]
 }
}
</code>
</pre>
</div>
</div>
<h1 id="coherence">Coherence</h1>
It&rsquo;s insightful to study why this impl is allowed. In Rust, types, traits and impls are separate. This creates a room for a problem: what if there are two impl blocks for a given (trait, type) pair? The obvious choice is to forbid to have two impls in the first place, and this is what Rust does.
Actually enforcing this restriction is tricky! The simplest rule of &ldquo;error if a set of crates currently compiled contains duplicate impls&rdquo; has severe drawbacks. First of all, this is a global check, which requires the knowledge of all compiled crates. This postpones the check until the later stages of compilation. It also plays awfully with dependencies, because two completely unrelated crates might fail the compilation if present simultaneously. What&rsquo;s more, it doesn&rsquo;t actually solve the problem, because the compiler does not necessary know the set of all crates beforehand. For example, you may load additional code at runtime via dynamic libraries, and silent bad things might happen if you program and dynamic library have duplicate impls.
To be able to combine crates freely, we want a much stronger property: not only the set of crates currently compiled, but all existing and even future crates must not violate the one impl restriction. How on earth is it possible to check this? Should <code>cargo publish</code> look for conflicting impls across all of the crates.io?
Luckily, and this is stunningly beautiful, it is possible to loosen this world-global property to a local one. In the simplest form, we can place a restriction that <code>impl Foo for Bar</code> can appear either in the crate that defines <code>Foo</code>, or in the one that defines <code>Bar</code>. Crucially, whichever one defines the impl has to use the other, which makes it possible to detect the conflict.
This is all really nifty, but we&rsquo;ve just defined an <code>Index</code> impl for <code>Vec</code>, and both <code>Index</code> and <code>Vec</code> are from the standard library! How is it possible? The trick is that <code>Index</code> has a type parameter: <code>trait Index&lt;Idx: ?Sized&gt;</code>. It is a template for a trait of sorts, and we get a &ldquo;real&rdquo; trait when we substitute type parameter with a type. Because <code>FooIdx</code> is a local type, the resulting <code>Index&lt;FromIdx&gt;</code> trait is also considered local. The precise rules here are quite tricky, <a href="https://github.com/rust-lang/rfcs/pull/2451">this RFC</a> explains them pretty well.
<h1 id="more-impls">More impls</h1>
Because <code>Index&lt;FooIdx&gt;</code> and <code>Index&lt;BarIdx&gt;</code> are different traits, one type can implement both of them. This is convenient for containers which hold distinct types:
<div>
<div>
<pre>
<code>struct Arena {
 foos: Vec&lt;Foo&gt;,
 bars: Vec&lt;Bar&gt;,
}

impl ops::Index&lt;FooIdx&gt; for Arena { ... }

impl ops::Index&lt;BarIdx&gt; for Arena { ... }
</code>
</pre>
</div>
</div>
It&rsquo;s also helpful to define arithmetic operations and conversions for the newtyped indexes. I&rsquo;ve put together a <a href="https://crates.io/crates/typed_index_derive"><code>typed_index_derive</code></a> crate to automate this boilerplate via a proc macro, the end result looks like this:
<div>
<div>
<pre>
<code>#[macro_use]
extern crate typed_index_derive;

struct Spam(String);

#[derive(
 // Usual derives for plain old data
 Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash,

 TypedIndex
)]
#[typed_index(Spam)] // index into `&amp;[Spam]`
struct SpamIdx(usize); // could be `u32` instead of `usize`

fn main() {
 let spams = vec![Spam("foo".into()), Spam("bar".into()), Spam("baz".into())];

 // Conversions between `usize` and `SpamIdx`
 let idx: SpamIdx = 1.into();
 assert_eq!(usize::from(idx), 1);

 // Indexing `Vec&lt;Spam&gt;` with `SpamIdx`, `IndexMut` works as well
 assert_eq!(&amp;spams[idx].0, "bar");

 // Indexing `Vec&lt;usize&gt;` is rightfully forbidden
 // vec![1, 2, 3][idx]
 // error: slice indices are of type `usize` or ranges of `usize`

 // It is possible to add/subtract `usize` from an index
 assert_eq!(&amp;spams[idx - 1].0, "foo");

 // The difference between two indices is `usize`
 assert_eq!(idx - idx, 0usize);
}
</code>
</pre>
</div>
</div>
Discussion on <a href="https://www.reddit.com/r/rust/comments/8ohaj4/blog_post_newtype_index_pattern/">/r/rust</a>.
</div>
</div>
 
 
 
 
 
 
via matklad.github.io https://ift.tt/2yMSpNn 
 
November 2, 2018 at 11:36AM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Newtype Index Pattern #1402

The problem

Newtype trick

Coherence

More impls

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Newtype Index Pattern #1402

Description

The problem

Newtype trick

Coherence

More impls

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions