A First Parallel Program in Rust

Mar 6, 2013   #rustlang  #tutorial  #projects  #code 

Code in this tutorial runs under Rust 0.8 as of October 31, 2013.

Rust is a compiled, strongly-typed language with first-class functions and closures. It has shared-nothing light-weight thread-based concurrency with message-passing for communication. It has memory safety but with this complicated three-kinds-of-pointers arrangement. It has a neat modern type system with structs (records) and enums (variants) and Traits (interfaces/type-classes).

So, it’s got all the modern features and focuses on system-level stuff and pervasive parallelism. That “pervasive parallelism” is what I’m going to focus on here.

As my first original Rust program after reading the language tutorial, this is going to be a bit rough. I am quite sure that I don’t understand the pointer types well enough to use them properly. Closures confuse me a bit, especially there being three types that …capture differently?

Anyway, at least the compiler tells me when I get it wrong.

##The Plan I’m going to implement a solution to this challenge problem. This means a program that:

  1. Spawns 4 threads.
  2. Thread 1: multiplies 10 * 2, sends result to thread 4
  3. Thread 2: multiplies 20 * 2, sends result to thread 4
  4. Thread 3: adds 30 + 40, sends result to thread 4
  5. Thread 4: receives the three results, sends string to main of format: "%d + %d + %d = %d" % x y z (x+y+z)
  6. main receives string from thread 4 and prints it to stdout

##My Code I like getting the big picture to start with, so here’s the code in full:

use std::comm::SharedChan;

fn main() {
  let (out,input): (Port<int>, Chan<int>) = stream();
  let input = SharedChan::new(input);
  let (strout,strin): (Port<~str>, Chan<~str>) = stream();

  do spawn {
    let x = out.recv();
    let y = out.recv();
    let z = out.recv();
    let result = format!("{:d} + {:d} + {:d} = {:d}", x, y, z, x+y+z);
    strin.send(result);
  }

  let my_in = input.clone();
  do spawn {
    my_in.send(2 * 10);
  }

  let my_in = input.clone();
  do spawn {
    my_in.send(2 * 20);
  }

  let my_in = input.clone();
  do spawn {
    my_in.send(30 + 40);
  }

  let str = strout.recv();
  println(str);
}

##Imports

use std::comm::SharedChan;

This use statement pull modules/functions/types into the local context. That means that rather than saying std::comm::SharedChan, I just say SharedChan.

As a note, use statements can also appear at the beginning of function bodies (not just at the top of the file).

You can see what other functions are in the std::comm module by reading this official documentation.

##Main Function

fn main() {

The fn keyword is for creating a function. Functions are not closures. This means that they do not grab variables from the enclosing scope (you can only use the explicit arguments). The main function takes no arguments and does not define a return type (which means it returns a value of type (), whose only value is (), which is called nil or the empty tuple or unit).

As in most languages, main is the function that gets run automatically when you run an executable. A Rust file without a main function cannot be compiled to an executable, but could be compiled as a library.

##Setting Up Pipes

Our “threads” will actually be “tasks”, which is Rust-speak for a light-weight thread with no shared memory. The way to communicate between tasks is by sending “messages”, meaning values, across a pipe.

A pipe is a one-way stream of values, of a particular type. This means that the pipe has two ends, one that you can put things into and another that you can receive things on.

let (out,input): (Port<int>, Chan<int>) = stream();

You create the two ends of the pipe by calling stream. It returns a tuple, which is unpacked here into the variable input and out.

The out side is a Port<int>. A Port has a recv method, which you can use to try to get a value out of the pipe (it will wait for one if the pipe is empty).

The <int> part is Rust’s type-generic notation. It means that calling recv on this port will return an int

The in side is a Chan<int>. A Chan has a send method, which you can use to send a value of the proper type across the pipe. The <int> part means that you can only send ints.

Only one task can hold each end of the pipe. This means that a pipe can connect one unique sending task to one unique receiving task. However, we want to have three senders (the first three threads) and one receiver (the fourth thread).

let input = SharedChan(input);

This is a pretty common concurrency pattern, so there’s an easy built-in way to let multiple tasks send on a pipe. Running SharedChan(input) return a SharedChan<int> that will send its output to out, the Port that matches input.

After this, you won’t be allowed to use input anyway, so I just reused the name for the SharedChan. I’ll show you a little further on how you divide a SharedChan into the multiple write ends you need.

let (strout,strin): (Port<~str>, Chan<~str>) = stream();

Here, we’re just setting a pipe for the fourth thread to speak to the main thread. It sends strings (str) instead of ints. The ~ means that we’re sending “owned” pointers to strings, rather than copies of strings, between the tasks.

Remember that the tasks do not share memory. This means that a pointer to the task-local heap will be meaningless when sent across a pipe. Copying an entire string to send it across could also be very wasteful.

An owned pointer is a pointer to a common “exchange heap”. Only one task at a time can “own” an owned pointer. That means that once we sent a string across the pipe, we won’t be allowed to use it in the sending task anymore – because we gave it away.

Sending the pointer across is quick, and no data gets copied, since we’re just sending the pointer.

The weird pointer types in Rust seem generally really confusing to me. I just guessed at punctuation marks until the compiler was happy, but I think I get why it works now that I’m explaining it.

##Thread 4

Our goal for Thread 4 is to receive three values and then create and send a string back to main.

The way to create and run a new task is the spawn function. You give it the function/closure to run, and then it does so.

do spawn {
  let x = out.recv();
  let y = out.recv();
  let z = out.recv();
  let result = format!("{:d} + {:d} + {:d} = {:d}", x, y, z, x+y+z);
  strin.send(result);
}

Here, we’re doing a lot of stuff implicitly. The stuff in the curly braces is a closure. That’s like a function, but it gets to grab variables from the enclosing scope. In this case, we’re grabbing our Chan<~str>, strin and out Port<int>, out.

The compiler will not let you use strin or out subsequently in main. It understands that that those values now belong to our Thread/Task 4.

Also, while I call this “Thread 4”, in the code it’s an anonymous closure and (I guess) it’d be called an anonymous task too. There is no way for us to talk about this task in the code after this.

let x = out.recv();
let y = out.recv();
let z = out.recv();

The first thing out new task does is wait. It’s going to wait to receive three ints from the pipe.

Once we have the numbers, we need to build a string:

let result = format!("{:d} + {:d} + {:d} = {:d}", x, y, z, x+y+z);

The format! function is a special syntax extension called a macro. That’s why it has a ! in the name. It acts like a function that returns a string.

The idea is that it’s equivalent to sprintf in C or similar languages. However, rather than using %d, you use {:d}. There is an additional formatting thing, :?, which can print anything. The formating things, like :d, are type-checked at compile time. You can see fancier usages of this function in the documentation.

strin.send(result);

Once we send result to main, we can’t use it anymore. We gave it away. However, we don’t care about that here, since we’re all done anyway.

##Threads 1,2,3

These three threads are basically the same, so I’m just going to look at Thread 1 in detail.

Before we split of a new task, we need to make a Chan<int> for it to send on.

let my_in = input.clone();

The clone method of SharedChan is how we get a new Chan to give to our new task. We’ll still be able to use input in our main function after we give my_in away to the new task.

do spawn {
  my_in.send(2 * 10);
}

Here we spawn using another anonymous zero-argument closure. This one captures the my_in that we just made. All it does is multiply the two literals and send the value across the pipe.

As you can see by the fact that I reuse the name my_in further down, the compiler is not banning a name once you give it away to another task. It just cares about the value.

##Main Thread: Printing

let str = strout.recv();
println(str);

Back in main, after we create all those child tasks, we just wait for the final answer. We won’t be waiting long since there’s so little work to do. Once we have the string, we print it.

##Conclusion

As you can see if you run rustc on the file and then run the executable it produces, the output is the line:

20 + 70 + 40 = 130

Or the line:

20 + 40 + 70 = 130

Those are the two orderings I got from running it 8 times. The changing ordering tells us that Rust really is parallelizing our puny tasks. I’m pretty excited about that.

This was a successful first foray for me into writing in Rust, and writing parallel things in Rust. I hope that it helps you wade in too. :)