C// transpiler to Rust

Looking at this project again around when I posted it (many years after my initial project), I noticed that there is some similarity between the algorithms that I used and how Rust does things. The C// optimizer tries to ensure that if a function modifies a variable, then it has the only copy of that variable, which is the same guarantee that Rust tries to make, except that Rust requires you to explicitly specify when you have the only copy of a variable and when to make copies, whereas C// tries to infer that. There are even similarities in how they achieve this:

(I even use the term "borrowed" and "ownership" at one point in the code (Expr.java). I don't actually remember whether I was aware of Rust at the time; it's possible some vague knowledge of Rust inspired this design.)

That means that C// is kind of like a simplified Rust-like language where borrowing vs. ownership is inferred rather than having to be explicitly specified, so my thought is that maybe I could try transpiling this into actual Rust. That means that C// is not only a proof-of-concept for value semantics, but also for this inferred Rust-like language idea.

This transpiler not only transpiles into Rust, but is also written in Rust. It's also my first attempt at writing a real program in Rust.

Contents

Instructions

Dependencies

You will need rustc and cargo installed. This program has no dependencies on external crates other than std.

You will also need something that can extract gzipped tar files.

Compiling and testing the transpiler

The download link is at the bottom of the page.

cd into the directory, then type cargo build.

You can run the unit tests with cargo test, and run all the included test programs by typing ./do-test. You can run a specific test by typing ./do-test name-of-file (without the .cdd extension or the test-input/ prefix). This will run four tests per file with different options, putting the output into the test-output-* directories, and compare the files with each other and with test-output-expected/*.txt if the file exists. (For both this and the original version, this only tests for correctness, not optimizedness.)

Running the transpiler

To run, either type

cargo run -- options input-file.cdd > output-file.rs

or

target/debug/cdd-rs options input-file.cdd > output-file.rs

options includes:

--default-share
Changes the default mode to @share (see below).
--optimize
Actually apply the optimizations. If this is not specified, either every struct value will be owned (and copied even when not necessary), or, if --default-share is supplied, every struct value will be reference counted.
-O0 through -O4
-O0 is the default, and -O4 is the same as --optimize. Anything in between will perform only a subset of the optimization steps, which could be helpful if you're trying to figure out what they do or how they work.

Known language differences from the transpiler to C

In addition, the following behaviors, which are undefined or implementation-defined in the transpiler to C, happen to be defined in this version of the transpiler to Rust (but could change in future versions):

Information about the algorithm

Type modes

Every variable, argument, and return type is assigned a type mode. There are three possible type modes, and (unlike in the transpiler to C) you can explicitly specify one to use (syntax: @own struct struct_name var_name):

ModeTranslates to (in Rust)Equivalent in the transpiler to CNotes
@ownTmodifyable = truecheap to modify, assigning to anything but @borrow is expensive unless it's the last use
@shareRc<T>modifyable = false
isContained = false
expensive to modify, assigning to other @shares is cheap
@borrow&Tmodifyable = false
isContained = true
cannot modify

Some more notes:

If optimizations are disabled, every variable where the mode isn't specified is marked as @own if --default-share isn't specified on the command line, and @share if it is. If optimizations are enabled (≥ -O2), then --default-share only affects struct members; the mode of everything else is determined by the optimizer.

In addition, each use of a variable is marked as "copy" or "move". If optimizations are disabled, then every variable use is "copy", and there's no way to change that.

The algorithm

For ≥ -O2, all parameters start out @borrow and everything else (except struct members) starts out @share.

  1. Analyze liveness (≥ -O1): Goes through each function backwards, with a mutable set of live variables, which is initially empty. Each time it encounters a use of a non-live variable, it marks that use as "move" and adds it to the set of live variables. Each time it encounters an assignment to a variable or variable declaration, removes it from the list of live variables. This corresponds to the findUsedVariables and analyzeLiveness steps in the C version; however, this version assumes that expressions are evaluated left-to-right (so in f(x, x), the second x could be the last use of x), whereas the C version assumes that evaluation order is undefined (so in f(x, x), neither x is treated as the last use).
  2. Find modifications (≥ -O2):
  3. Corresponds to findModifiedVariables and findTransferOpportunities in the C version.

  4. Find share opportunities (≥ -O3):
  5. Kind of corresponds to setting isContained in the C version, except in that version it just used a simple heuristic (is the parameter's type different from the return type?).

  6. Find unneeded reference counts (≥ -O4):
  7. Doesn't have an equivalent in the C version. Since shared variables didn't have reference counts in the C version (since it just leaked memory), there aren't any reference counts to optimize away. (That said, a function that e.g. calls a constructor and just returns its result unmodified probably could benefit from this, e.g. make_return in constructor.cdd.)

  8. Fix borrows (≥ -O1): Goes through every function call, and every time an variable is borrowed, further uses of the same variable in the same call are marked as moves. This is because e.g. f(&x, x) could be generated by the liveness analyzer, but is invalid (since the borrow lasts for the duration of the call), so the second x becomes cloned. (This wasn't an issue for the C version, since it would have classified both x's as moves regardless of whether one was borrowed.)

Each of steps 2, 3, and 4 run in a loop until they reach a fixed point, going through the whole program each time. (Functions do only depend on functions they call, not functions that call them, but this means that I don't have to figure out what the actual call graph is like, and can also handle recursion.) Variables can only go from @borrow to @share to @own, and can't go back, which guarantees that this will eventually terminate.

In all cases, if a mode is explicitly specified, then the optimizer won't change it.

After all that, when generating the code, there's a huge match statement with a case for each combination of expected mode, mode of the expression, and whether moving is allowed:

@borrow = @share = @own =
= @borrow expr Rc::new(expr.clone()) expr.clone()
= @share Move &*expr expr unwrap_or_clone(expr)
Copy Rc::clone(&expr) (*expr).clone()
= @own Move &expr Rc::new(expr) expr
Copy Rc::new(expr.clone()) expr.clone()

For comparison, here's what the C implementation does:

!modifyable =modifyable =struct member =
isContained!isContained
= !modifyableno copy /*shared*/copycopy
= modifyableFunction return valno copy /*borrowed*/no copy /*return transfer*/
isTransferno copy /*transfer*/
!isTransfercopy
= struct membercopycopy

The differences:

Known limitations

Some known limitations with the algorithm as implemented, and things that could potentially be improved:

Download

Download cdd-rs.tgz (42K)