Looking at this project again around when I posted it (many years after my initial project), I noticed that there is some similarity between the algorithms that I used and how Rust does things. The C// optimizer tries to ensure that if a function modifies a variable, then it has the only copy of that variable, which is the same guarantee that Rust tries to make, except that Rust requires you to explicitly specify when you have the only copy of a variable and when to make copies, whereas C// tries to infer that. There are even similarities in how they achieve this:
truebehave similarly to owned values in Rust: both can be potentially modified, and both can be transferred to a different variable without copying as long as the original variable is no longer used.
(I even use the term "borrowed" and "ownership" at one point in the code (
Expr.java). I don't actually remember whether I was aware of Rust at the time; it's possible some vague knowledge of Rust inspired this design.)
That means that C// is kind of like a simplified Rust-like language where borrowing vs. ownership is inferred rather than having to be explicitly specified, so my thought is that maybe I could try transpiling this into actual Rust. That means that C// is not only a proof-of-concept for value semantics, but also for this inferred Rust-like language idea.
This transpiler not only transpiles into Rust, but is also written in Rust. It's also my first attempt at writing a real program in Rust.
You will need
cargo installed. This program has no dependencies on external crates other than
You will also need something that can extract gzipped tar files.
The download link is at the bottom of the page.
cd into the directory, then type
You can run the unit tests with
cargo test, and run all the included test programs by typing
./do-test. You can run a specific test by typing
./do-test name-of-file (without the
.cdd extension or the
test-input/ prefix). This will run four tests per file with different options, putting the output into the
test-output-* directories, and compare the files with each other and with
test-output-expected/*.txt if the file exists. (For both this and the original version, this only tests for correctness, not optimizedness.)
To run, either type
cargo run -- options input-file.cdd > output-file.rs
target/debug/cdd-rs options input-file.cdd > output-file.rs
structvalue will be owned (and copied even when not necessary), or, if
--default-shareis supplied, every
structvalue will be reference counted.
-O0is the default, and
-O4is the same as
--optimize. Anything in between will perform only a subset of the optimization steps, which could be helpful if you're trying to figure out what they do or how they work.
@borrowcan be specified before
structtypes (explained below).
structs be defined before they're used in function parameters and return types. Due to differences in how I decided to structure the code, this requirement isn't enforced by this transpiler.
structs still need to be defined before they're used as members in other
structs, though (this prevents recursive
In addition, the following behaviors, which are undefined or implementation-defined in the transpiler to C, happen to be defined in this version of the transpiler to Rust (but could change in future versions):
ints are 32-bit.
Every variable, argument, and return type is assigned a type mode. There are three possible type modes, and (unlike in the transpiler to C) you can explicitly specify one to use (syntax:
@own struct struct_name var_name):
|Mode||Translates to (in Rust)||Equivalent in the transpiler to C||Notes|
|cheap to modify, assigning to anything but
|expensive to modify, assigning to other
Some more notes:
@borrowcan only be used on function arguments, and variables declared
@borrowcannot be modified (either by reassigning them or by assigning to one of their members).
@sharevariables can be modified, but doing so will (potentially) make a copy.
@sharevariables have a reference count (unlike in the C implementation, where memory just leaks), there's an additional optimization where if the reference count is 1, it won't actually make a copy. This means that only using
@shareis effectively an alternative solution to the problem of implementing value semantics, namely using runtime reference counting rather than static analysis.
If optimizations are disabled, every variable where the mode isn't specified is marked as
--default-share isn't specified on the command line, and
@share if it is. If optimizations are enabled (≥
--default-share only affects
struct members; the mode of everything else is determined by the optimizer.
In addition, each use of a variable is marked as "copy" or "move". If optimizations are disabled, then every variable use is "copy", and there's no way to change that.
-O2, all parameters start out
@borrow and everything else (except
struct members) starts out
-O1): Goes through each function backwards, with a mutable set of live variables, which is initially empty. Each time it encounters a use of a non-live variable, it marks that use as "move" and adds it to the set of live variables. Each time it encounters an assignment to a variable or variable declaration, removes it from the list of live variables. This corresponds to the
analyzeLivenesssteps in the C version; however, this version assumes that expressions are evaluated left-to-right (so in
f(x, x), the second
xcould be the last use of
x), whereas the C version assumes that evaluation order is undefined (so in
f(x, x), neither
xis treated as the last use).
@sharemember of it) has a "move" use in a context that takes an
@ownvariable, then it's marked as
@borrowparameter is assigned to, it's upgraded to
@share. (This is why
-O2is required before defaulting to parameters to
@borrow; otherwise an implied
@borrowparameter could be assigned to.)
@own, then the return type is upgraded to
findTransferOpportunities in the C version.
@borrowvariable is used in a context where it takes a
@sharevariable, that variable is upgraded to
Kind of corresponds to setting
isContained in the C version, except in that version it just used a simple heuristic (is the parameter's type different from the return type?).
@sharevariable is used in a context that expects a
@share, or a
@sharevariable is assigned a value that comes from something else
@share, that variable is marked.
@sharevariables except function parameters are upgraded to
@own, so they don't need a reference count (since there's nowhere to take advantage of it).
@share, then its return type is upgraded to
Doesn't have an equivalent in the C version. Since shared variables didn't have reference counts in the C version (since it just leaked memory), there aren't any reference counts to optimize away. (That said, a function that e.g. calls a constructor and just returns its result unmodified probably could benefit from this, e.g.
-O1): Goes through every function call, and every time an variable is borrowed, further uses of the same variable in the same call are marked as moves. This is because e.g.
f(&x, x)could be generated by the liveness analyzer, but is invalid (since the borrow lasts for the duration of the call), so the second
xbecomes cloned. (This wasn't an issue for the C version, since it would have classified both
x's as moves regardless of whether one was borrowed.)
Each of steps 2, 3, and 4 run in a loop until they reach a fixed point, going through the whole program each time. (Functions do only depend on functions they call, not functions that call them, but this means that I don't have to figure out what the actual call graph is like, and can also handle recursion.) Variables can only go from
@own, and can't go back, which guarantees that this will eventually terminate.
In all cases, if a mode is explicitly specified, then the optimizer won't change it.
After all that, when generating the code, there's a huge
match statement with a case for each combination of expected mode, mode of the expression, and whether moving is allowed:
For comparison, here's what the C implementation does:
|…||no copy ||copy||copy|
|…||Function return val||no copy ||no copy
@share, and copies in the
@share = @borrowcase. The C version doesn't even have access to the information to make that distinction. In the C version, borrowed parameters and the equivalent of
@sharevariables can be mixed freely, as long as borrowed parameters can't escape the function; if they can escape the function at all, then they're marked as not borrowed.
@sharevariable, which the C version doesn't. In the
@share = @sharecase, this is used to determine whether to increment the reference count (which doesn't exist in the C version), and in the
@own = @shareversion this is used to potentially eliminate a copy if the reference count is 1 (which the C version can't do, since there's no reference count).
structmembers are not pointers, so they're always copied.
Some known limitations with the algorithm as implemented, and things that could potentially be improved:
structs are actually big lists), and this language probably isn't useful for taking actual measurements of how long things take (since the aforementioned optimization would always apply). It also means that I haven't written any code to deal with
Boxes, which would probably be necessary to support recursive structures.
Boxes, all the values here are probably directly on the stack, so moving might not actually be faster than copying anyways.)
@ownis helpful) and then shared to more than one other variable (
@shareis helpful), then either each share makes a copy, or each modification makes a copy; it would be helpful if the variable could start out
@ownand then switch to
@shareafter the last modification. The included tests
rec-read-return.cddare affected by this limitation.
struct. This means that I don't have to deal with lifetimes other than the lifetime of the current function and the lifetime of the function we're currently calling, but it means that some functions can't borrow when they should be able to.
rec-read-mod-return.cdd, the return value of
fis inferred as
@sharebecause of the recursive call to itself, which returns
@share). This seems like the sort of thing that could be reduced to some known problem (constraint something? something optimization? something graph theory?), but I'm not sure what.
structmembers (although simply allowing modes other than
@ownis an improvement over the transpiler to C). I'm not sure how it could go about that; that might need something more whole program optimization–y (since it's not possible to tell how a
structis used just by looking at its definition), or some way to be generic over the mode. Or maybe defaulting to
@share, combined with the fact that refcount = 1 doesn't require copying, is sufficient.
Download cdd-rs.tgz (42K)