Looking at this project again around when I posted it (many years after my initial project), I noticed that there is some similarity between the algorithms that I used and how Rust does things. The C// optimizer tries to ensure that if a function modifies a variable, then it has the only copy of that variable, which is the same guarantee that Rust tries to make, except that Rust requires you to explicitly specify when you have the only copy of a variable and when to make copies, whereas C// tries to infer that. There are even similarities in how they achieve this:
modifyable
is true
behave similarly to owned values in Rust: both can be potentially modified, and both can be transferred to a different variable without copying as long as the original variable is no longer used.(I even use the term "borrowed" and "ownership" at one point in the code (Expr.java
). I don't actually remember whether I was aware of Rust at the time; it's possible some vague knowledge of Rust inspired this design.)
That means that C// is kind of like a simplified Rust-like language where borrowing vs. ownership is inferred rather than having to be explicitly specified, so my thought is that maybe I could try transpiling this into actual Rust. That means that C// is not only a proof-of-concept for value semantics, but also for this inferred Rust-like language idea.
This transpiler not only transpiles into Rust, but is also written in Rust. It's also my first attempt at writing a real program in Rust.
You will need rustc
and cargo
installed. This program has no dependencies on external crates other than std
.
You will also need something that can extract gzipped tar files.
The download link is at the bottom of the page.
cd
into the directory, then type cargo build
.
You can run the unit tests with cargo test
, and run all the included test programs by typing ./do-test
. You can run a specific test by typing ./do-test name-of-file
(without the .cdd
extension or the test-input/
prefix). This will run four tests per file with different options, putting the output into the test-output-*
directories, and compare the files with each other and with test-output-expected/*.txt
if the file exists. (For both this and the original version, this only tests for correctness, not optimizedness.)
To run, either type
cargo run -- options input-file.cdd > output-file.rs
or
target/debug/cdd-rs options input-file.cdd > output-file.rs
options includes:
--default-share
@share
(see below).--optimize
struct
value will be owned (and copied even when not necessary), or, if --default-share
is supplied, every struct
value will be reference counted.-O0
through -O4
-O0
is the default, and -O4
is the same as --optimize
. Anything in between will perform only a subset of the optimization steps, which could be helpful if you're trying to figure out what they do or how they work.@share
, @own
, and @borrow
can be specified before struct
types (explained below).struct
s be defined before they're used in function parameters and return types. Due to differences in how I decided to structure the code, this requirement isn't enforced by this transpiler. struct
s still need to be defined before they're used as members in other struct
s, though (this prevents recursive struct
s).In addition, the following behaviors, which are undefined or implementation-defined in the transpiler to C, happen to be defined in this version of the transpiler to Rust (but could change in future versions):
int
s are 32-bit.Every variable, argument, and return type is assigned a type mode. There are three possible type modes, and (unlike in the transpiler to C) you can explicitly specify one to use (syntax: @own struct struct_name var_name
):
Mode | Translates to (in Rust) | Equivalent in the transpiler to C | Notes |
---|---|---|---|
@own | T | modifyable = true | cheap to modify, assigning to anything but @borrow is expensive unless it's the last use |
@share | Rc<T> | modifyable = false isContained = false | expensive to modify, assigning to other @share s is cheap |
@borrow | &T | modifyable = false isContained = true | cannot modify |
Some more notes:
@borrow
can only be used on function arguments, and variables declared @borrow
cannot be modified (either by reassigning them or by assigning to one of their members).@share
variables can be modified, but doing so will (potentially) make a copy.@share
variables have a reference count (unlike in the C implementation, where memory just leaks), there's an additional optimization where if the reference count is 1, it won't actually make a copy. This means that only using @share
is effectively an alternative solution to the problem of implementing value semantics, namely using runtime reference counting rather than static analysis.If optimizations are disabled, every variable where the mode isn't specified is marked as @own
if --default-share
isn't specified on the command line, and @share
if it is. If optimizations are enabled (≥ -O2
), then --default-share
only affects struct
members; the mode of everything else is determined by the optimizer.
In addition, each use of a variable is marked as "copy" or "move". If optimizations are disabled, then every variable use is "copy", and there's no way to change that.
For ≥ -O2
, all parameters start out @borrow
and everything else (except struct
members) starts out @share
.
-O1
): Goes through each function backwards, with a mutable set of live variables, which is initially empty. Each time it encounters a use of a non-live variable, it marks that use as "move" and adds it to the set of live variables. Each time it encounters an assignment to a variable or variable declaration, removes it from the list of live variables. This corresponds to the findUsedVariables
and analyzeLiveness
steps in the C version; however, this version assumes that expressions are evaluated left-to-right (so in f(x, x)
, the second x
could be the last use of x
), whereas the C version assumes that evaluation order is undefined (so in f(x, x)
, neither x
is treated as the last use).-O2
):@share
member of it) has a "move" use in a context that takes an @own
variable, then it's marked as @own
.@borrow
parameter is assigned to, it's upgraded to @share
. (This is why -O2
is required before defaulting to parameters to @borrow
; otherwise an implied @borrow
parameter could be assigned to.)@own
, then the return type is upgraded to @own
.Corresponds to findModifiedVariables
and findTransferOpportunities
in the C version.
-O3
):@borrow
variable is used in a context where it takes a @share
variable, that variable is upgraded to @share
.Kind of corresponds to setting isContained
in the C version, except in that version it just used a simple heuristic (is the parameter's type different from the return type?).
-O4
):@share
variable is used in a context that expects a @share
, or a @share
variable is assigned a value that comes from something else @share
, that variable is marked.@share
variables except function parameters are upgraded to @own
, so they don't need a reference count (since there's nowhere to take advantage of it).@share
, then its return type is upgraded to @own
.Doesn't have an equivalent in the C version. Since shared variables didn't have reference counts in the C version (since it just leaked memory), there aren't any reference counts to optimize away. (That said, a function that e.g. calls a constructor and just returns its result unmodified probably could benefit from this, e.g. make_return
in constructor.cdd
.)
-O1
): Goes through every function call, and every time an variable is borrowed, further uses of the same variable in the same call are marked as moves. This is because e.g. f(&x, x)
could be generated by the liveness analyzer, but is invalid (since the borrow lasts for the duration of the call), so the second x
becomes cloned. (This wasn't an issue for the C version, since it would have classified both x
's as moves regardless of whether one was borrowed.)Each of steps 2, 3, and 4 run in a loop until they reach a fixed point, going through the whole program each time. (Functions do only depend on functions they call, not functions that call them, but this means that I don't have to figure out what the actual call graph is like, and can also handle recursion.) Variables can only go from @borrow
to @share
to @own
, and can't go back, which guarantees that this will eventually terminate.
In all cases, if a mode is explicitly specified, then the optimizer won't change it.
After all that, when generating the code, there's a huge match
statement with a case for each combination of expected mode, mode of the expression, and whether moving is allowed:
@borrow = … |
@share = … |
@own = … |
||
---|---|---|---|---|
…= @borrow |
expr |
Rc::new(expr.clone()) |
expr.clone() |
|
…= @share |
Move | &*expr |
expr |
unwrap_or_clone(expr) |
Copy | Rc::clone(&expr) |
(*expr).clone() |
||
…= @own |
Move | &expr |
Rc::new(expr) |
expr |
Copy | Rc::new(expr.clone()) |
expr.clone() |
For comparison, here's what the C implementation does:
!modifyable = … | modifyable = … | struct member = … |
|||
---|---|---|---|---|---|
isContained | !isContained |
||||
…= !modifyable | no copy /*shared*/ | copy | copy | ||
…= modifyable | Function return val | no copy /*borrowed*/ | no copy /*return transfer*/ |
||
isTransfer | no copy /*transfer*/ |
||||
!isTransfer | copy | ||||
…= struct member | copy | copy |
The differences:
@borrow
and @share
, and copies in the @share = @borrow
case. The C version doesn't even have access to the information to make that distinction. In the C version, borrowed parameters and the equivalent of @share
variables can be mixed freely, as long as borrowed parameters can't escape the function; if they can escape the function at all, then they're marked as not borrowed.@share
variable, which the C version doesn't. In the @share = @share
case, this is used to determine whether to increment the reference count (which doesn't exist in the C version), and in the @own = @share
version this is used to potentially eliminate a copy if the reference count is 1 (which the C version can't do, since there's no reference count).struct
members are not pointers, so they're always copied.Some known limitations with the algorithm as implemented, and things that could potentially be improved:
int
struct
s are actually big lists), and this language probably isn't useful for taking actual measurements of how long things take (since the aforementioned optimization would always apply). It also means that I haven't written any code to deal with Box
es, which would probably be necessary to support recursive structures.(Speaking of Box
es, all the values here are probably directly on the stack, so moving might not actually be faster than copying anyways.)
@own
is helpful) and then shared to more than one other variable (@share
is helpful), then either each share makes a copy, or each modification makes a copy; it would be helpful if the variable could start out @own
and then switch to @share
after the last modification. The included tests choose-an-arg.cdd
and rec-read-return.cdd
are affected by this limitation.struct
. This means that I don't have to deal with lifetimes other than the lifetime of the current function and the lifetime of the function we're currently calling, but it means that some functions can't borrow when they should be able to.rec-read-mod-return.cdd
, the return value of f
is inferred as @share
because of the recursive call to itself, which returns @share
). This seems like the sort of thing that could be reduced to some known problem (constraint something? something optimization? something graph theory?), but I'm not sure what.struct
members (although simply allowing modes other than @own
is an improvement over the transpiler to C). I'm not sure how it could go about that; that might need something more whole program optimization–y (since it's not possible to tell how a struct
is used just by looking at its definition), or some way to be generic over the mode. Or maybe defaulting to @share
, combined with the fact that refcount = 1 doesn't require copying, is sufficient.Download cdd-rs.tgz (42K)