Optimizing a call-by-value language



In many programming languages, such as Java, JavaScript, Python, and Scheme, complex data types like objects and arrays are treated as reference types (i.e., call-by-pointer). If an object is passed to a method, changes the method makes to members of that object are also visible to the caller. However, there are other ways languages could handle such situations; in particular, they could treat all complex data types as values, where each variable has its own copy of the entire structure, and modifications to one copy do not affect other copies. (I will call this call-by-value, though under some definitions call-by-pointer is a type of call-by-value. C behaves this way with structure types.)

There are advantages and disadvantages to each way of doing things; for instance, call-by-value could make code easier to reason about, because you know that local variables will not be modified. One disadvantage of call-by-value over call-by-pointer is that it uses more memory and has overhead for copying objects. However, it seems to me like a sufficiently-smart optimizer could notice places where an object does not have to be copied and optimize those places to use call-by-reference instead. My goal in this project is to show that such an optimizer is possible.

Cases that should be optimized

The following are some cases that I expect to be common that such an optimizer could optimize:

  1. Passing a complex object to a function that doesn't modify any part of the object and doesn't allow the object to escape. For instance, many functions that simply output the contents of the object (whether to the console, or to a GUI, or to a file or networked computer) fall into this category, as do many functions that compute a simple value based on the object (such as summing all elements of an array). The tests simple-readonly-call.cdd and more-readonly-calls.cdd test this case.
  2. Passing an object to a function that modifies the object and then returns it, and storing the return value in the original object (for example, array = sort(array)). This could be used in cases where pass-by-pointer or pass-by-reference languages would just destructively modify the object without returning it; therefore, it should be optimizable in many cases to do what pass-by-reference languages do. (An advantage to using call-by-value here would be that functions like these have a consistent and predictable interface; there's no need to guess whether a function like sort() in this language modifies the array/list or returns a copy of it.) The tests modify-and-return.cdd and sum-of-sums.cdd test this case.
  3. Initializing a new object, and then returning it. This could be used for constructor-like functions and for functions that calculate a value that happens to be of a complex type. The test constructor.cdd tests this case.

(The optimizer should also work on other cases, such as sharing variables within a function, but these are what I'm most focused on.)

Algorithm description

Each structure variable in a function is determined by the compiler to either be modifiable xor sharable. If a variable is modifiable, the compiler will allow its members to change, but will make sure that it is not aliased anywhere where those changes can be seen (copying it if necessary). If a variable is sharable, it may be aliased to other variables, but its members cannot be modified. The compiler makes a simplifying assumption that if a function can modify the variable somewhere, it can modify it anywhere in that variable's scope (as long as it's live).

Whether a variable is modifiable is initially determined by whether the function modifies a member of the object. In other words, a variable x is modifiable if a statement like this appears somewhere in the function, and sharable otherwise:

x.y = something;

Assigning a modifiable variable to anything or anything to a modifiable variable usually requires a copy. However, one case that doesn't is if a variable is passed to a function that doesn't modify it and the variable doesn't escape; in that case, the variable becomes temporarily immutable and thus sharable for the duration of the function. I use a simple way of determining if a variable can escape: the only way the current language specification allows a variable to escape is through a function's return value (also through struct fields, but those aren't optimized), so I assume a variable can escape if and only if its type is the same as the function's return type.

It also performs liveness analysis; if a function uses a variable for the last time when calling a function (and the variable can't be shared with anything else), the optimizer can transfer ownership of the variable to the new function, which can then share or modify it as it sees fit. The optimizer looks for opportunities to transfer ownership like this; it can also mark a variable as modifiable (and thus not sharable, thus potentially requiring whatever was passed to that variable to copy it) if it sees that a previously-sharable variable is transferred to a modifiable variable. This optimization is used for case 2 in the cases that should be optimized section above.

Because this can change whether some variables are modified, looking for such optimizations is done in a loop, which ends when nothing has changed. Since this can only add transfers and make more variables modifiable, and not the other way around, it should always terminate.

There's also a special case for function return values, which I'm not sure I'm completely happy with, but it seems to work well enough (and shouldn't produce wrong results, even if it sometimes copies when it doesn't have to).

Implementation notes

The modifyable [sic] attribute is stored as an instance variable in the object representing the variable's type. This variable is (perhaps somewhat ironically) modified through many references to this object; the object is passed around in such a way that each variable has its own copy, and expressions referring to that variable use the copy associated with the variable.

Subclasses of each main superclass (Statement, Expr, Decl, Type) are in the same file as the main class, except that Decl is a subclass of Statement.

I used an AST as an intermediate representation, with an additional node Expr.PointerExpr that means "dereference or copy if you have to". The final decision of whether to copy a value is made in Expr.PointerExpr.toC().

Some of the code relating to making the parser and lexer work was copied from my CS321 assignment, including from the templates provided in that class. I hope this is okay. This is in Makefile, lexer.lex, parser.cup, and ParserDriver.java. (The language and the representation of the AST is different, so most of the actual lexer and parser rules were not copied from there.)

I started to implement better escape analysis, but none of the code is actually called.

If a variable is used twice in an expression, I assume that neither use is the last use. This is because expressions translate fairly directly to C expressions, where order of evaluation is unspecified. (I could have made a special case for && and ||, but I didn't.)

When generating the C code, names of variables are prefixed with cdd_ in order to prevent conflicts with names used by C (reserved words and things in the standard library) and with names used internally by the compiler.

Potential downsides

The optimizer depends on things it finds out from the implementation of the function. If the implementation of a function changes, other modules that use the function (if it supported separate modules) would need to be recompiled. (However, I think this is true of GHC (Haskell compiler) as well—I'm pretty sure it allows library functions to be inlined in code that uses them.)

Using an optimizer like this may encourage people to write with the optimizer in mind (avoid certain constructs that are known to be slow, put extra effort into deciding whether to assign to a member variable or assign a whole object, etc.), thus cancelling out some of the advantages of having the compiler take care of it.

I did not get around to doing any of the optional stuff I said I might do if I had time.


The algorithm seems to work for programs that I've tried; in particular, the tests mentioned in "Cases that should be optimized" do not use any copying at all.