T-Write

Programming language based closely on Turing machines with a Turing-complete macro language based on expression rewriting.

The main data type in T-Write is a combination between a dictionary and a function that can use pattern matching. A T-Write program is one of these dictionaries/functions, and there is an operator that allows looking things up in the main dictionary of the program. This by itself would be Turing-complete, but this dictionary is then used to construct the states for a Turing machine (or a finite-state automaton).

Contents

Lexy stuff

Whitespace between tokens is ignored.

Names consist of letters, digits, -, ., _, '. Like Haskell, whether a name starts with a capital letter determines how it's interpreted. For this purpose, _ is treated as lowercase, while -, ., ', and digits are treated as uppercase. The name _ is reserved for use as a wildcard.

Comments start with % and last until the end of the line. Also, in the newer version, if a comment at the beginning of the program starts with %% (two parentheses and a space), then the rest of the line will be printed when the program starts (this can be used to add a description of what the program does).

Types and values

Basic types include:

A tuple (value1, value2, ...) is equivalent to a dictionary with keys 0, 1, etc. A list is a linked list of (head, rest) pairs.

Expressions

Expressions include:

lowercase name
The value of a variable bound in the current scope
capitalized name
A symbol literal
{ key : value ; key : value }
A dictionary. There may be any number of semicolon-separated key : value pairs, including zero. A trailing semicolon is allowed and is ignored. The keys are patterns and the values are expressions that can refer to variables bound in the patterns. If a key pattern matches multiple values, then all relevant values are in the dictionary. If key patterns overlap, the first one listed takes priority.
( value , value )
A tuple (dictionary with keys 0, 1, etc.). There may be any number of comma-separated values, including zero or one. (Note that this means parentheses are not used for precedence disambiguation like in most languages; a single parenthesized expression always creates a 1-tuple.) A trailing comma is allowed and is ignored. The values are all expressions. This is simply a shortcut for the equivalent dictionary literal.
( : value1 , value2 : )
A linked list. Equivalent to (value1, (value2, ()))
( : value1 , value2 : last )
A linked list with last replacing the end-of-list marker (like Lisp's (x y . z)). Equivalent to (value1, (value2, last))
[ : value , integer : ]
A linked list with the specified number of copies of an item.
expression *
Rewrite the expression (explained below).
expression + dictionary literal
Replace or add entries to a dictionary. It is a type error if expression is not a dictionary.
* pattern
Pattern literal.
= expression
Same as expression (for consistency with patterns).

Patterns

Pattern syntax is a superset of expression syntax.

_
Wildcard; can match anything. Because patterns involving _ can match infinitely many things, there are some situations where _ is not allowed.
variable @ pattern
Matches pattern and binds variable. A variable cannot be bound twice in the same scope.
variable @
Shortcut for variable @ _
capitalized name
Matches an exact symbol.
{ key : value ; key : value }
Matches a dictionary. Keys and values are both patterns. The dictionary must contain all keys that match the patterns, and it may not contain more. If a key has a pattern that can match multiple things, then the corresponding value cannot bind variables. Variables bound in a key are only in scope for that entry. The key patterns may not involve _ wildcards. Note that a dictionary literal with overlapping keys might not match the equivalent pattern; e.g., {A:1; [A,B]:2} does not match {A:1; [A,B]:2} because that pattern would require A to be both 1 and 2.
{ key : value ; key : value ; _ }
Same as a dictionary pattern, except that it also allows the dictionary to contain more entries.
{ : variable @ optional pattern }
If the key in a dictionary pattern is omitted and the value binds a variable, the key is implied to be the variable name with a . in front of it. That is, {: foo@pattern} is equivalent to {.foo: foo@pattern}.
( pattern1 , pattern2 )
( : pattern1 , pattern2 : )
( : pattern1 , pattern2 : end pattern )
[ : pattern , integer : ]
These still expand to the same things as they do in expressions; patterns can be used as expected inside these constructs. (:items:_) can be used to match lists that start a certain way, without requiring that it's the whole list.
[ : pattern , integer # integer : ]
Matches a list of a particular type with a specified range of lengths. The lower bound may be omitted, in which case it is implied to be 0. The upper bound may not be omitted.
[ pattern , pattern ]
Union (match if any of the patterns matches). Comma-separated list of zero or more patterns. If a variable is bound in one pattern, it must be bound in all of the patterns.
optional integer # optional integer
An integer within the specified range, inclusive (symbols whose names are made of digits). # means "any integer". Note that integers are symbols, and there isn't any way for a symbol to exist that isn't in the source code (once all syntactic sugar is expanded), so in any given program, # can only match a finite number of things.
= expression
Evaluate the expression. If the expression results in a pattern, use that pattern. If the expression results in any other value, test that value for equality (e.g., can be used to escape dictionaries with overlapping keys).
expression *
expression + dictionary literal
Do the same thing as if there were = in front of the expression (e.g., foo* as a pattern means the same thing as =foo*).
lowercase name
Matches the value of a variable already bound in the current scope, acting the same as if = were in front of the variable name.
* pattern
Same as pattern by itself (for consistency)
expression ~ ( pattern1 , pattern2 )
Matches if expression matches against pattern1, and pattern2 matches the original input. (Intended for simple descriptions of Turing machines, where the output depends on both the current tape symbol and the state.)

Postfix operators have a higher precedence than prefix operators.

If an infinite dictionary (a dictionary with a key containing _) is matched for equality for another dictionary, the test will fail.

Patterns cannot match other patterns except using a wildcard (_).

Ranges

There is a shortcut for lists of similar identifiers, like Item1, Item2, Item3, or A, B, C, D, E. This can be used for capitalized identifiers in comma-separated lists (tuples, linked lists, unions). Include the common part, and replace the changing part with the range in angle brackets.

The following forms are allowed:

<upper>
A through Z
<lower>
a through z
<digit>
0 through 9
<udigit>, <ldigit>
Uppercase or lowercase (respectively) hexadecimal digits
<byte>
0 through 255
<sbyte>
-128 through 127
<ubyte>, <lbyte>
Uppercase or lowercase (respectively) pairs of hexadecimal digits
< start , end >
Start through end, where start and end are numbers, uppercase letters, or lowercase letters
< start , second , end >
Start through end, also specifying the second item (which the step is deduced from). May include numbers with decimal points.

For instance, ['<upper>', '<lower>'] is equivalent to ['A','B','C' ... 'Y','Z','a','b','c' ... 'z'], and (<0,2,20>) is equivalent to (0,2,4,6,8,10,12,14,16,18,20). (Note that the latter is also the "multiply by 2" function, with a limited domain; this is one potential use for this feature.)

You can also use them in a dictionary, in which case both the key and the value must be ranges; corresponding elements of each range are used.

Programs

A program is of the form capabilities_pattern : io_pattern , tape_pattern , state_pattern : dictionary_expression. The dictionary expression has the primary rules for the program; it maps states to actions, and the * postfix operator looks up values from this dictionary. The capabilities pattern (and its following colon) may be omitted. If the IO pattern is omitted, it is assumed to be the same as the tape pattern; otherwise, it must be a subset of the tape pattern.

The capabilities pattern determines the capabilities of the program. Implementations may reject programs that require capabilities that they don't support, or use the pattern to determine which implementation to choose. Capabilities are described as a dictionary that contains at least the following keys:

Mem: [Tape, Stack, State, Finite]
Determines the type of memory available. Tape may use Move: rules (Turing complete); Stack may use Stack: rules (push-down automaton); and State may have _ in its state, IO, and tape patterns (Turing complete). Finite can do none of these (finite state machine). Any of these may use Write: rules (all except Tape have a one-symbol tape). (The current version of the interpreter doesn't support Mem: Stack.)
Nondeterm: [True, False]
Determines if nondeterminism is allowed. (The current version of the interpreter doesn't support Nondeterm: True.)
In: [Tape, None, Interact, Both]
Determines how input is received. For Tape, input is on the tape when the program starts (for memory type Tape, the current position is at the start of the input; for other memory types, the input is one symbol); otherwise, the tape is initially empty. For None, no input is received. For Interact, IO:In operations may be used; if both input and output are set to Interact, IO is interactive. If In: Both is specified, the program accepts input both interactively and on its tape.
Out: [Tape, Bool, Interact, Both]
Determines how output is given; similar to input. For Tape, the output is the section of the tape including the current pointer, extending left and right to the first blank encountered (or the current symbol, for memory types other than Tape). For Bool, the output is true if the program uses Halt and false if it uses Reject. For Interact, IO:Out operations may be used; if only output is Interact, this can be used to generate infinite output and still let the user see it. If Out: Both is used, the program gives output both interactively and on its tape.

Implementations should support programs that only accept these keys. If the capabilities pattern does not contain _, options listed first in the program are preferred. Variables bound in the capabilities pattern are available throughout the program.

tape_pattern is the type of symbol on the tape. It must not contain _ (including implicitly). Variables bound here are bound throughout the program. io_pattern is the type of symbol used for input and output, whether interactive or through the tape; it may not contain _ or an unbounded #. Variables bound in io_pattern aren't available elsewhere; variables should be bound in tape_pattern instead. state_pattern is the type of state used by certain built-in rules; it may not bind any variables or contain _. Everything matched by the state_pattern must be matched by a key in the main dictionary.

The first possible value for tape_pattern is considered to be a blank value; parts of the tape that don't have input are filled with this value.

The variables io_pattern, tape_pattern, and state_pattern are bound throughout the code to refer to the patterns at the beginning of the program.

The initial state is Start. For each state, the current state is looked up in the dictionary literal, and the result is used as the next state, unless it matches one of the built-in rules:

Halt
Halt in an accepting state. Always supported.
Reject
Halt in a rejecting state. Always supported.
{Write: t@tape_pattern; Next: s@state_pattern}
Write symbol t, and go to state s. Always supported.
{Move: n@#; Next: s@state_pattern}
Move n spaces to the right (can be negative to move left), and go to state s. Only supported with Mem: Tape.
{Stack: which@[Push,Pop]; Next: s@state_pattern}
Moves an element between the current tape symbol and the stack. Requires Mem: Stack.
{IO: In; Next: s@state_pattern}
Reads a symbol of input into the current tape cell. Requires In: Interact.
{IO: Out; Next: s@state_pattern}
Outputs the current symbol on the tape. Requires Out: Interact.
{Write: t@tape_pattern; Move: n@#; Next: s@state_pattern}
{Write: t@tape_pattern; IO: Out; Next: s@state_pattern}
{Write: t@tape_pattern; Stack: Push; Next: s@state_pattern}
Writes a symbol to the tape, then performs the specified action. Has the same requirements as performing the action normally.
{Choose: (state1@state_pattern, state2@state_pattern)}
Fork a new instance of the program for state1 and state2. The entire machine accepts the input if any copy of it accepts. Only supported with Nondeterm: True.

Built-in rules take precedence over user-defined rules. Built-in rules are ignored by the * operator. Built-in rules that are not supported by an implementation are treated as ordinary states; this allows programs to work on multiple types of machines. It is a type error if the rewritten expression is not a key in the main dictionary.

Note that the rewriting may be done at compile time. If it does not terminate for some (state, tape) combination, the compiler is not guaranteed to terminate either.

By default, input is expected to be in the form of valid expressions. If you instead want character-based IO, you can define a rule in the main dictionary for CharCode which maps to a number (Unicode value) for each tape value; this is used to construct a lookup table between characters and values.

Translation from Brainfuck

Template:

current @ 0 # 255, #: {
	Incr: (<1,255>,0);
	Decr: (255,<0,254>);
	{Lookup:key@; In:{key:value@; _}}: value;
	{If:0; Then:_; Else:e@}: e;
	{If:_; Then:t@; Else:_}: t;
	{Modify:op@; Next:n@#}: {Write: {Lookup:current; In:op}*; Next:n};
	{LoopFrom:start@#; To:end@#}: {If:current; Then:start; Else:end};
	Start: 0;
	code goes here
	last opnum + 1: Halt
}

Commands:

-opnum: {Modify:Decr; Next:next opnum}
+opnum: {Modify:Incr; Next:next opnum}
<opnum: {Move:-1; Next:next opnum}
>opnum: {Move:1; Next:next opnum}
[ ... ][first opnum,second opnum]: {LoopFrom:first opnum + 1; To:second opnum + 1}
.{IO:Out; Next:next opnum}
,{IO:In; Next:next opnum}

Interpeter

Interpreter written in Haskell (updated 2020; source only, download and run; 56K compressed, 412KiB uncompressed; also includes TurExp)

Original interpreter written in Haskell (not compatible with modern versions of GHC; source only; download and run; 49K compressed; also includes TurExp)

Running the interpreter

The interpreter can either be run interactively, by simply typing ./twrite, or with command-line arguments, by typing ./twrite filename tape-args. Command-line arguments are used for the Tape input type; if the program uses the Interactive input type, input comes from standard input. The -d option can also be specified to show debugging information (note that debug mode ignores CharCode).