Turing Expressions (version 2)

The Turing Machine equivalent of regular expressions. A nondeterministic esoteric language.

The language runs on a nondeterministic Turing Machine, or a simulator of one. The machine has a tape, extending infinitely in both directions, with 256 possible symbols. The input is initially on the tape (the rest of the tape is null characters (\00)), and the output is whatever is left on the tape when the program terminates (up to the next null character in each direction). The tape has a current position, which is initially at the beginning of the input.

The code, input, and output are assumed to be in an 8-bit superset of ASCII (though implementations may choose to treat input and output as numbers instead). Whitespace in the code is ignored except in quoted mode and after a backslash. There is no built-in support for comments, though various no-ops such as ([]"comment goes here")? can be used instead.

If multiple paths finish successfully, the final result may be from any of the paths.

programcompiler-directive* instruction
A program is an instruction (typically a sequence instruction)

Complex instructions

instructionsimple-instruction
Described below. Read and/or write symbols and/or move the tape head.
instructioncompiler-directive
Described below. Changes the current mode.
instruction( instruction* )
Executes the specified instructions as a sequence. () is a no-op.
instruction[ instruction* ]
Forks the machine for each instruction, and executes that instruction as the next instruction. [] causes the current machine to fail (i.e., the current fork is not the one that produces the final output).
instructioninstruction *
Executes the specified instruction as many times as necessary.
instructioninstruction +
a+ is a shortcut for (aa*); i.e., executes the specified instruction at least once.
instructioninstruction ?
a? is a shortcut for [()a]; i.e., optionally executes the specified instruction.

Multiple *, +, ? operators cannot appear next to each other without parentheses. Compiler directives cannot come between the statement and the operator.

Compiler modes

The compiler has several modes, which affect the interpretation of various instructions (this is partly due to indecision on the part of the language designer). The compiler mode can be changed using compiler directives, each of which is a single character.

Directives affect any characters after them in the source file, regardless of whether they would be executed at runtime; furthermore, the directive does not count as an instruction. This version of TurExp eliminates run-time mode, which would change this behavior.

Direction compiler directives cannot be used in modeless mode.

Symbols

Symbols are used for reading and writing. The syntax for a symbol depends on the character mode.

For normal mode

symbolcharacter - reserved-character - whitespace | \ (character - hex-digit) | \ hex-digit hex-digit | .

reserved-character → one of . \ " { } ` ' # > < | ( ) [ ] * + ? ^

A normal character or a character preceded by a backslash stands for its ASCII value. A backslash followed by a two-digit hex literal is the value of the literal. A dot matches anything (regardless of inversion mode) or leaves the current cell unchanged.

For binary mode

symbolhex-digit hex-digit | \ character | .

Like normal mode, but the meaning of backslash is reversed. An unrecognized character is a syntax error.

For quoted mode

symbolcharacter - "

Each character stands for its ASCII value.

Simple instructions

The meaning of a simple instruction depends on the current RW mode.

In read mode

simple-instructionsymbol

Rejects if the current symbol is not the specified symbol xor the inversion mode is inverted; otherwise, moves in the current direction.

In write mode

simple-instructionsymbol

Writes the specified symbol, then moves in the specified direction.

In modeless mode

simple-instructionsymbol symbol direction

direction< | > | |

Reads the first symbol (possibly inverted), writes the second, and then moves in the specified direction. Compiler directives cannot be between the three parts of the instruction. The direction can be specified even in quoted mode, and it can be preceded by whitespace even in quoted mode.

Interpreters

Interpreter written in Haskell (source only; download and run; 49K compressed; also includes T-Write)