TurExp (esoteric programming language)

A Turing Machine equivalent of regular expressions. A nondeterministic esoteric programming language, nondeterministic in the computer science sense of always choosing the right option when there's a choice.

The language runs on a nondeterministic Turing Machine, or a simulator of one. The machine has a tape, extending infinitely in both directions, with 1112064 possible symbols (one per Unicode character, but see below) and a current position that's pointing to a symbol on the tape (not between symbols).

Whitespace in the code is ignored except in quoted mode and after a backslash. There is no built-in support for comments, though no-ops such as ([]"comment goes here")? can be used instead.

program → ((?v3))? compiler-directive* instruction: A program is an instruction (typically a sequence instruction). Programs can optionally start with (?v3) to indicate that they use TurExp version 3. (This is not necessary to use version 3–specific features; it's mostly there for the possibility of a version 4 or higher.)

Character encoding
Input and output
Complex instructions
- The (?s) instruction (version 3 only)
Compiler modes
Symbols
Simple instructions
Macros (version 3 only)
Full grammar
Version differences
- Version 1 (unimplemented) to 2
- Version 2 to 3
Translation from Brainfuck
Interpreter written in JavaScript (TurExp 3)
Interpreter written in Haskell (TurExp 2)
- Running the interpreter
- List of examples

Character encoding

Version 2 was documented as having 256 symbols, and the documentation said "The code, input, and output are assumed to be in an 8-bit superset of ASCII (though implementations may choose to treat input and output as numbers instead)". However, the actual implementation used the platform-default character encoding and actually seems to support Unicode just fine, except that characters outside Latin-1 can't be entered as hex codes.

Version 3 fully supports Unicode. There are 1112064 possible symbols, one for each Unicode character except the surrogate code points (d800-dfff), which TurExp doesn't consider to be valid characters.

In version 3, the (?u) and (?-u) compiler directives determine the encoding of the source file. (?u) indicates that everything from that point on is in UTF-8; (?-u) indicates that everything from that point on is Latin-1. Whichever one is in effect at the end of the program determines the encoding for input and output.

However, no implementation actually supports these directives, because the only version 3 interpreter is written in JavaScript, and its input and output are strings rather than binary, so it just ignores these directives. They're still specified in case there are any non-JavaScript implementations at some point, and recognized for compatibility with such implementations.

Input and output

Before the program starts, a block of text is read as input. This text is put onto the tape before the program starts, and the current position is set to the first character of this text. The rest of the tape is filled with null characters (\00).

When the program ends, the interpreter outputs whether the program got to the end (accepted) or if all branches failed (rejected). If the program accepted, the interpreter looks to the left and the right of the current position for the nearest null character, and outputs everything on the tape between those characters. If the current position is on a null character, then it starts looking one character to the left. This means that, e.g., if the tape has \00foo\00 on it (and the rest is filled with null characters), and the current position is anywhere between the f and the \00 directly after the word foo, inclusive, the program will output foo; otherwise, it will output nothing.

If the program forks into multiple copies and more than one complete successfully, one of them will arbitrarily chosen to be the output. In the current interpreter, the same output will always be chosen for the same program/input combination, but that's not guaranteed, and the output chosen can differ between interpreters and versions of an interpreter. If the "Show all results" checkbox is checked (or the -a option is given to the Haskell interpreter), then the interpreter will continue running until all copies of the program have succeeded or failed.

There is no way to do input while the program is running, and no good way to do output before the program finishes. For debugging purposes, in version 3, you can check "Trace" to show every command that's executed, and you can print the current state using the (?t) instruction, which is a no-op that's always traced even if "Trace" isn't checked. You can put text in the parentheses after the ?t, which will also be printed.

Complex instructions

instruction → simple-instruction: Described below. Read and/or write symbols and/or move the tape head.
instruction → compiler-directive: Described below. Changes the current mode. (Technically not an instruction, but can be used where an instruction can, except before *, +, or ?.)
instruction → ( instructions ): Executes the specified instructions as a sequence. () is a no-op.
instruction → [ instructions ]: Forks the machine into as many copies as their are instructions, and each copy executes a different instruction from the brackets. [] causes the current machine to fail (i.e., the current fork is not the one that produces the final output).
instruction → (?^ characters ) (version 3 only): Reads a single character and fails if it matches any of the specified characters. characters technically can be any instructions, but things other than characters and ranges might behave unexpectedly. See the section on modes for details.
instruction → (?? instructions ) (version 3 only): Chooses one of the instructions at random (equally likely) and executes it. Doesn't attempt to choose another option if it fails.
instruction → instruction *: Executes the specified instruction as many times as necessary.
instruction → instruction +: a+ is a shortcut for (aa*); i.e., executes the specified instruction at least once.
instruction → instruction ?: a? is a shortcut for [()a]; i.e., optionally executes the specified instruction.
instruction → (?$) (version 3 only): Halts the program and causes it to complete successfully.
instruction → (?t (char - " ( ) [ ] \ | " (char - ")* ")* ) (version 3 only): A no-op that causes the interpreter to print the current program state, as described above. The details are implementation-specific, and implementations can choose to ignore this instruction.

Multiple *, +, ? operators cannot appear next to each other without parentheses. Compiler directives cannot come between the statement and the operator.

The `(?s)` instruction (version 3 only)

There are two new instructions that work together to create a temporary variable:

instruction → (?S instructions ): Acts like a sequence instruction, but introduces a new scope. These instructions cannot nest.
instruction → (?s instructions ): Must be used within an (?S) instruction. This instruction acts like a [] instruction, except that every (?s) instruction within a single (?S) instruction always chooses the same index.

For instance, (?S(?sab)(?s12)) first matches either a or b, and then matches either 1 (but only if the first match was a) or 2 (but only if the first match was b). This is equivalent to [(a1)(b2)], but it can be useful if there are a lot of instructions between the first and second match, and it can be useful when used with (?r).

Compiler modes

The compiler has several modes, which affect the interpretation of various instructions. The compiler mode can be changed using compiler directives. More details on these modes are in later sections.

The character mode determines how characters are represented. It can be normal, quoted, or binary, and it is initially normal.
- " switches to quoted mode, where most special characters are treated like normal characters. In quoted mode, " switches to normal mode.
- { switches to binary mode, where symbols are entered as hex values.
- } switches to normal mode, where symbols are entered as characters.
The RW mode determines whether the program is currently reading from or writing to the tape. It can be read, write, or modeless. It is initially read.
- ` switches to write mode.
- ' switches to read mode.
- # switches to modeless mode, where each statement can read and write, and where the direction is specified per-instruction.
The direction mode determines which direction the cursor moves on the tape after reading or writing. It can be left, right, or stop. It is initially right, and is reset to right when exiting modeless mode.
- < switches to left mode.
- > switches to right mode.
- | switches to stop mode, where the cursor doesn't move.
The inversion mode determines whether to read characters that match or that don't match those specified in the code. It can be non-inverted or inverted. It is initially non-inverted.
- ^ toggles the inversion mode.
- If you want to match anything except a specific set of characters, something like [^abc] won't work, because that means (not a) or (not b) or (not c), and any character matches at least one of those options. Instead, use a sequence of reads in stop mode; for instance, ^|abc^ checks that the character isn't a, then checks if it isn't b, then checks if it isn't c, and the | means the cursor doesn't move so it checks the same character each time.
- Version 3 introduces the (?^characters) command, which does that for you. It's syntactic sugar for ('|^characters^>.), except the ^s always enter and exit inversion mode, respectively, and the last direction change depends on the current direction at the start of the instruction.
(Version 3 only.) The case mode determines whether to match case sensitively. It can be case-insensitive or case-sensitive, and is initially case-sensitive.
- (?i) switches to case-insensitive mode.
- (?-i) switches to case-sensitive mode.
(Version 3 only.) The digits mode determines how many hex digits are used per symbol in binary mode or after a \. It can only be changed when in binary mode (in normal mode, you can instead use {x}, {u}, and {U}, which temporarily switch to binary mode). It can be two-digit, four-digit, or six-digit, and is initially two-digit.
- x switches to two-digit mode.
- u switches to four-digit mode.
- U switches to six-digit mode.
(Version 3 only, ignored in the JavaScript interpreter.) The encoding mode determines the encoding of the source file. It can be UTF-8 or Latin-1, and its initial value is implementation-dependent. See the section on encoding.
- (?u) switches to UTF-8 mode.
- (?-u) switches to Latin-1 mode.

Directives run at compile-time and affect any characters after them in the source file, regardless of whether or when the directive would be executed at runtime; for instance, ([ab`]c) is equivalent to ([ab]`c) (both choose between two options, reading a or b, and then write c), and not equivalent to [(ac)(bc)(`c)]. Version 2 of TurExp eliminates run-time mode, which would change this behavior.

Direction compiler directives cannot be used in modeless mode.

Symbols

symbols → symbol | .

Symbols are used for reading and writing. The syntax for a symbol depends on the character mode.

For normal mode

symbol → character - reserved-character - whitespace | \ (character - hex-digit) | \ hex-digits

reserved-character → one of . \ " { } ` ' # > < | ( ) [ ] * + ? ^

A normal character or a character preceded by a backslash stands for its ASCII value. A backslash followed by a hex literal is the value of the literal. The number of digits in the hex literal is determined by the digits mode.

For binary mode

symbol → hex-digits | \ character

Like normal mode, but the meaning of backslash is reversed. An unrecognized character is a syntax error.

For quoted mode

symbol → character - "

Each character stands for its ASCII value.

Ranges (version 3 only)

symbols → (?r symbol symbol? symbol )

In any situation where a symbol is expected, a range can be used instead (except within another range, or in quoted mode). These are notated as (?r, the first character, an optional second character, and then the last character, and ). For instance, (?raz) is equivalent to abcdefghijklmnopqrstuvwxyz, and (?r028) is equivalent to 02468 (skipping the odd numbers because the second character is two characters after the first character). Characters use the same syntax that they normally would (so backslashes are necessary to escape reserved characters, and they're entered in hex when hex mode is active).

A range is equivalent to the corresponding sequence of characters, whatever that means in the context where it's used. For instance, ['(?r15)] matches any of the characters 1, 2, 3, 4, or 5, whereas (`(?r15)) writes the string 12345. Modeless mode treats ranges a bit differently.

A range cannot be directly followed by *, +, or ?, and can't be at the top level, without being in some sort of parentheses or brackets.

Simple instructions

The meaning of a simple instruction depends on the current RW mode.

In read mode

simple-instruction → symbols

Reads a symbol from the tape. If the inversion mode is non-inverted, the symbol on the tape must be the same as the one specified in the instruction; if the inversion mode is inverted, it must be different. Otherwise, the current fork fails. If the case mode is case-insensitive, then the match is case-insensitive (regardless of whether the symbol was specified as a character or a hex code). Afterwards, moves in the current direction.

If a dot (.) is specified instead of a symbol, skips over the current symbol without failing, regardless of inversion mode.

In write mode

simple-instruction → symbols

Writes the specified symbol, then moves in the current direction. If a dot is specified instead of a symbol, just moves in the current direction. Inversion mode and case mode are ignored.

In modeless mode

simple-instruction → symbols symbols direction

direction → < | > | |

Reads the first symbol (possibly inverted), writes the second, and then moves in the specified direction. Compiler directives cannot be between the three parts of the instruction. . can be specified for either or both symbols to skip reading or writing. The direction can be specified even in quoted mode, and it can be preceded by whitespace even in quoted mode.

If a range is specified for either symbol, the entire instruction is repeated. For instance, (?r13)4> is equivalent to 14>24>34>. If ranges are specified for both symbols, then the ranges must be the same length, and corresponding elements are used. For instance, [#(?r08)(?r19)>90>] will add 1 to the current digit, wrapping around if it's 9.

Macros (version 3 only)

macro-char → character - (" . ( ) [ ] \)

compiler-directive → (?M macro-char macro-char* . instruction* )

instruction → (?m macro-char instruction* )

Macros can be defined using (?M), which can appear anywhere a compiler directive can, and can be called with (?m).

For a simple macro with no arguments, it can be defined as (?M name . instructions) (notice there's a dot there), where name is any single character except quotation mark, backslash, dot, parentheses, or brackets. It can be used with (?m name).

For arguments, when defining the macro, put the parameter names between the macro name and the dot (parameter names must also be single characters). If there's a parameter named x, then you can use it with (?mx). To pass arguments, simply put them after the macro name when calling it. For instance, the macro (?M2x.(?mx)(?mx)) performs any instruction you give it as an argument twice; once this is defined, (?m2a) expands into aa.

If a macro is passed more arguments than it expects, then all of the remaining arguments go into whatever the last parameter is. If a macro is passed fewer arguments than it expects, then remaining parameters will expand to nothing. If a macro that doesn't expect any arguments is passed arguments, those arguments are ignored.

If a macro or a macro argument expands to multiple instructions, the interpretation of those multiple instructions depends on context; for instance, if it's in a sequence instruction, then the different instructions are executed in sequence, if it's in a fork instruction, then each instruction becomes a different fork, etc. If a macro expands to multiple instructions, then it's an error for the (?m) instruction to be immediately followed by a postfix operator or to be at the top level.

Compiler directives take effect before macro expansion. For instance, (?Mfx.'(?mx))`(?mfa) writes a rather than reading it, because write mode (`) is active where the a is. Compiler directives do not count as macro arguments, so (?mf`a) has one argument, not two.

Macros can only be called after they are fully defined. This implies, among other things, that they can't be recursive.

Version differences

Version 1 (unimplemented) to 2

Version 2 is not backwards-compatible with version 1. The differences:

*, +, and ? go before the instructions they modify in version 1, and after them in version 2.
Version 1 had an extra compiler directive, !, which was removed in version 2.

In version 1, there was a mode mode, which was initially compile-time but could be switched to runtime using the ! directive (but it couldn't be switched back). In runtime mode, all compiler directives would act like statements, affecting instructions that are executed after the directive executes, so for instance (['`]a) would either read a or write a (equivalent to (['a][`a])).

The main issue with this was that " was a compiler directive, which meant that which characters were quoted could change depending on how the program runs. For instance, if (foo"bar)? executes, then the ending parentheses and question mark don't end the instruction and are instead treated as normal characters, whereas if it doesn't, that's the end of the instruction and a question mark to indicate the instruction is optional.

This made things harder to implement, so I first changed the postfix operators to prefix so at least the compiler would know up front if there's a situation like the one I just described, and then got rid of runtime mode entirely (and changed the operators back to postfix), given that it's optional and I expected people (or at least me) to just choose compile-time mode anyways.

Version 2 to 3

Version 3 attempts to be backwards-compatible with version 2 (aside from some Unicode-related issues). The new instruction types generally start with (?, a sequence of characters that was always a syntax error in version 2: (?i) (case), (?u) (Unicode), (?M)/(?m) (macros), (?^) (negation), (??) (random), (?S)/(?s) (same), (?t) (trace), (?$) (accept).

Of those, (?M)/(?m) (macros), (?^) (negation), and (?S)/(?s) (same) are fairly easy to mechanically translate into version 2 code. (?$) is trickier, (?i) requires Unicode knowledge, and (?u) (Unicode), (??) (random), and (?t) (trace) do things that can't be done in version 2.

Also, in version 2, compiler directives can't come after the main instruction.

Translation from Brainfuck

This assumes a right-infinite tape (going to the left of the beginning of the tape is undefined behavior), and doesn't support input. Input is assumed to be empty; output is in hexadecimal. (This is because actual ASCII output was harder in version 2; the version 3 translation shown here also uses hexadecimal so you can see how version 3 instructions are shortcuts for version 2 instructions.)

Template:

(
#
..<\00.<\000<\000>
commands go here
'^\00*^\00
)

Commands:

Brainfuck	TurExp (v2)	TurExp (v3)
`+`	`[01\|12\|23\|34\|45\|56\|67\|78\|89\|9a\|ab\|bc\|cd\|de\|ef\| (f0<[01>12>23>34>45>56>67>78>89>9a>ab>bc>cd>de>ef>f0>])]`	`[(?r08)(?r19)\|9a\|(?rae)(?rbf)\| (f0<[(?r08)(?r19)>9a>(?rae)(?rbf)>f0>])]`
`-`	`[fe\|ed\|dc\|cb\|ba\|a9\|98\|87\|76\|65\|54\|43\|32\|21\|10\| (0f<[fe>ed>dc>cb>ba>a9>98>87>76>65>54>43>32>21>10>0f>])]`	`[(?rfb)(?rea)\|a9\|(?r91)(?r80)\| (0f<[(?rfb)(?rea)>a9>(?r91)(?r80)>0f>])]`
`<`	`..>..>`	`..>..>`
`>`	`..<..<[^\00.\|^(\000<.0>)]`	`..<..<[^\00.\|^(\000<.0>)]`
`[`	`([^0.\|^(0.<^0.>^)]`	`([^0.\|^(0.<^0.>^)]`
`]`	`)*0.<0.>`	`)*0.<0.>`
`.`	..< [(0!>'^\00^\00^\00^#\000<^!.<^!0>) (1!>'^\00^\00^\00^#\001<^!.<^!1>) (2!>'^\00^\00^\00^#\002<^!.<^!2>) (3!>'^\00^\00^\00^#\003<^!.<^!3>) (4!>'^\00^\00^\00^#\004<^!.<^!4>) (5!>'^\00^\00^\00^#\005<^!.<^!5>) (6!>'^\00^\00^\00^#\006<^!.<^!6>) (7!>'^\00^\00^\00^#\007<^!.<^!7>) (8!>'^\00^\00^\00^#\008<^!.<^!8>) (9!>'^\00^\00^\00^#\009<^!.<^!9>) (a!>'^\00^\00^\00^#\00a<^!.<^!a>) (b!>'^\00^\00^\00^#\00b<^!.<^!b>) (c!>'^\00^\00^\00^#\00c<^!.<^!c>) (d!>'^\00^\00^\00^#\00d<^!.<^!d>) (e!>'^\00^\00^\00^#\00e<^!.<^!e>) (f!>'^\00^\00^\00^#\00f<^!.<^!f>)] [(0!>'^\00^\00^\00^#\000<^!.<^!0\|) (1!>'^\00^\00^\00^#\001<^!.<^!1\|) (2!>'^\00^\00^\00^#\002<^!.<^!2\|) (3!>'^\00^\00^\00^#\003<^!.<^!3\|) (4!>'^\00^\00^\00^#\004<^!.<^!4\|) (5!>'^\00^\00^\00^#\005<^!.<^!5\|) (6!>'^\00^\00^\00^#\006<^!.<^!6\|) (7!>'^\00^\00^\00^#\007<^!.<^!7\|) (8!>'^\00^\00^\00^#\008<^!.<^!8\|) (9!>'^\00^\00^\00^#\009<^!.<^!9\|) (a!>'^\00^\00^\00^#\00a<^!.<^!a\|) (b!>'^\00^\00^\00^#\00b<^!.<^!b\|) (c!>'^\00^\00^\00^#\00c<^!.<^!c\|) (d!>'^\00^\00^\00^#\00d<^!.<^!d\|) (e!>'^\00^\00^\00^#\00e<^!.<^!e\|) (f!>'^\00^\00^\00^#\00f<^!.<^!f\|)]	`..< (?S(?s(?r09)!>(?raf)!>) '^\00^\00^\00^ #(?s\00(?r09)<\00(?raf)<) ^!.<^ (?s!(?r09)>!(?raf)>)) (?S(?s(?r09)!>(?raf)!>) '^\00^\00^\00^ #(?s\00(?r09)<\00(?raf)<) ^!.<^ (?s!(?r09)\|!(?raf)\|))`

Or, if you're using version 3, then you can define macros like this:

#
(?M+. [(?r08)(?r19)|9a|(?rae)(?rbf)|(f0<[(?r08)(?r19)>9a>(?rae)(?rbf)>f0>])])
(?M-. [(?rfb)(?rea)|a9|(?r91)(?r80)|(0f<[(?rfb)(?rea)>a9>(?r91)(?r80)>0f>])])
(?M<. ..>..>)
(?M>. ..<..<[^\00.|^(\000<.0>)])
(?MLi. ([^0.|^(0.<^0.>^)](?mi))*0.<0.>)
(?Mp. ..<
 (?S(?s(?r09)!>(?raf)!>)'^\00*^\00^\00*^#(?s\00(?r09)<\00(?raf)<)^!.<*^(?s!(?r09)>!(?raf)>))
 (?S(?s(?r09)!>(?raf)!>)'^\00*^\00^\00*^#(?s\00(?r09)<\00(?raf)<)^!.<*^(?s!(?r09)|!(?raf)|)))

(
..<\00.<\000<\000>
commands go here
'^\00*^\00
)

and then + is (?m+), - is (?m-), < is (?m<), > is (?m>), [ is (?mL, ] is ), and . is (?mp).

Note that this reverses left and right; this is to make it easier for output to be in the correct order (the memory extends infinitely to the left, and the output infinitely to the right, separated by a null character).

Interpreter written in JavaScript (TurExp 3)

See below for descriptions of the version 2 examples

Input:

Show all results

Trace

Interpreter written in Haskell (TurExp 2)

Interpreter written in Haskell (updated 2020; source only, download and run; 56K compressed, 412KiB uncompressed; also includes T-Write)

Original interpreter written in Haskell (not compatible with modern versions of GHC; source only; download and run; 49K compressed; also includes T-Write)

Running the interpreter

The interpreter can either be run interactively, by simply typing ./turexp from the command prompt, or with command-line arguments, by typing ./turexp filename input. If only filename is provided, the input will be empty. The -a option can also be used to show all possible outputs rather than just one.

List of examples

The following examples are included with the newer version of the interpreter:

add.turexp, add-n.turexp (these are the same, except for use of read and write mode vs. modeless mode): Adds two binary numbers. Input is of the form 101+100 and are interpreted in little-endian order (i.e., backwards from how we normally write numbers).
anbncn.turexp: Matches strings of the form aⁿbⁿcⁿ with n ≥ 1, i.e., if the input some number of a's followed by the same number of b's and then the same number of c's. This is an example of a problem that neither finite automata/regular expressions (in the computer science theory sense) nor push-down automata/context-free grammars can solve but Turing machines can.
brainf-hello.turexp: The hello world program from Wikipedia's article on Brainfuck, translated using the translation above. Expects no input, and outputs the hexadecimal ASCII values of the string "Hello World!".
hellow.turexp: Basic hello world program
invtest.turexp: Test for inversion mode. Matches the regular expression [^a]a[^abc]., i.e., a character that's not a, then the letter a, then a character that's not a, b, or c, then any character (but demonstrating that [^abc] doesn't work like it does in regular expressions).
invtest2.turexp: I think this just matches anything
invtest3.turexp: Matches anything except the name of a single-digit number
ksat.turexp, ksat-small.turexp: Determines if a boolean/logical formula in conjunctive normal form is satisfiable (k-sat). See the comment in ksat.turexp for info on the input format.
rot13.turexp: Encodes the input in ROT13.

TurExp (version 3)

Contents

Character encoding

Input and output

Complex instructions

The `(?s)` instruction (version 3 only)

Compiler modes

Symbols

For normal mode

For binary mode

For quoted mode

Ranges (version 3 only)

Simple instructions

In read mode

In write mode

In modeless mode

Macros (version 3 only)

Full grammar

Version differences

Version 1 (unimplemented) to 2

Version 2 to 3

Translation from Brainfuck

Interpreter written in JavaScript (TurExp 3)

Interpreter written in Haskell (TurExp 2)

Running the interpreter

List of examples

`reserved-character`	→	one of . \ " { } ` ' # > < \| ( ) [ ] * + ? ^
`symbol`	→	`character` - `reserved-character` - `whitespace`	normal mode
	\|	`\` (`character` - `hex-digit`)	normal mode
	\|	`\` `hex-digit`{`n`}	normal mode; `n` depends on the digits mode
	\|	`hex-digit`{`n`}	binary mode; `n` depends on the digits mode
	\|	`\` `character`	binary mode
	\|	`character` - `"`	quoted mode
`symbols`	→	`symbol`
	\|	`.`
	\|	`(?r` `symbol` `symbol`? `symbol` `)`	version 3
`simple-instruction`	→	`symbols`	in read or write mode
	\|	`symbols` `symbols` (`<` \| `>` \| `\|`)	in modeless mode
`macro-char`	→	`character` - (`"` `.` `(` `)` `[` `]` `\`)
`compiler-directive`	→	one of " { } ` ' # > < \| ^
	\|	one of `x u U`	version 3, binary mode
	\|	`(?` (`i` \| `u`)* (`-` (`i` \| `u`)+)? `)`	version 3, must have at least one `i` or `u`
	\|	`(?M` `macro-char` `macro-char`* `.` `instruction`* `)`	version 3
`instruction`	→	`simple-instruction`
	\|	`compiler-directive`
	\|	`(` `instruction`* `)`
	\|	`[` `instruction`* `]`
	\|	`instruction` (`*` \| `+` \| `?`)	limitations on `instruction`
	\|	`(?^` `instruction`* `)`	version 3
	\|	`(??` `instruction`* `)`	version 3
	\|	`(?S` `instruction`* `)`	version 3, not inside `(?S...)`
	\|	`(?s` `instruction`* `)`	version 3, only inside `(?S...)`
	\|	`(?m` `macro-char` `instruction`* `)`	version 3, `macro-char` must be defined
	\|	`(?t` (`char` - `"` `(` `)` `[` `]` `\` \| `"` (`char` - `"`)* `"`)* `)`	version 3
	\|	`(?$)`	version 3
`program`	→	`compiler-directive`* `instruction`	version 2
	\|	(`(?v3)`)? `instruction`*	version 3, must expand to 1 instruction

TurExp (version 3)

Contents

Character encoding

Input and output

Complex instructions

The (?s) instruction (version 3 only)

Compiler modes

Symbols

For normal mode

For binary mode

For quoted mode

Ranges (version 3 only)

Simple instructions

In read mode

In write mode

In modeless mode

Macros (version 3 only)

Full grammar

Version differences

Version 1 (unimplemented) to 2

Version 2 to 3

Translation from Brainfuck

Interpreter written in JavaScript (TurExp 3)

Interpreter written in Haskell (TurExp 2)

Running the interpreter

List of examples

The `(?s)` instruction (version 3 only)