Syntax Reference¶
This document helps understand how Temper code is parsed. Readers
can use temper repl
to get feedback on how the language interprets
a piece of code. Especially useful is the describe REPL command
which lets you view a snapshot of the compilation state at various
processing stages.
Temper's syntax should be familiar to users of “C-like”
languages: languages that use {...}
around blocks and semicolons
(;
) separate computational steps. It is most similar to TypeScript;
types follow names (name: Type
) with a colon in between.
But its syntax is distinct from JS/TS in details.
Some diffences include:
- Temper uses
let
for named function declarations so that there is no confusion about when a named function's name is visible in the surrounding scope. Temper is a macro language so this is important when macros can operate on declarations. - Temper allows for interpolation into any string, so
"chars${expr}"
. For backwards compatibility, JavaScript could only allow interpolation into back-tick strings (`chars${expr}`
). - Temper has substantially different syntax for function expressions
(
fn (x: Int): Int { x + 1 }
instead of(x: Int): Int => x + 1
) and function types (fn <T> (T): T
instead of<T>(T) => T
). - Temper's
import
andexport
syntax, which allows connecting modules together is different.
The grammar below explains the main syntactic categories. It's meant to be advisory, to help learners discover features by following grammatical threads.
It is not an exact grammar. Temper has a three-stage parse: lexical analysis, operator precedence grouping, tree building. This grammar is derived from the tree builder which operates on a stream of tokens after an operator precedence parser has inserted synthetic parentheses into the token stream and after some other token level rewriting operations.
Since Temper is a macro language, some language features that would
have separate syntactic paths in a non-macro language are instead
implemented as macros; they parse as regular function calls, but those
functions are macros that apply at a leter compilation-stage. For example,
if
is a macro so there is no dedicated syntax for if
statements below.
Structure of a file¶
Syntax for Root¶
The root of a Temper module is a sequence of top-levels followed by an end of file marker.
Syntax for TopLevels¶
Top-levels are separated by semicolons in a module body or block.
Syntax for TopLevel¶
A top-level is roughly a whole declaration, or expression. Temper is an expression language, so most statement-like constructs can also nest.
Syntax for Garbage¶
Syntax for TopLevelNoGarbage¶
Syntax for TrailingSemi¶
Semicolons (;
) are allowed at the end of a block or module body.
An expression followed by a semicolon is not implicitly the result of
the containing block or module.
Trailing semicolons are never inserted.
Statements¶
Syntax for Stmt¶
Statements are not a large syntactic category, but include labeled
statements (like someName: /* loop or block */
), jumps (break
,
continue
, return
, etc.) which go to the end or beginning of
a containing statement and which may refer to a label.
Besides those, any expression may appear in statement position.
Syntax for Nop¶
A semicolon used to separate statements. Since our parser is built around an operator precedence parser, and semicolon is a low precedence operator, this grammar consumes them, but does not require them.
Not all semicolons need to appear explicitly in program text.
Automatic Semicolon Insertion¶
Semicolons are inserted in the following places:
- After
}
that end a line except before a close bracket or an operator token that is not prefix. - Before
{
that starts a line except after an open bracket or an operator token that is not postfix.
This is more conservative than semicolon insertion in JavaScript, but still simplifies several things.
All adjacent statements are separated by semicolons¶
There's no need to have a set of special statements like if (...) stmt0 else stmt1
that do not
need to be followed by a semicolon.
Productions for a series of statements and declarations can simply assume that semicolons appear
before them.
No limited set of statement continuers¶
We don't need a special set of statement continuers like else
so that we know that
the token sequence } else {
is part of one statement.
This lets us use common cues to allow new continuers like
foo(x) {
// Ruby-style block
} bar(y) {
// ruby-style block
}
// ⏸️
which de-sugars to a single statement
foo(x, fn { ... }, bar = fn (f) { f(y, fn { ... }) });
// ⏸️
vs something without a continuer
foo(x) {
// Ruby-style block
} // <-- Semicolon inserted here
bar(y) {
// Ruby-style-block
}
// ⏸️
which de-sugars to two statements
foo(x, fn { ... });
bar(y, fn { ... });
// ⏸️
Motivation¶
Developers of C-like languages are used to not following }
s that end a statement with a
semicolon.
The exception is class
definitions in C++ which, unlike Java and more recent C-like languages
do need to be followed by semicolons.
That that trips me up everytime I go back to C++ seems evidence that requiring semicolons after statements that end with something block-like would be a burden to developers.
Syntax for LabeledStmt¶
Declares a label and associates it as markers for the beginning and
end of a statement, so that break
s and continue
s within that
statement may refer to it explicitly.
Unlike TypeScript, we do not allow labeling any statement.
This allows conveying property declarations like the p: T
in
interface I {
p: T;
}
// ⏸️
to the disAmbiguate stage with that property declaration treated as a (Call (Name ":") (Name "p") (Name "T"))
Otherwise, we would have to treat class
bodies as a special syntactic
category to avoid ambiguity with
do {
p: T;
}
// ⏸️
or the disambiguation would need to convert T
's from statement
context to expression context.
Syntax for LeftLabel¶
A label that that can be jumped to as by break
and continue
.
This is left in the left-hand-side sense: it is a declaration, not a use.
Syntax for Jump¶
A jump to a location within the same function body that does not skip over any necessary variable initializations.
Syntax for LabelOrHole¶
Syntax for Label¶
A label that can be jumped to as by break
and continue
.
Syntax for AwaitReturnThrowYield¶
await
, return
, throw
, and yield
are operators which affect
control flow and operate on expressions.
return(42);
for example, looks like a function call but the parentheses
are not required:
return;
is an application of an operator even though there are no parentheses.return 42;
is an application of the operator to the arguments(42)
even though there are no explicit parentheses.
Syntax for StmtBlock¶
A {
... }
delimited block of statements.
Expressions¶
Syntax for Expr¶
An expression is evaluated to produce a result and/or a side effect.
Syntax for BooleanLiteral¶
Syntax for Float64Literal¶
Syntax for Call¶
The call expression includes simple function calls (f(args)
) as well as calls with
Ruby-style block functions (f { ... }
) and flow-control like
if (condition) { t } else { e }
because [if
is a macro][builtin/if].
Syntax for New¶
Syntax for StringLiteral¶
Block Lambdas¶
Syntax for BlockLambda¶
A block lambda is a {...}
block that specifies a function value and which
cna appear as part of a function call as below:
someFunction { ... }
Optionally, a signature is needed to specify argument names, and may specify the function type wholly or partially.
It may be followed by an extends
clause that specifies marker interfaces
that are super-types for the produced function value.
The signature is followed by the double-semicolon token (;;
) which is
distinct from two, space separated semicolons (; ;
).
someFunction /* <- callee */ {
(x): ReturnType // <- Optional signature
extends SomeInterfaceType // <- super types
;; // <- double semicolon separator
body
}
// ⏸️
Syntax for BlockLambdaSignatureAndSupers¶
The signature of a block lambda explains the names of arguments visible within the body, optionally their types and return type.
The signature also includes other interfaces that the lambda must implement. For example, a function that might pause execution could use a signature line as below:
(x: Int): Int extends GeneratorFn;;
That describes a function that takes an integer x
and which
also is a sub-type of GeneratorFn.
The extends
clause may be left off entirely if no super-types
are desired, or multiple super-types may be specified:
extends First & Second
.
Unlike in a function type, when a name is specified for a block lambda argument, it is the name of the argument, not its type.
let f: fn (Int): Void;
// ⇧⇧⇧ word is a type
let g(myLambda: fn (Int): Void): Void { myLambda(1); }
g { (x): Void;;
// ⇧ word is an argument name.
// In this case, the type is inferred from g's signature.
doSomethingWith(x + 1);
}
// ⏸️
Syntax for BlockLambdaSignature¶
A block lambda signature line like (x: Int): ReturnType
or just
(x)
to take advantage of the more aggressive type inference for block
lambdas than for declared functions.
This is often followed by ;;
as it is part of
BlockLambdaSignatureAndSupers Syntax.
These syntactic constructs are interpreted as if preceded by
fn
but the meaning is subtly different.
(x: Int)
is equivalent tofn (x: Int)
where the return type must later be inferrable from the calling context and the body.(x)
is equivalent tofn (x)
where the argument and return type must later be inferrable.- (x): ReturnType
is equivalent to
fn (x): RT` where only argument types must later be inferrable.
Syntax for BlockLambdaSupers¶
Syntax for BlockLambdaBody¶
Uncategorized¶
Syntax for Arg¶
Syntax for ArgNoInit¶
Syntax for Args¶
Syntax for Arrow¶
Syntax for ArrowHead¶
Syntax for CallArgs¶
Arguments to a function call.
A function call's arguments may be one of:
- a parenthesized, comma separated list of arguments like
(a, b, c)
. See Args - a parenthesized, semicolon separated list of 2 or three arguments
with a specific purpose. As in
(let x = 1; x < 2; ++x)
which is what thefor
loop macro expects. - a string group as in a tagged string template like
callee"foo ${ bar }"
.
Syntax for CallHead¶
The function called, its arguments, and any block lambda
Syntax for CallJoiningWords¶
When the call continues with something like } else if (c) {...}
we need to include
\else_if = fn {...}
as a final named parameter to the call that receives the block
just closed, so that the called function can delegate its result to later segments.
This joins words like else if
into the \else_if
symbol which follows the call join
symbol. A late parse stage pass finds those and groups everything following the joining
words into a trailing block function so that the contents of the parentheses and brackets
can match their own signature elements based on the joining words.
Syntax for CallTail¶
Syntax for Callee¶
The callee in a function application is a tad complicated.
Our OPP grammar covers many constructs that are bespoke constructs
in many languages, so class C extends A, B { ... }
is parsed as an
application of a block lambda (later turned into a member block) like
class(\word, C, \super, A, \super B, fn { .... })
.
This production desugars various parts into a combination of the
callee class
, and symbol/argument pairs.
The \word
argument is also used in constructs like function declaration
let f<T>(arg: Type) { ... }
where the let
macro is what is invoked to
build a named function declaration.
This production allows a callee to have:
- an expression specifying the called macro or function,
- an accompanying word,
- type parameters like
<T, U>
(whether the type parameters are actual parameters or formal parameters is determined by the Disambiguate stage),
Syntax for CalleeAndArgs¶
Captures low precedence operators that may follow a parenthesized argument list.
: ReturnType
desugars to\outType
,ReturnType
.extends SuperType
andimplements SuperType* desugars to
\super,
SuperType`.
Syntax for CalleeAndRequiredArgs¶
This is like CalleeAndArgs but is used in contexts where we're not sure yet whether this is a call. A call requires at least one of
- Parenthesized arguments as in
callee()
- Semi-ed arguments as in
loopMacro (initialization; condition; increment)
- A template string as in
callee"foo ${bar}"
- A trailing block as in
callee {}
This production succeeds is entered where we may not have a trailing block so must have one of the others.
Syntax for CommaEl¶
Syntax for CommaExpr¶
Syntax for CommaOp¶
Syntax for DeclDefault¶
Syntax for DeclInit¶
Syntax for DeclMulti¶
Syntax for DeclMultiNamed¶
Syntax for DeclMultiNested¶
Syntax for DeclName¶
Syntax for DeclType¶
Syntax for DeclTypeNested¶
Syntax for DecoratedLet¶
Syntax for DecoratedLetBody¶
Syntax for DecoratedTopLevel¶
Decorations transform declarations and function and type definitions at compile time.
@SomeName
followed by an optional argument list
When a let
declaration declares multiple names, any decoration before the let
applies to all the names, but declarations immediately before a declared name affect
only that name.
Syntax for EmbeddedComment¶
Comments are not semantically significant but nor are they filtered out entirely.
Temper tries to preserve them when translating documentation, and they are available to backends; for example, the Python backend turns autodoc comments before declarations into Python doc strings.
Syntax for EscapeSequence¶
Syntax for ForArgs¶
Syntax for ForCond¶
Syntax for ForIncr¶
Syntax for ForInit¶
Syntax for Formal¶
Syntax for FormalNoInit¶
Syntax for Formals¶
Syntax for Id¶
Syntax for Infix¶
Syntax for InfixOp¶
Syntax for Json¶
Syntax for JsonArray¶
Syntax for JsonBoolean¶
Truth values are represented using the keywords false
and true
.
Syntax for JsonNull¶
Syntax for JsonNumber¶
Syntax for JsonObject¶
Syntax for JsonProperty¶
Syntax for JsonString¶
Syntax for JsonValue¶
Syntax for Let¶
Syntax for LetArg¶
Syntax for LetBody¶
Syntax for LetNested¶
Syntax for LetRest¶
Syntax for List¶
Syntax for ListContent¶
Syntax for ListElement¶
Syntax for ListElements¶
Syntax for ListHole¶
Syntax for Literal¶
Syntax for MatchBranch¶
Relates a match case, e.g. a pattern, to a consequence of matching that pattern.
Syntax for MatchCase¶
There are two kinds of match cases: run-time type checks that use keyword is
, and a value to match.
Syntax for Member¶
Syntax for NoPropClass¶
Syntax for Obj¶
Syntax for Pattern¶
Syntax for Postfix¶
Syntax for PostfixOp¶
Syntax for Prefix¶
Syntax for PrefixOp¶
Syntax for Prop¶
Syntax for PropClass¶
Syntax for PropName¶
Syntax for Props¶
Syntax for QuasiAst¶
Syntax for QuasiHole¶
Syntax for QuasiInner¶
Syntax for QuasiLeaf¶
Syntax for QuasiTree¶
Syntax for Quasis¶
Syntax for RawBlock¶
Syntax for RawCommaOp¶
Syntax for RegExp¶
Syntax for RegularDot¶
Syntax for ReservedWord¶
Syntax for SpecialDot¶
Syntax for Specialize¶
Syntax for Spread¶
Syntax for StringGroup¶
Syntax for StringGroupTagged¶
Syntax for StringHole¶
Syntax for StringPart¶
String interpolation¶
Strings may contain embedded expressions. When a
string contains a ${
followed by an expression,
followed by a }
, the resulting string value
is the concatenation of the content before,
content from the expression, and the content after.
"foo ${"bar"} baz"
== "foo bar baz"
// ✅
Interpolated values that aren't String instances
have .toString()
called on them automatically,
which is convenient for types that define toString,
such as Int.
let two = 2;
"one ${two} three"
== "one 2 three"
// ✅
An empty interpolation contributes no characters, which means it may be used to embed meta-characters.
"$${}{}" == "\$\{\}"
// ✅
(This mostly comes in handy with tagged strings to give fine-grained control over what the tag receives.)
Empty interpolations can also be used to wrap a long string across multiple lines.
"A very long string ${
// Breaking this string across multiple lines.
}that runs on and on"
== "A very long string that runs on and on"
// ✅
Empty interpolations also let you include spaces at the end of a line in a multi-quoted string.
"""
Line 1
Line 2 ${}
"""
== "Line 1\nLine 2 "
// ✅
Syntax for StringPartRaw¶
Parallels [ProductionNames.StringPart] but emits a [ValuePart] instead of routing a string token to [lang.temper.lexer.unpackQuotedString] so that the tag expression gets string content without escape sequences decoded.
Syntax for SymbolLiteral¶
Syntax for SymbolValue¶
Syntax for TopLevelNoGarbageNoComment¶
Syntax for TopLevelsInSemi¶
Top levels in the context of a larger semicolon separated run.