blob: f9f36df6f818e5419495981355253cf9912673be [file] [log] [blame] [view]
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001<!--
2 Copyright 2018 The CUE Authors
3
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License.
6 You may obtain a copy of the License at
7
8 http://www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in writing, software
11 distributed under the License is distributed on an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 See the License for the specific language governing permissions and
14 limitations under the License.
15-->
16
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010017# The CUE Language Specification
18
19## Introduction
20
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010021This is a reference manual for the CUE data constraint language.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010022CUE, pronounced cue or Q, is a general-purpose and strongly typed
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010023constraint-based language.
24It can be used for data templating, data validation, code generation, scripting,
25and many other applications involving structured data.
26The CUE tooling, layered on top of CUE, provides
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010027a general purpose scripting language for creating scripts as well as
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010028simple servers, also expressed in CUE.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010029
30CUE was designed with cloud configuration, and related systems, in mind,
31but is not limited to this domain.
32It derives its formalism from relational programming languages.
33This formalism allows for managing and reasoning over large amounts of
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010034data in a straightforward manner.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010035
36The grammar is compact and regular, allowing for easy analysis by automatic
37tools such as integrated development environments.
38
39This document is maintained by mpvl@golang.org.
40CUE has a lot of similarities with the Go language. This document draws heavily
Marcel van Lohuizen73f14eb2019-01-30 17:11:17 +010041from the Go specification as a result.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010042
43CUE draws its influence from many languages.
44Its main influences were BCL/ GCL (internal to Google),
45LKB (LinGO), Go, and JSON.
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +020046Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
Marcel van Lohuizen62658a82019-06-16 12:18:47 +020047Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010048
49
50## Notation
51
52The syntax is specified using Extended Backus-Naur Form (EBNF):
53
54```
55Production = production_name "=" [ Expression ] "." .
56Expression = Alternative { "|" Alternative } .
57Alternative = Term { Term } .
58Term = production_name | token [ "…" token ] | Group | Option | Repetition .
59Group = "(" Expression ")" .
60Option = "[" Expression "]" .
61Repetition = "{" Expression "}" .
62```
63
64Productions are expressions constructed from terms and the following operators,
65in increasing precedence:
66
67```
68| alternation
69() grouping
70[] option (0 or 1 times)
71{} repetition (0 to n times)
72```
73
74Lower-case production names are used to identify lexical tokens. Non-terminals
75are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes
76``.
77
78The form a … b represents the set of characters from a through b as
79alternatives. The horizontal ellipsis … is also used elsewhere in the spec to
80informally denote various enumerations or code snippets that are not further
81specified. The character … (as opposed to the three characters ...) is not a
Roger Peppeded0e1d2019-09-24 16:39:36 +010082token of the CUE language.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010083
84
85## Source code representation
86
87Source code is Unicode text encoded in UTF-8.
88Unless otherwise noted, the text is not canonicalized, so a single
89accented code point is distinct from the same character constructed from
90combining an accent and a letter; those are treated as two code points.
91For simplicity, this document will use the unqualified term character to refer
92to a Unicode code point in the source text.
93
94Each code point is distinct; for instance, upper and lower case letters are
95different characters.
96
97Implementation restriction: For compatibility with other tools, a compiler may
98disallow the NUL character (U+0000) in the source text.
99
100Implementation restriction: For compatibility with other tools, a compiler may
101ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
102point in the source text. A byte order mark may be disallowed anywhere else in
103the source.
104
105
106### Characters
107
108The following terms are used to denote specific Unicode character classes:
109
110```
111newline = /* the Unicode code point U+000A */ .
112unicode_char = /* an arbitrary Unicode code point except newline */ .
113unicode_letter = /* a Unicode code point classified as "Letter" */ .
114unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
115```
116
117In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
118character categories.
119CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
120as Unicode letters, and those in the Number category Nd as Unicode digits.
121
122
123### Letters and digits
124
125The underscore character _ (U+005F) is considered a letter.
126
127```
128letter = unicode_letter | "_" .
129decimal_digit = "0" … "9" .
130octal_digit = "0" … "7" .
131hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
132```
133
134
135## Lexical elements
136
137### Comments
Marcel van Lohuizen7fc421b2019-09-11 09:24:03 +0200138Comments serve as program documentation.
139CUE supports line comments that start with the character sequence //
140and stop at the end of the line.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100141
Marcel van Lohuizen7fc421b2019-09-11 09:24:03 +0200142A comment cannot start inside a string literal or inside a comment.
143A comment acts like a newline.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100144
145
146### Tokens
147
148Tokens form the vocabulary of the CUE language. There are four classes:
149identifiers, keywords, operators and punctuation, and literals. White space,
150formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
151(U+000D), and newlines (U+000A), is ignored except as it separates tokens that
152would otherwise combine into a single token. Also, a newline or end of file may
153trigger the insertion of a comma. While breaking the input into tokens, the
154next token is the longest sequence of characters that form a valid token.
155
156
157### Commas
158
159The formal grammar uses commas "," as terminators in a number of productions.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500160CUE programs may omit most of these commas using the following two rules:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100161
162When the input is broken into tokens, a comma is automatically inserted into
163the token stream immediately after a line's final token if that token is
164
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500165- an identifier
166- null, true, false, bottom, or an integer, floating-point, or string literal
167- one of the characters ), ], or }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100168
169
170Although commas are automatically inserted, the parser will require
171explicit commas between two list elements.
172
173To reflect idiomatic use, examples in this document elide commas using
174these rules.
175
176
177### Identifiers
178
179Identifiers name entities such as fields and aliases.
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200180Identifier may be simple or quoted.
181A simple identifier is a sequence of one or more letters (which includes `_`) and digits.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100182It may not be `_`.
183The first character in an identifier must be a letter.
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200184Any sequence of letters, digits or `-` enclosed in
185backticks "`" make an identifier.
186The backticks are not part of the identifier.
187This allows one to refer to fields that are labeled
188with keywords or other identifiers that would
189otherwise not be legal.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100190
191<!--
192TODO: allow identifiers as defined in Unicode UAX #31
193(https://unicode.org/reports/tr31/).
194
195Identifiers are normalized using the NFC normal form.
196-->
197
198```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200199identifier = simple_identifier | quoted_identifier .
200simple_identifier = letter { letter | unicode_digit } .
201quoted_identifier = "`" { letter | unicode_digit | "-" } "`" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100202```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200203<!-- TODO: relax to allow other punctuation -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100204
205```
206a
207_x9
208fieldName
209αβ
210```
211
212<!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
213
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500214Some identifiers are [predeclared](#predeclared-identifiers).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100215
216
217### Keywords
218
219CUE has a limited set of keywords.
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200220In addition, CUE reserves all identifiers starting with `__`(double underscores)
221as keywords.
222These are typically targets of pre-declared identifiers.
223
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100224All keywords may be used as labels (field names).
225They cannot, however, be used as identifiers to refer to the same name.
226
227
228#### Values
229
230The following keywords are values.
231
232```
233null true false
234```
235
236These can never be used to refer to a field of the same name.
237This restriction is to ensure compatibility with JSON configuration files.
238
239
240#### Preamble
241
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100242The following keywords are used at the preamble of a CUE file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100243After the preamble, they may be used as identifiers to refer to namesake fields.
244
245```
246package import
247```
248
249
250#### Comprehension clauses
251
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100252The following keywords are used in comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100253
254```
255for in if let
256```
257
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100258The keywords `for`, `if` and `let` cannot be used as identifiers to
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200259refer to fields.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100260
261<!--
262TODO:
263 reduce [to]
264 order [by]
265-->
266
267
268#### Arithmetic
269
270The following pseudo keywords can be used as operators in expressions.
271
272```
273div mod quo rem
274```
275
276These may be used as identifiers to refer to fields in all other contexts.
277
278
279### Operators and punctuation
280
281The following character sequences represent operators and punctuation:
282
283```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200284+ div && == < = ( )
285- mod || != > :: { }
286* quo & =~ <= : [ ]
287/ rem | !~ >= . ... ,
288 _|_ !
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100289```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200290<!--
291Free tokens: # ; ~ $ ^
292
293// To be used:
294 @ at: associative lists.
295
296// Idea: use # instead of @ for attributes and allow then at declaration level.
297// This will open up the possibility of defining #! at the start of a file
298// without requiring special syntax. Although probably not quite.
299 -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100300
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +0100301
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100302### Integer literals
303
304An integer literal is a sequence of digits representing an integer value.
305An optional prefix sets a non-decimal base: 0 for octal,
3060x or 0X for hexadecimal, and 0b for binary.
307In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500308All integers allow interstitial underscores "_";
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100309these have no meaning and are solely for readability.
310
311Decimal integers may have a SI or IEC multiplier.
312Multipliers can be used with fractional numbers.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500313When multiplying a fraction by a multiplier, the result is truncated
314towards zero if it is not an integer.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100315
316```
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200317int_lit = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
318decimal_lit = ( "1" … "9" ) { [ "_" ] decimal_digit } .
319decimals = decimal_digit { [ "_" ] decimal_digit } .
320si_it = decimals [ "." decimals ] multiplier |
321 "." decimals multiplier .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100322binary_lit = "0b" binary_digit { binary_digit } .
323hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200324octal_lit = "0" [ "o" ] octal_digit { [ "_" ] octal_digit } .
Jonathan Amsterdamabeffa42019-01-20 10:29:29 -0500325multiplier = ( "K" | "M" | "G" | "T" | "P" | "E" | "Y" | "Z" ) [ "i" ]
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200326
327float_lit = decimals "." [ decimals ] [ exponent ] |
328 decimals exponent |
329 "." decimals [ exponent ].
330exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100331```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100332
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100333```
33442
3351.5Gi
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100336170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizenfc6303c2019-02-07 17:49:04 +01003370xBad_Face
3380o755
3390b0101_0001
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100340```
341
342### Decimal floating-point literals
343
344A decimal floating-point literal is a representation of
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500345a decimal floating-point value (a _float_).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100346It has an integer part, a decimal point, a fractional part, and an
347exponent part.
348The integer and fractional part comprise decimal digits; the
349exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
350One of the integer part or the fractional part may be elided; one of the decimal
351point or the exponent may be elided.
352
353```
354decimal_lit = decimals "." [ decimals ] [ exponent ] |
355 decimals exponent |
356 "." decimals [ exponent ] .
357exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
358```
359
360```
3610.
36272.40
363072.40 // == 72.40
3642.71828
3651.e+0
3666.67428e-11
3671E6
368.25
369.12345E+5
370```
371
372
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100373### String and byte sequence literals
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100374
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100375A string literal represents a string constant obtained from concatenating a
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100376sequence of characters.
377Byte sequences are a sequence of bytes.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100378
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100379String and byte sequence literals are character sequences between,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100380respectively, double and single quotes, as in `"bar"` and `'bar'`.
381Within the quotes, any character may appear except newline and,
382respectively, unescaped double or single quote.
383String literals may only be valid UTF-8.
384Byte sequences may contain any sequence of bytes.
385
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400386Several escape sequences allow arbitrary values to be encoded as ASCII text.
387An escape sequence starts with an _escape delimiter_, which is `\` by default.
388The escape delimiter may be altered to be `\` plus a fixed number of
389hash symbols `#`
390by padding the start and end of a string or byte sequence literal
391with this number of hash symbols.
392
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100393There are four ways to represent the integer value as a numeric constant: `\x`
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400394followed by exactly two hexadecimal digits; `\u` followed by exactly four
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100395hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
396plain backslash `\` followed by exactly three octal digits.
397In each case the value of the literal is the value represented by the
398digits in the corresponding base.
399Hexadecimal and octal escapes are only allowed within byte sequences
400(single quotes).
401
402Although these representations all result in an integer, they have different
403valid ranges.
404Octal escapes must represent a value between 0 and 255 inclusive.
405Hexadecimal escapes satisfy this condition by construction.
406The escapes `\u` and `\U` represent Unicode code points so within them
407some values are illegal, in particular those above `0x10FFFF`.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400408Surrogate halves are allowed,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100409but are translated into their non-surrogate equivalent internally.
410
411The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
412represent individual bytes of the resulting string; all other escapes represent
413the (possibly multi-byte) UTF-8 encoding of individual characters.
414Thus inside a string literal `\377` and `\xFF` represent a single byte of
415value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
416the two bytes `0xc3 0xbf` of the UTF-8
417encoding of character `U+00FF`.
418
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100419```
420\a U+0007 alert or bell
421\b U+0008 backspace
422\f U+000C form feed
423\n U+000A line feed or newline
424\r U+000D carriage return
425\t U+0009 horizontal tab
426\v U+000b vertical tab
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100427\/ U+002f slash (solidus)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100428\\ U+005c backslash
429\' U+0027 single quote (valid escape only within single quoted literals)
430\" U+0022 double quote (valid escape only within double quoted literals)
431```
432
433The escape `\(` is used as an escape for string interpolation.
434A `\(` must be followed by a valid CUE Expression, followed by a `)`.
435
436All other sequences starting with a backslash are illegal inside literals.
437
438```
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400439escaped_char = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100440byte_value = octal_byte_value | hex_byte_value .
441octal_byte_value = `\` octal_digit octal_digit octal_digit .
442hex_byte_value = `\` "x" hex_digit hex_digit .
443little_u_value = `\` "u" hex_digit hex_digit hex_digit hex_digit .
444big_u_value = `\` "U" hex_digit hex_digit hex_digit hex_digit
445 hex_digit hex_digit hex_digit hex_digit .
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400446unicode_value = unicode_char | little_u_value | big_u_value | escaped_char .
447interpolation = "\(" Expression ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100448
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400449string_lit = simple_string_lit |
450 multiline_string_lit |
451 simple_bytes_lit |
452 multiline_bytes_lit |
453 `#` string_lit `#` .
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100454
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400455simple_string_lit = `"` { unicode_value | interpolation } `"` .
456simple_bytes_lit = `"` { unicode_value | interpolation | byte_value } `"` .
457multiline_string_lit = `"""` newline
458 { unicode_value | interpolation | newline }
459 newline `"""` .
460multiline_bytes_lit = "'''" newline
461 { unicode_value | interpolation | byte_value | newline }
462 newline "'''" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100463```
464
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400465Carriage return characters (`\r`) inside string literals are discarded from
Marcel van Lohuizendb9d25a2019-02-21 23:54:43 +0100466the string value.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400467
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100468```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100469'a\000\xab'
470'\007'
471'\377'
472'\xa' // illegal: too few hexadecimal digits
473"\n"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100474"\""
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100475'Hello, world!\n'
476"Hello, \( name )!"
477"日本語"
478"\u65e5本\U00008a9e"
479"\xff\u00FF"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100480"\uD800" // illegal: surrogate half (TODO: probably should allow)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100481"\U00110000" // illegal: invalid Unicode code point
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400482
483#"This is not an \(interpolation)"#
484#"This is an \#(interpolation)"#
485#"The sequence "\U0001F604" renders as \#U0001F604."#
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100486```
487
488These examples all represent the same string:
489
490```
491"日本語" // UTF-8 input text
492'日本語' // UTF-8 input text as byte sequence
493`日本語` // UTF-8 input text as a raw literal
494"\u65e5\u672c\u8a9e" // the explicit Unicode code points
495"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points
496"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
497```
498
499If the source code represents a character as two code points, such as a
500combining form involving an accent and a letter, the result will appear as two
501code points if placed in a string literal.
502
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400503Strings and byte sequences have a multiline equivalent.
504Multiline strings are like their single-line equivalent,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100505but allow newline characters.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100506
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400507Multiline strings and byte sequences respectively start with
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100508a triple double quote (`"""`) or triple single quote (`'''`),
509immediately followed by a newline, which is discarded from the string contents.
510The string is closed by a matching triple quote, which must be by itself
511on a newline, preceded by optional whitespace.
512The whitespace before a closing triple quote must appear before any non-empty
513line after the opening quote and will be removed from each of these
514lines in the string literal.
515A closing triple quote may not appear in the string.
516To include it is suffices to escape one of the quotes.
517
518```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100519"""
520 lily:
521 out of the water
522 out of itself
523
524 bass
525 picking bugs
526 off the moon
527 — Nick Virgilio, Selected Haiku, 1988
528 """
529```
530
531This represents the same string as:
532
533```
534"lily:\nout of the water\nout of itself\n\n" +
535"bass\npicking bugs\noff the moon\n" +
536" — Nick Virgilio, Selected Haiku, 1988"
537```
538
539<!-- TODO: other values
540
541Support for other values:
542- Duration literals
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +0100543- regular expessions: `re("[a-z]")`
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100544-->
545
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500546
547## Values
548
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100549In addition to simple values like `"hello"` and `42.0`, CUE has _structs_.
550A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
551Structs are CUE's only way of building up complex values;
552lists, which we will see later,
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500553are defined in terms of structs.
554
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100555All possible values are ordered in a lattice,
556a partial order where every two elements have a single greatest lower bound.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500557A value `a` is an _instance_ of a value `b`,
558denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
559that is if `a` orders before `b` in the partial order
560(`⊑` is _not_ a CUE operator).
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100561We also say that `b` _subsumes_ `a` in this case.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500562In graphical terms, `b` is "above" `a` in the lattice.
563
564At the top of the lattice is the single ancestor of all values, called
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100565_top_, denoted `_` in CUE.
566Every value is an instance of top.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500567
568At the bottom of the lattice is the value called _bottom_, denoted `_|_`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100569A bottom value usually indicates an error.
570Bottom is an instance of every value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500571
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100572An _atom_ is any value whose only instances are itself and bottom.
573Examples of atoms are `42.0`, `"hello"`, `true`, `null`.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500574
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100575A value is _concrete_ if it is either an atom, or a struct all of whose
576field values are themselves concrete, recursively.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500577
578CUE's values also include what we normally think of as types, like `string` and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100579`float`.
580But CUE does not distinguish between types and values; only the
581relationship of values in the lattice is important.
582Each CUE "type" subsumes the concrete values that one would normally think
583of as part of that type.
584For example, "hello" is an instance of `string`, and `42.0` is an instance of
585`float`.
586In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`.
587We informally call these CUE's "basic types".
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100588
589
590```
591false ⊑ bool
592true ⊑ bool
593true ⊑ true
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01005945.0 ⊑ float
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100595bool ⊑ _
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100596_|_ ⊑ _
597_|_ ⊑ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100598
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100599_ ⋢ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100600_ ⋢ bool
601int ⋢ bool
602bool ⋢ int
603false ⋢ true
604true ⋢ false
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100605float ⋢ 5.0
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01006065 ⋢ 6
607```
608
609
610### Unification
611
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500612The _unification_ of values `a` and `b`
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100613is defined as the greatest lower bound of `a` and `b`. (That is, the
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500614value `u` such that `u ⊑ a` and `u ⊑ b`,
615and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
616it holds that `v ⊑ u`.)
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500617Since CUE values form a lattice, the unification of two CUE values is
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400618always unique.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100619
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500620These all follow from the definition of unification:
621- The unification of `a` with itself is always `a`.
622- The unification of values `a` and `b` where `a ⊑ b` is always `a`.
623- The unification of a value with bottom is always bottom.
624
625Unification in CUE is a [binary expression](#Operands), written `a & b`.
626It is commutative and associative.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100627As a consequence, order of evaluation is irrelevant, a property that is key
628to many of the constructs in the CUE language as well as the tooling layered
629on top of it.
630
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500631
632
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100633<!-- TODO: explicitly mention that disjunction is not a binary operation
634but a definition of a single value?-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100635
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100636
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100637### Disjunction
638
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500639The _disjunction_ of values `a` and `b`
640is defined as the least upper bound of `a` and `b`.
641(That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
642and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
643it holds that `d ⊑ e`.)
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100644This style of disjunctions is sometimes also referred to as sum types.
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500645Since CUE values form a lattice, the disjunction of two CUE values is always unique.
646
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100647
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500648These all follow from the definition of disjunction:
649- The disjunction of `a` with itself is always `a`.
650- The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
651- The disjunction of a value `a` with bottom is always `a`.
652- The disjunction of two bottom values is bottom.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100653
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500654Disjunction in CUE is a [binary expression](#Operands), written `a | b`.
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100655It is commutative, associative, and idempotent.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100656
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100657The unification of a disjunction with another value is equal to the disjunction
658composed of the unification of this value with all of the original elements
659of the disjunction.
660In other words, unification distributes over disjunction.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100661
662```
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100663(a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100664```
665
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100666```
667Expression Result
668({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100669(int | string) & "foo" "foo"
670("a" | "b") & "c" _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100671```
672
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100673A disjunction is _normalized_ if there is no element
674`a` for which there is an element `b` such that `a ⊑ b`.
675
676<!--
677Normalization is important, as we need to account for spurious elements
678For instance "tcp" | "tcp" should resolve to "tcp".
679
680Also consider
681
682 ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
683
684in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
685this expression is logically equivalent to {a:1} and should therefore be
686considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
687
688For instance, in
689
690 x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
691 y: x.a // 1
692
693y should resolve to 1, and not an error.
694
695For comparison, in
696
697 x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
698 y: x.a // _|_
699
700y should be an error as x is still ambiguous before the selector is applied,
701even though `a` resolves to 1 in all cases.
702-->
703
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500704
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100705#### Default values
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500706
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100707Any element of a disjunction can be marked as a default
708by prefixing it with an asterisk '*'.
709Intuitively, when an expression needs to be resolved for an operation other
710than unification or disjunctions,
711non-starred elements are dropped in favor of starred ones if the starred ones
712do not resolve to bottom.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500713
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100714More precisely, any value `v` may be associated with a default value `d`,
715denoted `(v, d)` (not CUE syntax),
716where `d` must be in instance of `v` (`d ⊑ v`).
717The rules for unifying and disjoining such values are as follows:
718
719```
720U1: (v1, d1) & v2 => (v1&v2, d1&v2)
721U2: (v1, d1) & (v2, d2) => (v1&v2, d1&d2)
722
723D1: (v1, d1) | v2 => (v1|v2, d1)
724D2: (v1, d1) | (v2, d2) => (v1|v2, d1|d2)
725```
726
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100727Default values may be introduced within disjunctions
728by _marking_ terms of a disjunction with an asterisk `*`
729([a unary expression](#Operators)).
730The default value of a disjunction with marked terms is the disjunction
731of those marked terms, applying the following rules for marks:
732
733```
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200734M1: *v => (v, v)
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100735M2: *(v1, d1) => (v1, d1)
736```
737
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400738In general, any operation `f` in CUE involving default values proceeds along the
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200739following lines
740```
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400741O1: f((v1, d1), ..., (vn, dn)) => (f(v1, ..., vn), f(d1, ..., dn))
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200742```
743where, with the exception of disjunction, a value `v` without a default
744value is promoted to `(v, v)`.
745
746
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100747```
748Expression Value-default pair Rules applied
749*"tcp" | "udp" ("tcp"|"udp", "tcp") M1, D1
750string | *"foo" (string, "foo") M1, D1
751
752*1 | 2 | 3 (1|2|3, 1) M1, D1
753
754(*1|2|3) | (1|*2|3) (1|2|3, 1|2) M1, D1, D2
755(*1|2|3) | *(1|*2|3) (1|2|3, 1|2) M1, D1, M2, D2
756(*1|2|3) | (1|*2|3)&2 (1|2|3, 1|2) M1, D1, U1, D2
757
758(*1|2) & (1|*2) (1|2, _|_) M1, D1, U2
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200759
760(*1|2) + (1|*2) ((1|2)+(1|2), 3) M1, D1, O1
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100761```
762
763The rules of subsumption for defaults can be derived from the above definitions
764and are as follows.
765
766```
767(v2, d2) ⊑ (v1, d1) if v2 ⊑ v1 and d2 ⊑ d1
768(v1, d1) ⊑ v if v1 ⊑ v
769v ⊑ (v1, d1) if v ⊑ d1
770```
771
772<!--
773For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
774
775The last one is so restrictive as v could still be made more specific by
776associating it with a default that is not subsumed by d1.
777
778Proof:
779 by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
780 where the most general value is (v, v).
781 Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
782 from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
783 exactly defines the boundary of this subsumption.
784-->
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100785
786<!--
787(non-normalized entries could also be implicitly marked, allowing writing
788int | 1, instead of int | *1, but that can be done in a backwards
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100789compatible way later if really desirable, as long as we require that
790disjunction literals be normalized).
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500791-->
792
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100793
794```
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100795Expression Resolves to
796"tcp" | "udp" "tcp" | "udp"
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100797*"tcp" | "udp" "tcp"
798float | *1 1
799*string | 1.0 string
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100800
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100801(*1|2|3) | (1|*2|3) 1|2
802(*1|2|3) & (1|*2|3) 1|2|3 // default is _|_
803
804(* >=5 | int) & (* <=5 | int) 5
805
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100806(*"tcp"|"udp") & ("udp"|*"tcp") "tcp"
807(*"tcp"|"udp") & ("udp"|"tcp") "tcp"
808(*"tcp"|"udp") & "tcp" "tcp"
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100809(*"tcp"|"udp") & (*"udp"|"tcp") "tcp" | "udp" // default is _|_
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100810
811(*true | false) & bool true
812(*true | false) & (true | false) true
813
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100814{a: 1} | {b: 1} {a: 1} | {b: 1}
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100815{a: 1} | *{b: 1} {b:1}
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100816*{a: 1} | *{b: 1} {a: 1} | {b: 1}
817({a: 1} | {b: 1}) & {a:1} {a:1} // after eliminating {a:1,b:1} by normalization
818({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1} // after eliminating {a:1,b:1} by normalization
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100819```
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500820
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100821
822### Bottom and errors
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100823
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100824Any evaluation error in CUE results in a bottom value, respresented by
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100825the token '_|_'.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100826Bottom is an instance of every other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100827Any evaluation error is represented as bottom.
828
829Implementations may associate error strings with different instances of bottom;
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500830logically they all remain the same value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100831
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100832
833### Top
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100834
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100835Top is represented by the underscore character '_', lexically an identifier.
836Unifying any value `v` with top results `v` itself.
837
838```
839Expr Result
840_ & 5 5
841_ & _ _
842_ & _|_ _|_
843_ | _|_ _
844```
845
846
847### Null
848
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100849The _null value_ is represented with the keyword `null`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100850It has only one parent, top, and one child, bottom.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100851It is unordered with respect to any other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100852
853```
854null_lit = "null"
855```
856
857```
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100858null & 8 _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100859null & _ null
860null & _|_ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100861```
862
863
864### Boolean values
865
866A _boolean type_ represents the set of Boolean truth values denoted by
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100867the keywords `true` and `false`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100868The predeclared boolean type is `bool`; it is a defined type and a separate
869element in the lattice.
870
871```
872boolean_lit = "true" | "false"
873```
874
875```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100876bool & true true
877true & true true
878true & false _|_
879bool & (false|true) false | true
880bool & (true|false) true | false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100881```
882
883
884### Numeric values
885
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500886The _integer type_ represents the set of all integral numbers.
887The _decimal floating-point type_ represents the set of all decimal floating-point
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100888numbers.
889They are two distinct types.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +0200890Both are instances instances of a generic `number` type.
891
892<!--
893 number
894 / \
895 int float
896-->
897
898The predeclared number, integer, decimal floating-point types are
899`number`, `int` and `float`; they are defined types.
900<!--
901TODO: should we drop float? It is somewhat preciser and probably a good idea
902to have it in the programmatic API, but it may be confusing to have to deal
903with it in the language.
904-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100905
906A decimal floating-point literal always has type `float`;
907it is not an instance of `int` even if it is an integral number.
908
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400909Integer literals are always of type `int` and don't match type `float`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100910
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100911Numeric literals are exact values of arbitrary precision.
912If the operation permits it, numbers should be kept in arbitrary precision.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100913
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100914Implementation restriction: although numeric values have arbitrary precision
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100915in the language, implementations may implement them using an internal
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100916representation with limited precision.
917That said, every implementation must:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100918
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500919- Represent integer values with at least 256 bits.
920- Represent floating-point values, with a mantissa of at least 256 bits and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100921a signed binary exponent of at least 16 bits.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500922- Give an error if unable to represent an integer value precisely.
923- Give an error if unable to represent a floating-point value due to overflow.
924- Round to the nearest representable value if unable to represent
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100925a floating-point value due to limits on precision.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100926These requirements apply to the result of any expression except for builtin
927functions for which an unusual loss of precision must be explicitly documented.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100928
929
930### Strings
931
Marcel van Lohuizen4108f802019-08-13 18:30:25 +0200932The _string type_ represents the set of UTF-8 strings,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100933not allowing surrogates.
934The predeclared string type is `string`; it is a defined type.
935
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100936The length of a string `s` (its size in bytes) can be discovered using
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400937the built-in function `len`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100938
Marcel van Lohuizen4108f802019-08-13 18:30:25 +0200939
940### Bytes
941
942The _bytes type_ represents the set of byte sequences.
943A byte sequence value is a (possibly empty) sequence of bytes.
944The number of bytes is called the length of the byte sequence
945and is never negative.
946The predeclared byte sequence type is `bytes`; it is a defined type.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100947
948
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +0100949### Bounds
950
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400951A _bound_, syntactically a [unary expression](#Operands), defines
Marcel van Lohuizen62b87272019-02-01 10:07:49 +0100952an infinite disjunction of concrete values than can be represented
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +0100953as a single comparison.
954
955For any [comparison operator](#Comparison-operators) `op` except `==`,
956`op a` is the disjunction of every `x` such that `x op a`.
957
958```
9592 & >=2 & <=5 // 2, where 2 is either an int or float.
9602.5 & >=1 & <=5 // 2.5
9612 & >=1.0 & <3.0 // 2.0
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01009622 & >1 & <3.0 // 2.0
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01009632.5 & int & >1 & <5 // _|_
9642.5 & float & >1 & <5 // 2.5
965int & 2 & >1.0 & <3.0 // _|_
9662.5 & >=(int & 1) & <5 // _|_
967>=0 & <=7 & >=3 & <=10 // >=3 & <=7
968!=null & 1 // 1
969>=5 & <=5 // 5
970```
971
972
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100973### Structs
974
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500975A _struct_ is a set of elements called _fields_, each of
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100976which has a name, called a _label_, and value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100977
978We say a label is defined for a struct if the struct has a field with the
979corresponding label.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +0200980The value for a label `f` of struct `a` is denoted `a.f`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100981A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100982defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100983Note that if `a` is an instance of `b` it may have fields with labels that
984are not defined for `b`.
985
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500986The (unique) struct with no fields, written `{}`, has every struct as an
987instance. It can be considered the type of all structs.
988
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400989```
990{a: 1} ⊑ {}
991{a: 1, b: 1} ⊑ {a: 1}
992{a: 1} ⊑ {a: int}
993{a: 1, b: 1} ⊑ {a: int, b: float}
994
995{} ⋢ {a: 1}
996{a: 2} ⋢ {a: 1}
997{a: 1} ⋢ {b: 1}
998```
999
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001000A field may be required or optional.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001001The successful unification of structs `a` and `b` is a new struct `c` which
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001002has all fields of both `a` and `b`, where
1003the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
1004or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001005If a field `f` is in both `a` and `b`, `c.f` is optional only if both
1006`a.f` and `b.f` are optional.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001007Any [references](#References) to `a` or `b`
1008in their respective field values need to be replaced with references to `c`.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001009The result of a unification is bottom (`_|_`) if any of its required
1010fields evaluates to bottom, recursively.
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001011<!--NOTE: About bottom values for optional fields being okay.
1012
1013The proposition ¬P is a close cousin of P → ⊥ and is often used
1014as an approximation to avoid the issues of using not.
1015Bottom (⊥) is also frequently used to mean undefined. This makes sense.
1016Consider `{a?: 2} & {a?: 3}`.
1017Both structs say `a` is optional; in other words, it may be omitted.
1018So we can still get a valid result by omitting `a`, even in
1019case of a conflict.
1020
1021Granted, this definition may lead to confusing results, especially in
1022definitions, when tightening an optional field leads to unintentionally
1023discarding it.
1024It could be a role of vet checkers to identify such cases (and suggest users
1025to explicitly use `_|_` to discard a field, for instance).
1026-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001027
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001028Syntactically, a struct literal may contain multiple fields with
1029the same label, the result of which is a single field with the same properties
1030as defined as the unification of two fields resulting from unifying two structs.
1031
1032These examples illustrate required fields only. Examples with
1033optional fields follow below.
1034
1035```
1036Expression Result (without optional fields)
1037{a: int, a: 1} {a: 1}
1038{a: int} & {a: 1} {a: 1}
1039{a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7}
1040{a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7}
1041
1042{a: 1} & {b: 2} {a: 1, b: 2}
1043{a: 1, b: int} & {b: 2} {a: 1, b: 2}
1044
1045{a: 1} & {a: 2} _|_
1046```
1047
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001048Syntactically, the labels of optional fields are followed by a
1049question mark `?`.
1050The question mark is not part of the field name.
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001051Concrete field labels may be an identifier or string, the latter of which may be
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001052interpolated.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001053Fields with identifier labels can be referred to within the scope they are
1054defined, string labels cannot.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001055References within such interpolated strings are resolved within
1056the scope of the struct in which the label sequence is
1057defined and can reference concrete labels lexically preceding
1058the label within a label sequence.
1059<!-- We allow this so that rewriting a CUE file to collapse or expand
1060field sequences has no impact on semantics.
1061-->
1062
1063<!--TODO: first implementation round will not yet have expression labels
1064
1065An ExpressionLabel sets a collection of optional fields to a field value.
1066By default it defines this value for all possible string labels.
1067An optional expression limits this to the set of optional fields which
1068labels match the expression.
1069-->
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001070A Bind label, written `<identifier>`, is useful for capturing a label as a value
1071and for enforcing constraints on all fields of a struct.
1072In a field using a bind label, such as
1073```
1074{
1075 <id>: { name: id }
1076}
1077```
1078the label name is bound to the identifier for the scope of the field value, so
1079it can be used inside the value to denote the label.
1080
1081A bind label matches every field of its enclosing struct, so
1082```
1083{
1084 <id>: { name: id }
1085 a: { value: 1 }
1086}
1087```
1088evaluates to
1089
1090```
1091{
1092 a: { name: "a" }
1093 a: { value: 1 }
1094}
1095```
1096Since identical fields in a struct unify, this is equivalent to
1097```
1098{
1099 a: {
1100 name: "a"
1101 value: 1
1102 }
1103}
1104```
1105
1106Because bind labels match every field in a struct, they can enforce constraints
1107on all fields. The struct
1108
1109```
1110ints: {
1111 <_>: int
1112}
1113```
1114can only have integer field values:
1115
1116```
1117ints & { a: 1 } // ok
1118ints & { b: "two" } // _|_, because int & "two" == _|_.
1119```
1120
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001121The token `...` is a shorthand for `<_>: _`.
1122<!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001123
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001124
1125<!-- NOTE:
1126A DefinitionDecl does not allow repeated labels. This is to avoid
1127any ambiguity or confusion about whether earlier path components
1128are to be interpreted as declarations or normal fields (they should
1129always be normal fields.)
1130-->
1131
1132<!--NOTE:
1133The syntax has been deliberately restricted to allow for the following
1134future extensions and relaxations:
1135 - Allow omitting a "?" in an expression label to indicate a concrete
1136 string value (but maybe we want to use () for that).
1137 - Make the "?" in expression label optional if expression labels
1138 are always optional.
1139 - Or allow eliding the "?" if the expression has no references and
1140 is obviously not concrete (such as `[string]`).
1141 - The expression of an expression label may also indicate a struct with
1142 integer or even number labels
1143 (beware of imprecise computation in the latter).
1144 e.g. `{ [int]: string }` is a map of integers to strings.
1145 - Allow for associative lists (`foo [@.field]: {field: string}`)
1146 - The `...` notation can be extended analogously to that of a ListList,
1147 by allowing it to follow with an expression for the remaining properties.
1148 In that case it is no longer a shorthand for `[string]: _`, but rather
1149 would define the value for any other value for which there is no field
1150 defined.
1151 Like the definition with List, this is somewhat odd, but it allows the
1152 encoding of JSON schema's and (non-structural) OpenAPI's
1153 additionalProperties and additionalItems.
1154-->
1155
1156<!-- TODO: for next round of implementation, replace ExpressionLabel with:
1157ExpressionLabel = BindLabel | [ BindLabel ] "[" [ Expression ] "]" .
1158-->
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001159
Marcel van Lohuizen98187612019-09-03 12:48:25 +02001160<!-- TODO: strongly consider relaxing an embedding to be an Expression, instead
1161of Operand. This will tie in with using dots instead of spaces on the LHS,
1162comprehensions and the ability to generate good error messages, so thread
1163carefully.
1164-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001165```
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02001166StructLit = "{" { Declaration "," } [ "..." ] "}" .
1167Declaration = FieldDecl | DefinitionDecl | AliasDecl | Comprehension | Embedding .
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001168FieldDecl = Label { Label } ":" Expression { attribute } .
1169DefinitionDecl = Label "::" Expression { attribute } .
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02001170Embedding = Expression .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001171
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001172AliasDecl = Label "=" Expression .
1173BindLabel = "<" identifier ">" .
1174ConcreteLabel = identifier | simple_string_lit .
1175ExpressionLabel = BindLabel
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001176Label = ConcreteLabel [ "?" ] | ExpressionLabel .
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001177
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001178attribute = "@" identifier "(" attr_elems ")" .
1179attr_elems = attr_elem { "," attr_elem }
1180attr_elem = attr_string | attr_label | attr_nest .
1181attr_label = identifier "=" attr_string .
1182attr_nest = identifier "(" attr_elems ")" .
1183attr_string = { attr_char } | string_lit .
1184attr_char = /* an arbitrary Unicode code point except newline, ',', '"', `'`, '#', '=', '(', and ')' */ .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001185```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001186
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001187
1188```
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001189Expression Result (without optional fields)
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001190a: { foo?: string } {}
1191b: { foo: "bar" } { foo: "bar" }
1192c: { foo?: *"bar" | string } {}
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001193
1194d: a & b { foo: "bar" }
1195e: b & c { foo: "bar" }
1196f: a & c {}
1197g: a & { foo?: number } {}
1198h: b & { foo?: number } _|_
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001199i: c & { foo: string } { foo: "bar" }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001200```
1201
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001202#### Closed structs
1203
1204By default, structs are open to adding fields.
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001205Instances of an open struct `p` may contain fields not defined in `p`.
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001206This is makes it easy to add fields, but can lead to bugs:
1207
1208```
1209S: {
1210 field1: string
1211}
1212
1213S1: S & { field2: "foo" }
1214
1215// S1 is { field1: string, field2: "foo" }
1216
1217
1218A: {
1219 field1: string
1220 field2: string
1221}
1222
1223A1: A & {
1224 feild1: "foo" // "field1" was accidentally misspelled
1225}
1226
1227// A1 is
1228// { field1: string, field2: string, feild1: "foo" }
1229// not the intended
1230// { field1: "foo", field2: string }
1231```
1232
Marcel van Lohuizen18637db2019-09-03 11:48:25 +02001233A _closed struct_ `c` is a struct whose instances may not have regular fields
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001234not defined in `c`.
Marcel van Lohuizen4245fb42019-09-09 11:22:12 +02001235Closing a struct is equivalent to adding an optional field with value `_|_`
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001236for all undefined fields.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001237
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001238Syntactically, closed structs can be explicitly created with the `close` builtin
1239or implicitly by [definitions](#Definitions).
1240
1241
1242```
1243A: close({
1244 field1: string
1245 field2: string
1246})
1247
1248A1: A & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001249 feild1: string
1250} // _|_ feild1 not defined for A
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001251
1252A2: A & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001253 for k,v in { feild1: string } {
1254 k: v
1255 }
1256} // _|_ feild1 not defined for A
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001257
1258C: close({
1259 <_>: _
1260})
1261
1262C2: C & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001263 for k,v in { thisIsFine: string } {
1264 "\(k)": v
1265 }
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001266}
1267
1268D: close({
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001269 // Values generated by comprehensions are treated as embeddings.
1270 for k,v in { x: string } {
1271 "\(k)": v
1272 }
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001273})
1274```
1275
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001276<!-- (jba) Somewhere it should be said that optional fields are only
1277 interesting inside closed structs. -->
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001278
1279#### Embedding
1280
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001281A struct may contain an _embedded value_, an operand used
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001282as a declaration, which must evaluate to a struct.
1283An embedded value of type struct is unified with the struct in which it is
1284embedded, but disregarding the restrictions imposed by closed structs
1285for its top-level fields.
1286<!--TODO: consider relaxing it to the below.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001287An embedded value of type struct is unified with the struct in which it is
1288embedded, but disregarding the restrictions imposed by closed structs.
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001289
1290Note that in the above definition we cannot say that the fields of the
1291embedded struct are added: references within these fields referring to
1292the embedded struct should be rewired to reference the new struct.
1293This would not be the case with per-field definition.
1294-->
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001295A struct resulting from such a unification is closed if either of the involved
1296structs were closed.
1297
Marcel van Lohuizene53305e2019-09-13 10:10:31 +02001298Syntactically, embeddings may be any expression, except that `<`
1299is eagerly interpreted as a bind label.
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02001300
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001301```
1302S1: {
1303 a: 1
1304 b: 2
1305 {
1306 c: 3
1307 }
1308}
1309// S1 is { a: 1, b: 2, c: 3 }
1310
1311S2: close({
1312 a: 1
1313 b: 2
1314 {
1315 c: 3
1316 }
1317})
1318// same as close(S1)
1319
1320S3: {
1321 a: 1
1322 b: 2
1323 close({
1324 c: 3
1325 })
1326}
1327// same as S2
1328```
1329
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001330
1331#### Definitions
1332
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001333A field of a struct may be declared as a regular field (using `:`)
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001334or as a _definition_ (using `::`).
1335Definitions are not emitted as part of the model and are never required
1336to be concrete when emitting data.
Marcel van Lohuizen18637db2019-09-03 11:48:25 +02001337It is illegal to have a regular field and a definition with the same name
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001338within the same struct.
1339Literal structs that are part of a definition's value are implicitly closed.
Marcel van Lohuizen5e8c3912019-09-03 15:46:26 +02001340This excludes literals structs in embeddings and aliases.
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001341An ellipsis `...` in such literal structs keeps them open,
1342as it defines `_` for all labels.
Marcel van Lohuizen5e8c3912019-09-03 15:46:26 +02001343<!--
1344Excluding embeddings from recursive closing allows comprehensions to be
1345interpreted as embeddings without some exception. For instance,
1346 if x > 2 {
1347 foo: string
1348 }
1349should not cause any failure. It is also consistent with embeddings being
1350opened when included in a closed struct.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001351
Marcel van Lohuizen5e8c3912019-09-03 15:46:26 +02001352Finally, excluding embeddings from recursive closing allows for
1353a mechanism to not recursively close, without needing an additional language
1354construct, such as a triple colon or something else:
1355foo :: {
1356 {
1357 // not recursively closed
1358 }
1359 ... // include this to not close outer struct
1360}
1361
1362Including aliases from this exclusion, which are more a separate definition
1363than embedding seems sensible, and allows for an easy mechanism to avoid
1364closing, aside from embedding.
1365-->
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001366
1367```
1368// MyStruct is closed and as there is no expression label or `...`, we know
1369// this is the full definition.
1370MyStruct :: {
1371 field: string
1372 enabled?: bool
1373}
1374
1375// Without the `...`, this field would not unify with its previous declaration.
1376MyStruct :: {
1377 enabled: bool | *false
1378 ...
1379}
1380
1381myValue: MyStruct & {
1382 feild: 2 // error, feild not defined in MyStruct
1383 enabled: true // okay
1384}
1385
1386D :: {
1387 OneOf
1388
1389 c: int // adds this field.
1390}
1391
1392OneOf :: { a: int } | { b: int }
1393
1394
1395D1: D & { a: 12, c: 22 } // { a: 12, c: 22 }
1396D2: D & { a: 12, b: 33 } // _|_ // cannot define both `a` and `b`
1397```
1398
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001399
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001400<!---
1401JSON fields are usual camelCase. Clashes can be avoided by adopting the
1402convention that definitions be TitleCase. Unexported definitions are still
1403subject to clashes, but those are likely easier to resolve because they are
1404package internal.
1405--->
1406
1407
1408#### Field attributes
1409
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001410Fields may be associated with attributes.
1411Attributes define additional information about a field,
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001412such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001413name of the field when mapping to a different language.
1414
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001415<!-- TODO define attribute syntax here, before getting into semantics. -->
1416
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001417If a field has multiple attributes their identifiers must be unique.
1418Attributes accumulate when unifying two fields, removing duplicate entries.
1419It is an error for the resulting field to have two different attributes
1420with the same identifier.
1421
1422Attributes are not directly part of the data model, but may be
1423accessed through the API or other means of reflection.
1424The interpretation of the attribute value
1425(a comma-separated list of attribute elements) depends on the attribute.
1426Interpolations are not allowed in attribute strings.
1427
1428The recommended convention, however, is to interpret the first
1429`n` arguments as positional arguments,
1430where duplicate conflicting entries are an error,
1431and the remaining arguments as a combination of flags
1432(an identifier) and key value pairs, separated by a `=`.
1433
1434```
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001435myStruct1: {
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001436 field: string @go(Field)
1437 attr: int @xml(,attr) @go(Attr)
1438}
1439
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001440myStruct2: {
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001441 field: string @go(Field)
1442 attr: int @xml(a1,attr) @go(Attr)
1443}
1444
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001445Combined: myStruct1 & myStruct2
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001446// field: string @go(Field)
1447// attr: int @xml(,attr) @xml(a1,attr) @go(Attr)
1448```
1449
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001450#### Aliases
1451
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001452In addition to fields, a struct literal may also define aliases.
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001453Aliases name values that can be referred to
1454within the [scope](#declarations-and-scopes) of their
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001455definition, but are not part of the struct: aliases are irrelevant to
1456the partial ordering of values and are not emitted as part of any
1457generated data.
1458The name of an alias must be unique within the struct literal.
1459
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001460<!-- TODO: explain the difference between aliases and definitions.
1461 Now that you have definitions, are aliases really necessary?
1462 Consider removing.
1463-->
1464
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001465```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001466// The empty struct.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001467{}
1468
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001469// A struct with 3 fields and 1 alias.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001470{
1471 alias = 3
1472
1473 foo: 2
1474 bar: "a string"
1475
1476 "not an ident": 4
1477}
1478```
1479
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001480#### Shorthand notation for nested structs
1481
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001482A field whose value is a struct with a single field may be written as
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001483a sequence of the two field names,
1484followed by a colon and the value of that single field.
1485
1486```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001487job myTask replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001488```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001489expands to
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001490```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001491job: {
1492 myTask: {
1493 replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001494 }
1495}
1496```
1497
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001498<!-- OPTIONAL FIELDS:
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001499
Marcel van Lohuizen08a0ef22019-03-28 09:12:19 +01001500The optional marker solves the issue of having to print large amounts of
1501boilerplate when dealing with large types with many optional or default
1502values (such as Kubernetes).
1503Writing such optional values in terms of *null | value is tedious,
1504unpleasant to read, and as it is not well defined what can be dropped or not,
1505all null values have to be emitted from the output, even if the user
1506doesn't override them.
1507Part of the issue is how null is defined. We could adopt a Typescript-like
1508approach of introducing "void" or "undefined" to mean "not defined and not
1509part of the output". But having all of null, undefined, and void can be
1510confusing. If these ever are introduced anyway, the ? operator could be
1511expressed along the lines of
1512 foo?: bar
1513being a shorthand for
1514 foo: void | bar
1515where void is the default if no other default is given.
1516
1517The current mechanical definition of "?" is straightforward, though, and
1518probably avoids the need for void, while solving a big issue.
1519
1520Caveats:
1521[1] this definition requires explicitly defined fields to be emitted, even
1522if they could be elided (for instance if the explicit value is the default
1523value defined an optional field). This is probably a good thing.
1524
1525[2] a default value may still need to be included in an output if it is not
1526the zero value for that field and it is not known if any outside system is
1527aware of defaults. For instance, which defaults are specified by the user
1528and which by the schema understood by the receiving system.
1529The use of "?" together with defaults should therefore be used carefully
1530in non-schema definitions.
1531Problematic cases should be easy to detect by a vet-like check, though.
1532
1533[3] It should be considered how this affects the trim command.
1534Should values implied by optional fields be allowed to be removed?
1535Probably not. This restriction is unlikely to limit the usefulness of trim,
1536though.
1537
1538[4] There should be an option to emit all concrete optional values.
1539```
1540-->
1541
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001542### Lists
1543
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001544A list literal defines a new value of type list.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001545A list may be open or closed.
1546An open list is indicated with a `...` at the end of an element list,
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001547optionally followed by a value for the remaining elements.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001548
1549The length of a closed list is the number of elements it contains.
1550The length of an open list is the its number of elements as a lower bound
1551and an unlimited number of elements as its upper bound.
1552
1553```
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01001554ListLit = "[" [ ElementList [ "," [ "..." [ Expression ] ] ] "]" .
1555ElementList = Expression { "," Expression } .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001556```
1557<!---
1558KeyedElement = Element .
1559--->
1560
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001561Lists can be thought of as structs:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001562
1563```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001564List: *null | {
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001565 Elem: _
1566 Tail: List
1567}
1568```
1569
1570For closed lists, `Tail` is `null` for the last element, for open lists it is
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001571`*null | List`, defaulting to the shortest variant.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001572For instance, the open list [ 1, 2, ... ] can be represented as:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001573```
1574open: List & { Elem: 1, Tail: { Elem: 2 } }
1575```
1576and the closed version of this list, [ 1, 2 ], as
1577```
1578closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
1579```
1580
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001581Using this representation, the subsumption rule for lists can
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001582be derived from those of structs.
1583Implementations are not required to implement lists as structs.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001584The `Elem` and `Tail` fields are not special and `len` will not work as
1585expected in these cases.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001586
1587
1588## Declarations and Scopes
1589
1590
1591### Blocks
1592
1593A _block_ is a possibly empty sequence of declarations.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001594The braces of a struct literal `{ ... }` form a block, but there are
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001595others as well:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001596
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001597- The _universe block_ encompasses all CUE source text.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001598- Each [package](#modules-instances-and-packages) has a _package block_
1599 containing all CUE source text in that package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001600- Each file has a _file block_ containing all CUE source text in that file.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001601- Each `for` and `let` clause in a [comprehension](#comprehensions)
1602 is considered to be its own implicit block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001603
1604Blocks nest and influence [scoping].
1605
1606
1607### Declarations and scope
1608
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001609A _declaration_ may bind an identifier to a field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001610Every identifier in a program must be declared.
1611Other than for fields,
1612no identifier may be declared twice within the same block.
1613For fields an identifier may be declared more than once within the same block,
1614resulting in a field with a value that is the result of unifying the values
1615of all fields with the same identifier.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001616String labels do not bind an identifier to the respective field.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001617
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001618The _scope_ of a declared identifier is the extent of source text in which the
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001619identifier denotes the specified field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001620
1621CUE is lexically scoped using blocks:
1622
Jonathan Amsterdame4790382019-01-20 10:29:29 -050016231. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010016241. The scope of an identifier denoting a field or alias
1625 declared at top level (outside any struct literal) is the file block.
16261. The scope of the package name of an imported package is the file block of the
1627 file containing the import declaration.
16281. The scope of a field or alias identifier declared inside a struct literal
1629 is the innermost containing block.
1630
1631An identifier declared in a block may be redeclared in an inner block.
1632While the identifier of the inner declaration is in scope, it denotes the entity
1633declared by the inner declaration.
1634
1635The package clause is not a declaration;
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001636the package name does not appear in any scope.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001637Its purpose is to identify the files belonging to the same package
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001638and to specify the default name for import declarations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001639
1640
1641### Predeclared identifiers
1642
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001643CUE predefines a set of types and builtin functions.
1644For each of these there is a corresponding keyword which is the name
1645of the predefined identifier, prefixed with `__`.
1646
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001647```
1648Functions
1649len required close open
1650
1651Types
1652null The null type and value
1653bool All boolean values
1654int All integral numbers
1655float All decimal floating-point numbers
1656string Any valid UTF-8 sequence
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001657bytes Any valid byte sequence
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001658
1659Derived Value
1660number int | float
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001661uint >=0
1662uint8 >=0 & <=255
1663int8 >=-128 & <=127
1664uint16 >=0 & <=65536
1665int16 >=-32_768 & <=32_767
1666rune >=0 & <=0x10FFFF
1667uint32 >=0 & <=4_294_967_296
1668int32 >=-2_147_483_648 & <=2_147_483_647
1669uint64 >=0 & <=18_446_744_073_709_551_615
1670int64 >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
1671uint128 >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
1672int128 >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
1673 <=170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001674float32 >=-3.40282346638528859811704183484516925440e+38 &
1675 <=3.40282346638528859811704183484516925440e+38
1676float64 >=-1.797693134862315708145274237317043567981e+308 &
1677 <=1.797693134862315708145274237317043567981e+308
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001678```
1679
1680
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001681### Exported identifiers
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001682
1683An identifier of a package may be exported to permit access to it
1684from another package.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001685An identifier is exported if
1686the first character of the identifier's name is a Unicode upper case letter
1687(Unicode class "Lu"); and
1688the identifier is declared in the file block.
1689All other top-level identifiers used for fields not exported.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001690
1691In addition, any definition declared anywhere within a package of which
1692the first character of the identifier's name is a Unicode upper case letter
1693(Unicode class "Lu") is visible outside this package.
1694Any other defintion is not visible outside the package and resides
1695in a separate namespace than namesake identifiers of other packages.
1696This is in contrast to ordinary field declarations that do not begin with
1697an upper-case letter, which are visible outside the package.
1698
1699```
1700package mypackage
1701
1702foo: string // not visible outside mypackage
1703
1704Foo :: { // visible outside mypackage
1705 a: 1 // visible outside mypackage
1706 B: 2 // visible outside mypackage
1707
1708 C :: { // visible outside mypackage
1709 d: 4 // visible outside mypackage
1710 }
1711 e :: foo // not visible outside mypackage
1712}
1713```
1714
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001715
1716### Uniqueness of identifiers
1717
1718Given a set of identifiers, an identifier is called unique if it is different
1719from every other in the set, after applying normalization following
1720Unicode Annex #31.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001721Two identifiers are different if they are spelled differently
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001722or if they appear in different packages and are not exported.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001723Otherwise, they are the same.
1724
1725
1726### Field declarations
1727
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001728A field associates the value of an expression to a label within a struct.
1729If this label is an identifier, it binds the field to that identifier,
1730so the field's value can be referenced by writing the identifier.
1731String labels are not bound to fields.
1732```
1733a: {
1734 b: 2
1735 "s": 3
1736
1737 c: b // 2
1738 d: s // _|_ unresolved identifier "s"
1739 e: a.s // 3
1740}
1741```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001742
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001743If an expression may result in a value associated with a default value
1744as described in [default values](#default-values), the field binds to this
1745value-default pair.
1746
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001747
Marcel van Lohuizenbcf832f2019-04-03 22:50:44 +02001748<!-- TODO: disallow creating identifiers starting with __
1749...and reserve them for builtin values.
1750
1751The issue is with code generation. As no guarantee can be given that
1752a predeclared identifier is not overridden in one of the enclosing scopes,
1753code will have to handle detecting such cases and renaming them.
1754An alternative is to have the predeclared identifiers be aliases for namesake
1755equivalents starting with a double underscore (e.g. string -> __string),
1756allowing generated code (normal code would keep using `string`) to refer
1757to these directly.
1758-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001759
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001760
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001761### Alias declarations
1762
1763An alias declaration binds an identifier to the given expression.
1764
1765Within the scope of the identifier, it serves as an _alias_ for that
1766expression.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001767The expression is evaluated in the scope it was declared.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001768
1769
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001770## Expressions
1771
1772An expression specifies the computation of a value by applying operators and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001773built-in functions to operands.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001774
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001775Expressions that require concrete values are called _incomplete_ if any of
1776their operands are not concrete, but define a value that would be legal for
1777that expression.
1778Incomplete expressions may be left unevaluated until a concrete value is
1779requested at the application level.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001780
1781### Operands
1782
1783Operands denote the elementary values in an expression.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001784An operand may be a literal, a (possibly qualified) identifier denoting
1785field, alias, or a parenthesized expression.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001786
1787```
1788Operand = Literal | OperandName | ListComprehension | "(" Expression ")" .
1789Literal = BasicLit | ListLit | StructLit .
1790BasicLit = int_lit | float_lit | string_lit |
1791 null_lit | bool_lit | bottom_lit | top_lit .
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001792OperandName = identifier | QualifiedIdent .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001793```
1794
1795### Qualified identifiers
1796
1797A qualified identifier is an identifier qualified with a package name prefix.
1798
1799```
1800QualifiedIdent = PackageName "." identifier .
1801```
1802
1803A qualified identifier accesses an identifier in a different package,
1804which must be [imported].
1805The identifier must be declared in the [package block] of that package.
1806
1807```
1808math.Sin // denotes the Sin function in package math
1809```
1810
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001811### References
1812
1813An identifier operand refers to a field and is called a reference.
1814The value of a reference is a copy of the expression associated with the field
1815that it is bound to,
1816with any references within that expression bound to the respective copies of
1817the fields they were originally bound to.
1818Implementations may use a different mechanism to evaluate as long as
1819these semantics are maintained.
1820
1821```
1822a: {
1823 place: string
1824 greeting: "Hello, \(place)!"
1825}
1826
1827b: a & { place: "world" }
1828c: a & { place: "you" }
1829
1830d: b.greeting // "Hello, world!"
1831e: c.greeting // "Hello, you!"
1832```
1833
1834
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001835
1836### Primary expressions
1837
1838Primary expressions are the operands for unary and binary expressions.
1839
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001840
1841```
1842
1843Slice: indices must be complete
1844([0, 1, 2, 3] | [2, 3])[0:2] => [0, 1] | [2, 3]
1845
1846([0, 1, 2, 3] | *[2, 3])[0:2] => [0, 1] | [2, 3]
1847([0,1,2,3]|[2,3], [2,3])[0:2] => ([0,1]|[2,3], [2,3])
1848
1849Index
1850a: (1|2, 1)
1851b: ([0,1,2,3]|[2,3], [2,3])[a] => ([0,1,2,3]|[2,3][a], 3)
1852
1853Binary operation
1854A binary is only evaluated if its operands are complete.
1855
1856Input Maximum allowed evaluation
1857a: string string
1858b: 2 2
1859c: a * b a * 2
1860
1861An error in a struct is if the evaluation of any expression results in
1862bottom, where an incomplete expression is not considered bottom.
1863```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001864<!-- TODO(mpvl)
1865 Conversion |
1866-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001867```
1868PrimaryExpr =
1869 Operand |
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001870 PrimaryExpr Selector |
1871 PrimaryExpr Index |
1872 PrimaryExpr Slice |
1873 PrimaryExpr Arguments .
1874
1875Selector = "." identifier .
1876Index = "[" Expression "]" .
1877Slice = "[" [ Expression ] ":" [ Expression ] "]"
1878Argument = Expression .
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001879Arguments = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001880```
1881<!---
1882Argument = Expression | ( identifer ":" Expression ).
1883--->
1884
1885```
1886x
18872
1888(s + ".txt")
1889f(3.1415, true)
1890m["foo"]
1891s[i : j + 1]
1892obj.color
1893f.p[i].x
1894```
1895
1896
1897### Selectors
1898
Roger Peppeded0e1d2019-09-24 16:39:36 +01001899For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause),
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001900the selector expression
1901
1902```
1903x.f
1904```
1905
1906denotes the field `f` of the value `x`.
1907The identifier `f` is called the field selector.
1908The type of the selector expression is the type of `f`.
Roger Peppeded0e1d2019-09-24 16:39:36 +01001909If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001910
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001911<!--
1912TODO: consider allowing this and also for selectors. It needs to be considered
1913how defaults are corried forward in cases like:
1914
1915 x: { a: string | *"foo" } | *{ a: int | *4 }
1916 y: x.a & string
1917
1918What is y in this case?
1919 (x.a & string, _|_)
1920 (string|"foo", _|_)
1921 (string|"foo", "foo)
1922If the latter, then why?
1923
1924For a disjunction of the form `x1 | ... | xn`,
1925the selector is applied to each element `x1.f | ... | xn.f`.
1926-->
1927
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001928Otherwise, if `x` is not a struct, or if `f` does not exist in `x`,
1929the result of the expression is bottom (an error).
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001930In the latter case the expression is incomplete.
1931The operand of a selector may be associated with a default.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001932
1933```
1934T: {
1935 x: int
1936 y: 3
1937}
1938
1939a: T.x // int
1940b: T.y // 3
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001941c: T.z // _|_ // field 'z' not found in T
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001942
1943e: {a: 1|*2} | *{a: 3|*4}
1944f: e.a // 4 (default value)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001945```
1946
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001947<!--
1948```
1949(v, d).f => (v.f, d.f)
1950
1951e: {a: 1|*2} | *{a: 3|*4}
1952f: e.a // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
1953
1954```
1955-->
1956
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001957
1958### Index expressions
1959
1960A primary expression of the form
1961
1962```
1963a[x]
1964```
1965
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001966denotes the element of a list or struct `a` indexed by `x`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001967The value `x` is called the index or field name, respectively.
1968The following rules apply:
1969
1970If `a` is not a struct:
1971
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001972- `a` is a list (which need not be complete)
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001973- the index `x` unified with `int` must be concrete.
1974- the index `x` is in range if `0 <= x < len(a)`, where only the
1975 explicitly defined values of an open-ended list are considered,
1976 otherwise it is out of range
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001977
1978The result of `a[x]` is
1979
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001980for `a` of list type:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001981
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001982- the list element at index `x`, if `x` is within range
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001983- bottom (an error), otherwise
1984
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001985
1986for `a` of struct type:
1987
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001988- the index `x` unified with `string` must be concrete.
Marcel van Lohuizend2825532019-09-23 12:44:01 +01001989- the value of the regular and non-optional field named `x` of struct `a`,
1990 if this field exists
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001991- bottom (an error), otherwise
1992
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001993
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001994```
1995[ 1, 2 ][1] // 2
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001996[ 1, 2 ][2] // _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001997[ 1, 2, ...][2] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001998```
1999
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002000Both the operand and index value may be a value-default pair.
2001```
2002va[vi] => va[vi]
2003va[(vi, di)] => (va[vi], va[di])
2004(va, da)[vi] => (va[vi], da[vi])
2005(va, da)[(vi, di)] => (va[vi], da[di])
2006```
2007
2008```
2009Fields Result
2010x: [1, 2] | *[3, 4] ([1,2]|[3,4], [3,4])
2011i: int | *1 (int, 1)
2012
2013v: x[i] (x[i], 4)
2014```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002015
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002016### Operators
2017
2018Operators combine operands into expressions.
2019
2020```
2021Expression = UnaryExpr | Expression binary_op Expression .
2022UnaryExpr = PrimaryExpr | unary_op UnaryExpr .
2023
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002024binary_op = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op .
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01002025rel_op = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002026add_op = "+" | "-" .
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002027mul_op = "*" | "/" | "div" | "mod" | "quo" | "rem" .
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01002028unary_op = "+" | "-" | "!" | "*" | rel_op .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002029```
2030
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002031Comparisons are discussed [elsewhere](#Comparison-operators).
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01002032For any binary operators, the operand types must unify.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002033<!-- TODO: durations
2034 unless the operation involves durations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002035
2036Except for duration operations, if one operand is an untyped [literal] and the
2037other operand is not, the constant is [converted] to the type of the other
2038operand.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002039-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002040
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002041Operands of unary and binary expressions may be associated with a default using
2042the following
2043<!--
2044```
2045O1: op (v1, d1) => (op v1, op d1)
2046
2047O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
2048and because v => (v, v)
2049O3: v1 op (v2, d2) => (v1 op v2, v1 op d2)
2050O4: (v1, d1) op v2 => (v1 op v2, d1 op v2)
2051```
2052-->
2053
2054```
2055Field Resulting Value-Default pair
2056a: *1|2 (1|2, 1)
2057b: -a (-a, -1)
2058
2059c: a + 2 (a+2, 3)
2060d: a + a (a+a, 2)
2061```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002062
2063#### Operator precedence
2064
2065Unary operators have the highest precedence.
2066
2067There are eight precedence levels for binary operators.
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002068Multiplication operators binds strongest, followed by
2069addition operators, comparison operators,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002070`&&` (logical AND), `||` (logical OR), `&` (unification),
2071and finally `|` (disjunction):
2072
2073```
2074Precedence Operator
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002075 7 * / div mod quo rem
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002076 6 + -
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01002077 5 == != < <= > >= =~ !~
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002078 4 &&
2079 3 ||
2080 2 &
2081 1 |
2082```
2083
2084Binary operators of the same precedence associate from left to right.
2085For instance, `x / y * z` is the same as `(x / y) * z`.
2086
2087```
2088+x
208923 + 3*x[i]
2090x <= f()
2091f() || g()
2092x == y+1 && y == z-1
20932 | int
2094{ a: 1 } & { b: 2 }
2095```
2096
2097#### Arithmetic operators
2098
2099Arithmetic operators apply to numeric values and yield a result of the same type
2100as the first operand. The three of the four standard arithmetic operators
2101`(+, -, *)` apply to integer and decimal floating-point types;
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01002102`+` and `*` also apply to lists and strings.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002103`/` only applies to decimal floating-point types and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002104`div`, `mod`, `quo`, and `rem` only apply to integer types.
2105
2106```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002107+ sum integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002108- difference integers, floats
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002109* product integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002110/ quotient floats
2111div division integers
2112mod modulo integers
2113quo quotient integers
2114rem remainder integers
2115```
2116
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002117For any operator that accepts operands of type `float`, any operand may be
2118of type `int` or `float`, in which case the result will be `float` if any
2119of the operands is `float` or `int` otherwise.
2120For `/` the result is always `float`.
2121
2122
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002123#### Integer operators
2124
2125For two integer values `x` and `y`,
2126the integer quotient `q = x div y` and remainder `r = x mod y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002127implement Euclidean division and
2128satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002129
2130```
2131r = x - y*q with 0 <= r < |y|
2132```
2133where `|y|` denotes the absolute value of `y`.
2134
2135```
2136 x y x div y x mod y
2137 5 3 1 2
2138-5 3 -2 1
2139 5 -3 -1 2
2140-5 -3 2 1
2141```
2142
2143For two integer values `x` and `y`,
2144the integer quotient `q = x quo y` and remainder `r = x rem y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002145implement truncated division and
2146satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002147
2148```
2149x = q*y + r and |r| < |y|
2150```
2151
2152with `x quo y` truncated towards zero.
2153
2154```
2155 x y x quo y x rem y
2156 5 3 1 2
2157-5 3 -1 -2
2158 5 -3 -1 2
2159-5 -3 1 -2
2160```
2161
2162A zero divisor in either case results in bottom (an error).
2163
2164For integer operands, the unary operators `+` and `-` are defined as follows:
2165
2166```
2167+x is 0 + x
2168-x negation is 0 - x
2169```
2170
2171
2172#### Decimal floating-point operators
2173
2174For decimal floating-point numbers, `+x` is the same as `x`,
2175while -x is the negation of x.
2176The result of a floating-point division by zero is bottom (an error).
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002177<!-- TODO: consider making it +/- Inf -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002178
2179An implementation may combine multiple floating-point operations into a single
2180fused operation, possibly across statements, and produce a result that differs
2181from the value obtained by executing and rounding the instructions individually.
2182
2183
2184#### List operators
2185
2186Lists can be concatenated using the `+` operator.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002187Opens list are closed to their default value beforehand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002188
2189```
2190[ 1, 2 ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
2191[ 1, 2, ... ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002192[ 1, 2 ] + [ 3, 4, ... ] // [ 1, 2, 3, 4 ]
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002193```
2194
Jonathan Amsterdam0500c312019-02-16 18:04:09 -05002195Lists can be multiplied with a non-negative`int` using the `*` operator
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01002196to create a repeated the list by the indicated number.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002197```
21983*[1,2] // [1, 2, 1, 2, 1, 2]
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +020021993*[1, 2, ...] // [1, 2, 1, 2, 1 ,2]
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01002200[byte]*4 // [byte, byte, byte, byte]
Jonathan Amsterdam0500c312019-02-16 18:04:09 -050022010*[1,2] // []
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002202```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002203
2204<!-- TODO(mpvl): should we allow multiplication with a range?
2205If so, how does one specify a list with a range of possible lengths?
2206
2207Suggestion from jba:
2208Multiplication should distribute over disjunction,
2209so int(1)..int(3) * [x] = [x] | [x, x] | [x, x, x].
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002210The hard part is figuring out what (>=1 & <=3) * [x] means,
2211since >=1 & <=3 includes many floats.
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002212(mpvl: could constrain arguments to parameter types, but needs to be
2213done consistently.)
2214-->
2215
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002216
2217#### String operators
2218
2219Strings can be concatenated using the `+` operator:
2220```
2221s := "hi " + name + " and good bye"
2222```
2223String addition creates a new string by concatenating the operands.
2224
2225A string can be repeated by multiplying it:
2226
2227```
2228s: "etc. "*3 // "etc. etc. etc. "
2229```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002230<!-- jba: Do these work for byte sequences? If not, why not? -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002231
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002232
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002233##### Comparison operators
2234
2235Comparison operators compare two operands and yield an untyped boolean value.
2236
2237```
2238== equal
2239!= not equal
2240< less
2241<= less or equal
2242> greater
2243>= greater or equal
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002244=~ matches regular expression
2245!~ does not match regular expression
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002246```
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002247<!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002248
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002249In any comparison, the types of the two operands must unify or one of the
2250operands must be null.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002251
2252The equality operators `==` and `!=` apply to operands that are comparable.
2253The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002254The matching operators `=~` and `!~` apply to a string and regular
2255expression operand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002256These terms and the result of the comparisons are defined as follows:
2257
Marcel van Lohuizen855243e2019-02-07 18:00:55 +01002258- Null is comparable with itself and any other type.
2259 Two null values are always equal, null is unequal with anything else.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002260- Boolean values are comparable.
2261 Two boolean values are equal if they are either both true or both false.
2262- Integer values are comparable and ordered, in the usual way.
2263- Floating-point values are comparable and ordered, as per the definitions
2264 for binary coded decimals in the IEEE-754-2008 standard.
Marcel van Lohuizen4a360992019-05-11 18:18:31 +02002265- Floating point numbers may be compared with integers.
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02002266- String and bytes values are comparable and ordered lexically byte-wise.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002267- Struct are not comparable.
Marcel van Lohuizen855243e2019-02-07 18:00:55 +01002268- Lists are not comparable.
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002269- The regular expression syntax is the one accepted by RE2,
2270 described in https://github.com/google/re2/wiki/Syntax,
2271 except for `\C`.
2272- `s =~ r` is true if `s` matches the regular expression `r`.
2273- `s !~ r` is true if `s` does not match regular expression `r`.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02002274<!--- TODO: consider the following
2275- For regular expression, named capture groups are interpreted as CUE references
2276 that must unify with the strings matching this capture group.
2277--->
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002278<!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
Marcel van Lohuizen88a8a5f2019-02-20 01:26:22 +01002279<!-- Consider implementing Level 2 of Unicode regular expression. -->
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002280
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002281```
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +010022823 < 4 // true
Marcel van Lohuizen4a360992019-05-11 18:18:31 +020022833 < 4.0 // true
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002284null == 2 // false
2285null != {} // true
2286{} == {} // _|_: structs are not comparable against structs
2287
2288"Wild cats" =~ "cat" // true
2289"Wild cats" !~ "dog" // true
2290
2291"foo" =~ "^[a-z]{3}$" // true
2292"foo" =~ "^[a-z]{4}$" // false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002293```
2294
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002295<!-- jba
2296I think I know what `3 < a` should mean if
2297
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002298 a: >=1 & <=5
2299
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002300It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
2301
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002302But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002303-->
2304
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002305#### Logical operators
2306
2307Logical operators apply to boolean values and yield a result of the same type
2308as the operands. The right operand is evaluated conditionally.
2309
2310```
2311&& conditional AND p && q is "if p then q else false"
2312|| conditional OR p || q is "if p then true else q"
2313! NOT !p is "not p"
2314```
2315
2316
2317<!--
2318### TODO TODO TODO
2319
23203.14 / 0.0 // illegal: division by zero
2321Illegal conversions always apply to CUE.
2322
2323Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
2324-->
2325
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002326<!--- TODO(mpvl): conversions
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002327### Conversions
2328Conversions are expressions of the form `T(x)` where `T` and `x` are
2329expressions.
2330The result is always an instance of `T`.
2331
2332```
2333Conversion = Expression "(" Expression [ "," ] ")" .
2334```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002335--->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002336<!---
2337
2338A literal value `x` can be converted to type T if `x` is representable by a
2339value of `T`.
2340
2341As a special case, an integer literal `x` can be converted to a string type
2342using the same rule as for non-constant x.
2343
2344Converting a literal yields a typed value as result.
2345
2346```
2347uint(iota) // iota value of type uint
2348float32(2.718281828) // 2.718281828 of type float32
2349complex128(1) // 1.0 + 0.0i of type complex128
2350float32(0.49999999) // 0.5 of type float32
2351float64(-1e-1000) // 0.0 of type float64
2352string('x') // "x" of type string
2353string(0x266c) // "♬" of type string
2354MyString("foo" + "bar") // "foobar" of type MyString
2355string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant
2356(*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
2357int(1.2) // illegal: 1.2 cannot be represented as an int
2358string(65.0) // illegal: 65.0 is not an integer constant
2359```
2360--->
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002361<!---
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002362
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002363A conversion is always allowed if `x` is an instance of `T`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002364
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002365If `T` and `x` of different underlying type, a conversion is allowed if
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002366`x` can be converted to a value `x'` of `T`'s type, and
2367`x'` is an instance of `T`.
2368A value `x` can be converted to the type of `T` in any of these cases:
2369
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002370- `x` is a struct and is subsumed by `T`.
2371- `x` and `T` are both integer or floating points.
2372- `x` is an integer or a byte sequence and `T` is a string.
2373- `x` is a string and `T` is a byte sequence.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002374
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002375Specific rules apply to conversions between numeric types, structs,
2376or to and from a string type. These conversions may change the representation
2377of `x`.
2378All other conversions only change the type but not the representation of x.
2379
2380
2381#### Conversions between numeric ranges
2382For the conversion of numeric values, the following rules apply:
2383
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +010023841. Any integer value can be converted into any other integer value
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002385 provided that it is within range.
23862. When converting a decimal floating-point number to an integer, the fraction
2387 is discarded (truncation towards zero). TODO: or disallow truncating?
2388
2389```
2390a: uint16(int(1000)) // uint16(1000)
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002391b: uint8(1000) // _|_ // overflow
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002392c: int(2.5) // 2 TODO: TBD
2393```
2394
2395
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002396#### Conversions to and from a string type
2397
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002398Converting a list of bytes to a string type yields a string whose successive
2399bytes are the elements of the slice.
2400Invalid UTF-8 is converted to `"\uFFFD"`.
2401
2402```
2403string('hell\xc3\xb8') // "hellø"
2404string(bytes([0x20])) // " "
2405```
2406
2407As string value is always convertible to a list of bytes.
2408
2409```
2410bytes("hellø") // 'hell\xc3\xb8'
2411bytes("") // ''
2412```
2413
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002414#### Conversions between list types
2415
2416Conversions between list types are possible only if `T` strictly subsumes `x`
2417and the result will be the unification of `T` and `x`.
2418
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002419If we introduce named types this would be different from IP & [10, ...]
2420
2421Consider removing this until it has a different meaning.
2422
2423```
2424IP: 4*[byte]
2425Private10: IP([10, ...]) // [10, byte, byte, byte]
2426```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002427
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002428#### Conversions between struct types
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002429
2430A conversion from `x` to `T`
2431is applied using the following rules:
2432
24331. `x` must be an instance of `T`,
24342. all fields defined for `x` that are not defined for `T` are removed from
2435 the result of the conversion, recursively.
2436
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002437<!-- jba: I don't think you say anywhere that the matching fields are unified.
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002438mpvl: they are not, x must be an instance of T, in which case x == T&x,
2439so unification would be unnecessary.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002440-->
Marcel van Lohuizena3f00972019-02-01 11:10:39 +01002441<!--
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002442```
2443T: {
2444 a: { b: 1..10 }
2445}
2446
2447x1: {
2448 a: { b: 8, c: 10 }
2449 d: 9
2450}
2451
2452c1: T(x1) // { a: { b: 8 } }
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002453c2: T({}) // _|_ // missing field 'a' in '{}'
2454c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002455```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002456-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002457
2458### Calls
2459
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002460Calls can be made to core library functions, called builtins.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002461Given an expression `f` of function type F,
2462```
2463f(a1, a2, … an)
2464```
2465calls `f` with arguments a1, a2, … an. Arguments must be expressions
2466of which the values are an instance of the parameter types of `F`
2467and are evaluated before the function is called.
2468
2469```
2470a: math.Atan2(x, y)
2471```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002472
2473In a function call, the function value and arguments are evaluated in the usual
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01002474order.
2475After they are evaluated, the parameters of the call are passed by value
2476to the function and the called function begins execution.
2477The return parameters
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002478of the function are passed by value back to the calling function when the
2479function returns.
2480
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002481
2482### Comprehensions
2483
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01002484Lists and fields can be constructed using comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002485
2486Each define a clause sequence that consists of a sequence of `for`, `if`, and
2487`let` clauses, nesting from left to right.
2488The `for` and `let` clauses each define a new scope in which new values are
2489bound to be available for the next clause.
2490
2491The `for` clause binds the defined identifiers, on each iteration, to the next
2492value of some iterable value in a new scope.
2493A `for` clause may bind one or two identifiers.
Marcel van Lohuizen4245fb42019-09-09 11:22:12 +02002494If there is one identifier, it binds it to the value of
2495a list element or struct field value.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002496If there are two identifiers, the first value will be the key or index,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002497if available, and the second will be the value.
2498
Marcel van Lohuizen4245fb42019-09-09 11:22:12 +02002499For lists, `for` iterates over all elements in the list after closing it.
2500For structs, `for` iterates over all non-optional regular fields.
2501
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002502An `if` clause, or guard, specifies an expression that terminates the current
2503iteration if it evaluates to false.
2504
2505The `let` clause binds the result of an expression to the defined identifier
2506in a new scope.
2507
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002508A current iteration is said to complete if the innermost block of the clause
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002509sequence is reached.
2510
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002511_List comprehensions_ specify a single expression that is evaluated and included
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002512in the list for each completed iteration.
2513
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002514_Field comprehensions_ follow a clause sequence with a struct literal,
2515where the struct literal is evaluated and embedded at the point of
2516declaration of the comprehension for each complete iteration.
2517As usual, fields in the struct may evaluate to the same label,
2518resulting in the unification of their values.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002519
2520```
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002521Comprehension = Clauses StructLit .
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002522ListComprehension = "[" Expression Clauses "]" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002523
2524Clauses = Clause { Clause } .
2525Clause = ForClause | GuardClause | LetClause .
2526ForClause = "for" identifier [ ", " identifier] "in" Expression .
2527GuardClause = "if" Expression .
2528LetClause = "let" identifier "=" Expression .
2529```
2530
2531```
2532a: [1, 2, 3, 4]
2533b: [ x+1 for x in a if x > 1] // [3, 4, 5]
2534
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002535c: {
2536 for x in a
2537 if x < 4
2538 let y = 1 {
2539 "\(x)": x + y
2540 }
2541}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002542d: { "1": 2, "2": 3, "3": 4 }
2543```
2544
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002545
2546### String interpolation
2547
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002548String interpolation allows constructing strings by replacing placeholder
2549expressions with their string representation.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002550String interpolation may be used in single- and double-quoted strings, as well
2551as their multiline equivalent.
2552
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002553A placeholder consists of "\(" followed by an expression and a ")". The
2554expression is evaluated within the scope within which the string is defined.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002555
2556```
2557a: "World"
2558b: "Hello \( a )!" // Hello World!
2559```
2560
2561
2562## Builtin Functions
2563
2564Built-in functions are predeclared. They are called like any other function.
2565
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002566
2567### `len`
2568
2569The built-in function `len` takes arguments of various types and return
2570a result of type int.
2571
2572```
2573Argument type Result
2574
2575string string length in bytes
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002576bytes length of byte sequence
2577list list length, smallest length for an open list
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002578struct number of distinct data fields, including optional
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002579```
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002580<!-- TODO: consider not supporting len, but instead rely on more
2581precisely named builtin functions:
2582 - strings.RuneLen(x)
2583 - bytes.Len(x) // x may be a string
2584 - struct.NumFooFields(x)
2585 - list.Len(x)
2586-->
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002587
2588```
2589Expression Result
2590len("Hellø") 6
2591len([1, 2, 3]) 3
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002592len([1, 2, ...]) >=2
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002593```
2594
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02002595
2596### `close`
2597
2598The builtin function `close` converts a partially defined, or open, struct
2599to a fully defined, or closed, struct.
2600
2601
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002602### `and`
2603
2604The built-in function `and` takes a list and returns the result of applying
2605the `&` operator to all elements in the list.
2606It returns top for the empty list.
2607
2608Expression: Result
2609and([a, b]) a & b
2610and([a]) a
2611and([]) _
2612
2613### `or`
2614
2615The built-in function `or` takes a list and returns the result of applying
2616the `|` operator to all elements in the list.
2617It returns bottom for the empty list.
2618
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002619```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002620Expression: Result
2621and([a, b]) a | b
2622and([a]) a
2623and([]) _|_
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002624```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002625
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002626
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002627## Cycles
2628
2629Implementations are required to interpret or reject cycles encountered
2630during evaluation according to the rules in this section.
2631
2632
2633### Reference cycles
2634
2635A _reference cycle_ occurs if a field references itself, either directly or
2636indirectly.
2637
2638```
2639// x references itself
2640x: x
2641
2642// indirect cycles
2643b: c
2644c: d
2645d: b
2646```
2647
2648Implementations should report these as an error except in the following cases:
2649
2650
2651#### Expressions that unify an atom with an expression
2652
2653An expression of the form `a & e`, where `a` is an atom
2654and `e` is an expression, always evaluates to `a` or bottom.
2655As it does not matter how we fail, we can assume the result to be `a`
2656and validate after the field in which the expression occurs has been evaluated
2657that `a == e`.
2658
2659```
Marcel van Lohuizeneac8f9a2019-08-03 13:53:56 +02002660// Config Evaluates to (requiring concrete values)
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002661x: { x: {
2662 a: b + 100 a: _|_ // cycle detected
2663 b: a - 100 b: _|_ // cycle detected
2664} }
2665
2666y: x & { y: {
2667 a: 200 a: 200 // asserted that 200 == b + 100
2668 b: 100
2669} }
2670```
2671
2672
2673#### Field values
2674
2675A field value of the form `r & v`,
2676where `r` evaluates to a reference cycle and `v` is a value,
2677evaluates to `v`.
2678Unification is idempotent and unifying a value with itself ad infinitum,
2679which is what the cycle represents, results in this value.
2680Implementations should detect cycles of this kind, ignore `r`,
2681and take `v` as the result of unification.
2682<!-- Tomabechi's graph unification algorithm
2683can detect such cycles at near-zero cost. -->
2684
2685```
2686Configuration Evaluated
2687// c Cycles in nodes of type struct evaluate
2688// ↙︎ ↖ to the fixed point of unifying their
2689// a → b values ad infinitum.
2690
2691a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 }
2692b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 }
2693c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 }
2694
2695// resolve a b & {x:1}
2696// substitute b c & {y:2} & {x:1}
2697// substitute c a & {z:3} & {y:2} & {x:1}
2698// eliminate a (cycle) {z:3} & {y:2} & {x:1}
2699// simplify {x:1,y:2,z:3}
2700```
2701
2702This rule also applies to field values that are disjunctions of unification
2703operations of the above form.
2704
2705```
2706a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1}
2707b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2}
2708c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3}
2709
2710
2711// resolving a b&{x:1} | {y:1}
2712// substitute b ({x:2} | c&{z:2})&{x:1} | {y:1}
2713// simplify c&{z:2}&{x:1} | {y:1}
2714// substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
2715// simplify a&{y:3}&{z:2}&{x:1} | {y:1}
2716// eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1}
2717// expand {x:1,y:3,z:2} | {y:1}
2718```
2719
2720Note that all nodes that form a reference cycle to form a struct will evaluate
2721to the same value.
2722If a field value is a disjunction, any element that is part of a cycle will
2723evaluate to this value.
2724
2725
2726### Structural cycles
2727
2728CUE disallows infinite structures.
2729Implementations must report an error when encountering such declarations.
2730
2731<!-- for instance using an occurs check -->
2732
2733```
2734// Disallowed: a list of infinite length with all elements being 1.
2735list: {
2736 head: 1
2737 tail: list
2738}
2739
2740// Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
2741a: {
2742 b: c
2743}
2744c: {
2745 d: a
2746}
2747```
2748
2749It is allowed for a value to define an infinite set of possibilities
2750without evaluating to an infinite structure itself.
2751
2752```
2753// List defines a list of arbitrary length (default null).
2754List: *null | {
2755 head: _
2756 tail: List
2757}
2758```
2759
2760<!--
Marcel van Lohuizen7f48df72019-02-01 17:24:59 +01002761Consider banning any construct that makes CUE not having a linear
2762running time expressed in the number of nodes in the output.
2763
2764This would require restricting constructs like:
2765
2766(fib&{n:2}).out
2767
2768fib: {
2769 n: int
2770
2771 out: (fib&{n:n-2}).out + (fib&{n:n-1}).out if n >= 2
2772 out: fib({n:n-2}).out + fib({n:n-1}).out if n >= 2
2773 out: n if n < 2
2774}
2775
2776-->
2777<!--
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002778### Unused fields
2779
2780TODO: rules for detection of unused fields
2781
27821. Any alias value must be used
2783-->
2784
2785
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002786## Modules, instances, and packages
2787
2788CUE configurations are constructed combining _instances_.
2789An instance, in turn, is constructed from one or more source files belonging
2790to the same _package_ that together declare the data representation.
2791Elements of this data representation may be exported and used
2792in other instances.
2793
2794### Source file organization
2795
2796Each source file consists of an optional package clause defining collection
2797of files to which it belongs,
2798followed by a possibly empty set of import declarations that declare
2799packages whose contents it wishes to use, followed by a possibly empty set of
2800declarations.
2801
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002802Like with a struct, a source file may contain embeddings.
2803Unlike with a struct, the embedded expressions may be any value.
2804If the result of the unification of all embedded values is not a struct,
2805it will be output instead of its enclosing file when exporting CUE
2806to a data format
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002807
2808```
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002809SourceFile = [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .
2810```
2811
2812```
2813"Hello \(place)!"
2814
2815place: "world"
2816
2817// Outputs "Hello world!"
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002818```
2819
2820### Package clause
2821
2822A package clause is an optional clause that defines the package to which
2823a source file the file belongs.
2824
2825```
2826PackageClause = "package" PackageName .
2827PackageName = identifier .
2828```
2829
2830The PackageName must not be the blank identifier.
2831
2832```
2833package math
2834```
2835
2836### Modules and instances
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002837A _module_ defines a tree of directories, rooted at the _module root_.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002838
2839All source files within a module with the same package belong to the same
2840package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002841<!-- jba: I can't make sense of the above sentence. -->
2842A module may define multiple packages.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002843
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002844An _instance_ of a package is any subset of files belonging
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002845to the same package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002846<!-- jba: Are you saying that -->
2847<!-- if I have a package with files a, b and c, then there are 8 instances of -->
2848<!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
2849<!-- purpose of that definition? -->
2850It is interpreted as the concatenation of these files.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002851
2852An implementation may impose conventions on the layout of package files
2853to determine which files of a package belongs to an instance.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002854For example, an instance may be defined as the subset of package files
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002855belonging to a directory and all its ancestors.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002856<!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002857
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002858
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002859### Import declarations
2860
2861An import declaration states that the source file containing the declaration
2862depends on definitions of the _imported_ package (§Program initialization and
2863execution) and enables access to exported identifiers of that package.
2864The import names an identifier (PackageName) to be used for access and an
2865ImportPath that specifies the package to be imported.
2866
2867```
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002868ImportDecl = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) .
Marcel van Lohuizenfbab65d2019-08-13 16:51:15 +02002869ImportSpec = [ PackageName ] ImportPath .
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002870ImportLocation = { unicode_value } .
2871ImportPath = `"` ImportLocation [ ":" identifier ] `"` .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002872```
2873
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002874The PackageName is used in qualified identifiers to access
2875exported identifiers of the package within the importing source file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002876It is declared in the file block.
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002877It defaults to the identifier specified in the package clause of the imported
2878package, which must match either the last path component of ImportLocation
2879or the identifier following it.
2880
2881<!--
2882Note: this deviates from the Go spec where there is no such restriction.
2883This restriction has the benefit of being to determine the identifiers
2884for packages from within the file itself. But for CUE it is has another benefit:
2885when using package hiearchies, one is more likely to want to include multiple
2886packages within the same directory structure. This mechanism allows
2887disambiguation in these cases.
2888-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002889
2890The interpretation of the ImportPath is implementation-dependent but it is
2891typically either the path of a builtin package or a fully qualifying location
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002892of a package within a source code repository.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002893
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002894An ImportLocation must be a non-empty strings using only characters belonging
2895Unicode's L, M, N, P, and S general categories
2896(the Graphic characters without spaces)
2897and may not include the characters !"#$%&'()*,:;<=>?[\]^`{|}
2898or the Unicode replacement character U+FFFD.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002899
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002900Assume we have package containing the package clause "package math",
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002901which exports function Sin at the path identified by "lib/math".
2902This table illustrates how Sin is accessed in files
2903that import the package after the various types of import declaration.
2904
2905```
2906Import declaration Local name of Sin
2907
2908import "lib/math" math.Sin
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002909import "lib/math:math" math.Sin
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002910import m "lib/math" m.Sin
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002911```
2912
2913An import declaration declares a dependency relation between the importing and
2914imported package. It is illegal for a package to import itself, directly or
2915indirectly, or to directly import a package without referring to any of its
2916exported identifiers.
2917
2918
2919### An example package
2920
2921TODO