blob: f6205c90b5006ea6871c91f42f22c3d7ab33b724 [file] [log] [blame] [view]
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001<!--
2 Copyright 2018 The CUE Authors
3
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License.
6 You may obtain a copy of the License at
7
8 http://www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in writing, software
11 distributed under the License is distributed on an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 See the License for the specific language governing permissions and
14 limitations under the License.
15-->
16
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010017# The CUE Language Specification
18
19## Introduction
20
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010021This is a reference manual for the CUE data constraint language.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010022CUE, pronounced cue or Q, is a general-purpose and strongly typed
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010023constraint-based language.
24It can be used for data templating, data validation, code generation, scripting,
25and many other applications involving structured data.
26The CUE tooling, layered on top of CUE, provides
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010027a general purpose scripting language for creating scripts as well as
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010028simple servers, also expressed in CUE.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010029
30CUE was designed with cloud configuration, and related systems, in mind,
31but is not limited to this domain.
32It derives its formalism from relational programming languages.
33This formalism allows for managing and reasoning over large amounts of
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010034data in a straightforward manner.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010035
36The grammar is compact and regular, allowing for easy analysis by automatic
37tools such as integrated development environments.
38
39This document is maintained by mpvl@golang.org.
40CUE has a lot of similarities with the Go language. This document draws heavily
Marcel van Lohuizen73f14eb2019-01-30 17:11:17 +010041from the Go specification as a result.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010042
43CUE draws its influence from many languages.
44Its main influences were BCL/ GCL (internal to Google),
45LKB (LinGO), Go, and JSON.
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +020046Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
Marcel van Lohuizen62658a82019-06-16 12:18:47 +020047Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010048
49
50## Notation
51
52The syntax is specified using Extended Backus-Naur Form (EBNF):
53
54```
55Production = production_name "=" [ Expression ] "." .
56Expression = Alternative { "|" Alternative } .
57Alternative = Term { Term } .
58Term = production_name | token [ "…" token ] | Group | Option | Repetition .
59Group = "(" Expression ")" .
60Option = "[" Expression "]" .
61Repetition = "{" Expression "}" .
62```
63
64Productions are expressions constructed from terms and the following operators,
65in increasing precedence:
66
67```
68| alternation
69() grouping
70[] option (0 or 1 times)
71{} repetition (0 to n times)
72```
73
74Lower-case production names are used to identify lexical tokens. Non-terminals
75are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes
76``.
77
78The form a … b represents the set of characters from a through b as
79alternatives. The horizontal ellipsis … is also used elsewhere in the spec to
80informally denote various enumerations or code snippets that are not further
81specified. The character … (as opposed to the three characters ...) is not a
Roger Peppeded0e1d2019-09-24 16:39:36 +010082token of the CUE language.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010083
84
85## Source code representation
86
87Source code is Unicode text encoded in UTF-8.
88Unless otherwise noted, the text is not canonicalized, so a single
89accented code point is distinct from the same character constructed from
90combining an accent and a letter; those are treated as two code points.
91For simplicity, this document will use the unqualified term character to refer
92to a Unicode code point in the source text.
93
94Each code point is distinct; for instance, upper and lower case letters are
95different characters.
96
97Implementation restriction: For compatibility with other tools, a compiler may
98disallow the NUL character (U+0000) in the source text.
99
100Implementation restriction: For compatibility with other tools, a compiler may
101ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
102point in the source text. A byte order mark may be disallowed anywhere else in
103the source.
104
105
106### Characters
107
108The following terms are used to denote specific Unicode character classes:
109
110```
111newline = /* the Unicode code point U+000A */ .
112unicode_char = /* an arbitrary Unicode code point except newline */ .
113unicode_letter = /* a Unicode code point classified as "Letter" */ .
114unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
115```
116
117In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
118character categories.
119CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
120as Unicode letters, and those in the Number category Nd as Unicode digits.
121
122
123### Letters and digits
124
125The underscore character _ (U+005F) is considered a letter.
126
127```
128letter = unicode_letter | "_" .
129decimal_digit = "0" … "9" .
130octal_digit = "0" … "7" .
131hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
132```
133
134
135## Lexical elements
136
137### Comments
Marcel van Lohuizen7fc421b2019-09-11 09:24:03 +0200138Comments serve as program documentation.
139CUE supports line comments that start with the character sequence //
140and stop at the end of the line.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100141
Marcel van Lohuizen7fc421b2019-09-11 09:24:03 +0200142A comment cannot start inside a string literal or inside a comment.
143A comment acts like a newline.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100144
145
146### Tokens
147
148Tokens form the vocabulary of the CUE language. There are four classes:
149identifiers, keywords, operators and punctuation, and literals. White space,
150formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
151(U+000D), and newlines (U+000A), is ignored except as it separates tokens that
152would otherwise combine into a single token. Also, a newline or end of file may
153trigger the insertion of a comma. While breaking the input into tokens, the
154next token is the longest sequence of characters that form a valid token.
155
156
157### Commas
158
159The formal grammar uses commas "," as terminators in a number of productions.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500160CUE programs may omit most of these commas using the following two rules:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100161
162When the input is broken into tokens, a comma is automatically inserted into
163the token stream immediately after a line's final token if that token is
164
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500165- an identifier
166- null, true, false, bottom, or an integer, floating-point, or string literal
167- one of the characters ), ], or }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100168
169
170Although commas are automatically inserted, the parser will require
171explicit commas between two list elements.
172
173To reflect idiomatic use, examples in this document elide commas using
174these rules.
175
176
177### Identifiers
178
179Identifiers name entities such as fields and aliases.
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200180Identifier may be simple or quoted.
181A simple identifier is a sequence of one or more letters (which includes `_`) and digits.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100182It may not be `_`.
183The first character in an identifier must be a letter.
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200184Any sequence of letters, digits or `-` enclosed in
185backticks "`" make an identifier.
186The backticks are not part of the identifier.
187This allows one to refer to fields that are labeled
188with keywords or other identifiers that would
189otherwise not be legal.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100190
191<!--
192TODO: allow identifiers as defined in Unicode UAX #31
193(https://unicode.org/reports/tr31/).
194
195Identifiers are normalized using the NFC normal form.
196-->
197
198```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200199identifier = simple_identifier | quoted_identifier .
200simple_identifier = letter { letter | unicode_digit } .
201quoted_identifier = "`" { letter | unicode_digit | "-" } "`" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100202```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200203<!-- TODO: relax to allow other punctuation -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100204
205```
206a
207_x9
208fieldName
209αβ
210```
211
212<!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
213
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500214Some identifiers are [predeclared](#predeclared-identifiers).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100215
216
217### Keywords
218
219CUE has a limited set of keywords.
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200220In addition, CUE reserves all identifiers starting with `__`(double underscores)
221as keywords.
222These are typically targets of pre-declared identifiers.
223
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100224All keywords may be used as labels (field names).
225They cannot, however, be used as identifiers to refer to the same name.
226
227
228#### Values
229
230The following keywords are values.
231
232```
233null true false
234```
235
236These can never be used to refer to a field of the same name.
237This restriction is to ensure compatibility with JSON configuration files.
238
239
240#### Preamble
241
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100242The following keywords are used at the preamble of a CUE file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100243After the preamble, they may be used as identifiers to refer to namesake fields.
244
245```
246package import
247```
248
249
250#### Comprehension clauses
251
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100252The following keywords are used in comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100253
254```
255for in if let
256```
257
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100258The keywords `for`, `if` and `let` cannot be used as identifiers to
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200259refer to fields.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100260
261<!--
262TODO:
263 reduce [to]
264 order [by]
265-->
266
267
268#### Arithmetic
269
270The following pseudo keywords can be used as operators in expressions.
271
272```
273div mod quo rem
274```
275
276These may be used as identifiers to refer to fields in all other contexts.
277
278
279### Operators and punctuation
280
281The following character sequences represent operators and punctuation:
282
283```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200284+ div && == < = ( )
285- mod || != > :: { }
286* quo & =~ <= : [ ]
287/ rem | !~ >= . ... ,
288 _|_ !
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100289```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200290<!--
291Free tokens: # ; ~ $ ^
292
293// To be used:
294 @ at: associative lists.
295
296// Idea: use # instead of @ for attributes and allow then at declaration level.
297// This will open up the possibility of defining #! at the start of a file
298// without requiring special syntax. Although probably not quite.
299 -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100300
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +0100301
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100302### Integer literals
303
304An integer literal is a sequence of digits representing an integer value.
Marcel van Lohuizenb2703c62019-09-29 18:20:01 +0200305An optional prefix sets a non-decimal base: 0o for octal,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01003060x or 0X for hexadecimal, and 0b for binary.
307In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500308All integers allow interstitial underscores "_";
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100309these have no meaning and are solely for readability.
310
311Decimal integers may have a SI or IEC multiplier.
312Multipliers can be used with fractional numbers.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500313When multiplying a fraction by a multiplier, the result is truncated
314towards zero if it is not an integer.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100315
316```
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200317int_lit = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
318decimal_lit = ( "1" … "9" ) { [ "_" ] decimal_digit } .
319decimals = decimal_digit { [ "_" ] decimal_digit } .
320si_it = decimals [ "." decimals ] multiplier |
321 "." decimals multiplier .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100322binary_lit = "0b" binary_digit { binary_digit } .
323hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
Marcel van Lohuizenb2703c62019-09-29 18:20:01 +0200324octal_lit = "0o" octal_digit { [ "_" ] octal_digit } .
Marcel van Lohuizen6eefcd02019-10-04 13:32:06 +0200325multiplier = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ]
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200326
327float_lit = decimals "." [ decimals ] [ exponent ] |
328 decimals exponent |
329 "." decimals [ exponent ].
330exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100331```
Marcel van Lohuizen6eefcd02019-10-04 13:32:06 +0200332<!--
333TODO: consider allowing Exo (and up), if not followed by a sign
334or number. Alternatively one could only allow Ei, Yi, and Zi.
335-->
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100336
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100337```
33842
3391.5Gi
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100340170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizenfc6303c2019-02-07 17:49:04 +01003410xBad_Face
3420o755
3430b0101_0001
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100344```
345
346### Decimal floating-point literals
347
348A decimal floating-point literal is a representation of
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500349a decimal floating-point value (a _float_).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100350It has an integer part, a decimal point, a fractional part, and an
351exponent part.
352The integer and fractional part comprise decimal digits; the
353exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
354One of the integer part or the fractional part may be elided; one of the decimal
355point or the exponent may be elided.
356
357```
358decimal_lit = decimals "." [ decimals ] [ exponent ] |
359 decimals exponent |
360 "." decimals [ exponent ] .
361exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
362```
363
364```
3650.
36672.40
367072.40 // == 72.40
3682.71828
3691.e+0
3706.67428e-11
3711E6
372.25
373.12345E+5
374```
375
376
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100377### String and byte sequence literals
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100378
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100379A string literal represents a string constant obtained from concatenating a
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100380sequence of characters.
381Byte sequences are a sequence of bytes.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100382
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100383String and byte sequence literals are character sequences between,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100384respectively, double and single quotes, as in `"bar"` and `'bar'`.
385Within the quotes, any character may appear except newline and,
386respectively, unescaped double or single quote.
387String literals may only be valid UTF-8.
388Byte sequences may contain any sequence of bytes.
389
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400390Several escape sequences allow arbitrary values to be encoded as ASCII text.
391An escape sequence starts with an _escape delimiter_, which is `\` by default.
392The escape delimiter may be altered to be `\` plus a fixed number of
393hash symbols `#`
394by padding the start and end of a string or byte sequence literal
395with this number of hash symbols.
396
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100397There are four ways to represent the integer value as a numeric constant: `\x`
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400398followed by exactly two hexadecimal digits; `\u` followed by exactly four
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100399hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
400plain backslash `\` followed by exactly three octal digits.
401In each case the value of the literal is the value represented by the
402digits in the corresponding base.
403Hexadecimal and octal escapes are only allowed within byte sequences
404(single quotes).
405
406Although these representations all result in an integer, they have different
407valid ranges.
408Octal escapes must represent a value between 0 and 255 inclusive.
409Hexadecimal escapes satisfy this condition by construction.
410The escapes `\u` and `\U` represent Unicode code points so within them
411some values are illegal, in particular those above `0x10FFFF`.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400412Surrogate halves are allowed,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100413but are translated into their non-surrogate equivalent internally.
414
415The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
416represent individual bytes of the resulting string; all other escapes represent
417the (possibly multi-byte) UTF-8 encoding of individual characters.
418Thus inside a string literal `\377` and `\xFF` represent a single byte of
419value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
420the two bytes `0xc3 0xbf` of the UTF-8
421encoding of character `U+00FF`.
422
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100423```
424\a U+0007 alert or bell
425\b U+0008 backspace
426\f U+000C form feed
427\n U+000A line feed or newline
428\r U+000D carriage return
429\t U+0009 horizontal tab
430\v U+000b vertical tab
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100431\/ U+002f slash (solidus)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100432\\ U+005c backslash
433\' U+0027 single quote (valid escape only within single quoted literals)
434\" U+0022 double quote (valid escape only within double quoted literals)
435```
436
437The escape `\(` is used as an escape for string interpolation.
438A `\(` must be followed by a valid CUE Expression, followed by a `)`.
439
440All other sequences starting with a backslash are illegal inside literals.
441
442```
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400443escaped_char = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100444byte_value = octal_byte_value | hex_byte_value .
445octal_byte_value = `\` octal_digit octal_digit octal_digit .
446hex_byte_value = `\` "x" hex_digit hex_digit .
447little_u_value = `\` "u" hex_digit hex_digit hex_digit hex_digit .
448big_u_value = `\` "U" hex_digit hex_digit hex_digit hex_digit
449 hex_digit hex_digit hex_digit hex_digit .
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400450unicode_value = unicode_char | little_u_value | big_u_value | escaped_char .
451interpolation = "\(" Expression ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100452
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400453string_lit = simple_string_lit |
454 multiline_string_lit |
455 simple_bytes_lit |
456 multiline_bytes_lit |
457 `#` string_lit `#` .
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100458
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400459simple_string_lit = `"` { unicode_value | interpolation } `"` .
460simple_bytes_lit = `"` { unicode_value | interpolation | byte_value } `"` .
461multiline_string_lit = `"""` newline
462 { unicode_value | interpolation | newline }
463 newline `"""` .
464multiline_bytes_lit = "'''" newline
465 { unicode_value | interpolation | byte_value | newline }
466 newline "'''" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100467```
468
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400469Carriage return characters (`\r`) inside string literals are discarded from
Marcel van Lohuizendb9d25a2019-02-21 23:54:43 +0100470the string value.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400471
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100472```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100473'a\000\xab'
474'\007'
475'\377'
476'\xa' // illegal: too few hexadecimal digits
477"\n"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100478"\""
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100479'Hello, world!\n'
480"Hello, \( name )!"
481"日本語"
482"\u65e5本\U00008a9e"
483"\xff\u00FF"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100484"\uD800" // illegal: surrogate half (TODO: probably should allow)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100485"\U00110000" // illegal: invalid Unicode code point
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400486
487#"This is not an \(interpolation)"#
488#"This is an \#(interpolation)"#
489#"The sequence "\U0001F604" renders as \#U0001F604."#
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100490```
491
492These examples all represent the same string:
493
494```
495"日本語" // UTF-8 input text
496'日本語' // UTF-8 input text as byte sequence
497`日本語` // UTF-8 input text as a raw literal
498"\u65e5\u672c\u8a9e" // the explicit Unicode code points
499"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points
500"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
501```
502
503If the source code represents a character as two code points, such as a
504combining form involving an accent and a letter, the result will appear as two
505code points if placed in a string literal.
506
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400507Strings and byte sequences have a multiline equivalent.
508Multiline strings are like their single-line equivalent,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100509but allow newline characters.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100510
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400511Multiline strings and byte sequences respectively start with
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100512a triple double quote (`"""`) or triple single quote (`'''`),
513immediately followed by a newline, which is discarded from the string contents.
514The string is closed by a matching triple quote, which must be by itself
515on a newline, preceded by optional whitespace.
516The whitespace before a closing triple quote must appear before any non-empty
517line after the opening quote and will be removed from each of these
518lines in the string literal.
519A closing triple quote may not appear in the string.
520To include it is suffices to escape one of the quotes.
521
522```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100523"""
524 lily:
525 out of the water
526 out of itself
527
528 bass
529 picking bugs
530 off the moon
531 — Nick Virgilio, Selected Haiku, 1988
532 """
533```
534
535This represents the same string as:
536
537```
538"lily:\nout of the water\nout of itself\n\n" +
539"bass\npicking bugs\noff the moon\n" +
540" — Nick Virgilio, Selected Haiku, 1988"
541```
542
543<!-- TODO: other values
544
545Support for other values:
546- Duration literals
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +0100547- regular expessions: `re("[a-z]")`
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100548-->
549
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500550
551## Values
552
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100553In addition to simple values like `"hello"` and `42.0`, CUE has _structs_.
554A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
555Structs are CUE's only way of building up complex values;
556lists, which we will see later,
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500557are defined in terms of structs.
558
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100559All possible values are ordered in a lattice,
560a partial order where every two elements have a single greatest lower bound.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500561A value `a` is an _instance_ of a value `b`,
562denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
563that is if `a` orders before `b` in the partial order
564(`⊑` is _not_ a CUE operator).
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100565We also say that `b` _subsumes_ `a` in this case.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500566In graphical terms, `b` is "above" `a` in the lattice.
567
568At the top of the lattice is the single ancestor of all values, called
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100569_top_, denoted `_` in CUE.
570Every value is an instance of top.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500571
572At the bottom of the lattice is the value called _bottom_, denoted `_|_`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100573A bottom value usually indicates an error.
574Bottom is an instance of every value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500575
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100576An _atom_ is any value whose only instances are itself and bottom.
577Examples of atoms are `42.0`, `"hello"`, `true`, `null`.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500578
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100579A value is _concrete_ if it is either an atom, or a struct all of whose
580field values are themselves concrete, recursively.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500581
582CUE's values also include what we normally think of as types, like `string` and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100583`float`.
584But CUE does not distinguish between types and values; only the
585relationship of values in the lattice is important.
586Each CUE "type" subsumes the concrete values that one would normally think
587of as part of that type.
588For example, "hello" is an instance of `string`, and `42.0` is an instance of
589`float`.
590In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`.
591We informally call these CUE's "basic types".
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100592
593
594```
595false ⊑ bool
596true ⊑ bool
597true ⊑ true
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01005985.0 ⊑ float
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100599bool ⊑ _
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100600_|_ ⊑ _
601_|_ ⊑ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100602
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100603_ ⋢ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100604_ ⋢ bool
605int ⋢ bool
606bool ⋢ int
607false ⋢ true
608true ⋢ false
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100609float ⋢ 5.0
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01006105 ⋢ 6
611```
612
613
614### Unification
615
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500616The _unification_ of values `a` and `b`
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100617is defined as the greatest lower bound of `a` and `b`. (That is, the
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500618value `u` such that `u ⊑ a` and `u ⊑ b`,
619and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
620it holds that `v ⊑ u`.)
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500621Since CUE values form a lattice, the unification of two CUE values is
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400622always unique.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100623
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500624These all follow from the definition of unification:
625- The unification of `a` with itself is always `a`.
626- The unification of values `a` and `b` where `a ⊑ b` is always `a`.
627- The unification of a value with bottom is always bottom.
628
629Unification in CUE is a [binary expression](#Operands), written `a & b`.
630It is commutative and associative.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100631As a consequence, order of evaluation is irrelevant, a property that is key
632to many of the constructs in the CUE language as well as the tooling layered
633on top of it.
634
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500635
636
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100637<!-- TODO: explicitly mention that disjunction is not a binary operation
638but a definition of a single value?-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100639
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100640
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100641### Disjunction
642
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500643The _disjunction_ of values `a` and `b`
644is defined as the least upper bound of `a` and `b`.
645(That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
646and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
647it holds that `d ⊑ e`.)
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100648This style of disjunctions is sometimes also referred to as sum types.
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500649Since CUE values form a lattice, the disjunction of two CUE values is always unique.
650
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100651
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500652These all follow from the definition of disjunction:
653- The disjunction of `a` with itself is always `a`.
654- The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
655- The disjunction of a value `a` with bottom is always `a`.
656- The disjunction of two bottom values is bottom.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100657
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500658Disjunction in CUE is a [binary expression](#Operands), written `a | b`.
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100659It is commutative, associative, and idempotent.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100660
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100661The unification of a disjunction with another value is equal to the disjunction
662composed of the unification of this value with all of the original elements
663of the disjunction.
664In other words, unification distributes over disjunction.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100665
666```
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100667(a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100668```
669
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100670```
671Expression Result
672({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100673(int | string) & "foo" "foo"
674("a" | "b") & "c" _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100675```
676
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100677A disjunction is _normalized_ if there is no element
678`a` for which there is an element `b` such that `a ⊑ b`.
679
680<!--
681Normalization is important, as we need to account for spurious elements
682For instance "tcp" | "tcp" should resolve to "tcp".
683
684Also consider
685
686 ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
687
688in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
689this expression is logically equivalent to {a:1} and should therefore be
690considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
691
692For instance, in
693
694 x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
695 y: x.a // 1
696
697y should resolve to 1, and not an error.
698
699For comparison, in
700
701 x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
702 y: x.a // _|_
703
704y should be an error as x is still ambiguous before the selector is applied,
705even though `a` resolves to 1 in all cases.
706-->
707
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500708
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100709#### Default values
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500710
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100711Any element of a disjunction can be marked as a default
Axel Wagner8529d772019-09-24 18:27:12 +0000712by prefixing it with an asterisk `*`.
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100713Intuitively, when an expression needs to be resolved for an operation other
714than unification or disjunctions,
715non-starred elements are dropped in favor of starred ones if the starred ones
716do not resolve to bottom.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500717
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100718More precisely, any value `v` may be associated with a default value `d`,
719denoted `(v, d)` (not CUE syntax),
720where `d` must be in instance of `v` (`d ⊑ v`).
721The rules for unifying and disjoining such values are as follows:
722
723```
724U1: (v1, d1) & v2 => (v1&v2, d1&v2)
725U2: (v1, d1) & (v2, d2) => (v1&v2, d1&d2)
726
727D1: (v1, d1) | v2 => (v1|v2, d1)
728D2: (v1, d1) | (v2, d2) => (v1|v2, d1|d2)
729```
730
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100731Default values may be introduced within disjunctions
732by _marking_ terms of a disjunction with an asterisk `*`
733([a unary expression](#Operators)).
734The default value of a disjunction with marked terms is the disjunction
735of those marked terms, applying the following rules for marks:
736
737```
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200738M1: *v => (v, v)
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100739M2: *(v1, d1) => (v1, d1)
740```
741
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400742In general, any operation `f` in CUE involving default values proceeds along the
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200743following lines
744```
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400745O1: f((v1, d1), ..., (vn, dn)) => (f(v1, ..., vn), f(d1, ..., dn))
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200746```
747where, with the exception of disjunction, a value `v` without a default
748value is promoted to `(v, v)`.
749
750
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100751```
752Expression Value-default pair Rules applied
753*"tcp" | "udp" ("tcp"|"udp", "tcp") M1, D1
754string | *"foo" (string, "foo") M1, D1
755
756*1 | 2 | 3 (1|2|3, 1) M1, D1
757
758(*1|2|3) | (1|*2|3) (1|2|3, 1|2) M1, D1, D2
759(*1|2|3) | *(1|*2|3) (1|2|3, 1|2) M1, D1, M2, D2
760(*1|2|3) | (1|*2|3)&2 (1|2|3, 1|2) M1, D1, U1, D2
761
762(*1|2) & (1|*2) (1|2, _|_) M1, D1, U2
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200763
764(*1|2) + (1|*2) ((1|2)+(1|2), 3) M1, D1, O1
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100765```
766
767The rules of subsumption for defaults can be derived from the above definitions
768and are as follows.
769
770```
771(v2, d2) ⊑ (v1, d1) if v2 ⊑ v1 and d2 ⊑ d1
772(v1, d1) ⊑ v if v1 ⊑ v
773v ⊑ (v1, d1) if v ⊑ d1
774```
775
776<!--
777For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
778
779The last one is so restrictive as v could still be made more specific by
780associating it with a default that is not subsumed by d1.
781
782Proof:
783 by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
784 where the most general value is (v, v).
785 Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
786 from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
787 exactly defines the boundary of this subsumption.
788-->
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100789
790<!--
791(non-normalized entries could also be implicitly marked, allowing writing
792int | 1, instead of int | *1, but that can be done in a backwards
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100793compatible way later if really desirable, as long as we require that
794disjunction literals be normalized).
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500795-->
796
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100797
798```
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100799Expression Resolves to
800"tcp" | "udp" "tcp" | "udp"
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100801*"tcp" | "udp" "tcp"
802float | *1 1
803*string | 1.0 string
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100804
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100805(*1|2|3) | (1|*2|3) 1|2
806(*1|2|3) & (1|*2|3) 1|2|3 // default is _|_
807
808(* >=5 | int) & (* <=5 | int) 5
809
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100810(*"tcp"|"udp") & ("udp"|*"tcp") "tcp"
811(*"tcp"|"udp") & ("udp"|"tcp") "tcp"
812(*"tcp"|"udp") & "tcp" "tcp"
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100813(*"tcp"|"udp") & (*"udp"|"tcp") "tcp" | "udp" // default is _|_
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100814
815(*true | false) & bool true
816(*true | false) & (true | false) true
817
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100818{a: 1} | {b: 1} {a: 1} | {b: 1}
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100819{a: 1} | *{b: 1} {b:1}
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100820*{a: 1} | *{b: 1} {a: 1} | {b: 1}
821({a: 1} | {b: 1}) & {a:1} {a:1} // after eliminating {a:1,b:1} by normalization
822({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1} // after eliminating {a:1,b:1} by normalization
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100823```
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500824
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100825
826### Bottom and errors
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100827
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100828Any evaluation error in CUE results in a bottom value, respresented by
Axel Wagner8529d772019-09-24 18:27:12 +0000829the token `_|_`.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100830Bottom is an instance of every other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100831Any evaluation error is represented as bottom.
832
833Implementations may associate error strings with different instances of bottom;
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500834logically they all remain the same value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100835
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100836
837### Top
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100838
Axel Wagner8529d772019-09-24 18:27:12 +0000839Top is represented by the underscore character `_`, lexically an identifier.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100840Unifying any value `v` with top results `v` itself.
841
842```
843Expr Result
844_ & 5 5
845_ & _ _
846_ & _|_ _|_
847_ | _|_ _
848```
849
850
851### Null
852
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100853The _null value_ is represented with the keyword `null`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100854It has only one parent, top, and one child, bottom.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100855It is unordered with respect to any other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100856
857```
858null_lit = "null"
859```
860
861```
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100862null & 8 _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100863null & _ null
864null & _|_ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100865```
866
867
868### Boolean values
869
870A _boolean type_ represents the set of Boolean truth values denoted by
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100871the keywords `true` and `false`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100872The predeclared boolean type is `bool`; it is a defined type and a separate
873element in the lattice.
874
875```
876boolean_lit = "true" | "false"
877```
878
879```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100880bool & true true
881true & true true
882true & false _|_
883bool & (false|true) false | true
884bool & (true|false) true | false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100885```
886
887
888### Numeric values
889
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500890The _integer type_ represents the set of all integral numbers.
891The _decimal floating-point type_ represents the set of all decimal floating-point
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100892numbers.
893They are two distinct types.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +0200894Both are instances instances of a generic `number` type.
895
896<!--
897 number
898 / \
899 int float
900-->
901
902The predeclared number, integer, decimal floating-point types are
903`number`, `int` and `float`; they are defined types.
904<!--
905TODO: should we drop float? It is somewhat preciser and probably a good idea
906to have it in the programmatic API, but it may be confusing to have to deal
907with it in the language.
908-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100909
910A decimal floating-point literal always has type `float`;
911it is not an instance of `int` even if it is an integral number.
912
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400913Integer literals are always of type `int` and don't match type `float`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100914
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100915Numeric literals are exact values of arbitrary precision.
916If the operation permits it, numbers should be kept in arbitrary precision.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100917
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100918Implementation restriction: although numeric values have arbitrary precision
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100919in the language, implementations may implement them using an internal
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100920representation with limited precision.
921That said, every implementation must:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100922
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500923- Represent integer values with at least 256 bits.
924- Represent floating-point values, with a mantissa of at least 256 bits and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100925a signed binary exponent of at least 16 bits.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500926- Give an error if unable to represent an integer value precisely.
927- Give an error if unable to represent a floating-point value due to overflow.
928- Round to the nearest representable value if unable to represent
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100929a floating-point value due to limits on precision.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100930These requirements apply to the result of any expression except for builtin
931functions for which an unusual loss of precision must be explicitly documented.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100932
933
934### Strings
935
Marcel van Lohuizen4108f802019-08-13 18:30:25 +0200936The _string type_ represents the set of UTF-8 strings,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100937not allowing surrogates.
938The predeclared string type is `string`; it is a defined type.
939
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100940The length of a string `s` (its size in bytes) can be discovered using
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400941the built-in function `len`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100942
Marcel van Lohuizen4108f802019-08-13 18:30:25 +0200943
944### Bytes
945
946The _bytes type_ represents the set of byte sequences.
947A byte sequence value is a (possibly empty) sequence of bytes.
948The number of bytes is called the length of the byte sequence
949and is never negative.
950The predeclared byte sequence type is `bytes`; it is a defined type.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100951
952
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +0100953### Bounds
954
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400955A _bound_, syntactically a [unary expression](#Operands), defines
Marcel van Lohuizen62b87272019-02-01 10:07:49 +0100956an infinite disjunction of concrete values than can be represented
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +0100957as a single comparison.
958
959For any [comparison operator](#Comparison-operators) `op` except `==`,
960`op a` is the disjunction of every `x` such that `x op a`.
961
962```
9632 & >=2 & <=5 // 2, where 2 is either an int or float.
9642.5 & >=1 & <=5 // 2.5
9652 & >=1.0 & <3.0 // 2.0
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01009662 & >1 & <3.0 // 2.0
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01009672.5 & int & >1 & <5 // _|_
9682.5 & float & >1 & <5 // 2.5
969int & 2 & >1.0 & <3.0 // _|_
9702.5 & >=(int & 1) & <5 // _|_
971>=0 & <=7 & >=3 & <=10 // >=3 & <=7
972!=null & 1 // 1
973>=5 & <=5 // 5
974```
975
976
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100977### Structs
978
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500979A _struct_ is a set of elements called _fields_, each of
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100980which has a name, called a _label_, and value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100981
982We say a label is defined for a struct if the struct has a field with the
983corresponding label.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +0200984The value for a label `f` of struct `a` is denoted `a.f`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100985A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100986defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100987Note that if `a` is an instance of `b` it may have fields with labels that
988are not defined for `b`.
989
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500990The (unique) struct with no fields, written `{}`, has every struct as an
991instance. It can be considered the type of all structs.
992
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400993```
994{a: 1} ⊑ {}
995{a: 1, b: 1} ⊑ {a: 1}
996{a: 1} ⊑ {a: int}
997{a: 1, b: 1} ⊑ {a: int, b: float}
998
999{} ⋢ {a: 1}
1000{a: 2} ⋢ {a: 1}
1001{a: 1} ⋢ {b: 1}
1002```
1003
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001004A field may be required or optional.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001005The successful unification of structs `a` and `b` is a new struct `c` which
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001006has all fields of both `a` and `b`, where
1007the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
1008or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001009If a field `f` is in both `a` and `b`, `c.f` is optional only if both
1010`a.f` and `b.f` are optional.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001011Any [references](#References) to `a` or `b`
1012in their respective field values need to be replaced with references to `c`.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001013The result of a unification is bottom (`_|_`) if any of its required
1014fields evaluates to bottom, recursively.
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001015<!--NOTE: About bottom values for optional fields being okay.
1016
1017The proposition ¬P is a close cousin of P → ⊥ and is often used
1018as an approximation to avoid the issues of using not.
1019Bottom (⊥) is also frequently used to mean undefined. This makes sense.
1020Consider `{a?: 2} & {a?: 3}`.
1021Both structs say `a` is optional; in other words, it may be omitted.
1022So we can still get a valid result by omitting `a`, even in
1023case of a conflict.
1024
1025Granted, this definition may lead to confusing results, especially in
1026definitions, when tightening an optional field leads to unintentionally
1027discarding it.
1028It could be a role of vet checkers to identify such cases (and suggest users
1029to explicitly use `_|_` to discard a field, for instance).
1030-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001031
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001032Syntactically, a struct literal may contain multiple fields with
1033the same label, the result of which is a single field with the same properties
1034as defined as the unification of two fields resulting from unifying two structs.
1035
1036These examples illustrate required fields only. Examples with
1037optional fields follow below.
1038
1039```
1040Expression Result (without optional fields)
1041{a: int, a: 1} {a: 1}
1042{a: int} & {a: 1} {a: 1}
1043{a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7}
1044{a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7}
1045
1046{a: 1} & {b: 2} {a: 1, b: 2}
1047{a: 1, b: int} & {b: 2} {a: 1, b: 2}
1048
1049{a: 1} & {a: 2} _|_
1050```
1051
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001052Syntactically, the labels of optional fields are followed by a
1053question mark `?`.
1054The question mark is not part of the field name.
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001055Concrete field labels may be an identifier or string, the latter of which may be
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001056interpolated.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001057Fields with identifier labels can be referred to within the scope they are
1058defined, string labels cannot.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001059References within such interpolated strings are resolved within
1060the scope of the struct in which the label sequence is
1061defined and can reference concrete labels lexically preceding
1062the label within a label sequence.
1063<!-- We allow this so that rewriting a CUE file to collapse or expand
1064field sequences has no impact on semantics.
1065-->
1066
1067<!--TODO: first implementation round will not yet have expression labels
1068
1069An ExpressionLabel sets a collection of optional fields to a field value.
1070By default it defines this value for all possible string labels.
1071An optional expression limits this to the set of optional fields which
1072labels match the expression.
1073-->
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001074A Bind label, written `<identifier>`, is useful for capturing a label as a value
1075and for enforcing constraints on all fields of a struct.
1076In a field using a bind label, such as
1077```
1078{
1079 <id>: { name: id }
1080}
1081```
1082the label name is bound to the identifier for the scope of the field value, so
1083it can be used inside the value to denote the label.
1084
1085A bind label matches every field of its enclosing struct, so
1086```
1087{
1088 <id>: { name: id }
1089 a: { value: 1 }
1090}
1091```
1092evaluates to
1093
1094```
1095{
1096 a: { name: "a" }
1097 a: { value: 1 }
1098}
1099```
1100Since identical fields in a struct unify, this is equivalent to
1101```
1102{
1103 a: {
1104 name: "a"
1105 value: 1
1106 }
1107}
1108```
1109
1110Because bind labels match every field in a struct, they can enforce constraints
1111on all fields. The struct
1112
1113```
1114ints: {
1115 <_>: int
1116}
1117```
1118can only have integer field values:
1119
1120```
1121ints & { a: 1 } // ok
1122ints & { b: "two" } // _|_, because int & "two" == _|_.
1123```
1124
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001125The token `...` is a shorthand for `<_>: _`.
1126<!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001127
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001128
1129<!-- NOTE:
1130A DefinitionDecl does not allow repeated labels. This is to avoid
1131any ambiguity or confusion about whether earlier path components
1132are to be interpreted as declarations or normal fields (they should
1133always be normal fields.)
1134-->
1135
1136<!--NOTE:
1137The syntax has been deliberately restricted to allow for the following
1138future extensions and relaxations:
1139 - Allow omitting a "?" in an expression label to indicate a concrete
1140 string value (but maybe we want to use () for that).
1141 - Make the "?" in expression label optional if expression labels
1142 are always optional.
1143 - Or allow eliding the "?" if the expression has no references and
1144 is obviously not concrete (such as `[string]`).
1145 - The expression of an expression label may also indicate a struct with
1146 integer or even number labels
1147 (beware of imprecise computation in the latter).
1148 e.g. `{ [int]: string }` is a map of integers to strings.
1149 - Allow for associative lists (`foo [@.field]: {field: string}`)
1150 - The `...` notation can be extended analogously to that of a ListList,
1151 by allowing it to follow with an expression for the remaining properties.
1152 In that case it is no longer a shorthand for `[string]: _`, but rather
1153 would define the value for any other value for which there is no field
1154 defined.
1155 Like the definition with List, this is somewhat odd, but it allows the
1156 encoding of JSON schema's and (non-structural) OpenAPI's
1157 additionalProperties and additionalItems.
1158-->
1159
1160<!-- TODO: for next round of implementation, replace ExpressionLabel with:
1161ExpressionLabel = BindLabel | [ BindLabel ] "[" [ Expression ] "]" .
1162-->
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001163
Marcel van Lohuizen98187612019-09-03 12:48:25 +02001164<!-- TODO: strongly consider relaxing an embedding to be an Expression, instead
1165of Operand. This will tie in with using dots instead of spaces on the LHS,
1166comprehensions and the ability to generate good error messages, so thread
1167carefully.
1168-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001169```
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02001170StructLit = "{" { Declaration "," } [ "..." ] "}" .
1171Declaration = FieldDecl | DefinitionDecl | AliasDecl | Comprehension | Embedding .
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001172FieldDecl = Label { Label } ":" Expression { attribute } .
1173DefinitionDecl = Label "::" Expression { attribute } .
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02001174Embedding = Expression .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001175
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001176AliasDecl = Label "=" Expression .
1177BindLabel = "<" identifier ">" .
1178ConcreteLabel = identifier | simple_string_lit .
1179ExpressionLabel = BindLabel
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001180Label = ConcreteLabel [ "?" ] | ExpressionLabel .
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001181
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001182attribute = "@" identifier "(" attr_elems ")" .
1183attr_elems = attr_elem { "," attr_elem }
1184attr_elem = attr_string | attr_label | attr_nest .
1185attr_label = identifier "=" attr_string .
1186attr_nest = identifier "(" attr_elems ")" .
1187attr_string = { attr_char } | string_lit .
1188attr_char = /* an arbitrary Unicode code point except newline, ',', '"', `'`, '#', '=', '(', and ')' */ .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001189```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001190
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001191
1192```
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001193Expression Result (without optional fields)
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001194a: { foo?: string } {}
1195b: { foo: "bar" } { foo: "bar" }
1196c: { foo?: *"bar" | string } {}
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001197
1198d: a & b { foo: "bar" }
1199e: b & c { foo: "bar" }
1200f: a & c {}
1201g: a & { foo?: number } {}
1202h: b & { foo?: number } _|_
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001203i: c & { foo: string } { foo: "bar" }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001204```
1205
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001206#### Closed structs
1207
1208By default, structs are open to adding fields.
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001209Instances of an open struct `p` may contain fields not defined in `p`.
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001210This is makes it easy to add fields, but can lead to bugs:
1211
1212```
1213S: {
1214 field1: string
1215}
1216
1217S1: S & { field2: "foo" }
1218
1219// S1 is { field1: string, field2: "foo" }
1220
1221
1222A: {
1223 field1: string
1224 field2: string
1225}
1226
1227A1: A & {
1228 feild1: "foo" // "field1" was accidentally misspelled
1229}
1230
1231// A1 is
1232// { field1: string, field2: string, feild1: "foo" }
1233// not the intended
1234// { field1: "foo", field2: string }
1235```
1236
Marcel van Lohuizen18637db2019-09-03 11:48:25 +02001237A _closed struct_ `c` is a struct whose instances may not have regular fields
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001238not defined in `c`.
Marcel van Lohuizen4245fb42019-09-09 11:22:12 +02001239Closing a struct is equivalent to adding an optional field with value `_|_`
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001240for all undefined fields.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001241
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001242Syntactically, closed structs can be explicitly created with the `close` builtin
1243or implicitly by [definitions](#Definitions).
1244
1245
1246```
1247A: close({
1248 field1: string
1249 field2: string
1250})
1251
1252A1: A & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001253 feild1: string
1254} // _|_ feild1 not defined for A
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001255
1256A2: A & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001257 for k,v in { feild1: string } {
1258 k: v
1259 }
1260} // _|_ feild1 not defined for A
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001261
1262C: close({
1263 <_>: _
1264})
1265
1266C2: C & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001267 for k,v in { thisIsFine: string } {
1268 "\(k)": v
1269 }
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001270}
1271
1272D: close({
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001273 // Values generated by comprehensions are treated as embeddings.
1274 for k,v in { x: string } {
1275 "\(k)": v
1276 }
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001277})
1278```
1279
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001280<!-- (jba) Somewhere it should be said that optional fields are only
1281 interesting inside closed structs. -->
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001282
1283#### Embedding
1284
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001285A struct may contain an _embedded value_, an operand used
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001286as a declaration, which must evaluate to a struct.
1287An embedded value of type struct is unified with the struct in which it is
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001288embedded, but disregarding the restrictions imposed by closed structs.
1289A struct resulting from such a unification is closed if either of the involved
1290structs were closed.
1291
Marcel van Lohuizene53305e2019-09-13 10:10:31 +02001292Syntactically, embeddings may be any expression, except that `<`
1293is eagerly interpreted as a bind label.
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02001294
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001295```
1296S1: {
1297 a: 1
1298 b: 2
1299 {
1300 c: 3
1301 }
1302}
1303// S1 is { a: 1, b: 2, c: 3 }
1304
1305S2: close({
1306 a: 1
1307 b: 2
1308 {
1309 c: 3
1310 }
1311})
1312// same as close(S1)
1313
1314S3: {
1315 a: 1
1316 b: 2
1317 close({
1318 c: 3
1319 })
1320}
1321// same as S2
1322```
1323
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001324
1325#### Definitions
1326
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001327A field of a struct may be declared as a regular field (using `:`)
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001328or as a _definition_ (using `::`).
1329Definitions are not emitted as part of the model and are never required
1330to be concrete when emitting data.
Marcel van Lohuizen18637db2019-09-03 11:48:25 +02001331It is illegal to have a regular field and a definition with the same name
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001332within the same struct.
1333Literal structs that are part of a definition's value are implicitly closed.
Marcel van Lohuizen5e8c3912019-09-03 15:46:26 +02001334This excludes literals structs in embeddings and aliases.
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001335An ellipsis `...` in such literal structs keeps them open,
1336as it defines `_` for all labels.
Marcel van Lohuizen5e8c3912019-09-03 15:46:26 +02001337<!--
1338Excluding embeddings from recursive closing allows comprehensions to be
1339interpreted as embeddings without some exception. For instance,
1340 if x > 2 {
1341 foo: string
1342 }
1343should not cause any failure. It is also consistent with embeddings being
1344opened when included in a closed struct.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001345
Marcel van Lohuizen5e8c3912019-09-03 15:46:26 +02001346Finally, excluding embeddings from recursive closing allows for
1347a mechanism to not recursively close, without needing an additional language
1348construct, such as a triple colon or something else:
1349foo :: {
1350 {
1351 // not recursively closed
1352 }
1353 ... // include this to not close outer struct
1354}
1355
1356Including aliases from this exclusion, which are more a separate definition
1357than embedding seems sensible, and allows for an easy mechanism to avoid
1358closing, aside from embedding.
1359-->
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001360
1361```
1362// MyStruct is closed and as there is no expression label or `...`, we know
1363// this is the full definition.
1364MyStruct :: {
1365 field: string
1366 enabled?: bool
1367}
1368
1369// Without the `...`, this field would not unify with its previous declaration.
1370MyStruct :: {
1371 enabled: bool | *false
1372 ...
1373}
1374
1375myValue: MyStruct & {
1376 feild: 2 // error, feild not defined in MyStruct
1377 enabled: true // okay
1378}
1379
1380D :: {
1381 OneOf
1382
1383 c: int // adds this field.
1384}
1385
1386OneOf :: { a: int } | { b: int }
1387
1388
1389D1: D & { a: 12, c: 22 } // { a: 12, c: 22 }
1390D2: D & { a: 12, b: 33 } // _|_ // cannot define both `a` and `b`
1391```
1392
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001393
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001394<!---
1395JSON fields are usual camelCase. Clashes can be avoided by adopting the
1396convention that definitions be TitleCase. Unexported definitions are still
1397subject to clashes, but those are likely easier to resolve because they are
1398package internal.
1399--->
1400
1401
1402#### Field attributes
1403
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001404Fields may be associated with attributes.
1405Attributes define additional information about a field,
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001406such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001407name of the field when mapping to a different language.
1408
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001409<!-- TODO define attribute syntax here, before getting into semantics. -->
1410
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001411If a field has multiple attributes their identifiers must be unique.
1412Attributes accumulate when unifying two fields, removing duplicate entries.
1413It is an error for the resulting field to have two different attributes
1414with the same identifier.
1415
1416Attributes are not directly part of the data model, but may be
1417accessed through the API or other means of reflection.
1418The interpretation of the attribute value
1419(a comma-separated list of attribute elements) depends on the attribute.
1420Interpolations are not allowed in attribute strings.
1421
1422The recommended convention, however, is to interpret the first
1423`n` arguments as positional arguments,
1424where duplicate conflicting entries are an error,
1425and the remaining arguments as a combination of flags
1426(an identifier) and key value pairs, separated by a `=`.
1427
1428```
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001429myStruct1: {
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001430 field: string @go(Field)
1431 attr: int @xml(,attr) @go(Attr)
1432}
1433
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001434myStruct2: {
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001435 field: string @go(Field)
1436 attr: int @xml(a1,attr) @go(Attr)
1437}
1438
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001439Combined: myStruct1 & myStruct2
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001440// field: string @go(Field)
1441// attr: int @xml(,attr) @xml(a1,attr) @go(Attr)
1442```
1443
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001444#### Aliases
1445
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001446In addition to fields, a struct literal may also define aliases.
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001447Aliases name values that can be referred to
1448within the [scope](#declarations-and-scopes) of their
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001449definition, but are not part of the struct: aliases are irrelevant to
1450the partial ordering of values and are not emitted as part of any
1451generated data.
1452The name of an alias must be unique within the struct literal.
1453
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001454<!-- TODO: explain the difference between aliases and definitions.
1455 Now that you have definitions, are aliases really necessary?
1456 Consider removing.
1457-->
1458
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001459```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001460// The empty struct.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001461{}
1462
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001463// A struct with 3 fields and 1 alias.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001464{
1465 alias = 3
1466
1467 foo: 2
1468 bar: "a string"
1469
1470 "not an ident": 4
1471}
1472```
1473
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001474#### Shorthand notation for nested structs
1475
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001476A field whose value is a struct with a single field may be written as
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001477a sequence of the two field names,
1478followed by a colon and the value of that single field.
1479
1480```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001481job myTask replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001482```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001483expands to
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001484```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001485job: {
1486 myTask: {
1487 replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001488 }
1489}
1490```
1491
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001492<!-- OPTIONAL FIELDS:
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001493
Marcel van Lohuizen08a0ef22019-03-28 09:12:19 +01001494The optional marker solves the issue of having to print large amounts of
1495boilerplate when dealing with large types with many optional or default
1496values (such as Kubernetes).
1497Writing such optional values in terms of *null | value is tedious,
1498unpleasant to read, and as it is not well defined what can be dropped or not,
1499all null values have to be emitted from the output, even if the user
1500doesn't override them.
1501Part of the issue is how null is defined. We could adopt a Typescript-like
1502approach of introducing "void" or "undefined" to mean "not defined and not
1503part of the output". But having all of null, undefined, and void can be
1504confusing. If these ever are introduced anyway, the ? operator could be
1505expressed along the lines of
1506 foo?: bar
1507being a shorthand for
1508 foo: void | bar
1509where void is the default if no other default is given.
1510
1511The current mechanical definition of "?" is straightforward, though, and
1512probably avoids the need for void, while solving a big issue.
1513
1514Caveats:
1515[1] this definition requires explicitly defined fields to be emitted, even
1516if they could be elided (for instance if the explicit value is the default
1517value defined an optional field). This is probably a good thing.
1518
1519[2] a default value may still need to be included in an output if it is not
1520the zero value for that field and it is not known if any outside system is
1521aware of defaults. For instance, which defaults are specified by the user
1522and which by the schema understood by the receiving system.
1523The use of "?" together with defaults should therefore be used carefully
1524in non-schema definitions.
1525Problematic cases should be easy to detect by a vet-like check, though.
1526
1527[3] It should be considered how this affects the trim command.
1528Should values implied by optional fields be allowed to be removed?
1529Probably not. This restriction is unlikely to limit the usefulness of trim,
1530though.
1531
1532[4] There should be an option to emit all concrete optional values.
1533```
1534-->
1535
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001536### Lists
1537
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001538A list literal defines a new value of type list.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001539A list may be open or closed.
1540An open list is indicated with a `...` at the end of an element list,
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001541optionally followed by a value for the remaining elements.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001542
1543The length of a closed list is the number of elements it contains.
1544The length of an open list is the its number of elements as a lower bound
1545and an unlimited number of elements as its upper bound.
1546
1547```
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01001548ListLit = "[" [ ElementList [ "," [ "..." [ Expression ] ] ] "]" .
1549ElementList = Expression { "," Expression } .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001550```
1551<!---
1552KeyedElement = Element .
1553--->
1554
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001555Lists can be thought of as structs:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001556
1557```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001558List: *null | {
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001559 Elem: _
1560 Tail: List
1561}
1562```
1563
1564For closed lists, `Tail` is `null` for the last element, for open lists it is
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001565`*null | List`, defaulting to the shortest variant.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001566For instance, the open list [ 1, 2, ... ] can be represented as:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001567```
1568open: List & { Elem: 1, Tail: { Elem: 2 } }
1569```
1570and the closed version of this list, [ 1, 2 ], as
1571```
1572closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
1573```
1574
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001575Using this representation, the subsumption rule for lists can
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001576be derived from those of structs.
1577Implementations are not required to implement lists as structs.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001578The `Elem` and `Tail` fields are not special and `len` will not work as
1579expected in these cases.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001580
1581
1582## Declarations and Scopes
1583
1584
1585### Blocks
1586
1587A _block_ is a possibly empty sequence of declarations.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001588The braces of a struct literal `{ ... }` form a block, but there are
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001589others as well:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001590
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001591- The _universe block_ encompasses all CUE source text.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001592- Each [package](#modules-instances-and-packages) has a _package block_
1593 containing all CUE source text in that package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001594- Each file has a _file block_ containing all CUE source text in that file.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001595- Each `for` and `let` clause in a [comprehension](#comprehensions)
1596 is considered to be its own implicit block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001597
1598Blocks nest and influence [scoping].
1599
1600
1601### Declarations and scope
1602
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001603A _declaration_ may bind an identifier to a field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001604Every identifier in a program must be declared.
1605Other than for fields,
1606no identifier may be declared twice within the same block.
1607For fields an identifier may be declared more than once within the same block,
1608resulting in a field with a value that is the result of unifying the values
1609of all fields with the same identifier.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001610String labels do not bind an identifier to the respective field.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001611
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001612The _scope_ of a declared identifier is the extent of source text in which the
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001613identifier denotes the specified field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001614
1615CUE is lexically scoped using blocks:
1616
Jonathan Amsterdame4790382019-01-20 10:29:29 -050016171. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010016181. The scope of an identifier denoting a field or alias
1619 declared at top level (outside any struct literal) is the file block.
16201. The scope of the package name of an imported package is the file block of the
1621 file containing the import declaration.
16221. The scope of a field or alias identifier declared inside a struct literal
1623 is the innermost containing block.
1624
1625An identifier declared in a block may be redeclared in an inner block.
1626While the identifier of the inner declaration is in scope, it denotes the entity
1627declared by the inner declaration.
1628
1629The package clause is not a declaration;
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001630the package name does not appear in any scope.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001631Its purpose is to identify the files belonging to the same package
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001632and to specify the default name for import declarations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001633
1634
1635### Predeclared identifiers
1636
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001637CUE predefines a set of types and builtin functions.
1638For each of these there is a corresponding keyword which is the name
1639of the predefined identifier, prefixed with `__`.
1640
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001641```
1642Functions
1643len required close open
1644
1645Types
1646null The null type and value
1647bool All boolean values
1648int All integral numbers
1649float All decimal floating-point numbers
1650string Any valid UTF-8 sequence
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001651bytes Any valid byte sequence
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001652
1653Derived Value
1654number int | float
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001655uint >=0
1656uint8 >=0 & <=255
1657int8 >=-128 & <=127
1658uint16 >=0 & <=65536
1659int16 >=-32_768 & <=32_767
1660rune >=0 & <=0x10FFFF
1661uint32 >=0 & <=4_294_967_296
1662int32 >=-2_147_483_648 & <=2_147_483_647
1663uint64 >=0 & <=18_446_744_073_709_551_615
1664int64 >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
1665uint128 >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
1666int128 >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
1667 <=170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001668float32 >=-3.40282346638528859811704183484516925440e+38 &
1669 <=3.40282346638528859811704183484516925440e+38
1670float64 >=-1.797693134862315708145274237317043567981e+308 &
1671 <=1.797693134862315708145274237317043567981e+308
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001672```
1673
1674
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001675### Exported identifiers
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001676
1677An identifier of a package may be exported to permit access to it
1678from another package.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001679An identifier is exported if
1680the first character of the identifier's name is a Unicode upper case letter
1681(Unicode class "Lu"); and
1682the identifier is declared in the file block.
1683All other top-level identifiers used for fields not exported.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001684
1685In addition, any definition declared anywhere within a package of which
1686the first character of the identifier's name is a Unicode upper case letter
1687(Unicode class "Lu") is visible outside this package.
1688Any other defintion is not visible outside the package and resides
1689in a separate namespace than namesake identifiers of other packages.
1690This is in contrast to ordinary field declarations that do not begin with
1691an upper-case letter, which are visible outside the package.
1692
1693```
1694package mypackage
1695
1696foo: string // not visible outside mypackage
1697
1698Foo :: { // visible outside mypackage
1699 a: 1 // visible outside mypackage
1700 B: 2 // visible outside mypackage
1701
1702 C :: { // visible outside mypackage
1703 d: 4 // visible outside mypackage
1704 }
1705 e :: foo // not visible outside mypackage
1706}
1707```
1708
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001709
1710### Uniqueness of identifiers
1711
1712Given a set of identifiers, an identifier is called unique if it is different
1713from every other in the set, after applying normalization following
1714Unicode Annex #31.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001715Two identifiers are different if they are spelled differently
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001716or if they appear in different packages and are not exported.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001717Otherwise, they are the same.
1718
1719
1720### Field declarations
1721
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001722A field associates the value of an expression to a label within a struct.
1723If this label is an identifier, it binds the field to that identifier,
1724so the field's value can be referenced by writing the identifier.
1725String labels are not bound to fields.
1726```
1727a: {
1728 b: 2
1729 "s": 3
1730
1731 c: b // 2
1732 d: s // _|_ unresolved identifier "s"
1733 e: a.s // 3
1734}
1735```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001736
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001737If an expression may result in a value associated with a default value
1738as described in [default values](#default-values), the field binds to this
1739value-default pair.
1740
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001741
Marcel van Lohuizenbcf832f2019-04-03 22:50:44 +02001742<!-- TODO: disallow creating identifiers starting with __
1743...and reserve them for builtin values.
1744
1745The issue is with code generation. As no guarantee can be given that
1746a predeclared identifier is not overridden in one of the enclosing scopes,
1747code will have to handle detecting such cases and renaming them.
1748An alternative is to have the predeclared identifiers be aliases for namesake
1749equivalents starting with a double underscore (e.g. string -> __string),
1750allowing generated code (normal code would keep using `string`) to refer
1751to these directly.
1752-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001753
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001754
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001755### Alias declarations
1756
1757An alias declaration binds an identifier to the given expression.
1758
1759Within the scope of the identifier, it serves as an _alias_ for that
1760expression.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001761The expression is evaluated in the scope it was declared.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001762
1763
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001764## Expressions
1765
1766An expression specifies the computation of a value by applying operators and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001767built-in functions to operands.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001768
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001769Expressions that require concrete values are called _incomplete_ if any of
1770their operands are not concrete, but define a value that would be legal for
1771that expression.
1772Incomplete expressions may be left unevaluated until a concrete value is
1773requested at the application level.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001774
1775### Operands
1776
1777Operands denote the elementary values in an expression.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001778An operand may be a literal, a (possibly qualified) identifier denoting
1779field, alias, or a parenthesized expression.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001780
1781```
1782Operand = Literal | OperandName | ListComprehension | "(" Expression ")" .
1783Literal = BasicLit | ListLit | StructLit .
1784BasicLit = int_lit | float_lit | string_lit |
1785 null_lit | bool_lit | bottom_lit | top_lit .
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001786OperandName = identifier | QualifiedIdent .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001787```
1788
1789### Qualified identifiers
1790
1791A qualified identifier is an identifier qualified with a package name prefix.
1792
1793```
1794QualifiedIdent = PackageName "." identifier .
1795```
1796
1797A qualified identifier accesses an identifier in a different package,
1798which must be [imported].
1799The identifier must be declared in the [package block] of that package.
1800
1801```
1802math.Sin // denotes the Sin function in package math
1803```
1804
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001805### References
1806
1807An identifier operand refers to a field and is called a reference.
1808The value of a reference is a copy of the expression associated with the field
1809that it is bound to,
1810with any references within that expression bound to the respective copies of
1811the fields they were originally bound to.
1812Implementations may use a different mechanism to evaluate as long as
1813these semantics are maintained.
1814
1815```
1816a: {
1817 place: string
1818 greeting: "Hello, \(place)!"
1819}
1820
1821b: a & { place: "world" }
1822c: a & { place: "you" }
1823
1824d: b.greeting // "Hello, world!"
1825e: c.greeting // "Hello, you!"
1826```
1827
1828
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001829
1830### Primary expressions
1831
1832Primary expressions are the operands for unary and binary expressions.
1833
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001834
1835```
1836
1837Slice: indices must be complete
1838([0, 1, 2, 3] | [2, 3])[0:2] => [0, 1] | [2, 3]
1839
1840([0, 1, 2, 3] | *[2, 3])[0:2] => [0, 1] | [2, 3]
1841([0,1,2,3]|[2,3], [2,3])[0:2] => ([0,1]|[2,3], [2,3])
1842
1843Index
1844a: (1|2, 1)
1845b: ([0,1,2,3]|[2,3], [2,3])[a] => ([0,1,2,3]|[2,3][a], 3)
1846
1847Binary operation
1848A binary is only evaluated if its operands are complete.
1849
1850Input Maximum allowed evaluation
1851a: string string
1852b: 2 2
1853c: a * b a * 2
1854
1855An error in a struct is if the evaluation of any expression results in
1856bottom, where an incomplete expression is not considered bottom.
1857```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001858<!-- TODO(mpvl)
1859 Conversion |
1860-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001861```
1862PrimaryExpr =
1863 Operand |
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001864 PrimaryExpr Selector |
1865 PrimaryExpr Index |
1866 PrimaryExpr Slice |
1867 PrimaryExpr Arguments .
1868
1869Selector = "." identifier .
1870Index = "[" Expression "]" .
1871Slice = "[" [ Expression ] ":" [ Expression ] "]"
1872Argument = Expression .
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001873Arguments = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001874```
1875<!---
1876Argument = Expression | ( identifer ":" Expression ).
1877--->
1878
1879```
1880x
18812
1882(s + ".txt")
1883f(3.1415, true)
1884m["foo"]
1885s[i : j + 1]
1886obj.color
1887f.p[i].x
1888```
1889
1890
1891### Selectors
1892
Roger Peppeded0e1d2019-09-24 16:39:36 +01001893For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause),
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001894the selector expression
1895
1896```
1897x.f
1898```
1899
1900denotes the field `f` of the value `x`.
1901The identifier `f` is called the field selector.
1902The type of the selector expression is the type of `f`.
Roger Peppeded0e1d2019-09-24 16:39:36 +01001903If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001904
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001905<!--
1906TODO: consider allowing this and also for selectors. It needs to be considered
1907how defaults are corried forward in cases like:
1908
1909 x: { a: string | *"foo" } | *{ a: int | *4 }
1910 y: x.a & string
1911
1912What is y in this case?
1913 (x.a & string, _|_)
1914 (string|"foo", _|_)
1915 (string|"foo", "foo)
1916If the latter, then why?
1917
1918For a disjunction of the form `x1 | ... | xn`,
1919the selector is applied to each element `x1.f | ... | xn.f`.
1920-->
1921
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001922Otherwise, if `x` is not a struct, or if `f` does not exist in `x`,
1923the result of the expression is bottom (an error).
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001924In the latter case the expression is incomplete.
1925The operand of a selector may be associated with a default.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001926
1927```
1928T: {
1929 x: int
1930 y: 3
1931}
1932
1933a: T.x // int
1934b: T.y // 3
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001935c: T.z // _|_ // field 'z' not found in T
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001936
1937e: {a: 1|*2} | *{a: 3|*4}
1938f: e.a // 4 (default value)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001939```
1940
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001941<!--
1942```
1943(v, d).f => (v.f, d.f)
1944
1945e: {a: 1|*2} | *{a: 3|*4}
1946f: e.a // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
1947
1948```
1949-->
1950
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001951
1952### Index expressions
1953
1954A primary expression of the form
1955
1956```
1957a[x]
1958```
1959
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001960denotes the element of a list or struct `a` indexed by `x`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001961The value `x` is called the index or field name, respectively.
1962The following rules apply:
1963
1964If `a` is not a struct:
1965
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001966- `a` is a list (which need not be complete)
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001967- the index `x` unified with `int` must be concrete.
1968- the index `x` is in range if `0 <= x < len(a)`, where only the
1969 explicitly defined values of an open-ended list are considered,
1970 otherwise it is out of range
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001971
1972The result of `a[x]` is
1973
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001974for `a` of list type:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001975
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001976- the list element at index `x`, if `x` is within range
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001977- bottom (an error), otherwise
1978
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001979
1980for `a` of struct type:
1981
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001982- the index `x` unified with `string` must be concrete.
Marcel van Lohuizend2825532019-09-23 12:44:01 +01001983- the value of the regular and non-optional field named `x` of struct `a`,
1984 if this field exists
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001985- bottom (an error), otherwise
1986
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001987
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001988```
1989[ 1, 2 ][1] // 2
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001990[ 1, 2 ][2] // _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001991[ 1, 2, ...][2] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001992```
1993
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001994Both the operand and index value may be a value-default pair.
1995```
1996va[vi] => va[vi]
1997va[(vi, di)] => (va[vi], va[di])
1998(va, da)[vi] => (va[vi], da[vi])
1999(va, da)[(vi, di)] => (va[vi], da[di])
2000```
2001
2002```
2003Fields Result
2004x: [1, 2] | *[3, 4] ([1,2]|[3,4], [3,4])
2005i: int | *1 (int, 1)
2006
2007v: x[i] (x[i], 4)
2008```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002009
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002010### Operators
2011
2012Operators combine operands into expressions.
2013
2014```
2015Expression = UnaryExpr | Expression binary_op Expression .
2016UnaryExpr = PrimaryExpr | unary_op UnaryExpr .
2017
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002018binary_op = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op .
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01002019rel_op = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002020add_op = "+" | "-" .
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002021mul_op = "*" | "/" | "div" | "mod" | "quo" | "rem" .
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01002022unary_op = "+" | "-" | "!" | "*" | rel_op .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002023```
2024
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002025Comparisons are discussed [elsewhere](#Comparison-operators).
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01002026For any binary operators, the operand types must unify.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002027<!-- TODO: durations
2028 unless the operation involves durations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002029
2030Except for duration operations, if one operand is an untyped [literal] and the
2031other operand is not, the constant is [converted] to the type of the other
2032operand.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002033-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002034
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002035Operands of unary and binary expressions may be associated with a default using
2036the following
2037<!--
2038```
2039O1: op (v1, d1) => (op v1, op d1)
2040
2041O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
2042and because v => (v, v)
2043O3: v1 op (v2, d2) => (v1 op v2, v1 op d2)
2044O4: (v1, d1) op v2 => (v1 op v2, d1 op v2)
2045```
2046-->
2047
2048```
2049Field Resulting Value-Default pair
2050a: *1|2 (1|2, 1)
2051b: -a (-a, -1)
2052
2053c: a + 2 (a+2, 3)
2054d: a + a (a+a, 2)
2055```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002056
2057#### Operator precedence
2058
2059Unary operators have the highest precedence.
2060
2061There are eight precedence levels for binary operators.
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002062Multiplication operators binds strongest, followed by
2063addition operators, comparison operators,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002064`&&` (logical AND), `||` (logical OR), `&` (unification),
2065and finally `|` (disjunction):
2066
2067```
2068Precedence Operator
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002069 7 * / div mod quo rem
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002070 6 + -
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01002071 5 == != < <= > >= =~ !~
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002072 4 &&
2073 3 ||
2074 2 &
2075 1 |
2076```
2077
2078Binary operators of the same precedence associate from left to right.
2079For instance, `x / y * z` is the same as `(x / y) * z`.
2080
2081```
2082+x
208323 + 3*x[i]
2084x <= f()
2085f() || g()
2086x == y+1 && y == z-1
20872 | int
2088{ a: 1 } & { b: 2 }
2089```
2090
2091#### Arithmetic operators
2092
2093Arithmetic operators apply to numeric values and yield a result of the same type
2094as the first operand. The three of the four standard arithmetic operators
2095`(+, -, *)` apply to integer and decimal floating-point types;
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01002096`+` and `*` also apply to lists and strings.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002097`/` only applies to decimal floating-point types and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002098`div`, `mod`, `quo`, and `rem` only apply to integer types.
2099
2100```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002101+ sum integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002102- difference integers, floats
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002103* product integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002104/ quotient floats
2105div division integers
2106mod modulo integers
2107quo quotient integers
2108rem remainder integers
2109```
2110
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002111For any operator that accepts operands of type `float`, any operand may be
2112of type `int` or `float`, in which case the result will be `float` if any
2113of the operands is `float` or `int` otherwise.
2114For `/` the result is always `float`.
2115
2116
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002117#### Integer operators
2118
2119For two integer values `x` and `y`,
2120the integer quotient `q = x div y` and remainder `r = x mod y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002121implement Euclidean division and
2122satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002123
2124```
2125r = x - y*q with 0 <= r < |y|
2126```
2127where `|y|` denotes the absolute value of `y`.
2128
2129```
2130 x y x div y x mod y
2131 5 3 1 2
2132-5 3 -2 1
2133 5 -3 -1 2
2134-5 -3 2 1
2135```
2136
2137For two integer values `x` and `y`,
2138the integer quotient `q = x quo y` and remainder `r = x rem y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002139implement truncated division and
2140satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002141
2142```
2143x = q*y + r and |r| < |y|
2144```
2145
2146with `x quo y` truncated towards zero.
2147
2148```
2149 x y x quo y x rem y
2150 5 3 1 2
2151-5 3 -1 -2
2152 5 -3 -1 2
2153-5 -3 1 -2
2154```
2155
2156A zero divisor in either case results in bottom (an error).
2157
2158For integer operands, the unary operators `+` and `-` are defined as follows:
2159
2160```
2161+x is 0 + x
2162-x negation is 0 - x
2163```
2164
2165
2166#### Decimal floating-point operators
2167
2168For decimal floating-point numbers, `+x` is the same as `x`,
2169while -x is the negation of x.
2170The result of a floating-point division by zero is bottom (an error).
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002171<!-- TODO: consider making it +/- Inf -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002172
2173An implementation may combine multiple floating-point operations into a single
2174fused operation, possibly across statements, and produce a result that differs
2175from the value obtained by executing and rounding the instructions individually.
2176
2177
2178#### List operators
2179
2180Lists can be concatenated using the `+` operator.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002181Opens list are closed to their default value beforehand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002182
2183```
2184[ 1, 2 ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
2185[ 1, 2, ... ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002186[ 1, 2 ] + [ 3, 4, ... ] // [ 1, 2, 3, 4 ]
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002187```
2188
Jonathan Amsterdam0500c312019-02-16 18:04:09 -05002189Lists can be multiplied with a non-negative`int` using the `*` operator
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01002190to create a repeated the list by the indicated number.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002191```
21923*[1,2] // [1, 2, 1, 2, 1, 2]
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +020021933*[1, 2, ...] // [1, 2, 1, 2, 1 ,2]
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01002194[byte]*4 // [byte, byte, byte, byte]
Jonathan Amsterdam0500c312019-02-16 18:04:09 -050021950*[1,2] // []
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002196```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002197
2198<!-- TODO(mpvl): should we allow multiplication with a range?
2199If so, how does one specify a list with a range of possible lengths?
2200
2201Suggestion from jba:
2202Multiplication should distribute over disjunction,
2203so int(1)..int(3) * [x] = [x] | [x, x] | [x, x, x].
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002204The hard part is figuring out what (>=1 & <=3) * [x] means,
2205since >=1 & <=3 includes many floats.
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002206(mpvl: could constrain arguments to parameter types, but needs to be
2207done consistently.)
2208-->
2209
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002210
2211#### String operators
2212
2213Strings can be concatenated using the `+` operator:
2214```
2215s := "hi " + name + " and good bye"
2216```
2217String addition creates a new string by concatenating the operands.
2218
2219A string can be repeated by multiplying it:
2220
2221```
2222s: "etc. "*3 // "etc. etc. etc. "
2223```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002224<!-- jba: Do these work for byte sequences? If not, why not? -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002225
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002226
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002227##### Comparison operators
2228
2229Comparison operators compare two operands and yield an untyped boolean value.
2230
2231```
2232== equal
2233!= not equal
2234< less
2235<= less or equal
2236> greater
2237>= greater or equal
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002238=~ matches regular expression
2239!~ does not match regular expression
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002240```
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002241<!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002242
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002243In any comparison, the types of the two operands must unify or one of the
2244operands must be null.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002245
2246The equality operators `==` and `!=` apply to operands that are comparable.
2247The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002248The matching operators `=~` and `!~` apply to a string and regular
2249expression operand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002250These terms and the result of the comparisons are defined as follows:
2251
Marcel van Lohuizen855243e2019-02-07 18:00:55 +01002252- Null is comparable with itself and any other type.
2253 Two null values are always equal, null is unequal with anything else.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002254- Boolean values are comparable.
2255 Two boolean values are equal if they are either both true or both false.
2256- Integer values are comparable and ordered, in the usual way.
2257- Floating-point values are comparable and ordered, as per the definitions
2258 for binary coded decimals in the IEEE-754-2008 standard.
Marcel van Lohuizen4a360992019-05-11 18:18:31 +02002259- Floating point numbers may be compared with integers.
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02002260- String and bytes values are comparable and ordered lexically byte-wise.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002261- Struct are not comparable.
Marcel van Lohuizen855243e2019-02-07 18:00:55 +01002262- Lists are not comparable.
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002263- The regular expression syntax is the one accepted by RE2,
2264 described in https://github.com/google/re2/wiki/Syntax,
2265 except for `\C`.
2266- `s =~ r` is true if `s` matches the regular expression `r`.
2267- `s !~ r` is true if `s` does not match regular expression `r`.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02002268<!--- TODO: consider the following
2269- For regular expression, named capture groups are interpreted as CUE references
2270 that must unify with the strings matching this capture group.
2271--->
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002272<!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
Marcel van Lohuizen88a8a5f2019-02-20 01:26:22 +01002273<!-- Consider implementing Level 2 of Unicode regular expression. -->
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002274
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002275```
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +010022763 < 4 // true
Marcel van Lohuizen4a360992019-05-11 18:18:31 +020022773 < 4.0 // true
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002278null == 2 // false
2279null != {} // true
2280{} == {} // _|_: structs are not comparable against structs
2281
2282"Wild cats" =~ "cat" // true
2283"Wild cats" !~ "dog" // true
2284
2285"foo" =~ "^[a-z]{3}$" // true
2286"foo" =~ "^[a-z]{4}$" // false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002287```
2288
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002289<!-- jba
2290I think I know what `3 < a` should mean if
2291
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002292 a: >=1 & <=5
2293
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002294It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
2295
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002296But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002297-->
2298
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002299#### Logical operators
2300
2301Logical operators apply to boolean values and yield a result of the same type
2302as the operands. The right operand is evaluated conditionally.
2303
2304```
2305&& conditional AND p && q is "if p then q else false"
2306|| conditional OR p || q is "if p then true else q"
2307! NOT !p is "not p"
2308```
2309
2310
2311<!--
2312### TODO TODO TODO
2313
23143.14 / 0.0 // illegal: division by zero
2315Illegal conversions always apply to CUE.
2316
2317Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
2318-->
2319
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002320<!--- TODO(mpvl): conversions
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002321### Conversions
2322Conversions are expressions of the form `T(x)` where `T` and `x` are
2323expressions.
2324The result is always an instance of `T`.
2325
2326```
2327Conversion = Expression "(" Expression [ "," ] ")" .
2328```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002329--->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002330<!---
2331
2332A literal value `x` can be converted to type T if `x` is representable by a
2333value of `T`.
2334
2335As a special case, an integer literal `x` can be converted to a string type
2336using the same rule as for non-constant x.
2337
2338Converting a literal yields a typed value as result.
2339
2340```
2341uint(iota) // iota value of type uint
2342float32(2.718281828) // 2.718281828 of type float32
2343complex128(1) // 1.0 + 0.0i of type complex128
2344float32(0.49999999) // 0.5 of type float32
2345float64(-1e-1000) // 0.0 of type float64
2346string('x') // "x" of type string
2347string(0x266c) // "♬" of type string
2348MyString("foo" + "bar") // "foobar" of type MyString
2349string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant
2350(*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
2351int(1.2) // illegal: 1.2 cannot be represented as an int
2352string(65.0) // illegal: 65.0 is not an integer constant
2353```
2354--->
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002355<!---
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002356
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002357A conversion is always allowed if `x` is an instance of `T`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002358
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002359If `T` and `x` of different underlying type, a conversion is allowed if
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002360`x` can be converted to a value `x'` of `T`'s type, and
2361`x'` is an instance of `T`.
2362A value `x` can be converted to the type of `T` in any of these cases:
2363
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002364- `x` is a struct and is subsumed by `T`.
2365- `x` and `T` are both integer or floating points.
2366- `x` is an integer or a byte sequence and `T` is a string.
2367- `x` is a string and `T` is a byte sequence.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002368
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002369Specific rules apply to conversions between numeric types, structs,
2370or to and from a string type. These conversions may change the representation
2371of `x`.
2372All other conversions only change the type but not the representation of x.
2373
2374
2375#### Conversions between numeric ranges
2376For the conversion of numeric values, the following rules apply:
2377
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +010023781. Any integer value can be converted into any other integer value
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002379 provided that it is within range.
23802. When converting a decimal floating-point number to an integer, the fraction
2381 is discarded (truncation towards zero). TODO: or disallow truncating?
2382
2383```
2384a: uint16(int(1000)) // uint16(1000)
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002385b: uint8(1000) // _|_ // overflow
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002386c: int(2.5) // 2 TODO: TBD
2387```
2388
2389
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002390#### Conversions to and from a string type
2391
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002392Converting a list of bytes to a string type yields a string whose successive
2393bytes are the elements of the slice.
2394Invalid UTF-8 is converted to `"\uFFFD"`.
2395
2396```
2397string('hell\xc3\xb8') // "hellø"
2398string(bytes([0x20])) // " "
2399```
2400
2401As string value is always convertible to a list of bytes.
2402
2403```
2404bytes("hellø") // 'hell\xc3\xb8'
2405bytes("") // ''
2406```
2407
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002408#### Conversions between list types
2409
2410Conversions between list types are possible only if `T` strictly subsumes `x`
2411and the result will be the unification of `T` and `x`.
2412
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002413If we introduce named types this would be different from IP & [10, ...]
2414
2415Consider removing this until it has a different meaning.
2416
2417```
2418IP: 4*[byte]
2419Private10: IP([10, ...]) // [10, byte, byte, byte]
2420```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002421
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002422#### Conversions between struct types
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002423
2424A conversion from `x` to `T`
2425is applied using the following rules:
2426
24271. `x` must be an instance of `T`,
24282. all fields defined for `x` that are not defined for `T` are removed from
2429 the result of the conversion, recursively.
2430
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002431<!-- jba: I don't think you say anywhere that the matching fields are unified.
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002432mpvl: they are not, x must be an instance of T, in which case x == T&x,
2433so unification would be unnecessary.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002434-->
Marcel van Lohuizena3f00972019-02-01 11:10:39 +01002435<!--
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002436```
2437T: {
2438 a: { b: 1..10 }
2439}
2440
2441x1: {
2442 a: { b: 8, c: 10 }
2443 d: 9
2444}
2445
2446c1: T(x1) // { a: { b: 8 } }
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002447c2: T({}) // _|_ // missing field 'a' in '{}'
2448c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002449```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002450-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002451
2452### Calls
2453
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002454Calls can be made to core library functions, called builtins.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002455Given an expression `f` of function type F,
2456```
2457f(a1, a2, … an)
2458```
2459calls `f` with arguments a1, a2, … an. Arguments must be expressions
2460of which the values are an instance of the parameter types of `F`
2461and are evaluated before the function is called.
2462
2463```
2464a: math.Atan2(x, y)
2465```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002466
2467In a function call, the function value and arguments are evaluated in the usual
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01002468order.
2469After they are evaluated, the parameters of the call are passed by value
2470to the function and the called function begins execution.
2471The return parameters
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002472of the function are passed by value back to the calling function when the
2473function returns.
2474
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002475
2476### Comprehensions
2477
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01002478Lists and fields can be constructed using comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002479
2480Each define a clause sequence that consists of a sequence of `for`, `if`, and
2481`let` clauses, nesting from left to right.
2482The `for` and `let` clauses each define a new scope in which new values are
2483bound to be available for the next clause.
2484
2485The `for` clause binds the defined identifiers, on each iteration, to the next
2486value of some iterable value in a new scope.
2487A `for` clause may bind one or two identifiers.
Marcel van Lohuizen4245fb42019-09-09 11:22:12 +02002488If there is one identifier, it binds it to the value of
2489a list element or struct field value.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002490If there are two identifiers, the first value will be the key or index,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002491if available, and the second will be the value.
2492
Marcel van Lohuizen4245fb42019-09-09 11:22:12 +02002493For lists, `for` iterates over all elements in the list after closing it.
2494For structs, `for` iterates over all non-optional regular fields.
2495
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002496An `if` clause, or guard, specifies an expression that terminates the current
2497iteration if it evaluates to false.
2498
2499The `let` clause binds the result of an expression to the defined identifier
2500in a new scope.
2501
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002502A current iteration is said to complete if the innermost block of the clause
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002503sequence is reached.
2504
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002505_List comprehensions_ specify a single expression that is evaluated and included
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002506in the list for each completed iteration.
2507
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002508_Field comprehensions_ follow a clause sequence with a struct literal,
2509where the struct literal is evaluated and embedded at the point of
2510declaration of the comprehension for each complete iteration.
2511As usual, fields in the struct may evaluate to the same label,
2512resulting in the unification of their values.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002513
2514```
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002515Comprehension = Clauses StructLit .
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002516ListComprehension = "[" Expression Clauses "]" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002517
2518Clauses = Clause { Clause } .
2519Clause = ForClause | GuardClause | LetClause .
2520ForClause = "for" identifier [ ", " identifier] "in" Expression .
2521GuardClause = "if" Expression .
2522LetClause = "let" identifier "=" Expression .
2523```
2524
2525```
2526a: [1, 2, 3, 4]
2527b: [ x+1 for x in a if x > 1] // [3, 4, 5]
2528
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002529c: {
2530 for x in a
2531 if x < 4
2532 let y = 1 {
2533 "\(x)": x + y
2534 }
2535}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002536d: { "1": 2, "2": 3, "3": 4 }
2537```
2538
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002539
2540### String interpolation
2541
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002542String interpolation allows constructing strings by replacing placeholder
2543expressions with their string representation.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002544String interpolation may be used in single- and double-quoted strings, as well
2545as their multiline equivalent.
2546
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002547A placeholder consists of "\(" followed by an expression and a ")". The
2548expression is evaluated within the scope within which the string is defined.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002549
2550```
2551a: "World"
2552b: "Hello \( a )!" // Hello World!
2553```
2554
2555
2556## Builtin Functions
2557
2558Built-in functions are predeclared. They are called like any other function.
2559
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002560
2561### `len`
2562
2563The built-in function `len` takes arguments of various types and return
2564a result of type int.
2565
2566```
2567Argument type Result
2568
2569string string length in bytes
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002570bytes length of byte sequence
2571list list length, smallest length for an open list
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002572struct number of distinct data fields, including optional
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002573```
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002574<!-- TODO: consider not supporting len, but instead rely on more
2575precisely named builtin functions:
2576 - strings.RuneLen(x)
2577 - bytes.Len(x) // x may be a string
2578 - struct.NumFooFields(x)
2579 - list.Len(x)
2580-->
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002581
2582```
2583Expression Result
2584len("Hellø") 6
2585len([1, 2, 3]) 3
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002586len([1, 2, ...]) >=2
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002587```
2588
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02002589
2590### `close`
2591
2592The builtin function `close` converts a partially defined, or open, struct
2593to a fully defined, or closed, struct.
2594
2595
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002596### `and`
2597
2598The built-in function `and` takes a list and returns the result of applying
2599the `&` operator to all elements in the list.
2600It returns top for the empty list.
2601
2602Expression: Result
2603and([a, b]) a & b
2604and([a]) a
2605and([]) _
2606
2607### `or`
2608
2609The built-in function `or` takes a list and returns the result of applying
2610the `|` operator to all elements in the list.
2611It returns bottom for the empty list.
2612
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002613```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002614Expression: Result
2615and([a, b]) a | b
2616and([a]) a
2617and([]) _|_
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002618```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002619
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002620
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002621## Cycles
2622
2623Implementations are required to interpret or reject cycles encountered
2624during evaluation according to the rules in this section.
2625
2626
2627### Reference cycles
2628
2629A _reference cycle_ occurs if a field references itself, either directly or
2630indirectly.
2631
2632```
2633// x references itself
2634x: x
2635
2636// indirect cycles
2637b: c
2638c: d
2639d: b
2640```
2641
2642Implementations should report these as an error except in the following cases:
2643
2644
2645#### Expressions that unify an atom with an expression
2646
2647An expression of the form `a & e`, where `a` is an atom
2648and `e` is an expression, always evaluates to `a` or bottom.
2649As it does not matter how we fail, we can assume the result to be `a`
2650and validate after the field in which the expression occurs has been evaluated
2651that `a == e`.
2652
2653```
Marcel van Lohuizeneac8f9a2019-08-03 13:53:56 +02002654// Config Evaluates to (requiring concrete values)
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002655x: { x: {
2656 a: b + 100 a: _|_ // cycle detected
2657 b: a - 100 b: _|_ // cycle detected
2658} }
2659
2660y: x & { y: {
2661 a: 200 a: 200 // asserted that 200 == b + 100
2662 b: 100
2663} }
2664```
2665
2666
2667#### Field values
2668
2669A field value of the form `r & v`,
2670where `r` evaluates to a reference cycle and `v` is a value,
2671evaluates to `v`.
2672Unification is idempotent and unifying a value with itself ad infinitum,
2673which is what the cycle represents, results in this value.
2674Implementations should detect cycles of this kind, ignore `r`,
2675and take `v` as the result of unification.
2676<!-- Tomabechi's graph unification algorithm
2677can detect such cycles at near-zero cost. -->
2678
2679```
2680Configuration Evaluated
2681// c Cycles in nodes of type struct evaluate
2682// ↙︎ ↖ to the fixed point of unifying their
2683// a → b values ad infinitum.
2684
2685a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 }
2686b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 }
2687c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 }
2688
2689// resolve a b & {x:1}
2690// substitute b c & {y:2} & {x:1}
2691// substitute c a & {z:3} & {y:2} & {x:1}
2692// eliminate a (cycle) {z:3} & {y:2} & {x:1}
2693// simplify {x:1,y:2,z:3}
2694```
2695
2696This rule also applies to field values that are disjunctions of unification
2697operations of the above form.
2698
2699```
2700a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1}
2701b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2}
2702c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3}
2703
2704
2705// resolving a b&{x:1} | {y:1}
2706// substitute b ({x:2} | c&{z:2})&{x:1} | {y:1}
2707// simplify c&{z:2}&{x:1} | {y:1}
2708// substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
2709// simplify a&{y:3}&{z:2}&{x:1} | {y:1}
2710// eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1}
2711// expand {x:1,y:3,z:2} | {y:1}
2712```
2713
2714Note that all nodes that form a reference cycle to form a struct will evaluate
2715to the same value.
2716If a field value is a disjunction, any element that is part of a cycle will
2717evaluate to this value.
2718
2719
2720### Structural cycles
2721
2722CUE disallows infinite structures.
2723Implementations must report an error when encountering such declarations.
2724
2725<!-- for instance using an occurs check -->
2726
2727```
2728// Disallowed: a list of infinite length with all elements being 1.
2729list: {
2730 head: 1
2731 tail: list
2732}
2733
2734// Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
2735a: {
2736 b: c
2737}
2738c: {
2739 d: a
2740}
2741```
2742
2743It is allowed for a value to define an infinite set of possibilities
2744without evaluating to an infinite structure itself.
2745
2746```
2747// List defines a list of arbitrary length (default null).
2748List: *null | {
2749 head: _
2750 tail: List
2751}
2752```
2753
2754<!--
Marcel van Lohuizen7f48df72019-02-01 17:24:59 +01002755Consider banning any construct that makes CUE not having a linear
2756running time expressed in the number of nodes in the output.
2757
2758This would require restricting constructs like:
2759
2760(fib&{n:2}).out
2761
2762fib: {
2763 n: int
2764
2765 out: (fib&{n:n-2}).out + (fib&{n:n-1}).out if n >= 2
2766 out: fib({n:n-2}).out + fib({n:n-1}).out if n >= 2
2767 out: n if n < 2
2768}
2769
2770-->
2771<!--
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002772### Unused fields
2773
2774TODO: rules for detection of unused fields
2775
27761. Any alias value must be used
2777-->
2778
2779
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002780## Modules, instances, and packages
2781
2782CUE configurations are constructed combining _instances_.
2783An instance, in turn, is constructed from one or more source files belonging
2784to the same _package_ that together declare the data representation.
2785Elements of this data representation may be exported and used
2786in other instances.
2787
2788### Source file organization
2789
2790Each source file consists of an optional package clause defining collection
2791of files to which it belongs,
2792followed by a possibly empty set of import declarations that declare
2793packages whose contents it wishes to use, followed by a possibly empty set of
2794declarations.
2795
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002796Like with a struct, a source file may contain embeddings.
2797Unlike with a struct, the embedded expressions may be any value.
2798If the result of the unification of all embedded values is not a struct,
2799it will be output instead of its enclosing file when exporting CUE
2800to a data format
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002801
2802```
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002803SourceFile = [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .
2804```
2805
2806```
2807"Hello \(place)!"
2808
2809place: "world"
2810
2811// Outputs "Hello world!"
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002812```
2813
2814### Package clause
2815
2816A package clause is an optional clause that defines the package to which
2817a source file the file belongs.
2818
2819```
2820PackageClause = "package" PackageName .
2821PackageName = identifier .
2822```
2823
2824The PackageName must not be the blank identifier.
2825
2826```
2827package math
2828```
2829
2830### Modules and instances
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002831A _module_ defines a tree of directories, rooted at the _module root_.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002832
2833All source files within a module with the same package belong to the same
2834package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002835<!-- jba: I can't make sense of the above sentence. -->
2836A module may define multiple packages.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002837
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002838An _instance_ of a package is any subset of files belonging
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002839to the same package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002840<!-- jba: Are you saying that -->
2841<!-- if I have a package with files a, b and c, then there are 8 instances of -->
2842<!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
2843<!-- purpose of that definition? -->
2844It is interpreted as the concatenation of these files.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002845
2846An implementation may impose conventions on the layout of package files
2847to determine which files of a package belongs to an instance.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002848For example, an instance may be defined as the subset of package files
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002849belonging to a directory and all its ancestors.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002850<!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002851
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002852
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002853### Import declarations
2854
2855An import declaration states that the source file containing the declaration
2856depends on definitions of the _imported_ package (§Program initialization and
2857execution) and enables access to exported identifiers of that package.
2858The import names an identifier (PackageName) to be used for access and an
2859ImportPath that specifies the package to be imported.
2860
2861```
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002862ImportDecl = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) .
Marcel van Lohuizenfbab65d2019-08-13 16:51:15 +02002863ImportSpec = [ PackageName ] ImportPath .
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002864ImportLocation = { unicode_value } .
2865ImportPath = `"` ImportLocation [ ":" identifier ] `"` .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002866```
2867
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002868The PackageName is used in qualified identifiers to access
2869exported identifiers of the package within the importing source file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002870It is declared in the file block.
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002871It defaults to the identifier specified in the package clause of the imported
2872package, which must match either the last path component of ImportLocation
2873or the identifier following it.
2874
2875<!--
2876Note: this deviates from the Go spec where there is no such restriction.
2877This restriction has the benefit of being to determine the identifiers
2878for packages from within the file itself. But for CUE it is has another benefit:
2879when using package hiearchies, one is more likely to want to include multiple
2880packages within the same directory structure. This mechanism allows
2881disambiguation in these cases.
2882-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002883
2884The interpretation of the ImportPath is implementation-dependent but it is
2885typically either the path of a builtin package or a fully qualifying location
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002886of a package within a source code repository.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002887
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002888An ImportLocation must be a non-empty strings using only characters belonging
2889Unicode's L, M, N, P, and S general categories
2890(the Graphic characters without spaces)
2891and may not include the characters !"#$%&'()*,:;<=>?[\]^`{|}
2892or the Unicode replacement character U+FFFD.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002893
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002894Assume we have package containing the package clause "package math",
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002895which exports function Sin at the path identified by "lib/math".
2896This table illustrates how Sin is accessed in files
2897that import the package after the various types of import declaration.
2898
2899```
2900Import declaration Local name of Sin
2901
2902import "lib/math" math.Sin
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002903import "lib/math:math" math.Sin
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002904import m "lib/math" m.Sin
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002905```
2906
2907An import declaration declares a dependency relation between the importing and
2908imported package. It is illegal for a package to import itself, directly or
2909indirectly, or to directly import a package without referring to any of its
2910exported identifiers.
2911
2912
2913### An example package
2914
2915TODO