blob: ca9df9f1ae1bf39f5c4c5fbcd3979ebdfd9509db [file] [log] [blame] [view]
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001<!--
2 Copyright 2018 The CUE Authors
3
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License.
6 You may obtain a copy of the License at
7
8 http://www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in writing, software
11 distributed under the License is distributed on an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 See the License for the specific language governing permissions and
14 limitations under the License.
15-->
16
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010017# The CUE Language Specification
18
19## Introduction
20
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010021This is a reference manual for the CUE data constraint language.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010022CUE, pronounced cue or Q, is a general-purpose and strongly typed
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010023constraint-based language.
24It can be used for data templating, data validation, code generation, scripting,
25and many other applications involving structured data.
26The CUE tooling, layered on top of CUE, provides
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010027a general purpose scripting language for creating scripts as well as
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010028simple servers, also expressed in CUE.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010029
30CUE was designed with cloud configuration, and related systems, in mind,
31but is not limited to this domain.
32It derives its formalism from relational programming languages.
33This formalism allows for managing and reasoning over large amounts of
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010034data in a straightforward manner.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010035
36The grammar is compact and regular, allowing for easy analysis by automatic
37tools such as integrated development environments.
38
39This document is maintained by mpvl@golang.org.
40CUE has a lot of similarities with the Go language. This document draws heavily
Marcel van Lohuizen73f14eb2019-01-30 17:11:17 +010041from the Go specification as a result.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010042
43CUE draws its influence from many languages.
44Its main influences were BCL/ GCL (internal to Google),
45LKB (LinGO), Go, and JSON.
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +020046Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
Marcel van Lohuizen62658a82019-06-16 12:18:47 +020047Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010048
49
50## Notation
51
52The syntax is specified using Extended Backus-Naur Form (EBNF):
53
54```
55Production = production_name "=" [ Expression ] "." .
56Expression = Alternative { "|" Alternative } .
57Alternative = Term { Term } .
58Term = production_name | token [ "…" token ] | Group | Option | Repetition .
59Group = "(" Expression ")" .
60Option = "[" Expression "]" .
61Repetition = "{" Expression "}" .
62```
63
64Productions are expressions constructed from terms and the following operators,
65in increasing precedence:
66
67```
68| alternation
69() grouping
70[] option (0 or 1 times)
71{} repetition (0 to n times)
72```
73
74Lower-case production names are used to identify lexical tokens. Non-terminals
75are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes
76``.
77
78The form a … b represents the set of characters from a through b as
79alternatives. The horizontal ellipsis … is also used elsewhere in the spec to
80informally denote various enumerations or code snippets that are not further
81specified. The character … (as opposed to the three characters ...) is not a
Roger Peppeded0e1d2019-09-24 16:39:36 +010082token of the CUE language.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010083
84
85## Source code representation
86
87Source code is Unicode text encoded in UTF-8.
88Unless otherwise noted, the text is not canonicalized, so a single
89accented code point is distinct from the same character constructed from
90combining an accent and a letter; those are treated as two code points.
91For simplicity, this document will use the unqualified term character to refer
92to a Unicode code point in the source text.
93
94Each code point is distinct; for instance, upper and lower case letters are
95different characters.
96
97Implementation restriction: For compatibility with other tools, a compiler may
98disallow the NUL character (U+0000) in the source text.
99
100Implementation restriction: For compatibility with other tools, a compiler may
101ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
102point in the source text. A byte order mark may be disallowed anywhere else in
103the source.
104
105
106### Characters
107
108The following terms are used to denote specific Unicode character classes:
109
110```
111newline = /* the Unicode code point U+000A */ .
112unicode_char = /* an arbitrary Unicode code point except newline */ .
113unicode_letter = /* a Unicode code point classified as "Letter" */ .
114unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
115```
116
117In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
118character categories.
119CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
120as Unicode letters, and those in the Number category Nd as Unicode digits.
121
122
123### Letters and digits
124
125The underscore character _ (U+005F) is considered a letter.
126
127```
128letter = unicode_letter | "_" .
129decimal_digit = "0" … "9" .
130octal_digit = "0" … "7" .
131hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
132```
133
134
135## Lexical elements
136
137### Comments
Marcel van Lohuizen7fc421b2019-09-11 09:24:03 +0200138Comments serve as program documentation.
139CUE supports line comments that start with the character sequence //
140and stop at the end of the line.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100141
Marcel van Lohuizen7fc421b2019-09-11 09:24:03 +0200142A comment cannot start inside a string literal or inside a comment.
143A comment acts like a newline.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100144
145
146### Tokens
147
148Tokens form the vocabulary of the CUE language. There are four classes:
149identifiers, keywords, operators and punctuation, and literals. White space,
150formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
151(U+000D), and newlines (U+000A), is ignored except as it separates tokens that
152would otherwise combine into a single token. Also, a newline or end of file may
153trigger the insertion of a comma. While breaking the input into tokens, the
154next token is the longest sequence of characters that form a valid token.
155
156
157### Commas
158
159The formal grammar uses commas "," as terminators in a number of productions.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500160CUE programs may omit most of these commas using the following two rules:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100161
162When the input is broken into tokens, a comma is automatically inserted into
163the token stream immediately after a line's final token if that token is
164
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500165- an identifier
166- null, true, false, bottom, or an integer, floating-point, or string literal
167- one of the characters ), ], or }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100168
169
170Although commas are automatically inserted, the parser will require
171explicit commas between two list elements.
172
173To reflect idiomatic use, examples in this document elide commas using
174these rules.
175
176
177### Identifiers
178
179Identifiers name entities such as fields and aliases.
Marcel van Lohuizen8a2df962019-11-10 00:14:24 +0100180An identifier is a sequence of one or more letters (which includes `_` and `$`)
Marcel van Lohuizenb7083ff2020-05-12 11:38:19 +0200181and digits, optionally preceded by `#` or `_#`.
Marcel van Lohuizendbf1c002020-05-16 14:19:34 +0200182It may not be `_` or `$`.
183The first character in an identifier, or after an `#` if it contains one,
184must be a letter.
Marcel van Lohuizenb7083ff2020-05-12 11:38:19 +0200185Identifiers starting with a `#` or `_` are reserved for definitions and hidden
186fields.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100187
188<!--
189TODO: allow identifiers as defined in Unicode UAX #31
190(https://unicode.org/reports/tr31/).
191
192Identifiers are normalized using the NFC normal form.
193-->
194
195```
Marcel van Lohuizenb7083ff2020-05-12 11:38:19 +0200196identifier = [ "#" | "_#" ] letter { letter | unicode_digit } .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100197```
198
199```
200a
201_x9
202fieldName
203αβ
204```
205
206<!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
207
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500208Some identifiers are [predeclared](#predeclared-identifiers).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100209
210
211### Keywords
212
213CUE has a limited set of keywords.
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200214In addition, CUE reserves all identifiers starting with `__`(double underscores)
215as keywords.
216These are typically targets of pre-declared identifiers.
217
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100218All keywords may be used as labels (field names).
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +0200219Unless noted otherwise, they can also be used as identifiers to refer to
220the same name.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100221
222
223#### Values
224
225The following keywords are values.
226
227```
228null true false
229```
230
231These can never be used to refer to a field of the same name.
232This restriction is to ensure compatibility with JSON configuration files.
233
234
235#### Preamble
236
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100237The following keywords are used at the preamble of a CUE file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100238After the preamble, they may be used as identifiers to refer to namesake fields.
239
240```
241package import
242```
243
244
245#### Comprehension clauses
246
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100247The following keywords are used in comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100248
249```
250for in if let
251```
252
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100253<!--
254TODO:
255 reduce [to]
256 order [by]
257-->
258
259
260#### Arithmetic
261
262The following pseudo keywords can be used as operators in expressions.
263
264```
265div mod quo rem
266```
267
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100268### Operators and punctuation
269
270The following character sequences represent operators and punctuation:
271
272```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200273+ div && == < = ( )
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +0100274- mod || != > : { }
275* quo & =~ <= ? [ ] ,
276/ rem | !~ >= ! _|_ ... .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100277```
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200278<!--
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +0100279Free tokens: ; ~ ^
Marcel van Lohuizen40178752019-08-25 19:17:56 +0200280// To be used:
281 @ at: associative lists.
282
283// Idea: use # instead of @ for attributes and allow then at declaration level.
284// This will open up the possibility of defining #! at the start of a file
285// without requiring special syntax. Although probably not quite.
286 -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100287
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +0100288
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100289### Integer literals
290
291An integer literal is a sequence of digits representing an integer value.
Marcel van Lohuizenb2703c62019-09-29 18:20:01 +0200292An optional prefix sets a non-decimal base: 0o for octal,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002930x or 0X for hexadecimal, and 0b for binary.
294In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500295All integers allow interstitial underscores "_";
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100296these have no meaning and are solely for readability.
297
298Decimal integers may have a SI or IEC multiplier.
299Multipliers can be used with fractional numbers.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500300When multiplying a fraction by a multiplier, the result is truncated
301towards zero if it is not an integer.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100302
303```
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200304int_lit = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
305decimal_lit = ( "1" … "9" ) { [ "_" ] decimal_digit } .
306decimals = decimal_digit { [ "_" ] decimal_digit } .
307si_it = decimals [ "." decimals ] multiplier |
308 "." decimals multiplier .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100309binary_lit = "0b" binary_digit { binary_digit } .
310hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
Marcel van Lohuizenb2703c62019-09-29 18:20:01 +0200311octal_lit = "0o" octal_digit { [ "_" ] octal_digit } .
Marcel van Lohuizen6eefcd02019-10-04 13:32:06 +0200312multiplier = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ]
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200313
314float_lit = decimals "." [ decimals ] [ exponent ] |
315 decimals exponent |
316 "." decimals [ exponent ].
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +0200317exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100318```
Marcel van Lohuizen6eefcd02019-10-04 13:32:06 +0200319<!--
320TODO: consider allowing Exo (and up), if not followed by a sign
321or number. Alternatively one could only allow Ei, Yi, and Zi.
322-->
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100323
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100324```
32542
3261.5Gi
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100327170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizenfc6303c2019-02-07 17:49:04 +01003280xBad_Face
3290o755
3300b0101_0001
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100331```
332
333### Decimal floating-point literals
334
335A decimal floating-point literal is a representation of
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500336a decimal floating-point value (a _float_).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100337It has an integer part, a decimal point, a fractional part, and an
338exponent part.
339The integer and fractional part comprise decimal digits; the
340exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
341One of the integer part or the fractional part may be elided; one of the decimal
342point or the exponent may be elided.
343
344```
345decimal_lit = decimals "." [ decimals ] [ exponent ] |
346 decimals exponent |
347 "." decimals [ exponent ] .
348exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
349```
350
351```
3520.
35372.40
354072.40 // == 72.40
3552.71828
3561.e+0
3576.67428e-11
3581E6
359.25
360.12345E+5
361```
362
363
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100364### String and byte sequence literals
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100365
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100366A string literal represents a string constant obtained from concatenating a
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100367sequence of characters.
368Byte sequences are a sequence of bytes.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100369
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100370String and byte sequence literals are character sequences between,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100371respectively, double and single quotes, as in `"bar"` and `'bar'`.
372Within the quotes, any character may appear except newline and,
373respectively, unescaped double or single quote.
374String literals may only be valid UTF-8.
375Byte sequences may contain any sequence of bytes.
376
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400377Several escape sequences allow arbitrary values to be encoded as ASCII text.
378An escape sequence starts with an _escape delimiter_, which is `\` by default.
379The escape delimiter may be altered to be `\` plus a fixed number of
380hash symbols `#`
381by padding the start and end of a string or byte sequence literal
382with this number of hash symbols.
383
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100384There are four ways to represent the integer value as a numeric constant: `\x`
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400385followed by exactly two hexadecimal digits; `\u` followed by exactly four
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100386hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
387plain backslash `\` followed by exactly three octal digits.
388In each case the value of the literal is the value represented by the
389digits in the corresponding base.
390Hexadecimal and octal escapes are only allowed within byte sequences
391(single quotes).
392
393Although these representations all result in an integer, they have different
394valid ranges.
395Octal escapes must represent a value between 0 and 255 inclusive.
396Hexadecimal escapes satisfy this condition by construction.
397The escapes `\u` and `\U` represent Unicode code points so within them
398some values are illegal, in particular those above `0x10FFFF`.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400399Surrogate halves are allowed,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100400but are translated into their non-surrogate equivalent internally.
401
402The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
403represent individual bytes of the resulting string; all other escapes represent
404the (possibly multi-byte) UTF-8 encoding of individual characters.
405Thus inside a string literal `\377` and `\xFF` represent a single byte of
406value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
407the two bytes `0xc3 0xbf` of the UTF-8
408encoding of character `U+00FF`.
409
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100410```
411\a U+0007 alert or bell
412\b U+0008 backspace
413\f U+000C form feed
414\n U+000A line feed or newline
415\r U+000D carriage return
416\t U+0009 horizontal tab
417\v U+000b vertical tab
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100418\/ U+002f slash (solidus)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100419\\ U+005c backslash
420\' U+0027 single quote (valid escape only within single quoted literals)
421\" U+0022 double quote (valid escape only within double quoted literals)
422```
423
424The escape `\(` is used as an escape for string interpolation.
425A `\(` must be followed by a valid CUE Expression, followed by a `)`.
426
427All other sequences starting with a backslash are illegal inside literals.
428
429```
Marcel van Lohuizen39df6c92019-10-25 20:16:26 +0200430escaped_char = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100431byte_value = octal_byte_value | hex_byte_value .
432octal_byte_value = `\` octal_digit octal_digit octal_digit .
433hex_byte_value = `\` "x" hex_digit hex_digit .
434little_u_value = `\` "u" hex_digit hex_digit hex_digit hex_digit .
435big_u_value = `\` "U" hex_digit hex_digit hex_digit hex_digit
436 hex_digit hex_digit hex_digit hex_digit .
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400437unicode_value = unicode_char | little_u_value | big_u_value | escaped_char .
438interpolation = "\(" Expression ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100439
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400440string_lit = simple_string_lit |
441 multiline_string_lit |
442 simple_bytes_lit |
443 multiline_bytes_lit |
444 `#` string_lit `#` .
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100445
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400446simple_string_lit = `"` { unicode_value | interpolation } `"` .
Marcel van Lohuizenc6e5d172019-11-22 12:09:25 -0800447simple_bytes_lit = `'` { unicode_value | interpolation | byte_value } `'` .
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400448multiline_string_lit = `"""` newline
449 { unicode_value | interpolation | newline }
450 newline `"""` .
451multiline_bytes_lit = "'''" newline
452 { unicode_value | interpolation | byte_value | newline }
453 newline "'''" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100454```
455
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400456Carriage return characters (`\r`) inside string literals are discarded from
Marcel van Lohuizendb9d25a2019-02-21 23:54:43 +0100457the string value.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400458
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100459```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100460'a\000\xab'
461'\007'
462'\377'
463'\xa' // illegal: too few hexadecimal digits
464"\n"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100465"\""
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100466'Hello, world!\n'
467"Hello, \( name )!"
468"日本語"
469"\u65e5本\U00008a9e"
470"\xff\u00FF"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100471"\uD800" // illegal: surrogate half (TODO: probably should allow)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100472"\U00110000" // illegal: invalid Unicode code point
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400473
474#"This is not an \(interpolation)"#
475#"This is an \#(interpolation)"#
476#"The sequence "\U0001F604" renders as \#U0001F604."#
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100477```
478
479These examples all represent the same string:
480
481```
482"日本語" // UTF-8 input text
483'日本語' // UTF-8 input text as byte sequence
484`日本語` // UTF-8 input text as a raw literal
485"\u65e5\u672c\u8a9e" // the explicit Unicode code points
486"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points
487"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
488```
489
490If the source code represents a character as two code points, such as a
491combining form involving an accent and a letter, the result will appear as two
492code points if placed in a string literal.
493
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400494Strings and byte sequences have a multiline equivalent.
495Multiline strings are like their single-line equivalent,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100496but allow newline characters.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100497
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400498Multiline strings and byte sequences respectively start with
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100499a triple double quote (`"""`) or triple single quote (`'''`),
500immediately followed by a newline, which is discarded from the string contents.
501The string is closed by a matching triple quote, which must be by itself
502on a newline, preceded by optional whitespace.
Marcel van Lohuizenc8d6c392019-12-02 13:30:47 +0100503The newline preceding the closing quote is discarded from the string contents.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100504The whitespace before a closing triple quote must appear before any non-empty
505line after the opening quote and will be removed from each of these
506lines in the string literal.
507A closing triple quote may not appear in the string.
508To include it is suffices to escape one of the quotes.
509
510```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100511"""
512 lily:
513 out of the water
514 out of itself
515
516 bass
517 picking bugs
518 off the moon
519 — Nick Virgilio, Selected Haiku, 1988
520 """
521```
522
523This represents the same string as:
524
525```
526"lily:\nout of the water\nout of itself\n\n" +
527"bass\npicking bugs\noff the moon\n" +
528" — Nick Virgilio, Selected Haiku, 1988"
529```
530
531<!-- TODO: other values
532
533Support for other values:
534- Duration literals
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +0100535- regular expessions: `re("[a-z]")`
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100536-->
537
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500538
539## Values
540
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100541In addition to simple values like `"hello"` and `42.0`, CUE has _structs_.
542A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
543Structs are CUE's only way of building up complex values;
544lists, which we will see later,
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500545are defined in terms of structs.
546
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100547All possible values are ordered in a lattice,
548a partial order where every two elements have a single greatest lower bound.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500549A value `a` is an _instance_ of a value `b`,
550denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
551that is if `a` orders before `b` in the partial order
552(`⊑` is _not_ a CUE operator).
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100553We also say that `b` _subsumes_ `a` in this case.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500554In graphical terms, `b` is "above" `a` in the lattice.
555
556At the top of the lattice is the single ancestor of all values, called
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100557_top_, denoted `_` in CUE.
558Every value is an instance of top.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500559
560At the bottom of the lattice is the value called _bottom_, denoted `_|_`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100561A bottom value usually indicates an error.
562Bottom is an instance of every value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500563
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100564An _atom_ is any value whose only instances are itself and bottom.
565Examples of atoms are `42.0`, `"hello"`, `true`, `null`.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500566
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100567A value is _concrete_ if it is either an atom, or a struct all of whose
568field values are themselves concrete, recursively.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500569
570CUE's values also include what we normally think of as types, like `string` and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100571`float`.
572But CUE does not distinguish between types and values; only the
573relationship of values in the lattice is important.
574Each CUE "type" subsumes the concrete values that one would normally think
575of as part of that type.
576For example, "hello" is an instance of `string`, and `42.0` is an instance of
577`float`.
578In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`.
579We informally call these CUE's "basic types".
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100580
581
582```
583false ⊑ bool
584true ⊑ bool
585true ⊑ true
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01005865.0 ⊑ float
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100587bool ⊑ _
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100588_|_ ⊑ _
589_|_ ⊑ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100590
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100591_ ⋢ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100592_ ⋢ bool
593int ⋢ bool
594bool ⋢ int
595false ⋢ true
596true ⋢ false
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100597float ⋢ 5.0
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01005985 ⋢ 6
599```
600
601
602### Unification
603
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500604The _unification_ of values `a` and `b`
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100605is defined as the greatest lower bound of `a` and `b`. (That is, the
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500606value `u` such that `u ⊑ a` and `u ⊑ b`,
607and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
608it holds that `v ⊑ u`.)
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500609Since CUE values form a lattice, the unification of two CUE values is
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400610always unique.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100611
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500612These all follow from the definition of unification:
613- The unification of `a` with itself is always `a`.
614- The unification of values `a` and `b` where `a ⊑ b` is always `a`.
615- The unification of a value with bottom is always bottom.
616
617Unification in CUE is a [binary expression](#Operands), written `a & b`.
618It is commutative and associative.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100619As a consequence, order of evaluation is irrelevant, a property that is key
620to many of the constructs in the CUE language as well as the tooling layered
621on top of it.
622
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500623
624
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100625<!-- TODO: explicitly mention that disjunction is not a binary operation
626but a definition of a single value?-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100627
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100628
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100629### Disjunction
630
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500631The _disjunction_ of values `a` and `b`
632is defined as the least upper bound of `a` and `b`.
633(That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
634and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
635it holds that `d ⊑ e`.)
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100636This style of disjunctions is sometimes also referred to as sum types.
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500637Since CUE values form a lattice, the disjunction of two CUE values is always unique.
638
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100639
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500640These all follow from the definition of disjunction:
641- The disjunction of `a` with itself is always `a`.
642- The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
643- The disjunction of a value `a` with bottom is always `a`.
644- The disjunction of two bottom values is bottom.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100645
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500646Disjunction in CUE is a [binary expression](#Operands), written `a | b`.
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100647It is commutative, associative, and idempotent.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100648
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100649The unification of a disjunction with another value is equal to the disjunction
650composed of the unification of this value with all of the original elements
651of the disjunction.
652In other words, unification distributes over disjunction.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100653
654```
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100655(a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100656```
657
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100658```
659Expression Result
660({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100661(int | string) & "foo" "foo"
662("a" | "b") & "c" _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100663```
664
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100665A disjunction is _normalized_ if there is no element
666`a` for which there is an element `b` such that `a ⊑ b`.
667
668<!--
669Normalization is important, as we need to account for spurious elements
670For instance "tcp" | "tcp" should resolve to "tcp".
671
672Also consider
673
674 ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
675
676in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
677this expression is logically equivalent to {a:1} and should therefore be
678considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
679
680For instance, in
681
682 x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
683 y: x.a // 1
684
685y should resolve to 1, and not an error.
686
687For comparison, in
688
689 x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
690 y: x.a // _|_
691
692y should be an error as x is still ambiguous before the selector is applied,
693even though `a` resolves to 1 in all cases.
694-->
695
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500696
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100697#### Default values
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500698
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100699Any element of a disjunction can be marked as a default
Axel Wagner8529d772019-09-24 18:27:12 +0000700by prefixing it with an asterisk `*`.
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100701Intuitively, when an expression needs to be resolved for an operation other
702than unification or disjunctions,
703non-starred elements are dropped in favor of starred ones if the starred ones
704do not resolve to bottom.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500705
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100706More precisely, any value `v` may be associated with a default value `d`,
707denoted `(v, d)` (not CUE syntax),
708where `d` must be in instance of `v` (`d ⊑ v`).
709The rules for unifying and disjoining such values are as follows:
710
711```
712U1: (v1, d1) & v2 => (v1&v2, d1&v2)
713U2: (v1, d1) & (v2, d2) => (v1&v2, d1&d2)
714
715D1: (v1, d1) | v2 => (v1|v2, d1)
716D2: (v1, d1) | (v2, d2) => (v1|v2, d1|d2)
717```
718
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100719Default values may be introduced within disjunctions
720by _marking_ terms of a disjunction with an asterisk `*`
721([a unary expression](#Operators)).
722The default value of a disjunction with marked terms is the disjunction
723of those marked terms, applying the following rules for marks:
724
725```
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200726M1: *v => (v, v)
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100727M2: *(v1, d1) => (v1, d1)
728```
729
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400730In general, any operation `f` in CUE involving default values proceeds along the
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200731following lines
732```
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400733O1: f((v1, d1), ..., (vn, dn)) => (f(v1, ..., vn), f(d1, ..., dn))
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200734```
735where, with the exception of disjunction, a value `v` without a default
736value is promoted to `(v, v)`.
737
738
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100739```
740Expression Value-default pair Rules applied
741*"tcp" | "udp" ("tcp"|"udp", "tcp") M1, D1
742string | *"foo" (string, "foo") M1, D1
743
744*1 | 2 | 3 (1|2|3, 1) M1, D1
745
746(*1|2|3) | (1|*2|3) (1|2|3, 1|2) M1, D1, D2
747(*1|2|3) | *(1|*2|3) (1|2|3, 1|2) M1, D1, M2, D2
748(*1|2|3) | (1|*2|3)&2 (1|2|3, 1|2) M1, D1, U1, D2
749
750(*1|2) & (1|*2) (1|2, _|_) M1, D1, U2
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200751
752(*1|2) + (1|*2) ((1|2)+(1|2), 3) M1, D1, O1
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100753```
754
755The rules of subsumption for defaults can be derived from the above definitions
756and are as follows.
757
758```
759(v2, d2) ⊑ (v1, d1) if v2 ⊑ v1 and d2 ⊑ d1
760(v1, d1) ⊑ v if v1 ⊑ v
761v ⊑ (v1, d1) if v ⊑ d1
762```
763
764<!--
765For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
766
767The last one is so restrictive as v could still be made more specific by
768associating it with a default that is not subsumed by d1.
769
770Proof:
771 by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
772 where the most general value is (v, v).
773 Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
774 from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
775 exactly defines the boundary of this subsumption.
776-->
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100777
778<!--
779(non-normalized entries could also be implicitly marked, allowing writing
780int | 1, instead of int | *1, but that can be done in a backwards
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100781compatible way later if really desirable, as long as we require that
782disjunction literals be normalized).
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500783-->
784
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100785
786```
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100787Expression Resolves to
788"tcp" | "udp" "tcp" | "udp"
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100789*"tcp" | "udp" "tcp"
790float | *1 1
791*string | 1.0 string
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100792
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100793(*1|2|3) | (1|*2|3) 1|2
794(*1|2|3) & (1|*2|3) 1|2|3 // default is _|_
795
796(* >=5 | int) & (* <=5 | int) 5
797
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100798(*"tcp"|"udp") & ("udp"|*"tcp") "tcp"
799(*"tcp"|"udp") & ("udp"|"tcp") "tcp"
800(*"tcp"|"udp") & "tcp" "tcp"
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100801(*"tcp"|"udp") & (*"udp"|"tcp") "tcp" | "udp" // default is _|_
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100802
803(*true | false) & bool true
804(*true | false) & (true | false) true
805
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100806{a: 1} | {b: 1} {a: 1} | {b: 1}
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100807{a: 1} | *{b: 1} {b:1}
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100808*{a: 1} | *{b: 1} {a: 1} | {b: 1}
809({a: 1} | {b: 1}) & {a:1} {a:1} // after eliminating {a:1,b:1} by normalization
810({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1} // after eliminating {a:1,b:1} by normalization
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100811```
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500812
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100813
814### Bottom and errors
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100815
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100816Any evaluation error in CUE results in a bottom value, respresented by
Axel Wagner8529d772019-09-24 18:27:12 +0000817the token `_|_`.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100818Bottom is an instance of every other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100819Any evaluation error is represented as bottom.
820
821Implementations may associate error strings with different instances of bottom;
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500822logically they all remain the same value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100823
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100824
825### Top
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100826
Axel Wagner8529d772019-09-24 18:27:12 +0000827Top is represented by the underscore character `_`, lexically an identifier.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100828Unifying any value `v` with top results `v` itself.
829
830```
831Expr Result
832_ & 5 5
833_ & _ _
834_ & _|_ _|_
835_ | _|_ _
836```
837
838
839### Null
840
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100841The _null value_ is represented with the keyword `null`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100842It has only one parent, top, and one child, bottom.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100843It is unordered with respect to any other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100844
845```
846null_lit = "null"
847```
848
849```
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100850null & 8 _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100851null & _ null
852null & _|_ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100853```
854
855
856### Boolean values
857
858A _boolean type_ represents the set of Boolean truth values denoted by
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100859the keywords `true` and `false`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100860The predeclared boolean type is `bool`; it is a defined type and a separate
861element in the lattice.
862
863```
864boolean_lit = "true" | "false"
865```
866
867```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100868bool & true true
869true & true true
870true & false _|_
871bool & (false|true) false | true
872bool & (true|false) true | false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100873```
874
875
876### Numeric values
877
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500878The _integer type_ represents the set of all integral numbers.
879The _decimal floating-point type_ represents the set of all decimal floating-point
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100880numbers.
881They are two distinct types.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +0200882Both are instances instances of a generic `number` type.
883
884<!--
885 number
886 / \
887 int float
888-->
889
890The predeclared number, integer, decimal floating-point types are
891`number`, `int` and `float`; they are defined types.
892<!--
893TODO: should we drop float? It is somewhat preciser and probably a good idea
894to have it in the programmatic API, but it may be confusing to have to deal
895with it in the language.
896-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100897
898A decimal floating-point literal always has type `float`;
899it is not an instance of `int` even if it is an integral number.
900
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400901Integer literals are always of type `int` and don't match type `float`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100902
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100903Numeric literals are exact values of arbitrary precision.
904If the operation permits it, numbers should be kept in arbitrary precision.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100905
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100906Implementation restriction: although numeric values have arbitrary precision
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100907in the language, implementations may implement them using an internal
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100908representation with limited precision.
909That said, every implementation must:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100910
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500911- Represent integer values with at least 256 bits.
912- Represent floating-point values, with a mantissa of at least 256 bits and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100913a signed binary exponent of at least 16 bits.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500914- Give an error if unable to represent an integer value precisely.
915- Give an error if unable to represent a floating-point value due to overflow.
916- Round to the nearest representable value if unable to represent
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100917a floating-point value due to limits on precision.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100918These requirements apply to the result of any expression except for builtin
919functions for which an unusual loss of precision must be explicitly documented.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100920
921
922### Strings
923
Marcel van Lohuizen4108f802019-08-13 18:30:25 +0200924The _string type_ represents the set of UTF-8 strings,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100925not allowing surrogates.
926The predeclared string type is `string`; it is a defined type.
927
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100928The length of a string `s` (its size in bytes) can be discovered using
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400929the built-in function `len`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100930
Marcel van Lohuizen4108f802019-08-13 18:30:25 +0200931
932### Bytes
933
934The _bytes type_ represents the set of byte sequences.
935A byte sequence value is a (possibly empty) sequence of bytes.
936The number of bytes is called the length of the byte sequence
937and is never negative.
938The predeclared byte sequence type is `bytes`; it is a defined type.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100939
940
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +0100941### Bounds
942
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400943A _bound_, syntactically a [unary expression](#Operands), defines
Marcel van Lohuizen62b87272019-02-01 10:07:49 +0100944an infinite disjunction of concrete values than can be represented
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +0100945as a single comparison.
946
947For any [comparison operator](#Comparison-operators) `op` except `==`,
948`op a` is the disjunction of every `x` such that `x op a`.
949
950```
9512 & >=2 & <=5 // 2, where 2 is either an int or float.
9522.5 & >=1 & <=5 // 2.5
9532 & >=1.0 & <3.0 // 2.0
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01009542 & >1 & <3.0 // 2.0
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01009552.5 & int & >1 & <5 // _|_
9562.5 & float & >1 & <5 // 2.5
957int & 2 & >1.0 & <3.0 // _|_
9582.5 & >=(int & 1) & <5 // _|_
959>=0 & <=7 & >=3 & <=10 // >=3 & <=7
960!=null & 1 // 1
961>=5 & <=5 // 5
962```
963
964
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100965### Structs
966
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500967A _struct_ is a set of elements called _fields_, each of
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100968which has a name, called a _label_, and value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100969
970We say a label is defined for a struct if the struct has a field with the
971corresponding label.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +0200972The value for a label `f` of struct `a` is denoted `a.f`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100973A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100974defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100975Note that if `a` is an instance of `b` it may have fields with labels that
976are not defined for `b`.
977
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500978The (unique) struct with no fields, written `{}`, has every struct as an
979instance. It can be considered the type of all structs.
980
Jonathan Amsterdam061bde12019-09-03 08:28:10 -0400981```
982{a: 1} ⊑ {}
983{a: 1, b: 1} ⊑ {a: 1}
984{a: 1} ⊑ {a: int}
985{a: 1, b: 1} ⊑ {a: int, b: float}
986
987{} ⋢ {a: 1}
988{a: 2} ⋢ {a: 1}
989{a: 1} ⋢ {b: 1}
990```
991
Marcel van Lohuizen62658a82019-06-16 12:18:47 +0200992A field may be required or optional.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100993The successful unification of structs `a` and `b` is a new struct `c` which
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100994has all fields of both `a` and `b`, where
995the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
996or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +0200997If a field `f` is in both `a` and `b`, `c.f` is optional only if both
998`a.f` and `b.f` are optional.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100999Any [references](#References) to `a` or `b`
1000in their respective field values need to be replaced with references to `c`.
Marcel van Lohuizen3022ae92019-10-15 13:35:58 +02001001The result of a unification is bottom (`_|_`) if any of its non-optional
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001002fields evaluates to bottom, recursively.
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02001003
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001004<!--NOTE: About bottom values for optional fields being okay.
1005
1006The proposition ¬P is a close cousin of P → ⊥ and is often used
1007as an approximation to avoid the issues of using not.
1008Bottom (⊥) is also frequently used to mean undefined. This makes sense.
1009Consider `{a?: 2} & {a?: 3}`.
1010Both structs say `a` is optional; in other words, it may be omitted.
1011So we can still get a valid result by omitting `a`, even in
1012case of a conflict.
1013
1014Granted, this definition may lead to confusing results, especially in
1015definitions, when tightening an optional field leads to unintentionally
1016discarding it.
1017It could be a role of vet checkers to identify such cases (and suggest users
1018to explicitly use `_|_` to discard a field, for instance).
1019-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001020
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001021Syntactically, a field is marked as optional by following its label with a `?`.
1022The question mark is not part of the field name.
1023A struct literal may contain multiple fields with
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001024the same label, the result of which is a single field with the same properties
1025as defined as the unification of two fields resulting from unifying two structs.
1026
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001027These examples illustrate required fields only.
1028Examples with optional fields follow below.
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001029
1030```
1031Expression Result (without optional fields)
1032{a: int, a: 1} {a: 1}
1033{a: int} & {a: 1} {a: 1}
1034{a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7}
1035{a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7}
1036
1037{a: 1} & {b: 2} {a: 1, b: 2}
1038{a: 1, b: int} & {b: 2} {a: 1, b: 2}
1039
1040{a: 1} & {a: 2} _|_
1041```
1042
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001043A struct may define constraints that apply to fields that are added when unified
1044with another struct using pattern or default constraints.
1045
1046A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which
1047is a value of type string, and a value to unify with fields whose label
1048match that pattern.
1049When unifying structs `a` and `b`,
1050a pattern constraint `[p]: v` declared in `a`
1051defines that the value `v` should unify with any field in the resulting struct `c`
1052whose label unifies with pattern `p` and for which there exists no
1053field in `a` with the same label.
1054
1055Additionally, a _default constraint_, denoted `...value`, defines a value
1056to unify with any field for which there is no other declaration in a struct.
1057When unifying structs `a` and `b`,
1058a default constraint `...v` declared in `a`
1059defines that the value `v` should unify with any field in the resulting struct `c`
1060whose label does not unify with any of the patterns of the pattern
1061constraints defined for `a` _and_ for which there exists no field in `a`
1062with that label.
1063The token `...` is a shorthand for `..._`.
1064
1065
Marcel van Lohuizen0cb140e2020-02-10 09:09:43 +01001066```
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001067a: {
1068 foo: string // foo is a string
1069 ["^i"]: int // all other fields starting with i are integers
1070 ["^b"]: bool // all other fields starting with b are booleans
1071 ...string // all other fields must be a string
1072}
1073
1074b: a & {
1075 i3: 3
1076 bar: true
1077 other: "a string"
1078}
Marcel van Lohuizen0cb140e2020-02-10 09:09:43 +01001079```
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001080
1081<!-- NOTE: pattern and default constraints can be made to apply to all
1082fields by embedding them as a struct:
1083 x: {
1084 a: 2
1085 b: 3
1086 {[string]: int}
1087 }
1088or by writing
1089 x: [string]: int
1090 x: {
1091 a: 2
1092 b: 3
1093 }
1094-->
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001095
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001096Concrete field labels may be an identifier or string, the latter of which may be
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001097interpolated.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001098Fields with identifier labels can be referred to within the scope they are
1099defined, string labels cannot.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001100References within such interpolated strings are resolved within
1101the scope of the struct in which the label sequence is
1102defined and can reference concrete labels lexically preceding
1103the label within a label sequence.
1104<!-- We allow this so that rewriting a CUE file to collapse or expand
1105field sequences has no impact on semantics.
1106-->
1107
1108<!--TODO: first implementation round will not yet have expression labels
1109
1110An ExpressionLabel sets a collection of optional fields to a field value.
1111By default it defines this value for all possible string labels.
1112An optional expression limits this to the set of optional fields which
1113labels match the expression.
1114-->
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001115
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001116
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001117<!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001118
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001119
1120<!-- NOTE:
1121A DefinitionDecl does not allow repeated labels. This is to avoid
1122any ambiguity or confusion about whether earlier path components
1123are to be interpreted as declarations or normal fields (they should
1124always be normal fields.)
1125-->
1126
1127<!--NOTE:
1128The syntax has been deliberately restricted to allow for the following
1129future extensions and relaxations:
1130 - Allow omitting a "?" in an expression label to indicate a concrete
1131 string value (but maybe we want to use () for that).
1132 - Make the "?" in expression label optional if expression labels
1133 are always optional.
1134 - Or allow eliding the "?" if the expression has no references and
1135 is obviously not concrete (such as `[string]`).
1136 - The expression of an expression label may also indicate a struct with
1137 integer or even number labels
1138 (beware of imprecise computation in the latter).
1139 e.g. `{ [int]: string }` is a map of integers to strings.
1140 - Allow for associative lists (`foo [@.field]: {field: string}`)
1141 - The `...` notation can be extended analogously to that of a ListList,
1142 by allowing it to follow with an expression for the remaining properties.
1143 In that case it is no longer a shorthand for `[string]: _`, but rather
1144 would define the value for any other value for which there is no field
1145 defined.
1146 Like the definition with List, this is somewhat odd, but it allows the
1147 encoding of JSON schema's and (non-structural) OpenAPI's
1148 additionalProperties and additionalItems.
1149-->
1150
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001151```
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001152StructLit = "{" { Declaration "," } "}" .
1153Declaration = Field | Ellipsis | Embedding | LetClause | attribute .
1154Ellipsis = "..." [ Expression ] .
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001155Embedding = Comprehension | AliasExpr .
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001156Field = Label ":" { Label ":" } Expression { attribute } .
Marcel van Lohuizen86e1a642020-05-19 21:42:01 +02001157Label = [ identifier "=" ] LabelExpr .
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001158LabelExpr = LabelName [ "?" ] | "[" AliasExpr "]" .
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001159LabelName = identifier | simple_string_lit .
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001160
Marcel van Lohuizen4d29dde2019-12-02 23:11:30 +01001161attribute = "@" identifier "(" attr_tokens ")" .
1162attr_tokens = { attr_token |
1163 "(" attr_tokens ")" |
1164 "[" attr_tokens "]" |
1165 "{" attr_tokens "}" } .
1166attr_token = /* any token except '(', ')', '[', ']', '{', or '}' */
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001167```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001168
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001169```
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001170Expression Result (without optional fields)
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001171a: { foo?: string } {}
1172b: { foo: "bar" } { foo: "bar" }
1173c: { foo?: *"bar" | string } {}
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001174
1175d: a & b { foo: "bar" }
1176e: b & c { foo: "bar" }
1177f: a & c {}
1178g: a & { foo?: number } {}
1179h: b & { foo?: number } _|_
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001180i: c & { foo: string } { foo: "bar" }
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001181
1182intMap: [string]: int
1183intMap: {
1184 t1: 43
1185 t2: 2.4 // error: 2.4 is not an integer
1186}
1187
1188nameMap: [string]: {
1189 firstName: string
1190 nickName: *firstName | string
1191}
1192
1193nameMap: hank: { firstName: "Hank" }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001194```
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001195The optional field set defined by `nameMap` matches every field,
1196in this case just `hank`, and unifies the associated constraint
1197with the matched field, resulting in:
1198```
1199nameMap: hank: {
1200 firstName: "Hank"
1201 nickName: "Hank"
1202}
1203```
1204
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001205
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001206#### Closed structs
1207
1208By default, structs are open to adding fields.
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001209Instances of an open struct `p` may contain fields not defined in `p`.
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001210This is makes it easy to add fields, but can lead to bugs:
1211
1212```
1213S: {
1214 field1: string
1215}
1216
1217S1: S & { field2: "foo" }
1218
1219// S1 is { field1: string, field2: "foo" }
1220
1221
1222A: {
1223 field1: string
1224 field2: string
1225}
1226
1227A1: A & {
1228 feild1: "foo" // "field1" was accidentally misspelled
1229}
1230
1231// A1 is
1232// { field1: string, field2: string, feild1: "foo" }
1233// not the intended
1234// { field1: "foo", field2: string }
1235```
1236
Marcel van Lohuizendd0fa882020-07-25 16:00:45 +02001237A _closed struct_ `c` is a struct whose instances may not declare any field
1238with a name that does not match the name of a regular or optional field,
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001239or the pattern of a pattern constraint defined in `c`.
Marcel van Lohuizendd0fa882020-07-25 16:00:45 +02001240Hidden fields are excluded from this limitation.
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001241A struct that is the result of unifying any struct with a [`...`](#Structs)
1242declaration is defined for all fields.
1243Recursively closing a struct is equivalent to adding `..._|_` to its its root
1244and any of its substructures that are not defined for all fields.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001245
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001246Syntactically, structs are recursively closed explicitly with
1247the `close` builtin or implicitly by [definitions](#Definitions).
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001248
1249
1250```
1251A: close({
1252 field1: string
1253 field2: string
1254})
1255
1256A1: A & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001257 feild1: string
1258} // _|_ feild1 not defined for A
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001259
1260A2: A & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001261 for k,v in { feild1: string } {
1262 k: v
1263 }
1264} // _|_ feild1 not defined for A
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001265
1266C: close({
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001267 [_]: _
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001268})
1269
1270C2: C & {
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001271 for k,v in { thisIsFine: string } {
1272 "\(k)": v
1273 }
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001274}
1275
1276D: close({
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001277 // Values generated by comprehensions are treated as embeddings.
1278 for k,v in { x: string } {
1279 "\(k)": v
1280 }
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001281})
1282```
1283
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001284<!-- (jba) Somewhere it should be said that optional fields are only
1285 interesting inside closed structs. -->
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001286
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001287<!-- TODO: move embedding section to above the previous one -->
1288
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001289#### Embedding
1290
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001291A struct may contain an _embedded value_, an operand used
Marcel van Lohuizen5134dee2019-07-21 14:41:44 +02001292as a declaration, which must evaluate to a struct.
1293An embedded value of type struct is unified with the struct in which it is
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001294embedded, but disregarding the restrictions imposed by closed structs.
1295A struct resulting from such a unification is closed if either of the involved
1296structs were closed.
1297
Marcel van Lohuizena3c7bef2019-10-10 21:50:58 +02001298At the top level, an embedded value may be any type.
1299In this case, a CUE program will evaluate to the embedded value
1300and the CUE program may not have top-level regular or optional
1301fields (definitions and aliases are allowed).
1302
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001303Syntactically, embeddings may be any expression.
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02001304
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001305```
1306S1: {
1307 a: 1
1308 b: 2
1309 {
1310 c: 3
1311 }
1312}
1313// S1 is { a: 1, b: 2, c: 3 }
1314
1315S2: close({
1316 a: 1
1317 b: 2
1318 {
1319 c: 3
1320 }
1321})
1322// same as close(S1)
1323
1324S3: {
1325 a: 1
1326 b: 2
1327 close({
1328 c: 3
1329 })
1330}
1331// same as S2
1332```
1333
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001334
Marcel van Lohuizenb7083ff2020-05-12 11:38:19 +02001335#### Definitions and hidden fields
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001336
Marcel van Lohuizenb7083ff2020-05-12 11:38:19 +02001337A field is a _definition_ if its identifier starts with `#` or `_#`.
1338A field is _hidden_ if its starts with a `_`.
1339Definitions and hidden fields are not emitted when converting a CUE program
1340to data and are never required to be concrete.
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02001341
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001342Referencing a definition will implicitely [close](#ClosedStructs) it.
1343A struct that embeds a referenced definition will itself be closed
1344after first allowing any other fields or embedded structs to unify.
1345The result of `{ #A }` is `#A` for any `#A`.
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02001346
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001347If referencing a definition would always result in an error, implementations
1348may report this inconsistency at the point of its declaration.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001349
1350```
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001351#MyStruct: {
1352 sub: field: string
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001353}
1354
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001355#MyStruct: {
1356 sub: enabled?: bool
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001357}
1358
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001359myValue: #MyStruct & {
1360 sub: feild: 2 // error, feild not defined in #MyStruct
1361 sub: enabled: true // okay
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001362}
1363
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001364#D: {
1365 #OneOf
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001366
1367 c: int // adds this field.
1368}
1369
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001370#OneOf: { a: int } | { b: int }
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001371
1372
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001373D1: #D & { a: 12, c: 22 } // { a: 12, c: 22 }
1374D2: #D & { a: 12, b: 33 } // _|_ // cannot define both `a` and `b`
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001375```
1376
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001377
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001378<!---
1379JSON fields are usual camelCase. Clashes can be avoided by adopting the
1380convention that definitions be TitleCase. Unexported definitions are still
1381subject to clashes, but those are likely easier to resolve because they are
1382package internal.
1383--->
1384
1385
Marcel van Lohuizen4dd96302020-01-13 09:38:00 +01001386#### Attributes
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001387
Marcel van Lohuizen4dd96302020-01-13 09:38:00 +01001388Attributes allow associating meta information with values.
1389Their primary purpose is to define mappings between CUE and
1390other representations.
1391Attributes do not influence the evaluation of CUE.
1392
1393An attribute associates an identifier with a value, a balanced token sequence,
1394which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`).
1395The sequence may not contain interpolations.
1396
1397Fields, structs and packages can be associated with a set of attributes.
1398Attributes accumulate during unification, but implementations may remove
1399duplicates that have the same source string representation.
1400The interpretation of an attribute, including the handling of multiple
1401attributes for a given identifier, is up to the consumer of the attribute.
1402
1403Field attributes define additional information about a field,
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001404such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001405name of the field when mapping to a different language.
1406
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001407
1408```
Marcel van Lohuizen4dd96302020-01-13 09:38:00 +01001409// Package attribute
1410@protobuf(proto3)
1411
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001412myStruct1: {
Marcel van Lohuizen4dd96302020-01-13 09:38:00 +01001413 // Struct attribute:
1414 @jsonschema(id="https://example.org/mystruct1.json")
1415
1416 // Field attributes
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001417 field: string @go(Field)
1418 attr: int @xml(,attr) @go(Attr)
1419}
1420
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001421myStruct2: {
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001422 field: string @go(Field)
1423 attr: int @xml(a1,attr) @go(Attr)
1424}
1425
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001426Combined: myStruct1 & myStruct2
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001427// field: string @go(Field)
1428// attr: int @xml(,attr) @xml(a1,attr) @go(Attr)
1429```
1430
Marcel van Lohuizenfa7e3ce2019-10-10 15:43:34 +02001431
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001432#### Aliases
1433
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001434Aliases name values that can be referred to
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001435within the [scope](#declarations-and-scopes) in which they are declared.
1436The name of an alias must be unique within its scope.
1437
1438```
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001439AliasExpr = Expression | identifier "=" Expression .
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001440```
1441
1442Aliases can appear in several positions:
1443
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001444<!--- TODO: consider allowing this. It should be considered whether
1445having field aliases isn't already sufficient.
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001446
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001447As a declaration in a struct (`X=value`):
1448
1449- binds identifier `X` to a value embedded within the struct.
1450--->
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001451
1452In front of a Label (`X=label: value`):
1453
1454- binds the identifier to the same value as `label` would be bound
1455 to if it were a valid identifier.
1456- for optional fields (`foo?: bar` and `[foo]: bar`),
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001457 the bound identifier is only visible within the field value (`bar`).
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001458
1459Inside a bracketed label (`[X=expr]: value`):
1460
1461- binds the identifier to the the concrete label that matches `expr`
1462 within the instances of the field value (`value`).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001463
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001464Before a list element (`[ X=value, X+1 ]`) (Not yet implemented)
1465
1466- binds the identifier to the list element it precedes within the scope of the
1467 list expression.
1468
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001469<!-- TODO: explain the difference between aliases and definitions.
1470 Now that you have definitions, are aliases really necessary?
1471 Consider removing.
1472-->
1473
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001474```
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001475// An alias declaration
1476Alias = 3
1477a: Alias // 3
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001478
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001479// A field alias
1480foo: X // 4
1481X="not an identifier": 4
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001482
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001483// A label alias
1484[Y=string]: { name: Y }
1485foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001486```
1487
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001488<!-- TODO: also allow aliases as lists -->
1489
1490
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001491#### Let declarations
1492
1493_Let declarations_ bind an identifier to an expression.
1494The identifier is visible within the [scope](#declarations-and-scopes)
1495in which it is declared.
1496The identifier must be unique within its scope.
1497
1498```
1499let x = expr
1500
1501a: x + 1
1502b: x + 2
1503```
1504
Jonathan Amsterdam061bde12019-09-03 08:28:10 -04001505#### Shorthand notation for nested structs
1506
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001507A field whose value is a struct with a single field may be written as
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001508a colon-separated sequence of the two field names,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001509followed by a colon and the value of that single field.
1510
1511```
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001512job: myTask: replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001513```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001514expands to
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001515```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001516job: {
1517 myTask: {
1518 replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001519 }
1520}
1521```
1522
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001523<!-- OPTIONAL FIELDS:
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001524
Marcel van Lohuizen08a0ef22019-03-28 09:12:19 +01001525The optional marker solves the issue of having to print large amounts of
1526boilerplate when dealing with large types with many optional or default
1527values (such as Kubernetes).
1528Writing such optional values in terms of *null | value is tedious,
1529unpleasant to read, and as it is not well defined what can be dropped or not,
1530all null values have to be emitted from the output, even if the user
1531doesn't override them.
1532Part of the issue is how null is defined. We could adopt a Typescript-like
1533approach of introducing "void" or "undefined" to mean "not defined and not
1534part of the output". But having all of null, undefined, and void can be
1535confusing. If these ever are introduced anyway, the ? operator could be
1536expressed along the lines of
1537 foo?: bar
1538being a shorthand for
1539 foo: void | bar
1540where void is the default if no other default is given.
1541
1542The current mechanical definition of "?" is straightforward, though, and
1543probably avoids the need for void, while solving a big issue.
1544
1545Caveats:
1546[1] this definition requires explicitly defined fields to be emitted, even
1547if they could be elided (for instance if the explicit value is the default
1548value defined an optional field). This is probably a good thing.
1549
1550[2] a default value may still need to be included in an output if it is not
1551the zero value for that field and it is not known if any outside system is
1552aware of defaults. For instance, which defaults are specified by the user
1553and which by the schema understood by the receiving system.
1554The use of "?" together with defaults should therefore be used carefully
1555in non-schema definitions.
1556Problematic cases should be easy to detect by a vet-like check, though.
1557
1558[3] It should be considered how this affects the trim command.
1559Should values implied by optional fields be allowed to be removed?
1560Probably not. This restriction is unlikely to limit the usefulness of trim,
1561though.
1562
1563[4] There should be an option to emit all concrete optional values.
1564```
1565-->
1566
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001567### Lists
1568
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001569A list literal defines a new value of type list.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001570A list may be open or closed.
1571An open list is indicated with a `...` at the end of an element list,
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001572optionally followed by a value for the remaining elements.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001573
1574The length of a closed list is the number of elements it contains.
1575The length of an open list is the its number of elements as a lower bound
1576and an unlimited number of elements as its upper bound.
1577
1578```
Marcel van Lohuizen21ca3712020-06-04 11:59:12 +02001579ListLit = "[" [ ElementList [ "," [ Ellipsis ] ] [ "," ] "]" .
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001580ElementList = Embedding { "," Embedding } .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001581```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001582
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001583Lists can be thought of as structs:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001584
1585```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001586List: *null | {
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001587 Elem: _
1588 Tail: List
1589}
1590```
1591
1592For closed lists, `Tail` is `null` for the last element, for open lists it is
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001593`*null | List`, defaulting to the shortest variant.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001594For instance, the open list [ 1, 2, ... ] can be represented as:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001595```
1596open: List & { Elem: 1, Tail: { Elem: 2 } }
1597```
1598and the closed version of this list, [ 1, 2 ], as
1599```
1600closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
1601```
1602
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001603Using this representation, the subsumption rule for lists can
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001604be derived from those of structs.
1605Implementations are not required to implement lists as structs.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001606The `Elem` and `Tail` fields are not special and `len` will not work as
1607expected in these cases.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001608
1609
1610## Declarations and Scopes
1611
1612
1613### Blocks
1614
1615A _block_ is a possibly empty sequence of declarations.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001616The braces of a struct literal `{ ... }` form a block, but there are
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001617others as well:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001618
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001619- The _universe block_ encompasses all CUE source text.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001620- Each [package](#modules-instances-and-packages) has a _package block_
1621 containing all CUE source text in that package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001622- Each file has a _file block_ containing all CUE source text in that file.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001623- Each `for` and `let` clause in a [comprehension](#comprehensions)
1624 is considered to be its own implicit block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001625
1626Blocks nest and influence [scoping].
1627
1628
1629### Declarations and scope
1630
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001631A _declaration_ may bind an identifier to a field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001632Every identifier in a program must be declared.
1633Other than for fields,
1634no identifier may be declared twice within the same block.
1635For fields an identifier may be declared more than once within the same block,
1636resulting in a field with a value that is the result of unifying the values
1637of all fields with the same identifier.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001638String labels do not bind an identifier to the respective field.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001639
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001640The _scope_ of a declared identifier is the extent of source text in which the
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001641identifier denotes the specified field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001642
1643CUE is lexically scoped using blocks:
1644
Jonathan Amsterdame4790382019-01-20 10:29:29 -050016451. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
Marcel van Lohuizen21f6c442019-09-26 14:55:23 +020016461. The scope of an identifier denoting a field
1647 declared at top level (outside any struct literal) is the package block.
16481. The scope of an identifier denoting an alias
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001649 declared at top level (outside any struct literal) is the file block.
16501. The scope of the package name of an imported package is the file block of the
1651 file containing the import declaration.
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +020016521. The scope of a field, alias or let identifier declared inside a struct
1653 literal is the innermost containing block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001654
1655An identifier declared in a block may be redeclared in an inner block.
1656While the identifier of the inner declaration is in scope, it denotes the entity
1657declared by the inner declaration.
1658
1659The package clause is not a declaration;
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001660the package name does not appear in any scope.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001661Its purpose is to identify the files belonging to the same package
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001662and to specify the default name for import declarations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001663
1664
1665### Predeclared identifiers
1666
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001667CUE predefines a set of types and builtin functions.
1668For each of these there is a corresponding keyword which is the name
1669of the predefined identifier, prefixed with `__`.
1670
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001671```
1672Functions
1673len required close open
1674
1675Types
1676null The null type and value
1677bool All boolean values
1678int All integral numbers
1679float All decimal floating-point numbers
1680string Any valid UTF-8 sequence
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02001681bytes Any valid byte sequence
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001682
1683Derived Value
1684number int | float
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001685uint >=0
1686uint8 >=0 & <=255
1687int8 >=-128 & <=127
1688uint16 >=0 & <=65536
1689int16 >=-32_768 & <=32_767
1690rune >=0 & <=0x10FFFF
1691uint32 >=0 & <=4_294_967_296
1692int32 >=-2_147_483_648 & <=2_147_483_647
1693uint64 >=0 & <=18_446_744_073_709_551_615
1694int64 >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
1695uint128 >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
1696int128 >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
1697 <=170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001698float32 >=-3.40282346638528859811704183484516925440e+38 &
1699 <=3.40282346638528859811704183484516925440e+38
1700float64 >=-1.797693134862315708145274237317043567981e+308 &
1701 <=1.797693134862315708145274237317043567981e+308
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001702```
1703
1704
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001705### Exported identifiers
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001706
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001707<!-- move to a more logical spot -->
1708
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001709An identifier of a package may be exported to permit access to it
1710from another package.
Marcel van Lohuizenb7083ff2020-05-12 11:38:19 +02001711All identifiers not starting with `_` (so all regular fields and definitions
1712starting with `#`) are exported.
1713Any identifier starting with `_` is not visible outside the package and resides
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001714in a separate namespace than namesake identifiers of other packages.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001715
1716```
1717package mypackage
1718
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001719foo: string // visible outside mypackage
1720"bar": string // visible outside mypackage
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001721
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001722#Foo: { // visible outside mypackage
1723 a: 1 // visible outside mypackage
1724 _b: 2 // not visible outside mypackage
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001725
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01001726 #C: { // visible outside mypackage
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001727 d: 4 // visible outside mypackage
1728 }
Marcel van Lohuizenb7083ff2020-05-12 11:38:19 +02001729 _#E: foo // not visible outside mypackage
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001730}
1731```
1732
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001733
1734### Uniqueness of identifiers
1735
1736Given a set of identifiers, an identifier is called unique if it is different
1737from every other in the set, after applying normalization following
1738Unicode Annex #31.
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001739Two identifiers are different if they are spelled differently
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001740or if they appear in different packages and are not exported.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001741Otherwise, they are the same.
1742
1743
1744### Field declarations
1745
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001746A field associates the value of an expression to a label within a struct.
1747If this label is an identifier, it binds the field to that identifier,
1748so the field's value can be referenced by writing the identifier.
1749String labels are not bound to fields.
1750```
1751a: {
1752 b: 2
1753 "s": 3
1754
1755 c: b // 2
1756 d: s // _|_ unresolved identifier "s"
1757 e: a.s // 3
1758}
1759```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001760
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001761If an expression may result in a value associated with a default value
1762as described in [default values](#default-values), the field binds to this
1763value-default pair.
1764
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001765
Marcel van Lohuizenbcf832f2019-04-03 22:50:44 +02001766<!-- TODO: disallow creating identifiers starting with __
1767...and reserve them for builtin values.
1768
1769The issue is with code generation. As no guarantee can be given that
1770a predeclared identifier is not overridden in one of the enclosing scopes,
1771code will have to handle detecting such cases and renaming them.
1772An alternative is to have the predeclared identifiers be aliases for namesake
1773equivalents starting with a double underscore (e.g. string -> __string),
1774allowing generated code (normal code would keep using `string`) to refer
1775to these directly.
1776-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001777
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001778
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001779### Let declarations
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001780
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001781Within a struct, a let clause binds an identifier to the given expression.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001782
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001783Within the scope of the identifier, the identifier refers to the
1784_locally declared_ expression.
Marcel van Lohuizen40178752019-08-25 19:17:56 +02001785The expression is evaluated in the scope it was declared.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001786
1787
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001788## Expressions
1789
1790An expression specifies the computation of a value by applying operators and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001791built-in functions to operands.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001792
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001793Expressions that require concrete values are called _incomplete_ if any of
1794their operands are not concrete, but define a value that would be legal for
1795that expression.
1796Incomplete expressions may be left unevaluated until a concrete value is
1797requested at the application level.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001798
1799### Operands
1800
1801Operands denote the elementary values in an expression.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001802An operand may be a literal, a (possibly qualified) identifier denoting
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001803field, alias, or let declaration, or a parenthesized expression.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001804
1805```
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02001806Operand = Literal | OperandName | "(" Expression ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001807Literal = BasicLit | ListLit | StructLit .
1808BasicLit = int_lit | float_lit | string_lit |
1809 null_lit | bool_lit | bottom_lit | top_lit .
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001810OperandName = identifier | QualifiedIdent .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001811```
1812
1813### Qualified identifiers
1814
1815A qualified identifier is an identifier qualified with a package name prefix.
1816
1817```
1818QualifiedIdent = PackageName "." identifier .
1819```
1820
1821A qualified identifier accesses an identifier in a different package,
1822which must be [imported].
1823The identifier must be declared in the [package block] of that package.
1824
1825```
1826math.Sin // denotes the Sin function in package math
1827```
1828
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001829### References
1830
1831An identifier operand refers to a field and is called a reference.
1832The value of a reference is a copy of the expression associated with the field
1833that it is bound to,
1834with any references within that expression bound to the respective copies of
1835the fields they were originally bound to.
1836Implementations may use a different mechanism to evaluate as long as
1837these semantics are maintained.
1838
1839```
1840a: {
1841 place: string
1842 greeting: "Hello, \(place)!"
1843}
1844
1845b: a & { place: "world" }
1846c: a & { place: "you" }
1847
1848d: b.greeting // "Hello, world!"
1849e: c.greeting // "Hello, you!"
1850```
1851
1852
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001853
1854### Primary expressions
1855
1856Primary expressions are the operands for unary and binary expressions.
1857
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001858
1859```
1860
1861Slice: indices must be complete
1862([0, 1, 2, 3] | [2, 3])[0:2] => [0, 1] | [2, 3]
1863
1864([0, 1, 2, 3] | *[2, 3])[0:2] => [0, 1] | [2, 3]
1865([0,1,2,3]|[2,3], [2,3])[0:2] => ([0,1]|[2,3], [2,3])
1866
1867Index
1868a: (1|2, 1)
1869b: ([0,1,2,3]|[2,3], [2,3])[a] => ([0,1,2,3]|[2,3][a], 3)
1870
1871Binary operation
1872A binary is only evaluated if its operands are complete.
1873
1874Input Maximum allowed evaluation
1875a: string string
1876b: 2 2
1877c: a * b a * 2
1878
1879An error in a struct is if the evaluation of any expression results in
1880bottom, where an incomplete expression is not considered bottom.
1881```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001882<!-- TODO(mpvl)
1883 Conversion |
1884-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001885```
1886PrimaryExpr =
1887 Operand |
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001888 PrimaryExpr Selector |
1889 PrimaryExpr Index |
1890 PrimaryExpr Slice |
1891 PrimaryExpr Arguments .
1892
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +02001893Selector = "." (identifier | simple_string_lit) .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001894Index = "[" Expression "]" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001895Argument = Expression .
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02001896Arguments = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001897```
1898<!---
Marcel van Lohuizen9ffcbbc2019-10-23 18:05:05 +02001899TODO:
1900 PrimaryExpr Query |
1901Query = "." Filters .
1902Filters = Filter { Filter } .
1903Filter = "[" [ "?" ] AliasExpr "]" .
1904
1905TODO: maybe reintroduce slices, as they are useful in queries, probably this
1906time with Python semantics.
1907Slice = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" .
1908
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001909Argument = Expression | ( identifer ":" Expression ).
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +02001910
1911// & expression type
1912// string_lit: same as label. Arguments is current node.
1913// If selector is applied to list, it performs the operation for each
1914// element.
1915
1916TODO: considering allowing decimal_lit for selectors.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001917--->
1918
1919```
1920x
19212
1922(s + ".txt")
1923f(3.1415, true)
1924m["foo"]
1925s[i : j + 1]
1926obj.color
1927f.p[i].x
1928```
1929
1930
1931### Selectors
1932
Roger Peppeded0e1d2019-09-24 16:39:36 +01001933For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause),
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001934the selector expression
1935
1936```
1937x.f
1938```
1939
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +02001940denotes the element of a <!--list or -->struct `x` identified by `f`.
1941<!--For structs, -->`f` must be an identifier or a string literal identifying
1942any definition or regular non-optional field.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001943The identifier `f` is called the field selector.
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +02001944
1945<!--
1946Allowing strings to be used as field selectors obviates the need for
1947backquoted identifiers. Note that some standards use names for structs that
1948are not standard identifiers (such "Fn::Foo"). Note that indexing does not
1949allow access to identifiers.
1950-->
1951
1952<!--
1953For lists, `f` must be an integer and follows the same lookup rules as
1954for the index operation.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001955The type of the selector expression is the type of `f`.
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +02001956-->
1957
Roger Peppeded0e1d2019-09-24 16:39:36 +01001958If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001959
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001960<!--
1961TODO: consider allowing this and also for selectors. It needs to be considered
1962how defaults are corried forward in cases like:
1963
1964 x: { a: string | *"foo" } | *{ a: int | *4 }
1965 y: x.a & string
1966
1967What is y in this case?
1968 (x.a & string, _|_)
1969 (string|"foo", _|_)
1970 (string|"foo", "foo)
1971If the latter, then why?
1972
1973For a disjunction of the form `x1 | ... | xn`,
1974the selector is applied to each element `x1.f | ... | xn.f`.
1975-->
1976
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +02001977Otherwise, if `x` is not a <!--list or -->struct,
1978or if `f` does not exist in `x`,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001979the result of the expression is bottom (an error).
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001980In the latter case the expression is incomplete.
1981The operand of a selector may be associated with a default.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001982
1983```
1984T: {
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +02001985 x: int
1986 y: 3
1987 "x-y": 4
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001988}
1989
Marcel van Lohuizenc7791ac2019-10-07 11:29:28 +02001990a: T.x // int
1991b: T.y // 3
1992c: T.z // _|_ // field 'z' not found in T
1993d: T."x-y" // 4
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001994
1995e: {a: 1|*2} | *{a: 3|*4}
1996f: e.a // 4 (default value)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001997```
1998
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001999<!--
2000```
2001(v, d).f => (v.f, d.f)
2002
2003e: {a: 1|*2} | *{a: 3|*4}
2004f: e.a // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
2005
2006```
2007-->
2008
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002009
2010### Index expressions
2011
2012A primary expression of the form
2013
2014```
2015a[x]
2016```
2017
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02002018denotes the element of a list or struct `a` indexed by `x`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002019The value `x` is called the index or field name, respectively.
2020The following rules apply:
2021
2022If `a` is not a struct:
2023
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02002024- `a` is a list (which need not be complete)
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002025- the index `x` unified with `int` must be concrete.
2026- the index `x` is in range if `0 <= x < len(a)`, where only the
2027 explicitly defined values of an open-ended list are considered,
2028 otherwise it is out of range
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002029
2030The result of `a[x]` is
2031
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02002032for `a` of list type:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002033
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02002034- the list element at index `x`, if `x` is within range
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002035- bottom (an error), otherwise
2036
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002037
2038for `a` of struct type:
2039
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002040- the index `x` unified with `string` must be concrete.
Marcel van Lohuizend2825532019-09-23 12:44:01 +01002041- the value of the regular and non-optional field named `x` of struct `a`,
2042 if this field exists
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002043- bottom (an error), otherwise
2044
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002045
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002046```
2047[ 1, 2 ][1] // 2
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002048[ 1, 2 ][2] // _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002049[ 1, 2, ...][2] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002050```
2051
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002052Both the operand and index value may be a value-default pair.
2053```
2054va[vi] => va[vi]
2055va[(vi, di)] => (va[vi], va[di])
2056(va, da)[vi] => (va[vi], da[vi])
2057(va, da)[(vi, di)] => (va[vi], da[di])
2058```
2059
2060```
2061Fields Result
2062x: [1, 2] | *[3, 4] ([1,2]|[3,4], [3,4])
2063i: int | *1 (int, 1)
2064
2065v: x[i] (x[i], 4)
2066```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002067
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002068### Operators
2069
2070Operators combine operands into expressions.
2071
2072```
2073Expression = UnaryExpr | Expression binary_op Expression .
2074UnaryExpr = PrimaryExpr | unary_op UnaryExpr .
2075
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002076binary_op = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op .
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01002077rel_op = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002078add_op = "+" | "-" .
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002079mul_op = "*" | "/" | "div" | "mod" | "quo" | "rem" .
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01002080unary_op = "+" | "-" | "!" | "*" | rel_op .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002081```
2082
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002083Comparisons are discussed [elsewhere](#Comparison-operators).
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01002084For any binary operators, the operand types must unify.
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02002085
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002086<!-- TODO: durations
2087 unless the operation involves durations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002088
2089Except for duration operations, if one operand is an untyped [literal] and the
2090other operand is not, the constant is [converted] to the type of the other
2091operand.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002092-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002093
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002094Operands of unary and binary expressions may be associated with a default using
2095the following
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02002096
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02002097<!--
2098```
2099O1: op (v1, d1) => (op v1, op d1)
2100
2101O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
2102and because v => (v, v)
2103O3: v1 op (v2, d2) => (v1 op v2, v1 op d2)
2104O4: (v1, d1) op v2 => (v1 op v2, d1 op v2)
2105```
2106-->
2107
2108```
2109Field Resulting Value-Default pair
2110a: *1|2 (1|2, 1)
2111b: -a (-a, -1)
2112
2113c: a + 2 (a+2, 3)
2114d: a + a (a+a, 2)
2115```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002116
2117#### Operator precedence
2118
2119Unary operators have the highest precedence.
2120
2121There are eight precedence levels for binary operators.
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002122Multiplication operators binds strongest, followed by
2123addition operators, comparison operators,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002124`&&` (logical AND), `||` (logical OR), `&` (unification),
2125and finally `|` (disjunction):
2126
2127```
2128Precedence Operator
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002129 7 * / div mod quo rem
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002130 6 + -
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01002131 5 == != < <= > >= =~ !~
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002132 4 &&
2133 3 ||
2134 2 &
2135 1 |
2136```
2137
2138Binary operators of the same precedence associate from left to right.
2139For instance, `x / y * z` is the same as `(x / y) * z`.
2140
2141```
2142+x
214323 + 3*x[i]
2144x <= f()
2145f() || g()
2146x == y+1 && y == z-1
21472 | int
2148{ a: 1 } & { b: 2 }
2149```
2150
2151#### Arithmetic operators
2152
2153Arithmetic operators apply to numeric values and yield a result of the same type
2154as the first operand. The three of the four standard arithmetic operators
2155`(+, -, *)` apply to integer and decimal floating-point types;
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01002156`+` and `*` also apply to lists and strings.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002157`/` only applies to decimal floating-point types and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002158`div`, `mod`, `quo`, and `rem` only apply to integer types.
2159
2160```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002161+ sum integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002162- difference integers, floats
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002163* product integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002164/ quotient floats
2165div division integers
2166mod modulo integers
2167quo quotient integers
2168rem remainder integers
2169```
2170
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002171For any operator that accepts operands of type `float`, any operand may be
2172of type `int` or `float`, in which case the result will be `float` if any
2173of the operands is `float` or `int` otherwise.
2174For `/` the result is always `float`.
2175
2176
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002177#### Integer operators
2178
2179For two integer values `x` and `y`,
2180the integer quotient `q = x div y` and remainder `r = x mod y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002181implement Euclidean division and
2182satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002183
2184```
2185r = x - y*q with 0 <= r < |y|
2186```
2187where `|y|` denotes the absolute value of `y`.
2188
2189```
2190 x y x div y x mod y
2191 5 3 1 2
2192-5 3 -2 1
2193 5 -3 -1 2
2194-5 -3 2 1
2195```
2196
2197For two integer values `x` and `y`,
2198the integer quotient `q = x quo y` and remainder `r = x rem y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002199implement truncated division and
2200satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002201
2202```
2203x = q*y + r and |r| < |y|
2204```
2205
2206with `x quo y` truncated towards zero.
2207
2208```
2209 x y x quo y x rem y
2210 5 3 1 2
2211-5 3 -1 -2
2212 5 -3 -1 2
2213-5 -3 1 -2
2214```
2215
2216A zero divisor in either case results in bottom (an error).
2217
2218For integer operands, the unary operators `+` and `-` are defined as follows:
2219
2220```
2221+x is 0 + x
2222-x negation is 0 - x
2223```
2224
2225
2226#### Decimal floating-point operators
2227
2228For decimal floating-point numbers, `+x` is the same as `x`,
2229while -x is the negation of x.
2230The result of a floating-point division by zero is bottom (an error).
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02002231
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002232<!-- TODO: consider making it +/- Inf -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002233
2234An implementation may combine multiple floating-point operations into a single
2235fused operation, possibly across statements, and produce a result that differs
2236from the value obtained by executing and rounding the instructions individually.
2237
2238
2239#### List operators
2240
2241Lists can be concatenated using the `+` operator.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002242Opens list are closed to their default value beforehand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002243
2244```
2245[ 1, 2 ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
2246[ 1, 2, ... ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002247[ 1, 2 ] + [ 3, 4, ... ] // [ 1, 2, 3, 4 ]
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002248```
2249
Jonathan Amsterdam0500c312019-02-16 18:04:09 -05002250Lists can be multiplied with a non-negative`int` using the `*` operator
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01002251to create a repeated the list by the indicated number.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002252```
22533*[1,2] // [1, 2, 1, 2, 1, 2]
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +020022543*[1, 2, ...] // [1, 2, 1, 2, 1 ,2]
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01002255[byte]*4 // [byte, byte, byte, byte]
Jonathan Amsterdam0500c312019-02-16 18:04:09 -050022560*[1,2] // []
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002257```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002258
2259<!-- TODO(mpvl): should we allow multiplication with a range?
2260If so, how does one specify a list with a range of possible lengths?
2261
2262Suggestion from jba:
2263Multiplication should distribute over disjunction,
2264so int(1)..int(3) * [x] = [x] | [x, x] | [x, x, x].
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002265The hard part is figuring out what (>=1 & <=3) * [x] means,
2266since >=1 & <=3 includes many floats.
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01002267(mpvl: could constrain arguments to parameter types, but needs to be
2268done consistently.)
2269-->
2270
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002271
2272#### String operators
2273
2274Strings can be concatenated using the `+` operator:
2275```
Daniel Martí107863a2020-02-11 15:00:50 +00002276s: "hi " + name + " and good bye"
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002277```
2278String addition creates a new string by concatenating the operands.
2279
2280A string can be repeated by multiplying it:
2281
2282```
2283s: "etc. "*3 // "etc. etc. etc. "
2284```
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02002285
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002286<!-- jba: Do these work for byte sequences? If not, why not? -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002287
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002288
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002289##### Comparison operators
2290
2291Comparison operators compare two operands and yield an untyped boolean value.
2292
2293```
2294== equal
2295!= not equal
2296< less
2297<= less or equal
2298> greater
2299>= greater or equal
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002300=~ matches regular expression
2301!~ does not match regular expression
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002302```
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02002303
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002304<!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002305
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002306In any comparison, the types of the two operands must unify or one of the
2307operands must be null.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002308
2309The equality operators `==` and `!=` apply to operands that are comparable.
2310The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002311The matching operators `=~` and `!~` apply to a string and regular
2312expression operand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002313These terms and the result of the comparisons are defined as follows:
2314
Marcel van Lohuizen855243e2019-02-07 18:00:55 +01002315- Null is comparable with itself and any other type.
2316 Two null values are always equal, null is unequal with anything else.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002317- Boolean values are comparable.
2318 Two boolean values are equal if they are either both true or both false.
2319- Integer values are comparable and ordered, in the usual way.
2320- Floating-point values are comparable and ordered, as per the definitions
2321 for binary coded decimals in the IEEE-754-2008 standard.
Marcel van Lohuizen4a360992019-05-11 18:18:31 +02002322- Floating point numbers may be compared with integers.
Marcel van Lohuizen4108f802019-08-13 18:30:25 +02002323- String and bytes values are comparable and ordered lexically byte-wise.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002324- Struct are not comparable.
Marcel van Lohuizen855243e2019-02-07 18:00:55 +01002325- Lists are not comparable.
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002326- The regular expression syntax is the one accepted by RE2,
2327 described in https://github.com/google/re2/wiki/Syntax,
2328 except for `\C`.
2329- `s =~ r` is true if `s` matches the regular expression `r`.
2330- `s !~ r` is true if `s` does not match regular expression `r`.
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02002331
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02002332<!--- TODO: consider the following
2333- For regular expression, named capture groups are interpreted as CUE references
2334 that must unify with the strings matching this capture group.
2335--->
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002336<!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
Marcel van Lohuizen88a8a5f2019-02-20 01:26:22 +01002337<!-- Consider implementing Level 2 of Unicode regular expression. -->
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002338
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002339```
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +010023403 < 4 // true
Marcel van Lohuizen4a360992019-05-11 18:18:31 +020023413 < 4.0 // true
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01002342null == 2 // false
2343null != {} // true
2344{} == {} // _|_: structs are not comparable against structs
2345
2346"Wild cats" =~ "cat" // true
2347"Wild cats" !~ "dog" // true
2348
2349"foo" =~ "^[a-z]{3}$" // true
2350"foo" =~ "^[a-z]{4}$" // false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002351```
2352
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002353<!-- jba
2354I think I know what `3 < a` should mean if
2355
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002356 a: >=1 & <=5
2357
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002358It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
2359
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01002360But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002361-->
2362
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002363#### Logical operators
2364
2365Logical operators apply to boolean values and yield a result of the same type
2366as the operands. The right operand is evaluated conditionally.
2367
2368```
2369&& conditional AND p && q is "if p then q else false"
2370|| conditional OR p || q is "if p then true else q"
2371! NOT !p is "not p"
2372```
2373
2374
2375<!--
2376### TODO TODO TODO
2377
23783.14 / 0.0 // illegal: division by zero
2379Illegal conversions always apply to CUE.
2380
2381Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
2382-->
2383
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002384<!--- TODO(mpvl): conversions
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002385### Conversions
2386Conversions are expressions of the form `T(x)` where `T` and `x` are
2387expressions.
2388The result is always an instance of `T`.
2389
2390```
2391Conversion = Expression "(" Expression [ "," ] ")" .
2392```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002393--->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002394<!---
2395
2396A literal value `x` can be converted to type T if `x` is representable by a
2397value of `T`.
2398
2399As a special case, an integer literal `x` can be converted to a string type
2400using the same rule as for non-constant x.
2401
2402Converting a literal yields a typed value as result.
2403
2404```
2405uint(iota) // iota value of type uint
2406float32(2.718281828) // 2.718281828 of type float32
2407complex128(1) // 1.0 + 0.0i of type complex128
2408float32(0.49999999) // 0.5 of type float32
2409float64(-1e-1000) // 0.0 of type float64
2410string('x') // "x" of type string
2411string(0x266c) // "♬" of type string
2412MyString("foo" + "bar") // "foobar" of type MyString
2413string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant
2414(*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
2415int(1.2) // illegal: 1.2 cannot be represented as an int
2416string(65.0) // illegal: 65.0 is not an integer constant
2417```
2418--->
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002419<!---
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002420
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002421A conversion is always allowed if `x` is an instance of `T`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002422
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002423If `T` and `x` of different underlying type, a conversion is allowed if
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002424`x` can be converted to a value `x'` of `T`'s type, and
2425`x'` is an instance of `T`.
2426A value `x` can be converted to the type of `T` in any of these cases:
2427
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002428- `x` is a struct and is subsumed by `T`.
2429- `x` and `T` are both integer or floating points.
2430- `x` is an integer or a byte sequence and `T` is a string.
2431- `x` is a string and `T` is a byte sequence.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002432
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002433Specific rules apply to conversions between numeric types, structs,
2434or to and from a string type. These conversions may change the representation
2435of `x`.
2436All other conversions only change the type but not the representation of x.
2437
2438
2439#### Conversions between numeric ranges
2440For the conversion of numeric values, the following rules apply:
2441
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +010024421. Any integer value can be converted into any other integer value
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002443 provided that it is within range.
24442. When converting a decimal floating-point number to an integer, the fraction
2445 is discarded (truncation towards zero). TODO: or disallow truncating?
2446
2447```
2448a: uint16(int(1000)) // uint16(1000)
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002449b: uint8(1000) // _|_ // overflow
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002450c: int(2.5) // 2 TODO: TBD
2451```
2452
2453
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002454#### Conversions to and from a string type
2455
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002456Converting a list of bytes to a string type yields a string whose successive
2457bytes are the elements of the slice.
2458Invalid UTF-8 is converted to `"\uFFFD"`.
2459
2460```
2461string('hell\xc3\xb8') // "hellø"
2462string(bytes([0x20])) // " "
2463```
2464
2465As string value is always convertible to a list of bytes.
2466
2467```
2468bytes("hellø") // 'hell\xc3\xb8'
2469bytes("") // ''
2470```
2471
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002472#### Conversions between list types
2473
2474Conversions between list types are possible only if `T` strictly subsumes `x`
2475and the result will be the unification of `T` and `x`.
2476
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002477If we introduce named types this would be different from IP & [10, ...]
2478
2479Consider removing this until it has a different meaning.
2480
2481```
2482IP: 4*[byte]
2483Private10: IP([10, ...]) // [10, byte, byte, byte]
2484```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002485
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002486#### Conversions between struct types
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002487
2488A conversion from `x` to `T`
2489is applied using the following rules:
2490
24911. `x` must be an instance of `T`,
24922. all fields defined for `x` that are not defined for `T` are removed from
2493 the result of the conversion, recursively.
2494
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002495<!-- jba: I don't think you say anywhere that the matching fields are unified.
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002496mpvl: they are not, x must be an instance of T, in which case x == T&x,
2497so unification would be unnecessary.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002498-->
Marcel van Lohuizena3f00972019-02-01 11:10:39 +01002499<!--
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002500```
2501T: {
2502 a: { b: 1..10 }
2503}
2504
2505x1: {
2506 a: { b: 8, c: 10 }
2507 d: 9
2508}
2509
2510c1: T(x1) // { a: { b: 8 } }
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002511c2: T({}) // _|_ // missing field 'a' in '{}'
2512c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002513```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002514-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002515
2516### Calls
2517
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002518Calls can be made to core library functions, called builtins.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002519Given an expression `f` of function type F,
2520```
2521f(a1, a2, … an)
2522```
2523calls `f` with arguments a1, a2, … an. Arguments must be expressions
2524of which the values are an instance of the parameter types of `F`
2525and are evaluated before the function is called.
2526
2527```
2528a: math.Atan2(x, y)
2529```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002530
2531In a function call, the function value and arguments are evaluated in the usual
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01002532order.
2533After they are evaluated, the parameters of the call are passed by value
2534to the function and the called function begins execution.
2535The return parameters
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002536of the function are passed by value back to the calling function when the
2537function returns.
2538
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002539
2540### Comprehensions
2541
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01002542Lists and fields can be constructed using comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002543
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02002544Comprehensions define a clause sequence that consists of a sequence of
2545`for`, `if`, and `let` clauses, nesting from left to right.
2546The sequence must start with a `for` or `if` clause.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002547The `for` and `let` clauses each define a new scope in which new values are
2548bound to be available for the next clause.
2549
2550The `for` clause binds the defined identifiers, on each iteration, to the next
2551value of some iterable value in a new scope.
2552A `for` clause may bind one or two identifiers.
Marcel van Lohuizen4245fb42019-09-09 11:22:12 +02002553If there is one identifier, it binds it to the value of
2554a list element or struct field value.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002555If there are two identifiers, the first value will be the key or index,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002556if available, and the second will be the value.
2557
Marcel van Lohuizen4245fb42019-09-09 11:22:12 +02002558For lists, `for` iterates over all elements in the list after closing it.
2559For structs, `for` iterates over all non-optional regular fields.
2560
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002561An `if` clause, or guard, specifies an expression that terminates the current
2562iteration if it evaluates to false.
2563
2564The `let` clause binds the result of an expression to the defined identifier
2565in a new scope.
2566
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002567A current iteration is said to complete if the innermost block of the clause
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002568sequence is reached.
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02002569Syntactically, the comprehension value is a struct.
2570A comprehension can generate non-struct values by embedding such values within
2571this struct.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002572
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02002573Within lists, the values yielded by a comprehension are inserted in the list
2574at the position of the comprehension.
2575Within structs, the values yielded by a comprehension are embedded within the
2576struct.
2577Both structs and lists may contain multiple comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002578
2579```
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002580Comprehension = Clauses StructLit .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002581
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02002582Clauses = StartClause { [ "," ] Clause } .
2583StartClause = ForClause | GuardClause .
2584Clause = StartClause | LetClause .
2585ForClause = "for" identifier [ "," identifier ] "in" Expression .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002586GuardClause = "if" Expression .
2587LetClause = "let" identifier "=" Expression .
2588```
2589
2590```
2591a: [1, 2, 3, 4]
Marcel van Lohuizende0c53d2020-04-05 15:36:29 +02002592b: [ for x in a if x > 1 { x+1 } ] // [3, 4, 5]
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002593
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002594c: {
2595 for x in a
2596 if x < 4
2597 let y = 1 {
2598 "\(x)": x + y
2599 }
2600}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002601d: { "1": 2, "2": 3, "3": 4 }
2602```
2603
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002604
2605### String interpolation
2606
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002607String interpolation allows constructing strings by replacing placeholder
2608expressions with their string representation.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002609String interpolation may be used in single- and double-quoted strings, as well
2610as their multiline equivalent.
2611
Marcel van Lohuizen30ca0622020-08-22 14:07:59 +02002612A placeholder consists of "\(" followed by an expression and a ")".
2613The expression is evaluated in the scope within which the string is defined.
2614
2615The result of the expression is substituted as follows:
2616- string: as is
2617- bool: the JSON representation of the bool
2618- number: a JSON representation of the number that preserves the
2619precision of the underlying binary coded decimal
2620- bytes: as if substituted within single quotes or
2621converted to valid UTF-8 replacing the
2622maximal subpart of ill-formed subsequences with a single
2623replacement character (W3C encoding standard) otherwise
2624- list: illegal
2625- struct: illegal
2626
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002627
2628```
2629a: "World"
2630b: "Hello \( a )!" // Hello World!
2631```
2632
2633
2634## Builtin Functions
2635
2636Built-in functions are predeclared. They are called like any other function.
2637
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002638
2639### `len`
2640
2641The built-in function `len` takes arguments of various types and return
2642a result of type int.
2643
2644```
2645Argument type Result
2646
2647string string length in bytes
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002648bytes length of byte sequence
2649list list length, smallest length for an open list
Marcel van Lohuizenedea6f32020-09-12 20:27:20 +02002650struct number of distinct data fields, excluding optional
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002651```
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002652<!-- TODO: consider not supporting len, but instead rely on more
2653precisely named builtin functions:
2654 - strings.RuneLen(x)
2655 - bytes.Len(x) // x may be a string
2656 - struct.NumFooFields(x)
2657 - list.Len(x)
2658-->
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002659
2660```
2661Expression Result
2662len("Hellø") 6
2663len([1, 2, 3]) 3
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002664len([1, 2, ...]) >=2
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002665```
2666
Marcel van Lohuizen62658a82019-06-16 12:18:47 +02002667
2668### `close`
2669
2670The builtin function `close` converts a partially defined, or open, struct
2671to a fully defined, or closed, struct.
2672
2673
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002674### `and`
2675
2676The built-in function `and` takes a list and returns the result of applying
2677the `&` operator to all elements in the list.
2678It returns top for the empty list.
2679
Adieu5b4fa8b2019-12-03 19:20:58 +01002680```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002681Expression: Result
2682and([a, b]) a & b
2683and([a]) a
2684and([]) _
Adieu5b4fa8b2019-12-03 19:20:58 +01002685```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002686
2687### `or`
2688
2689The built-in function `or` takes a list and returns the result of applying
2690the `|` operator to all elements in the list.
2691It returns bottom for the empty list.
2692
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002693```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002694Expression: Result
Adieu5b4fa8b2019-12-03 19:20:58 +01002695or([a, b]) a | b
2696or([a]) a
2697or([]) _|_
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002698```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002699
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002700
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002701## Cycles
2702
2703Implementations are required to interpret or reject cycles encountered
2704during evaluation according to the rules in this section.
2705
2706
2707### Reference cycles
2708
2709A _reference cycle_ occurs if a field references itself, either directly or
2710indirectly.
2711
2712```
2713// x references itself
2714x: x
2715
2716// indirect cycles
2717b: c
2718c: d
2719d: b
2720```
2721
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002722Implementations should treat these as `_`.
2723Two particular cases are discussed below.
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002724
2725
2726#### Expressions that unify an atom with an expression
2727
2728An expression of the form `a & e`, where `a` is an atom
2729and `e` is an expression, always evaluates to `a` or bottom.
2730As it does not matter how we fail, we can assume the result to be `a`
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002731and postpone validating `a == e` until after all referenecs
2732in `e` have been resolved.
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002733
2734```
Marcel van Lohuizeneac8f9a2019-08-03 13:53:56 +02002735// Config Evaluates to (requiring concrete values)
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002736x: { x: {
2737 a: b + 100 a: _|_ // cycle detected
2738 b: a - 100 b: _|_ // cycle detected
2739} }
2740
2741y: x & { y: {
2742 a: 200 a: 200 // asserted that 200 == b + 100
2743 b: 100
2744} }
2745```
2746
2747
2748#### Field values
2749
2750A field value of the form `r & v`,
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002751where `r` evaluates to a reference cycle and `v` is a concrete value,
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002752evaluates to `v`.
2753Unification is idempotent and unifying a value with itself ad infinitum,
2754which is what the cycle represents, results in this value.
2755Implementations should detect cycles of this kind, ignore `r`,
2756and take `v` as the result of unification.
Marcel van Lohuizen0d0b9ad2019-10-10 18:19:28 +02002757
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002758<!-- Tomabechi's graph unification algorithm
2759can detect such cycles at near-zero cost. -->
2760
2761```
2762Configuration Evaluated
2763// c Cycles in nodes of type struct evaluate
2764// ↙︎ ↖ to the fixed point of unifying their
2765// a → b values ad infinitum.
2766
2767a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 }
2768b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 }
2769c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 }
2770
2771// resolve a b & {x:1}
2772// substitute b c & {y:2} & {x:1}
2773// substitute c a & {z:3} & {y:2} & {x:1}
2774// eliminate a (cycle) {z:3} & {y:2} & {x:1}
2775// simplify {x:1,y:2,z:3}
2776```
2777
2778This rule also applies to field values that are disjunctions of unification
2779operations of the above form.
2780
2781```
2782a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1}
2783b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2}
2784c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3}
2785
2786
2787// resolving a b&{x:1} | {y:1}
2788// substitute b ({x:2} | c&{z:2})&{x:1} | {y:1}
2789// simplify c&{z:2}&{x:1} | {y:1}
2790// substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
2791// simplify a&{y:3}&{z:2}&{x:1} | {y:1}
2792// eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1}
2793// expand {x:1,y:3,z:2} | {y:1}
2794```
2795
2796Note that all nodes that form a reference cycle to form a struct will evaluate
2797to the same value.
2798If a field value is a disjunction, any element that is part of a cycle will
2799evaluate to this value.
2800
2801
2802### Structural cycles
2803
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002804A structural cycle is when a node references one of its ancestor nodes.
2805It is possible to construct a structural cycle by unifying two acyclic values:
2806```
2807// acyclic
2808y: {
2809 f: h: g
2810 g: _
2811}
2812// acyclic
2813x: {
2814 f: _
2815 g: f
2816}
2817// introduces structural cycle
2818z: x & y
2819```
2820Implementations should be able to detect such structural cycles dynamically.
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002821
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002822A structural cycle can result in infinite structure or evaluation loops.
2823```
2824// infinite structure
2825a: b: a
2826
2827// infinite evaluation
2828f: {
2829 n: int
2830 out: n + (f & {n: 1}).out
2831}
2832```
2833CUE must allow or disallow structural cycles under certain circumstances.
2834
2835If a node `a` references an ancestor node, we call it and any of its
2836field values `a.f` _cyclic_.
2837So if `a` is cyclic, all of its descendants are also regarded as cyclic.
2838A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`,
2839is valid if any of its conjuncts is not cyclic.
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002840
2841```
2842// Disallowed: a list of infinite length with all elements being 1.
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002843#List: {
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002844 head: 1
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002845 tail: #List
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002846}
2847
2848// Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
2849a: {
2850 b: c
2851}
2852c: {
2853 d: a
2854}
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002855
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002856// #List defines a list of arbitrary length. Because the recursive reference
2857// is part of a disjunction, this does not result in a structural cycle.
2858#List: {
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002859 head: _
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002860 tail: null | #List
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002861}
Marcel van Lohuizen043c5342020-07-25 18:51:21 +02002862
2863// Usage of #List. The value of tail in the most deeply nested element will
2864// be `null`: as the value of the disjunct referring to list is the only
2865// conjunct, all conjuncts are cyclic and the value is invalid and so
2866// eliminated from the disjunction.
2867MyList: #List & { head: 1, tail: { head: 2 }}
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002868```
2869
2870<!--
2871### Unused fields
2872
2873TODO: rules for detection of unused fields
2874
28751. Any alias value must be used
2876-->
2877
2878
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002879## Modules, instances, and packages
2880
2881CUE configurations are constructed combining _instances_.
2882An instance, in turn, is constructed from one or more source files belonging
2883to the same _package_ that together declare the data representation.
2884Elements of this data representation may be exported and used
2885in other instances.
2886
2887### Source file organization
2888
2889Each source file consists of an optional package clause defining collection
2890of files to which it belongs,
2891followed by a possibly empty set of import declarations that declare
2892packages whose contents it wishes to use, followed by a possibly empty set of
2893declarations.
2894
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002895Like with a struct, a source file may contain embeddings.
2896Unlike with a struct, the embedded expressions may be any value.
2897If the result of the unification of all embedded values is not a struct,
2898it will be output instead of its enclosing file when exporting CUE
2899to a data format
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002900
2901```
Marcel van Lohuizenc2644752020-09-12 15:02:25 +02002902SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .
Marcel van Lohuizen1f5a9032019-09-09 23:53:42 +02002903```
2904
2905```
2906"Hello \(place)!"
2907
2908place: "world"
2909
2910// Outputs "Hello world!"
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002911```
2912
2913### Package clause
2914
2915A package clause is an optional clause that defines the package to which
2916a source file the file belongs.
2917
2918```
2919PackageClause = "package" PackageName .
2920PackageName = identifier .
2921```
2922
Marcel van Lohuizencb8f4f52020-03-08 17:39:39 +01002923The PackageName must not be the blank identifier or a definition identifier.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002924
2925```
2926package math
2927```
2928
2929### Modules and instances
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002930A _module_ defines a tree of directories, rooted at the _module root_.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002931
2932All source files within a module with the same package belong to the same
2933package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002934<!-- jba: I can't make sense of the above sentence. -->
2935A module may define multiple packages.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002936
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002937An _instance_ of a package is any subset of files belonging
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002938to the same package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002939<!-- jba: Are you saying that -->
2940<!-- if I have a package with files a, b and c, then there are 8 instances of -->
2941<!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
2942<!-- purpose of that definition? -->
2943It is interpreted as the concatenation of these files.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002944
2945An implementation may impose conventions on the layout of package files
2946to determine which files of a package belongs to an instance.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002947For example, an instance may be defined as the subset of package files
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002948belonging to a directory and all its ancestors.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002949<!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002950
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002951
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002952### Import declarations
2953
2954An import declaration states that the source file containing the declaration
2955depends on definitions of the _imported_ package (§Program initialization and
2956execution) and enables access to exported identifiers of that package.
2957The import names an identifier (PackageName) to be used for access and an
2958ImportPath that specifies the package to be imported.
2959
2960```
Marcel van Lohuizen40178752019-08-25 19:17:56 +02002961ImportDecl = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) .
Marcel van Lohuizenfbab65d2019-08-13 16:51:15 +02002962ImportSpec = [ PackageName ] ImportPath .
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002963ImportLocation = { unicode_value } .
2964ImportPath = `"` ImportLocation [ ":" identifier ] `"` .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002965```
2966
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002967The PackageName is used in qualified identifiers to access
2968exported identifiers of the package within the importing source file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002969It is declared in the file block.
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002970It defaults to the identifier specified in the package clause of the imported
2971package, which must match either the last path component of ImportLocation
2972or the identifier following it.
2973
2974<!--
2975Note: this deviates from the Go spec where there is no such restriction.
2976This restriction has the benefit of being to determine the identifiers
2977for packages from within the file itself. But for CUE it is has another benefit:
2978when using package hiearchies, one is more likely to want to include multiple
2979packages within the same directory structure. This mechanism allows
2980disambiguation in these cases.
2981-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002982
2983The interpretation of the ImportPath is implementation-dependent but it is
2984typically either the path of a builtin package or a fully qualifying location
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002985of a package within a source code repository.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002986
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002987An ImportLocation must be a non-empty strings using only characters belonging
2988Unicode's L, M, N, P, and S general categories
2989(the Graphic characters without spaces)
2990and may not include the characters !"#$%&'()*,:;<=>?[\]^`{|}
2991or the Unicode replacement character U+FFFD.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002992
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002993Assume we have package containing the package clause "package math",
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002994which exports function Sin at the path identified by "lib/math".
2995This table illustrates how Sin is accessed in files
2996that import the package after the various types of import declaration.
2997
2998```
2999Import declaration Local name of Sin
3000
3001import "lib/math" math.Sin
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02003002import "lib/math:math" math.Sin
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01003003import m "lib/math" m.Sin
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01003004```
3005
3006An import declaration declares a dependency relation between the importing and
3007imported package. It is illegal for a package to import itself, directly or
3008indirectly, or to directly import a package without referring to any of its
3009exported identifiers.
3010
3011
3012### An example package
3013
3014TODO