blob: a71ae0e6524a3050525e7fd42904dfa087a769d7 [file] [log] [blame] [view]
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001<!--
2 Copyright 2018 The CUE Authors
3
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License.
6 You may obtain a copy of the License at
7
8 http://www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in writing, software
11 distributed under the License is distributed on an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 See the License for the specific language governing permissions and
14 limitations under the License.
15-->
16
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010017# The CUE Language Specification
18
19## Introduction
20
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010021This is a reference manual for the CUE data constraint language.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010022CUE, pronounced cue or Q, is a general-purpose and strongly typed
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010023constraint-based language.
24It can be used for data templating, data validation, code generation, scripting,
25and many other applications involving structured data.
26The CUE tooling, layered on top of CUE, provides
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010027a general purpose scripting language for creating scripts as well as
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010028simple servers, also expressed in CUE.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010029
30CUE was designed with cloud configuration, and related systems, in mind,
31but is not limited to this domain.
32It derives its formalism from relational programming languages.
33This formalism allows for managing and reasoning over large amounts of
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010034data in a straightforward manner.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010035
36The grammar is compact and regular, allowing for easy analysis by automatic
37tools such as integrated development environments.
38
39This document is maintained by mpvl@golang.org.
40CUE has a lot of similarities with the Go language. This document draws heavily
Marcel van Lohuizen73f14eb2019-01-30 17:11:17 +010041from the Go specification as a result.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010042
43CUE draws its influence from many languages.
44Its main influences were BCL/ GCL (internal to Google),
45LKB (LinGO), Go, and JSON.
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +020046Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
47Jsonnet, HCL, Flabbergast, JSONPath, Haskell, Objective-C, and Python.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010048
49
50## Notation
51
52The syntax is specified using Extended Backus-Naur Form (EBNF):
53
54```
55Production = production_name "=" [ Expression ] "." .
56Expression = Alternative { "|" Alternative } .
57Alternative = Term { Term } .
58Term = production_name | token [ "…" token ] | Group | Option | Repetition .
59Group = "(" Expression ")" .
60Option = "[" Expression "]" .
61Repetition = "{" Expression "}" .
62```
63
64Productions are expressions constructed from terms and the following operators,
65in increasing precedence:
66
67```
68| alternation
69() grouping
70[] option (0 or 1 times)
71{} repetition (0 to n times)
72```
73
74Lower-case production names are used to identify lexical tokens. Non-terminals
75are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes
76``.
77
78The form a … b represents the set of characters from a through b as
79alternatives. The horizontal ellipsis … is also used elsewhere in the spec to
80informally denote various enumerations or code snippets that are not further
81specified. The character … (as opposed to the three characters ...) is not a
82token of the Go language.
83
84
85## Source code representation
86
87Source code is Unicode text encoded in UTF-8.
88Unless otherwise noted, the text is not canonicalized, so a single
89accented code point is distinct from the same character constructed from
90combining an accent and a letter; those are treated as two code points.
91For simplicity, this document will use the unqualified term character to refer
92to a Unicode code point in the source text.
93
94Each code point is distinct; for instance, upper and lower case letters are
95different characters.
96
97Implementation restriction: For compatibility with other tools, a compiler may
98disallow the NUL character (U+0000) in the source text.
99
100Implementation restriction: For compatibility with other tools, a compiler may
101ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
102point in the source text. A byte order mark may be disallowed anywhere else in
103the source.
104
105
106### Characters
107
108The following terms are used to denote specific Unicode character classes:
109
110```
111newline = /* the Unicode code point U+000A */ .
112unicode_char = /* an arbitrary Unicode code point except newline */ .
113unicode_letter = /* a Unicode code point classified as "Letter" */ .
114unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
115```
116
117In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
118character categories.
119CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
120as Unicode letters, and those in the Number category Nd as Unicode digits.
121
122
123### Letters and digits
124
125The underscore character _ (U+005F) is considered a letter.
126
127```
128letter = unicode_letter | "_" .
129decimal_digit = "0" … "9" .
130octal_digit = "0" … "7" .
131hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
132```
133
134
135## Lexical elements
136
137### Comments
138Comments serve as program documentation. There are two forms:
139
1401. Line comments start with the character sequence // and stop at the end of the line.
1412. General comments start with the character sequence /* and stop with the first subsequent character sequence */.
142
143A comment cannot start inside string literal or inside a comment.
144A general comment containing no newlines acts like a space.
145Any other comment acts like a newline.
146
147
148### Tokens
149
150Tokens form the vocabulary of the CUE language. There are four classes:
151identifiers, keywords, operators and punctuation, and literals. White space,
152formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
153(U+000D), and newlines (U+000A), is ignored except as it separates tokens that
154would otherwise combine into a single token. Also, a newline or end of file may
155trigger the insertion of a comma. While breaking the input into tokens, the
156next token is the longest sequence of characters that form a valid token.
157
158
159### Commas
160
161The formal grammar uses commas "," as terminators in a number of productions.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500162CUE programs may omit most of these commas using the following two rules:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100163
164When the input is broken into tokens, a comma is automatically inserted into
165the token stream immediately after a line's final token if that token is
166
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500167- an identifier
168- null, true, false, bottom, or an integer, floating-point, or string literal
169- one of the characters ), ], or }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100170
171
172Although commas are automatically inserted, the parser will require
173explicit commas between two list elements.
174
175To reflect idiomatic use, examples in this document elide commas using
176these rules.
177
178
179### Identifiers
180
181Identifiers name entities such as fields and aliases.
182An identifier is a sequence of one or more letters and digits.
183It may not be `_`.
184The first character in an identifier must be a letter.
185
186<!--
187TODO: allow identifiers as defined in Unicode UAX #31
188(https://unicode.org/reports/tr31/).
189
190Identifiers are normalized using the NFC normal form.
191-->
192
193```
194identifier = letter { letter | unicode_digit } .
195```
196
197```
198a
199_x9
200fieldName
201αβ
202```
203
204<!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
205
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500206Some identifiers are [predeclared](#predeclared-identifiers).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100207
208
209### Keywords
210
211CUE has a limited set of keywords.
212All keywords may be used as labels (field names).
213They cannot, however, be used as identifiers to refer to the same name.
214
215
216#### Values
217
218The following keywords are values.
219
220```
221null true false
222```
223
224These can never be used to refer to a field of the same name.
225This restriction is to ensure compatibility with JSON configuration files.
226
227
228#### Preamble
229
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100230The following keywords are used at the preamble of a CUE file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100231After the preamble, they may be used as identifiers to refer to namesake fields.
232
233```
234package import
235```
236
237
238#### Comprehension clauses
239
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100240The following keywords are used in comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100241
242```
243for in if let
244```
245
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100246The keywords `for`, `if` and `let` cannot be used as identifiers to
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100247refer to fields. All others can.
248
249<!--
250TODO:
251 reduce [to]
252 order [by]
253-->
254
255
256#### Arithmetic
257
258The following pseudo keywords can be used as operators in expressions.
259
260```
261div mod quo rem
262```
263
264These may be used as identifiers to refer to fields in all other contexts.
265
266
267### Operators and punctuation
268
269The following character sequences represent operators and punctuation:
270
271```
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +0100272+ div && == < . ( )
273- mod || != > : { }
274* quo & =~ <= = [ ]
275/ rem | !~ >= <- ... ,
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +0200276 _|_ ! ;
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100277```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100278<!-- :: for "is-a" definitions -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100279
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +0100280
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100281### Integer literals
282
283An integer literal is a sequence of digits representing an integer value.
284An optional prefix sets a non-decimal base: 0 for octal,
2850x or 0X for hexadecimal, and 0b for binary.
286In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500287All integers allow interstitial underscores "_";
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100288these have no meaning and are solely for readability.
289
290Decimal integers may have a SI or IEC multiplier.
291Multipliers can be used with fractional numbers.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500292When multiplying a fraction by a multiplier, the result is truncated
293towards zero if it is not an integer.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100294
295```
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200296int_lit = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
297decimal_lit = ( "1" … "9" ) { [ "_" ] decimal_digit } .
298decimals = decimal_digit { [ "_" ] decimal_digit } .
299si_it = decimals [ "." decimals ] multiplier |
300 "." decimals multiplier .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100301binary_lit = "0b" binary_digit { binary_digit } .
302hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200303octal_lit = "0" [ "o" ] octal_digit { [ "_" ] octal_digit } .
Jonathan Amsterdamabeffa42019-01-20 10:29:29 -0500304multiplier = ( "K" | "M" | "G" | "T" | "P" | "E" | "Y" | "Z" ) [ "i" ]
Marcel van Lohuizenafb4db62019-05-31 00:23:24 +0200305
306float_lit = decimals "." [ decimals ] [ exponent ] |
307 decimals exponent |
308 "." decimals [ exponent ].
309exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100310```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100311
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100312```
31342
3141.5Gi
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100315170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizenfc6303c2019-02-07 17:49:04 +01003160xBad_Face
3170o755
3180b0101_0001
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100319```
320
321### Decimal floating-point literals
322
323A decimal floating-point literal is a representation of
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500324a decimal floating-point value (a _float_).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100325It has an integer part, a decimal point, a fractional part, and an
326exponent part.
327The integer and fractional part comprise decimal digits; the
328exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
329One of the integer part or the fractional part may be elided; one of the decimal
330point or the exponent may be elided.
331
332```
333decimal_lit = decimals "." [ decimals ] [ exponent ] |
334 decimals exponent |
335 "." decimals [ exponent ] .
336exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
337```
338
339```
3400.
34172.40
342072.40 // == 72.40
3432.71828
3441.e+0
3456.67428e-11
3461E6
347.25
348.12345E+5
349```
350
351
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100352### String and byte sequence literals
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100353
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100354A string literal represents a string constant obtained from concatenating a
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100355sequence of characters.
356Byte sequences are a sequence of bytes.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100357
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100358String and byte sequence literals are character sequences between,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100359respectively, double and single quotes, as in `"bar"` and `'bar'`.
360Within the quotes, any character may appear except newline and,
361respectively, unescaped double or single quote.
362String literals may only be valid UTF-8.
363Byte sequences may contain any sequence of bytes.
364
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400365Several escape sequences allow arbitrary values to be encoded as ASCII text.
366An escape sequence starts with an _escape delimiter_, which is `\` by default.
367The escape delimiter may be altered to be `\` plus a fixed number of
368hash symbols `#`
369by padding the start and end of a string or byte sequence literal
370with this number of hash symbols.
371
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100372There are four ways to represent the integer value as a numeric constant: `\x`
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400373followed by exactly two hexadecimal digits; `\u` followed by exactly four
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100374hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
375plain backslash `\` followed by exactly three octal digits.
376In each case the value of the literal is the value represented by the
377digits in the corresponding base.
378Hexadecimal and octal escapes are only allowed within byte sequences
379(single quotes).
380
381Although these representations all result in an integer, they have different
382valid ranges.
383Octal escapes must represent a value between 0 and 255 inclusive.
384Hexadecimal escapes satisfy this condition by construction.
385The escapes `\u` and `\U` represent Unicode code points so within them
386some values are illegal, in particular those above `0x10FFFF`.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400387Surrogate halves are allowed,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100388but are translated into their non-surrogate equivalent internally.
389
390The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
391represent individual bytes of the resulting string; all other escapes represent
392the (possibly multi-byte) UTF-8 encoding of individual characters.
393Thus inside a string literal `\377` and `\xFF` represent a single byte of
394value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
395the two bytes `0xc3 0xbf` of the UTF-8
396encoding of character `U+00FF`.
397
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100398```
399\a U+0007 alert or bell
400\b U+0008 backspace
401\f U+000C form feed
402\n U+000A line feed or newline
403\r U+000D carriage return
404\t U+0009 horizontal tab
405\v U+000b vertical tab
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100406\/ U+002f slash (solidus)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100407\\ U+005c backslash
408\' U+0027 single quote (valid escape only within single quoted literals)
409\" U+0022 double quote (valid escape only within double quoted literals)
410```
411
412The escape `\(` is used as an escape for string interpolation.
413A `\(` must be followed by a valid CUE Expression, followed by a `)`.
414
415All other sequences starting with a backslash are illegal inside literals.
416
417```
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400418escaped_char = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100419byte_value = octal_byte_value | hex_byte_value .
420octal_byte_value = `\` octal_digit octal_digit octal_digit .
421hex_byte_value = `\` "x" hex_digit hex_digit .
422little_u_value = `\` "u" hex_digit hex_digit hex_digit hex_digit .
423big_u_value = `\` "U" hex_digit hex_digit hex_digit hex_digit
424 hex_digit hex_digit hex_digit hex_digit .
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400425unicode_value = unicode_char | little_u_value | big_u_value | escaped_char .
426interpolation = "\(" Expression ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100427
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400428string_lit = simple_string_lit |
429 multiline_string_lit |
430 simple_bytes_lit |
431 multiline_bytes_lit |
432 `#` string_lit `#` .
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100433
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400434simple_string_lit = `"` { unicode_value | interpolation } `"` .
435simple_bytes_lit = `"` { unicode_value | interpolation | byte_value } `"` .
436multiline_string_lit = `"""` newline
437 { unicode_value | interpolation | newline }
438 newline `"""` .
439multiline_bytes_lit = "'''" newline
440 { unicode_value | interpolation | byte_value | newline }
441 newline "'''" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100442```
443
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400444Carriage return characters (`\r`) inside string literals are discarded from
Marcel van Lohuizendb9d25a2019-02-21 23:54:43 +0100445the string value.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400446
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100447```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100448'a\000\xab'
449'\007'
450'\377'
451'\xa' // illegal: too few hexadecimal digits
452"\n"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100453"\""
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100454'Hello, world!\n'
455"Hello, \( name )!"
456"日本語"
457"\u65e5本\U00008a9e"
458"\xff\u00FF"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100459"\uD800" // illegal: surrogate half (TODO: probably should allow)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100460"\U00110000" // illegal: invalid Unicode code point
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400461
462#"This is not an \(interpolation)"#
463#"This is an \#(interpolation)"#
464#"The sequence "\U0001F604" renders as \#U0001F604."#
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100465```
466
467These examples all represent the same string:
468
469```
470"日本語" // UTF-8 input text
471'日本語' // UTF-8 input text as byte sequence
472`日本語` // UTF-8 input text as a raw literal
473"\u65e5\u672c\u8a9e" // the explicit Unicode code points
474"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points
475"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
476```
477
478If the source code represents a character as two code points, such as a
479combining form involving an accent and a letter, the result will appear as two
480code points if placed in a string literal.
481
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400482Strings and byte sequences have a multiline equivalent.
483Multiline strings are like their single-line equivalent,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100484but allow newline characters.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100485
Marcel van Lohuizen369e4232019-02-15 10:59:29 +0400486Multiline strings and byte sequences respectively start with
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100487a triple double quote (`"""`) or triple single quote (`'''`),
488immediately followed by a newline, which is discarded from the string contents.
489The string is closed by a matching triple quote, which must be by itself
490on a newline, preceded by optional whitespace.
491The whitespace before a closing triple quote must appear before any non-empty
492line after the opening quote and will be removed from each of these
493lines in the string literal.
494A closing triple quote may not appear in the string.
495To include it is suffices to escape one of the quotes.
496
497```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100498"""
499 lily:
500 out of the water
501 out of itself
502
503 bass
504 picking bugs
505 off the moon
506 — Nick Virgilio, Selected Haiku, 1988
507 """
508```
509
510This represents the same string as:
511
512```
513"lily:\nout of the water\nout of itself\n\n" +
514"bass\npicking bugs\noff the moon\n" +
515" — Nick Virgilio, Selected Haiku, 1988"
516```
517
518<!-- TODO: other values
519
520Support for other values:
521- Duration literals
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +0100522- regular expessions: `re("[a-z]")`
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100523-->
524
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500525
526## Values
527
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100528In addition to simple values like `"hello"` and `42.0`, CUE has _structs_.
529A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
530Structs are CUE's only way of building up complex values;
531lists, which we will see later,
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500532are defined in terms of structs.
533
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100534All possible values are ordered in a lattice,
535a partial order where every two elements have a single greatest lower bound.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500536A value `a` is an _instance_ of a value `b`,
537denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
538that is if `a` orders before `b` in the partial order
539(`⊑` is _not_ a CUE operator).
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100540We also say that `b` _subsumes_ `a` in this case.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500541In graphical terms, `b` is "above" `a` in the lattice.
542
543At the top of the lattice is the single ancestor of all values, called
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100544_top_, denoted `_` in CUE.
545Every value is an instance of top.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500546
547At the bottom of the lattice is the value called _bottom_, denoted `_|_`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100548A bottom value usually indicates an error.
549Bottom is an instance of every value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500550
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100551An _atom_ is any value whose only instances are itself and bottom.
552Examples of atoms are `42.0`, `"hello"`, `true`, `null`.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500553
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100554A value is _concrete_ if it is either an atom, or a struct all of whose
555field values are themselves concrete, recursively.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500556
557CUE's values also include what we normally think of as types, like `string` and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100558`float`.
559But CUE does not distinguish between types and values; only the
560relationship of values in the lattice is important.
561Each CUE "type" subsumes the concrete values that one would normally think
562of as part of that type.
563For example, "hello" is an instance of `string`, and `42.0` is an instance of
564`float`.
565In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`.
566We informally call these CUE's "basic types".
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100567
568
569```
570false ⊑ bool
571true ⊑ bool
572true ⊑ true
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01005735.0 ⊑ float
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100574bool ⊑ _
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100575_|_ ⊑ _
576_|_ ⊑ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100577
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100578_ ⋢ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100579_ ⋢ bool
580int ⋢ bool
581bool ⋢ int
582false ⋢ true
583true ⋢ false
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100584float ⋢ 5.0
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01005855 ⋢ 6
586```
587
588
589### Unification
590
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500591The _unification_ of values `a` and `b`
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100592is defined as the greatest lower bound of `a` and `b`. (That is, the
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500593value `u` such that `u ⊑ a` and `u ⊑ b`,
594and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
595it holds that `v ⊑ u`.)
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500596Since CUE values form a lattice, the unification of two CUE values is
597always unique.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100598
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500599These all follow from the definition of unification:
600- The unification of `a` with itself is always `a`.
601- The unification of values `a` and `b` where `a ⊑ b` is always `a`.
602- The unification of a value with bottom is always bottom.
603
604Unification in CUE is a [binary expression](#Operands), written `a & b`.
605It is commutative and associative.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100606As a consequence, order of evaluation is irrelevant, a property that is key
607to many of the constructs in the CUE language as well as the tooling layered
608on top of it.
609
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500610
611
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100612<!-- TODO: explicitly mention that disjunction is not a binary operation
613but a definition of a single value?-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100614
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100615
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100616### Disjunction
617
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500618The _disjunction_ of values `a` and `b`
619is defined as the least upper bound of `a` and `b`.
620(That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
621and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
622it holds that `d ⊑ e`.)
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100623This style of disjunctions is sometimes also referred to as sum types.
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500624Since CUE values form a lattice, the disjunction of two CUE values is always unique.
625
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100626
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500627These all follow from the definition of disjunction:
628- The disjunction of `a` with itself is always `a`.
629- The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
630- The disjunction of a value `a` with bottom is always `a`.
631- The disjunction of two bottom values is bottom.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100632
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500633Disjunction in CUE is a [binary expression](#Operands), written `a | b`.
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100634It is commutative, associative, and idempotent.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100635
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100636The unification of a disjunction with another value is equal to the disjunction
637composed of the unification of this value with all of the original elements
638of the disjunction.
639In other words, unification distributes over disjunction.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100640
641```
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100642(a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100643```
644
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100645```
646Expression Result
647({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100648(int | string) & "foo" "foo"
649("a" | "b") & "c" _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100650```
651
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100652A disjunction is _normalized_ if there is no element
653`a` for which there is an element `b` such that `a ⊑ b`.
654
655<!--
656Normalization is important, as we need to account for spurious elements
657For instance "tcp" | "tcp" should resolve to "tcp".
658
659Also consider
660
661 ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
662
663in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
664this expression is logically equivalent to {a:1} and should therefore be
665considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
666
667For instance, in
668
669 x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
670 y: x.a // 1
671
672y should resolve to 1, and not an error.
673
674For comparison, in
675
676 x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
677 y: x.a // _|_
678
679y should be an error as x is still ambiguous before the selector is applied,
680even though `a` resolves to 1 in all cases.
681-->
682
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500683
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100684#### Default values
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500685
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100686Any element of a disjunction can be marked as a default
687by prefixing it with an asterisk '*'.
688Intuitively, when an expression needs to be resolved for an operation other
689than unification or disjunctions,
690non-starred elements are dropped in favor of starred ones if the starred ones
691do not resolve to bottom.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500692
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100693More precisely, any value `v` may be associated with a default value `d`,
694denoted `(v, d)` (not CUE syntax),
695where `d` must be in instance of `v` (`d ⊑ v`).
696The rules for unifying and disjoining such values are as follows:
697
698```
699U1: (v1, d1) & v2 => (v1&v2, d1&v2)
700U2: (v1, d1) & (v2, d2) => (v1&v2, d1&d2)
701
702D1: (v1, d1) | v2 => (v1|v2, d1)
703D2: (v1, d1) | (v2, d2) => (v1|v2, d1|d2)
704```
705
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100706Default values may be introduced within disjunctions
707by _marking_ terms of a disjunction with an asterisk `*`
708([a unary expression](#Operators)).
709The default value of a disjunction with marked terms is the disjunction
710of those marked terms, applying the following rules for marks:
711
712```
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200713M1: *v => (v, v)
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100714M2: *(v1, d1) => (v1, d1)
715```
716
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200717In general, any operation in CUE involving default values proceeds along the
718following lines
719```
720O1: f((v1, d1), ..., (vn, dn)) => (fn(v1, ..., vn), fn(d1, ..., dn))
721```
722where, with the exception of disjunction, a value `v` without a default
723value is promoted to `(v, v)`.
724
725
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100726```
727Expression Value-default pair Rules applied
728*"tcp" | "udp" ("tcp"|"udp", "tcp") M1, D1
729string | *"foo" (string, "foo") M1, D1
730
731*1 | 2 | 3 (1|2|3, 1) M1, D1
732
733(*1|2|3) | (1|*2|3) (1|2|3, 1|2) M1, D1, D2
734(*1|2|3) | *(1|*2|3) (1|2|3, 1|2) M1, D1, M2, D2
735(*1|2|3) | (1|*2|3)&2 (1|2|3, 1|2) M1, D1, U1, D2
736
737(*1|2) & (1|*2) (1|2, _|_) M1, D1, U2
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +0200738
739(*1|2) + (1|*2) ((1|2)+(1|2), 3) M1, D1, O1
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100740```
741
742The rules of subsumption for defaults can be derived from the above definitions
743and are as follows.
744
745```
746(v2, d2) ⊑ (v1, d1) if v2 ⊑ v1 and d2 ⊑ d1
747(v1, d1) ⊑ v if v1 ⊑ v
748v ⊑ (v1, d1) if v ⊑ d1
749```
750
751<!--
752For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
753
754The last one is so restrictive as v could still be made more specific by
755associating it with a default that is not subsumed by d1.
756
757Proof:
758 by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
759 where the most general value is (v, v).
760 Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
761 from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
762 exactly defines the boundary of this subsumption.
763-->
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100764
765<!--
766(non-normalized entries could also be implicitly marked, allowing writing
767int | 1, instead of int | *1, but that can be done in a backwards
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100768compatible way later if really desirable, as long as we require that
769disjunction literals be normalized).
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500770-->
771
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100772
773```
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100774Expression Resolves to
775"tcp" | "udp" "tcp" | "udp"
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100776*"tcp" | "udp" "tcp"
777float | *1 1
778*string | 1.0 string
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100779
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100780(*1|2|3) | (1|*2|3) 1|2
781(*1|2|3) & (1|*2|3) 1|2|3 // default is _|_
782
783(* >=5 | int) & (* <=5 | int) 5
784
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100785(*"tcp"|"udp") & ("udp"|*"tcp") "tcp"
786(*"tcp"|"udp") & ("udp"|"tcp") "tcp"
787(*"tcp"|"udp") & "tcp" "tcp"
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100788(*"tcp"|"udp") & (*"udp"|"tcp") "tcp" | "udp" // default is _|_
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100789
790(*true | false) & bool true
791(*true | false) & (true | false) true
792
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100793{a: 1} | {b: 1} {a: 1} | {b: 1}
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100794{a: 1} | *{b: 1} {b:1}
Marcel van Lohuizen6e5d9932019-03-14 15:52:48 +0100795*{a: 1} | *{b: 1} {a: 1} | {b: 1}
796({a: 1} | {b: 1}) & {a:1} {a:1} // after eliminating {a:1,b:1} by normalization
797({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1} // after eliminating {a:1,b:1} by normalization
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100798```
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500799
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100800
801### Bottom and errors
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100802
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100803Any evaluation error in CUE results in a bottom value, respresented by
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100804the token '_|_'.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100805Bottom is an instance of every other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100806Any evaluation error is represented as bottom.
807
808Implementations may associate error strings with different instances of bottom;
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500809logically they all remain the same value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100810
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100811
812### Top
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100813
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100814Top is represented by the underscore character '_', lexically an identifier.
815Unifying any value `v` with top results `v` itself.
816
817```
818Expr Result
819_ & 5 5
820_ & _ _
821_ & _|_ _|_
822_ | _|_ _
823```
824
825
826### Null
827
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100828The _null value_ is represented with the keyword `null`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100829It has only one parent, top, and one child, bottom.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100830It is unordered with respect to any other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100831
832```
833null_lit = "null"
834```
835
836```
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100837null & 8 _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100838null & _ null
839null & _|_ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100840```
841
842
843### Boolean values
844
845A _boolean type_ represents the set of Boolean truth values denoted by
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100846the keywords `true` and `false`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100847The predeclared boolean type is `bool`; it is a defined type and a separate
848element in the lattice.
849
850```
851boolean_lit = "true" | "false"
852```
853
854```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100855bool & true true
856true & true true
857true & false _|_
858bool & (false|true) false | true
859bool & (true|false) true | false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100860```
861
862
863### Numeric values
864
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500865The _integer type_ represents the set of all integral numbers.
866The _decimal floating-point type_ represents the set of all decimal floating-point
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100867numbers.
868They are two distinct types.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +0200869Both are instances instances of a generic `number` type.
870
871<!--
872 number
873 / \
874 int float
875-->
876
877The predeclared number, integer, decimal floating-point types are
878`number`, `int` and `float`; they are defined types.
879<!--
880TODO: should we drop float? It is somewhat preciser and probably a good idea
881to have it in the programmatic API, but it may be confusing to have to deal
882with it in the language.
883-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100884
885A decimal floating-point literal always has type `float`;
886it is not an instance of `int` even if it is an integral number.
887
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +0200888Integer literals are always of type `int and don't match type `float`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100889
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100890Numeric literals are exact values of arbitrary precision.
891If the operation permits it, numbers should be kept in arbitrary precision.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100892
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100893Implementation restriction: although numeric values have arbitrary precision
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100894in the language, implementations may implement them using an internal
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100895representation with limited precision.
896That said, every implementation must:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100897
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500898- Represent integer values with at least 256 bits.
899- Represent floating-point values, with a mantissa of at least 256 bits and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100900a signed binary exponent of at least 16 bits.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500901- Give an error if unable to represent an integer value precisely.
902- Give an error if unable to represent a floating-point value due to overflow.
903- Round to the nearest representable value if unable to represent
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100904a floating-point value due to limits on precision.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100905These requirements apply to the result of any expression except for builtin
906functions for which an unusual loss of precision must be explicitly documented.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100907
908
909### Strings
910
911The _string type_ represents the set of all possible UTF-8 strings,
912not allowing surrogates.
913The predeclared string type is `string`; it is a defined type.
914
915Strings are designed to be unicode-safe.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500916Comparison is done using canonical forms ("é" == "e\u0301").
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100917A string element is an
918[extended grapheme cluster](https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries),
919which is an approximation of a human-readable character.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100920
921The length of a string `s` (its size in bytes) can be discovered using
922the built-in function len.
923A string's extended grapheme cluster can be accessed by integer index
9240 through len(s)-1 for any byte that is part of that grapheme cluster.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100925
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100926To access the individual bytes of a string one should convert it to
927a sequence of bytes first.
928
929
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +0100930### Bounds
931
932A _bound_, syntactically_ a [unary expression](#Operands), defines
Marcel van Lohuizen62b87272019-02-01 10:07:49 +0100933an infinite disjunction of concrete values than can be represented
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +0100934as a single comparison.
935
936For any [comparison operator](#Comparison-operators) `op` except `==`,
937`op a` is the disjunction of every `x` such that `x op a`.
938
939```
9402 & >=2 & <=5 // 2, where 2 is either an int or float.
9412.5 & >=1 & <=5 // 2.5
9422 & >=1.0 & <3.0 // 2.0
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01009432 & >1 & <3.0 // 2.0
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01009442.5 & int & >1 & <5 // _|_
9452.5 & float & >1 & <5 // 2.5
946int & 2 & >1.0 & <3.0 // _|_
9472.5 & >=(int & 1) & <5 // _|_
948>=0 & <=7 & >=3 & <=10 // >=3 & <=7
949!=null & 1 // 1
950>=5 & <=5 // 5
951```
952
953
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100954### Structs
955
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500956A _struct_ is a set of elements called _fields_, each of
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100957which has a name, called a _label_, and value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100958
959We say a label is defined for a struct if the struct has a field with the
960corresponding label.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100961The value for a label `f` of struct `a` is denoted `f.a`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100962A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100963defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100964Note that if `a` is an instance of `b` it may have fields with labels that
965are not defined for `b`.
966
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500967The (unique) struct with no fields, written `{}`, has every struct as an
968instance. It can be considered the type of all structs.
969
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100970The successful unification of structs `a` and `b` is a new struct `c` which
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100971has all fields of both `a` and `b`, where
972the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
973or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
974Any [references](#References) to `a` or `b`
975in their respective field values need to be replaced with references to `c`.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100976The result of a unification is bottom (`_|_`) if any of its fields evaluates
977to bottom, recursively.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100978
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100979A field name may also be an interpolated string.
980Identifiers used in such strings are evaluated within
981the scope of the struct in which the label is defined.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500982
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100983Syntactically, a struct literal may contain multiple fields with
984the same label, the result of which is a single field with a value
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500985that is the unification of the values of those fields.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100986
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100987A TemplateLabel indicates a template value that is to be unified with
988the values of all fields within a struct.
989The identifier of a template label binds to the field name of each
990field and is visible within the template value.
991
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100992```
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +0100993StructLit = "{" [ Declaration { "," Declaration } [ "," ] ] "}" .
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100994Declaration = FieldDecl | AliasDecl | ComprehensionDecl .
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +0100995FieldDecl = Label { Label } ":" Expression { attribute } .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100996
997AliasDecl = Label "=" Expression .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100998TemplateLabel = "<" identifier ">" .
Marcel van Lohuizen08a0ef22019-03-28 09:12:19 +0100999ConcreteLabel = identifier | simple_string_lit
1000OptionalLabel = ConcreteLabel "?"
1001Label = ConcreteLabel | OptionalLabel | TemplateLabel .
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001002
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001003attribute = "@" identifier "(" attr_elems ")" .
1004attr_elems = attr_elem { "," attr_elem }
1005attr_elem = attr_string | attr_label | attr_nest .
1006attr_label = identifier "=" attr_string .
1007attr_nest = identifier "(" attr_elems ")" .
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001008attr_string = { attr_char } | string_lit .
1009attr_char = /* an arbitrary Unicode code point except newline, ',', '"', `'`, '#', '=', '(', and ')' */ .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001010```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001011
1012```
1013{a: 1} ⊑ {}
1014{a: 1, b: 1} ⊑ {a: 1}
1015{a: 1} ⊑ {a: int}
1016{a: 1, b: 1} ⊑ {a: int, b: float}
1017
1018{} ⋢ {a: 1}
1019{a: 2} ⋢ {a: 1}
1020{a: 1} ⋢ {b: 1}
1021```
1022
1023```
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001024Expression Result
1025{a: int, a: 1} {a: int(1)}
1026{a: int} & {a: 1} {a: int(1)}
1027{a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7}
1028{a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001029
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001030{a: 1} & {b: 2} {a: 1, b: 2}
1031{a: 1, b: int} & {b: 2} {a: 1, b: int(2)}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001032
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001033{a: 1} & {a: 2} _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001034```
1035
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01001036Fields may be associated with attributes.
1037Attributes define additional information about a field,
1038such as a mapping to a protobuf tag or alternative
1039name of the field when mapping to a different language.
1040
1041If a field has multiple attributes their identifiers must be unique.
1042Attributes accumulate when unifying two fields, removing duplicate entries.
1043It is an error for the resulting field to have two different attributes
1044with the same identifier.
1045
1046Attributes are not directly part of the data model, but may be
1047accessed through the API or other means of reflection.
1048The interpretation of the attribute value
1049(a comma-separated list of attribute elements) depends on the attribute.
1050Interpolations are not allowed in attribute strings.
1051
1052The recommended convention, however, is to interpret the first
1053`n` arguments as positional arguments,
1054where duplicate conflicting entries are an error,
1055and the remaining arguments as a combination of flags
1056(an identifier) and key value pairs, separated by a `=`.
1057
1058```
1059MyStruct1: {
1060 field: string @go(Field)
1061 attr: int @xml(,attr) @go(Attr)
1062}
1063
1064MyStruct2: {
1065 field: string @go(Field)
1066 attr: int @xml(a1,attr) @go(Attr)
1067}
1068
1069Combined: MyStruct1 & MyStruct2
1070// field: string @go(Field)
1071// attr: int @xml(,attr) @xml(a1,attr) @go(Attr)
1072```
1073
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001074In addition to fields, a struct literal may also define aliases.
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001075Aliases name values that can be referred to
1076within the [scope](#declarations-and-scopes) of their
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001077definition, but are not part of the struct: aliases are irrelevant to
1078the partial ordering of values and are not emitted as part of any
1079generated data.
1080The name of an alias must be unique within the struct literal.
1081
1082```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001083// The empty struct.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001084{}
1085
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001086// A struct with 3 fields and 1 alias.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001087{
1088 alias = 3
1089
1090 foo: 2
1091 bar: "a string"
1092
1093 "not an ident": 4
1094}
1095```
1096
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001097A field whose value is a struct with a single field may be written as
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001098a sequence of the two field names,
1099followed by a colon and the value of that single field.
1100
1101```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001102job myTask replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001103```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001104expands to
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001105```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001106job: {
1107 myTask: {
1108 replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001109 }
1110}
1111```
1112
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001113
Marcel van Lohuizen08a0ef22019-03-28 09:12:19 +01001114#### Optional fields
1115
1116An identifier or string label may be followed by a question mark `?`
1117to indicate a field is optional.
Marcel van Lohuizen8bc02e52019-04-01 13:14:07 +02001118The question mark is not part of the field name.
Marcel van Lohuizen08a0ef22019-03-28 09:12:19 +01001119Constraints defined by an optional field should only be applied when
1120a field is present.
Marcel van Lohuizen8bc02e52019-04-01 13:14:07 +02001121A field with such a marker may be omitted from output and should not cause
Marcel van Lohuizen08a0ef22019-03-28 09:12:19 +01001122an error when emitting a concrete configuration, even if its value is
1123not concrete or bottom.
Marcel van Lohuizen08a0ef22019-03-28 09:12:19 +01001124The result of unifying two fields only has an optional marker
1125if both fields have such a marker.
1126
1127<!--
1128The optional marker solves the issue of having to print large amounts of
1129boilerplate when dealing with large types with many optional or default
1130values (such as Kubernetes).
1131Writing such optional values in terms of *null | value is tedious,
1132unpleasant to read, and as it is not well defined what can be dropped or not,
1133all null values have to be emitted from the output, even if the user
1134doesn't override them.
1135Part of the issue is how null is defined. We could adopt a Typescript-like
1136approach of introducing "void" or "undefined" to mean "not defined and not
1137part of the output". But having all of null, undefined, and void can be
1138confusing. If these ever are introduced anyway, the ? operator could be
1139expressed along the lines of
1140 foo?: bar
1141being a shorthand for
1142 foo: void | bar
1143where void is the default if no other default is given.
1144
1145The current mechanical definition of "?" is straightforward, though, and
1146probably avoids the need for void, while solving a big issue.
1147
1148Caveats:
1149[1] this definition requires explicitly defined fields to be emitted, even
1150if they could be elided (for instance if the explicit value is the default
1151value defined an optional field). This is probably a good thing.
1152
1153[2] a default value may still need to be included in an output if it is not
1154the zero value for that field and it is not known if any outside system is
1155aware of defaults. For instance, which defaults are specified by the user
1156and which by the schema understood by the receiving system.
1157The use of "?" together with defaults should therefore be used carefully
1158in non-schema definitions.
1159Problematic cases should be easy to detect by a vet-like check, though.
1160
1161[3] It should be considered how this affects the trim command.
1162Should values implied by optional fields be allowed to be removed?
1163Probably not. This restriction is unlikely to limit the usefulness of trim,
1164though.
1165
1166[4] There should be an option to emit all concrete optional values.
1167```
1168-->
1169
1170```
1171Input Result
1172a: { foo?: string } {}
1173b: { foo: "bar" } { foo: "bar" }
1174c: { foo?: *"bar" | string } {}
1175
1176d: a & b { foo: "bar" }
1177e: b & c { foo: "bar" }
1178f: a & c {}
1179g: a & { foo?: number } _|_
1180```
1181
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001182
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001183### Lists
1184
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001185A list literal defines a new value of type list.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001186A list may be open or closed.
1187An open list is indicated with a `...` at the end of an element list,
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001188optionally followed by a value for the remaining elements.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001189
1190The length of a closed list is the number of elements it contains.
1191The length of an open list is the its number of elements as a lower bound
1192and an unlimited number of elements as its upper bound.
1193
1194```
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01001195ListLit = "[" [ ElementList [ "," [ "..." [ Expression ] ] ] "]" .
1196ElementList = Expression { "," Expression } .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001197```
1198<!---
1199KeyedElement = Element .
1200--->
1201
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001202Lists can be thought of as structs:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001203
1204```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001205List: *null | {
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001206 Elem: _
1207 Tail: List
1208}
1209```
1210
1211For closed lists, `Tail` is `null` for the last element, for open lists it is
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001212`*null | List`, defaulting to the shortest variant.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001213For instance, the open list [ 1, 2, ... ] can be represented as:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001214```
1215open: List & { Elem: 1, Tail: { Elem: 2 } }
1216```
1217and the closed version of this list, [ 1, 2 ], as
1218```
1219closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
1220```
1221
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001222Using this representation, the subsumption rule for lists can
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001223be derived from those of structs.
1224Implementations are not required to implement lists as structs.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001225The `Elem` and `Tail` fields are not special and `len` will not work as
1226expected in these cases.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001227
1228
1229## Declarations and Scopes
1230
1231
1232### Blocks
1233
1234A _block_ is a possibly empty sequence of declarations.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001235The braces of a struct literal `{ ... }` form a block, but there are
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001236others as well:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001237
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001238- The _universe block_ encompasses all CUE source text.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001239- Each [package](#modules-instances-and-packages) has a _package block_
1240 containing all CUE source text in that package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001241- Each file has a _file block_ containing all CUE source text in that file.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001242- Each `for` and `let` clause in a [comprehension](#comprehensions)
1243 is considered to be its own implicit block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001244
1245Blocks nest and influence [scoping].
1246
1247
1248### Declarations and scope
1249
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001250A _declaration_ binds an identifier to a field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001251Every identifier in a program must be declared.
1252Other than for fields,
1253no identifier may be declared twice within the same block.
1254For fields an identifier may be declared more than once within the same block,
1255resulting in a field with a value that is the result of unifying the values
1256of all fields with the same identifier.
1257
1258```
1259TopLevelDecl = Declaration | Emit .
1260Emit = Operand .
1261```
1262
1263The _scope_ of a declared identifier is the extent of source text in which the
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001264identifier denotes the specified field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001265
1266CUE is lexically scoped using blocks:
1267
Jonathan Amsterdame4790382019-01-20 10:29:29 -050012681. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010012691. The scope of an identifier denoting a field or alias
1270 declared at top level (outside any struct literal) is the file block.
12711. The scope of the package name of an imported package is the file block of the
1272 file containing the import declaration.
12731. The scope of a field or alias identifier declared inside a struct literal
1274 is the innermost containing block.
1275
1276An identifier declared in a block may be redeclared in an inner block.
1277While the identifier of the inner declaration is in scope, it denotes the entity
1278declared by the inner declaration.
1279
1280The package clause is not a declaration;
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001281the package name does not appear in any scope.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001282Its purpose is to identify the files belonging to the same package
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001283and to specify the default name for import declarations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001284
1285
1286### Predeclared identifiers
1287
1288```
1289Functions
1290len required close open
1291
1292Types
1293null The null type and value
1294bool All boolean values
1295int All integral numbers
1296float All decimal floating-point numbers
1297string Any valid UTF-8 sequence
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001298bytes Any vallid byte sequence
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001299
1300Derived Value
1301number int | float
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001302uint >=0
1303uint8 >=0 & <=255
1304int8 >=-128 & <=127
1305uint16 >=0 & <=65536
1306int16 >=-32_768 & <=32_767
1307rune >=0 & <=0x10FFFF
1308uint32 >=0 & <=4_294_967_296
1309int32 >=-2_147_483_648 & <=2_147_483_647
1310uint64 >=0 & <=18_446_744_073_709_551_615
1311int64 >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
1312uint128 >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
1313int128 >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
1314 <=170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001315float32 >=-3.40282346638528859811704183484516925440e+38 &
1316 <=3.40282346638528859811704183484516925440e+38
1317float64 >=-1.797693134862315708145274237317043567981e+308 &
1318 <=1.797693134862315708145274237317043567981e+308
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001319```
1320
1321
1322### Exported and manifested identifiers
1323
1324An identifier of a package may be exported to permit access to it
1325from another package.
1326An identifier is exported if both:
1327the first character of the identifier's name is not a Unicode lower case letter
1328(Unicode class "Ll") or the underscore "_"; and
1329the identifier is declared in the file block.
1330All other identifiers are not exported.
1331
1332An identifier that starts with the underscore "_" is not
1333emitted in any data output.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001334Quoted labels that start with an underscore are emitted, however.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001335
1336### Uniqueness of identifiers
1337
1338Given a set of identifiers, an identifier is called unique if it is different
1339from every other in the set, after applying normalization following
1340Unicode Annex #31.
1341Two identifiers are different if they are spelled differently.
1342<!--
1343or if they appear in different packages and are not exported.
1344--->
1345Otherwise, they are the same.
1346
1347
1348### Field declarations
1349
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001350A field declaration binds a label (the name of the field) to an expression.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001351The name for a quoted string used as label is the string it represents.
1352Tne name for an identifier used as a label is the identifier itself.
1353Quoted strings and identifiers can be used used interchangeably, with the
1354exception of identifiers starting with an underscore '_'.
1355The latter represent hidden fields and are treated in a different namespace.
1356
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001357If an expression may result in a value associated with a default value
1358as described in [default values](#default-values), the field binds to this
1359value-default pair.
1360
Marcel van Lohuizenbcf832f2019-04-03 22:50:44 +02001361<!-- TODO: disallow creating identifiers starting with __
1362...and reserve them for builtin values.
1363
1364The issue is with code generation. As no guarantee can be given that
1365a predeclared identifier is not overridden in one of the enclosing scopes,
1366code will have to handle detecting such cases and renaming them.
1367An alternative is to have the predeclared identifiers be aliases for namesake
1368equivalents starting with a double underscore (e.g. string -> __string),
1369allowing generated code (normal code would keep using `string`) to refer
1370to these directly.
1371-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001372
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001373
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001374### Alias declarations
1375
1376An alias declaration binds an identifier to the given expression.
1377
1378Within the scope of the identifier, it serves as an _alias_ for that
1379expression.
1380The expression is evaluated in the scope as it was declared.
1381
1382
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001383## Expressions
1384
1385An expression specifies the computation of a value by applying operators and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001386built-in functions to operands.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001387
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001388Expressions that require concrete values are called _incomplete_ if any of
1389their operands are not concrete, but define a value that would be legal for
1390that expression.
1391Incomplete expressions may be left unevaluated until a concrete value is
1392requested at the application level.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001393
1394### Operands
1395
1396Operands denote the elementary values in an expression.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001397An operand may be a literal, a (possibly qualified) identifier denoting
1398field, alias, or a parenthesized expression.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001399
1400```
1401Operand = Literal | OperandName | ListComprehension | "(" Expression ")" .
1402Literal = BasicLit | ListLit | StructLit .
1403BasicLit = int_lit | float_lit | string_lit |
1404 null_lit | bool_lit | bottom_lit | top_lit .
1405OperandName = identifier | QualifiedIdent.
1406```
1407
1408### Qualified identifiers
1409
1410A qualified identifier is an identifier qualified with a package name prefix.
1411
1412```
1413QualifiedIdent = PackageName "." identifier .
1414```
1415
1416A qualified identifier accesses an identifier in a different package,
1417which must be [imported].
1418The identifier must be declared in the [package block] of that package.
1419
1420```
1421math.Sin // denotes the Sin function in package math
1422```
1423
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001424### References
1425
1426An identifier operand refers to a field and is called a reference.
1427The value of a reference is a copy of the expression associated with the field
1428that it is bound to,
1429with any references within that expression bound to the respective copies of
1430the fields they were originally bound to.
1431Implementations may use a different mechanism to evaluate as long as
1432these semantics are maintained.
1433
1434```
1435a: {
1436 place: string
1437 greeting: "Hello, \(place)!"
1438}
1439
1440b: a & { place: "world" }
1441c: a & { place: "you" }
1442
1443d: b.greeting // "Hello, world!"
1444e: c.greeting // "Hello, you!"
1445```
1446
1447
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001448
1449### Primary expressions
1450
1451Primary expressions are the operands for unary and binary expressions.
1452
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001453
1454```
1455
1456Slice: indices must be complete
1457([0, 1, 2, 3] | [2, 3])[0:2] => [0, 1] | [2, 3]
1458
1459([0, 1, 2, 3] | *[2, 3])[0:2] => [0, 1] | [2, 3]
1460([0,1,2,3]|[2,3], [2,3])[0:2] => ([0,1]|[2,3], [2,3])
1461
1462Index
1463a: (1|2, 1)
1464b: ([0,1,2,3]|[2,3], [2,3])[a] => ([0,1,2,3]|[2,3][a], 3)
1465
1466Binary operation
1467A binary is only evaluated if its operands are complete.
1468
1469Input Maximum allowed evaluation
1470a: string string
1471b: 2 2
1472c: a * b a * 2
1473
1474An error in a struct is if the evaluation of any expression results in
1475bottom, where an incomplete expression is not considered bottom.
1476```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001477<!-- TODO(mpvl)
1478 Conversion |
1479-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001480```
1481PrimaryExpr =
1482 Operand |
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001483 PrimaryExpr Selector |
1484 PrimaryExpr Index |
1485 PrimaryExpr Slice |
1486 PrimaryExpr Arguments .
1487
1488Selector = "." identifier .
1489Index = "[" Expression "]" .
1490Slice = "[" [ Expression ] ":" [ Expression ] "]"
1491Argument = Expression .
1492Arguments = "(" [ ( Argument { "," Argument } ) [ "..." ] [ "," ] ] ")" .
1493```
1494<!---
1495Argument = Expression | ( identifer ":" Expression ).
1496--->
1497
1498```
1499x
15002
1501(s + ".txt")
1502f(3.1415, true)
1503m["foo"]
1504s[i : j + 1]
1505obj.color
1506f.p[i].x
1507```
1508
1509
1510### Selectors
1511
1512For a [primary expression] `x` that is not a [package name],
1513the selector expression
1514
1515```
1516x.f
1517```
1518
1519denotes the field `f` of the value `x`.
1520The identifier `f` is called the field selector.
1521The type of the selector expression is the type of `f`.
1522If `x` is a package name, see the section on [qualified identifiers].
1523
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001524<!--
1525TODO: consider allowing this and also for selectors. It needs to be considered
1526how defaults are corried forward in cases like:
1527
1528 x: { a: string | *"foo" } | *{ a: int | *4 }
1529 y: x.a & string
1530
1531What is y in this case?
1532 (x.a & string, _|_)
1533 (string|"foo", _|_)
1534 (string|"foo", "foo)
1535If the latter, then why?
1536
1537For a disjunction of the form `x1 | ... | xn`,
1538the selector is applied to each element `x1.f | ... | xn.f`.
1539-->
1540
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001541Otherwise, if `x` is not a struct, or if `f` does not exist in `x`,
1542the result of the expression is bottom (an error).
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001543In the latter case the expression is incomplete.
1544The operand of a selector may be associated with a default.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001545
1546```
1547T: {
1548 x: int
1549 y: 3
1550}
1551
1552a: T.x // int
1553b: T.y // 3
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001554c: T.z // _|_ // field 'z' not found in T
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001555
1556e: {a: 1|*2} | *{a: 3|*4}
1557f: e.a // 4 (default value)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001558```
1559
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001560<!--
1561```
1562(v, d).f => (v.f, d.f)
1563
1564e: {a: 1|*2} | *{a: 3|*4}
1565f: e.a // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
1566
1567```
1568-->
1569
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001570
1571### Index expressions
1572
1573A primary expression of the form
1574
1575```
1576a[x]
1577```
1578
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001579denotes the element of the list, string, bytes, or struct `a` indexed by `x`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001580The value `x` is called the index or field name, respectively.
1581The following rules apply:
1582
1583If `a` is not a struct:
1584
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001585- `a` is a concrete string or bytes type or a list (which need not be complete)
1586- the index `x` unified with `int` must be concrete.
1587- the index `x` is in range if `0 <= x < len(a)`, where only the
1588 explicitly defined values of an open-ended list are considered,
1589 otherwise it is out of range
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001590
1591The result of `a[x]` is
1592
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001593for `a` of list or bytes type:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001594
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001595- the list or byte element at index `x`, if `x` is within range
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001596- bottom (an error), otherwise
1597
1598for `a` of string type:
1599
1600- the grapheme cluster at the `x`th byte (type string), if `x` is within range
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001601 where `x` may match any byte of the grapheme cluster
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001602- bottom (an error), otherwise
1603
1604for `a` of struct type:
1605
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001606- the index `x` unified with `string` must be concrete.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001607- the value of the field named `x` of struct `a`, if this field exists
1608- bottom (an error), otherwise
1609
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001610
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001611```
1612[ 1, 2 ][1] // 2
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001613[ 1, 2 ][2] // _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001614[ 1, 2, ...][2] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001615"He\u0300?"[0] // "H"
1616"He\u0300?"[1] // "e\u0300"
1617"He\u0300?"[2] // "e\u0300"
1618"He\u0300?"[3] // "e\u0300"
1619"He\u0300?"[4] // "?"
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001620"He\u0300?"[5] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001621```
1622
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001623Both the operand and index value may be a value-default pair.
1624```
1625va[vi] => va[vi]
1626va[(vi, di)] => (va[vi], va[di])
1627(va, da)[vi] => (va[vi], da[vi])
1628(va, da)[(vi, di)] => (va[vi], da[di])
1629```
1630
1631```
1632Fields Result
1633x: [1, 2] | *[3, 4] ([1,2]|[3,4], [3,4])
1634i: int | *1 (int, 1)
1635
1636v: x[i] (x[i], 4)
1637```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001638
1639### Slice expressions
1640
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001641Slice expressions construct a substring or slice from a string, bytes,
1642or list value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001643
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001644For strings, bytes or lists, the primary expression
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001645```
1646a[low : high]
1647```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001648constructs a substring or slice. The indices `low` and `high` must be
1649concrete integers and select
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001650which elements of operand `a` appear in the result.
1651The result has indices starting at 0 and length equal to `high` - `low`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001652After slicing the list `a`
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001653<!-- TODO(jba): how does slicing open lists work? -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001654
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001655<!-- TODO: consider this.
1656For `a` is a disjunction of the form `a1 | ... | an`, then the result is
1657`a1[low:high] | ... | an[low:high]` observing the above rules.
1658-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001659```
1660a := [1, 2, 3, 4, 5]
1661s := a[1:4]
1662```
1663the list s has length 3 and elements
1664```
1665s[0] == 2
1666s[1] == 3
1667s[2] == 4
1668```
1669For convenience, any of the indices may be omitted.
1670A missing `low` index defaults to zero; a missing `high` index defaults
1671to the length of the sliced operand:
1672```
1673a[2:] // same as a[2 : len(a)]
1674a[:3] // same as a[0 : 3]
1675a[:] // same as a[0 : len(a)]
1676```
1677
1678Indices are in range if `0 <= low <= high <= len(a)`,
1679otherwise they are out of range.
1680For strings, the indices selects the start of the extended grapheme cluster
1681at byte position indicated by the index.
1682If any of the slice values is out of range or if `low > high`, the result of
1683a slice is bottom (error).
1684
1685```
1686"He\u0300?"[:2] // "He\u0300"
1687"He\u0300?"[1:2] // "e\u0300"
1688"He\u0300?"[4:5] // "e\u0300?"
1689```
1690
1691
1692The result of a successful slice operation is a value of the same type
1693as the operand.
1694
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001695Both the slice operand and the slice indices may be associated with a default.
1696
1697<!--
1698```
1699va[vs:ve] => va[vs:ve]
1700va[vs:(ve, de)] => (va[vs:ve], va[vs:de])
1701va[(vs, ds):ve] => (va[vs:ve], va[ds:ve])
1702va[(vs, ds):(ve, de)] => (va[vs:ve], va[ds:de])
1703(va, da)[vs:ve] => (va[vs:ve], da[vs:ve])
1704(va, da)[vs:(ve, de)] => (va[vs:ve], da[vs:de])
1705(va, da)[(vs, ds):ve] => (va[vs:ve], da[ds:ve])
1706(va, da)[(vs, ds):(ve, de)] => (va[vs:ve], da[ds:de])
1707```
1708-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001709
1710### Operators
1711
1712Operators combine operands into expressions.
1713
1714```
1715Expression = UnaryExpr | Expression binary_op Expression .
1716UnaryExpr = PrimaryExpr | unary_op UnaryExpr .
1717
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001718binary_op = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op .
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01001719rel_op = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001720add_op = "+" | "-" .
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001721mul_op = "*" | "/" | "div" | "mod" | "quo" | "rem" .
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01001722unary_op = "+" | "-" | "!" | "*" | rel_op .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001723```
1724
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001725Comparisons are discussed [elsewhere](#Comparison-operators).
Marcel van Lohuizen7da140a2019-02-01 09:35:00 +01001726For any binary operators, the operand types must unify.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001727<!-- TODO: durations
1728 unless the operation involves durations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001729
1730Except for duration operations, if one operand is an untyped [literal] and the
1731other operand is not, the constant is [converted] to the type of the other
1732operand.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001733-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001734
Marcel van Lohuizenfe4abac2019-04-06 17:19:03 +02001735Operands of unary and binary expressions may be associated with a default using
1736the following
1737<!--
1738```
1739O1: op (v1, d1) => (op v1, op d1)
1740
1741O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
1742and because v => (v, v)
1743O3: v1 op (v2, d2) => (v1 op v2, v1 op d2)
1744O4: (v1, d1) op v2 => (v1 op v2, d1 op v2)
1745```
1746-->
1747
1748```
1749Field Resulting Value-Default pair
1750a: *1|2 (1|2, 1)
1751b: -a (-a, -1)
1752
1753c: a + 2 (a+2, 3)
1754d: a + a (a+a, 2)
1755```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001756
1757#### Operator precedence
1758
1759Unary operators have the highest precedence.
1760
1761There are eight precedence levels for binary operators.
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001762Multiplication operators binds strongest, followed by
1763addition operators, comparison operators,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001764`&&` (logical AND), `||` (logical OR), `&` (unification),
1765and finally `|` (disjunction):
1766
1767```
1768Precedence Operator
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001769 7 * / div mod quo rem
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001770 6 + -
Marcel van Lohuizen2b0e7cd2019-03-25 08:28:41 +01001771 5 == != < <= > >= =~ !~
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001772 4 &&
1773 3 ||
1774 2 &
1775 1 |
1776```
1777
1778Binary operators of the same precedence associate from left to right.
1779For instance, `x / y * z` is the same as `(x / y) * z`.
1780
1781```
1782+x
178323 + 3*x[i]
1784x <= f()
1785f() || g()
1786x == y+1 && y == z-1
17872 | int
1788{ a: 1 } & { b: 2 }
1789```
1790
1791#### Arithmetic operators
1792
1793Arithmetic operators apply to numeric values and yield a result of the same type
1794as the first operand. The three of the four standard arithmetic operators
1795`(+, -, *)` apply to integer and decimal floating-point types;
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01001796`+` and `*` also apply to lists and strings.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001797`/` only applies to decimal floating-point types and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001798`div`, `mod`, `quo`, and `rem` only apply to integer types.
1799
1800```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001801+ sum integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001802- difference integers, floats
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001803* product integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001804/ quotient floats
1805div division integers
1806mod modulo integers
1807quo quotient integers
1808rem remainder integers
1809```
1810
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001811For any operator that accepts operands of type `float`, any operand may be
1812of type `int` or `float`, in which case the result will be `float` if any
1813of the operands is `float` or `int` otherwise.
1814For `/` the result is always `float`.
1815
1816
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001817#### Integer operators
1818
1819For two integer values `x` and `y`,
1820the integer quotient `q = x div y` and remainder `r = x mod y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001821implement Euclidean division and
1822satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001823
1824```
1825r = x - y*q with 0 <= r < |y|
1826```
1827where `|y|` denotes the absolute value of `y`.
1828
1829```
1830 x y x div y x mod y
1831 5 3 1 2
1832-5 3 -2 1
1833 5 -3 -1 2
1834-5 -3 2 1
1835```
1836
1837For two integer values `x` and `y`,
1838the integer quotient `q = x quo y` and remainder `r = x rem y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001839implement truncated division and
1840satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001841
1842```
1843x = q*y + r and |r| < |y|
1844```
1845
1846with `x quo y` truncated towards zero.
1847
1848```
1849 x y x quo y x rem y
1850 5 3 1 2
1851-5 3 -1 -2
1852 5 -3 -1 2
1853-5 -3 1 -2
1854```
1855
1856A zero divisor in either case results in bottom (an error).
1857
1858For integer operands, the unary operators `+` and `-` are defined as follows:
1859
1860```
1861+x is 0 + x
1862-x negation is 0 - x
1863```
1864
1865
1866#### Decimal floating-point operators
1867
1868For decimal floating-point numbers, `+x` is the same as `x`,
1869while -x is the negation of x.
1870The result of a floating-point division by zero is bottom (an error).
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001871<!-- TODO: consider making it +/- Inf -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001872
1873An implementation may combine multiple floating-point operations into a single
1874fused operation, possibly across statements, and produce a result that differs
1875from the value obtained by executing and rounding the instructions individually.
1876
1877
1878#### List operators
1879
1880Lists can be concatenated using the `+` operator.
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001881Opens list are closed to their default value beforehand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001882
1883```
1884[ 1, 2 ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
1885[ 1, 2, ... ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001886[ 1, 2 ] + [ 3, 4, ... ] // [ 1, 2, 3, 4 ]
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001887```
1888
Jonathan Amsterdam0500c312019-02-16 18:04:09 -05001889Lists can be multiplied with a non-negative`int` using the `*` operator
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01001890to create a repeated the list by the indicated number.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001891```
18923*[1,2] // [1, 2, 1, 2, 1, 2]
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +020018933*[1, 2, ...] // [1, 2, 1, 2, 1 ,2]
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01001894[byte]*4 // [byte, byte, byte, byte]
Jonathan Amsterdam0500c312019-02-16 18:04:09 -050018950*[1,2] // []
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001896```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001897
1898<!-- TODO(mpvl): should we allow multiplication with a range?
1899If so, how does one specify a list with a range of possible lengths?
1900
1901Suggestion from jba:
1902Multiplication should distribute over disjunction,
1903so int(1)..int(3) * [x] = [x] | [x, x] | [x, x, x].
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001904The hard part is figuring out what (>=1 & <=3) * [x] means,
1905since >=1 & <=3 includes many floats.
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001906(mpvl: could constrain arguments to parameter types, but needs to be
1907done consistently.)
1908-->
1909
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001910
1911#### String operators
1912
1913Strings can be concatenated using the `+` operator:
1914```
1915s := "hi " + name + " and good bye"
1916```
1917String addition creates a new string by concatenating the operands.
1918
1919A string can be repeated by multiplying it:
1920
1921```
1922s: "etc. "*3 // "etc. etc. etc. "
1923```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001924<!-- jba: Do these work for byte sequences? If not, why not? -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001925
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02001926
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001927##### Comparison operators
1928
1929Comparison operators compare two operands and yield an untyped boolean value.
1930
1931```
1932== equal
1933!= not equal
1934< less
1935<= less or equal
1936> greater
1937>= greater or equal
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01001938=~ matches regular expression
1939!~ does not match regular expression
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001940```
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01001941<!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001942
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01001943In any comparison, the types of the two operands must unify or one of the
1944operands must be null.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001945
1946The equality operators `==` and `!=` apply to operands that are comparable.
1947The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01001948The matching operators `=~` and `!~` apply to a string and regular
1949expression operand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001950These terms and the result of the comparisons are defined as follows:
1951
Marcel van Lohuizen855243e2019-02-07 18:00:55 +01001952- Null is comparable with itself and any other type.
1953 Two null values are always equal, null is unequal with anything else.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001954- Boolean values are comparable.
1955 Two boolean values are equal if they are either both true or both false.
1956- Integer values are comparable and ordered, in the usual way.
1957- Floating-point values are comparable and ordered, as per the definitions
1958 for binary coded decimals in the IEEE-754-2008 standard.
Marcel van Lohuizen4a360992019-05-11 18:18:31 +02001959- Floating point numbers may be compared with integers.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001960- String values are comparable and ordered, lexically byte-wise after
1961 normalization to Unicode normal form NFC.
1962- Struct are not comparable.
Marcel van Lohuizen855243e2019-02-07 18:00:55 +01001963- Lists are not comparable.
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01001964- The regular expression syntax is the one accepted by RE2,
1965 described in https://github.com/google/re2/wiki/Syntax,
1966 except for `\C`.
1967- `s =~ r` is true if `s` matches the regular expression `r`.
1968- `s !~ r` is true if `s` does not match regular expression `r`.
1969<!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
Marcel van Lohuizen88a8a5f2019-02-20 01:26:22 +01001970<!-- Consider implementing Level 2 of Unicode regular expression. -->
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01001971
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001972```
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +010019733 < 4 // true
Marcel van Lohuizen4a360992019-05-11 18:18:31 +020019743 < 4.0 // true
Marcel van Lohuizen0a0a3ac2019-02-10 16:48:53 +01001975null == 2 // false
1976null != {} // true
1977{} == {} // _|_: structs are not comparable against structs
1978
1979"Wild cats" =~ "cat" // true
1980"Wild cats" !~ "dog" // true
1981
1982"foo" =~ "^[a-z]{3}$" // true
1983"foo" =~ "^[a-z]{4}$" // false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001984```
1985
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001986<!-- jba
1987I think I know what `3 < a` should mean if
1988
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001989 a: >=1 & <=5
1990
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001991It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
1992
Marcel van Lohuizen62b87272019-02-01 10:07:49 +01001993But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001994-->
1995
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001996#### Logical operators
1997
1998Logical operators apply to boolean values and yield a result of the same type
1999as the operands. The right operand is evaluated conditionally.
2000
2001```
2002&& conditional AND p && q is "if p then q else false"
2003|| conditional OR p || q is "if p then true else q"
2004! NOT !p is "not p"
2005```
2006
2007
2008<!--
2009### TODO TODO TODO
2010
20113.14 / 0.0 // illegal: division by zero
2012Illegal conversions always apply to CUE.
2013
2014Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
2015-->
2016
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002017<!--- TODO(mpvl): conversions
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002018### Conversions
2019Conversions are expressions of the form `T(x)` where `T` and `x` are
2020expressions.
2021The result is always an instance of `T`.
2022
2023```
2024Conversion = Expression "(" Expression [ "," ] ")" .
2025```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002026--->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002027<!---
2028
2029A literal value `x` can be converted to type T if `x` is representable by a
2030value of `T`.
2031
2032As a special case, an integer literal `x` can be converted to a string type
2033using the same rule as for non-constant x.
2034
2035Converting a literal yields a typed value as result.
2036
2037```
2038uint(iota) // iota value of type uint
2039float32(2.718281828) // 2.718281828 of type float32
2040complex128(1) // 1.0 + 0.0i of type complex128
2041float32(0.49999999) // 0.5 of type float32
2042float64(-1e-1000) // 0.0 of type float64
2043string('x') // "x" of type string
2044string(0x266c) // "♬" of type string
2045MyString("foo" + "bar") // "foobar" of type MyString
2046string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant
2047(*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
2048int(1.2) // illegal: 1.2 cannot be represented as an int
2049string(65.0) // illegal: 65.0 is not an integer constant
2050```
2051--->
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002052<!---
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002053
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002054A conversion is always allowed if `x` is an instance of `T`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002055
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002056If `T` and `x` of different underlying type, a conversion is allowed if
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002057`x` can be converted to a value `x'` of `T`'s type, and
2058`x'` is an instance of `T`.
2059A value `x` can be converted to the type of `T` in any of these cases:
2060
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002061- `x` is a struct and is subsumed by `T`.
2062- `x` and `T` are both integer or floating points.
2063- `x` is an integer or a byte sequence and `T` is a string.
2064- `x` is a string and `T` is a byte sequence.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002065
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002066Specific rules apply to conversions between numeric types, structs,
2067or to and from a string type. These conversions may change the representation
2068of `x`.
2069All other conversions only change the type but not the representation of x.
2070
2071
2072#### Conversions between numeric ranges
2073For the conversion of numeric values, the following rules apply:
2074
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +010020751. Any integer value can be converted into any other integer value
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002076 provided that it is within range.
20772. When converting a decimal floating-point number to an integer, the fraction
2078 is discarded (truncation towards zero). TODO: or disallow truncating?
2079
2080```
2081a: uint16(int(1000)) // uint16(1000)
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002082b: uint8(1000) // _|_ // overflow
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002083c: int(2.5) // 2 TODO: TBD
2084```
2085
2086
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002087#### Conversions to and from a string type
2088
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002089Converting a list of bytes to a string type yields a string whose successive
2090bytes are the elements of the slice.
2091Invalid UTF-8 is converted to `"\uFFFD"`.
2092
2093```
2094string('hell\xc3\xb8') // "hellø"
2095string(bytes([0x20])) // " "
2096```
2097
2098As string value is always convertible to a list of bytes.
2099
2100```
2101bytes("hellø") // 'hell\xc3\xb8'
2102bytes("") // ''
2103```
2104
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002105#### Conversions between list types
2106
2107Conversions between list types are possible only if `T` strictly subsumes `x`
2108and the result will be the unification of `T` and `x`.
2109
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002110If we introduce named types this would be different from IP & [10, ...]
2111
2112Consider removing this until it has a different meaning.
2113
2114```
2115IP: 4*[byte]
2116Private10: IP([10, ...]) // [10, byte, byte, byte]
2117```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002118
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01002119#### Conversions between struct types
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002120
2121A conversion from `x` to `T`
2122is applied using the following rules:
2123
21241. `x` must be an instance of `T`,
21252. all fields defined for `x` that are not defined for `T` are removed from
2126 the result of the conversion, recursively.
2127
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002128<!-- jba: I don't think you say anywhere that the matching fields are unified.
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002129mpvl: they are not, x must be an instance of T, in which case x == T&x,
2130so unification would be unnecessary.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002131-->
Marcel van Lohuizena3f00972019-02-01 11:10:39 +01002132<!--
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002133```
2134T: {
2135 a: { b: 1..10 }
2136}
2137
2138x1: {
2139 a: { b: 8, c: 10 }
2140 d: 9
2141}
2142
2143c1: T(x1) // { a: { b: 8 } }
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002144c2: T({}) // _|_ // missing field 'a' in '{}'
2145c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002146```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01002147-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002148
2149### Calls
2150
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002151Calls can be made to core library functions, called builtins.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002152Given an expression `f` of function type F,
2153```
2154f(a1, a2, … an)
2155```
2156calls `f` with arguments a1, a2, … an. Arguments must be expressions
2157of which the values are an instance of the parameter types of `F`
2158and are evaluated before the function is called.
2159
2160```
2161a: math.Atan2(x, y)
2162```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002163
2164In a function call, the function value and arguments are evaluated in the usual
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01002165order.
2166After they are evaluated, the parameters of the call are passed by value
2167to the function and the called function begins execution.
2168The return parameters
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002169of the function are passed by value back to the calling function when the
2170function returns.
2171
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002172
2173### Comprehensions
2174
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01002175Lists and fields can be constructed using comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002176
2177Each define a clause sequence that consists of a sequence of `for`, `if`, and
2178`let` clauses, nesting from left to right.
2179The `for` and `let` clauses each define a new scope in which new values are
2180bound to be available for the next clause.
2181
2182The `for` clause binds the defined identifiers, on each iteration, to the next
2183value of some iterable value in a new scope.
2184A `for` clause may bind one or two identifiers.
2185If there is one identifier, it binds it to the value, for instance
2186a list element, a struct field value or a range element.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002187If there are two identifiers, the first value will be the key or index,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002188if available, and the second will be the value.
2189
2190An `if` clause, or guard, specifies an expression that terminates the current
2191iteration if it evaluates to false.
2192
2193The `let` clause binds the result of an expression to the defined identifier
2194in a new scope.
2195
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002196A current iteration is said to complete if the innermost block of the clause
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002197sequence is reached.
2198
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002199_List comprehensions_ specify a single expression that is evaluated and included
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002200in the list for each completed iteration.
2201
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002202_Field comprehensions_ follow a `Field` with a clause sequence, where the
2203label and value of the field are evaluated for each iteration.
Marcel van Lohuizen369e4232019-02-15 10:59:29 +04002204The label must be an identifier or simple_string_lit, where the
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01002205later may be a string interpolation that refers to the identifiers defined
2206in the clauses.
2207Values of iterations that map to the same label unify into a single field.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002208
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01002209<!--
2210TODO: consider allowing multiple labels for comprehensions
2211(current implementation). Generally it is better to define comprehensions
2212in the current scope, though, as it may prevent surprises given the
2213restrictions on comprehensions.
2214-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002215```
Marcel van Lohuizenb9b62d32019-03-14 23:50:15 +01002216ComprehensionDecl = Label ":" Expression [ "<-" ] Clauses .
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01002217ListComprehension = "[" Expression [ "<-" ] Clauses "]" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002218
2219Clauses = Clause { Clause } .
2220Clause = ForClause | GuardClause | LetClause .
2221ForClause = "for" identifier [ ", " identifier] "in" Expression .
2222GuardClause = "if" Expression .
2223LetClause = "let" identifier "=" Expression .
2224```
2225
2226```
2227a: [1, 2, 3, 4]
2228b: [ x+1 for x in a if x > 1] // [3, 4, 5]
2229
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01002230c: { "\(x)": x + y for x in a if x < 4 let y = 1 }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002231d: { "1": 2, "2": 3, "3": 4 }
2232```
2233
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002234
2235### String interpolation
2236
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002237String interpolation allows constructing strings by replacing placeholder
2238expressions with their string representation.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002239String interpolation may be used in single- and double-quoted strings, as well
2240as their multiline equivalent.
2241
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002242A placeholder consists of "\(" followed by an expression and a ")". The
2243expression is evaluated within the scope within which the string is defined.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002244
2245```
2246a: "World"
2247b: "Hello \( a )!" // Hello World!
2248```
2249
2250
2251## Builtin Functions
2252
2253Built-in functions are predeclared. They are called like any other function.
2254
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002255
2256### `len`
2257
2258The built-in function `len` takes arguments of various types and return
2259a result of type int.
2260
2261```
2262Argument type Result
2263
2264string string length in bytes
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002265bytes length of byte sequence
2266list list length, smallest length for an open list
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002267struct number of distinct data fields, including optional
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002268```
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002269<!-- TODO: consider not supporting len, but instead rely on more
2270precisely named builtin functions:
2271 - strings.RuneLen(x)
2272 - bytes.Len(x) // x may be a string
2273 - struct.NumFooFields(x)
2274 - list.Len(x)
2275-->
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002276
2277```
2278Expression Result
2279len("Hellø") 6
2280len([1, 2, 3]) 3
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002281len([1, 2, ...]) >=2
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01002282```
2283
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002284### `and`
2285
2286The built-in function `and` takes a list and returns the result of applying
2287the `&` operator to all elements in the list.
2288It returns top for the empty list.
2289
2290Expression: Result
2291and([a, b]) a & b
2292and([a]) a
2293and([]) _
2294
2295### `or`
2296
2297The built-in function `or` takes a list and returns the result of applying
2298the `|` operator to all elements in the list.
2299It returns bottom for the empty list.
2300
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002301```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002302Expression: Result
2303and([a, b]) a | b
2304and([a]) a
2305and([]) _|_
Marcel van Lohuizen6c35af62019-05-06 10:50:57 +02002306```
Marcel van Lohuizena460fe82019-04-26 10:20:51 +02002307
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002308
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002309## Cycles
2310
2311Implementations are required to interpret or reject cycles encountered
2312during evaluation according to the rules in this section.
2313
2314
2315### Reference cycles
2316
2317A _reference cycle_ occurs if a field references itself, either directly or
2318indirectly.
2319
2320```
2321// x references itself
2322x: x
2323
2324// indirect cycles
2325b: c
2326c: d
2327d: b
2328```
2329
2330Implementations should report these as an error except in the following cases:
2331
2332
2333#### Expressions that unify an atom with an expression
2334
2335An expression of the form `a & e`, where `a` is an atom
2336and `e` is an expression, always evaluates to `a` or bottom.
2337As it does not matter how we fail, we can assume the result to be `a`
2338and validate after the field in which the expression occurs has been evaluated
2339that `a == e`.
2340
2341```
Marcel van Lohuizeneac8f9a2019-08-03 13:53:56 +02002342// Config Evaluates to (requiring concrete values)
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002343x: { x: {
2344 a: b + 100 a: _|_ // cycle detected
2345 b: a - 100 b: _|_ // cycle detected
2346} }
2347
2348y: x & { y: {
2349 a: 200 a: 200 // asserted that 200 == b + 100
2350 b: 100
2351} }
2352```
2353
2354
2355#### Field values
2356
2357A field value of the form `r & v`,
2358where `r` evaluates to a reference cycle and `v` is a value,
2359evaluates to `v`.
2360Unification is idempotent and unifying a value with itself ad infinitum,
2361which is what the cycle represents, results in this value.
2362Implementations should detect cycles of this kind, ignore `r`,
2363and take `v` as the result of unification.
2364<!-- Tomabechi's graph unification algorithm
2365can detect such cycles at near-zero cost. -->
2366
2367```
2368Configuration Evaluated
2369// c Cycles in nodes of type struct evaluate
2370// ↙︎ ↖ to the fixed point of unifying their
2371// a → b values ad infinitum.
2372
2373a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 }
2374b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 }
2375c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 }
2376
2377// resolve a b & {x:1}
2378// substitute b c & {y:2} & {x:1}
2379// substitute c a & {z:3} & {y:2} & {x:1}
2380// eliminate a (cycle) {z:3} & {y:2} & {x:1}
2381// simplify {x:1,y:2,z:3}
2382```
2383
2384This rule also applies to field values that are disjunctions of unification
2385operations of the above form.
2386
2387```
2388a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1}
2389b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2}
2390c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3}
2391
2392
2393// resolving a b&{x:1} | {y:1}
2394// substitute b ({x:2} | c&{z:2})&{x:1} | {y:1}
2395// simplify c&{z:2}&{x:1} | {y:1}
2396// substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
2397// simplify a&{y:3}&{z:2}&{x:1} | {y:1}
2398// eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1}
2399// expand {x:1,y:3,z:2} | {y:1}
2400```
2401
2402Note that all nodes that form a reference cycle to form a struct will evaluate
2403to the same value.
2404If a field value is a disjunction, any element that is part of a cycle will
2405evaluate to this value.
2406
2407
2408### Structural cycles
2409
2410CUE disallows infinite structures.
2411Implementations must report an error when encountering such declarations.
2412
2413<!-- for instance using an occurs check -->
2414
2415```
2416// Disallowed: a list of infinite length with all elements being 1.
2417list: {
2418 head: 1
2419 tail: list
2420}
2421
2422// Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
2423a: {
2424 b: c
2425}
2426c: {
2427 d: a
2428}
2429```
2430
2431It is allowed for a value to define an infinite set of possibilities
2432without evaluating to an infinite structure itself.
2433
2434```
2435// List defines a list of arbitrary length (default null).
2436List: *null | {
2437 head: _
2438 tail: List
2439}
2440```
2441
2442<!--
Marcel van Lohuizen7f48df72019-02-01 17:24:59 +01002443Consider banning any construct that makes CUE not having a linear
2444running time expressed in the number of nodes in the output.
2445
2446This would require restricting constructs like:
2447
2448(fib&{n:2}).out
2449
2450fib: {
2451 n: int
2452
2453 out: (fib&{n:n-2}).out + (fib&{n:n-1}).out if n >= 2
2454 out: fib({n:n-2}).out + fib({n:n-1}).out if n >= 2
2455 out: n if n < 2
2456}
2457
2458-->
2459<!--
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002460### Unused fields
2461
2462TODO: rules for detection of unused fields
2463
24641. Any alias value must be used
2465-->
2466
2467
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002468## Modules, instances, and packages
2469
2470CUE configurations are constructed combining _instances_.
2471An instance, in turn, is constructed from one or more source files belonging
2472to the same _package_ that together declare the data representation.
2473Elements of this data representation may be exported and used
2474in other instances.
2475
2476### Source file organization
2477
2478Each source file consists of an optional package clause defining collection
2479of files to which it belongs,
2480followed by a possibly empty set of import declarations that declare
2481packages whose contents it wishes to use, followed by a possibly empty set of
2482declarations.
2483
2484
2485```
2486SourceFile = [ PackageClause "," ] { ImportDecl "," } { TopLevelDecl "," } .
2487```
2488
2489### Package clause
2490
2491A package clause is an optional clause that defines the package to which
2492a source file the file belongs.
2493
2494```
2495PackageClause = "package" PackageName .
2496PackageName = identifier .
2497```
2498
2499The PackageName must not be the blank identifier.
2500
2501```
2502package math
2503```
2504
2505### Modules and instances
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002506A _module_ defines a tree of directories, rooted at the _module root_.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002507
2508All source files within a module with the same package belong to the same
2509package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002510<!-- jba: I can't make sense of the above sentence. -->
2511A module may define multiple packages.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002512
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002513An _instance_ of a package is any subset of files belonging
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002514to the same package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002515<!-- jba: Are you saying that -->
2516<!-- if I have a package with files a, b and c, then there are 8 instances of -->
2517<!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
2518<!-- purpose of that definition? -->
2519It is interpreted as the concatenation of these files.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002520
2521An implementation may impose conventions on the layout of package files
2522to determine which files of a package belongs to an instance.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002523For example, an instance may be defined as the subset of package files
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002524belonging to a directory and all its ancestors.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002525<!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002526
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002527
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002528### Import declarations
2529
2530An import declaration states that the source file containing the declaration
2531depends on definitions of the _imported_ package (§Program initialization and
2532execution) and enables access to exported identifiers of that package.
2533The import names an identifier (PackageName) to be used for access and an
2534ImportPath that specifies the package to be imported.
2535
2536```
2537ImportDecl = "import" ( ImportSpec | "(" { ImportSpec ";" } ")" ) .
Marcel van Lohuizenfbab65d2019-08-13 16:51:15 +02002538ImportSpec = [ PackageName ] ImportPath .
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002539ImportLocation = { unicode_value } .
2540ImportPath = `"` ImportLocation [ ":" identifier ] `"` .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002541```
2542
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002543The PackageName is used in qualified identifiers to access
2544exported identifiers of the package within the importing source file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002545It is declared in the file block.
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002546It defaults to the identifier specified in the package clause of the imported
2547package, which must match either the last path component of ImportLocation
2548or the identifier following it.
2549
2550<!--
2551Note: this deviates from the Go spec where there is no such restriction.
2552This restriction has the benefit of being to determine the identifiers
2553for packages from within the file itself. But for CUE it is has another benefit:
2554when using package hiearchies, one is more likely to want to include multiple
2555packages within the same directory structure. This mechanism allows
2556disambiguation in these cases.
2557-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002558
2559The interpretation of the ImportPath is implementation-dependent but it is
2560typically either the path of a builtin package or a fully qualifying location
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002561of a package within a source code repository.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002562
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002563An ImportLocation must be a non-empty strings using only characters belonging
2564Unicode's L, M, N, P, and S general categories
2565(the Graphic characters without spaces)
2566and may not include the characters !"#$%&'()*,:;<=>?[\]^`{|}
2567or the Unicode replacement character U+FFFD.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002568
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002569Assume we have package containing the package clause "package math",
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002570which exports function Sin at the path identified by "lib/math".
2571This table illustrates how Sin is accessed in files
2572that import the package after the various types of import declaration.
2573
2574```
2575Import declaration Local name of Sin
2576
2577import "lib/math" math.Sin
Marcel van Lohuizen7414fae2019-08-13 17:26:35 +02002578import "lib/math:math" math.Sin
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002579import m "lib/math" m.Sin
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002580```
2581
2582An import declaration declares a dependency relation between the importing and
2583imported package. It is illegal for a package to import itself, directly or
2584indirectly, or to directly import a package without referring to any of its
2585exported identifiers.
2586
2587
2588### An example package
2589
2590TODO
2591
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002592
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002593
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002594
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002595