blob: c0ea20a66835821d8ddcde5d52f6a300af3931d4 [file] [log] [blame] [view]
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001<!--
2 Copyright 2018 The CUE Authors
3
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License.
6 You may obtain a copy of the License at
7
8 http://www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in writing, software
11 distributed under the License is distributed on an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 See the License for the specific language governing permissions and
14 limitations under the License.
15-->
16
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010017# The CUE Language Specification
18
19## Introduction
20
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010021This is a reference manual for the CUE data constraint language.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010022CUE, pronounced cue or Q, is a general-purpose and strongly typed
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010023constraint-based language.
24It can be used for data templating, data validation, code generation, scripting,
25and many other applications involving structured data.
26The CUE tooling, layered on top of CUE, provides
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010027a general purpose scripting language for creating scripts as well as
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010028simple servers, also expressed in CUE.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010029
30CUE was designed with cloud configuration, and related systems, in mind,
31but is not limited to this domain.
32It derives its formalism from relational programming languages.
33This formalism allows for managing and reasoning over large amounts of
Marcel van Lohuizen5953c662019-01-26 13:26:04 +010034data in a straightforward manner.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010035
36The grammar is compact and regular, allowing for easy analysis by automatic
37tools such as integrated development environments.
38
39This document is maintained by mpvl@golang.org.
40CUE has a lot of similarities with the Go language. This document draws heavily
Marcel van Lohuizen73f14eb2019-01-30 17:11:17 +010041from the Go specification as a result.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010042
43CUE draws its influence from many languages.
44Its main influences were BCL/ GCL (internal to Google),
45LKB (LinGO), Go, and JSON.
46Others are Swift, Javascript, Prolog, NCL (internal to Google), Jsonnet, HCL,
47Flabbergast, JSONPath, Haskell, Objective-C, and Python.
48
49
50## Notation
51
52The syntax is specified using Extended Backus-Naur Form (EBNF):
53
54```
55Production = production_name "=" [ Expression ] "." .
56Expression = Alternative { "|" Alternative } .
57Alternative = Term { Term } .
58Term = production_name | token [ "…" token ] | Group | Option | Repetition .
59Group = "(" Expression ")" .
60Option = "[" Expression "]" .
61Repetition = "{" Expression "}" .
62```
63
64Productions are expressions constructed from terms and the following operators,
65in increasing precedence:
66
67```
68| alternation
69() grouping
70[] option (0 or 1 times)
71{} repetition (0 to n times)
72```
73
74Lower-case production names are used to identify lexical tokens. Non-terminals
75are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes
76``.
77
78The form a … b represents the set of characters from a through b as
79alternatives. The horizontal ellipsis … is also used elsewhere in the spec to
80informally denote various enumerations or code snippets that are not further
81specified. The character … (as opposed to the three characters ...) is not a
82token of the Go language.
83
84
85## Source code representation
86
87Source code is Unicode text encoded in UTF-8.
88Unless otherwise noted, the text is not canonicalized, so a single
89accented code point is distinct from the same character constructed from
90combining an accent and a letter; those are treated as two code points.
91For simplicity, this document will use the unqualified term character to refer
92to a Unicode code point in the source text.
93
94Each code point is distinct; for instance, upper and lower case letters are
95different characters.
96
97Implementation restriction: For compatibility with other tools, a compiler may
98disallow the NUL character (U+0000) in the source text.
99
100Implementation restriction: For compatibility with other tools, a compiler may
101ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
102point in the source text. A byte order mark may be disallowed anywhere else in
103the source.
104
105
106### Characters
107
108The following terms are used to denote specific Unicode character classes:
109
110```
111newline = /* the Unicode code point U+000A */ .
112unicode_char = /* an arbitrary Unicode code point except newline */ .
113unicode_letter = /* a Unicode code point classified as "Letter" */ .
114unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
115```
116
117In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
118character categories.
119CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
120as Unicode letters, and those in the Number category Nd as Unicode digits.
121
122
123### Letters and digits
124
125The underscore character _ (U+005F) is considered a letter.
126
127```
128letter = unicode_letter | "_" .
129decimal_digit = "0" … "9" .
130octal_digit = "0" … "7" .
131hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
132```
133
134
135## Lexical elements
136
137### Comments
138Comments serve as program documentation. There are two forms:
139
1401. Line comments start with the character sequence // and stop at the end of the line.
1412. General comments start with the character sequence /* and stop with the first subsequent character sequence */.
142
143A comment cannot start inside string literal or inside a comment.
144A general comment containing no newlines acts like a space.
145Any other comment acts like a newline.
146
147
148### Tokens
149
150Tokens form the vocabulary of the CUE language. There are four classes:
151identifiers, keywords, operators and punctuation, and literals. White space,
152formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
153(U+000D), and newlines (U+000A), is ignored except as it separates tokens that
154would otherwise combine into a single token. Also, a newline or end of file may
155trigger the insertion of a comma. While breaking the input into tokens, the
156next token is the longest sequence of characters that form a valid token.
157
158
159### Commas
160
161The formal grammar uses commas "," as terminators in a number of productions.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500162CUE programs may omit most of these commas using the following two rules:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100163
164When the input is broken into tokens, a comma is automatically inserted into
165the token stream immediately after a line's final token if that token is
166
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500167- an identifier
168- null, true, false, bottom, or an integer, floating-point, or string literal
169- one of the characters ), ], or }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100170
171
172Although commas are automatically inserted, the parser will require
173explicit commas between two list elements.
174
175To reflect idiomatic use, examples in this document elide commas using
176these rules.
177
178
179### Identifiers
180
181Identifiers name entities such as fields and aliases.
182An identifier is a sequence of one or more letters and digits.
183It may not be `_`.
184The first character in an identifier must be a letter.
185
186<!--
187TODO: allow identifiers as defined in Unicode UAX #31
188(https://unicode.org/reports/tr31/).
189
190Identifiers are normalized using the NFC normal form.
191-->
192
193```
194identifier = letter { letter | unicode_digit } .
195```
196
197```
198a
199_x9
200fieldName
201αβ
202```
203
204<!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
205
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500206Some identifiers are [predeclared](#predeclared-identifiers).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100207
208
209### Keywords
210
211CUE has a limited set of keywords.
212All keywords may be used as labels (field names).
213They cannot, however, be used as identifiers to refer to the same name.
214
215
216#### Values
217
218The following keywords are values.
219
220```
221null true false
222```
223
224These can never be used to refer to a field of the same name.
225This restriction is to ensure compatibility with JSON configuration files.
226
227
228#### Preamble
229
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100230The following keywords are used at the preamble of a CUE file.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100231After the preamble, they may be used as identifiers to refer to namesake fields.
232
233```
234package import
235```
236
237
238#### Comprehension clauses
239
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100240The following keywords are used in comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100241
242```
243for in if let
244```
245
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100246The keywords `for`, `if` and `let` cannot be used as identifiers to
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100247refer to fields. All others can.
248
249<!--
250TODO:
251 reduce [to]
252 order [by]
253-->
254
255
256#### Arithmetic
257
258The following pseudo keywords can be used as operators in expressions.
259
260```
261div mod quo rem
262```
263
264These may be used as identifiers to refer to fields in all other contexts.
265
266
267### Operators and punctuation
268
269The following character sequences represent operators and punctuation:
270
271```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +0100272+ div && == != ( )
273- mod || < <= [ ]
274* quo ! > >= { }
275/ rem & : <- ; ,
276% _|_ | = ... .. .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100277```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100278<!-- :: for "is-a" definitions -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100279
280### Integer literals
281
282An integer literal is a sequence of digits representing an integer value.
283An optional prefix sets a non-decimal base: 0 for octal,
2840x or 0X for hexadecimal, and 0b for binary.
285In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500286All integers allow interstitial underscores "_";
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100287these have no meaning and are solely for readability.
288
289Decimal integers may have a SI or IEC multiplier.
290Multipliers can be used with fractional numbers.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500291When multiplying a fraction by a multiplier, the result is truncated
292towards zero if it is not an integer.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100293
294```
295int_lit = decimal_lit | octal_lit | binary_lit | hex_lit .
296decimals = ( "0" … "9" ) { [ "_" ] decimal_digit } .
297decimal_lit = ( "1" … "9" ) { [ "_" ] decimal_digit } [ [ "." decimals ] multiplier ] |
298 "." decimals multiplier.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100299binary_lit = "0b" binary_digit { binary_digit } .
300hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
Jonathan Amsterdamabeffa42019-01-20 10:29:29 -0500301multiplier = ( "K" | "M" | "G" | "T" | "P" | "E" | "Y" | "Z" ) [ "i" ]
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100302```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100303
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100304<!--
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100305octal_lit = "0" octal_digit { [ "_" ] octal_digit } .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100306TODO: consider 0o766 notation for octal.
307--->
308
309```
31042
3111.5Gi
3120600
3130xBad_Face
314170_141_183_460_469_231_731_687_303_715_884_105_727
315```
316
317### Decimal floating-point literals
318
319A decimal floating-point literal is a representation of
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500320a decimal floating-point value (a _float_).
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100321It has an integer part, a decimal point, a fractional part, and an
322exponent part.
323The integer and fractional part comprise decimal digits; the
324exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
325One of the integer part or the fractional part may be elided; one of the decimal
326point or the exponent may be elided.
327
328```
329decimal_lit = decimals "." [ decimals ] [ exponent ] |
330 decimals exponent |
331 "." decimals [ exponent ] .
332exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
333```
334
335```
3360.
33772.40
338072.40 // == 72.40
3392.71828
3401.e+0
3416.67428e-11
3421E6
343.25
344.12345E+5
345```
346
347
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100348### String and byte sequence literals
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100349
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100350A string literal represents a string constant obtained from concatenating a
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100351sequence of characters.
352Byte sequences are a sequence of bytes.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100353
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100354String and byte sequence literals are character sequences between,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100355respectively, double and single quotes, as in `"bar"` and `'bar'`.
356Within the quotes, any character may appear except newline and,
357respectively, unescaped double or single quote.
358String literals may only be valid UTF-8.
359Byte sequences may contain any sequence of bytes.
360
361Several backslash escapes allow arbitrary values to be encoded as ASCII text
362in interpreted strings.
363There are four ways to represent the integer value as a numeric constant: `\x`
364followed by exactly two hexadecimal digits; \u followed by exactly four
365hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
366plain backslash `\` followed by exactly three octal digits.
367In each case the value of the literal is the value represented by the
368digits in the corresponding base.
369Hexadecimal and octal escapes are only allowed within byte sequences
370(single quotes).
371
372Although these representations all result in an integer, they have different
373valid ranges.
374Octal escapes must represent a value between 0 and 255 inclusive.
375Hexadecimal escapes satisfy this condition by construction.
376The escapes `\u` and `\U` represent Unicode code points so within them
377some values are illegal, in particular those above `0x10FFFF`.
378Surrogate halves are allowed to be compatible with JSON,
379but are translated into their non-surrogate equivalent internally.
380
381The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
382represent individual bytes of the resulting string; all other escapes represent
383the (possibly multi-byte) UTF-8 encoding of individual characters.
384Thus inside a string literal `\377` and `\xFF` represent a single byte of
385value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
386the two bytes `0xc3 0xbf` of the UTF-8
387encoding of character `U+00FF`.
388
389After a backslash, certain single-character escapes represent special values:
390
391```
392\a U+0007 alert or bell
393\b U+0008 backspace
394\f U+000C form feed
395\n U+000A line feed or newline
396\r U+000D carriage return
397\t U+0009 horizontal tab
398\v U+000b vertical tab
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100399\/ U+002f slash (solidus)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100400\\ U+005c backslash
401\' U+0027 single quote (valid escape only within single quoted literals)
402\" U+0022 double quote (valid escape only within double quoted literals)
403```
404
405The escape `\(` is used as an escape for string interpolation.
406A `\(` must be followed by a valid CUE Expression, followed by a `)`.
407
408All other sequences starting with a backslash are illegal inside literals.
409
410```
411escaped_char = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
412unicode_value = unicode_char | little_u_value | big_u_value | escaped_char .
413byte_value = octal_byte_value | hex_byte_value .
414octal_byte_value = `\` octal_digit octal_digit octal_digit .
415hex_byte_value = `\` "x" hex_digit hex_digit .
416little_u_value = `\` "u" hex_digit hex_digit hex_digit hex_digit .
417big_u_value = `\` "U" hex_digit hex_digit hex_digit hex_digit
418 hex_digit hex_digit hex_digit hex_digit .
419
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100420string_lit = interpreted_string_lit |
421 interpreted_bytes_lit |
422 multiline_lit .
423
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100424interpolation = "\(" Expression ")" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100425interpreted_string_lit = `"` { unicode_value | interpolation } `"` .
426interpreted_bytes_lit = `"` { unicode_value | interpolation | byte_value } `"` .
427```
428
429```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100430'a\000\xab'
431'\007'
432'\377'
433'\xa' // illegal: too few hexadecimal digits
434"\n"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100435"\""
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100436'Hello, world!\n'
437"Hello, \( name )!"
438"日本語"
439"\u65e5本\U00008a9e"
440"\xff\u00FF"
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +0100441"\uD800" // illegal: surrogate half (TODO: probably should allow)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100442"\U00110000" // illegal: invalid Unicode code point
443```
444
445These examples all represent the same string:
446
447```
448"日本語" // UTF-8 input text
449'日本語' // UTF-8 input text as byte sequence
450`日本語` // UTF-8 input text as a raw literal
451"\u65e5\u672c\u8a9e" // the explicit Unicode code points
452"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points
453"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
454```
455
456If the source code represents a character as two code points, such as a
457combining form involving an accent and a letter, the result will appear as two
458code points if placed in a string literal.
459
460Each of the interpreted string variants have a multiline equivalent.
461Multiline interpreted strings are like their single-line equivalent,
462but allow newline characters.
463Carriage return characters (`\r`) inside raw string literals are discarded from
464the raw string value.
465
466Multiline interpreted strings and byte sequences respectively start with
467a triple double quote (`"""`) or triple single quote (`'''`),
468immediately followed by a newline, which is discarded from the string contents.
469The string is closed by a matching triple quote, which must be by itself
470on a newline, preceded by optional whitespace.
471The whitespace before a closing triple quote must appear before any non-empty
472line after the opening quote and will be removed from each of these
473lines in the string literal.
474A closing triple quote may not appear in the string.
475To include it is suffices to escape one of the quotes.
476
477```
478multiline_lit = multiline_string_lit | multiline_bytes_lit .
479multiline_string_lit = `"""` newline
480 { unicode_char | interpolation | newline }
481 newline `"""` .
482multiline_bytes_lit = "'''" newline
483 { unicode_char | interpolation | newline | byte_value }
484 newline "'''" .
485```
486
487```
488"""
489 lily:
490 out of the water
491 out of itself
492
493 bass
494 picking bugs
495 off the moon
496 — Nick Virgilio, Selected Haiku, 1988
497 """
498```
499
500This represents the same string as:
501
502```
503"lily:\nout of the water\nout of itself\n\n" +
504"bass\npicking bugs\noff the moon\n" +
505" — Nick Virgilio, Selected Haiku, 1988"
506```
507
508<!-- TODO: other values
509
510Support for other values:
511- Duration literals
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +0100512- regular expessions: `re("[a-z]")`
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100513-->
514
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500515
516## Values
517
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100518In addition to simple values like `"hello"` and `42.0`, CUE has _structs_.
519A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
520Structs are CUE's only way of building up complex values;
521lists, which we will see later,
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500522are defined in terms of structs.
523
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100524All possible values are ordered in a lattice,
525a partial order where every two elements have a single greatest lower bound.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500526A value `a` is an _instance_ of a value `b`,
527denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
528that is if `a` orders before `b` in the partial order
529(`⊑` is _not_ a CUE operator).
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100530We also say that `b` _subsumes_ `a` in this case.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500531In graphical terms, `b` is "above" `a` in the lattice.
532
533At the top of the lattice is the single ancestor of all values, called
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100534_top_, denoted `_` in CUE.
535Every value is an instance of top.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500536
537At the bottom of the lattice is the value called _bottom_, denoted `_|_`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100538A bottom value usually indicates an error.
539Bottom is an instance of every value.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500540
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100541An _atom_ is any value whose only instances are itself and bottom.
542Examples of atoms are `42.0`, `"hello"`, `true`, `null`.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500543
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100544A value is _concrete_ if it is either an atom, or a struct all of whose
545field values are themselves concrete, recursively.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500546
547CUE's values also include what we normally think of as types, like `string` and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100548`float`.
549But CUE does not distinguish between types and values; only the
550relationship of values in the lattice is important.
551Each CUE "type" subsumes the concrete values that one would normally think
552of as part of that type.
553For example, "hello" is an instance of `string`, and `42.0` is an instance of
554`float`.
555In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`.
556We informally call these CUE's "basic types".
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100557
558
559```
560false ⊑ bool
561true ⊑ bool
562true ⊑ true
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01005635.0 ⊑ float
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100564bool ⊑ _
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100565_|_ ⊑ _
566_|_ ⊑ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100567
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100568_ ⋢ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100569_ ⋢ bool
570int ⋢ bool
571bool ⋢ int
572false ⋢ true
573true ⋢ false
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100574float ⋢ 5.0
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01005755 ⋢ 6
576```
577
578
579### Unification
580
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500581The _unification_ of values `a` and `b`
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100582is defined as the greatest lower bound of `a` and `b`. (That is, the
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500583value `u` such that `u ⊑ a` and `u ⊑ b`,
584and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
585it holds that `v ⊑ u`.)
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500586Since CUE values form a lattice, the unification of two CUE values is
587always unique.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100588
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500589These all follow from the definition of unification:
590- The unification of `a` with itself is always `a`.
591- The unification of values `a` and `b` where `a ⊑ b` is always `a`.
592- The unification of a value with bottom is always bottom.
593
594Unification in CUE is a [binary expression](#Operands), written `a & b`.
595It is commutative and associative.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100596As a consequence, order of evaluation is irrelevant, a property that is key
597to many of the constructs in the CUE language as well as the tooling layered
598on top of it.
599
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500600
601
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100602<!-- TODO: explicitly mention that disjunction is not a binary operation
603but a definition of a single value?-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100604
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100605
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100606### Disjunction
607
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500608The _disjunction_ of values `a` and `b`
609is defined as the least upper bound of `a` and `b`.
610(That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
611and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
612it holds that `d ⊑ e`.)
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100613This style of disjunctions is sometimes also referred to as sum types.
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500614Since CUE values form a lattice, the disjunction of two CUE values is always unique.
615
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100616
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500617These all follow from the definition of disjunction:
618- The disjunction of `a` with itself is always `a`.
619- The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
620- The disjunction of a value `a` with bottom is always `a`.
621- The disjunction of two bottom values is bottom.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100622
Jonathan Amsterdama8d8a3c2019-02-03 07:53:55 -0500623Disjunction in CUE is a [binary expression](#Operands), written `a | b`.
624It is commutative and associative.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100625
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100626The unification of a disjunction with another value is equal to the disjunction
627composed of the unification of this value with all of the original elements
628of the disjunction.
629In other words, unification distributes over disjunction.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100630
631```
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100632(a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100633```
634
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100635```
636Expression Result
637({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100638(int | string) & "foo" "foo"
639("a" | "b") & "c" _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100640```
641
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500642
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100643#### Default values
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500644
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100645One or more values in a disjunction can be _marked_
Marcel van Lohuizen08466f82019-02-01 09:09:09 +0100646by prefixing it with a `*` ([a unary expression](#Operators)).
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100647A bottom value cannot be marked.
648When a marked value is unified, the result is also marked.
649(When unification results in a single value,
650the mark is dropped, as single values cannot be marked.)
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500651
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100652A disjunction is _normalized_ if there is no unmarked element
653`a` for which there is an element `b` such that `a ⊑ b`
654and no marked element `c` for which there is a marked element
655`d` such that `c ⊑ d`.
656A disjunction literal must be normalized.
657
658<!--
659(non-normalized entries could also be implicitly marked, allowing writing
660int | 1, instead of int | *1, but that can be done in a backwards
661compatible way later if really desirable).
662
663Normalization is important, as we need to account for spurious elements
664For instance
665"tcp" | "tcp", or
666({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
667
668In the latter case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1}.
669Note that without defaults, {a:1} | {a:1,b:1} | {a:1,b:2} is logically
670identical to {a:1}.
671More to the point, without normalization unifying {a:1} | {b:1} with {a:1,b:2}
672results in a single value and thus resolves,
673whereas unifying {a:1} | {a:1,b:1} | {a:1,b:2} with {a:1,b:2}
674results in two values, and thus does not resolve.
675With normalization:
676({a:1} | {a:1,b:1} | {a:1,b:2}) & {a:1} {a:1}, instead of _|_,
677({a:1} | {b:1}) & {a:1} {a:1} (instead of _|_), as {a:1,b:1} ⊑ {a:1}
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500678-->
679
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100680If a disjunction appears where a concrete value is required
681(that is, as an operand or in a location where it will be emitted),
682the result is, after normalization and after dropping non-marked elements
683if some elements are marked,
684the resulting value itself if only a single value remains or bottom otherwise.
685
686<!--
687We treat remaining marked and unmarked elements the same to have less surprises:
688
689Unifying {a:1}|{b:1} with *{}|string produces *{a:1}|*{b:1}. It would be
690surprising to have a different default for {a:1}|{b:1} and *{a:1}|*{b:1} in
691this case.
692
693Similarly, we do not unify the remaining elements to minimize the difference
694between using a disjunction in cases where concrete values are required
695versus otherwise.
696-->
697
698<!-- TODO: is the above definition precise enough, or perhaps too abstract?
699Previously:
700
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100701A default value is chosen if the disjunction is not used
702in a unification or disjunction operation.
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100703This means that, in practice, a default is chosen for almost any expression
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100704that does not involve `&` and `|`, including slices, indices, selectors,
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100705and all but a few explicitly marked builtin functions. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100706
707```
708Expression Default
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100709"tcp" | "udp" _|_ // more than one element remaining
710*"tcp" | "udp" "tcp"
711float | *1 1
712*string | 1.0 string
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100713
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100714(*"tcp"|"udp") & ("udp"|*"tcp") "tcp"
715(*"tcp"|"udp") & ("udp"|"tcp") "tcp"
716(*"tcp"|"udp") & "tcp" "tcp"
717(*"tcp"|"udp") & (*"udp"|"tcp") _|_ // "tcp" & "udp"
718
719(*true | false) & bool true
720(*true | false) & (true | false) true
721
722{a: 1} | {b: 1} _|_ // more than one element remaining
723{a: 1} | *{b: 1} {b:1}
724*{a: 1} | *{b: 1} _|_ // more than one marked element remaining
725({a: 1} | {b: 1}) & {a:1} {a:1} // after eliminating {a:1,b:1}
726({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1} // after eliminating *{a:1,b:1}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100727```
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500728
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100729A disjunction always evaluates to the same default value, regardless of
730the context in which the value is used.
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100731For instance, `[1, 3][*"a" | 1]` will result in an error, as `"a"` will be
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100732selected as the default value.
733
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100734```
Marcel van Lohuizen69139d62019-01-24 13:46:51 +0100735[1, 2][*"a" | 1] // _|_ // "a" is not an integer value
736[1, 2][(*"a" | 1) & int] // 2, as "a" is eliminated before choosing a default.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100737```
738
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100739
740### Bottom and errors
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100741
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100742Any evaluation error in CUE results in a bottom value, respresented by
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100743the token '_|_'.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100744Bottom is an instance of every other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100745Any evaluation error is represented as bottom.
746
747Implementations may associate error strings with different instances of bottom;
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500748logically they all remain the same value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100749
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100750
751### Top
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100752
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100753Top is represented by the underscore character '_', lexically an identifier.
754Unifying any value `v` with top results `v` itself.
755
756```
757Expr Result
758_ & 5 5
759_ & _ _
760_ & _|_ _|_
761_ | _|_ _
762```
763
764
765### Null
766
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100767The _null value_ is represented with the keyword `null`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100768It has only one parent, top, and one child, bottom.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100769It is unordered with respect to any other value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100770
771```
772null_lit = "null"
773```
774
775```
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100776null & 8 _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100777null & _ null
778null & _|_ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100779```
780
781
782### Boolean values
783
784A _boolean type_ represents the set of Boolean truth values denoted by
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100785the keywords `true` and `false`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100786The predeclared boolean type is `bool`; it is a defined type and a separate
787element in the lattice.
788
789```
790boolean_lit = "true" | "false"
791```
792
793```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100794bool & true true
795true & true true
796true & false _|_
797bool & (false|true) false | true
798bool & (true|false) true | false
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100799```
800
801
802### Numeric values
803
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500804The _integer type_ represents the set of all integral numbers.
805The _decimal floating-point type_ represents the set of all decimal floating-point
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100806numbers.
807They are two distinct types.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500808The predeclared integer and decimal floating-point types are `int` and `float`;
809they are defined types.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100810
811A decimal floating-point literal always has type `float`;
812it is not an instance of `int` even if it is an integral number.
813
814An integer literal has both type `int` and `float`, with the integer variant
815being the default if no other constraints are applied.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500816Expressed in terms of disjunction and [type conversion](#conversions),
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100817the literal `1`, for instance, is defined as `int(1) | float(1)`.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100818Hexadecimal, octal, and binary integer literals are always of type `int`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100819
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100820Numeric literals are exact values of arbitrary precision.
821If the operation permits it, numbers should be kept in arbitrary precision.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100822
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100823Implementation restriction: although numeric values have arbitrary precision
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100824in the language, implementations may implement them using an internal
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100825representation with limited precision.
826That said, every implementation must:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100827
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500828- Represent integer values with at least 256 bits.
829- Represent floating-point values, with a mantissa of at least 256 bits and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100830a signed binary exponent of at least 16 bits.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500831- Give an error if unable to represent an integer value precisely.
832- Give an error if unable to represent a floating-point value due to overflow.
833- Round to the nearest representable value if unable to represent
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100834a floating-point value due to limits on precision.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100835These requirements apply to the result of any expression except for builtin
836functions for which an unusual loss of precision must be explicitly documented.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100837
838
839### Strings
840
841The _string type_ represents the set of all possible UTF-8 strings,
842not allowing surrogates.
843The predeclared string type is `string`; it is a defined type.
844
845Strings are designed to be unicode-safe.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500846Comparison is done using canonical forms ("é" == "e\u0301").
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100847A string element is an
848[extended grapheme cluster](https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries),
849which is an approximation of a human-readable character.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100850
851The length of a string `s` (its size in bytes) can be discovered using
852the built-in function len.
853A string's extended grapheme cluster can be accessed by integer index
8540 through len(s)-1 for any byte that is part of that grapheme cluster.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100855
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100856To access the individual bytes of a string one should convert it to
857a sequence of bytes first.
858
859
860### Ranges
861
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100862A _range type_, syntactically a [binary expression](#Operands), defines
863a (possibly infinite) disjunction of concrete values that can be represented
864as a contiguous range.
865A concrete value `c` unifies with `a..b` if `a <= c` and `c <= b`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100866Ranges can be defined on numbers and strings.
867
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100868A range of numbers `a..b` defines an inclusive range for integers and
869floating-point numbers.
870
871Remember that an integer literal represents both an `int` and `float`:
872```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05008732 & 1..5 // 2, where 2 is either an int or float.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01008742.5 & 1..5 // 2.5
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01008752 & 1.0..3.0 // 2.0
8762 & 1..3.0 // 2.0
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01008772.5 & int & 1..5 // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01008782.5 & float & 1..5 // 2.5
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100879int & 2 & 1.0..3.0 // _|_
8802.5 & (int & 1)..5 // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01008810..7 & 3..10 // 3..7
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100882"foo" & "a".."n" // "foo"
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100883```
884
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100885
886### Structs
887
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500888A _struct_ is a set of elements called _fields_, each of
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100889which has a name, called a _label_, and value.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100890
891We say a label is defined for a struct if the struct has a field with the
892corresponding label.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100893The value for a label `f` of struct `a` is denoted `f.a`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100894A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100895defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100896Note that if `a` is an instance of `b` it may have fields with labels that
897are not defined for `b`.
898
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500899The (unique) struct with no fields, written `{}`, has every struct as an
900instance. It can be considered the type of all structs.
901
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100902The successful unification of structs `a` and `b` is a new struct `c` which
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100903has all fields of both `a` and `b`, where
904the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
905or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
906Any [references](#References) to `a` or `b`
907in their respective field values need to be replaced with references to `c`.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +0100908The result of a unification is bottom (`_|_`) if any of its fields evaluates
909to bottom, recursively.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100910
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100911A field name may also be an interpolated string.
912Identifiers used in such strings are evaluated within
913the scope of the struct in which the label is defined.
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500914
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100915Syntactically, a struct literal may contain multiple fields with
916the same label, the result of which is a single field with a value
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500917that is the unification of the values of those fields.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100918
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100919A TemplateLabel indicates a template value that is to be unified with
920the values of all fields within a struct.
921The identifier of a template label binds to the field name of each
922field and is visible within the template value.
923
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100924```
925StructLit = "{" [ { Declaration "," } Declaration ] "}" .
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100926Declaration = FieldDecl | AliasDecl | ComprehensionDecl .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100927FieldDecl = Label { Label } ":" Expression .
928
929AliasDecl = Label "=" Expression .
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100930Label = identifier | interpreted_string_lit | TemplateLabel .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100931TemplateLabel = "<" identifier ">" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100932Tag = "#" identifier [ ":" json_string ] .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100933```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100934
935```
936{a: 1} ⊑ {}
937{a: 1, b: 1} ⊑ {a: 1}
938{a: 1} ⊑ {a: int}
939{a: 1, b: 1} ⊑ {a: int, b: float}
940
941{} ⋢ {a: 1}
942{a: 2} ⋢ {a: 1}
943{a: 1} ⋢ {b: 1}
944```
945
946```
947Expression Result
948{a: int, a: 1} {a: int(1)}
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500949{a: int} & {a: 1} {a: int(1)}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100950{a: 1..7} & {a: 5..9} {a: 5..7}
951{a: 1..7, a: 5..9} {a: 5..7}
952
953{a: 1} & {b: 2} {a: 1, b: 2}
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500954{a: 1, b: int} & {b: 2} {a: 1, b: int(2)}
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100955
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100956{a: 1} & {a: 2} _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100957```
958
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500959In addition to fields, a struct literal may also define aliases.
960Aliases name values that can be referred to within the [scope](#declarations-and-scopes) of their
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100961definition, but are not part of the struct: aliases are irrelevant to
962the partial ordering of values and are not emitted as part of any
963generated data.
964The name of an alias must be unique within the struct literal.
965
966```
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500967// The empty struct.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100968{}
969
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500970// A struct with 3 fields and 1 alias.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100971{
972 alias = 3
973
974 foo: 2
975 bar: "a string"
976
977 "not an ident": 4
978}
979```
980
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500981A field whose value is a struct with a single field may be written as
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100982a sequence of the two field names,
983followed by a colon and the value of that single field.
984
985```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +0100986job myTask replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100987```
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500988expands to
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100989```
Jonathan Amsterdame4790382019-01-20 10:29:29 -0500990job: {
991 myTask: {
992 replicas: 2
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100993 }
994}
995```
996
997
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100998### Lists
999
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001000A list literal defines a new value of type list.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001001A list may be open or closed.
1002An open list is indicated with a `...` at the end of an element list,
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001003optionally followed by a value for the remaining elements.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001004
1005The length of a closed list is the number of elements it contains.
1006The length of an open list is the its number of elements as a lower bound
1007and an unlimited number of elements as its upper bound.
1008
1009```
1010ListLit = "[" [ ElementList [ "," [ "..." [ Element ] ] ] "]" .
1011ElementList = Element { "," Element } .
1012Element = Expression | LiteralValue .
1013```
1014<!---
1015KeyedElement = Element .
1016--->
1017
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001018Lists can be thought of as structs:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001019
1020```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001021List: *null | {
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001022 Elem: _
1023 Tail: List
1024}
1025```
1026
1027For closed lists, `Tail` is `null` for the last element, for open lists it is
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001028`*null | List`, defaulting to the shortest variant.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001029For instance, the open list [ 1, 2, ... ] can be represented as:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001030```
1031open: List & { Elem: 1, Tail: { Elem: 2 } }
1032```
1033and the closed version of this list, [ 1, 2 ], as
1034```
1035closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
1036```
1037
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001038Using this representation, the subsumption rule for lists can
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001039be derived from those of structs.
1040Implementations are not required to implement lists as structs.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001041The `Elem` and `Tail` fields are not special and `len` will not work as
1042expected in these cases.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001043
1044
1045## Declarations and Scopes
1046
1047
1048### Blocks
1049
1050A _block_ is a possibly empty sequence of declarations.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001051The braces of a struct literal `{ ... }` form a block, but there are
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001052others as well:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001053
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001054- The _universe block_ encompasses all CUE source text.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001055- Each [package](#modules-instances-and-packages) has a _package block_
1056 containing all CUE source text in that package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001057- Each file has a _file block_ containing all CUE source text in that file.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001058- Each `for` and `let` clause in a [comprehension](#comprehensions)
1059 is considered to be its own implicit block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001060
1061Blocks nest and influence [scoping].
1062
1063
1064### Declarations and scope
1065
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001066A _declaration_ binds an identifier to a field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001067Every identifier in a program must be declared.
1068Other than for fields,
1069no identifier may be declared twice within the same block.
1070For fields an identifier may be declared more than once within the same block,
1071resulting in a field with a value that is the result of unifying the values
1072of all fields with the same identifier.
1073
1074```
1075TopLevelDecl = Declaration | Emit .
1076Emit = Operand .
1077```
1078
1079The _scope_ of a declared identifier is the extent of source text in which the
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001080identifier denotes the specified field, alias, or package.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001081
1082CUE is lexically scoped using blocks:
1083
Jonathan Amsterdame4790382019-01-20 10:29:29 -050010841. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010010851. The scope of an identifier denoting a field or alias
1086 declared at top level (outside any struct literal) is the file block.
10871. The scope of the package name of an imported package is the file block of the
1088 file containing the import declaration.
10891. The scope of a field or alias identifier declared inside a struct literal
1090 is the innermost containing block.
1091
1092An identifier declared in a block may be redeclared in an inner block.
1093While the identifier of the inner declaration is in scope, it denotes the entity
1094declared by the inner declaration.
1095
1096The package clause is not a declaration;
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001097the package name does not appear in any scope.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001098Its purpose is to identify the files belonging to the same package
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001099and to specify the default name for import declarations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001100
1101
1102### Predeclared identifiers
1103
1104```
1105Functions
1106len required close open
1107
1108Types
1109null The null type and value
1110bool All boolean values
1111int All integral numbers
1112float All decimal floating-point numbers
1113string Any valid UTF-8 sequence
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001114bytes Any vallid byte sequence
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001115
1116Derived Value
1117number int | float
1118uint 0..int
1119uint8 0..255
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001120int8 -128..127
1121uint16 0..65536
1122int16 -32_768...32_767
1123rune 0..0x10FFFF
1124uint32 0..4_294_967_296
1125int32 -2_147_483_648..2_147_483_647
1126uint64 0..18_446_744_073_709_551_615
1127int64 -9_223_372_036_854_775_808..9_223_372_036_854_775_807
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001128uint128 340_282_366_920_938_463_463_374_607_431_768_211_455
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01001129int128 -170_141_183_460_469_231_731_687_303_715_884_105_728..
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001130 170_141_183_460_469_231_731_687_303_715_884_105_727
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001131```
1132
1133
1134### Exported and manifested identifiers
1135
1136An identifier of a package may be exported to permit access to it
1137from another package.
1138An identifier is exported if both:
1139the first character of the identifier's name is not a Unicode lower case letter
1140(Unicode class "Ll") or the underscore "_"; and
1141the identifier is declared in the file block.
1142All other identifiers are not exported.
1143
1144An identifier that starts with the underscore "_" is not
1145emitted in any data output.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001146Quoted labels that start with an underscore are emitted, however.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001147
1148### Uniqueness of identifiers
1149
1150Given a set of identifiers, an identifier is called unique if it is different
1151from every other in the set, after applying normalization following
1152Unicode Annex #31.
1153Two identifiers are different if they are spelled differently.
1154<!--
1155or if they appear in different packages and are not exported.
1156--->
1157Otherwise, they are the same.
1158
1159
1160### Field declarations
1161
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001162A field declaration binds a label (the name of the field) to an expression.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001163The name for a quoted string used as label is the string it represents.
1164Tne name for an identifier used as a label is the identifier itself.
1165Quoted strings and identifiers can be used used interchangeably, with the
1166exception of identifiers starting with an underscore '_'.
1167The latter represent hidden fields and are treated in a different namespace.
1168
1169
1170### Alias declarations
1171
1172An alias declaration binds an identifier to the given expression.
1173
1174Within the scope of the identifier, it serves as an _alias_ for that
1175expression.
1176The expression is evaluated in the scope as it was declared.
1177
1178
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001179## Expressions
1180
1181An expression specifies the computation of a value by applying operators and
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001182built-in functions to operands.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001183
1184
1185### Operands
1186
1187Operands denote the elementary values in an expression.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001188An operand may be a literal, a (possibly qualified) identifier denoting
1189field, alias, or a parenthesized expression.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001190
1191```
1192Operand = Literal | OperandName | ListComprehension | "(" Expression ")" .
1193Literal = BasicLit | ListLit | StructLit .
1194BasicLit = int_lit | float_lit | string_lit |
1195 null_lit | bool_lit | bottom_lit | top_lit .
1196OperandName = identifier | QualifiedIdent.
1197```
1198
1199### Qualified identifiers
1200
1201A qualified identifier is an identifier qualified with a package name prefix.
1202
1203```
1204QualifiedIdent = PackageName "." identifier .
1205```
1206
1207A qualified identifier accesses an identifier in a different package,
1208which must be [imported].
1209The identifier must be declared in the [package block] of that package.
1210
1211```
1212math.Sin // denotes the Sin function in package math
1213```
1214
1215
1216### Primary expressions
1217
1218Primary expressions are the operands for unary and binary expressions.
Marcel van Lohuizen69139d62019-01-24 13:46:51 +01001219A default expression is only valid as an operand to a disjunction.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001220
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001221<!-- TODO(mpvl)
1222 Conversion |
1223-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001224```
1225PrimaryExpr =
1226 Operand |
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001227 PrimaryExpr Selector |
1228 PrimaryExpr Index |
1229 PrimaryExpr Slice |
1230 PrimaryExpr Arguments .
1231
1232Selector = "." identifier .
1233Index = "[" Expression "]" .
1234Slice = "[" [ Expression ] ":" [ Expression ] "]"
1235Argument = Expression .
1236Arguments = "(" [ ( Argument { "," Argument } ) [ "..." ] [ "," ] ] ")" .
1237```
1238<!---
1239Argument = Expression | ( identifer ":" Expression ).
1240--->
1241
1242```
1243x
12442
1245(s + ".txt")
1246f(3.1415, true)
1247m["foo"]
1248s[i : j + 1]
1249obj.color
1250f.p[i].x
1251```
1252
1253
1254### Selectors
1255
1256For a [primary expression] `x` that is not a [package name],
1257the selector expression
1258
1259```
1260x.f
1261```
1262
1263denotes the field `f` of the value `x`.
1264The identifier `f` is called the field selector.
1265The type of the selector expression is the type of `f`.
1266If `x` is a package name, see the section on [qualified identifiers].
1267
1268Otherwise, if `x` is not a struct, or if `f` does not exist in `x`,
1269the result of the expression is bottom (an error).
1270
1271```
1272T: {
1273 x: int
1274 y: 3
1275}
1276
1277a: T.x // int
1278b: T.y // 3
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001279c: T.z // _|_ // field 'z' not found in T
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001280```
1281
1282
1283### Index expressions
1284
1285A primary expression of the form
1286
1287```
1288a[x]
1289```
1290
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001291denotes the element of the list, string, bytes, or struct `a` indexed by `x`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001292The value `x` is called the index or field name, respectively.
1293The following rules apply:
1294
1295If `a` is not a struct:
1296
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001297- the index `x` must be a concrete integer.
1298 If `x` is a disjunction, the default, if any will be selected without unifying
1299 `x` with `int` beforehand.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001300- the index `x` is in range if `0 <= x < len(a)`, otherwise it is out of range
1301
1302The result of `a[x]` is
1303
1304for `a` of list type (including single quoted strings, which are lists of bytes):
1305
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001306- the list element at index `x`, if `x` is within range, where only the
1307 explicitly defined values of an open-ended list are considered
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001308- bottom (an error), otherwise
1309
1310for `a` of string type:
1311
1312- the grapheme cluster at the `x`th byte (type string), if `x` is within range
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001313 where `x` may match any byte of the grapheme cluster
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001314- bottom (an error), otherwise
1315
1316for `a` of struct type:
1317
1318- the value of the field named `x` of struct `a`, if this field exists
1319- bottom (an error), otherwise
1320
1321```
1322[ 1, 2 ][1] // 2
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001323[ 1, 2 ][2] // _|_
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001324[ 1, 2, ...][2] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001325"He\u0300?"[0] // "H"
1326"He\u0300?"[1] // "e\u0300"
1327"He\u0300?"[2] // "e\u0300"
1328"He\u0300?"[3] // "e\u0300"
1329"He\u0300?"[4] // "?"
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001330"He\u0300?"[5] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001331```
1332
1333
1334### Slice expressions
1335
1336Slice expressions construct a substring or slice from a string or list.
1337
1338For strings or lists, the primary expression
1339```
1340a[low : high]
1341```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001342constructs a substring or slice. The indices `low` and `high` must be
1343concrete integers and select
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001344which elements of operand `a` appear in the result.
1345The result has indices starting at 0 and length equal to `high` - `low`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001346After slicing the list `a`
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001347<!-- TODO(jba): how does slicing open lists work? -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001348
1349```
1350a := [1, 2, 3, 4, 5]
1351s := a[1:4]
1352```
1353the list s has length 3 and elements
1354```
1355s[0] == 2
1356s[1] == 3
1357s[2] == 4
1358```
1359For convenience, any of the indices may be omitted.
1360A missing `low` index defaults to zero; a missing `high` index defaults
1361to the length of the sliced operand:
1362```
1363a[2:] // same as a[2 : len(a)]
1364a[:3] // same as a[0 : 3]
1365a[:] // same as a[0 : len(a)]
1366```
1367
1368Indices are in range if `0 <= low <= high <= len(a)`,
1369otherwise they are out of range.
1370For strings, the indices selects the start of the extended grapheme cluster
1371at byte position indicated by the index.
1372If any of the slice values is out of range or if `low > high`, the result of
1373a slice is bottom (error).
1374
1375```
1376"He\u0300?"[:2] // "He\u0300"
1377"He\u0300?"[1:2] // "e\u0300"
1378"He\u0300?"[4:5] // "e\u0300?"
1379```
1380
1381
1382The result of a successful slice operation is a value of the same type
1383as the operand.
1384
1385
1386### Operators
1387
1388Operators combine operands into expressions.
1389
1390```
1391Expression = UnaryExpr | Expression binary_op Expression .
1392UnaryExpr = PrimaryExpr | unary_op UnaryExpr .
1393
1394binary_op = "|" | "&" | "||" | "&&" | rel_op | add_op | mul_op | ".." .
1395rel_op = "==" | "!=" | "<" | "<=" | ">" | ">=" .
1396add_op = "+" | "-" .
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01001397mul_op = "*" | "/" | "%" | "div" | "mod" | "quo" | "rem" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001398
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001399unary_op = "+" | "-" | "!" | "*" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001400```
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001401<!-- TODO: consider adding unary_op: "<" | "<=" | ">" | ">=" -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001402
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001403Comparisons are discussed [elsewhere](#Comparison-operators).
1404For other binary operators, the operand
1405types must unify.
1406<!-- TODO: durations
1407 unless the operation involves durations.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001408
1409Except for duration operations, if one operand is an untyped [literal] and the
1410other operand is not, the constant is [converted] to the type of the other
1411operand.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001412-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001413
1414
1415#### Operator precedence
1416
1417Unary operators have the highest precedence.
1418
1419There are eight precedence levels for binary operators.
1420The `..` operator (range) binds strongest, followed by
1421multiplication operators, addition operators, comparison operators,
1422`&&` (logical AND), `||` (logical OR), `&` (unification),
1423and finally `|` (disjunction):
1424
1425```
1426Precedence Operator
1427 8 ..
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01001428 7 * / % div mod quo rem
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001429 6 + -
1430 5 == != < <= > >=
1431 4 &&
1432 3 ||
1433 2 &
1434 1 |
1435```
1436
1437Binary operators of the same precedence associate from left to right.
1438For instance, `x / y * z` is the same as `(x / y) * z`.
1439
1440```
1441+x
144223 + 3*x[i]
1443x <= f()
1444f() || g()
1445x == y+1 && y == z-1
14462 | int
1447{ a: 1 } & { b: 2 }
1448```
1449
1450#### Arithmetic operators
1451
1452Arithmetic operators apply to numeric values and yield a result of the same type
1453as the first operand. The three of the four standard arithmetic operators
1454`(+, -, *)` apply to integer and decimal floating-point types;
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01001455`+` and `*` also apply to lists and strings.
1456`/` and `%` only apply to decimal floating-point types and
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001457`div`, `mod`, `quo`, and `rem` only apply to integer types.
1458
1459```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001460+ sum integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001461- difference integers, floats
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001462* product integers, floats, lists, strings, bytes
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001463/ quotient floats
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01001464% remainder floats
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001465div division integers
1466mod modulo integers
1467quo quotient integers
1468rem remainder integers
1469```
1470
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001471
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001472#### Integer operators
1473
1474For two integer values `x` and `y`,
1475the integer quotient `q = x div y` and remainder `r = x mod y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001476implement Euclidean division and
1477satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001478
1479```
1480r = x - y*q with 0 <= r < |y|
1481```
1482where `|y|` denotes the absolute value of `y`.
1483
1484```
1485 x y x div y x mod y
1486 5 3 1 2
1487-5 3 -2 1
1488 5 -3 -1 2
1489-5 -3 2 1
1490```
1491
1492For two integer values `x` and `y`,
1493the integer quotient `q = x quo y` and remainder `r = x rem y `
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001494implement truncated division and
1495satisfy the following relationship:
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001496
1497```
1498x = q*y + r and |r| < |y|
1499```
1500
1501with `x quo y` truncated towards zero.
1502
1503```
1504 x y x quo y x rem y
1505 5 3 1 2
1506-5 3 -1 -2
1507 5 -3 -1 2
1508-5 -3 1 -2
1509```
1510
1511A zero divisor in either case results in bottom (an error).
1512
1513For integer operands, the unary operators `+` and `-` are defined as follows:
1514
1515```
1516+x is 0 + x
1517-x negation is 0 - x
1518```
1519
1520
1521#### Decimal floating-point operators
1522
1523For decimal floating-point numbers, `+x` is the same as `x`,
1524while -x is the negation of x.
1525The result of a floating-point division by zero is bottom (an error).
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001526<!-- TODO: consider making it +/- Inf -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001527
1528An implementation may combine multiple floating-point operations into a single
1529fused operation, possibly across statements, and produce a result that differs
1530from the value obtained by executing and rounding the instructions individually.
1531
1532
1533#### List operators
1534
1535Lists can be concatenated using the `+` operator.
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001536For lists `a` and `b`,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001537```
1538a + b
1539```
1540will produce an open list if `b` is open.
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001541If list `a` is open, its default value, the shortest variant, is selected.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001542
1543```
1544[ 1, 2 ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
1545[ 1, 2, ... ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
1546[ 1, 2 ] + [ 3, 4, ... ] // [ 1, 2, 3, 4, ... ]
1547```
1548
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +01001549Lists can be multiplied with a positive `int` using the `*` operator
1550to create a repeated the list by the indicated number.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001551```
15523*[1,2] // [1, 2, 1, 2, 1, 2]
Marcel van Lohuizen13e36bd2019-02-01 09:59:18 +010015533*[1, 2, ...] // [1, 2, 1, 2, 1 ,2]
1554[byte]*4 // [byte, byte, byte, byte]
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001555```
Marcel van Lohuizen08466f82019-02-01 09:09:09 +01001556
1557<!-- TODO(mpvl): should we allow multiplication with a range?
1558If so, how does one specify a list with a range of possible lengths?
1559
1560Suggestion from jba:
1561Multiplication should distribute over disjunction,
1562so int(1)..int(3) * [x] = [x] | [x, x] | [x, x, x].
1563The hard part is figuring out what 1..3 * [x] means,
1564since 1..3 includes many floats.
1565(mpvl: could constrain arguments to parameter types, but needs to be
1566done consistently.)
1567-->
1568
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001569
1570#### String operators
1571
1572Strings can be concatenated using the `+` operator:
1573```
1574s := "hi " + name + " and good bye"
1575```
1576String addition creates a new string by concatenating the operands.
1577
1578A string can be repeated by multiplying it:
1579
1580```
1581s: "etc. "*3 // "etc. etc. etc. "
1582```
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001583<!-- jba: Do these work for byte sequences? If not, why not? -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001584
1585##### Comparison operators
1586
1587Comparison operators compare two operands and yield an untyped boolean value.
1588
1589```
1590== equal
1591!= not equal
1592< less
1593<= less or equal
1594> greater
1595>= greater or equal
1596```
1597
1598In any comparison, the types of the two operands must unify.
1599
1600The equality operators `==` and `!=` apply to operands that are comparable.
1601The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
1602These terms and the result of the comparisons are defined as follows:
1603
1604- Boolean values are comparable.
1605 Two boolean values are equal if they are either both true or both false.
1606- Integer values are comparable and ordered, in the usual way.
1607- Floating-point values are comparable and ordered, as per the definitions
1608 for binary coded decimals in the IEEE-754-2008 standard.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001609- String values are comparable and ordered, lexically byte-wise after
1610 normalization to Unicode normal form NFC.
1611- Struct are not comparable.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001612 Two struct values are equal if their corresponding non-blank fields are equal.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001613- Lists are comparable.
1614 Two list values are equal if their corresponding elements are equal.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001615```
1616c: 3 < 4
1617
1618x: int
1619y: int
1620
1621b3: x == y // b3 has type bool
1622```
1623
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001624<!-- jba
1625I think I know what `3 < a` should mean if
1626
1627 a: 1..5
1628
1629It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
1630
1631But what does `3 < 1..5` mean? We'll never get more information, so it must have a definite value.
1632-->
1633
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001634#### Logical operators
1635
1636Logical operators apply to boolean values and yield a result of the same type
1637as the operands. The right operand is evaluated conditionally.
1638
1639```
1640&& conditional AND p && q is "if p then q else false"
1641|| conditional OR p || q is "if p then true else q"
1642! NOT !p is "not p"
1643```
1644
1645
1646<!--
1647### TODO TODO TODO
1648
16493.14 / 0.0 // illegal: division by zero
1650Illegal conversions always apply to CUE.
1651
1652Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
1653-->
1654
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001655<!--- TODO(mpvl): conversions
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001656### Conversions
1657Conversions are expressions of the form `T(x)` where `T` and `x` are
1658expressions.
1659The result is always an instance of `T`.
1660
1661```
1662Conversion = Expression "(" Expression [ "," ] ")" .
1663```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001664--->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001665<!---
1666
1667A literal value `x` can be converted to type T if `x` is representable by a
1668value of `T`.
1669
1670As a special case, an integer literal `x` can be converted to a string type
1671using the same rule as for non-constant x.
1672
1673Converting a literal yields a typed value as result.
1674
1675```
1676uint(iota) // iota value of type uint
1677float32(2.718281828) // 2.718281828 of type float32
1678complex128(1) // 1.0 + 0.0i of type complex128
1679float32(0.49999999) // 0.5 of type float32
1680float64(-1e-1000) // 0.0 of type float64
1681string('x') // "x" of type string
1682string(0x266c) // "♬" of type string
1683MyString("foo" + "bar") // "foobar" of type MyString
1684string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant
1685(*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
1686int(1.2) // illegal: 1.2 cannot be represented as an int
1687string(65.0) // illegal: 65.0 is not an integer constant
1688```
1689--->
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001690<!---
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001691
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001692A conversion is always allowed if `x` is an instance of `T`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001693
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001694If `T` and `x` of different underlying type, a conversion is allowed if
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001695`x` can be converted to a value `x'` of `T`'s type, and
1696`x'` is an instance of `T`.
1697A value `x` can be converted to the type of `T` in any of these cases:
1698
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001699- `x` is a struct and is subsumed by `T`.
1700- `x` and `T` are both integer or floating points.
1701- `x` is an integer or a byte sequence and `T` is a string.
1702- `x` is a string and `T` is a byte sequence.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001703
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001704Specific rules apply to conversions between numeric types, structs,
1705or to and from a string type. These conversions may change the representation
1706of `x`.
1707All other conversions only change the type but not the representation of x.
1708
1709
1710#### Conversions between numeric ranges
1711For the conversion of numeric values, the following rules apply:
1712
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +010017131. Any integer value can be converted into any other integer value
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001714 provided that it is within range.
17152. When converting a decimal floating-point number to an integer, the fraction
1716 is discarded (truncation towards zero). TODO: or disallow truncating?
1717
1718```
1719a: uint16(int(1000)) // uint16(1000)
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001720b: uint8(1000) // _|_ // overflow
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001721c: int(2.5) // 2 TODO: TBD
1722```
1723
1724
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001725#### Conversions to and from a string type
1726
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001727Converting a list of bytes to a string type yields a string whose successive
1728bytes are the elements of the slice.
1729Invalid UTF-8 is converted to `"\uFFFD"`.
1730
1731```
1732string('hell\xc3\xb8') // "hellø"
1733string(bytes([0x20])) // " "
1734```
1735
1736As string value is always convertible to a list of bytes.
1737
1738```
1739bytes("hellø") // 'hell\xc3\xb8'
1740bytes("") // ''
1741```
1742
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001743#### Conversions between list types
1744
1745Conversions between list types are possible only if `T` strictly subsumes `x`
1746and the result will be the unification of `T` and `x`.
1747
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001748If we introduce named types this would be different from IP & [10, ...]
1749
1750Consider removing this until it has a different meaning.
1751
1752```
1753IP: 4*[byte]
1754Private10: IP([10, ...]) // [10, byte, byte, byte]
1755```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001756
Marcel van Lohuizen75cb0032019-01-11 12:10:48 +01001757#### Conversions between struct types
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001758
1759A conversion from `x` to `T`
1760is applied using the following rules:
1761
17621. `x` must be an instance of `T`,
17632. all fields defined for `x` that are not defined for `T` are removed from
1764 the result of the conversion, recursively.
1765
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001766<!-- jba: I don't think you say anywhere that the matching fields are unified.
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001767mpvl: they are not, x must be an instance of T, in which case x == T&x,
1768so unification would be unnecessary.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001769-->
Marcel van Lohuizena3f00972019-02-01 11:10:39 +01001770<!--
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001771```
1772T: {
1773 a: { b: 1..10 }
1774}
1775
1776x1: {
1777 a: { b: 8, c: 10 }
1778 d: 9
1779}
1780
1781c1: T(x1) // { a: { b: 8 } }
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001782c2: T({}) // _|_ // missing field 'a' in '{}'
1783c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001784```
Marcel van Lohuizend340e8d2019-01-30 16:57:39 +01001785-->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001786
1787### Calls
1788
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001789Calls can be made to core library functions, called builtins.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001790Given an expression `f` of function type F,
1791```
1792f(a1, a2, … an)
1793```
1794calls `f` with arguments a1, a2, … an. Arguments must be expressions
1795of which the values are an instance of the parameter types of `F`
1796and are evaluated before the function is called.
1797
1798```
1799a: math.Atan2(x, y)
1800```
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001801
1802In a function call, the function value and arguments are evaluated in the usual
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01001803order.
1804After they are evaluated, the parameters of the call are passed by value
1805to the function and the called function begins execution.
1806The return parameters
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001807of the function are passed by value back to the calling function when the
1808function returns.
1809
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001810
1811### Comprehensions
1812
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01001813Lists and fields can be constructed using comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001814
1815Each define a clause sequence that consists of a sequence of `for`, `if`, and
1816`let` clauses, nesting from left to right.
1817The `for` and `let` clauses each define a new scope in which new values are
1818bound to be available for the next clause.
1819
1820The `for` clause binds the defined identifiers, on each iteration, to the next
1821value of some iterable value in a new scope.
1822A `for` clause may bind one or two identifiers.
1823If there is one identifier, it binds it to the value, for instance
1824a list element, a struct field value or a range element.
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001825If there are two identifiers, the first value will be the key or index,
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001826if available, and the second will be the value.
1827
1828An `if` clause, or guard, specifies an expression that terminates the current
1829iteration if it evaluates to false.
1830
1831The `let` clause binds the result of an expression to the defined identifier
1832in a new scope.
1833
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001834A current iteration is said to complete if the innermost block of the clause
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001835sequence is reached.
1836
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001837_List comprehensions_ specify a single expression that is evaluated and included
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001838in the list for each completed iteration.
1839
Marcel van Lohuizen5fee32f2019-01-21 22:18:48 +01001840_Field comprehensions_ follow a `Field` with a clause sequence, where the
1841label and value of the field are evaluated for each iteration.
1842The label must be an identifier or interpreted_string_lit, where the
1843later may be a string interpolation that refers to the identifiers defined
1844in the clauses.
1845Values of iterations that map to the same label unify into a single field.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001846
1847```
1848ComprehensionDecl = Field [ "<-" ] Clauses .
Marcel van Lohuizen1e0fe9c2018-12-21 00:17:06 +01001849ListComprehension = "[" Expression [ "<-" ] Clauses "]" .
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001850
1851Clauses = Clause { Clause } .
1852Clause = ForClause | GuardClause | LetClause .
1853ForClause = "for" identifier [ ", " identifier] "in" Expression .
1854GuardClause = "if" Expression .
1855LetClause = "let" identifier "=" Expression .
1856```
1857
1858```
1859a: [1, 2, 3, 4]
1860b: [ x+1 for x in a if x > 1] // [3, 4, 5]
1861
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01001862c: { "\(x)": x + y for x in a if x < 4 let y = 1 }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001863d: { "1": 2, "2": 3, "3": 4 }
1864```
1865
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001866
1867### String interpolation
1868
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001869String interpolation allows constructing strings by replacing placeholder
1870expressions with their string representation.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001871String interpolation may be used in single- and double-quoted strings, as well
1872as their multiline equivalent.
1873
Jonathan Amsterdame4790382019-01-20 10:29:29 -05001874A placeholder consists of "\(" followed by an expression and a ")". The
1875expression is evaluated within the scope within which the string is defined.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001876
1877```
1878a: "World"
1879b: "Hello \( a )!" // Hello World!
1880```
1881
1882
1883## Builtin Functions
1884
1885Built-in functions are predeclared. They are called like any other function.
1886
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001887
1888### `len`
1889
1890The built-in function `len` takes arguments of various types and return
1891a result of type int.
1892
1893```
1894Argument type Result
1895
1896string string length in bytes
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001897bytes length of byte sequence
1898list list length, smallest length for an open list
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001899struct number of distinct fields
1900```
Marcel van Lohuizen45163fa2019-01-22 15:53:32 +01001901
1902```
1903Expression Result
1904len("Hellø") 6
1905len([1, 2, 3]) 3
1906len([1, 2, ...]) 2
1907len({a:1, b:2}) 2
1908```
1909
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001910
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01001911## Cycles
1912
1913Implementations are required to interpret or reject cycles encountered
1914during evaluation according to the rules in this section.
1915
1916
1917### Reference cycles
1918
1919A _reference cycle_ occurs if a field references itself, either directly or
1920indirectly.
1921
1922```
1923// x references itself
1924x: x
1925
1926// indirect cycles
1927b: c
1928c: d
1929d: b
1930```
1931
1932Implementations should report these as an error except in the following cases:
1933
1934
1935#### Expressions that unify an atom with an expression
1936
1937An expression of the form `a & e`, where `a` is an atom
1938and `e` is an expression, always evaluates to `a` or bottom.
1939As it does not matter how we fail, we can assume the result to be `a`
1940and validate after the field in which the expression occurs has been evaluated
1941that `a == e`.
1942
1943```
1944// Config Evaluates to
1945x: { x: {
1946 a: b + 100 a: _|_ // cycle detected
1947 b: a - 100 b: _|_ // cycle detected
1948} }
1949
1950y: x & { y: {
1951 a: 200 a: 200 // asserted that 200 == b + 100
1952 b: 100
1953} }
1954```
1955
1956
1957#### Field values
1958
1959A field value of the form `r & v`,
1960where `r` evaluates to a reference cycle and `v` is a value,
1961evaluates to `v`.
1962Unification is idempotent and unifying a value with itself ad infinitum,
1963which is what the cycle represents, results in this value.
1964Implementations should detect cycles of this kind, ignore `r`,
1965and take `v` as the result of unification.
1966<!-- Tomabechi's graph unification algorithm
1967can detect such cycles at near-zero cost. -->
1968
1969```
1970Configuration Evaluated
1971// c Cycles in nodes of type struct evaluate
1972// ↙︎ ↖ to the fixed point of unifying their
1973// a → b values ad infinitum.
1974
1975a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 }
1976b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 }
1977c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 }
1978
1979// resolve a b & {x:1}
1980// substitute b c & {y:2} & {x:1}
1981// substitute c a & {z:3} & {y:2} & {x:1}
1982// eliminate a (cycle) {z:3} & {y:2} & {x:1}
1983// simplify {x:1,y:2,z:3}
1984```
1985
1986This rule also applies to field values that are disjunctions of unification
1987operations of the above form.
1988
1989```
1990a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1}
1991b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2}
1992c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3}
1993
1994
1995// resolving a b&{x:1} | {y:1}
1996// substitute b ({x:2} | c&{z:2})&{x:1} | {y:1}
1997// simplify c&{z:2}&{x:1} | {y:1}
1998// substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
1999// simplify a&{y:3}&{z:2}&{x:1} | {y:1}
2000// eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1}
2001// expand {x:1,y:3,z:2} | {y:1}
2002```
2003
2004Note that all nodes that form a reference cycle to form a struct will evaluate
2005to the same value.
2006If a field value is a disjunction, any element that is part of a cycle will
2007evaluate to this value.
2008
2009
2010### Structural cycles
2011
2012CUE disallows infinite structures.
2013Implementations must report an error when encountering such declarations.
2014
2015<!-- for instance using an occurs check -->
2016
2017```
2018// Disallowed: a list of infinite length with all elements being 1.
2019list: {
2020 head: 1
2021 tail: list
2022}
2023
2024// Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
2025a: {
2026 b: c
2027}
2028c: {
2029 d: a
2030}
2031```
2032
2033It is allowed for a value to define an infinite set of possibilities
2034without evaluating to an infinite structure itself.
2035
2036```
2037// List defines a list of arbitrary length (default null).
2038List: *null | {
2039 head: _
2040 tail: List
2041}
2042```
2043
2044<!--
Marcel van Lohuizen7f48df72019-02-01 17:24:59 +01002045Consider banning any construct that makes CUE not having a linear
2046running time expressed in the number of nodes in the output.
2047
2048This would require restricting constructs like:
2049
2050(fib&{n:2}).out
2051
2052fib: {
2053 n: int
2054
2055 out: (fib&{n:n-2}).out + (fib&{n:n-1}).out if n >= 2
2056 out: fib({n:n-2}).out + fib({n:n-1}).out if n >= 2
2057 out: n if n < 2
2058}
2059
2060-->
2061<!--
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002062### Unused fields
2063
2064TODO: rules for detection of unused fields
2065
20661. Any alias value must be used
2067-->
2068
2069
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002070## Modules, instances, and packages
2071
2072CUE configurations are constructed combining _instances_.
2073An instance, in turn, is constructed from one or more source files belonging
2074to the same _package_ that together declare the data representation.
2075Elements of this data representation may be exported and used
2076in other instances.
2077
2078### Source file organization
2079
2080Each source file consists of an optional package clause defining collection
2081of files to which it belongs,
2082followed by a possibly empty set of import declarations that declare
2083packages whose contents it wishes to use, followed by a possibly empty set of
2084declarations.
2085
2086
2087```
2088SourceFile = [ PackageClause "," ] { ImportDecl "," } { TopLevelDecl "," } .
2089```
2090
2091### Package clause
2092
2093A package clause is an optional clause that defines the package to which
2094a source file the file belongs.
2095
2096```
2097PackageClause = "package" PackageName .
2098PackageName = identifier .
2099```
2100
2101The PackageName must not be the blank identifier.
2102
2103```
2104package math
2105```
2106
2107### Modules and instances
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002108A _module_ defines a tree of directories, rooted at the _module root_.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002109
2110All source files within a module with the same package belong to the same
2111package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002112<!-- jba: I can't make sense of the above sentence. -->
2113A module may define multiple packages.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002114
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002115An _instance_ of a package is any subset of files belonging
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002116to the same package.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002117<!-- jba: Are you saying that -->
2118<!-- if I have a package with files a, b and c, then there are 8 instances of -->
2119<!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
2120<!-- purpose of that definition? -->
2121It is interpreted as the concatenation of these files.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002122
2123An implementation may impose conventions on the layout of package files
2124to determine which files of a package belongs to an instance.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002125For example, an instance may be defined as the subset of package files
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002126belonging to a directory and all its ancestors.
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002127<!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002128
2129### Import declarations
2130
2131An import declaration states that the source file containing the declaration
2132depends on definitions of the _imported_ package (§Program initialization and
2133execution) and enables access to exported identifiers of that package.
2134The import names an identifier (PackageName) to be used for access and an
2135ImportPath that specifies the package to be imported.
2136
2137```
2138ImportDecl = "import" ( ImportSpec | "(" { ImportSpec ";" } ")" ) .
2139ImportSpec = [ "." | PackageName ] ImportPath .
2140ImportPath = `"` { unicode_value } `"` .
2141```
2142
2143The PackageName is used in qualified identifiers to access exported identifiers
2144of the package within the importing source file.
2145It is declared in the file block.
2146If the PackageName is omitted, it defaults to the identifier specified in the
2147package clause of the imported instance.
2148If an explicit period (.) appears instead of a name, all the instances's
2149exported identifiers declared in that instances's package block will be declared
2150in the importing source file's file block
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002151and must be accessed without a qualifier.
2152<!-- jba: Can you omit this feature? It's likely to only decrease readability,
2153as we know from Go. -->
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002154
2155The interpretation of the ImportPath is implementation-dependent but it is
2156typically either the path of a builtin package or a fully qualifying location
2157of an instance within a source code repository.
2158
2159Implementation restriction: An interpreter may restrict ImportPaths to non-empty
2160strings using only characters belonging to Unicode's L, M, N, P, and S general
2161categories (the Graphic characters without spaces) and may also exclude the
2162characters !"#$%&'()*,:;<=>?[\]^`{|} and the Unicode replacement character
2163U+FFFD.
2164
Jonathan Amsterdame4790382019-01-20 10:29:29 -05002165Assume we have package containing the package clause "package math",
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002166which exports function Sin at the path identified by "lib/math".
2167This table illustrates how Sin is accessed in files
2168that import the package after the various types of import declaration.
2169
2170```
2171Import declaration Local name of Sin
2172
2173import "lib/math" math.Sin
2174import m "lib/math" m.Sin
2175import . "lib/math" Sin
2176```
2177
2178An import declaration declares a dependency relation between the importing and
2179imported package. It is illegal for a package to import itself, directly or
2180indirectly, or to directly import a package without referring to any of its
2181exported identifiers.
2182
2183
2184### An example package
2185
2186TODO
2187
Marcel van Lohuizen6713ae22019-01-26 14:42:25 +01002188
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002189
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002190
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002191