blob: a61736aaa3e738478d9c0511a7462397db64ea02 [file] [log] [blame] [view]
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001<!--
2 Copyright 2018 The CUE Authors
3
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License.
6 You may obtain a copy of the License at
7
8 http://www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in writing, software
11 distributed under the License is distributed on an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 See the License for the specific language governing permissions and
14 limitations under the License.
15-->
16
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +010017# The CUE Language Specification
18
19## Introduction
20
21This is a reference manual for the CUE configuration language.
22CUE, pronounced cue or Q, is a general-purpose and strongly typed
23configuration language.
24The CUE tooling, layered on top of CUE, converts this language to
25a general purpose scripting language for creating scripts as well as
26simple servers.
27
28CUE was designed with cloud configuration, and related systems, in mind,
29but is not limited to this domain.
30It derives its formalism from relational programming languages.
31This formalism allows for managing and reasoning over large amounts of
32configuration in a straightforward manner.
33
34The grammar is compact and regular, allowing for easy analysis by automatic
35tools such as integrated development environments.
36
37This document is maintained by mpvl@golang.org.
38CUE has a lot of similarities with the Go language. This document draws heavily
39from the Go specification as a result, Copyright 2009–2018, The Go Authors.
40
41CUE draws its influence from many languages.
42Its main influences were BCL/ GCL (internal to Google),
43LKB (LinGO), Go, and JSON.
44Others are Swift, Javascript, Prolog, NCL (internal to Google), Jsonnet, HCL,
45Flabbergast, JSONPath, Haskell, Objective-C, and Python.
46
47
48## Notation
49
50The syntax is specified using Extended Backus-Naur Form (EBNF):
51
52```
53Production = production_name "=" [ Expression ] "." .
54Expression = Alternative { "|" Alternative } .
55Alternative = Term { Term } .
56Term = production_name | token [ "…" token ] | Group | Option | Repetition .
57Group = "(" Expression ")" .
58Option = "[" Expression "]" .
59Repetition = "{" Expression "}" .
60```
61
62Productions are expressions constructed from terms and the following operators,
63in increasing precedence:
64
65```
66| alternation
67() grouping
68[] option (0 or 1 times)
69{} repetition (0 to n times)
70```
71
72Lower-case production names are used to identify lexical tokens. Non-terminals
73are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes
74``.
75
76The form a … b represents the set of characters from a through b as
77alternatives. The horizontal ellipsis … is also used elsewhere in the spec to
78informally denote various enumerations or code snippets that are not further
79specified. The character … (as opposed to the three characters ...) is not a
80token of the Go language.
81
82
83## Source code representation
84
85Source code is Unicode text encoded in UTF-8.
86Unless otherwise noted, the text is not canonicalized, so a single
87accented code point is distinct from the same character constructed from
88combining an accent and a letter; those are treated as two code points.
89For simplicity, this document will use the unqualified term character to refer
90to a Unicode code point in the source text.
91
92Each code point is distinct; for instance, upper and lower case letters are
93different characters.
94
95Implementation restriction: For compatibility with other tools, a compiler may
96disallow the NUL character (U+0000) in the source text.
97
98Implementation restriction: For compatibility with other tools, a compiler may
99ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
100point in the source text. A byte order mark may be disallowed anywhere else in
101the source.
102
103
104### Characters
105
106The following terms are used to denote specific Unicode character classes:
107
108```
109newline = /* the Unicode code point U+000A */ .
110unicode_char = /* an arbitrary Unicode code point except newline */ .
111unicode_letter = /* a Unicode code point classified as "Letter" */ .
112unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
113```
114
115In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
116character categories.
117CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
118as Unicode letters, and those in the Number category Nd as Unicode digits.
119
120
121### Letters and digits
122
123The underscore character _ (U+005F) is considered a letter.
124
125```
126letter = unicode_letter | "_" .
127decimal_digit = "0" … "9" .
128octal_digit = "0" … "7" .
129hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
130```
131
132
133## Lexical elements
134
135### Comments
136Comments serve as program documentation. There are two forms:
137
1381. Line comments start with the character sequence // and stop at the end of the line.
1392. General comments start with the character sequence /* and stop with the first subsequent character sequence */.
140
141A comment cannot start inside string literal or inside a comment.
142A general comment containing no newlines acts like a space.
143Any other comment acts like a newline.
144
145
146### Tokens
147
148Tokens form the vocabulary of the CUE language. There are four classes:
149identifiers, keywords, operators and punctuation, and literals. White space,
150formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
151(U+000D), and newlines (U+000A), is ignored except as it separates tokens that
152would otherwise combine into a single token. Also, a newline or end of file may
153trigger the insertion of a comma. While breaking the input into tokens, the
154next token is the longest sequence of characters that form a valid token.
155
156
157### Commas
158
159The formal grammar uses commas "," as terminators in a number of productions.
160CUE programs may omit most of these commas using the following two rule:
161
162When the input is broken into tokens, a comma is automatically inserted into
163the token stream immediately after a line's final token if that token is
164
165 - an identifier
166 - null, true, false, bottom, an integer, a floating-point, or string literal
167 - one of the punctuation ), ], or }
168
169
170Although commas are automatically inserted, the parser will require
171explicit commas between two list elements.
172
173To reflect idiomatic use, examples in this document elide commas using
174these rules.
175
176
177### Identifiers
178
179Identifiers name entities such as fields and aliases.
180An identifier is a sequence of one or more letters and digits.
181It may not be `_`.
182The first character in an identifier must be a letter.
183
184<!--
185TODO: allow identifiers as defined in Unicode UAX #31
186(https://unicode.org/reports/tr31/).
187
188Identifiers are normalized using the NFC normal form.
189-->
190
191```
192identifier = letter { letter | unicode_digit } .
193```
194
195```
196a
197_x9
198fieldName
199αβ
200```
201
202<!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
203
204Some identifiers are [predeclared].
205
206
207### Keywords
208
209CUE has a limited set of keywords.
210All keywords may be used as labels (field names).
211They cannot, however, be used as identifiers to refer to the same name.
212
213
214#### Values
215
216The following keywords are values.
217
218```
219null true false
220```
221
222These can never be used to refer to a field of the same name.
223This restriction is to ensure compatibility with JSON configuration files.
224
225
226#### Preamble
227
228The following pseudo keywords are used at the preamble of a CUE file.
229After the preamble, they may be used as identifiers to refer to namesake fields.
230
231```
232package import
233```
234
235
236#### Comprehension clauses
237
238The following pseudo keywords are used in comprehensions.
239
240```
241for in if let
242```
243
244The pseudo keywords `for`, `if` and `let` cannot be used as identifiers to
245refer to fields. All others can.
246
247<!--
248TODO:
249 reduce [to]
250 order [by]
251-->
252
253
254#### Arithmetic
255
256The following pseudo keywords can be used as operators in expressions.
257
258```
259div mod quo rem
260```
261
262These may be used as identifiers to refer to fields in all other contexts.
263
264
265### Operators and punctuation
266
267The following character sequences represent operators and punctuation:
268
269```
270+ & && == != ( )
271- | || < <= [ ]
272* : ! > >= { }
273/ :: ; = ... .. .
274div mod quo rem _|_ <- ,
275```
276Optional:
277```
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100278-> _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100279```
280
281
282### Integer literals
283
284An integer literal is a sequence of digits representing an integer value.
285An optional prefix sets a non-decimal base: 0 for octal,
2860x or 0X for hexadecimal, and 0b for binary.
287In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
288All integers allow intersticial underscores "_";
289these have no meaning and are solely for readability.
290
291Decimal integers may have a SI or IEC multiplier.
292Multipliers can be used with fractional numbers.
293The result of multiplying the fraction with the multiplier is truncated
294towards zero if the result is not an integer.
295
296```
297int_lit = decimal_lit | octal_lit | binary_lit | hex_lit .
298decimals = ( "0" … "9" ) { [ "_" ] decimal_digit } .
299decimal_lit = ( "1" … "9" ) { [ "_" ] decimal_digit } [ [ "." decimals ] multiplier ] |
300 "." decimals multiplier.
301octal_lit = "0" octal_digit { [ "_" ] octal_digit } .
302binary_lit = "0b" binary_digit { binary_digit } .
303hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
304multiplier = "k" | "Ki" | ( "M" | "G" | "T" | "P" | "E" | "Y" | "Z" ) [ "i" ]
305```
306<!--
307TODO: consider 0o766 notation for octal.
308--->
309
310```
31142
3121.5Gi
3130600
3140xBad_Face
315170_141_183_460_469_231_731_687_303_715_884_105_727
316```
317
318### Decimal floating-point literals
319
320A decimal floating-point literal is a representation of
321a decimal floating-point value.
322We will refer to those as floats.
323It has an integer part, a decimal point, a fractional part, and an
324exponent part.
325The integer and fractional part comprise decimal digits; the
326exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
327One of the integer part or the fractional part may be elided; one of the decimal
328point or the exponent may be elided.
329
330```
331decimal_lit = decimals "." [ decimals ] [ exponent ] |
332 decimals exponent |
333 "." decimals [ exponent ] .
334exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
335```
336
337```
3380.
33972.40
340072.40 // == 72.40
3412.71828
3421.e+0
3436.67428e-11
3441E6
345.25
346.12345E+5
347```
348
349
350### String literals
351A string literal represents a string constant obtained from concatenating a
352sequence of characters. There are three forms: raw string literals and
353interpreted strings and bytes sequences.
354
355Raw string literals are character sequences between back quotes, as in
356```
357`foo`
358```
359Within the quotes, any character may appear except back quote. The value of a
360raw string literal is the string composed of the uninterpreted (implicitly
361UTF-8-encoded) characters between the quotes; in particular, backslashes have no
362special meaning and the string may contain newlines. Carriage return characters
363(`\r`) inside raw string literals are discarded from the raw string value.
364
365Interpreted string and byte sequence literals are character sequences between,
366respectively, double and single quotes, as in `"bar"` and `'bar'`.
367Within the quotes, any character may appear except newline and,
368respectively, unescaped double or single quote.
369String literals may only be valid UTF-8.
370Byte sequences may contain any sequence of bytes.
371
372Several backslash escapes allow arbitrary values to be encoded as ASCII text
373in interpreted strings.
374There are four ways to represent the integer value as a numeric constant: `\x`
375followed by exactly two hexadecimal digits; \u followed by exactly four
376hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
377plain backslash `\` followed by exactly three octal digits.
378In each case the value of the literal is the value represented by the
379digits in the corresponding base.
380Hexadecimal and octal escapes are only allowed within byte sequences
381(single quotes).
382
383Although these representations all result in an integer, they have different
384valid ranges.
385Octal escapes must represent a value between 0 and 255 inclusive.
386Hexadecimal escapes satisfy this condition by construction.
387The escapes `\u` and `\U` represent Unicode code points so within them
388some values are illegal, in particular those above `0x10FFFF`.
389Surrogate halves are allowed to be compatible with JSON,
390but are translated into their non-surrogate equivalent internally.
391
392The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
393represent individual bytes of the resulting string; all other escapes represent
394the (possibly multi-byte) UTF-8 encoding of individual characters.
395Thus inside a string literal `\377` and `\xFF` represent a single byte of
396value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
397the two bytes `0xc3 0xbf` of the UTF-8
398encoding of character `U+00FF`.
399
400After a backslash, certain single-character escapes represent special values:
401
402```
403\a U+0007 alert or bell
404\b U+0008 backspace
405\f U+000C form feed
406\n U+000A line feed or newline
407\r U+000D carriage return
408\t U+0009 horizontal tab
409\v U+000b vertical tab
410\\ U+005c backslash
411\' U+0027 single quote (valid escape only within single quoted literals)
412\" U+0022 double quote (valid escape only within double quoted literals)
413```
414
415The escape `\(` is used as an escape for string interpolation.
416A `\(` must be followed by a valid CUE Expression, followed by a `)`.
417
418All other sequences starting with a backslash are illegal inside literals.
419
420```
421escaped_char = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
422unicode_value = unicode_char | little_u_value | big_u_value | escaped_char .
423byte_value = octal_byte_value | hex_byte_value .
424octal_byte_value = `\` octal_digit octal_digit octal_digit .
425hex_byte_value = `\` "x" hex_digit hex_digit .
426little_u_value = `\` "u" hex_digit hex_digit hex_digit hex_digit .
427big_u_value = `\` "U" hex_digit hex_digit hex_digit hex_digit
428 hex_digit hex_digit hex_digit hex_digit .
429
430string_lit = raw_string_lit |
431 interpreted_string_lit |
432 interpreted_bytes_lit |
433 multiline_lit .
434interpolation = "\(" Expression ")" .
435raw_string_lit = "`" { unicode_char | newline } "`" .
436interpreted_string_lit = `"` { unicode_value | interpolation } `"` .
437interpreted_bytes_lit = `"` { unicode_value | interpolation | byte_value } `"` .
438```
439
440```
441`abc` // same as "abc"
442`\n
443\n` // same as "\\n\n\\n"
444'a\000\xab'
445'\007'
446'\377'
447'\xa' // illegal: too few hexadecimal digits
448"\n"
449"\"" // same as `"`
450'Hello, world!\n'
451"Hello, \( name )!"
452"日本語"
453"\u65e5本\U00008a9e"
454"\xff\u00FF"
455"\uD800" // illegal: surrogate half
456"\U00110000" // illegal: invalid Unicode code point
457```
458
459These examples all represent the same string:
460
461```
462"日本語" // UTF-8 input text
463'日本語' // UTF-8 input text as byte sequence
464`日本語` // UTF-8 input text as a raw literal
465"\u65e5\u672c\u8a9e" // the explicit Unicode code points
466"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points
467"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
468```
469
470If the source code represents a character as two code points, such as a
471combining form involving an accent and a letter, the result will appear as two
472code points if placed in a string literal.
473
474Each of the interpreted string variants have a multiline equivalent.
475Multiline interpreted strings are like their single-line equivalent,
476but allow newline characters.
477Carriage return characters (`\r`) inside raw string literals are discarded from
478the raw string value.
479
480Multiline interpreted strings and byte sequences respectively start with
481a triple double quote (`"""`) or triple single quote (`'''`),
482immediately followed by a newline, which is discarded from the string contents.
483The string is closed by a matching triple quote, which must be by itself
484on a newline, preceded by optional whitespace.
485The whitespace before a closing triple quote must appear before any non-empty
486line after the opening quote and will be removed from each of these
487lines in the string literal.
488A closing triple quote may not appear in the string.
489To include it is suffices to escape one of the quotes.
490
491```
492multiline_lit = multiline_string_lit | multiline_bytes_lit .
493multiline_string_lit = `"""` newline
494 { unicode_char | interpolation | newline }
495 newline `"""` .
496multiline_bytes_lit = "'''" newline
497 { unicode_char | interpolation | newline | byte_value }
498 newline "'''" .
499```
500
501```
502"""
503 lily:
504 out of the water
505 out of itself
506
507 bass
508 picking bugs
509 off the moon
510 — Nick Virgilio, Selected Haiku, 1988
511 """
512```
513
514This represents the same string as:
515
516```
517"lily:\nout of the water\nout of itself\n\n" +
518"bass\npicking bugs\noff the moon\n" +
519" — Nick Virgilio, Selected Haiku, 1988"
520```
521
522<!-- TODO: other values
523
524Support for other values:
525- Duration literals
526- regualr expessions: `re("[a-z]")`
527-->
528
529## Prototypes
530
531CUE defines basic types and structs. The _basic types_ are null, bool,
532int, float, string, and bytes.
533A _struct_ is a map from a label to a value, where the value may be of any
534type.
535Lists, provided by CUE as a convenience, are special cases of structs and
536are not included in the definition of the type system.
537
538In CUE, all possible types and values are partially ordered in a lattice.
539CUE does not distinguish between types and values, only between
540concrete and partially defined instances of a certain type.
541
542For example `string` is the identifier used to set of all possible strings.
543The string `"hello"` is an instance of such a string and ordered below
544this string. The value `42` is not an instance of `string`.
545
546
547### Ordering and lattices
548
549All possible prototypes are ordered in a lattice,
550a partial order where every two elements have a single greatest lower bound.
551A value `a` is said to be _greater_ than `b` if `a` orders before `b` in this
552partial order.
553At the top of this lattice is the single ancestor of all values, called
554_top_, denoted `_` in CUE.
555At the bottom of this lattice is the value called _bottom_, denoted `_|_`.
556A bottom value usually indicates an error.
557
558We say that for any two prototypes `a` and `b` that `a` is an _instance_ of `b`,
559denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`
560that is if `a` orders before `b` in the partial order
561(`⊑` is _not_ a CUE operator).
562We also say that `b` subsumes `a` in this case.
563
564
565An _atom_ is any value whose only instance is itself and bottom. Examples of
566atoms are `42`, `"hello"`, `true`, `null`.
567
568A _type_ is any value which is only an instance of itself or top.
569This includes `null`: the null value, `bool`: all possible boolean values,
570`int`: all integral numbers, `float`, `string`, `bytes`, and `{}`.
571
572A _concrete value_ is any atom or struct with fields of which the values
573are itself concrete values, recursively.
574A concrete value corresponds to a valid JSON value
575
576A _prototype_ is any concrete value, type, or any instance of a type
577that is not a concrete value.
578We will informally refer to a prototype as _value_.
579
580
581```
582false ⊑ bool
583true ⊑ bool
584true ⊑ true
5855 ⊑ int
586bool ⊑ _
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100587_|_ ⊑ _
588_|_ ⊑ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100589
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100590_ ⋢ _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100591_ ⋢ bool
592int ⋢ bool
593bool ⋢ int
594false ⋢ true
595true ⋢ false
596int ⋢ 5
5975 ⋢ 6
598```
599
600
601### Unification
602
603The _unification_ of values `a` and `b`, denoted as `a & b` in CUE,
604is defined as the value `u` such that `u ⊑ a` and `u ⊑ b`,
605while for any other value `u'` for which `u' ⊑ a` and `u' ⊑ b`
606it holds that `u' ⊑ u`.
607The value `u` is also called the _greatest lower bound_ of `a` and `b`.
608The greatest lower bound is, given the nature of lattices, always unique.
609The unification of `a` with itself is always `a`.
610The unification of a value `a` and `b` where `a ⊑ b` is always `a`.
611
612Unification is commutative, transitive, and reflexive.
613As a consequence, order of evaluation is irrelevant, a property that is key
614to many of the constructs in the CUE language as well as the tooling layered
615on top of it.
616
617Syntactically, unification is a [binary expression].
618
619
620### Disjunction
621
622A _disjunction_ of two values `a` and `b`, denoted as `a | b` in CUE,
623is defined as the smallest value `d` such that `a ⊑ d` and `b ⊑ d`.
624The disjunction of `a` with itself is always `a`.
625The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
626
627Syntactically, disjunction is a [binary expression].
628
629Implementations should report an error if for a disjunction `a | ... | b`,
630`b` is an instance of `a`, as `b` will be superfluous and can never
631be selected as a default.
632
633A value that evaluates to bottom is removed from the disjunction.
634A disjunction evaluates to bottom if all of its values evaluate to bottom.
635
636If a disjunction is used in any operation other than unification or another
637disjunction, the default value is chosen before operating on it.
638
639```
640Expression Result (without picking default)
641(int | string) & "foo" "foo"
642("a" | "b") & "c" _|_
643
644(3 | 5) + 2 5
645```
646
647If the values of a disjunction are unambiguous, its first value may be taken
648as a default value.
649The default value for a disjunction is selected when:
650
6511. passing it to an argument of a call or index value,
6521. using it in any unary or binary expression except for unification or disjunction,
6531. using it as the receiver of a call, index, slice, or selector expression, and
6541. a value is taken for a configuration.
655
656A value is unambiguous if a disjunction has never been unified with another
657disjunction, or if the first element is the result of unifying two first
658values of a disjunction.
659
660
661```
662Expression Default
663("tcp"|"udp") & ("tcp"|"udp") "tcp" // default chosen
664("tcp"|"udp") & ("udp"|"tcp") _|_ // no unique default
665
666("a"|"b") & ("b"|"a") & "a" "a" // single value after evaluation
667```
668
669
670### Bottom and errors
671Any evaluation error in CUE results in a bottom value, respresented by
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100672the token '_|_'.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100673Bottom is an instance of every other prototype.
674Any evaluation error is represented as bottom.
675
676Implementations may associate error strings with different instances of bottom;
677logically they remain all the same value.
678
679```
680Expr Result
681 1 & 2 _|_
682int & bool _|_
683_|_ | 1 1
684_|_ & 2 _|_
685_|_ & _|_ _|_
686```
687
688
689### Top
690Top is represented by the underscore character '_', lexically an identifier.
691Unifying any value `v` with top results `v` itself.
692
693```
694Expr Result
695_ & 5 5
696_ & _ _
697_ & _|_ _|_
698_ | _|_ _
699```
700
701
702### Null
703
704The _null value_ is represented with the pseudo keyword `null`.
705It has only one parent, top, and one child, bottom.
706It is unordered with respect to any other prototype.
707
708```
709null_lit = "null"
710```
711
712```
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100713null & 8 _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100714```
715
716
717### Boolean values
718
719A _boolean type_ represents the set of Boolean truth values denoted by
720the pseudo keywords `true` and `false`.
721The predeclared boolean type is `bool`; it is a defined type and a separate
722element in the lattice.
723
724```
725boolean_lit = "true" | "false"
726```
727
728```
729bool & true true
730bool | true true
731true | false true | false
732true & true true
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100733true & false _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100734```
735
736
737### Numeric values
738
739An _integer type_ represents the set of all integral numbers.
740A _decimal floating-point type_ represents of all decimal floating-point
741numbers.
742They are two distinct types.
743The predeclared integer and decimal floating-point type are `int` and `float`;
744they are a defined type.
745
746A decimal floating-point literal always has type `float`;
747it is not an instance of `int` even if it is an integral number.
748
749An integer literal has both type `int` and `float`, with the integer variant
750being the default if no other constraints are applied.
751Expressed in terms of disjunction and [type conversion],
752the literal `1`, for instance, is defined as `int(1) | float(1)`.
753
754Numeric values are exact values of arbitrary precision and do not overflow.
755Consequently, there are no constants denoting the IEEE-754 negative zero,
756infinity, and not-a-number values.
757
758Implementation restriction: Although numeric values have arbitrary precision
759in the language, implementations may implement them using an internal
760representation with limited precision. That said, every implementation must:
761
762Represent integer values with at least 256 bits.
763Represent floating-point values, with a mantissa of at least 256 bits and
764a signed binary exponent of at least 16 bits.
765Give an error if unable to represent an integer value precisely.
766Give an error if unable to represent a floating-point value due to overflow.
767Round to the nearest representable value if unable to represent
768a floating-point value due to limits on precision.
769These requirements apply to the result of any expression except builtin
770expressions where the loss of precision is remarked.
771
772
773### Strings
774
775The _string type_ represents the set of all possible UTF-8 strings,
776not allowing surrogates.
777The predeclared string type is `string`; it is a defined type.
778
779Strings are designed to be unicode-safe.
780Comparisson is done using canonical forms ("é" == "e\u0301").
781A string element is an extended grapheme cluster, which is an approximation
782of a human readable character.
783The length of a string is its number of extended grapheme clusters, and can
784be discovered using the built-in function `len`.
785
786The length of a string `s` (its size in bytes) can be discovered using
787the built-in function len.
788A string's extended grapheme cluster can be accessed by integer index
7890 through len(s)-1 for any byte that is part of that grapheme cluster.
790To access the individual bytes of a string one should convert it to
791a sequence of bytes first.
792
793
794### Ranges
795
796A _range type_, syntactically a [binary expression], defines
797a disjunction of concrete values that can be represented as a contiguous range.
798Ranges can be defined on numbers and strings.
799
800The type of range `a..b` is the unification of the type of `a` and `b`.
801Note that this may be more than one type.
802
803A range of numbers `a..b` defines an inclusive range for integers and
804floating-point numbers.
805
806Remember that an integer literal represents both an `int` and `float`:
807```
8082 & 1..5 // 2, where 2 is either an int or float.
8092.5 & 1..5 // 2.5
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01008102.5 & int & 1..5 // _|_
8112.5 & (int & 1)..5 // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01008122.5 & float & 1..5 // 2.5
8130..7 & 3..10 // 3..7
814```
815
816A range of strings `a..b` defines a set of strings that includes any `s`
817for which `NFC(a) <= NFC(s)` and `NFC(s) <= NFC(b)` in a byte-wise comparison,
818where `NFC` is the respective Unicode normal form.
819
820
821### Structs
822
823A _struct_ is a set of named elements, called _fields_, each of
824which has a name, called a _label_, and value.
825Structs and fields respectively correspond to JSON objects and members.
826
827We say a label is defined for a struct if the struct has a field with the
828corresponding label.
829We denote the value for a label `f` defined for `a` as `δ(f, a)`.
830
831A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
832defined for `b` label `f` is also defiend for `a` and `δ(f, a) ⊑ δ(f, b)`.
833Note that if `a` is an instance of `b` it may have fields with labels that
834are not defined for `b`.
835
836The unification of structs `a` and `b` is defined as a new struct `c` which
837has all fields of `a` and `b` where the value is either the unification of the
838respective values where a field is contained in both or the original value
839for the respective fields of `a` or `b` otherwise.
840Any [references] to `a` or `b` in their respective field values need to be
841replaced with references to `c`.
842
843Syntactically, a struct literal may contain multiple fields with
844the same label, the result of which is a single field with a value
845that is the result of unifying the values of those fields.
846
847```
848StructLit = "{" [ { Declaration "," } Declaration ] "}" .
849Declaration = FieldDecl | AliasDecl .
850FieldDecl = Label { Label } ":" Expression .
851
852AliasDecl = Label "=" Expression .
853Label = identifier | json_string | TemplateLabel | ExprLabel.
854TemplateLabel = "<" identifier ">" .
855ExprLabel = "[" Expression "]" .
856Tag = "#" identifier [ ":" json_string ] .
857json_string = `"` { unicode_value } `"`
858```
859
860<!--
861TODO: consider using string interpolations for ExprLabel, instead of []
862So "\(k)" for [k]
863--->
864
865
866```
867{a: 1} ⊑ {}
868{a: 1, b: 1} ⊑ {a: 1}
869{a: 1} ⊑ {a: int}
870{a: 1, b: 1} ⊑ {a: int, b: float}
871
872{} ⋢ {a: 1}
873{a: 2} ⋢ {a: 1}
874{a: 1} ⋢ {b: 1}
875```
876
877```
878Expression Result
879{a: int, a: 1} {a: int(1)}
880{a: int} & {a: 1} {a: 1}
881{a: 1..7} & {a: 5..9} {a: 5..7}
882{a: 1..7, a: 5..9} {a: 5..7}
883
884{a: 1} & {b: 2} {a: 1, b: 2}
885{a: 1, b: int} & {b: 2} {a: 1, b: 2}
886
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +0100887{a: 1} & {a: 2} _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +0100888```
889
890A struct literal may, besides fields, also define aliases.
891Aliases declare values that can be referred to within the [scope] of their
892definition, but are not part of the struct: aliases are irrelevant to
893the partial ordering of values and are not emitted as part of any
894generated data.
895The name of an alias must be unique within the struct literal.
896
897```
898// An empty object.
899{}
900
901// An object with 3 fields and 1 alias.
902{
903 alias = 3
904
905 foo: 2
906 bar: "a string"
907
908 "not an ident": 4
909}
910```
911
912A field with as value a struct with a single field may be written as
913a sequence of the two field names,
914followed by a colon and the value of that single field.
915
916```
917job myTask: {
918 replicas: 2
919}
920```
921expands to the following JSON:
922```
923"job": {
924 "myTask": {
925 "replicas": 2
926 }
927}
928```
929
930
931A field declaration may be followed by an optional field tag,
932which becomes a key value pair for all equivalent fields in structs with which
933it is unified.
934If two structs are unified which both define a field for a label and both
935fields have a tag for that field with the same key,
936implementations may require that their value match.
937The tags are made visible through CUE's API and are
938not visible within the language itself.
939
940
941### Lists
942
943A list literal defines a new prototype of type list.
944A list may be open or closed.
945An open list is indicated with a `...` at the end of an element list,
946optionally followed by a prototype for the remaining elements.
947
948The length of a closed list is the number of elements it contains.
949The length of an open list is the its number of elements as a lower bound
950and an unlimited number of elements as its upper bound.
951
952```
953ListLit = "[" [ ElementList [ "," [ "..." [ Element ] ] ] "]" .
954ElementList = Element { "," Element } .
955Element = Expression | LiteralValue .
956```
957<!---
958KeyedElement = Element .
959--->
960
961A list can be represented as a struct:
962
963```
964List: null | {
965 Elem: _
966 Tail: List
967}
968```
969
970For closed lists, `Tail` is `null` for the last element, for open lists it is
971`null | List`.
972For instance, the closed list [ 1, 2, ... ] can be represented as:
973```
974open: List & { Elem: 1, Tail: { Elem: 2 } }
975```
976and the closed version of this list, [ 1, 2 ], as
977```
978closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
979```
980
981Using this definition, the subsumption and unification rules for lists can
982be derived from those of structs.
983Implementations are not required to implement lists as structs.
984
985
986## Declarations and Scopes
987
988
989### Blocks
990
991A _block_ is a possibly empty sequence of declarations.
992A block is mostly corresponds with the brace brackets of a struct literal
993`{ ... }`, but also includes the following,
994
995- The _universe block_ encompases all CUE source text.
996- Each package has a _package block_ containing all CUE source text in that package.
997- Each file has a _file block_ containing all CUE source text in that file.
998- Each `for` and `let` clause in a comprehension is considered to be
999 its own implicit block.
1000- Each function value is considered to be its own implicit block.
1001
1002Blocks nest and influence [scoping].
1003
1004
1005### Declarations and scope
1006
1007A _declaration_ binds an identifier to a field, alias, or package.
1008Every identifier in a program must be declared.
1009Other than for fields,
1010no identifier may be declared twice within the same block.
1011For fields an identifier may be declared more than once within the same block,
1012resulting in a field with a value that is the result of unifying the values
1013of all fields with the same identifier.
1014
1015```
1016TopLevelDecl = Declaration | Emit .
1017Emit = Operand .
1018```
1019
1020The _scope_ of a declared identifier is the extent of source text in which the
1021identifier denotes the specified field, alias, function, or package.
1022
1023CUE is lexically scoped using blocks:
1024
10251. The scope of a [predeclared identifier] is the universe block.
10261. The scope of an identifier denoting a field or alias
1027 declared at top level (outside any struct literal) is the file block.
10281. The scope of the package name of an imported package is the file block of the
1029 file containing the import declaration.
10301. The scope of a field or alias identifier declared inside a struct literal
1031 is the innermost containing block.
1032
1033An identifier declared in a block may be redeclared in an inner block.
1034While the identifier of the inner declaration is in scope, it denotes the entity
1035declared by the inner declaration.
1036
1037The package clause is not a declaration;
1038the package name do not appear in any scope.
1039Its purpose is to identify the files belonging to the same package
1040and tospecify the default name for import declarations.
1041
1042
1043### Predeclared identifiers
1044
1045```
1046Functions
1047len required close open
1048
1049Types
1050null The null type and value
1051bool All boolean values
1052int All integral numbers
1053float All decimal floating-point numbers
1054string Any valid UTF-8 sequence
1055bytes A blob of bytes representing arbitrary data
1056
1057Derived Value
1058number int | float
1059uint 0..int
1060uint8 0..255
1061byte uint8
1062int8 -128..127
1063uint16 0..65536
1064int16 -32_768...32_767
1065rune 0..0x10FFFF
1066uint32 0..4_294_967_296
1067int32 -2_147_483_648..2_147_483_647
1068uint64 0..18_446_744_073_709_551_615
1069int64 -9_223_372_036_854_775_808..9_223_372_036_854_775_807
1070uint128 340_282_366_920_938_463_463_374_607_431_768_211_455
1071
1072 "int128": mkIntRange(
1073 "-170141183460469231731687303715884105728",
1074 "170141183460469231731687303715884105727"),
1075
1076 // Do not include an alias for "byte", as it would be too easily confused
1077 // with the builtin "bytes".
1078 "uint8": mkIntRange("0", "255"),
1079 "uint16": mkIntRange("0", "65535"),
1080 "uint32": mkIntRange("0", "4294967295"),
1081 "uint64": mkIntRange("0", "18446744073709551615"),
1082}
1083```
1084
1085
1086### Exported and manifested identifiers
1087
1088An identifier of a package may be exported to permit access to it
1089from another package.
1090An identifier is exported if both:
1091the first character of the identifier's name is not a Unicode lower case letter
1092(Unicode class "Ll") or the underscore "_"; and
1093the identifier is declared in the file block.
1094All other identifiers are not exported.
1095
1096An identifier that starts with the underscore "_" is not
1097emitted in any data output.
1098Quoted labels that start with an underscore are emitted nonetheless.
1099
1100### Uniqueness of identifiers
1101
1102Given a set of identifiers, an identifier is called unique if it is different
1103from every other in the set, after applying normalization following
1104Unicode Annex #31.
1105Two identifiers are different if they are spelled differently.
1106<!--
1107or if they appear in different packages and are not exported.
1108--->
1109Otherwise, they are the same.
1110
1111
1112### Field declarations
1113
1114A field declaration binds a label (the names of the field) to an expression.
1115The name for a quoted string used as label is the string it represents.
1116Tne name for an identifier used as a label is the identifier itself.
1117Quoted strings and identifiers can be used used interchangeably, with the
1118exception of identifiers starting with an underscore '_'.
1119The latter represent hidden fields and are treated in a different namespace.
1120
1121
1122### Alias declarations
1123
1124An alias declaration binds an identifier to the given expression.
1125
1126Within the scope of the identifier, it serves as an _alias_ for that
1127expression.
1128The expression is evaluated in the scope as it was declared.
1129
1130
1131### Function declarations
1132
1133NOTE: this is an internal construction.
1134
1135A function declaration binds an identifier, the function name, to a function.
1136
1137```
1138FunctionDecl = FunctionName Parameters "->" FunctionValue .
1139FunctionName = identifier .
1140FunctionValue = Expression .
1141Result = Parameters .
1142Parameters = "(" [ ParameterList [ "," ] ] ")" .
1143ParameterList = ParameterDecl { "," ParameterDecl } .
1144ParameterDecl = identifier [ ":" Type ] .
1145```
1146
1147
1148## Expressions
1149
1150An expression specifies the computation of a value by applying operators and
1151functions to operands.
1152
1153
1154### Operands
1155
1156Operands denote the elementary values in an expression.
1157An operand may be a literal, a (possibly [qualified]) identifier denoting
1158field, alias, function, or a parenthesized expression.
1159
1160```
1161Operand = Literal | OperandName | ListComprehension | "(" Expression ")" .
1162Literal = BasicLit | ListLit | StructLit .
1163BasicLit = int_lit | float_lit | string_lit |
1164 null_lit | bool_lit | bottom_lit | top_lit .
1165OperandName = identifier | QualifiedIdent.
1166```
1167
1168### Qualified identifiers
1169
1170A qualified identifier is an identifier qualified with a package name prefix.
1171
1172```
1173QualifiedIdent = PackageName "." identifier .
1174```
1175
1176A qualified identifier accesses an identifier in a different package,
1177which must be [imported].
1178The identifier must be declared in the [package block] of that package.
1179
1180```
1181math.Sin // denotes the Sin function in package math
1182```
1183
1184
1185### Primary expressions
1186
1187Primary expressions are the operands for unary and binary expressions.
1188
1189```
1190PrimaryExpr =
1191 Operand |
1192 Conversion |
1193 PrimaryExpr Selector |
1194 PrimaryExpr Index |
1195 PrimaryExpr Slice |
1196 PrimaryExpr Arguments .
1197
1198Selector = "." identifier .
1199Index = "[" Expression "]" .
1200Slice = "[" [ Expression ] ":" [ Expression ] "]"
1201Argument = Expression .
1202Arguments = "(" [ ( Argument { "," Argument } ) [ "..." ] [ "," ] ] ")" .
1203```
1204<!---
1205Argument = Expression | ( identifer ":" Expression ).
1206--->
1207
1208```
1209x
12102
1211(s + ".txt")
1212f(3.1415, true)
1213m["foo"]
1214s[i : j + 1]
1215obj.color
1216f.p[i].x
1217```
1218
1219
1220### Selectors
1221
1222For a [primary expression] `x` that is not a [package name],
1223the selector expression
1224
1225```
1226x.f
1227```
1228
1229denotes the field `f` of the value `x`.
1230The identifier `f` is called the field selector.
1231The type of the selector expression is the type of `f`.
1232If `x` is a package name, see the section on [qualified identifiers].
1233
1234Otherwise, if `x` is not a struct, or if `f` does not exist in `x`,
1235the result of the expression is bottom (an error).
1236
1237```
1238T: {
1239 x: int
1240 y: 3
1241}
1242
1243a: T.x // int
1244b: T.y // 3
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001245c: T.z // _|_ // field 'z' not found in T
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001246```
1247
1248
1249### Index expressions
1250
1251A primary expression of the form
1252
1253```
1254a[x]
1255```
1256
1257denotes the element of the list, string, or struct `a` indexed by `x`.
1258The value `x` is called the index or field name, respectively.
1259The following rules apply:
1260
1261If `a` is not a struct:
1262
1263- the index `x` must be of integer type
1264- the index `x` is in range if `0 <= x < len(a)`, otherwise it is out of range
1265
1266The result of `a[x]` is
1267
1268for `a` of list type (including single quoted strings, which are lists of bytes):
1269
1270- the list element at index `x`, if `x` is within range
1271- bottom (an error), otherwise
1272
1273for `a` of string type:
1274
1275- the grapheme cluster at the `x`th byte (type string), if `x` is within range
1276- bottom (an error), otherwise
1277
1278for `a` of struct type:
1279
1280- the value of the field named `x` of struct `a`, if this field exists
1281- bottom (an error), otherwise
1282
1283```
1284[ 1, 2 ][1] // 2
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001285[ 1, 2 ][2] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001286"He\u0300?"[0] // "H"
1287"He\u0300?"[1] // "e\u0300"
1288"He\u0300?"[2] // "e\u0300"
1289"He\u0300?"[3] // "e\u0300"
1290"He\u0300?"[4] // "?"
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001291"He\u0300?"[5] // _|_
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001292```
1293
1294
1295### Slice expressions
1296
1297Slice expressions construct a substring or slice from a string or list.
1298
1299For strings or lists, the primary expression
1300```
1301a[low : high]
1302```
1303constructs a substring or slice. The indices `low` and `high` select
1304which elements of operand a appear in the result. The result has indices
1305starting at 0 and length equal to `high` - `low`.
1306After slicing the list `a`
1307
1308```
1309a := [1, 2, 3, 4, 5]
1310s := a[1:4]
1311```
1312the list s has length 3 and elements
1313```
1314s[0] == 2
1315s[1] == 3
1316s[2] == 4
1317```
1318For convenience, any of the indices may be omitted.
1319A missing `low` index defaults to zero; a missing `high` index defaults
1320to the length of the sliced operand:
1321```
1322a[2:] // same as a[2 : len(a)]
1323a[:3] // same as a[0 : 3]
1324a[:] // same as a[0 : len(a)]
1325```
1326
1327Indices are in range if `0 <= low <= high <= len(a)`,
1328otherwise they are out of range.
1329For strings, the indices selects the start of the extended grapheme cluster
1330at byte position indicated by the index.
1331If any of the slice values is out of range or if `low > high`, the result of
1332a slice is bottom (error).
1333
1334```
1335"He\u0300?"[:2] // "He\u0300"
1336"He\u0300?"[1:2] // "e\u0300"
1337"He\u0300?"[4:5] // "e\u0300?"
1338```
1339
1340
1341The result of a successful slice operation is a value of the same type
1342as the operand.
1343
1344
1345### Operators
1346
1347Operators combine operands into expressions.
1348
1349```
1350Expression = UnaryExpr | Expression binary_op Expression .
1351UnaryExpr = PrimaryExpr | unary_op UnaryExpr .
1352
1353binary_op = "|" | "&" | "||" | "&&" | rel_op | add_op | mul_op | ".." .
1354rel_op = "==" | "!=" | "<" | "<=" | ">" | ">=" .
1355add_op = "+" | "-" .
1356mul_op = "*" | "/" | "div" | "mod" | "quo" | "rem" .
1357
1358unary_op = "+" | "-" | "!" .
1359```
1360
1361Comparisons are discussed [elsewhere]. For other binary operators, the operand
1362types must be [identical] unless the operation involves untyped [constants]
1363or durations. For operations involving constants only, see the section on
1364[constant expressions].
1365
1366Except for duration operations, if one operand is an untyped [literal] and the
1367other operand is not, the constant is [converted] to the type of the other
1368operand.
1369
1370
1371#### Operator precedence
1372
1373Unary operators have the highest precedence.
1374
1375There are eight precedence levels for binary operators.
1376The `..` operator (range) binds strongest, followed by
1377multiplication operators, addition operators, comparison operators,
1378`&&` (logical AND), `||` (logical OR), `&` (unification),
1379and finally `|` (disjunction):
1380
1381```
1382Precedence Operator
1383 8 ..
1384 7 * / div mod quo rem
1385 6 + -
1386 5 == != < <= > >=
1387 4 &&
1388 3 ||
1389 2 &
1390 1 |
1391```
1392
1393Binary operators of the same precedence associate from left to right.
1394For instance, `x / y * z` is the same as `(x / y) * z`.
1395
1396```
1397+x
139823 + 3*x[i]
1399x <= f()
1400f() || g()
1401x == y+1 && y == z-1
14022 | int
1403{ a: 1 } & { b: 2 }
1404```
1405
1406#### Arithmetic operators
1407
1408Arithmetic operators apply to numeric values and yield a result of the same type
1409as the first operand. The three of the four standard arithmetic operators
1410`(+, -, *)` apply to integer and decimal floating-point types;
1411`+` and `*` also applies to lists and strings.
1412`/` only applies to decimal floating-point types and
1413`div`, `mod`, `quo`, and `rem` only apply to integer types.
1414
1415```
1416+ sum integers, floats, lists, strings
1417- difference integers, floats
1418* product integers, floats, lists, strings
1419/ quotient floats
1420div division integers
1421mod modulo integers
1422quo quotient integers
1423rem remainder integers
1424```
1425
1426#### Integer operators
1427
1428For two integer values `x` and `y`,
1429the integer quotient `q = x div y` and remainder `r = x mod y `
1430satisfy the following relationships:
1431
1432```
1433r = x - y*q with 0 <= r < |y|
1434```
1435where `|y|` denotes the absolute value of `y`.
1436
1437```
1438 x y x div y x mod y
1439 5 3 1 2
1440-5 3 -2 1
1441 5 -3 -1 2
1442-5 -3 2 1
1443```
1444
1445For two integer values `x` and `y`,
1446the integer quotient `q = x quo y` and remainder `r = x rem y `
1447satisfy the following relationships:
1448
1449```
1450x = q*y + r and |r| < |y|
1451```
1452
1453with `x quo y` truncated towards zero.
1454
1455```
1456 x y x quo y x rem y
1457 5 3 1 2
1458-5 3 -1 -2
1459 5 -3 -1 2
1460-5 -3 1 -2
1461```
1462
1463A zero divisor in either case results in bottom (an error).
1464
1465For integer operands, the unary operators `+` and `-` are defined as follows:
1466
1467```
1468+x is 0 + x
1469-x negation is 0 - x
1470```
1471
1472
1473#### Decimal floating-point operators
1474
1475For decimal floating-point numbers, `+x` is the same as `x`,
1476while -x is the negation of x.
1477The result of a floating-point division by zero is bottom (an error).
1478
1479An implementation may combine multiple floating-point operations into a single
1480fused operation, possibly across statements, and produce a result that differs
1481from the value obtained by executing and rounding the instructions individually.
1482
1483
1484#### List operators
1485
1486Lists can be concatenated using the `+` operator.
1487For, list `a` and `b`
1488```
1489a + b
1490```
1491will produce an open list if `b` is open.
1492If list `a` is open, only the existing elements will be involved in the
1493concatenation.
1494
1495```
1496[ 1, 2 ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
1497[ 1, 2, ... ] + [ 3, 4 ] // [ 1, 2, 3, 4 ]
1498[ 1, 2 ] + [ 3, 4, ... ] // [ 1, 2, 3, 4, ... ]
1499```
1500
1501Lists can be multiplied using the `*` operator.
1502```
15033*[1,2] // [1, 2, 1, 2, 1, 2]
1504
1505[1, 2, ...int] // open list of two elements with element type int
15064*[byte] // [byte, byte, byte, byte]
1507[...byte] // byte list or arbitrary length
1508(0..5)*[byte] // byte list of size 0 through 5
1509
1510// list with alternating elements of type string and int
1511uint*[string, int]
1512```
1513
1514The following illustrate how typed lists can be encoded as structs:
1515```
1516ip: 4*[byte]
1517ipT: {
1518 Elem: byte
1519 Tail: {
1520 Elem: byte
1521 Tail: {
1522 Elem: byte
1523 Tail: {
1524 Elem: byte
1525 Tail: null
1526 }
1527 }
1528 }
1529}
1530
1531rangeList: (1..2)*[int]
1532rangeListT: null | {
1533 Elem: int
1534 Tail: {
1535 Elem: int
1536 Tail: null | {
1537 Elem: int
1538 Tail: null
1539 }
1540 }
1541}
1542
1543strIntList: uint*[string, int]
1544strIntListT: null | {
1545 Elem: string
1546 Tail: {
1547 Elem: int
1548 Tail: strIntListT
1549 }
1550}
1551```
1552
1553#### String operators
1554
1555Strings can be concatenated using the `+` operator:
1556```
1557s := "hi " + name + " and good bye"
1558```
1559String addition creates a new string by concatenating the operands.
1560
1561A string can be repeated by multiplying it:
1562
1563```
1564s: "etc. "*3 // "etc. etc. etc. "
1565```
1566
1567##### Comparison operators
1568
1569Comparison operators compare two operands and yield an untyped boolean value.
1570
1571```
1572== equal
1573!= not equal
1574< less
1575<= less or equal
1576> greater
1577>= greater or equal
1578```
1579
1580In any comparison, the types of the two operands must unify.
1581
1582The equality operators `==` and `!=` apply to operands that are comparable.
1583The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
1584These terms and the result of the comparisons are defined as follows:
1585
1586- Boolean values are comparable.
1587 Two boolean values are equal if they are either both true or both false.
1588- Integer values are comparable and ordered, in the usual way.
1589- Floating-point values are comparable and ordered, as per the definitions
1590 for binary coded decimals in the IEEE-754-2008 standard.
1591- String values are comparable and ordered, lexically byte-wise.
1592- Struct are not comparable.
1593 Two struct values are equal if their corresponding non-blank fields are equal.
1594- List are not comparable.
1595 Two array values are equal if their corresponding elements are equal.
1596```
1597c: 3 < 4
1598
1599x: int
1600y: int
1601
1602b3: x == y // b3 has type bool
1603```
1604
1605#### Logical operators
1606
1607Logical operators apply to boolean values and yield a result of the same type
1608as the operands. The right operand is evaluated conditionally.
1609
1610```
1611&& conditional AND p && q is "if p then q else false"
1612|| conditional OR p || q is "if p then true else q"
1613! NOT !p is "not p"
1614```
1615
1616
1617<!--
1618### TODO TODO TODO
1619
16203.14 / 0.0 // illegal: division by zero
1621Illegal conversions always apply to CUE.
1622
1623Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
1624-->
1625
1626### Conversions
1627Conversions are expressions of the form `T(x)` where `T` and `x` are
1628expressions.
1629The result is always an instance of `T`.
1630
1631```
1632Conversion = Expression "(" Expression [ "," ] ")" .
1633```
1634
1635<!---
1636
1637A literal value `x` can be converted to type T if `x` is representable by a
1638value of `T`.
1639
1640As a special case, an integer literal `x` can be converted to a string type
1641using the same rule as for non-constant x.
1642
1643Converting a literal yields a typed value as result.
1644
1645```
1646uint(iota) // iota value of type uint
1647float32(2.718281828) // 2.718281828 of type float32
1648complex128(1) // 1.0 + 0.0i of type complex128
1649float32(0.49999999) // 0.5 of type float32
1650float64(-1e-1000) // 0.0 of type float64
1651string('x') // "x" of type string
1652string(0x266c) // "♬" of type string
1653MyString("foo" + "bar") // "foobar" of type MyString
1654string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant
1655(*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
1656int(1.2) // illegal: 1.2 cannot be represented as an int
1657string(65.0) // illegal: 65.0 is not an integer constant
1658```
1659--->
1660
1661A conversion is always allowed if `x` is of the same type as `T` and `x`
1662is an instance of `T`.
1663
1664If `T` and `x` of different underlying type, a conversion if
1665`x` can be converted to a value `x'` of `T`'s type, and
1666`x'` is an instance of `T`.
1667A value `x` can be converted to the type of `T` in any of these cases:
1668
1669- `x` is of type struct and is subsumed by `T` ignoring struct tags.
1670- `x` and `T` are both integer or floating point types.
1671- `x` is an integer or a list of bytes or runes and `T` is a string type.
1672- `x` is a string and `T` is a list of bytes or runes.
1673
1674
1675[Field tags] are ignored when comparing struct types for identity
1676for the purpose of conversion:
1677
1678```
1679person: {
1680 name: string #xml:"Name"
1681 address: null | {
1682 street: string #xml:"Street"
1683 city: string #xml:"City"
1684 } #xml:"Address"
1685}
1686
1687person2: {
1688 name: string
1689 address: null | {
1690 street: string
1691 city: string
1692 }
1693}
1694
1695p2 = person(person2) // ignoring tags, the underlying types are identical
1696```
1697
1698Specific rules apply to conversions between numeric types, structs,
1699or to and from a string type. These conversions may change the representation
1700of `x`.
1701All other conversions only change the type but not the representation of x.
1702
1703
1704#### Conversions between numeric ranges
1705For the conversion of numeric values, the following rules apply:
1706
17071. Any integer prototype can be converted into any other integer prototype
1708 provided that it is within range.
17092. When converting a decimal floating-point number to an integer, the fraction
1710 is discarded (truncation towards zero). TODO: or disallow truncating?
1711
1712```
1713a: uint16(int(1000)) // uint16(1000)
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001714b: uint8(1000) // _|_ // overflow
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001715c: int(2.5) // 2 TODO: TBD
1716```
1717
1718
1719
1720#### Conversions to and from a string type
1721
1722Converting a signed or unsigned integer value to a string type yields a string
1723containing the UTF-8 representation of the integer. Values outside the range of
1724valid Unicode code points are converted to `"\uFFFD"`.
1725
1726```
1727string('a') // "a"
1728string(-1) // "\ufffd" == "\xef\xbf\xbd"
1729string(0xf8) // "\u00f8" == "ø" == "\xc3\xb8"
1730
1731MyString(0x65e5) // "\u65e5" == "日" == "\xe6\x97\xa5"
1732```
1733
1734Converting a list of bytes to a string type yields a string whose successive
1735bytes are the elements of the slice.
1736Invalid UTF-8 is converted to `"\uFFFD"`.
1737
1738```
1739string('hell\xc3\xb8') // "hellø"
1740string(bytes([0x20])) // " "
1741```
1742
1743As string value is always convertible to a list of bytes.
1744
1745```
1746bytes("hellø") // 'hell\xc3\xb8'
1747bytes("") // ''
1748```
1749
1750<!---
1751#### Conversions between list types
1752
1753Conversions between list types are possible only if `T` strictly subsumes `x`
1754and the result will be the unification of `T` and `x`.
1755
1756<!---
1757If we introduce named types this would be different from IP & [10, ...]
1758
1759Consider removing this until it has a different meaning.
1760
1761```
1762IP: 4*[byte]
1763Private10: IP([10, ...]) // [10, byte, byte, byte]
1764```
1765--->
1766
1767#### Convesions between struct types
1768
1769A conversion from `x` to `T`
1770is applied using the following rules:
1771
17721. `x` must be an instance of `T`,
17732. all fields defined for `x` that are not defined for `T` are removed from
1774 the result of the conversion, recursively.
1775
1776```
1777T: {
1778 a: { b: 1..10 }
1779}
1780
1781x1: {
1782 a: { b: 8, c: 10 }
1783 d: 9
1784}
1785
1786c1: T(x1) // { a: { b: 8 } }
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01001787c2: T({}) // _|_ // missing field 'a' in '{}'
1788c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10)
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001789```
1790
1791
1792### Calls
1793
1794Given an expression `f` of function type F,
1795```
1796f(a1, a2, … an)
1797```
1798calls `f` with arguments a1, a2, … an. Arguments must be expressions
1799of which the values are an instance of the parameter types of `F`
1800and are evaluated before the function is called.
1801
1802```
1803a: math.Atan2(x, y)
1804```
1805<!---
1806
1807// FUNCTIONS ARE DISABLED:
1808Point: {
1809 Scale(a: float) -> Point({ Value: Point.Value * a })
1810 Value: float
1811}
1812pt: Point
1813pt: { Value: a }
1814
1815ptp: pt.Scale(3.5)
1816--->
1817
1818
1819In a function call, the function value and arguments are evaluated in the usual
1820order. After they are evaluated, the parameters of the call are passed by value
1821to the function and the called function begins execution. The return parameters
1822of the function are passed by value back to the calling function when the
1823function returns.
1824
1825<!---
1826TODO:
1827
1828Passing arguments to ... parameters
1829If f is variadic with a final parameter p of type ...T, then within f the type of p is equivalent to type []T. If f is invoked with no actual arguments for p, the value passed to p is nil. Otherwise, the value passed is a new slice of type []T with a new underlying array whose successive elements are the actual arguments, which all must be assignable to T. The length and capacity of the slice is therefore the number of arguments bound to p and may differ for each call site.
1830
1831Given the function and calls
1832
1833func Greeting(prefix string, who ...string)
1834Greeting("nobody")
1835Greeting("hello:", "Joe", "Anna", "Eileen")
1836within Greeting, who will have the value nil in the first call, and []string{"Joe", "Anna", "Eileen"} in the second.
1837
1838If the final argument is assignable to a slice type []T, it may be passed unchanged as the value for a ...T parameter if the argument is followed by .... In this case no new slice is created.
1839
1840Given the slice s and call
1841
1842s := []string{"James", "Jasmine"}
1843Greeting("goodbye:", s...)
1844within Greeting, who will have the same value as s with the same underlying array.
1845--->
1846
1847
1848### Comprehensions
1849
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01001850Lists and fields can be constructed using comprehensions.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001851
1852Each define a clause sequence that consists of a sequence of `for`, `if`, and
1853`let` clauses, nesting from left to right.
1854The `for` and `let` clauses each define a new scope in which new values are
1855bound to be available for the next clause.
1856
1857The `for` clause binds the defined identifiers, on each iteration, to the next
1858value of some iterable value in a new scope.
1859A `for` clause may bind one or two identifiers.
1860If there is one identifier, it binds it to the value, for instance
1861a list element, a struct field value or a range element.
1862If there are more two identifies, the first value will be the key or index,
1863if available, and the second will be the value.
1864
1865An `if` clause, or guard, specifies an expression that terminates the current
1866iteration if it evaluates to false.
1867
1868The `let` clause binds the result of an expression to the defined identifier
1869in a new scope.
1870
1871A current iteration is said to complete if the inner-most block of the clause
1872sequence is reached.
1873
1874List comprehensions specify a single expression that is evaluated and included
1875in the list for each completed iteration.
1876
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01001877Field comprehensions specify a field that is included in the struct for each
1878completed iteration.
1879If the same field is generated more than once, its values are unified.
1880The clauses of a field comprehension may not refer to fields generated by
1881field comprehensions defined in the same struct.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001882
1883```
1884ComprehensionDecl = Field [ "<-" ] Clauses .
1885ListComprehension = "[" Expression "<-" Clauses "]" .
1886
1887Clauses = Clause { Clause } .
1888Clause = ForClause | GuardClause | LetClause .
1889ForClause = "for" identifier [ ", " identifier] "in" Expression .
1890GuardClause = "if" Expression .
1891LetClause = "let" identifier "=" Expression .
1892```
1893
1894```
1895a: [1, 2, 3, 4]
1896b: [ x+1 for x in a if x > 1] // [3, 4, 5]
1897
Marcel van Lohuizen66db9202018-12-17 19:02:08 +01001898c: { "\(x)": x + y for x in a if x < 4 let y = 1 }
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001899d: { "1": 2, "2": 3, "3": 4 }
1900```
1901
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01001902
1903### String interpolation
1904
1905Strings interpolation allows constructing strings by replacing placeholder
1906expressions included in strings to be replaced with their string representation.
1907String interpolation may be used in single- and double-quoted strings, as well
1908as their multiline equivalent.
1909
1910A placeholder is demarked by a enclosing parentheses prefixed with
1911a backslash `\`.
1912Within the parentheses may be any expression to be
1913evaluated within the scope within which the string is defined.
1914
1915```
1916a: "World"
1917b: "Hello \( a )!" // Hello World!
1918```
1919
1920
1921## Builtin Functions
1922
1923Built-in functions are predeclared. They are called like any other function.
1924
1925The built-in functions cannot be used as function values.
1926
1927### `len`
1928
1929The built-in function `len` takes arguments of various types and return
1930a result of type int.
1931
1932```
1933Argument type Result
1934
1935string string length in bytes
1936list list length
1937struct number of distinct fields
1938```
1939
1940
1941### `required`
1942
1943The built-in function `required` discards the default mechanism of
1944a disjunction.
1945
1946```
1947"tcp" | "udp" // default is "tcp"
1948required("tcp" | "udp") // no default, user must specify either "tcp" or "udp"
1949```
1950
1951
1952## Modules, instances, and packages
1953
1954CUE configurations are constructed combining _instances_.
1955An instance, in turn, is constructed from one or more source files belonging
1956to the same _package_ that together declare the data representation.
1957Elements of this data representation may be exported and used
1958in other instances.
1959
1960### Source file organization
1961
1962Each source file consists of an optional package clause defining collection
1963of files to which it belongs,
1964followed by a possibly empty set of import declarations that declare
1965packages whose contents it wishes to use, followed by a possibly empty set of
1966declarations.
1967
1968
1969```
1970SourceFile = [ PackageClause "," ] { ImportDecl "," } { TopLevelDecl "," } .
1971```
1972
1973### Package clause
1974
1975A package clause is an optional clause that defines the package to which
1976a source file the file belongs.
1977
1978```
1979PackageClause = "package" PackageName .
1980PackageName = identifier .
1981```
1982
1983The PackageName must not be the blank identifier.
1984
1985```
1986package math
1987```
1988
1989### Modules and instances
1990A _module_ defines a tree directories, rooted at the _module root_.
1991
1992All source files within a module with the same package belong to the same
1993package.
1994A module may define multiple package.
1995
1996An _instance_ of a package is any subset of files within a module belonging
1997to the same package.
1998It is interpreted as the concatanation of these files.
1999
2000An implementation may impose conventions on the layout of package files
2001to determine which files of a package belongs to an instance.
2002For instance, an instance may be defined as the subset of package files
2003belonging to a directory and all its ancestors.
2004
2005
2006### Import declarations
2007
2008An import declaration states that the source file containing the declaration
2009depends on definitions of the _imported_ package (§Program initialization and
2010execution) and enables access to exported identifiers of that package.
2011The import names an identifier (PackageName) to be used for access and an
2012ImportPath that specifies the package to be imported.
2013
2014```
2015ImportDecl = "import" ( ImportSpec | "(" { ImportSpec ";" } ")" ) .
2016ImportSpec = [ "." | PackageName ] ImportPath .
2017ImportPath = `"` { unicode_value } `"` .
2018```
2019
2020The PackageName is used in qualified identifiers to access exported identifiers
2021of the package within the importing source file.
2022It is declared in the file block.
2023If the PackageName is omitted, it defaults to the identifier specified in the
2024package clause of the imported instance.
2025If an explicit period (.) appears instead of a name, all the instances's
2026exported identifiers declared in that instances's package block will be declared
2027in the importing source file's file block
2028and must be accessed without a qualifier.
2029
2030The interpretation of the ImportPath is implementation-dependent but it is
2031typically either the path of a builtin package or a fully qualifying location
2032of an instance within a source code repository.
2033
2034Implementation restriction: An interpreter may restrict ImportPaths to non-empty
2035strings using only characters belonging to Unicode's L, M, N, P, and S general
2036categories (the Graphic characters without spaces) and may also exclude the
2037characters !"#$%&'()*,:;<=>?[\]^`{|} and the Unicode replacement character
2038U+FFFD.
2039
2040Assume we have package containing the package clause package math,
2041which exports function Sin at the path identified by "lib/math".
2042This table illustrates how Sin is accessed in files
2043that import the package after the various types of import declaration.
2044
2045```
2046Import declaration Local name of Sin
2047
2048import "lib/math" math.Sin
2049import m "lib/math" m.Sin
2050import . "lib/math" Sin
2051```
2052
2053An import declaration declares a dependency relation between the importing and
2054imported package. It is illegal for a package to import itself, directly or
2055indirectly, or to directly import a package without referring to any of its
2056exported identifiers.
2057
2058
2059### An example package
2060
2061TODO
2062
2063## Interpretation
2064
2065CUE was inspired by a formalism known as
2066typed attribute structures [Carpenter 1992] or
2067typed feature structures [Copestake 2002],
2068which are used in linguistics to encode grammars and
2069lexicons. Being able to effectively encode large amounts of data in a rigorous
2070manner, this formalism seemed like a great fit for large-scale configuration.
2071
2072Although CUE configurations are specified as trees, not graphs, implementations
2073can benefit from considering them as graphs when dealing with cycles,
2074and effectively turning them into graphs when applying techniques like
2075structure sharing.
2076Dealing with cycles is well understood for typed attribute structures
2077and as CUE configurations are formally closely related to them,
2078we can benefit from this knowledge without reinventing the wheel.
2079
2080
2081### Formal definition
2082
2083<!--
2084The previous section is equivalent to the below text with the main difference
2085that it is only defined for trees. Technically, structs are more akin dags,
2086but that is hard to explain at this point and also unnecessarily pedantic.
2087 We keep the definition closer to trees and will layer treatment
2088of cycles on top of these definitions to achieve the same result (possibly
2089without the benefits of structure sharing of a dag).
2090
2091A _field_ is a field name, or _label_ and a protype.
2092A _struct_ is a set of _fields_ with unique labels for each field.
2093-->
2094
2095A CUE configuration can be defined in terms of constraints, which are
2096analogous to typed attribute structures referred to above.
2097
2098#### Definition of basic prototypes
2099
2100> A _basic prototype_ is any CUE prototype that is not a struct (or, by
2101> extension, a list).
2102> All basic prototypes are paritally ordered in a lattice, such that for any
2103> basic prototype `a` and `b` there is a unique greatest lower bound
2104> defined for the subsumption relation `a ⊑ b`.
2105
2106```
2107Basic prototypes
2108null
2109true
2110bool
21113.14
2112string
2113"Hello"
21140..10
2115<8
2116re("Hello .*!")
2117```
2118
2119The basic prototypes correspond to their respective types defined earlier.
2120
2121Struct (and by extension lists), are represented by the abstract notion of
2122a constraint structure.
2123Each node in a configuration, including the root node,
2124is associated with a constraint.
2125
2126
2127#### Definition of a typed feature structures and substructures
2128
2129> A typed feature structure_ defined for a finite set of labels `Label`
2130> is directed acyclic graph with labeled
2131> arcs and values, represented by a tuple `C = <Q, q0, υ, δ>`, where
2132>
2133> 1. `Q` is the finite set of nodes,
2134> 1. `q0 ∈ Q`, is the root node,
2135> 1. `υ: Q → T` is the total node typing function,
2136> for a finite set of possible terms `T`.
2137> 1. `δ: Label × Q → Q` is the partial feature function,
2138>
2139> subject to the following conditions:
2140>
2141> 1. there is no node `q` or label `l` such that `δ(q, l) = q0` (root)
2142> 2. for every node `q` in `Q` there is a path `π` (i.e. a sequence of
2143> members of Label) such that `δ(q0, π) = q` (unique root, correctness)
2144> 3. there is no node `q` or path `π` such that `δ(q, π) = q` (no cycles)
2145>
2146> where `δ` is extended to be defined on paths as follows:
2147>
2148> 1. `δ(q, ϵ) = q`, where `ϵ` is the empty path
2149> 2. `δ(q, l∙π) = δ(δ(l, q), π)`
2150>
2151> The _substructures_ of a typed feature structure are the
2152> typed feature structures rooted at each node in the structure.
2153>
2154> The set of all possible typed feature structures for a given label
2155> set is denoted as `𝒞`<sub>`Label`</sub>.
2156>
2157> The set of _terms_ for label set `Label` is recursively defined as
2158>
2159> 1. every basic prototype: `P ⊆ T`
2160> 2. every constraint in `𝒞`<sub>`Label`</sub> is a term: `𝒞`<sub>`Label`</sub>` ⊆ T`
2161> 3. for every `n` prototypes `t₁, ..., tₙ`, and every `n`-ary function symbol
2162> `f ∈ F_n`, the prototye `f(t₁,...,tₙ) ∈ T`.
2163>
2164
2165
2166This definition has been taken and modified from [Carpenter, 1992]
2167and [Copestake, 2002].
2168
2169Without loss of generality, we will henceforth assume that the given set
2170of labels is constant and denote `𝒞`<sub>`Label`</sub> as `𝒞`.
2171
2172In CUE configurations, the abstract constraints implicated by `υ`
2173are CUE exressions.
2174Literal structs can be treated as part of the original typed feature structure
2175and do not need evaluation.
2176Any other expression is evaluated and unified with existing values of that node.
2177
2178References in expressions refer to other nodes within the `C` and represent
2179a copy of such a `C`.
2180The functions defined by `F` correspond to the binary and unary operators
2181and interpolation construct of CUE, as well as builtin functions.
2182
2183CUE allows duplicate labels within a struct, while the definition of
2184typed feature structures does not.
2185A duplicate label `l` with respective values `a` and `b` is represented in
2186a constraint as a single label with term `&(a, b)`,
2187the unification of `a` and `b`.
2188Multiple labels may be recursively combined in any order.
2189
2190<!-- unnecessary, probably.
2191#### Definition of evaluated prototype
2192
2193> A fully evaluated prototype, `T_evaluated ⊆ T` is a subset of `T` consisting
2194> only of atoms, typed attribute structures and constraint functions.
2195>
2196> A prototype is called _ground_ if it is an atom or typed attribute structure.
2197
2198#### Unification of evaluated prototypes
2199
2200> A fully evaluated prototype, `T_evaluated ⊆ T` is a subset of `T` consisting
2201> only of atoms, typed attribute structures and constraint functions.
2202>
2203> A prototype is called _ground_ if it is an atom or typed attribute structure.
2204-->
2205
2206#### Definition of subsumption and unification on typed attribute structure
2207
2208> For a given collection of constraints `𝒞`,
2209> we define `π ≡`<sub>`C`</sub> `π'` to mean that constraint structure `C ∈ 𝒞`
2210> contains path equivalence between the paths `π` and `π'`
2211> (i.e. `δ(q0, π) = δ(q0, π')`, where `q0` is the root node of `C`);
2212> and `𝒫`<sub>`C`</sub>`(π) = c` to mean that
2213> the constraint structure at the path `π` in `C`
2214> is `c` (i.e. `𝒫`<sub>`C`</sub>`(π) = c` if and only if `υ(δ(q0, π)) == c`,
2215> where `q0` is the root node of `C`).
2216> Subsumption is then defined as follows:
2217> `C ∈ 𝒞` subsumes `C' ∈ 𝒞`, written `C' ⊑ C`, if and only if:
2218>
2219> - `π ≡`<sub>`C`</sub> `π'` implies `π ≡`<sub>`C'`</sub> `π'`
2220> - `𝒫`<sub>`C`</sub>`(π) = c` implies`𝒫`<sub>`C'`</sub>`(π) = c` and `c' ⊑ c`
2221>
2222> The unification of `C` and `C'`, denoted `C ⊓ C'`,
2223> is the greatest lower bound of `C` and `C'` in `𝒞` ordered by subsumption.
2224
2225Like with the subsumption relation for basic prototypes,
2226the subsumption relation for constraints determines the mutual placement
2227of constraints within the partial order of all values.
2228
2229
2230#### Evaluation function
2231
2232> The evaluation function is given by `E: T -> 𝒞`.
2233> The unification of two constraint structures is evaluated as defined above.
2234> All other functions are evaluated according to the definitions found earlier
2235> in this spec.
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002236> An error is indicated by `_|_`.
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002237
2238#### Definition of well-formedness
2239
2240> We say that a given constraint structure `C = <Q, q0, υ, δ> ∈ 𝒞` is
2241> a _well-formed_ constraint structure if and only if for all nodes `q ∈ Q`,
2242> the substructure `C'` rooted at `q`,
2243> is such that `E(υ(q)) ∈ 𝒞` and `C' = <Q', q, δ', υ'> ⊑ E(υ(q))`.
2244
2245<!-- Also, like Copestake, define appropriate features?
2246Appropriate features are useful for detecting unused variables.
2247
2248Appropriate features could be introduced by distinguishing between:
2249
2250a: MyStruct // appropriate features are MyStruct
2251a: {a : 1}
2252
2253and
2254
2255a: MyStruct & { a: 1 } // appropriate features are those of MyStruct + 'a'
2256
2257This is way to suttle, though.
2258
2259Alternatively: use Haskell's approach:
2260
2261a :: MyStruct // define a to be MyStruct any other features are allowed but
2262 // discarded from the model. Unused features are an error.
2263
2264Let's first try to see if we can get away with static usage analysis.
2265A variant would be to define appropriate features unconditionally, but enforce
2266them only for unused variables, with some looser definition of unused.
2267-->
2268
2269The _evaluation_ of a CUE configuration represented by `C`
2270is defined as the process of making `C` well-formed.
2271
2272<!--
2273ore abstractly, we can define this structure as the tuple
2274`<≡, 𝒫>`, where
2275
2276- `≡ ⊆ Path × Path` where `π ≡ π'` if and only if `Δ(π, q0) = Δ(π', q0)` (path equivalence)
2277- `P: Path → ℙ` is `υ(Δ(π, q))` (path value).
2278
2279A struct `a = <≡, 𝒫>` subsumes a struct `b = <≡', 𝒫'>`, or `a ⊑ b`,
2280if and only if
2281
2282- `π ≡ π'` implied `π ≡' π'`, and
2283- `𝒫(π) = v` implies `𝒫'(π) = v'` and `v' ⊑ v`
2284-->
2285
2286#### References
2287Theory:
2288- [1992] Bob Carpenter, "The logic of typed feature structures.";
2289 Cambridge University Press, ISBN:0-521-41932-8
2290- [2002] Ann Copestake, "Implementing Typed Feature Structure Grammars.";
2291 CSLI Publications, ISBN 1-57586-261-1
2292
2293Some graph unification algorithms:
2294
2295- [1985] Fernando C. N. Pereira, "A structure-sharing representation for
2296 unification-based grammar formalisms."; In Proc. of the 23rd Annual Meeting of
2297 the Association for Computational Linguistics. Chicago, IL
2298- [1991] H. Tomabechi, "Quasi-destructive graph unifications.."; In Proceedings
2299 of the 29th Annual Meeting of the ACL. Berkeley, CA
2300- [1992] Hideto Tomabechi, "Quasi-destructive graph ynifications with structure-
2301 sharing."; In Proceedings of the 15th International Conference on
2302 Computational Linguistics (COLING-92), Nantes, France.
2303- [2001] Marcel van Lohuizen, "Memory-efficient and thread-safe
2304 quasi-destructive graph unification."; In Proceedings of the 38th Meeting of
2305 the Association for Computational Linguistics. Hong Kong, China.
2306
2307
2308### Evaluation
2309
2310The _evaluation_ of a CUE configuration `C` is defined as the process of
2311making `C` well-formed.
2312
2313This document does not define any operational semantics.
2314As the unification operation is communitive, transitive, and reflexive,
2315implementations have a considerable amount of leeway in
2316chosing an evaluation strategy.
2317Although most algorithms for the unification of typed attribute structure
2318that have been proposed are `O(n)`, there can be considerable performance
2319benefits of chosing one of the many proposed evaluation strategies over the
2320other.
2321Implementations will need to be verified against the above formal definition.
2322
2323
2324
2325#### Constraint functions
2326
2327A _constraint function_ is a unary function `f` which for any input `a` only
2328returns values that are an instance of `a`. For instance, the constraint
2329function `f` for `string` returns `"foo"` for `f("foo")` and `_|_` for `f(1)`.
2330Constraint functions may take other constraint functions as arguments to
2331produce a more restricting constraint function.
2332For instance, the constraint function `f` for `0..8` returns `5` for `f(5)`,
2333`5..8` for `f(5..10)`, and `_|_` for `f("foo")`.
2334
2335
2336Constraint functions play a special role in unification.
2337The unification function `&(a, b)` is defined as
2338
2339- `a & b` if `a` and `b` are two atoms
2340- `a & b` if `a` and `b` are two nodes, respresenting struct
2341- `a(b)` or `b(a)` if either `a` or `b` is a constraint function, respectively.
2342
2343Implementations are free to pick which constraint function is applied if
2344both `a` and `b` are constraint functions, as the properties of unification
2345will ensure this produces identical results.
2346
2347#### Manifestation
2348
2349TODO: a prototype which is a function invocation that cannot be evaluated
2350or for which the result is not an atom or a struct is called _incomplete_.
2351
2352
2353
2354### Validation
2355
2356TODO: when to proactively do recursive validation
2357
2358#### References
2359
2360A distinguising feature of CUE's unification algorithm is the use of references.
2361In conventional graph unification for typed feature structures, the structures
2362that are unified into the existing graph are independent and pre-evaluated.
2363In CUE, the constraint structures indicated by references may still need to
2364be evaluated.
2365Some conventional evaluation strategy may not cope well with references that
2366refer to each other.
2367The simple solution is to deploy a bread-first evaluation strategy, rather than
2368the more traditional depth-first approach.
2369Other approaches are possible, however, and implementations are free to choose
2370which approach is deployed.
2371
2372### Cycles
2373
2374TODO: describe precisely which cycles must be resolved by implementations.
2375
2376Rules:
2377
2378- Unification of atom value `a` with non-concrete atom `b` for node `q`:
2379 - set `q` to `a` and schedule the evalution `a == b` at the end of
2380 evaluating `q`: `C` is only correct under the assumption that `q` is `a`
2381 so evaluate later.
2382
2383A direct cyclic reference between nodes defines a shared node for the paths
2384of the original nodes.
2385
2386- Unification of cycle of references of struct,
2387 for instance: `{ a: b, b: c, c: a }`
2388 - ignore the cycle and continue evaluating not including the last unification:
2389 a unification of a value with itself is itself. As `a` was already included,
2390 ignoring the cycle will produce the same result.
2391
2392```
2393Configuration Evaluated
2394// c Cycles in nodes of type struct evaluate
2395// ↙︎ ↖ to the fixed point of unifying their.
2396// a → b values
2397
2398a: b // a: { x: 1, y: 3 }
2399b: c // b: { x: 1, y: 3 }
2400c: a // c: { x: 1, y: 3 }
2401
2402a: { x: 1 }
2403b: { y: 3 }
2404```
2405
2406
24071. Cycle breaking
24081. Cycle detection
24091. Assertion checks
24101. Validation
2411
2412The preparation step loads all the relevant CUE sources and merges duplicate
2413by creating unification expressions until each field is unique within its scope.
2414
2415
2416For fields of type struct any cycle that does not result in an infinite
2417structure is allowed.
2418An expresion of type struct only allows unification and disjunction operations.
2419
2420Unification of structs is done by unifying a copy of each of the input structs.
2421A copy of a referenced input struct may itself contain references which are
2422handled with the following rules:
2423- a reference bound to a field that it is being copied is replaced
2424 with a new reference pointing to the respective copy,
2425- a reference bound to a field that is not being copied refers to the
2426 original field.
2427
2428
2429#### Self-referential cycles
2430
2431A graph unification algorithm like Tomabechi [] or Van Lohuizen [] can be used
2432to handle the reference replacement rules and minimize the cost of
2433copying and cycle detection.
2434
2435Unification of lists, which are expressible as structs, follow along the same
2436lines.
2437
2438For an expression `a & b` of any scalar type where exactly one of `a` or `b` is
2439a concrete value, the result may be replaced by this concrete value while
2440adding the expression `a == b` to the list of assertions.
2441
2442```
2443// Config Evaluates to
2444x: { x: {
Marcel van Lohuizen6f0faec2018-12-16 10:42:42 +01002445 a: b + 100 a: _|_ // cycle detected
2446 b: a - 100 b: _|_ // cycle detected
Marcel van Lohuizendd5e5892018-11-22 23:29:16 +01002447} }
2448
2449y: x & { y: {
2450 a: 200 a: 200 // asserted that 200 == b + 100
2451 b: 100
2452} }
2453```
2454
2455During the evaluation of a field which expression is being evaluated is marked as such.
2456
2457A field `f` with unification expression `e` where `e` contains reference that in turn
2458point to `a` can be handled as follows:
2459
2460#### Evaluation cycles
2461
2462For structs, cycles are disallowed
2463
2464Disallowed cycles:
2465
2466A field `a` is _reachable_ from field `b` if there is a selector sequence
2467from `a` to `b`.
2468
2469A reference used in field `a` may not refer to a value that recursively
2470refers to a value that is reachable from `a`.
2471
2472```
2473a: b & { c: 3 }
2474
2475b: a.c // illegal reference
2476
2477```
2478
2479#### Structural cycles
2480
2481A reference to `Δ(π, q0)` may not recursively refer to `Δ(π', q)`,
2482where `π` is a prefix to `π'`.
2483
2484
2485a: b & { b: _ }
2486
2487
2488### Validation
2489
2490Implementations are allowed to postpone recursive unification of structures
2491except for in the following cases:
2492
2493- Unification within disjunctions:
2494
2495
2496<!--
2497### Inference
2498
2499There is currently no logical inference for values of references prescribed.
2500It mostly relies on users defining the value of all variables.
2501The main reason for this is to keep control over run time complexity.
2502However, implementations may be free to do so.
2503Also, later versions of the language may strengthen requirements for resolution.
2504
2505TODO: examples of situations where variables could be resolved but are not.
2506-->
2507
2508### Unused values
2509
2510TODO: rules for detection of unused variables
2511
25121. Any alias value must be used
2513