doc/ref: updates to spec based on jba’s comments

Regarding the rewrite of disjunctions:
I also verified this definition works well with cycles.

Change-Id: I807ba0835e3868d643e4fc6665a44c0fa5b723fd
diff --git a/doc/ref/spec.md b/doc/ref/spec.md
index 4faa44c..8d4cecb 100644
--- a/doc/ref/spec.md
+++ b/doc/ref/spec.md
@@ -227,7 +227,7 @@
 
 #### Preamble
 
-The following pseudo-keywords are used at the preamble of a CUE file.
+The following keywords are used at the preamble of a CUE file.
 After the preamble, they may be used as identifiers to refer to namesake fields.
 
 ```
@@ -237,13 +237,13 @@
 
 #### Comprehension clauses
 
-The following pseudo-keywords are used in comprehensions.
+The following keywords are used in comprehensions.
 
 ```
 for          in           if           let
 ```
 
-The pseudo-keywords `for`, `if` and `let` cannot be used as identifiers to
+The keywords `for`, `if` and `let` cannot be used as identifiers to
 refer to fields. All others can.
 
 <!--
@@ -346,6 +346,7 @@
 
 
 ### String and byte sequence literals
+
 A string literal represents a string constant obtained from concatenating a
 sequence of characters.
 Byte sequences are a sequence of bytes.
@@ -581,7 +582,7 @@
 ### Unification
 
 The _unification_ of values `a` and `b`, denoted as `a & b` in CUE,
-is defined as the greatest lower bound of `a` and `b`. (That is, the 
+is defined as the greatest lower bound of `a` and `b`. (That is, the
 value `u` such that `u ⊑ a` and `u ⊑ b`,
 and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
 it holds that `v ⊑ u`.)
@@ -595,130 +596,120 @@
 on top of it.
 
 Syntactically, unification is a [binary expression](#Operands).
-
+<!-- TODO: explicitly mention that disjunction is not a binary operation
+but a definition of a single value?-->
 
 ### Disjunction
 
 A _disjunction_ of two values `a` and `b`, denoted as `a | b` in CUE,
-is defined as the smallest value `d` such that `a ⊑ d` and `b ⊑ d`.
+defines the smallest value `d` such that `a ⊑ d` and `b ⊑ d`.
+This style of disjunctions is sometimes also referred to as sum types.
+
 These all follow from the definition of disjunction:
 - The disjunction of `a` with itself is always `a`.
 - The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
 - The disjunction of a value `a` with bottom is always `a`.
 - The disjunction of two bottom values is bottom.
 
+Elements that evaluate to bottom are eliminated from a disjunction.
+A disjunction evaluates to bottom (`_|_`) if it has no elements
+after such elimination.
+
 Syntactically, disjunction is a [binary expression](#Operands).
 
-Implementations should report an error if for a disjunction `a | ... | b`
-where `b` is an instance of `a`, as `b` will be superfluous and can never
-be selected as a default. <!-- jba: either put this paragraph later, or link 
-to the discussion of defaults. -->
-
-If a disjunction is used in any operation other than unification or another
-disjunction, the default value is chosen before operating on it.
-<!-- jba: I wish that the default mechanism wasn't so intertwined with
-     disjunction. Like Prolog "cut," it breaks the beautiful functional
-     structure. 
-     
-     But reading the Kubernetes examples, I see how pervasive and useful it is.
--->
-
-   
+The unification of a disjunction with another value is equal to the disjunction
+composed of the unification of this value with all of the original elements
+of the disjunction.
+In other words, unification distributes over disjunction.
+So
 
 ```
-Expression                Result (without picking default)
+(a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b,
+```
+
+with values evaluating to bottom removed from the resulting disjunction.
+
+```
+Expression                Result
+({a:1} | {b:2}) & {c:3}   {a:1, c:3} | {b:2, c:3}
 (int | string) & "foo"    "foo"
 ("a" | "b") & "c"         _|_
-
-(3 | 5) + 2               5
 ```
 
-If the values of a disjunction are unambiguous, its first value may be taken
-as a default value. <!-- jba: move this sentence higher -->
+Unification of two disjunctions is defined to retain a strict ordering
+of the elements of the resulting disjunction:
 
-The default value for a disjunction is selected when:
-
-1. passing it to an argument of a call or index value,
-1. using it in any unary or binary expression except for unification or disjunction,
-1. using it as the receiver of a call, index, slice, or selector expression, and
-1. a value is taken for a configuration. <!-- jba: not clear what this phrase means -->
-<!-- jba: you said this above more concisely as "anything other than unification
-or disjunction". -->
-
-<!-- jba: This rule concerns me. It seems to imply that the default value can be
-used even if it is not ultimately selected as the concrete value for the
-disjunction. Example:
-
-    a: 1 | int
-    b: a+1
-    // later
-    a: 2
-
-You say "the default value for a disjunction is selected when...using it in any
-unary or binary expression except for unification or disjunction", so 1 is
-selected for `a` in the second line, and therefore `b` is 2. But then `a` is 2,
-so the constraint on the second line is violated at the end of the evaluation.
-
-Maybe the rule only applies to disjunction expressions that appear as part of
-other expressions, not when they are the value of a field? 
+```
+(a_0 | ... | a_n) & (b_0 | ... | b_m) =>
+a_0&b_0 | ... | a_n&b_0 | a_0&b_1 | ... | a_n&b_m
+```
+<!-- Note that there are several variants of ordering that would work here.
+It only matters that there exists a pre-determined order.
 -->
 
-<!-- jba
-Assuming I'm right about the above and the default is not always selected in expressions,
-then it can matter which order the defaults are selected. Consider:
+The _first value_ of a disjunction is the first element in a disjunction
+after eliminating values evaluating to bottom.
+The ordering is irrelevant for the placement of a disjunction within the
+lattice of values, but it is relevant for default values, which we will
+discuss next.
 
-    a: 1 | int
-    b: 2 | int
-    c: a + b
-    c: 4
+#### Default values
 
-After applying all constraints, I don't have concrete values for `a` or `b`, so
-I must select a default and then re-run my constraint engine. (At least that's
-how I imagine the implementation to work.) 
+A disjunction evaluates to its _default_ value if it is used as a value for any
+operation other than unification or disjunction.
+The default value of a disjunction is its first value or
+bottom in case of a manifest failure.
+A _manifest failure_ occurs if
+the disjunction resulted from the unification of two or more disjunctions and
+these disjunctions can be unified in different orders such that they
+result in different first values.
 
-If I select `a` first, then I end with a==1 and b=3. If I select `b` first, then
-I get b==2 and a==2.
+<!-- TODO(mpvl): consider it to be a manifest failure if if non-concrete
+values are used in expressions requiring concrete values.
 -->
 
-
-A value is unambiguous if a disjunction has never been unified with another
-disjunction, or if the first element is the result of unifying two first
-values of a disjunction.
-<!-- jba: Hard to understand this sentence, or why a disjunction must be
-unambiguous to use its first value as a default. -->
+A default value is chosen if the disjunction is not used
+in a unification or disjunction operation.
+This means that in practice, a default is chosen for almost any expression
+that does not involve `&` and `|`, including slices, indices, selectors,
+and all but a few explicitly marked builtin functions.
 
 ```
 Expression                       Default
-("tcp"|"udp") & ("tcp"|"udp")    "tcp"  // default chosen
-("tcp"|"udp") & ("udp"|"tcp")    _|_    // no unique default
+"tcp" | "udp"                    "tcp"
+1 | int                          1
+string | 1.0                     string
 
-("a"|"b") & ("b"|"a") & "a"      "a"    // single value after evaluation
+("tcp"|"udp") & ("tcp"|"udp")    "tcp"
+("tcp"|"udp") & ("udp"|"tcp")    _|_ // ambiguous: no unique first value
 ```
-<!-- jba: the first two examples here contradict your assertion that -->
-<!-- defaults are not chosen under unfication and disjunction. Both should have -->
-<!-- the result ("tcp"|"udp"). -->
 
+A disjunction always evaluates to the same default value, regardless of
+the context in which the value is used.
+For instance, `[1, 3]["a" | 1]` will result in an error, as `"a"` will be
+selected as the default value.
+
+
+```
+[1, 2]["a" | 1]          //  _|_ // "a" is not an integer value
+[1, 2][("a" | 1) & int]  //  2
+```
+
+Implementations should report an error for a disjunction `a | ... | b`,
+where `b` is an instance of `a`, as `b` will be superfluous and can never
+be selected as a default.
 
 
 ### Bottom and errors
 
 Any evaluation error in CUE results in a bottom value, respresented by
 the token '_|_'.
-Bottom is an instance of every other prototype.
+Bottom is an instance of every other value.
 Any evaluation error is represented as bottom.
 
 Implementations may associate error strings with different instances of bottom;
 logically they all remain the same value.
 
-```
-Expr         Result
- 1  &  2       _|_
-int & bool     _|_
-_|_ |  1        1
-_|_ &  2       _|_
-_|_ & _|_      _|_
-```
-
 
 ### Top
 
@@ -736,9 +727,9 @@
 
 ### Null
 
-The _null value_ is represented with the pseudo-keyword `null`.
+The _null value_ is represented with the keyword `null`.
 It has only one parent, top, and one child, bottom.
-It is unordered with respect to any other prototype.
+It is unordered with respect to any other value.
 
 ```
 null_lit   = "null"
@@ -754,7 +745,7 @@
 ### Boolean values
 
 A _boolean type_ represents the set of Boolean truth values denoted by
-the pseudo-keywords `true` and `false`.
+the keywords `true` and `false`.
 The predeclared boolean type is `bool`; it is a defined type and a separate
 element in the lattice.
 
@@ -862,6 +853,7 @@
 
 We say a label is defined for a struct if the struct has a field with the
 corresponding label.
+The value for a label `f` of struct `a` is denoted `f.a`.
 A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
 defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
 Note that if `a` is an instance of `b` it may have fields with labels that
@@ -870,12 +862,14 @@
 The (unique) struct with no fields, written `{}`, has every struct as an
 instance. It can be considered the type of all structs.
 
-The unification of structs `a` and `b` is a new struct `c` which
+The successful unification of structs `a` and `b` is a new struct `c` which
 has all fields of both `a` and `b`, where
 the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
 or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
 Any [references](#References) to `a` or `b`
 in their respective field values need to be replaced with references to `c`.
+The result of a unification is bottom (`_|_`) if any of its fields evaluates
+to bottom, recursively.
 
 A field name may also be an interpolated string.
 Identifiers used in such strings are evaluated within
@@ -966,10 +960,10 @@
 
 ### Lists
 
-A list literal defines a new prototype of type list.
+A list literal defines a new value of type list.
 A list may be open or closed.
 An open list is indicated with a `...` at the end of an element list,
-optionally followed by a prototype for the remaining elements.
+optionally followed by a value for the remaining elements.
 
 The length of a closed list is the number of elements it contains.
 The length of an open list is the its number of elements as a lower bound
@@ -1307,8 +1301,8 @@
 ```
 constructs a substring or slice. The indices `low` and `high` must be
 concrete integers and select
-which elements of operand a appear in the result. The result has indices
-starting at 0 and length equal to `high` - `low`.
+which elements of operand `a` appear in the result.
+The result has indices starting at 0 and length equal to `high` - `low`.
 After slicing the list `a`
 <!-- TODO(jba): how does slicing open lists work? -->
 
@@ -1366,14 +1360,16 @@
 ```
 <!-- TODO: consider adding unary_op: "<" | "<=" | ">" | ">=" -->
 
-Comparisons are discussed [elsewhere]. For other binary operators, the operand
-types must be [identical] unless the operation involves untyped [constants]
-or durations. For operations involving constants only, see the section on
-[constant expressions].
+Comparisons are discussed [elsewhere](#Comparison-operators).
+For other binary operators, the operand
+types must unify.
+<!-- TODO: durations
+ unless the operation involves durations.
 
 Except for duration operations, if one operand is an untyped [literal] and the
 other operand is not, the constant is [converted] to the type of the other
 operand.
+-->
 
 
 #### Operator precedence
@@ -1486,6 +1482,7 @@
 For decimal floating-point numbers, `+x` is the same as `x`,
 while -x is the negation of x.
 The result of a floating-point division by zero is bottom (an error).
+<!-- TODO: consider making it +/- Inf -->
 
 An implementation may combine multiple floating-point operations into a single
 fused operation, possibly across statements, and produce a result that differs
@@ -1495,7 +1492,7 @@
 #### List operators
 
 Lists can be concatenated using the `+` operator.
-For lists `a` and `b`, 
+For lists `a` and `b`,
 ```
 a + b
 ```
@@ -1541,7 +1538,7 @@
 }
 
 rangeList:  (1..2)*[int]
-rangeListT: null | {
+rangeListT: {
     Elem: int
     Tail: {
         Elem: int
@@ -1561,8 +1558,8 @@
     }
 }
 ```
-<!-- jba: I thought (1..2)*[int] is equivalent to [int] | [int, int]
-     but rangeListT is not that. -->
+<!-- TODO(mpvl): should we disallow multiplication with a range?
+If so, how does one specify a list with a range of possible lengths? -->
 
 <!-- jba: Clarify the allowed values for the non-list operand. And must that be 
 the left operand? -->
@@ -1694,40 +1691,10 @@
 `x'` is an instance of `T`.
 A value `x` can be converted to the type of `T` in any of these cases:
 
-- `x` is of type struct and is subsumed by `T` ignoring struct tags.
-- `x` and `T` are both integer or floating point types.
-- `x` is an integer or a list of bytes or runes and `T` is a string type.
-<!-- jba: first mention of runes -->
-- `x` is a string and `T` is a list of bytes or runes.
-
-
-<!--
-[Field tags] are ignored when comparing struct types for identity
-for the purpose of conversion:
-
-```
-person: {
-    name:    string #xml:"Name"
-    address: null | {
-        street: string #xml:"Street"
-        city:   string #xml:"City"
-    }  #xml:"Address"
-}
-
-person2: {
-    name:    string
-    address: null | {
-        street: string
-        city:   string
-    }
-}
-
-p2 = person(person2)  // ignoring tags, the underlying types are identical
-```
--->
-
-<!-- jba: I don't know what the type of a struct is, other than {}. -->
-
+- `x` is a struct and is subsumed by `T`.
+- `x` and `T` are both integer or floating points.
+- `x` is an integer or a byte sequence and `T` is a string.
+- `x` is a string and `T` is a byte sequence.
 
 Specific rules apply to conversions between numeric types, structs,
 or to and from a string type. These conversions may change the representation
@@ -1738,7 +1705,7 @@
 #### Conversions between numeric ranges
 For the conversion of numeric values, the following rules apply:
 
-1. Any integer prototype can be converted into any other integer prototype
+1. Any integer value can be converted into any other integer value
    provided that it is within range.
 2. When converting a decimal floating-point number to an integer, the fraction
    is discarded (truncation towards zero). TODO: or disallow truncating?
@@ -1774,7 +1741,6 @@
 Conversions between list types are possible only if `T` strictly subsumes `x`
 and the result will be the unification of `T` and `x`.
 
-<!---
 If we introduce named types this would be different from IP & [10, ...]
 
 Consider removing this until it has a different meaning.
@@ -1922,14 +1888,19 @@
 Argument type    Result
 
 string            string length in bytes
-list              list length
+bytes             length of byte sequence
+list              list length, smallest length for an open list
 struct            number of distinct fields
 ```
-<!-- jba: 
-   String length contradicts what you said in the Strings section near the beginning.
-   What does len([1, 2, ...]) return?
-   If [1] is actually { Elem: 1, Tail: null}, then wouldn't len([1]) == 2?
--->
+
+```
+Expression           Result
+len("Hellø")         6
+len([1, 2, 3])       3
+len([1, 2, ...])     2
+len({a:1, b:2})      2
+```
+
 
 ### `required`
 
@@ -2095,16 +2066,16 @@
 A CUE configuration can be defined in terms of constraints, which are
 analogous to typed attribute structures referred to above.
 
-#### Definition of basic prototypes
+#### Definition of basic values
 
-> A _basic prototype_ is any CUE prototype that is not a struct (or, by
+> A _basic value_ is any CUE value that is not a struct (or, by
 > extension, a list).
-> All basic prototypes are partially ordered in a lattice, such that for any
-> basic prototype `a` and `b` there is a unique greatest lower bound
+> All basic values are partially ordered in a lattice, such that for any
+> basic value `a` and `b` there is a unique greatest lower bound
 > defined for the subsumption relation `a ⊑ b`.
 
 ```
-Basic prototypes
+Basic values
 null
 true
 bool
@@ -2116,7 +2087,7 @@
 re("Hello .*!")
 ```
 
-The basic prototypes correspond to their respective types defined earlier.
+The basic values correspond to their respective types defined earlier.
 
 Struct (and by extension lists), are represented by the abstract notion of
 a typed feature structure.
@@ -2161,11 +2132,11 @@
 >
 > The set of _terms_ for label set `Label` is recursively defined as
 >
-> 1. every basic prototype: `P ⊆ T`
+> 1. every basic value: `P ⊆ T`
 > 1. every constraint in `𝒞`<sub>`Label`</sub> is a term: `𝒞`<sub>`Label`</sub>` ⊆ T`
 >    a _reference_ may refer to any substructure of `C`.
-> 1. for every `n` prototypes `t₁, ..., tₙ`, and every `n`-ary function symbol
->    `f ∈ F_n`, the prototype `f(t₁,...,tₙ) ∈ T`.
+> 1. for every `n` values `t₁, ..., tₙ`, and every `n`-ary function symbol
+>    `f ∈ F_n`, the value `f(t₁,...,tₙ) ∈ T`.
 >
 
 
@@ -2202,19 +2173,19 @@
 Multiple labels may be recursively combined in any order.
 
 <!-- unnecessary, probably.
-#### Definition of evaluated prototype
+#### Definition of evaluated value
 
-> A fully evaluated prototype, `T_evaluated ⊆ T` is a subset of `T` consisting
+> A fully evaluated value, `T_evaluated ⊆ T` is a subset of `T` consisting
 > only of atoms, typed attribute structures and constraint functions.
 >
-> A prototype is called _ground_ if it is an atom or typed attribute structure.
+> A value is called _ground_ if it is an atom or typed attribute structure.
 
-#### Unification of evaluated prototypes
+#### Unification of evaluated values
 
-> A fully evaluated prototype, `T_evaluated ⊆ T` is a subset of `T` consisting
+> A fully evaluated value, `T_evaluated ⊆ T` is a subset of `T` consisting
 > only of atoms, typed attribute structures and constraint functions.
 >
-> A prototype is called _ground_ if it is an atom or typed attribute structure.
+> A value is called _ground_ if it is an atom or typed attribute structure.
 -->
 
 #### Definition of subsumption and unification on typed attribute structure
@@ -2241,7 +2212,7 @@
 sufficiently precise. Although I admit that references and cycles
 are still unclear to me. -->
 
-Like with the subsumption relation for basic prototypes,
+Like with the subsumption relation for basic values,
 the subsumption relation for constraints determines the mutual placement
 of constraints within the partial order of all values.