doc/ref: define defintions, closed structs and embedding
- The definition of optional has now been integrated
in the original definition of unification to stress the
fact that a bottom optional field does not fail
unification, but merely makes that field no longer
an option.
- This will allow the currently supported (and somehwat
hideous) hidden values to be deprecated.
Issue #40
Change-Id: I86e6d336d9114efccc54282752f0e62b36be522d
Reviewed-on: https://cue-review.googlesource.com/c/cue/+/2280
Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>
diff --git a/doc/ref/spec.md b/doc/ref/spec.md
index 3679a4c..eb53b34 100644
--- a/doc/ref/spec.md
+++ b/doc/ref/spec.md
@@ -44,7 +44,7 @@
Its main influences were BCL/ GCL (internal to Google),
LKB (LinGO), Go, and JSON.
Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
-Jsonnet, HCL, Flabbergast, JSONPath, Haskell, Objective-C, and Python.
+Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
## Notation
@@ -956,7 +956,7 @@
We say a label is defined for a struct if the struct has a field with the
corresponding label.
-The value for a label `f` of struct `a` is denoted `f.a`.
+The value for a label `f` of struct `a` is denoted `a.f`.
A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
Note that if `a` is an instance of `b` it may have fields with labels that
@@ -965,46 +965,103 @@
The (unique) struct with no fields, written `{}`, has every struct as an
instance. It can be considered the type of all structs.
+A field may be required or optional.
The successful unification of structs `a` and `b` is a new struct `c` which
has all fields of both `a` and `b`, where
the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
+If a field `f` is in both `a` and `b`, `c.f` is optional only if both
+`a.f` and `b.f` are optional.
Any [references](#References) to `a` or `b`
in their respective field values need to be replaced with references to `c`.
-The result of a unification is bottom (`_|_`) if any of its fields evaluates
-to bottom, recursively.
+The result of a unification is bottom (`_|_`) if any of its required
+fields evaluates to bottom, recursively.
-A field name may also be an interpolated string.
-Identifiers used in such strings are evaluated within
-the scope of the struct in which the label is defined.
+Syntactically, the labels of optional fields are followed by a
+question mark `?`.
+The question mark is not part of the field name.
+Concrete field labels may be an identifier or string, the later of which may be
+interpolated.
+References within such interpolated strings are resolved within
+the scope of the struct in which the label sequence is
+defined and can reference concrete labels lexically preceding
+the label within a label sequence.
+<!-- We allow this so that rewriting a CUE file to collapse or expand
+field sequences has no impact on semantics.
+-->
+
+<!--TODO: first implementation round will not yet have expression labels
+
+An ExpressionLabel sets a collection of optional fields to a field value.
+By default it defines this value for all possible string labels.
+An optional expression limits this to the set of optional fields which
+labels match the expression.
+-->
+A Bind label binds an identifier to the label name scoped to the field value.
+The token `...` is a shorthand for `<_>: _`.
+<!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
Syntactically, a struct literal may contain multiple fields with
-the same label, the result of which is a single field with a value
-that is the unification of the values of those fields.
+the same label, the result of which is a single field with the same properties
+as defined as the unification of two fields resulting from unifying two structs.
-A TemplateLabel indicates a template value that is to be unified with
-the values of all fields within a struct.
-The identifier of a template label binds to the field name of each
-field and is visible within the template value.
+
+<!-- NOTE:
+A DefinitionDecl does not allow repeated labels. This is to avoid
+any ambiguity or confusion about whether earlier path components
+are to be interpreted as declarations or normal fields (they should
+always be normal fields.)
+-->
+
+<!--NOTE:
+The syntax has been deliberately restricted to allow for the following
+future extensions and relaxations:
+ - Allow omitting a "?" in an expression label to indicate a concrete
+ string value (but maybe we want to use () for that).
+ - Make the "?" in expression label optional if expression labels
+ are always optional.
+ - Or allow eliding the "?" if the expression has no references and
+ is obviously not concrete (such as `[string]`).
+ - The expression of an expression label may also indicate a struct with
+ integer or even number labels
+ (beware of imprecise computation in the latter).
+ e.g. `{ [int]: string }` is a map of integers to strings.
+ - Allow for associative lists (`foo [@.field]: {field: string}`)
+ - The `...` notation can be extended analogously to that of a ListList,
+ by allowing it to follow with an expression for the remaining properties.
+ In that case it is no longer a shorthand for `[string]: _`, but rather
+ would define the value for any other value for which there is no field
+ defined.
+ Like the definition with List, this is somewhat odd, but it allows the
+ encoding of JSON schema's and (non-structural) OpenAPI's
+ additionalProperties and additionalItems.
+-->
+
+<!-- TODO: for next round of implementation, replace ExpressionLabel with:
+ExpressionLabel = BindLabel | [ BindLabel ] "[" [ Expression ] "]" .
+-->
```
-StructLit = "{" [ Declaration { "," Declaration } [ "," ] ] "}" .
-Declaration = FieldDecl | AliasDecl | ComprehensionDecl .
-FieldDecl = Label { Label } ":" Expression { attribute } .
+StructLit = "{" [ DeclarationList [ "," [ "..." ] ] "}" .
+DeclarationList = Declaration { "," Declaration }
+Declaration = FieldDecl | DefinitionDecl | AliasDecl | ComprehensionDecl | Embedding .
+FieldDecl = Label { Label } ":" Expression { attribute } .
+DefinitionDecl = Label "::" Expression { attribute } .
+Embedding = Operand .
-AliasDecl = Label "=" Expression .
-TemplateLabel = "<" identifier ">" .
-ConcreteLabel = identifier | simple_string_lit
-OptionalLabel = ConcreteLabel "?"
-Label = ConcreteLabel | OptionalLabel | TemplateLabel .
+AliasDecl = Label "=" Expression .
+BindLabel = "<" identifier ">" .
+ConcreteLabel = identifier | simple_string_lit .
+ExpressionLabel = BindLabel
+Label = ConcreteLabel [ "?" ] | ExpressionLabel "?".
-attribute = "@" identifier "(" attr_elems ")" .
-attr_elems = attr_elem { "," attr_elem }
-attr_elem = attr_string | attr_label | attr_nest .
-attr_label = identifier "=" attr_string .
-attr_nest = identifier "(" attr_elems ")" .
-attr_string = { attr_char } | string_lit .
-attr_char = /* an arbitrary Unicode code point except newline, ',', '"', `'`, '#', '=', '(', and ')' */ .
+attribute = "@" identifier "(" attr_elems ")" .
+attr_elems = attr_elem { "," attr_elem }
+attr_elem = attr_string | attr_label | attr_nest .
+attr_label = identifier "=" attr_string .
+attr_nest = identifier "(" attr_elems ")" .
+attr_string = { attr_char } | string_lit .
+attr_char = /* an arbitrary Unicode code point except newline, ',', '"', `'`, '#', '=', '(', and ')' */ .
```
```
@@ -1019,18 +1076,141 @@
```
```
-Expression Result
-{a: int, a: 1} {a: int(1)}
-{a: int} & {a: 1} {a: int(1)}
+Expression Result (without optional fields)
+{a: int, a: 1} {a: 1}
+{a: int} & {a: 1} {a: 1}
{a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7}
{a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7}
{a: 1} & {b: 2} {a: 1, b: 2}
-{a: 1, b: int} & {b: 2} {a: 1, b: int(2)}
+{a: 1, b: int} & {b: 2} {a: 1, b: 2}
{a: 1} & {a: 2} _|_
+
+a: { foo?: string } {}
+b: { foo: "bar" } { foo: "bar" }
+c: { foo?: *"bar" | string } {}
+d: { [string]?: string }
+
+d: a & b { foo: "bar" }
+e: b & c { foo: "bar" }
+f: a & c {}
+g: a & { foo?: number } {}
+h: b & { foo?: number } _|_
```
+
+#### Closed structs
+
+By default, structs are open to adding fields.
+One could say that an optional field `f` with value top (`_`) is defined for any
+unspecified field.
+A _closed struct_ `c` is a struct whose instances may not have fields
+not defined in `c`.
+Closing a struct is equivalent to adding an optional field with value `_|_`
+for any undefined field.
+
+Note that fields created with field comprehensions are not considered
+defined fields.
+Fields inserted by a field comprehension defined in a closed struct
+are only permitted when defined explicitly by a required or optional field.
+
+Syntactically, closed structs can be explicitly created with the `close` builtin
+or implicitly by [definitions](#Definitions).
+
+
+```
+A: close({
+ field1: string
+ field2: string
+})
+
+A1: A & {
+ feild1: string // _|_ feild1 not defined for A
+}
+
+A2: A & {
+ k: v for k,v in { feild1: string } // _|_ feild1 not defined for A
+}
+
+C: close({
+ <_>: _
+})
+
+C2: C & {
+ "\(k)": v for k,v in { thisIsFine: string }
+}
+
+D: close({
+ "\(k)": v for k,v in { x: string } // _|_ field "x" not defined
+})
+```
+
+
+#### Embedding
+
+A struct may contain an _embedded value_, an Operand used
+as a field declaration.
+An embedded value of type struct is unified with the struct in which it is
+embedded, but disregarding the restrictions imposed by closed structs.
+A struct resulting from such a unification is closed if either of the involved
+structs were closed.
+
+
+#### Definitions
+
+A fields of a struct may be declared as a regular field (using `:`)
+or as a _definition_ (using `::`).
+Definitions are not emitted as part of the model and are never required
+to be concrete when emitting data.
+It is illegal to have a normal field and a definition with the same name
+within the same struct.
+Literal structs that are part of a definition's value are implicitly closed.
+An ellipsis `...` in such literal structs keeps them open.
+
+
+```
+// MyStruct is closed and as there is no expression label or `...`, we know
+// this is the full definition.
+MyStruct :: {
+ field: string
+ enabled?: bool
+}
+
+// Without the `...`, this field would not unify with its previous declaration.
+MyStruct :: {
+ enabled: bool | *false
+ ...
+}
+
+myValue: MyStruct & {
+ feild: 2 // error, feild not defined in MyStruct
+ enabled: true // okay
+}
+
+D :: {
+ OneOf
+
+ c: int // adds this field.
+}
+
+OneOf :: { a: int } | { b: int }
+
+
+D1: D & { a: 12, c: 22 } // { a: 12, c: 22 }
+D2: D & { a: 12, b: 33 } // _|_ // cannot define both `a` and `b`
+```
+
+<!---
+JSON fields are usual camelCase. Clashes can be avoided by adopting the
+convention that definitions be TitleCase. Unexported definitions are still
+subject to clashes, but those are likely easier to resolve because they are
+package internal.
+--->
+
+
+#### Field attributes
+
Fields may be associated with attributes.
Attributes define additional information about a field,
such as a mapping to a protobuf tag or alternative
@@ -1054,17 +1234,17 @@
(an identifier) and key value pairs, separated by a `=`.
```
-MyStruct1: {
+myStruct1: {
field: string @go(Field)
attr: int @xml(,attr) @go(Attr)
}
-MyStruct2: {
+myStruct2: {
field: string @go(Field)
attr: int @xml(a1,attr) @go(Attr)
}
-Combined: MyStruct1 & MyStruct2
+Combined: myStruct1 & myStruct2
// field: string @go(Field)
// attr: int @xml(,attr) @xml(a1,attr) @go(Attr)
```
@@ -1108,21 +1288,8 @@
}
```
+<!-- OPTIONAL FIELDS:
-#### Optional fields
-
-An identifier or string label may be followed by a question mark `?`
-to indicate a field is optional.
-The question mark is not part of the field name.
-Constraints defined by an optional field should only be applied when
-a field is present.
-A field with such a marker may be omitted from output and should not cause
-an error when emitting a concrete configuration, even if its value is
-not concrete or bottom.
-The result of unifying two fields only has an optional marker
-if both fields have such a marker.
-
-<!--
The optional marker solves the issue of having to print large amounts of
boilerplate when dealing with large types with many optional or default
values (such as Kubernetes).
@@ -1165,19 +1332,6 @@
```
-->
-```
-Input Result
-a: { foo?: string } {}
-b: { foo: "bar" } { foo: "bar" }
-c: { foo?: *"bar" | string } {}
-
-d: a & b { foo: "bar" }
-e: b & c { foo: "bar" }
-f: a & c {}
-g: a & { foo?: number } _|_
-```
-
-
### Lists
A list literal defines a new value of type list.
@@ -1317,10 +1471,17 @@
```
-### Exported and manifested identifiers
+### Exported identifiers
An identifier of a package may be exported to permit access to it
from another package.
+<!-- TODO: remove hidden fields by replacing the follwing with this text.
+An identifier is exported if
+the first character of the identifier's name is a Unicode upper case letter
+(Unicode class "Lu"); and
+the identifier is declared in the file block.
+All other top-level identifiers used for fields not exported.
+-->
An identifier is exported if both:
the first character of the identifier's name is not a Unicode lower case letter
(Unicode class "Ll") or the underscore "_"; and
@@ -1330,16 +1491,40 @@
An identifier that starts with the underscore "_" is not
emitted in any data output.
Quoted labels that start with an underscore are emitted, however.
+<!-- END REPLACE -->
+
+In addition, any definition declared anywhere within a package of which
+the first character of the identifier's name is a Unicode upper case letter
+(Unicode class "Lu") is visible outside this package.
+Any other defintion is not visible outside the package and resides
+in a separate namespace than namesake identifiers of other packages.
+This is in contrast to ordinary field declarations that do not begin with
+an upper-case letter, which are visible outside the package.
+
+```
+package mypackage
+
+foo: string // not visible outside mypackage
+
+Foo :: { // visible outside mypackage
+ a: 1 // visible outside mypackage
+ B: 2 // visible outside mypackage
+
+ C :: { // visible outside mypackage
+ d: 4 // visible outside mypackage
+ }
+ e :: foo // not visible outside mypackage
+}
+```
+
### Uniqueness of identifiers
Given a set of identifiers, an identifier is called unique if it is different
from every other in the set, after applying normalization following
Unicode Annex #31.
-Two identifiers are different if they are spelled differently.
-<!--
+Two identifiers are different if they are spelled differently
or if they appear in different packages and are not exported.
---->
Otherwise, they are the same.
@@ -1348,6 +1533,9 @@
A field declaration binds a label (the name of the field) to an expression.
The name for a quoted string used as label is the string it represents.
Tne name for an identifier used as a label is the identifier itself.
+<!-- TODO: replace the remainder of this paragraph with the following
+Quoted strings and identifiers can be used used interchangeably.
+-->
Quoted strings and identifiers can be used used interchangeably, with the
exception of identifiers starting with an underscore '_'.
The latter represent hidden fields and are treated in a different namespace.
@@ -1356,6 +1544,7 @@
as described in [default values](#default-values), the field binds to this
value-default pair.
+
<!-- TODO: disallow creating identifiers starting with __
...and reserve them for builtin values.
@@ -1400,7 +1589,7 @@
Literal = BasicLit | ListLit | StructLit .
BasicLit = int_lit | float_lit | string_lit |
null_lit | bool_lit | bottom_lit | top_lit .
-OperandName = identifier | QualifiedIdent.
+OperandName = identifier | QualifiedIdent .
```
### Qualified identifiers
@@ -1487,7 +1676,7 @@
Index = "[" Expression "]" .
Slice = "[" [ Expression ] ":" [ Expression ] "]"
Argument = Expression .
-Arguments = "(" [ ( Argument { "," Argument } ) [ "..." ] [ "," ] ] ")" .
+Arguments = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
```
<!---
Argument = Expression | ( identifer ":" Expression ).
@@ -1951,6 +2140,10 @@
except for `\C`.
- `s =~ r` is true if `s` matches the regular expression `r`.
- `s !~ r` is true if `s` does not match regular expression `r`.
+<!--- TODO: consider the following
+- For regular expression, named capture groups are interpreted as CUE references
+ that must unify with the strings matching this capture group.
+--->
<!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
<!-- Consider implementing Level 2 of Unicode regular expression. -->
@@ -2266,6 +2459,13 @@
len([1, 2, ...]) >=2
```
+
+### `close`
+
+The builtin function `close` converts a partially defined, or open, struct
+to a fully defined, or closed, struct.
+
+
### `and`
The built-in function `and` takes a list and returns the result of applying