NOTE: this is a working document attempting to describe CUE in a way relatable to existing graph unification systems. It is mostly redundant to the spec. Unless one is interested in understanding how to implement CUE or how it relates to the existing body of research, read the spec instead.
CUE is modeled after typed feature structure and graph unification systems such as LKB. There is a wealth of research related to such systems and graph unification in general. This document describes the core semantics of CUE in a notation that allows relating it to this existing body of research.
CUE was inspired by a formalism known as typed attribute structures [Carpenter 1992] or typed feature structures [Copestake 2002], which are used in linguistics to encode grammars and lexicons. Being able to effectively encode large amounts of data in a rigorous manner, this formalism seemed like a great fit for large-scale configuration.
Although CUE configurations are specified as trees, not graphs, implementations can benefit from considering them as graphs when dealing with cycles, and effectively turning them into graphs when applying techniques like structure sharing. Dealing with cycles is well understood for typed attribute structures and as CUE configurations are formally closely related to them, we can benefit from this knowledge without reinventing the wheel.
A CUE configuration can be defined in terms of constraints, which are analogous to typed attribute structures referred to above.
A basic value is any CUE value that is not a struct (or, by extension, a list). All basic values are partially ordered in a lattice, such that for any basic value
a
andb
there is a unique greatest lower bound defined for the subsumption relationa ⊑ b
.
Basic values null true bool 3.14 string "Hello" >=0 <8 re("Hello .*!")
The basic values correspond to their respective types defined earlier.
Struct (and by extension lists), are represented by the abstract notion of a typed feature structure. Each node in a configuration, including the root node, is associated with a constraint.
A typed feature structure_ defined for a finite set of labels
Label
is directed acyclic graph with labeled arcs and values, represented by a tupleC = <Q, q0, υ, δ>
, where
Q
is the finite set of nodes,q0 ∈ Q
, is the root node,υ: Q → T
is the total node typing function, for a finite set of possible termsT
.δ: Label × Q → Q
is the partial feature function,subject to the following conditions:
- there is no node
q
or labell
such thatδ(q, l) = q0
(root)- for every node
q
inQ
there is a pathπ
(i.e. a sequence of members of Label) such thatδ(q0, π) = q
(unique root, correctness)- there is no node
q
or pathπ
such thatδ(q, π) = q
(no cycles)where
δ
is extended to be defined on paths as follows:
δ(q, ϵ) = q
, whereϵ
is the empty pathδ(q, l∙π) = δ(δ(l, q), π)
The substructures of a typed feature structure are the typed feature structures rooted at each node in the structure.
The set of all possible typed feature structures for a given label set is denoted as
𝒞
Label
.The set of terms for label set
Label
is recursively defined as
- every basic value:
P ⊆ T
- every constraint in
𝒞
Label
is a term:𝒞
Label
⊆ T
a reference may refer to any substructure ofC
.- for every
n
valuest₁, ..., tₙ
, and everyn
-ary function symbolf ∈ F_n
, the valuef(t₁,...,tₙ) ∈ T
.
This definition has been taken and modified from [Carpenter, 1992] and [Copestake, 2002].
Without loss of generality, we will henceforth assume that the given set of labels is constant and denote 𝒞
Label
as 𝒞
.
In CUE configurations, the abstract constraints implicated by υ
are CUE expressions. Literal structs can be treated as part of the original typed feature structure and do not need evaluation. Any other expression is evaluated and unified with existing values of that node.
References in expressions refer to other nodes within the C
and represent a copy of the substructure C'
of C
rooted at these nodes. Any references occurring in terms assigned to nodes of C'
are be updated to point to the equivalent node in a copy of C'
.
The functions defined by F
correspond to the binary and unary operators and interpolation construct of CUE, as well as builtin functions.
CUE allows duplicate labels within a struct, while the definition of typed feature structures does not. A duplicate label l
with respective values a
and b
is represented in a constraint as a single label with term &(a, b)
, the unification of a
and b
. Multiple labels may be recursively combined in any order.
For a given collection of constraints
𝒞
, we defineπ ≡
C
π'
to mean that typed feature structureC ∈ 𝒞
contains path equivalence between the pathsπ
andπ'
(i.e.δ(q0, π) = δ(q0, π')
, whereq0
is the root node ofC
); and𝒫
C
(π) = c
to mean that the typed feature structure at the pathπ
inC
isc
(i.e.𝒫
C
(π) = c
if and only ifυ(δ(q0, π)) == c
, whereq0
is the root node ofC
). Subsumption is then defined as follows:C ∈ 𝒞
subsumesC' ∈ 𝒞
, writtenC' ⊑ C
, if and only if:
π ≡
C
π'
impliesπ ≡
C'
π'
𝒫
C
(π) = c
implies𝒫
C'
(π) = c
andc' ⊑ c
The unification of
C
andC'
, denotedC ⊓ C'
, is the greatest lower bound ofC
andC'
in𝒞
ordered by subsumption.
Like with the subsumption relation for basic values, the subsumption relation for constraints determines the mutual placement of constraints within the partial order of all values.
The evaluation function is given by
E: T -> 𝒞
. The unification of two typed feature structures is evaluated as defined above. All other functions are evaluated according to the definitions found earlier in this spec. An error is indicated by_|_
.
We say that a given typed feature structure
C = <Q, q0, υ, δ> ∈ 𝒞
is a well-formed typed feature structure if and only if for all nodesq ∈ Q
, the substructureC'
rooted atq
, is such thatE(υ(q)) ∈ 𝒞
andC' = <Q', q, δ', υ'> ⊑ E(υ(q))
.
The evaluation of a CUE configuration represented by C
is defined as the process of making C
well-formed.
Theory:
Some graph unification algorithms:
The evaluation of a CUE configuration C
is defined as the process of making C
well-formed.
This section does not define any operational semantics. As the unification operation is communitive, transitive, and reflexive, implementations have a considerable amount of leeway in choosing an evaluation strategy. Although most algorithms for the unification of typed attribute structure that have been proposed are near O(n)
, there can be considerable performance benefits of choosing one of the many proposed evaluation strategies over the other. Implementations will need to be verified against the above formal definition.
A constraint function is a unary function f
which for any input a
only returns values that are an instance of a
. For instance, the constraint function f
for string
returns "foo"
for f("foo")
and _|_
for f(1)
. Constraint functions may take other constraint functions as arguments to produce a more restricting constraint function. For instance, the constraint function f
for <=8
returns 5
for f(5)
, >=5 & <=8
for f(>=5)
, and _|_
for f("foo")
.
Constraint functions play a special role in unification. The unification function &(a, b)
is defined as
a & b
if a
and b
are two atomsa & b
if a
and b
are two nodes, respresenting structa(b)
or b(a)
if either a
or b
is a constraint function, respectively.Implementations are free to pick which constraint function is applied if both a
and b
are constraint functions, as the properties of unification will ensure this produces identical results.
A distinguising feature of CUE's unification algorithm is the use of references. In conventional graph unification for typed feature structures, the structures that are unified into the existing graph are independent and pre-evaluated. In CUE, the typed feature structures indicated by references may still need to be evaluated. Some conventional evaluation strategies may not cope well with references that refer to each other. The simple solution is to deploy a breadth-first evaluation strategy, rather than the more traditional depth-first approach. Other approaches are possible, however, and implementations are free to choose which approach is deployed.