doc/ref: simplify string model The difference in semantics between strings and bytes hampers there interchangabilty. This can be improved by moving the smarts of interpreting strings to a bytes, strings, (or text) package, making it also very clear when smart interpretation is needed (or not). This requires removing a bunch of the operations currently supported for bytes and strings. At a later stage, we can relax this further and make single and double quoted strings interchangable. Change-Id: I1bb02855a3fb5a6c889d2c614fee1519b6c6c780 Reviewed-on: https://cue-review.googlesource.com/c/cue/+/2842 Reviewed-by: Jonathan Amsterdam <jba@google.com>

commit: 4108f8057e33a4e38647563065c45d8883deaab0 [log] [tgz]
author: Marcel van Lohuizen <mpvl@golang.org> Tue Aug 13 18:30:25 2019 +0200
committer: Marcel van Lohuizen <mpvl@golang.org> Sun Aug 18 08:26:38 2019 +0000
tree: da6dbcddb36624bc2d9eb5601c3d90d62eac13dc
parent: 7414faea590ef09847b3ec3ac75af1fb1798b7cc [diff]
diff --git a/doc/ref/spec.md b/doc/ref/spec.md
index a71ae0e..3679a4c 100644
--- a/doc/ref/spec.md
+++ b/doc/ref/spec.md

@@ -908,23 +908,21 @@
 
 ### Strings
 
-The _string type_ represents the set of all possible UTF-8 strings,
+The _string type_ represents the set of UTF-8 strings,
 not allowing surrogates.
 The predeclared string type is `string`; it is a defined type.
 
-Strings are designed to be unicode-safe.
-Comparison is done using canonical forms ("é" == "e\u0301").
-A string element is an
-[extended grapheme cluster](https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries),
-which is an approximation of a human-readable character.
-
 The length of a string `s` (its size in bytes) can be discovered using
 the built-in function len.
-A string's extended grapheme cluster can be accessed by integer index
-0 through len(s)-1 for any byte that is part of that grapheme cluster.
 
-To access the individual bytes of a string one should convert it to
-a sequence of bytes first.
+
+### Bytes
+
+The _bytes type_ represents the set of byte sequences.
+A byte sequence value is a (possibly empty) sequence of bytes.
+The number of bytes is called the length of the byte sequence
+and is never negative.
+The predeclared byte sequence type is `bytes`; it is a defined type.
 
 
 ### Bounds
@@ -1295,7 +1293,7 @@
 int       All integral numbers
 float     All decimal floating-point numbers
 string    Any valid UTF-8 sequence
-bytes     Any vallid byte sequence
+bytes     Any valid byte sequence
 
 Derived   Value
 number    int | float
@@ -1576,13 +1574,13 @@
 a[x]
 ```
 
-denotes the element of the list, string, bytes, or struct `a` indexed by `x`.
+denotes the element of a list or struct `a` indexed by `x`.
 The value `x` is called the index or field name, respectively.
 The following rules apply:
 
 If `a` is not a struct:
 
-- `a` is a concrete string or bytes type or a list (which need not be complete)
+- `a` is a list (which need not be complete)
 - the index `x` unified with `int` must be concrete.
 - the index `x` is in range if `0 <= x < len(a)`, where only the
   explicitly defined values of an open-ended list are considered,
@@ -1590,16 +1588,11 @@
 
 The result of `a[x]` is
 
-for `a` of list or bytes type:
+for `a` of list type:
 
-- the list or byte element at index `x`, if `x` is within range
+- the list element at index `x`, if `x` is within range
 - bottom (an error), otherwise
 
-for `a` of string type:
-
-- the grapheme cluster at the `x`th byte (type string), if `x` is within range
-  where `x` may match any byte of the grapheme cluster
-- bottom (an error), otherwise
 
 for `a` of struct type:
 
@@ -1612,12 +1605,6 @@
 [ 1, 2 ][1]     // 2
 [ 1, 2 ][2]     // _|_
 [ 1, 2, ...][2] // _|_
-"He\u0300?"[0]  // "H"
-"He\u0300?"[1]  // "e\u0300"
-"He\u0300?"[2]  // "e\u0300"
-"He\u0300?"[3]  // "e\u0300"
-"He\u0300?"[4]  // "?"
-"He\u0300?"[5]  // _|_
 ```
 
 Both the operand and index value may be a value-default pair.
@@ -1636,17 +1623,29 @@
 v: x[i]                 (x[i], 4)
 ```
 
+
 ### Slice expressions
 
-Slice expressions construct a substring or slice from a string, bytes,
-or list value.
+<!-- TODO: consider removing slices alltogether
+Slice is or marginal utility in CUE. Also, it may be that we will use
+other notations to achieve the same.
 
-For strings, bytes or lists, the primary expression
+For now it seems saver to remove and provide slicing as builtins instead:
+
+    list.Slice()
+    strings.Runes().Slice()      // slice by rune
+    strings.Characters().Slice() // slice by character
+    bytes.Slice()                // slice by bytes
+-->
+
+Slice expressions construct a slice from a list value.
+
+The primary expression
 ```
 a[low : high]
 ```
-constructs a substring or slice. The indices `low` and `high` must be
-concrete integers and select
+constructs a slice.
+The indices `low` and `high` must be concrete integers and select
 which elements of operand `a` appear in the result.
 The result has indices starting at 0 and length equal to `high` - `low`.
 After slicing the list `a`
@@ -1677,20 +1676,6 @@
 
 Indices are in range if `0 <= low <= high <= len(a)`,
 otherwise they are out of range.
-For strings, the indices selects the start of the extended grapheme cluster
-at byte position indicated by the index.
-If any of the slice values is out of range or if `low > high`, the result of
-a slice is bottom (error).
-
-```
-"He\u0300?"[:2]  // "He\u0300"
-"He\u0300?"[1:2] // "e\u0300"
-"He\u0300?"[4:5] // "e\u0300?"
-```
-
-
-The result of a successful slice operation is a value of the same type
-as the operand.
 
 Both the slice operand and the slice indices may be associated with a default.
 
@@ -1707,6 +1692,7 @@
 ```
 -->
 
+
 ### Operators
 
 Operators combine operands into expressions.
@@ -1957,8 +1943,7 @@
 - Floating-point values are comparable and ordered, as per the definitions
   for binary coded decimals in the IEEE-754-2008 standard.
 - Floating point numbers may be compared with integers.
-- String values are comparable and ordered, lexically byte-wise after
-  normalization to Unicode normal form NFC.
+- String and bytes values are comparable and ordered lexically byte-wise.
 - Struct are not comparable.
 - Lists are not comparable.
 - The regular expression syntax is the one accepted by RE2,
commit	4108f8057e33a4e38647563065c45d8883deaab0	[log] [tgz]
author	Marcel van Lohuizen <mpvl@golang.org>	Tue Aug 13 18:30:25 2019 +0200
committer	Marcel van Lohuizen <mpvl@golang.org>	Sun Aug 18 08:26:38 2019 +0000
tree	da6dbcddb36624bc2d9eb5601c3d90d62eac13dc
parent	7414faea590ef09847b3ec3ac75af1fb1798b7cc [diff]