OILS / doc / ref / chap-errors.md View on Github | oilshell.org

186 lines, 130 significant
1---
2title: Errors (Oils Reference)
3all_docs_url: ..
4body_css_class: width40
5default_highlighter: oils-sh
6preserve_anchor_case: yes
7---
8
9<div class="doc-ref-header">
10
11[Oils Reference](index.html) &mdash;
12Chapter **Errors**
13
14</div>
15
16This chapter describes **errors** for data languages. An error checklist is
17often a nice, concise way to describe a language.
18
19Related: [Oils Error Catalog, With Hints](../error-catalog.html) describes
20errors in code.
21
22<span class="in-progress">(in progress)</span>
23
24<div id="dense-toc">
25</div>
26
27## UTF8
28
29J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.
30
31### err-utf8-encode
32
33Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.
34
35But it may have a function to encode UTF-8 from a `List[Int]`. These errors
36would be handled:
37
381. Integer greater than max code point
391. Code point in the surrogate range
40
41### err-utf8-decode
42
43A UTF-8 decoder should handle these errors:
44
451. Overlong encoding. In UTF-8, each code point should be represented with the
46 fewest possible bytes.
47 - Overlong encodings are the equivalent of writing the integer `42` as
48 `042`, `0042`, `00042`, etc. This is not allowed.
491. Surrogate code point. The sequence decodes to a code point in the surrogate
50 range, which is used only for the UTF-16 encoding, not for string data.
511. Exceeds max code point. The sequence decodes to an integer that's larger
52 than the maximum code point.
531. Bad encoding. A byte is not encoded like a UTF-8 start byte or a
54 continuation byte.
551. Incomplete sequence. Too few continuation bytes appeared after the start
56 byte.
57
58## J8 String
59
60J8 strings extend [JSON]($xref) strings, and are a primary building block of J8
61Notation.
62
63### err-j8-str-encode
64
65J8 strings can represent any string &mdash; bytes or unicode &mdash; so there
66are **no encoding errors**.
67
68### err-j8-str-decode
69
701. Escape sequence like `\u{dc00}` should not be in the surrogate range.
71 - This means it doesn't represent a real character. Byte escapes like
72 `\yff` should be used instead.
731. Escape sequence like `\u{110000}` is greater than the maximimum Unicode code
74 point.
751. Byte escapes like `\yff` should not be in `u''` string.
76 - By design, they're only valid in `b''` strings.
77
78Implementation-defined limit:
79
804. Max string length (NYI)
81 - e.g. more than 4 billion bytes could overflow a length field, in some
82 implementations
83
84## J8 Lines
85
86Roughly speaking, J8 Lines are an encoding for a stream of J8 strings. In
87[YSH]($xref), it's used by `@(split command sub)`.
88
89### err-j8-lines-encode
90
91Like J8 strings, J8 Lines have no encoding errors by design.
92
93### err-j8-lines-decode
94
951. Any error in a J8 quoted string.
96 - e.g. no closing quote, invalid UTF-8, invalid backslash escape, ...
971. A line with a quoted string has extra text after it.
98 - e.g. `"mystr" extra`.
991. An unquoted line is not valid UTF-8.
100
101## JSON
102
103### err-json-encode
104
105JSON encoding has these errors:
106
1071. Object of this type can't be serialized.
108 - For example, `Str List Dict` are Oils objects can be serialized, but
109 `Eggex Func Range` can't.
1101. Circular reference.
111 - e.g. a Dict that points to itself, a List that points to itself, and other
112 permutations
1131. Float values of NaN, Inf, and -Inf can't be encoded.
114 - (These encode to `null` in Oils, following JavaScript.)
115
116Note that invalid UTF-8 bytes like `0xfe` produce a Unicode replacement
117character, not a hard error.
118
119### err-json-decode
120
1211. The encoded message itself is not valid UTF-8.
122 - (Typically, you need to check the unescaped bytes in string literals
123 `"abc\n"`).
1241. Lexical error, like
125 - the message `+`
126 - an invalid escape `"\z"` or a truncated escape `"\u1"`
127 - A single quoted string like `u''`
1281. Grammatical error
129 - like the message `}{`
1301. Unexpected trailing input
131 - like the message `42]` or `{}]`
132
133Implementation-defined limits, i.e. outside the grammar:
134
1355. Integer too big
136 - implementations may decode to a 64-bit integer
1371. Floats that are too big
138 - may decode to `Inf`
1391. Max array length (NYI)
140 - e.g. more than 4 billion objects in an array could overflow a length
141 field, in some implementations
1421. Max object length (NYI)
1431. Max depth for arrays and objects (NYI)
144 - to avoid a recursive parser blowing the stack
145
146## JSON8
147
148### err-json8-encode
149
150JSON8 has the same encoding errors as JSON.
151
152However, the encoding is lossless by design. Instead of invalid UTF-8 being
153turned into a Unicode replacment character, it can use J8 strings with byte
154escapes like `b'byte \yfe\yff'`.
155
156### err-json8-decode
157
158JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.
159
160See [err-j8-str-decode](#err-j8-str-decode).
161
162<!--
163
164## Packle
165
166TODO: Not implemented!
167
168### err-packle-encode
169
170Packle has no encoding errors!
171
1721. TODO: Unserializable `Eggex Func Range` can be turned into "wire Tuple"
173 `(type_name: Str, heap_id: Int)`.
174 - When you read a packle into Python, you'll get a tuple.
175 - When you read a packle back into YSH, you'll get a `value.Tombstone`?
1761. Circular references are allowed. Packle data expresses a **graph**, not a
177 tree.
1781. Float values NaN, Inf, and -Inf use their binary representations.
1791. Both Unicode and binary data are allowed.
180
181### err-packle-decode
182
183TODO
184
185-->
186