J8 Notation is built on UTF-8, so let's summarize UTF-8 errors.
err-utf8-encode
Oils stores strings as UTF-8 in memory, so it doesn't encode UTF-8 often.
But it may have a function to encode UTF-8 from a List[Int]. These errors
would be handled:
Integer greater than max code point
Code point in the surrogate range
err-utf8-decode
A UTF-8 decoder should handle these errors:
Overlong encoding. In UTF-8, each code point should be represented with the
fewest possible bytes.
Overlong encodings are the equivalent of writing the integer 42 as
042, 0042, 00042, etc. This is not allowed.
Surrogate code point. The sequence decodes to a code point in the surrogate
range, which is used only for the UTF-16 encoding, not for string data.
Exceeds max code point. The sequence decodes to an integer that's larger
than the maximum code point.
Bad encoding. A byte is not encoded like a UTF-8 start byte or a
continuation byte.
Incomplete sequence. Too few continuation bytes appeared after the start
byte.
J8 String
J8 strings extend JSON strings, and are a primary building block of J8
Notation.
err-j8-str-encode
J8 strings can represent any string — bytes or unicode — so there
are no encoding errors.
err-j8-str-decode
Escape sequence like \u{dc00} should not be in the surrogate range.
This means it doesn't represent a real character. Byte escapes like
\yff should be used instead.
Escape sequence like \u{110000} is greater than the maximum Unicode code
point.
Byte escapes like \yff should not be in u'' string.
By design, they're only valid in b'' strings.
Implementation-defined limit:
Max string length (NYI)
e.g. more than 4 billion bytes could overflow a length field, in some
implementations
J8 Lines
Roughly speaking, J8 Lines are an encoding for a stream of J8 strings. In
YSH, it's used by @(split command sub).
err-j8-lines-encode
Like J8 strings, J8 Lines have no encoding errors by design.
err-j8-lines-decode
Any error in a J8 quoted string.
e.g. no closing quote, invalid UTF-8, invalid backslash escape, ...
A line with a quoted string has extra text after it.
e.g. "mystr" extra.
An unquoted line is not valid UTF-8.
JSON
err-json-encode
JSON encoding has these errors:
Object of this type can't be serialized.
For example, Str List Dict are Oils objects can be serialized, but
Eggex Func Range can't.
Circular reference.
e.g. a Dict that points to itself, a List that points to itself, and other
permutations
Float values of NaN, Inf, and -Inf can't be encoded.
(These encode to null in Oils, following JavaScript.)
Note that invalid UTF-8 bytes like 0xfe produce a Unicode replacement
character, not a hard error.
err-json-decode
The encoded message itself is not valid UTF-8.
(Typically, you need to check the unescaped bytes in string literals
"abc\n").
Lexical error, like
the message +
an invalid escape "\z" or a truncated escape "\u1"
A single quoted string like u''
Grammatical error
like the message }{
Unexpected trailing input
like the message 42] or {}]
Implementation-defined limits, i.e. outside the grammar:
Integer too big
implementations may decode to a 64-bit integer
Floats that are too big
may decode to Inf
Max array length (NYI)
e.g. more than 4 billion objects in an array could overflow a length
field, in some implementations
Max object length (NYI)
Max depth for arrays and objects (NYI)
to avoid a recursive parser blowing the stack
JSON8
err-json8-encode
JSON8 has the same encoding errors as JSON.
However, the encoding is lossless by design. Instead of invalid UTF-8 being
turned into a Unicode replacement character, it can use J8 strings with byte
escapes like b'byte \yfe\yff'.
err-json8-decode
JSON8 has the same decoding errors as JSON, plus J8 string decoding errors.