Why Sponsor Oils? | source | all docs for version 0.26.0 | all versions | oils.pub
Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.
HTM8 is a data language, which is part of J8 Notation:
<li><li>
examplesed
-like transformation!Currently, all of Oils docs are parsed and processed with it.
We would like to "lift it up" into an API for YSH users.
<a>
</a>
<img/>
HTML5 doesn't have the notion of self-closing tags. Instead, it silently ignores
the trailing /
.
We are bringing it back for human, because we think it's too hard for people to remember the 16 void elements.
And lack of balanced bugs causes visual bugs that are hard to debug. It would be better to get an error earlier.
5 closely related Syntaxes
<a missing>
<a empty=>
<a href=foo>
<a href="foo">
<a href='foo'>
Note: <a href=/>
is disallowed because it's ambiguous. Use <a href="/">
or
<a href=/ >
or <a href= />
.
& < > " '
should be escaped as & < > " &apos
.But we are lenient and allow raw >
between tags:
<p> foo > bar </p>
and raw <
inside tags:
<span foo="<" > foo </span>
Like HTML5, we support explicit <![CDATA[
, even though it's implicit in the
tags.
&
- namedϧ
- decimalÿ
- hex<!-- -->
<? ?>
(XML processing instruction)<!DOCTYPE html>
from HTML5<?xml version= ... ?>
from XML - this is a comment / processing instruction<script>
and <style>
are Leaf Tags with Special Lexing<script> <style>
Note: we still have CDATA for compatibility.
<source> ...
Angle brackets:
<a foo="<">
is allowed, but <a foo=">">
is disallowed<p> 4>3 </p>
is allowed, but <p> 4<3 </p>
is disallowedThis makes lexing the top-level structure easier.
&
is allowed, unlike XML
<a href="?foo=42&bar=99">
HTM8 tags must be balanced to convert them to XML
<script></SCRIPT>
isn't matched
<SCRipt></SCRipt>
<style>
NUL bytes aren't allowed - currently due to re2c sentinel. Two options:
Encodings other than UTF-8. HTM8 is always UTF-8.
Unicode Tag names and attribute names.
<a href=">">
- no literal >
inside quotes
<a href="&">
HTML notes:
There are 5 kinds of tags:
<title> <textarea>
<style> <xmp> <iframe>
?and we have
<script>
</script>
in a string literal?<math> <svg>
- XML rulesThat is, we use exhaustive reasoning
It's meant to be easy to implement.
Using re2c as the "choice" primitive.
<a href="&">
NO_SPECIAL_TAGS
- get rid of special cases fo <script>
and <style>
Conflicts between HTML5 and XML:
In XML, <source>
is like any tag, and must be closed,
In HTML, <source>
is a VOID tag, and must NOT be closedlike any tag, and must be closed,
In XML, <script>
and <style>
don't have special treatment
In HTML, they do
The header is different - <!DOCTYPE html>
vs. <?xml version= ... ?>
HTML: <a empty= missing>
is two attributes
right now we don't handle <a empty = "missing">
as a single attribute
TODO:
querySelectorAll()
Just emit it! This always works, by design.
&
<
BadGreaterTnan ->
>
<script>
and <style>
<![CDATA[
& <
<!DOCTYPE foo>
<?xml version=>
, remove <!DOCTYPE html>
<svg>
and <math>
?<svg>
and <math>
are foreign XML content.
We might want to support this.
This is one way:
<object data="math.xml" type="application/mathml+xml"></object>
<object data="drawing.xml" type="image/svg+xml"></object>
Then we don't need special parsing?