Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.

HTM8 - An Easy Subset of HTML5, With Errors

Table of Contents
Basic Structure
Text Content
3 Kinds of Character Code
3 Kinds of Tag
2 Kinds of Attribute
2 Kinds of Comment
Special Rules, From HTML
2 Tags Cause Special Lexing
16 VOID Tags Change Parsing
Bonus: XML Mode
Under the Hood
3 Layers of Lexing
What Do You Use This for?
Algorithms
Emitting HTM8 as HTML5
Parsing XML
Converting to XML?
Leniency
Related
FAQ
What Doesn't This Cover?
TODO

Basic Structure

Text Content

Anything except & and <.

These must be &amp; and &lt;.

> is allowed, or you can escape it with &gt;.

3 Kinds of Character Code

  1. &amp; - named
  2. &#999; - decimal
  3. &#xff; - hex

3 Kinds of Tag

  1. Start
  2. End
  3. StartEnd

2 Kinds of Attribute

  1. Unquoted
  2. Quoted

2 Kinds of Comment

  1. <!-- -->
  2. <? ?> (XML processing instruction)

Special Rules, From HTML

2 Tags Cause Special Lexing

Note: we still have CDATA for compatibility.

16 VOID Tags Change Parsing

Bonus: XML Mode

Then you can query HTML

Under the Hood

3 Layers of Lexing

  1. Tag
  2. Attributes within a Tag
  3. Quoted Value for Attributes

What Do You Use This for?

TODO:

Algorithms

Emitting HTM8 as HTML5

Just emit it! This always works, by design.

Parsing XML

Conflicts between HTML5 and XML:

Converting to XML?

Leniency

Angle brackets:

This makes lexing the top-level structure easier.

Related

FAQ

What Doesn't This Cover?

There are 5 kinds of tags:

and we have

TODO

This is one way:

<object data="math.xml" type="application/mathml+xml"></object>
<object data="drawing.xml" type="image/svg+xml"></object>

Then we don't need special parsing?

Generated on Tue, 14 Jan 2025 21:09:12 +0000