1 | ---
|
2 | in_progress: yes
|
3 | default_highlighter: oils-sh
|
4 | ---
|
5 |
|
6 | Doc Processing in YSH - Notation, Query, Templating
|
7 | ====================================================
|
8 |
|
9 | This is a slogan for "maximalist YSH" design:
|
10 |
|
11 | *Documents, Objects, and Tables - HTML, JSON, and CSV* †
|
12 |
|
13 | This design doc is about the first part - **documents** and document processing.
|
14 |
|
15 | † from a paper about the C# language
|
16 |
|
17 | <div id="toc">
|
18 | </div>
|
19 |
|
20 | ## Intro
|
21 |
|
22 | Let's sketch a design for 3 aspects of doc processing:
|
23 |
|
24 | 1. HTM8 Notation - A **subset** of HTML5 meant for easy implementation, with
|
25 | regular languages.
|
26 | - It's part of J8 Notation (although it does not use J8 strings, like JSON8
|
27 | and TSV8 do.)
|
28 | - It's very important to understand that this is HTM8, not HTML8!
|
29 | 1. A subset of CSS for querying
|
30 | 1. Templating in the Markaby style (a bit like Lisp, but unlike JSX templates)
|
31 |
|
32 | The basic goal is to write ad hod HTML processors.
|
33 |
|
34 | YSH programs should loosely follow the style of the DOM API in web browsers,
|
35 | e.g. `document.querySelectorAll('table#mytable')` and the doc fragments it
|
36 | returns.
|
37 |
|
38 | Note that the DOM API is not available in node.js or Deno by default, much less
|
39 | any alternative lightweight JavaScript runtimes.
|
40 |
|
41 | I believe we can write include something that's simpler, and just as powerful,
|
42 | in YSH.
|
43 |
|
44 | ## Use Cases for HTML Processing
|
45 |
|
46 | These will help people get an idea.
|
47 |
|
48 | 1. making Oils cross-ref.html
|
49 | - query and replacement
|
50 | 1. table language - md-ul-table
|
51 | - query and replacement
|
52 | - many tables to make here
|
53 | 1. safe HTML subset, e.g. for publishing user results on continuous build
|
54 | - well I think I want to encode the policy, like
|
55 | - query
|
56 |
|
57 | Design goals:
|
58 |
|
59 | - Simple format that can be re-implemented anywhere
|
60 | - a few re2c expressions
|
61 | - Fast
|
62 | - re2c uses C
|
63 | - Few allocations
|
64 | - much simpler than an entire browser engine
|
65 |
|
66 | ## Operations
|
67 |
|
68 | - doc('<p>') - validates it and creates a value.Obj
|
69 | - docQuery(mydoc, '#element') - does a simple search
|
70 |
|
71 | Constructors:
|
72 |
|
73 | doc { # prints valid HT8
|
74 | p {
|
75 | echo 'hi'
|
76 | }
|
77 | p {
|
78 | 'hi' # I think I want to turn on this auto-quote feature
|
79 | }
|
80 | raw '<b>bold</b>'
|
81 | }
|
82 |
|
83 | And then
|
84 |
|
85 | doc (&mydoc) { # captures the output, and creates a value.Obj
|
86 | p {
|
87 | 'hi' # I think I want to turn on this auto-quote feature
|
88 | "hi $x"
|
89 | }
|
90 | }
|
91 |
|
92 | This is the same as the table constructor
|
93 |
|
94 | Module:
|
95 |
|
96 | source $LIB_YSH/doc.ysh
|
97 |
|
98 | doc (&d) {
|
99 | }
|
100 | doc {
|
101 | }
|
102 | doc('<p>')
|
103 |
|
104 | This can have both __invoke__ and __call__
|
105 |
|
106 | var results = d.query('#a')
|
107 |
|
108 | # The doc could be __invoke__ ?
|
109 | d query '#a' {
|
110 | }
|
111 |
|
112 | doc query (d, '#a') {
|
113 | for result in (results) {
|
114 | echo hi
|
115 | }
|
116 | }
|
117 |
|
118 | # we create (old, new) pairs?
|
119 | # this is performs an operation like:
|
120 | # d.outerHTML = outerHTML
|
121 | var d = d.replace(pairs)
|
122 |
|
123 |
|
124 | Safe HTML subset
|
125 |
|
126 | d query (tags= :|a p div h1 h2 h3|) {
|
127 | case (_frag.tag) {
|
128 | a {
|
129 | # get a list of all attributes
|
130 | var attrs = _frag.getAttributes()
|
131 | }
|
132 | }
|
133 | }
|
134 |
|
135 | If you want to take user HTML, then you first use an HTML5 -> HT8 converter.
|