OILS / doc / ysh-doc-processing.md View on Github | oilshell.org

135 lines, 98 significant
1---
2in_progress: yes
3default_highlighter: oils-sh
4---
5
6Doc Processing in YSH - Notation, Query, Templating
7====================================================
8
9This is a slogan for "maximalist YSH" design:
10
11*Documents, Objects, and Tables - HTML, JSON, and CSV* †
12
13This design doc is about the first part - **documents** and document processing.
14
15† from a paper about the C# language
16
17<div id="toc">
18</div>
19
20## Intro
21
22Let's sketch a design for 3 aspects of doc processing:
23
241. HTM8 Notation - A **subset** of HTML5 meant for easy implementation, with
25 regular languages.
26 - It's part of J8 Notation (although it does not use J8 strings, like JSON8
27 and TSV8 do.)
28 - It's very important to understand that this is HTM8, not HTML8!
291. A subset of CSS for querying
301. Templating in the Markaby style (a bit like Lisp, but unlike JSX templates)
31
32The basic goal is to write ad hod HTML processors.
33
34YSH programs should loosely follow the style of the DOM API in web browsers,
35e.g. `document.querySelectorAll('table#mytable')` and the doc fragments it
36returns.
37
38Note that the DOM API is not available in node.js or Deno by default, much less
39any alternative lightweight JavaScript runtimes.
40
41I believe we can write include something that's simpler, and just as powerful,
42in YSH.
43
44## Use Cases for HTML Processing
45
46These will help people get an idea.
47
481. making Oils cross-ref.html
49 - query and replacement
501. table language - md-ul-table
51 - query and replacement
52 - many tables to make here
531. safe HTML subset, e.g. for publishing user results on continuous build
54 - well I think I want to encode the policy, like
55 - query
56
57Design goals:
58
59- Simple format that can be re-implemented anywhere
60 - a few re2c expressions
61- Fast
62 - re2c uses C
63 - Few allocations
64- much simpler than an entire browser engine
65
66## Operations
67
68- doc('<p>') - validates it and creates a value.Obj
69- docQuery(mydoc, '#element') - does a simple search
70
71Constructors:
72
73 doc { # prints valid HT8
74 p {
75 echo 'hi'
76 }
77 p {
78 'hi' # I think I want to turn on this auto-quote feature
79 }
80 raw '<b>bold</b>'
81 }
82
83And then
84
85 doc (&mydoc) { # captures the output, and creates a value.Obj
86 p {
87 'hi' # I think I want to turn on this auto-quote feature
88 "hi $x"
89 }
90 }
91
92This is the same as the table constructor
93
94Module:
95
96 source $LIB_YSH/doc.ysh
97
98 doc (&d) {
99 }
100 doc {
101 }
102 doc('<p>')
103
104 This can have both __invoke__ and __call__
105
106 var results = d.query('#a')
107
108 # The doc could be __invoke__ ?
109 d query '#a' {
110 }
111
112 doc query (d, '#a') {
113 for result in (results) {
114 echo hi
115 }
116 }
117
118 # we create (old, new) pairs?
119 # this is performs an operation like:
120 # d.outerHTML = outerHTML
121 var d = d.replace(pairs)
122
123
124Safe HTML subset
125
126 d query (tags= :|a p div h1 h2 h3|) {
127 case (_frag.tag) {
128 a {
129 # get a list of all attributes
130 var attrs = _frag.getAttributes()
131 }
132 }
133 }
134
135If you want to take user HTML, then you first use an HTML5 -> HT8 converter.