OILS / doc / ysh-io.md View on Github | oils.pub

367 lines, 264 significant
1---
2default_highlighter: oils-sh
3---
4
5YSH Input/Output
6================
7
8This doc describes how YSH improves upon I/O in shell.
9
10<!--
11TODO:
12
13- bar-g:
14 - reading from io.stdin twice in a row produces unexpected results
15- Inconsistent naming/usage of read -0 and read --raw-line
16- Buffered version of read -0? Not orthogonal
17 - io.stdin0? or io.stdinLines vs io.stdin0?
18
19More:
20
21- read --netstr - Length-prefixed reading mode
22- Encoding and Decoding
23 - JSON lines idiom? Note that @() is J8 Lines, which is different than JSON
24 lines!
25-->
26
27<div id="toc">
28</div>
29
30## Summary
31
32- The POSIX [read][] builtin is slow because it must read one byte at a time.
33 So YSH adds faster ways to read data ([ysh-read][]):
34 - Slurping whole files: `read --all`
35 - Reading in chunks: `read --num-bytes`
36 - Streaming of buffered lines: [io.stdin][]
37- YSH adds [J8 Notation][] for encoding and decoding (based on JSON)
38 - Writing isn't conflated with encoding (`echo -e`, [printf][])
39 - Reading isn't conflated with decoding `\` escapes ([read][])
40 - YSH adds `@(command splice)`, which improves on `$(command sub)` and word
41 splitting
42- YSH supports the NUL-terminated format: `find -print0 | xargs -0`
43 - TODO: streaming of buffered chunks?
44
45[printf]: ref/chap-builtin-cmd.html#printf
46
47These YSH constructs make string processing more orthogonal to I/O:
48
49- `${x %.2f}` as a static version of the [printf][] builtin (TODO)
50- `${x|html}` and `html"<p>$x</p>"` for safe escaping (TODO)
51
52[io.stdin]: ref/chap-type-method.html#stdin
53
54### Details on Problems with Shell
55
56- `echo $x` is a bug, because `$x` could be `-n`.
57 - The YSH [write][] builtin accepts `--`, and [echo][] doesn't accept any
58 flags.
59- In addition to [read][] being slow, the [mapfile][] builtin is also slow.
60- The [read][] builtin is confusing because it respects `\` escapes, unless
61 `-r` is passed.
62 - These `\` escapes create a mini-language that isn't understood by other
63 line-based tools like `grep` and `awk`. The set of escapes isn't
64 consistent between shells.
65- There's no way to tell if `$()` removes the trailing newline
66 - YSH has `read --all`, which preserves the data exactly.
67- `echo hi | read; echo $REPLY` doesn't work in bash because the last part of a
68 pipeline (`read`) runs in a child process. That is, the data is indeed read,
69 but it's **lost** to the rest of the program.
70 - In OSH and YSH, `echo hi | read` works because the last part of a pipeline
71 runs in the shell process. (This is what bash calls `shopt -s lastpipe`,
72 mentioned in [Known Differences][lastpipe].)
73
74[lastpipe]: known-differences.html#last-pipeline-part-may-run-in-shell-process-zsh-bash-shopt-s-lastpipe
75
76
77Examples:
78
79 hostname | read --all (&x)
80 write -- $x
81 echo $x
82
83[json]: ref/chap-builtin-cmd.html#json
84[write]: ref/chap-builtin-cmd.html#write
85[ysh-read]: ref/chap-builtin-cmd.html#ysh-read
86
87### Shell Pitfall: the Exit Code of `read`
88
89Suppose you have lines without a trailing `\n`:
90
91<!-- Note: these code blocks aren't executed by build/doc.sh
92 Because they have class="language-*"
93-->
94
95```oils-sh
96$ printf 'a\nb'
97a
98b # no trailing newline
99```
100
101Then this loop doesn't print the last line, because `read` fails if it doesn't
102see the newline delimiter.
103
104```oils-sh
105$ printf 'a\nb' | while read -r; do echo $REPLY done
106a
107```
108
109In contrast, a loop with YSH `read --raw-line` prints all lines:
110
111```oils-sh
112$ printf 'a\nb' | while read --raw-line { echo $_reply }
113a
114b
115```
116
117## Tested Invariants
118
119These examples show that YSH I/O is orthogonal and composable. You can **round
120trip** data between YSH data structures and the OS.
121
122### Set Up Test Data
123
124First, let's create files with funny names:
125
126 mkdir -p mydir
127 touch 'mydir/file with spaces'
128 touch b'mydir/newline \n file'
129
130And let's list these files in 3 different formats:
131
132 # Line-based: one file spans multiple lines
133 find . > lines.txt
134
135 # NUL-terminated
136 find . -print0 > 0.bin
137
138 # J8 lines
139 redir >j8-lines.txt {
140 for path in mydir/* {
141 write -- $[toJson8(path)]
142 }
143 }
144
145 head lines.txt j8-lines.txt
146
147Now let's test the invariants.
148
149### File -> String -> File
150
151Start with a file, slurp it into a string, and write it back to an equivalent
152file.
153
154 cat lines.txt | read --all
155
156 = _reply # (Str)
157
158 # suppress trailing newline
159 write --end '' -- $_reply > out.txt
160
161 # files are equal
162 diff lines.txt out.txt
163
164### File -> Array of Lines -> File (fast)
165
166Start with a file, read it into an array of lines, and write it back to an
167equivalent file.
168
169 # newlines removed on reading
170 var lines = []
171 cat lines.txt | for line in (io.stdin) {
172 call lines->append(line)
173 }
174
175 = lines # (List)
176
177 # newlines added
178 write -- @lines > out.txt
179
180 # files are equal, even though one path is split across lines
181 diff lines.txt out.txt
182
183### File -> Array of Lines -> File (slow)
184
185This idiom can be slow, since `read --raw-line` reads one byte at a time:
186
187 # newlines removed on reading
188 var paths = []
189 cat lines.txt | while read --raw-line (&path) {
190 call paths->append(path)
191 }
192
193 = paths # (List)
194
195 # newlines added
196 write -- @paths > out.txt
197
198 # files are equal, even though one path is split across lines
199 diff lines.txt out.txt
200
201### NUL File -> Array of Lines -> NUL File (fast)
202
203Start with a file, slurp it into a string, split it into an array, and write it
204back to an equivalent file.
205
206 var paths = []
207 read --all < 0.bin
208 var paths = _reply.split( \y00 ) # split by NUL
209
210 # last \y00 is terminator, not separator
211 # TODO: could improve this
212 call paths->pop()
213
214 = paths
215
216 # Use NUL separator and terminator
217 write --sep b'\y00' --end b'\y00' -- @paths > out0.bin
218
219 diff 0.bin out0.bin
220
221### NUL File -> Array of Lines -> NUL File (slow)
222
223This idiom can be slow, since `read -0` reads one byte at a time:
224
225 var paths = []
226 cat 0.bin | while read -0 path {
227 call paths->append(path)
228 }
229
230 = paths
231
232 # Use NUL separator and terminator
233 write --sep b'\y00' --end b'\y00' -- @paths > out0.bin
234
235 diff 0.bin out0.bin
236
237### J8 File -> Array of Lines -> J8 File
238
239Start with a file, slurp it into an array of lines, and write it back to an
240equivalent file.
241
242 var paths = @(cat j8-lines.txt)
243
244 = paths
245
246 redir >j8-out.txt {
247 for path in (paths) {
248 write -- $[toJson8(path)]
249 }
250 }
251
252 diff j8-lines.txt j8-out.txt
253
254### Array -> File of J8 Lines -> Array
255
256Start with an array, write it to a file, and slurp it back into an array.
257
258 var strs = :| 'with space' b'with \n newline' |
259 redir >j8-tmp.txt {
260 for s in (strs) {
261 write -- $[toJson8(s)]
262 }
263 }
264
265 cat j8-tmp.txt
266
267 # round-tripped
268 assert [strs === @(cat j8-tmp.txt)]
269
270## Reference
271
272### Three Types of I/O
273
274This table characterizes the performance of different ways to read input:
275
276<style>
277table {
278 margin-left: 2em;
279 background-color: #eee;
280}
281table code {
282 color: green;
283}
284thead {
285 background-color: white;
286}
287td {
288 vertical-align: top;
289}
290</style>
291
292<table cellpadding="10" cellspacing="5">
293
294- thead
295 - Performance
296 - Shell Constructs
297- tr
298 - Buffered, and therefore **fast**
299 - <div>
300
301 - [io.stdin][] - loop over lines
302
303 </div>
304- tr
305 - Unbuffered and **fast** <br/>
306 (large chunks)
307 - <div>
308
309 - [ysh-read][]: `read --all` and `--num-bytes`
310 - Shell `$(command sub)`
311 - YSH `@(command splice)`
312
313 </div>
314- tr
315 - Unbuffered and **slow** <br/>
316 (one byte at a time)
317 - <div>
318
319 - The POSIX shell [read][] builtin: either without flags, or with short
320 flags like `-r -d`
321 - The bash [mapfile][] builtin
322 - [ysh-read][]:
323 - YSH `read --raw-line` (replaces the idiom `IFS= read -r`)
324 - YSH `read -0` (replaces the idiom `read -r -d ''`)
325
326 </div>
327
328</table>
329
330[read]: ref/chap-builtin-cmd.html#read
331
332<!--
333That is, the POSIX flags to `read` issue many `read(0, 1)` calls. YSH provides
334replacements.
335-->
336
337## Related Docs
338
339- [J8 Notation][]
340- [JSON](json.html) in Oils
341- [Strings](strings.html) &dagger;
342
343[J8 Notation]: j8-notation.html
344
345### Help Topics
346
347- Builtin commands that are encouraged:
348 - [write][]
349 - [ysh-echo][]
350 - [ysh-read][]
351 - [json][]
352- Builtin commands in shell:
353 - [echo][]
354 - [printf][]
355 - [read][]
356 - [mapfile][] - this is also slow in shell
357- Types and Methods > [io.stdin][]
358- Word Language
359 - [command-sub][]
360 - [command-splice][] (YSH)
361
362[ysh-echo]: ref/chap-builtin-cmd.html#ysh-echo
363[echo]: ref/chap-builtin-cmd.html#echo
364[mapfile]: ref/chap-builtin-cmd.html#mapfile
365
366[command-sub]: ref/chap-word-lang.html#command-sub
367[command-splice]: ref/chap-word-lang.html#command-splice