OILS / doc / simple-word-eval.md View on Github | oils.pub

355 lines, 249 significant
1Simple Word Evaluation in Unix Shell
2====================================
3
4This document describes the YSH word evaluation semantics (`shopt -s
5simple_word_eval`) for experienced shell users. It may also be useful to
6those who want to implement this behavior in another shell.
7
8The main idea is that YSH behaves like a traditional programming language:
9
101. It's **parsed** from start to end [in a single pass][parsing-shell].
112. It's **evaluated** in a single step too.
12
13That is, parsing and evaluation aren't interleaved, and code and data aren't
14confused.
15
16[parsing-shell]: https://www.oilshell.org/blog/2019/02/07.html
17
18[posix-spec]: https://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06
19
20
21<div id="toc">
22</div>
23
24## An Analogy: Word Expressions Should Be Like Arithmetic Expressions
25
26In YSH, "word expressions" like
27
28 $x
29 "hello $name"
30 $(hostname)
31 'abc'$x${y:-${z//pat/replace}}"$(echo hi)$((a[i] * 3))"
32
33are parsed and evaluated in a straightforward way, like this expression when `x
34== 2`:
35
36```sh-prompt
371 + x / 2 + x * 3 → 8 # Python, JS, Ruby, etc. work this way
38```
39
40In contrast, in shell, words are "expanded" in multiple stages, like this:
41
42```sh-prompt
431 + "x / 2 + \"x * 3\"" → 8 # Hypothetical, confusing language
44```
45
46That is, it would be odd if Python looked *inside a program's strings* for
47expressions to evaluate, but that's exactly what shell does! There are
48multiple places where there's a silent `eval`, and you need **quoting** to
49inhibit it. Neglecting this can cause security problems due to confusing code
50and data (links below).
51
52In other words, the **defaults are wrong**. Programmers are surprised by shell's
53behavior, and it leads to incorrect programs.
54
55So in YSH, you can opt out of the multiple "word expansion" stages described in
56the [POSIX shell spec][posix-spec]. Instead, there's only **one stage**:
57evaluation.
58
59## Design Goals
60
61The new semantics should be easily adoptable by existing shell scripts.
62
63- Importantly, `bin/osh` is POSIX-compatible and runs real [bash]($xref)
64 scripts. You can gradually opt into **stricter and saner** behavior with
65 `shopt` options (or by running `bin/ysh`). The most important one is
66 [simple_word_eval]($help), and the others are listed below.
67- Even after opting in, the new syntax shouldn't break many scripts. If it
68 does break, the change to fix it should be small. For example, `echo @foo`
69 is not too common, and it can be made bash-compatible by quoting it: `echo
70 '@foo'`.
71
72<!--
73It's technically incompatible but I think it will break very few scripts.
74
75-->
76
77## Examples
78
79In the following examples, the [argv][] command prints the `argv` array it
80receives in a readable format:
81
82```sh-prompt
83$ argv one "two three"
84['one', 'two three']
85```
86
87I also use the YSH [var]($help) keyword for assignments. *(TODO: This could be
88rewritten with shell assignment for the benefit of shell implementers)*
89
90[argv]: $oils-src:spec/bin/argv.py
91
92### No Implicit Splitting, Dynamic Globbing, or Empty Elision
93
94In YSH, the following constructs always evaluate to **one argument**:
95
96- Variable / "parameter" substitution: `$x`, `${y}`
97- Command sub: `$(echo hi)` or backticks
98- Arithmetic sub: `$(( 1 + 2 ))`
99
100
101<!--
102Related help topics: [command-sub]($help), [var-sub]($help), [arith-sub]($help).
103Not shown: [tilde-sub]($help).
104-->
105
106That is, quotes aren't necessary to avoid:
107
108- **Word Splitting**, which uses `$IFS`.
109- **Empty Elision**. For example, `x=''; ls $x` passes `ls` no arguments.
110- **Dynamic Globbing**. Globs are *dynamic* when the pattern comes from
111 program data rather than the source code.
112
113<!-- - Tilde Sub: `~bob/src` -->
114
115Here's an example showing that each construct evaluates to one arg in YSH:
116
117```sh-prompt
118ysh$ var pic = 'my pic.jpg' # filename with spaces
119ysh$ var empty = ''
120ysh$ var pat = '*.py' # pattern stored in a string
121
122ysh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
123['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
124```
125
126In contrast, shell applies splitting, globbing, and empty elision after the
127substitutions. Each of these operations returns an indeterminate number of
128strings:
129
130```sh-prompt
131sh$ pic='my pic.jpg' # filename with spaces
132sh$ empty=
133sh$ pat='*.py' # pattern stored in a string
134
135sh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
136['my', 'pic.jpg', 'a.py', 'b.py', 'contents', 'of', 'foo.txt', '3']
137```
138
139To get the desired behavior, you have to use double quotes:
140
141```sh-prompt
142sh$ argv "${pic}" "$empty" "$pat", "$(cat foo.txt)" "$((1 + 2))"
143['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
144```
145
146### Splicing, Static Globbing, and Brace Expansion
147
148The constructs in the last section evaluate to a **single argument**. In
149contrast, these three constructs evaluate to **0 to N arguments**:
150
1511. **Splicing** an array: `"$@"` and `"${myarray[@]}"`
1522. **Static Globbing**: `echo *.py`. Globs are *static* when they occur in the
153 program text.
1543. **Brace expansion**: `{alice,bob}@example.com`
155
156In YSH, `shopt -s parse_at` enables these shortcuts for splicing:
157
158- `@myarray` for `"${myarray[@]}"`
159- `@ARGV` for `"$@"`
160
161Example:
162
163```sh-prompt
164ysh$ var myarray = :| 'a b' c | # array with 2 elements
165ysh$ set -- 'd e' f # 2 arguments
166
167ysh$ argv @myarray @ARGV *.py {ian,jack}@sh.com
168['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
169```
170
171is just like:
172
173
174```sh-prompt
175bash$ myarray=('a b' c)
176bash$ set -- 'd e' f
177
178bash$ argv "${myarray[@]}" "$@" *.py {ian,jack}@sh.com
179['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
180```
181
182Unchanged: quotes disable globbing and brace expansion:
183
184```sh-prompt
185$ echo *.py
186foo.py bar.py
187
188$ echo "*.py" # globbing disabled with quotes
189*.py
190
191$ echo {spam,eggs}.sh
192spam.sh eggs.sh
193
194$ echo "{spam,eggs}.sh" # brace expansion disabled with quotes
195{spam,eggs}.sh
196```
197
198<!--
199help topics:
200
201- braces
202- glob
203- splice
204
205More:
206- inline-call
207
208-->
209
210## Where These Rules Apply
211
212These rules apply when a **sequence** of words is being evaluated, exactly as
213in shell:
214
2151. [Command]($help:simple-command): `echo $x foo`
2162. [For loop]($help:for): `for i in $x foo; do ...`
2173. [Array Literals]($help:array): `a=($x foo)` and `var a = :| $x foo |` ([ysh-array]($help))
218
219Shell has other word evaluation contexts like:
220
221```sh-prompt
222sh$ x="${not_array[@]}"
223sh$ echo hi > "${not_array[@]}"
224```
225
226which aren't affected by [simple_word_eval]($help).
227
228<!--
229EvalWordSequence
230-->
231
232## Opt In to the Old Behavior With Explicit Expressions
233
234YSH can express everything that shell can.
235
236- Split with `@[split(mystr, IFS?)]`
237- Glob with `@[glob(mypat)]`
238- Elision with `@[maybe(s)]`
239
240## More Word Evaluation Issues
241
242### More `shopt` Options
243
244- [nullglob]($help) - Globs matching nothing don't evaluate to code.
245- [dashglob]($help) is true by default, but **disabled** when YSH is enabled, so that
246 files that begin with `-` aren't returned. This avoids [confusing flags and
247 files](https://www.oilshell.org/blog/2020/02/dashglob.html).
248
249Strict options cause fatal errors:
250
251- [strict_tilde]($help) - Failed tilde expansions don't evaluate to code.
252- [strict_word_eval]($help) - Invalid slices and invalid UTF-8 aren't ignored.
253
254### Arithmetic Is Statically Parsed
255
256This is an intentional incompatibility described in the [Known
257Differences](known-differences.html#static-parsing) doc.
258
259<!--
260TODO: also allow
261
262var parts = @[split(x)]
263var python = @[glob('*.py')]
264-->
265
266## Summary
267
268YSH word evaluation is enabled with `shopt -s simple_word_eval`, and proceeds
269in a single step.
270
271Variable, command, and arithmetic substitutions predictably evaluate to a
272**single argument**, regardless of whether they're empty or have spaces.
273There's no implicit splitting, globbing, or elision of empty words.
274
275You can opt into those behaviors with explicit expressions like
276`@[split(mystr)]`, which evaluates to an array.
277
278YSH also supports shell features that evaluate to **0 to N arguments**:
279splicing, globbing, and brace expansion.
280
281There are other options that "clean up" word evaluation. All options are
282designed to be gradually adopted by other shells, shell scripts, and eventually
283POSIX.
284
285## Notes
286
287### Related Documents
288
289- [The Simplest Explanation of
290 Oil](http://www.oilshell.org/blog/2020/01/simplest-explanation.html). Some
291 color on the rest of the language.
292- [Known Differences Between OSH and Other Shells](known-differences.html).
293 Mentioned above: Arithmetic is statically parsed. Arrays and strings are
294 kept separate.
295- [OSH Word Evaluation Algorithm][wiki-word-eval] on the Wiki. Informally
296 describes the data structures, and describes legacy constructs.
297- [Security implications of forgetting to quote a variable in bash/POSIX
298 shells](https://unix.stackexchange.com/questions/171346/security-implications-of-forgetting-to-quote-a-variable-in-bash-posix-shells)
299 by Stéphane Chazelas. Describes the "implicit split+glob" operator, which
300 YSH word evaluation removes.
301 - This is essentially the same [security
302 issue](http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem)
303 I rediscovered in January 2019. It appears in all [ksh]($xref)-derived shells, and some shells
304 recently patched it. I wasn't able to exploit in a "real" context;
305 otherwise I'd have made more noise about it.
306 - Also described by the Fedora Security team: [Defensive Coding: Shell Double Expansion](https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Shell-Double_Expansion.html)
307
308[wiki-word-eval]: https://github.com/oilshell/oil/wiki/OSH-Word-Evaluation-Algorithm
309
310### Tip: View the Syntax Tree With `-n`
311
312This gives insight into [how Oils parses shell][parsing-shell]:
313
314```sh-prompt
315$ osh -n -c 'echo ${x:-default}$(( 1 + 2 ))'
316(C {<echo>}
317 {
318 (braced_var_sub
319 token: <Id.VSub_Name x>
320 suffix_op: (suffix_op.Unary op_id:Id.VTest_ColonHyphen arg_word:{<default>})
321 )
322 (word_part.ArithSub
323 anode:
324 (arith_expr.Binary
325 op_id: Id.Arith_Plus
326 left: (arith_expr.ArithWord w:{<Id.Lit_Digits 1>})
327 right: (arith_expr.ArithWord w:{<Id.Lit_Digits 2>})
328 )
329 )
330 }
331)
332```
333
334You can pass `--ast-format text` for more details.
335
336Evaluation of the syntax tree is a single step.
337
338
339<!--
340
341### Elision Without @[maybe()]
342
343The `@[maybe(s)]` function is a shortcut for something like:
344
345```
346var x = '' # empty in this case
347var tmp = :| |
348if (x) { # test if string is non-empty
349 append $x (tmp) # appends 'x' to the array variable 'tmp'
350}
351```
352
353This is how it's used:
354
355-->