OILS / doc / error-handling.md View on Github | oils.pub

749 lines, 501 significant
1---
2default_highlighter: oils-sh
3---
4
5YSH Fixes Shell's Error Handling (`errexit`)
6============================================
7
8<style>
9 .faq {
10 font-style: italic;
11 color: purple;
12 }
13
14 /* copied from web/blog.css */
15 .attention {
16 text-align: center;
17 background-color: #DEE;
18 padding: 1px 0.5em;
19
20 /* to match p tag etc. */
21 margin-left: 2em;
22 }
23</style>
24
25YSH is unlike other shells:
26
27- It never silently ignores an error, and it never loses an exit code.
28- There's no reason to write an YSH script without `errexit`, which is on by
29 default.
30
31This document explains how YSH makes these guarantees. We first review shell
32error handling, and discuss its fundamental problems. Then we show idiomatic
33YSH code, and look under the hood at the underlying mechanisms.
34
35(If you just want to **use** YSH, see [YSH Error Handling: A Quick
36Guide](ysh-error.html).)
37
38[file a bug]: https://github.com/oilshell/oil/issues
39
40<div id="toc">
41</div>
42
43## Review of Shell Error Handling Mechanisms
44
45POSIX shell has fundamental problems with error handling. With `set -e` aka
46`errexit`, you're [damned if you do and damned if you don't][bash-faq].
47
48GNU [bash]($xref) fixes some of the problems, but **adds its own**, e.g. with
49respect to process subs, command subs, and assignment builtins.
50
51YSH fixes all the problems by adding new builtin commands, special variables,
52and global options. But you see a simple interface with `try` and `_error`.
53
54Let's review a few concepts before discussing YSH.
55
56### POSIX Shell
57
58- The special variable `$?` is the exit status of the "last command". It's a
59 number between `0` and `255`.
60- If `errexit` is enabled, the shell will abort if `$?` is nonzero.
61 - This is subject to the *Disabled `errexit` Quirk*, which I describe below.
62
63These mechanisms are fundamentally incomplete.
64
65### Bash
66
67Bash improves error handling for pipelines like `ls /bad | wc`.
68
69- `${PIPESTATUS[@]}` stores the exit codes of all processes in a pipeline.
70- When `set -o pipefail` is enabled, `$?` takes into account every process in a
71 pipeline.
72 - Without this setting, the failure of `ls` would be ignored.
73- `shopt -s inherit_errexit` was introduced in bash 4.4 to re-introduce error
74 handling in command sub child processes. This fixes a bash-specific bug.
75
76But there are still places where bash will lose an exit code.
77
78&nbsp;
79
80## Fundamental Problems
81
82Let's look at **four** fundamental issues with shell error handling. They
83underlie the **nine** [shell pitfalls enumerated in the
84appendix](#list-of-pitfalls).
85
86### When Is `$?` Set?
87
88Each external process and shell builtin has one exit status. But the
89definition of `$?` is obscure: it's tied to the `pipeline` rule in the POSIX
90shell grammar, which does **not** correspond to a single process or builtin.
91
92We saw that `pipefail` fixes one case:
93
94 ls /nonexistent | wc # 2 processes, 2 exit codes, but just one $?
95
96But there are others:
97
98 local x=$(false) # 2 exit codes, but just one $?
99 diff <(sort left) <(sort right) # 3 exit codes, but just one $?
100
101This issue means that shell scripts fundamentally **lose errors**. The
102language is unreliable.
103
104### What Does `$?` Mean?
105
106Each process or builtin decides the meaning of its exit status independently.
107Here are two common choices:
108
1091. **The Failure Paradigm**
110 - `0` for success, or non-zero for an error.
111 - Examples: most shell builtins, `ls`, `cp`, ...
1121. **The Boolean Paradigm**
113 - `0` for true, `1` for false, or a different number like `2` for an error.
114 - Examples: the `test` builtin, `grep`, `diff`, ...
115
116New error handling constructs in YSH deal with this fundamental inconsistency.
117
118### The Meaning of `if`
119
120Shell's `if` statement tests whether a command exits zero or non-zero:
121
122 if grep class *.py; then
123 echo 'found class'
124 else
125 echo 'not found' # is this true?
126 fi
127
128So while you'd expect `if` to work in the boolean paradigm, it's closer to
129the failure paradigm. This means that using `if` with certain commands can
130cause the *Error or False Pitfall*:
131
132 if grep 'class\(' *.py; then # grep syntax error, status 2
133 echo 'found class('
134 else
135 echo 'not found is a lie'
136 fi
137 # => grep: Unmatched ( or \(
138 # => not found is a lie
139
140That is, the `else` clause conflates grep's **error** status 2 and **false**
141status 1.
142
143Strangely enough, I encountered this pitfall while trying to disallow shell's
144error handling pitfalls in YSH! I describe this in another appendix as the
145"[meta pitfall](#the-meta-pitfall)".
146
147### Design Mistake: The Disabled `errexit` Quirk
148
149There's more bad news about the design of shell's `if` statement. It's subject
150to the *Disabled `errexit` Quirk*, which means when you use a **shell function**
151in a conditional context, errors are unexpectedly **ignored**.
152
153That is, while `if ls /tmp` is useful, `if my-ls-function /tmp` should be
154avoided. It yields surprising results.
155
156I call this the *`if myfunc` Pitfall*, and show an example in [the
157appendix](#disabled-errexit-quirk-if-myfunc-pitfall).
158
159We can't fix this decades-old bug in shell. Instead we disallow dangerous code
160with `strict_errexit`, and add new error handling mechanisms.
161
162&nbsp;
163
164## YSH Error Handling: The Big Picture
165
166We've reviewed how POSIX shell and bash work, and showed fundamental problems
167with the shell language.
168
169But when you're using YSH, **you don't have to worry about any of this**!
170
171### YSH Fails On Every Error
172
173This means you don't have to explicitly check for errors. Examples:
174
175 shopt --set ysh:upgrade # Enable good error handling in bin/osh
176 # It's the default in bin/ysh.
177 shopt --set strict_errexit # Disallow bad shell error handling.
178 # Also the default in bin/ysh.
179
180 local date=$(date X) # 'date' failure is fatal
181 # => date: invalid date 'X'
182
183 echo $(date X) # ditto
184
185 echo $(date X) $(ls > F) # 'ls' isn't executed; 'date' fails first
186
187 ls /bad | wc # 'ls' failure is fatal
188
189 diff <(sort A) <(sort B) # 'sort' failure is fatal
190
191On the other hand, you won't experience this problem caused by `pipefail`:
192
193 yes | head # doesn't fail due to SIGPIPE
194
195The details are explained below.
196
197### `try` Handles Command and Expression Errors
198
199You may want to **handle failure** instead of aborting the shell. In this
200case, use the `try` builtin and inspect the `_error` variable it sets.
201
202 try { # try takes a block of commands
203 ls /etc
204 ls /BAD # it stops at the first failure
205 ls /lib
206 } # After try, $? is always 0
207 if (_error.code !== 0) { # Now check _error
208 echo 'failed'
209 }
210
211Note that:
212
213- The `_error.code` variable is different than `$?`.
214 - The leading `_` is a PHP-like convention for special variables /
215 "registers" in YSH.
216- Idiomatic YSH programs don't look at `$?`.
217
218You also have fine-grained control over every process in a pipeline:
219
220 try {
221 ls /bad | wc
222 }
223 write -- @_pipeline_status # every exit status
224
225And each process substitution:
226
227 try {
228 diff <(sort left.txt) <(sort right.txt)
229 }
230 write -- @_process_sub_status # every exit status
231
232
233&nbsp;
234
235<div class="attention">
236
237See [YSH vs. Shell Idioms > Error Handling](idioms.html#error-handling) for
238more examples.
239
240</div>
241
242&nbsp;
243
244Certain expressions produce fatal errors, like:
245
246 var x = 42 / 0 # divide by zero will abort shell
247
248The `try` builtin also handles them:
249
250 try {
251 var x = 42 / 0
252 }
253 if failed {
254 echo 'divide by zero'
255 }
256
257More examples:
258
259- Index out of bounds `a[i]`
260- Nonexistent key `d->foo` or `d['foo']`.
261
262Such expression evaluation errors result in status `3`, which is an arbitrary non-zero
263status that's not used by other shells. Status `2` is generally for syntax
264errors and status `1` is for most runtime failures.
265
266### `boolstatus` Enforces 0 or 1 Status
267
268The `boolstatus` builtin addresses the *Error or False Pitfall*:
269
270 if boolstatus grep 'class' *.py { # may abort the program
271 echo 'found' # status 0 means 'found'
272 } else {
273 echo 'not found' # status 1 means 'not found'
274 }
275
276Rather than confusing **error** with **false**, `boolstatus` will abort the
277program if `grep` doesn't return 0 or 1.
278
279You can think of this as a shortcut for
280
281 try {
282 grep 'class' *.py
283 }
284 case (_error.code) {
285 (0) { echo 'found' }
286 (1) { echo 'not found' }
287 (else) { echo 'fatal'
288 exit $[_error.code]
289 }
290 }
291
292### FAQ on Language Design
293
294<div class="faq">
295
296Why is there `try` but no `catch`?
297
298</div>
299
300First, it offers more flexibility:
301
302- The handler usually inspects `_error.code`, but it may also inspect
303 `_pipeline_status` or `_process_sub_status`.
304- The handler may use `case` instead of `if`, e.g. to distinguish true / false
305 / error.
306
307Second, it makes the language smaller:
308
309- `try` / `catch` would require specially parsed keywords. But our `try` is a
310 shell builtin that takes a block, like `cd` or `shopt`.
311- The builtin also lets us write either `try ls` or `try { ls }`, which is hard
312 with a keyword.
313
314Another way to remember this is that there are **three parts** to handling an
315error, each of which has independent choices:
316
3171. Does `try` take a simple command or a block? For example, `try ls` versus
318 `try { ls; var x = 42 / n }`
3192. Which status do you want to inspect?
3203. Inspect it with `if` or `case`? As mentioned, `boolstatus` is a special
321 case of `try / case`.
322
323<div class="faq">
324
325Why is `_error.code` different from `$?`
326
327</div>
328
329This avoids special cases in the interpreter for `try`, which is again a
330builtin that takes a block.
331
332The exit status of `try` is always `0`. If it returned a non-zero status, the
333`errexit` rule would trigger, and you wouldn't be able to handle the error!
334
335Generally, [errors occur *inside* blocks, not
336outside](proc-block-func.html#errors).
337
338Again, idiomatic YSH scripts never look at `$?`, which is only used to trigger
339shell's `errexit` rule. Instead they invoke `try` and inspect `_error.code`
340when they want to handle errors.
341
342<div class="faq">
343
344Why `boolstatus`? Can't you just change what `if` means in YSH?
345
346</div>
347
348I've learned the hard way that when there's a shell **semantics** change, there
349must be a **syntax** change. In general, you should be able to read code on
350its own, without context.
351
352Readers shouldn't have to constantly look up whether `ysh:upgrade` is on. There
353are some cases where this is necessary, but it should be minimized.
354
355Also, both `if foo` and `if boolstatus foo` are useful in idiomatic YSH code.
356
357&nbsp;
358
359<div class="attention">
360
361**Most users can skip to [the summary](#summary).** You don't need to know all
362the details to use YSH.
363
364</div>
365
366&nbsp;
367
368## Reference: Global Options
369
370
371Under the hood, we implement the `errexit` option from POSIX, bash options like
372`pipefail` and `inherit_errexit`, and add more options of our
373own. They're all hidden behind [option groups](options.html) like `strict:all`
374and `ysh:upgrade`.
375
376The following sections explain new YSH options.
377
378### `command_sub_errexit` Adds More Errors
379
380In all Bourne shells, the status of command subs is lost, so errors are ignored
381(details in the [appendix](#quirky-behavior-of)). For example:
382
383 echo $(date X) $(date Y) # 2 failures, both ignored
384 echo # program continues
385
386The `command_sub_errexit` option makes both `date` invocations an an error.
387The status `$?` of the parent `echo` command will be `1`, so if `errexit` is
388on, the shell will abort.
389
390(Other shells should implement `command_sub_errexit`!)
391
392### `process_sub_fail` Is Analogous to `pipefail`
393
394Similarly, in this example, `sort` will fail if the file doesn't exist.
395
396 diff <(sort left.txt) <(sort right.txt) # any failures are ignored
397
398But there's no way to see this error in bash. YSH adds `process_sub_fail`,
399which folds the failure into `$?` so `errexit` can do its job.
400
401You can also inspect the special `_process_sub_status` array variable to
402implement custom error logic.
403
404### `strict_errexit` Flags Two Problems
405
406Like other `strict_*` options, YSH `strict_errexit` improves your shell
407programs, even if you run them under another shell like [bash]($xref)! It's
408like a linter *at runtime*, so it can catch things that [ShellCheck][] can't.
409
410[ShellCheck]: https://www.shellcheck.net/
411
412`strict_errexit` disallows code that exhibits these problems:
413
4141. The `if myfunc` Pitfall
4151. The `local x=$(false)` Pitfall
416
417See the appendix for examples of each.
418
419#### Rules to Prevent the `if myfunc` Pitfall
420
421In any conditional context, `strict_errexit` disallows:
422
4231. All commands except `((`, `[[`, and some simple commands (e.g. `echo foo`).
424 - Detail: `! ls` is considered a pipeline in the shell grammar. We have to
425 allow it, while disallowing `ls | grep foo`.
4262. Function/proc invocations (which are a special case of simple
427 commands.)
4283. Command sub and process sub (`shopt --unset allow_csub_psub`)
429
430This means that you should check the exit status of functions and pipeline
431differently. See [Does a Function
432Succeed?](idioms.html#does-a-function-succeed), [Does a Pipeline
433Succeed?](idioms.html#does-a-pipeline-succeed), and other [YSH vs. Shell
434Idioms](idioms.html).
435
436#### Rule to Prevent the `local x=$(false)` Pitfall
437
438- Command Subs and process subs are disallowed in assignment builtins: `local`,
439 `declare` aka `typeset`, `readonly`, and `export`.
440
441No:
442
443 local x=$(false)
444
445Yes:
446
447 var x = $(false) # YSH style
448
449 local x # Shell style
450 x=$(false)
451
452### `sigpipe_status_ok` Ignores an Issue With `pipefail`
453
454When you turn on `pipefail`, you may inadvertently run into this behavior:
455
456 yes | head
457 # => y
458 # ...
459
460 echo ${PIPESTATUS[@]}
461 # => 141 0
462
463That is, `head` closes the pipe after 10 lines, causing the `yes` command to
464**fail** with `SIGPIPE` status `141`.
465
466This error shouldn't be fatal, so OSH has a `sigpipe_status_ok` option, which
467is on by default in YSH.
468
469### `verbose_errexit`
470
471When `verbose_errexit` is on, the shell prints errors to `stderr` when the
472`errexit` rule is triggered.
473
474### FAQ on Options
475
476<div class="faq">
477
478Why is there no `_command_sub_status`? And why is `command_sub_errexit` named
479differently than `process_sub_fail` and `pipefail`?
480
481</div>
482
483Command subs are executed **serially**, while process subs and pipeline parts
484run **in parallel**.
485
486So a command sub can "abort" its parent command, setting `$?` immediately.
487The parallel constructs must wait until all parts are done and save statuses in
488an array. Afterward, they determine `$?` based on the value of `pipefail` and
489`process_sub_fail`.
490
491<div class="faq">
492
493Why are `strict_errexit` and `command_sub_errexit` different options?
494
495</div>
496
497Because `shopt --set strict:all` can be used to improve scripts that are run
498under other shells like [bash]($xref). It's like a runtime linter that
499disallows dangerous constructs.
500
501On the other hand, if you write code with `command_sub_errexit` on, it's
502impossible to get the same failures under bash. So `command_sub_errexit` is
503not a `strict_*` option, and it's meant for code that runs only under YSH.
504
505<div class="faq">
506
507What's the difference between bash's `inherit_errexit` and YSH
508`command_sub_errexit`? Don't they both relate to command subs?
509
510</div>
511
512- `inherit_errexit` enables failure in the **child** process running the
513 command sub.
514- `command_sub_errexit` enables failure in the **parent** process, after the
515 command sub has finished.
516
517&nbsp;
518
519## Summary
520
521YSH uses three mechanisms to fix error handling once and for all.
522
523It has two new **builtins** that relate to errors:
524
5251. `try` lets you explicitly handle errors when `errexit` is on.
5261. `boolstatus` enforces a true/false meaning. (This builtin is less common).
527
528It has three **special variables**:
529
5301. The `_error` register, which is set by `try`.
531 - Remember that `_error.code` is distinct from `$?`, and that idiomatic YSH
532 programs don't use `$?`.
5331. The `_pipeline_status` array (another name for bash's `PIPESTATUS`)
5341. The `_process_sub_status` array for process substitutions.
535
536Finally, it supports all of these **global options**:
537
538- From POSIX shell:
539 - `errexit`
540- From [bash]($xref):
541 - `pipefail`
542 - `inherit_errexit` aborts the child process of a command sub.
543- New:
544 - `command_sub_errexit` aborts the parent process immediately after a failed
545 command sub.
546 - `process_sub_fail` is analogous to `pipefail`.
547 - `strict_errexit` flags two common problems.
548 - `sigpipe_status_ok` ignores a spurious "broken pipe" failure.
549 - `verbose_errexit` controls whether error messages are printed.
550
551When using `bin/osh`, set all options at once with `shopt --set ysh:upgrade
552strict:all`. Or use `bin/ysh`, where they're set by default.
553
554<!--
555Related 2020 blog post [Reliable Error
556Handling](https://www.oilshell.org/blog/2020/10/osh-features.html#reliable-error-handling).
557-->
558
559
560## Related Docs
561
562- [YSH vs. Shell Idioms](idioms.html) shows more examples of `try` and `boolstatus`.
563- [Shell Idioms](shell-idioms.html) has a section on fixing `strict_errexit`
564 problems in Bourne shell.
565
566Good articles on `errexit`:
567
568- Bash FAQ: [Why doesn't `set -e` do what I expected?][bash-faq]
569- [Bash: Error Handling](http://fvue.nl/wiki/Bash:_Error_handling) from
570 `fvue.nl`
571
572[bash-faq]: http://mywiki.wooledge.org/BashFAQ/105
573
574Spec Test Suites:
575
576- <https://www.oilshell.org/release/latest/test/spec.wwz/survey/errexit.html>
577- <https://www.oilshell.org/release/latest/test/spec.wwz/survey/errexit-oil.html>
578
579These docs aren't about error handling, but they're also painstaking
580backward-compatible overhauls of shell!
581
582- [Simple Word Evaluation in Unix Shell](simple-word-eval.html)
583- [Egg Expressions (YSH Regexes)](eggex.html)
584
585For reference, this work on error handling was described in [Four Features That
586Justify a New Unix
587Shell](https://www.oilshell.org/blog/2020/10/osh-features.html) (October 2020).
588Since then, we changed `try` and `_error` to be more powerful and general.
589
590&nbsp;
591
592## Appendices
593
594### List Of Pitfalls
595
596We mentioned some of these pitfalls:
597
5981. The `if myfunc` Pitfall, caused by the Disabled `errexit` Quirk (`strict_errexit`)
5991. The `local x=$(false)` Pitfall (`strict_errexit`)
6001. The Error or False Pitfall (`boolstatus`, `try` / `case`)
601 - Special case: When the child process is another instance of the shell, the
602 Meta Pitfall is possible.
6031. The Process Sub Pitfall (`process_sub_fail` and `_process_sub_status`)
6041. The `yes | head` Pitfall (`sigpipe_status_ok`)
605
606There are two pitfalls related to command subs:
607
6086. The `echo $(false)` Pitfall (`command_sub_errexit`)
6096. Bash's `inherit_errexit` pitfall.
610 - As mentioned, this bash 4.4 option fixed a bug in earlier versions of
611 bash. YSH reimplements it and turns it on by default.
612
613Here are two more pitfalls that don't require changes to YSH:
614
6158. The Trailing `&&` Pitfall
616 - When `test -d /bin && echo found` is at the end of a function, the exit
617 code is surprising.
618 - Solution: always use `if` rather than `&&`.
619 - More reasons: the `if` is easier to read, and `&&` isn't useful when
620 `errexit` is on.
6218. The surprising return value of `(( i++ ))`, `let`, `expr`, etc.
622 - Solution: Use `i=$((i + 1))`, which is valid POSIX shell.
623 - In YSH, use `setvar i += 1`.
624
625#### Example of `inherit_errexit` Pitfall
626
627In bash, `errexit` is disabled in command sub child processes:
628
629 set -e
630 shopt -s inherit_errexit # needed to avoid 'touch two'
631 echo $(touch one; false; touch two)
632
633Without the option, it will touch both files, even though there is a failure
634`false` after the first.
635
636#### Bash has a grammatical quirk with `set -o failglob`
637
638This isn't a pitfall, but a quirk that also relates to errors and shell's
639**grammar**. Recall that the definition of `$?` is tied to the grammar.
640
641Consider this program:
642
643 set -o failglob
644 echo *.ZZ # no files match
645 echo status=$? # show failure
646 # => status=1
647
648This is the same program with a newline replaced by a semicolon:
649
650 set -o failglob
651
652 # Surprisingly, bash doesn't execute what's after ;
653 echo *.ZZ; echo status=$?
654 # => (no output)
655
656But it behaves differently. This is because newlines and semicolons are handled
657in different **productions of the grammar**, and produce distinct syntax trees.
658
659(A related quirk is that this same difference can affect the number of
660processes that shells start!)
661
662### Disabled `errexit` Quirk / `if myfunc` Pitfall
663
664This quirk is a bad interaction between the `if` statement, shell functions,
665and `errexit`. It's a **mistake** in the design of the shell language.
666Example:
667
668 set -o errexit # don't ignore errors
669
670 myfunc() {
671 ls /bad # fails with status 1
672 echo 'should not get here'
673 }
674
675 myfunc # Good: script aborts before echo
676 # => ls: '/bad': no such file or directory
677
678 if myfunc; then # Surprise! It behaves differently in a condition.
679 echo OK
680 fi
681 # => ls: '/bad': no such file or directory
682 # => should not get here
683
684We see "should not get here" because the shell **silently disables** `errexit`
685while executing the condition of `if`. This relates to the fundamental
686problems above:
687
6881. Does the function use the failure paradigm or the boolean paradigm?
6892. `if` tests a single exit status, but every command in a function has an exit
690 status. Which one should we consider?
691
692This quirk occurs in all **conditional contexts**:
693
6941. The condition of the `if`, `while`, and `until` constructs
6952. A command/pipeline prefixed by `!` (negation)
6963. Every clause in `||` and `&&` except the last.
697
698### The Meta Pitfall
699
700I encountered the *Error or False Pitfall* while trying to disallow other error
701handling pitfalls! The *meta pitfall* arises from a combination of the issues
702discussed:
703
7041. The `if` statement tests for zero or non-zero status.
7051. The condition of an `if` may start child processes. For example, in `if
706 myfunc | grep foo`, the `myfunc` invocation must be run in a subshell.
7071. You may want an external process to use the **boolean paradigm**, and
708 that includes **the shell itself**. When any of the `strict_` options
709 encounters bad code, it aborts the shell with **error** status `1`, not
710 boolean **false** `1`.
711
712The result of this fundamental issue is that `strict_errexit` is quite strict.
713On the other hand, the resulting style is straightforward and explicit.
714Earlier attempts allowed code that is too subtle.
715
716### Quirky Behavior of `$?`
717
718This is a different way of summarizing the information above.
719
720Simple commands have an obvious behavior:
721
722 echo hi # $? is 0
723 false # $? is 1
724
725But the parent process loses errors from failed command subs:
726
727 echo $(false) # $? is 0
728 # YSH makes it fail with command_sub_errexit
729
730Surprisingly, bare assignments take on the value of any command subs:
731
732 x=$(false) # $? is 1 -- we did NOT lose the exit code
733
734But assignment builtins have the problem again:
735
736 local x=$(false) # $? is 0 -- exit code is clobbered
737 # disallowed by YSH strict_errexit
738
739So shell is confusing and inconsistent, but YSH fixes all these problems. You
740never lose the exit code of `false`.
741
742
743&nbsp;
744
745## Acknowledgments
746
747- Thank you to `ca2013` for extensive review and proofreading of this doc.
748
749