OILS / doc / ysh-tour.md View on Github | oils.pub

1602 lines, 1094 significant
1---
2default_highlighter: oils-sh
3---
4
5A Tour of YSH
6=============
7
8<!-- author's note about example names
9
10- people: alice, bob
11- nouns: ale, bean
12 - peanut, coconut
13- 42 for integers
14-->
15
16This doc describes the [YSH]($xref) language from **clean slate**
17perspective. We don't assume you know Unix shell, or the compatible
18[OSH]($xref). But shell users will see the similarity, with simplifications
19and upgrades.
20
21Remember, YSH is for Python and JavaScript users who avoid shell! See the
22[project FAQ][FAQ] for more color on that.
23
24[FAQ]: https://www.oilshell.org/blog/2021/01/why-a-new-shell.html
25
26This document is **long** because it demonstrates nearly every feature of the
27language. You may want to read it in multiple sittings, or read [The Simplest
28Explanation of
29Oil](https://www.oilshell.org/blog/2020/01/simplest-explanation.html) first.
30(Until 2023, YSH was called the "Oil language".)
31
32
33Here's a summary of what follows:
34
351. YSH has interleaved *word*, *command*, and *expression* languages.
36 - The command language has Ruby-like *blocks*, and the expression language
37 has Python-like *data types*.
382. YSH has both builtin *commands* like `cd /tmp`, and builtin *functions* like
39 `join()`.
403. Languages for *data*, like [JSON][], are complementary to YSH code.
414. OSH and YSH share both an *interpreter data model* and a *process model*
42 (provided by the Unix kernel). Understanding these common models will make
43 you both a better shell user and YSH user.
44
45Keep these points in mind as you read the details below.
46
47[JSON]: https://json.org
48
49<div id="toc">
50</div>
51
52## Preliminaries
53
54Start YSH just like you start bash or Python:
55
56<!-- oils-sh below skips code block extraction, since it doesn't run -->
57
58```sh-prompt
59bash$ ysh # assuming it's installed
60
61ysh$ echo 'hello world' # command typed into YSH
62hello world
63```
64
65In the sections below, we'll save space by showing output **in comments**, with
66`=>`:
67
68 echo 'hello world' # => hello world
69
70Multi-line output is shown like this:
71
72 echo one
73 echo two
74 # =>
75 # one
76 # two
77
78## Examples
79
80### Hello World Script
81
82You can also type commands into a file like `hello.ysh`. This is a complete
83YSH program, which is identical to a shell program:
84
85 echo 'hello world' # => hello world
86
87### A Taste of YSH
88
89Unlike shell, YSH has `var` and `const` keywords:
90
91 const name = 'world' # const is rarer, used the top-level
92 echo "hello $name" # => hello world
93
94They take rich Python-like expressions on the right:
95
96 var x = 42 # an integer, not a string
97 setvar x = x * 2 + 1 # mutate with the 'setvar' keyword
98
99 setvar x += 5 # Increment by 5
100 echo $x # => 6
101
102 var mylist = [x, 7] # two integers [6, 7]
103
104Expressions are often surrounded by `()`:
105
106 if (x > 0) {
107 echo 'positive'
108 } # => positive
109
110 for i, item in (mylist) { # 'mylist' is a variable, not a string
111 echo "[$i] item $item"
112 }
113 # =>
114 # [0] item 6
115 # [1] item 7
116
117YSH has Ruby-like blocks:
118
119 cd /tmp {
120 echo hi > greeting.txt # file created inside /tmp
121 echo $PWD # => /tmp
122 }
123 echo $PWD # prints the original directory
124
125And utilities to read and write JSON:
126
127 var person = {name: 'bob', age: 42}
128 json write (person)
129 # =>
130 # {
131 # "name": "bob",
132 # "age": 42,
133 # }
134
135 echo '["str", 42]' | json read # sets '_reply' variable by default
136
137The `=` keyword evaluates and prints an expression:
138
139 = _reply
140 # => (List) ["str", 42]
141
142(Think of it like `var x = _reply`, without the `var`.)
143
144## Word Language: Expressions for Strings (and Arrays)
145
146Let's describe the word language first, and then talk about commands and
147expressions. Words are a rich language because **strings** are a central
148concept in shell.
149
150### Unquoted Words
151
152Words denote strings, but you often don't need to quote them:
153
154 echo hi # => hi
155
156Quotes are useful when a string has spaces, or punctuation characters like `( )
157;`.
158
159### Three Kinds of String Literals
160
161You can choose the style that's most convenient to write a given string.
162
163#### Double-Quoted, Single-Quoted, and J8 strings (like JSON)
164
165Double-quoted strings allow **interpolation**, with `$`:
166
167 var person = 'alice'
168 echo "hi $person, $(echo bye)" # => hi alice, bye
169
170Write operators by escaping them with `\`:
171
172 echo "\$ \" \\ " # => $ " \
173
174In single-quoted strings, all characters are **literal** (except `'`, which
175can't be expressed):
176
177 echo 'c:\Program Files\' # => c:\Program Files\
178
179If you want C-style backslash **character escapes**, use a J8 string, which is
180like JSON, but with single quotes:
181
182 echo u' A is \u{41} \n line two, with backslash \\'
183 # =>
184 # A is A
185 # line two, with backslash \
186
187The `u''` strings are guaranteed to be valid Unicode (unlike JSON). You can
188also use `b''` strings:
189
190 echo b'byte \yff' # Byte that's not valid unicode, like \xff in C.
191 # Don't confuse it with \u{ff}.
192
193#### Multi-line Strings
194
195Multi-line strings are surrounded with triple quotes. They come in the same
196three varieties, and leading whitespace is stripped in a convenient way.
197
198 sort <<< """
199 var sub: $x
200 command sub: $(echo hi)
201 expression sub: $[x + 3]
202 """
203 # =>
204 # command sub: hi
205 # expression sub: 9
206 # var sub: 6
207
208 sort <<< '''
209 $2.00 # literal $, no interpolation
210 $1.99
211 '''
212 # =>
213 # $1.99
214 # $2.00
215
216 sort <<< u'''
217 C\tD
218 A\tB
219 ''' # b''' strings also supported
220 # =>
221 # A B
222 # C D
223
224(Use multiline strings instead of shell's [here docs]($xref:here-doc).)
225
226### Three Kinds of Substitution
227
228YSH has syntax for 3 types of substitution, all of which start with `$`. That
229is, you can convert any of these things to a **string**:
230
2311. Variables
2322. The output of commands
2333. The value of expressions
234
235#### Variable Sub
236
237The syntax `$a` or `${a}` converts a variable to a string:
238
239 var a = 'ale'
240 echo $a # => ale
241 echo _${a}_ # => _ale_
242 echo "_ $a _" # => _ ale _
243
244The shell operator `:-` is occasionally useful in YSH:
245
246 echo ${not_defined:-'default'} # => default
247
248#### Command Sub
249
250The `$(echo hi)` syntax runs a command and captures its `stdout`:
251
252 echo $(hostname) # => example.com
253 echo "_ $(hostname) _" # => _ example.com _
254
255#### Expression Sub
256
257The `$[myexpr]` syntax evaluates an expression and converts it to a string:
258
259 echo $[a] # => ale
260 echo $[1 + 2 * 3] # => 7
261 echo "_ $[1 + 2 * 3] _" # => _ 7 _
262
263<!-- TODO: safe substitution with "$[a]"html -->
264
265### Arrays of Strings: Globs, Brace Expansion, Splicing, and Splitting
266
267There are four constructs that evaluate to a **list of strings**, rather than a
268single string.
269
270#### Globs
271
272Globs like `*.py` evaluate to a list of files.
273
274 touch foo.py bar.py # create the files
275 write *.py
276 # =>
277 # foo.py
278 # bar.py
279
280If no files match, it evaluates to an empty list (`[]`).
281
282#### Brace Expansion
283
284The brace expansion mini-language lets you write strings without duplication:
285
286 write {alice,bob}@example.com
287 # =>
288 # alice@example.com
289 # bob@example.com
290
291#### Splicing
292
293The `@` operator splices an array into a command:
294
295 var myarray = :| ale bean |
296 write S @myarray E
297 # =>
298 # S
299 # ale
300 # bean
301 # E
302
303You also have `@[]` to splice an expression that evaluates to a list:
304
305 write -- @[split('ale bean')]
306 # =>
307 # ale
308 # bean
309
310Each item will be converted to a string.
311
312#### Split Command Sub / Split Builtin Sub
313
314There's also a variant of *command sub* that decodes J8 lines into a sequence
315of strings:
316
317 write @(seq 3) # write is passed 3 args
318 # =>
319 # 1
320 # 2
321 # 3
322
323## Command Language: I/O, Control Flow, Abstraction
324
325### Simple Commands
326
327A simple command is a space-separated list of words. YSH looks up the first
328word to determine if it's a builtin command, or a user-defined `proc`.
329
330 echo 'hello world' # The shell builtin 'echo'
331
332 proc greet (name) { # Define a unit of code
333 echo "hello $name"
334 }
335
336 # The first word now resolves to the proc you defined
337 greet alice # => hello alice
338
339If it's neither, then it's assumed to be an external command:
340
341 ls -l /tmp # The external 'ls' command
342
343Commands accept traditional string arguments, as well as typed arguments in
344parentheses:
345
346 # 'write' is a string arg; 'x' is a typed expression arg
347 json write (x)
348
349<!--
350Block args are a special kind of typed arg:
351
352 cd /tmp {
353 echo $PWD
354 }
355-->
356
357### Redirects
358
359You can **redirect** `stdin` and `stdout` of simple commands:
360
361 echo hi > tmp.txt # write to a file
362 sort < tmp.txt
363
364Here are the most common idioms for using `stderr` (identical to shell):
365
366 ls /tmp 2>errors.txt
367 echo 'fatal error' >&2
368
369### ARGV and ENV
370
371The `ARGV` list holds the arguments passed to the shell:
372
373 var num_args = len(ARGV)
374 ls /tmp @ARGV # pass shell's arguments through
375
376---
377
378You can add to the environment of a new process with a *prefix binding*:
379
380 PYTHONPATH=vendor ./demo.py
381
382The `ENV` object reflects the current environment:
383
384 echo $[ENV.PYTHONPATH] # => vendor
385
386### Pipelines
387
388Pipelines are a powerful method manipulating data streams:
389
390 ls | wc -l # count files in this directory
391 find /bin -type f | xargs wc -l # count files in a subtree
392
393The stream may contain (lines of) text, binary data, JSON, TSV, and more.
394Details below.
395
396### Multi-line Commands
397
398The `...` prefix lets you write long commands, pipelines, and `&&` chains
399without `\` line continuations.
400
401 ... find /bin # traverse this directory and
402 -type f -a -executable # print executable files
403 | sort -r # reverse sort
404 | head -n 30 # limit to 30 files
405 ;
406
407When this mode is active:
408
409- A single newline behaves like a space
410- A blank line (two newlines in a row) is illegal, but a line that has only a
411 comment is allowed. This prevents confusion if you forget the `;`
412 terminator.
413
414### `var`, `setvar`, `const` to Declare and Mutate
415
416Constants can't be modified:
417
418 const myconst = 'mystr'
419 # setvar myconst = 'foo' would be an error
420
421Modify variables with the `setvar` keyword:
422
423 var num_beans = 12
424 setvar num_beans = 13
425
426A more complex example:
427
428 var d = {name: 'bob', age: 42} # dict literal
429 setvar d.name = 'alice' # d.name is a synonym for d['name']
430 echo $[d.name] # => alice
431
432That's most of what you need to know about assignments. Advanced users may
433want to use `setglobal` or `call myplace->setValue(42)` in certain situations.
434
435<!--
436 var g = 1
437 var h = 2
438 proc demo(:out) {
439 setglobal g = 42
440 setref out = 43
441 }
442 demo :h # pass a reference to h
443 echo "$g $h" # => 42 43
444-->
445
446More info: [Variable Declaration and Mutation](variables.html).
447
448### `for` Loop
449
450#### Words
451
452Shell-style for loops iterate over **words**:
453
454 for word in 'oils' $num_beans {pea,coco}nut {
455 echo $word
456 }
457 # =>
458 # oils
459 # 13
460 # peanut
461 # coconut
462
463You can also request the loop index:
464
465 for i, word in README.md *.py {
466 echo "$i - $word"
467 }
468 # =>
469 # 0 - README.md
470 # 1 - __init__.py
471
472#### Typed Data
473
474To iterate over a typed data, use parentheses around an **expression**. The
475expression should evaluate to an integer `Range`, `List`, `Dict`, or `Stdin`.
476
477Range:
478
479 for i in (3 ..< 5) { # range operator ..<
480 echo "i = $i"
481 }
482 # =>
483 # i = 3
484 # i = 4
485
486List:
487
488 var foods = ['ale', 'bean']
489 for item in (foods) {
490 echo $item
491 }
492 # =>
493 # ale
494 # bean
495
496Again, you can request the index with `for i, item in ...`.
497
498---
499
500Here's the most general form of the loop over `Dict`:
501
502 var mydict = {pea: 42, nut: 10}
503 for i, k, v in (mydict) {
504 echo "$i - $k - $v"
505 }
506 # =>
507 # 0 - pea - 42
508 # 1 - nut - 10
509
510There are two simpler forms:
511
512- One variable gives you the key: `for k in (mydict)`
513- Two variables gives you the key and value: `for k, v in (mydict)`
514
515(One way to think of it: `for` loops in YSH have the functionality Python's
516`enumerate()`, `items()`, `keys()`, and `values()`.)
517
518---
519
520The `io.stdin` object iterates over lines:
521
522 for line in (io.stdin) {
523 echo $line
524 }
525 # lines are buffered, so it's much faster than `while read --raw-line`
526
527<!--
528TODO: Str loop should give you the (UTF-8 offset, rune)
529Or maybe just UTF-8 offset? Decoding errors could be exceptions, or Unicode
530replacement.
531-->
532
533### `while` Loop
534
535While loops can use a **command** as the termination condition:
536
537 while test --file lock {
538 sleep 1
539 }
540
541Or an **expression**, which is surrounded in `()`:
542
543 var i = 3
544 while (i < 6) {
545 echo "i = $i"
546 setvar i += 1
547 }
548 # =>
549 # i = 3
550 # i = 4
551 # i = 5
552
553### Conditionals
554
555#### `if elif`
556
557If statements test the exit code of a command, and have optional `elif` and
558`else` clauses:
559
560 if test --file foo {
561 echo 'foo is a file'
562 rm --verbose foo # delete it
563 } elif test --dir foo {
564 echo 'foo is a directory'
565 } else {
566 echo 'neither'
567 }
568
569Invert the exit code with `!`:
570
571 if ! grep alice /etc/passwd {
572 echo 'alice is not a user'
573 }
574
575As with `while` loops, the condition can also be an **expression** wrapped in
576`()`:
577
578 if (num_beans > 0) {
579 echo 'so many beans'
580 }
581
582 var done = false
583 if (not done) { # negate with 'not' operator (contrast with !)
584 echo "we aren't done"
585 }
586
587#### `case`
588
589The case statement is a series of conditionals and executable blocks. The
590condition can be either an unquoted glob pattern like `*.py`, an eggex pattern
591like `/d+/`, or a typed expression like `(42)`:
592
593 var s = 'README.md'
594 case (s) {
595 *.py { echo 'Python' }
596 *.cc | *.h { echo 'C++' }
597 * { echo 'Other' }
598 }
599 # => Other
600
601 case (s) {
602 / dot* '.md' / { echo 'Markdown' }
603 (30 + 12) { echo 'the integer 42' }
604 (else) { echo 'neither' }
605 }
606 # => Markdown
607
608
609<!--
610(Shell style like `if foo; then ... fi` and `case $x in ... esac` is also
611legal, but discouraged in YSH code.)
612-->
613
614### Error Handling
615
616If statements are also used for **error handling**. Builtins and external
617commands use this style:
618
619 if ! test -d /bin {
620 echo 'not a directory'
621 }
622
623 if ! cp foo /tmp {
624 echo 'error copying' # any non-zero status
625 }
626
627Procs use this style (because of shell's *disabled `errexit` quirk*):
628
629 try {
630 myproc
631 }
632 if failed {
633 echo 'failed'
634 }
635
636For a complete list of examples, see [YSH Error
637Handling](ysh-error.html). For design goals and a reference, see [YSH
638Fixes Shell's Error Handling](error-handling.html).
639
640#### exit, break, continue, return
641
642The `exit` **keyword** exits a process. (It's not a shell builtin.)
643
644The other 3 control flow keywords behave like they do in Python and JavaScript.
645
646### Shell-like `proc`
647
648You can define units of code with the `proc` keyword. A `proc` is like a
649*procedure* or *process*.
650
651 proc mycopy (src, dest) {
652 ### Copy verbosely
653
654 mkdir -p $dest
655 cp --verbose $src $dest
656 }
657
658The `###` line is a "doc comment". Simple procs like this are invoked like a
659shell command:
660
661 touch log.txt
662 mycopy log.txt /tmp # first word 'mycopy' is a proc
663
664Procs have many features, including **four** kinds of arguments:
665
6661. Word args (which are always strings)
6671. Typed, positional args
6681. Typed, named args
6691. A final block argument, which may be written with `{ }`.
670
671At the call site, they can look like any of these forms:
672
673 ls /tmp # word arg
674
675 json write (d) # word arg, then positional arg
676
677 try {
678 error 'failed' (status=9) # word arg, then named arg
679 }
680
681 cd /tmp { echo $PWD } # word arg, then block arg
682
683 pp value ([1, 2]) # positional, typed arg
684
685<!-- TODO: lazy arg list: ls8 | where [age > 10] -->
686
687At the definition site, the kinds of parameters are separated with `;`, similar
688to the Julia language:
689
690 proc p2 (word1, word2; pos1, pos2, ...rest_pos) {
691 echo "$word1 $word2 $[pos1 + pos2]"
692 json write (rest_pos)
693 }
694
695 proc p3 (w ; ; named1, named2, ...rest_named; block) {
696 echo "$w $[named1 + named2]"
697 call io->eval(block)
698 json write (rest_named)
699 }
700
701 proc p4 (; ; ; block) {
702 call io->eval(block)
703 }
704
705YSH also has Python-like functions defined with `func`. These are part of the
706expression language, which we'll see later.
707
708For more info, see the [Guide to Procs and Funcs](proc-func.html).
709
710### Ruby-like Block Arguments
711
712A block is a value of type `Command`. For example, `shopt` is a builtin
713command that takes a block argument:
714
715 shopt --unset errexit { # ignore errors
716 cp ale /tmp
717 cp bean /bin
718 }
719
720In this case, the block doesn't form a new scope.
721
722#### Block Scope / Closures
723
724However, by default, block arguments capture the frame they're defined in.
725This means they obey *lexical scope*.
726
727Consider this proc, which accepts a block, and runs it:
728
729 proc do-it (; ; ; block) {
730 call io->eval(block)
731 }
732
733When the block arg is passed, the enclosing stack frame is captured. This
734means that code inside the block can use variables in the captured frame:
735
736 var x = 42
737 do-it {
738 echo "x = $x" # outer x is visible LATER, when the block is run
739 }
740
741- [Feature Index: Closures](ref/feature-index.html#Closures)
742
743### Builtin Commands
744
745**Shell builtins** like `cd` and `read` are the "standard library" of the
746command language. Each one takes various flags:
747
748 cd -L . # follow symlinks
749
750 echo foo | read --all # read all of stdin
751
752Here are some categories of builtin:
753
754- I/O: `echo write read`
755- File system: `cd test`
756- Processes: `fork wait forkwait exec`
757- Interpreter settings: `shopt shvar`
758- Meta: `command builtin runproc type eval`
759
760<!-- TODO: Link to a comprehensive list of builtins -->
761
762## Expression Language: Python-like Types
763
764YSH expressions look and behave more like Python or JavaScript than shell. For
765example, we write `if (x < y)` instead of `if [ $x -lt $y ]`. Expressions are
766usually surrounded by `( )`.
767
768At runtime, variables like `x` and `y` are bounded to **typed data**, like
769integers, floats, strings, lists, and dicts.
770
771<!--
772[Command vs. Expression Mode](command-vs-expression-mode.html) may help you
773understand how YSH is parsed.
774-->
775
776### Python-like `func`
777
778At the end of the *Command Language*, we saw that procs are shell-like units of
779code. YSH also has Python-like **functions**, which are different than
780`procs`:
781
782- They're defined with the `func` keyword.
783- They're called in expressions, not in commands.
784- They're **pure**, and live in the **interior** of a process.
785 - In contrast, procs usually perform I/O, and have **exterior** boundaries.
786
787The simplest function is:
788
789 func identity(x) {
790 return (x) # parens required for typed return
791 }
792
793A more complex pure function:
794
795 func myRepeat(s, n; special=false) { # positional; named params
796 var parts = []
797 for i in (0 ..< n) {
798 append $s (parts)
799 }
800 var result = join(parts)
801
802 if (special) {
803 return ("$result !!")
804 } else {
805 return (result)
806 }
807 }
808
809 echo $[myRepeat('z', 3)] # => zzz
810
811 echo $[myRepeat('z', 3, special=true)] # => zzz !!
812
813A function that mutates its argument:
814
815 func popTwice(mylist) {
816 call mylist->pop()
817 call mylist->pop()
818 }
819
820 var mylist = [3, 4]
821
822 # The call keyword is an "adapter" between commands and expressions,
823 # like the = keyword.
824 call popTwice(mylist)
825
826
827Funcs are named using `camelCase`, while procs use `kebab-case`. See the
828[Style Guide](style-guide.html) for more conventions.
829
830#### Builtin Functions
831
832In addition, to builtin commands, YSH has Python-like builtin **functions**.
833These are like the "standard library" for the expression language. Examples:
834
835- Functions that take multiple types: `len() type()`
836- Conversions: `bool() int() float() str() list() ...`
837- Explicit word evaluation: `split() join() glob() maybe()`
838
839<!-- TODO: Make a comprehensive list of func builtins. -->
840
841
842### Data Types: `Int`, `Str`, `List`, `Dict`, `Obj`, ...
843
844YSH has data types, each with an expression syntax and associated methods.
845
846### Methods
847
848Non-mutating methods are looked up with the `.` operator:
849
850 var line = ' ale bean '
851 var caps = line.trim().upper() # 'ALE BEAN'
852
853Mutating methods are looked up with a thin arrow `->`:
854
855 var foods = ['ale', 'bean']
856 var last = foods->pop() # bean
857 write @foods # => ale
858
859You can ignore the return value with the `call` keyword:
860
861 call foods->pop()
862
863That is, YSH adds mutable data structures to shell, so we have a special syntax
864for mutation.
865
866---
867
868You can also chain functions with a fat arrow `=>`:
869
870 var trimmed = line.trim() => upper() # 'ALE BEAN'
871
872The `=>` operator allows functions to appear in a natural left-to-right order,
873like methods.
874
875 # list() is a free function taking one arg
876 # join() is a free function taking two args
877 var x = {k1: 42, k2: 43} => list() => join('/') # 'K1/K2'
878
879---
880
881Now let's go through the data types in YSH. We'll show the syntax for
882literals, and what **methods** they have.
883
884#### Null and Bool
885
886YSH uses JavaScript-like spellings these three "atoms":
887
888 var x = null
889
890 var b1, b2 = true, false
891
892 if (b1) {
893 echo 'yes'
894 } # => yes
895
896
897#### Int
898
899There are many ways to write integers:
900
901 var small, big = 42, 65_536
902 echo "$small $big" # => 42 65536
903
904 var hex, octal, binary = 0x0001_0000, 0o755, 0b0001_0101
905 echo "$hex $octal $binary" # => 65536 493 21
906
907<!--
908"Runes" are integers that represent Unicode code points. They're not common in
909YSH code, but can make certain string algorithms more readable.
910
911 # Pound rune literals are similar to ord('A')
912 const a = #'A'
913
914 # Backslash rune literals can appear outside of quotes
915 const newline = \n # Remember this is an integer
916 const backslash = \\ # ditto
917
918 # Unicode rune literal is syntactic sugar for 0x3bc
919 const mu = \u{3bc}
920
921 echo "chars $a $newline $backslash $mu" # => chars 65 10 92 956
922-->
923
924#### Float
925
926Floats are written with a decimal point:
927
928 var big = 3.14
929
930You can use scientific notation, as in Python:
931
932 var small = 1.5e-10
933
934#### Str
935
936See the section above on *Three Kinds of String Literals*. It described
937`'single quoted'`, `"double ${quoted}"`, and `u'J8-style\n'` strings; as well
938as their multiline variants.
939
940Strings are UTF-8 encoded in memory, like strings in the [Go
941language](https://golang.org). There isn't a separate string and unicode type,
942as in Python.
943
944Strings are **immutable**, as in Python and JavaScript. This means they only
945have **transforming** methods:
946
947 var x = s.trim()
948
949Other methods:
950
951- `trimLeft() trimRight()`
952- `trimPrefix() trimSuffix()`
953- `upper() lower()`
954- `search() leftMatch()` - pattern matching
955- `replace() split()`
956
957#### List (and Arrays)
958
959All lists can be expressed with Python-like literals:
960
961 var foods = ['ale', 'bean', 'corn']
962 var recursive = [1, [2, 3]]
963
964As a special case, list of strings are called **arrays**. It's often more
965convenient to write them with shell-like literals:
966
967 # No quotes or commas
968 var foods = :| ale bean corn |
969
970 # You can use the word language here
971 var other = :| foo $s *.py {alice,bob}@example.com |
972
973Lists are **mutable**, as in Python and JavaScript. So they mainly have
974mutating methods:
975
976 call foods->reverse()
977 write -- @foods
978 # =>
979 # corn
980 # bean
981 # ale
982
983#### Dict
984
985Dicts use syntax that's like JavaScript. Here's a dict literal:
986
987 var d = {
988 name: 'bob', # unquoted keys are allowed
989 age: 42,
990 'key with spaces': 'val'
991 }
992
993You can use either `[]` or `.` to retrieve a value, given a key:
994
995 var v1 = d['name']
996 var v2 = d.name # shorthand for the above
997 var v3 = d['key with spaces'] # no shorthand for this
998
999(If the key doesn't exist, an error is raised.)
1000
1001You can change Dict values with the same 2 syntaxes:
1002
1003 set d['name'] = 'other'
1004 set d.name = 'fun'
1005
1006---
1007
1008If you want to compute a key name, use an expression inside `[]`:
1009
1010 var key = 'alice'
1011 var d2 = {[key ++ '_z']: 'ZZZ'} # Computed key name
1012 echo $[d2.alice_z] # => ZZZ
1013
1014If you omit the value, its taken from a variable of the same name:
1015
1016 var d3 = {key} # like {key: key}
1017 echo "name is $[d3.key]" # => name is alice
1018
1019More examples:
1020
1021 var empty = {}
1022 echo $[len(empty)] # => 0
1023
1024The `keys()` and `values()` methods return new `List` objects:
1025
1026 var keys = keys(d2) # => alice_z
1027 var vals = values(d3) # => alice
1028
1029#### Obj
1030
1031YSH has an `Obj` type that bundles **code** and **data**. (In contrast, JSON
1032messages are pure data, not objects.)
1033
1034The main purpose of objects is **polymorphism**:
1035
1036 var obj = makeMyObject(42) # I don't know what it looks like inside
1037
1038 echo $[obj.myMethod()] # But I can perform abstract operations
1039
1040 call obj->mutatingMethod() # Mutation is considered special, with ->
1041
1042YSH objects are similar to Lua and JavaScript objects. They can be thought of
1043as a linked list of `Dict` instances.
1044
1045Or you can say they have a `Dict` of properties, and a recursive "prototype
1046chain" that is also an `Obj`.
1047
1048- [Feature Index: Objects](ref/feature-index.html#Objects)
1049
1050### `Place` type / "out params"
1051
1052The `read` builtin can set an implicit variable `_reply`:
1053
1054 whoami | read --all # sets _reply
1055
1056Or you can pass a `value.Place`, created with `&`
1057
1058 var x # implicitly initialized to null
1059 whoami | read --all (&x) # mutate this "place"
1060 echo who=$x # => who=andy
1061
1062<!--
1063#### Quotation Types: value.Command (Block) and value.Expr
1064
1065These types are for reflection on YSH code. Most YSH programs won't use them
1066directly.
1067
1068- `Command`: an unevaluated code block.
1069 - rarely-used literal: `^(ls | wc -l)`
1070- `Expr`: an unevaluated expression.
1071 - rarely-used literal: `^[42 + a[i]]`
1072-->
1073
1074### Operators
1075
1076YSH operators are generally the same as in Python:
1077
1078 if (10 <= num_beans and num_beans < 20) {
1079 echo 'enough'
1080 } # => enough
1081
1082YSH has a few operators that aren't in Python. Equality can be approximate or
1083exact:
1084
1085 var n = ' 42 '
1086 if (n ~== 42) {
1087 echo 'equal after stripping whitespace and type conversion'
1088 } # => equal after stripping whitespace type conversion
1089
1090 if (n === 42) {
1091 echo "not reached because strings and ints aren't equal"
1092 }
1093
1094<!-- TODO: is n === 42 a type error? -->
1095
1096Pattern matching can be done with globs (`~~` and `!~~`)
1097
1098 const filename = 'foo.py'
1099 if (filename ~~ '*.py') {
1100 echo 'Python'
1101 } # => Python
1102
1103 if (filename !~~ '*.sh') {
1104 echo 'not shell'
1105 } # => not shell
1106
1107or regular expressions (`~` and `!~`). See the Eggex section below for an
1108example of the latter.
1109
1110Concatenation is `++` rather than `+` because it avoids confusion in the
1111presence of type conversion:
1112
1113 var n = 42 + 1 # string plus int does implicit conversion
1114 echo $n # => 43
1115
1116 var y = 'ale ' ++ "bean $n" # concatenation
1117 echo $y # => ale bean 43
1118
1119<!--
1120TODO: change example above
1121 var n = '42' + 1 # string plus int does implicit conversion
1122-->
1123
1124<!--
1125
1126#### Summary of Operators
1127
1128- Arithmetic: `+ - * / // %` and `**` for exponentatiation
1129 - `/` always yields a float, and `//` is integer division
1130- Bitwise: `& | ^ ~`
1131- Logical: `and or not`
1132- Comparison: `== < > <= >= in 'not in'`
1133 - Approximate equality: `~==`
1134 - Eggex and glob match: `~ !~ ~~ !~~`
1135- Ternary: `1 if x else 0`
1136- Index and slice: `mylist[3]` and `mylist[1:3]`
1137 - `mydict->key` is a shortcut for `mydict['key']`
1138- Function calls
1139 - free: `f(x, y)`
1140 - transformations and chaining: `s => startWith('prefix')`
1141 - mutating methods: `mylist->pop()`
1142- String and List: `++` for concatenation
1143 - This is a separate operator because the addition operator `+` does
1144 string-to-int conversion
1145
1146TODO: What about list comprehensions?
1147-->
1148
1149### Egg Expressions (YSH Regexes)
1150
1151An *Eggex* is a YSH expression that denotes a regular expression. Eggexes
1152translate to POSIX ERE syntax, for use with tools like `egrep`, `awk`, and `sed
1153--regexp-extended` (GNU only).
1154
1155They're designed to be readable and composable. Example:
1156
1157 var D = / digit{1,3} /
1158 var ip_pattern = / D '.' D '.' D '.' D'.' /
1159
1160 var z = '192.168.0.1'
1161 if (z ~ ip_pattern) { # Use the ~ operator to match
1162 echo "$z looks like an IP address"
1163 } # => 192.168.0.1 looks like an IP address
1164
1165 if (z !~ / '.255' %end /) {
1166 echo "doesn't end with .255"
1167 } # => doesn't end with .255"
1168
1169See the [Egg Expressions doc](eggex.html) for details.
1170
1171## Interlude
1172
1173Before moving onto other YSH features, let's review what we've seen.
1174
1175### Three Interleaved Languages
1176
1177Here are the languages we saw in the last 3 sections:
1178
11791. **Words** evaluate to a string, or list of strings. This includes:
1180 - literals like `'mystr'`
1181 - substitutions like `${x}` and `$(hostname)`
1182 - globs like `*.sh`
11832. **Commands** are used for
1184 - I/O: pipelines, builtins like `read`
1185 - control flow: `if`, `for`
1186 - abstraction: `proc`
11873. **Expressions** on typed data are borrowed from Python, with influence from
1188 JavaScript:
1189 - Lists: `['ale', 'bean']` or `:| ale bean |`
1190 - Dicts: `{name: 'bob', age: 42}`
1191 - Functions: `split('ale bean')` and `join(['pea', 'nut'])`
1192
1193### How Do They Work Together?
1194
1195Here are two examples:
1196
1197(1) In this this *command*, there are **four** *words*. The fourth word is an
1198*expression sub* `$[]`.
1199
1200 write hello $name $[d['age'] + 1]
1201 # =>
1202 # hello
1203 # world
1204 # 43
1205
1206(2) In this assignment, the *expression* on the right hand side of `=`
1207concatenates two strings. The first string is a literal, and the second is a
1208*command sub*.
1209
1210 var food = 'ale ' ++ $(echo bean | tr a-z A-Z)
1211 write $food # => ale BEAN
1212
1213So words, commands, and expressions are **mutually recursive**. If you're a
1214conceptual person, skimming [Syntactic Concepts](syntactic-concepts.html) may
1215help you understand this on a deeper level.
1216
1217<!--
1218One way to think about these sublanguages is to note that the `|` character
1219means something different in each context:
1220
1221- In the command language, it's the pipeline operator, as in `ls | wc -l`
1222- In the word language, it's only valid in a literal string like `'|'`, `"|"`,
1223 or `\|`. (It's also used in `${x|html}`, which formats a string.)
1224- In the expression language, it's the bitwise OR operator, as in Python and
1225 JavaScript.
1226-->
1227
1228---
1229
1230Let's move on from talking about **code**, and talk about **data**.
1231
1232## Data Notation / Interchange Formats
1233
1234In YSH, you can read and write data languages based on [JSON]($xref). This is
1235a primary way to exchange messages between Unix processes.
1236
1237Instead of being **executed**, like our command/word/expression languages,
1238these languages **parsed** as data structures.
1239
1240<!-- TODO: Link to slogans, fallacies, and concepts -->
1241
1242### UTF-8
1243
1244UTF-8 is the foundation of our data notation. It's the most common Unicode
1245encoding, and the most consistent:
1246
1247 var x = u'hello \u{1f642}' # store a UTF-8 string in memory
1248 echo $x # send UTF-8 to stdout
1249
1250hello &#x1f642;
1251
1252<!-- TODO: there's a runes() iterator which gives integer offsets, usable for
1253slicing -->
1254
1255### JSON
1256
1257JSON messages are UTF-8 text. You can encode and decode JSON with functions
1258(`func` style):
1259
1260 var message = toJson({x: 42}) # => (Str) '{"x": 42}'
1261 var mydict = fromJson('{"x": 42}') # => (Dict) {x: 42}
1262
1263Or with commands (`proc` style):
1264
1265 json write ({x: 42}) > foo.json # writes '{"x": 42}'
1266
1267 json read (&mydict) < foo.json # create var
1268 = mydict # => (Dict) {x: 42}
1269
1270### J8 Notation
1271
1272But JSON isn't quite enough for a principled shell.
1273
1274- Traditional Unix tools like `grep` and `awk` operate on streams of **lines**.
1275 In YSH, to avoid data-dependent bugs, we want a reliable way of **quoting**
1276 lines.
1277- In YSH, we also want to represent **binary** data, not just text. When you
1278 read a Unix file, it may or may not be text.
1279
1280So we borrow JSON-style strings, and create [J8 Notation][]. Slogans:
1281
1282- *Deconstructing and Augmenting JSON*
1283- *Fixing the JSON-Unix Mismatch*
1284
1285[J8 Notation]: $xref:j8-notation
1286
1287#### J8 Lines
1288
1289*J8 Lines* are a building block of J8 Notation. If you have a file
1290`lines.txt`:
1291
1292<pre>
1293 doc/hello.md
1294 "doc/with spaces.md"
1295b'doc/with byte \yff.md'
1296</pre>
1297
1298Then you can decode it with *split command sub* (mentioned above):
1299
1300 var decoded = @(cat lines.txt)
1301
1302This file has:
1303
13041. An unquoted string
13051. A JSON string with `"double quotes"`
13061. A J8-style string: `u'unicode'` or `b'bytes'`
1307
1308<!--
1309TODO: fromJ8Line() toJ8Line()
1310-->
1311
1312#### JSON8 is Tree-Shaped
1313
1314JSON8 is just like JSON, but it allows J8-style strings:
1315
1316<pre>
1317{ "foo": "hi \uD83D\uDE42"} # valid JSON, and valid JSON8
1318{u'foo': u'hi \u{1F642}' } # valid JSON8, with J8-style strings
1319</pre>
1320
1321<!--
1322In addition to strings and lines, you can write and read **tree-shaped** data
1323as [JSON][]:
1324
1325 var d = {key: 'value'}
1326 json write (d) # dump variable d as JSON
1327 # =>
1328 # {
1329 # "key": "value"
1330 # }
1331
1332 echo '["ale", 42]' > example.json
1333
1334 json read (&d2) < example.json # parse JSON into var d2
1335 pp (d2) # pretty print it
1336 # => (List) ['ale', 42]
1337
1338[JSON][] will lose information when strings have binary data, but the slight
1339[JSON8]($xref) upgrade won't:
1340
1341 var b = {binary: $'\xff'}
1342 json8 write (b)
1343 # =>
1344 # {
1345 # "binary": b'\yff'
1346 # }
1347-->
1348
1349[JSON]: $xref
1350
1351#### TSV8 is Table-Shaped
1352
1353(TODO: not yet implemented.)
1354
1355YSH supports data notation for tables:
1356
13571. Plain TSV files, which are untyped. Every column has string data.
1358 - Cells with tabs, newlines, and binary data are a problem.
13592. Our extension [TSV8]($xref), which supports typed data.
1360 - It uses JSON notation for booleans, integers, and floats.
1361 - It uses J8 strings, which can represent any string.
1362
1363<!-- Figure out the API. Does it work like JSON?
1364
1365Or I think we just implement
1366- rows: 'where' or 'filter' (dplyr)
1367- cols: 'select' conflicts with shell builtin; call it 'cols'?
1368- sort: 'sort-by' or 'arrange' (dplyr)
1369- TSV8 <=> sqlite conversion. Are these drivers or what?
1370 - and then let you pipe output?
1371
1372Do we also need TSV8 space2tab or something? For writing TSV8 inline.
1373
1374More later:
1375- MessagePack (e.g. for shared library extension modules)
1376 - msgpack read, write? I think user-defined function could be like this?
1377- SASH: Simple and Strict HTML? For easy processing
1378-->
1379
1380## YSH Modules are Files
1381
1382A module is a **file** of source code, like `lib/myargs.ysh`. The `use`
1383builtin turns it into an `Obj` that can be invoked and inspected:
1384
1385 use myargs.ysh
1386
1387 myargs proc1 --flag val # module name becomes a prefix, via __invoke__
1388 var alias = myargs.proc1 # module has attributes
1389
1390You can import specific names with the `--pick` flag:
1391
1392 use myargs.ysh --pick p2 p3
1393
1394 p2
1395 p3
1396
1397- [Feature Index: Modules](ref/feature-index.html#Modules)
1398
1399## The Runtime Shared by OSH and YSH
1400
1401Although we describe OSH and YSH as different languages, they use the **same**
1402interpreter under the hood.
1403
1404This interpreter has many `shopt` booleans to control behavior, like `shopt
1405--set parse_paren`. The group `shopt --set ysh:all` flips all booleans to make
1406`bin/osh` behave like `bin/ysh`.
1407
1408Understanding this common runtime, and its interface to the Unix kernel, will
1409help you understand **both** languages!
1410
1411### Interpreter Data Model
1412
1413The [Interpreter State](interpreter-state.html) doc is under construction. It
1414will cover:
1415
1416- The **call stack** for OSH and YSH
1417 - Each *stack frame* is a `{name -> cell}` mapping.
1418- Each cell has a **value**, with boolean flags
1419 - OSH has types `Str BashArray BashAssoc`, and flags `readonly export
1420 nameref`.
1421 - YSH has types `Bool Int Float Str List Dict Obj ...`, and the `readonly`
1422 flag.
1423- YSH **namespaces**
1424 - Modules with `use`
1425 - Builtin functions and commands
1426 - ENV
1427- Shell **options**
1428 - Boolean options with `shopt`: `parse_paren`, `simple_word_eval`, etc.
1429 - String options with `shvar`: `IFS`, `PATH`
1430- **Registers** that store interpreter state
1431 - `$?` and `_error`
1432 - `$!` for the last PID
1433 - `_this_dir`
1434 - `_reply`
1435
1436### Process Model (the kernel)
1437
1438The [Process Model](process-model.html) doc is **under construction**. It will cover:
1439
1440- Simple Commands, `exec`
1441- Pipelines. #[shell-the-good-parts](#blog-tag)
1442- `fork`, `forkwait`
1443- Command and process substitution
1444- Related:
1445 - [Tracing execution in Oils](xtrace.html) (xtrace), which divides
1446 process-based concurrency into **synchronous** and **async** constructs.
1447 - [Three Comics For Understanding Unix
1448 Shell](http://www.oilshell.org/blog/2020/04/comics.html) (blog)
1449
1450<!--
1451Process model additions: Capers, Headless shell
1452
1453some optimizations: See YSH starts fewer processes than other shells.
1454-->
1455
1456### Advanced: Reflecting on the Interpreter
1457
1458You can reflect on the interpreter with APIs like `io->eval()` and
1459`vm.getFrame()`.
1460
1461- [Feature Index: Reflection](ref/feature-index.html#Reflection)
1462
1463This allows YSH to be a language for creating other languages. (Ruby, Tcl, and
1464Racket also have this flavor.)
1465
1466<!--
1467
1468TODO: Hay and Awk examples
1469-->
1470
1471## Summary
1472
1473What have we described in this tour?
1474
1475YSH is a programming language that evolved from Unix shell. But you can
1476"forget" the bad parts of shell like `[ $x -lt $y ]`.
1477
1478<!--
1479Instead, we've shown you shell-like commands, Python-like expressions on typed
1480data, and Ruby-like command blocks.
1481-->
1482
1483Instead, focus on these central concepts:
1484
14851. Interleaved *word*, *command*, and *expression* languages.
14862. A standard library of *builtin commands*, as well as *builtin functions*
14873. Languages for *data*: J8 Notation, including JSON8 and TSV8
14884. A *runtime* shared by OSH and YSH
1489
1490## Appendix
1491
1492### Related Docs
1493
1494- [YSH vs. Shell Idioms](idioms.html) - YSH side-by-side with shell.
1495- [YSH Language Influences](language-influences.html) - In addition to shell,
1496 Python, and JavaScript, YSH is influenced by Ruby, Perl, Awk, PHP, and more.
1497- [A Feel For YSH Syntax](syntax-feelings.html) - Some thoughts that may help
1498 you remember the syntax.
1499- [YSH Language Warts](warts.html) documents syntax that may be surprising.
1500
1501
1502### YSH Script Template
1503
1504YSH can be used to write simple "shell scripts" or longer programs. It has
1505*procs* and *modules* to help with the latter.
1506
1507A module is just a file, like this:
1508
1509```
1510#!/usr/bin/env ysh
1511### Deploy script
1512
1513use $_this_dir/lib/util.ysh --pick log
1514
1515const DEST = '/tmp/ysh-tour'
1516
1517proc my-sync(...files) {
1518 ### Sync files and show which ones
1519
1520 cp --verbose @files $DEST
1521}
1522
1523proc main {
1524 mkdir -p $DEST
1525
1526 touch {foo,bar}.py {build,test}.sh
1527
1528 log "Copying source files"
1529 my-sync *.py *.sh
1530
1531 if test --dir /tmp/logs {
1532 cd /tmp/logs
1533
1534 log "Copying logs"
1535 my-sync *.log
1536 }
1537}
1538
1539if is-main { # The only top-level statement
1540 main @ARGV
1541}
1542```
1543
1544<!--
1545TODO:
1546- Also show flags parsing?
1547- Show longer examples where it isn't boilerplate
1548-->
1549
1550You wouldn't bother with the boilerplate for something this small. But this
1551example illustrates the basic idea: the top level often contains these words:
1552`use`, `const`, `proc`, and `func`.
1553
1554
1555<!--
1556TODO: not mentioning __provide__, since it should be optional in the most basic usage?
1557-->
1558
1559### YSH Features Not Shown
1560
1561#### Advanced
1562
1563These shell features are part of YSH, but aren't shown above:
1564
1565- The `fork` and `forkwait` builtins, for concurrent execution and subshells.
1566- Process Substitution: `diff <(sort left.txt) <(sort right.txt)`
1567
1568#### Deprecated Shell Constructs
1569
1570The shared interpreter supports many shell constructs that are deprecated:
1571
1572- YSH code uses shell's `||` and `&&` in limited circumstances, since `errexit`
1573 is on by default.
1574- Assignment builtins like `local` and `declare`. Use YSH keywords.
1575- Boolean expressions like `[[ x =~ $pat ]]`. Use YSH expressions.
1576- Shell arithmetic like `$(( x + 1 ))` and `(( y = x ))`. Use YSH expressions.
1577- The `until` loop can always be replaced with a `while` loop
1578- Most of what's in `${}` can be written in other ways. For example
1579 `${s#/tmp}` could be `s => removePrefix('/tmp')` (TODO).
1580
1581#### Not Yet Implemented
1582
1583This document mentions a few constructs that aren't yet implemented. Here's a
1584summary:
1585
1586```none
1587# Unimplemented syntax:
1588
1589echo ${x|html} # formatters
1590
1591echo ${x %.2f} # statically-parsed printf
1592
1593var x = "<p>$x</p>"html
1594echo "<p>$x</p>"html # tagged string
1595
1596var x = 15 Mi # units suffix
1597```
1598
1599<!--
1600- To implement: Capers: stateless coprocesses
1601-->
1602