doc/ysh-tour.md

OILS / doc / ysh-tour.md View on Github | oilshell.org

1541 lines, 1049 significant

1	---
2	default_highlighter: oils-sh
3	---
4
5	A Tour of YSH
6	=============
7
8	<!-- author's note about example names
9
10	- people: alice, bob
11	- nouns: ale, bean
12	- peanut, coconut
13	- 42 for integers
14	-->
15
16	This doc describes the [YSH]($xref) language from clean slate
17	perspective. We don't assume you know Unix shell, or the compatible
18	[OSH]($xref). But shell users will see the similarity, with simplifications
19	and upgrades.
20
21	Remember, YSH is for Python and JavaScript users who avoid shell! See the
22	[project FAQ][FAQ] for more color on that.
23
24	[FAQ]: https://www.oilshell.org/blog/2021/01/why-a-new-shell.html
25
26	This document is long because it demonstrates nearly every feature of the
27	language. You may want to read it in multiple sittings, or read [The Simplest
28	Explanation of
29	Oil](https://www.oilshell.org/blog/2020/01/simplest-explanation.html) first.
30	(Until 2023, YSH was called the "Oil language".)
31
32
33	Here's a summary of what follows:
34
35	1. YSH has interleaved word, command, and expression languages.
36	- The command language has Ruby-like blocks, and the expression language
37	has Python-like data types.
38	2. YSH has both builtin commands like `cd /tmp`, and builtin functions like
39	`join()`.
40	3. Languages for data, like [JSON][], are complementary to YSH code.
41	4. OSH and YSH share both an interpreter data model and a process model
42	(provided by the Unix kernel). Understanding these common models will make
43	you both a better shell user and YSH user.
44
45	Keep these points in mind as you read the details below.
46
47	[JSON]: https://json.org
48
49	<div id="toc">
50	</div>
51
52	## Preliminaries
53
54	Start YSH just like you start bash or Python:
55
56	<!-- oils-sh below skips code block extraction, since it doesn't run -->
57
58	```sh-prompt
59	bash$ ysh # assuming it's installed
60
61	ysh$ echo 'hello world' # command typed into YSH
62	hello world
63	```
64
65	In the sections below, we'll save space by showing output in comments, with
66	`=>`:
67
68	echo 'hello world' # => hello world
69
70	Multi-line output is shown like this:
71
72	echo one
73	echo two
74	# =>
75	# one
76	# two
77
78	## Examples
79
80	### Hello World Script
81
82	You can also type commands into a file like `hello.ysh`. This is a complete
83	YSH program, which is identical to a shell program:
84
85	echo 'hello world' # => hello world
86
87	### A Taste of YSH
88
89	Unlike shell, YSH has `var` and `const` keywords:
90
91	const name = 'world' # const is rarer, used the top-level
92	echo "hello $name" # => hello world
93
94	They take rich Python-like expressions on the right:
95
96	var x = 42 # an integer, not a string
97	setvar x = x * 2 + 1 # mutate with the 'setvar' keyword
98
99	setvar x += 5 # Increment by 5
100	echo $x # => 6
101
102	var mylist = [x, 7] # two integers [6, 7]
103
104	Expressions are often surrounded by `()`:
105
106	if (x > 0) {
107	echo 'positive'
108	} # => positive
109
110	for i, item in (mylist) { # 'mylist' is a variable, not a string
111	echo "[$i] item $item"
112	}
113	# =>
114	# [0] item 6
115	# [1] item 7
116
117	YSH has Ruby-like blocks:
118
119	cd /tmp {
120	echo hi > greeting.txt # file created inside /tmp
121	echo $PWD # => /tmp
122	}
123	echo $PWD # prints the original directory
124
125	And utilities to read and write JSON:
126
127	var person = {name: 'bob', age: 42}
128	json write (person)
129	# =>
130	# {
131	# "name": "bob",
132	# "age": 42,
133	# }
134
135	echo '["str", 42]' \| json read # sets '_reply' variable by default
136
137	The `=` keyword evaluates and prints an expression:
138
139	= _reply
140	# => (List) ["str", 42]
141
142	(Think of it like `var x = _reply`, without the `var`.)
143
144	## Word Language: Expressions for Strings (and Arrays)
145
146	Let's describe the word language first, and then talk about commands and
147	expressions. Words are a rich language because strings are a central
148	concept in shell.
149
150	### Unquoted Words
151
152	Words denote strings, but you often don't need to quote them:
153
154	echo hi # => hi
155
156	Quotes are useful when a string has spaces, or punctuation characters like `( )
157	;`.
158
159	### Three Kinds of String Literals
160
161	You can choose the style that's most convenient to write a given string.
162
163	#### Double-Quoted, Single-Quoted, and J8 strings (like JSON)
164
165	Double-quoted strings allow interpolation, with `$`:
166
167	var person = 'alice'
168	echo "hi $person, $(echo bye)" # => hi alice, bye
169
170	Write operators by escaping them with `\`:
171
172	echo "\$ \" \\ " # => $ " \
173
174	In single-quoted strings, all characters are literal (except `'`, which
175	can't be expressed):
176
177	echo 'c:\Program Files\' # => c:\Program Files\
178
179	If you want C-style backslash character escapes, use a J8 string, which is
180	like JSON, but with single quotes:
181
182	echo u' A is \u{41} \n line two, with backslash \\'
183	# =>
184	# A is A
185	# line two, with backslash \
186
187	The `u''` strings are guaranteed to be valid Unicode (unlike JSON). You can
188	also use `b''` strings:
189
190	echo b'byte \yff' # Byte that's not valid unicode, like \xff in C.
191	# Don't confuse it with \u{ff}.
192
193	#### Multi-line Strings
194
195	Multi-line strings are surrounded with triple quotes. They come in the same
196	three varieties, and leading whitespace is stripped in a convenient way.
197
198	sort <<< """
199	var sub: $x
200	command sub: $(echo hi)
201	expression sub: $[x + 3]
202	"""
203	# =>
204	# command sub: hi
205	# expression sub: 9
206	# var sub: 6
207
208	sort <<< '''
209	$2.00 # literal $, no interpolation
210	$1.99
211	'''
212	# =>
213	# $1.99
214	# $2.00
215
216	sort <<< u'''
217	C\tD
218	A\tB
219	''' # b''' strings also supported
220	# =>
221	# A B
222	# C D
223
224	(Use multiline strings instead of shell's [here docs]($xref:here-doc).)
225
226	### Three Kinds of Substitution
227
228	YSH has syntax for 3 types of substitution, all of which start with `$`. That
229	is, you can convert any of these things to a string:
230
231	1. Variables
232	2. The output of commands
233	3. The value of expressions
234
235	#### Variable Sub
236
237	The syntax `$a` or `${a}` converts a variable to a string:
238
239	var a = 'ale'
240	echo $a # => ale
241	echo _${a}_ # => _ale_
242	echo "_ $a _" # => _ ale _
243
244	The shell operator `:-` is occasionally useful in YSH:
245
246	echo ${not_defined:-'default'} # => default
247
248	#### Command Sub
249
250	The `$(echo hi)` syntax runs a command and captures its `stdout`:
251
252	echo $(hostname) # => example.com
253	echo "_ $(hostname) _" # => _ example.com _
254
255	#### Expression Sub
256
257	The `$[myexpr]` syntax evaluates an expression and converts it to a string:
258
259	echo $[a] # => ale
260	echo $[1 + 2 * 3] # => 7
261	echo "_ $[1 + 2 * 3] _" # => _ 7 _
262
263	<!-- TODO: safe substitution with "$[a]"html -->
264
265	### Arrays of Strings: Globs, Brace Expansion, Splicing, and Splitting
266
267	There are four constructs that evaluate to a list of strings, rather than a
268	single string.
269
270	#### Globs
271
272	Globs like `*.py` evaluate to a list of files.
273
274	touch foo.py bar.py # create the files
275	write *.py
276	# =>
277	# foo.py
278	# bar.py
279
280	If no files match, it evaluates to an empty list (`[]`).
281
282	#### Brace Expansion
283
284	The brace expansion mini-language lets you write strings without duplication:
285
286	write {alice,bob}@example.com
287	# =>
288	# alice@example.com
289	# bob@example.com
290
291	#### Splicing
292
293	The `@` operator splices an array into a command:
294
295	var myarray = :\| ale bean \|
296	write S @myarray E
297	# =>
298	# S
299	# ale
300	# bean
301	# E
302
303	You also have `@[]` to splice an expression that evaluates to a list:
304
305	write -- @[split('ale bean')]
306	# =>
307	# ale
308	# bean
309
310	Each item will be converted to a string.
311
312	#### Split Command Sub / Split Builtin Sub
313
314	There's also a variant of command sub that decodes J8 lines into a sequence
315	of strings:
316
317	write @(seq 3) # write is passed 3 args
318	# =>
319	# 1
320	# 2
321	# 3
322
323	## Command Language: I/O, Control Flow, Abstraction
324
325	### Simple Commands
326
327	A simple command is a space-separated list of words. YSH looks up the first
328	word to determine if it's a builtin command, or a user-defined `proc`.
329
330	echo 'hello world' # The shell builtin 'echo'
331
332	proc greet (name) { # A proc is like a procedure or process
333	echo "hello $name"
334	}
335
336	# The first word now resolves to the proc you defined
337	greet alice # => hello alice
338
339	If it's neither, then it's assumed to be an external command:
340
341	ls -l /tmp # The external 'ls' command
342
343	Commands accept traditional string arguments, as well as typed arguments in
344	parentheses:
345
346	# 'write' is a string arg; 'x' is a typed expression arg
347	json write (x)
348
349	<!--
350	Block args are a special kind of typed arg:
351
352	cd /tmp {
353	echo $PWD
354	}
355	-->
356
357	### Redirects
358
359	You can redirect `stdin` and `stdout` of simple commands:
360
361	echo hi > tmp.txt # write to a file
362	sort < tmp.txt
363
364	Here are the most common idioms for using `stderr` (identical to shell):
365
366	ls /tmp 2>errors.txt
367	echo 'fatal error' >&2
368
369	### ARGV and ENV
370
371	The `ARGV` list holds the arguments pased to the shell:
372
373	var num_args = len(ARGV)
374	ls /tmp @ARGV # pass shell's arguments through
375
376	---
377
378	You can add to the environment of a new process with a prefix binding:
379
380	PYTHONPATH=vendor ./demo.py
381
382	The `ENV` object reflects the current environment:
383
384	echo $[ENV.PYTHONPATH] # => vendor
385
386	### Pipelines
387
388	Pipelines are a powerful method manipulating data streams:
389
390	ls \| wc -l # count files in this directory
391	find /bin -type f \| xargs wc -l # count files in a subtree
392
393	The stream may contain (lines of) text, binary data, JSON, TSV, and more.
394	Details below.
395
396	### Multi-line Commands
397
398	The `...` prefix lets you write long commands, pipelines, and `&&` chains
399	without `\` line continuations.
400
401	... find /bin # traverse this directory and
402	-type f -a -executable # print executable files
403	\| sort -r # reverse sort
404	\| head -n 30 # limit to 30 files
405	;
406
407	When this mode is active:
408
409	- A single newline behaves like a space
410	- A blank line (two newlines in a row) is illegal, but a line that has only a
411	comment is allowed. This prevents confusion if you forget the `;`
412	terminator.
413
414	### `var`, `setvar`, `const` to Declare and Mutate
415
416	Constants can't be modified:
417
418	const myconst = 'mystr'
419	# setvar myconst = 'foo' would be an error
420
421	Modify variables with the `setvar` keyword:
422
423	var num_beans = 12
424	setvar num_beans = 13
425
426	A more complex example:
427
428	var d = {name: 'bob', age: 42} # dict literal
429	setvar d.name = 'alice' # d.name is a synonym for d['name']
430	echo $[d.name] # => alice
431
432	That's most of what you need to know about assignments. Advanced users may
433	want to use `setglobal` or `call myplace->setValue(42)` in certain situations.
434
435	<!--
436	var g = 1
437	var h = 2
438	proc demo(:out) {
439	setglobal g = 42
440	setref out = 43
441	}
442	demo :h # pass a reference to h
443	echo "$g $h" # => 42 43
444	-->
445
446	More info: [Variable Declaration and Mutation](variables.html).
447
448	### `for` Loop
449
450	#### Words
451
452	Shell-style for loops iterate over words:
453
454	for word in 'oils' $num_beans {pea,coco}nut {
455	echo $word
456	}
457	# =>
458	# oils
459	# 13
460	# peanut
461	# coconut
462
463	You can also request the loop index:
464
465	for i, word in README.md *.py {
466	echo "$i - $word"
467	}
468	# =>
469	# 0 - README.md
470	# 1 - __init__.py
471
472	#### Typed Data
473
474	To iterate over a typed data, use parentheses around an expression. The
475	expression should evaluate to an integer `Range`, `List`, `Dict`, or `Stdin`.
476
477	Range:
478
479	for i in (3 ..< 5) { # range operator ..<
480	echo "i = $i"
481	}
482	# =>
483	# i = 3
484	# i = 4
485
486	List:
487
488	var foods = ['ale', 'bean']
489	for item in (foods) {
490	echo $item
491	}
492	# =>
493	# ale
494	# bean
495
496	Again, you can request the index with `for i, item in ...`.
497
498	---
499
500	Here's the most general form of the loop over `Dict`:
501
502	var mydict = {pea: 42, nut: 10}
503	for i, k, v in (mydict) {
504	echo "$i - $k - $v"
505	}
506	# =>
507	# 0 - pea - 42
508	# 1 - nut - 10
509
510	There are two simpler forms:
511
512	- One variable gives you the key: `for k in (mydict)`
513	- Two variables gives you the key and value: `for k, v in (mydict)`
514
515	(One way to think of it: `for` loops in YSH have the functionality Python's
516	`enumerate()`, `items()`, `keys()`, and `values()`.)
517
518	---
519
520	The `io.stdin` object iterates over lines:
521
522	for line in (io.stdin) {
523	echo $line
524	}
525	# lines are buffered, so it's much faster than `while read --rawline`
526
527	<!--
528	TODO: Str loop should give you the (UTF-8 offset, rune)
529	Or maybe just UTF-8 offset? Decoding errors could be exceptions, or Unicode
530	replacement.
531	-->
532
533	### `while` Loop
534
535	While loops can use a command as the termination condition:
536
537	while test --file lock {
538	sleep 1
539	}
540
541	Or an expression, which is surrounded in `()`:
542
543	var i = 3
544	while (i < 6) {
545	echo "i = $i"
546	setvar i += 1
547	}
548	# =>
549	# i = 3
550	# i = 4
551	# i = 5
552
553	### `if elif` Conditional
554
555	If statements test the exit code of a command, and have optional `elif` and
556	`else` clauses:
557
558	if test --file foo {
559	echo 'foo is a file'
560	rm --verbose foo # delete it
561	} elif test --dir foo {
562	echo 'foo is a directory'
563	} else {
564	echo 'neither'
565	}
566
567	Invert the exit code with `!`:
568
569	if ! grep alice /etc/passwd {
570	echo 'alice is not a user'
571	}
572
573	As with `while` loops, the condition can also be an expression wrapped in
574	`()`:
575
576	if (num_beans > 0) {
577	echo 'so many beans'
578	}
579
580	var done = false
581	if (not done) { # negate with 'not' operator (contrast with !)
582	echo "we aren't done"
583	}
584
585	### `case` Conditional
586
587	The case statement is a series of conditionals and executable blocks. The
588	condition can be either an unquoted glob pattern like `*.py`, an eggex pattern
589	like `/d+/`, or a typed expression like `(42)`:
590
591	var s = 'README.md'
592	case (s) {
593	*.py { echo 'Python' }
594	.cc \| .h { echo 'C++' }
595	* { echo 'Other' }
596	}
597	# => Other
598
599	case (s) {
600	/ dot* '.md' / { echo 'Markdown' }
601	(30 + 12) { echo 'the integer 42' }
602	(else) { echo 'neither' }
603	}
604	# => Markdown
605
606
607	<!--
608	(Shell style like `if foo; then ... fi` and `case $x in ... esac` is also
609	legal, but discouraged in YSH code.)
610	-->
611
612	### Error Handling
613
614	If statements are also used for error handling. Builtins and external
615	commands use this style:
616
617	if ! test -d /bin {
618	echo 'not a directory'
619	}
620
621	if ! cp foo /tmp {
622	echo 'error copying' # any non-zero status
623	}
624
625	Procs use this style (because of shell's disabled `errexit` quirk):
626
627	try {
628	myproc
629	}
630	if failed {
631	echo 'failed'
632	}
633
634	For a complete list of examples, see [YSH Error
635	Handling](ysh-error.html). For design goals and a reference, see [YSH
636	Fixes Shell's Error Handling](error-handling.html).
637
638	#### exit, break, continue, return
639
640	The `exit` keyword exits a process. (It's not a shell builtin.)
641
642	The other 3 control flow keywords behave like they do in Python and JavaScript.
643
644	### Ruby-like Block Arguments
645
646	Here's a builtin command that takes a literal block argument:
647
648	shopt --unset errexit { # ignore errors
649	cp ale /tmp
650	cp bean /bin
651	}
652
653	A block is a value of type `Command`.
654
655	### Shell-like `proc`
656
657	You can define units of code with the `proc` keyword.
658
659	proc mycopy (src, dest) {
660	### Copy verbosely
661
662	mkdir -p $dest
663	cp --verbose $src $dest
664	}
665
666	The `###` line is a "doc comment". Simple procs like this are invoked like a
667	shell command:
668
669	touch log.txt
670	mycopy log.txt /tmp # first word 'mycopy' is a proc
671
672	Procs have many features, including four kinds of arguments:
673
674	1. Word args (which are always strings)
675	1. Typed, positional args (aka positional args)
676	1. Typed, named args (aka named args)
677	1. A final block argument, which may be written with `{ }`.
678
679	At the call site, they can look like any of these forms:
680
681	ls /tmp # word arg
682
683	json write (d) # word arg, then positional arg
684
685	try {
686	error 'failed' (status=9) # word arg, then named arg
687	}
688
689	cd /tmp { echo $PWD } # word arg, then block arg
690
691	pp value ([1, 2]) # positional, typed arg
692
693	<!-- TODO: lazy arg list: ls8 \| where [age > 10] -->
694
695	At the definition site, the kinds of parameters are separated with `;`, similar
696	to the Julia language:
697
698	proc p2 (word1, word2; pos1, pos2, ...rest_pos) {
699	echo "$word1 $word2 $[pos1 + pos2]"
700	json write (rest_pos)
701	}
702
703	proc p3 (w ; ; named1, named2, ...rest_named; block) {
704	echo "$w $[named1 + named2]"
705	eval (block)
706	json write (rest_named)
707	}
708
709	proc p4 (; ; ; block) {
710	eval (block)
711	}
712
713	YSH also has Python-like functions defined with `func`. These are part of the
714	expression language, which we'll see later.
715
716	For more info, see the [Guide to Procs and Funcs](proc-func.html).
717
718	#### Builtin Commands
719
720	Shell builtins like `cd` and `read` are the "standard library" of the
721	command language. Each one takes various flags:
722
723	cd -L . # follow symlinks
724
725	echo foo \| read --all # read all of stdin
726
727	Here are some categories of builtin:
728
729	- I/O: `echo write read`
730	- File system: `cd test`
731	- Processes: `fork wait forkwait exec`
732	- Interpreter settings: `shopt shvar`
733	- Meta: `command builtin runproc type eval`
734
735	<!-- TODO: Link to a comprehensive list of builtins -->
736
737	## Expression Language: Python-like Types
738
739	YSH expressions look and behave more like Python or JavaScript than shell. For
740	example, we write `if (x < y)` instead of `if [ $x -lt $y ]`. Expressions are
741	usually surrounded by `( )`.
742
743	At runtime, variables like `x` and `y` are bounded to typed data, like
744	integers, floats, strings, lists, and dicts.
745
746	<!--
747	[Command vs. Expression Mode](command-vs-expression-mode.html) may help you
748	understand how YSH is parsed.
749	-->
750
751	### Python-like `func`
752
753	At the end of the Command Language, we saw that procs are shell-like units of
754	code. YSH also has Python-like functions, which are different than
755	`procs`:
756
757	- They're defined with the `func` keyword.
758	- They're called in expressions, not in commands.
759	- They're pure, and live in the interior of a process.
760	- In contrast, procs usually perform I/O, and have exterior boundaries.
761
762	The simplest function is:
763
764	func identity(x) {
765	return (x) # parens required for typed return
766	}
767
768	A more complex pure function:
769
770	func myRepeat(s, n; special=false) { # positional; named params
771	var parts = []
772	for i in (0 ..< n) {
773	append $s (parts)
774	}
775	var result = join(parts)
776
777	if (special) {
778	return ("$result !!")
779	} else {
780	return (result)
781	}
782	}
783
784	echo $[myRepeat('z', 3)] # => zzz
785
786	echo $[myRepeat('z', 3, special=true)] # => zzz !!
787
788	A function that mutates its argument:
789
790	func popTwice(mylist) {
791	call mylist->pop()
792	call mylist->pop()
793	}
794
795	var mylist = [3, 4]
796
797	# The call keyword is an "adapter" between commands and expressions,
798	# like the = keyword.
799	call popTwice(mylist)
800
801
802	Funcs are named using `camelCase`, while procs use `kebab-case`. See the
803	[Style Guide](style-guide.html) for more conventions.
804
805	#### Builtin Functions
806
807	In addition, to builtin commands, YSH has Python-like builtin functions.
808	These are like the "standard library" for the expression language. Examples:
809
810	- Functions that take multiple types: `len() type()`
811	- Conversions: `bool() int() float() str() list() ...`
812	- Explicit word evaluation: `split() join() glob() maybe()`
813
814	<!-- TODO: Make a comprehensive list of func builtins. -->
815
816
817	### Data Types: `Int`, `Str`, `List`, `Dict`, `Obj`, ...
818
819	YSH has data types, each with an expression syntax and associated methods.
820
821	### Methods
822
823	YSH adds mutable data structures to shell, so we have a special syntax for
824	mutating methods. They are looked up with a thin arrow `->`:
825
826	var foods = ['ale', 'bean']
827	var last = foods->pop() # bean
828	write @foods # => ale
829
830	You can ignore the return value with the `call` keyword:
831
832	call foods->pop()
833
834	Regular methods are looked up with the `.` operator:
835
836	var line = ' ale bean '
837	var caps = last.trim().upper() # 'ALE BEAN'
838
839	---
840
841	You can also chain functions with a fat arrow `=>`:
842
843	var trimmed = line.trim() => upper() # 'ALE BEAN'
844
845	The `=>` operator allows functions to appear in a natural left-to-right order,
846	like methods.
847
848	# list() is a free function taking one arg
849	# join() is a free function taking two args
850	var x = {k1: 42, k2: 43} => list() => join('/') # 'K1/K2'
851
852	---
853
854	Now let's go through the data types in YSH. We'll show the syntax for
855	literals, and what methods they have.
856
857	#### Null and Bool
858
859	YSH uses JavaScript-like spellings these three "atoms":
860
861	var x = null
862
863	var b1, b2 = true, false
864
865	if (b1) {
866	echo 'yes'
867	} # => yes
868
869
870	#### Int
871
872	There are many ways to write integers:
873
874	var small, big = 42, 65_536
875	echo "$small $big" # => 42 65536
876
877	var hex, octal, binary = 0x0001_0000, 0o755, 0b0001_0101
878	echo "$hex $octal $binary" # => 65536 493 21
879
880	<!--
881	"Runes" are integers that represent Unicode code points. They're not common in
882	YSH code, but can make certain string algorithms more readable.
883
884	# Pound rune literals are similar to ord('A')
885	const a = #'A'
886
887	# Backslash rune literals can appear outside of quotes
888	const newline = \n # Remember this is an integer
889	const backslash = \\ # ditto
890
891	# Unicode rune literal is syntactic sugar for 0x3bc
892	const mu = \u{3bc}
893
894	echo "chars $a $newline $backslash $mu" # => chars 65 10 92 956
895	-->
896
897	#### Float
898
899	Floats are written with a decimal point:
900
901	var big = 3.14
902
903	You can use scientific notation, as in Python:
904
905	var small = 1.5e-10
906
907	#### Str
908
909	See the section above on Three Kinds of String Literals. It described
910	`'single quoted'`, `"double ${quoted}"`, and `u'J8-style\n'` strings; as well
911	as their multiline variants.
912
913	Strings are UTF-8 encoded in memory, like strings in the [Go
914	language](https://golang.org). There isn't a separate string and unicode type,
915	as in Python.
916
917	Strings are immutable, as in Python and JavaScript. This means they only
918	have transforming methods:
919
920	var x = s.trim()
921
922	Other methods:
923
924	- `trimLeft() trimRight()`
925	- `trimPrefix() trimSuffix()`
926	- `upper() lower()`
927	- `search() leftMatch()` - pattern matching
928	- `replace() split()`
929
930	#### List (and Arrays)
931
932	All lists can be expressed with Python-like literals:
933
934	var foods = ['ale', 'bean', 'corn']
935	var recursive = [1, [2, 3]]
936
937	As a special case, list of strings are called arrays. It's often more
938	convenient to write them with shell-like literals:
939
940	# No quotes or commas
941	var foods = :\| ale bean corn \|
942
943	# You can use the word language here
944	var other = :\| foo $s *.py {alice,bob}@example.com \|
945
946	Lists are mutable, as in Python and JavaScript. So they mainly have
947	mutating methods:
948
949	call foods->reverse()
950	write -- @foods
951	# =>
952	# corn
953	# bean
954	# ale
955
956	#### Dict
957
958	Dicts use syntax that's like JavaScript. Here's a dict literal:
959
960	var d = {
961	name: 'bob', # unquoted keys are allowed
962	age: 42,
963	'key with spaces': 'val'
964	}
965
966	You can use either `[]` or `.` to retrieve a value, given a key:
967
968	var v1 = d['name']
969	var v2 = d.name # shorthand for the above
970	var v3 = d['key with spaces'] # no shorthand for this
971
972	(If the key doesn't exist, an error is raised.)
973
974	You can change Dict values with the same 2 syntaxes:
975
976	set d['name'] = 'other'
977	set d.name = 'fun'
978
979	---
980
981	If you want to compute a key name, use an expression inside `[]`:
982
983	var key = 'alice'
984	var d2 = {[key ++ '_z']: 'ZZZ'} # Computed key name
985	echo $[d2.alice_z] # => ZZZ
986
987	If you omit the value, its taken from a variable of the same name:
988
989	var d3 = {key} # like {key: key}
990	echo "name is $[d3.key]" # => name is alice
991
992	More examples:
993
994	var empty = {}
995	echo $[len(empty)] # => 0
996
997	The `keys()` and `values()` methods return new `List` objects:
998
999	var keys = keys(d2) # => alice_z
1000	var vals = values(d3) # => alice
1001
1002	### `Place` type / "out params"
1003
1004	The `read` builtin can either set an implicit variable `_reply`:
1005
1006	whoami \| read --all # sets _reply
1007
1008	Or you can pass a `value.Place`, created with `&`
1009
1010	var x # implicitly initialized to null
1011	whoami \| read --all (&x) # mutate this "place"
1012	echo who=$x # => who=andy
1013
1014	<!--
1015	#### Quotation Types: value.Command (Block) and value.Expr
1016
1017	These types are for reflection on YSH code. Most YSH programs won't use them
1018	directly.
1019
1020	- `Command`: an unevaluated code block.
1021	- rarely-used literal: `^(ls \| wc -l)`
1022	- `Expr`: an unevaluated expression.
1023	- rarely-used literal: `^[42 + a[i]]`
1024	-->
1025
1026	### Operators
1027
1028	YSH operators are generally the same as in Python:
1029
1030	if (10 <= num_beans and num_beans < 20) {
1031	echo 'enough'
1032	} # => enough
1033
1034	YSH has a few operators that aren't in Python. Equality can be approximate or
1035	exact:
1036
1037	var n = ' 42 '
1038	if (n ~== 42) {
1039	echo 'equal after stripping whitespace and type conversion'
1040	} # => equal after stripping whitespace type conversion
1041
1042	if (n === 42) {
1043	echo "not reached because strings and ints aren't equal"
1044	}
1045
1046	<!-- TODO: is n === 42 a type error? -->
1047
1048	Pattern matching can be done with globs (`~~` and `!~~`)
1049
1050	const filename = 'foo.py'
1051	if (filename ~~ '*.py') {
1052	echo 'Python'
1053	} # => Python
1054
1055	if (filename !~~ '*.sh') {
1056	echo 'not shell'
1057	} # => not shell
1058
1059	or regular expressions (`~` and `!~`). See the Eggex section below for an
1060	example of the latter.
1061
1062	Concatenation is `++` rather than `+` because it avoids confusion in the
1063	presence of type conversion:
1064
1065	var n = 42 + 1 # string plus int does implicit conversion
1066	echo $n # => 43
1067
1068	var y = 'ale ' ++ "bean $n" # concatenation
1069	echo $y # => ale bean 43
1070
1071	<!--
1072	TODO: change example above
1073	var n = '42' + 1 # string plus int does implicit conversion
1074	-->
1075
1076	<!--
1077
1078	#### Summary of Operators
1079
1080	- Arithmetic: `+ - * / // %` and `**` for exponentatiation
1081	- `/` always yields a float, and `//` is integer division
1082	- Bitwise: `& \| ^ ~`
1083	- Logical: `and or not`
1084	- Comparison: `== < > <= >= in 'not in'`
1085	- Approximate equality: `~==`
1086	- Eggex and glob match: `~ !~ ~~ !~~`
1087	- Ternary: `1 if x else 0`
1088	- Index and slice: `mylist[3]` and `mylist[1:3]`
1089	- `mydict->key` is a shortcut for `mydict['key']`
1090	- Function calls
1091	- free: `f(x, y)`
1092	- transformations and chaining: `s => startWith('prefix')`
1093	- mutating methods: `mylist->pop()`
1094	- String and List: `++` for concatenation
1095	- This is a separate operator because the addition operator `+` does
1096	string-to-int conversion
1097
1098	TODO: What about list comprehensions?
1099	-->
1100
1101	### Egg Expressions (YSH Regexes)
1102
1103	An Eggex is a YSH expression that denotes a regular expression. Eggexes
1104	translate to POSIX ERE syntax, for use with tools like `egrep`, `awk`, and `sed
1105	--regexp-extended` (GNU only).
1106
1107	They're designed to be readable and composable. Example:
1108
1109	var D = / digit{1,3} /
1110	var ip_pattern = / D '.' D '.' D '.' D'.' /
1111
1112	var z = '192.168.0.1'
1113	if (z ~ ip_pattern) { # Use the ~ operator to match
1114	echo "$z looks like an IP address"
1115	} # => 192.168.0.1 looks like an IP address
1116
1117	if (z !~ / '.255' %end /) {
1118	echo "doesn't end with .255"
1119	} # => doesn't end with .255"
1120
1121	See the [Egg Expressions doc](eggex.html) for details.
1122
1123	## Interlude
1124
1125	Let's review what we've seen before moving onto other YSH features.
1126
1127	### Three Interleaved Languages
1128
1129	Here are the languages we saw in the last 3 sections:
1130
1131	1. Words evaluate to a string, or list of strings. This includes:
1132	- literals like `'mystr'`
1133	- substitutions like `${x}` and `$(hostname)`
1134	- globs like `*.sh`
1135	2. Commands are used for
1136	- I/O: pipelines, builtins like `read`
1137	- control flow: `if`, `for`
1138	- abstraction: `proc`
1139	3. Expressions on typed data are borrowed from Python, with influence from
1140	JavaScript:
1141	- Lists: `['ale', 'bean']` or `:\| ale bean \|`
1142	- Dicts: `{name: 'bob', age: 42}`
1143	- Functions: `split('ale bean')` and `join(['pea', 'nut'])`
1144
1145	### How Do They Work Together?
1146
1147	Here are two examples:
1148
1149	(1) In this this command, there are four words. The fourth word is an
1150	expression sub `$[]`.
1151
1152	write hello $name $[d['age'] + 1]
1153	# =>
1154	# hello
1155	# world
1156	# 43
1157
1158	(2) In this assignment, the expression on the right hand side of `=`
1159	concatenates two strings. The first string is a literal, and the second is a
1160	command sub.
1161
1162	var food = 'ale ' ++ $(echo bean \| tr a-z A-Z)
1163	write $food # => ale BEAN
1164
1165	So words, commands, and expressions are mutually recursive. If you're a
1166	conceptual person, skimming [Syntactic Concepts](syntactic-concepts.html) may
1167	help you understand this on a deeper level.
1168
1169	<!--
1170	One way to think about these sublanguages is to note that the `\|` character
1171	means something different in each context:
1172
1173	- In the command language, it's the pipeline operator, as in `ls \| wc -l`
1174	- In the word language, it's only valid in a literal string like `'\|'`, `"\|"`,
1175	or `\\|`. (It's also used in `${x\|html}`, which formats a string.)
1176	- In the expression language, it's the bitwise OR operator, as in Python and
1177	JavaScript.
1178	-->
1179
1180	## Advanced YSH Features
1181
1182	Unlike shell, YSH is powerful enough to write reusable libraries. It also
1183	has reflective features, to allow creating reusable languages!
1184
1185	The following sections give you a taste of some advanced features.
1186
1187	### Closures
1188
1189	Block arguments capture the frame they're defined in, which means they have
1190	lexical scope.
1191
1192	For example, this proc accepts a block, and runs it:
1193
1194	proc do-it (; ; ; block) {
1195	call io->eval(block)
1196	}
1197
1198	When you pass a block to it, the enclosing stack frame is captured:
1199
1200	var x = 42
1201	do-it {
1202	echo "x = $x" # outer x is visible LATER, when the block is run
1203	}
1204
1205	- [Feature Index: Closures](ref/feature-index.html#Closures)
1206
1207	### Objects
1208
1209	YSH has an `Obj` type that bundles code and data. (In contrast, JSON
1210	messages are pure data, not objects.)
1211
1212	The main purpose of objects is polymorphism:
1213
1214	var obj = makeMyObject(42) # I don't know what it looks like inside
1215
1216	echo $[obj.myMethod()] # But I can perform abstract operations
1217
1218	call obj->mutatingMethod() # Mutation is considered special, with ->
1219
1220	YSH objects are similar to Lua and JavaScript objects: they have a `Dict` of
1221	properties, and a recursive "prototype chain" that is also an `Obj`.
1222
1223	- [Feature Index: Objects](ref/feature-index.html#Objects)
1224
1225	### Modules
1226
1227	A module is a file of source code, like `lib/myargs.ysh`.
1228
1229	The `use` builtin turns it into an `Obj` that can be invoked and inspected:
1230
1231	use myargs.ysh
1232	myargs proc1 --flag val # module name becomes a prefix, via __invoke__
1233	var alias = myargs.proc1 # module has attributes
1234
1235	You can import specific names with the `--pick` flag:
1236
1237	use myargs.ysh --pick p2 p3
1238	p2
1239	p3
1240
1241	<!--
1242	TODO: not mentioning __provide__, since it should be optional in the most basic usage?
1243	-->
1244
1245	- [Feature Index: Modules](ref/feature-index.html#Modules)
1246
1247	### Reflecting on the Interpreter
1248
1249	YSH is a language for creating other languages. You can reflect on the
1250	interpreter with APIs like `io->eval()` and `vm.getFrame()`.
1251
1252	- [Feature Index: Reflection](ref/feature-index.html#Reflection)
1253
1254	(Ruby, Tcl, and Racket also have this flavor.)
1255
1256	---
1257
1258	These advanced features all live inside the Oils interpreter. But a shell
1259	naturally deals with textual data from the outside, so let's switch gears.
1260
1261	## Data Notation / Interchange Formats
1262
1263	YSH reads and writes data notation, like [JSON]($xref).
1264
1265	I think of them as languages for data, rather than code. Instead of being
1266	executed, they're parsed as data structures.
1267
1268	<!-- TODO: Link to slogans, fallacies, and concepts -->
1269
1270	### UTF-8
1271
1272	UTF-8 is the foundation of our textual data languages.
1273
1274	It's the most common Unicode encoding, and represents all code points
1275	consistently and efficiently.
1276
1277	<!-- TODO: there's a runes() iterator which gives integer offsets, usable for
1278	slicing -->
1279
1280	<!-- TODO: write about J8 notation -->
1281
1282	### Lines of Text (traditional), and JSON/J8 Strings
1283
1284	Traditional Unix tools like `grep` and `awk` operate on streams of lines. YSH
1285	supports this style, like any other shell.
1286
1287	But YSH also has [J8 Notation][], a data format based on [JSON][]. It's a 100%
1288	compatible upgrade that fixes some warts in JSON, and makes Unix text and JSON
1289	work together more smoothly.
1290
1291	---
1292
1293	[J8 Notation]: j8-notation.html
1294
1295	Let's talk about simple strings and lines first. Here is YSH code for making a
1296	string with 2 lines:
1297
1298	var mystr = u'pea\n' ++ u'42\n'
1299
1300	Now we can encode it into a message, which will fit on a single line.
1301
1302	json write (mystr) > message.txt
1303
1304	Now we can compress `message.txt`, encrypt it, and send it to another computer.
1305
1306	And then we can decode it, i.e. read it back into a variable:
1307
1308	json read (&x) < message.txt
1309	= x # => "pea\n42\n"
1310
1311	<!--
1312	This can also be done with functions like `toJson()` and `fromJson()`
1313
1314	write $[toJson(mystr)] # => "pea\n42\n"
1315
1316	# JSON8 is the same, but it's not lossy for binary data
1317	write $[toJson8(mystr)] # => "pea\t42\n"
1318
1319	-->
1320
1321	### Structured: JSON8, TSV8
1322
1323	In addition to strings and lines, you can write and read tree-shaped data
1324	as [JSON][]:
1325
1326	var d = {key: 'value'}
1327	json write (d) # dump variable d as JSON
1328	# =>
1329	# {
1330	# "key": "value"
1331	# }
1332
1333	echo '["ale", 42]' > example.json
1334
1335	json read (&d2) < example.json # parse JSON into var d2
1336	pp (d2) # pretty print it
1337	# => (List) ['ale', 42]
1338
1339	[JSON][] will lose information when strings have binary data, but the slight
1340	[JSON8]($xref) upgrade won't:
1341
1342	var b = {binary: $'\xff'}
1343	json8 write (b)
1344	# =>
1345	# {
1346	# "binary": b'\yff'
1347	# }
1348
1349	[JSON]: $xref
1350
1351	Table-shaped data can be read and written as [TSV8]($xref). (TODO: not yet
1352	implemented.)
1353
1354	<!-- Figure out the API. Does it work like JSON?
1355
1356	Or I think we just implement
1357	- rows: 'where' or 'filter' (dplyr)
1358	- cols: 'select' conflicts with shell builtin; call it 'cols'?
1359	- sort: 'sort-by' or 'arrange' (dplyr)
1360	- TSV8 <=> sqlite conversion. Are these drivers or what?
1361	- and then let you pipe output?
1362
1363	Do we also need TSV8 space2tab or something? For writing TSV8 inline.
1364
1365	More later:
1366	- MessagePack (e.g. for shared library extension modules)
1367	- msgpack read, write? I think user-defined function could be like this?
1368	- SASH: Simple and Strict HTML? For easy processing
1369	-->
1370
1371	## The Runtime Shared by OSH and YSH
1372
1373	Although we describe OSH and YSH as different languages, they use the same
1374	interpreter under the hood. This interpreter has various `shopt` flags that
1375	are flipped for different behavior, e.g. with `shopt --set ysh:all`.
1376
1377	Understanding this interpreter and its interface to the Unix kernel will help
1378	you understand both languages!
1379
1380	### Interpreter Data Model
1381
1382	The [Interpreter State](interpreter-state.html) doc is under construction.
1383	It will cover:
1384
1385	- Two separate namespaces (like Lisp 1 vs. 2):
1386	- proc namespace for procs as the first word
1387	- variable namespace
1388	- The variable namespace has a call stack, for the local variables of a
1389	proc.
1390	- Each stack frame is a `{name -> cell}` mapping.
1391	- A cell has one of the above data types: `Bool`, `Int`, `Str`, etc.
1392	- A cell has `readonly`, `export`, and `nameref` flags.
1393	- Boolean shell options with `shopt`: `parse_paren`, `simple_word_eval`, etc.
1394	- String shell options with `shvar`: `IFS`, `PATH`
1395	- Registers that are silently modified by the interpreter
1396	- `$?` and `_error`
1397	- `$!` for the last PID
1398	- `_this_dir`
1399	- `_reply`
1400
1401	### Process Model (the kernel)
1402
1403	The [Process Model](process-model.html) doc is under construction. It will cover:
1404
1405	- Simple Commands, `exec`
1406	- Pipelines. #[shell-the-good-parts](#blog-tag)
1407	- `fork`, `forkwait`
1408	- Command and process substitution.
1409	- Related links:
1410	- [Tracing execution in Oils](xtrace.html) (xtrace), which divides
1411	process-based concurrency into synchronous and async constructs.
1412	- [Three Comics For Understanding Unix
1413	Shell](http://www.oilshell.org/blog/2020/04/comics.html) (blog)
1414
1415
1416	<!--
1417	Process model additions: Capers, Headless shell
1418
1419	some optimizations: See YSH starts fewer processes than other shells.
1420	-->
1421
1422	## Summary
1423
1424	YSH is a large language that evolved from Unix shell. It has shell-like
1425	commands, Python-like expressions on typed data, and Ruby-like command blocks.
1426
1427	Even though it's large, you can "forget" the bad parts of shell like `[ $x -lt
1428	$y ]`.
1429
1430	These concepts are central to YSH:
1431
1432	1. Interleaved word, command, and expression languages.
1433	2. A standard library of shell builtins, as well as builtin functions
1434	3. Languages for data: J8 Notation, including JSON8 and TSV8
1435	4. A runtime shared by OSH and YSH
1436
1437	## Related Docs
1438
1439	- [YSH vs. Shell Idioms](idioms.html) - YSH side-by-side with shell.
1440	- [YSH Language Influences](language-influences.html) - In addition to shell,
1441	Python, and JavaScript, YSH is influenced by Ruby, Perl, Awk, PHP, and more.
1442	- [A Feel For YSH Syntax](syntax-feelings.html) - Some thoughts that may help
1443	you remember the syntax.
1444	- [YSH Language Warts](warts.html) documents syntax that may be surprising.
1445
1446	## Appendix: Features Not Shown
1447
1448	### Advanced
1449
1450	These shell features are part of YSH, but aren't shown for brevity.
1451
1452	- The `fork` and `forkwait` builtins, for concurrent execution and subshells.
1453	- Process Substitution: `diff <(sort left.txt) <(sort right.txt)`
1454
1455	### Deprecated Shell Constructs
1456
1457	The shared interpreter supports many shell constructs that are deprecated:
1458
1459	- YSH code uses shell's `\|\|` and `&&` in limited circumstances, since `errexit`
1460	is on by default.
1461	- Assignment builtins like `local` and `declare`. Use YSH keywords.
1462	- Boolean expressions like `[[ x =~ $pat ]]`. Use YSH expressions.
1463	- Shell arithmetic like `$(( x + 1 ))` and `(( y = x ))`. Use YSH expressions.
1464	- The `until` loop can always be replaced with a `while` loop
1465	- Most of what's in `${}` can be written in other ways. For example
1466	`${s#/tmp}` could be `s => removePrefix('/tmp')` (TODO).
1467
1468	### Not Yet Implemented
1469
1470	This document mentions a few constructs that aren't yet implemented. Here's a
1471	summary:
1472
1473	```none
1474	# Unimplemented syntax:
1475
1476	echo ${x\|html} # formatters
1477
1478	echo ${x %.2f} # statically-parsed printf
1479
1480	var x = "<p>$x</p>"html
1481	echo "<p>$x</p>"html # tagged string
1482
1483	var x = 15 Mi # units suffix
1484	```
1485
1486	<!--
1487	- To implement: Capers: stateless coprocesses
1488	-->
1489
1490	## Appendix: Example of an YSH Module
1491
1492	YSH can be used to write simple "shell scripts" or longer programs. It has
1493	procs and modules to help with the latter.
1494
1495	A module is just a file, like this:
1496
1497	```
1498	#!/usr/bin/env ysh
1499	### Deploy script
1500
1501	use $_this_dir/lib/util.ysh --pick log
1502
1503	const DEST = '/tmp/ysh-tour'
1504
1505	proc my-sync(...files) {
1506	### Sync files and show which ones
1507
1508	cp --verbose @files $DEST
1509	}
1510
1511	proc main {
1512	mkdir -p $DEST
1513
1514	touch {foo,bar}.py {build,test}.sh
1515
1516	log "Copying source files"
1517	my-sync .py .sh
1518
1519	if test --dir /tmp/logs {
1520	cd /tmp/logs
1521
1522	log "Copying logs"
1523	my-sync *.log
1524	}
1525	}
1526
1527	if is-main { # The only top-level statement
1528	main @ARGV
1529	}
1530	```
1531
1532	<!--
1533	TODO:
1534	- Also show flags parsing?
1535	- Show longer examples where it isn't boilerplate
1536	-->
1537
1538	You wouldn't bother with the boilerplate for something this small. But this
1539	example illustrates the basic idea: the top level often contains these words:
1540	`use`, `const`, `proc`, and `func`.
1541