doc/ysh-tour.md

OILS / doc / ysh-tour.md View on Github | oilshell.org

1602 lines, 1094 significant

1	---
2	default_highlighter: oils-sh
3	---
4
5	A Tour of YSH
6	=============
7
8	<!-- author's note about example names
9
10	- people: alice, bob
11	- nouns: ale, bean
12	- peanut, coconut
13	- 42 for integers
14	-->
15
16	This doc describes the [YSH]($xref) language from clean slate
17	perspective. We don't assume you know Unix shell, or the compatible
18	[OSH]($xref). But shell users will see the similarity, with simplifications
19	and upgrades.
20
21	Remember, YSH is for Python and JavaScript users who avoid shell! See the
22	[project FAQ][FAQ] for more color on that.
23
24	[FAQ]: https://www.oilshell.org/blog/2021/01/why-a-new-shell.html
25
26	This document is long because it demonstrates nearly every feature of the
27	language. You may want to read it in multiple sittings, or read [The Simplest
28	Explanation of
29	Oil](https://www.oilshell.org/blog/2020/01/simplest-explanation.html) first.
30	(Until 2023, YSH was called the "Oil language".)
31
32
33	Here's a summary of what follows:
34
35	1. YSH has interleaved word, command, and expression languages.
36	- The command language has Ruby-like blocks, and the expression language
37	has Python-like data types.
38	2. YSH has both builtin commands like `cd /tmp`, and builtin functions like
39	`join()`.
40	3. Languages for data, like [JSON][], are complementary to YSH code.
41	4. OSH and YSH share both an interpreter data model and a process model
42	(provided by the Unix kernel). Understanding these common models will make
43	you both a better shell user and YSH user.
44
45	Keep these points in mind as you read the details below.
46
47	[JSON]: https://json.org
48
49	<div id="toc">
50	</div>
51
52	## Preliminaries
53
54	Start YSH just like you start bash or Python:
55
56	<!-- oils-sh below skips code block extraction, since it doesn't run -->
57
58	```sh-prompt
59	bash$ ysh # assuming it's installed
60
61	ysh$ echo 'hello world' # command typed into YSH
62	hello world
63	```
64
65	In the sections below, we'll save space by showing output in comments, with
66	`=>`:
67
68	echo 'hello world' # => hello world
69
70	Multi-line output is shown like this:
71
72	echo one
73	echo two
74	# =>
75	# one
76	# two
77
78	## Examples
79
80	### Hello World Script
81
82	You can also type commands into a file like `hello.ysh`. This is a complete
83	YSH program, which is identical to a shell program:
84
85	echo 'hello world' # => hello world
86
87	### A Taste of YSH
88
89	Unlike shell, YSH has `var` and `const` keywords:
90
91	const name = 'world' # const is rarer, used the top-level
92	echo "hello $name" # => hello world
93
94	They take rich Python-like expressions on the right:
95
96	var x = 42 # an integer, not a string
97	setvar x = x * 2 + 1 # mutate with the 'setvar' keyword
98
99	setvar x += 5 # Increment by 5
100	echo $x # => 6
101
102	var mylist = [x, 7] # two integers [6, 7]
103
104	Expressions are often surrounded by `()`:
105
106	if (x > 0) {
107	echo 'positive'
108	} # => positive
109
110	for i, item in (mylist) { # 'mylist' is a variable, not a string
111	echo "[$i] item $item"
112	}
113	# =>
114	# [0] item 6
115	# [1] item 7
116
117	YSH has Ruby-like blocks:
118
119	cd /tmp {
120	echo hi > greeting.txt # file created inside /tmp
121	echo $PWD # => /tmp
122	}
123	echo $PWD # prints the original directory
124
125	And utilities to read and write JSON:
126
127	var person = {name: 'bob', age: 42}
128	json write (person)
129	# =>
130	# {
131	# "name": "bob",
132	# "age": 42,
133	# }
134
135	echo '["str", 42]' \| json read # sets '_reply' variable by default
136
137	The `=` keyword evaluates and prints an expression:
138
139	= _reply
140	# => (List) ["str", 42]
141
142	(Think of it like `var x = _reply`, without the `var`.)
143
144	## Word Language: Expressions for Strings (and Arrays)
145
146	Let's describe the word language first, and then talk about commands and
147	expressions. Words are a rich language because strings are a central
148	concept in shell.
149
150	### Unquoted Words
151
152	Words denote strings, but you often don't need to quote them:
153
154	echo hi # => hi
155
156	Quotes are useful when a string has spaces, or punctuation characters like `( )
157	;`.
158
159	### Three Kinds of String Literals
160
161	You can choose the style that's most convenient to write a given string.
162
163	#### Double-Quoted, Single-Quoted, and J8 strings (like JSON)
164
165	Double-quoted strings allow interpolation, with `$`:
166
167	var person = 'alice'
168	echo "hi $person, $(echo bye)" # => hi alice, bye
169
170	Write operators by escaping them with `\`:
171
172	echo "\$ \" \\ " # => $ " \
173
174	In single-quoted strings, all characters are literal (except `'`, which
175	can't be expressed):
176
177	echo 'c:\Program Files\' # => c:\Program Files\
178
179	If you want C-style backslash character escapes, use a J8 string, which is
180	like JSON, but with single quotes:
181
182	echo u' A is \u{41} \n line two, with backslash \\'
183	# =>
184	# A is A
185	# line two, with backslash \
186
187	The `u''` strings are guaranteed to be valid Unicode (unlike JSON). You can
188	also use `b''` strings:
189
190	echo b'byte \yff' # Byte that's not valid unicode, like \xff in C.
191	# Don't confuse it with \u{ff}.
192
193	#### Multi-line Strings
194
195	Multi-line strings are surrounded with triple quotes. They come in the same
196	three varieties, and leading whitespace is stripped in a convenient way.
197
198	sort <<< """
199	var sub: $x
200	command sub: $(echo hi)
201	expression sub: $[x + 3]
202	"""
203	# =>
204	# command sub: hi
205	# expression sub: 9
206	# var sub: 6
207
208	sort <<< '''
209	$2.00 # literal $, no interpolation
210	$1.99
211	'''
212	# =>
213	# $1.99
214	# $2.00
215
216	sort <<< u'''
217	C\tD
218	A\tB
219	''' # b''' strings also supported
220	# =>
221	# A B
222	# C D
223
224	(Use multiline strings instead of shell's [here docs]($xref:here-doc).)
225
226	### Three Kinds of Substitution
227
228	YSH has syntax for 3 types of substitution, all of which start with `$`. That
229	is, you can convert any of these things to a string:
230
231	1. Variables
232	2. The output of commands
233	3. The value of expressions
234
235	#### Variable Sub
236
237	The syntax `$a` or `${a}` converts a variable to a string:
238
239	var a = 'ale'
240	echo $a # => ale
241	echo _${a}_ # => _ale_
242	echo "_ $a _" # => _ ale _
243
244	The shell operator `:-` is occasionally useful in YSH:
245
246	echo ${not_defined:-'default'} # => default
247
248	#### Command Sub
249
250	The `$(echo hi)` syntax runs a command and captures its `stdout`:
251
252	echo $(hostname) # => example.com
253	echo "_ $(hostname) _" # => _ example.com _
254
255	#### Expression Sub
256
257	The `$[myexpr]` syntax evaluates an expression and converts it to a string:
258
259	echo $[a] # => ale
260	echo $[1 + 2 * 3] # => 7
261	echo "_ $[1 + 2 * 3] _" # => _ 7 _
262
263	<!-- TODO: safe substitution with "$[a]"html -->
264
265	### Arrays of Strings: Globs, Brace Expansion, Splicing, and Splitting
266
267	There are four constructs that evaluate to a list of strings, rather than a
268	single string.
269
270	#### Globs
271
272	Globs like `*.py` evaluate to a list of files.
273
274	touch foo.py bar.py # create the files
275	write *.py
276	# =>
277	# foo.py
278	# bar.py
279
280	If no files match, it evaluates to an empty list (`[]`).
281
282	#### Brace Expansion
283
284	The brace expansion mini-language lets you write strings without duplication:
285
286	write {alice,bob}@example.com
287	# =>
288	# alice@example.com
289	# bob@example.com
290
291	#### Splicing
292
293	The `@` operator splices an array into a command:
294
295	var myarray = :\| ale bean \|
296	write S @myarray E
297	# =>
298	# S
299	# ale
300	# bean
301	# E
302
303	You also have `@[]` to splice an expression that evaluates to a list:
304
305	write -- @[split('ale bean')]
306	# =>
307	# ale
308	# bean
309
310	Each item will be converted to a string.
311
312	#### Split Command Sub / Split Builtin Sub
313
314	There's also a variant of command sub that decodes J8 lines into a sequence
315	of strings:
316
317	write @(seq 3) # write is passed 3 args
318	# =>
319	# 1
320	# 2
321	# 3
322
323	## Command Language: I/O, Control Flow, Abstraction
324
325	### Simple Commands
326
327	A simple command is a space-separated list of words. YSH looks up the first
328	word to determine if it's a builtin command, or a user-defined `proc`.
329
330	echo 'hello world' # The shell builtin 'echo'
331
332	proc greet (name) { # Define a unit of code
333	echo "hello $name"
334	}
335
336	# The first word now resolves to the proc you defined
337	greet alice # => hello alice
338
339	If it's neither, then it's assumed to be an external command:
340
341	ls -l /tmp # The external 'ls' command
342
343	Commands accept traditional string arguments, as well as typed arguments in
344	parentheses:
345
346	# 'write' is a string arg; 'x' is a typed expression arg
347	json write (x)
348
349	<!--
350	Block args are a special kind of typed arg:
351
352	cd /tmp {
353	echo $PWD
354	}
355	-->
356
357	### Redirects
358
359	You can redirect `stdin` and `stdout` of simple commands:
360
361	echo hi > tmp.txt # write to a file
362	sort < tmp.txt
363
364	Here are the most common idioms for using `stderr` (identical to shell):
365
366	ls /tmp 2>errors.txt
367	echo 'fatal error' >&2
368
369	### ARGV and ENV
370
371	The `ARGV` list holds the arguments passed to the shell:
372
373	var num_args = len(ARGV)
374	ls /tmp @ARGV # pass shell's arguments through
375
376	---
377
378	You can add to the environment of a new process with a prefix binding:
379
380	PYTHONPATH=vendor ./demo.py
381
382	The `ENV` object reflects the current environment:
383
384	echo $[ENV.PYTHONPATH] # => vendor
385
386	### Pipelines
387
388	Pipelines are a powerful method manipulating data streams:
389
390	ls \| wc -l # count files in this directory
391	find /bin -type f \| xargs wc -l # count files in a subtree
392
393	The stream may contain (lines of) text, binary data, JSON, TSV, and more.
394	Details below.
395
396	### Multi-line Commands
397
398	The `...` prefix lets you write long commands, pipelines, and `&&` chains
399	without `\` line continuations.
400
401	... find /bin # traverse this directory and
402	-type f -a -executable # print executable files
403	\| sort -r # reverse sort
404	\| head -n 30 # limit to 30 files
405	;
406
407	When this mode is active:
408
409	- A single newline behaves like a space
410	- A blank line (two newlines in a row) is illegal, but a line that has only a
411	comment is allowed. This prevents confusion if you forget the `;`
412	terminator.
413
414	### `var`, `setvar`, `const` to Declare and Mutate
415
416	Constants can't be modified:
417
418	const myconst = 'mystr'
419	# setvar myconst = 'foo' would be an error
420
421	Modify variables with the `setvar` keyword:
422
423	var num_beans = 12
424	setvar num_beans = 13
425
426	A more complex example:
427
428	var d = {name: 'bob', age: 42} # dict literal
429	setvar d.name = 'alice' # d.name is a synonym for d['name']
430	echo $[d.name] # => alice
431
432	That's most of what you need to know about assignments. Advanced users may
433	want to use `setglobal` or `call myplace->setValue(42)` in certain situations.
434
435	<!--
436	var g = 1
437	var h = 2
438	proc demo(:out) {
439	setglobal g = 42
440	setref out = 43
441	}
442	demo :h # pass a reference to h
443	echo "$g $h" # => 42 43
444	-->
445
446	More info: [Variable Declaration and Mutation](variables.html).
447
448	### `for` Loop
449
450	#### Words
451
452	Shell-style for loops iterate over words:
453
454	for word in 'oils' $num_beans {pea,coco}nut {
455	echo $word
456	}
457	# =>
458	# oils
459	# 13
460	# peanut
461	# coconut
462
463	You can also request the loop index:
464
465	for i, word in README.md *.py {
466	echo "$i - $word"
467	}
468	# =>
469	# 0 - README.md
470	# 1 - __init__.py
471
472	#### Typed Data
473
474	To iterate over a typed data, use parentheses around an expression. The
475	expression should evaluate to an integer `Range`, `List`, `Dict`, or `Stdin`.
476
477	Range:
478
479	for i in (3 ..< 5) { # range operator ..<
480	echo "i = $i"
481	}
482	# =>
483	# i = 3
484	# i = 4
485
486	List:
487
488	var foods = ['ale', 'bean']
489	for item in (foods) {
490	echo $item
491	}
492	# =>
493	# ale
494	# bean
495
496	Again, you can request the index with `for i, item in ...`.
497
498	---
499
500	Here's the most general form of the loop over `Dict`:
501
502	var mydict = {pea: 42, nut: 10}
503	for i, k, v in (mydict) {
504	echo "$i - $k - $v"
505	}
506	# =>
507	# 0 - pea - 42
508	# 1 - nut - 10
509
510	There are two simpler forms:
511
512	- One variable gives you the key: `for k in (mydict)`
513	- Two variables gives you the key and value: `for k, v in (mydict)`
514
515	(One way to think of it: `for` loops in YSH have the functionality Python's
516	`enumerate()`, `items()`, `keys()`, and `values()`.)
517
518	---
519
520	The `io.stdin` object iterates over lines:
521
522	for line in (io.stdin) {
523	echo $line
524	}
525	# lines are buffered, so it's much faster than `while read --raw-line`
526
527	<!--
528	TODO: Str loop should give you the (UTF-8 offset, rune)
529	Or maybe just UTF-8 offset? Decoding errors could be exceptions, or Unicode
530	replacement.
531	-->
532
533	### `while` Loop
534
535	While loops can use a command as the termination condition:
536
537	while test --file lock {
538	sleep 1
539	}
540
541	Or an expression, which is surrounded in `()`:
542
543	var i = 3
544	while (i < 6) {
545	echo "i = $i"
546	setvar i += 1
547	}
548	# =>
549	# i = 3
550	# i = 4
551	# i = 5
552
553	### Conditionals
554
555	#### `if elif`
556
557	If statements test the exit code of a command, and have optional `elif` and
558	`else` clauses:
559
560	if test --file foo {
561	echo 'foo is a file'
562	rm --verbose foo # delete it
563	} elif test --dir foo {
564	echo 'foo is a directory'
565	} else {
566	echo 'neither'
567	}
568
569	Invert the exit code with `!`:
570
571	if ! grep alice /etc/passwd {
572	echo 'alice is not a user'
573	}
574
575	As with `while` loops, the condition can also be an expression wrapped in
576	`()`:
577
578	if (num_beans > 0) {
579	echo 'so many beans'
580	}
581
582	var done = false
583	if (not done) { # negate with 'not' operator (contrast with !)
584	echo "we aren't done"
585	}
586
587	#### `case`
588
589	The case statement is a series of conditionals and executable blocks. The
590	condition can be either an unquoted glob pattern like `*.py`, an eggex pattern
591	like `/d+/`, or a typed expression like `(42)`:
592
593	var s = 'README.md'
594	case (s) {
595	*.py { echo 'Python' }
596	.cc \| .h { echo 'C++' }
597	* { echo 'Other' }
598	}
599	# => Other
600
601	case (s) {
602	/ dot* '.md' / { echo 'Markdown' }
603	(30 + 12) { echo 'the integer 42' }
604	(else) { echo 'neither' }
605	}
606	# => Markdown
607
608
609	<!--
610	(Shell style like `if foo; then ... fi` and `case $x in ... esac` is also
611	legal, but discouraged in YSH code.)
612	-->
613
614	### Error Handling
615
616	If statements are also used for error handling. Builtins and external
617	commands use this style:
618
619	if ! test -d /bin {
620	echo 'not a directory'
621	}
622
623	if ! cp foo /tmp {
624	echo 'error copying' # any non-zero status
625	}
626
627	Procs use this style (because of shell's disabled `errexit` quirk):
628
629	try {
630	myproc
631	}
632	if failed {
633	echo 'failed'
634	}
635
636	For a complete list of examples, see [YSH Error
637	Handling](ysh-error.html). For design goals and a reference, see [YSH
638	Fixes Shell's Error Handling](error-handling.html).
639
640	#### exit, break, continue, return
641
642	The `exit` keyword exits a process. (It's not a shell builtin.)
643
644	The other 3 control flow keywords behave like they do in Python and JavaScript.
645
646	### Shell-like `proc`
647
648	You can define units of code with the `proc` keyword. A `proc` is like a
649	procedure or process.
650
651	proc mycopy (src, dest) {
652	### Copy verbosely
653
654	mkdir -p $dest
655	cp --verbose $src $dest
656	}
657
658	The `###` line is a "doc comment". Simple procs like this are invoked like a
659	shell command:
660
661	touch log.txt
662	mycopy log.txt /tmp # first word 'mycopy' is a proc
663
664	Procs have many features, including four kinds of arguments:
665
666	1. Word args (which are always strings)
667	1. Typed, positional args
668	1. Typed, named args
669	1. A final block argument, which may be written with `{ }`.
670
671	At the call site, they can look like any of these forms:
672
673	ls /tmp # word arg
674
675	json write (d) # word arg, then positional arg
676
677	try {
678	error 'failed' (status=9) # word arg, then named arg
679	}
680
681	cd /tmp { echo $PWD } # word arg, then block arg
682
683	pp value ([1, 2]) # positional, typed arg
684
685	<!-- TODO: lazy arg list: ls8 \| where [age > 10] -->
686
687	At the definition site, the kinds of parameters are separated with `;`, similar
688	to the Julia language:
689
690	proc p2 (word1, word2; pos1, pos2, ...rest_pos) {
691	echo "$word1 $word2 $[pos1 + pos2]"
692	json write (rest_pos)
693	}
694
695	proc p3 (w ; ; named1, named2, ...rest_named; block) {
696	echo "$w $[named1 + named2]"
697	call io->eval(block)
698	json write (rest_named)
699	}
700
701	proc p4 (; ; ; block) {
702	call io->eval(block)
703	}
704
705	YSH also has Python-like functions defined with `func`. These are part of the
706	expression language, which we'll see later.
707
708	For more info, see the [Guide to Procs and Funcs](proc-func.html).
709
710	### Ruby-like Block Arguments
711
712	A block is a value of type `Command`. For example, `shopt` is a builtin
713	command that takes a block argument:
714
715	shopt --unset errexit { # ignore errors
716	cp ale /tmp
717	cp bean /bin
718	}
719
720	In this case, the block doesn't form a new scope.
721
722	#### Block Scope / Closures
723
724	However, by default, block arguments capture the frame they're defined in.
725	This means they obey lexical scope.
726
727	Consider this proc, which accepts a block, and runs it:
728
729	proc do-it (; ; ; block) {
730	call io->eval(block)
731	}
732
733	When the block arg is passed, the enclosing stack frame is captured. This
734	means that code inside the block can use variables in the captured frame:
735
736	var x = 42
737	do-it {
738	echo "x = $x" # outer x is visible LATER, when the block is run
739	}
740
741	- [Feature Index: Closures](ref/feature-index.html#Closures)
742
743	### Builtin Commands
744
745	Shell builtins like `cd` and `read` are the "standard library" of the
746	command language. Each one takes various flags:
747
748	cd -L . # follow symlinks
749
750	echo foo \| read --all # read all of stdin
751
752	Here are some categories of builtin:
753
754	- I/O: `echo write read`
755	- File system: `cd test`
756	- Processes: `fork wait forkwait exec`
757	- Interpreter settings: `shopt shvar`
758	- Meta: `command builtin runproc type eval`
759
760	<!-- TODO: Link to a comprehensive list of builtins -->
761
762	## Expression Language: Python-like Types
763
764	YSH expressions look and behave more like Python or JavaScript than shell. For
765	example, we write `if (x < y)` instead of `if [ $x -lt $y ]`. Expressions are
766	usually surrounded by `( )`.
767
768	At runtime, variables like `x` and `y` are bounded to typed data, like
769	integers, floats, strings, lists, and dicts.
770
771	<!--
772	[Command vs. Expression Mode](command-vs-expression-mode.html) may help you
773	understand how YSH is parsed.
774	-->
775
776	### Python-like `func`
777
778	At the end of the Command Language, we saw that procs are shell-like units of
779	code. YSH also has Python-like functions, which are different than
780	`procs`:
781
782	- They're defined with the `func` keyword.
783	- They're called in expressions, not in commands.
784	- They're pure, and live in the interior of a process.
785	- In contrast, procs usually perform I/O, and have exterior boundaries.
786
787	The simplest function is:
788
789	func identity(x) {
790	return (x) # parens required for typed return
791	}
792
793	A more complex pure function:
794
795	func myRepeat(s, n; special=false) { # positional; named params
796	var parts = []
797	for i in (0 ..< n) {
798	append $s (parts)
799	}
800	var result = join(parts)
801
802	if (special) {
803	return ("$result !!")
804	} else {
805	return (result)
806	}
807	}
808
809	echo $[myRepeat('z', 3)] # => zzz
810
811	echo $[myRepeat('z', 3, special=true)] # => zzz !!
812
813	A function that mutates its argument:
814
815	func popTwice(mylist) {
816	call mylist->pop()
817	call mylist->pop()
818	}
819
820	var mylist = [3, 4]
821
822	# The call keyword is an "adapter" between commands and expressions,
823	# like the = keyword.
824	call popTwice(mylist)
825
826
827	Funcs are named using `camelCase`, while procs use `kebab-case`. See the
828	[Style Guide](style-guide.html) for more conventions.
829
830	#### Builtin Functions
831
832	In addition, to builtin commands, YSH has Python-like builtin functions.
833	These are like the "standard library" for the expression language. Examples:
834
835	- Functions that take multiple types: `len() type()`
836	- Conversions: `bool() int() float() str() list() ...`
837	- Explicit word evaluation: `split() join() glob() maybe()`
838
839	<!-- TODO: Make a comprehensive list of func builtins. -->
840
841
842	### Data Types: `Int`, `Str`, `List`, `Dict`, `Obj`, ...
843
844	YSH has data types, each with an expression syntax and associated methods.
845
846	### Methods
847
848	Non-mutating methods are looked up with the `.` operator:
849
850	var line = ' ale bean '
851	var caps = line.trim().upper() # 'ALE BEAN'
852
853	Mutating methods are looked up with a thin arrow `->`:
854
855	var foods = ['ale', 'bean']
856	var last = foods->pop() # bean
857	write @foods # => ale
858
859	You can ignore the return value with the `call` keyword:
860
861	call foods->pop()
862
863	That is, YSH adds mutable data structures to shell, so we have a special syntax
864	for mutation.
865
866	---
867
868	You can also chain functions with a fat arrow `=>`:
869
870	var trimmed = line.trim() => upper() # 'ALE BEAN'
871
872	The `=>` operator allows functions to appear in a natural left-to-right order,
873	like methods.
874
875	# list() is a free function taking one arg
876	# join() is a free function taking two args
877	var x = {k1: 42, k2: 43} => list() => join('/') # 'K1/K2'
878
879	---
880
881	Now let's go through the data types in YSH. We'll show the syntax for
882	literals, and what methods they have.
883
884	#### Null and Bool
885
886	YSH uses JavaScript-like spellings these three "atoms":
887
888	var x = null
889
890	var b1, b2 = true, false
891
892	if (b1) {
893	echo 'yes'
894	} # => yes
895
896
897	#### Int
898
899	There are many ways to write integers:
900
901	var small, big = 42, 65_536
902	echo "$small $big" # => 42 65536
903
904	var hex, octal, binary = 0x0001_0000, 0o755, 0b0001_0101
905	echo "$hex $octal $binary" # => 65536 493 21
906
907	<!--
908	"Runes" are integers that represent Unicode code points. They're not common in
909	YSH code, but can make certain string algorithms more readable.
910
911	# Pound rune literals are similar to ord('A')
912	const a = #'A'
913
914	# Backslash rune literals can appear outside of quotes
915	const newline = \n # Remember this is an integer
916	const backslash = \\ # ditto
917
918	# Unicode rune literal is syntactic sugar for 0x3bc
919	const mu = \u{3bc}
920
921	echo "chars $a $newline $backslash $mu" # => chars 65 10 92 956
922	-->
923
924	#### Float
925
926	Floats are written with a decimal point:
927
928	var big = 3.14
929
930	You can use scientific notation, as in Python:
931
932	var small = 1.5e-10
933
934	#### Str
935
936	See the section above on Three Kinds of String Literals. It described
937	`'single quoted'`, `"double ${quoted}"`, and `u'J8-style\n'` strings; as well
938	as their multiline variants.
939
940	Strings are UTF-8 encoded in memory, like strings in the [Go
941	language](https://golang.org). There isn't a separate string and unicode type,
942	as in Python.
943
944	Strings are immutable, as in Python and JavaScript. This means they only
945	have transforming methods:
946
947	var x = s.trim()
948
949	Other methods:
950
951	- `trimLeft() trimRight()`
952	- `trimPrefix() trimSuffix()`
953	- `upper() lower()`
954	- `search() leftMatch()` - pattern matching
955	- `replace() split()`
956
957	#### List (and Arrays)
958
959	All lists can be expressed with Python-like literals:
960
961	var foods = ['ale', 'bean', 'corn']
962	var recursive = [1, [2, 3]]
963
964	As a special case, list of strings are called arrays. It's often more
965	convenient to write them with shell-like literals:
966
967	# No quotes or commas
968	var foods = :\| ale bean corn \|
969
970	# You can use the word language here
971	var other = :\| foo $s *.py {alice,bob}@example.com \|
972
973	Lists are mutable, as in Python and JavaScript. So they mainly have
974	mutating methods:
975
976	call foods->reverse()
977	write -- @foods
978	# =>
979	# corn
980	# bean
981	# ale
982
983	#### Dict
984
985	Dicts use syntax that's like JavaScript. Here's a dict literal:
986
987	var d = {
988	name: 'bob', # unquoted keys are allowed
989	age: 42,
990	'key with spaces': 'val'
991	}
992
993	You can use either `[]` or `.` to retrieve a value, given a key:
994
995	var v1 = d['name']
996	var v2 = d.name # shorthand for the above
997	var v3 = d['key with spaces'] # no shorthand for this
998
999	(If the key doesn't exist, an error is raised.)
1000
1001	You can change Dict values with the same 2 syntaxes:
1002
1003	set d['name'] = 'other'
1004	set d.name = 'fun'
1005
1006	---
1007
1008	If you want to compute a key name, use an expression inside `[]`:
1009
1010	var key = 'alice'
1011	var d2 = {[key ++ '_z']: 'ZZZ'} # Computed key name
1012	echo $[d2.alice_z] # => ZZZ
1013
1014	If you omit the value, its taken from a variable of the same name:
1015
1016	var d3 = {key} # like {key: key}
1017	echo "name is $[d3.key]" # => name is alice
1018
1019	More examples:
1020
1021	var empty = {}
1022	echo $[len(empty)] # => 0
1023
1024	The `keys()` and `values()` methods return new `List` objects:
1025
1026	var keys = keys(d2) # => alice_z
1027	var vals = values(d3) # => alice
1028
1029	#### Obj
1030
1031	YSH has an `Obj` type that bundles code and data. (In contrast, JSON
1032	messages are pure data, not objects.)
1033
1034	The main purpose of objects is polymorphism:
1035
1036	var obj = makeMyObject(42) # I don't know what it looks like inside
1037
1038	echo $[obj.myMethod()] # But I can perform abstract operations
1039
1040	call obj->mutatingMethod() # Mutation is considered special, with ->
1041
1042	YSH objects are similar to Lua and JavaScript objects. They can be thought of
1043	as a linked list of `Dict` instances.
1044
1045	Or you can say they have a `Dict` of properties, and a recursive "prototype
1046	chain" that is also an `Obj`.
1047
1048	- [Feature Index: Objects](ref/feature-index.html#Objects)
1049
1050	### `Place` type / "out params"
1051
1052	The `read` builtin can set an implicit variable `_reply`:
1053
1054	whoami \| read --all # sets _reply
1055
1056	Or you can pass a `value.Place`, created with `&`
1057
1058	var x # implicitly initialized to null
1059	whoami \| read --all (&x) # mutate this "place"
1060	echo who=$x # => who=andy
1061
1062	<!--
1063	#### Quotation Types: value.Command (Block) and value.Expr
1064
1065	These types are for reflection on YSH code. Most YSH programs won't use them
1066	directly.
1067
1068	- `Command`: an unevaluated code block.
1069	- rarely-used literal: `^(ls \| wc -l)`
1070	- `Expr`: an unevaluated expression.
1071	- rarely-used literal: `^[42 + a[i]]`
1072	-->
1073
1074	### Operators
1075
1076	YSH operators are generally the same as in Python:
1077
1078	if (10 <= num_beans and num_beans < 20) {
1079	echo 'enough'
1080	} # => enough
1081
1082	YSH has a few operators that aren't in Python. Equality can be approximate or
1083	exact:
1084
1085	var n = ' 42 '
1086	if (n ~== 42) {
1087	echo 'equal after stripping whitespace and type conversion'
1088	} # => equal after stripping whitespace type conversion
1089
1090	if (n === 42) {
1091	echo "not reached because strings and ints aren't equal"
1092	}
1093
1094	<!-- TODO: is n === 42 a type error? -->
1095
1096	Pattern matching can be done with globs (`~~` and `!~~`)
1097
1098	const filename = 'foo.py'
1099	if (filename ~~ '*.py') {
1100	echo 'Python'
1101	} # => Python
1102
1103	if (filename !~~ '*.sh') {
1104	echo 'not shell'
1105	} # => not shell
1106
1107	or regular expressions (`~` and `!~`). See the Eggex section below for an
1108	example of the latter.
1109
1110	Concatenation is `++` rather than `+` because it avoids confusion in the
1111	presence of type conversion:
1112
1113	var n = 42 + 1 # string plus int does implicit conversion
1114	echo $n # => 43
1115
1116	var y = 'ale ' ++ "bean $n" # concatenation
1117	echo $y # => ale bean 43
1118
1119	<!--
1120	TODO: change example above
1121	var n = '42' + 1 # string plus int does implicit conversion
1122	-->
1123
1124	<!--
1125
1126	#### Summary of Operators
1127
1128	- Arithmetic: `+ - * / // %` and `**` for exponentatiation
1129	- `/` always yields a float, and `//` is integer division
1130	- Bitwise: `& \| ^ ~`
1131	- Logical: `and or not`
1132	- Comparison: `== < > <= >= in 'not in'`
1133	- Approximate equality: `~==`
1134	- Eggex and glob match: `~ !~ ~~ !~~`
1135	- Ternary: `1 if x else 0`
1136	- Index and slice: `mylist[3]` and `mylist[1:3]`
1137	- `mydict->key` is a shortcut for `mydict['key']`
1138	- Function calls
1139	- free: `f(x, y)`
1140	- transformations and chaining: `s => startWith('prefix')`
1141	- mutating methods: `mylist->pop()`
1142	- String and List: `++` for concatenation
1143	- This is a separate operator because the addition operator `+` does
1144	string-to-int conversion
1145
1146	TODO: What about list comprehensions?
1147	-->
1148
1149	### Egg Expressions (YSH Regexes)
1150
1151	An Eggex is a YSH expression that denotes a regular expression. Eggexes
1152	translate to POSIX ERE syntax, for use with tools like `egrep`, `awk`, and `sed
1153	--regexp-extended` (GNU only).
1154
1155	They're designed to be readable and composable. Example:
1156
1157	var D = / digit{1,3} /
1158	var ip_pattern = / D '.' D '.' D '.' D'.' /
1159
1160	var z = '192.168.0.1'
1161	if (z ~ ip_pattern) { # Use the ~ operator to match
1162	echo "$z looks like an IP address"
1163	} # => 192.168.0.1 looks like an IP address
1164
1165	if (z !~ / '.255' %end /) {
1166	echo "doesn't end with .255"
1167	} # => doesn't end with .255"
1168
1169	See the [Egg Expressions doc](eggex.html) for details.
1170
1171	## Interlude
1172
1173	Before moving onto other YSH features, let's review what we've seen.
1174
1175	### Three Interleaved Languages
1176
1177	Here are the languages we saw in the last 3 sections:
1178
1179	1. Words evaluate to a string, or list of strings. This includes:
1180	- literals like `'mystr'`
1181	- substitutions like `${x}` and `$(hostname)`
1182	- globs like `*.sh`
1183	2. Commands are used for
1184	- I/O: pipelines, builtins like `read`
1185	- control flow: `if`, `for`
1186	- abstraction: `proc`
1187	3. Expressions on typed data are borrowed from Python, with influence from
1188	JavaScript:
1189	- Lists: `['ale', 'bean']` or `:\| ale bean \|`
1190	- Dicts: `{name: 'bob', age: 42}`
1191	- Functions: `split('ale bean')` and `join(['pea', 'nut'])`
1192
1193	### How Do They Work Together?
1194
1195	Here are two examples:
1196
1197	(1) In this this command, there are four words. The fourth word is an
1198	expression sub `$[]`.
1199
1200	write hello $name $[d['age'] + 1]
1201	# =>
1202	# hello
1203	# world
1204	# 43
1205
1206	(2) In this assignment, the expression on the right hand side of `=`
1207	concatenates two strings. The first string is a literal, and the second is a
1208	command sub.
1209
1210	var food = 'ale ' ++ $(echo bean \| tr a-z A-Z)
1211	write $food # => ale BEAN
1212
1213	So words, commands, and expressions are mutually recursive. If you're a
1214	conceptual person, skimming [Syntactic Concepts](syntactic-concepts.html) may
1215	help you understand this on a deeper level.
1216
1217	<!--
1218	One way to think about these sublanguages is to note that the `\|` character
1219	means something different in each context:
1220
1221	- In the command language, it's the pipeline operator, as in `ls \| wc -l`
1222	- In the word language, it's only valid in a literal string like `'\|'`, `"\|"`,
1223	or `\\|`. (It's also used in `${x\|html}`, which formats a string.)
1224	- In the expression language, it's the bitwise OR operator, as in Python and
1225	JavaScript.
1226	-->
1227
1228	---
1229
1230	Let's move on from talking about code, and talk about data.
1231
1232	## Data Notation / Interchange Formats
1233
1234	In YSH, you can read and write data languages based on [JSON]($xref). This is
1235	a primary way to exchange messages between Unix processes.
1236
1237	Instead of being executed, like our command/word/expression languages,
1238	these languages parsed as data structures.
1239
1240	<!-- TODO: Link to slogans, fallacies, and concepts -->
1241
1242	### UTF-8
1243
1244	UTF-8 is the foundation of our data notation. It's the most common Unicode
1245	encoding, and the most consistent:
1246
1247	var x = u'hello \u{1f642}' # store a UTF-8 string in memory
1248	echo $x # send UTF-8 to stdout
1249
1250	hello 🙂
1251
1252	<!-- TODO: there's a runes() iterator which gives integer offsets, usable for
1253	slicing -->
1254
1255	### JSON
1256
1257	JSON messages are UTF-8 text. You can encode and decode JSON with functions
1258	(`func` style):
1259
1260	var message = toJson({x: 42}) # => (Str) '{"x": 42}'
1261	var mydict = fromJson('{"x": 42}') # => (Dict) {x: 42}
1262
1263	Or with commands (`proc` style):
1264
1265	json write ({x: 42}) > foo.json # writes '{"x": 42}'
1266
1267	json read (&mydict) < foo.json # create var
1268	= mydict # => (Dict) {x: 42}
1269
1270	### J8 Notation
1271
1272	But JSON isn't quite enough for a principled shell.
1273
1274	- Traditional Unix tools like `grep` and `awk` operate on streams of lines.
1275	In YSH, to avoid data-dependent bugs, we want a reliable way of quoting
1276	lines.
1277	- In YSH, we also want to represent binary data, not just text. When you
1278	read a Unix file, it may or may not be text.
1279
1280	So we borrow JSON-style strings, and create [J8 Notation][]. Slogans:
1281
1282	- Deconstructing and Augmenting JSON
1283	- Fixing the JSON-Unix Mismatch
1284
1285	[J8 Notation]: $xref:j8-notation
1286
1287	#### J8 Lines
1288
1289	J8 Lines are a building block of J8 Notation. If you have a file
1290	`lines.txt`:
1291
1292	<pre>
1293	doc/hello.md
1294	"doc/with spaces.md"
1295	b'doc/with byte \yff.md'
1296	</pre>
1297
1298	Then you can decode it with split command sub (mentioned above):
1299
1300	var decoded = @(cat lines.txt)
1301
1302	This file has:
1303
1304	1. An unquoted string
1305	1. A JSON string with `"double quotes"`
1306	1. A J8-style string: `u'unicode'` or `b'bytes'`
1307
1308	<!--
1309	TODO: fromJ8Line() toJ8Line()
1310	-->
1311
1312	#### JSON8 is Tree-Shaped
1313
1314	JSON8 is just like JSON, but it allows J8-style strings:
1315
1316	<pre>
1317	{ "foo": "hi \uD83D\uDE42"} # valid JSON, and valid JSON8
1318	{u'foo': u'hi \u{1F642}' } # valid JSON8, with J8-style strings
1319	</pre>
1320
1321	<!--
1322	In addition to strings and lines, you can write and read tree-shaped data
1323	as [JSON][]:
1324
1325	var d = {key: 'value'}
1326	json write (d) # dump variable d as JSON
1327	# =>
1328	# {
1329	# "key": "value"
1330	# }
1331
1332	echo '["ale", 42]' > example.json
1333
1334	json read (&d2) < example.json # parse JSON into var d2
1335	pp (d2) # pretty print it
1336	# => (List) ['ale', 42]
1337
1338	[JSON][] will lose information when strings have binary data, but the slight
1339	[JSON8]($xref) upgrade won't:
1340
1341	var b = {binary: $'\xff'}
1342	json8 write (b)
1343	# =>
1344	# {
1345	# "binary": b'\yff'
1346	# }
1347	-->
1348
1349	[JSON]: $xref
1350
1351	#### TSV8 is Table-Shaped
1352
1353	(TODO: not yet implemented.)
1354
1355	YSH supports data notation for tables:
1356
1357	1. Plain TSV files, which are untyped. Every column has string data.
1358	- Cells with tabs, newlines, and binary data are a problem.
1359	2. Our extension [TSV8]($xref), which supports typed data.
1360	- It uses JSON notation for booleans, integers, and floats.
1361	- It uses J8 strings, which can represent any string.
1362
1363	<!-- Figure out the API. Does it work like JSON?
1364
1365	Or I think we just implement
1366	- rows: 'where' or 'filter' (dplyr)
1367	- cols: 'select' conflicts with shell builtin; call it 'cols'?
1368	- sort: 'sort-by' or 'arrange' (dplyr)
1369	- TSV8 <=> sqlite conversion. Are these drivers or what?
1370	- and then let you pipe output?
1371
1372	Do we also need TSV8 space2tab or something? For writing TSV8 inline.
1373
1374	More later:
1375	- MessagePack (e.g. for shared library extension modules)
1376	- msgpack read, write? I think user-defined function could be like this?
1377	- SASH: Simple and Strict HTML? For easy processing
1378	-->
1379
1380	## YSH Modules are Files
1381
1382	A module is a file of source code, like `lib/myargs.ysh`. The `use`
1383	builtin turns it into an `Obj` that can be invoked and inspected:
1384
1385	use myargs.ysh
1386
1387	myargs proc1 --flag val # module name becomes a prefix, via __invoke__
1388	var alias = myargs.proc1 # module has attributes
1389
1390	You can import specific names with the `--pick` flag:
1391
1392	use myargs.ysh --pick p2 p3
1393
1394	p2
1395	p3
1396
1397	- [Feature Index: Modules](ref/feature-index.html#Modules)
1398
1399	## The Runtime Shared by OSH and YSH
1400
1401	Although we describe OSH and YSH as different languages, they use the same
1402	interpreter under the hood.
1403
1404	This interpreter has many `shopt` booleans to control behavior, like `shopt
1405	--set parse_paren`. The group `shopt --set ysh:all` flips all booleans to make
1406	`bin/osh` behave like `bin/ysh`.
1407
1408	Understanding this common runtime, and its interface to the Unix kernel, will
1409	help you understand both languages!
1410
1411	### Interpreter Data Model
1412
1413	The [Interpreter State](interpreter-state.html) doc is under construction. It
1414	will cover:
1415
1416	- The call stack for OSH and YSH
1417	- Each stack frame is a `{name -> cell}` mapping.
1418	- Each cell has a value, with boolean flags
1419	- OSH has types `Str BashArray BashAssoc`, and flags `readonly export
1420	nameref`.
1421	- YSH has types `Bool Int Float Str List Dict Obj ...`, and the `readonly`
1422	flag.
1423	- YSH namespaces
1424	- Modules with `use`
1425	- Builtin functions and commands
1426	- ENV
1427	- Shell options
1428	- Boolean options with `shopt`: `parse_paren`, `simple_word_eval`, etc.
1429	- String options with `shvar`: `IFS`, `PATH`
1430	- Registers that store interpreter state
1431	- `$?` and `_error`
1432	- `$!` for the last PID
1433	- `_this_dir`
1434	- `_reply`
1435
1436	### Process Model (the kernel)
1437
1438	The [Process Model](process-model.html) doc is under construction. It will cover:
1439
1440	- Simple Commands, `exec`
1441	- Pipelines. #[shell-the-good-parts](#blog-tag)
1442	- `fork`, `forkwait`
1443	- Command and process substitution
1444	- Related:
1445	- [Tracing execution in Oils](xtrace.html) (xtrace), which divides
1446	process-based concurrency into synchronous and async constructs.
1447	- [Three Comics For Understanding Unix
1448	Shell](http://www.oilshell.org/blog/2020/04/comics.html) (blog)
1449
1450	<!--
1451	Process model additions: Capers, Headless shell
1452
1453	some optimizations: See YSH starts fewer processes than other shells.
1454	-->
1455
1456	### Advanced: Reflecting on the Interpreter
1457
1458	You can reflect on the interpreter with APIs like `io->eval()` and
1459	`vm.getFrame()`.
1460
1461	- [Feature Index: Reflection](ref/feature-index.html#Reflection)
1462
1463	This allows YSH to be a language for creating other languages. (Ruby, Tcl, and
1464	Racket also have this flavor.)
1465
1466	<!--
1467
1468	TODO: Hay and Awk examples
1469	-->
1470
1471	## Summary
1472
1473	What have we described in this tour?
1474
1475	YSH is a programming language that evolved from Unix shell. But you can
1476	"forget" the bad parts of shell like `[ $x -lt $y ]`.
1477
1478	<!--
1479	Instead, we've shown you shell-like commands, Python-like expressions on typed
1480	data, and Ruby-like command blocks.
1481	-->
1482
1483	Instead, focus on these central concepts:
1484
1485	1. Interleaved word, command, and expression languages.
1486	2. A standard library of builtin commands, as well as builtin functions
1487	3. Languages for data: J8 Notation, including JSON8 and TSV8
1488	4. A runtime shared by OSH and YSH
1489
1490	## Appendix
1491
1492	### Related Docs
1493
1494	- [YSH vs. Shell Idioms](idioms.html) - YSH side-by-side with shell.
1495	- [YSH Language Influences](language-influences.html) - In addition to shell,
1496	Python, and JavaScript, YSH is influenced by Ruby, Perl, Awk, PHP, and more.
1497	- [A Feel For YSH Syntax](syntax-feelings.html) - Some thoughts that may help
1498	you remember the syntax.
1499	- [YSH Language Warts](warts.html) documents syntax that may be surprising.
1500
1501
1502	### YSH Script Template
1503
1504	YSH can be used to write simple "shell scripts" or longer programs. It has
1505	procs and modules to help with the latter.
1506
1507	A module is just a file, like this:
1508
1509	```
1510	#!/usr/bin/env ysh
1511	### Deploy script
1512
1513	use $_this_dir/lib/util.ysh --pick log
1514
1515	const DEST = '/tmp/ysh-tour'
1516
1517	proc my-sync(...files) {
1518	### Sync files and show which ones
1519
1520	cp --verbose @files $DEST
1521	}
1522
1523	proc main {
1524	mkdir -p $DEST
1525
1526	touch {foo,bar}.py {build,test}.sh
1527
1528	log "Copying source files"
1529	my-sync .py .sh
1530
1531	if test --dir /tmp/logs {
1532	cd /tmp/logs
1533
1534	log "Copying logs"
1535	my-sync *.log
1536	}
1537	}
1538
1539	if is-main { # The only top-level statement
1540	main @ARGV
1541	}
1542	```
1543
1544	<!--
1545	TODO:
1546	- Also show flags parsing?
1547	- Show longer examples where it isn't boilerplate
1548	-->
1549
1550	You wouldn't bother with the boilerplate for something this small. But this
1551	example illustrates the basic idea: the top level often contains these words:
1552	`use`, `const`, `proc`, and `func`.
1553
1554
1555	<!--
1556	TODO: not mentioning __provide__, since it should be optional in the most basic usage?
1557	-->
1558
1559	### YSH Features Not Shown
1560
1561	#### Advanced
1562
1563	These shell features are part of YSH, but aren't shown above:
1564
1565	- The `fork` and `forkwait` builtins, for concurrent execution and subshells.
1566	- Process Substitution: `diff <(sort left.txt) <(sort right.txt)`
1567
1568	#### Deprecated Shell Constructs
1569
1570	The shared interpreter supports many shell constructs that are deprecated:
1571
1572	- YSH code uses shell's `\|\|` and `&&` in limited circumstances, since `errexit`
1573	is on by default.
1574	- Assignment builtins like `local` and `declare`. Use YSH keywords.
1575	- Boolean expressions like `[[ x =~ $pat ]]`. Use YSH expressions.
1576	- Shell arithmetic like `$(( x + 1 ))` and `(( y = x ))`. Use YSH expressions.
1577	- The `until` loop can always be replaced with a `while` loop
1578	- Most of what's in `${}` can be written in other ways. For example
1579	`${s#/tmp}` could be `s => removePrefix('/tmp')` (TODO).
1580
1581	#### Not Yet Implemented
1582
1583	This document mentions a few constructs that aren't yet implemented. Here's a
1584	summary:
1585
1586	```none
1587	# Unimplemented syntax:
1588
1589	echo ${x\|html} # formatters
1590
1591	echo ${x %.2f} # statically-parsed printf
1592
1593	var x = "<p>$x</p>"html
1594	echo "<p>$x</p>"html # tagged string
1595
1596	var x = 15 Mi # units suffix
1597	```
1598
1599	<!--
1600	- To implement: Capers: stateless coprocesses
1601	-->
1602