doc/proc-func.md

OILS / doc / proc-func.md View on Github | oils.pub

819 lines, 582 significant

1	---
2	default_highlighter: oils-sh
3	---
4
5	Guide to Procs and Funcs
6	========================
7
8	YSH has two major units of code: shell-like `proc`, and Python-like `func`.
9
10	- Roughly speaking, procs are for commands and I/O, while funcs are for
11	pure computation.
12	- Procs are often big, and may call small funcs. On the other hand,
13	it's possible, but rarer, for funcs to call procs.
14	- You can write shell scripts mostly with procs, and perhaps a few funcs.
15
16	This doc compares the two mechanisms, and gives rough guidelines.
17
18	<!--
19	See the blog for more conceptual background: [Oils is
20	Exterior-First](https://www.oilshell.org/blog/2023/06/ysh-design.html).
21	-->
22
23	<div id="toc">
24	</div>
25
26	## Tip: Start Simple
27
28	Before going into detail, here's a quick reminder that you don't have to use
29	either procs or funcs. YSH is a language that scales both down and up.
30
31	You can start with just a list of plain commands:
32
33	mkdir -p /tmp/dest
34	cp --verbose *.txt /tmp/dest
35
36	Then copy those into procs as the script gets bigger:
37
38	proc build-app {
39	ninja --verbose
40	}
41
42	proc deploy {
43	mkdir -p /tmp/dest
44	cp --verbose *.txt /tmp/dest
45	}
46
47	build-app
48	deploy
49
50	Then add funcs if you need pure computation:
51
52	func isTestFile(name) {
53	return (name => endsWith('._test.py'))
54	}
55
56	if (isTestFile('my_test.py')) {
57	echo 'yes'
58	}
59
60	## At a Glance
61
62	### Procs vs. Funcs
63
64	This table summarizes the difference between procs and funcs. The rest of the
65	doc will elaborate on these issues.
66
67	<style>
68	thead {
69	background-color: #eee;
70	font-weight: bold;
71	}
72	table {
73	font-family: sans-serif;
74	border-collapse: collapse;
75	}
76
77	tr {
78	border-bottom: solid 1px;
79	border-color: #ddd;
80	}
81
82	td {
83	padding: 8px; /* override default of 5px */
84	}
85	</style>
86
87
88	<table>
89
90	- thead
91	- <!-- empty -->
92	- Proc
93	- Func
94	- tr
95	- Design Influence
96	- Shell-like.
97	- Python- and JavaScript-like, but pure.
98	- tr
99	- Shape
100	- Procs are shaped like Unix processes: with `argv`, an integer return code,
101	and `stdin` / `stdout` streams.
102
103	They're a generalization of Bourne shell "functions".
104	- Funcs are shaped like mathematical functions.
105	- tr
106	- Architectural Role ([Oils is Exterior First](https://www.oilshell.org/blog/2023/06/ysh-design.html))
107	- Exterior: processes and files.
108	- Interior: functions and garbage-collected data structures.
109	- tr
110	- I/O
111	- Procs may start external processes and pipelines. Can perform I/O
112	anywhere.
113	- Funcs need an explicit `io` param to perform I/O.
114	- tr
115	- Example Definition
116	- ```
117	proc print-max (; x, y) {
118	echo $[x if x > y else y]
119	}
120	```
121	- ```
122	func computeMax(x, y) {
123	return (x if x > y else y)
124	}
125	```
126	- tr
127	- Example Call
128	- ```
129	print-max (3, 4)
130	```
131
132	Procs can be put in pipelines:
133
134	```
135	print-max (3, 4) \| tee out.txt
136	```
137	- ```
138	var m = computeMax(3, 4)
139	```
140
141	Or throw away the return value, which is useful for functions that mutate:
142
143	```
144	call computeMax(3, 4)
145	```
146	- tr
147	- Naming Convention
148	- `kebab-case`
149	- `camelCase`
150	- tr
151	- [Syntax Mode](command-vs-expression-mode.html) of call site
152	- Command Mode
153	- Expression Mode
154	- tr
155	- Kinds of Parameters / Arguments
156	- <!-- empty -->
157	1. Word aka string
158	1. Typed and Positional
159	1. Typed and Named
160	1. Block
161
162	Examples shown below.
163	- <!-- empty -->
164	1. Positional
165	1. Named
166
167	(both typed)
168	- tr
169	- Return Value
170	- Integer status 0-255
171	- Any type of value, e.g.
172
173	```
174	return ([42, {name: 'bob'}])
175	```
176	- tr
177	- Can it be a method on an object?
178	- No
179	- Yes, funcs may be bound to objects:
180
181	```
182	var x = obj.myMethod()
183	call obj->myMutatingMethod()
184	```
185	- tr
186	- Interface Evolution
187	- Slower: Procs exposed to the outside world may need to evolve in a compatible or "versionless" way.
188	- Faster: Funcs may be refactored internally.
189	- tr
190	- Parallelism?
191	- Procs can be parallel with:
192	- shell constructs: pipelines, `&` aka `fork`
193	- external tools and the [$0 Dispatch
194	Pattern](https://www.oilshell.org/blog/2021/08/xargs.html): xargs, make,
195	Ninja, etc.
196	- Funcs are inherently serial, unless wrapped in a proc.
197	- tr
198	- More `proc` Features ...
199	<cell-attrs colspan=3 style="text-align: center; padding: 3em" />
200	- tr
201	- Kinds of Signature
202	- Open `proc p {` or <br/>
203	Closed `proc p () {`
204	- <!-- dash --> -
205	- tr
206	- Lazy Args
207	- ```
208	assert [42 === x]
209	```
210	- <!-- dash --> -
211
212	</table>
213
214	### Func Calls and Defs
215
216	Now that we've compared procs and funcs, let's look more closely at funcs.
217	They're inherently simpler: they have 2 types of args and params, rather
218	than 4.
219
220	YSH argument binding is based on Julia, which has all the power of Python, but
221	without the "evolved warts" (e.g. `/` and `*`).
222
223	In general, with all the bells and whistles, func definitions look like:
224
225	# pos args and named args separated with ;
226	func f(p1, p2, ...rest_pos; n1=42, n2='foo', ...rest_named) {
227	return (len(rest_pos) + len(rest_named))
228	}
229
230	Func calls look like:
231
232	# spread operator ... at call site
233	var pos_args = [3, 4]
234	var named_args = {foo: 'bar'}
235	var x = f(1, 2, ...pos_args; n1=43, ...named_args)
236
237	Note that positional args/params and named args/params can be thought of as two
238	"separate worlds".
239
240	This table shows simpler, more common cases.
241
242
243	<table>
244	<thead>
245	<tr>
246	<td>Args / Params</td>
247	<td>Call Site</td>
248	<td>Definition</td>
249	</tr>
250	</thead>
251
252	<tr>
253	<td>Positional Args</td>
254	<td>
255
256	var x = myMax(3, 4)
257
258	</td>
259	<td>
260
261	func myMax(x, y) {
262	return (x if x > y else y)
263	}
264
265	</td>
266	</tr>
267
268	<tr>
269	<td>Spread Pos Args</td>
270	<td>
271
272	var args = [3, 4]
273	var x = myMax(...args)
274
275	</td>
276	<td>
277
278	(as above)
279
280	</td>
281	</tr>
282
283	<tr>
284	<td>Rest Pos Params</td>
285	<td>
286
287	var x = myPrintf("%s is %d", 'bob', 30)
288
289	</td>
290	<td>
291
292	func myPrintf(fmt, ...args) {
293	# ...
294	}
295
296	</td>
297	</tr>
298
299	<tr>
300	<td colspan=3 style="text-align: center; padding: 3em">...</td>
301	</tr>
302
303	<tr>
304	<td>Named Args</td>
305	<td>
306
307	var x = mySum(3, 4, start=5)
308
309	</td>
310	<td>
311
312	func mySum(x, y; start=0) {
313	return (x + y + start)
314	}
315
316	</td>
317	</tr>
318
319	<tr>
320	<td>Spread Named Args</td>
321	<td>
322
323	var opts = {start: 5}
324	var x = mySum(3, 4, ...opts)
325
326	</td>
327	<td>
328
329	(as above)
330
331	</td>
332	</tr>
333
334	<tr>
335	<td>Rest Named Params</td>
336	<td>
337
338	var x = f(start=5, end=7)
339
340	</td>
341	<td>
342
343	func f(; ...opts) {
344	if ('start' not in opts) {
345	setvar opts.start = 0
346	}
347	# ...
348	}
349
350	</td>
351	</tr>
352
353	</table>
354
355	### Proc Calls and Defs
356
357	Like funcs, procs have 2 kinds of typed args/params: positional and named.
358
359	But they may also have string aka word args/params, and a block
360	arg/param.
361
362	In general, a proc signature has 4 sections, like this:
363
364	proc p (
365	w1, w2, ...rest_word; # word params
366	p1, p2, ...rest_pos; # pos params
367	n1, n2, ...rest_named; # named params
368	block # block param
369	) {
370	echo 'body'
371	}
372
373	In general, a proc call looks like this:
374
375	var pos_args = [3, 4]
376	var named_args = {foo: 'bar'}
377
378	p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args) {
379	echo 'block'
380	}
381
382	The block can also be passed as an expression after a second semicolon:
383
384	p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args; block)
385
386	<!--
387	- Block is really last positional arg: `cd /tmp { echo $PWD }`
388	-->
389
390	Some simpler examples:
391
392	<table>
393	<thead>
394	<tr>
395	<td>Args / Params</td>
396	<td>Call Site</td>
397	<td>Definition</td>
398	</tr>
399	</thead>
400
401	<tr>
402	<td>Word args</td>
403	<td>
404
405	my-cd /tmp
406
407	</td>
408	<td>
409
410	proc my-cd (dest) {
411	cd $dest
412	}
413
414	</td>
415	</tr>
416
417	<tr>
418	<td>Rest Word Params</td>
419	<td>
420
421	my-cd -L /tmp
422
423	</td>
424	<td>
425
426	proc my-cd (...flags) {
427	cd @flags
428	}
429
430	<tr>
431	<td>Spread Word Args</td>
432	<td>
433
434	var flags = :\| -L /tmp \|
435	my-cd @flags
436
437	</td>
438	<td>
439
440	(as above)
441
442	</td>
443	</tr>
444
445	</td>
446	</tr>
447
448	<tr>
449	<td colspan=3 style="text-align: center; padding: 3em">...</td>
450	</tr>
451
452	<tr>
453	<td>Typed Pos Arg</td>
454	<td>
455
456	print-max (3, 4)
457
458	</td>
459	<td>
460
461	proc print-max ( ; x, y) {
462	echo $[x if x > y else y]
463	}
464
465	</td>
466	</tr>
467
468	<tr>
469	<td>Typed Named Arg</td>
470	<td>
471
472	print-max (3, 4, start=5)
473
474	</td>
475	<td>
476
477	proc print-max ( ; x, y; start=0) {
478	# ...
479	}
480
481	</td>
482	</tr>
483
484	<tr>
485	<td colspan=3 style="text-align: center; padding: 3em">...</td>
486	</tr>
487
488
489
490	<tr>
491	<td>Block Argument</td>
492	<td>
493
494	my-cd /tmp {
495	echo $PWD
496	echo hi
497	}
498
499	</td>
500	<td>
501
502	proc my-cd (dest; ; ; block) {
503	cd $dest (; ; block)
504	}
505
506	</td>
507	</tr>
508
509	<tr>
510	<td>All Four Kinds</td>
511	<td>
512
513	p 'word' (42, verbose=true) {
514	echo $PWD
515	echo hi
516	}
517
518	</td>
519	<td>
520
521	proc p (w; myint; verbose=false; block) {
522	= w
523	= myint
524	= verbose
525	= block
526	}
527
528	</td>
529	</tr>
530
531	</table>
532
533	## Common Features
534
535	Let's recap the common features of procs and funcs.
536
537	### Spread Args, Rest Params
538
539	- Spread arg list `...` at call site
540	- Rest params `...` at definition
541
542	### The `error` builtin raises exceptions
543
544	The `error` builtin is idiomatic in both funcs and procs:
545
546	func f(x) {
547	if (x <= 0) {
548	error 'Should be positive' (status=99)
549	}
550	}
551
552	Tip: reserve such errors for exceptional situations. For example, an input
553	string being invalid may not be uncommon, while a disk full I/O error is more
554	exceptional.
555
556	(The `error` builtin is implemented with C++ exceptions, which are slow in the
557	error case.)
558
559	### Out Params: `&myvar` is of type `value.Place`
560
561	Out params are more common in procs, because they don't have a typed return
562	value.
563
564	proc p ( ; out) {
565	call out->setValue(42)
566	}
567	var x
568	p (&x)
569	echo "x set to $x" # => x set to 42
570
571	But they can also be used in funcs:
572
573	func f (out) {
574	call out->setValue(42)
575	}
576	var x
577	call f(&x)
578	echo "x set to $x" # => x set to 42
579
580	Observation: procs can do everything funcs can. But you may want the purity
581	and familiar syntax of a `func`.
582
583	---
584
585	Design note: out params are a nicer way of doing what bash does with `declare
586	-n` aka `nameref` variables. They don't rely on [dynamic
587	scope]($xref:dynamic-scope).
588
589	## Proc-Only Features
590
591	Procs have some features that funcs don't have.
592
593	### Lazy Arg Lists `where [x > 10]`
594
595	A lazy arg list is implemented with `shopt --set parse_bracket`, and is syntax
596	sugar for an unevaluated `value.Expr`.
597
598	Longhand:
599
600	var my_expr = ^[42 === x] # value of type Expr
601	assert (myexpr)
602
603	Shorthand:
604
605	assert [42 === x] # equivalent to the above
606
607	### Open Proc Signatures bind `argv`
608
609	TODO: Implement new `ARGV` semantics.
610
611	When a proc signature omits `()`, it's called "open" because the caller can
612	pass "extra" arguments:
613
614	proc my-open {
615	write 'args are' @ARGV
616	}
617	# All valid:
618	my-open
619	my-open 1
620	my-open 1 2
621
622	Stricter closed procs:
623
624	proc my-closed (x) {
625	write 'arg is' $x
626	}
627	my-closed # runtime error: missing argument
628	my-closed 1 # valid
629	my-closed 1 2 # runtime error: too many arguments
630
631
632	An "open" proc is nearly is nearly identical to a shell function:
633
634	shfunc() {
635	write 'args are' @ARGV
636	}
637
638	## Methods are Funcs Bound to Objects
639
640	Values of type `Obj` have an ordered set of name-value bindings, as well as a
641	prototype chain of more `Obj` instances ("parents"). They support these
642	operators:
643
644	- dot (`.`) looks for attributes or methods with a given name.
645	- Reference: [ysh-attr](ref/chap-expr-lang.html#ysh-attr)
646	- Attributes may be in the object, or up the chain. They are returned
647	literally.
648	- Methods live up the chain. They are returned as `BoundFunc`, so that the
649	first `self` argument of a method call is the object itself.
650	- Thin arrow (`->`) looks for mutating methods, which have an `M/` prefix.
651	- Reference: [thin-arrow](ref/chap-expr-lang.html#thin-arrow)
652
653	## The `__invoke__` method makes an Object "Proc-like"
654
655	First, define a proc, with the first typed arg named `self`:
656
657	proc myInvoke (word_param; self, int_param) {
658	echo "sum = $[self.x + self.y + int_param]"
659	}
660
661	Make it the `__invoke__` method of an `Obj`:
662
663	var methods = Object(null, {__invoke__: myInvoke})
664	var invokable_obj = Object(methods, {x: 1, y: 2})
665
666	Then invoke it like a proc:
667
668	invokable_obj myword (3)
669	# sum => 6
670
671	## Usage Notes
672
673	### 3 Ways to Return a Value
674
675	Let's review the recommended ways to "return" a value:
676
677	1. `return (x)` in a `func`.
678	- The parentheses are required because expressions like `(x + 1)` should
679	look different than words.
680	1. Pass a `value.Place` instance to a proc or func.
681	- That is, out param `&out`.
682	1. Print to stdout in a `proc`
683	- Capture it with command sub: `$(myproc)`
684	- Or with `read`: `myproc \| read --all; echo $_reply`
685
686	Obsolete ways of "returning":
687
688	1. Using `declare -n` aka `nameref` variables in bash.
689	1. Relying on [dynamic scope]($xref:dynamic-scope) in POSIX shell.
690
691	### Procs Compose in Pipelines / "Bernstein Chaining"
692
693	Some YSH users may tend toward funcs because they're more familiar. But shell
694	composition with procs is very powerful!
695
696	They have at least two kinds of composition that funcs don't have.
697
698	See #[shell-the-good-parts]($blog-tag):
699
700	1. [Shell Has a Forth-Like
701	Quality](https://www.oilshell.org/blog/2017/01/13.html) - Bernstein
702	chaining.
703	1. [Pipelines Support Vectorized, Point-Free, and Imperative
704	Style](https://www.oilshell.org/blog/2017/01/15.html) - the shell can
705	transparently run procs as elements of pipelines.
706
707	<!--
708
709	In summary:
710
711	* func signatures look like JavaScript, Julia, and Go.
712	* named and positional are separated with `;` in the signature.
713	* The prefix `...` "spread" operator takes the place of Python's `args` and `*kwargs`.
714	* There are optional type annotations
715	* procs are like shell functions
716	* but they also allow you to name parameters, and throw errors if the arity
717	is wrong.
718	* and they take blocks.
719
720	-->
721
722	## Summary
723
724	YSH is influenced by both shell and Python, so it has both procs and funcs.
725
726	Many programmers will gravitate towards funcs because they're familiar, but
727	procs are more powerful and shell-like.
728
729	Make your YSH programs by learning to use procs!
730
731	## Appendix
732
733	### Implementation Details
734
735	procs vs. funcs both have these concerns:
736
737	1. Evaluation of default args at definition time.
738	1. Evaluation of actual args at the call site.
739	1. Arg-Param binding for builtin functions, e.g. with `typed_args.Reader`.
740	1. Arg-Param binding for user-defined functions.
741
742	So the implementation can be thought of as a 2 × 4 matrix, with some
743	code shared. This code is mostly in [ysh/func_proc.py]($oils-src).
744
745	### Related
746
747	- [Variable Declaration, Mutation, and Scope](variables.html) - in particular,
748	procs don't have [dynamic scope]($xref:dynamic-scope).
749	- [Block Literals](block-literals.html) (in progress)
750
751	<!--
752	TODO: any reference topics?
753	-->
754
755	<!--
756	OK we're getting close here -- #language-design>Unifying Proc and Func Params
757
758	I think we need to write a quick guide first, not a reference
759
760
761	It might have some tables
762
763	It might mention concerete use cases like the flag parser -- #oil-dev>Progress on argparse
764
765
766	### Diff-based explanation
767
768	- why not Python -- because of `/` and `*` special cases
769	- Julia influence
770	- lazy args for procs `where` filters and `awk`
771	- out Ref parameters are for "returning" without printing to stdout
772
773	#language-design>N ways to "return" a value
774
775
776	- What does shell have?
777	- it has blocks, e.g. with redirects
778	- it has functions without params -- only named params
779
780
781	- Ruby influence -- rich DSLs
782
783
784	So I think you can say we're a mix of
785
786	- shell
787	- Python
788	- Julia (mostly subsumes Python?)
789	- Ruby
790
791
792	### Implemented-based explanation
793
794	- ASDL schemas -- #oil-dev>Good Proc/Func refactoring
795
796
797	### Big Idea: procs are for I/O, funcs are for computation
798
799	We may want to go full in on this idea with #language-design>func evaluator without redirects and $?
800
801
802	### Very Basic Advice, Up Front
803
804
805	Done with #language-design>value.Place, & operator, read builtin
806
807	Place works with both func and proc
808
809
810	### Bump
811
812	I think this might go in the backlog - #blog-ideas
813
814
815	#language-design>Simplify proc param passing?
816
817	-->
818
819	<!-- vim sw=2 -->