doc/proc-func.md

OILS / doc / proc-func.md View on Github | oils.pub

822 lines, 584 significant

1	---
2	default_highlighter: oils-sh
3	---
4
5	Guide to Procs and Funcs
6	========================
7
8	YSH has two major units of code: shell-like `proc`, and Python-like `func`.
9
10	- Roughly speaking, procs are for commands and I/O, while funcs are for
11	pure computation.
12	- Procs are often big, and may call small funcs. On the other hand,
13	it's possible, but rarer, for funcs to call procs.
14	- You can write shell scripts mostly with procs, and perhaps a few funcs.
15
16	This doc compares the two mechanisms, and gives rough guidelines.
17
18	<!--
19	See the blog for more conceptual background: [Oils is
20	Exterior-First](https://www.oilshell.org/blog/2023/06/ysh-design.html).
21	-->
22
23	<div id="toc">
24	</div>
25
26	## Tip: Start Simple
27
28	Before going into detail, here's a quick reminder that you don't have to use
29	either procs or funcs. YSH is a language that scales both down and up.
30
31	You can start with just a list of plain commands:
32
33	mkdir -p /tmp/dest
34	cp --verbose *.txt /tmp/dest
35
36	Then copy those into procs as the script gets bigger:
37
38	proc build-app {
39	ninja --verbose
40	}
41
42	proc deploy {
43	mkdir -p /tmp/dest
44	cp --verbose *.txt /tmp/dest
45	}
46
47	build-app
48	deploy
49
50	Then add funcs if you need pure computation:
51
52	func isTestFile(name) {
53	return (name => endsWith('._test.py'))
54	}
55
56	if (isTestFile('my_test.py')) {
57	echo 'yes'
58	}
59
60	## At a Glance
61
62	### Procs vs. Funcs
63
64	This table summarizes the difference between procs and funcs. The rest of the
65	doc will elaborate on these issues.
66
67	<style>
68	thead {
69	background-color: #eee;
70	font-weight: bold;
71	}
72	table {
73	font-family: sans-serif;
74	border-collapse: collapse;
75	}
76
77	tr {
78	border-bottom: solid 1px;
79	border-color: #ddd;
80	}
81
82	td {
83	padding: 8px; /* override default of 5px */
84	}
85	</style>
86
87
88	<table>
89
90	- thead
91	- <!-- empty -->
92	- Proc
93	- Func
94	- tr
95	- Design Influence
96	- Shell-like.
97	- Python- and JavaScript-like, but pure.
98	- tr
99	- Shape
100	- Procs are shaped like Unix processes: with `argv`, an integer return code,
101	and `stdin` / `stdout` streams.
102
103	They're a generalization of Bourne shell "functions".
104	- Funcs are shaped like mathematical functions.
105	- tr
106	- Architectural Role ([Oils is Exterior First](https://www.oilshell.org/blog/2023/06/ysh-design.html))
107	- Exterior: processes and files.
108	- Interior: functions and garbage-collected data structures.
109	- tr
110	- I/O
111	- Procs may start external processes and pipelines. Can perform I/O
112	anywhere.
113	- Funcs need an explicit `io` param to perform I/O.
114	- tr
115	- Example Definition
116	- ```
117	proc print-max (; x, y) {
118	echo $[x if x > y else y]
119	}
120	```
121	- ```
122	func computeMax(x, y) {
123	return (x if x > y else y)
124	}
125	```
126	- tr
127	- Example Call
128	- ```
129	print-max (3, 4)
130	```
131
132	Procs can be put in pipelines:
133
134	```
135	print-max (3, 4) \| tee out.txt
136	```
137	- ```
138	var m = computeMax(3, 4)
139	```
140
141	Or throw away the return value, which is useful for functions that mutate:
142
143	```
144	call computeMax(3, 4)
145	```
146	- tr
147	- Naming Convention
148	- `kebab-case`
149	- `camelCase`
150	- tr
151	- [Syntax Mode](command-vs-expression-mode.html) of call site
152	- Command Mode</td>
153	- Expression Mode</td>
154	- tr
155	- Kinds of Parameters / Arguments
156	- <!-- empty -->
157	1. Word aka string
158	1. Typed and Positional
159	1. Typed and Named
160	1. Block
161
162	Examples shown below.
163	- <!-- empty -->
164	1. Positional
165	1. Named
166
167	(both typed)
168	- tr
169	- Return Value
170	- Integer status 0-255
171	- Any type of value, e.g.
172
173	```
174	return ([42, {name: 'bob'}])
175	```
176	- tr
177	- Can it be a method on an object?
178	- No
179	- Yes, funcs may be bound to objects:
180
181	```
182	var x = obj.myMethod()
183	call obj->myMutatingMethod()
184	```
185	- tr
186	- Interface Evolution
187	- Slower: Procs exposed to the outside world may need to evolve in a compatible or "versionless" way.
188	- Faster: Funcs may be refactored internally.
189	- tr
190	- Parallelism?
191	- Procs can be parallel with:
192	- shell constructs: pipelines, `&` aka `fork`
193	- external tools and the [$0 Dispatch
194	Pattern](https://www.oilshell.org/blog/2021/08/xargs.html): xargs, make,
195	Ninja, etc.
196	- Funcs are inherently serial, unless wrapped in a proc.
197	- tr
198	- More `proc` Features ...
199	<cell-attrs colspan=3 style="text-align: center; padding: 3em" />
200	- tr
201	- Kinds of Signature
202	- Open `proc p {` or <br/>
203	Closed `proc p () {`
204	- <!-- dash --> -
205	- tr
206	- Lazy Args
207	- ```
208	assert [42 === x]
209	```
210	- <!-- dash --> -
211
212	</table>
213
214	### Func Calls and Defs
215
216	Now that we've compared procs and funcs, let's look more closely at funcs.
217	They're inherently simpler: they have 2 types of args and params, rather
218	than 4.
219
220	YSH argument binding is based on Julia, which has all the power of Python, but
221	without the "evolved warts" (e.g. `/` and `*`).
222
223	In general, with all the bells and whistles, func definitions look like:
224
225	# pos args and named args separated with ;
226	func f(p1, p2, ...rest_pos; n1=42, n2='foo', ...rest_named) {
227	return (len(rest_pos) + len(rest_named))
228	}
229
230	Func calls look like:
231
232	# spread operator ... at call site
233	var pos_args = [3, 4]
234	var named_args = {foo: 'bar'}
235	var x = f(1, 2, ...pos_args; n1=43, ...named_args)
236
237	Note that positional args/params and named args/params can be thought of as two
238	"separate worlds".
239
240	This table shows simpler, more common cases.
241
242
243	<table>
244	<thead>
245	<tr>
246	<td>Args / Params</td>
247	<td>Call Site</td>
248	<td>Definition</td>
249	</tr>
250	</thead>
251
252	<tr>
253	<td>Positional Args</td>
254	<td>
255
256	var x = myMax(3, 4)
257
258	</td>
259	<td>
260
261	func myMax(x, y) {
262	return (x if x > y else y)
263	}
264
265	</td>
266	</tr>
267
268	<tr>
269	<td>Spread Pos Args</td>
270	<td>
271
272	var args = [3, 4]
273	var x = myMax(...args)
274
275	</td>
276	<td>
277
278	(as above)
279
280	</td>
281	</tr>
282
283	<tr>
284	<td>Rest Pos Params</td>
285	<td>
286
287	var x = myPrintf("%s is %d", 'bob', 30)
288
289	</td>
290	<td>
291
292	func myPrintf(fmt, ...args) {
293	# ...
294	}
295
296	</td>
297	</tr>
298
299	<tr>
300	<td colspan=3 style="text-align: center; padding: 3em">...</td>
301	</tr>
302
303	</td>
304	</tr>
305
306	<tr>
307	<td>Named Args</td>
308	<td>
309
310	var x = mySum(3, 4, start=5)
311
312	</td>
313	<td>
314
315	func mySum(x, y; start=0) {
316	return (x + y + start)
317	}
318
319	</td>
320	</tr>
321
322	<tr>
323	<td>Spread Named Args</td>
324	<td>
325
326	var opts = {start: 5}
327	var x = mySum(3, 4, ...opts)
328
329	</td>
330	<td>
331
332	(as above)
333
334	</td>
335	</tr>
336
337	<tr>
338	<td>Rest Named Params</td>
339	<td>
340
341	var x = f(start=5, end=7)
342
343	</td>
344	<td>
345
346	func f(; ...opts) {
347	if ('start' not in opts) {
348	setvar opts.start = 0
349	}
350	# ...
351	}
352
353	</td>
354	</tr>
355
356	</table>
357
358	### Proc Calls and Defs
359
360	Like funcs, procs have 2 kinds of typed args/params: positional and named.
361
362	But they may also have string aka word args/params, and a block
363	arg/param.
364
365	In general, a proc signature has 4 sections, like this:
366
367	proc p (
368	w1, w2, ...rest_word; # word params
369	p1, p2, ...rest_pos; # pos params
370	n1, n2, ...rest_named; # named params
371	block # block param
372	) {
373	echo 'body'
374	}
375
376	In general, a proc call looks like this:
377
378	var pos_args = [3, 4]
379	var named_args = {foo: 'bar'}
380
381	p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args) {
382	echo 'block'
383	}
384
385	The block can also be passed as an expression after a second semicolon:
386
387	p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args; block)
388
389	<!--
390	- Block is really last positional arg: `cd /tmp { echo $PWD }`
391	-->
392
393	Some simpler examples:
394
395	<table>
396	<thead>
397	<tr>
398	<td>Args / Params</td>
399	<td>Call Site</td>
400	<td>Definition</td>
401	</tr>
402	</thead>
403
404	<tr>
405	<td>Word args</td>
406	<td>
407
408	my-cd /tmp
409
410	</td>
411	<td>
412
413	proc my-cd (dest) {
414	cd $dest
415	}
416
417	</td>
418	</tr>
419
420	<tr>
421	<td>Rest Word Params</td>
422	<td>
423
424	my-cd -L /tmp
425
426	</td>
427	<td>
428
429	proc my-cd (...flags) {
430	cd @flags
431	}
432
433	<tr>
434	<td>Spread Word Args</td>
435	<td>
436
437	var flags = :\| -L /tmp \|
438	my-cd @flags
439
440	</td>
441	<td>
442
443	(as above)
444
445	</td>
446	</tr>
447
448	</td>
449	</tr>
450
451	<tr>
452	<td colspan=3 style="text-align: center; padding: 3em">...</td>
453	</tr>
454
455	<tr>
456	<td>Typed Pos Arg</td>
457	<td>
458
459	print-max (3, 4)
460
461	</td>
462	<td>
463
464	proc print-max ( ; x, y) {
465	echo $[x if x > y else y]
466	}
467
468	</td>
469	</tr>
470
471	<tr>
472	<td>Typed Named Arg</td>
473	<td>
474
475	print-max (3, 4, start=5)
476
477	</td>
478	<td>
479
480	proc print-max ( ; x, y; start=0) {
481	# ...
482	}
483
484	</td>
485	</tr>
486
487	<tr>
488	<td colspan=3 style="text-align: center; padding: 3em">...</td>
489	</tr>
490
491
492
493	<tr>
494	<td>Block Argument</td>
495	<td>
496
497	my-cd /tmp {
498	echo $PWD
499	echo hi
500	}
501
502	</td>
503	<td>
504
505	proc my-cd (dest; ; ; block) {
506	cd $dest (; ; block)
507	}
508
509	</td>
510	</tr>
511
512	<tr>
513	<td>All Four Kinds</td>
514	<td>
515
516	p 'word' (42, verbose=true) {
517	echo $PWD
518	echo hi
519	}
520
521	</td>
522	<td>
523
524	proc p (w; myint; verbose=false; block) {
525	= w
526	= myint
527	= verbose
528	= block
529	}
530
531	</td>
532	</tr>
533
534	</table>
535
536	## Common Features
537
538	Let's recap the common features of procs and funcs.
539
540	### Spread Args, Rest Params
541
542	- Spread arg list `...` at call site
543	- Rest params `...` at definition
544
545	### The `error` builtin raises exceptions
546
547	The `error` builtin is idiomatic in both funcs and procs:
548
549	func f(x) {
550	if (x <= 0) {
551	error 'Should be positive' (status=99)
552	}
553	}
554
555	Tip: reserve such errors for exceptional situations. For example, an input
556	string being invalid may not be uncommon, while a disk full I/O error is more
557	exceptional.
558
559	(The `error` builtin is implemented with C++ exceptions, which are slow in the
560	error case.)
561
562	### Out Params: `&myvar` is of type `value.Place`
563
564	Out params are more common in procs, because they don't have a typed return
565	value.
566
567	proc p ( ; out) {
568	call out->setValue(42)
569	}
570	var x
571	p (&x)
572	echo "x set to $x" # => x set to 42
573
574	But they can also be used in funcs:
575
576	func f (out) {
577	call out->setValue(42)
578	}
579	var x
580	call f(&x)
581	echo "x set to $x" # => x set to 42
582
583	Observation: procs can do everything funcs can. But you may want the purity
584	and familiar syntax of a `func`.
585
586	---
587
588	Design note: out params are a nicer way of doing what bash does with `declare
589	-n` aka `nameref` variables. They don't rely on [dynamic
590	scope]($xref:dynamic-scope).
591
592	## Proc-Only Features
593
594	Procs have some features that funcs don't have.
595
596	### Lazy Arg Lists `where [x > 10]`
597
598	A lazy arg list is implemented with `shopt --set parse_bracket`, and is syntax
599	sugar for an unevaluated `value.Expr`.
600
601	Longhand:
602
603	var my_expr = ^[42 === x] # value of type Expr
604	assert (myexpr)
605
606	Shorthand:
607
608	assert [42 === x] # equivalent to the above
609
610	### Open Proc Signatures bind `argv`
611
612	TODO: Implement new `ARGV` semantics.
613
614	When a proc signature omits `()`, it's called "open" because the caller can
615	pass "extra" arguments:
616
617	proc my-open {
618	write 'args are' @ARGV
619	}
620	# All valid:
621	my-open
622	my-open 1
623	my-open 1 2
624
625	Stricter closed procs:
626
627	proc my-closed (x) {
628	write 'arg is' $x
629	}
630	my-closed # runtime error: missing argument
631	my-closed 1 # valid
632	my-closed 1 2 # runtime error: too many arguments
633
634
635	An "open" proc is nearly is nearly identical to a shell function:
636
637	shfunc() {
638	write 'args are' @ARGV
639	}
640
641	## Methods are Funcs Bound to Objects
642
643	Values of type `Obj` have an ordered set of name-value bindings, as well as a
644	prototype chain of more `Obj` instances ("parents"). They support these
645	operators:
646
647	- dot (`.`) looks for attributes or methods with a given name.
648	- Reference: [ysh-attr](ref/chap-expr-lang.html#ysh-attr)
649	- Attributes may be in the object, or up the chain. They are returned
650	literally.
651	- Methods live up the chain. They are returned as `BoundFunc`, so that the
652	first `self` argument of a method call is the object itself.
653	- Thin arrow (`->`) looks for mutating methods, which have an `M/` prefix.
654	- Reference: [thin-arrow](ref/chap-expr-lang.html#thin-arrow)
655
656	## The `__invoke__` method makes an Object "Proc-like"
657
658	First, define a proc, with the first typed arg named `self`:
659
660	proc myInvoke (word_param; self, int_param) {
661	echo "sum = $[self.x + self.y + int_param]"
662	}
663
664	Make it the `__invoke__` method of an `Obj`:
665
666	var methods = Object(null, {__invoke__: myInvoke})
667	var invokable_obj = Object(methods, {x: 1, y: 2})
668
669	Then invoke it like a proc:
670
671	invokable_obj myword (3)
672	# sum => 6
673
674	## Usage Notes
675
676	### 3 Ways to Return a Value
677
678	Let's review the recommended ways to "return" a value:
679
680	1. `return (x)` in a `func`.
681	- The parentheses are required because expressions like `(x + 1)` should
682	look different than words.
683	1. Pass a `value.Place` instance to a proc or func.
684	- That is, out param `&out`.
685	1. Print to stdout in a `proc`
686	- Capture it with command sub: `$(myproc)`
687	- Or with `read`: `myproc \| read --all; echo $_reply`
688
689	Obsolete ways of "returning":
690
691	1. Using `declare -n` aka `nameref` variables in bash.
692	1. Relying on [dynamic scope]($xref:dynamic-scope) in POSIX shell.
693
694	### Procs Compose in Pipelines / "Bernstein Chaining"
695
696	Some YSH users may tend toward funcs because they're more familiar. But shell
697	composition with procs is very powerful!
698
699	They have at least two kinds of composition that funcs don't have.
700
701	See #[shell-the-good-parts]($blog-tag):
702
703	1. [Shell Has a Forth-Like
704	Quality](https://www.oilshell.org/blog/2017/01/13.html) - Bernstein
705	chaining.
706	1. [Pipelines Support Vectorized, Point-Free, and Imperative
707	Style](https://www.oilshell.org/blog/2017/01/15.html) - the shell can
708	transparently run procs as elements of pipelines.
709
710	<!--
711
712	In summary:
713
714	* func signatures look like JavaScript, Julia, and Go.
715	* named and positional are separated with `;` in the signature.
716	* The prefix `...` "spread" operator takes the place of Python's `args` and `*kwargs`.
717	* There are optional type annotations
718	* procs are like shell functions
719	* but they also allow you to name parameters, and throw errors if the arity
720	is wrong.
721	* and they take blocks.
722
723	-->
724
725	## Summary
726
727	YSH is influenced by both shell and Python, so it has both procs and funcs.
728
729	Many programmers will gravitate towards funcs because they're familiar, but
730	procs are more powerful and shell-like.
731
732	Make your YSH programs by learning to use procs!
733
734	## Appendix
735
736	### Implementation Details
737
738	procs vs. funcs both have these concerns:
739
740	1. Evaluation of default args at definition time.
741	1. Evaluation of actual args at the call site.
742	1. Arg-Param binding for builtin functions, e.g. with `typed_args.Reader`.
743	1. Arg-Param binding for user-defined functions.
744
745	So the implementation can be thought of as a 2 × 4 matrix, with some
746	code shared. This code is mostly in [ysh/func_proc.py]($oils-src).
747
748	### Related
749
750	- [Variable Declaration, Mutation, and Scope](variables.html) - in particular,
751	procs don't have [dynamic scope]($xref:dynamic-scope).
752	- [Block Literals](block-literals.html) (in progress)
753
754	<!--
755	TODO: any reference topics?
756	-->
757
758	<!--
759	OK we're getting close here -- #language-design>Unifying Proc and Func Params
760
761	I think we need to write a quick guide first, not a reference
762
763
764	It might have some tables
765
766	It might mention concerete use cases like the flag parser -- #oil-dev>Progress on argparse
767
768
769	### Diff-based explanation
770
771	- why not Python -- because of `/` and `*` special cases
772	- Julia influence
773	- lazy args for procs `where` filters and `awk`
774	- out Ref parameters are for "returning" without printing to stdout
775
776	#language-design>N ways to "return" a value
777
778
779	- What does shell have?
780	- it has blocks, e.g. with redirects
781	- it has functions without params -- only named params
782
783
784	- Ruby influence -- rich DSLs
785
786
787	So I think you can say we're a mix of
788
789	- shell
790	- Python
791	- Julia (mostly subsumes Python?)
792	- Ruby
793
794
795	### Implemented-based explanation
796
797	- ASDL schemas -- #oil-dev>Good Proc/Func refactoring
798
799
800	### Big Idea: procs are for I/O, funcs are for computation
801
802	We may want to go full in on this idea with #language-design>func evaluator without redirects and $?
803
804
805	### Very Basic Advice, Up Front
806
807
808	Done with #language-design>value.Place, & operator, read builtin
809
810	Place works with both func and proc
811
812
813	### Bump
814
815	I think this might go in the backlog - #blog-ideas
816
817
818	#language-design>Simplify proc param passing?
819
820	-->
821
822	<!-- vim sw=2 -->