OILS / doc / proc-func.md View on Github | oilshell.org

1071 lines, 748 significant
1---
2default_highlighter: oils-sh
3---
4
5Guide to Procs and Funcs
6========================
7
8YSH has two major units of code: shell-like `proc`, and Python-like `func`.
9
10- Roughly speaking, procs are for commands and **I/O**, while funcs are for
11 pure **computation**.
12- Procs are often **big**, and may call **small** funcs. On the other hand,
13 it's possible, but rarer, for funcs to call procs.
14- You can write shell scripts **mostly** with procs, and perhaps a few funcs.
15
16This doc compares the two mechanisms, and gives rough guidelines.
17
18<!--
19See the blog for more conceptual background: [Oils is
20Exterior-First](https://www.oilshell.org/blog/2023/06/ysh-design.html).
21-->
22
23<div id="toc">
24</div>
25
26## Tip: Start Simple
27
28Before going into detail, here's a quick reminder that you don't have to use
29**either** procs or funcs. YSH is a language that scales both down and up.
30
31You can start with just a list of plain commands:
32
33 mkdir -p /tmp/dest
34 cp --verbose *.txt /tmp/dest
35
36Then copy those into procs as the script gets bigger:
37
38 proc build-app {
39 ninja --verbose
40 }
41
42 proc deploy {
43 mkdir -p /tmp/dest
44 cp --verbose *.txt /tmp/dest
45 }
46
47 build-app
48 deploy
49
50Then add funcs if you need pure computation:
51
52 func isTestFile(name) {
53 return (name => endsWith('._test.py'))
54 }
55
56 if (isTestFile('my_test.py')) {
57 echo 'yes'
58 }
59
60## At a Glance
61
62### Procs vs. Funcs
63
64This table summarizes the difference between procs and funcs. The rest of the
65doc will elaborate on these issues.
66
67<style>
68 thead {
69 background-color: #eee;
70 font-weight: bold;
71 }
72 table {
73 font-family: sans-serif;
74 border-collapse: collapse;
75 }
76
77 tr {
78 border-bottom: solid 1px;
79 border-color: #ddd;
80 }
81
82 td {
83 padding: 8px; /* override default of 5px */
84 }
85</style>
86
87
88<table>
89 <thead>
90 <tr>
91 <td></td>
92 <td>Proc</td>
93 <td>Func</td>
94 </tr>
95 </thead>
96
97 <tr>
98 <td>Design Influence</td>
99<td>
100
101Shell-like.
102
103</td>
104<td>
105
106Python- and JavaScript-like, but **pure**.
107
108</td>
109 </tr>
110
111 <tr>
112 <td>Shape</td>
113
114<td>
115
116Procs are shaped like Unix processes: with `argv`, an integer return code, and
117`stdin` / `stdout` streams.
118
119They're a generalization of Bourne shell "functions".
120
121</td>
122<td>
123
124Funcs are shaped like mathematical functions.
125
126</td>
127 </tr>
128
129 <tr>
130<td>
131
132Architectural Role ([Oils is Exterior First](https://www.oilshell.org/blog/2023/06/ysh-design.html))
133
134</td>
135<td>
136
137**Exterior**: processes and files.
138
139</td>
140
141<td>
142
143**Interior**: functions and garbage-collected data structures.
144
145</td>
146 </tr>
147
148 <tr>
149 <td>I/O</td>
150 <td>
151
152Procs may start external processes and pipelines. Can perform I/O anywhere.
153
154</td>
155 <td>
156
157Funcs need an explicit `io` param to perform I/O.
158
159</td>
160 </tr>
161
162 <tr>
163 <td>Example Definition</td>
164<td>
165
166 proc print-max (; x, y) {
167 echo $[x if x > y else y]
168 }
169
170</td>
171<td>
172
173 func computeMax(x, y) {
174 return (x if x > y else y)
175 }
176
177</td>
178 </tr>
179
180 <tr>
181 <td>Example Call</td>
182<td>
183
184 print-max (3, 4)
185
186Procs can be put in pipelines:
187
188 print-max (3, 4) | tee out.txt
189
190</td>
191<td>
192
193 var m = computeMax(3, 4)
194
195Or throw away the return value, which is useful for functions that mutate:
196
197 call computeMax(3, 4)
198
199</td>
200 </tr>
201
202 <tr>
203 <td>Naming Convention</td>
204<td>
205
206`kebab-case`
207
208</td>
209<td>
210
211`camelCase`
212
213</td>
214 </tr>
215
216 <tr>
217<td>
218
219[Syntax Mode](command-vs-expression-mode.html) of call site
220
221</td>
222 <td>Command Mode</td>
223 <td>Expression Mode</td>
224 </tr>
225
226 <tr>
227 <td>Kinds of Parameters / Arguments</td>
228 <td>
229
2301. Word aka string
2311. Typed and Positional
2321. Typed and Named
2331. Block
234
235Examples shown below.
236
237</td>
238 <td>
239
2401. Positional
2411. Named
242
243(both typed)
244
245</td>
246 </tr>
247
248 <tr>
249 <td>Return Value</td>
250 <td>Integer status 0-255</td>
251 <td>
252
253Any type of value, e.g.
254
255 return ([42, {name: 'bob'}])
256
257</td>
258 </tr>
259 <tr>
260 <td>Relation to Objects</td>
261 <td>none</td>
262 <td>
263
264May be bound to objects:
265
266 var x = obj.myMethod()
267 call obj->myMutatingMethod()
268
269 </td>
270 </tr>
271
272 <tr>
273 <td>Interface Evolution</td>
274<td>
275
276**Slower**: Procs exposed to the outside world may need to evolve in a compatible or "versionless" way.
277
278</td>
279<td>
280
281**Faster**: Funcs may be refactored internally.
282
283</td>
284 </tr>
285
286 <tr>
287 <td>Parallelism?</td>
288<td>
289
290Procs can be parallel with:
291
292- shell constructs: pipelines, `&` aka `fork`
293- external tools and the [$0 Dispatch
294 Pattern](https://www.oilshell.org/blog/2021/08/xargs.html): xargs, make,
295 Ninja, etc.
296
297</td>
298<td>
299
300Funcs are inherently **serial**, unless wrapped in a proc.
301
302</td>
303 </tr>
304
305 <tr>
306 <td colspan=3 style="text-align: center; padding: 3em">More <code>proc</code> features ...</td>
307 </tr>
308
309 <tr>
310 <td>Kinds of Signature</td>
311 <td>
312
313Open `proc p {` or <br/>
314Closed `proc p () {`
315
316</td>
317 <td>-</td>
318 </tr>
319
320 <tr>
321 <td>Lazy Args</td>
322<td>
323
324 assert [42 === x]
325
326</td>
327 <td>-</td>
328 </tr>
329
330</table>
331
332### Func Calls and Defs
333
334Now that we've compared procs and funcs, let's look more closely at funcs.
335They're inherently **simpler**: they have 2 types of args and params, rather
336than 4.
337
338YSH argument binding is based on Julia, which has all the power of Python, but
339without the "evolved warts" (e.g. `/` and `*`).
340
341In general, with all the bells and whistles, func definitions look like:
342
343 # pos args and named args separated with ;
344 func f(p1, p2, ...rest_pos; n1=42, n2='foo', ...rest_named) {
345 return (len(rest_pos) + len(rest_named))
346 }
347
348Func calls look like:
349
350 # spread operator ... at call site
351 var pos_args = [3, 4]
352 var named_args = {foo: 'bar'}
353 var x = f(1, 2, ...pos_args; n1=43, ...named_args)
354
355Note that positional args/params and named args/params can be thought of as two
356"separate worlds".
357
358This table shows simpler, more common cases.
359
360
361<table>
362 <thead>
363 <tr>
364 <td>Args / Params</td>
365 <td>Call Site</td>
366 <td>Definition</td>
367 </tr>
368 </thead>
369
370 <tr>
371 <td>Positional Args</td>
372<td>
373
374 var x = myMax(3, 4)
375
376</td>
377<td>
378
379 func myMax(x, y) {
380 return (x if x > y else y)
381 }
382
383</td>
384 </tr>
385
386 <tr>
387 <td>Spread Pos Args</td>
388<td>
389
390 var args = [3, 4]
391 var x = myMax(...args)
392
393</td>
394<td>
395
396(as above)
397
398</td>
399 </tr>
400
401 <tr>
402 <td>Rest Pos Params</td>
403<td>
404
405 var x = myPrintf("%s is %d", 'bob', 30)
406
407</td>
408<td>
409
410 func myPrintf(fmt, ...args) {
411 # ...
412 }
413
414</td>
415 </tr>
416
417 <tr>
418 <td colspan=3 style="text-align: center; padding: 3em">...</td>
419 </tr>
420
421</td>
422 </tr>
423
424 <tr>
425 <td>Named Args</td>
426<td>
427
428 var x = mySum(3, 4, start=5)
429
430</td>
431<td>
432
433 func mySum(x, y; start=0) {
434 return (x + y + start)
435 }
436
437</td>
438 </tr>
439
440 <tr>
441 <td>Spread Named Args</td>
442<td>
443
444 var opts = {start: 5}
445 var x = mySum(3, 4, ...opts)
446
447</td>
448<td>
449
450(as above)
451
452</td>
453 </tr>
454
455 <tr>
456 <td>Rest Named Params</td>
457<td>
458
459 var x = f(start=5, end=7)
460
461</td>
462<td>
463
464 func f(; ...opts) {
465 if ('start' not in opts) {
466 setvar opts.start = 0
467 }
468 # ...
469 }
470
471</td>
472 </tr>
473
474</table>
475
476### Proc Calls and Defs
477
478Like funcs, procs have 2 kinds of typed args/params: positional and named.
479
480But they may also have **string aka word** args/params, and a **block**
481arg/param.
482
483In general, a proc signature has 4 sections, like this:
484
485 proc p (
486 w1, w2, ...rest_word; # word params
487 p1, p2, ...rest_pos; # pos params
488 n1, n2, ...rest_named; # named params
489 block # block param
490 ) {
491 echo 'body'
492 }
493
494In general, a proc call looks like this:
495
496 var pos_args = [3, 4]
497 var named_args = {foo: 'bar'}
498
499 p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args) {
500 echo 'block'
501 }
502
503The block can also be passed as an expression after a second semicolon:
504
505 p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args; block)
506
507<!--
508- Block is really last positional arg: `cd /tmp { echo $PWD }`
509-->
510
511Some simpler examples:
512
513<table>
514 <thead>
515 <tr>
516 <td>Args / Params</td>
517 <td>Call Site</td>
518 <td>Definition</td>
519 </tr>
520 </thead>
521
522 <tr>
523 <td>Word args</td>
524<td>
525
526 my-cd /tmp
527
528</td>
529<td>
530
531 proc my-cd (dest) {
532 cd $dest
533 }
534
535</td>
536 </tr>
537
538 <tr>
539 <td>Rest Word Params</td>
540<td>
541
542 my-cd -L /tmp
543
544</td>
545<td>
546
547 proc my-cd (...flags) {
548 cd @flags
549 }
550
551 <tr>
552 <td>Spread Word Args</td>
553<td>
554
555 var flags = :| -L /tmp |
556 my-cd @flags
557
558</td>
559<td>
560
561(as above)
562
563</td>
564 </tr>
565
566</td>
567 </tr>
568
569 <tr>
570 <td colspan=3 style="text-align: center; padding: 3em">...</td>
571 </tr>
572
573 <tr>
574 <td>Typed Pos Arg</td>
575<td>
576
577 print-max (3, 4)
578
579</td>
580<td>
581
582 proc print-max ( ; x, y) {
583 echo $[x if x > y else y]
584 }
585
586</td>
587 </tr>
588
589 <tr>
590 <td>Typed Named Arg</td>
591<td>
592
593 print-max (3, 4, start=5)
594
595</td>
596<td>
597
598 proc print-max ( ; x, y; start=0) {
599 # ...
600 }
601
602</td>
603 </tr>
604
605 <tr>
606 <td colspan=3 style="text-align: center; padding: 3em">...</td>
607 </tr>
608
609
610
611 <tr>
612 <td>Block Argument</td>
613<td>
614
615 my-cd /tmp {
616 echo $PWD
617 echo hi
618 }
619
620</td>
621<td>
622
623 proc my-cd (dest; ; ; block) {
624 cd $dest (; ; block)
625 }
626
627</td>
628 </tr>
629
630 <tr>
631 <td>All Four Kinds</td>
632<td>
633
634 p 'word' (42, verbose=true) {
635 echo $PWD
636 echo hi
637 }
638
639</td>
640<td>
641
642 proc p (w; myint; verbose=false; block) {
643 = w
644 = myint
645 = verbose
646 = block
647 }
648
649</td>
650 </tr>
651
652</table>
653
654## Common Features
655
656Let's recap the common features of procs and funcs.
657
658### Spread Args, Rest Params
659
660- Spread arg list `...` at call site
661- Rest params `...` at definition
662
663### The `error` builtin raises exceptions
664
665The `error` builtin is idiomatic in both funcs and procs:
666
667 func f(x) {
668 if (x <= 0) {
669 error 'Should be positive' (status=99)
670 }
671 }
672
673Tip: reserve such errors for **exceptional** situations. For example, an input
674string being invalid may not be uncommon, while a disk full I/O error is more
675exceptional.
676
677(The `error` builtin is implemented with C++ exceptions, which are slow in the
678error case.)
679
680### Out Params: `&myvar` is of type `value.Place`
681
682Out params are more common in procs, because they don't have a typed return
683value.
684
685 proc p ( ; out) {
686 call out->setValue(42)
687 }
688 var x
689 p (&x)
690 echo "x set to $x" # => x set to 42
691
692But they can also be used in funcs:
693
694 func f (out) {
695 call out->setValue(42)
696 }
697 var x
698 call f(&x)
699 echo "x set to $x" # => x set to 42
700
701Observation: procs can do everything funcs can. But you may want the purity
702and familiar syntax of a `func`.
703
704---
705
706Design note: out params are a nicer way of doing what bash does with `declare
707-n` aka `nameref` variables. They don't rely on [dynamic
708scope]($xref:dynamic-scope).
709
710## Proc-Only Features
711
712Procs have some features that funcs don't have.
713
714### Lazy Arg Lists `where [x > 10]`
715
716A lazy arg list is implemented with `shopt --set parse_bracket`, and is syntax
717sugar for an unevaluated `value.Expr`.
718
719Longhand:
720
721 var my_expr = ^[42 === x] # value of type Expr
722 assert (myexpr)
723
724Shorthand:
725
726 assert [42 === x] # equivalent to the above
727
728### Open Proc Signatures bind `argv`
729
730TODO: Implement new `ARGV` semantics.
731
732When a proc signature omits `()`, it's called **"open"** because the caller can
733pass "extra" arguments:
734
735 proc my-open {
736 write 'args are' @ARGV
737 }
738 # All valid:
739 my-open
740 my-open 1
741 my-open 1 2
742
743Stricter closed procs:
744
745 proc my-closed (x) {
746 write 'arg is' $x
747 }
748 my-closed # runtime error: missing argument
749 my-closed 1 # valid
750 my-closed 1 2 # runtime error: too many arguments
751
752
753An "open" proc is nearly is nearly identical to a shell function:
754
755 shfunc() {
756 write 'args are' @ARGV
757 }
758
759## Methods are Funcs Bound to Objects
760
761Values of type `Obj` have an ordered set of name-value bindings, as well as a
762prototype chain of more `Obj` instances ("parents"). They support these
763operators:
764
765- dot (`.`) looks for attributes or methods with a given name.
766 - Reference: [ysh-attr](ref/chap-expr-lang.html#ysh-attr)
767 - Attributes may be in the object, or up the chain. They are returned
768 literally.
769 - Methods live up the chain. They are returned as `BoundFunc`, so that the
770 first `self` argument of a method call is the object itself.
771- Thin arrow (`->`) looks for mutating methods, which have an `M/` prefix.
772 - Reference: [thin-arrow](ref/chap-expr-lang.html#thin-arrow)
773
774## The `__invoke__` method makes an Object "Proc-like"
775
776First, define a proc, with the first typed arg named `self`:
777
778 proc myInvoke (word_param; self, int_param) {
779 echo "sum = $[self.x + self.y + int_param]"
780 }
781
782Make it the `__invoke__` method of an `Obj`:
783
784 var methods = Object(null, {__invoke__: myInvoke})
785 var invokable_obj = Object(methods, {x: 1, y: 2})
786
787Then invoke it like a proc:
788
789 invokable_obj myword (3)
790 # sum => 6
791
792## Usage Notes
793
794### 3 Ways to Return a Value
795
796Let's review the recommended ways to "return" a value:
797
7981. `return (x)` in a `func`.
799 - The parentheses are required because expressions like `(x + 1)` should
800 look different than words.
8011. Pass a `value.Place` instance to a proc or func.
802 - That is, out param `&out`.
8031. Print to stdout in a `proc`
804 - Capture it with command sub: `$(myproc)`
805 - Or with `read`: `myproc | read --all; echo $_reply`
806
807Obsolete ways of "returning":
808
8091. Using `declare -n` aka `nameref` variables in bash.
8101. Relying on [dynamic scope]($xref:dynamic-scope) in POSIX shell.
811
812### Procs Compose in Pipelines / "Bernstein Chaining"
813
814Some YSH users may tend toward funcs because they're more familiar. But shell
815composition with procs is very powerful!
816
817They have at least two kinds of composition that funcs don't have.
818
819See #[shell-the-good-parts]($blog-tag):
820
8211. [Shell Has a Forth-Like
822 Quality](https://www.oilshell.org/blog/2017/01/13.html) - Bernstein
823 chaining.
8241. [Pipelines Support Vectorized, Point-Free, and Imperative
825 Style](https://www.oilshell.org/blog/2017/01/15.html) - the shell can
826 transparently run procs as elements of pipelines.
827
828<!--
829
830In summary:
831
832* func signatures look like JavaScript, Julia, and Go.
833 * named and positional are separated with `;` in the signature.
834 * The prefix `...` "spread" operator takes the place of Python's `*args` and `**kwargs`.
835 * There are optional type annotations
836* procs are like shell functions
837 * but they also allow you to name parameters, and throw errors if the arity
838is wrong.
839 * and they take blocks.
840
841-->
842
843## Summary
844
845YSH is influenced by both shell and Python, so it has both procs and funcs.
846
847Many programmers will gravitate towards funcs because they're familiar, but
848procs are more powerful and shell-like.
849
850Make your YSH programs by learning to use procs!
851
852## Appendix
853
854### Implementation Details
855
856procs vs. funcs both have these concerns:
857
8581. Evaluation of default args at definition time.
8591. Evaluation of actual args at the call site.
8601. Arg-Param binding for builtin functions, e.g. with `typed_args.Reader`.
8611. Arg-Param binding for user-defined functions.
862
863So the implementation can be thought of as a **2 &times; 4 matrix**, with some
864code shared. This code is mostly in [ysh/func_proc.py]($oils-src).
865
866### Related
867
868- [Variable Declaration, Mutation, and Scope](variables.html) - in particular,
869 procs don't have [dynamic scope]($xref:dynamic-scope).
870- [Block Literals](block-literals.html) (in progress)
871
872<!--
873TODO: any reference topics?
874-->
875
876<!--
877OK we're getting close here -- #**language-design>Unifying Proc and Func Params**
878
879I think we need to write a quick guide first, not a reference
880
881
882It might have some **tables**
883
884It might mention concerete use cases like the **flag parser** -- #**oil-dev>Progress on argparse**
885
886
887### Diff-based explanation
888
889- why not Python -- because of `/` and `*` special cases
890- Julia influence
891- lazy args for procs `where` filters and `awk`
892- out Ref parameters are for "returning" without printing to stdout
893
894#**language-design>N ways to "return" a value**
895
896
897- What does shell have?
898 - it has blocks, e.g. with redirects
899 - it has functions without params -- only named params
900
901
902- Ruby influence -- rich DSLs
903
904
905So I think you can say we're a mix of
906
907- shell
908- Python
909- Julia (mostly subsumes Python?)
910- Ruby
911
912
913### Implemented-based explanation
914
915- ASDL schemas -- #**oil-dev>Good Proc/Func refactoring**
916
917
918### Big Idea: procs are for I/O, funcs are for computation
919
920We may want to go full in on this idea with #**language-design>func evaluator without redirects and $?**
921
922
923### Very Basic Advice, Up Front
924
925
926Done with #**language-design>value.Place, & operator, read builtin**
927
928Place works with both func and proc
929
930
931### Bump
932
933I think this might go in the backlog - #**blog-ideas**
934
935
936#**language-design>Simplify proc param passing?**
937
938-->
939
940
941
942<!-- vim sw=2 -->
943
944## ul-table Draft
945
946<table>
947
948- thead
949 - <!-- empty -->
950 - Proc
951 - Func
952- tr
953 - Design Influence
954 - Shell-like.
955 - Python- and JavaScript-like, but **pure**.
956- tr
957 - Shape
958 - Procs are shaped like Unix processes: with `argv`, an integer return code,
959 and `stdin` / `stdout` streams.
960
961 They're a generalization of Bourne shell "functions".
962 - Funcs are shaped like mathematical functions.
963- tr
964 - Architectural Role ([Oils is Exterior First](https://www.oilshell.org/blog/2023/06/ysh-design.html))
965 - **Exterior**: processes and files.
966 - **Interior**: functions and garbage-collected data structures.
967- tr
968 - I/O
969 - Procs may start external processes and pipelines. Can perform I/O
970 anywhere.
971 - Funcs need an explicit `io` param to perform I/O.
972- tr
973 - Example Definition
974 - ```
975 proc print-max (; x, y) {
976 echo $[x if x > y else y]
977 }
978 ```
979 - ```
980 func computeMax(x, y) {
981 return (x if x > y else y)
982 }
983 ```
984- tr
985 - Example Call
986 - ```
987 print-max (3, 4)
988 ```
989
990 Procs can be put in pipelines:
991
992 ```
993 print-max (3, 4) | tee out.txt
994 ```
995 - ```
996 var m = computeMax(3, 4)
997 ```
998
999 Or throw away the return value, which is useful for functions that mutate:
1000
1001 ```
1002 call computeMax(3, 4)
1003 ```
1004- tr
1005 - Naming Convention
1006 - `kebab-case`
1007 - `camelCase`
1008- tr
1009 - [Syntax Mode](command-vs-expression-mode.html) of call site
1010 - Command Mode</td>
1011 - Expression Mode</td>
1012- tr
1013 - Kinds of Parameters / Arguments
1014 - <!-- empty -->
1015 1. Word aka string
1016 1. Typed and Positional
1017 1. Typed and Named
1018 1. Block
1019
1020 Examples shown below.
1021 - <!-- empty -->
1022 1. Positional
1023 1. Named
1024
1025 (both typed)
1026- tr
1027 - Return Value
1028 - Integer status 0-255
1029 - Any type of value, e.g.
1030
1031 ```
1032 return ([42, {name: 'bob'}])
1033 ```
1034- tr
1035 - Can it be a method on an object?
1036 - No
1037 - Yes, funcs may be bound to objects:
1038
1039 ```
1040 var x = obj.myMethod()
1041 call obj->myMutatingMethod()
1042 ```
1043- tr
1044 - Interface Evolution
1045 - **Slower**: Procs exposed to the outside world may need to evolve in a compatible or "versionless" way.
1046 - **Faster**: Funcs may be refactored internally.
1047- tr
1048 - Parallelism?
1049 - Procs can be parallel with:
1050 - shell constructs: pipelines, `&` aka `fork`
1051 - external tools and the [$0 Dispatch
1052 Pattern](https://www.oilshell.org/blog/2021/08/xargs.html): xargs, make,
1053 Ninja, etc.
1054 - Funcs are inherently **serial**, unless wrapped in a proc.
1055- tr
1056 - <td-attrs colspan=3 style="text-align: center; padding: 3em" /> &nbsp;
1057 More `proc` Features ...
1058- tr
1059 - Kinds of Signature
1060 - Open `proc p {` or <br/>
1061 Closed `proc p () {`
1062 - <!-- dash --> -
1063- tr
1064 - Lazy Args
1065 - ```
1066 assert [42 === x]
1067 ```
1068 - <!-- dash --> -
1069
1070</table>
1071