OILS / doc / proc-func.md View on Github | oilshell.org

941 lines, 633 significant
1---
2default_highlighter: oils-sh
3---
4
5Guide to Procs and Funcs
6========================
7
8YSH has two major units of code: shell-like `proc`, and Python-like `func`.
9
10- Roughly speaking, procs are for commands and **I/O**, while funcs are for
11 pure **computation**.
12- Procs are often **big**, and may call **small** funcs. On the other hand,
13 it's possible, but rarer, for funcs to call procs.
14- You can write shell scripts **mostly** with procs, and perhaps a few funcs.
15
16This doc compares the two mechanisms, and gives rough guidelines.
17
18<!--
19See the blog for more conceptual background: [Oils is
20Exterior-First](https://www.oilshell.org/blog/2023/06/ysh-design.html).
21-->
22
23<div id="toc">
24</div>
25
26## Tip: Start Simple
27
28Before going into detail, here's a quick reminder that you don't have to use
29**either** procs or funcs. YSH is a language that scales both down and up.
30
31You can start with just a list of plain commands:
32
33 mkdir -p /tmp/dest
34 cp --verbose *.txt /tmp/dest
35
36Then copy those into procs as the script gets bigger:
37
38 proc build-app {
39 ninja --verbose
40 }
41
42 proc deploy {
43 mkdir -p /tmp/dest
44 cp --verbose *.txt /tmp/dest
45 }
46
47 build-app
48 deploy
49
50Then add funcs if you need pure computation:
51
52 func isTestFile(name) {
53 return (name => endsWith('._test.py'))
54 }
55
56 if (isTestFile('my_test.py')) {
57 echo 'yes'
58 }
59
60## At a Glance
61
62### Procs vs. Funcs
63
64This table summarizes the difference between procs and funcs. The rest of the
65doc will elaborate on these issues.
66
67<style>
68 thead {
69 background-color: #eee;
70 font-weight: bold;
71 }
72 table {
73 font-family: sans-serif;
74 border-collapse: collapse;
75 }
76
77 tr {
78 border-bottom: solid 1px;
79 border-color: #ddd;
80 }
81
82 td {
83 padding: 8px; /* override default of 5px */
84 }
85</style>
86
87<table>
88 <thead>
89 <tr>
90 <td></td>
91 <td>Proc</td>
92 <td>Func</td>
93 </tr>
94 </thead>
95
96 <tr>
97 <td>Design Influence</td>
98<td>
99
100Shell-like.
101
102</td>
103<td>
104
105Python- and JavaScript-like, but **pure**.
106
107</td>
108 </tr>
109
110 <tr>
111 <td>Shape</td>
112
113<td>
114
115Procs are shaped like Unix processes: with `argv`, an integer return code, and
116`stdin` / `stdout` streams.
117
118They're a generalization of Bourne shell "functions".
119
120</td>
121<td>
122
123Funcs are shaped like mathematical functions.
124
125</td>
126 </tr>
127
128 <tr>
129<td>
130
131Architectural Role ([Oils is Exterior First](https://www.oilshell.org/blog/2023/06/ysh-design.html))
132
133</td>
134<td>
135
136**Exterior**: processes and files.
137
138</td>
139
140<td>
141
142**Interior**: functions and garbage-collected data structures.
143
144</td>
145 </tr>
146
147 <tr>
148 <td>I/O</td>
149 <td>
150
151Procs may start external processes and pipelines. Can perform I/O anywhere.
152
153</td>
154 <td>
155
156Funcs need an explicit `io` param to perform I/O.
157
158</td>
159 </tr>
160
161 <tr>
162 <td>Example Definition</td>
163<td>
164
165 proc print-max (; x, y) {
166 echo $[x if x > y else y]
167 }
168
169</td>
170<td>
171
172 func computeMax(x, y) {
173 return (x if x > y else y)
174 }
175
176</td>
177 </tr>
178
179 <tr>
180 <td>Example Call</td>
181<td>
182
183 print-max (3, 4)
184
185Procs can be put in pipelines:
186
187 print-max (3, 4) | tee out.txt
188
189</td>
190<td>
191
192 var m = computeMax(3, 4)
193
194Or throw away the return value, which is useful for functions that mutate:
195
196 call computeMax(3, 4)
197
198</td>
199 </tr>
200
201 <tr>
202 <td>Naming Convention</td>
203<td>
204
205`kebab-case`
206
207</td>
208<td>
209
210`camelCase`
211
212</td>
213 </tr>
214
215 <tr>
216<td>
217
218[Syntax Mode](command-vs-expression-mode.html) of call site
219
220</td>
221 <td>Command Mode</td>
222 <td>Expression Mode</td>
223 </tr>
224
225 <tr>
226 <td>Kinds of Parameters / Arguments</td>
227 <td>
228
2291. Word aka string
2301. Typed and Positional
2311. Typed and Named
2321. Block
233
234Examples shown below.
235
236</td>
237 <td>
238
2391. Positional
2401. Named
241
242(both typed)
243
244</td>
245 </tr>
246
247 <tr>
248 <td>Return Value</td>
249 <td>Integer status 0-255</td>
250 <td>
251
252Any type of value, e.g.
253
254 return ([42, {name: 'bob'}])
255
256</td>
257 </tr>
258 <tr>
259 <td>Relation to Objects</td>
260 <td>none</td>
261 <td>
262
263May be bound to objects:
264
265 var x = obj.myMethod()
266 call obj->myMutatingMethod()
267
268 </td>
269 </tr>
270
271 <tr>
272 <td>Interface Evolution</td>
273<td>
274
275**Slower**: Procs exposed to the outside world may need to evolve in a compatible or "versionless" way.
276
277</td>
278<td>
279
280**Faster**: Funcs may be refactored internally.
281
282</td>
283 </tr>
284
285 <tr>
286 <td>Parallelism?</td>
287<td>
288
289Procs can be parallel with:
290
291- shell constructs: pipelines, `&` aka `fork`
292- external tools and the [$0 Dispatch
293 Pattern](https://www.oilshell.org/blog/2021/08/xargs.html): xargs, make,
294 Ninja, etc.
295
296</td>
297<td>
298
299Funcs are inherently **serial**, unless wrapped in a proc.
300
301</td>
302 </tr>
303
304 <tr>
305 <td colspan=3 style="text-align: center; padding: 3em">More <code>proc</code> features ...</td>
306 </tr>
307
308 <tr>
309 <td>Kinds of Signature</td>
310 <td>
311
312Open `proc p {` or <br/>
313Closed `proc p () {`
314
315</td>
316 <td>-</td>
317 </tr>
318
319 <tr>
320 <td>Lazy Args</td>
321<td>
322
323 assert [42 === x]
324
325</td>
326 <td>-</td>
327 </tr>
328
329</table>
330
331### Func Calls and Defs
332
333Now that we've compared procs and funcs, let's look more closely at funcs.
334They're inherently **simpler**: they have 2 types of args and params, rather
335than 4.
336
337YSH argument binding is based on Julia, which has all the power of Python, but
338without the "evolved warts" (e.g. `/` and `*`).
339
340In general, with all the bells and whistles, func definitions look like:
341
342 # pos args and named args separated with ;
343 func f(p1, p2, ...rest_pos; n1=42, n2='foo', ...rest_named) {
344 return (len(rest_pos) + len(rest_named))
345 }
346
347Func calls look like:
348
349 # spread operator ... at call site
350 var pos_args = [3, 4]
351 var named_args = {foo: 'bar'}
352 var x = f(1, 2, ...pos_args; n1=43, ...named_args)
353
354Note that positional args/params and named args/params can be thought of as two
355"separate worlds".
356
357This table shows simpler, more common cases.
358
359
360<table>
361 <thead>
362 <tr>
363 <td>Args / Params</td>
364 <td>Call Site</td>
365 <td>Definition</td>
366 </tr>
367 </thead>
368
369 <tr>
370 <td>Positional Args</td>
371<td>
372
373 var x = myMax(3, 4)
374
375</td>
376<td>
377
378 func myMax(x, y) {
379 return (x if x > y else y)
380 }
381
382</td>
383 </tr>
384
385 <tr>
386 <td>Spread Pos Args</td>
387<td>
388
389 var args = [3, 4]
390 var x = myMax(...args)
391
392</td>
393<td>
394
395(as above)
396
397</td>
398 </tr>
399
400 <tr>
401 <td>Rest Pos Params</td>
402<td>
403
404 var x = myPrintf("%s is %d", 'bob', 30)
405
406</td>
407<td>
408
409 func myPrintf(fmt, ...args) {
410 # ...
411 }
412
413</td>
414 </tr>
415
416 <tr>
417 <td colspan=3 style="text-align: center; padding: 3em">...</td>
418 </tr>
419
420</td>
421 </tr>
422
423 <tr>
424 <td>Named Args</td>
425<td>
426
427 var x = mySum(3, 4, start=5)
428
429</td>
430<td>
431
432 func mySum(x, y; start=0) {
433 return (x + y + start)
434 }
435
436</td>
437 </tr>
438
439 <tr>
440 <td>Spread Named Args</td>
441<td>
442
443 var opts = {start: 5}
444 var x = mySum(3, 4, ...opts)
445
446</td>
447<td>
448
449(as above)
450
451</td>
452 </tr>
453
454 <tr>
455 <td>Rest Named Params</td>
456<td>
457
458 var x = f(start=5, end=7)
459
460</td>
461<td>
462
463 func f(; ...opts) {
464 if ('start' not in opts) {
465 setvar opts.start = 0
466 }
467 # ...
468 }
469
470</td>
471 </tr>
472
473</table>
474
475### Proc Calls and Defs
476
477Like funcs, procs have 2 kinds of typed args/params: positional and named.
478
479But they may also have **string aka word** args/params, and a **block**
480arg/param.
481
482In general, a proc signature has 4 sections, like this:
483
484 proc p (
485 w1, w2, ...rest_word; # word params
486 p1, p2, ...rest_pos; # pos params
487 n1, n2, ...rest_named; # named params
488 block # block param
489 ) {
490 echo 'body'
491 }
492
493In general, a proc call looks like this:
494
495 var pos_args = [3, 4]
496 var named_args = {foo: 'bar'}
497
498 p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args) {
499 echo 'block'
500 }
501
502The block can also be passed as an expression after a second semicolon:
503
504 p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args; block)
505
506<!--
507- Block is really last positional arg: `cd /tmp { echo $PWD }`
508-->
509
510Some simpler examples:
511
512<table>
513 <thead>
514 <tr>
515 <td>Args / Params</td>
516 <td>Call Site</td>
517 <td>Definition</td>
518 </tr>
519 </thead>
520
521 <tr>
522 <td>Word args</td>
523<td>
524
525 my-cd /tmp
526
527</td>
528<td>
529
530 proc my-cd (dest) {
531 cd $dest
532 }
533
534</td>
535 </tr>
536
537 <tr>
538 <td>Rest Word Params</td>
539<td>
540
541 my-cd -L /tmp
542
543</td>
544<td>
545
546 proc my-cd (...flags) {
547 cd @flags
548 }
549
550 <tr>
551 <td>Spread Word Args</td>
552<td>
553
554 var flags = :| -L /tmp |
555 my-cd @flags
556
557</td>
558<td>
559
560(as above)
561
562</td>
563 </tr>
564
565</td>
566 </tr>
567
568 <tr>
569 <td colspan=3 style="text-align: center; padding: 3em">...</td>
570 </tr>
571
572 <tr>
573 <td>Typed Pos Arg</td>
574<td>
575
576 print-max (3, 4)
577
578</td>
579<td>
580
581 proc print-max ( ; x, y) {
582 echo $[x if x > y else y]
583 }
584
585</td>
586 </tr>
587
588 <tr>
589 <td>Typed Named Arg</td>
590<td>
591
592 print-max (3, 4, start=5)
593
594</td>
595<td>
596
597 proc print-max ( ; x, y; start=0) {
598 # ...
599 }
600
601</td>
602 </tr>
603
604 <tr>
605 <td colspan=3 style="text-align: center; padding: 3em">...</td>
606 </tr>
607
608
609
610 <tr>
611 <td>Block Argument</td>
612<td>
613
614 my-cd /tmp {
615 echo $PWD
616 echo hi
617 }
618
619</td>
620<td>
621
622 proc my-cd (dest; ; ; block) {
623 cd $dest (; ; block)
624 }
625
626</td>
627 </tr>
628
629 <tr>
630 <td>All Four Kinds</td>
631<td>
632
633 p 'word' (42, verbose=true) {
634 echo $PWD
635 echo hi
636 }
637
638</td>
639<td>
640
641 proc p (w; myint; verbose=false; block) {
642 = w
643 = myint
644 = verbose
645 = block
646 }
647
648</td>
649 </tr>
650
651</table>
652
653## Common Features
654
655Let's recap the common features of procs and funcs.
656
657### Spread Args, Rest Params
658
659- Spread arg list `...` at call site
660- Rest params `...` at definition
661
662### The `error` builtin raises exceptions
663
664The `error` builtin is idiomatic in both funcs and procs:
665
666 func f(x) {
667 if (x <= 0) {
668 error 'Should be positive' (status=99)
669 }
670 }
671
672Tip: reserve such errors for **exceptional** situations. For example, an input
673string being invalid may not be uncommon, while a disk full I/O error is more
674exceptional.
675
676(The `error` builtin is implemented with C++ exceptions, which are slow in the
677error case.)
678
679### Out Params: `&myvar` is of type `value.Place`
680
681Out params are more common in procs, because they don't have a typed return
682value.
683
684 proc p ( ; out) {
685 call out->setValue(42)
686 }
687 var x
688 p (&x)
689 echo "x set to $x" # => x set to 42
690
691But they can also be used in funcs:
692
693 func f (out) {
694 call out->setValue(42)
695 }
696 var x
697 call f(&x)
698 echo "x set to $x" # => x set to 42
699
700Observation: procs can do everything funcs can. But you may want the purity
701and familiar syntax of a `func`.
702
703---
704
705Design note: out params are a nicer way of doing what bash does with `declare
706-n` aka `nameref` variables. They don't rely on [dynamic
707scope]($xref:dynamic-scope).
708
709## Proc-Only Features
710
711Procs have some features that funcs don't have.
712
713### Lazy Arg Lists `where [x > 10]`
714
715A lazy arg list is implemented with `shopt --set parse_bracket`, and is syntax
716sugar for an unevaluated `value.Expr`.
717
718Longhand:
719
720 var my_expr = ^[42 === x] # value of type Expr
721 assert (myexpr)
722
723Shorthand:
724
725 assert [42 === x] # equivalent to the above
726
727### Open Proc Signatures bind `argv`
728
729TODO: Implement new `ARGV` semantics.
730
731When a proc signature omits `()`, it's called **"open"** because the caller can
732pass "extra" arguments:
733
734 proc my-open {
735 write 'args are' @ARGV
736 }
737 # All valid:
738 my-open
739 my-open 1
740 my-open 1 2
741
742Stricter closed procs:
743
744 proc my-closed (x) {
745 write 'arg is' $x
746 }
747 my-closed # runtime error: missing argument
748 my-closed 1 # valid
749 my-closed 1 2 # runtime error: too many arguments
750
751
752An "open" proc is nearly is nearly identical to a shell function:
753
754 shfunc() {
755 write 'args are' @ARGV
756 }
757
758## Methods are Funcs Bound to Objects
759
760Values of type `Obj` have an ordered set of name-value bindings, as well as a
761prototype chain of more `Obj` instances ("parents"). They support these
762operators:
763
764- dot (`.`) looks for attributes or methods with a given name.
765 - Reference: [ysh-attr](ref/chap-expr-lang.html#ysh-attr)
766 - Attributes may be in the object, or up the chain. They are returned
767 literally.
768 - Methods live up the chain. They are returned as `BoundFunc`, so that the
769 first `self` argument of a method call is the object itself.
770- Thin arrow (`->`) looks for mutating methods, which have an `M/` prefix.
771 - Reference: [thin-arrow](ref/chap-expr-lang.html#thin-arrow)
772
773## The `__invoke__` method makes an Object "Proc-like"
774
775First, define a proc, with the first typed arg named `self`:
776
777 proc myInvoke (word_param; self, int_param) {
778 echo "sum = $[self.x + self.y + int_param]"
779 }
780
781Make it the `__invoke__` method of an `Obj`:
782
783 var methods = Object(null, {__invoke__: myInvoke})
784 var invokable_obj = Object(methods, {x: 1, y: 2})
785
786Then invoke it like a proc:
787
788 invokable_obj myword (3)
789 # sum => 6
790
791## Usage Notes
792
793### 3 Ways to Return a Value
794
795Let's review the recommended ways to "return" a value:
796
7971. `return (x)` in a `func`.
798 - The parentheses are required because expressions like `(x + 1)` should
799 look different than words.
8001. Pass a `value.Place` instance to a proc or func.
801 - That is, out param `&out`.
8021. Print to stdout in a `proc`
803 - Capture it with command sub: `$(myproc)`
804 - Or with `read`: `myproc | read --all; echo $_reply`
805
806Obsolete ways of "returning":
807
8081. Using `declare -n` aka `nameref` variables in bash.
8091. Relying on [dynamic scope]($xref:dynamic-scope) in POSIX shell.
810
811### Procs Compose in Pipelines / "Bernstein Chaining"
812
813Some YSH users may tend toward funcs because they're more familiar. But shell
814composition with procs is very powerful!
815
816They have at least two kinds of composition that funcs don't have.
817
818See #[shell-the-good-parts]($blog-tag):
819
8201. [Shell Has a Forth-Like
821 Quality](https://www.oilshell.org/blog/2017/01/13.html) - Bernstein
822 chaining.
8231. [Pipelines Support Vectorized, Point-Free, and Imperative
824 Style](https://www.oilshell.org/blog/2017/01/15.html) - the shell can
825 transparently run procs as elements of pipelines.
826
827<!--
828
829In summary:
830
831* func signatures look like JavaScript, Julia, and Go.
832 * named and positional are separated with `;` in the signature.
833 * The prefix `...` "spread" operator takes the place of Python's `*args` and `**kwargs`.
834 * There are optional type annotations
835* procs are like shell functions
836 * but they also allow you to name parameters, and throw errors if the arity
837is wrong.
838 * and they take blocks.
839
840-->
841
842## Summary
843
844YSH is influenced by both shell and Python, so it has both procs and funcs.
845
846Many programmers will gravitate towards funcs because they're familiar, but
847procs are more powerful and shell-like.
848
849Make your YSH programs by learning to use procs!
850
851## Appendix
852
853### Implementation Details
854
855procs vs. funcs both have these concerns:
856
8571. Evaluation of default args at definition time.
8581. Evaluation of actual args at the call site.
8591. Arg-Param binding for builtin functions, e.g. with `typed_args.Reader`.
8601. Arg-Param binding for user-defined functions.
861
862So the implementation can be thought of as a **2 &times; 4 matrix**, with some
863code shared. This code is mostly in [ysh/func_proc.py]($oils-src).
864
865### Related
866
867- [Variable Declaration, Mutation, and Scope](variables.html) - in particular,
868 procs don't have [dynamic scope]($xref:dynamic-scope).
869- [Block Literals](block-literals.html) (in progress)
870
871<!--
872TODO: any reference topics?
873-->
874
875<!--
876OK we're getting close here -- #**language-design>Unifying Proc and Func Params**
877
878I think we need to write a quick guide first, not a reference
879
880
881It might have some **tables**
882
883It might mention concerete use cases like the **flag parser** -- #**oil-dev>Progress on argparse**
884
885
886### Diff-based explanation
887
888- why not Python -- because of `/` and `*` special cases
889- Julia influence
890- lazy args for procs `where` filters and `awk`
891- out Ref parameters are for "returning" without printing to stdout
892
893#**language-design>N ways to "return" a value**
894
895
896- What does shell have?
897 - it has blocks, e.g. with redirects
898 - it has functions without params -- only named params
899
900
901- Ruby influence -- rich DSLs
902
903
904So I think you can say we're a mix of
905
906- shell
907- Python
908- Julia (mostly subsumes Python?)
909- Ruby
910
911
912### Implemented-based explanation
913
914- ASDL schemas -- #**oil-dev>Good Proc/Func refactoring**
915
916
917### Big Idea: procs are for I/O, funcs are for computation
918
919We may want to go full in on this idea with #**language-design>func evaluator without redirects and $?**
920
921
922### Very Basic Advice, Up Front
923
924
925Done with #**language-design>value.Place, & operator, read builtin**
926
927Place works with both func and proc
928
929
930### Bump
931
932I think this might go in the backlog - #**blog-ideas**
933
934
935#**language-design>Simplify proc param passing?**
936
937-->
938
939
940
941<!-- vim sw=2 -->