OILS / doc / ref / chap-expr-lang.md View on Github | oilshell.org

836 lines, 515 significant
1---
2title: YSH Expression Language (Oils Reference)
3all_docs_url: ..
4body_css_class: width40
5default_highlighter: oils-sh
6preserve_anchor_case: yes
7---
8
9<div class="doc-ref-header">
10
11[Oils Reference](index.html) &mdash;
12Chapter **YSH Expression Language**
13
14</div>
15
16This chapter describes the YSH expression language, which includes [Egg
17Expressions]($xref:eggex).
18
19<div id="dense-toc">
20</div>
21
22## Assignment
23
24### assign
25
26The `=` operator is used with assignment keywords:
27
28 var x = 42
29 setvar x = 43
30
31 const y = 'k'
32
33 setglobal z = 'g'
34
35### aug-assign
36
37The augmented assignment operators are:
38
39 += -= *= /= **= //= %=
40 &= |= ^= <<= >>=
41
42They are used with `setvar` and `setglobal`. For example:
43
44 setvar x += 2
45
46is the same as:
47
48 setvar x = x + 2
49
50Likewise, these are the same:
51
52 setglobal a[i] -= 1
53
54 setglobal a[i] = a[i] - 1
55
56## Literals
57
58### atom-literal
59
60YSH uses JavaScript-like spellings for these three "atoms":
61
62 null # type Null
63 true false # type Bool
64
65Note: to signify "no value", you may sometimes use an empty string `''`,
66instead of `null`.
67
68### int-literal
69
70Examples of integer literals:
71
72 var decimal = 42
73 var big = 42_000
74
75 var hex = 0x0010_ffff
76
77 var octal = 0o755
78
79 var binary = 0b0001_0000
80
81### float-lit
82
83Examples of float literals:
84
85 var myfloat = 3.14
86
87 var f2 = -1.5e-100
88
89### char-literal
90
91Three kinds of unquoted backslash escapes are allowed in expression mode. They
92match what's available in quoted J8-style strings:
93
94 var backslash = \\
95 var quotes = \' ++ \" # same as u'\'' ++ '"'
96
97 var mu = \u{3bc} # same as u'\u{3bc}'
98
99 var nul = \y00 # same as b'\y00'
100
101### ysh-string
102
103YSH has single and double-quoted strings borrowed from Bourne shell, and
104C-style strings borrowed from J8 Notation.
105
106Double quoted strings respect `$` interpolation:
107
108 var dq = "hello $world and $(hostname)"
109
110You can add a `$` before the left quote to be explicit: `$"x is $x"` rather
111than `"x is $x"`.
112
113Single quoted strings may be raw:
114
115 var s = r'line\n' # raw string means \n is literal, NOT a newline
116
117Or *J8 strings* with backslash escapes:
118
119 var s = u'line\n \u{3bc}' # unicode string means \n is a newline
120 var s = b'line\n \u{3bc} \yff' # same thing, but also allows bytes
121
122Both `u''` and `b''` strings evaluate to the single `Str` type. The difference
123is that `b''` strings allow the `\yff` byte escape.
124
125#### Notes
126
127There's no way to express a single quote in raw strings. Use one of the other
128forms instead:
129
130 var sq = "single quote: ' "
131 var sq = u'single quote: \' '
132
133Sometimes you can omit the `r`, e.g. where there are no backslashes and thus no
134ambiguity:
135
136 echo 'foo'
137 echo r'foo' # same thing
138
139The `u''` and `b''` strings are called *J8 strings* because the syntax in YSH
140**code** matches JSON-like **data**.
141
142 var strU = u'mu = \u{3bc}' # J8 string with escapes
143 var strB = b'bytes \yff' # J8 string that can express byte strings
144
145More examples:
146
147 var myRaw = r'[a-z]\n' # raw strings can be used for regexes (not
148 # eggexes)
149
150### triple-quoted
151
152Triple-quoted string literals have leading whitespace stripped on each line.
153They come in the same variants:
154
155 var dq = """
156 hello $world and $(hostname)
157 no leading whitespace
158 """
159
160 var myRaw = r'''
161 raw string
162 no leading whitespace
163 '''
164
165 var strU = u'''
166 string that happens to be unicode \u{3bc}
167 no leading whitespace
168 '''
169
170 var strB = b'''
171 string that happens to be bytes \u{3bc} \yff
172 no leading whitespace
173 '''
174
175Again, you can omit the `r` prefix if there's no backslash, because it's not
176ambiguous:
177
178 var myRaw = '''
179 raw string
180 no leading whitespace
181 '''
182
183### str-template
184
185String templates use the same syntax as double-quoted strings:
186
187 var mytemplate = ^"name = $name, age = $age"
188
189Related topics:
190
191- [Str => replace](chap-type-method.html#replace)
192- [ysh-string](chap-expr-lang.html#ysh-string)
193
194### list-literal
195
196Lists have a Python-like syntax:
197
198 var mylist = ['one', 'two', [42, 43]]
199
200And a shell-like syntax:
201
202 var list2 = :| one two |
203
204The shell-like syntax accepts the same syntax as a simple command:
205
206 ls $mystr @ARGV *.py {foo,bar}@example.com
207
208 # Rather than executing ls, evaluate words into a List
209 var cmd = :| ls $mystr @ARGV *.py {foo,bar}@example.com |
210
211### dict-literal
212
213Dicts look like JavaScript.
214
215 var d = {
216 key1: 'value', # key can be unquoted if it looks like a var name
217 'key2': 42, # or quote it
218
219 ['key2' ++ suffix]: 43, # bracketed expression
220 }
221
222Omitting a value means that the corresponding key takes the value of a var of
223the same name:
224
225 ysh$ var x = 42
226 ysh$ var y = 43
227
228 ysh$ var d = {x, y} # values omitted
229 ysh$ = d
230 (Dict) {x: 42, y: 43}
231
232### range
233
234A Range is a sequence of numbers that can be iterated over. The `..<` operator
235constructs half-open ranges.
236
237 for i in (0 ..< 3) {
238 echo $i
239 }
240 => 0
241 => 1
242 => 2
243
244The `..=` operator constructs closed ranges:
245
246 for i in (0 ..= 3) {
247 echo $i
248 }
249 => 0
250 => 1
251 => 2
252 => 3
253
254### block-expr
255
256In YSH expressions, we use `^()` to create a [Command][] object:
257
258 var myblock = ^(echo $PWD; ls *.txt)
259
260It's more common for [Command][] objects to be created with block arguments,
261which are not expressions:
262
263 cd /tmp {
264 echo $PWD
265 ls *.txt
266 }
267
268[Command]: chap-type-method.html#Command
269
270### expr-literal
271
272An expression literal is an object that holds an unevaluated expression:
273
274 var myexpr = ^[1 + 2*3]
275
276[Expr]: chap-type-method.html#Expr
277
278## Operators
279
280### op-precedence
281
282YSH operator precedence is identical to Python's operator precedence.
283
284New operators:
285
286- `++` has the same precedence as `+`
287- `->` and `=>` have the same precedence as `.`
288
289<!-- TODO: show grammar -->
290
291
292<h3 id="concat">concat <code>++</code></h3>
293
294The concatenation operator works on `Str` objects:
295
296 ysh$ var s = 'hello'
297 ysh$ var t = s ++ ' world'
298
299 ysh$ = t
300 (Str) "hello world"
301
302and `List` objects:
303
304 ysh$ var L = ['one', 'two']
305 ysh$ var M = L ++ ['three', '4']
306
307 ysh$ = M
308 (List) ["one", "two", "three", "4"]
309
310String interpolation can be nicer than `++`:
311
312 var t2 = "${s} world" # same as t
313
314Likewise, splicing lists can be nicer:
315
316 var M2 = :| @L three 4 | # same as M
317
318### ysh-equals
319
320YSH has strict equality:
321
322 a === b # Python-like, without type conversion
323 a !== b # negated
324
325And type converting equality:
326
327 '3' ~== 3 # True, type conversion
328
329The `~==` operator expects a string as the left operand.
330
331---
332
333Note that:
334
335- `3 === 3.0` is false because integers and floats are different types, and
336 there is no type conversion.
337- `3 ~== 3.0` is an error, because the left operand isn't a string.
338
339You may want to use explicit `int()` and `float()` to convert numbers, and then
340compare them.
341
342---
343
344Compare objects for identity with `is`:
345
346 ysh$ var d = {}
347 ysh$ var e = d
348
349 ysh$ = d is d
350 (Bool) true
351
352 ysh$ = d is {other: 'dict'}
353 (Bool) false
354
355To negate `is`, use `is not` (like Python:
356
357 ysh$ d is not {other: 'dict'}
358 (Bool) true
359
360### ysh-in
361
362The `in` operator tests if a key is in a dictionary:
363
364 var d = {k: 42}
365 if ('k' in d) {
366 echo yes
367 } # => yes
368
369Unlike Python, `in` doesn't work on `Str` and `List` instances. This because
370those operations take linear time rather than constant time (O(n) rather than
371O(1)).
372
373TODO: Use `includes() / contains()` methods instead.
374
375### ysh-compare
376
377The comparison operators apply to integers or floats:
378
379 4 < 4 # => false
380 4 <= 4 # => true
381
382 5.0 > 5.0 # => false
383 5.0 >= 5.0 # => true
384
385Example in context:
386
387 if (x < 0) {
388 echo 'x is negative'
389 }
390
391### ysh-logical
392
393The logical operators take boolean operands, and are spelled like Python:
394
395 not
396 and or
397
398Note that they are distinct from `! && ||`, which are part of the [command
399language](chap-cmd-lang.html).
400
401### ysh-arith
402
403YSH supports most of the arithmetic operators from Python. Notably, `/` and `%`
404differ from Python as [they round toward zero, not negative
405infinity](https://www.oilshell.org/blog/2024/03/release-0.21.0.html#integers-dont-do-whatever-python-or-c-does).
406
407Use `+ - *` for `Int` or `Float` addition, subtraction and multiplication. If
408any of the operands are `Float`s, then the output will also be a `Float`.
409
410Use `/` and `//` for `Float` division and `Int` division, respectively. `/`
411will _always_ result in a `Float`, meanwhile `//` will _always_ result in an
412`Int`.
413
414 = 1 / 2 # => (Float) 0.5
415 = 1 // 2 # => (Int) 0
416
417Use `%` to compute the _remainder_ of integer division. The left operand must
418be an `Int` and the right a _positive_ `Int`.
419
420 = 1 % 2 # -> (Int) 1
421 = -4 % 2 # -> (Int) 0
422
423Use `**` for exponentiation. The left operand must be an `Int` and the right a
424_positive_ `Int`.
425
426All arithmetic operators may coerce either of their operands from strings to a
427number, provided those strings are formatted as numbers.
428
429 = 10 + '1' # => (Int) 11
430
431Operators like `+ - * /` will coerce strings to _either_ an `Int` or `Float`.
432However, operators like `// ** %` and bit shifts will coerce strings _only_ to
433an `Int`.
434
435 = '1.14' + '2' # => (Float) 3.14
436 = '1.14' % '2' # Type Error: Left operand is a Str
437
438### ysh-bitwise
439
440Bitwise operators are like Python and C:
441
442 ~ # unary complement
443
444 & | ^ # binary and, or, xor
445
446 >> << # bit shift
447
448### ysh-ternary
449
450The ternary operator is borrowed from Python:
451
452 display = 'yes' if len(s) else 'empty'
453
454### ysh-index
455
456`Str` objects can be indexed by byte:
457
458 ysh$ var s = 'cat'
459 ysh$ = mystr[1]
460 (Str) 'a'
461
462 ysh$ = mystr[-1] # index from the end
463 (Str) 't'
464
465`List` objects:
466
467 ysh$ var mylist = [1, 2, 3]
468 ysh$ = mylist[2]
469 (Int) 3
470
471`Dict` objects are indexed by string key:
472
473 ysh$ var mydict = {'key': 42}
474 ysh$ = mydict['key']
475 (Int) 42
476
477### ysh-attr
478
479The `.` operator looks up values on either `Dict` or `Obj` instances.
480
481On dicts, it looks for the value associated with a key. That is, the
482expression `mydict.key` is short for `mydict['key']` (like JavaScript, but
483unlike Python.)
484
485---
486
487On objects, the expression `obj.x` looks for attributes, with a special rule
488for bound methods. The rules are:
489
4901. Search the properties of `obj` for a field named `x`.
491 - If it exists, return the value literally. (It can be of any type: `Func`, `Int`,
492 `Str`, ...)
4932. Search up the prototype chain for a field named `x`.
494 - If it exists, and is **not** a `Func`, return the value literally.
495 - If it **is** a `Func`, return **bound method**, which is an (object,
496 function) pair.
497
498Later, when the bound method is called, the object is passed as the first
499argument to the function (`self`), making it a method call. This is how a
500method has access to the object's properties.
501
502Example of first rule:
503
504 func Free(i) {
505 return (i + 1)
506 }
507 var module = Object(null, {Free})
508 echo $[module.Free(42)] # => 43
509
510Example of second rule:
511
512 func method(self, i) {
513 return (self.n + i)
514 }
515 var methods = Object(null, {method})
516 var obj = Object(methods, {n: 10})
517 echo $[obj.method(42)] # => 52
518
519### ysh-slice
520
521Slicing gives you a subsequence of a `Str` or `List`, as in Python.
522
523Negative indices are relative to the end.
524
525String example:
526
527 $ var s = 'spam eggs'
528 $ pp (s[1:-1])
529 (Str) "pam egg"
530
531 $ echo "x $[s[2:]]"
532 x am eggs
533
534List example:
535
536 $ var foods = ['ale', 'bean', 'corn']
537 $ pp (foods[-2:])
538 (List) ["bean","corn"]
539
540 $ write -- @[foods[:2]]
541 ale
542 bean
543
544### func-call
545
546A function call expression looks like Python:
547
548 ysh$ = f('s', 't', named=42)
549
550A semicolon `;` can be used after positional args and before named args, but
551isn't always required:
552
553 ysh$ = f('s', 't'; named=42)
554
555In these cases, the `;` is necessary:
556
557 ysh$ = f(...args; ...kwargs)
558
559 ysh$ = f(42, 43; ...kwargs)
560
561### thin-arrow
562
563The thin arrow is for mutating methods:
564
565 var mylist = ['bar']
566 call mylist->pop()
567
568 var mydict = {name: 'foo'}
569 call mydict->erase('name')
570
571On `Obj` instances, `obj->mymethod` looks up the prototype chain for a function
572named `M/mymethod`. The `M/` prefix signals mutation.
573
574Example:
575
576 func inc(self, n) {
577 setvar self.i += n
578 }
579 var Counter_methods = Object(null, {'M/inc': inc})
580 var c = Object(Counter_methods, {i: 0})
581
582 call c->inc(5)
583 echo $[c.i] # => 5
584
585It does **not** look in the properties of an object.
586
587### fat-arrow
588
589The fat arrow is for transforming methods:
590
591 if (s => startsWith('prefix')) {
592 echo 'yes'
593 }
594
595If the method lookup on `s` fails, it looks for free functions. This means it
596can be used for "chaining" transformations:
597
598 var x = myFunc() => list() => join()
599
600### match-ops
601
602YSH has four pattern matching operators: `~ !~ ~~ !~~`.
603
604Does string match an **eggex**?
605
606 var filename = 'x42.py'
607 if (filename ~ / d+ /) {
608 echo 'number'
609 }
610
611Does a string match a POSIX regular expression (ERE syntax)?
612
613 if (filename ~ '[[:digit:]]+') {
614 echo 'number'
615 }
616
617Negate the result with the `!~` operator:
618
619 if (filename !~ /space/ ) {
620 echo 'no space'
621 }
622
623 if (filename !~ '[[:space:]]' ) {
624 echo 'no space'
625 }
626
627Does a string match a **glob**?
628
629 if (filename ~~ '*.py') {
630 echo 'Python'
631 }
632
633 if (filename !~~ '*.py') {
634 echo 'not Python'
635 }
636
637Take care not to confuse glob patterns and regular expressions.
638
639- Related doc: [YSH Regex API](../ysh-regex-api.html)
640
641## Eggex
642
643### re-literal
644
645An eggex literal looks like this:
646
647 / expression ; flags ; translation preference /
648
649The flags and translation preference are both optional.
650
651Examples:
652
653 var pat = / d+ / # => [[:digit:]]+
654
655You can specify flags passed to libc `regcomp()`:
656
657 var pat = / d+ ; reg_icase reg_newline /
658
659You can specify a translation preference after a second semi-colon:
660
661 var pat = / d+ ; ; ERE /
662
663Right now the translation preference does nothing. It could be used to
664translate eggex to PCRE or Python syntax.
665
666- Related doc: [Egg Expressions](../eggex.html)
667
668### re-primitive
669
670There are two kinds of eggex primitives.
671
672"Zero-width assertions" match a position rather than a character:
673
674 %start # translates to ^
675 %end # translates to $
676
677Literal characters appear within **single** quotes:
678
679 'oh *really*' # translates to regex-escaped string
680
681Double-quoted strings are **not** eggex primitives. Instead, you can use
682splicing of strings:
683
684 var dq = "hi $name"
685 var eggex = / @dq /
686
687### class-literal
688
689An eggex character class literal specifies a set. It can have individual
690characters and ranges:
691
692 [ 'x' 'y' 'z' a-f A-F 0-9 ] # 3 chars, 3 ranges
693
694Omit quotes on ASCII characters:
695
696 [ x y z ] # avoid typing 'x' 'y' 'z'
697
698Sets of characters can be written as strings
699
700 [ 'xyz' ] # any of 3 chars, not a sequence of 3 chars
701
702Backslash escapes are respected:
703
704 [ \\ \' \" \0 ]
705 [ \xFF \u{3bc} ]
706
707(Note that we don't use `\yFF`, as in J8 strings.)
708
709Splicing:
710
711 [ @str_var ]
712
713Negation always uses `!`
714
715 ![ a-f A-F 'xyz' @str_var ]
716
717### named-class
718
719Perl-like shortcuts for sets of characters:
720
721 [ dot ] # => .
722 [ digit ] # => [[:digit:]]
723 [ space ] # => [[:space:]]
724 [ word ] # => [[:alpha:]][[:digit:]]_
725
726Abbreviations:
727
728 [ d s w ] # Same as [ digit space word ]
729
730Valid POSIX classes:
731
732 alnum cntrl lower space
733 alpha digit print upper
734 blank graph punct xdigit
735
736Negated:
737
738 !digit !space !word
739 !d !s !w
740 !alnum # etc.
741
742### re-repeat
743
744Eggex repetition looks like POSIX syntax:
745
746 / 'a'? / # zero or one
747 / 'a'* / # zero or more
748 / 'a'+ / # one or more
749
750Counted repetitions:
751
752 / 'a'{3} / # exactly 3 repetitions
753 / 'a'{2,4} / # between 2 to 4 repetitions
754
755### re-compound
756
757Sequence expressions with a space:
758
759 / word digit digit / # Matches 3 characters in sequence
760 # Examples: a42, b51
761
762(Compare `/ [ word digit ] /`, which is a set matching 1 character.)
763
764Alternation with `|`:
765
766 / word | digit / # Matches 'a' OR '9', for example
767
768Grouping with parentheses:
769
770 / (word digit) | \\ / # Matches a9 or \
771
772### re-capture
773
774To retrieve a substring of a string that matches an Eggex, use a "capture
775group" like `<capture ...>`.
776
777Here's an eggex with a **positional** capture:
778
779 var pat = / 'hi ' <capture d+> / # access with _group(1)
780 # or Match => _group(1)
781
782Captures can be **named**:
783
784 <capture d+ as month> # access with _group('month')
785 # or Match => group('month')
786
787Captures can also have a type **conversion func**:
788
789 <capture d+ : int> # _group(1) returns Int
790
791 <capture d+ as month: int> # _group('month') returns Int
792
793Related docs and help topics:
794
795- [YSH Regex API](../ysh-regex-api.html)
796- [`_group()`](chap-builtin-func.html#_group)
797- [`Match => group()`](chap-type-method.html#group)
798
799### re-splice
800
801To build an eggex out of smaller expressions, you can **splice** eggexes
802together:
803
804 var D = / [0-9][0-9] /
805 var time = / @D ':' @D / # [0-9][0-9]:[0-9][0-9]
806
807If the variable begins with a capital letter, you can omit `@`:
808
809 var ip = / D ':' D /
810
811You can also splice a string:
812
813 var greeting = 'hi'
814 var pat = / @greeting ' world' / # hi world
815
816Splicing is **not** string concatenation; it works on eggex subtrees.
817
818### re-flags
819
820Valid ERE flags, which are passed to libc's `regcomp()`:
821
822- `reg_icase` aka `i` - ignore case
823- `reg_newline` - 4 matching changes related to newlines
824
825See `man regcomp`.
826
827### re-multiline
828
829Multi-line eggexes aren't yet implemented. Splicing makes it less necessary:
830
831 var Name = / <capture [a-z]+ as name> /
832 var Num = / <capture d+ as num> /
833 var Space = / <capture s+ as space> /
834
835 # For variables named like CapWords, splicing @Name doesn't require @
836 var lexer = / Name | Num | Space /