1 | ---
|
2 | title: YSH Expression Language (Oils Reference)
|
3 | all_docs_url: ..
|
4 | body_css_class: width40
|
5 | default_highlighter: oils-sh
|
6 | preserve_anchor_case: yes
|
7 | ---
|
8 |
|
9 | <div class="doc-ref-header">
|
10 |
|
11 | [Oils Reference](index.html) —
|
12 | Chapter **YSH Expression Language**
|
13 |
|
14 | </div>
|
15 |
|
16 | This chapter describes the YSH expression language, which includes [Egg
|
17 | Expressions]($xref:eggex).
|
18 |
|
19 | <div id="dense-toc">
|
20 | </div>
|
21 |
|
22 | ## Assignment
|
23 |
|
24 | ### assign
|
25 |
|
26 | The `=` operator is used with assignment keywords:
|
27 |
|
28 | var x = 42
|
29 | setvar x = 43
|
30 |
|
31 | const y = 'k'
|
32 |
|
33 | setglobal z = 'g'
|
34 |
|
35 | ### aug-assign
|
36 |
|
37 | The augmented assignment operators are:
|
38 |
|
39 | += -= *= /= **= //= %=
|
40 | &= |= ^= <<= >>=
|
41 |
|
42 | They are used with `setvar` and `setglobal`. For example:
|
43 |
|
44 | setvar x += 2
|
45 |
|
46 | is the same as:
|
47 |
|
48 | setvar x = x + 2
|
49 |
|
50 | Likewise, these are the same:
|
51 |
|
52 | setglobal a[i] -= 1
|
53 |
|
54 | setglobal a[i] = a[i] - 1
|
55 |
|
56 | ## Literals
|
57 |
|
58 | ### atom-literal
|
59 |
|
60 | YSH uses JavaScript-like spellings for these three "atoms":
|
61 |
|
62 | null # type Null
|
63 | true false # type Bool
|
64 |
|
65 | Note: to signify "no value", you may sometimes use an empty string `''`,
|
66 | instead of `null`.
|
67 |
|
68 | ### int-literal
|
69 |
|
70 | Examples of integer literals:
|
71 |
|
72 | var decimal = 42
|
73 | var big = 42_000
|
74 |
|
75 | var hex = 0x0010_ffff
|
76 |
|
77 | var octal = 0o755
|
78 |
|
79 | var binary = 0b0001_0000
|
80 |
|
81 | ### float-lit
|
82 |
|
83 | Examples of float literals:
|
84 |
|
85 | var myfloat = 3.14
|
86 |
|
87 | var f2 = -1.5e-100
|
88 |
|
89 | ### char-literal
|
90 |
|
91 | Three kinds of unquoted backslash escapes are allowed in expression mode. They
|
92 | match what's available in quoted J8-style strings:
|
93 |
|
94 | var backslash = \\
|
95 | var quotes = \' ++ \" # same as u'\'' ++ '"'
|
96 |
|
97 | var mu = \u{3bc} # same as u'\u{3bc}'
|
98 |
|
99 | var nul = \y00 # same as b'\y00'
|
100 |
|
101 | ### ysh-string
|
102 |
|
103 | YSH has single and double-quoted strings borrowed from Bourne shell, and
|
104 | C-style strings borrowed from J8 Notation.
|
105 |
|
106 | Double quoted strings respect `$` interpolation:
|
107 |
|
108 | var dq = "hello $world and $(hostname)"
|
109 |
|
110 | You can add a `$` before the left quote to be explicit: `$"x is $x"` rather
|
111 | than `"x is $x"`.
|
112 |
|
113 | Single quoted strings may be raw:
|
114 |
|
115 | var s = r'line\n' # raw string means \n is literal, NOT a newline
|
116 |
|
117 | Or *J8 strings* with backslash escapes:
|
118 |
|
119 | var s = u'line\n \u{3bc}' # unicode string means \n is a newline
|
120 | var s = b'line\n \u{3bc} \yff' # same thing, but also allows bytes
|
121 |
|
122 | Both `u''` and `b''` strings evaluate to the single `Str` type. The difference
|
123 | is that `b''` strings allow the `\yff` byte escape.
|
124 |
|
125 | #### Notes
|
126 |
|
127 | There's no way to express a single quote in raw strings. Use one of the other
|
128 | forms instead:
|
129 |
|
130 | var sq = "single quote: ' "
|
131 | var sq = u'single quote: \' '
|
132 |
|
133 | Sometimes you can omit the `r`, e.g. where there are no backslashes and thus no
|
134 | ambiguity:
|
135 |
|
136 | echo 'foo'
|
137 | echo r'foo' # same thing
|
138 |
|
139 | The `u''` and `b''` strings are called *J8 strings* because the syntax in YSH
|
140 | **code** matches JSON-like **data**.
|
141 |
|
142 | var strU = u'mu = \u{3bc}' # J8 string with escapes
|
143 | var strB = b'bytes \yff' # J8 string that can express byte strings
|
144 |
|
145 | More examples:
|
146 |
|
147 | var myRaw = r'[a-z]\n' # raw strings can be used for regexes (not
|
148 | # eggexes)
|
149 |
|
150 | ### triple-quoted
|
151 |
|
152 | Triple-quoted string literals have leading whitespace stripped on each line.
|
153 | They come in the same variants:
|
154 |
|
155 | var dq = """
|
156 | hello $world and $(hostname)
|
157 | no leading whitespace
|
158 | """
|
159 |
|
160 | var myRaw = r'''
|
161 | raw string
|
162 | no leading whitespace
|
163 | '''
|
164 |
|
165 | var strU = u'''
|
166 | string that happens to be unicode \u{3bc}
|
167 | no leading whitespace
|
168 | '''
|
169 |
|
170 | var strB = b'''
|
171 | string that happens to be bytes \u{3bc} \yff
|
172 | no leading whitespace
|
173 | '''
|
174 |
|
175 | Again, you can omit the `r` prefix if there's no backslash, because it's not
|
176 | ambiguous:
|
177 |
|
178 | var myRaw = '''
|
179 | raw string
|
180 | no leading whitespace
|
181 | '''
|
182 |
|
183 | ### str-template
|
184 |
|
185 | String templates use the same syntax as double-quoted strings:
|
186 |
|
187 | var mytemplate = ^"name = $name, age = $age"
|
188 |
|
189 | Related topics:
|
190 |
|
191 | - [Str => replace](chap-type-method.html#replace)
|
192 | - [ysh-string](chap-expr-lang.html#ysh-string)
|
193 |
|
194 | ### list-literal
|
195 |
|
196 | Lists have a Python-like syntax:
|
197 |
|
198 | var mylist = ['one', 'two', [42, 43]]
|
199 |
|
200 | And a shell-like syntax:
|
201 |
|
202 | var list2 = :| one two |
|
203 |
|
204 | The shell-like syntax accepts the same syntax as a simple command:
|
205 |
|
206 | ls $mystr @ARGV *.py {foo,bar}@example.com
|
207 |
|
208 | # Rather than executing ls, evaluate words into a List
|
209 | var cmd = :| ls $mystr @ARGV *.py {foo,bar}@example.com |
|
210 |
|
211 | ### dict-literal
|
212 |
|
213 | Dicts look like JavaScript.
|
214 |
|
215 | var d = {
|
216 | key1: 'value', # key can be unquoted if it looks like a var name
|
217 | 'key2': 42, # or quote it
|
218 |
|
219 | ['key2' ++ suffix]: 43, # bracketed expression
|
220 | }
|
221 |
|
222 | Omitting a value means that the corresponding key takes the value of a var of
|
223 | the same name:
|
224 |
|
225 | ysh$ var x = 42
|
226 | ysh$ var y = 43
|
227 |
|
228 | ysh$ var d = {x, y} # values omitted
|
229 | ysh$ = d
|
230 | (Dict) {x: 42, y: 43}
|
231 |
|
232 | ### range
|
233 |
|
234 | A Range is a sequence of numbers that can be iterated over. The `..<` operator
|
235 | constructs half-open ranges.
|
236 |
|
237 | for i in (0 ..< 3) {
|
238 | echo $i
|
239 | }
|
240 | => 0
|
241 | => 1
|
242 | => 2
|
243 |
|
244 | The `..=` operator constructs closed ranges:
|
245 |
|
246 | for i in (0 ..= 3) {
|
247 | echo $i
|
248 | }
|
249 | => 0
|
250 | => 1
|
251 | => 2
|
252 | => 3
|
253 |
|
254 | ### block-expr
|
255 |
|
256 | In YSH expressions, we use `^()` to create a [Command][] object:
|
257 |
|
258 | var myblock = ^(echo $PWD; ls *.txt)
|
259 |
|
260 | It's more common for [Command][] objects to be created with block arguments,
|
261 | which are not expressions:
|
262 |
|
263 | cd /tmp {
|
264 | echo $PWD
|
265 | ls *.txt
|
266 | }
|
267 |
|
268 | [Command]: chap-type-method.html#Command
|
269 |
|
270 | ### expr-literal
|
271 |
|
272 | An expression literal is an object that holds an unevaluated expression:
|
273 |
|
274 | var myexpr = ^[1 + 2*3]
|
275 |
|
276 | [Expr]: chap-type-method.html#Expr
|
277 |
|
278 | ## Operators
|
279 |
|
280 | ### op-precedence
|
281 |
|
282 | YSH operator precedence is identical to Python's operator precedence.
|
283 |
|
284 | New operators:
|
285 |
|
286 | - `++` has the same precedence as `+`
|
287 | - `->` and `=>` have the same precedence as `.`
|
288 |
|
289 | <!-- TODO: show grammar -->
|
290 |
|
291 |
|
292 | <h3 id="concat">concat <code>++</code></h3>
|
293 |
|
294 | The concatenation operator works on `Str` objects:
|
295 |
|
296 | ysh$ var s = 'hello'
|
297 | ysh$ var t = s ++ ' world'
|
298 |
|
299 | ysh$ = t
|
300 | (Str) "hello world"
|
301 |
|
302 | and `List` objects:
|
303 |
|
304 | ysh$ var L = ['one', 'two']
|
305 | ysh$ var M = L ++ ['three', '4']
|
306 |
|
307 | ysh$ = M
|
308 | (List) ["one", "two", "three", "4"]
|
309 |
|
310 | String interpolation can be nicer than `++`:
|
311 |
|
312 | var t2 = "${s} world" # same as t
|
313 |
|
314 | Likewise, splicing lists can be nicer:
|
315 |
|
316 | var M2 = :| @L three 4 | # same as M
|
317 |
|
318 | ### ysh-equals
|
319 |
|
320 | YSH has strict equality:
|
321 |
|
322 | a === b # Python-like, without type conversion
|
323 | a !== b # negated
|
324 |
|
325 | And type converting equality:
|
326 |
|
327 | '3' ~== 3 # True, type conversion
|
328 |
|
329 | The `~==` operator expects a string as the left operand.
|
330 |
|
331 | ---
|
332 |
|
333 | Note that:
|
334 |
|
335 | - `3 === 3.0` is false because integers and floats are different types, and
|
336 | there is no type conversion.
|
337 | - `3 ~== 3.0` is an error, because the left operand isn't a string.
|
338 |
|
339 | You may want to use explicit `int()` and `float()` to convert numbers, and then
|
340 | compare them.
|
341 |
|
342 | ---
|
343 |
|
344 | Compare objects for identity with `is`:
|
345 |
|
346 | ysh$ var d = {}
|
347 | ysh$ var e = d
|
348 |
|
349 | ysh$ = d is d
|
350 | (Bool) true
|
351 |
|
352 | ysh$ = d is {other: 'dict'}
|
353 | (Bool) false
|
354 |
|
355 | To negate `is`, use `is not` (like Python:
|
356 |
|
357 | ysh$ d is not {other: 'dict'}
|
358 | (Bool) true
|
359 |
|
360 | ### ysh-in
|
361 |
|
362 | The `in` operator tests if a key is in a dictionary:
|
363 |
|
364 | var d = {k: 42}
|
365 | if ('k' in d) {
|
366 | echo yes
|
367 | } # => yes
|
368 |
|
369 | Unlike Python, `in` doesn't work on `Str` and `List` instances. This because
|
370 | those operations take linear time rather than constant time (O(n) rather than
|
371 | O(1)).
|
372 |
|
373 | TODO: Use `includes() / contains()` methods instead.
|
374 |
|
375 | ### ysh-compare
|
376 |
|
377 | The comparison operators apply to integers or floats:
|
378 |
|
379 | 4 < 4 # => false
|
380 | 4 <= 4 # => true
|
381 |
|
382 | 5.0 > 5.0 # => false
|
383 | 5.0 >= 5.0 # => true
|
384 |
|
385 | Example in context:
|
386 |
|
387 | if (x < 0) {
|
388 | echo 'x is negative'
|
389 | }
|
390 |
|
391 | ### ysh-logical
|
392 |
|
393 | The logical operators take boolean operands, and are spelled like Python:
|
394 |
|
395 | not
|
396 | and or
|
397 |
|
398 | Note that they are distinct from `! && ||`, which are part of the [command
|
399 | language](chap-cmd-lang.html).
|
400 |
|
401 | ### ysh-arith
|
402 |
|
403 | YSH supports most of the arithmetic operators from Python. Notably, `/` and `%`
|
404 | differ from Python as [they round toward zero, not negative
|
405 | infinity](https://www.oilshell.org/blog/2024/03/release-0.21.0.html#integers-dont-do-whatever-python-or-c-does).
|
406 |
|
407 | Use `+ - *` for `Int` or `Float` addition, subtraction and multiplication. If
|
408 | any of the operands are `Float`s, then the output will also be a `Float`.
|
409 |
|
410 | Use `/` and `//` for `Float` division and `Int` division, respectively. `/`
|
411 | will _always_ result in a `Float`, meanwhile `//` will _always_ result in an
|
412 | `Int`.
|
413 |
|
414 | = 1 / 2 # => (Float) 0.5
|
415 | = 1 // 2 # => (Int) 0
|
416 |
|
417 | Use `%` to compute the _remainder_ of integer division. The left operand must
|
418 | be an `Int` and the right a _positive_ `Int`.
|
419 |
|
420 | = 1 % 2 # -> (Int) 1
|
421 | = -4 % 2 # -> (Int) 0
|
422 |
|
423 | Use `**` for exponentiation. The left operand must be an `Int` and the right a
|
424 | _positive_ `Int`.
|
425 |
|
426 | All arithmetic operators may coerce either of their operands from strings to a
|
427 | number, provided those strings are formatted as numbers.
|
428 |
|
429 | = 10 + '1' # => (Int) 11
|
430 |
|
431 | Operators like `+ - * /` will coerce strings to _either_ an `Int` or `Float`.
|
432 | However, operators like `// ** %` and bit shifts will coerce strings _only_ to
|
433 | an `Int`.
|
434 |
|
435 | = '1.14' + '2' # => (Float) 3.14
|
436 | = '1.14' % '2' # Type Error: Left operand is a Str
|
437 |
|
438 | ### ysh-bitwise
|
439 |
|
440 | Bitwise operators are like Python and C:
|
441 |
|
442 | ~ # unary complement
|
443 |
|
444 | & | ^ # binary and, or, xor
|
445 |
|
446 | >> << # bit shift
|
447 |
|
448 | ### ysh-ternary
|
449 |
|
450 | The ternary operator is borrowed from Python:
|
451 |
|
452 | display = 'yes' if len(s) else 'empty'
|
453 |
|
454 | ### ysh-index
|
455 |
|
456 | `Str` objects can be indexed by byte:
|
457 |
|
458 | ysh$ var s = 'cat'
|
459 | ysh$ = mystr[1]
|
460 | (Str) 'a'
|
461 |
|
462 | ysh$ = mystr[-1] # index from the end
|
463 | (Str) 't'
|
464 |
|
465 | `List` objects:
|
466 |
|
467 | ysh$ var mylist = [1, 2, 3]
|
468 | ysh$ = mylist[2]
|
469 | (Int) 3
|
470 |
|
471 | `Dict` objects are indexed by string key:
|
472 |
|
473 | ysh$ var mydict = {'key': 42}
|
474 | ysh$ = mydict['key']
|
475 | (Int) 42
|
476 |
|
477 | ### ysh-attr
|
478 |
|
479 | The `.` operator looks up values on either `Dict` or `Obj` instances.
|
480 |
|
481 | On dicts, it looks for the value associated with a key. That is, the
|
482 | expression `mydict.key` is short for `mydict['key']` (like JavaScript, but
|
483 | unlike Python.)
|
484 |
|
485 | ---
|
486 |
|
487 | On objects, the expression `obj.x` looks for attributes, with a special rule
|
488 | for bound methods. The rules are:
|
489 |
|
490 | 1. Search the properties of `obj` for a field named `x`.
|
491 | - If it exists, return the value literally. (It can be of any type: `Func`, `Int`,
|
492 | `Str`, ...)
|
493 | 2. Search up the prototype chain for a field named `x`.
|
494 | - If it exists, and is **not** a `Func`, return the value literally.
|
495 | - If it **is** a `Func`, return **bound method**, which is an (object,
|
496 | function) pair.
|
497 |
|
498 | Later, when the bound method is called, the object is passed as the first
|
499 | argument to the function (`self`), making it a method call. This is how a
|
500 | method has access to the object's properties.
|
501 |
|
502 | Example of first rule:
|
503 |
|
504 | func Free(i) {
|
505 | return (i + 1)
|
506 | }
|
507 | var module = Object(null, {Free})
|
508 | echo $[module.Free(42)] # => 43
|
509 |
|
510 | Example of second rule:
|
511 |
|
512 | func method(self, i) {
|
513 | return (self.n + i)
|
514 | }
|
515 | var methods = Object(null, {method})
|
516 | var obj = Object(methods, {n: 10})
|
517 | echo $[obj.method(42)] # => 52
|
518 |
|
519 | ### ysh-slice
|
520 |
|
521 | Slicing gives you a subsequence of a `Str` or `List`, as in Python.
|
522 |
|
523 | Negative indices are relative to the end.
|
524 |
|
525 | String example:
|
526 |
|
527 | $ var s = 'spam eggs'
|
528 | $ pp (s[1:-1])
|
529 | (Str) "pam egg"
|
530 |
|
531 | $ echo "x $[s[2:]]"
|
532 | x am eggs
|
533 |
|
534 | List example:
|
535 |
|
536 | $ var foods = ['ale', 'bean', 'corn']
|
537 | $ pp (foods[-2:])
|
538 | (List) ["bean","corn"]
|
539 |
|
540 | $ write -- @[foods[:2]]
|
541 | ale
|
542 | bean
|
543 |
|
544 | ### ysh-func-call
|
545 |
|
546 | A function call expression looks like Python:
|
547 |
|
548 | ysh$ = f('s', 't', named=42)
|
549 |
|
550 | A semicolon `;` can be used after positional args and before named args, but
|
551 | isn't always required:
|
552 |
|
553 | ysh$ = f('s', 't'; named=42)
|
554 |
|
555 | In these cases, the `;` is necessary:
|
556 |
|
557 | ysh$ = f(...args; ...kwargs)
|
558 |
|
559 | ysh$ = f(42, 43; ...kwargs)
|
560 |
|
561 | ### thin-arrow
|
562 |
|
563 | The thin arrow is for mutating methods:
|
564 |
|
565 | var mylist = ['bar']
|
566 | call mylist->pop()
|
567 |
|
568 | var mydict = {name: 'foo'}
|
569 | call mydict->erase('name')
|
570 |
|
571 | On `Obj` instances, `obj->mymethod` looks up the prototype chain for a function
|
572 | named `M/mymethod`. The `M/` prefix signals mutation.
|
573 |
|
574 | Example:
|
575 |
|
576 | func inc(self, n) {
|
577 | setvar self.i += n
|
578 | }
|
579 | var Counter_methods = Object(null, {'M/inc': inc})
|
580 | var c = Object(Counter_methods, {i: 0})
|
581 |
|
582 | call c->inc(5)
|
583 | echo $[c.i] # => 5
|
584 |
|
585 | It does **not** look in the properties of an object.
|
586 |
|
587 | ### fat-arrow
|
588 |
|
589 | The fat arrow is for transforming methods:
|
590 |
|
591 | if (s => startsWith('prefix')) {
|
592 | echo 'yes'
|
593 | }
|
594 |
|
595 | If the method lookup on `s` fails, it looks for free functions. This means it
|
596 | can be used for "chaining" transformations:
|
597 |
|
598 | var x = myFunc() => list() => join()
|
599 |
|
600 | ### match-ops
|
601 |
|
602 | YSH has four pattern matching operators: `~ !~ ~~ !~~`.
|
603 |
|
604 | Does string match an **eggex**?
|
605 |
|
606 | var filename = 'x42.py'
|
607 | if (filename ~ / d+ /) {
|
608 | echo 'number'
|
609 | }
|
610 |
|
611 | Does a string match a POSIX regular expression (ERE syntax)?
|
612 |
|
613 | if (filename ~ '[[:digit:]]+') {
|
614 | echo 'number'
|
615 | }
|
616 |
|
617 | Negate the result with the `!~` operator:
|
618 |
|
619 | if (filename !~ /space/ ) {
|
620 | echo 'no space'
|
621 | }
|
622 |
|
623 | if (filename !~ '[[:space:]]' ) {
|
624 | echo 'no space'
|
625 | }
|
626 |
|
627 | Does a string match a **glob**?
|
628 |
|
629 | if (filename ~~ '*.py') {
|
630 | echo 'Python'
|
631 | }
|
632 |
|
633 | if (filename !~~ '*.py') {
|
634 | echo 'not Python'
|
635 | }
|
636 |
|
637 | Take care not to confuse glob patterns and regular expressions.
|
638 |
|
639 | - Related doc: [YSH Regex API](../ysh-regex-api.html)
|
640 |
|
641 | ## Eggex
|
642 |
|
643 | ### re-literal
|
644 |
|
645 | An eggex literal looks like this:
|
646 |
|
647 | / expression ; flags ; translation preference /
|
648 |
|
649 | The flags and translation preference are both optional.
|
650 |
|
651 | Examples:
|
652 |
|
653 | var pat = / d+ / # => [[:digit:]]+
|
654 |
|
655 | You can specify flags passed to libc `regcomp()`:
|
656 |
|
657 | var pat = / d+ ; reg_icase reg_newline /
|
658 |
|
659 | You can specify a translation preference after a second semi-colon:
|
660 |
|
661 | var pat = / d+ ; ; ERE /
|
662 |
|
663 | Right now the translation preference does nothing. It could be used to
|
664 | translate eggex to PCRE or Python syntax.
|
665 |
|
666 | - Related doc: [Egg Expressions](../eggex.html)
|
667 |
|
668 | ### re-primitive
|
669 |
|
670 | There are two kinds of eggex primitives.
|
671 |
|
672 | "Zero-width assertions" match a position rather than a character:
|
673 |
|
674 | %start # translates to ^
|
675 | %end # translates to $
|
676 |
|
677 | Literal characters appear within **single** quotes:
|
678 |
|
679 | 'oh *really*' # translates to regex-escaped string
|
680 |
|
681 | Double-quoted strings are **not** eggex primitives. Instead, you can use
|
682 | splicing of strings:
|
683 |
|
684 | var dq = "hi $name"
|
685 | var eggex = / @dq /
|
686 |
|
687 | ### class-literal
|
688 |
|
689 | An eggex character class literal specifies a set. It can have individual
|
690 | characters and ranges:
|
691 |
|
692 | [ 'x' 'y' 'z' a-f A-F 0-9 ] # 3 chars, 3 ranges
|
693 |
|
694 | Omit quotes on ASCII characters:
|
695 |
|
696 | [ x y z ] # avoid typing 'x' 'y' 'z'
|
697 |
|
698 | Sets of characters can be written as strings
|
699 |
|
700 | [ 'xyz' ] # any of 3 chars, not a sequence of 3 chars
|
701 |
|
702 | Backslash escapes are respected:
|
703 |
|
704 | [ \\ \' \" \0 ]
|
705 | [ \xFF \u{3bc} ]
|
706 |
|
707 | (Note that we don't use `\yFF`, as in J8 strings.)
|
708 |
|
709 | Splicing:
|
710 |
|
711 | [ @str_var ]
|
712 |
|
713 | Negation always uses `!`
|
714 |
|
715 | ![ a-f A-F 'xyz' @str_var ]
|
716 |
|
717 | ### named-class
|
718 |
|
719 | Perl-like shortcuts for sets of characters:
|
720 |
|
721 | [ dot ] # => .
|
722 | [ digit ] # => [[:digit:]]
|
723 | [ space ] # => [[:space:]]
|
724 | [ word ] # => [[:alpha:]][[:digit:]]_
|
725 |
|
726 | Abbreviations:
|
727 |
|
728 | [ d s w ] # Same as [ digit space word ]
|
729 |
|
730 | Valid POSIX classes:
|
731 |
|
732 | alnum cntrl lower space
|
733 | alpha digit print upper
|
734 | blank graph punct xdigit
|
735 |
|
736 | Negated:
|
737 |
|
738 | !digit !space !word
|
739 | !d !s !w
|
740 | !alnum # etc.
|
741 |
|
742 | ### re-repeat
|
743 |
|
744 | Eggex repetition looks like POSIX syntax:
|
745 |
|
746 | / 'a'? / # zero or one
|
747 | / 'a'* / # zero or more
|
748 | / 'a'+ / # one or more
|
749 |
|
750 | Counted repetitions:
|
751 |
|
752 | / 'a'{3} / # exactly 3 repetitions
|
753 | / 'a'{2,4} / # between 2 to 4 repetitions
|
754 |
|
755 | ### re-compound
|
756 |
|
757 | Sequence expressions with a space:
|
758 |
|
759 | / word digit digit / # Matches 3 characters in sequence
|
760 | # Examples: a42, b51
|
761 |
|
762 | (Compare `/ [ word digit ] /`, which is a set matching 1 character.)
|
763 |
|
764 | Alternation with `|`:
|
765 |
|
766 | / word | digit / # Matches 'a' OR '9', for example
|
767 |
|
768 | Grouping with parentheses:
|
769 |
|
770 | / (word digit) | \\ / # Matches a9 or \
|
771 |
|
772 | ### re-capture
|
773 |
|
774 | To retrieve a substring of a string that matches an Eggex, use a "capture
|
775 | group" like `<capture ...>`.
|
776 |
|
777 | Here's an eggex with a **positional** capture:
|
778 |
|
779 | var pat = / 'hi ' <capture d+> / # access with _group(1)
|
780 | # or Match => _group(1)
|
781 |
|
782 | Captures can be **named**:
|
783 |
|
784 | <capture d+ as month> # access with _group('month')
|
785 | # or Match => group('month')
|
786 |
|
787 | Captures can also have a type **conversion func**:
|
788 |
|
789 | <capture d+ : int> # _group(1) returns Int
|
790 |
|
791 | <capture d+ as month: int> # _group('month') returns Int
|
792 |
|
793 | Related docs and help topics:
|
794 |
|
795 | - [YSH Regex API](../ysh-regex-api.html)
|
796 | - [`_group()`](chap-builtin-func.html#_group)
|
797 | - [`Match => group()`](chap-type-method.html#group)
|
798 |
|
799 | ### re-splice
|
800 |
|
801 | To build an eggex out of smaller expressions, you can **splice** eggexes
|
802 | together:
|
803 |
|
804 | var D = / [0-9][0-9] /
|
805 | var time = / @D ':' @D / # [0-9][0-9]:[0-9][0-9]
|
806 |
|
807 | If the variable begins with a capital letter, you can omit `@`:
|
808 |
|
809 | var ip = / D ':' D /
|
810 |
|
811 | You can also splice a string:
|
812 |
|
813 | var greeting = 'hi'
|
814 | var pat = / @greeting ' world' / # hi world
|
815 |
|
816 | Splicing is **not** string concatenation; it works on eggex subtrees.
|
817 |
|
818 | ### re-flags
|
819 |
|
820 | Valid ERE flags, which are passed to libc's `regcomp()`:
|
821 |
|
822 | - `reg_icase` aka `i` - ignore case
|
823 | - `reg_newline` - 4 matching changes related to newlines
|
824 |
|
825 | See `man regcomp`.
|
826 |
|
827 | ### re-multiline
|
828 |
|
829 | Multi-line eggexes aren't yet implemented. Splicing makes it less necessary:
|
830 |
|
831 | var Name = / <capture [a-z]+ as name> /
|
832 | var Num = / <capture d+ as num> /
|
833 | var Space = / <capture s+ as space> /
|
834 |
|
835 | # For variables named like CapWords, splicing @Name doesn't require @
|
836 | var lexer = / Name | Num | Space /
|