doc_src/string: Add a small regex reference

This isn't nearly all of it (https://pcre.org/current/doc/html/pcre2syntax.html), but it should cover the most-used features. [ci skip]
2025-03-15 23:22:53 +08:00 · 2018-12-01 09:54:05 +01:00 · 2018-12-01 09:54:05 +01:00 · 1785af156b
commit 1785af156b
parent d20b3c688b
1 changed files with 44 additions and 0 deletions
--- a/doc_src/string.txt
+++ b/doc_src/string.txt
@ -126,6 +126,50 @@ Both the `match` and `replace` subcommand support regular expressions when used

 In general, special characters are special by default, so `a+` matches one or more "a"s, while `a\+` matches an "a" and then a "+". `(a+)` matches one or more "a"s in a capturing group (`(?:XXXX)` denotes a non-capturing group). For the replacement parameter of `replace`, `$n` refers to the n-th group of the match. In the match parameter, `\n` (e.g. `\1`) refers back to groups.

+Some features include repetitions:
+- `*` refers to 0 or more repetitions of the previous expression
+- `+` 1 or more
+- `?` 0 or 1.
+- `{n}` to exactly n (where n is a number)
+- `{n,m}` at least n, no more than m.
+- `{n,}` n or more
+
+Character classes, some of the more important:
+- `.` any character except newline
+- `\d` a decimal digit and `\D`, not a decimal digit
+- `\s` whitespace and `\S`, not whitespace
+- `\w` a "word" character and `\W`, a "non-word" character
+- `[...]` (where "..." is some characters) is a character set
+- `[^...]` is the inverse of the given character set
+- `[x-y]` is the range of characters from x-y
+- `[[:xxx:]]` is a named character set
+- `[[:^xxx:]]` is the inverse of a named character set
+- `[[:alnum:]]`  : "alphanumeric"
+- `[[:alpha:]]`  : "alphabetic"
+- `[[:ascii:]]`  : "0-127"
+- `[[:blank:]]`  : "space or tab"
+- `[[:cntrl:]]`  : "control character"
+- `[[:digit:]]`  : "decimal digit"
+- `[[:graph:]]`  : "printing, excluding space"
+- `[[:lower:]]`  : "lower case letter"
+- `[[:print:]]`  : "printing, including space"
+- `[[:punct:]]`  : "printing, excluding alphanumeric"
+- `[[:space:]]`  : "white space"
+- `[[:upper:]]`  : "upper case letter"
+- `[[:word:]]`   : "same as \w"
+- `[[:xdigit:]]` : "hexadecimal digit"
+
+Groups:
+- `(...)` is a capturing group
+- `(?:...)` is a non-capturing group
+- `\n` is a backreference (where n is the number of the group, starting with 1)
+- `$n` is a reference from the replacement expression to a group in the match expression.
+
+And some other things:
+- `\b` denotes a word boundary, `\B` is not a word boundary.
+- `^` is the start of the string or line, `$` the end.
+- `|` is "alternation", i.e. the "or".
+
 \subsection string-example Examples

 \fish{cli-dark}