2019-10-27 17:56:24 +08:00
string-split - split strings by delimiter
=========================================
Synopsis
--------
.. BEGIN SYNOPSIS
docs synopsis: add HTML highlighing and automate manpage markup
Recent synopsis changes move from literal code blocks to
[RST line blocks]. This does not translate well to HTML: it's not
rendered in monospace, so aligment is lost. Additionally, we don't
get syntax highlighting in HTML, which adds differences to our code
samples which are highlighted.
We hard-wrap synopsis lines (like code blocks). To align continuation
lines in manpages we need [backslashes in weird places]. Combined with
the **, *, and `` markup, it's a bit hard to get the alignment right.
Fix these by moving synopsis sources back to code blocks and compute
HTML syntax highlighting and manpage markup with a custom Sphinx
extension.
The new Pygments lexer can tokenize a synopsis and assign the various
highlighting roles, which closely matches fish's syntax highlighing:
- command/keyword (dark blue)
- parameter (light blue)
- operator like and/or/not/&&/|| (cyan)
- grammar metacharacter (black)
For manpage output, we don't project the fish syntax highlighting
but follow the markup convention in GNU's man(1):
bold text type exactly as shown.
italic text replace with appropriate argument.
To make it easy to separate these two automatically, formalize that
(italic) placeholders must be uppercase; while all lowercase text is
interpreted literally (so rendered bold).
This makes manpages more consistent, see string-join(1) and and(1).
Implementation notes:
Since we want manpage formatting but Sphinx's Pygments highlighing
plugin does not support manpage output, add our custom "synopsis"
directive. This directive parses differently when manpage output is
specified. This means that the HTML and manpage build processes must
not share a cache, because the parsed doctrees are cached. Work around
this by using separate cache locations for build targets "sphinx-docs"
(which creates HTML) and "sphinx-manpages". A better solution would
be to only override Sphinx's ManualPageBuilder but that would take a
bit more code (ideally we could override ManualPageWriter but Sphinx
4.3.2 doesn't really support that).
---
Alternative solution: stick with line blocks but use roles like
:command: or :option: (or custom ones). While this would make it
possible to produce HTML that is consistent with code blocks (by adding
a bit of CSS), the source would look uglier and is harder to maintain.
(Let's say we want to add custom formatting to the [|] metacharacters
in HTML. This is much easier with the proposed patch.)
---
[RST line blocks]: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#line-blocks
[backslashes in weird places]: https://github.com/fish-shell/fish-shell/pull/8626#discussion_r782837750
2022-01-09 22:09:46 +08:00
.. synopsis ::
2022-05-04 07:39:04 +08:00
string split [(-f | --fields) FIELDS] [(-m | --max) MAX] [-n | --no-empty]
2022-05-02 23:19:37 +08:00
[-q | --quiet] [-r | --right] SEP [STRING ...]
2022-05-04 07:39:04 +08:00
string split0 [(-f | --fields) FIELDS] [(-m | --max) MAX] [-n | --no-empty]
2022-05-02 23:19:37 +08:00
[-q | --quiet] [-r | --right] [STRING ...]
2019-10-27 17:56:24 +08:00
.. END SYNOPSIS
Description
-----------
.. BEGIN DESCRIPTION
2022-03-12 22:22:00 +08:00
`` string split `` splits each *STRING* on the separator *SEP* , which can be an empty string. If **-m** or **--max** is specified, at most MAX splits are done on each *STRING* . If **-r** or **--right** is given, splitting is performed right-to-left. This is useful in combination with **-m** or **--max** . With **-n** or **--no-empty** , empty results are excluded from consideration (e.g. `` hello\n\nworld `` would expand to two strings and not three). Exit status: 0 if at least one split was performed, or 1 otherwise.
2020-04-18 14:44:22 +08:00
2022-05-02 23:19:37 +08:00
Use **-f** or **--fields** to print out specific fields. FIELDS is a comma-separated string of field numbers and/or spans. Each field is one-indexed, and will be printed on separate lines. If a given field does not exist, then the command exits with status 1 and does not print anything, unless **--allow-empty** is used.
2019-10-27 17:56:24 +08:00
2022-09-24 01:57:49 +08:00
See also the **--delimiter** option of the :doc: `read <read>` command.
2019-10-27 17:56:24 +08:00
2022-03-12 22:22:00 +08:00
`` string split0 `` splits each *STRING* on the zero byte (NUL). Options are the same as `` string split `` except that no separator is given.
2019-10-27 17:56:24 +08:00
`` split0 `` has the important property that its output is not further split when used in a command substitution, allowing for the command substitution to produce elements containing newlines. This is most useful when used with Unix tools that produce zero bytes, such as `` find -print0 `` or `` sort -z `` . See split0 examples below.
.. END DESCRIPTION
Examples
--------
.. BEGIN EXAMPLES
::
>_ string split . example.com
example
com
>_ string split -r -m1 / /usr/local/bin/fish
/usr/local/bin
fish
>_ string split '' abc
a
b
c
2022-05-02 23:19:37 +08:00
>_ string split --allow-empty -f1,3-4,5 '' abcd
2020-03-21 00:31:23 +08:00
a
c
2022-05-02 23:19:37 +08:00
d
2020-03-21 00:31:23 +08:00
2019-10-27 17:56:24 +08:00
NUL Delimited Examples
^^^^^^^^^^^^^^^^^^^^^^
::
>_ # Count files in a directory, without being confused by newlines.
>_ count (find . -print0 | string split0)
42
>_ # Sort a list of elements which may contain newlines
2020-08-27 00:29:03 +08:00
>_ set foo beta alpha\ngamma
2019-10-27 17:56:24 +08:00
>_ set foo (string join0 $foo | sort -z | string split0)
>_ string escape $foo[1]
2020-08-27 00:29:03 +08:00
alpha\ngamma
2019-10-27 17:56:24 +08:00
.. END EXAMPLES