docs synopsis: add HTML highlighing and automate manpage markup
Recent synopsis changes move from literal code blocks to
[RST line blocks]. This does not translate well to HTML: it's not
rendered in monospace, so aligment is lost. Additionally, we don't
get syntax highlighting in HTML, which adds differences to our code
samples which are highlighted.
We hard-wrap synopsis lines (like code blocks). To align continuation
lines in manpages we need [backslashes in weird places]. Combined with
the **, *, and `` markup, it's a bit hard to get the alignment right.
Fix these by moving synopsis sources back to code blocks and compute
HTML syntax highlighting and manpage markup with a custom Sphinx
extension.
The new Pygments lexer can tokenize a synopsis and assign the various
highlighting roles, which closely matches fish's syntax highlighing:
- command/keyword (dark blue)
- parameter (light blue)
- operator like and/or/not/&&/|| (cyan)
- grammar metacharacter (black)
For manpage output, we don't project the fish syntax highlighting
but follow the markup convention in GNU's man(1):
bold text type exactly as shown.
italic text replace with appropriate argument.
To make it easy to separate these two automatically, formalize that
(italic) placeholders must be uppercase; while all lowercase text is
interpreted literally (so rendered bold).
This makes manpages more consistent, see string-join(1) and and(1).
Implementation notes:
Since we want manpage formatting but Sphinx's Pygments highlighing
plugin does not support manpage output, add our custom "synopsis"
directive. This directive parses differently when manpage output is
specified. This means that the HTML and manpage build processes must
not share a cache, because the parsed doctrees are cached. Work around
this by using separate cache locations for build targets "sphinx-docs"
(which creates HTML) and "sphinx-manpages". A better solution would
be to only override Sphinx's ManualPageBuilder but that would take a
bit more code (ideally we could override ManualPageWriter but Sphinx
4.3.2 doesn't really support that).
---
Alternative solution: stick with line blocks but use roles like
:command: or :option: (or custom ones). While this would make it
possible to produce HTML that is consistent with code blocks (by adding
a bit of CSS), the source would look uglier and is harder to maintain.
(Let's say we want to add custom formatting to the [|] metacharacters
in HTML. This is much easier with the proposed patch.)
---
[RST line blocks]: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#line-blocks
[backslashes in weird places]: https://github.com/fish-shell/fish-shell/pull/8626#discussion_r782837750
2022-01-09 22:09:46 +08:00
|
|
|
# Pygments lexer for a fish command synopsis.
|
|
|
|
#
|
|
|
|
# Example usage:
|
|
|
|
# echo 'string match [OPTIONS] [STRING]' | pygmentize -f terminal256 -l doc_src/fish_synopsis.py:FishSynopsisLexer -x
|
|
|
|
|
|
|
|
from docutils import nodes
|
|
|
|
from pygments.lexer import Lexer
|
|
|
|
from pygments.token import (
|
|
|
|
Generic,
|
|
|
|
Name,
|
|
|
|
Operator,
|
|
|
|
Punctuation,
|
|
|
|
Text,
|
|
|
|
)
|
|
|
|
import re
|
|
|
|
from sphinx.directives.code import CodeBlock
|
|
|
|
|
|
|
|
|
|
|
|
class FishSynopsisDirective(CodeBlock):
|
|
|
|
"""A custom directive that describes a command's grammar."""
|
|
|
|
|
|
|
|
has_content = True
|
|
|
|
required_arguments = 0
|
|
|
|
|
|
|
|
def run(self):
|
|
|
|
if self.env.app.builder.name != "man":
|
|
|
|
self.arguments = ["fish-synopsis"]
|
|
|
|
return CodeBlock.run(self)
|
|
|
|
lexer = FishSynopsisLexer()
|
|
|
|
result = nodes.line_block()
|
2023-06-02 00:20:19 +08:00
|
|
|
for start, tok, text in lexer.get_tokens_unprocessed("\n".join(self.content)):
|
docs synopsis: add HTML highlighing and automate manpage markup
Recent synopsis changes move from literal code blocks to
[RST line blocks]. This does not translate well to HTML: it's not
rendered in monospace, so aligment is lost. Additionally, we don't
get syntax highlighting in HTML, which adds differences to our code
samples which are highlighted.
We hard-wrap synopsis lines (like code blocks). To align continuation
lines in manpages we need [backslashes in weird places]. Combined with
the **, *, and `` markup, it's a bit hard to get the alignment right.
Fix these by moving synopsis sources back to code blocks and compute
HTML syntax highlighting and manpage markup with a custom Sphinx
extension.
The new Pygments lexer can tokenize a synopsis and assign the various
highlighting roles, which closely matches fish's syntax highlighing:
- command/keyword (dark blue)
- parameter (light blue)
- operator like and/or/not/&&/|| (cyan)
- grammar metacharacter (black)
For manpage output, we don't project the fish syntax highlighting
but follow the markup convention in GNU's man(1):
bold text type exactly as shown.
italic text replace with appropriate argument.
To make it easy to separate these two automatically, formalize that
(italic) placeholders must be uppercase; while all lowercase text is
interpreted literally (so rendered bold).
This makes manpages more consistent, see string-join(1) and and(1).
Implementation notes:
Since we want manpage formatting but Sphinx's Pygments highlighing
plugin does not support manpage output, add our custom "synopsis"
directive. This directive parses differently when manpage output is
specified. This means that the HTML and manpage build processes must
not share a cache, because the parsed doctrees are cached. Work around
this by using separate cache locations for build targets "sphinx-docs"
(which creates HTML) and "sphinx-manpages". A better solution would
be to only override Sphinx's ManualPageBuilder but that would take a
bit more code (ideally we could override ManualPageWriter but Sphinx
4.3.2 doesn't really support that).
---
Alternative solution: stick with line blocks but use roles like
:command: or :option: (or custom ones). While this would make it
possible to produce HTML that is consistent with code blocks (by adding
a bit of CSS), the source would look uglier and is harder to maintain.
(Let's say we want to add custom formatting to the [|] metacharacters
in HTML. This is much easier with the proposed patch.)
---
[RST line blocks]: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#line-blocks
[backslashes in weird places]: https://github.com/fish-shell/fish-shell/pull/8626#discussion_r782837750
2022-01-09 22:09:46 +08:00
|
|
|
if ( # Literal text.
|
|
|
|
(tok in (Name.Function, Name.Constant) and not text.isupper())
|
|
|
|
or text.startswith("-") # Literal option, even if it's uppercase.
|
|
|
|
or tok in (Operator, Punctuation)
|
|
|
|
or text
|
|
|
|
== " ]" # Tiny hack: the closing bracket of the test(1) alias is a literal.
|
|
|
|
):
|
|
|
|
node = nodes.strong(text=text)
|
|
|
|
elif (
|
|
|
|
tok in (Name.Constant, Name.Function) and text.isupper()
|
|
|
|
): # Placeholder parameter.
|
|
|
|
node = nodes.emphasis(text=text)
|
|
|
|
else: # Grammar metacharacter or whitespace.
|
|
|
|
node = nodes.inline(text=text)
|
|
|
|
result.append(node)
|
|
|
|
return [result]
|
|
|
|
|
|
|
|
|
|
|
|
lexer_rules = [
|
|
|
|
(re.compile(pattern), token)
|
|
|
|
for pattern, token in (
|
|
|
|
# Hack: treat the "[ expr ]" alias of builtin test as command token (not as grammar
|
|
|
|
# metacharacter). This works because we write it without spaces in the grammar (like
|
|
|
|
# "[OPTIONS]").
|
|
|
|
(r"\[ | \]", Name.Constant),
|
|
|
|
# Statement separators.
|
|
|
|
(r"\n", Text.Whitespace),
|
|
|
|
(r";", Punctuation),
|
|
|
|
(r" +", Text.Whitespace),
|
|
|
|
# Operators have different highlighting than commands or parameters.
|
|
|
|
(r"\b(and|not|or|time)\b", Operator),
|
|
|
|
# Keywords that are not in command position.
|
|
|
|
(r"\b(if|in)\b", Name.Function),
|
|
|
|
# Grammar metacharacters.
|
|
|
|
(r"[()[\]|]", Generic.Other),
|
|
|
|
(r"\.\.\.", Generic.Other),
|
|
|
|
# Parameters.
|
|
|
|
(r"[\w-]+", Name.Constant),
|
|
|
|
(r"[=%]", Name.Constant),
|
|
|
|
(
|
|
|
|
r"[<>]",
|
|
|
|
Name.Constant,
|
|
|
|
), # Redirection are highlighted like parameters by default.
|
|
|
|
)
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
|
|
class FishSynopsisLexer(Lexer):
|
|
|
|
name = "FishSynopsisLexer"
|
|
|
|
aliases = ["fish-synopsis"]
|
|
|
|
|
|
|
|
is_before_command_token = None
|
|
|
|
|
|
|
|
def next_token(self, rule: str, offset: int, has_continuation_line: bool):
|
|
|
|
for pattern, token_kind in lexer_rules:
|
|
|
|
m = pattern.match(rule, pos=offset)
|
|
|
|
if m is None:
|
|
|
|
continue
|
|
|
|
if token_kind is Name.Constant and self.is_before_command_token:
|
|
|
|
token_kind = Name.Function
|
|
|
|
|
|
|
|
if has_continuation_line:
|
|
|
|
# Traditional case: rules with continuation lines only have a single command.
|
|
|
|
self.is_before_command_token = False
|
|
|
|
else:
|
|
|
|
if m.group() in ("\n", ";") or token_kind is Operator:
|
|
|
|
self.is_before_command_token = True
|
|
|
|
elif token_kind in (Name.Constant, Name.Function):
|
|
|
|
self.is_before_command_token = False
|
|
|
|
|
|
|
|
return m, token_kind, m.end()
|
|
|
|
return None, None, offset
|
|
|
|
|
|
|
|
def get_tokens_unprocessed(self, input_text):
|
|
|
|
"""Return a list of (start, tok, value) tuples.
|
|
|
|
|
|
|
|
start is the index into the string
|
|
|
|
tok is the token type (as above)
|
|
|
|
value is the string contents of the token
|
|
|
|
"""
|
|
|
|
"""
|
|
|
|
A synopsis consists of multiple rules. Each rule can have continuation lines, which
|
|
|
|
are expected to be indented:
|
|
|
|
|
|
|
|
cmd foo [--quux]
|
|
|
|
[ARGUMENT] ...
|
|
|
|
cmd bar
|
|
|
|
|
|
|
|
We'll split the input into rules. This is easy for a traditional synopsis because each
|
|
|
|
non-indented line starts a new rule. However, we also want to support code blocks:
|
|
|
|
|
|
|
|
switch VALUE
|
|
|
|
[case [GLOB ...]
|
|
|
|
[COMMAND ...]]
|
|
|
|
end
|
|
|
|
|
|
|
|
which makes this format ambiguous. Hack around this by always adding "end" to the
|
|
|
|
current rule, which is enough in practice.
|
|
|
|
"""
|
|
|
|
rules = []
|
|
|
|
rule = []
|
|
|
|
for line in list(input_text.splitlines()) + [""]:
|
|
|
|
if rule and not line.startswith(" "):
|
|
|
|
rules.append(rule)
|
|
|
|
rule = []
|
|
|
|
if line == "end":
|
|
|
|
rules[-1].append(line)
|
|
|
|
continue
|
|
|
|
rule.append(line)
|
|
|
|
result = []
|
|
|
|
for rule in rules:
|
|
|
|
offset = 0
|
|
|
|
self.is_before_command_token = True
|
|
|
|
has_continuation_line = rule[-1].startswith(" ")
|
|
|
|
rule = "\n".join(rule) + "\n"
|
|
|
|
while True:
|
|
|
|
match, token_kind, offset = self.next_token(
|
|
|
|
rule, offset, has_continuation_line
|
|
|
|
)
|
|
|
|
if match is None:
|
|
|
|
break
|
|
|
|
text = match.group()
|
|
|
|
result.append((match.start(), token_kind, text))
|
|
|
|
assert offset == len(rule), "cannot tokenize leftover text: '{}'".format(
|
|
|
|
rule[offset:]
|
|
|
|
)
|
|
|
|
return result
|