The translation is fairly direct though it adds some duplication, for example
there are multiple "match" statements that mimic function overloading.
Rust has no overloading, and we cannot have generic methods in the Node trait
(due to a Rust limitation, the error is like "cannot be made into an object")
so we include the type name in method names.
Give clients like "indent_visitor_t" a Rust companion ("IndentVisitor")
that takes care of the AST traversal while the AST consumption remains
in C++ for now. In future, "IndentVisitor" should absorb the entirety of
"indent_visitor_t". This pattern requires that "fish_indent" be exposed
includable header to the CXX bridge.
Alternatively, we could define FFI wrappers for recursive AST traversal.
Rust requires we separate the AST visitors for "mut" and "const"
scenarios. Take this opportunity to concretize both visitors:
The only client that requires mutable access is the populator. To match the
structure of the C++ populator which makes heavy use of function overloading,
we need to add a bunch of functions to the trait. Since there is no other
mutable visit, this seems acceptable.
The "const" visitors never use "will_visit_fields_of()" or
"did_visit_fields_of()", so remove them (though this is debatable).
Like in the C++ implementation, the AST nodes themselves are largely defined
via macros. Union fields like "Statement" and "ArgumentOrRedirection"
do currently not use macros but may in future.
This commit also introduces a precedent for a type that is defined in one
CXX bridge and used in another one - "ParseErrorList". To make this work
we need to manually define "ExternType".
There is one annoyance with CXX: functions that take explicit lifetime
parameters require to be marked as unsafe. This makes little sense
because functions that return `&Foo` with implicit lifetime can be
misused the same way on the C++ side.
One notable change is that we cannot directly port "find_block_open_keyword()"
(which is used to compute an error) because it relies on the stack of visited
nodes. We cannot modify a stack of node references while we do the "mut"
walk. Happily, an idiomatic solution is easy: we can tell the AST visitor
to backtrack to the parent node and create the error there.
Since "node_t::accept_base" is no longer a template we don't need the
"node_visitation_t" trampoline anymore.
The added copying at the FFI boundary makes things slower (memcpy dominates
the profile) but it's not unusable, which is good news:
$ hyperfine ./fish.{old,new}" -c 'source ../share/completions/git.fish'"
Benchmark 1: ./fish.old -c 'source ../share/completions/git.fish'
Time (mean ± σ): 195.5 ms ± 2.9 ms [User: 190.1 ms, System: 4.4 ms]
Range (min … max): 193.2 ms … 205.1 ms 15 runs
Benchmark 2: ./fish.new -c 'source ../share/completions/git.fish'
Time (mean ± σ): 677.5 ms ± 62.0 ms [User: 665.4 ms, System: 10.0 ms]
Range (min … max): 611.7 ms … 805.5 ms 10 runs
Summary
'./fish.old -c 'source ../share/completions/git.fish'' ran
3.47 ± 0.32 times faster than './fish.new -c 'source ../share/completions/git.fish''
Leftovers:
- Enum variants are still snakecase; I didn't get around to changing this yet.
- "ast_type_to_string()" still returns a snakecase name. This could be
changed since it's not user visible.
This works around an autocxx limitations where different types cannot
have the same name even if they live in different namespace.
ast::job_t conflicts with job_t.
Let's hope this doesn't causes build failures for e.g. musl: I just
know it's good on macOS and our Linux CI.
It's been a long time.
One fix this brings, is I discovered we #include assert.h or cassert
in a lot of places. If those ever happen to be in a file that doesn't
include common.h, or we are before common.h gets included, we're
unawaringly working with the system 'assert' macro again, which
may get disabled for debug builds or at least has different
behavior on crash. We undef 'assert' and redefine it in common.h.
Those were all eliminated, except in one catch-22 spot for
maybe.h: it can't include common.h. A fix might be to
make a fish_assert.h that *usually* common.h exports.
Cancellation groups were meant to reflect the following idea: if you ran a
simple block:
begin
cmd1
cmd2
end
then under job control, cmd1 and cmd2 would get separate groups; however if
either exits due to SIGINT or SIGQUIT we also want to propagate that to the
outer block. So the outermost block and its interior jobs would share a
cancellation group. However this is more complex than necessary; it's
sufficient for the execution context to just store an int internally.
This ought not to affect anything user-visible.
This is a cleanup of job groups, rationalizing a bunch of stuff. Some
notable changes (none user-visible hopefully):
1. Previously, if a job group wanted a pgid, then we would assign it to the
first process to run in the job group. Now we deliberately mark which
process will own the pgroup, via a new `leads_pgrp` flag in process_t. This
eliminates a source of ambiguity.
2. Previously, if a job were run inside fish's pgroup, we would set fish's
pgroup as the group of the job. But this meant we had to check if the job
had fish's pgroup in lots of places, for example when calling tcsetpgrp.
Now a job group only has a pgrp if that pgrp is external (i.e. the job is
under job control).
The user may write for example:
echo foo >&5
and fish would try to output to file descriptor 5, within the fish process
itself. This has unpredictable effects and isn't useful. Make this an
error.
Note that the reverse is "allowed" but ignored:
echo foo 5>&1
this conceptually dup2s stdout to fd 5, but since no builtin writes to fd
5 we ignore it.
This concerns how "internal job groups" know to stop executing when an
external command receives a "cancel signal" (SIGINT or SIGQUIT). For
example:
while true
sleep 1
end
The intent is that if any 'sleep' exits from a cancel signal, then so would
the while loop. This is why you can hit control-C to end the loop even
if the SIGINT is delivered to sleep and not fish.
Here the 'while' loop is considered an "internal job group" (no separate
pgid, bash would not fork) while each 'sleep' is a separate external
command with its own job group, pgroup, etc. Prior to this change, after
running each 'sleep', parse_execution_context_t would check to see if its
exit status was a cancel signal, and if so, stash it into an int that the
cancel checker would check. But this became unwieldy: now there were three
sources of cancellation signals (that int, the job group, and fish itself).
Introduce the notion of a "cancellation group" which is a set of job
groups that should cancel together. Even though the while loop and sleep
are in different job groups, they are in the same cancellation group. When
any job gets a SIGINT or SIGQUIT, it marks that signal in its cancellation
group, which prevents running new jobs in that group.
This reduces the number of signals to check from 3 to 2; eventually we can
teach cancellation groups how to check fish's own signals and then it will
just be 1.
This concerns code like the following:
while true ; sleep 100; end
Here 'while' is a "simple block execution" and does not create a new job,
or get a pgid. Each 'sleep' however is an external command execution, and
is treated as a distinct job. (bash is the same way). So `while` and
`sleep` are always in different job groups.
The problem comes about if 'sleep' is cancelled through SIGINT or SIGQUIT.
Prior to 2a4c545b21, if *any* process got a SIGINT or SIGQUIT, then fish
would mark a global "stop executing" variable. This obviously prevents
background execution of fish functions.
In 2a4c545b21, this was changed so only the job's group gets marked as
cancelled. However in the case of one job group spawning another, we
weren't propagating the signal.
This adds a signal to parse_execution_context which the parser checks after
execution. It's not ideal since now we have three different places where
signals can be recorded. However it fixes this regression which is too
important to leave unfixed for long.
Fixes#7259
Initially I wanted to pick a different name to avoid confusion with
process groups, but really job trees *are* process groups. So name them
to reflect that fact.
Also rename "placeholder" to "internal" which is clearer.
job_lineage was used to track "where jobs came from" but the job tree idea is
a better abstraction. It groups jobs together similar to how a process group
would in other shells. Begin to remove the notion of lineage.
Prior to this fix, fish was rather inconsistent in when $status gets set
in response to an error. For example, a failed expansion like "$foo["
would not modify $status.
This makes the following inter-related changes:
1. String expansion now directly returns the value to set for $status on
error. The value is always used.
2. parser_t::eval() now directly returns the proc_status_t, which cleans
up a lot of call sites.
3. We expose a new function exec_subshell_for_expand() which ignores
$status but returns errors specifically related to subshell expansion.
4. We reify the notion of "expansion breaking" errors. These include
command-not-found, expand syntax errors, and others.
The upshot is we are more consistent about always setting $status on
errors.
This commit recognizes an existing pattern: many operations need some
combination of a set of variables, a way to detect cancellation, and
sometimes a parser. For example, tab completion needs a parser to execute
custom completions, the variable set, should cancel on SIGINT. Background
autosuggestions don't need a parser, but they do need the variables and
should cancel if the user types something new. Etc.
This introduces a new triple operation_context_t that wraps these concepts
up. This simplifies many method signatures and argument passing.
for-loops that were not inside a function could overwrite global
and universal variables with the loop variable. Avoid this by making
for-loop-variables local variables in their enclosing scope.
This means that if someone does:
set a global
for a in local; end
echo $a
The local $a will shadow the global one (but not be visible in child
scopes). Which is surprising, but less dangerous than the previous
behavior.
The detection whether the loop is running inside a function was failing
inside command substitutions. Remove this special handling of functions
alltogether, it's not needed anymore.
Fixes#6480
Store the entire function declaration, not just its job list.
This allows us to extract the body of the function complete with any
leading comments and indents.
Fixes#5285
Prior to this change, a process after it has been constructed by
parse_execution, but before it is executed, was given a list of
io_data_t redirections. The problem is that redirections have a
sensitive ownership policy because they hold onto fds. This made it
rather hard to reason about fd lifetime.
Change these to redirection_spec_t. This is a textual description
of a redirection after expansion. It does not represent an open file and
so its lifetime is no longer important.
This enables files to be held only on the stack, and are no longer owned
by a process of indeterminate lifetime.
Currently a job needs to know three things about its "parents:"
1. Any IO redirections for the block or function containing this job
2. The pgid for the parent job
3. Whether the parent job has been fully constructed (to defer self-disown)
These are all tracked in somewhat separate awkward ways. Collapse them
into a single new type job_lineage_t.
This adds initial support for statements with prefixed variable assignments.
Statments like this are supported:
a=1 b=$a echo $b # outputs 1
Just like in other shells, the left-hand side of each assignment must
be a valid variable identifier (no quoting/escaping). Array indexing
(PATH[1]=/bin ls $PATH) is *not* yet supported, but can be added fairly
easily.
The right hand side may be any valid string token, like a command
substitution, or a brace expansion.
Since `a=* foo` is equivalent to `begin set -lx a *; foo; end`,
the assignment, like `set`, uses nullglob behavior, e.g. below command
can safely be used to check if a directory is empty.
x=/nothing/{,.}* test (count $x) -eq 0
Generic file completion is done after the equal sign, so for example
pressing tab after something like `HOME=/` completes files in the
root directory
Subcommand completion works, so something like
`GIT_DIR=repo.git and command git ` correctly calls git completions
(but the git completion does not use the variable as of now).
The variable assignment is highlighted like an argument.
Closes#6048
This runs build_tools/style.fish, which runs clang-format on C++, fish_indent on fish and (new) black on python.
If anything is wrong with the formatting, we should fix the tools, but automated formatting is worth it.
The parent of a job is the parent pipeline that executed the function or
block corresponding to this job. This will help simplify
process_mark_finished_children().
Variables set in if and while conditions are in the enclosing block, not
the if/while statement block. For example:
if set -l var (somecommand) ; end
echo $var
will now work as expected.
Fixes#4820. Fixes#1212.
This promotes "and" and "or" from a type of statement to "job
decorators," as a possible prefix on a job. The point is to rationalize
how they interact with && and ||.
In the new world 'and' and 'or' apply to a entire job conjunction, i.e.
they have "lower precedence." Example:
if [ $age -ge 0 ] && [ $age -le 18 ]
or [ $age -ge 75 ] && [ $age -le 100 ]
echo "Child or senior"
end
This concerns block nodes with redirections, like
begin ... end | grep ...
Prior to this fix, we passed in a pointer to the node. Switch to passing
in the tnode and parsed source ref. This improves type safety and better
aligns with the function-node plans.
Prior to this fix, functions stored a string representation of their
contents. Switch them to storing a parsed source reference and the
tnode of the contents. This is part of an effort to avoid reparsing
a function's contents every time it executes.