The regex struct is pretty large at 560 bytes, with the entire
Abbreviation being 664 bytes.
If it's an "Option<Regex>", any abbr gets to pay the price. Boxing it
means abbrs without a regex are over 500 bytes smaller.
.unwrap() is in effect an assert(). If it is applied mistakenly, the
program crashes and there isn't a good error.
I would like it to be used as a last resort. In these cases there are
nicer ways to do it that handle a missing result properly.
Issue #10194 reports Cobra completions do
set -l args (commandline -opc)
eval $args[1] __complete $args[2..] (commandline -ct | string escape)
The intent behind "eval" is to expand variables and tildes in "$args".
Fair enough. Several of our own completions do the same, see the next commit.
The problem with "commandline -o" + "eval" is that the former already
removes quotes that are relevant for "eval". This becomes a problem if $args
contains quoted () or {}, for example this command will wrongly execute a
command substituion:
git --work-tree='(launch-missiles)' <TAB>
It is possible to escape the string the tokens before running eval, but
then there will be no expansion of variables etc. The problem is that
"commandline -o" only unescapes tokens so they end up in a weird state
somewhere in-between what the user typed and the expanded version.
Remove the need for "eval" by introducing "commandline -x" which expands
things like variables and braces. This enables custom completion scripts to
be aware of shell variables without eval, see the added test for completions
to "make -C $var/some/dir ".
This means that essentially all third party scripts should migrate from
"commandline -o" to "commandline -x". For example
set -l tokens
if commandline -x >/dev/null 2>&1
set tokens (commandline -xpc)
else
set tokens (commandline -opc)
end
Since this is mainly used for completions, the expansion skips command
substitutions. They are passed through as-is (instead of cancelling or
expanding to nothing) to make custom completion scripts work reasonably well
in the common case. Of course there are cases where we would want to expand
command substitutions here, so I'm not sure.
Move all qmark tests to `scoped_test()` sections with explicitly set feature
flags. We already test the default qmark behavior in the functionality tests.
It seems the logic for calculating the cursor position was not ported correctly,
because the correct place to insert it is at the cursor_pos regardless of
range.start, going by the parameters submitted to the function and the expected
result.
This implements input and input_common FFI pieces in input_ffi.rs, and
simultaneously ports bind.rs. This was done as a single commit because
builtin_bind would have required a substantial amount of work to use the input
ffi.
Drop support for history file version 1.
ParseExecutionContext no longer contains an OperationContext because in my
first implementation, ParseExecutionContext didn't have interior mutability.
We should probably try to add it back.
Add a few to-do style comments. Search for "todo!" and "PORTING".
Co-authored-by: Xiretza <xiretza@xiretza.xyz>
(complete, wildcard, expand, history, history/file)
Co-authored-by: Henrik Hørlück Berg <36937807+henrikhorluck@users.noreply.github.com>
(builtins/set)
This reduces noise in the upcoming "Port execution" commit.
I accidentally made IoStreams a "class" instead of a "struct". Would be
easy to correct that but this will be deleted soon, so I don't think we care.
- Add test to verify piped string replace exit code
Ensure fields parsing error messages are the same.
Note: C++ relied upon the value of the parsed value even when `errno` was set,
that is defined behaviour we should not rely on, and cannot easilt be replicated from Rust.
Therefore the Rust version will change the following error behaviour from:
```shell
> string split --fields=a "" abc
string split: Invalid fields value 'a'
> string split --fields=1a "" abc
string split: 1a: invalid integer
```
To:
```shell
> string split --fields=a "" abc
string split: a: invalid integer
> string split --fields=1a "" abc
string split: 1a: invalid integer
```
This adopts the new function store, replacing the C++ version.
It also reimplements builtin_function in Rust, as these was too coupled to
the function store to handle in a separate commit.
We could end up overflowing if we print out something that's a multiple of the
chunk size, which would then finish printing in the chunk-printing, but not
break out early.
Note this is slightly incomplete - the FD is not moved into the parser, and so
will be freed at the end of each directory change. The FD saved in the parser is
never actually used in existing code, so this doesn't break anything, but will
need to be corrected once the parser is ported.
The global variables are moved (not copied) from C++ to rust and exported as
extern C integers. On the rust side they are accessed only with atomic semantics
but regular int access is preserved from the C++ side (until that code is also
ported).
There are many places where we want to treat a missing variable the same as
a variable with an empty value.
In C++ we handle this by branching on maybe_t<env_var_t>::missing_or_empty().
If it returns false, we go on to access maybe_t<env_var_t>::value() aka
operator*.
In Rust, Environment::get() will return an Option<EnvVar>.
We could define a MissingOrEmpty trait and implement it for Option<EnvVar>.
However that will still leave us with ugly calls to Option::unwrap()
(by convention Rust does use shorthands like *).
Let's add a variable getter that returns none for empty variables.
The translation is fairly direct though it adds some duplication, for example
there are multiple "match" statements that mimic function overloading.
Rust has no overloading, and we cannot have generic methods in the Node trait
(due to a Rust limitation, the error is like "cannot be made into an object")
so we include the type name in method names.
Give clients like "indent_visitor_t" a Rust companion ("IndentVisitor")
that takes care of the AST traversal while the AST consumption remains
in C++ for now. In future, "IndentVisitor" should absorb the entirety of
"indent_visitor_t". This pattern requires that "fish_indent" be exposed
includable header to the CXX bridge.
Alternatively, we could define FFI wrappers for recursive AST traversal.
Rust requires we separate the AST visitors for "mut" and "const"
scenarios. Take this opportunity to concretize both visitors:
The only client that requires mutable access is the populator. To match the
structure of the C++ populator which makes heavy use of function overloading,
we need to add a bunch of functions to the trait. Since there is no other
mutable visit, this seems acceptable.
The "const" visitors never use "will_visit_fields_of()" or
"did_visit_fields_of()", so remove them (though this is debatable).
Like in the C++ implementation, the AST nodes themselves are largely defined
via macros. Union fields like "Statement" and "ArgumentOrRedirection"
do currently not use macros but may in future.
This commit also introduces a precedent for a type that is defined in one
CXX bridge and used in another one - "ParseErrorList". To make this work
we need to manually define "ExternType".
There is one annoyance with CXX: functions that take explicit lifetime
parameters require to be marked as unsafe. This makes little sense
because functions that return `&Foo` with implicit lifetime can be
misused the same way on the C++ side.
One notable change is that we cannot directly port "find_block_open_keyword()"
(which is used to compute an error) because it relies on the stack of visited
nodes. We cannot modify a stack of node references while we do the "mut"
walk. Happily, an idiomatic solution is easy: we can tell the AST visitor
to backtrack to the parent node and create the error there.
Since "node_t::accept_base" is no longer a template we don't need the
"node_visitation_t" trampoline anymore.
The added copying at the FFI boundary makes things slower (memcpy dominates
the profile) but it's not unusable, which is good news:
$ hyperfine ./fish.{old,new}" -c 'source ../share/completions/git.fish'"
Benchmark 1: ./fish.old -c 'source ../share/completions/git.fish'
Time (mean ± σ): 195.5 ms ± 2.9 ms [User: 190.1 ms, System: 4.4 ms]
Range (min … max): 193.2 ms … 205.1 ms 15 runs
Benchmark 2: ./fish.new -c 'source ../share/completions/git.fish'
Time (mean ± σ): 677.5 ms ± 62.0 ms [User: 665.4 ms, System: 10.0 ms]
Range (min … max): 611.7 ms … 805.5 ms 10 runs
Summary
'./fish.old -c 'source ../share/completions/git.fish'' ran
3.47 ± 0.32 times faster than './fish.new -c 'source ../share/completions/git.fish''
Leftovers:
- Enum variants are still snakecase; I didn't get around to changing this yet.
- "ast_type_to_string()" still returns a snakecase name. This could be
changed since it's not user visible.
This is basically a subset of type, so we might as well.
To be clear this is `command -s` and friends, if you do `command grep` that's
handled as a keyword.
One issue here is that we can't get "one path or not" because I don't
know how to translate a maybe_t? Do we need to make it a shared_ptr instead?
Most of it is duplicated, hence untested.
Functions like mbrtowc are not exposed by the libc crate, so declare them
ourselves.
Since we don't know the definition of C macros, add two big hacks to make
this work:
1. Replace MB_LEN_MAX and mbstate_t with values (resp types) that should
be large enough for any implementation.
2. Detect the definition of MB_CUR_MAX in the build script. This requires
more changes for each new libc. We could also use this approach for 1.
Additionally, this commit brings a small behavior change to
read_unquoted_escape(): we cannot decode surrogate code points like \UDE01
into a Rust char, so use � (\UFFFD, replacement character) instead.
Previously, we added such code points to a wcstring; looks like they were
ignored when printed.
More ugliness with types that cxx bridge can't recognize as being POD. Using
pointers to get/set `termios` values with an assert to make sure we're using
identical definitions on both sides (in cpp from the system headers and in rust
from the libc crate as exported).
I don't know why cxx bridge doesn't allow `SharedPtr<OpaqueRustType>` but we can
work around it in C++ by converting a `Box<T>` to a `shared_ptr<T>` then convert
it back when it needs to be destructed. I can't find a clean way of doing it
from the cxx bridge wrapper so for now it needs to be done manually in the C++
code.
Types/values that are drop-in ready over ffi are renamed to match the old cpp
names but for types that now differ due to ffi difficulties I've left the `_ffi`
in the function names to indicate that this isn't the "correct" way of using the
types/methods.
Keeps the location of original function definition, and also stores
where it was copied. `functions` and `type` show both locations,
instead of none. It also retains the line numbers in the stack trace.
This is early work but I guess there's no harm in pushing it?
Some thoughts on the conventions:
Types that live only inside Rust follow Rust naming convention
("FeatureMetadata").
Types that live on both sides of the language boundary follow the existing
naming ("feature_flag_t").
The alternative is to define a type alias ("using feature_flag_t =
rust::FeatureFlag") but that doesn't seem to be supported in "[cxx::bridge]"
blocks. We could put it in a header ("future_feature_flags.h").
"feature_metadata_t" is a variant of "FeatureMetadata" that can cross
the language boundary. This has the advantage that we can avoid tainting
"FeatureMetadata" with "CxxString" and such. This is an experimental approach,
probably not what we should do in general.
This means cleaning out old universal variables is now just:
```fish
abbr --erase (abbr --list)
```
which makes upgrading much easier.
Note that this erases the currently defined variable and/or any
universal. It doesn't stop at the former because that makes it *easy*
to remove the universals (no running `abbr --erase` twice), and it
doesn't care about globals because, well, they would be gone on
restart anyway.
Fixes#9468.
This now means `abbr --add` has two modes:
```fish
abbr --add name --function foo --regex regex
```
```fish
abbr --add name --regex regex replacement
```
This is because `--function` was seen to be confusing as a boolean flag.
Unfortunately print_hints was true *by default* - so for all builtins
that didn't pass it it would now be false instead.
This resulted in the trailer missing, which includes the line number
and context. So if you ran a script that includes `bind -M` the error
message would now just be "bind: -M: option requires an argument",
with no indication as to where.
This reverts commit 8a50d47a46.
The print_hints variable was always false, so just remove it.
This caused a cascade of other changes where the parser_t variable
becomes unused, so remove it from the call sites.
No functional change expected here.
This would print
```
abbr -a -- dotdot --regex ^\\.\\.+\$ --function multicd
```
which expands "dotdot" to "--regex ^\\.\\.+\$...".
Instead, we move the name to right before the replacement, and move
the `--` before that:
```
abbr -a --regex ^\\.\\.+\$ --function -- dotdot multicd
```
It might be possible to improve that, but this at least round-trips.
Historical behavior is to stop option parsing at the first non-option argument.
Since we have added more options, it seemed impractical to keep that behavior.
However people are using options in their abbr expansions ("abbr e emacs
-nw"). To support this, we ignore options. However, we only ignore them
if they are not valid "abbr" options. Let's ignore all options in the
expansion definition, which is a small price to pay to keep most existing
configurations working.
Fixes#9410
This does not fix other cases which used to work, like
abbr x -unknown
Those are hopefully not used by anyone, so I don't think we need to maintain
support for that.
Also default the marker to '%'. So you may write:
abbr -a L --position anywhere --set-cursor "% | less"
or set an explicit marker:
abbr -a L --position anywhere --set-cursor=! "! | less"
This renames abbreviation triggers from `--trigger-on entry` and
`--trigger-on exec` to `--on-space` and `--on-enter`. These names are less
precise, as abbreviations trigger on any character that terminates a word
or any key binding that triggers exec, but they're also more human friendly
and that's a better tradeoff.
set-cursor enables abbreviations to specify the cursor location after
expansion, by passing in a string which is expected to be found in the
expansion. For example you may create an abbreviation like `L!`:
abbr L! --position anywhere --set-cursor ! "! | less"
and the cursor will be positioned where the "!" is after expansion, with
the "| less" appearing to its right.
This adds support for the `--function` option of abbreviations, so that the
expansion of an abbreviation may be generated dynamically via a fish
function.
Prior to this change, abbreviations were stored as fish variables, often
universal. However we intend to add additional features to abbreviations
which would be very awkward to shoe-horn into variables.
Re-implement abbreviations using a builtin, managing them internally.
Existing abbreviations stored in universal variables are still imported,
for compatibility. However new abbreviations will need to be added to a
function. A follow-up commit will add it.
Now that abbr is a built-in, remove the abbr function; but leave the
abbr.fish file so that stale files from past installs do not override
the abbr builtin.
The "flag" field enables an option to discover which flag it was invoked
with. However in practice none of our options use multiple flags so this
parameter was always nullptr. Remove it and fix up all the builtins to
stop passing this.
No functional change here.
When unsetting, the scope indicates the scope that was *removed* not
set, so the warning is incorrectly triggered. If anything, the confusion
is now removed or we emit a warning that the variable is still present
in another scope (but don't do that!).
Closes#9338.
This fixes#9321
IEEE Std 1003.1-2017 Issue 6 added optional error condition
[EINVAL] for if no conversion could be performed.
Switch back to wcstoimax/wcstoumax: do not work around the old FreeBSD
8 issue.
Add a test for printf '%d %d' 1 2 3
This addresses a long-standing TODO where `complete -C` output isn't
deduplicated.
With this patch, the same deduplication and sort procedure that is run on actual
pager completions is also executed for `complete -C` completions (with a `-C`
payload specified).
This makes it possible to use `complete -C` to test what completions will
actually be generated by the completions pager instead of it displaying
something completely divorced from reality, improving the productivity of fish
completions developers.
Note that completions that wouldn't be shown in the pager are also omitted from
the results, e.g. `test/buildroot/` and `test/fish_expand_test/` are omitted
from the check matches in `checks/complete_directories.fish` because even if
they were generated, the pager wouldn't have shown them. This again makes
reasoning about and debugging completions much easier and more sane.
Fixes ommitted newline char shown after complete -n'(foo)'
Also axes the 'contains syntax errors' line before the error.
Update tests
before
> complete -n'(foo)'
complete: Condition '(foo)' contained a syntax error
complete: Command substitutions not allowed⏎
after
> complete -n'(foo)'
complete: -n '(foo)': command substitutions not allowed here
Makes it possible to retrieve the currently executing command line as
opposed to the currently executing command (`status current-command`).
Closes#8905.
There should be no functional changes in this commit.
The global variable `$_` set in the parser variables by `reader.cpp` and
read by the `status` builtin was deprecated in fish 2.0 but kept around
internally because there's no good way to store/share/forward parser
variables.
A new enum is added that identifies the status variable and they are
stored in a private array in the parser. There is no need for
synchronization because they are only set during job init and never
thereafter. This is currently asserted via ASSERT_IS_MAIN_THREAD() but
that assert can be dropped in the interest of making the parser possible
to clone and use from worker threads.
The old `$_` global variable is still kept for backwards compatibility,
though it will be dropped in a future release.
Closes#9240.
Squash of the following commits (in reverse-chronological order):
commit 03b5cab3dc40eca9d50a9df07a8a32524338a807
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 15:09:04 2022 -0500
Handle differently declared posix_spawnxxx_t on macOS
On macOS, posix_spawnattr_t and posix_spawn_file_actions_t are declared as void
pointers, so we can't use maybe_t's bool operator to test if it has a value.
commit aed83b8bb308120c0f287814d108b5914593630a
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:48:46 2022 -0500
Update maybe_t tests to reflect dynamic bool conversion
maybe_t<T> is now bool-convertible only if T _isn't_ already bool-convertible.
commit 2b5a12ca97b46f96b1c6b56a41aafcbdb0dfddd6
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:34:03 2022 -0500
Make maybe_t a little harder to misuse
We've had a few bugs over the years stemming from accidental misuse of maybe_t
with bool-convertible types. This patch disables maybe_t's bool operator if the
type T is already bool convertible, forcing the (barely worth mentioning) need
to use maybe_t::has_value() instead.
This patch both removes maybe_t's bool conversion for bool-convertible types and
updates the existing codebase to use the explicit `has_value()` method in place
of existing implicit bool conversions.
While we hardcode the return values for the rest of our builtins, the `return`
builtin bubbles up whatever the user returned in their fish script, allowing
invalid return values such as negative numbers to make it into our C++ side of
things.
In creating a `proc_status_t` from the return code of a builtin, we invoke
W_EXITCODE() which is a macro that shifts left the return code by some amount,
and left-shifting a negative integer is undefined behavior.
Aside from causing us to land in UB territory, it also can cause some negative
return values to map to a "successful" exit code of 0, which was probably not
the fish script author's intention.
This patch also adds error logging to help catch any inadvertent additions of
cases where a builtin returns a negative value (should one forget that unix
return codes are always positive) and an assertion protecting against UB.
The impact here depends on the command and how much output it
produces.
It's possible to get up to 1.5x - `string upper` being a good example,
or a no-op `string match '*'`.
But the more the command actually needs to do, the less of an effect
this has.
This basically immediately issues a "write()" if it's to a pipe or the
terminal.
That means we can reduce syscalls and improve performance, even by
doing something like
```c++
streams.out.append(somewcstring + L"\n");
```
instead of
```c++
streams.out.append(somewcstring);
streams.out.push_back(L'\n');
```
Some benchmarks of the
```fish
for i in (string repeat -n 2000 \n)
$thing
end
```
variety:
1. `set` (printing variables) sped up 1.75x
2. `builtin -n` 1.60x
3. `jobs` 1.25x (with 3 jobs)
4. `functions` 1.20x
5. `math 1 + 1` 1.1x
6. `pwd` 1.1x
Piping yields similar results, there is no real difference when
outputting to a command substitution.
This writes the output once per argument instead of once per format or
escaped char.
An egregious case:
```fish
printf (string repeat -n 200 \\x7f)%s\n (string repeat -n 2000 aaa\n)
```
Has been sped up by ~20x by reducing write() calls from 40000 to 200.
Even a simple
```fish
printf %s\n (string repeat -n 2000 aaa\n)
```
should now be ~1.2x faster by issuing 2000 instead of 4000 write
calls (the `\n` was written separately!).
This at least halves the number of "write()" calls we do if it goes to
a pipe or the terminal, or reduces them by 75% if there is a
description.
This makes
```fish
complete -c foo -xa "(seq 50000)"
complete -C"foo "
```
faster by 1.33x.
This reverts commit 3d8f98c395.
In addition to the issues mentioned on the GitHub page for this commit,
it also broke the CentOS 7 build.
Note one can locally test the CentOS 7 build via:
./docker/docker_run_tests.sh ./docker/centos7.Dockerfile
Be more careful with sign extension issues stemming from the differences in how
an untyped literal is promoted to an integer vs how a typed (and signed) `char`
is promoted to an integer.
Also convert some `const[expr] static xxx` to `const[expr] xxx` where it makes
sense to let the compiler deduce on its own whether or not to allocate storage
for a constant variable rather than imposing our view that it should have STATIC
storage set aside for it.
A few call sites were not making use of the `XXX_LEN` definitions and were
calling `strlen(XXX)` - these have been updated to use `const_strlen(XXX)`
instead.
I'm not sure if any toolchains will have raise any issues with these changes...
CI will tell!