Historical behavior is to stop option parsing at the first non-option argument.
Since we have added more options, it seemed impractical to keep that behavior.
However people are using options in their abbr expansions ("abbr e emacs
-nw"). To support this, we ignore options. However, we only ignore them
if they are not valid "abbr" options. Let's ignore all options in the
expansion definition, which is a small price to pay to keep most existing
configurations working.
Fixes#9410
This does not fix other cases which used to work, like
abbr x -unknown
Those are hopefully not used by anyone, so I don't think we need to maintain
support for that.
Also default the marker to '%'. So you may write:
abbr -a L --position anywhere --set-cursor "% | less"
or set an explicit marker:
abbr -a L --position anywhere --set-cursor=! "! | less"
This renames abbreviation triggers from `--trigger-on entry` and
`--trigger-on exec` to `--on-space` and `--on-enter`. These names are less
precise, as abbreviations trigger on any character that terminates a word
or any key binding that triggers exec, but they're also more human friendly
and that's a better tradeoff.
set-cursor enables abbreviations to specify the cursor location after
expansion, by passing in a string which is expected to be found in the
expansion. For example you may create an abbreviation like `L!`:
abbr L! --position anywhere --set-cursor ! "! | less"
and the cursor will be positioned where the "!" is after expansion, with
the "| less" appearing to its right.
This adds support for the `--function` option of abbreviations, so that the
expansion of an abbreviation may be generated dynamically via a fish
function.
Prior to this change, abbreviations were stored as fish variables, often
universal. However we intend to add additional features to abbreviations
which would be very awkward to shoe-horn into variables.
Re-implement abbreviations using a builtin, managing them internally.
Existing abbreviations stored in universal variables are still imported,
for compatibility. However new abbreviations will need to be added to a
function. A follow-up commit will add it.
Now that abbr is a built-in, remove the abbr function; but leave the
abbr.fish file so that stale files from past installs do not override
the abbr builtin.
The "flag" field enables an option to discover which flag it was invoked
with. However in practice none of our options use multiple flags so this
parameter was always nullptr. Remove it and fix up all the builtins to
stop passing this.
No functional change here.
When unsetting, the scope indicates the scope that was *removed* not
set, so the warning is incorrectly triggered. If anything, the confusion
is now removed or we emit a warning that the variable is still present
in another scope (but don't do that!).
Closes#9338.
This fixes#9321
IEEE Std 1003.1-2017 Issue 6 added optional error condition
[EINVAL] for if no conversion could be performed.
Switch back to wcstoimax/wcstoumax: do not work around the old FreeBSD
8 issue.
Add a test for printf '%d %d' 1 2 3
This addresses a long-standing TODO where `complete -C` output isn't
deduplicated.
With this patch, the same deduplication and sort procedure that is run on actual
pager completions is also executed for `complete -C` completions (with a `-C`
payload specified).
This makes it possible to use `complete -C` to test what completions will
actually be generated by the completions pager instead of it displaying
something completely divorced from reality, improving the productivity of fish
completions developers.
Note that completions that wouldn't be shown in the pager are also omitted from
the results, e.g. `test/buildroot/` and `test/fish_expand_test/` are omitted
from the check matches in `checks/complete_directories.fish` because even if
they were generated, the pager wouldn't have shown them. This again makes
reasoning about and debugging completions much easier and more sane.
Fixes ommitted newline char shown after complete -n'(foo)'
Also axes the 'contains syntax errors' line before the error.
Update tests
before
> complete -n'(foo)'
complete: Condition '(foo)' contained a syntax error
complete: Command substitutions not allowed⏎
after
> complete -n'(foo)'
complete: -n '(foo)': command substitutions not allowed here
Makes it possible to retrieve the currently executing command line as
opposed to the currently executing command (`status current-command`).
Closes#8905.
There should be no functional changes in this commit.
The global variable `$_` set in the parser variables by `reader.cpp` and
read by the `status` builtin was deprecated in fish 2.0 but kept around
internally because there's no good way to store/share/forward parser
variables.
A new enum is added that identifies the status variable and they are
stored in a private array in the parser. There is no need for
synchronization because they are only set during job init and never
thereafter. This is currently asserted via ASSERT_IS_MAIN_THREAD() but
that assert can be dropped in the interest of making the parser possible
to clone and use from worker threads.
The old `$_` global variable is still kept for backwards compatibility,
though it will be dropped in a future release.
Closes#9240.
Squash of the following commits (in reverse-chronological order):
commit 03b5cab3dc40eca9d50a9df07a8a32524338a807
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 15:09:04 2022 -0500
Handle differently declared posix_spawnxxx_t on macOS
On macOS, posix_spawnattr_t and posix_spawn_file_actions_t are declared as void
pointers, so we can't use maybe_t's bool operator to test if it has a value.
commit aed83b8bb308120c0f287814d108b5914593630a
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:48:46 2022 -0500
Update maybe_t tests to reflect dynamic bool conversion
maybe_t<T> is now bool-convertible only if T _isn't_ already bool-convertible.
commit 2b5a12ca97b46f96b1c6b56a41aafcbdb0dfddd6
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:34:03 2022 -0500
Make maybe_t a little harder to misuse
We've had a few bugs over the years stemming from accidental misuse of maybe_t
with bool-convertible types. This patch disables maybe_t's bool operator if the
type T is already bool convertible, forcing the (barely worth mentioning) need
to use maybe_t::has_value() instead.
This patch both removes maybe_t's bool conversion for bool-convertible types and
updates the existing codebase to use the explicit `has_value()` method in place
of existing implicit bool conversions.
While we hardcode the return values for the rest of our builtins, the `return`
builtin bubbles up whatever the user returned in their fish script, allowing
invalid return values such as negative numbers to make it into our C++ side of
things.
In creating a `proc_status_t` from the return code of a builtin, we invoke
W_EXITCODE() which is a macro that shifts left the return code by some amount,
and left-shifting a negative integer is undefined behavior.
Aside from causing us to land in UB territory, it also can cause some negative
return values to map to a "successful" exit code of 0, which was probably not
the fish script author's intention.
This patch also adds error logging to help catch any inadvertent additions of
cases where a builtin returns a negative value (should one forget that unix
return codes are always positive) and an assertion protecting against UB.
The impact here depends on the command and how much output it
produces.
It's possible to get up to 1.5x - `string upper` being a good example,
or a no-op `string match '*'`.
But the more the command actually needs to do, the less of an effect
this has.
This basically immediately issues a "write()" if it's to a pipe or the
terminal.
That means we can reduce syscalls and improve performance, even by
doing something like
```c++
streams.out.append(somewcstring + L"\n");
```
instead of
```c++
streams.out.append(somewcstring);
streams.out.push_back(L'\n');
```
Some benchmarks of the
```fish
for i in (string repeat -n 2000 \n)
$thing
end
```
variety:
1. `set` (printing variables) sped up 1.75x
2. `builtin -n` 1.60x
3. `jobs` 1.25x (with 3 jobs)
4. `functions` 1.20x
5. `math 1 + 1` 1.1x
6. `pwd` 1.1x
Piping yields similar results, there is no real difference when
outputting to a command substitution.
This writes the output once per argument instead of once per format or
escaped char.
An egregious case:
```fish
printf (string repeat -n 200 \\x7f)%s\n (string repeat -n 2000 aaa\n)
```
Has been sped up by ~20x by reducing write() calls from 40000 to 200.
Even a simple
```fish
printf %s\n (string repeat -n 2000 aaa\n)
```
should now be ~1.2x faster by issuing 2000 instead of 4000 write
calls (the `\n` was written separately!).
This at least halves the number of "write()" calls we do if it goes to
a pipe or the terminal, or reduces them by 75% if there is a
description.
This makes
```fish
complete -c foo -xa "(seq 50000)"
complete -C"foo "
```
faster by 1.33x.
This reverts commit 3d8f98c395.
In addition to the issues mentioned on the GitHub page for this commit,
it also broke the CentOS 7 build.
Note one can locally test the CentOS 7 build via:
./docker/docker_run_tests.sh ./docker/centos7.Dockerfile
Be more careful with sign extension issues stemming from the differences in how
an untyped literal is promoted to an integer vs how a typed (and signed) `char`
is promoted to an integer.
Also convert some `const[expr] static xxx` to `const[expr] xxx` where it makes
sense to let the compiler deduce on its own whether or not to allocate storage
for a constant variable rather than imposing our view that it should have STATIC
storage set aside for it.
A few call sites were not making use of the `XXX_LEN` definitions and were
calling `strlen(XXX)` - these have been updated to use `const_strlen(XXX)`
instead.
I'm not sure if any toolchains will have raise any issues with these changes...
CI will tell!
This errored out *later* because the result was infinite or NaN, but
it didn't actually stop evaluation.
I'm not sure if there is a way to get floating point math to turn an
infinity back into something that doesn't depend on a literal
infinity, but division by zero conceptually isn't a thing we can
support.
There's entire branches of maths dedicated to figuring out what
dividing by "basically zero" means and we don't have to get into it.
This is essentially the inverse of `string pad`.
Where that adds characters to get up to the specified width,
this adds an ellipsis to a string if it goes over a specific maximum width.
The char can be given, but defaults to our ellipsis string.
("…" if the locale can handle it and "..." otherwise)
If the ellipsis string is empty, it just truncates.
For arguments given via argv, it goes line-by-line,
because otherwise length makes no sense.
If "--no-newline" is given, it adds an ellipsis instead and removes all subsequent lines.
Like pad and `length --visible`, it goes by visible width,
skipping recognized escape sequences, as those have no influence on width.
The default target width is the shortest of the given widths that is non-zero.
If the ellipsis is already wider than the target width,
we truncate instead. This is safer overall, so we don't e.g. move into a new line.
This is especially important given our default ellipsis might be width 3.
Let's hope this doesn't causes build failures for e.g. musl: I just
know it's good on macOS and our Linux CI.
It's been a long time.
One fix this brings, is I discovered we #include assert.h or cassert
in a lot of places. If those ever happen to be in a file that doesn't
include common.h, or we are before common.h gets included, we're
unawaringly working with the system 'assert' macro again, which
may get disabled for debug builds or at least has different
behavior on crash. We undef 'assert' and redefine it in common.h.
Those were all eliminated, except in one catch-22 spot for
maybe.h: it can't include common.h. A fix might be to
make a fish_assert.h that *usually* common.h exports.
This is a *tiny* commit code-wise, but the explanation is a bit
longer.
When I made string read in chunks, I picked a chunk size from bash's
read, under the assumption that they had picked a good one.
It turns out, on the (linux) systems I've tested, that's simply not
true.
My tests show that a bigger chunk size of up to 4096 is better *across
the board*:
- It's better with very large inputs
- It's equal-to-slightly-better with small inputs
- It's equal-to-slightly-better even if we quit early
My test setup:
0. Create various fish builds with various sizes for
STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE".
1. Download the npm package names from
https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I
used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB)
2. Extract the names so we get a line-based version:
```fish
jq '.[]' names.json | string trim -c '"' >/tmp/all
```
3. Create various sizes of random extracts:
```fish
for f in 10000 1000 500 50
shuf /tmp/all | head -n $f > /tmp/$f
end
```
(the idea here is to defeat any form of pattern in the input).
4. Run benchmarks:
hyperfine -w 3 ./fish-{128,512,1024,2048,4096}"
-c 'for i in (seq 1000)
string match -re foot < $f
end; true'"
(reduce the seq size for the larger files so you don't have to wait
for hours - the idea here is to have some time running string and not
just fish startup time)
This shows results pretty much like
```
Summary
'./fish-2048 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true'' ran
1.01 ± 0.02 times faster than './fish-4096 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.02 ± 0.03 times faster than './fish-1024 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.08 ± 0.03 times faster than './fish-512 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.47 ± 0.07 times faster than './fish-128 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
```
So we see that up to 1024 there's a difference, and after that the
returns are marginal. So we stick with 1024 because of the memory
trade-off.
----
Fun extra:
Comparisons with `grep` (GNU grep 3.7) are *weird*. Because you both
get
```
'./fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran
11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end''
```
and
```
'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran
66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
```
Basically, if you *can* give grep a lot of work at once (~40MB in this
case), it'll churn through it like butter. But if you have to call it
a lot, string beats it by virtue of cheating.
Rephrase this to more explicitly indicate that the uvar actually
was successfully set. I believe the prior phrasing can leave some
ambiguity as far as wether set just failed with an error, whether it
has done anything or not.
Now uses the same macro other builtins use for a missing -e arg,
and the error message show the short or long option as it was used.
e.g. before
$ set -e
set: Erase needs a variable name
after
$ set --erase
set: --erase: option requires an argument
$ set -e
set: -e: option requires an argument
Intern'd strings were intended to be "shared" to reduce memory usage but
this optimization doesn't carry its weight. Remove it. No functional
change expected.
We store filenames in function definitions to indicate where the
function comes from. Previously these were intern'd strings. Switch them
to a shared_ptr<wcstring>, intending to remove intern'd strings.
This was misguidedly "fixed" in
9e08609f85, which made printf error out
with any "-"-prefixed words as the first argument.
Note: This means currently `printf --help` doesn't print the help.
This also matches `echo`, and we currently don't have anything to make
a literal `--help` execute a builtin help except for keywords. Oh well.
Fixes#9132
* string repeat: Don't allocate repeated string all at once
This used to allocate one string and fill it with the necessary
repetitions, which could be a very very large string.
Now, it instead uses one buffer and fills it to a chunk size,
and then writes that.
This fixes:
1. We no longer crash with too large max/count values. Before they
caused a bad_alloc because we tried to fill all RAM.
2. We no longer fill all RAM if given a big-but-not-too-big value. You
could've caused fish to eat *most* of your RAM here.
3. It can start writing almost immediately, instead of waiting
potentially minutes to start.
Performance is about the same to slightly faster overall.
ESCAPE_ALL is not really a helpful name. Also it's the most common flag.
Let's make it the default so we can remove this unhelpful name.
While at it, let's add a default value for the flags argument, which helps
most callers.
The absence of ESCAPE_ALL makes it only escape nonprintable characters
(with some exceptions). We use this for displaying strings in the completion
pager as well as for the human-readable output of "set", "set -S", "bind"
and "functions".
No functional change.
When listing variables, "set" tries to escape variable names.
Since variable names cannot have special characters, this doesn't do anything.
The escaping is one of the few places that does not use ESCAPE_ALL. This has
complex behavior; let's alleviate the problem by getting rid of this call.
No functional change.
Or should we stop using it?
I'm fine with either always or never using auto-formatting but our current
way of using it only sometimes is confusing.
No functional change.
Otherwise realpath would add the cwd, which would be broken if fish
ever cd'd.
We could add the original cwd, but even that isn't enough, because we
need *the parent's* idea of cwd and $PATH.
Or, alternatively, what we need is for the OS to give us the actual
path to ourselves.
get_executable_path says: "This needs to be realpath'd"
So how about we do that? The only other place we use it is fish.cpp,
and we realpath it there already.
See #9085
This was an inadvertent change from
cc632d6ae9.
Because we used wgetcwd directly before, we always got the "physical"
resolved $PWD.
There's an argument to be made to use the logical $PWD here as well
but I prefer not to make changes lik that in a random commit without
good reason.
This can be used to print the modification time, like `stat` with some
options.
The reason is that `stat` has caused us a number of portability
headaches:
1. It's not available everywhere by default
2. The versions are quite different
For instance, with GNU stat it's `stat -c '%Y'`, with macOS it's `stat
-f %m`.
So now checking a cache file can be done just with builtins.
These are non-POSIX extensions other test(1) utilities implement,
which compares the modification time of two files as proposed for
fish in #3589: testing if one file is newer than another file.
-ef is a common extension to test(1) which checks if two paths refer
to the same file, by comparing the dev and inode numbers.
Take advantage of additional cleanup unlocked by this refactoring,
including eliminating unneeded error returns and simplifying some
control flow.
No user-visible behavior change expected here.
This switches builtin_string from using PCRE2 directly, to using the new re
component. This simplifies some code and removes redundancy.
No user-visible behavior change expected here.
This switches the flag_to_function from a map to just an ordinary switch
statement. This saves some memory/startup time and removes some
relocations. No functional change here.
This adds a line to `set --show`s output like
```
$PATH: originally inherited as |/home/alfa/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/var/lib/flatpak/exports/bin|
```
to help with debugging.
Note that this means keeping an additional copy of the original
environment around. At most this would be one ARG_MAX's worth, which
is about 2M.
It's still useful without, for instance to implement a command that
takes no options, or to check min-args or max-args.
(technically no optspecs, no min/max args and --ignore-unknown does
nothing, but that's a very specific error that we don't need to forbid)
Fixes#9006
This makes it so `complete -c foo -n test1 -n test2` registers *both*
conditions, and when it comes time to check the candidate, tries both,
in that order. If any fails it stops, if all succeed the completion is offered.
The reason for this is that it helps with caching - we have a
condition cache, but conditions like
```fish
test (count (commandline -opc)) -ge 2; and contains -- (commandline -opc)[2] length
test (count (commandline -opc)) -ge 2; and contains -- (commandline -opc)[2] sub
```
defeats it pretty easily, because the cache only looks at the entire
script as a string - it can't tell that the first `test` is the same
in both.
So this means we separate it into
```fish
complete -f -c string -n "test (count (commandline -opc)) -ge 2; and contains -- (commandline -opc)[2] length" -s V -l visible -d "Use the visible width, excluding escape sequences"
+complete -f -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] length" -s V -l visible -d "Use the visible width, excluding escape sequences"
```
which allows the `test` to be cached.
In tests, this improves performance for the string completions by 30%
by reducing all the redundant `test` calls.
The `git` completions can also greatly benefit from this.
This would still remove non-existent paths, which isn't a strict
inversion and contradicts the docs.
Currently, to only allow paths that exist but don't pass a type check,
you'd have to filter twice:
path filter -Z foo bar | path filter -vfz
If a shortcut for this becomes necessary we can add it later.