While we hardcode the return values for the rest of our builtins, the `return`
builtin bubbles up whatever the user returned in their fish script, allowing
invalid return values such as negative numbers to make it into our C++ side of
things.
In creating a `proc_status_t` from the return code of a builtin, we invoke
W_EXITCODE() which is a macro that shifts left the return code by some amount,
and left-shifting a negative integer is undefined behavior.
Aside from causing us to land in UB territory, it also can cause some negative
return values to map to a "successful" exit code of 0, which was probably not
the fish script author's intention.
This patch also adds error logging to help catch any inadvertent additions of
cases where a builtin returns a negative value (should one forget that unix
return codes are always positive) and an assertion protecting against UB.
The impact here depends on the command and how much output it
produces.
It's possible to get up to 1.5x - `string upper` being a good example,
or a no-op `string match '*'`.
But the more the command actually needs to do, the less of an effect
this has.
This basically immediately issues a "write()" if it's to a pipe or the
terminal.
That means we can reduce syscalls and improve performance, even by
doing something like
```c++
streams.out.append(somewcstring + L"\n");
```
instead of
```c++
streams.out.append(somewcstring);
streams.out.push_back(L'\n');
```
Some benchmarks of the
```fish
for i in (string repeat -n 2000 \n)
$thing
end
```
variety:
1. `set` (printing variables) sped up 1.75x
2. `builtin -n` 1.60x
3. `jobs` 1.25x (with 3 jobs)
4. `functions` 1.20x
5. `math 1 + 1` 1.1x
6. `pwd` 1.1x
Piping yields similar results, there is no real difference when
outputting to a command substitution.
This writes the output once per argument instead of once per format or
escaped char.
An egregious case:
```fish
printf (string repeat -n 200 \\x7f)%s\n (string repeat -n 2000 aaa\n)
```
Has been sped up by ~20x by reducing write() calls from 40000 to 200.
Even a simple
```fish
printf %s\n (string repeat -n 2000 aaa\n)
```
should now be ~1.2x faster by issuing 2000 instead of 4000 write
calls (the `\n` was written separately!).
This at least halves the number of "write()" calls we do if it goes to
a pipe or the terminal, or reduces them by 75% if there is a
description.
This makes
```fish
complete -c foo -xa "(seq 50000)"
complete -C"foo "
```
faster by 1.33x.
This reverts commit 3d8f98c395.
In addition to the issues mentioned on the GitHub page for this commit,
it also broke the CentOS 7 build.
Note one can locally test the CentOS 7 build via:
./docker/docker_run_tests.sh ./docker/centos7.Dockerfile
Be more careful with sign extension issues stemming from the differences in how
an untyped literal is promoted to an integer vs how a typed (and signed) `char`
is promoted to an integer.
Also convert some `const[expr] static xxx` to `const[expr] xxx` where it makes
sense to let the compiler deduce on its own whether or not to allocate storage
for a constant variable rather than imposing our view that it should have STATIC
storage set aside for it.
A few call sites were not making use of the `XXX_LEN` definitions and were
calling `strlen(XXX)` - these have been updated to use `const_strlen(XXX)`
instead.
I'm not sure if any toolchains will have raise any issues with these changes...
CI will tell!
This errored out *later* because the result was infinite or NaN, but
it didn't actually stop evaluation.
I'm not sure if there is a way to get floating point math to turn an
infinity back into something that doesn't depend on a literal
infinity, but division by zero conceptually isn't a thing we can
support.
There's entire branches of maths dedicated to figuring out what
dividing by "basically zero" means and we don't have to get into it.
This is essentially the inverse of `string pad`.
Where that adds characters to get up to the specified width,
this adds an ellipsis to a string if it goes over a specific maximum width.
The char can be given, but defaults to our ellipsis string.
("…" if the locale can handle it and "..." otherwise)
If the ellipsis string is empty, it just truncates.
For arguments given via argv, it goes line-by-line,
because otherwise length makes no sense.
If "--no-newline" is given, it adds an ellipsis instead and removes all subsequent lines.
Like pad and `length --visible`, it goes by visible width,
skipping recognized escape sequences, as those have no influence on width.
The default target width is the shortest of the given widths that is non-zero.
If the ellipsis is already wider than the target width,
we truncate instead. This is safer overall, so we don't e.g. move into a new line.
This is especially important given our default ellipsis might be width 3.
Let's hope this doesn't causes build failures for e.g. musl: I just
know it's good on macOS and our Linux CI.
It's been a long time.
One fix this brings, is I discovered we #include assert.h or cassert
in a lot of places. If those ever happen to be in a file that doesn't
include common.h, or we are before common.h gets included, we're
unawaringly working with the system 'assert' macro again, which
may get disabled for debug builds or at least has different
behavior on crash. We undef 'assert' and redefine it in common.h.
Those were all eliminated, except in one catch-22 spot for
maybe.h: it can't include common.h. A fix might be to
make a fish_assert.h that *usually* common.h exports.
This is a *tiny* commit code-wise, but the explanation is a bit
longer.
When I made string read in chunks, I picked a chunk size from bash's
read, under the assumption that they had picked a good one.
It turns out, on the (linux) systems I've tested, that's simply not
true.
My tests show that a bigger chunk size of up to 4096 is better *across
the board*:
- It's better with very large inputs
- It's equal-to-slightly-better with small inputs
- It's equal-to-slightly-better even if we quit early
My test setup:
0. Create various fish builds with various sizes for
STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE".
1. Download the npm package names from
https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I
used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB)
2. Extract the names so we get a line-based version:
```fish
jq '.[]' names.json | string trim -c '"' >/tmp/all
```
3. Create various sizes of random extracts:
```fish
for f in 10000 1000 500 50
shuf /tmp/all | head -n $f > /tmp/$f
end
```
(the idea here is to defeat any form of pattern in the input).
4. Run benchmarks:
hyperfine -w 3 ./fish-{128,512,1024,2048,4096}"
-c 'for i in (seq 1000)
string match -re foot < $f
end; true'"
(reduce the seq size for the larger files so you don't have to wait
for hours - the idea here is to have some time running string and not
just fish startup time)
This shows results pretty much like
```
Summary
'./fish-2048 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true'' ran
1.01 ± 0.02 times faster than './fish-4096 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.02 ± 0.03 times faster than './fish-1024 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.08 ± 0.03 times faster than './fish-512 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.47 ± 0.07 times faster than './fish-128 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
```
So we see that up to 1024 there's a difference, and after that the
returns are marginal. So we stick with 1024 because of the memory
trade-off.
----
Fun extra:
Comparisons with `grep` (GNU grep 3.7) are *weird*. Because you both
get
```
'./fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran
11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end''
```
and
```
'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran
66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
```
Basically, if you *can* give grep a lot of work at once (~40MB in this
case), it'll churn through it like butter. But if you have to call it
a lot, string beats it by virtue of cheating.
Rephrase this to more explicitly indicate that the uvar actually
was successfully set. I believe the prior phrasing can leave some
ambiguity as far as wether set just failed with an error, whether it
has done anything or not.
Now uses the same macro other builtins use for a missing -e arg,
and the error message show the short or long option as it was used.
e.g. before
$ set -e
set: Erase needs a variable name
after
$ set --erase
set: --erase: option requires an argument
$ set -e
set: -e: option requires an argument
Intern'd strings were intended to be "shared" to reduce memory usage but
this optimization doesn't carry its weight. Remove it. No functional
change expected.
We store filenames in function definitions to indicate where the
function comes from. Previously these were intern'd strings. Switch them
to a shared_ptr<wcstring>, intending to remove intern'd strings.
This was misguidedly "fixed" in
9e08609f85, which made printf error out
with any "-"-prefixed words as the first argument.
Note: This means currently `printf --help` doesn't print the help.
This also matches `echo`, and we currently don't have anything to make
a literal `--help` execute a builtin help except for keywords. Oh well.
Fixes#9132
* string repeat: Don't allocate repeated string all at once
This used to allocate one string and fill it with the necessary
repetitions, which could be a very very large string.
Now, it instead uses one buffer and fills it to a chunk size,
and then writes that.
This fixes:
1. We no longer crash with too large max/count values. Before they
caused a bad_alloc because we tried to fill all RAM.
2. We no longer fill all RAM if given a big-but-not-too-big value. You
could've caused fish to eat *most* of your RAM here.
3. It can start writing almost immediately, instead of waiting
potentially minutes to start.
Performance is about the same to slightly faster overall.
ESCAPE_ALL is not really a helpful name. Also it's the most common flag.
Let's make it the default so we can remove this unhelpful name.
While at it, let's add a default value for the flags argument, which helps
most callers.
The absence of ESCAPE_ALL makes it only escape nonprintable characters
(with some exceptions). We use this for displaying strings in the completion
pager as well as for the human-readable output of "set", "set -S", "bind"
and "functions".
No functional change.
When listing variables, "set" tries to escape variable names.
Since variable names cannot have special characters, this doesn't do anything.
The escaping is one of the few places that does not use ESCAPE_ALL. This has
complex behavior; let's alleviate the problem by getting rid of this call.
No functional change.
Or should we stop using it?
I'm fine with either always or never using auto-formatting but our current
way of using it only sometimes is confusing.
No functional change.
Otherwise realpath would add the cwd, which would be broken if fish
ever cd'd.
We could add the original cwd, but even that isn't enough, because we
need *the parent's* idea of cwd and $PATH.
Or, alternatively, what we need is for the OS to give us the actual
path to ourselves.
get_executable_path says: "This needs to be realpath'd"
So how about we do that? The only other place we use it is fish.cpp,
and we realpath it there already.
See #9085
This was an inadvertent change from
cc632d6ae9.
Because we used wgetcwd directly before, we always got the "physical"
resolved $PWD.
There's an argument to be made to use the logical $PWD here as well
but I prefer not to make changes lik that in a random commit without
good reason.
This can be used to print the modification time, like `stat` with some
options.
The reason is that `stat` has caused us a number of portability
headaches:
1. It's not available everywhere by default
2. The versions are quite different
For instance, with GNU stat it's `stat -c '%Y'`, with macOS it's `stat
-f %m`.
So now checking a cache file can be done just with builtins.
These are non-POSIX extensions other test(1) utilities implement,
which compares the modification time of two files as proposed for
fish in #3589: testing if one file is newer than another file.
-ef is a common extension to test(1) which checks if two paths refer
to the same file, by comparing the dev and inode numbers.
Take advantage of additional cleanup unlocked by this refactoring,
including eliminating unneeded error returns and simplifying some
control flow.
No user-visible behavior change expected here.
This switches builtin_string from using PCRE2 directly, to using the new re
component. This simplifies some code and removes redundancy.
No user-visible behavior change expected here.
This switches the flag_to_function from a map to just an ordinary switch
statement. This saves some memory/startup time and removes some
relocations. No functional change here.
This adds a line to `set --show`s output like
```
$PATH: originally inherited as |/home/alfa/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/var/lib/flatpak/exports/bin|
```
to help with debugging.
Note that this means keeping an additional copy of the original
environment around. At most this would be one ARG_MAX's worth, which
is about 2M.
It's still useful without, for instance to implement a command that
takes no options, or to check min-args or max-args.
(technically no optspecs, no min/max args and --ignore-unknown does
nothing, but that's a very specific error that we don't need to forbid)
Fixes#9006
This makes it so `complete -c foo -n test1 -n test2` registers *both*
conditions, and when it comes time to check the candidate, tries both,
in that order. If any fails it stops, if all succeed the completion is offered.
The reason for this is that it helps with caching - we have a
condition cache, but conditions like
```fish
test (count (commandline -opc)) -ge 2; and contains -- (commandline -opc)[2] length
test (count (commandline -opc)) -ge 2; and contains -- (commandline -opc)[2] sub
```
defeats it pretty easily, because the cache only looks at the entire
script as a string - it can't tell that the first `test` is the same
in both.
So this means we separate it into
```fish
complete -f -c string -n "test (count (commandline -opc)) -ge 2; and contains -- (commandline -opc)[2] length" -s V -l visible -d "Use the visible width, excluding escape sequences"
+complete -f -c string -n "test (count (commandline -opc)) -ge 2" -n "contains -- (commandline -opc)[2] length" -s V -l visible -d "Use the visible width, excluding escape sequences"
```
which allows the `test` to be cached.
In tests, this improves performance for the string completions by 30%
by reducing all the redundant `test` calls.
The `git` completions can also greatly benefit from this.
This would still remove non-existent paths, which isn't a strict
inversion and contradicts the docs.
Currently, to only allow paths that exist but don't pass a type check,
you'd have to filter twice:
path filter -Z foo bar | path filter -vfz
If a shortcut for this becomes necessary we can add it later.
This is now added to the two commands that definitely deal with
relative paths.
It doesn't work for e.g. `path basename`, because after removing the
dirname prepending a "./" doesn't refer to the same file, and the
basename is also expected to not contain any slashes.
Because we now count the extension including the ".", we print an
empty entry.
This makes e.g.
```fish
set -l base (path change-extension '' $somefile)
set -l ext (path extension $somefile)
echo $base$ext
```
reconstruct the filename, and makes it easier to deal with files with
no extension.
This means "../" components are cancelled out even after non-existent
paths or files.
(the alternative is to error out, but being able to say `path resolve
/path/to/file/../../` over `path resolve (path dirname
/path/to/file)/../../` seems worth it?)
This sorts paths by basename, dirname or full path - in future
possibly size or age.
It takes --invert to invert the sort and "--what=basename|dirname|..."
to specify what to sort
This can be used to implement better conf.d sorting, with something
like
```fish
set -l sourcelist
for file in (path sort --what=basename $__fish_config_dir/conf.d/*.fish $__fish_sysconf_dir/conf.d/*.fish $vendor_confdirs/*.fish)
```
which will iterate over the files by their basename. Then we keep a
list of their basenames to skip over anything that was already
sourced, like before.
This just goes back until it finds an existent path, resolves that,
and adds the normalized rest on top.
So if you try
/bin/foo/bar////../baz
and /bin exists as a symlink to /usr/bin, it would resolve that, and
normalize the rest, giving
/usr/bin/foo/baz
(note: We might want to add this to realpath as well?)
This includes the "." in what `path extension` prints.
This allows distinguishing between an empty extension (just `.`) and a
non-existent extension (no `.` at all).
This cleans up the path_get_path function which is used to resolve a
command name against $PATH, by removing the dependence on errno and
being explicit about which error is returned.
Should be no user-visible change here.
Apple's terminfo has missing support for enter_italics_mode,
exit_italics_mode, and enter_dim_mode. Previously we would hack in such
support in set_color; migrate that to init_curses so we do it up-front
instead of opportunistically.
This commit was problematic for a few reasons:
1. It silently changed the behavior of argparse, by switching which
characters were replaced with `_` from non-alphanumeric to punctuation.
This is a potentially breaking change and there doesn't appear to be any
justification for it.
2. It combines a one-line if with a multi-line else which we should try
to avoid.
This reverts commit 63bd4eda55.
This reverts commit 4f835a0f0f.
If we ever need any of these... they're in this commit:
fish_wcswidth_visible()
status_cmd_opts_t::feature_name
completion_t::is_naturally_less_than()
parser_t::set_empty_var_and_fire()
parser_t::get_block_desc()
parser_keywords_skip_arguments()
parser_keywords_is_block()
job_t::has_internal_proc()
fish_wcswidth_visible()
1. Bravely use a real enum for has_arg, despite the warnings.
2. Use some C++11 initializers so we don't have to pass an int for this
parameter.
No functional change expected here.
This is a big cleanup to how tty transfer works. Recall that when job
control is active, we transfer the tty to jobs via tcsetpgrp().
Previously, transferring was done "as needed" in continue_job. That is, if
we are running a job, and the job wants the terminal and does not have it,
we will transfer the tty at that point.
This got pretty weird when running mixed pipelines. For example:
cmd1 | func1 | cmd2
Here we would run `func1` before calling continue_job. Thus the tty
would be transferred by the nested function invocation, and also restored
by that invocation, potentially racing with tty manipulation from cmd1 or
cmd2.
In the new model, migrate the tty transfer responsibility outside of
continue_job. The caller of continue_job is then responsible for setting up
the tty. There's two places where this gets done:
1. In `exec_job`, where we run a job for the first time.
2. In `builtin_fg` where we continue a stopped job in the foreground.
Fixes#8699
This just defines a constant to whichever tparm implementation we're
using (either the actual, working one the system provides, or our
kludge to paper over Solaris' inadequacies).
This means that there won't be so much ping-ponging of what "tparm"
stands for. "tparm" is the system's function. Only we don't use it,
just like we don't use wcstod directly.
Fixes#8780
* New -n flag for string join command.
This is an argument that excludes empty result items. Fixes#8351
* New documentation for string-join.
The new argument --no-empty was added at string-join manpage.
* New completions for the new -n flag for string join.
* Remove the documentation of the new -n flag of string join0
The reason to remove this new argument in the join0 is that this flag basically doesn't make any difference in the join0.
* Refactor the validation for the string join.
The string join command was using the length of the argument, this commit changes the validation to use the empty function.
* Revert #4b56ab452
The reason for the revert is thath the build broke on the ubuntu in the Github actions.
* Revert #e72e239a1
The reason the compilation on GitHub broke is that the test was weird, it didn't even run it, Common CI systems are typically very very resource-constrained.
* Resolve conflicts in the string-join.rst.
* Resolve conflicts in the "string-join.rst".
commit #1242d0fd7 not fixed all conflicts.
We can't always read in chunks because we often can't bear to
overread:
```fish
echo foo\nbar | begin
read -l foo
read -l bar
end
```
needs to have the first read read `foo` and the second read `bar`. So
here we can only read one byte at a time.
However, when we are directly redirected:
```fish
echo foo | read foo
```
we can, because the data is only for us anyway. The stream will be
closed after, so anything not read just goes away. Nobody else is
there to read.
This dramatically speeds up `read` of long lines through a pipe. How
much depends on the length of the line.
With lines of 5000 characters it's about 15x, with lines of 50
characters about 2x, lines of 5 characters about 1.07x.
See #8542.
The new --escape option means that -C is not necessarily the last option;
We have this scenario where we produce a bogus error
$ fish -c 'complete -C --escape'
complete: --escape: option requires an argument
--escape doesn't take arguments, so let the error message say -C.
Previously, when we got an unknown option with --ignore-unknown, we
would increment woptind but still try to read the same contents.
This means in e.g.
```
argparse -i h -- -ooo -h
```
The `-h` would also be skipped as an option, because after the first
`-o` getopt reads the other two `-o` and skips that many options.
This could be handled more extensively in wgetopt, but the simpler fix
is to just skip to the next argv entry once we have an unknown option
- there's nothing more we can do with it anyway!
Additionally, document this and clearly explain that we currently
don't transform the option.
Fixes#8637
The builtin history delete call has some code that removes a leading and
trailing quote from its arguments. This code dates back to ec34f2527a,
when the builtin was introduced. It seems wrong and tests pass
without it. Let's bravely remove it.
Use the remaining_to_disclose count to determine if all completions
are shown (allows consistent behavior between short and long completion
lists).
Closes#8485
A command like "printf nonewline | sed s/x/y/" does not print a
concluding newline, whereas "printf nnl | string replace x y" does.
This is an edge case -- usually the user input does have a newline at
the end -- but it seems still better for this command to just forward
the user's data.
Teach most string subcommands to check if stdin is missing the trailing
newline, and stop adding one in that case.
This does not apply when input is read from commandline arguments.
* Most subcommands stop adding the final newline, because they don't
really care about newlines, so besides their normal processing,
they just want to preserve user input. They are:
* string collect
* string escape/unescape
* string join¹
* string lower/upper
* string pad
* string replace
* string repeat
* string sub
* string trim
* string match keeps adding the newline, following "grep". Additionally,
for string match --regex, it's important to output capture groups
separated by newlines, resulting in multiple output lines for an
input line. So it is not obvious where to leave out the newline.
* string split/split0 keep adding the newline for the same reason --
they are meant to output multiple elements for a single input line.
¹) string join0 is not changed because it already printed a trailing
zero byte instead of the trailing newline. This is consistent
with other tools like "find -print0".
Closes#3847
A completion entry like «complete -a '\\~'» results in completions
that insert \~ into the command line. However we usually want to
insert ~, but there is no way to do that.
There are a couple of longstanding issues about completion escaping
[1]. Until we fix those in a general way, fix the common case by
never escaping tildes when applying custom completions to the command
line. This is a hack but will probably work out fine because we don't
expect literal tildes in arguments.
The tilde is included in completions for cdh, or
__fish_complete_suffix, which simply forwards results from "complete
-C". Revert a workaround to cdh that expanded ~, because we can now
render that without escaping.
Closes#4570, #8441
[ja: tweak patch and commit message]
[1]: https://github.com/fish-shell/fish-shell/pull/8441#discussion_r748803338
check_global_scope_exists is meant to warn if the user creates a
universal variable shadowing a global. In practice it always returned
success (though it may print an error). Remove its return value and
clean up the call sites. Also rename it to
`warn_if_uvar_shadows_global`. No functional change in this commit.
+ No functional change here, just renames and #include changes.
+ CMake can't have slashes in the target names. I'm suspciious of
that weird machinery for test, but I made it work.
+ A couple of builtins did not include their own headers, that
is no longer the case.