wcs2string converts a wide string to a narrow one. The result is
null-terminated and may also contain interior null-characters.
std::string allows this.
Rust's null-terminated string, CString, does not like interior null-characters.
This means we will need to use Vec<u8> or OsString for the places where we
use interior null-characters.
On the other hand, we want to use CString for places that require a
null-terminator, because other Rust types don't guarantee the null-terminator.
Turns out there is basically no overlap between the two use cases, so make
it two functions. Their equivalents in Rust will have the same name, so
we'll only need to adjust the type when porting.
This can be triggered on linux with:
```js
import { spawn } from 'child_process';
const shell = spawn('/home/alfa/dev/fish-shell/build-c++/fish', []);
```
Under node 19.8.1.
*No clue* how that happens, but since this is a workaround we shall
skip it.
Another from the "why are we asserting instead of doing something
sensible" department.
The alternative is to make exit() and return() compute their own exit
code, but tbh I don't want any *other* builtin to hit this either?
Fixes#9659
This shows some of the ugliness of the rust borrow checker when it comes to
safely implementing any sort of recursive access and the need to be overly
explicit about which types are actually used across threads and which aren't.
We're forced to use an `Arc` for `ItemMaker` (née `item_maker_t`) because
there's no other way to make it clear that its lifetime will last longer than
the FdMonitor's. But once we've created an `Arc<T>` we can't call
`Arc::get_mut()` to get an `&mut T` once we've created even a single weak
reference to the Arc (because that weak ref could be upgraded to a strong ref at
any time). This means we need to finish configuring any non-atomic properties
(such as `ItemMaker::always_exit`) before we initialize the callback (which
needs an `Arc<ItemMaker>` to do its thing).
Because rust doesn't like self-referential types and because of the fact that we
now need to create both the `ItemMaker` and the `FdMonitorItem` separately
before we set the callback (at which point it becomes impossible to get a
mutable reference to the `ItemMaker`), `ItemMaker::item` is dropped from the
struct and we instead have the "constructor" for `ItemMaker` take a reference to
an `FdMonitor` instance and directly add itself to the monitor's set, meaning we
don't need to move the item out of the `ItemMaker` in order to add it to the
`FdMonitor` set later.
CXX does not allow generic types like maybe_t. When porting a C++ function
that returns maybe_t to Rust, we return std::unique_ptr instead. Let's make
the transition more seamless by allowing to convert back to maybe_t implicitly.
* wutil: Rewrite `wrealpath` in Rust
* Reduce use of FFI types in `wrealpath`
* Addressed PR comments regarding allocation
* Replace let binding assignment with regular comparison
More ugliness with types that cxx bridge can't recognize as being POD. Using
pointers to get/set `termios` values with an assert to make sure we're using
identical definitions on both sides (in cpp from the system headers and in rust
from the libc crate as exported).
I don't know why cxx bridge doesn't allow `SharedPtr<OpaqueRustType>` but we can
work around it in C++ by converting a `Box<T>` to a `shared_ptr<T>` then convert
it back when it needs to be destructed. I can't find a clean way of doing it
from the cxx bridge wrapper so for now it needs to be done manually in the C++
code.
Types/values that are drop-in ready over ffi are renamed to match the old cpp
names but for types that now differ due to ffi difficulties I've left the `_ffi`
in the function names to indicate that this isn't the "correct" way of using the
types/methods.
The way cxx bridge works, it doesn't recognize any types from another module as
being shared cxx bridge types with generations native to both C++ and Rust,
meaning every module that was going to use function pointers would have to
define its own `c_void` type (because cxx bridge doesn't recognize any of
libc::c_void, std::ffi::c_void, or autocxx::c_void).
FFI on other platforms has long used the equivalent of `uint8_t *` as an
alternative to `void *` for code where `void` was not available or was
undesirable for some reason. We can join the club - this way we can always use
`* {const|mut} u8` in our rust code and `uint8_t *` in our C++ code to pass
around parameters or values over the C abi.
I needed to rename some types already ported to rust so they don't clash with
their still-extant cpp counterparts. Helper ffi functions added to avoid needing
to dynamically allocate an FdMonitorItem for every fd (we use dozens per basic
prompt).
I ported some functions from cpp to rust that are used only in the backend but
without removing their existing cpp counterparts so cpp code can continue to use
their version of them (`wperror` and `make_detached_pthread`).
I ran into issues porting line-by-line logic because rust inverts the behavior
of `std::remove_if(..)` by making it (basically) `Vec::retain_if(..)` so I
replaced bools with an explict enum to make everything clearer.
I'll port the cpp tests for this separately, for now they're using ffi.
Porting closures was ugly. It's nothing hard, but it's very ugly as now each
capturing lambda has been changed into an explicit struct that contains its
parameters (that needs to be dynamically allocated), a standalone callback
(member) function to replace the lambda contents, and a separate trampoline
function to call it from rust over the shared C abi (not really relevant to
x86_64 w/ its single calling convention but probably needed on other platforms).
I don't like that `fd_monitor.rs` has its own `c_void`. I couldn't find a way to
move that to `ffi.rs` but still get cxx bridge to consider it a shared POD.
Every time I moved it to a different module, it would consider it to be an
opaque rust type instead. I worry this means we're going to have multiple
`c_void1`, `c_void2`, etc. types as we continue to port code to use function
pointers.
Also, rust treats raw pointers as foreign so you can't do `impl Send for * const
Foo` even if `Foo` is from the same module. That necessitated a wrapper type
(`void_ptr`) that implements `Send` and `Sync` so we can move stuff between
threads.
The code in fd_monitor_t has been split into two objects, one that is used by
the caller and a separate one associated with the background thread (this is
made nice and clean by rust's ownership model). Objects not needed under the
lock (i.e. accessed by the background thread exclusively) were moved to the
separate `BackgroundFdMonitor` type.
Keeps the location of original function definition, and also stores
where it was copied. `functions` and `type` show both locations,
instead of none. It also retains the line numbers in the stack trace.
By default, fish does not complete files that have leading dots, unless the
wildcard itself has a leading dot. However this also affected completions;
for example `git add` would not offer `.gitlab-ci.yml` because it has a
leading dot.
Relax this for custom completions. Default file expansion still
suppresses leading dots, but now custom completions can create
leading-dot completions and they will be offered.
Fixes#3707.
When we draw the prompt, we move the cursor to the actual
position *we* think it is by issuing a carriage return (via
`move(0,0)`), and then going forward until we hit the spot.
This helps when the terminal and fish disagree on the width of the
prompt, because we are now definitely in the correct place, so we can
only overwrite a bit of the prompt (if it renders longer than we
expected) or leave space after the prompt. Both of these are benign in
comparison to staircase effects we would otherwise get.
Unfortunately, midnight commander ("mc") tries to extract the last
line of the prompt, and does so in a way that is overly naive - it
resets everything to 0 when it sees a `\r`, and doesn't account for
cursor movement. In effect it's playing a terminal, but not committing
to the bit.
Since this has been an open request in mc for quite a while, we hack
around it, by checking the $MC_SID environment variable.
If we see it, we skip the clearing. We end up most likely doing
relative movement from where we think we are, and in most cases it
should be *fine*.
This is early work but I guess there's no harm in pushing it?
Some thoughts on the conventions:
Types that live only inside Rust follow Rust naming convention
("FeatureMetadata").
Types that live on both sides of the language boundary follow the existing
naming ("feature_flag_t").
The alternative is to define a type alias ("using feature_flag_t =
rust::FeatureFlag") but that doesn't seem to be supported in "[cxx::bridge]"
blocks. We could put it in a header ("future_feature_flags.h").
"feature_metadata_t" is a variant of "FeatureMetadata" that can cross
the language boundary. This has the advantage that we can avoid tainting
"FeatureMetadata" with "CxxString" and such. This is an experimental approach,
probably not what we should do in general.
The initial port of feature flags requires a global initialization. Since
fish_indent accesses feature flags, let's make sure to initialize them here.
In future, we can stop initializing things fish_indent doesn't need (like
the topic monitor) but that's no big deal. Global initialization should
always be a benign addition.
The original implementation without the test took me 3 hours (first time
seriously looking into this)
The functions take "wcharz_t" for smooth integration with existing C++ callers.
This is at the expense of Rust callers, which would prefer "&wstr". Would be
nice to declare a function parameter that accepts both but I don't think
that really works since "wcharz_t" drops the lifetime annotation.
This works around an autocxx limitations where different types cannot
have the same name even if they live in different namespace.
ast::job_t conflicts with job_t.
This translated ctrl-k to "\v", which is a "vertical tab", and ctrl-l
to "\f" and ctrl-g to "\a".
There is no "vertical tab" or "alarm" or "\f" *key*, so these
shouldn't be translated. Just drop these and call them `\ck` and such.
(vertical tab specifically is utterly useless and I would be okay with
dropping it entirely, I have never seen it used anywhere)
Commit 3b30d92b6 (Commit transient edit when closing pager, 2022-08-31)
inadvertently introduced two regressions to history search:
1. It made Escape keeps the selected history entry,
instead of restoring the commandline before history search.
2. It made history search commands add undo entries.
Fix both of this issues.
Inadvertently broken in a2d816710f,
this made `cd .` no longer offer `cd ../` (same for general file completions
like `ls .`, which only offers dotfiles)
This meant we didn't actually do our weird en/decoding scheme for e.g.
a C locale, which meant that, when you then switch to a proper locale
the previous variables were broken.
I don't know how to test this automatically - none of my attempts seem
to ever *fail* with the old code, here's what you'd do manually:
- Run fish with an actual C locale (LC_ALL=C
fish_allow_singlebyte_locale=1 fish)
- `set -gx foo 💩`
- `set -e LC_ALL`
- `echo $foo` outputs "💩" if it works and "ð⏎" if it's broken.
Fixes#2613
This means cleaning out old universal variables is now just:
```fish
abbr --erase (abbr --list)
```
which makes upgrading much easier.
Note that this erases the currently defined variable and/or any
universal. It doesn't stop at the former because that makes it *easy*
to remove the universals (no running `abbr --erase` twice), and it
doesn't care about globals because, well, they would be gone on
restart anyway.
Fixes#9468.
Like I mentioned in #9089, 12 entries is a bit few.
So, instead, we do like we do for completions before disclosing and
pick half the screen (but at least X, in this case 12).
This avoids filling the entire screen, and will avoid an unsightly "X
more entries" (which requires scrolling down to fully disclose)
because it matches what the pager does.
Note: For multiline commands we can be pushed further upwards, and in
case of a multi-column layout we could fit more lines. That would
require asking the pager to fit as many as possible and give us back
the index of the last matching entry and rewinding the history search.
That's gonna be left as an exercise for later if it turns out to be necessary.
This now means `abbr --add` has two modes:
```fish
abbr --add name --function foo --regex regex
```
```fish
abbr --add name --regex regex replacement
```
This is because `--function` was seen to be confusing as a boolean flag.
Unfortunately print_hints was true *by default* - so for all builtins
that didn't pass it it would now be false instead.
This resulted in the trailer missing, which includes the line number
and context. So if you ran a script that includes `bind -M` the error
message would now just be "bind: -M: option requires an argument",
with no indication as to where.
This reverts commit 8a50d47a46.
The print_hints variable was always false, so just remove it.
This caused a cascade of other changes where the parser_t variable
becomes unused, so remove it from the call sites.
No functional change expected here.
When we insert characters that don't yet have highlighting, we use the
highlighting to the left, unless there is nothing to our left. The logic to
check if we are the leftmost character uses an overly loose comparison. Let's
make it more specific.
No functional change.
When there are multiple event handlers for a single event, we would print
the same log statement twice. Let's add the function name to make this
less confusing.
This would print
```
abbr -a -- dotdot --regex ^\\.\\.+\$ --function multicd
```
which expands "dotdot" to "--regex ^\\.\\.+\$...".
Instead, we move the name to right before the replacement, and move
the `--` before that:
```
abbr -a --regex ^\\.\\.+\$ --function -- dotdot multicd
```
It might be possible to improve that, but this at least round-trips.
Historical behavior is to stop option parsing at the first non-option argument.
Since we have added more options, it seemed impractical to keep that behavior.
However people are using options in their abbr expansions ("abbr e emacs
-nw"). To support this, we ignore options. However, we only ignore them
if they are not valid "abbr" options. Let's ignore all options in the
expansion definition, which is a small price to pay to keep most existing
configurations working.
Fixes#9410
This does not fix other cases which used to work, like
abbr x -unknown
Those are hopefully not used by anyone, so I don't think we need to maintain
support for that.
Enhances abbreviations with extra features
- global abbreviations
- trigger on regex match as alternative to literal match
- the ability to expand abbreviations with a user-defined function
- the ability to set cursor position after expansion
Also default the marker to '%'. So you may write:
abbr -a L --position anywhere --set-cursor "% | less"
or set an explicit marker:
abbr -a L --position anywhere --set-cursor=! "! | less"
This renames abbreviation triggers from `--trigger-on entry` and
`--trigger-on exec` to `--on-space` and `--on-enter`. These names are less
precise, as abbreviations trigger on any character that terminates a word
or any key binding that triggers exec, but they're also more human friendly
and that's a better tradeoff.
set-cursor enables abbreviations to specify the cursor location after
expansion, by passing in a string which is expected to be found in the
expansion. For example you may create an abbreviation like `L!`:
abbr L! --position anywhere --set-cursor ! "! | less"
and the cursor will be positioned where the "!" is after expansion, with
the "| less" appearing to its right.
This adds support for the `--function` option of abbreviations, so that the
expansion of an abbreviation may be generated dynamically via a fish
function.
Prior to this change, abbreviations were stored as fish variables, often
universal. However we intend to add additional features to abbreviations
which would be very awkward to shoe-horn into variables.
Re-implement abbreviations using a builtin, managing them internally.
Existing abbreviations stored in universal variables are still imported,
for compatibility. However new abbreviations will need to be added to a
function. A follow-up commit will add it.
Now that abbr is a built-in, remove the abbr function; but leave the
abbr.fish file so that stale files from past installs do not override
the abbr builtin.
This allows adjusting a pattern string so that it matches an entire
string, by wrapping the regex in a group like ^(?:...)$
This is a workaround for the fact that PCRE2_ENDANCHORED is unavailable
on PCRE2 prior to 2017, so we have to adjust the pattern instead.
Also introduce an overload of match() which creates its own
match_data_t.
We have had multiple crashes for relative CDPATH entries. Commit 5e274066e
(Always return absolute path in path_get_cdpath, 2019-10-17) tried to fix
all of them but it failed to do justice to its title. Let's fix this to
actually return absolute paths, always. Take care to to normalize the path
because it is used for autosuggestions. The normalization is mostly relevant
for CDPATH=. (the default) but it doesn't hurt others.
Closes#9407
wopterr was a feature to allow wgetopt to emit error messages; but we do
not use this and never will. Remove its support. No functional change
expected here.
We wrongly highlight this as prefix when actually the trailing slash should
invalidate it. Turns out path normalization drops the slash, so let's
sidestep that.
Fixes#9394
The "flag" field enables an option to discover which flag it was invoked
with. However in practice none of our options use multiple flags so this
parameter was always nullptr. Remove it and fix up all the builtins to
stop passing this.
No functional change here.
I believe this should be identical to the previous code and handle the same
cases (I'm guessing going by the comment that this came from a C codebase
without `bool` types).
The problem with the previous code is that it tripped up the `clangd` analyzer
into thinking `assert()` expressions can/should be simplified via DeMorgan's to
improve readability (because it was seeing the fully expanded macro).
The tty_ownership test was sometimes failing. In this test,
`fish_test_helper` creates a child and transfers the tty to it,
"abandoning" the tty. In some cases, the child was running before the
parent; the child claims the tty. When the parent tries to transfer it to
the child, it get SIGTTIN and stops. Fix this by ignoring SIGTTIN and
SIGTTOU.
This only affects macOS and BSDs.
The stack overflow tests are too slow without this.
This is because the tests are essentially quadratic: with 500 jobs, and
each job attempts to reap all jobs.
Inside a comment we offer plain file completions (or command completions if
the comment is in command position). However these completions are broken
because they don't consider any of the surrounding characters. For example
with a command line
echo # comment
^ cursor
we suggest file completions and insert them as
echo # comsomefile ment
Providing completions inside comments does not seem useful and it can be
misleading. Let's remove the completions; this should communicate better that
we are in a free-form comment that's not subject to fish syntax.
Closes#9320
When unsetting, the scope indicates the scope that was *removed* not
set, so the warning is incorrectly triggered. If anything, the confusion
is now removed or we emit a warning that the variable is still present
in another scope (but don't do that!).
Closes#9338.
This fixes#9321
IEEE Std 1003.1-2017 Issue 6 added optional error condition
[EINVAL] for if no conversion could be performed.
Switch back to wcstoimax/wcstoumax: do not work around the old FreeBSD
8 issue.
Add a test for printf '%d %d' 1 2 3
This addresses a long-standing TODO where `complete -C` output isn't
deduplicated.
With this patch, the same deduplication and sort procedure that is run on actual
pager completions is also executed for `complete -C` completions (with a `-C`
payload specified).
This makes it possible to use `complete -C` to test what completions will
actually be generated by the completions pager instead of it displaying
something completely divorced from reality, improving the productivity of fish
completions developers.
Note that completions that wouldn't be shown in the pager are also omitted from
the results, e.g. `test/buildroot/` and `test/fish_expand_test/` are omitted
from the check matches in `checks/complete_directories.fish` because even if
they were generated, the pager wouldn't have shown them. This again makes
reasoning about and debugging completions much easier and more sane.
This reverts commit 1c92d4c5db and
reintroduces support for trivially copyable `maybe_t` impls but with a
GCC version check to disable the optimization for GNU GCC compiler
versions 9 and below.
GCC 8.3.0 armhf builds seem to have a problem with the trivially
copyable `maybe_t` impl that introduces odd heisenbugs that cause the
tests to fail. GDB reveals that `maybe_t` function parameters received
in the callee differ from what was passed-in by the caller.
This behavior appears to be (but has not been confirmed as) a
platform-specific compiler bug. Under the same system (32-bit Debian 10
armhf), compiling with clang 7.0.1 does not result in any bugs and
causes all the tests to pass while compiling with GCC 10.2 under 32-bit
Debian 11 armhf also doesn't run into any problems, so just expand the
existing GCC version check that gates support for trivially copyable
`maybe_t` impls to encompass both the troublesome GCC 8 version and the
untested GCC 9 version.
This reverts commit 9d303a74e3.
This reverts commit 0305c842e6.
9d303a7 broke 32-bit armhf builds for unknown reasons, specifically in
settings where a trivial copy of `maybe_t<int>` was performed. A caller
would pass a literal int in the place of a `maybe_t<int>` parameter and
the callee would see a populated `maybe_t` but with a value of `0`
rather than the actual value that was passed in. It was too painful to
debug to a resolution under qemu.
Fixes ommitted newline char shown after complete -n'(foo)'
Also axes the 'contains syntax errors' line before the error.
Update tests
before
> complete -n'(foo)'
complete: Condition '(foo)' contained a syntax error
complete: Command substitutions not allowed⏎
after
> complete -n'(foo)'
complete: -n '(foo)': command substitutions not allowed here
This is a salvage of the "no functional changes" part of #9221, and cherry-picks
storing completion entries in a vector instead of a linked list. The legacy
"reverse intuitive" group ordering is kept by iterating in reverse order.
Tests pass but don't actually cover group order, which needs another test.
Makes it possible to retrieve the currently executing command line as
opposed to the currently executing command (`status current-command`).
Closes#8905.
There should be no functional changes in this commit.
The global variable `$_` set in the parser variables by `reader.cpp` and
read by the `status` builtin was deprecated in fish 2.0 but kept around
internally because there's no good way to store/share/forward parser
variables.
A new enum is added that identifies the status variable and they are
stored in a private array in the parser. There is no need for
synchronization because they are only set during job init and never
thereafter. This is currently asserted via ASSERT_IS_MAIN_THREAD() but
that assert can be dropped in the interest of making the parser possible
to clone and use from worker threads.
The old `$_` global variable is still kept for backwards compatibility,
though it will be dropped in a future release.
As the user is typing an argument, fish continually checks if the input is
the prefix of a valid file path. If yes, the input is underlined.
The same prefix-logic is used for all tokens on the command line, even for
"finished" tokens. This means we highlight any token that happens to be
a prefix of a valid file path. We actually want this to only apply to the
token that the user is currently typing.
Let's use the prefix-logic only for tokens adjacent to the cursor. This should
better match user expectations (and reduce IO traffic). I don't think this is
the perfect criteria but I don't know how else we can determine if a token is
"unfinished".
When visiting the "cd" node, we mark invalid paths as error, but don't
underline valid paths. This works fine most of the time because we later
underline paths (for any command, not just "cd").
However the latter check fails to honor CDPATH. Let's correct that, which
also allows to simplify the logic.
The next commit wants to move the "Underline every valid path" logic into the
visit() methods. The logic currently polls the cancel checker before checking
each path. If that's valid, it should probably have the same behavior inside
visit(). Since we currently can't cancel an AST-visitation, the next best
thing seems to suspend all IO operations, the rest should be very fast anyway.
I'm not sure if the motivation is strong enough; a conceivable alternative
would be to stop using the cancel checker altogether for highlighting.
When passing a value of type maybe_t<size_t>, clangd complains:
Parameter 'cursor' is passed by value and only copied once; consider
moving it to avoid unnecessary copies (fix available)
We get this warning because maybe_t<size_t> is not trivially copyable
because it has a user-defined destructor and copy-constructor. Let's remove
them if the contained type is trivially copyable, to avoid such warnings.
No functional change.
The destructor is equivalent to the compiler-generated one. The user-defined
destructor prevents maybe_t<size_t> from bearing the predicate "trivially
copyable". Let's remove it. No functional change.
It seems to have originally been thought that the only possible way a stack
overflow could happen is via function calls, but there are other possibilities.
Issue #9302 reports how `eval` can be abused to recursively execute a string
substitution ad infinitum, triggering a stack overflow in fish.
This patch extends the stack overflow check to also check the current
`eval_level` against a new constant `FISH_MAX_EVAL_DEPTH`, currently set to a
conservative but hopefully still fair limit of 500. For future reference, with
the default stack size for the main/foreground thread of 8 MiB, we actually have
room for a stack depth around 2800, but that's only with extremely minimal state
stored in each stack frame.
I'm not entirely sure why we don't check `eval_depth` regardless of block type;
it can't be for performance reasons since it's just a simple integer comparison
- and a ridiculously easily one for the branch predictor handle, at that - but
maybe it's to try and support non-recursive nested execution blocks of greater
than `FISH_MAX_STACK_DEPTH`? But even without recursion, the stack can still
overflow so may be we should just bump the limit up some (to 500 like the new
`FISH_MAX_EVAL_DEPTH`?) and check it all the time?
Closes#9302.
A `block_t` instance is allocated for each live block type in memory when
executing a script or snippet of fish code. While many of the items in a
`block_t` class are specific to a particular type of block, the overhead of
`maybe_t<event_t>` that's unused except in the relatively extremely rare case of
an event block is more significant than the rest, given that 88 out of the 216
bytes of a `block_t` are set aside for this field that is rarely used.
This patch reorders the `block_t` members by order of decreasing alignment,
bringing down the size to 208 bytes, then changes `maybe_t<event_t>` to
`shared_ptr<event_t>` instead of allocating room for the event on the stack.
This brings down the runtime memory size of a `block_t` to 136 bytes for a 37%
reduction in size.
I would like to investigate using inheritance and virtual methods to have a
`block_t` only include the values that actually make sense for the block rather
than always allocating some sort of storage for them and then only sometimes
using it. In addition to further reducing the memory, I think this could also be
a safer and saner approach overall, as it would make it very clear when and
where we can expect each block_type_type_t-dependent member to be present and
hold a value.
This is a false positive as a result of disabling TLS support in LSAN due to an
incompatibility with newer versions of glibc.
Also remove the older workaround (because it didn't work).
When there are multiple screens worth of output and `history` is writing to the
pager, pressing Ctrl-C at the end of a screen doesn't exit the pager (`q` is
needed for that) but previously caused fish to emit an error ("write:
Interrupted system call) until we starting silently handling SIGINT in
`fd_output_stream_t::append()`.
This patch makes `history` detect when the `append()` call returns with an error
and causes it to end early rather than repeatedly trying (and failing) to write
to the output stream.
If EINTR caused by SIGINT is encountered while writing to the
`fd_output_stream_t` output fd, mark the output stream as errored and return
false to the caller but do not visibly complain.
Addressing the outstanding TODO notwithstanding, this is needed to avoid
littering the tty with spurious errors when the user hits Ctrl-C to abort a
long-running builtin's output (w/ the primary example being `history`).
Up to now, in normal locales \x was essentially the same as \X, except
that it errored if given a value > 0x7f.
That's kind of annoying and useless.
A subtle change is that `\xHH` now represents the character (if any)
encoded by the byte value "HH", so even for values <= 0x7f if that's
not the same as the ASCII value we would diverge.
I do not believe anyone has ever run fish on a system where that
distinction matters. It isn't a thing for UTF-8, it isn't a thing for
ASCII, it isn't a thing for UTF-16, it isn't a thing for any extended
ASCII scheme - ISO8859-X, it isn't a thing for SHIFT-JIS.
I am reasonably certain we are making that same assumption in other
places.
Fixes#1352
Closes#9240.
Squash of the following commits (in reverse-chronological order):
commit 03b5cab3dc40eca9d50a9df07a8a32524338a807
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 15:09:04 2022 -0500
Handle differently declared posix_spawnxxx_t on macOS
On macOS, posix_spawnattr_t and posix_spawn_file_actions_t are declared as void
pointers, so we can't use maybe_t's bool operator to test if it has a value.
commit aed83b8bb308120c0f287814d108b5914593630a
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:48:46 2022 -0500
Update maybe_t tests to reflect dynamic bool conversion
maybe_t<T> is now bool-convertible only if T _isn't_ already bool-convertible.
commit 2b5a12ca97b46f96b1c6b56a41aafcbdb0dfddd6
Author: Mahmoud Al-Qudsi <mqudsi@neosmart.net>
Date: Sun Sep 25 14:34:03 2022 -0500
Make maybe_t a little harder to misuse
We've had a few bugs over the years stemming from accidental misuse of maybe_t
with bool-convertible types. This patch disables maybe_t's bool operator if the
type T is already bool convertible, forcing the (barely worth mentioning) need
to use maybe_t::has_value() instead.
This patch both removes maybe_t's bool conversion for bool-convertible types and
updates the existing codebase to use the explicit `has_value()` method in place
of existing implicit bool conversions.
The parent commit made the destructor of the DIR* member close it if necessary
(i.e. only if it's not null). This means that we can use the same logic in
the move constructor (where the source DIR* is null) and for move assignment
(where it might not be).
No functional change.
dir_iter_t closes its DIR* member in two places: the move assignment and
the destructor. Simplify this by closing it in the destructor of the DIR*
member which is called in both places. Use std::unique_ptr, which is shorter
than a dedicated wrapper class. Conveniently, it calls the deleter only if
the pointer is not-null. Unfortunately, std::unique_ptr requires explicit
conversion to DIR* when interacting with C APIs but it's probably still
better than a wrapper class.
This means that the noncopyable_t annotation is now implied due to the
unique_ptr member.
Additionally, we could probably remove the user-declared move constructor
and move assignment (the compiler-generated ones should be good enough). To
be safe, keep them around since they also erase the fd (though I hope we
don't rely on that behavior anywhere).
We should perhaps remove the user-declared destructor entirely but
dir_iter_t::entry_t also has one, I'm not sure why. Maybe there's a good
reason, like code size.
No functional change.
This was recently converted to a while-loop. However, we only
loop in a specific case when (by hitting "continue") so a
loop condition is not necessary.
No functional change.
We forgot to decode (i.e. turn into nice wchar_t codepoints)
"byte_literal" escape sequences. This meant that e.g.
```fish
string match ö \Xc3\Xb6
math 5 \X2b 5
```
didn't work, but `math 5 \x2b 5` did, and would print the wonderful
error:
```
math: Error: Missing operator
'5 + 5'
^
```
So, instead, we decode eagerly.
descend_unique_hierarchy is used for the cd autosuggestion: if a directory
contains exactly one subdirectory and no other entries, then propose that
as part of the cd autosuggestion.
This had a bug: if the subdirectory is a symlink to the parent, we would
chase that, going around the loop suggesting a longer path until we hit
PATH_MAX.
Fix this by using the new API which provides the inode "for free," and
track whether we've seen this inode before. This is technically too
conservative since the inode may be for a directory on a different device,
but devices are not available for free so this would incur a cost. In
practice encountering the same inode twice with different devices in a
unique hierarchy is unlikely, and should it happen the consequences are
merely cosmetic: we fail to suggest a longer path.
This introduces dir_iter_t, a new class for iterating the contents of a
directory. dir_iter_t encapsulates the logic that tries to avoid using
stat() to determine the type of a file, when possible.
While we hardcode the return values for the rest of our builtins, the `return`
builtin bubbles up whatever the user returned in their fish script, allowing
invalid return values such as negative numbers to make it into our C++ side of
things.
In creating a `proc_status_t` from the return code of a builtin, we invoke
W_EXITCODE() which is a macro that shifts left the return code by some amount,
and left-shifting a negative integer is undefined behavior.
Aside from causing us to land in UB territory, it also can cause some negative
return values to map to a "successful" exit code of 0, which was probably not
the fish script author's intention.
This patch also adds error logging to help catch any inadvertent additions of
cases where a builtin returns a negative value (should one forget that unix
return codes are always positive) and an assertion protecting against UB.
This was always the case if HAVE_TEXT wasn't defined, but if it was then we were
coercing the result of `_C()` to a `const wchar_t *` pointer, because we were
returning the address of a constant zero-length wchar_t pointer. This reserves a
local static `wcstring` variable that we can return as the "no text" sentinel
and bubbles back the `wcstring` reference rather than decomposing it into a
pointer.
This is a prerequisite for a bigger change I'm working on.
It's gone from 136 bytes to a 128 bytes by rearranging the items in order of
decreasing alignment requirements. While this reduces the memory consumption
slightly (by around 6%) for each completion we have in-memory, that translates
to only around ~8KiB of savings for a command with 1000 possible completions,
which is nice but ultimately not that big of a deal.
The bigger benefit is that a single `complete_entry_t` might now fit in a cache
line, hopefully making the process of testing completions for matches more
cache friendly (and maybe even faster).
The impact here depends on the command and how much output it
produces.
It's possible to get up to 1.5x - `string upper` being a good example,
or a no-op `string match '*'`.
But the more the command actually needs to do, the less of an effect
this has.
This basically immediately issues a "write()" if it's to a pipe or the
terminal.
That means we can reduce syscalls and improve performance, even by
doing something like
```c++
streams.out.append(somewcstring + L"\n");
```
instead of
```c++
streams.out.append(somewcstring);
streams.out.push_back(L'\n');
```
Some benchmarks of the
```fish
for i in (string repeat -n 2000 \n)
$thing
end
```
variety:
1. `set` (printing variables) sped up 1.75x
2. `builtin -n` 1.60x
3. `jobs` 1.25x (with 3 jobs)
4. `functions` 1.20x
5. `math 1 + 1` 1.1x
6. `pwd` 1.1x
Piping yields similar results, there is no real difference when
outputting to a command substitution.
This writes the output once per argument instead of once per format or
escaped char.
An egregious case:
```fish
printf (string repeat -n 200 \\x7f)%s\n (string repeat -n 2000 aaa\n)
```
Has been sped up by ~20x by reducing write() calls from 40000 to 200.
Even a simple
```fish
printf %s\n (string repeat -n 2000 aaa\n)
```
should now be ~1.2x faster by issuing 2000 instead of 4000 write
calls (the `\n` was written separately!).
This at least halves the number of "write()" calls we do if it goes to
a pipe or the terminal, or reduces them by 75% if there is a
description.
This makes
```fish
complete -c foo -xa "(seq 50000)"
complete -C"foo "
```
faster by 1.33x.
This uses wreaddir_resolving, which tries to use the dirent d_type
field if it exists. In that way, it can skip the `stat` to determine
if the given file is a directory.
This allows `cd` completions to skip stat in most cases:
```fish
strace -Ce newfstatat fish --no-config -c 'complete -C"cd /tmp/completion_test/"' >/dev/null
```
prints before:
```
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100,00 0,002627 2 1033 4 newfstatat
```
after:
```
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100,00 0,000054 1 31 3 newfstatat
```
for a directory with 1000 subdirectories.
(just `fish --no-config -c exit` does 26 newfstatat)
This should improve the situation with slow filesystems like fuse or
network fsen.
In case we have no d_type, we use `stat`, which would yield about the
same results.
The worst case is that we need directories *and* descriptions or the
"executable" flag (which we don't currently check for cd, if I read
this right?).
This flag determines whether or not more shortopt switches will be offered up as
potential completions (vs only the payload for the last-parsed shortopt switch).
Previously, it was being stomped before it was determined whether or not two
`complete` rules with different `result_mode.requires_param` values were
actually resolved against the current command line or not, and the last
evaluated completion rule would win out.
There are two changes here:
* `last_option_requires_param` is only assigned if all associated conditions for
a potential completion are also met, and
* If already assigned by a conflicting rule (which can only be user/developer
error), `last_option_requires_param` is allowed to change from true to false
but not the other way around (i.e. in case of a conflict, generate both
payloads and other shortopt completions)
The first change is immediately noticeable and affects many of our own
completions, see the discussion in #9221 for an example regarding `git` where
`-c` has any of about a million different possible meanings depending on which
completion preconditions have been met. The second change should only happen if
a dev/user mistakenly enters a `complete -c ...` rule for the same shortopt more
than once, both with conditions matching, sometimes requiring an argument and
not sometimes not. It should be a rare occurence.
This reverts commit 3d8f98c395.
In addition to the issues mentioned on the GitHub page for this commit,
it also broke the CentOS 7 build.
Note one can locally test the CentOS 7 build via:
./docker/docker_run_tests.sh ./docker/centos7.Dockerfile
Be more careful with sign extension issues stemming from the differences in how
an untyped literal is promoted to an integer vs how a typed (and signed) `char`
is promoted to an integer.
Also convert some `const[expr] static xxx` to `const[expr] xxx` where it makes
sense to let the compiler deduce on its own whether or not to allocate storage
for a constant variable rather than imposing our view that it should have STATIC
storage set aside for it.
A few call sites were not making use of the `XXX_LEN` definitions and were
calling `strlen(XXX)` - these have been updated to use `const_strlen(XXX)`
instead.
I'm not sure if any toolchains will have raise any issues with these changes...
CI will tell!
The optimization takes references to strings which are stored in a vector,
and stores those references in a set; but the strings are simultaneously
being moved within the vector, which may invalidate those references.
It's probably safe if you work through which particular strings are being
moved, but as a matter of principle we shouldn't take references to elements
of a vector while the vector is being rearranged, absenet a clear improvement
on a benchmark.
This reverts commit d5561623aa.
Whenever the command line changes, we redraw it with the previously computed
syntax highlighting. At the same time we start recomputing highlighting in
a background thread.
On some systems, the highlighting computation is slow, so the stale syntax
highlighting is visible.
The stale highlighting was computed for an old commandline. When the user
had inserted or deleted some characters in the middle, then the highlighting
is wrong for the characters to the right. This is because the characters
to the right have shifted but the highlighting hasn't. Fix this by also
shifting highlighting.
This means that text that was alrady highlighted will use the same
highlighting until a new one is computed. Newly inserted text uses the color
left of the cursor.
This is implemented by giving editable_line_t ownership of the highlighting.
It is able to perfectly sync text and highlighting; they will invariably
have the same length.
Fixes#9180
While its true that we only ever call this with temporaries, there is no
fundamental reason for this restriction. Taking by value is simpler and
more flexible. I think it does not change the generated code.
No functional change.
The idea for this function was that it stands as the one place that modifies
the text without push_edit. In practice I don't think it helps.
No functional change.
In theory this does less work so we should generally use this style.
In practice it looks uglier so I'm not sure. Maybe wait for stdlib ranges...
No functional change.
It turns out there *is* an obviously portable way... except it's
not-so-obviously not portable after all.
POSIX specifies that sigqueue(2) can be used to validate pid and signo
separately, returning EINVAL in the specific case of an invalid or unsupported
signal number. This would be perfect... if only it were actually implemented.
It seems that the WSLv1 implementation of pselect(2) does not check for
undelivered signals after the temporary sigmask is un-applied from the thread in
question.
When fish runs with job control enabled, it transfers ownership of the
tty to a child process, and then reclaims the tty after the process
exits. If job control is disabled then fish does not transfer or reclaim
the tty.
It may happen that the child process creates a pgroup and then transfers
the tty to it. In that case fish will not attempt to reclaim the tty, as
fish did not transfer it. Then when fish reads from stdin it will
receive SIGTTIN instead of data.
Fix this by unconditionally claiming the tty in readline().
Fixes#9181
This errored out *later* because the result was infinite or NaN, but
it didn't actually stop evaluation.
I'm not sure if there is a way to get floating point math to turn an
infinity back into something that doesn't depend on a literal
infinity, but division by zero conceptually isn't a thing we can
support.
There's entire branches of maths dedicated to figuring out what
dividing by "basically zero" means and we don't have to get into it.
This is essentially the inverse of `string pad`.
Where that adds characters to get up to the specified width,
this adds an ellipsis to a string if it goes over a specific maximum width.
The char can be given, but defaults to our ellipsis string.
("…" if the locale can handle it and "..." otherwise)
If the ellipsis string is empty, it just truncates.
For arguments given via argv, it goes line-by-line,
because otherwise length makes no sense.
If "--no-newline" is given, it adds an ellipsis instead and removes all subsequent lines.
Like pad and `length --visible`, it goes by visible width,
skipping recognized escape sequences, as those have no influence on width.
The default target width is the shortest of the given widths that is non-zero.
If the ellipsis is already wider than the target width,
we truncate instead. This is safer overall, so we don't e.g. move into a new line.
This is especially important given our default ellipsis might be width 3.
When selecting items in the pager, only the latest of those items is kept
in the edit history, as so-called transient edit. Each new transient edit
evicts any old transient edit (via undo).
If the pager is closed by a command that performs another transient edit
(like history-token-search-backward) we thus inadvertently undo (= remove)
the token inserted by the pager. Fix this by closing a transient edit
session when closing the pager. Token search will start its own session.
Fixes#9160
strncpy will fill the entire buffer with NUL.
In this case we have a 128 byte buffer and write "empty" - 5 bytes -
into it.
So now instead of writing 6 bytes it'll write 128 bytes. Especially
wasteful because we already did memset before
This fixes a crash when you open the history pager and then do
history-token-search-backward (e.g. alt+. or alt-up).
It would sometimes crash because the `colors.at(i)` was an
out-of-bounds access.
Note: This might still leave the highlighting offset in some
cases (not quite sure why), but at least it doesn't *crash*, and the
search generally *works*.
This reverts commit 3e556b984c.
Revert "Further fix the issue and add the assert that'd have prevented it."
This reverts commit 056502001e.
Revert "Fix actual issue with allow_use_posix_spawn."
This reverts commit 85b9f3c71f.
Revert "Stop using posix_spawn when it is not allowed"
This reverts commit 9c896e1990.
Revert "don't even set up a fish_use_posix_spawn handler if unsupported"
This reverts commit 8b14ac4a9c.
Commit 8b14ac4a9c started using
posix_spawn even if allow_use_posix_spawn() returns false. Stop doing
that.
This may be reproduced with:
./docker/docker_run_tests.sh ./docker/centos7.Dockerfile
as centos7 has a too-old glibc.
Let's hope this doesn't causes build failures for e.g. musl: I just
know it's good on macOS and our Linux CI.
It's been a long time.
One fix this brings, is I discovered we #include assert.h or cassert
in a lot of places. If those ever happen to be in a file that doesn't
include common.h, or we are before common.h gets included, we're
unawaringly working with the system 'assert' macro again, which
may get disabled for debug builds or at least has different
behavior on crash. We undef 'assert' and redefine it in common.h.
Those were all eliminated, except in one catch-22 spot for
maybe.h: it can't include common.h. A fix might be to
make a fish_assert.h that *usually* common.h exports.
This is a *tiny* commit code-wise, but the explanation is a bit
longer.
When I made string read in chunks, I picked a chunk size from bash's
read, under the assumption that they had picked a good one.
It turns out, on the (linux) systems I've tested, that's simply not
true.
My tests show that a bigger chunk size of up to 4096 is better *across
the board*:
- It's better with very large inputs
- It's equal-to-slightly-better with small inputs
- It's equal-to-slightly-better even if we quit early
My test setup:
0. Create various fish builds with various sizes for
STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE".
1. Download the npm package names from
https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I
used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB)
2. Extract the names so we get a line-based version:
```fish
jq '.[]' names.json | string trim -c '"' >/tmp/all
```
3. Create various sizes of random extracts:
```fish
for f in 10000 1000 500 50
shuf /tmp/all | head -n $f > /tmp/$f
end
```
(the idea here is to defeat any form of pattern in the input).
4. Run benchmarks:
hyperfine -w 3 ./fish-{128,512,1024,2048,4096}"
-c 'for i in (seq 1000)
string match -re foot < $f
end; true'"
(reduce the seq size for the larger files so you don't have to wait
for hours - the idea here is to have some time running string and not
just fish startup time)
This shows results pretty much like
```
Summary
'./fish-2048 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true'' ran
1.01 ± 0.02 times faster than './fish-4096 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.02 ± 0.03 times faster than './fish-1024 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.08 ± 0.03 times faster than './fish-512 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
1.47 ± 0.07 times faster than './fish-128 -c 'for i in (seq 1000)
string match -re foot < /tmp/500
end; true''
```
So we see that up to 1024 there's a difference, and after that the
returns are marginal. So we stick with 1024 because of the memory
trade-off.
----
Fun extra:
Comparisons with `grep` (GNU grep 3.7) are *weird*. Because you both
get
```
'./fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran
11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end''
```
and
```
'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran
66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
```
Basically, if you *can* give grep a lot of work at once (~40MB in this
case), it'll churn through it like butter. But if you have to call it
a lot, string beats it by virtue of cheating.
Rephrase this to more explicitly indicate that the uvar actually
was successfully set. I believe the prior phrasing can leave some
ambiguity as far as wether set just failed with an error, whether it
has done anything or not.
Now uses the same macro other builtins use for a missing -e arg,
and the error message show the short or long option as it was used.
e.g. before
$ set -e
set: Erase needs a variable name
after
$ set --erase
set: --erase: option requires an argument
$ set -e
set: -e: option requires an argument
Intern'd strings were intended to be "shared" to reduce memory usage but
this optimization doesn't carry its weight. Remove it. No functional
change expected.
We store filenames in function definitions to indicate where the
function comes from. Previously these were intern'd strings. Switch them
to a shared_ptr<wcstring>, intending to remove intern'd strings.
The history pager will show multiline commands in single-line cells.
We escape newline characters as \\n but that looks awkward if the next line
starts with a letter. Let's render control characters using their corresponding
symbol from the Control Pictures Unicode block.
This means there is also no need to escape backslashes, which further improves
the history pager - now the rendering has exactly as many backslashes as
the eventual command.
This means that (multiline) commands in the history pager will be rendered
with the same amount of characters as are in the actual command (unless
they contain funny nonprintables). This makes it easy for the next commit
to highlight multiline commands correctly in the history pager.
The font size for these symbols (for example ␉) is quite small, but that's
okay since for the proposed uses it's not so important that they readable.
The important thing is that the stand out from surrounding text.
This checked specifically for "| and" and "a=b" and then just gave the
error without a caret at all.
E.g. for a /tmp/broken.fish that contains
```fish
echo foo
echo foo | and cat
```
This would print:
```
/tmp/broken.fish (line 3): The 'and' command can not be used in a pipeline
warning: Error while reading file /tmp/broken.fish
```
without any indication other than the line number as to the location
of the error.
Now we do
```
/tmp/broken.fish (line 3): The 'and' command can not be used in a pipeline
echo foo | and cat
^~^
warning: Error while reading file /tmp/broken.fish
```
Another nice one:
```
fish --no-config -c 'echo notprinted; echo foo; a=b'
```
failed to give the error message!
(Note: Is it really a "warning" if we failed to read the one file we
wer told to?)
We should check if we should either centralize these error messages
completely, or always pass them and remove this "code" system, because
it's only used in some cases.
This skipped printing a "^" line if the start or length of the error
was longer than the source.
That seems like the correc thing at first glance, however it means
that the caret line isn't skipped *if the file goes on*.
So, for example
```fish
echo "$abc["
```
by itself, in a file or via `fish -c`, would not print an error, but
```fish
echo "$abc["
true
```
would. That's not a great way to print errors.
So instead we just.. imagine the start was at most at the end.
The underlying issue why `echo "$abc["` causes this is that `wcstol`
didn't move the end pointer for the index value (because there is no
number there). I'd fix this, but apparently some of
our recursive variable calls absolutely rely on this position value.
This makes the awkward case
fish: Unexpected end of string, square brackets do not match
echo f[oo # not valid, no matching ]
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(that `]` is simply the last character on the line, it's firmly in a comment)
less awkward by only marking the starting brace.
The implementation here is awkward mostly because the tok_t
communicates two things: The error location and how to carry on.
So we need to store the error length separately, and this is the first
time we've done so.
It's possible we can make this simpler.
This makes it so instead of marking the error location with a simple
`^`, we mark it with a caret, then a run of `~`, and then an ending `^`.
This makes it easier to see where exactly an error occured, e.g. which
command substitution was meant.
Note: Because this uses error locations that haven't been exposed like
that, it's likely to shake out weirdnesses and inaccuracies. For that
reason I've not adjusted the tests yet.