Padding with an unprintable character is now disallowed, like it was for other
zero-length characters.
`string shorten` now ignores escape sequences and non-printable characters
when calculating the visible width of the ellipsis used (except for `\b`,
which is treated as a width of -1).
Previously `fish_wcswidth` returned a length of -1 when the ellipsis-str
contained any non-printable character, causing the command to poentially
print a larger width than expected.
This also fixes an integer overflows in `string shorten`'s
`max` and `max2`, when the cumulative sum of character widths turned negative
(e.g. with any non-printable characters, or `\b` after the changes above).
The overflow potentially caused strings containing non-printable characters
to be truncated.
This adds test that verify the fixed behaviour.
- Add test to verify piped string replace exit code
Ensure fields parsing error messages are the same.
Note: C++ relied upon the value of the parsed value even when `errno` was set,
that is defined behaviour we should not rely on, and cannot easilt be replicated from Rust.
Therefore the Rust version will change the following error behaviour from:
```shell
> string split --fields=a "" abc
string split: Invalid fields value 'a'
> string split --fields=1a "" abc
string split: 1a: invalid integer
```
To:
```shell
> string split --fields=a "" abc
string split: a: invalid integer
> string split --fields=1a "" abc
string split: 1a: invalid integer
```
We could end up overflowing if we print out something that's a multiple of the
chunk size, which would then finish printing in the chunk-printing, but not
break out early.
* Make NULs work for builtins
This switches from passing a c-string to output_stream_t::append to
passing a proper string.
That means a builtin that prints a NUL no longer crashes with "thread '' panicked
at 'String contained intermediate NUL character: ".
Instead, it will actually handle the NUL, even as an argument.
That means something like
`echo foo\x00bar` will now actually print a NUL instead of truncating
after the `foo` because we passed c-strings around everywhere.
The former is *necessary* for e.g. `string`, the latter is a change
that on the whole makes dealing with NULs easier, but it is a
behavioral change.
To restore the c-string behavior we would have to truncate arguments
at NUL.
See #9739.
* Use AsRef instead of trait bound
This is essentially the inverse of `string pad`.
Where that adds characters to get up to the specified width,
this adds an ellipsis to a string if it goes over a specific maximum width.
The char can be given, but defaults to our ellipsis string.
("…" if the locale can handle it and "..." otherwise)
If the ellipsis string is empty, it just truncates.
For arguments given via argv, it goes line-by-line,
because otherwise length makes no sense.
If "--no-newline" is given, it adds an ellipsis instead and removes all subsequent lines.
Like pad and `length --visible`, it goes by visible width,
skipping recognized escape sequences, as those have no influence on width.
The default target width is the shortest of the given widths that is non-zero.
If the ellipsis is already wider than the target width,
we truncate instead. This is safer overall, so we don't e.g. move into a new line.
This is especially important given our default ellipsis might be width 3.
* string repeat: Don't allocate repeated string all at once
This used to allocate one string and fill it with the necessary
repetitions, which could be a very very large string.
Now, it instead uses one buffer and fills it to a chunk size,
and then writes that.
This fixes:
1. We no longer crash with too large max/count values. Before they
caused a bad_alloc because we tried to fill all RAM.
2. We no longer fill all RAM if given a big-but-not-too-big value. You
could've caused fish to eat *most* of your RAM here.
3. It can start writing almost immediately, instead of waiting
potentially minutes to start.
Performance is about the same to slightly faster overall.
A command like "printf nonewline | sed s/x/y/" does not print a
concluding newline, whereas "printf nnl | string replace x y" does.
This is an edge case -- usually the user input does have a newline at
the end -- but it seems still better for this command to just forward
the user's data.
Teach most string subcommands to check if stdin is missing the trailing
newline, and stop adding one in that case.
This does not apply when input is read from commandline arguments.
* Most subcommands stop adding the final newline, because they don't
really care about newlines, so besides their normal processing,
they just want to preserve user input. They are:
* string collect
* string escape/unescape
* string join¹
* string lower/upper
* string pad
* string replace
* string repeat
* string sub
* string trim
* string match keeps adding the newline, following "grep". Additionally,
for string match --regex, it's important to output capture groups
separated by newlines, resulting in multiple output lines for an
input line. So it is not obvious where to leave out the newline.
* string split/split0 keep adding the newline for the same reason --
they are meant to output multiple elements for a single input line.
¹) string join0 is not changed because it already printed a trailing
zero byte instead of the trailing newline. This is consistent
with other tools like "find -print0".
Closes#3847
widechar_width no longer classifies U+1F41F as widened-in-9, so the
width no longer changes.
Since we're interested in testing the change here, we need a different
emoji.
Just use 🥁, which was introduced in 9 as wide, and therefore widened
in 9.
This makes it so we treat backspaces as width -1, but never go below a
0 total width when talking about *lines*, like in screen or string
length --visible.
Fixes#8277.
Because we are, ultimately, interested in how many cells a string
occupies, we *have* to handle carriage return (`\r`) and line
feed (`\n`).
A carriage return sets the current tally to 0, and only the longest
tally is kept. The idea here is that the last position is the same as
the last position of the longest string. So:
abcdef\r123
ends up looking like
123def
which is the same width as abcdef, 6.
A line feed meanwhile means we flush the current tally and start a new
one. Every line is printed separately, even if it's given as one.
That's because, well, counting the width over multiple lines
doesn't *help*.
As a sidenote: This is necessarily imperfect, because, while we may
know the width of the terminal ($COLUMNS), we don't know the current
cursor position. So we can only give the width, and the user can then
figure something out on their own.
But for the common case of figuring out how wide the prompt is, this
should do.
* string: Allow `collect --no-empty` to avoid empty ellision
Currently we still have that issue where
test -n (thing | string collect)
can return true if `thing` doesn't print anything, because the
collected argument will still be removed.
So, what we do is allow `--no-empty` to be used, in which case we
print one empty argument.
This means
test -n (thing | string collect -n)
can now be safely used.
"no-empty" isn't the best name for this flag, but string's design
really incentivizes reusing names, and it's not *terrible*.
* Switch to `--allow-empty`
`--no-empty` does the exact opposite for `string split` and split0.
Since `-a`/`--allow-empty` already exists, use it.
This used to print a literal DEL character in the output for `bind`,
which wouldn't actually show up and made it hard to figure out what
the key was.
So we just escape it back to how we actually used it - `\x7f`.
Fixes#7631.
E.g. if we do `string match -q`, and we find a match, nothing about
the input can change anything, so we quit early.
This is mainly useful for performance, but it also allows `string`
with `-q` to be used with infinite input (e.g. `yes`).
Alternative to #7495.
If the padding is not divisible by the char's width without remainder,
we pad the remainder with spaces, so the total width of the output is correct.
Also add completions, changelog entry, adjust documentation, add examples
with emoji and some tests. Apply some minor style nitpicks and avoid extra
allocations of the input strings.
It's now good enough to do so.
We don't allow grid-alignment:
```fish
complete -c foo -s b -l barnanana -a '(something)'
complete -c foo -s z -a '(something)'
```
becomes
```fish
complete -c foo -s b -l barnanana -a '(something)'
complete -c foo -s z -a '(something)'
```
It's just more trouble than it is worth.
The one part I'd change:
We align and/or'd parts of an if-condition with the in-block code:
```fish
if true
and false
dosomething
end
```
becomes
```fish
if true
and false
dosomething
end
```
but it's not used terribly much and if we ever fix it we can just
reindent.
The `--entire` would enable output even though the `--quiet` should
have silenced it. These two don't make any sense together so print an
error, because the user could have just left off the `-q`.
Previously when propagating explicitly separated output, we would early-out
if the buffer was empty, where empty meant contains no characters. However
it may contain one or more empty strings, in which case we should propagate
those strings.
Remove this footgun "empty" function and handle this properly.
Fixes#5987