Work around broken rendering of pasted multibyte chars in non-UTF-8-ish locale

Run

    printf \Xf6 | wl-copy # ö in ISO-8859-1
    LANG=de_DE LC_ALL=$LANG gnome-terminal -- build/fish

and press ctrl-v. The pasted data looks like this:

    $ set data (wl-paste -n 2>/dev/null | string collect -N)
    $ set -S data
    $data: set in local scope, unexported, with 1 elements
    $data[1]: |\Xf6|

we pass $data directly to "commandline -i", which is supposed to insert it
into the commandline verbatim. What's actually inserted is "�".

This is because of all of:
1. We never decode "\Xf6 -> ö" in this scenario. Decoding it -- like we do
   for non-pasted keyboard input -- would fix the issue.
2. We've switched to using Rust's char, which, for better or worse, disallows
   code points that are not valid in Unicode (see b77d1d0e2 (Stop crashing
   on invalid Unicode input, 2024-02-27)). This means that we don't simply
   store \Xf6 as '\u{00f6}'. Instead we use our PUA encoding trick, making it
   \u{f6f6} internally.
3. Finally, b77d1d0e2 renders reserved codepoints (which includes PUA chars)
   using the replacement character � (sic).  This was deemed more
   user-friendly than printing an invalid character (which is probably not
   mapped to a glyph).  Yet it causes problems here: since we think that
   \u{f6f6} is garbage, we try to render the replacement character. Apparently
   that one is not defined(?) in ISO-8859-1; we get "�".

Fix this regression by removing the replacement character feature.

In future we should maybe decode pasted input instead. We could do that
lazily in "commandline -i", or eagerly in "set data (wl-paste ...)".
This commit is contained in:
Johannes Altmanninger 2024-08-03 11:01:11 +02:00
parent 8b028c37e5
commit e25a1358e6

View File

@ -19,7 +19,7 @@ use std::time::SystemTime;
use libc::{ONLCR, STDERR_FILENO, STDOUT_FILENO};
use crate::common::{
fish_reserved_codepoint, get_ellipsis_char, get_omitted_newline_str, get_omitted_newline_width,
get_ellipsis_char, get_omitted_newline_str, get_omitted_newline_width,
has_working_tty_timestamps, shell_modes, str2wcstring, wcs2string, write_loop, ScopeGuard,
ScopeGuarding,
};
@ -1825,9 +1825,6 @@ fn compute_layout(
// \n.
// See https://unicode-table.com/en/blocks/control-pictures/
fn rendered_character(c: char) -> char {
if fish_reserved_codepoint(c) {
return '�'; // replacement character
}
if c <= '\x1F' {
char::from_u32(u32::from(c) + 0x2400).unwrap()
} else {