Commit Graph

8 Commits

Author SHA1 Message Date
Fabian Boehm
52dcfe11af Make \x the same as \X
Up to now, in normal locales \x was essentially the same as \X, except
that it errored if given a value > 0x7f.

That's kind of annoying and useless.

A subtle change is that `\xHH` now represents the character (if any)
encoded by the byte value "HH", so even for values <= 0x7f if that's
not the same as the ASCII value we would diverge.

I do not believe anyone has ever run fish on a system where that
distinction matters. It isn't a thing for UTF-8, it isn't a thing for
ASCII, it isn't a thing for UTF-16, it isn't a thing for any extended
ASCII scheme - ISO8859-X, it isn't a thing for SHIFT-JIS.

I am reasonably certain we are making that same assumption in other
places.

Fixes #1352
2022-10-09 15:24:01 +02:00
Fabian Boehm
396e276286 Decode multibyte escapes immediately
We forgot to decode (i.e. turn into nice wchar_t codepoints)
"byte_literal" escape sequences. This meant that e.g.

```fish
string match ö \Xc3\Xb6

math 5 \X2b 5
```

didn't work, but `math 5 \x2b 5` did, and would print the wonderful
error:

```
math: Error: Missing operator
'5 + 5'
   ^
```

So, instead, we decode eagerly.
2022-10-05 18:55:01 +02:00
Fabian Homborg
046db09f90
Try to set LC_CTYPE to something UTF-8 capable (#8031)
* Try to set LC_CTYPE to something UTF-8 capable

When fish is started with LC_CTYPE=C (even just effectively, often via
LC_ALL=C!), it's basically broken. There's no way to handle non-ASCII
characters with a C locale unless we want to write our
locale-independent replacements for all of the system functions.

Since we're not going to do that, let's try to find *some locale* for
LC_CTYPE.

We already do that in __fish_setlocale, but that's

- a bit of a weird thing that reads unstandardized system
  configuration files
- allows setting locale to C explicitly

So it's still easily possible to end up in a broken configuration.

Now, the issue with this is that there is (AFAICT) no portable way to
get a list of all allowed locales and C.UTF-8 is not standardized, so
we have no one locale to fall back on and are forced to try a few. The
list we have here is quite arbitrary, but it's a start.

Python does something similar and only tries C.UTF-8, C.utf8 and
"UTF-8".

Once C.UTF-8 is (hopefully) standardized, that will just start
working (tm).

Note that we do not *export* the fixed LC_CTYPE variable, so external
programs still have to deal with the C locale, but we have no real
business messing with the user's environment.

To turn it off: $fish_allow_singlebyte_locale, if set to something true (like "1"),
will re-run the locale initialization and skip the bit where we force
LC_CTYPE to be utf8-capable.

This is mainly used in our tests, but might also be useful if people
are trying to do something weird.
2021-06-06 09:28:32 +02:00
Fabian Homborg
3b87547411 Fix skipping locale tests on Github Actions 2021-04-16 09:01:41 +02:00
Fabian Homborg
fff158fd2b tests: Disable locale.fish on Github Actions
Sometimes hangs with tsan.

Works around #7934.
2021-04-15 17:26:08 +02:00
Fabian Homborg
9367d4ff71 Reindent functions to remove useless quotes
This does not include checks/function.fish because that currently
includes a "; end" in a message that indent would remove, breaking the test.
2020-03-09 19:46:43 +01:00
Fabian Homborg
e9b4f5f0ab Replace references to ".../test/root/bin/fish" in the checks 2020-02-08 11:06:36 +01:00
Fabian Homborg
0fab1ce8b4 Port locale test to littlecheck 2020-02-08 09:38:23 +01:00