The translation is fairly direct though it adds some duplication, for example
there are multiple "match" statements that mimic function overloading.
Rust has no overloading, and we cannot have generic methods in the Node trait
(due to a Rust limitation, the error is like "cannot be made into an object")
so we include the type name in method names.
Give clients like "indent_visitor_t" a Rust companion ("IndentVisitor")
that takes care of the AST traversal while the AST consumption remains
in C++ for now. In future, "IndentVisitor" should absorb the entirety of
"indent_visitor_t". This pattern requires that "fish_indent" be exposed
includable header to the CXX bridge.
Alternatively, we could define FFI wrappers for recursive AST traversal.
Rust requires we separate the AST visitors for "mut" and "const"
scenarios. Take this opportunity to concretize both visitors:
The only client that requires mutable access is the populator. To match the
structure of the C++ populator which makes heavy use of function overloading,
we need to add a bunch of functions to the trait. Since there is no other
mutable visit, this seems acceptable.
The "const" visitors never use "will_visit_fields_of()" or
"did_visit_fields_of()", so remove them (though this is debatable).
Like in the C++ implementation, the AST nodes themselves are largely defined
via macros. Union fields like "Statement" and "ArgumentOrRedirection"
do currently not use macros but may in future.
This commit also introduces a precedent for a type that is defined in one
CXX bridge and used in another one - "ParseErrorList". To make this work
we need to manually define "ExternType".
There is one annoyance with CXX: functions that take explicit lifetime
parameters require to be marked as unsafe. This makes little sense
because functions that return `&Foo` with implicit lifetime can be
misused the same way on the C++ side.
One notable change is that we cannot directly port "find_block_open_keyword()"
(which is used to compute an error) because it relies on the stack of visited
nodes. We cannot modify a stack of node references while we do the "mut"
walk. Happily, an idiomatic solution is easy: we can tell the AST visitor
to backtrack to the parent node and create the error there.
Since "node_t::accept_base" is no longer a template we don't need the
"node_visitation_t" trampoline anymore.
The added copying at the FFI boundary makes things slower (memcpy dominates
the profile) but it's not unusable, which is good news:
$ hyperfine ./fish.{old,new}" -c 'source ../share/completions/git.fish'"
Benchmark 1: ./fish.old -c 'source ../share/completions/git.fish'
Time (mean ± σ): 195.5 ms ± 2.9 ms [User: 190.1 ms, System: 4.4 ms]
Range (min … max): 193.2 ms … 205.1 ms 15 runs
Benchmark 2: ./fish.new -c 'source ../share/completions/git.fish'
Time (mean ± σ): 677.5 ms ± 62.0 ms [User: 665.4 ms, System: 10.0 ms]
Range (min … max): 611.7 ms … 805.5 ms 10 runs
Summary
'./fish.old -c 'source ../share/completions/git.fish'' ran
3.47 ± 0.32 times faster than './fish.new -c 'source ../share/completions/git.fish''
Leftovers:
- Enum variants are still snakecase; I didn't get around to changing this yet.
- "ast_type_to_string()" still returns a snakecase name. This could be
changed since it's not user visible.
A JobId is not supposed to convert to other types.
Since this type is defined as NonZeroU32 (which cannot be -1), we need to
add some conversion functions to match the C++ behavior.
Overall, it would have been better to keep using the C++ type.
The conversion to usize is used for array accesses, so negative values
would cause crashes either way. Let's do it earlier so we can get rid of
the suspect C-style cast.
This allows us to use the scoped push in more scenarios by appeasing the
borrow checker.
Use it in a couple of places instead of ScopeGuard. Hopefully this is makes
porting easier.
Even though we generally dont' want to use this type (because it's immutable),
it can be advantageous when working with the std::fs API. This is because
it implements "AsRef<Path>" which neither of CString and Vec<u8> do.
This is basically a subset of type, so we might as well.
To be clear this is `command -s` and friends, if you do `command grep` that's
handled as a keyword.
One issue here is that we can't get "one path or not" because I don't
know how to translate a maybe_t? Do we need to make it a shared_ptr instead?
Most of it is duplicated, hence untested.
Functions like mbrtowc are not exposed by the libc crate, so declare them
ourselves.
Since we don't know the definition of C macros, add two big hacks to make
this work:
1. Replace MB_LEN_MAX and mbstate_t with values (resp types) that should
be large enough for any implementation.
2. Detect the definition of MB_CUR_MAX in the build script. This requires
more changes for each new libc. We could also use this approach for 1.
Additionally, this commit brings a small behavior change to
read_unquoted_escape(): we cannot decode surrogate code points like \UDE01
into a Rust char, so use � (\UFFFD, replacement character) instead.
Previously, we added such code points to a wcstring; looks like they were
ignored when printed.
wcs2string converts a wide string to a narrow one. The result is
null-terminated and may also contain interior null-characters.
std::string allows this.
Rust's null-terminated string, CString, does not like interior null-characters.
This means we will need to use Vec<u8> or OsString for the places where we
use interior null-characters.
On the other hand, we want to use CString for places that require a
null-terminator, because other Rust types don't guarantee the null-terminator.
Turns out there is basically no overlap between the two use cases, so make
it two functions. Their equivalents in Rust will have the same name, so
we'll only need to adjust the type when porting.
Existing C++ code didn't use a function for this but simply added
ENCODE_DIRECT_BASE. In Rust that's more verbose because char won't do
arithmetics, hence the function.
We'll add a dual function for decoding, so let's rename this.
BTW we should get rid of the "wchar" naming, it's just "char" in Rust.
Prior to this change, wcstoi("0x") would fail with missing digits.
However strtoul will "backtrack" to return just the 0 and leave the x as
the remainder. Implement this behavior.
Prior to this change, wcstoi() would return an error if the requested
type were unsigned, and the input had a leading minus sign. However this
causes problems for printf, which expects strtoul behavior.
Add "modulo base" behavior which wraps the negative value to positive.
Factor this into an option; the default is False (but code which
previously used strtoull directly should set it to true).
fish_wcstoi_partial is like fish_wcstoi: it converts from a string to an
int optionally inferring the radix. fish_wcstoi_partial also returns the
number of characters consumed.
Unfortunately we cannot use wide string literals in match statements
(not sure if there's an easy fix).
Because of this, I converted the input to UTF-8 so we could use the match
statement. This conversion is confusing, let's skip it.
Everything but signal handlers has been changed to use `Signal` instead of
`c_int` or `i32` signal values.
Event handlers are using `usize` to match C++, at least for now.
Signal is a newtype around NonZeroI32. We could use NonZeroU8 since all signal
values comfortably fit, but using i32 lets us avoid a fallible attempt at
narrowing values returned from the system as integers to the narrower u8 type.
Known signals are explicitly defined as constants and can be matched against
with equality or with pattern matching in a `match` block. Unknown signal values
are passed-through without causing any issues.
We're using per-OS targeting to enable certain libc SIGXXX values - we could
change this to dynamically detecting what's available in build.rs but then it
might not match what libc exposes, still giving us build failures.
Just address two clippy lints that are fallout from changing the signal type.
There's no longer any need to convert these (which gets rid of an unwrap).
Due to limitations imposed by the borrow checker, there are very few places
where we will be able to use the `ScopedPush` class ported over from the C++
codebase (once you capture the value w/ a `ScopedPush` you can't access the
value - or the mutable reference you used to reach it! - until the `ScopedPush`
object goes out of scope).
This alternative requires binding the previous values to a variable and manually
restoring them in the callback passed to the `ScopeGuard` constructor, but will
work with rust's borrow and `&mut` paradigm.
This was added to support signals; however we are unlikely to use this
for anything else. Remove it; just use a u64 to report signals that have
been set.
This optimizes over both the rust rewrite and the original C++ code. The rust
rewrite saw `std::bitset` replaced with `[bool; 65]` which could result in a
lot of memory copy bandwidth each time we checked for and received no signals.
The original C++ code would iterate over all signal slots to see if any were
set. The code now returns a single u64 and only checks slots that are known to
have signals via an intelligent `Iterator` impl.
You can now use a reference to CxxWString or an allocated UniquePtr<CxxWString>
to get an &wstr temporary to use without having to allocate again (e.g. via
`from_ffi()`).
wchar.rs should not import let alone reexport FFI strings.
Stop re-exporting utf32str! because we use L! instead.
In wchar_ffi.rs, stop re-exporting cxx::CxxWString because that hasn't
seen adoption.
I think we should use re-exports only for aliases like "wstr" or for aliases
into internal modules.
So I'd probably remove `pub use wchar_ffi::wcharz_t = crate::ffi::wcharz_t`
as well.
bool_assert_comparison is stupid, the reason they give is "it's shorter". Well,
`assert!(!foo)` is nowhere near as readable as `assert_eq!(foo, false)` because
of the ! noise from the macro.
Uninlined format args is a stupid lint that Rust actually walked back when they
made it an official warning because you still have to use a mix of inlined and
un-inlined format args (the latter of which won't complain) since only idents
can be inlined.
This shows some of the ugliness of the rust borrow checker when it comes to
safely implementing any sort of recursive access and the need to be overly
explicit about which types are actually used across threads and which aren't.
We're forced to use an `Arc` for `ItemMaker` (née `item_maker_t`) because
there's no other way to make it clear that its lifetime will last longer than
the FdMonitor's. But once we've created an `Arc<T>` we can't call
`Arc::get_mut()` to get an `&mut T` once we've created even a single weak
reference to the Arc (because that weak ref could be upgraded to a strong ref at
any time). This means we need to finish configuring any non-atomic properties
(such as `ItemMaker::always_exit`) before we initialize the callback (which
needs an `Arc<ItemMaker>` to do its thing).
Because rust doesn't like self-referential types and because of the fact that we
now need to create both the `ItemMaker` and the `FdMonitorItem` separately
before we set the callback (at which point it becomes impossible to get a
mutable reference to the `ItemMaker`), `ItemMaker::item` is dropped from the
struct and we instead have the "constructor" for `ItemMaker` take a reference to
an `FdMonitor` instance and directly add itself to the monitor's set, meaning we
don't need to move the item out of the `ItemMaker` in order to add it to the
`FdMonitor` set later.
We were only using their ffi implementations which are automatically
exported/public, but the actual functions we would need if we were to use
FdMonitor and co. in native rust code were either private or missing convenient
wrappers.
The existing code is kept, but a rusty version of these functions is added for
code that needs them.
These should only be temporarily used when porting 1-to-1 from C++; we should
use the std library's `read()` and `write_all()` methods instead in the future.
By extracting the equivalent of i32::cmp() into its own const function,
it becomes a lot easier to see what is happening and the logic can be
more direct.
These will be used in the parser.
Maybe this type should be a struct with boolean fields. The current way has
the upside that the usage is exactly the same as in C++.
For some reason this error is triggered by tests after the Rust port of
ast.cpp. Might want to get to the bottom of this but moving it back
to match the original C++ logic fixes it.
Prior to this fix, the Rust FLOG output was regressed from C++, because
it put quotes around strings. However if we used Display, we would fail
to FLOG non-display types like ThreadIDs.
There is apparently no way in Rust to write a function which formats a
value preferentially using Display, falling back to Debug.
Fix this by introducing two new traits, FloggableDisplay and
FloggableDebug. FloggableDisplay is implemented for all Display types,
and FloggableDebug can be "opted into" for any Debug type:
impl FloggableDebug for MyType {}
Both traits have a 'to_flog_str' function. FLOG brings them both into
scope, and Rust figures out which 'to_flog_str' gets called.
* wutil: Rewrite `wrealpath` in Rust
* Reduce use of FFI types in `wrealpath`
* Addressed PR comments regarding allocation
* Replace let binding assignment with regular comparison
More ugliness with types that cxx bridge can't recognize as being POD. Using
pointers to get/set `termios` values with an assert to make sure we're using
identical definitions on both sides (in cpp from the system headers and in rust
from the libc crate as exported).
I don't know why cxx bridge doesn't allow `SharedPtr<OpaqueRustType>` but we can
work around it in C++ by converting a `Box<T>` to a `shared_ptr<T>` then convert
it back when it needs to be destructed. I can't find a clean way of doing it
from the cxx bridge wrapper so for now it needs to be done manually in the C++
code.
Types/values that are drop-in ready over ffi are renamed to match the old cpp
names but for types that now differ due to ffi difficulties I've left the `_ffi`
in the function names to indicate that this isn't the "correct" way of using the
types/methods.
We want to keep the cast because tv_sec is not always 64 bits, see b5ff175b4
(Fix timer.rs cross-platform compilation, 2023-02-14).
It would be nice to avoid the clippy exemption, perhaps using something like
#[cfg(target_pointer_width = "32")]
let seconds = val.tv_sec as i64;
#[cfg(not(target_pointer_width = "32"))]
let seconds = val.tv_sec;
but I'm not sure if "target_pointer_width" is the right criteria.
Upsizing to `usize` from `i32` doesn't work if `usize` is only 32-bits.
I changed the code to use the `FromStr` impl on `i32`, but we could have also
just used `u64` instead of `i32`.
Also, we should get in the habit of using the appropriate type aliases where
possible (`i32` should be `RawFd`).
The mutex was being locked from the very start, before it was needed and
possibly before it would be needed.
Also rename the static global to stick to rust naming conventions.
Note that `once_cell::sync::Lazy<T>` actually internally uses its own lock
around the value, but in this case it's insufficient because `SmallRng` doesn't
implement `SeedableRng` so we can't reseed it with only an `&mut` reference and
must instead replace its value.
We probably *could* still use `Lazy<SmallRng>` directly and then rely on
`std::mem::swap()` to replace the contents of the shared global static without
reassigning the variable directly with a new `SmallRng` instance, but I'm not
sure that's a great idea. This is just a built-in, there's no real harm in
locking twice (especially while fish remains essentially single-threaded).
The old comments about using i128 logic were still there even though we are no
longer using that approach and the output type was very much misleadingly a u64
printed to the console (but via `%d` so it was ultimately shown as an i64). Be
explicit about the resulting being a valid i64 value before passing it to the
sprintf!() macro.
Also add comments about the safety of the final `unwrap()` operation.
Rust doesn't have __FUNCTION__ or __func__ (though you can hack around it with a
proc macro, but that will require a separate crate and slowing down compilation
times with heavy proc macro dependencies), so these are just regular functions
(at least for now). Rust's default stack trace on panic (even in release mode)
should be enough (and the functions themselves are inlined so the calling
function should be the second frame from the top, after the #[cold] panic
functions).
This is to allow us to verify some implementation details that aren't explicitly
documented in the rust standard library's documentation.
std::thread uses `pthread_create()` underneath the hood on *nix platforms, so
this *should* merely be a formality.
The way cxx bridge works, it doesn't recognize any types from another module as
being shared cxx bridge types with generations native to both C++ and Rust,
meaning every module that was going to use function pointers would have to
define its own `c_void` type (because cxx bridge doesn't recognize any of
libc::c_void, std::ffi::c_void, or autocxx::c_void).
FFI on other platforms has long used the equivalent of `uint8_t *` as an
alternative to `void *` for code where `void` was not available or was
undesirable for some reason. We can join the club - this way we can always use
`* {const|mut} u8` in our rust code and `uint8_t *` in our C++ code to pass
around parameters or values over the C abi.
I needed to rename some types already ported to rust so they don't clash with
their still-extant cpp counterparts. Helper ffi functions added to avoid needing
to dynamically allocate an FdMonitorItem for every fd (we use dozens per basic
prompt).
I ported some functions from cpp to rust that are used only in the backend but
without removing their existing cpp counterparts so cpp code can continue to use
their version of them (`wperror` and `make_detached_pthread`).
I ran into issues porting line-by-line logic because rust inverts the behavior
of `std::remove_if(..)` by making it (basically) `Vec::retain_if(..)` so I
replaced bools with an explict enum to make everything clearer.
I'll port the cpp tests for this separately, for now they're using ffi.
Porting closures was ugly. It's nothing hard, but it's very ugly as now each
capturing lambda has been changed into an explicit struct that contains its
parameters (that needs to be dynamically allocated), a standalone callback
(member) function to replace the lambda contents, and a separate trampoline
function to call it from rust over the shared C abi (not really relevant to
x86_64 w/ its single calling convention but probably needed on other platforms).
I don't like that `fd_monitor.rs` has its own `c_void`. I couldn't find a way to
move that to `ffi.rs` but still get cxx bridge to consider it a shared POD.
Every time I moved it to a different module, it would consider it to be an
opaque rust type instead. I worry this means we're going to have multiple
`c_void1`, `c_void2`, etc. types as we continue to port code to use function
pointers.
Also, rust treats raw pointers as foreign so you can't do `impl Send for * const
Foo` even if `Foo` is from the same module. That necessitated a wrapper type
(`void_ptr`) that implements `Send` and `Sync` so we can move stuff between
threads.
The code in fd_monitor_t has been split into two objects, one that is used by
the caller and a separate one associated with the background thread (this is
made nice and clean by rust's ownership model). Objects not needed under the
lock (i.e. accessed by the background thread exclusively) were moved to the
separate `BackgroundFdMonitor` type.