The type system no longer guarantees that the input string is nul-terminated,
meaning accessing beyond the range-checked `i` a char-at-a-time is no longer
safe. (In C++, we would either be using a plain C string which is always
nul-terminated or we would be using (w)string::cstr() which similarly grants
access to its nul-terminated buffer.)
Aside from that, there's no need to explicitly check `if c2 == '\0'` because
'\0' is not a valid hex digit so the `?` tacked on to `convert_hex_digit(c2)?`
will abort and return `None` anyway.
convert_hex_digit() is not appreciably faster than char::to_digit(16) and makes
the code less maintainable since it encodes certain assumptions; since it's also
not used consistently just drop it in favor of the std fn.
Since the output string (per the decode logic) is always shorter than or equal
to the input string, just reserve the input string size upfront to prevent vec
reallocations.
Somewhat counter-intuitively, this code is active when compiling under *Linux*
and is always false when compiling under Windows. The logic was incorrectly
reversed before (it's easier to reason about when you realize that fish doesn't
even compile under Windows because it uses tons of libc functions).
As the code was actually never compiled, it wasn't actually tested for validity
either and there were some issues that prevented it from compiling that have
since been fixed. The logic has also been adjusted a bit to make it possible to
use the rust-native int parsing instead of `libc::strtod()`.
The code has been changed to use `once_cell::race::OnceBool` instead of
`once_cell::sync::Lazy<T>` which imposes a greater runtime burden with locking
and other overhead. We don't care if the code runs more than once on init (if
calls were to race, though they probably don't) - just that the code isn't
subsequently executed on each call. The `once_cell::race` module is a better fit
here, though it doesn't expose the ergonomic `Lazy<T>` façade around its types.
is_main_thread() and co were previously ported to threads.rs, so remove the
duplicate code and move everything else related to threads there as well. No
need for common.rs to be as long as our old common.cpp!
I left #[deprecated] stubs in common.rs to help redirect anyone porting code
over that we can remove after the port has finished.
Additionally, the fork guards had previously been left as a todo!() item but I
ported that over. They're all called from the now-central threads::init()
function so there isn't a need to call each individual thread-management-fn
manually.
The decision was made a while back to try and embrace/use the native rust thread
functionality and utilities so the manual thread management code has been ripped
out and was replaced with code that marshals the native rust values instead. The
values won't line up with what the C++ code sees, but it never lined up anyway
since each was using a separate counter to keep track of the values.
Except for the indent visitor bits.
Tests for parse_util_detect_errors* are not ported yet because they depend
on expand.h (and operation_context.h which depends on env.h).
This caused math to assert out because it never wrote into the buffer.
Now, presumably it wrote somewhere but I don't know where, so fixing
this seems like a good idea.
Fixes#9735.
The translation is fairly direct though it adds some duplication, for example
there are multiple "match" statements that mimic function overloading.
Rust has no overloading, and we cannot have generic methods in the Node trait
(due to a Rust limitation, the error is like "cannot be made into an object")
so we include the type name in method names.
Give clients like "indent_visitor_t" a Rust companion ("IndentVisitor")
that takes care of the AST traversal while the AST consumption remains
in C++ for now. In future, "IndentVisitor" should absorb the entirety of
"indent_visitor_t". This pattern requires that "fish_indent" be exposed
includable header to the CXX bridge.
Alternatively, we could define FFI wrappers for recursive AST traversal.
Rust requires we separate the AST visitors for "mut" and "const"
scenarios. Take this opportunity to concretize both visitors:
The only client that requires mutable access is the populator. To match the
structure of the C++ populator which makes heavy use of function overloading,
we need to add a bunch of functions to the trait. Since there is no other
mutable visit, this seems acceptable.
The "const" visitors never use "will_visit_fields_of()" or
"did_visit_fields_of()", so remove them (though this is debatable).
Like in the C++ implementation, the AST nodes themselves are largely defined
via macros. Union fields like "Statement" and "ArgumentOrRedirection"
do currently not use macros but may in future.
This commit also introduces a precedent for a type that is defined in one
CXX bridge and used in another one - "ParseErrorList". To make this work
we need to manually define "ExternType".
There is one annoyance with CXX: functions that take explicit lifetime
parameters require to be marked as unsafe. This makes little sense
because functions that return `&Foo` with implicit lifetime can be
misused the same way on the C++ side.
One notable change is that we cannot directly port "find_block_open_keyword()"
(which is used to compute an error) because it relies on the stack of visited
nodes. We cannot modify a stack of node references while we do the "mut"
walk. Happily, an idiomatic solution is easy: we can tell the AST visitor
to backtrack to the parent node and create the error there.
Since "node_t::accept_base" is no longer a template we don't need the
"node_visitation_t" trampoline anymore.
The added copying at the FFI boundary makes things slower (memcpy dominates
the profile) but it's not unusable, which is good news:
$ hyperfine ./fish.{old,new}" -c 'source ../share/completions/git.fish'"
Benchmark 1: ./fish.old -c 'source ../share/completions/git.fish'
Time (mean ± σ): 195.5 ms ± 2.9 ms [User: 190.1 ms, System: 4.4 ms]
Range (min … max): 193.2 ms … 205.1 ms 15 runs
Benchmark 2: ./fish.new -c 'source ../share/completions/git.fish'
Time (mean ± σ): 677.5 ms ± 62.0 ms [User: 665.4 ms, System: 10.0 ms]
Range (min … max): 611.7 ms … 805.5 ms 10 runs
Summary
'./fish.old -c 'source ../share/completions/git.fish'' ran
3.47 ± 0.32 times faster than './fish.new -c 'source ../share/completions/git.fish''
Leftovers:
- Enum variants are still snakecase; I didn't get around to changing this yet.
- "ast_type_to_string()" still returns a snakecase name. This could be
changed since it's not user visible.
A JobId is not supposed to convert to other types.
Since this type is defined as NonZeroU32 (which cannot be -1), we need to
add some conversion functions to match the C++ behavior.
Overall, it would have been better to keep using the C++ type.