fish-shell/doc_internal/rust-devel.md
Fabian Boehm 16fcc5de7c Increase MSRV to 1.70
It appears we can't find a system that ships rustc >= 1.67 and < 1.70,
so keeping it at 1.67 gains nothing.

1.70 is used in Debian 13, so that will be able to build fish out of
the box (12 was on 1.63 which was already too low).
2024-04-29 22:00:59 +02:00

193 lines
9.7 KiB
Markdown

# fish-shell Rust Development Guide
This describes how to get started building fish-shell in its partial Rust state, and how to contribute to the port.
## Overview
fish is in the process of transitioning from C++ to Rust. The fish project has a Rust crate embedded at path `fish-rust`. This crate builds a Rust library `libfish_rust.a` which is linked with the C++ `libfish.a`. Existing C++ code will be incrementally migrated to this crate; then CMake will be replaced with cargo and other Rust-native tooling.
Important tools used during this transition:
1. [Corrosion](https://github.com/corrosion-rs/corrosion) to invoke cargo from CMake.
2. [cxx](http://cxx.rs) for basic C++ <-> Rust interop.
3. [autocxx](https://google.github.io/autocxx/) for using C++ types in Rust.
We use forks of the last two - see the [FFI section](#ffi) below. No special action is required to obtain these packages. They're downloaded by cargo.
## Building
### Build Dependencies
fish-shell currently depends on Rust 1.70 or later. To install Rust, follow https://rustup.rs.
### Build via CMake
It is recommended to build inside `fish-shell/build`. This will make it easier for Rust to find the `config.h` file.
Build via CMake as normal (use any generator, here we use Ninja):
```shell
$ cd fish-shell
$ mkdir build && cd build
$ cmake -G Ninja ..
$ ninja
```
This will create the usual fish executables.
### Build just libfish_rust.a with Cargo
The directory `fish-rust` contains the Rust sources. These require that CMake has been run to produce `config.h` which is necessary for autocxx to succeed.
Follow the "Build from CMake" steps above, and then:
```shell
$ cd fish-shell/fish-rust
$ cargo build
```
This will build only the library, not a full working fish, but it allows faster iteration for Rust development. That is, after running `cmake` you can open the `fish-rust` as the root of a Rust crate, and tools like rust-analyzer will work.
## Development
The basic development loop for this port:
1. Pick a .cpp (or in some cases .h) file to port, say `util.cpp`.
2. Add the corresponding `util.rs` file to `fish-rust/`.
3. Reimplement it in Rust, along with its dependencies as needed. Match the existing C++ code where practical, including propagating any relevant comments.
- Do this even if it results in less idiomatic Rust, but avoid being super-dogmatic either way.
- One technique is to paste the C++ into the Rust code, commented out, and go line by line.
4. Decide whether any existing C++ callers should invoke the Rust implementation, or whether we should keep the C++ one.
- Utility functions may have both a Rust and C++ implementation. An example is `FLOG` where interop is too hard.
- Major components (e.g. builtin implementations) should _not_ be duplicated; instead the Rust should call C++ or vice-versa.
5. Remember to run `cargo fmt` and `cargo clippy` to keep the codebase somewhat clean (otherwise CI will fail). If you use rust-analyzer, you can run clippy automatically by setting `rust-analyzer.checkOnSave.command = "clippy"`.
You will likely run into limitations of [`autocxx`](https://google.github.io/autocxx/) and to a lesser extent [`cxx`](https://cxx.rs/). See the [FFI sections](#ffi) below.
## Type Mapping
### Constants & Type Aliases
The FFI does not support constants (`#define` or `static const`) or type aliases (`typedef`, `using`). Duplicate them using their Rust equivalent (`pub const` and `type`/`struct`/`enum`).
### Non-POD types
Many types cannot currently be passed across the language boundary by value or occur in shared structs. As a workaround, use references, raw pointers or smart pointers (`cxx` provides `SharedPtr` and `UniquePtr`). Try to keep workarounds on the C++ side and the FFI layer of the Rust code. This ensures we will get rid of the workarounds as we peel off the FFI layer.
### Strings
Fish will mostly _not_ use Rust's `String/&str` types as these cannot represent non-UTF8 data using the default encoding.
fish's primary string types will come from the [`widestring` crate](https://docs.rs/widestring). The two main string types are `WString` and `&wstr`, which are renamed [Utf32String](https://docs.rs/widestring/latest/widestring/utfstring/struct.Utf32String.html) and [Utf32Str](https://docs.rs/widestring/latest/widestring/utfstr/struct.Utf32Str.html). `WString` is an owned, heap-allocated UTF32 string, `&wstr` a borrowed UTF32 slice.
In general, follow this mapping when porting from C++:
- `wcstring` -> `WString`
- `const wcstring &` -> `&wstr`
- `const wchar_t *` -> `&wstr`
None of the Rust string types are nul-terminated. We're taking this opportunity to drop the nul-terminated aspect of wide string handling.
#### Creating strings
One may create a `&wstr` from a string literal using the `wchar::L!` macro:
```rust
use crate::wchar::prelude::*;
// This imports wstr, the L! macro, WString, a ToWString trait that supplies .to_wstring() along with other things
fn get_shell_name() -> &'static wstr {
L!("fish")
}
```
There is also a `widestrs` proc-macro which enables L as a _suffix_, to reduce the noise. This can be applied to any block, including modules and individual functions:
```rust
use crate::wchar::{wstr, widestrs}
// also imported by the prelude
#[widestrs]
fn get_shell_name() -> &'static wstr {
"fish"L // equivalent to L!("fish")
}
```
#### The wchar prelude
We have a prelude to make working with these string types a whole lot more ergonomic. In particular `WExt` supplies the null-terminated-compatible `.char_at(usize)`,
and a whole lot more methods that makes porting C++ code easier. It is also preferred to use char-based-methods like `.char_count()` and `.slice_{from,to}()`
of the `WExt` trait over directly calling `.len()` and `[usize..]/[..usize]`, as that makes the code compatible with a potential future change to UTF8-strings.
```rust
pub(crate) mod prelude {
pub(crate) use crate::{
wchar::{wstr, IntoCharIter, WString, L},
wchar_ext::{ToWString, WExt},
wutil::{sprintf, wgettext, wgettext_fmt, wgettext_str},
};
pub(crate) use widestring_suffix::widestrs;
}
```
### Strings for FFI
`WString` and `&wstr` are the common strings used by Rust components. At the FII boundary there are some additional strings for interop. _All of these are temporary for the duration of the port._
- `CxxWString` is the Rust binding of `std::wstring`. It is the wide-string analog to [`CxxString`](https://cxx.rs/binding/cxxstring.html) and is [added in our fork of cxx](https://github.com/ridiculousfish/cxx/blob/fish/src/cxx_wstring.rs). This is useful for functions which return e.g. `const wcstring &`.
- `W0String` is renamed [U32CString](https://docs.rs/widestring/latest/widestring/ucstring/struct.U32CString.html). This is basically `WString` except it _is_ nul-terminated. This is useful for getting a nul-terminated `const wchar_t *` to pass to C++ implementations.
- `wcharz_t` is an annoying C++ struct which merely wraps a `const wchar_t *`, used for passing these pointers from C++ to Rust. We would prefer to use `const wchar_t *` directly but `autocxx` refuses to generate bindings for types such as `std::vector<const wchar_t *>` so we wrap it in this silly struct.
Note C++ `wchar_t`, Rust `char`, and `u32` are effectively interchangeable: you can cast pointers to them back and forth (except we check upon u32->char conversion). However be aware of which types are nul-terminated.
These types should be confined to the FFI modules, in particular `wchar_ffi`. They should not "leak" into other modules. See the `wchar_ffi` module.
### Format strings
Rust's builtin `std::fmt` modules do not accept runtime-provided format strings, so we mostly won't use them, except perhaps for FLOG / other non-translated text.
Instead we'll continue to use printf-style strings, with a Rust printf implementation.
### Vectors
See [`Vec`](https://cxx.rs/binding/vec.html) and [`CxxVector`](https://cxx.rs/binding/cxxvector.html).
In many cases, `autocxx` refuses to allow vectors of certain types. For example, autocxx supports `std::vector` and `std::shared_ptr` but NOT `std::vector<std::shared_ptr<...>>`. To work around this one can create a helper (pointer, length) struct. Example:
```cpp
struct RustFFIJobList {
std::shared_ptr<job_t> *jobs;
size_t count;
};
```
This is just a POD (plain old data) so autocxx can generate bindings for it. Then it is trivial to convert it to a Rust slice:
```
pub fn get_jobs(ffi_jobs: &ffi::RustFFIJobList) -> &[SharedPtr<job_t>] {
unsafe { slice::from_raw_parts(ffi_jobs.jobs, ffi_jobs.count) }
}
```
Another workaround is to define a struct that contains the shared pointer, and create a vector of that struct.
## Development Tooling
The [autocxx guidance](https://google.github.io/autocxx/workflow.html#how-can-i-see-what-bindings-autocxx-has-generated) is helpful:
1. Install cargo expand (`cargo install cargo-expand`). Then you can use `cargo expand` to see the generated Rust bindings for C++. In particular this is useful for seeing failed expansions for C++ types that autocxx cannot handle.
2. In rust-analyzer, enable Proc Macro and Proc Macro Attributes.
## FFI
The boundary between Rust and C++ is referred to as the Foreign Function Interface, or FFI.
`autocxx` and `cxx` both are designed for long-term interop: C++ and Rust coexisting for years. To this end, both emphasize safety: requiring lots of `unsafe`, `Pin`, etc.
fish plans to use them only temporarily, with a focus on getting things working. To this end, both cxx and autocxx have been forked to support fish:
1. Relax the requirement that all functions taking pointers are `unsafe` (this just added noise).
2. Add support for `wchar_t` as a recognized type, and `CxxWString` analogous to `CxxString`.
See the `Cargo.toml` file for the locations of the forks.