fish-shell/doc_internal/rust-devel.md
Xiretza dff7db2f16
Run rustfmt and clippy in CI (#9616)
* Add machine-readable MSRV to Cargo.toml
* Fix clippy warnings
* CI: add rustfmt and clippy checks
2023-02-26 13:20:20 -06:00

8.8 KiB

fish-shell Rust Development Guide

This describes how to get started building fish-shell in its partial Rust state, and how to contribute to the port.

Overview

fish is in the process of transitioning from C++ to Rust. The fish project has a Rust crate embedded at path fish-rust. This crate builds a Rust library libfish_rust.a which is linked with the C++ libfish.a. Existing C++ code will be incrementally migrated to this crate; then CMake will be replaced with cargo and other Rust-native tooling.

Important tools used during this transition:

  1. Corrosion to invoke cargo from CMake.
  2. cxx for basic C++ <-> Rust interop.
  3. autocxx for using C++ types in Rust.

We use forks of the last two - see the FFI section below. No special action is required to obtain these packages. They're downloaded by cargo.

Building

Build Dependencies

fish-shell currently depends on Rust 1.67 or later. To install Rust, follow https://rustup.rs.

Build via CMake

It is recommended to build inside fish-shell/build. This will make it easier for Rust to find the config.h file.

Build via CMake as normal (use any generator, here we use Ninja):

$ cd fish-shell
$ mkdir build && cd build
$ cmake -G Ninja ..
$ ninja

This will create the usual fish executables.

Build just libfish_rust.a with Cargo

The directory fish-rust contains the Rust sources. These require that CMake has been run to produce config.h which is necessary for autocxx to succeed.

Follow the "Build from CMake" steps above, and then:

$ cd fish-shell/fish-rust
$ cargo build

This will build only the library, not a full working fish, but it allows faster iteration for Rust development. That is, after running cmake you can open the fish-rust as the root of a Rust crate, and tools like rust-analyzer will work.

Development

The basic development loop for this port:

  1. Pick a .cpp (or in some cases .h) file to port, say util.cpp.
  2. Add the corresponding util.rs file to fish-rust/.
  3. Reimplement it in Rust, along with its dependencies as needed. Match the existing C++ code where practical, including propagating any relevant comments.
    • Do this even if it results in less idiomatic Rust, but avoid being super-dogmatic either way.
    • One technique is to paste the C++ into the Rust code, commented out, and go line by line.
  4. Decide whether any existing C++ callers should invoke the Rust implementation, or whether we should keep the C++ one.
    • Utility functions may have both a Rust and C++ implementation. An example is FLOG where interop is too hard.
    • Major components (e.g. builtin implementations) should not be duplicated; instead the Rust should call C++ or vice-versa.
  5. Remember to run cargo fmt and cargo clippy to keep the codebase somewhat clean (otherwise CI will fail). If you use rust-analyzer, you can run clippy automatically by setting rust-analyzer.checkOnSave.command = "clippy".

You will likely run into limitations of autocxx and to a lesser extent cxx. See the FFI sections below.

Type Mapping

Constants & Type Aliases

The FFI does not support constants (#define or static const) or type aliases (typedef, using). Duplicate them using their Rust equivalent (pub const and type/struct/enum).

Non-POD types

Many types cannot currently be passed across the language boundary by value or occur in shared structs. As a workaround, use references, raw pointers or smart pointers (cxx provides SharedPtr and UniquePtr). Try to keep workarounds on the C++ side and the FFI layer of the Rust code. This ensures we will get rid of the workarounds as we peel off the FFI layer.

Strings

Fish will mostly not use Rust's String/&str types as these cannot represent non-UTF8 data using the default encoding.

fish's primary string types will come from the widestring crate. The two main string types are WString and &wstr, which are renamed Utf32String and Utf32Str. WString is an owned, heap-allocated UTF32 string, &wstr a borrowed UTF32 slice.

In general, follow this mapping when porting from C++:

  • wcstring -> WString
  • const wcstring & -> &wstr
  • const wchar_t * -> &wstr

None of the Rust string types are nul-terminated. We're taking this opportunity to drop the nul-terminated aspect of wide string handling.

Creating strings

One may create a &wstr from a string literal using the wchar::L! macro:

use crate::wchar::{wstr, L!}

fn get_shell_name() -> &'static wstr {
    L!("fish")
}

There is also a widestrs proc-macro which enables L as a suffix, to reduce the noise. This can be applied to any block, including modules and individual functions:

use crate::wchar::{wstr, widestrs}

#[widestrs]
fn get_shell_name() -> &'static wstr {
    "fish"L // equivalent to L!("fish")
}

Strings for FFI

WString and &wstr are the common strings used by Rust components. At the FII boundary there are some additional strings for interop. All of these are temporary for the duration of the port.

  • CxxWString is the Rust binding of std::wstring. It is the wide-string analog to CxxString and is added in our fork of cxx. This is useful for functions which return e.g. const wcstring &.
  • W0String is renamed U32CString. This is basically WString except it is nul-terminated. This is useful for getting a nul-terminated const wchar_t * to pass to C++ implementations.
  • wcharz_t is an annoying C++ struct which merely wraps a const wchar_t *, used for passing these pointers from C++ to Rust. We would prefer to use const wchar_t * directly but autocxx refuses to generate bindings for types such as std::vector<const wchar_t *> so we wrap it in this silly struct.

Note C++ wchar_t, Rust char, and u32 are effectively interchangeable: you can cast pointers to them back and forth (except we check upon u32->char conversion). However be aware of which types are nul-terminated.

These types should be confined to the FFI modules, in particular wchar_ffi. They should not "leak" into other modules. See the wchar_ffi module.

Format strings

Rust's builtin std::fmt modules do not accept runtime-provided format strings, so we mostly won't use them, except perhaps for FLOG / other non-translated text.

Instead we'll continue to use printf-style strings, with a Rust printf implementation.

Vectors

See Vec and CxxVector.

In many cases, autocxx refuses to allow vectors of certain types. For example, autocxx supports std::vector and std::shared_ptr but NOT std::vector<std::shared_ptr<...>>. To work around this one can create a helper (pointer, length) struct. Example:

struct RustFFIJobList {
    std::shared_ptr<job_t> *jobs;
    size_t count;
};

This is just a POD (plain old data) so autocxx can generate bindings for it. Then it is trivial to convert it to a Rust slice:

pub fn get_jobs(ffi_jobs: &ffi::RustFFIJobList) -> &[SharedPtr<job_t>] {
    unsafe { slice::from_raw_parts(ffi_jobs.jobs, ffi_jobs.count) }
}

Another workaround is to define a struct that contains the shared pointer, and create a vector of that struct.

Development Tooling

The autocxx guidance is helpful:

  1. Install cargo expand (cargo install cargo-expand). Then you can use cargo expand to see the generated Rust bindings for C++. In particular this is useful for seeing failed expansions for C++ types that autocxx cannot handle.
  2. In rust-analyzer, enable Proc Macro and Proc Macro Attributes.

FFI

The boundary between Rust and C++ is referred to as the Foreign Function Interface, or FFI.

autocxx and cxx both are designed for long-term interop: C++ and Rust coexisting for years. To this end, both emphasize safety: requiring lots of unsafe, Pin, etc.

fish plans to use them only temporarily, with a focus on getting things working. To this end, both cxx and autocxx have been forked to support fish:

  1. Relax the requirement that all functions taking pointers are unsafe (this just added noise).
  2. Add support for wchar_t as a recognized type, and CxxWString analogous to CxxString.

See the Cargo.toml file for the locations of the forks.