Commit Graph

14 Commits

Author SHA1 Message Date
Johannes Altmanninger
408161f4d6 Port test_tokenizer 2023-10-07 19:30:46 +02:00
Henrik Hørlück Berg
fae090ea67 Adopt the wchar prelude 2023-08-09 15:00:58 +02:00
Johannes Altmanninger
fc5e97e55e Expose u32 source offsets as usize
Computations should use usize, so this makes things more convenient.
Post-FFI we can make SourceRange fields private, to enforce this even easier.
2023-04-19 01:03:16 +02:00
Johannes Altmanninger
2ca27d2c5b Implement Iterator for Tokenizer 2023-04-19 01:03:16 +02:00
ridiculousfish
ead329db60 Replace a bunch of from_ffi with as_wstr calls
from_ffi copies a CxxWString into a new Rust WString, but as_wstr simply
gets the slice of chars directly.

Too many string types!
2023-04-16 12:50:53 -07:00
Johannes Altmanninger
915db44fbd Implement printf formatting for some parser types 2023-04-16 17:46:56 +02:00
Johannes Altmanninger
05bad5eda1 Port common.{h,cpp} to Rust
Most of it is duplicated, hence untested.

Functions like mbrtowc are not exposed by the libc crate, so declare them
ourselves.
Since we don't know the definition of C macros, add two big hacks to make
this work:
1. Replace MB_LEN_MAX and mbstate_t with values (resp types) that should
   be large enough for any implementation.
2. Detect the definition of MB_CUR_MAX in the build script. This requires
   more changes for each new libc. We could also use this approach for 1.

Additionally, this commit brings a small behavior change to
read_unquoted_escape(): we cannot decode surrogate code points like \UDE01
into a Rust char, so use � (\UFFFD, replacement character) instead.
Previously, we added such code points to a wcstring; looks like they were
ignored when printed.
2023-04-02 15:17:06 +02:00
Johannes Altmanninger
c6756e9324 Canonicalize some wide string imports
wchar.rs should not import let alone reexport FFI strings.
Stop re-exporting utf32str! because we use L! instead.

In wchar_ffi.rs, stop re-exporting cxx::CxxWString because that hasn't
seen adoption.

I think we should use re-exports only for aliases like "wstr" or for aliases
into internal modules.
So I'd probably remove `pub use wchar_ffi::wcharz_t = crate::ffi::wcharz_t`
as well.
2023-03-05 10:32:20 +01:00
Johannes Altmanninger
2c331e9c69 Implement more bitwise operation for parser bitfields
These will be used in the parser.

Maybe this type should be a struct with boolean fields. The current way has
the upside that the usage is exactly the same as in C++.
2023-03-04 22:24:22 +01:00
Johannes Altmanninger
bb1c64b202 Make some parser types public 2023-03-04 22:24:22 +01:00
Johannes Altmanninger
5394ca1f96 Address clippy lints 2023-02-25 12:24:25 +01:00
Mahmoud Al-Qudsi
aca7dedf33 Fix Tokenizer::parse_fd() on x86
Upsizing to `usize` from `i32` doesn't work if `usize` is only 32-bits.
I changed the code to use the `FromStr` impl on `i32`, but we could have also
just used `u64` instead of `i32`.

Also, we should get in the habit of using the appropriate type aliases where
possible (`i32` should be `RawFd`).
2023-02-20 13:41:11 -06:00
Johannes Altmanninger
39f3c894d7 Port tokenizer.cpp to Rust
In hindsight, I should probably have split this into three different commits.
2023-02-09 00:37:22 +01:00
Johannes Altmanninger
7f8d247211 Port parse_constants.h to Rust 2023-02-09 00:37:22 +01:00