fish-shell

mirror of https://github.com/fish-shell/fish-shell.git synced 2025-02-01 13:44:16 +08:00

History

Fabian Boehm 7988cff6bd Increase the string chunk size to increase performance This is a tiny commit code-wise, but the explanation is a bit longer. When I made string read in chunks, I picked a chunk size from bash's read, under the assumption that they had picked a good one. It turns out, on the (linux) systems I've tested, that's simply not true. My tests show that a bigger chunk size of up to 4096 is better across the board: - It's better with very large inputs - It's equal-to-slightly-better with small inputs - It's equal-to-slightly-better even if we quit early My test setup: 0. Create various fish builds with various sizes for STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE". 1. Download the npm package names from https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB) 2. Extract the names so we get a line-based version: ```fish jq '.[]' names.json \| string trim -c '"' >/tmp/all ``` 3. Create various sizes of random extracts: ```fish for f in 10000 1000 500 50 shuf /tmp/all \| head -n $f > /tmp/$f end ``` (the idea here is to defeat any form of pattern in the input). 4. Run benchmarks: hyperfine -w 3 ./fish-{128,512,1024,2048,4096}" -c 'for i in (seq 1000) string match -re foot < $f end; true'" (reduce the seq size for the larger files so you don't have to wait for hours - the idea here is to have some time running string and not just fish startup time) This shows results pretty much like ``` Summary './fish-2048 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' ran 1.01 ± 0.02 times faster than './fish-4096 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.02 ± 0.03 times faster than './fish-1024 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.08 ± 0.03 times faster than './fish-512 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.47 ± 0.07 times faster than './fish-128 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' ``` So we see that up to 1024 there's a difference, and after that the returns are marginal. So we stick with 1024 because of the memory trade-off. ---- Fun extra: Comparisons with `grep` (GNU grep 3.7) are weird. Because you both get ``` './fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran 11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end'' ``` and ``` 'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran 66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2); string match -re foot < /tmp/all; end; true'' 100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2); string match -re foot < /tmp/all; end; true'' ``` Basically, if you can give grep a lot of work at once (~40MB in this case), it'll churn through it like butter. But if you have to call it a lot, string beats it by virtue of cheating.		2022-08-15 20:16:12 +02:00
..
argparse.cpp	argparse: Stop reconverting to null_terminated_array_t	2022-06-27 17:45:08 +02:00
argparse.h
bg.cpp	Refactor tty transfer to be more deliberate	2022-03-19 14:48:36 -07:00
bg.h
bind.cpp	Make ESCAPE_ALL the default and call its inverse ESCAPE_NO_PRINTABLES	2022-07-27 11:24:35 +02:00
bind.h
block.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
block.h
builtin.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
builtin.h
cd.cpp	Add an error message when cd fails with ELOOP	2022-05-15 11:58:40 -07:00
cd.h
command.cpp	Rationalize path-getting	2022-04-23 15:24:27 -07:00
command.h
commandline.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
commandline.h
complete.cpp	Switch completion_request_options_t from a list of flags to a struct	2022-06-19 11:23:10 -07:00
complete.h
contains.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
contains.h
disown.cpp
disown.h
echo.cpp	echo: Use convert_digit	2022-06-16 15:43:46 +02:00
echo.h
emit.cpp	event_fire_generic to take its arguments directly	2022-05-14 10:33:47 -07:00
emit.h
eval.cpp
eval.h
exit.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
exit.h
fg.cpp	Refactor tty transfer to be more deliberate	2022-03-19 14:48:36 -07:00
fg.h
function.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
function.h
functions.cpp	Switch filenames from intern'd strings to shared_ptr	2022-08-13 12:51:36 -07:00
functions.h
history.cpp	clang-format C++ files	2022-07-27 10:05:41 +02:00
history.h
jobs.cpp	Fix CPU usage percentage calculation as reported by jobs	2022-05-07 15:29:56 -07:00
jobs.h
math.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
math.h
path.cpp	clang-format C++ files	2022-07-27 10:05:41 +02:00
path.h	clang-format C++ files	2022-07-27 10:05:41 +02:00
printf.cpp	builtin printf: suppress warnings about unused variables	2022-08-13 21:11:54 +02:00
printf.h	clang-format C++ files	2022-07-27 10:05:41 +02:00
pwd.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
pwd.h
random.cpp
random.h
read.cpp	Make ESCAPE_ALL the default and call its inverse ESCAPE_NO_PRINTABLES	2022-07-27 11:24:35 +02:00
read.h
realpath.cpp	clang-format C++ files	2022-07-27 10:05:41 +02:00
realpath.h
return.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
return.h
set_color.cpp	clang-format C++ files	2022-07-27 10:05:41 +02:00
set_color.h
set.cpp	clarify "…variable is shadowed by the global variable of the same name"	2022-08-14 16:16:38 -07:00
set.h
source.cpp	Remove the intern'd strings component	2022-08-13 12:51:36 -07:00
source.h
status.cpp	Switch filenames from intern'd strings to shared_ptr	2022-08-13 12:51:36 -07:00
status.h
string.cpp	Increase the string chunk size to increase performance	2022-08-15 20:16:12 +02:00
string.h
test.cpp	clang-format C++ files	2022-07-27 10:05:41 +02:00
test.h
type.cpp	Switch filenames from intern'd strings to shared_ptr	2022-08-13 12:51:36 -07:00
type.h
ulimit.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
ulimit.h
wait.cpp	Clean up woption	2022-04-02 11:28:30 -07:00
wait.h