mirror of
https://github.com/fish-shell/fish-shell.git
synced 2025-02-01 13:44:16 +08:00
7988cff6bd
This is a *tiny* commit code-wise, but the explanation is a bit longer. When I made string read in chunks, I picked a chunk size from bash's read, under the assumption that they had picked a good one. It turns out, on the (linux) systems I've tested, that's simply not true. My tests show that a bigger chunk size of up to 4096 is better *across the board*: - It's better with very large inputs - It's equal-to-slightly-better with small inputs - It's equal-to-slightly-better even if we quit early My test setup: 0. Create various fish builds with various sizes for STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE". 1. Download the npm package names from https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB) 2. Extract the names so we get a line-based version: ```fish jq '.[]' names.json | string trim -c '"' >/tmp/all ``` 3. Create various sizes of random extracts: ```fish for f in 10000 1000 500 50 shuf /tmp/all | head -n $f > /tmp/$f end ``` (the idea here is to defeat any form of pattern in the input). 4. Run benchmarks: hyperfine -w 3 ./fish-{128,512,1024,2048,4096}" -c 'for i in (seq 1000) string match -re foot < $f end; true'" (reduce the seq size for the larger files so you don't have to wait for hours - the idea here is to have some time running string and not just fish startup time) This shows results pretty much like ``` Summary './fish-2048 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' ran 1.01 ± 0.02 times faster than './fish-4096 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.02 ± 0.03 times faster than './fish-1024 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.08 ± 0.03 times faster than './fish-512 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.47 ± 0.07 times faster than './fish-128 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' ``` So we see that up to 1024 there's a difference, and after that the returns are marginal. So we stick with 1024 because of the memory trade-off. ---- Fun extra: Comparisons with `grep` (GNU grep 3.7) are *weird*. Because you both get ``` './fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran 11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end'' ``` and ``` 'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran 66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2); string match -re foot < /tmp/all; end; true'' 100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2); string match -re foot < /tmp/all; end; true'' ``` Basically, if you *can* give grep a lot of work at once (~40MB in this case), it'll churn through it like butter. But if you have to call it a lot, string beats it by virtue of cheating. |
||
---|---|---|
.. | ||
argparse.cpp | ||
argparse.h | ||
bg.cpp | ||
bg.h | ||
bind.cpp | ||
bind.h | ||
block.cpp | ||
block.h | ||
builtin.cpp | ||
builtin.h | ||
cd.cpp | ||
cd.h | ||
command.cpp | ||
command.h | ||
commandline.cpp | ||
commandline.h | ||
complete.cpp | ||
complete.h | ||
contains.cpp | ||
contains.h | ||
disown.cpp | ||
disown.h | ||
echo.cpp | ||
echo.h | ||
emit.cpp | ||
emit.h | ||
eval.cpp | ||
eval.h | ||
exit.cpp | ||
exit.h | ||
fg.cpp | ||
fg.h | ||
function.cpp | ||
function.h | ||
functions.cpp | ||
functions.h | ||
history.cpp | ||
history.h | ||
jobs.cpp | ||
jobs.h | ||
math.cpp | ||
math.h | ||
path.cpp | ||
path.h | ||
printf.cpp | ||
printf.h | ||
pwd.cpp | ||
pwd.h | ||
random.cpp | ||
random.h | ||
read.cpp | ||
read.h | ||
realpath.cpp | ||
realpath.h | ||
return.cpp | ||
return.h | ||
set_color.cpp | ||
set_color.h | ||
set.cpp | ||
set.h | ||
source.cpp | ||
source.h | ||
status.cpp | ||
status.h | ||
string.cpp | ||
string.h | ||
test.cpp | ||
test.h | ||
type.cpp | ||
type.h | ||
ulimit.cpp | ||
ulimit.h | ||
wait.cpp | ||
wait.h |