FIX: Further reduce the input of to_tsvector (#15716)

Random strings can result into much longer tsvectors. For example
parsing a Base64 string of ~600kb can result in a tsvector of over 1MB,
which is the maximum size of a tsvector.

Follow-up-to: 823c3f09d4
This commit is contained in:
Dan Ungureanu 2022-02-07 23:03:01 +02:00 committed by GitHub
parent e92f57255d
commit 820fea835c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -120,11 +120,11 @@ class SearchIndexer
a_weight: topic_title,
b_weight: category_name,
c_weight: topic_tags,
# Length of a tsvector must be less than 1_048_576 bytes.
# The difference between the max ouptut limit and imposed input limit
# accounts for the fact that sometimes the output tsvector may be
# slighlty longer than the input.
d_weight: scrub_html_for_search(cooked)[0..1_000_000]
# The tsvector resulted from parsing a string can be double the size of
# the original string. Since there is no way to estimate the length of
# the expected tsvector, we limit the input to ~50% of the maximum
# length of a tsvector (1_048_576 bytes).
d_weight: scrub_html_for_search(cooked)[0..600_000]
) do |params|
params["private_message"] = private_message
end