FIX: Further reduce the input of to_tsvector (#15716)

Random strings can result into much longer tsvectors. For example parsing a Base64 string of ~600kb can result in a tsvector of over 1MB, which is the maximum size of a tsvector. Follow-up-to: 823c3f09d44ab89e88c4910abe36899bb23d601d
2025-03-19 18:35:31 +08:00 · 2022-02-07 23:03:01 +02:00 · 2022-02-07 23:03:01 +02:00 · 820fea835c
commit 820fea835c
parent e92f57255d
1 changed files with 5 additions and 5 deletions
--- a/app/services/search_indexer.rb
+++ b/app/services/search_indexer.rb
@ -120,11 +120,11 @@ class SearchIndexer
      a_weight: topic_title,
      b_weight: category_name,
      c_weight: topic_tags,
-      # Length of a tsvector must be less than 1_048_576 bytes.
-      # The difference between the max ouptut limit and imposed input limit
-      # accounts for the fact that sometimes the output tsvector may be
-      # slighlty longer than the input.
-      d_weight: scrub_html_for_search(cooked)[0..1_000_000]
+      # The tsvector resulted from parsing a string can be double the size of
+      # the original string. Since there is no way to estimate the length of
+      # the expected tsvector, we limit the input to ~50% of the maximum
+      # length of a tsvector (1_048_576 bytes).
+      d_weight: scrub_html_for_search(cooked)[0..600_000]
    ) do |params|
      params["private_message"] = private_message
    end