FEATURE: allow disabling of extra term injection in search

There is a feature in search where we take over from the tokenizer
in postgres and attempt to inject more words into search.

So for example: sam.i.am will inject the words i and am.

This is not ideal cause there are many edge cases and this can
cause extreme index bloat.

This is an opening move commit to make it configurable, over the
next few weeks we will evaluate and decide if we disable this by
default or simply remove.
This commit is contained in:
Sam Saffron 2020-06-25 13:36:52 +10:00 committed by Robin Ward
parent 5f5dd9ea67
commit ae520b62e4
2 changed files with 5 additions and 0 deletions

View File

@ -17,6 +17,8 @@ class SearchIndexer
end end
def self.inject_extra_terms(raw) def self.inject_extra_terms(raw)
return raw if !SiteSetting.search_inject_extra_terms
# insert some extra words for I.am.a.word so "word" is tokenized # insert some extra words for I.am.a.word so "word" is tokenized
# I.am.a.word becomes I.am.a.word am a word # I.am.a.word becomes I.am.a.word am a word
raw.gsub(/[^[:space:]]*[\.]+[^[:space:]]*/) do |with_dot| raw.gsub(/[^[:space:]]*[\.]+[^[:space:]]*/) do |with_dot|

View File

@ -1730,6 +1730,9 @@ backups:
hidden: true hidden: true
search: search:
search_inject_extra_terms:
default: true
hidden: true
min_search_term_length: min_search_term_length:
client: true client: true
default: 3 default: 3