Commit Graph

57 Commits

Author SHA1 Message Date
Sam
cd247d5322
FEATURE: Roll out new search optimisations (#20364)
- Reduce duplication of terms in post index from unlimited to 6. This will
result in reduced index size and reduced weighting for posts containing
a huge amount of duplicate terms. (Eg: a post containing "sam sam sam sam
sam sam sam sam", will index as "sam sam sam sam sam sam", only including
the word up to 6 times.) This corrects a flaw where title weighting could
be ignored.

- Prioritize exact matches of words in titles. Our search always performs
a prefix match. However we want to give special weight to exact title matches
meaning that a search for "sum" will find topics such as "the sum of us" vs
"summer in spring".

- Pick up fixes to our search algorithm which are missing from old indexes.
Specifically pick up the fix that indexes URLs properly. (`https://happy.com`
was stemmed to `happi` in keywords and then was not searchable)

see also:

https://meta.discourse.org/t/refinements-to-search-being-tested-on-meta/254158

Indexing will take a while and work in batches, in the background.
2023-02-20 11:53:35 +11:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
Penar Musaraj
8222810099
FIX: Limits for PM and group header search (#16887)
When searching for PMs or PMs in a group inbox, results in the header search were not being limited to 5 with a "More" link to the full page search. This PR fixes that.

It also simplifies the logic and updates the search API docs to include recently added `in:messages` and `group_messages:groupname` options.
2022-05-24 11:31:24 -04:00
Alan Guo Xiang Tan
4e5f5b67b0
DEV: Remove monkey patch that is no longer required (#16648) 2022-05-06 10:33:42 +08:00
Bianca Nenciu
34b4b53bac
FEATURE: Use Postgres unaccent to ignore accents (#16100)
The search_ignore_accents site setting can be used to make the search
indexer remove the accents before indexing the content. The unaccent
function from PostgreSQL is better than Ruby's unicode_normalize(:nfkd).
2022-03-07 23:03:10 +02:00
Alan Guo Xiang Tan
930f51e175 FEATURE: Split up text segmentation for Chinese and Japanese.
* Chinese segmenetation will continue to rely on cppjieba
* Japanese segmentation will use our port of TinySegmenter
* Korean currently does not rely on segmentation which was dropped in c677877e4f
* SiteSetting.search_tokenize_chinese_japanese_korean has been split
into SiteSetting.search_tokenize_chinese and
SiteSetting.search_tokenize_japanese respectively
2022-02-07 09:21:14 +08:00
Sam
5b342ae505
FIX: remove superfluous spaces from CJK blurbs (#12629)
Previously we used the raw data indexed to generate blurbs even for cases
when Chinese/Korean/Japanese text was used.

This caused superfluous spaces to show up in excerpts.
2021-04-12 12:46:42 +10:00
Guo Xiang Tan
93f8396b4b
FIX: Limit PG headline based search blurb generation to 200 characters.
* Recovers omission characters '...' in blurb as well.
2020-08-12 15:34:27 +08:00
Guo Xiang Tan
2193d02433
PERF: Use PG headlines for blurb generation and highlighting for search. 2020-08-06 14:56:29 +08:00
Guo Xiang Tan
255b0e9f14
PERF: Replace video and audio links in search blurb while indexing.
In the near future, we will be swtiching to PG headlines to generate the
search blurb. As such, we need to replace audio and video links in the
raw data used for headline generation. This also means that we avoid
replacing links each time we need to generate the blurb.
2020-08-06 12:25:03 +08:00
Guo Xiang Tan
06ef87da51
DEV: Make rubocop happy. 2020-08-06 10:11:07 +08:00
Guo Xiang Tan
ee5d8fba0c
PERF: Optimize ActionView::Helpers::TextHelper#excerpt. 2020-08-06 09:59:20 +08:00
Guo Xiang Tan
4c03a944f6
PERF: Move URI regexp in GroupSearchResults.blurb_for into constant
No need to generate the huge regexp over and over again.
2020-08-04 14:38:43 +08:00
Guo Xiang Tan
181c4eb760 PERF: Avoid parsing Post#cooked with Nokogiri for every search. 2020-07-24 10:43:09 +08:00
Guo Xiang Tan
ce39733b1a
FIX: Incorrect search blurb when advanced search filters are used take2
Also remove include_blurbs attribute which isn't used.
2020-07-14 11:50:40 +08:00
David Taylor
cb1f891392
Revert "FIX: Incorrect search blurb when advanced search filters are used."
This change was causing advanced search filters to disappear from the search input

This reverts commit 2e1eafae06.
2020-07-09 16:19:18 +01:00
Guo Xiang Tan
2e1eafae06
FIX: Incorrect search blurb when advanced search filters are used. 2020-07-08 11:59:49 +08:00
Penar Musaraj
0dfc594784 FIX: skip invalid URLs when checking for audio/video in search blurbs
Fixes 500 errors on search queries introduced in 580a4a8
2019-11-06 10:32:15 -05:00
Penar Musaraj
15b25547bb DEV: Cleanup misspelled TextHelper param 2019-10-31 09:32:42 -04:00
Penar Musaraj
f8b72d9835 DEV: Refactor excluding audio/video URLs from search result blurbs
Followup to 580a4a82
2019-10-31 09:13:24 -04:00
Penar Musaraj
580a4a827b Exclude audio/video URLs from search result blurbs
Displays translatable "[audio]" or "[video]" placeholders instead of ugly (and often long) URLs.
2019-10-30 13:07:16 -04:00
Sam Saffron
4dcc5f16f1 FEATURE: when under extreme load disable search
The global setting disable_search_queue_threshold
(DISCOURSE_DISABLE_SEARCH_QUEUE_THRESHOLD) which default to 1 second was
added.

This protection ensures that when the application is unable to keep up with
requests it will simply turn off search till it is not backed up.

To disable this protection set this to 0.
2019-07-02 11:22:01 +10:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Sam Saffron
e2bcf55077 DEV: move send => public_send in lib folder
This handles most of the cases in `lib` where we were using send instead
of public_send
2019-05-07 12:25:44 +10:00
Guo Xiang Tan
451f7842ff DEV: More send -> public_send. 2019-05-07 10:05:58 +08:00
Guo Xiang Tan
dae0bb4c67 FIX: Post blurb incorrect when search contains a phrase match.
If the blurb generated is not around the search term, we will not be
able to highlight it on the client side.
2019-03-26 17:01:52 +08:00
Joffrey JAFFEUX
dc4001370c
FEATURE: displays groups in menu search (#7090) 2019-03-04 10:30:09 +01:00
Régis Hanol
4481836de2 FEATURE: new 'search_ignore_accents' site setting 2018-09-17 10:42:30 +02:00
Neil Lalonde
2c56f8df7c FEATURE: show tags in search results 2017-08-25 11:52:59 -04:00
Neil Lalonde
7c1d7fb423 Merge branch 'master' into fix_limited_search_results 2017-07-31 15:55:31 -04:00
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Jakub Macina
7b40de5ac4 Add attribute to grouped search results for more available posts. 2017-07-20 18:07:13 +02:00
Robin Ward
21e02d6969 Include the search_log_id in search results 2017-07-17 12:10:32 -04:00
Sam
0a78ae739d Remove SearchObserver, aim is to remove all observers
rails-observers gem is mostly unmaintained and is a pain to carry forward
new implementation contains significantly less magic as a bonus
2016-12-22 13:13:14 +11:00
Sam
50f7616d04 FIX: include pinned status in search results 2016-03-18 16:26:20 +11:00
Sam
2876725e1b REFACTOR: remove hacky search from discovery 2015-07-27 16:47:06 +10:00
Sam
41ceff8430 UX: move search to its own route
previously search was bundled with discovery, something that makes stuff confusing internally
2015-07-27 16:47:06 +10:00
Robin Ward
6422d5efbd Use the same component for similar topics as search results. 2015-06-24 15:08:22 -04:00
Sam
0ade9bafff FIX: highlight in yellow, not blue
FEATURE: highlight in title
2014-09-04 15:01:13 +10:00
Sam
9c29c1c072 FEATURE: highlight search results 2014-09-03 17:09:01 +10:00
Sam
f06ad7ed8e remove old unused files 2014-09-03 12:15:48 +10:00
Sam
4f09d552ed FEATURE: increase search expansion to 50 results
refactor search code to deal with proper objects
use proper serializers, test the controllers
2014-09-03 12:13:25 +10:00
Sam
69e418facf FEATURE: wider search with more context 2014-09-01 17:04:57 +10:00
Robin Ward
e8cade40c7 Improve search results by introducing an aggregate post search data
filter. It seems performant despite the extra content being searched.
2014-08-22 16:56:26 -04:00
Akshay
7ef61144e7 Avoid using to_s when performing String Interpolation 2014-08-14 23:55:27 +05:30
Sam
c5a3bfdfa9 BUGFIX: missing avatars in search 2014-05-29 14:38:52 +10:00
paully21
84d100be85 Add blurb of post to search results via API 2014-04-17 07:58:51 -05:00
Sam
abb2de22ab BUGFIX: search could break when expanding 2014-02-17 14:34:14 +11:00
Sam
2b10fdc97f FEATURE: search auto scopes on topic first 2014-02-17 13:54:51 +11:00
verg
f723f11443 Fix subcategories links from search 2014-02-16 12:49:20 -05:00