Commit Graph

347 Commits

Author SHA1 Message Date
Sérgio Saquetim
f8e3a90ab9
DEV: Forces custom search filter matchers to be case insensitive (#28785) 2024-09-06 12:30:51 -03:00
Loïc Guitaut
9b4b5b5028 FIX: Return proper results when searching for a topic in Japanese
Currently, when the default locale is Japanese, the search for a topic
using its URL, path or ID doesn’t work as expected. It will either
return wrong results or no result at all.

The problem lies with how we process the provided terms in Japanese
mode. For example, if `http://localhost/t/-/55` is provided, currently
this will result in `http localhost t 5 5` to be searched for.

This patch addresses the issue by checking whether the provided term
needs segmenting. If the provided term is a number, or a path or a full
URL, then it doesn’t need segmenting. When that happens we skip the
processing we normally apply for Japanese, making the search return the
expected results.
2024-09-03 09:48:58 +02:00
Daniel Waterworth
1a95543e93
PERF: Don't use unaccent on string literals (#28120)
unaccent isn't marked as a pure function, so it gets evaluated per row
instead of once.
2024-07-29 15:37:25 -05:00
Sérgio Saquetim
4b20021033
DEV: Restrict include:unlisted search option to users that can view unlisted topics (#27977) 2024-07-18 16:33:14 -03:00
Sérgio Saquetim
6a3e12a39c
FEATURE: Include advanced search option to include unlisted topics in the results (#27958)
---------

Co-authored-by: Régis Hanol <regis@hanol.fr>
2024-07-18 13:43:53 -03:00
Isaac Janzen
005f623c42
DEV: Add user_agent column to search_logs (#27742)
Add a new column - `user_agent` - to the `SearchLog` table. 

This column can be null as we are only allowing a the user-agent string to have a max length of 2000 characters. In the case the user-agent string surpasses the max characters allowed, we simply nullify the value, and save/write the log as normal.
2024-07-05 14:05:00 -05:00
Régis Hanol
a56321efb5 FIX: topic search order
When using the full page search and filtering down to a specific topic, the sort order was overwritten to by by "post_number".

This was confusing because we allow different type of sort order in the full search page.

This fixes it by only sorting by post_number when there's no "global" sort order defined.

Since the "new topic map" uses the search endpoint behind the scene, this also fixes the "most likes" popup.

Context - https://meta.discourse.org/t/searching-order-seems-to-be-broken-when-searching-in-topic/312303
2024-06-27 18:13:26 +02:00
Sam
dc8249c08a
FEATURE: align with /filter and allow multiple category search (#27440)
This introduces the syntax of

`category:a,b,c` which will search across multiple categories.

Previously there was no way to allow search across a wide selection of
categories.
2024-06-12 16:06:04 +10:00
Loïc Guitaut
2a28cda15c DEV: Update to lastest rubocop-discourse 2024-05-27 18:06:14 +02:00
Jan Cernik
9fb888923d
FIX: Do not show hidden posts in search results (#26800) 2024-04-29 12:32:02 -03:00
Régis Hanol
f7a1272fa4 DEV: cleanup custom filters to prevent leaks
Ensures we clean up any custom filters added in the specs to prevent any leaks when running the specs.

Follow up to https://github.com/discourse/discourse/pull/26770#discussion_r1582464760
2024-04-29 16:11:12 +02:00
Bianca Nenciu
9199c52e5e
FIX: Load categories with search topic results (#25700)
Add categories to the serialized search results together with the topics
when lazy load categories is enabled. This is necessary in order for the
results to be rendered correctly and display the category information.
2024-02-21 17:29:47 +02:00
Alan Guo Xiang Tan
e61608d080
FIX: Remap postgres text search proximity operator (#25497)
Why this change?

Since 1dba1aca27, we have been remapping
the `<->` proximity operator in a tsquery to `&`. However, there is
another variant of it which follows the `<N>` pattern. For example, the
following text "end-to-end" will eventually result in the following
tsquery `end-to-end:* <-> end:* <2> end:*` being generated by Postgres.
Before this fix, the tsquery is remapped to `end-to-end:* & end:* <2>
end:*` by us. This is requires the search data which we store to contain
`end` at exactly 2 position apart. Due to the way we limit the
number of duplicates in our search data, the search term may end up not
matching anything. In bd32912c5e, we made
it such that we do not allow any duplicates when indexing a topic's
title. Therefore, search for `end-to-end` against a topic title with
`end-to-end` will never match because our index will only contain one
`end` term.

What does this change do?

We will remap the `<N>` variant of the proximity operator.
2024-02-01 07:20:46 +08:00
Martin Brennan
146da75fd7
FEATURE: Add setting & preference for search sort default order (#24428)
This commit adds a new `search_default_sort_order` site setting,
set to "relevance" by default, that controls the default sort order
for the full page /search route.

If the user changes the order in the dropdown on that page, we remember
their preference automatically, and it takes precedence over the site
setting as a default from then on. This way people who prefer e.g.
Latest Post as their default can make it so.
2023-11-20 10:43:58 +10:00
Sam
f25849501d
FEATURE: allow consumers to parse a search string (#23528)
This extends search so it can have consumers that:

1. Can split off "term" from various advanced filters and orders
2. Can build a relation of either order or filter

It also moves a lot of stuff around in the search class for clarity.

Two new APIs are exposed:

`.apply_filter` to apply all the special filters to a posts/topics relation
`.apply_order` to force a particular order (eg: order:latest)

This can then be used by semantic search in Discourse AI
2023-09-12 16:21:01 +10:00
Mark VanLandingham
730f652255
DEV: Add plugin modifier locations for user search locations (#23169) 2023-08-21 12:23:42 -05:00
Canapin
b3c722f2f7
FIX: created:@ search keyword for uppercase usernames (#22878)
The filter wasn't working if the username had uppercase letters.
2023-08-02 15:28:17 -04:00
Sérgio Saquetim
908117e270
DEV: Added modifier hooks to allow plugins to tweak how categories and groups are fetched (#21837)
This commit adds modifiers that allow plugins to change how categories and groups are prefetched into the application and listed in the respective controllers.

Possible use cases:

- prevent some categories/groups from being prefetched when the application loads for performance reasons.
- prevent some categories/groups from being listed in their respective index pages.
2023-05-30 18:41:50 -03:00
Sam
b2e3084205
FEATURE: allow searching for oldest topics (#21715)
In some cases reverse chronological can be very important.

- Oldest post by sam
- Oldest topic by sam

Prior to these new filters we had no way of searching for them.

Now the 2 new orders `order:oldest` and `order:oldest_topic` can be used
to find oldest topics and posts

* Update spec/lib/search_spec.rb

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>

* Update spec/lib/search_spec.rb

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>

---------

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
2023-05-24 18:26:36 +10:00
Mark VanLandingham
96e3c5e102
DEV: Add hidden site setting to control search page size (#21640) 2023-05-18 15:30:08 -05:00
Bianca Nenciu
d6534bdb11
DEV: Fix test (#21283)
Apostrophe-like characters (for example, ’ and ') are transformed to the
ASCII apostrophe (') regardless of search_ignore_accents.
2023-05-04 17:04:26 +03:00
Sam
c63551d227
FEATURE: search_rank_sort_priorities modifier (#21329)
This new modifier can be used by plugins to modify search ordering.

Specifically plugins such as discourse_solved can amend search ordering
so solved topics bump to the top.

Also correct edge case where low and high sort priority categories did not
order correctly when it came to closed/archived
2023-05-02 16:36:36 +10:00
Ted Johansson
25a226279a
DEV: Replace #pluck_first freedom patch with AR #pick in core (#19893)
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.

There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
2023-02-13 12:39:45 +08:00
Sam
5d28cb709a
FIX: de-prioritize archived topics (#20161)
Previously due to an error archived topics were more prominent in search
than closed topics.

This amends our internal logic to ensure archived topics are bumped down
the list.
2023-02-03 13:23:27 +11:00
Sam
1dba1aca27
FIX: add support for PG 14 and up (#20137)
Previously to_tsquery would split terms and join with &

In PG 14 terms are split and use <-> which means followed directly by.

In PG 13:

discourse_test=# SELECT to_tsquery('english', '''hello world''');
     to_tsquery
---------------------
 'hello' & 'world'
(1 row)

In PG 14:

discourse_test=# SELECT to_tsquery('english', '''hello world''');
     to_tsquery
---------------------
 'hello' <-> 'world'
(1 row)


Change is very unobtrosive, we simply amend our to_tsquery to behave like
it used to behave and make no use of the `<->` operator


More detail at: https://akorotkov.github.io/blog/2021/05/22/pg-14-query-parsing/

Note that plainto_tsquery used elsewhere in Discourse keeps the exact
same function.

This also corrects a faulty test that was passing by a fluke on older
version of PG
2023-02-03 08:11:25 +11:00
Sam
c5345d0e54
FEATURE: prioritize_exact_search_title_match hidden setting (#20089)
The new `prioritize_exact_search_match` can be used to force the search
algorithm to prioritize exact term matches in title when ranking results.

This is scoped narrowly to titles for cases such as a topic titled:

"organisation chart" and a search of "org chart".

If we scoped this wider, all discussion about "org chart" would float to
the top and leave a very common title de-prioritized.

This is a hidden site setting and it has some performance impact due
to double ranking.

That said, performance impact is somewhat mitigated cause ranking on
title alone is a very cheap operation.
2023-01-31 16:34:01 +11:00
Alan Guo Xiang Tan
6934edd97c
DEV: Add hidden site setting to configure search ranking weights (#20086)
This site setting is mostly experimental at this point.
2023-01-31 08:57:13 +08:00
Sam
5d669d8aa2
Revert "FEATURE: hidden site setting to disable search prefix matching (#20058)" (#20073)
This reverts commit 64f7b97d08.

Too many side effects for this setting, we have decided to remove it
2023-01-31 07:39:23 +08:00
Sam
64f7b97d08
FEATURE: hidden site setting to disable search prefix matching (#20058)
Many users seems surprised by prefix matching in search leading to
unexpected results.

Over the years we always would return results starting with a search term
and not expect exact matches.

Meaning a search for `abra` would find `abracadabra`

This introduces the Site Setting `enable_search_prefix_matching` which
defaults to true. (behavior unchanged)

We plan to experiment on select sites with exact matches to see if the
results are less surprising
2023-01-30 12:44:40 +08:00
Daniel Waterworth
666536cbd1
DEV: Prefer \A and \z over ^ and $ in regexes (#19936) 2023-01-20 12:52:49 -06:00
Sérgio Saquetim
0feb9ad341
DEV: Added callback to change the query used to filter groups in search (#19884)
Added plugin registry that will allow adding callbacks that can change the query that is used
to filter groups while running a search.
2023-01-16 15:48:00 -03:00
Bianca Nenciu
fb780c50fd
FIX: Replace all quote-like unicodes with quotes (#19714)
If unaccent is called with quote-like Unicode characters then it can
generate invalid queries because some of the transformed quotes by
unaccent are not escaped and to_tsquery fails because of bad input.

This commits replaces more quote-like Unicode characters before
unaccent is called.
2023-01-09 19:19:51 +02:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
Bianca Nenciu
17b7ab0d7b
FIX: Make sure generated tsqueries are valid (#19368)
The tsquery used for searching is generated using both functions from
Ruby and Postgresql (for example, unaccent function). Depending on the
term used, it generated an invalid tsquery. For example "can’t"
generated "''can''t''" instead of "''can''''t''".
2022-12-12 17:57:20 +02:00
Du Jiajun
41e6b516e5
FIX: Support unicode in search filter @username (#18804) 2022-11-16 10:42:37 +01:00
Daniel Waterworth
167181f4b7
DEV: Quote values when constructing SQL (#18827)
All of these cases should already be safe, but still good to quote for
"defense in depth".
2022-11-01 14:05:13 -05:00
Bianca Nenciu
9db8f00b3d
FEATURE: Create upload_references table (#16146)
This table holds associations between uploads and other models. This can be used to prevent removing uploads that are still in use.

* DEV: Create upload_references
* DEV: Use UploadReference instead of PostUpload
* DEV: Use UploadReference for SiteSetting
* DEV: Use UploadReference for Badge
* DEV: Use UploadReference for Category
* DEV: Use UploadReference for CustomEmoji
* DEV: Use UploadReference for Group
* DEV: Use UploadReference for ThemeField
* DEV: Use UploadReference for ThemeSetting
* DEV: Use UploadReference for User
* DEV: Use UploadReference for UserAvatar
* DEV: Use UploadReference for UserExport
* DEV: Use UploadReference for UserProfile
* DEV: Add method to extract uploads from raw text
* DEV: Use UploadReference for Draft
* DEV: Use UploadReference for ReviewableQueuedPost
* DEV: Use UploadReference for UserProfile's bio_raw
* DEV: Do not copy user uploads to upload references
* DEV: Copy post uploads again after deploy
* DEV: Use created_at and updated_at from uploads table
* FIX: Check if upload site setting is empty
* DEV: Copy user uploads to upload references
* DEV: Make upload extraction less strict
2022-06-09 09:24:30 +10:00
Penar Musaraj
8222810099
FIX: Limits for PM and group header search (#16887)
When searching for PMs or PMs in a group inbox, results in the header search were not being limited to 5 with a "More" link to the full page search. This PR fixes that.

It also simplifies the logic and updates the search API docs to include recently added `in:messages` and `group_messages:groupname` options.
2022-05-24 11:31:24 -04:00
Martin Brennan
fcc2e7ebbf
FEATURE: Promote polymorphic bookmarks to default and migrate (#16729)
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.

No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
2022-05-23 10:07:15 +10:00
Martin Brennan
955d47bbd0
FIX: Use polymorphic bookmarks for in:bookmarks search (#16684)
This commit makes sure the in:bookmarks post advanced
search filter works with polymorphic bookmarks.
2022-05-10 09:08:01 +10:00
Martin Brennan
222c8d9b6a
FEATURE: Polymorphic bookmarks pt. 3 (reminders, imports, exports, refactors) (#16591)
A bit of a mixed bag, this addresses several edge areas of bookmarks and makes them compatible with polymorphic bookmarks (hidden behind the `use_polymorphic_bookmarks` site setting). The main ones are:

* ExportUserArchive compatibility
* SyncTopicUserBookmarked job compatibility
* Sending different notifications for the bookmark reminders based on the bookmarkable type
* Import scripts compatibility
* BookmarkReminderNotificationHandler compatibility

This PR also refactors the `register_bookmarkable` API so it accepts a class descended from a `BaseBookmarkable` class instead. This was done because we kept having to add more and more lambdas/properties inline and it was very messy, so a factory pattern is cleaner. The classes can be tested independently as well.

Some later PRs will address some other areas like the discourse narrative bot, advanced search, reports, and the .ics endpoint for bookmarks.
2022-05-09 09:37:23 +10:00
Penar Musaraj
b266a36967
FEATURE: Add group_messages: keyword to advanced search (#16584) 2022-04-28 10:47:40 -04:00
Penar Musaraj
eebce8f80a
FEATURE: Add in:messages search modifier (#16567)
This adds `in:messages` as a synonym for `in:personal` and sets it up as our default nomenclature (`in:personal` will still work).
2022-04-26 16:47:01 -04:00
Bianca Nenciu
6eb3d658ca
FIX: Do not wrap unaccent around tsqueries (#16284)
tsqueries use quotes and having other characters that when unaccented
become quotes results in invalid tsqueries.
2022-03-25 19:10:05 +02:00
Bianca Nenciu
34b4b53bac
FEATURE: Use Postgres unaccent to ignore accents (#16100)
The search_ignore_accents site setting can be used to make the search
indexer remove the accents before indexing the content. The unaccent
function from PostgreSQL is better than Ruby's unicode_normalize(:nfkd).
2022-03-07 23:03:10 +02:00
Alan Guo Xiang Tan
930f51e175 FEATURE: Split up text segmentation for Chinese and Japanese.
* Chinese segmenetation will continue to rely on cppjieba
* Japanese segmentation will use our port of TinySegmenter
* Korean currently does not rely on segmentation which was dropped in c677877e4f
* SiteSetting.search_tokenize_chinese_japanese_korean has been split
into SiteSetting.search_tokenize_chinese and
SiteSetting.search_tokenize_japanese respectively
2022-02-07 09:21:14 +08:00
Alan Guo Xiang Tan
fff8b98485 SECURITY: Advanced group search did not respect visiblity of groups. 2022-01-10 13:49:26 +08:00
Penar Musaraj
d99deaf1ab
FEATURE: show recent searches in quick search panel (#15024) 2021-11-25 15:44:15 -05:00
Penar Musaraj
20f5474be9
FEATURE: Log only topic/post search queries in search log (#14994) 2021-11-18 09:21:12 +08:00
Alan Guo Xiang Tan
a03c48b720
FIX: Use the same mode for chinese search when indexing and querying. (#14780)
The `白名单` term becomes `名单 白名单` after it is processed by
cppjieba in :query mode. However, `白名单` is not tokenized as such by cppjieba when it
appears in a string of text. Therefore, this may lead to failed matches as
the search data generated while indexing may not contain all of the
terms generated by :query mode. We've decided to maintain parity for now
such that both indexing and querying uses the same :mix mode. This may
lead to less accurate search but our plan is to properly support CJK
search in the future.
2021-11-01 10:14:47 +08:00