discourse

mirror of https://github.com/discourse/discourse.git synced 2024-11-27 21:53:48 +08:00

Author	SHA1	Message	Date
Guo Xiang Tan	2607bb602e	Fix broken spec. Follow-up to `3c678df942`	2020-10-08 10:52:46 +08:00
Sam	3c678df942	PERF: avoid lookbehinds when indexing search (#10862 ) * PERF: avoid lookbehinds when indexing search Previously we used a `EmailCook.url_regexp` this regex used lookbehinds Unfortunately certain strings could lead to pathological behavior causing CPU to skyrocket and regex replace to take a very very long time. EmailCook still needs a fix, but it is less urgent cause it already splits to single lines. That said we will correct that as well in a seperate PR. New implementation is far more naive and relies on the extra spaces search indexer inserts.	2020-10-08 11:40:13 +11:00
Arpit Jalan	f7940b1d20	FEATURE: advanced search option for max posts count (#10761 ) This commit adds an option to search for max posts count and updates the UI for posts count search to show a min/max range in single line.	2020-09-28 21:34:16 +05:30
Arpit Jalan	4498c59085	FEATURE: add alias for min_post_count search filter	2020-09-28 16:07:44 +05:30
Arpit Jalan	cdf45f4fe6	Update regex for views search filter.	2020-09-24 17:05:55 +05:30
Arpit Jalan	0c5cd0d1ef	FEATURE: advanced search filters for view count	2020-09-24 15:22:18 +05:30
Bianca Nenciu	4abbe3d361	FEATURE: Make search filters case insensitive (#10715 )	2020-09-23 11:59:42 +03:00
Krzysztof Kotlarek	cb58cbbc2c	FEATURE: allow to extend topic_eager_loads in Search (#10625 ) This additional interface is required by encrypt plugin	2020-09-14 11:58:28 +10:00
Guo Xiang Tan	e6ca1b4326	FIX: Admin search for PMs should only search own PMs. In `c6ceda8c`, a bug was introduced where an admin searching for his own private messages will actually end up searching through all private messages on the site. Follow-up to `c6ceda8c4e`	2020-09-10 11:37:18 +08:00
Dan Ungureanu	38c9c87128	FIX: Add to tags result set only visible tags (#10580 )	2020-09-02 13:24:40 +03:00
Guo Xiang Tan	40c6d90df3	PERF: Create a partial regular post_search_data index on large sites. With the addition of `PostSearchData#private_message`, a partial index consisting of only search data from regular posts can be created. The partial index helps to speed up searches on large sites since PG will not have to do an index scan on the entire search data index which has shown to be a bottle neck.	2020-08-27 13:42:00 +08:00
siriwatknp	1a2800ad07	fix: 🐛 category & tag search regex to support thai character	2020-08-25 16:12:26 +08:00
Guo Xiang Tan	05174df5c0	FIX: Restrict `personal_messages:` advanced search filter to admin. The filter noops if an incorrect username is passed. This filter is not exposed as part of the UI but is only used when an admin transitions from a search within a user's personal messages to the full page search. Follow-up to `4b30799054`.	2020-08-24 13:53:48 +08:00
Guo Xiang Tan	c6ceda8c4e	PERF: Avoid extra subquery when searching within PMs for normal user. Note the following query being generated where the filter for a user's private messages is executed twice. ```sql SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id", (TS_RANK_CD( post_search_data.search_data, TO_TSQUERY('english', '''test'':ABCD'), 0\|32 ) ( CASE categories.search_priority WHEN 2 THEN 0.6 WHEN 3 THEN 0.8 WHEN 4 THEN 1.2 WHEN 5 THEN 1.4 ELSE CASE WHEN topics.closed THEN 0.9 ELSE 1 END END ) ) rank, topics.bumped_at topic_bumped_at FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3) AND (topics.visible) AND (topics.archetype = 'private_message' AND post_search_data.private_message) AND (posts.topic_id IN (SELECT topic_id FROM topic_allowed_users WHERE user_id = 99999 UNION ALL SELECT tg.topic_id FROM topic_allowed_groups tg JOIN group_users gu ON gu.user_id = 99999 AND gu.group_id = tg.group_id )) AND (post_search_data.search_data @@ TO_TSQUERY('english', '''test'':*ABCD')) AND (posts.topic_id IN (SELECT topic_id FROM topic_allowed_users WHERE user_id = 99999 UNION ALL SELECT tg.topic_id FROM topic_allowed_groups tg JOIN group_users gu ON gu.user_id = 99999 AND gu.group_id = tg.group_id )) AND ((categories.id IS NULL) OR (NOT categories.read_restricted) OR (categories.id IN (999999))) ORDER BY rank DESC, topic_bumped_at DESC ```	2020-08-24 13:49:43 +08:00
Guo Xiang Tan	2f043dc89a	Fix lint.	2020-08-24 12:38:46 +08:00
Guo Xiang Tan	4b30799054	FIX: Correct `personal_messages:<username>` advanced search filter. Renamed from `private_messages` to `personal_messages` without deprecation because the `private_messages` advanced search filter never worked in the first place when it was implemented.	2020-08-24 11:54:30 +08:00
Guo Xiang Tan	106a2f58a2	DEV: Drop support for deprecated `in:private` search filter.	2020-08-21 17:18:39 +08:00
Guo Xiang Tan	0684118008	DEV: Remove array_agg from search orders that does not need it.	2020-08-21 14:39:07 +08:00
Guo Xiang Tan	92b7fe4c62	PERF: Add partial index for non-pm search.	2020-08-18 15:55:08 +08:00
Guo Xiang Tan	248bebb8cd	PERF: Remove extra subquery in search. I also noticed that removing the subquery helps the planner to plan better.	2020-08-17 13:52:12 +08:00
Guo Xiang Tan	93f8396b4b	FIX: Limit PG headline based search blurb generation to 200 characters. * Recovers omission characters '...' in blurb as well.	2020-08-12 15:34:27 +08:00
Guo Xiang Tan	053cbe3112	PERF: Limit characters used to generate headline for search blurb. We determined using the following benchmark script that limiting to 2500 chars would mean a maximum of 25ms spent generating headlines. ``` require 'benchmark/ips' string = <<~STRING Far far away, behind the word mountains... STRING def sql_excerpt(string, l = 1000000) DB.query_single(<<~SQL) SELECT TS_HEADLINE('english', left('#{string}', #{l}), PLAINTO_TSQUERY('mountains')) SQL end def ruby_excerpt(string) output = DB.query_single("SELECT '#{string}'")[0] Search::GroupedSearchResults::TextHelper.excerpt(output, 'mountains', radius: 100) end puts "Ruby Excerpt: #{ruby_excerpt(string)}" puts "SQL Excerpt: #{sql_excerpt(string)}" puts Benchmark.ips do \|x\| x.time = 10 [1000, 2500, 5000, 10000, 20000, 50000].each do \|l\| short_string = string[0..l] x.report("ts_headline excerpt #{l}") do sql_excerpt(short_string, l) end x.report("actionview excerpt #{l}") do ruby_excerpt(short_string) end end x.compare! end ``` ``` actionview excerpt 1000: 20570.7 i/s actionview excerpt 2500: 17863.1 i/s - 1.15x (± 0.00) slower actionview excerpt 5000: 14228.9 i/s - 1.45x (± 0.00) slower actionview excerpt 10000: 10906.2 i/s - 1.89x (± 0.00) slower actionview excerpt 20000: 6255.0 i/s - 3.29x (± 0.00) slower ts_headline excerpt 1000: 4337.5 i/s - 4.74x (± 0.00) slower actionview excerpt 50000: 3222.7 i/s - 6.38x (± 0.00) slower ts_headline excerpt 2500: 2240.4 i/s - 9.18x (± 0.00) slower ts_headline excerpt 5000: 1258.7 i/s - 16.34x (± 0.00) slower ts_headline excerpt 10000: 667.2 i/s - 30.83x (± 0.00) slower ts_headline excerpt 20000: 348.7 i/s - 58.98x (± 0.00) slower ts_headline excerpt 50000: 131.9 i/s - 155.91x (± 0.00) slower ```	2020-08-07 14:36:52 +08:00
Guo Xiang Tan	e60c74d3c1	FEATURE: Use PG `ts_headline` for highlighting topic title in search.	2020-08-07 12:43:09 +08:00
Krzysztof Kotlarek	12a00d6dc5	FEATURE: add advanced order to search (#10385 ) Similar to `advanced_filter` I introduced `advanced_order`. I needed a new option because default orders are evaluated after advanced_filter so I couldn't use it. Also, that part is a little bit more generic ``` elsif word =~ /order:\w+/ @order = word.gsub('order:', '').to_sym nil ``` After those changes, I can use them in plugins in this way: ``` Search.advanced_order(:votes) do \|posts\| posts.reorder("COALESCE((SELECT dvvc.counter FROM discourse_voting_vote_counters dvvc WHERE dvvc.topic_id = subquery.topic_id), 0) DESC") end ```	2020-08-07 12:47:00 +10:00
Guo Xiang Tan	ab2b6f8dea	FIX: Specify config when generating tsquery using `ts_headline`.	2020-08-07 10:21:14 +08:00
Guo Xiang Tan	2193d02433	PERF: Use PG headlines for blurb generation and highlighting for search.	2020-08-06 14:56:29 +08:00
Guo Xiang Tan	3b08b15855	PERF: Remove one extra call to Redis when searching.	2020-08-04 14:02:02 +08:00
Guo Xiang Tan	597d542c33	FIX: Improve `Topic.similar_to` with better `Topic#title` matches. This changes PG text search to only match the given title against lexemes that are formed from the title. Likewise, the given raw will only be matched against lexemes that are formed from the post's raw.	2020-07-28 12:00:27 +08:00
Guo Xiang Tan	181c4eb760	PERF: Avoid parsing `Post#cooked` with Nokogiri for every search.	2020-07-24 10:43:09 +08:00
Guo Xiang Tan	af87911178	FIX: `in:title` search should only search through topic first posts.	2020-07-16 12:21:19 +08:00
Guo Xiang Tan	5bf0a0893b	FIX: Search by relevance may return incorrect post number. Follow up to `d8c796bc4`. Note that his change increases query time by around 40% in the following benchmark against `dev.discourse.org` but this is a tradeoff that has to be taken so that relevance search is accurate. ``` require 'benchmark/ips' Benchmark.ips do \|x\| x.config(time: 10, warmup: 2) x.report("current aggregate search query") do DB.exec <<~SQL SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id" FROM "posts" JOIN (SELECT , row_number() over() row_number FROM (SELECT topics.id, min(posts.post_number) post_number FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3, 4) AND (topics.visible) AND (topics.archetype <> 'private_message') AND (post_search_data.search_data @@ TO_TSQUERY('english', '''postgres'':ABCD')) AND (categories.id NOT IN ( SELECT categories.id WHERE categories.search_priority = 1 ) ) AND ((categories.id IS NULL) OR (NOT categories.read_restricted)) GROUP BY topics.id ORDER BY MAX(( TS_RANK_CD( post_search_data.search_data, TO_TSQUERY('english', '''postgres'':ABCD'), 1\|32 ) ( CASE categories.search_priority WHEN 2 THEN 0.6 WHEN 3 THEN 0.8 WHEN 4 THEN 1.2 WHEN 5 THEN 1.4 ELSE CASE WHEN topics.closed THEN 0.9 ELSE 1 END END ) ) ) DESC, topics.bumped_at DESC LIMIT 51 OFFSET 0) xxx) x ON x.id = posts.topic_id AND x.post_number = posts.post_number WHERE ("posts"."deleted_at" IS NULL) ORDER BY row_number; SQL end x.report("current aggregate search query with proper ranking") do DB.exec <<~SQL SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id" FROM "posts" JOIN (SELECT , row_number() over() row_number FROM (SELECT subquery.topic_id id, (ARRAY_AGG(subquery.post_number ORDER BY rank DESC, bumped_at DESC))[1] post_number, MAX(subquery.rank) rank, MAX(subquery.bumped_at) bumped_at FROM (SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id", ( TS_RANK_CD( post_search_data.search_data, TO_TSQUERY('english', '''postgres'':ABCD'), 1\|32 ) * ( CASE categories.search_priority WHEN 2 THEN 0.6 WHEN 3 THEN 0.8 WHEN 4 THEN 1.2 WHEN 5 THEN 1.4 ELSE CASE WHEN topics.closed THEN 0.9 ELSE 1 END END ) ) rank, topics.bumped_at bumped_at FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3, 4) AND (topics.visible) AND (topics.archetype <> 'private_message') AND (post_search_data.search_data @@ TO_TSQUERY('english', '''postgres'':*ABCD')) AND (categories.id NOT IN ( SELECT categories.id WHERE categories.search_priority = 1 ) ) AND ((categories.id IS NULL) OR (NOT categories.read_restricted))) subquery GROUP BY subquery.topic_id ORDER BY rank DESC, bumped_at DESC LIMIT 51 OFFSET 0) xxx) x ON x.id = posts.topic_id AND x.post_number = posts.post_number WHERE ("posts"."deleted_at" IS NULL) ORDER BY row_number; SQL end x.compare! end ``` ``` Warming up -------------------------------------- current aggregate search query 1.000 i/100ms current aggregate search query with proper ranking 1.000 i/100ms Calculating ------------------------------------- current aggregate search query 18.040 (± 0.0%) i/s - 181.000 in 10.035241s current aggregate search query with proper ranking 12.992 (± 0.0%) i/s - 130.000 in 10.007214s Comparison: current aggregate search query: 18.0 i/s current aggregate search query with proper ranking: 13.0 i/s - 1.39x (± 0.00) slower ```	2020-07-15 11:45:56 +08:00
Guo Xiang Tan	2196d0b9ae	FIX: Strip query from URLs when indexing for search. Indexing query strings in URLS produces inconsistent results in PG and pollutes the search data for really little gain. The following seems to work as expected... ``` discourse_development=# SELECT TO_TSVECTOR('https://www.discourse.org?test=2&test2=3'); to_tsvector ------------------------------------------------------ '2':3 '3':5 'test':2 'test2':4 'www.discourse.org':1 ``` However, once a path is present ``` discourse_development=# SELECT TO_TSVECTOR('https://www.discourse.org/latest?test=2&test2=3'); to_tsvector ---------------------------------------------------------------------------------------------- '/latest?test=2&test2=3':3 'www.discourse.org':2 'www.discourse.org/latest?test=2&test2=3':1 ``` The lexeme contains both the path and the query string.	2020-07-14 15:32:40 +08:00
Guo Xiang Tan	5c31216aea	FIX: Search for whole URLs wasn't working.	2020-07-14 15:31:48 +08:00
Guo Xiang Tan	d8c796bc44	FIX: Ensure that aggregating search shows the post with the higest rank. Previously, we would only take either the `MIN` or `MAX` for `post_number` during aggregation meaning that the ranking is not considered. ``` require 'benchmark/ips' Benchmark.ips do \|x\| x.config(time: 10, warmup: 2) x.report("current aggregate search query") do DB.exec <<~SQL SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id" FROM "posts" JOIN (SELECT , row_number() over() row_number FROM (SELECT topics.id, min(posts.post_number) post_number FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3, 4) AND (topics.visible) AND (topics.archetype <> 'private_message') AND (post_search_data.search_data @@ TO_TSQUERY('english', '''postgres'':ABCD')) AND (categories.id NOT IN ( SELECT categories.id WHERE categories.search_priority = 1 ) ) AND ((categories.id IS NULL) OR (NOT categories.read_restricted)) GROUP BY topics.id ORDER BY MAX(( TS_RANK_CD( post_search_data.search_data, TO_TSQUERY('english', '''postgres'':ABCD'), 1\|32 ) ( CASE categories.search_priority WHEN 2 THEN 0.6 WHEN 3 THEN 0.8 WHEN 4 THEN 1.2 WHEN 5 THEN 1.4 ELSE CASE WHEN topics.closed THEN 0.9 ELSE 1 END END ) ) ) DESC, topics.bumped_at DESC LIMIT 51 OFFSET 0) xxx) x ON x.id = posts.topic_id AND x.post_number = posts.post_number WHERE ("posts"."deleted_at" IS NULL) ORDER BY row_number; SQL end x.report("current aggregate search query with proper ranking") do DB.exec <<~SQL SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id" FROM "posts" JOIN (SELECT , row_number() over() row_number FROM (SELECT subquery.topic_id id, (ARRAY_AGG(subquery.post_number))[1] post_number, MAX(subquery.rank) rank, MAX(subquery.bumped_at) bumped_at FROM (SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id", ( TS_RANK_CD( post_search_data.search_data, TO_TSQUERY('english', '''postgres'':ABCD'), 1\|32 ) * ( CASE categories.search_priority WHEN 2 THEN 0.6 WHEN 3 THEN 0.8 WHEN 4 THEN 1.2 WHEN 5 THEN 1.4 ELSE CASE WHEN topics.closed THEN 0.9 ELSE 1 END END ) ) rank, topics.bumped_at bumped_at FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3, 4) AND (topics.visible) AND (topics.archetype <> 'private_message') AND (post_search_data.search_data @@ TO_TSQUERY('english', '''postgres'':*ABCD')) AND (categories.id NOT IN ( SELECT categories.id WHERE categories.search_priority = 1 ) ) AND ((categories.id IS NULL) OR (NOT categories.read_restricted))) subquery GROUP BY subquery.topic_id ORDER BY rank DESC, bumped_at DESC LIMIT 51 OFFSET 0) xxx) x ON x.id = posts.topic_id AND x.post_number = posts.post_number WHERE ("posts"."deleted_at" IS NULL) ORDER BY row_number; SQL end x.compare! end ``` ``` Warming up -------------------------------------- current aggregate search query 1.000 i/100ms current aggregate search query with proper ranking 1.000 i/100ms Calculating ------------------------------------- current aggregate search query 17.726 (± 0.0%) i/s - 178.000 in 10.045107s current aggregate search query with proper ranking 17.802 (± 0.0%) i/s - 178.000 in 10.002230s Comparison: current aggregate search query with proper ranking: 17.8 i/s current aggregate search query: 17.7 i/s - 1.00x (± 0.00) slower ```	2020-07-14 13:39:13 +08:00
Guo Xiang Tan	ce39733b1a	FIX: Incorrect search blurb when advanced search filters are used take2 Also remove include_blurbs attribute which isn't used.	2020-07-14 11:50:40 +08:00
David Taylor	cb1f891392	Revert "FIX: Incorrect search blurb when advanced search filters are used." This change was causing advanced search filters to disappear from the search input This reverts commit `2e1eafae06`.	2020-07-09 16:19:18 +01:00
Guo Xiang Tan	2e1eafae06	FIX: Incorrect search blurb when advanced search filters are used.	2020-07-08 11:59:49 +08:00
Guo Xiang Tan	6bab2acc9f	Fix typo. Follow up to `af52df2d`	2020-07-02 14:23:10 +08:00
Guo Xiang Tan	af52df2d96	DEV: Add hidden site setting for PG search ranking normalization.	2020-07-02 14:11:18 +08:00
Régis Hanol	860deeb072	FIX: identify slug-less topic urls everywhere In `91c89df6`, I fixed the onebox to support local topics with a slug-less URL. This commit fixes all the other spots (search, topic links and user badges) where we look up for a local topic. Follow-up-to: `91c89df6`	2020-06-29 12:31:20 +02:00
Sam Saffron	3cb41d5429	PERF: stop adding more topics to search when not needed The logic of adding additional search results does not seem to be needed anymore. It appears to be a relic of an old implementation. This saves an entire search query for every search made.	2020-06-25 12:31:12 +10:00
Vinoth Kannan	ce1491e830	UX: remove `in:unpinned` filter from advanced search page. (#9911 )	2020-05-29 00:47:28 +05:30
Sam Saffron	862773ec83	FIX: do not remove stop words when using English locale PG already handles English stop words, the list in cppjieba is bigger than the list PG uses, which in turn causes confusion cause words such as "volume" are stripped using cppijieba stop word list We will follow up with another commit here to apply the Chinese word stopwords, but for now to eliminate the confusion we are skipping applying the stopword list when the dictionary in PG is in English.	2020-05-18 10:54:56 +10:00
David Taylor	03818e642a	FEATURE: Include optimized thumbnails for topics (#9215 ) This introduces new APIs for obtaining optimized thumbnails for topics. There are a few building blocks required for this: - Introduces new `image_upload_id` columns on the `posts` and `topics` table. This replaces the old `image_url` column, which means that thumbnails are now restricted to uploads. Hotlinked thumbnails are no longer possible. In normal use (with pull_hotlinked_images enabled), this has no noticeable impact - A migration attempts to match existing urls to upload records. If a match cannot be found then the posts will be queued for rebake - Optimized thumbnails are generated during post_process_cooked. If thumbnails are missing when serializing a topic list, then a sidekiq job is queued - Topic lists and topics now include a `thumbnails` key, which includes all the available images: ``` "thumbnails": [ { "max_width": null, "max_height": null, "url": "//example.com/original-image.png", "width": 1380, "height": 1840 }, { "max_width": 1024, "max_height": 1024, "url": "//example.com/optimized-image.png", "width": 768, "height": 1024 } ] ``` - Themes can request additional thumbnail sizes by using a modifier in their `about.json` file: ``` "modifiers": { "topic_thumbnail_sizes": [ [200, 200], [800, 800] ], ... ``` Remember that these are generated asynchronously, so your theme should include logic to fallback to other available thumbnails if your requested size has not yet been generated - Two new raw plugin outlets are introduced, to improve the customisability of the topic list. `topic-list-before-columns` and `topic-list-before-link`	2020-05-05 09:07:50 +01:00
Benno	6e01acb3cb	FIX: Apply category priority for empty query (#9516 )	2020-04-27 10:35:27 -04:00
Martin Brennan	628ba9d1e2	FEATURE: Promote bookmarks with reminders to core functionality (#9369 ) The main thrust of this PR is to take all the conditional checks based on the `enable_bookmarks_with_reminders` away and only keep the code from the `true` path, making bookmarks with reminders the core bookmarks feature. There is also a migration to create `Bookmark` records out of `PostAction` bookmarks for a site. ### Summary * Remove logic based on whether enable_bookmarks_with_reminders is true. This site setting is now obsolete, the old bookmark functionality is being removed. Retain the setting and set the value to `true` in a migration. * Use the code from the rake task to create a database migration that creates bookmarks from post actions. * Change the bookmark report to read from the new table. * Get rid of old endpoints for bookmarks * Link to the new bookmarks list from the user summary page	2020-04-22 13:44:19 +10:00
Martin Brennan	51672b9121	FIX: Minor bookmark with reminder issue cleanup (#9436 ) * Count user summary bookmarks from new Bookmark table if bookmarks with reminders enabled * Update topic user bookmarked column when new topic bookmark changed * Make in:bookmarks search work with new bookmarks * Fix batch inserts for bookmark rake task (and thus migration). We were only inserting one bookmark at a time, completely defeating the purpose of batching!	2020-04-16 11:32:21 +10:00
Sam Saffron	10b37e1e36	FIX: add support for sub-sub category slugs in search Previous to this change slugs for leaves in 3 level nestings would not work Our UX picks only the last two levels This also makes the results consistent for slugs as it enforces order.	2020-03-20 15:36:50 +11:00
David Taylor	5b3630dba3	FIX: Do not raise an error when in:all search is performed by anon (#9113 ) Also improve in:all specs to catch to catch similar failures	2020-03-05 17:50:29 +00:00
David Taylor	c344f43211	UX: Admins should only see their own PMs when searching in:all Admins are technically allowed to access all PMs, but it can be confusing to include them all in search. Follow-up to `e0605029dc`	2020-01-28 11:26:42 +00:00
adam j hartz	e0605029dc	FEATURE: allow searching public topics and personal messages simultaneously (#8784 ) The new search modifier `in:all` can be used to include both public and personal messages in the same search. Co-authored-by: adam j hartz <hz@mit.edu>	2020-01-28 10:11:33 +00:00
Mark VanLandingham	c5eec19368	FIX: Featuring topic on other users profile shows their topics (#8769 )	2020-01-22 14:16:17 -06:00
Mark VanLandingham	8c4ffaea1b	FEATURE: Modal for profile featured topic & admin wrench refactor (#8545 )	2019-12-16 08:41:34 -08:00
Sam Saffron	0fb497eb23	DEV: use Discourse.cache over Rails.cache Discourse.cache is a more consistent method to use and offers clean fallback if you are skipping redis This is part of a larger change that both optimizes Discoruse.cache and omits use of setex on $redis in favor of consistently using discourse cache Bench does reveal that use of Rails.cache and Discourse.cache is 1.25x slower than redis.setex / get so a re-implementation will follow prior to porting	2019-11-27 12:36:19 +11:00
Martin Brennan	e7226a8c84	FEATURE: Allow scoping search to tag (#8345 ) * When viewing a tag, the search widget will now show a checkbox to scope the search by tag, which will limit search results to that tag on desktop and mobile	2019-11-14 10:40:26 +10:00
Daniel Waterworth	55a1394342	DEV: pluck_first Doing .pluck(:column).first is a very common pattern in Discourse and in most cases, a limit cause isn't being added. Instead of adding a limit clause to all these callsites, this commit adds two new methods to ActiveRecord::Relation: pluck_first, equivalent to limit(1).pluck(*columns).first and pluck_first! which, like other finder methods, raises an exception when no record is found	2019-10-21 12:08:20 +01:00
Krzysztof Kotlarek	427d54b2b0	DEV: Upgrading Discourse to Zeitwerk (#8098 ) Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. We no longer need to use Rails "require_dependency" anywhere and instead can just use standard Ruby patterns to require files. This is a far reaching change and we expect some followups here.	2019-10-02 14:01:53 +10:00
OsamaSayegh	f364317625	PERF: Improve query speed when looking up direct PMs Follow up to `5fc5a7f5ae`	2019-07-23 03:52:52 +00:00
Osama Sayegh	5fc5a7f5ae	FEATURE: Add search operator to see all direct messages from a user (#7913 ) * FEATURE: Add search operator to see all direct messages from a user * Only show message if related messages >= 5 * Make "all messages" the hyperlink * Review	2019-07-22 10:55:49 -04:00
Blake Erickson	c76732722a	FIX: Turn off search logging when read-only (#7877 ) If `SiteSetting.log_search_queries` is enabled 500 errors will occur when searching if the master db is down. This fix allows searching to still work under these conditions.	2019-07-10 17:05:31 -07:00
Josh Moore	6c5689984f	FEATURE: in:tagged search (srv side) (#7822 ) * FEATURE: in:tagged and in:untagged advanced search filters Similar to in:solved or in:unsolved, the filters check for an existence of the topic_id in the topic_tags table. see: https://meta.discourse.org/t/how-to-search-filter-untagged-topics/119641/2	2019-06-28 18:19:57 +10:00
Sam Saffron	8f7a387aa7	FEATURE: add support for tag group search The behaviour of #TERM in search has been amended 1. We try category or subcategory slugs 2. We try tags 3. We try tag-groups The term `hello #my-group` will search for all posts tagged with any of the tags in the tag group `My Group` Future work may be introducing a slug cache here or caching it in the table but the assumption is that the number of tag groups will not be huge	2019-06-27 17:53:26 +10:00
Penar Musaraj	f51f37eddf	FEATURE: apply a small penalty to closed topics when searching (#7782 )	2019-06-21 12:03:45 +10:00
Dan Ungureanu	6bd082feab	FIX: Update mapping between locales and Postgres dictionaries. (#7606 )	2019-05-27 16:52:09 +03:00
Sam Saffron	30990006a9	DEV: enable frozen string literal on all files This reduces chances of errors where consumers of strings mutate inputs and reduces memory usage of the app. Test suite passes now, but there may be some stuff left, so we will run a few sites on a branch prior to merging	2019-05-13 09:31:32 +08:00
Sam Saffron	e2bcf55077	DEV: move send => public_send in lib folder This handles most of the cases in `lib` where we were using send instead of public_send	2019-05-07 12:25:44 +10:00
Vinoth Kannan	7869a10d18	Revert "FEATURE: Added unlisted topics option to advanced search (#7447 )" This reverts commit `539723f8ff` since it is failing the build.	2019-05-01 21:06:20 +05:30
Tim Lange	539723f8ff	FEATURE: Added unlisted topics option to advanced search (#7447 )	2019-05-01 12:31:13 +10:00
Guo Xiang Tan	e8a4d72281	FIX: Avoid penalizing long documents too much in search. This is a follow up to `e87ca59401`.	2019-04-03 14:09:57 +08:00
Guo Xiang Tan	4b0ac91bfb	DEV: Remove duplicated scope. `.joins(:topic)` will automatically add the `deleted_at IS NULL` scope.	2019-04-02 10:48:17 +08:00
Guo Xiang Tan	d8704c11ca	PERF: Better use of index when queueing a topci for search reindex. Also move `Search::INDEX_VERSION` to `SearchIndexer` which is where the version is actually being used.	2019-04-02 09:53:37 +08:00
Guo Xiang Tan	e87ca59401	FIX: Relevance search will now consider document length in ranking. The default ranking options ranks by the number of matches which is highly problematic when posts are stuffed with a keyword. The ranking will now be divided by the document length which is a much fairer way to rank.	2019-04-01 14:37:45 +08:00
Guo Xiang Tan	dae0bb4c67	FIX: Post blurb incorrect when search contains a phrase match. If the blurb generated is not around the search term, we will not be able to highlight it on the client side.	2019-03-26 17:01:52 +08:00
Guo Xiang Tan	ac661e856a	FEATURE: Allow categories to be prioritized/deprioritized in search. (#7209 )	2019-03-25 10:59:55 +08:00
Guo Xiang Tan	54d3648c55	FIX: Use same weights for calculating rank and searching for posts. * Reduce an extra db query as well when searching for posts ordered by relevance.	2019-03-20 15:36:31 +08:00
Guo Xiang Tan	64f20e7e7a	FIX: Don't ignore category in search when using category filters.	2019-03-19 11:23:14 +08:00
Guo Xiang Tan	5e410dc5e0	FEATURE: Ability to exclude category from search results. (#7194 ) This commit also adds `Category#search_priority` which sets the ground work to enable prioritizing of posts for certain categories when searching.	2019-03-18 15:25:45 +08:00
Vinoth Kannan	daf5a268a7	DEV: Option to preload topic custom fields in Search class	2019-03-17 23:16:09 +05:30
Guo Xiang Tan	684eef71c7	REFACTOR: Better variable name.	2019-03-13 15:23:01 +08:00
Guo Xiang Tan	da941840d4	FIX: Advanced search category term should be case insensitive.	2019-03-12 14:11:21 +08:00
Joffrey JAFFEUX	3acf8a95f3	UX: various tweaks to search-menu (#7114 )	2019-03-08 09:23:44 +01:00
Joffrey JAFFEUX	dc4001370c	FEATURE: displays groups in menu search (#7090 )	2019-03-04 10:30:09 +01:00
Sam	0a357299b7	FEATURE: add `f` and `t` search shortcuts for first post / title Previously with had `in:title` and `in:first` search shortcuts for searching in first post or title only. They are a bit of handful to type. This add 2 shortcuts (t and f) for searching titles of first posts. This commit also cleans up all advanced filters, they were not properly regex terminated allowing for weird clauses like `in:firstinator` acting the same as `in:first`	2019-02-25 10:55:24 +11:00
Bianca Nenciu	4f3ee86bbd	FIX: in:title should work irrespective of the order. (#6968 )	2019-02-05 10:54:52 +01:00
Arpit Jalan	a121d40771	FIX: do not show PM topics when moving posts to an existing public topic (#6876 )	2019-01-14 15:00:45 +05:30
Gerhard Schlager	bf27aecce2	REFACTOR: compact! works since the array can't contain empty strings	2018-11-22 13:27:34 +01:00
Gerhard Schlager	c376670bd2	FIX: a search term containing '& could lead to errors This also makes sure that the search term in front or after special characters isn't ignored.	2018-11-21 22:07:56 +01:00
Sam	06b9d8223a	FIX: search within topic not working correctly in CJK We were splitting the term prior to search causing everything to miss	2018-11-07 09:41:55 +11:00
Daniel Hollas	cee51672c9	FIX: Strip accents from search query `4481836` introduced accent stipping in search_indexer, but we need to strip it from the query itself as well TODO in search with diacritics: - Still need to fix excerpts on search page - need to support accent stripping in in_topic search - need to make sure that in:title works correctly - need to fix "word boldening" in titles	2018-10-23 12:10:33 +11:00
David Taylor	9bf522f227	FEATURE: Mixed case tagging (#6454 ) - By default, behaviour is not changed: tags are made lowercase upon creation and edit. - If force_lowercase_tags is disabled, then mixed case tags are allowed. - Tags must remain case-insensitively unique. This is enforced by ActiveRecord and Postgres. - A migration is added to provide a `UNIQUE` index on `lower(name)`. Migration includes a safety to correct any current tags that do not meet the criteria. - A `where_name` scope is added to `models/tag.rb`, to allow easy case-insensitive lookups. This is used instead of `Tag.where(name: "blah")`. - URLs remain lowercase. Mixed case URLs are functional, but have the lowercase equivalent as the canonical.	2018-10-05 10:23:52 +01:00
Penar Musaraj	70d74f8fc1	FIX: advanced search ordering broken when using tags	2018-09-28 17:27:08 +08:00
Sam	9b7cab589a	FIX: revert diacritic stripping See more details in test case and at: https://meta.discourse.org/t/discourse-should-ignore-if-a-character-is-accented-when-doing-a-search/90198/16?u=sam	2018-08-31 11:46:55 +10:00
Maja Komel	020eba4623	FIX: find tags with non-latin names (#6312 )	2018-08-27 11:05:28 +10:00
Sam	ac11f8df52	correct regression searching with diacritics	2018-08-24 10:00:51 +10:00
Régis Hanol	0cd9e2acb9	fix build	2018-08-04 01:56:26 +02:00
Kasia Bułat	b71cf6d422	FEATURE: Add search not operator for tags.	2018-07-03 15:57:34 +08:00
Sam	5f64fd0a21	DEV: remove exec_sql and replace with mini_sql Introduce new patterns for direct sql that are safe and fast. MiniSql is not prone to memory bloat that can happen with direct PG usage. It also has an extremely fast materializer and very a convenient API - DB.exec(sql, params) => runs sql returns row count - DB.query(sql, params) => runs sql returns usable objects (not a hash) - DB.query_hash(sql, params) => runs sql returns an array of hashes - DB.query_single(sql, params) => runs sql and returns a flat one dimensional array - DB.build(sql) => returns a sql builder See more at: https://github.com/discourse/mini_sql	2018-06-19 16:13:36 +10:00
Guo Xiang Tan	ad5082d969	Make rubocop happy again.	2018-06-07 13:28:18 +08:00
Sam	e501936405	FIX: search server side error in rare condition	2018-05-28 15:28:18 +10:00
Sam	c677877e4f	FIX: Korean needs no word segmentation	2018-05-28 09:37:57 +10:00

1 2 3 4 5 ...

330 Commits