mirror of
https://github.com/discourse/discourse.git
synced 2025-01-18 15:43:16 +08:00
PERF: Optimise query for excluding topics in certain categories in TopicsFilter (#27027)
This commit optimises the database query generated by `TopicsFilter#filter_categories` when the `-category:*` filter is used. Previously, the method will add the `topics.category_id NOT IN (<category ids to be excluded>)` filter to the resulting query. However, we noticed that the performance of the query degrades as the number of rows in the `topics` table grow and when the number of category ids to be excluded is large. Sample of query we ran on a large database in production to demonstrate the improvement: Before: ``` SELECT topics.id FROM topics WHERE topics.category_id NOT IN (83, 136, 149, 143, 153, 165, 161, 123, 155, 163, 144, 134, 69, 135, 158, 141, 151, 160, 131, 133, 89, 104, 150, 147, 132, 145, 108, 146, 122, 100, 128, 154, 95, 102, 140, 139, 88, 91, 87) ORDER BY topics.id DESC LIMIT 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=27795.34..27795.34 rows=1 width=4) (actual time=29.317..30.165 rows=5 loops=1) -> Sort (cost=27795.34..27795.34 rows=1 width=4) (actual time=29.316..30.163 rows=5 loops=1) Sort Key: id DESC Sort Method: top-N heapsort Memory: 25kB -> Gather (cost=1000.10..27795.33 rows=1 width=4) (actual time=0.187..26.132 rows=73478 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on topics (cost=0.10..26795.23 rows=1 width=4) (actual time=0.013..22.252 rows=24493 loops=3) Filter: (category_id <> ALL ('{83,136,149,143,153,165,161,123,155,163,144,134,69,135,158,141,151,160,131,133,89,104,150,147,132,145,108,146,122,100,128,154,95,102,140,139,88,91,87}'::integer[])) Rows Removed by Filter: 77276 Planning Time: 0.140 ms Execution Time: 30.181 ms ``` After: ``` SELECT topics.id FROM topics WHERE NOT EXISTS ( SELECT 1 FROM unnest(array[83, 136, 149, 143, 153, 165, 161, 123, 155, 163, 144, 134, 69, 135, 158, 141, 151, 160, 131, 133, 89, 104, 150, 147, 132, 145, 108, 146, 122, 100, 128, 154, 95, 102, 140, 139, 88, 91, 87]) AS excluded_categories(category_id) WHERE topics.category_id IS NULL OR excluded_categories.category_id = topics.category_id ) ORDER BY topics.id DESC LIMIT 5 ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..13.52 rows=5 width=4) (actual time=0.028..0.110 rows=5 loops=1) -> Nested Loop Anti Join (cost=0.42..179929.62 rows=68715 width=4) (actual time=0.027..0.109 rows=5 loops=1) Join Filter: ((topics.category_id IS NULL) OR (excluded_categories.category_id = topics.category_id)) Rows Removed by Join Filter: 239 -> Index Scan Backward using forum_threads_pkey on topics (cost=0.42..108925.71 rows=305301 width=8) (actual time=0.012..0.062 rows=44 loops=1) -> Function Scan on unnest excluded_categories (cost=0.00..0.39 rows=39 width=4) (actual time=0.000..0.001 rows=6 loops=44) Planning Time: 0.126 ms Execution Time: 0.124 ms (8 rows) ```
This commit is contained in:
parent
454a343188
commit
2134ca9031
|
@ -270,9 +270,14 @@ class TopicsFilter
|
|||
)
|
||||
end
|
||||
|
||||
if exclude_category_ids.present?
|
||||
@scope = @scope.where("topics.category_id NOT IN (?)", exclude_category_ids)
|
||||
end
|
||||
# Use `NOT EXISTS` instead of `NOT IN` to avoid performance issues with large arrays.
|
||||
@scope = @scope.where(<<~SQL) if exclude_category_ids.present?
|
||||
NOT EXISTS (
|
||||
SELECT 1
|
||||
FROM unnest(array[#{exclude_category_ids.join(",")}]) AS excluded_categories(category_id)
|
||||
WHERE topics.category_id IS NULL OR excluded_categories.category_id = topics.category_id
|
||||
)
|
||||
SQL
|
||||
end
|
||||
|
||||
def filter_created_by_user(usernames:)
|
||||
|
|
Loading…
Reference in New Issue
Block a user