PERF: Optimise query for excluding topics in certain categories in TopicsFilter (#27027)

This commit optimises the database query generated by
`TopicsFilter#filter_categories` when the `-category:*` filter is used.
Previously, the method will add the `topics.category_id NOT IN
(<category ids to be excluded>)` filter to the resulting query. However,
we noticed that the performance of the query degrades as the number of
rows in the `topics` table grow and when the number of category ids to be
excluded is large.

Sample of query we ran on a large database in production to demonstrate
the improvement:

Before:

```
SELECT topics.id FROM topics WHERE topics.category_id NOT IN (83, 136, 149, 143, 153, 165, 161, 123, 155, 163, 144, 134, 69, 135, 158, 141, 151, 160, 131, 133, 89, 104, 150, 147, 132, 145, 108, 146, 122, 100, 128, 154, 95, 102, 140, 139, 88, 91, 87) ORDER BY topics.id DESC LIMIT 5;

                                                                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=27795.34..27795.34 rows=1 width=4) (actual time=29.317..30.165 rows=5 loops=1)
   ->  Sort  (cost=27795.34..27795.34 rows=1 width=4) (actual time=29.316..30.163 rows=5 loops=1)
         Sort Key: id DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Gather  (cost=1000.10..27795.33 rows=1 width=4) (actual time=0.187..26.132 rows=73478 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on topics  (cost=0.10..26795.23 rows=1 width=4) (actual time=0.013..22.252 rows=24493 loops=3)
                     Filter: (category_id <> ALL ('{83,136,149,143,153,165,161,123,155,163,144,134,69,135,158,141,151,160,131,133,89,104,150,147,132,145,108,146,122,100,128,154,95,102,140,139,88,91,87}'::integer[]))
                     Rows Removed by Filter: 77276
 Planning Time: 0.140 ms
 Execution Time: 30.181 ms
```

After:

```
SELECT topics.id FROM topics WHERE NOT EXISTS (
  SELECT 1
  FROM unnest(array[83, 136, 149, 143, 153, 165, 161, 123, 155, 163, 144, 134, 69, 135, 158, 141, 151, 160, 131, 133, 89, 104, 150, 147, 132, 145, 108, 146, 122, 100, 128, 154, 95, 102, 140, 139, 88, 91, 87]) AS excluded_categories(category_id)
  WHERE topics.category_id IS NULL OR excluded_categories.category_id = topics.category_id
) ORDER BY topics.id DESC LIMIT 5 ;

                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..13.52 rows=5 width=4) (actual time=0.028..0.110 rows=5 loops=1)
   ->  Nested Loop Anti Join  (cost=0.42..179929.62 rows=68715 width=4) (actual time=0.027..0.109 rows=5 loops=1)
         Join Filter: ((topics.category_id IS NULL) OR (excluded_categories.category_id = topics.category_id))
         Rows Removed by Join Filter: 239
         ->  Index Scan Backward using forum_threads_pkey on topics  (cost=0.42..108925.71 rows=305301 width=8) (actual time=0.012..0.062 rows=44 loops=1)
         ->  Function Scan on unnest excluded_categories  (cost=0.00..0.39 rows=39 width=4) (actual time=0.000..0.001 rows=6 loops=44)
 Planning Time: 0.126 ms
 Execution Time: 0.124 ms
(8 rows)
```
This commit is contained in:
Alan Guo Xiang Tan 2024-05-15 11:03:07 +08:00 committed by GitHub
parent 454a343188
commit 2134ca9031
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -270,9 +270,14 @@ class TopicsFilter
)
end
if exclude_category_ids.present?
@scope = @scope.where("topics.category_id NOT IN (?)", exclude_category_ids)
end
# Use `NOT EXISTS` instead of `NOT IN` to avoid performance issues with large arrays.
@scope = @scope.where(<<~SQL) if exclude_category_ids.present?
NOT EXISTS (
SELECT 1
FROM unnest(array[#{exclude_category_ids.join(",")}]) AS excluded_categories(category_id)
WHERE topics.category_id IS NULL OR excluded_categories.category_id = topics.category_id
)
SQL
end
def filter_created_by_user(usernames:)