2f5e66b6f8
In the query generated by `TopicTrackingState.report`, there are two subqueies being executed. The first subquery fetches all the topics that are new for a given user while the second subquery fetches all the topics with unread posts for a given user. For the second subquery, there is a filter `topics.updated_at >= user_stats.first_unread_at` which is used as a performance optimisation to reduce the number of rows that PG has to scan through the `topics` table. However, we started to notice in production that the PG planner doesn't always execute the filter first to reduce the number of rows that it has to scan through. Running the following query in one of our production instance, ``` EXPLAIN ANALYZE SELECT DISTINCT topics.id as topic_id, u.id as user_id, topics.created_at, topics.updated_at, topics.highest_staff_post_number AS highest_post_number, last_read_post_number, c.id as category_id, c.topic_id AS category_topic_id, tu.notification_level, us.first_unread_at, GREATEST( CASE WHEN COALESCE(uo.new_topic_duration_minutes, 2880) = -1 THEN u.created_at WHEN COALESCE(uo.new_topic_duration_minutes, 2880) = -2 THEN COALESCE( u.previous_visit_at,u.created_at ) ELSE ('2023-07-31 03:29:45.737630'::timestamp - INTERVAL '1 MINUTE' * COALESCE(uo.new_topic_duration_minutes, 2880)) END, u.created_at, '2023-07-25 15:06:44' ) AS treat_as_new_topic_start_date FROM topics JOIN users u on u.id = 13455 JOIN user_stats AS us ON us.user_id = u.id JOIN user_options AS uo ON uo.user_id = u.id JOIN categories c ON c.id = topics.category_id LEFT JOIN topic_users tu ON tu.topic_id = topics.id AND tu.user_id = u.id WHERE u.id = 13455 AND topics.updated_at >= us.first_unread_at AND topics.archetype <> 'private_message' AND (("topics"."deleted_at" IS NULL AND (tu.last_read_post_number < topics.highest_staff_post_number) AND (COALESCE(tu.notification_level, 1) >= 2)) OR (1=0)) AND NOT ( COALESCE((select array_agg(tag_id) from topic_tags where topic_tags.topic_id = topics.id), ARRAY[]::int[]) && ARRAY[451,452,453] ) AND topics.deleted_at IS NULL AND NOT ( last_read_post_number IS NULL AND ( topics.category_id IN (SELECT "categories"."id" FROM "categories" LEFT JOIN categories categories2 ON categories2.id = categories.parent_category_id LEFT JOIN category_users ON category_users.category_id = categories.id AND category_users.user_id = 13455 LEFT JOIN category_users category_users2 ON category_users2.category_id = categories2.id AND category_users2.user_id = 13455 WHERE ((category_users.id IS NULL AND COALESCE(category_users2.notification_level, 1) = 0) OR COALESCE(category_users.notification_level, 1) = 0)) AND tu.notification_level <= 1 ) ) ``` we get the following ``` QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=201606.06..201608.15 rows=76 width=60) (actual time=91.279..91.294 rows=14 loops=1) -> Sort (cost=201606.06..201606.25 rows=76 width=60) (actual time=91.278..91.284 rows=14 loops=1) Sort Key: topics.id, topics.created_at, topics.updated_at, topics.highest_staff_post_number, tu.last_read_post_number, c.id, c.topic_id, tu.notification_level, us.first_unread_at, (GREATEST(CASE WHEN (COALESCE(uo.new_topic_duration_minutes, 2880) = '-1'::integer) THEN u.created_at WHEN (COALESCE(uo.new_topic_duration_minutes, 2880) = '-2'::integer) THEN COALESCE(u.previous_visit_at, u.created_at) ELSE ('2023-07-31 03:29:45.73763'::timestamp without time zone - ('00:01:00'::interval * (COALESCE(uo.new_topic_duration_minutes, 2880))::double precision)) END, u.created_at, '2023-07-25 15:06:44'::timestamp without time zone)) Sort Method: quicksort Memory: 26kB -> Hash Join (cost=97519.51..201603.69 rows=76 width=60) (actual time=87.662..91.268 rows=14 loops=1) Hash Cond: (topics.id = tu.topic_id) Join Filter: ((tu.last_read_post_number < topics.highest_staff_post_number) AND ((tu.last_read_post_number IS NOT NULL) OR (NOT (hashed SubPlan 2)) OR (tu.notification_level > 1))) Rows Removed by Join Filter: 10 -> Nested Loop (cost=1.54..104075.36 rows=3511 width=68) (actual time=0.055..3.609 rows=548 loops=1) -> Nested Loop (cost=1.13..25.20 rows=1 width=32) (actual time=0.027..0.033 rows=1 loops=1) -> Nested Loop (cost=0.71..16.76 rows=1 width=28) (actual time=0.020..0.023 rows=1 loops=1) -> Index Scan using users_pkey on users u (cost=0.42..8.44 rows=1 width=20) (actual time=0.010..0.012 rows=1 loops=1) Index Cond: (id = 13455) -> Index Scan using user_stats_pkey on user_stats us (cost=0.29..8.31 rows=1 width=12) (actual time=0.008..0.010 rows=1 loops=1) Index Cond: (user_id = 13455) -> Index Scan using index_user_options_on_user_id_and_default_calendar on user_options uo (cost=0.42..8.44 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1) Index Cond: (user_id = 13455) -> Nested Loop (cost=0.41..104015.12 rows=3504 width=36) (actual time=0.026..3.503 rows=548 loops=1) -> Seq Scan on categories c (cost=0.00..13.73 rows=73 width=8) (actual time=0.003..0.039 rows=73 loops=1) -> Index Only Scan using index_topics_on_updated_at_public on topics (cost=0.41..1424.20 rows=48 width=28) (actual time=0.012..0.046 rows=8 loops=73) Index Cond: ((updated_at >= us.first_unread_at) AND (category_id = c.id)) Filter: (NOT (COALESCE((SubPlan 1), '{}'::integer[]) && '{451,452,453}'::integer[])) Heap Fetches: 553 SubPlan 1 -> Aggregate (cost=4.31..4.32 rows=1 width=32) (actual time=0.002..0.002 rows=1 loops=548) -> Index Only Scan using index_topic_tags_on_topic_id_and_tag_id on topic_tags (cost=0.29..4.31 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=548) Index Cond: (topic_id = topics.id) Heap Fetches: 178 -> Hash (cost=97222.14..97222.14 rows=19914 width=16) (actual time=87.545..87.546 rows=42884 loops=1) Buckets: 65536 (originally 32768) Batches: 1 (originally 1) Memory Usage: 2387kB -> Bitmap Heap Scan on topic_users tu (cost=1217.47..97222.14 rows=19914 width=16) (actual time=14.419..78.286 rows=42884 loops=1) Recheck Cond: (user_id = 13455) Filter: (COALESCE(notification_level, 1) >= 2) Rows Removed by Filter: 15839 Heap Blocks: exact=45285 -> Bitmap Index Scan on index_topic_users_on_user_id_and_topic_id (cost=0.00..1212.49 rows=59741 width=0) (actual time=6.448..6.448 rows=58723 loops=1) Index Cond: (user_id = 13455) SubPlan 2 -> Nested Loop Left Join (cost=0.74..46.90 rows=1 width=4) (never executed) Join Filter: (category_users2.category_id = categories2.id) Filter: (((category_users.id IS NULL) AND (COALESCE(category_users2.notification_level, 1) = 0)) OR (COALESCE(category_users.notification_level, 1) = 0)) -> Nested Loop Left Join (cost=0.45..32.31 rows=73 width=16) (never executed) Join Filter: (category_users.category_id = categories.id) -> Nested Loop Left Join (cost=0.15..18.45 rows=73 width=8) (never executed) -> Seq Scan on categories (cost=0.00..13.73 rows=73 width=8) (never executed) -> Memoize (cost=0.15..0.28 rows=1 width=4) (never executed) Cache Key: categories.parent_category_id Cache Mode: logical -> Index Only Scan using categories_pkey on categories categories2 (cost=0.14..0.27 rows=1 width=4) (never executed) Index Cond: (id = categories.parent_category_id) Heap Fetches: 0 -> Materialize (cost=0.29..11.69 rows=2 width=12) (never executed) -> Index Scan using idx_category_users_user_id_category_id on category_users (cost=0.29..11.68 rows=2 width=12) (never executed) Index Cond: (user_id = 13455) -> Materialize (cost=0.29..11.69 rows=2 width=8) (never executed) -> Index Scan using idx_category_users_user_id_category_id on category_users category_users2 (cost=0.29..11.68 rows=2 width=8) (never executed) Index Cond: (user_id = 13455) Planning Time: 1.740 ms Execution Time: 91.414 ms (59 rows) ``` From the execution plan, we can see the most of the time is spent joining about 42888 rows in the `topics` table to the `topic_users` table. However, we know that we only have to scan through a subset of the `topics` table because the user's last unread at is '2023-07-20 11:33:05'. If we filter the `topics` table with `topics.updated_at >= '2023-07-20 11:33:05'`, this would only return about 1500 rows. From our testing in production, the PG planner is able to execute a better query plan when we avoid the unnecessary joins on `user_stats` just to be able to get the user's `UserStat#first_unread_at`. Instead, we can just pass the value of `UserStat#first_unread_at` directly as a query parameter. ``` EXPLAIN ANALYZE SELECT DISTINCT topics.id as topic_id, u.id as user_id, topics.created_at, topics.updated_at, topics.highest_staff_post_number AS highest_post_number, last_read_post_number, c.id as category_id, c.topic_id AS category_topic_id, tu.notification_level, GREATEST( CASE WHEN COALESCE(uo.new_topic_duration_minutes, 2880) = -1 THEN u.created_at WHEN COALESCE(uo.new_topic_duration_minutes, 2880) = -2 THEN COALESCE( u.previous_visit_at,u.created_at ) ELSE ('2023-07-31 03:29:45.737630'::timestamp - INTERVAL '1 MINUTE' * COALESCE(uo.new_topic_duration_minutes, 2880)) END, u.created_at, '2023-07-25 15:06:44' ) AS treat_as_new_topic_start_date FROM topics JOIN users u on u.id = 13455 JOIN user_options AS uo ON uo.user_id = u.id JOIN categories c ON c.id = topics.category_id LEFT JOIN topic_users tu ON tu.topic_id = topics.id AND tu.user_id = u.id WHERE u.id = 13455 AND topics.updated_at >= '2023-07-20 11:33:05' AND topics.archetype <> 'private_message' AND (("topics"."deleted_at" IS NULL AND (tu.last_read_post_number < topics.highest_staff_post_number) AND (COALESCE(tu.notification_level, 1) >= 2)) OR (1=0)) AND NOT ( COALESCE((select array_agg(tag_id) from topic_tags where topic_tags.topic_id = topics.id), ARRAY[]::int[]) && ARRAY[451,452,453] ) AND topics.deleted_at IS NULL AND NOT ( last_read_post_number IS NULL AND ( topics.category_id IN (SELECT "categories"."id" FROM "categories" LEFT JOIN categories categories2 ON categories2.id = categories.parent_category_id LEFT JOIN category_users ON category_users.category_id = categories.id AND category_users.user_id = 13455 LEFT JOIN category_users category_users2 ON category_users2.category_id = categories2.id AND category_users2.user_id = 13455 WHERE ((category_users.id IS NULL AND COALESCE(category_users2.notification_level, 1) = 0) OR COALESCE(category_users.notification_level, 1) = 0)) AND tu.notification_level <= 1 ) ); ``` Note how the filter is now `topics.updated_at >= '2023-07-20 11:33:05'` instead of `topics.updated_at >= us.first_unread_at`. The modified query above generates the following execution plan. ``` QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Unique (cost=5189.86..5189.88 rows=1 width=52) (actual time=4.991..5.002 rows=14 loops=1) -> Sort (cost=5189.86..5189.86 rows=1 width=52) (actual time=4.990..4.994 rows=14 loops=1) Sort Key: topics.id, topics.created_at, topics.updated_at, topics.highest_staff_post_number, tu.last_read_post_number, c.id, c.topic_id, tu.notification_level, (GREATEST(CASE WHEN (COALESCE(uo.new_topic_duration_minutes, 2880) = '-1'::integer) THEN u.created_at WHEN (COALESCE(uo.new_topic_duration_minutes, 2880) = '-2'::integer) THEN COALESCE(u.previous_visit_at, u.created_at) ELSE ('2023-07-31 03:29:45.73763'::timestamp without time zone - ('00:01:00'::interval * (COALESCE(uo.new_topic_duration_minutes, 2880))::double precision)) END, u.created_at, '2023-07-25 15:06:44'::timestamp without time zone)) Sort Method: quicksort Memory: 26kB -> Nested Loop (cost=52.11..5189.85 rows=1 width=52) (actual time=0.093..4.974 rows=14 loops=1) -> Nested Loop (cost=51.70..5181.39 rows=1 width=60) (actual time=0.084..4.931 rows=14 loops=1) -> Nested Loop (cost=51.28..5172.94 rows=1 width=44) (actual time=0.076..4.887 rows=14 loops=1) -> Nested Loop (cost=0.41..1698.46 rows=59 width=36) (actual time=0.029..3.537 rows=548 loops=1) -> Seq Scan on categories c (cost=0.00..13.73 rows=73 width=8) (actual time=0.005..0.039 rows=73 loops=1) -> Index Only Scan using index_topics_on_updated_at_public on topics (cost=0.41..23.07 rows=1 width=28) (actual time=0.012..0.047 rows=8 loops=73) Index Cond: ((updated_at >= '2023-07-20 11:33:05'::timestamp without time zone) AND (category_id = c.id)) Filter: (NOT (COALESCE((SubPlan 1), '{}'::integer[]) && '{451,452,453}'::integer[])) Heap Fetches: 552 SubPlan 1 -> Aggregate (cost=4.31..4.32 rows=1 width=32) (actual time=0.002..0.002 rows=1 loops=548) -> Index Only Scan using index_topic_tags_on_topic_id_and_tag_id on topic_tags (cost=0.29..4.31 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=548) Index Cond: (topic_id = topics.id) Heap Fetches: 178 -> Index Scan using index_topic_users_on_user_id_and_topic_id on topic_users tu (cost=50.86..58.88 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=548) Index Cond: ((user_id = 13455) AND (topic_id = topics.id)) Filter: ((COALESCE(notification_level, 1) >= 2) AND (last_read_post_number < topics.highest_staff_post_number) AND ((last_read_post_number IS NOT NULL) OR (NOT (hashed SubPlan 2)) OR (notification_level > 1))) Rows Removed by Filter: 0 SubPlan 2 -> Nested Loop Left Join (cost=0.74..50.43 rows=1 width=4) (never executed) Join Filter: (category_users2.category_id = categories2.id) Filter: (((category_users.id IS NULL) AND (COALESCE(category_users2.notification_level, 1) = 0)) OR (COALESCE(category_users.notification_level, 1) = 0)) -> Nested Loop Left Join (cost=0.45..35.84 rows=73 width=16) (never executed) Join Filter: (category_users.category_id = categories.id) -> Nested Loop Left Join (cost=0.15..21.97 rows=73 width=8) (never executed) -> Seq Scan on categories (cost=0.00..13.73 rows=73 width=8) (never executed) -> Memoize (cost=0.15..0.61 rows=1 width=4) (never executed) Cache Key: categories.parent_category_id Cache Mode: logical -> Index Only Scan using categories_pkey on categories categories2 (cost=0.14..0.60 rows=1 width=4) (never executed) Index Cond: (id = categories.parent_category_id) Heap Fetches: 0 -> Materialize (cost=0.29..11.69 rows=2 width=12) (never executed) -> Index Scan using idx_category_users_user_id_category_id on category_users (cost=0.29..11.68 rows=2 width=12) (never executed) Index Cond: (user_id = 13455) -> Materialize (cost=0.29..11.69 rows=2 width=8) (never executed) -> Index Scan using idx_category_users_user_id_category_id on category_users category_users2 (cost=0.29..11.68 rows=2 width=8) (never executed) Index Cond: (user_id = 13455) -> Index Scan using users_pkey on users u (cost=0.42..8.44 rows=1 width=20) (actual time=0.003..0.003 rows=1 loops=14) Index Cond: (id = 13455) -> Index Scan using index_user_options_on_user_id_and_default_calendar on user_options uo (cost=0.42..8.44 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=14) Index Cond: (user_id = 13455) Planning Time: 1.281 ms Execution Time: 5.092 ms (48 rows) ``` With the new query, PG first does an index scan using the `index_topics_on_updated_at_public` index to filter away most of the topics making the subsequent joins much cheaper. Total query time has been reduced from ~90ms to ~5ms. This optimisation will mostly affect users with very few/recent unread topics since a large `UserStat#firsts_unread_at` value will still mean scanning through a large portion of the `topics` table. |
||
---|---|---|
.devcontainer | ||
.github | ||
.vscode-sample | ||
app | ||
bin | ||
config | ||
db | ||
docs | ||
documentation | ||
images | ||
lib | ||
log | ||
plugins | ||
public | ||
script | ||
spec | ||
test | ||
vendor | ||
.editorconfig | ||
.eslintignore | ||
.eslintrc | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.jsdoc | ||
.licensed.yml | ||
.licensee.json | ||
.npmrc | ||
.prettierignore | ||
.prettierrc | ||
.rspec | ||
.rspec_parallel | ||
.rubocop.yml | ||
.ruby-gemset.sample | ||
.ruby-version.sample | ||
.streerc | ||
.template-lintrc.js | ||
Brewfile | ||
config.ru | ||
CONTRIBUTING.md | ||
COPYRIGHT.md | ||
d | ||
discourse.sublime-project | ||
Gemfile | ||
Gemfile.lock | ||
jsconfig.base.json | ||
lefthook.yml | ||
LICENSE.txt | ||
package.json | ||
Rakefile | ||
README.md | ||
translator.yml | ||
yarn.lock |
Discourse is the 100% open source discussion platform built for the next decade of the Internet. Use it as a:
- mailing list
- discussion forum
- long-form chat room
To learn more about the philosophy and goals of the project, visit discourse.org.
Screenshots
Browse lots more notable Discourse instances.
Development
To get your environment setup, follow the community setup guide for your operating system.
- If you're on macOS, try the macOS development guide.
- If you're on Ubuntu, try the Ubuntu development guide.
- If you're on Windows, try the Windows 10 development guide.
If you're familiar with how Rails works and are comfortable setting up your own environment, you can also try out the Discourse Advanced Developer Guide, which is aimed primarily at Ubuntu and macOS environments.
Before you get started, ensure you have the following minimum versions: Ruby 3.2+, PostgreSQL 13, Redis 7. If you're having trouble, please see our TROUBLESHOOTING GUIDE first!
Setting up Discourse
If you want to set up a Discourse forum for production use, see our Discourse Install Guide.
If you're looking for business class hosting, see discourse.org/buy.
Requirements
Discourse is built for the next 10 years of the Internet, so our requirements are high.
Discourse supports the latest, stable releases of all major browsers and platforms:
Browsers | Tablets | Phones |
---|---|---|
Apple Safari | iPadOS | iOS |
Google Chrome | Android | Android |
Microsoft Edge | ||
Mozilla Firefox |
Additionally, we aim to support Safari on iOS 15.7+.
Built With
- Ruby on Rails — Our back end API is a Rails app. It responds to requests RESTfully in JSON.
- Ember.js — Our front end is an Ember.js app that communicates with the Rails API.
- PostgreSQL — Our main data store is in Postgres.
- Redis — We use Redis as a cache and for transient data.
- BrowserStack — We use BrowserStack to test on real devices and browsers.
Plus lots of Ruby Gems, a complete list of which is at /main/Gemfile.
Contributing
Discourse is 100% free and open source. We encourage and support an active, healthy community that
accepts contributions from the public – including you!
Before contributing to Discourse:
- Please read the complete mission statements on discourse.org. Yes we actually believe this stuff; you should too.
- Read and sign the Electronic Discourse Forums Contribution License Agreement.
- Dig into CONTRIBUTING.MD, which covers submitting bugs, requesting new features, preparing your code for a pull request, etc.
- Always strive to collaborate with mutual respect.
- Not sure what to work on? We've got some ideas.
We look forward to seeing your pull requests!
Security
We take security very seriously at Discourse; all our code is 100% open source and peer reviewed. Please read our security guide for an overview of security measures in Discourse, or if you wish to report a security issue.
The Discourse Team
The original Discourse code contributors can be found in AUTHORS.MD. For a complete list of the many individuals that contributed to the design and implementation of Discourse, please refer to the official Discourse blog and GitHub's list of contributors.
Copyright / License
Copyright 2014 - 2023 Civilized Discourse Construction Kit, Inc.
Licensed under the GNU General Public License Version 2.0 (or later);
you may not use this work except in compliance with the License.
You may obtain a copy of the License in the LICENSE file, or at:
https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Discourse logo and “Discourse Forum” ®, Civilized Discourse Construction Kit, Inc.
Accessibility
To guide our ongoing effort to build accessible software we follow the W3C’s Web Content Accessibility Guidelines (WCAG). If you'd like to report an accessibility issue that makes it difficult for you to use Discourse, email accessibility@discourse.org. For more information visit discourse.org/accessibility.
Dedication
Discourse is built with love, Internet style.