We were seeing lots of deadlocks deploying this migration. This improves
the situation in 2 ways.
1. ddl transaction is avoided, so we hold locks for far shorter times
2. we operate in chunks of a maximum of 100_000 posts (though it is heavily filtered down)
* improve code so it is clearer
Previously, Discourse's password hashing was hard-coded to a specific algorithm and parameters. Any changes to the algorithm or parameters would essentially invalidate all existing user passwords.
This commit introduces a new `password_algorithm` column on the `users` table. This persists the algorithm/parameters which were use to generate the hash for a given user. All existing rows in the users table are assumed to be using Discourse's current algorithm/parameters. With this data stored per-user in the database, we'll be able to keep existing passwords working while adjusting the algorithm/parameters for newly hashed passwords.
Passwords which were hashed with an old algorithm will be automatically re-hashed with the new algorithm when the user next logs in.
Values in the `password_algorithm` column are based on the PHC string format (https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md). Discourse's existing algorithm is described by the string `$pbkdf2-sha256$i=64000,l=32$`
To introduce a new algorithm and start using it, make sure it's implemented in the `PasswordHasher` library, then update `User::TARGET_PASSWORD_ALGORITHM`.
That column is obsolete since we added the `granted_title_badge_id` column in 2019 (56d3e29a69). Having both columns can lead to inconsistencies (mostly due to old data from before 2019).
For example, `BadgeGranter.revoke_ungranted_titles!` doesn't work correctly if `badge_granted_title` is `false` while `granted_title_badge_id` points to the badge that is used as title.
Currently we don’t have an association between reviewables and posts.
This sometimes leads to inconsistencies in the DB as a post can have
been deleted but an associated reviewable is still present.
This patch addresses this issue simply by adding a new association to
the `Post` model and by using the `dependent: :destroy` option.
Currently, `Tag#topic_count` is a count of all regular topics regardless of whether the topic is in a read restricted category or not. As a result, any users can technically poll a sensitive tag to determine if a new topic is created in a category which the user has not excess to. We classify this as a minor leak in sensitive information.
The following changes are introduced in this commit:
1. Introduce `Tag#public_topic_count` which only count topics which have been tagged with a given tag in public categories.
2. Rename `Tag#topic_count` to `Tag#staff_topic_count` which counts the same way as `Tag#topic_count`. In other words, it counts all topics tagged with a given tag regardless of the category the topic is in. The rename is also done so that we indicate that this column contains sensitive information.
3. Change all previous spots which relied on `Topic#topic_count` to rely on `Tag.topic_column_count(guardian)` which will return the right "topic count" column to use based on the current scope.
4. Introduce `SiteSetting.include_secure_categories_in_tag_counts` site setting to allow site administrators to always display the tag topics count using `Tag#staff_topic_count` instead.
In Discourse, there are many migration files where we CREATE INDEX CONCURRENTLY which requires us to set disable_ddl_transaction!. Setting disable_ddl_transaction! in a migration file runs the SQL statements outside of a transaction. The implication of this is that there is no ROLLBACK should any of the SQL statements fail.
We have seen lock timeouts occuring when running CREATE INDEX CONCURRENTLY. When that happens, the index would still have been created but marked as invalid by Postgres.
Per the postgres documentation:
> If a problem arises while scanning the table, such as a deadlock or a uniqueness violation in a unique index, the CREATE INDEX command will fail but leave behind an “invalid” index. This index will be ignored for querying purposes because it might be incomplete; however it will still consume update overhead.
> The recommended recovery method in such cases is to drop the index and try again to perform CREATE INDEX CONCURRENTLY . (Another possibility is to rebuild the index with REINDEX INDEX CONCURRENTLY ).
When such scenarios happen, we are supposed to either drop and create the index again or run a REINDEX operation. However, I noticed today that we have not been doing so in Discourse. Instead, we’ve been incorrectly working around the problem by checking for the index existence before creating the index in order to make the migration idempotent. What this potentially mean is that we might have invalid indexes which are lying around in the database which PG will ignore for querying purposes.
This commits adds a migration which queries for all the
invalid indexes in the `public` namespace and reindexes them.
This commit promotes all post_deploy migrations which existed in
Discourse v2.8.0 (timestamp <= 20220107014925).
This commit includes a fix to the promote_migrations script to promote
all migrations of the first version of the previous stable version. For
example, if the current stable version is v2.8.13, the version used as
a cutoff for promoting migrations is v2.8.0.
* Remove old bookmark column ignores to follow up b22450c7a8
* Change some group site setting checks to use the _map helper
* Remove old secure_media helper stub for chat
* Change attr_accessor to attr_reader for preloaded_custom_fields to follow up 70af45055a
See https://meta.discourse.org/t/discourse-email-messages-are-incorrectly-threaded/233499
for thorough reasoning.
This commit changes how we generate Message-IDs and do email
threading for emails sent from Discourse. The main changes are
as follows:
* Introduce an outbound_message_id column on Post that
is either a) filled with a Discourse-generated Message-ID
the first time that post is used for an outbound email
or b) filled with an original Message-ID from an external
mail client or service if the post was created from an
incoming email.
* Change Discourse-generated Message-IDs to be more consistent
and static, in the format `discourse/post/:post_id@:host`
* Do not send References or In-Reply-To headers for emails sent
for the OP of topics.
* Make sure that In-Reply-To is filled with either a) the OP's
Message-ID if the post is not a direct reply or b) the parent
post's Message-ID
* Make sure that In-Reply-To has all referenced post's Message-IDs
* Make sure that References is filled with a chain of Message-IDs
from the OP down to the parent post of the new post.
We also are keeping X-Discourse-Post-Id and X-Discourse-Topic-Id,
headers that we previously removed, for easier visual debugging
of outbound emails.
Finally, we backfill the `outbound_message_id` for posts that have
a linked `IncomingEmail` record, using the `message_id` of that record.
We do not need to do that for posts that don't have an incoming email
since they are backfilled at runtime if `outbound_message_id` is missing.
It's already included in the `ignored_columns` list in the group model. 03ffb0bf27/app/models/group.rb (L9)
Also, removed the `MigrateGroupFlairImages` onceoff job and spec.
At some point in the past we decided to rename the 'regular' notification state of topics/categories to 'normal'. However, some UI copy was missed when the initial renaming was done so this commit changes the spots that were missed to the new name.
This table holds associations between uploads and other models. This can be used to prevent removing uploads that are still in use.
* DEV: Create upload_references
* DEV: Use UploadReference instead of PostUpload
* DEV: Use UploadReference for SiteSetting
* DEV: Use UploadReference for Badge
* DEV: Use UploadReference for Category
* DEV: Use UploadReference for CustomEmoji
* DEV: Use UploadReference for Group
* DEV: Use UploadReference for ThemeField
* DEV: Use UploadReference for ThemeSetting
* DEV: Use UploadReference for User
* DEV: Use UploadReference for UserAvatar
* DEV: Use UploadReference for UserExport
* DEV: Use UploadReference for UserProfile
* DEV: Add method to extract uploads from raw text
* DEV: Use UploadReference for Draft
* DEV: Use UploadReference for ReviewableQueuedPost
* DEV: Use UploadReference for UserProfile's bio_raw
* DEV: Do not copy user uploads to upload references
* DEV: Copy post uploads again after deploy
* DEV: Use created_at and updated_at from uploads table
* FIX: Check if upload site setting is empty
* DEV: Copy user uploads to upload references
* DEV: Make upload extraction less strict
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.
No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
There is no need to wait until after the deploy for this cleanup. In fact, running it later will mean there could be a window of a few minutes during which the site is broken.
The only requirement is that it runs after the broken `20220401130745_create_category_required_tag_groups` migration.
Followup to 39a6de3d73
The category table's required_tag_group_id contained references to deleted tag groups, which we copied to the new table. The new serializer tries to get the associated tag group name but fails because the tag group is nil.
This PR adds an inner join in the original migration to make sure tag groups still exist and adds a new post-migration to fix already migrated sites.
Previously we only supported a single 'required tag group' for a category. This commit allows admins to specify multiple required tag groups, each with their own minimum tag count.
A new category_required_tag_groups database table replaces the existing columns on the categories table. Data is automatically migrated.
As we are gradually moving to having a polymorphic
bookmarkable relationship on the Bookmark table,
we need to make the post_id column nullable to be
able to develop and test the new columns, and
for cutover/migration purposes later as well.
This commit promotes all post_deploy migrations which existed in Discourse v2.7.13 (timestamp <= 20210328233843)
This reduces the likelihood of issues relating to migration run order
Also fixes a couple of typos in `script/promote_migrations`
This commit is a redo of2f1ddadff7dd47f824070c8a3f633f00a27aacde
which we reverted because it blew up an internal CI check. I looked
into it, and it happened because the old migration to add the bookmark
columns still existed, and those columns were dropped in a post migrate,
so the two migrations to add the columns were conflicting before
the post migrate was run.
------
This commit only includes the creation of the new columns and index,
and does not add any triggers, backfilling, or new data.
A backfill will be done in the final PR when we switch this over.
Intermediate PRs will look something like this:
Add an experimental site setting for using polymorphic bookmarks,
and make sure in the places where bookmarks are created or updated
we fill in the columns. This setting will be used in subsequent
PRs as well.
Listing and searching bookmarks based on polymorphic associations
Creating post and topic bookmarks using polymorphic associations,
and changing special for_topic logic to just rely on the Topic
bookmarkable_type
Querying bookmark reminders based on polymorphic associations
Make sure various other areas like importers, bookmark guardian,
and others all rely on the associations
Prepare plugins that rely on the Bookmark model to use polymorphic
associations
The final core PR will remove all the setting gates and switch over
to using the polymorphic associations, backfill the bookmarks
table columns, and ignore the old post_id and for_topic colummns.
Then it will just be a matter of dropping the old columns down the
line.
This URL was originally updated in 89cb537fae. However, some sites are not using the proxy, and have configured their forum to hotlink images directly to avatars.discourse.org.
We intend to shut down this domain in favor of `avatars.discourse-cdn.com`, so this migration will re-write any matching site setting values and queue affected posts for rebaking.
* FEATURE: upload an avatar option for uploading avatars with selectable avatars
Allow staff or users at or above a trust level to upload avatars even when the site
has selectable avatars enabled.
Everyone can still pick from the list of avatars. The option to upload is shown
below the selectable avatar list.
refactored boolean site setting into an enum with the following values:
disabled: No selectable avatars enabled (default)
everyone: Show selectable avatars, and allow everyone to upload custom avatars
tl1: Show selectable avatars, but require tl1+ and staff to upload custom avatars
tl2: Show selectable avatars, but require tl2+ and staff to upload custom avatars
tl3: Show selectable avatars, but require tl3+ and staff to upload custom avatars
tl4: Show selectable avatars, but require tl4 and staff to upload custom avatars
staff: Show selectable avatars, but only allow staff to upload custom avatars
no_one: Show selectable avatars. No users can upload custom avatars
Co-authored-by: Régis Hanol <regis@hanol.fr>
This commit makes sure that the email log's bounce_error_code
conforms to the SMTP error code RFC on save, so that
it is always in the format X.X.X or XXX without any
additional string details. Also included is a migration
to fix this issue for past records.
We added this constraint in 5bd55acf83
but it is causing problems in hosted sites and is catching the
issue too far down the line. This commit removes the constraint
for now, and also fixes an issue found with PostDestroyer
which wasn't using the UserStatCountUpdater when updating post_count
and thus was causing negative numbers to occur.
Follow up to 88a8584348. Sets
the baked version of all posts with custom emoji and a secure
media URL in the cooked content to 0. Then our periodic rebake
posts job will rebake them to apply the fix in the linked
commit. This only matters on sites with secure media enabled.
It is too close to release of 2.8 for incomplete
feature shenanigans. Ignores and drops the columns and drops
the trigger/function introduced in
e21c640a3c.
Will pick this feature back up post-release.
Old OnceOff job could perform pretty slowly on sites with millions of emails
New implementation operates in batches in a migration, minimizing locking.
Currently when a user creates posts that are moderated (for whatever
reason), a popup is displayed saying the post needs approval and the
total number of the user’s pending posts. But then this piece of
information is kind of lost and there is nowhere for the user to know
what are their pending posts or how many there are.
This patch solves this issue by adding a new “Pending” section to the
user’s activity page when there are some pending posts to display. When
there are none, then the “Pending” section isn’t displayed at all.
* PERF: Improve database query perf when loading topics for a category.
Instead of left joining the `topics` table against `categories` by filtering with `categories.id`,
we can improve the query plan by filtering against `topics.category_id`
first before joining which helps to reduce the number of rows in the
topics table that has to be joined against the other tables and also
make better use of our existing index.
The following is a before and after of the query plan for a category
with many subcategories.
Before:
```
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
Limit (cost=1.28..747.09 rows=30 width=12) (actual time=85.502..2453.727 rows=30 loops=1)
-> Nested Loop Left Join (cost=1.28..566518.36 rows=22788 width=12) (actual time=85.501..2453.722 rows=30 loops=1)
Join Filter: (category_users.category_id = topics.category_id)
Filter: ((topics.category_id = 11) OR (COALESCE(category_users.notification_level, 1) <> 0) OR (tu.notification_level > 1))
-> Nested Loop Left Join (cost=1.00..566001.58 rows=22866 width=20) (actual time=85.494..2453.702 rows=30 loops=1)
Filter: ((COALESCE(tu.notification_level, 1) > 0) AND ((topics.category_id <> 11) OR (topics.pinned_at IS NULL) OR ((t
opics.pinned_at <= tu.cleared_pinned_at) AND (tu.cleared_pinned_at IS NOT NULL))))
Rows Removed by Filter: 1
-> Nested Loop (cost=0.57..528561.75 rows=68606 width=24) (actual time=85.472..2453.562 rows=31 loops=1)
Join Filter: ((topics.category_id = categories.id) AND ((categories.topic_id <> topics.id) OR (categories.id = 1
1)))
Rows Removed by Join Filter: 13938306
-> Index Scan using index_topics_on_bumped_at on topics (cost=0.42..100480.05 rows=715549 width=24) (actual ti
me=0.010..633.015 rows=464623 loops=1)
Filter: ((deleted_at IS NULL) AND ((archetype)::text <> 'private_message'::text))
Rows Removed by Filter: 105321
-> Materialize (cost=0.14..36.04 rows=30 width=8) (actual time=0.000..0.002 rows=30 loops=464623)
-> Index Scan using categories_pkey on categories (cost=0.14..35.89 rows=30 width=8) (actual time=0.006.
.0.040 rows=30 loops=1)
Index Cond: (id = ANY ('{11,53,57,55,54,56,112,94,107,115,116,117,97,95,102,103,101,105,99,114,106,1
13,104,98,100,96,108,109,110,111}'::integer[]))
-> Index Scan using index_topic_users_on_topic_id_and_user_id on topic_users tu (cost=0.43..0.53 rows=1 width=16) (a
ctual time=0.004..0.004 rows=0 loops=31)
Index Cond: ((topic_id = topics.id) AND (user_id = 1103877))
-> Materialize (cost=0.28..2.30 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=30)
-> Index Scan using index_category_users_on_user_id_and_last_seen_at on category_users (cost=0.28..2.29 rows=1 width
=8) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (user_id = 1103877)
Planning Time: 1.359 ms
Execution Time: 2453.765 ms
(23 rows)
```
After:
```
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.28..438.55 rows=30 width=12) (actual time=38.297..657.215 rows=30 loops=1)
-> Nested Loop Left Join (cost=1.28..195944.68 rows=13443 width=12) (actual time=38.296..657.211 rows=30 loops=1)
Filter: ((categories.topic_id <> topics.id) OR (topics.category_id = 11))
Rows Removed by Filter: 29
-> Nested Loop Left Join (cost=1.13..193462.59 rows=13443 width=16) (actual time=38.289..657.092 rows=59 loops=1)
Join Filter: (category_users.category_id = topics.category_id)
Filter: ((topics.category_id = 11) OR (COALESCE(category_users.notification_level, 1) <> 0) OR (tu.notification_level > 1))
-> Nested Loop Left Join (cost=0.85..193156.79 rows=13489 width=20) (actual time=38.282..657.059 rows=59 loops=1)
Filter: ((COALESCE(tu.notification_level, 1) > 0) AND ((topics.category_id <> 11) OR (topics.pinned_at IS NULL) OR ((topics.pinned_at <= tu.cleared_pinned_at) AND (tu.cleared_pinned_at IS NOT NULL))))
Rows Removed by Filter: 1
-> Index Scan using index_topics_on_bumped_at on topics (cost=0.42..134521.06 rows=40470 width=24) (actual time=38.267..656.850 rows=60 loops=1)
Filter: ((deleted_at IS NULL) AND ((archetype)::text <> 'private_message'::text) AND (category_id = ANY ('{11,53,57,55,54,56,112,94,107,115,116,117,97,95,102,103,101,105,99,114,106,113,104,98,100,96,108,109,110,111}'::integer[])))
Rows Removed by Filter: 569895
-> Index Scan using index_topic_users_on_topic_id_and_user_id on topic_users tu (cost=0.43..1.43 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=60)
Index Cond: ((topic_id = topics.id) AND (user_id = 1103877))
-> Materialize (cost=0.28..2.30 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=59)
-> Index Scan using index_category_users_on_user_id_and_last_seen_at on category_users (cost=0.28..2.29 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (user_id = 1103877)
-> Index Scan using categories_pkey on categories (cost=0.14..0.17 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=59)
Index Cond: (id = topics.category_id)
Planning Time: 1.633 ms
Execution Time: 657.255 ms
(22 rows)
```
* PERF: Optimize index on topics bumped_at.
Replace `index_topics_on_bumped_at` index with a partial index on `Topic#bumped_at` filtered by archetype since there is already another index that covers private topics.
We don't need no stinkin' denormalization! This commit ignores
the topic_id column on bookmarks, to be deleted at a later date.
We don't really need this column and it's better to rely on the
post.topic_id as the canonical topic_id for bookmarks, then we
don't need to remember to update both columns if the bookmarked
post moves to another topic.