We just completed the 3.2 release, which marks a good time to drop some previously deprecated columns.
Since the column has been marked in ignored_columns, it has been inaccessible to application code since then. There's a tiny risk that this might break a Data Explorer query, but given the nature of the column, the years of disuse, and the fact that such a breakage wouldn't be critical, we accept it.
When exporting a csv file and the size of the file exceeded the
max_export_file_size_kb it will still send the PM that the export
succeeded with a broken link to a missing export file. This change
ensures that a failed message will be sent instead.
Currently when exporting a list of users and there is an error we just
log that there was an error, but we don't show what the issue is in the
logs which makes it really hard to debug in production. This change will
output any errors to the logs.
We want to exclude the system user from group user counts, since intuitively admins wouldn't include them.
Originally this was accomplished by booting said system user from the groups, but this is causing problems, because the system user needs TL group membership to perform certain tasks.
After this PR, system user is still in the TL groups, but excluded when refreshing the user count.
This introduces a new experimental hot sort ordering.
It attempts to float top conversations by first prioritizing a topics with lots of recent activity (likes and users responding)
The schedule that updates hot topics is disabled unless the hidden site setting: `experimental_hot_topics` is enabled.
You can control "decay" with `hot_topic_gravity` and `recency` with `hot_topics_recent_days`
Data is stored in the new `topic_hot_scores` table and you can check it out on the `/hot` route once
enabled.
---------
Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>
Why this change?
On CI, we have been seeing the "handles job concurrency" job timing out
on CI after 45 seconds. Upon closer inspection of `Jobs::Base#perform`
when cluster concurrency has been set, we see that a thread is spun up
to extend the expiring of a redis key by 120 seconds every 60 seconds
while the job is still being executed. The thread looks like this before
the fix:
```
keepalive_thread =
Thread.new do
while parent_thread.alive? && !finished
Discourse.redis.without_namespace.expire(cluster_concurrency_redis_key, 120)
sleep 60
end
end
```
In an ensure block of `Jobs::Base#perform`, the thread is stop by doing
something like this:
```
finished = true
keepalive_thread.wakeup
keepalive_thread.join
```
If the thread is sleeping, `keepalive_thread.wakeup` will stop the
`sleep` method and run the next iteration causing the thread to
complete. However, there is a timing issue at play here. If
`keepalive_thread.wakeup` is called at a time when the thread is not
sleeping, it will have no effect and the thread may end up sleeping for
60 seconds which is longer than our timeout on CI of 45 seconds.
What does this change do?
1. Change `sleep 60` to sleep in intervals of 1 second checking if the
job has been finished each time.
2. Add `use_redis_snapshotting` to `Jobs::Base` spec since Redis is
involved in scheduling and we want to ensure we don't leak Redis
keys.
3. Add `ConcurrentJob.stop!` and `thread.join` to `ensure` block in "handles job concurrency"
test since a failing expectation will cause us to not clean up the
thread we created in the test.
This commit fixes an issue where when some actions were done
(deleting/recovering post, moving posts) we updated the
topic_users.bookmarked column to the wrong value. This was happening
because the SyncTopicUserBookmarked job was not taking into account
Topic level bookmarks, so if there was a Topic bookmark and no
Post bookmarks for a user in the topic, they would have
topic_users.bookmarked set to false, which meant the bookmark would
no longer show in the /bookmarks list.
To reproduce before the fix:
* Bookmark a topic and don’t bookmark any posts within
* Delete or recover any post in the topic
c.f. https://meta.discourse.org/t/disappearing-bookmarks-and-expected-behavior-of-bookmarks/264670/36
We updated scheduled admin checks to run concurrently in their own jobs. The main reason for this was so that we can implement re-check functionality for especially flaky checks (e.g. group e-mail credentials check.)
This works in the following way:
1. The check declares its retry policy using class methods.
2. A block can be yielded to if there are problems, but before they are committed to Redis.
3. The job uses this block to either a) schedule a retry if there are any remaining or b) do nothing and let the check commit.
This PR does some preparatory refactoring of scheduled admin checks in order for us to be able to do custom retry strategies for some of them.
Instead of running all checks in sequence inside a single, scheduled job, the scheduled job spawns one new job per check.
In order to be concurrency-safe, we need to change the existing Redis data structure from a string (of serialized JSON) to a list of strings (of serialized JSON).
After fbe0e4c we always pass a block into these methods.
So yield inside the export methods works and there is no need
anymore to wrap them into enumerators.
So we have to order by calling `find_each(order: :desc)`.
Note that that will order rows by Id, not by `last_match_at`
as we tried before (though that didn't work).
When we receive the stream parameter, we'll queue a job that periodically publishes partial updates, and after the summarization finishes, a final one with the completed version, plus metadata.
`summary-box` listens to these updates via MessageBus, and updates state accordingly.
A previous change updated `ReviewableQueuedPost`'s `created_by`
to be consistent with other reviewable types. It assigns
the the creator of the post being queued to `target_created_by` and sets
the `created_by` to the creator of the reviewable itself.
This fix updates some of the `created_by` references missed during the
intial fix.
We have a number of raw comments indicating that certain methods and classes are deprecated and marked for removal. This change turn those comments into deprecation warnings so that we can 1) see them in the logs of our own hosting and 2) give some warning to self hosters.
We recently introduced this advice to admins when some translation overrides are outdated or using unknown interpolation keys:
However we missed the case where the original translation key has been renamed or altogether removed. When this happens they are no longer visible in the admin interface, leading to the confusing situation where we say there are outdated translations, but none are shown.
Because we don't explicitly handle this case, some deleted translations were incorrectly marked as having unknown interpolation keys. (This is because I18n.t will return a string like "Translation missing: foo", which obviously has no interpolation keys inside.)
This change adds an additional status, deprecated for TranslationOverride, and the job that checks them will check for this status first, taking precedence over invalid_interpolation_keys. Since the advice only checks for the outdated and invalid_interpolation_keys statuses, this fixes the problem.
This commit makes sure we don't load all data into memory when doing CSV exports.
The most important change here made to the recently introduced export of chat
messages (3ea31f4). We were loading all data into memory in the first version, with
this commit it's not the case anymore.
Speaking of old exports. Some of them already use find_each, and it worked as
expected, without loading all data into memory. And it will proceed working as
expected after this commit.
In general, I made sure this change didn't break other CSV exports, first manually, and
then by writing system specs for them. Sadly, I haven't managed yet to make those
specs stable, they work fine locally, but flaky in GitHub actions, so I've disabled them
for now.
I'll be making more changes to the CSV exports code soon, those system specs will be
very helpful. I'll be running them locally, and I hope I'll manage to make them stable
while doing that work.
This PR adds a feature to help admins stay up-to-date with their translations. We already have protections preventing admins from problems when they update their overrides. This change adds some protection in the other direction (where translations change in core due to an upgrade) by creating a notice for admins when defaults have changed.
Terms:
- In the case where Discourse core changes the default translation, the translation override is considered "outdated".
- In the case above where interpolation keys were changed from the ones the override is using, it is considered "invalid".
- If none of the above applies, the override is considered "up to date".
How does it work?
There are a few pieces that makes this work:
- When an admin creates or updates a translation override, we store the original translation at the time of write. (This is used to detect changes later on.)
- There is a background job that runs once every day and checks for outdated and invalid overrides, and marks them as such.
- When there are any outdated or invalid overrides, a notice is shown in admin dashboard with a link to the text customization page.
Known limitations
The link from the dashboard links to the default locale text customization page. Given there might be invalid overrides in multiple languages, I'm not sure what we could do here. Consideration for future improvement.
What is the problem?
When an admin changes the default_sidebar_categories or default_sidebar_tags site settings and opts to backfill the setting,
we currently enqueue a sidekiq job to run the backfilling operation. When an admin changes those settings multiple times
within a short time frame, multiple sidekiq jobs with different backfilling parameters will be enqueued.
This is problematic if multiple jobs are executed concurrently as it may lead to situations where a job
with “outdated” site setting values is completed after a job with the “latest” site setting values.
What is the fix?
By setting `cluster_concurrency` to `1`, we ensure that only one of such
backfilling job will execute across all the sidekiq processes that are
deployed at any point in time. Since Sidekiq pops off job in the order
in which they are pushed, limiting the cluster concurrency here will
allow us to execute the enqueued `Jobs::BackfillSidebarSiteSettings`
jobs serially.
While we are unable to support OAUTH2 with pop3 (due to upstream dependency ruby/net-pop#16), we are adding the support for mail pollers plugin. Doing so, it would be possible to write a plugin which then uses other ways (microsoft graph sdk for example) to poll emails from a mailbox.
The idea is that a plugin would define a class which inherits from Email::Poller and defines a poll_mailbox static method which returns an array of strings. Then the plugin could call register_mail_poller(<class_name>) to have it registered. All the configuration (oauth2 tokens, email, etc) could be managed by sitesettings defined in the plugin.
This change adds support retroactively updating display names in the new quote format when the user's name is changed. It happens through a background job that is triggered by a callback when a user is saved with a new name.
https://meta.discourse.org/t/markdown-preview-and-result-differ/263878
The result of this markdown had different results in the composer preview and the post. This is solved by updating Loofah to the latest version and using html5 fragments like our user had reported. While the change was only needed in cooked_post_processor.rb for this fix, other areas also had to be updated due to various side effects.
When we introduced the new quote format with full-name display name:
```
[quote="Ted Johansson, post:1, topic:2, username:ted"]
we overlooked the code responsible for rewriting quotes when a user's name is changed.
```
The functional part of this change adds support for the new quote format in the code that updates quotes when a user's username changes. See the test case in `spec/services/username_changer_spec.rb` for the details.
In addition, this change adds a regression test for PrettyText to cover the new quote format, and extracts the code responsible for rewriting raw and cooked quotes into its own `QuoteRewriter` class. The functionality of the latter is tested through the tests in `spec/services/username_changer_spec.rb`.
Anonymization is among the most expensive operations we can perform with
extreme potential to impact the database. To mitigate risk we only allow a
single anonymization across the entire cluster concurrently.
This commit introduces support for `cluster_concurrency 1`. When you set that on a Job it will only allow 1 concurrent execution per cluster.
On the client-side, message-bus subscriptions and reviewable count UI is based on the 'redesigned_user_menu_enabled' boolean. We need to use the same logic on the server-side to ensure things work correctly when legacy navigation is used alongside the new user menu.
2e78045a fixed the anonymization job so that it correctly updated self-mentions, which are not logged in the post_actions table. The solution was to scan the entire `posts` table with an `raw ILIKE` query. On sites with many posts, this can take a very long time.
This commit updates the job to take a two-pass approach:
First, we update posts based on the post_actions table. This is much more efficient than a full table scan, and takes care of all 'non-self' mentions.
Then, we make a second pass using the `raw ILIKE` approach. Since we already took care of most posts, we can scope this down to self-mentions only. By filtering the query to a specific posts.user_id, it is significantly more performant than a full table scan.
Currently, we’re performing a check when a user is suspended in the
`UserEmail` job and we’re assuming a `post` is always available, which
is not the case. The code indeed breaks when the job is called with the
`account_suspended` type option.
This patch fixes this issue by making the check use the safe navigation
operator, thus making it working when `post` is not provided.
Currently, when a suspended user belongs to a group PM (private message
with more than two people in it) and a staff member sends a message to
this group PM, then the suspended user will receive an email.
This happens because a suspended user can only receive emails from staff
members. But in this case, this can be seen as a bug as the expected
behavior would be instead to not send any email to the suspended user. A
staff member can participate in active discussions like any other
member and so their messages in this context shouldn’t be treated
differently than the ones from regular users.
This patch addresses this issue by checking if a suspended user receives
a message from a group PM or not. If that’s the case then an email won’t
be sent no matter if the post originated from a staff member or not.
* UX: add type tag and design update
* UX: clarify status copy in reviewQ
* DEV: switch to selectKit
* UX: color approve/reject buttons in RQ
* DEV: regroup actions
* UX: add type tag and design update
* UX: clarify status copy in reviewQ
* Join questions for flagged post with "or" with new I18n function
* Move ReviewableScores component out of context
* Add CSS classes to reviewable-item based on human type
* UX: add table header for scoring
* UX: don't display % score
* UX: prefix modifier class with dash
* UX: reviewQ flag table styling
* UX: consistent use of ignore icon
* DEV: only show context question on pending status
* UX: only show table headers on pending status
* DEV: reviewQ regroup actions for hidden posts
* UX: reviewQ > approve/reject buttons
* UX: reviewQ add fadeout
* UX: reviewQ styling
* DEV: move scores back into component
* UX: reviewQ mobile styling
* UX: score table on mobile
* UX: reviewQ > move meta info outside table
* UX: reviewQ > score layout fixes
* DEV: readd `agree_and_keep` and fix the spec tests.
* Fix the spec tests
* fix the quint test
* DEV: readd deleting replies
* UX: reviewQ copy tweaks
* DEV: readd test for ignore + delete replies
* Remove old
* FIX: Add perform_ignore back in for backwards compat
* DEV: add an action alias `ignore` for `ignore_and_do_nothing`.
---------
Co-authored-by: Martin Brennan <martin@discourse.org>
Co-authored-by: Vinoth Kannan <svkn.87@gmail.com>
Forcing distributed muted to raise when a notify reviewable job is running
leads to excessive errors in the logs under many conditions.
The new pattern
1. Optimises the counting of reviewables so it is a lot faster
2. Holds the distributed lock for 2 minutes (max)
The downside is the job queue can get blocked up when tons of notify
reviewables are running at the same time. However this should be very
rare in the real world, as we only notify when stuff is flagged which
is fairly infrequent.
This also give a fair bit more time for the notifications which may be
a little slow on large sites with tons of mods.
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.
There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
Posts with self-mentions aren't updated with username updates. This happens
because mention `UserAction` entries aren't logged for self-mentions.
This change updates the lookup of `Post` and `PostRevision` with mentions to bypass
`UserAction` entries.
Under scenarios of extremely high load where large numbers of `Reviewable*` items are being created, it has been observed that multiple instances of the `NotifyReviewable` job may run simultaneously.
These jobs will work satisfactorily if the concurrency is limited to 1, and the different types of jobs (items reviewable by admins, vs moderators, vs particular groups, etc.) are run eventually.
This change introduces a new option to `DistributedMutex` which allows the `max_get_lock_attempts` to be specified. If the number is exceeded an error will be raised, which will cause Sidekiq to requeue the job. Sidekiq has existing logic to back-off on retry times for jobs that have failed multiple times.
So it can easily be overwritten in a plugin for example.
### Added more tests to provide better coverage
We previously only had `u.silenced_till IS NULL` but I made it consistent with pretty much every other places where we check for "active" users.
These two new lines do change the query a tiny bit though.
**Before**
- You could not get the badge if you were currently silenced (no matter what period is being checked)
- You could get the badge if you were suspended 😬
**After**
- You can't get the badge if you were silenced during the past year
- You can't get the badge if you were suspended during the past year
### Improved the performance of the query by using `NOT EXISTS` instead of `LEFT JOIN / COUNT() = 0`
There is no difference in behaviour between
```sql
LEFT JOIN user_badges AS ub ON ub.user_id = u.id AND ...
[...]
HAVING COUNT(ub.*) = 0
```
and
```sql
NOT EXISTS (SELECT 1 FROM user_badges AS ub WHERE ub.user_id = u.id AND ...)
```
The only difference is performance-wise. The `NOT EXISTS` is 10-30% faster on very large databases (aka. posts and users in X millions). I checked on 3 of the largest datasets I could find.
```
class Jobs::DummyDelayedJob < Jobs::Base
def execute(args = {})
end
end
RSpec.describe "Jobs.run_immediately!" do
before { Jobs.run_immediately! }
it "explodes" do
current_user = Fabricate(:user)
Jobs.enqueue_in(1.seconds, :dummy_delayed_job)
sign_in(current_user)
end
end
```
The test above will fail with the following error if `ActiveRecord::Base.connection_handler.clear_active_connections!` is called before the configured Capybara server checks out a connection from the connection pool.
```
ActiveRecord::ActiveRecordError:
Cannot expire connection, it is owned by a different thread: #<Thread:0x00007f437391df58@puma srv tp 001 /home/tgxworld/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/gems/puma-6.0.2/lib/puma/thread_pool.rb:106 sleep_forever>. Current thread: #<Thread:0x00007f437d6cfc60 run>.
```
We're not exactly sure if this is an ActiveRecord bug or not but we've
invested too much time into investigating this problem. Fundamentally,
we also no longer understand why `ActiveRecord::Base.connection_handler.clear_active_connections!` is being called in an ensure block
within `Jobs::Base#perform` which was added in
ceddb6e0da 10 years ago. This
commit moves the logic for running jobs immediately out of the
`Jobs::Base#perform` method into another `Jobs::Base#perform_immediately` method such that
`ActiveRecord::Base.connection_handler.clear_active_connections!` is not
called. This change will only impact the test environment.
When sending emails out via group SMTP, if we
are sending them to non-staged users we want
to mask those emails with BCC, just so we don't
expose them to anyone we shouldn't. Staged users
are ones that have likely only interacted with
support via email, and will likely include other
people who were CC'd on the original email to the
group.
Co-authored-by: Martin Brennan <martin@discourse.org>
* DEV: Skip push notifications for active online users
Currently, users with active push subscriptions get push notifications
regardless of their "presence" on the site.
This change introduces a `push_notification_time_window_mins`
site setting which is used in conjunction with a user's `last_seen_at` to
determine if push notifications should be sent. A user is considered to
be actively online if their `last_seen_at` is within `push_notification_time_window_mins`
minutes. `push_notification_time_window_mins` is set to 10 by default.
* DEV: Remove client param for push_notification_time_window_mins site setting
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
This commit adds a new notification that gets sent to admins when the site gets new features after an upgrade/deploy. Clicking on the notification takes the admin to the admin dashboard at `/admin` where they can see the new features under the "New Features" section.
Internal topic: t/87166.
This new site setting replaces the
`enable_experimental_sidebar_hamburger` and `enable_sidebar` site
settings as the sidebar feature exits the experimental phase.
Note that we're replacing this without depreciation since the previous
site setting was considered experimental.
Internal Ref: /t/86563
Users who can access the review queue can claim a pending reviewable(s) which means that the claimed reviewable(s) can only be handled by the user who claimed it. Currently, we show claimed reviewables in the user menu, but this can be annoying for other reviewers because they can't do anything about a reviewable claimed by someone. So this PR makes sure that we only show in the user menu reviewables that are claimed by nobody or claimed by the current user.
Internal topic: t/77235.
While load testing our user creation code path in production, we
identified that executing the DB statement to update the `Group#user_count` column within a
transaction is creating a bottleneck for us. This is because the
creation of a user and addition of the user to the relevant groups are
done in a transaction. When we execute the DB statement to update
`Group#user_count` for the relevant group, a row level lock is held
until the transaction completes. This row level lock acts like a global
lock when the server is creating users that will be added to the same
group in quick succession.
Instead of updating the counter cache within a transaction which the
default ActiveRecord `counter_cache` option does, we simply update the
counter cache outside of the committing transaction.
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
The previous sidebar default tags and categories implementation did not
allow for a user to configure their sidebar to have no categories or
tags. This commit changes how the defaults are applied. When a user is being created,
we create the SidebarSectionLink records based on the `default_sidebar_categories` and
`default_sidebar_tags` site settings. SidebarSectionLink records are
only created for categories and tags which the user has visibility on at
the point of user creation.
With this change, we're also adding the ability for admins to apply
changes to the `default_sidebar_categories` and `default_sidebar_tags`
site settings historically when changing their site setting. When a new
category/tag has been added to the default, the new category/tag will be
added to the sidebar for all users if the admin elects to apply the changes historically.
Like wise when a tag/category is removed, the tag/category will be
removed from the sidebar for all users if the admin elects to apply the
changes historically.
Internal Ref: /t/73500
Previously, when categories were not muted by default, we were sending message about unmuted topics (topics which user explicitly set notification level to watching)
The same mechanism can be used to fix a bug. When the user was explicitly watching topic, but category was muted, then the user was not informed about new reply.
Skipped invites were not counted at all and some invites could generate
more than one error and resulted in a grand total that was not equal to
the count of bulk invites.
This commit renames all secure_media related settings to secure_uploads_* along with the associated functionality.
This is being done because "media" does not really cover it, we aren't just doing this for images and videos etc. but for all uploads in the site.
Additionally, in future we want to secure more types of uploads, and enable a kind of "mixed mode" where some uploads are secure and some are not, so keeping media in the name is just confusing.
This also keeps compatibility with the `secure-media-uploads` path, and changes new
secure URLs to be `secure-uploads`.
Deprecated settings:
* secure_media -> secure_uploads
* secure_media_allow_embed_images_in_emails -> secure_uploads_allow_embed_images_in_emails
* secure_media_max_email_embed_image_size_kb -> secure_uploads_max_email_embed_image_size_kb
We previously had a system which would generate a 10x10px preview of images and add their URLs in a data-small-upload attribute. The client would then use that as the background-image of the `<img>` element. This works reasonably well on fast connections, but on slower connections it can take a few seconds for the placeholders to appear. The act of loading the placeholders can also break or delay the loading of the 'real' images.
This commit replaces the placeholder logic with a new approach. Instead of a 10x10px preview, we use imagemagick to calculate the average color of an image and store it in the database. The hex color value then added as a `data-dominant-color` attribute on the `<img>` element, and the client can use this as a `background-color` on the element while the real image is loading. That means no extra HTTP request is required, and so the placeholder color can appear instantly.
Dominant color will be calculated:
1. When a new upload is created
2. During a post rebake, if the dominant color is missing from an upload, it will be calculated and stored
3. Every 15 minutes, 25 old upload records are fetched and their dominant color calculated and stored. (part of the existing PeriodicalUpdates job)
Existing posts will continue to use the old 10x10px placeholder system until they are next rebaked
We observed that some sites seemingly put us in a tarpit when we attempt to pull hotlinked images. Increasing the timeout will help in these situations.
When a topic was published from a shared draft and it had tags, the
users watching the tags were not notified. The problem was that the
topics are usually created in a secret category and publishing it just
moves an existent topic to the target category, without making any
changes to the tags.
It's already included in the `ignored_columns` list in the group model. 03ffb0bf27/app/models/group.rb (L9)
Also, removed the `MigrateGroupFlairImages` onceoff job and spec.
This table holds associations between uploads and other models. This can be used to prevent removing uploads that are still in use.
* DEV: Create upload_references
* DEV: Use UploadReference instead of PostUpload
* DEV: Use UploadReference for SiteSetting
* DEV: Use UploadReference for Badge
* DEV: Use UploadReference for Category
* DEV: Use UploadReference for CustomEmoji
* DEV: Use UploadReference for Group
* DEV: Use UploadReference for ThemeField
* DEV: Use UploadReference for ThemeSetting
* DEV: Use UploadReference for User
* DEV: Use UploadReference for UserAvatar
* DEV: Use UploadReference for UserExport
* DEV: Use UploadReference for UserProfile
* DEV: Add method to extract uploads from raw text
* DEV: Use UploadReference for Draft
* DEV: Use UploadReference for ReviewableQueuedPost
* DEV: Use UploadReference for UserProfile's bio_raw
* DEV: Do not copy user uploads to upload references
* DEV: Copy post uploads again after deploy
* DEV: Use created_at and updated_at from uploads table
* FIX: Check if upload site setting is empty
* DEV: Copy user uploads to upload references
* DEV: Make upload extraction less strict
This commit introduces a new site setting: `block_hotlinked_media`. When enabled, all attempts to hotlink media (images, videos, and audio) will fail, and be replaced with a linked placeholder. Exceptions to the rule can be added via `block_hotlinked_media_exceptions`.
`download_remote_image_to_local` can be used alongside this feature. In that case, hotlinked images will be blocked immediately when the post is created, but will then be replaced with the downloaded version a few seconds later.
This implementation is purely server-side, and does not impact the composer preview.
Technically, there are two stages to this feature:
1. `PrettyText.sanitize_hotlinked_media` is called during `PrettyText.cook`, and whenever new images are introduced by Onebox. It will iterate over all src/srcset attributes in the post HTML and check if they're allowed. If not, the attributes will be removed and replaced with a `data-blocked-hotlinked-src(set)` attribute
2. In the `CookedPostProcessor`, we iterate over all `data-blocked-hotlinked-src(set)` attributes and check whether we have a downloaded version of the media. If yes, we update the src to use the downloaded version. If not, the entire media element is replaced with a placeholder. The placeholder is labelled 'external media', and is a link to the offsite media.
Previously, with the default `editing_grace_period`, hotlinked images were pulled 5 minutes after a post is created. This delay was added to reduce the chance of automated edits clashing with user edits.
This commit refactors things so that we can pull hotlinked images immediately. URLs are immediately updated in the post's `cooked` HTML. The post's raw markdown is updated later, after the `editing_grace_period`.
This involves a number of behind-the-scenes changes including:
- Schedule Jobs::PullHotlinkedImages immediately after Jobs::ProcessPost. Move scheduling to after the `update_column` call to avoid race conditions
- Move raw changes into a separate job, which is delayed until after the ninja-edit window
- Move disable_if_low_on_disk_space logic into the `pull_hotlinked_images` job
- Move raw-parsing/replacing logic into `InlineUpload` so it can be easily be shared between `UpdateHotlinkedRaw` and `PullUserProfileHotlinkedImages`
Previously this mapping of **cooked** images was only being run for oneboxes. Now it runs for all images, so we can transform hotlinked images without needing to immediately update `raw`
Incorporates learnings from /t/64227:
* Changes the code to set access control posts in the rake
task to be an efficient UPDATE SQL query.
The original version was timing out with 312017 post uploads,
the new query took ~3s to run.
* Changes the code to mark uploads as secure/not secure in
the rake task to be an efficient UPDATE SQL query rather than
using UploadSecurity. This took a very long time previously,
and now takes only a few seconds.
* Spread out ACL syncing for uploads into jobs with batches of
100 uploads at a time, so they can be parallelized instead
of having to wait ~1.25 seconds for each ACL to be changed
in S3 serially.
One issue that still remains is post rebaking. Doing this serially
is painfully slow. We have a way to do this in sidekiq via PeriodicalUpdates
but this is limited by max_old_rebakes_per_15_minutes. It would
be better to fan this rebaking out into jobs like we did for the
ACL sync, but that should be done in another PR.
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.
No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
We have not used anything related to bookmarks for PostAction
or UserAction records since 2020, bookmarks are their own thing
now. Deleting all this is just cleaning up old cruft.
A bit of a mixed bag, this addresses several edge areas of bookmarks and makes them compatible with polymorphic bookmarks (hidden behind the `use_polymorphic_bookmarks` site setting). The main ones are:
* ExportUserArchive compatibility
* SyncTopicUserBookmarked job compatibility
* Sending different notifications for the bookmark reminders based on the bookmarkable type
* Import scripts compatibility
* BookmarkReminderNotificationHandler compatibility
This PR also refactors the `register_bookmarkable` API so it accepts a class descended from a `BaseBookmarkable` class instead. This was done because we kept having to add more and more lambdas/properties inline and it was very messy, so a factory pattern is cleaner. The classes can be tested independently as well.
Some later PRs will address some other areas like the discourse narrative bot, advanced search, reports, and the .ics endpoint for bookmarks.
This will make future changes to the 'pull hotlinked images' system easier. This commit should not introduce any functional change.
For now, the old post_custom_field data is kept in the database. This will be dropped in a future commit.