Anonymization is among the most expensive operations we can perform with
extreme potential to impact the database. To mitigate risk we only allow a
single anonymization across the entire cluster concurrently.
This commit introduces support for `cluster_concurrency 1`. When you set that on a Job it will only allow 1 concurrent execution per cluster.
On the client-side, message-bus subscriptions and reviewable count UI is based on the 'redesigned_user_menu_enabled' boolean. We need to use the same logic on the server-side to ensure things work correctly when legacy navigation is used alongside the new user menu.
2e78045a fixed the anonymization job so that it correctly updated self-mentions, which are not logged in the post_actions table. The solution was to scan the entire `posts` table with an `raw ILIKE` query. On sites with many posts, this can take a very long time.
This commit updates the job to take a two-pass approach:
First, we update posts based on the post_actions table. This is much more efficient than a full table scan, and takes care of all 'non-self' mentions.
Then, we make a second pass using the `raw ILIKE` approach. Since we already took care of most posts, we can scope this down to self-mentions only. By filtering the query to a specific posts.user_id, it is significantly more performant than a full table scan.
Currently, we’re performing a check when a user is suspended in the
`UserEmail` job and we’re assuming a `post` is always available, which
is not the case. The code indeed breaks when the job is called with the
`account_suspended` type option.
This patch fixes this issue by making the check use the safe navigation
operator, thus making it working when `post` is not provided.
Currently, when a suspended user belongs to a group PM (private message
with more than two people in it) and a staff member sends a message to
this group PM, then the suspended user will receive an email.
This happens because a suspended user can only receive emails from staff
members. But in this case, this can be seen as a bug as the expected
behavior would be instead to not send any email to the suspended user. A
staff member can participate in active discussions like any other
member and so their messages in this context shouldn’t be treated
differently than the ones from regular users.
This patch addresses this issue by checking if a suspended user receives
a message from a group PM or not. If that’s the case then an email won’t
be sent no matter if the post originated from a staff member or not.
Forcing distributed muted to raise when a notify reviewable job is running
leads to excessive errors in the logs under many conditions.
The new pattern
1. Optimises the counting of reviewables so it is a lot faster
2. Holds the distributed lock for 2 minutes (max)
The downside is the job queue can get blocked up when tons of notify
reviewables are running at the same time. However this should be very
rare in the real world, as we only notify when stuff is flagged which
is fairly infrequent.
This also give a fair bit more time for the notifications which may be
a little slow on large sites with tons of mods.
The #pluck_first freedom patch, first introduced by @danielwaterworth has served us well, and is used widely throughout both core and plugins. It seems to have been a common enough use case that Rails 6 introduced it's own method #pick with the exact same implementation. This allows us to retire the freedom patch and switch over to the built-in ActiveRecord method.
There is no replacement for #pluck_first!, but a quick search shows we are using this in a very limited capacity, and in some cases incorrectly (by assuming a nil return rather than an exception), which can quite easily be replaced with #pick plus some extra handling.
Posts with self-mentions aren't updated with username updates. This happens
because mention `UserAction` entries aren't logged for self-mentions.
This change updates the lookup of `Post` and `PostRevision` with mentions to bypass
`UserAction` entries.
Under scenarios of extremely high load where large numbers of `Reviewable*` items are being created, it has been observed that multiple instances of the `NotifyReviewable` job may run simultaneously.
These jobs will work satisfactorily if the concurrency is limited to 1, and the different types of jobs (items reviewable by admins, vs moderators, vs particular groups, etc.) are run eventually.
This change introduces a new option to `DistributedMutex` which allows the `max_get_lock_attempts` to be specified. If the number is exceeded an error will be raised, which will cause Sidekiq to requeue the job. Sidekiq has existing logic to back-off on retry times for jobs that have failed multiple times.
When sending emails out via group SMTP, if we
are sending them to non-staged users we want
to mask those emails with BCC, just so we don't
expose them to anyone we shouldn't. Staged users
are ones that have likely only interacted with
support via email, and will likely include other
people who were CC'd on the original email to the
group.
Co-authored-by: Martin Brennan <martin@discourse.org>
* DEV: Skip push notifications for active online users
Currently, users with active push subscriptions get push notifications
regardless of their "presence" on the site.
This change introduces a `push_notification_time_window_mins`
site setting which is used in conjunction with a user's `last_seen_at` to
determine if push notifications should be sent. A user is considered to
be actively online if their `last_seen_at` is within `push_notification_time_window_mins`
minutes. `push_notification_time_window_mins` is set to 10 by default.
* DEV: Remove client param for push_notification_time_window_mins site setting
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
This new site setting replaces the
`enable_experimental_sidebar_hamburger` and `enable_sidebar` site
settings as the sidebar feature exits the experimental phase.
Note that we're replacing this without depreciation since the previous
site setting was considered experimental.
Internal Ref: /t/86563
Users who can access the review queue can claim a pending reviewable(s) which means that the claimed reviewable(s) can only be handled by the user who claimed it. Currently, we show claimed reviewables in the user menu, but this can be annoying for other reviewers because they can't do anything about a reviewable claimed by someone. So this PR makes sure that we only show in the user menu reviewables that are claimed by nobody or claimed by the current user.
Internal topic: t/77235.
While load testing our user creation code path in production, we
identified that executing the DB statement to update the `Group#user_count` column within a
transaction is creating a bottleneck for us. This is because the
creation of a user and addition of the user to the relevant groups are
done in a transaction. When we execute the DB statement to update
`Group#user_count` for the relevant group, a row level lock is held
until the transaction completes. This row level lock acts like a global
lock when the server is creating users that will be added to the same
group in quick succession.
Instead of updating the counter cache within a transaction which the
default ActiveRecord `counter_cache` option does, we simply update the
counter cache outside of the committing transaction.
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
The previous sidebar default tags and categories implementation did not
allow for a user to configure their sidebar to have no categories or
tags. This commit changes how the defaults are applied. When a user is being created,
we create the SidebarSectionLink records based on the `default_sidebar_categories` and
`default_sidebar_tags` site settings. SidebarSectionLink records are
only created for categories and tags which the user has visibility on at
the point of user creation.
With this change, we're also adding the ability for admins to apply
changes to the `default_sidebar_categories` and `default_sidebar_tags`
site settings historically when changing their site setting. When a new
category/tag has been added to the default, the new category/tag will be
added to the sidebar for all users if the admin elects to apply the changes historically.
Like wise when a tag/category is removed, the tag/category will be
removed from the sidebar for all users if the admin elects to apply the
changes historically.
Internal Ref: /t/73500
Previously, when categories were not muted by default, we were sending message about unmuted topics (topics which user explicitly set notification level to watching)
The same mechanism can be used to fix a bug. When the user was explicitly watching topic, but category was muted, then the user was not informed about new reply.
Skipped invites were not counted at all and some invites could generate
more than one error and resulted in a grand total that was not equal to
the count of bulk invites.
This commit renames all secure_media related settings to secure_uploads_* along with the associated functionality.
This is being done because "media" does not really cover it, we aren't just doing this for images and videos etc. but for all uploads in the site.
Additionally, in future we want to secure more types of uploads, and enable a kind of "mixed mode" where some uploads are secure and some are not, so keeping media in the name is just confusing.
This also keeps compatibility with the `secure-media-uploads` path, and changes new
secure URLs to be `secure-uploads`.
Deprecated settings:
* secure_media -> secure_uploads
* secure_media_allow_embed_images_in_emails -> secure_uploads_allow_embed_images_in_emails
* secure_media_max_email_embed_image_size_kb -> secure_uploads_max_email_embed_image_size_kb
We observed that some sites seemingly put us in a tarpit when we attempt to pull hotlinked images. Increasing the timeout will help in these situations.
When a topic was published from a shared draft and it had tags, the
users watching the tags were not notified. The problem was that the
topics are usually created in a secret category and publishing it just
moves an existent topic to the target category, without making any
changes to the tags.
This commit introduces a new site setting: `block_hotlinked_media`. When enabled, all attempts to hotlink media (images, videos, and audio) will fail, and be replaced with a linked placeholder. Exceptions to the rule can be added via `block_hotlinked_media_exceptions`.
`download_remote_image_to_local` can be used alongside this feature. In that case, hotlinked images will be blocked immediately when the post is created, but will then be replaced with the downloaded version a few seconds later.
This implementation is purely server-side, and does not impact the composer preview.
Technically, there are two stages to this feature:
1. `PrettyText.sanitize_hotlinked_media` is called during `PrettyText.cook`, and whenever new images are introduced by Onebox. It will iterate over all src/srcset attributes in the post HTML and check if they're allowed. If not, the attributes will be removed and replaced with a `data-blocked-hotlinked-src(set)` attribute
2. In the `CookedPostProcessor`, we iterate over all `data-blocked-hotlinked-src(set)` attributes and check whether we have a downloaded version of the media. If yes, we update the src to use the downloaded version. If not, the entire media element is replaced with a placeholder. The placeholder is labelled 'external media', and is a link to the offsite media.
Previously, with the default `editing_grace_period`, hotlinked images were pulled 5 minutes after a post is created. This delay was added to reduce the chance of automated edits clashing with user edits.
This commit refactors things so that we can pull hotlinked images immediately. URLs are immediately updated in the post's `cooked` HTML. The post's raw markdown is updated later, after the `editing_grace_period`.
This involves a number of behind-the-scenes changes including:
- Schedule Jobs::PullHotlinkedImages immediately after Jobs::ProcessPost. Move scheduling to after the `update_column` call to avoid race conditions
- Move raw changes into a separate job, which is delayed until after the ninja-edit window
- Move disable_if_low_on_disk_space logic into the `pull_hotlinked_images` job
- Move raw-parsing/replacing logic into `InlineUpload` so it can be easily be shared between `UpdateHotlinkedRaw` and `PullUserProfileHotlinkedImages`
Previously this mapping of **cooked** images was only being run for oneboxes. Now it runs for all images, so we can transform hotlinked images without needing to immediately update `raw`
Incorporates learnings from /t/64227:
* Changes the code to set access control posts in the rake
task to be an efficient UPDATE SQL query.
The original version was timing out with 312017 post uploads,
the new query took ~3s to run.
* Changes the code to mark uploads as secure/not secure in
the rake task to be an efficient UPDATE SQL query rather than
using UploadSecurity. This took a very long time previously,
and now takes only a few seconds.
* Spread out ACL syncing for uploads into jobs with batches of
100 uploads at a time, so they can be parallelized instead
of having to wait ~1.25 seconds for each ACL to be changed
in S3 serially.
One issue that still remains is post rebaking. Doing this serially
is painfully slow. We have a way to do this in sidekiq via PeriodicalUpdates
but this is limited by max_old_rebakes_per_15_minutes. It would
be better to fan this rebaking out into jobs like we did for the
ACL sync, but that should be done in another PR.
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.
No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
We have not used anything related to bookmarks for PostAction
or UserAction records since 2020, bookmarks are their own thing
now. Deleting all this is just cleaning up old cruft.
A bit of a mixed bag, this addresses several edge areas of bookmarks and makes them compatible with polymorphic bookmarks (hidden behind the `use_polymorphic_bookmarks` site setting). The main ones are:
* ExportUserArchive compatibility
* SyncTopicUserBookmarked job compatibility
* Sending different notifications for the bookmark reminders based on the bookmarkable type
* Import scripts compatibility
* BookmarkReminderNotificationHandler compatibility
This PR also refactors the `register_bookmarkable` API so it accepts a class descended from a `BaseBookmarkable` class instead. This was done because we kept having to add more and more lambdas/properties inline and it was very messy, so a factory pattern is cleaner. The classes can be tested independently as well.
Some later PRs will address some other areas like the discourse narrative bot, advanced search, reports, and the .ics endpoint for bookmarks.
This will make future changes to the 'pull hotlinked images' system easier. This commit should not introduce any functional change.
For now, the old post_custom_field data is kept in the database. This will be dropped in a future commit.
Sometimes we need to update a _lot_ of ACLs on S3 (such as when secure media
is enabled), and since it takes ~1s per upload to update the ACL, this is
best spread out over many jobs instead of having to do the whole thing serially.
In future, it will be better to have a job that can be run based on
a column on uploads (e.g. acl_stale) so we can track progress, similar
to how we can set the baked_version to nil to rebake posts.
raw_html posts (i.e. those which are pulled as part of our comments integration) don't go through our markdown pipeline, so `upload://` URLs are not supported. Running pull_hotlinked_images will break any images in the post.
In future we may add support for pulling hotlinked images in these posts. But for now, disabling it will stop it breaking images.
This commit introduces a new use_polymorphic_bookmarks site setting
that is default false and hidden, that will be used to help continuous
development of polymorphic bookmarks. This setting **should not** be
enabled anywhere in production yet, it is purely for local development.
This commit uses the setting to enable create/update/delete actions
for polymorphic bookmarks on the server and client side. The bookmark
interactions on topics/posts are all usable. Listing, searching,
sending bookmark reminders, and other edge cases will be handled
in subsequent PRs.
Comprehensive UI tests will be added in the final PR -- we already
have them for regular bookmarks, so it will just be a matter of
changing them to be for polymorphic bookmarks.
We validate the *format* of email addresses in many places with a match against
a regex, often with very slightly different syntax.
Adding a separate EmailAddressValidator simplifies the code in a few spots and
feels cleaner.
Deprecated the old location in case someone is using it in a plugin.
No functionality change is in this commit.
Note: the regex used at the moment does not support using address literals, e.g.:
* localpart@[192.168.0.1]
* localpart@[2001:db8::1]
Whenever we got a bounced email in the Email::Receiver we
previously would just set bounced: true on the EmailLog and
discard the status/diagnostic code. This commit changes this
flow to store the bounce error code (defined in the RFC at
https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml)
not just in the Email::Receiver, but also via webhook events
from other mail services and from SNS.
This commit does not surface the bounce error in the UI,
we can do that later if necessary.
Job arguments go via JSON, and so symbols will appear as strings in the Job's `#execute` method. The latest version of Sidekiq has started warning about this to reduce developer confusion.
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
Sometimes, a user may have a malformed email such as
`test@test.com<mailto:test@test.com` their email address,
and as a topic participant will be included as a CC email
when sending a GroupSmtpEmail. This causes the CC parsing to
fail and further down the line in Email::Sender the code
to check the CC addresses expects an array but gets a string
instead because of the parse failure.
Instead, we can just check if the CC addresses are valid
and drop them if they are not in the GroupSmtpEmail job.
An upstream validation bug in the aws-sdk-sns library could enable RCE under certain circumstances. This commit updates the upstream gem, and adds additional validation to provide defense-in-depth.
We don't actually use the reminder_type for bookmarks anywhere;
we are just storing it. It has no bearing on the UI. It used
to be relevant with the at_desktop bookmark reminders (see
fa572d3a7a)
This commit marks the column as readonly, ignores it, and removes
the index, and it will be dropped in a later PR. Some plugins
are relying on reminder_type partially so some stubs have been
left in place to avoid errors.
We don't need no stinkin' denormalization! This commit ignores
the topic_id column on bookmarks, to be deleted at a later date.
We don't really need this column and it's better to rely on the
post.topic_id as the canonical topic_id for bookmarks, then we
don't need to remember to update both columns if the bookmarked
post moves to another topic.
This bug was introduced by f66007ec83.
In PostJobsEnqueuer we previously did not fire the after_post_create
event and after_topic_create event for private message topics. This was
changed in the above commit in order to publish message bus messages
for topic tracking state updates. Unfortunately this caused the
NotifyMailingListSubscribers job to be enqueued for all posts including
private messages, and admins and the users involved in the PMs got
emailed the contents of the PMs if they had mailing list mode enabled.
Luckily the impact of this was mitigated by a Guardian#can_see? check
for each mailing list mode user in the NotifyMailingListSubscribers job.
We never want to notify mailing list mode subscribers for private messages
so an early return has been added there, plus the logic in PostJobsEnqueuer
has been fixed, and tests have been added to that class where there were
none before.
There are certain design decisions that were made in this commit.
Private messages implements its own version of topic tracking state because there are significant differences between regular and private_message topics. Regular topics have to track categories and tags while private messages do not. It is much easier to design the new topic tracking state if we maintain two different classes, instead of trying to mash this two worlds together.
One MessageBus channel per user and one MessageBus channel per group. This allows each user and each group to have their own channel backlog instead of having one global channel which requires the client to filter away unrelated messages.
* FIX: Clear stale status of reloaded reviewables
Navigating away from and back to the reviewables reloaded Reviewable
records, but did not clear the "stale" attribute.
* FEATURE: Show user who last acted on reviewable
When a user acts on a reviewable, all other clients are notified and a
generic "reviewable was resolved by someone" notice was shown instead of
the buttons. There is no need to keep secret the username of the acting
user.
Currently when bulk-awarding a badge that can be granted multiple times, users in the CSV file are granted the badge once no matter how many times they're listed in the file and only if they don't have the badge already.
This PR adds a new option to the Badge Bulk Award feature so that it's possible to grant users a badge even if they already have the badge and as many times as they appear in the CSV file.
This PR fixes a couple of issues related to group SMTP:
1. When running the group SMTP job, we were exiting early if the email was for the OP because of an IMAP race condition. However this causes issues when replying as a new topic for an existing SMTP topic, as the recipient does not get the OP email which can cause threading problems.
2. When sending emails for a new topic spun out like the issue in 1., we are not maintaining the original subject/topic title because that is based on the incoming email record, which we were not doing because the group SMTP email was never sent because of issue 1.
Use the `sidekiq_retry_in` code from Jobs::UserEmail in group SMTP. Also we don't need to keep `seconds_to_delay` -- sidekiq uses the default delay calculation if you return 0 or nil from the block. See 3330df0ee3/lib/sidekiq/job_retry.rb (L216-L234) for sidekiq default retry delay logic.
I experimented with extracting this into a concern or a module, but `sidekiq_retry_in` is quite magic and it would not allow me to abstract away into a module that calls some method specificall in the child job class.
I would love to write tests for this, but it does not seem possible (not sure if its because of our test
setup) to write tests that test sidekiq's retry capability, and I am not sure if we should be anyway. Initial addition
to UserEmail did not test this functionality
d224966a0e
Skip group SMTP email (and add log) if:
* topic is deleted
* post is deleted
* smtp has been disabled for the group
Skip without log if:
* enable_smtp site setting is false
* disable_emails site setting is yes
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
This PR backtracks a fair bit on this one https://github.com/discourse/discourse/pull/13220/files.
Instead of sending the group SMTP email for each user via `UserNotifications`, we are changing to send only one email with the existing `Jobs::GroupSmtpEmail` job and `GroupSmtpMailer`. We are changing this job and mailer along with `PostAlerter` to make the first topic allowed user the `to_address` for the email and any other `topic_allowed_users` to be the CC address on the email. This is to cut down on emails sent via SMTP, which is subject to daily limits from providers such as Gmail. We log these details in the `EmailLog` table now.
In addition to this, we have changed `PostAlerter` to no longer rely on incoming email email addresses for sending the `GroupSmtpEmail` job. This was unreliable as a user's email could have changed in the meantime. Also it was a little overcomplicated to use the incoming email records -- it is far simpler to reason about to just use topic allowed users.
This also adds a fix to include cc_addresses in the EmailLog.addressed_to_user scope.
ATM it only implements server side of it, as my need is for automation purposes. However it should probably be added in the UI too as it's unexpected to have pinned_until and no bannered_until.
Anonymizing a user changed their email address, destroyed all
associated InvitedUser records, but did not destroy the invites
associated to user's email.
When we call Bookmark.cleanup! we want to make sure that
topic_user.bookmarked is updated for topics linked to the
bookmarks that were deleted. Also when PostDestroyer calls
destroy and recover. We have a job for this already --
SyncTopicUserBookmarked -- so we just utilize that.
Notifying about a tag change sometimes resulted in loading a large
number of users in memory just to perform an exclusion. This commit
prefers to do inclusion (i.e. instead of exclude users X, do include
users in groups Y) and does it in SQL to avoid fetching unnecessary
data that is later discarded.
The previous commits removed reviewables leading to a bad user
experience. This commit updates the status, replaces actions with a
message and greys out the reviewable.
Over the years we accrued many spelling mistakes in the code base.
This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change
- comments
- test descriptions
- other low risk areas
* FEATURE: add support for like webhooks
Add support for like webhooks. Webhook events only send on user membership
in the defined webhook group filters.
This also fixes group webhook events, as before this was never used, and
the logic was not correct.
It used to check if an upload record exists, which is wrong because an
invalid upload record exists even if the upload was not created.
The other improvement is a better log message.
Previously watched words ignored topic titles when applying auto tagging rules.
Also copy has been improved to reflect how the system behaves.
The text hints that we are only watching first post now
scopes are incredibly annoying to preload, simply adding :user_emails is not
enough.
Instead of relying on scopes simply iterate through user_emails which is
properly preloaded.
This removes 2 * N+1 when generating user reports.
When posts are moved from one topic to another, the `topic_user.bookmarked` column for all users in the new and the old topic needs to be resynced, for example because a user bookmarks post 12 in topic 1, then it is moved to topic 2, the topic_user record for topic 1 should no longer be bookmarked. A background job has been added to sync the column for a specified topic, or for no topic at all, which does it for all topics like the migration.
Also includes a migration that we have run in the past to fix bad data.
----
This has been addressed in other places in the past:
https://github.com/discourse/discourse/pull/10211https://github.com/discourse/discourse/pull/10188
When bulk inviting, the uploaded CSV file may contain wrong values for
the user fields. This tries to automatically correct them by finding
the most similar option (by ignoring the case).
Admins can use bulk invites to pre-populate user fields. The imported
CSV file must have a header with "email" column (first position) and
names of the user fields (exact match).
Under the hood, the bulk invite will create staged users and populate
the user fields of those.
The user mailing list mode continued to be silently enabled and
UserEmail job checked just that ignoring site setting
disable_mailing_list_mode.
An additional migrate was added to set disable_mailing_list_mode
to false if any users enabled the mailing list mode already.
It was used both when inviting from a topic page and when creating
invites with "Send to topic on first login", while it should be used
only in the former case.
Onebox content may only be resolved during the process_post job. Onebox content could change the content of the excerpt, so we need to make sure the excerpt is updated accordingly.