In a large forum with millions of users and millions of user_fields
updating the list of dropdown user field options will result in a
502 now due to the large number of fields.
This commit moves the indexing into a job.
This ensures we only ever store correct post and topic timing when the client
notifies.
Previous to this change we would blindly trust the client.
Additionally this has error correction code that will correct the last seen
post number when you visit a topic with incorrect timings.
This commit addresses an issue for sites where secure_uploads
is turned on after the site has been operating without it for
some time.
When uploads are linked when they are used inside a post,
we were setting the access_control_post_id unconditionally
if it was NULL to that post ID and secure_uploads was true.
However this causes issues if an upload has been used in a
few different places, especially if a post was previously
used in a PM and marked secure, so we end up with a case of
the upload using a public post for its access control, which
causes URLs to not use the /secure-uploads/ path in the post,
breaking things like image uploads.
We should only set the access_control_post_id if the post is the first time the
upload is referenced so it cannot hijack uploads from other places.
When lazy load categories is enabled, categories should be loaded with
user activity items and drafts because the categories may not be
preloaded on the client side.
This method name is a bit confusing; with_secure_uploads implies
it may return a block or something with the uploads of the post,
and has_secure_uploads implies that it's checking whether the post
is linked to any secure uploads.
should_secure_uploads? communicates the true intent of this method --
which is to say whether uploads attached to this post should be
secure or not.
* DEV: Add `topic_embed_import_create_args` plugin modifier
This modifier allows a plugin to change the arguments used when creating
a new topic for an imported article.
For example: let's say you want to prepend "Imported: " to the title of
every imported topic. You could use this modifier like so:
```ruby
# In your plugin's code
plugin.register_modifier(:topic_embed_import_create_args) do |args|
args[:title] = "Imported: #{args[:title]}"
args
end
```
In this example, the modifier is prepending "Imported: " to the `title` in the `create_args` hash. This modified title would then be used when the new topic is created.
This PR improves the performance of the `most_replied_to_users` method on the `UserSummary` model.
### Old Query
```ruby
post_query
.joins(
"JOIN posts replies ON posts.topic_id = replies.topic_id AND posts.reply_to_post_number = replies.post_number",
)
# We are removing replies by @user, but we can simplify this by getting the using the user_id on the posts.
.where("replies.user_id <> ?", @user.id)
.group("replies.user_id")
.order("COUNT(*) DESC")
.limit(MAX_SUMMARY_RESULTS)
.pluck("replies.user_id, COUNT(*)")
.each { |r| replied_users[r[0]] = r[1] }
```
### Old Query with corrections
```ruby
post_query
.joins(
"JOIN posts replies ON posts.topic_id = replies.topic_id AND replies.reply_to_post_number = posts.post_number",
)
# Remove replies by @user but instead look on loaded posts (we do this so we don't count self replies)
.where("replies.user_id <> posts.user_id")
.group("replies.user_id")
.order("COUNT(*) DESC")
.limit(MAX_SUMMARY_RESULTS)
.pluck("replies.user_id, COUNT(*)")
.each { |r| replied_users[r[0]] = r[1] }
```
### New Query
```ruby
post_query
.joins(
"JOIN posts replies ON posts.topic_id = replies.topic_id AND posts.reply_to_post_number = replies.post_number",
)
# Only include regular posts in our joins, this makes sure we don't have the bloat of loading private messages
.joins(
"JOIN topics ON replies.topic_id = topics.id AND topics.archetype <> 'private_message'",
)
# Only include visible post types, so exclude posts like whispers, etc
.joins(
"AND replies.post_type IN (#{Topic.visible_post_types(@user, include_moderator_actions: false).join(",")})",
)
.where("replies.user_id <> posts.user_id")
.group("replies.user_id")
.order("COUNT(*) DESC")
.limit(MAX_SUMMARY_RESULTS)
.pluck("replies.user_id, COUNT(*)")
.each { |r| replied_users[r[0]] = r[1] }
```
# Conclusion
`most_replied_to_users` was untested, so I introduced a test for the logic, and have confirmed that it passes on both the new query **AND** the old query.
Thank you @danielwaterworth for the debugging assistance.
When a user is manually deactivated, they should not be deleted by our background job that purges inactive users.
In addition, site settings keywords should accept an array of keywords.
Previously the problem check registry simply looked at the subclasses of ProblemCheck. This was causing some confusion in environments where eager loading is not enabled, as the registry would appear empty as a result of the classes never being referenced (and thus never loaded.)
This PR changes the approach to a more explicit one. I followed other implementations (bookmarkable and hashtag autocomplete.) As a bonus, this now has a neat plugin entry point as well.
This enables the following in Discourse AI
```
plugin.register_modifier(:chat_allowed_bot_user_ids) do |user_ids, guardian|
if guardian.user
mentionables = AiPersona.mentionables(user: guardian.user)
allowed_bot_ids = mentionables.map { |mentionable| mentionable[:user_id] }
user_ids.concat(allowed_bot_ids)
end
user_ids
end
```
some bots that are id < 0 need to be discoverable in search otherwise people can not talk to them.
---------
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
When "lazy load categories" is enabled and parent_category_id was set,
the query fetching categories contained a contradiction filtering both
by parent_category_id and parent_category_id = NULL.
In #26122 we promoted all problem checks defined as class methods on AdminDashboardData to their own first-class ProblemCheck instances.
This PR continues that by promoting problem checks that are implemented as blocks as well. This includes updating a couple plugins that have problem checks.
We never use that information and this also fixes an issue with the BCC plugin which ends up triggering a rate-limit because we were publishing a "NEW_PRIVATE_MESSAGE" to the user sending the BCC for every recipients 💥
Internal - t/118283
This commit makes it so the site settings filter controls and
the list of settings input editors themselves can be used elsewhere
in the admin UI outside of /admin/site_settings
This allows us to provide more targeted groups of settings in different
UI areas where it makes sense to provide them, such as on plugin pages.
You could open a single page for a plugin where you can see information
about that plugin, change settings, and configure it with custom UIs
in the one place.
In future we will do this in "config areas" for other parts of the
admin UI.
In AdminDashboardData we have a bunch of problem checks implemented as methods on that class. This PR absolves it of the responsibility by promoting each of those checks to a first class ProblemCheck. This way each of them can have their own priority and arbitrary functionality can be isolated in its own class.
Think "extract class" refactoring over and over. Since they were all moved we can also get rid of the @@problem_syms class variable which was basically the old version of the registry now replaced by ProblemCheck.realtime.
In addition AdminDashboardData::Problem value object has been entirely replaced with the new ProblemCheck::Problem (with compatible API).
Lastly, I added some RSpec matchers to simplify testing of problem checks and provide helpful error messages when assertions fail.
There are a couple of reasons for this.
The first one is practical, and related to eager loading. Since /lib is not eager loaded, when the application boots, ProblemCheck["identifier"] will be nil because the child classes aren't loaded.
The second one is more conceptual. There turns out to be a lot of inter-dependencies between the part of the problem check system that live in /app and the parts that live in /lib, which probably suggests it should all go in /app.
Why this change?
There are two problematic queries in question here when loading
notifications in various tabs in the user menu:
```
SELECT "notifications".*
FROM "notifications"
LEFT JOIN topics ON notifications.topic_id = topics.id
WHERE "notifications"."user_id" = 1338 AND (topics.id IS NULL OR topics.deleted_at IS NULL)
ORDER BY notifications.high_priority AND NOT notifications.read DESC,
NOT notifications.read AND notifications.notification_type NOT IN (5,19,25) DESC,
notifications.created_at DESC
LIMIT 30;
```
and
```
EXPLAIN ANALYZE SELECT "notifications".*
FROM "notifications"
LEFT JOIN topics ON notifications.topic_id = topics.id
WHERE "notifications"."user_id" = 1338
AND (topics.id IS NULL OR topics.deleted_at IS NULL)
AND "notifications"."notification_type" IN (5, 19, 25)
ORDER BY notifications.high_priority AND NOT notifications.read DESC, NOT notifications.read DESC, notifications.created_at DESC LIMIT 30;
```
For a particular user, the queries takes about 40ms and 26ms
respectively on one of our production instance where the user has 10K notifications while the site has 600K notifications in total.
What does this change do?
1. Adds the `index_notifications_user_menu_ordering` index to the `notifications` table which is
indexed on `(user_id, (high_priority AND NOT read) DESC, (NOT read)
DESC, created_at DESC)`.
1. Adds a second index `index_notifications_user_menu_ordering_deprioritized_likes` to the `notifications`
table which is indexed on `(user_id, (high_priority AND NOT read) DESC, (NOT read AND notification_type NOT IN (5,19,25)) DESC, created_at DESC)`. Note that we have to hardcode the like typed notifications type here as it is being used in an ordering clause.
With the two indexes above, both queries complete in roughly 0.2ms. While I acknowledge that there will be some overhead in insert,update or delete operations. I believe this trade-off is worth it since viewing notifications in the user menu is something that is at the core of using a Discourse forum so we should optimise this experience as much as possible.
Currently, the trust level method is calculating trust level based on maximum value from:
- locked trust level
- group automatic trust level
- previously granted trust level by admin
https://github.com/discourse/discourse/blob/main/lib/trust_level.rb#L33
Let's say the user belongs to groups with automatic trust level 1 and in the meantime meets all criteria to get trust level 2.
Each time, a user is removed from a group with automatic trust_level 1, they will be downgraded to trust_level 1 and promoted to trust_level 2
120a2f70a9/lib/promotion.rb (L142)
This will cause duplicated promotion messages.
Therefore, we have to check if the user meets the criteria, before downgrading.
Why this change?
Prior to this change, the `CategoryList#find_relevant_topics` method was
loading and allocating all `CategoryFeaturedTopic` records in the
database to eventually only just use its `category_id` and `topic_id`
column. On a site with many `CategoryFeaturedTopic` records, the loading
of the ActiveRecord objects is a source of bottleneck.
The other problem with the `CategoryList#find_relevant_topics` method is
that it is unconditionally loading all records from the database even if
the user does not have access to the category. This again is wasteful.
What does this change do?
This commit makes it such that `CategoryList#find_relevant_topics` is
called only after `CategoryList#find_categories` in the `CategoryList#initialize`
method so that we can filter featured topics against categories that the
user has access to.
The second change is that Instead of loading `CategoryFeaturedTopic` records, we make an
inner join agains the `topics` table instead and skip any allocation of
`CatgoryFeaturedTopic` ActiveRecord objects.
It's February 29th, you know what that means...date-based flaky specs! If today is
February 29th 2024:
```
freeze_time(1.year.ago) -> Tue, 28 Feb 2023 01:38:42.732875000 UTC +00:00
```
Then
```
freeze_time(1.year.from_now) -> Wed, 28 Feb 2024 01:38:42.732875000 UTC +00:00
```
So then our "now" for the insert query ends up being "yesterday"
```
WHERE topic_hot_scores.topic_id IS NULL
AND topics.deleted_at IS NULL
AND topics.archetype <> :private_message
AND topics.created_at <= :now
```
As part of problem checks refactoring, we're moving some data to be DB backed. In this PR it's the tracking of problem check execution. When was it last run, when was the last problem, when should it run next, how many consecutive checks had problems, etc.
This allows us to implement the perform_every feature in scheduled problem checks for checks that don't need to be run every 10 minutes.
Why this change?
This change adds validation for the default value for `type: objects` theme
settings when a setting theme field is uploaded. This helps the theme
author to ensure that the objects which they specifc in the default
value adhere to the schema which they have declared.
When an error is encountered in one of the objects, the error
message will look something like:
`"The property at JSON Pointer '/0/title' must be at least 5 characters
long."`
We use a JSON Pointer to reference the property in the object which is
something most json-schema validator uses as well.
What does this change do?
1. This commit once again changes the shape of hash returned by
`ThemeSettingsObjectValidator.validate`. Instead of using the
property name as the key previously, we have decided to avoid
multiple levels of nesting and instead use a JSON Pointer as the key
which helps to simplify the implementation.
2 Introduces `ThemeSettingsObjectValidator.validate_objects` which
returns an array of validation error messages for all the objects
passed to the method.
Previously, problem checks were all added as either class methods or blocks in AdminDashboardData. Another set of class methods were used to add and run problem checks.
As of this PR, problem checks are promoted to first-class citizens. Each problem check receives their own class. This class of course contains the implementation for running the check, but also configuration items like retry strategies (for scheduled checks.)
In addition, the parent class ProblemCheck also serves as a registry for checks. For example we can get a list of all existing check classes through ProblemCheck.checks, or just the ones running on a schedule through ProblemCheck.scheduled.
After this refactor, the task of adding a new check is significantly simplified. You add a class that inherits ProblemCheck, you implement it, add a test, and you're good to go.
Why this change?
Firstly, note that this is not a security commit because this feature is
still in development and should not be used anywhere.
The reason we want to set a limit here is to greatly reduce the
possibility of a DoS attack in the future via `ThemeSetting` where
someone would set an arbituary large json string in
`ThemeSetting#json_value` and causing the server to run out of resources
trying to serialize/deserialize the value.
What does this change do?
Adds an ActiveRecord validation to ensure that the bytesize of the json
string being stored is smaller than or equal to 0.5mb. We believe 0.5mb
is a decent limit for now but we can review the limit in the future if
we believe it is too small.
Why this change?
The logic for validating a theme setting's value and default value was
not consistent as each part of the code would implement its own logic.
This is not ideal as the default value may be validated differently than
when we are setting a new value. Therefore, this commit seeks to
refactor all the validation logic for a theme setting's value into a
single service class.
What does this change do?
Introduce the `ThemeSettingsValidator` service class which holds all the
necessary helper methods required to validate a theme setting's value
When "lazy load categories" is enabled, only the categories present in
the sidebar are preloaded. This is insufficient because the parent
categories are necessary too for the sidebar to be rendered properly.
The strict-dynamic CSP directive is supported in all our target browsers, and makes for a much simpler configuration. Instead of allowlisting paths, we use a per-request nonce to authorize `<script>` tags, and then those scripts are allowed to load additional scripts (or add additional inline scripts) without restriction.
This becomes especially useful when admins want to add external scripts like Google Tag Manager, or advertising scripts, which then go on to load a ton of other scripts.
All script tags introduced via themes will automatically have the nonce attribute applied, so it should be zero-effort for theme developers. Plugins *may* need some changes if they are inserting their own script tags.
This commit introduces a strict-dynamic-based CSP behind an experimental `content_security_policy_strict_dynamic` site setting.
Reactions needs this to be able to filter out likes received
actions, where there is also an associated reaction, since
now most reactions also count as a like.
Fixes an issue where private topics that are quoted have an incorrectly formatted url when using a subfolder install.
This update returns a relative url that includes the base_path rather than a combination of base_url + base_path.
This commit changes `max_image_megapixels` to be used
as is without multiplying by 2 to give extra leway.
We found in reality this was just causing confusion
for admins, especially with the already permissive
40MP default.