This commit fleshes out and adds functionality for the new `#hashtag` search and
lookup system, still hidden behind the `enable_experimental_hashtag_autocomplete`
feature flag.
**Serverside**
We have two plugin API registration methods that are used to define data sources
(`register_hashtag_data_source`) and hashtag result type priorities depending on
the context (`register_hashtag_type_in_context`). Reading the comments in plugin.rb
should make it clear what these are doing. Reading the `HashtagAutocompleteService`
in full will likely help a lot as well.
Each data source is responsible for providing its own **lookup** and **search**
method that returns hashtag results based on the arguments provided. For example,
the category hashtag data source has to take into account parent categories and
how they relate, and each data source has to define their own icon to use for the
hashtag, and so on.
The `Site` serializer has two new attributes that source data from `HashtagAutocompleteService`.
There is `hashtag_icons` that is just a simple array of all the different icons that
can be used for allowlisting in our markdown pipeline, and there is `hashtag_context_configurations`
that is used to store the type priority orders for each registered context.
When sending emails, we cannot render the SVG icons for hashtags, so
we need to change the HTML hashtags to the normal `#hashtag` text.
**Markdown**
The `hashtag-autocomplete.js` file is where I have added the new `hashtag-autocomplete`
markdown rule, and like all of our rules this is used to cook the raw text on both the clientside
and on the serverside using MiniRacer. Only on the server side do we actually reach out to
the database with the `hashtagLookup` function, on the clientside we just render a plainer
version of the hashtag HTML. Only in the composer preview do we do further lookups based
on this.
This rule is the first one (that I can find) that uses the `currentUser` based on a passed
in `user_id` for guardian checks in markdown rendering code. This is the `last_editor_id`
for both the post and chat message. In some cases we need to cook without a user present,
so the `Discourse.system_user` is used in this case.
**Chat Channels**
This also contains the changes required for chat so that chat channels can be used
as a data source for hashtag searches and lookups. This data source will only be
used when `enable_experimental_hashtag_autocomplete` is `true`, so we don't have
to worry about channel results suddenly turning up.
------
**Known Rough Edges**
- Onebox excerpts will not render the icon svg/use tags, I plan to address that in a follow up PR
- Selecting a hashtag + pressing the Quote button will result in weird behaviour, I plan to address that in a follow up PR
- Mixed hashtag contexts for hashtags without a type suffix will not work correctly, e.g. #ux which is both a category and a channel slug will resolve to a category when used inside a post or within a [chat] transcript in that post. Users can get around this manually by adding the correct suffix, for example ::channel. We may get to this at some point in future
- Icons will not show for the hashtags in emails since SVG support is so terrible in email (this is not likely to be resolved, but still noting for posterity)
- Additional refinements and review fixes wil
This commit renames all secure_media related settings to secure_uploads_* along with the associated functionality.
This is being done because "media" does not really cover it, we aren't just doing this for images and videos etc. but for all uploads in the site.
Additionally, in future we want to secure more types of uploads, and enable a kind of "mixed mode" where some uploads are secure and some are not, so keeping media in the name is just confusing.
This also keeps compatibility with the `secure-media-uploads` path, and changes new
secure URLs to be `secure-uploads`.
Deprecated settings:
* secure_media -> secure_uploads
* secure_media_allow_embed_images_in_emails -> secure_uploads_allow_embed_images_in_emails
* secure_media_max_email_embed_image_size_kb -> secure_uploads_max_email_embed_image_size_kb
See https://meta.discourse.org/t/discourse-email-messages-are-incorrectly-threaded/233499
for thorough reasoning.
This commit changes how we generate Message-IDs and do email
threading for emails sent from Discourse. The main changes are
as follows:
* Introduce an outbound_message_id column on Post that
is either a) filled with a Discourse-generated Message-ID
the first time that post is used for an outbound email
or b) filled with an original Message-ID from an external
mail client or service if the post was created from an
incoming email.
* Change Discourse-generated Message-IDs to be more consistent
and static, in the format `discourse/post/:post_id@:host`
* Do not send References or In-Reply-To headers for emails sent
for the OP of topics.
* Make sure that In-Reply-To is filled with either a) the OP's
Message-ID if the post is not a direct reply or b) the parent
post's Message-ID
* Make sure that In-Reply-To has all referenced post's Message-IDs
* Make sure that References is filled with a chain of Message-IDs
from the OP down to the parent post of the new post.
We also are keeping X-Discourse-Post-Id and X-Discourse-Topic-Id,
headers that we previously removed, for easier visual debugging
of outbound emails.
Finally, we backfill the `outbound_message_id` for posts that have
a linked `IncomingEmail` record, using the `message_id` of that record.
We do not need to do that for posts that don't have an incoming email
since they are backfilled at runtime if `outbound_message_id` is missing.
Updates automatically data on the stats section of the topic.
It will update automatically the following information: likes, replies and last reply (timestamp and user)
This table holds associations between uploads and other models. This can be used to prevent removing uploads that are still in use.
* DEV: Create upload_references
* DEV: Use UploadReference instead of PostUpload
* DEV: Use UploadReference for SiteSetting
* DEV: Use UploadReference for Badge
* DEV: Use UploadReference for Category
* DEV: Use UploadReference for CustomEmoji
* DEV: Use UploadReference for Group
* DEV: Use UploadReference for ThemeField
* DEV: Use UploadReference for ThemeSetting
* DEV: Use UploadReference for User
* DEV: Use UploadReference for UserAvatar
* DEV: Use UploadReference for UserExport
* DEV: Use UploadReference for UserProfile
* DEV: Add method to extract uploads from raw text
* DEV: Use UploadReference for Draft
* DEV: Use UploadReference for ReviewableQueuedPost
* DEV: Use UploadReference for UserProfile's bio_raw
* DEV: Do not copy user uploads to upload references
* DEV: Copy post uploads again after deploy
* DEV: Use created_at and updated_at from uploads table
* FIX: Check if upload site setting is empty
* DEV: Copy user uploads to upload references
* DEV: Make upload extraction less strict
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.
No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
A bit of a mixed bag, this addresses several edge areas of bookmarks and makes them compatible with polymorphic bookmarks (hidden behind the `use_polymorphic_bookmarks` site setting). The main ones are:
* ExportUserArchive compatibility
* SyncTopicUserBookmarked job compatibility
* Sending different notifications for the bookmark reminders based on the bookmarkable type
* Import scripts compatibility
* BookmarkReminderNotificationHandler compatibility
This PR also refactors the `register_bookmarkable` API so it accepts a class descended from a `BaseBookmarkable` class instead. This was done because we kept having to add more and more lambdas/properties inline and it was very messy, so a factory pattern is cleaner. The classes can be tested independently as well.
Some later PRs will address some other areas like the discourse narrative bot, advanced search, reports, and the .ics endpoint for bookmarks.
This will make future changes to the 'pull hotlinked images' system easier. This commit should not introduce any functional change.
For now, the old post_custom_field data is kept in the database. This will be dropped in a future commit.
* FEATURE: use canonical links in posts.rss feed
Previously we used non canonical links in posts.rss
These links get crawled frequently by crawlers when discovering new
content forcing crawlers to hop to non canonical pages just to end up
visiting canonical pages
This uses up expensive crawl time and adds load on Discourse sites
Old links were of the form:
`https://DOMAIN/t/SLUG/43/21`
New links are of the form
`https://DOMAIN/t/SLUG/43?page=2#post_21`
This also adds a post_id identified element to crawler view that was
missing.
Note, to avoid very expensive N+1 queries required to figure out the
page a post is on during rss generation, we cache that information.
There is a smart "cache breaker" which ensures worst case scenario is
a "page drift" - meaning we would publicize a post is on page 11 when
it is actually on page 10 due to post deletions. Cache holds for up to
12 hours.
Change only impacts public post RSS feeds (`/posts.rss`)
Breakdown of fixes in this commit:
* `UserStat#topic_count` was not updated when visibility of
the topic changed.
* `UserStat#post_count` was not updated when post was hidden or
unhidden.
* `TopicConverter` was only incrementing or decrementing the counts by 1
even if a user has multiple posts in the topic.
* The commit turns off the verbose logging by default as it is just
noise to normal users who are not debugging this problem.
We are planning on attaching bookmarks to more and
more other models, so it makes sense to make a polymorphic
relationship to handle this. This commit adds the new
columns and backfills them in the bookmark table, and
makes sure that any new bookmark changes fill in the columns
via DB triggers.
This way we can gradually change the frontend and backend
to use these new columns, and eventually delete the
old post_id and for_topic columns in `bookmarks`.
* PERF: Remove JOIN on categories for PM search
JOIN on categories is not needed when searchin in private messages as
PMs are not categorized.
* DEV: Use == for string comparison
* PERF: Optimize query for allowed topic groups
There was a query that checked for all topics a user or their groups
were allowed to see. This used UNION between topic_allowed_users and
topic_allowed_groups which was very inefficient. That was replaced with
a OR condition that checks in either tables more efficiently.
Sometimes administrators want to permanently delete posts and topics
from the database. To make sure that this is done for a good reasons,
administrators can do this only after one minute has passed since the
post was deleted or immediately if another administrator does it.
Previosuly, quotes from original topics are rendered incorrectly since the moved posts are not rebaked.
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
Staff can send a post to the review queue by clicking the "Flag Post" button next to "Take Action...". Clicking it flags the post using the "Notify moderators" score type and hides it. A custom message will be sent to the user.
This PR adds security_last_changed_at and security_last_changed_reason to uploads. This has been done to make it easier to track down why an upload's secure column has changed and when. This necessitated a refactor of the UploadSecurity class to provide reasons why the upload security would have changed.
As well as this, a source is now provided from the location which called for the upload's security status to be updated as they are several (e.g. post creator, topic security updater, rake tasks, manual change).
Improvements to make console access to IncomingEmail more pleasant, and stopping certain IMAP logs from landing in the DB because they just create too much noise,
These 2 indexes optimise performance on profile pages.
The summary page displays:
1. A list of "Top Link" - links sorted by number of clicks posted by user
2. A list of "Top Replies" - replies made by a user that go the most hearts
These two areas could devolve into full index or table scans, new indexes are there to avoid this cost on large dbs
One minor downside is that storage requirements go a tiny bit up to maintain the new indexes
Normally, secure media urls are linked like `/secure-media-uploads/...`. In this case, uploads were already being linked correctly.
But sometimes (e.g. when pulling hotlinked onebox images) secure media is referenced with a full domain name (`//example.com/secure-media-uploads`). This commit ensures that those uploads are also linked correctly.
Previously we would unconditionally keep all images downloaded via pull_hotlinked_images, even if they are later removed from the post. This commit removes that logic, and relies on the existing link_post_uploads process to pick up the downloaded images in `cooked`. Specs are added to ensure this is working correctly for regular hotlinked images, and for oneboxes.
This commit should cause no functional change
- Split into functions to avoid deep nesting
- Register custom field type, and remove manual json parse/serialize
- Recover from deleted upload records
Also adds a test to ensure pull_hotlinked_images redownloads secure images only once
When rebaking a post we were invalidating _regular_ oneboxes but not inline oneboxes.
DEV: also renamed 'InlineOneboxer.purge' to 'InlineOneboxer.invalidate' to keep
the API consistent with 'Oneboxer.invalidate'
Previously the pull hotlinked images job was skipped after system edits. This ensured that we never had an infinite loop of system-edit/pull-hotlinked/system-edit/pull-hotlinked etc.
A side effect was that edits made by system for any other reason (e.g. API, removing full quotes) would prevent pulling hotlinked images. This commit removes the system edit check, and replaces it with another method to avoid an infinite job scheduling loop.
This reverts commit 20780a1eee.
* SECURITY: re-adds accidentally reverted commit:
03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
instead of the 03d26cd6 parent (which contains security fixes)
* We now have a site setting "topic_excerpt_maxlength" that is used when the OP is created or revised to generate a topic excerpt.
* However, posts created before this setting was introduced cannot benefit from this change unless they are revised, and if the topic excerpt length setting is changed that situation is also not covererd.
* This PR makes a change to rebake! to update the topic excerpt IF the post is the OP.
Adds a new topic_excerpt_maxlength site setting.
* When topic excerpt is requested for a post, use the new topic_excerpt_maxlength site setting to limit the size of the excerpt
* Remove code for getting/setting Post.excerpt_size as it is not used anywhere
This introduces new APIs for obtaining optimized thumbnails for topics. There are a few building blocks required for this:
- Introduces new `image_upload_id` columns on the `posts` and `topics` table. This replaces the old `image_url` column, which means that thumbnails are now restricted to uploads. Hotlinked thumbnails are no longer possible. In normal use (with pull_hotlinked_images enabled), this has no noticeable impact
- A migration attempts to match existing urls to upload records. If a match cannot be found then the posts will be queued for rebake
- Optimized thumbnails are generated during post_process_cooked. If thumbnails are missing when serializing a topic list, then a sidekiq job is queued
- Topic lists and topics now include a `thumbnails` key, which includes all the available images:
```
"thumbnails": [
{
"max_width": null,
"max_height": null,
"url": "//example.com/original-image.png",
"width": 1380,
"height": 1840
},
{
"max_width": 1024,
"max_height": 1024,
"url": "//example.com/optimized-image.png",
"width": 768,
"height": 1024
}
]
```
- Themes can request additional thumbnail sizes by using a modifier in their `about.json` file:
```
"modifiers": {
"topic_thumbnail_sizes": [
[200, 200],
[800, 800]
],
...
```
Remember that these are generated asynchronously, so your theme should include logic to fallback to other available thumbnails if your requested size has not yet been generated
- Two new raw plugin outlets are introduced, to improve the customisability of the topic list. `topic-list-before-columns` and `topic-list-before-link`
avg_time on posts and topics have not been used in a year.
This uses a re-runnable ddl transaction diasabled migration to
drop the column, cause it touchs very high traffic table and may
deadlock
* Because custom emoji count as post "uploads" we were
marking them as secure when updating the secure status for post uploads.
* We were also giving them an access control post id, which meant
broken image previews from 403 errors in the admin custom emoji list.
* We now check if an upload is used as a custom emoji and do not
assign the access control post + never mark as secure.
### UI Changes
If `SiteSetting.enable_bookmarks_with_reminders` is enabled:
* Clicking "Bookmark" on a topic will create a new Bookmark record instead of a post + user action
* Clicking "Clear Bookmarks" on a topic will delete all the new Bookmark records on a topic
* The topic bookmark buttons control the post bookmark flags correctly and vice-versa
Disabled selecting the "reminder type" for bookmarks in the UI because the backend functionality is not done yet (of sending users notifications etc.)
### Other Changes
* Added delete bookmark route (but no UI yet)
* Added a rake task to sync the old PostAction bookmarks to the new Bookmark table, which can be run as many times as we want for a site (it will not create duplicates).
If our reply tree somehow ends up with cycles or other odd
structures, we only want to consider a reply once, at the first
level in the tree that it appears.
It seems in some situations replies have been moved to other topics but
the `PostReply` table has not been updated. I will try and fix this in a
follow up PR, but for now this fix ensures that every time we ask a post
for its replies that we restrict it to the same topic.
When pull_hotlinked_images tried to run on posts with secure media (which had already been downloaded from external sources) we were getting a 404 when trying to download the image because the secure endpoint doesn't allow anon downloads.
Also, we were getting into an infinite loop of pull_hotlinked_images because the job didn't consider the secure media URLs as "downloaded" already so it kept trying to download them over and over.
In this PR I have also refactored secure-media-upload URL checks and mutations into single source of truth in Upload, adding a SECURE_MEDIA_ROUTE constant to check URLs against too.
Add TopicUploadSecurityManager to handle post moves. When a post moves around or a topic changes between categories and public/private message status the uploads connected to posts in the topic need to have their secure status updated, depending on the security context the topic now lives in.
### General Changes and Duplication
* We now consider a post `with_secure_media?` if it is in a read-restricted category.
* When uploading we now set an upload's secure status straight away.
* When uploading if `SiteSetting.secure_media` is enabled, we do not check to see if the upload already exists using the `sha1` digest of the upload. The `sha1` column of the upload is filled with a `SecureRandom.hex(20)` value which is the same length as `Upload::SHA1_LENGTH`. The `original_sha1` column is filled with the _real_ sha1 digest of the file.
* Whether an upload `should_be_secure?` is now determined by whether the `access_control_post` is `with_secure_media?` (if there is no access control post then we leave the secure status as is).
* When serializing the upload, we now cook the URL if the upload is secure. This is so it shows up correctly in the composer preview, because we set secure status on upload.
### Viewing Secure Media
* The secure-media-upload URL will take the post that the upload is attached to into account via `Guardian.can_see?` for access permissions
* If there is no `access_control_post` then we just deliver the media. This should be a rare occurrance and shouldn't cause issues as the `access_control_post` is set when `link_post_uploads` is called via `CookedPostProcessor`
### Removed
We no longer do any of these because we do not reuse uploads by sha1 if secure media is enabled.
* We no longer have a way to prevent cross-posting of a secure upload from a private context to a public context.
* We no longer have to set `secure: false` for uploads when uploading for a theme component.
The following methods have long been deprecated in ruby due to flaws in their implementation per http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/29293?29179-31097:
URI.escape
URI.unescape
URI.encode
URI.unencode
escape/encode are just aliases for one another. This PR uses the Addressable gem to replace these methods with its own encode, unencode, and encode_component methods where appropriate.
I have put all references to Addressable::URI here into the UrlHelper to keep them corralled in one place to make changes to this implementation easier.
Addressable is now also an explicit gem dependency.
In non-login-required sites, we prevent secure uploads already used in PMs from being used in public topics.
In login_required sites, secure uploads should be reusable in any topic, PM or not.
This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access.
A few notes:
- the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads
- the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured
- upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status
- when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error
- when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3
Doing .pluck(:column).first is a very common pattern in Discourse and in
most cases, a limit cause isn't being added. Instead of adding a limit
clause to all these callsites, this commit adds two new methods to
ActiveRecord::Relation:
pluck_first, equivalent to limit(1).pluck(*columns).first
and pluck_first! which, like other finder methods, raises an exception
when no record is found
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains.
We no longer need to use Rails "require_dependency" anywhere and instead can just use standard
Ruby patterns to require files.
This is a far reaching change and we expect some followups here.
The URL '/images/transparent.png' will be used in the cooked content if upload record not found. In that case we have to use 'short_url' as image src in 'post.each_upload_url' method.