With Embroider, we can rely on async `import()` to do the splitting
for us.
This commit extracts from `pretty-text` all the parts that are
meant to be loaded async into a new `discourse-markdown-it` package
that is also a V2 addon (meaning that all files are presumed unused
until they are imported, aka "static").
Mostly I tried to keep the very discourse specific stuff (accessing
site settings and loading plugin features) inside discourse proper,
while the new package aims to have some resembalance of a general
purpose library, a MarkdownIt++ if you will. It is far from perfect
because of how all the "options" stuff work but I think it's a good
start for more refactorings (clearing up the interfaces) to happen
later.
With this, pretty-text and app/lib/text are mostly a kitchen sink
of loosely related text processing utilities.
After the refactor, a lot more code related to setting up the
engine are now loaded lazily, which should be a pretty nice win. I
also noticed that we are currently pulling in the `xss` library at
initial load to power the "sanitize" stuff, but I suspect with a
similar refactoring effort those usages can be removed too. (See
also #23790).
This PR does not attempt to fix the sanitize issue, but I think it
sets things up on the right trajectory for that to happen later.
Co-authored-by: David Taylor <david@taylorhq.com>
Preloading just metadata is not always respected by browsers, and
sometimes the whole video will be downloaded. This switches to using a
placeholder image for the video and only loads the video when the play
button is clicked.
This commit removes any logic in the app and in specs around
enable_experimental_hashtag_autocomplete and deletes some
old category hashtag code that is no longer necessary.
It also adds a `slug_ref` category instance method, which
will generate a reference like `parent:child` for a category,
with an optional depth, which hashtags use. Also refactors
PostRevisor which was using CategoryHashtagDataSource directly
which is a no-no.
Deletes the old hashtag markdown rule as well.
These avatar-related helper functions are used in pretty-text, which currently means we load the entire `discourse/lib/utilities` module into the mini-racer when running pretty-text on the server side. This stops us adding any logic or imports to discourse/lib/utilities which may depend on other `discourse/` namespace features.
This commit moves the avatar-related utils into a dedicated module in the `discourse-common` namespace, adds backwards-compatibility shims, and updates the pretty-text config accordingly.
https://meta.discourse.org/t/markdown-preview-and-result-differ/263878
The result of this markdown had different results in the composer preview and the post. This is solved by updating Loofah to the latest version and using html5 fragments like our user had reported. While the change was only needed in cooked_post_processor.rb for this fix, other areas also had to be updated due to various side effects.
Added a new modifier hook to allow plugins to modify the @mentions
extracted from a cooked text.
Use case: Some plugins may change how the mentions are cooked to prevent
them from being confused with user or group mentions and display the user
card.
This modifier hook allows the plugin to filter the mentions detected or add new ways
to add mentions into cooked text.
Test were sometimes failing with similar error to the following:
```
1) UsernameChanger#override when unicode_usernames is off overrides the username if a new name has different case
Failure/Error:
protect { v8.eval(<<~JS) }
__paths = #{paths_json};
__utils.avatarImg({size: #{size.inspect}, avatarTemplate: #{avatar_template.inspect}}, __getURL);
JS
MiniRacer::RuntimeError:
ReferenceError: __optInput is not defined
# JavaScript at exports.helperContext (<anonymous>:21:17)
# JavaScript at getRawAvatarSize (<anonymous>:108:49)
# JavaScript at avatarUrl (<anonymous>:102:21)
# JavaScript at Object.avatarImg (<anonymous>:129:15)
# JavaScript at <anonymous>:2:9
# ./lib/pretty_text.rb:259:in `block in avatar_img'
# ./lib/pretty_text.rb:661:in `block in protect'
# ./lib/pretty_text.rb:661:in `synchronize'
# ./lib/pretty_text.rb:661:in `protect'
# ./lib/pretty_text.rb:259:in `avatar_img'
# ./app/jobs/regular/update_username.rb:14:in `execute'
```
This should not be needed as it should already have been initialised but that should stop the flakey-ness for now while being a safe change.
* FEATURE: reduce avatar sizes to 6 from 20
This PR introduces 3 changes:
1. SiteSetting.avatar_sizes, now does what is says on the tin.
previously it would introduce a large number of extra sizes, to allow for
various DPIs. Instead we now trust the admin with the size list.
2. When `avatar_sizes` changes, we ensure consistency and remove resized
avatars that are not longer allowed per site setting. This happens on the
12 hourly job and limited out of the box to 20k cleanups per cycle, given
this may reach out to AWS 20k times to remove things.
3.Our default avatar sizes are now "24|48|72|96|144|288" these sizes were
very specifically picked to limit amount of bluriness introduced by webkit.
Our avatars are already blurry due to 1px border, so this corrects old blur.
This change heavily reduces storage required by forums which simplifies
site moves and more.
Co-authored-by: David Taylor <david@taylorhq.com>
This commit makes some fundamental changes to how hashtag cooking and
icon generation works in the new experimental hashtag autocomplete mode.
Previously we cooked the appropriate SVG icon with the cooked hashtag,
though this has proved inflexible especially for theming purposes.
Instead, we now cook a data-ID attribute with the hashtag and add a new
span as an icon placeholder. This is replaced on the client side with an
icon (or a square span in the case of categories) on the client side via
the decorateCooked API for posts and chat messages.
This client side logic uses the generated hashtag, category, and channel
CSS classes added in a previous commit.
This is missing changes to the sidebar to use the new generated CSS
classes and also colors and the split square for categories in the
hashtag autocomplete menu -- I will tackle this in a separate PR so it
is clearer.
Watched words were converted to regular expressions containing \W, which
handled only ASCII characters. Using [^[:word]] instead ensures that
UTF-8 characters are also handled correctly.
This feature will allow sites to define which emoji are not allowed. Emoji in this list should be excluded from the set we show in the core emoji picker used in the composer for posts when emoji are enabled. And they should not be allowed to be chosen to be added to messages or as reactions in chat.
This feature prevents denied emoji from appearing in the following scenarios:
- topic title and page title
- private messages (topic title and body)
- inserting emojis into a chat
- reacting to chat messages
- using the emoji picker (composer, user status etc)
- using search within emoji picker
It also takes into account the various ways that emojis can be accessed, such as:
- emoji autocomplete suggestions
- emoji favourites (auto populates when adding to emoji deny list for example)
- emoji inline translations
- emoji skintones (ie. for certain hand gestures)
This was added in d3f02a1270
for hashtags but later removed usage in
b2acc416e7. It was removed because
serializing the user does not include things like their
secure_categories.
It is not used by any other plugins or themes, and can cause
issues where it will error when operating on a null user. Better
to just pass in the user_id and use it to look up a user
directly in a PrettyText::Helper
There was an issue where if hashtag-cooked HTML was sent
to the ExcerptParser without the keep_svg option, we would
end up with empty </use> and </svg> tags on the parts of the
excerpt where the hashtag was, in this case when a post
push notification was sent.
Fixed this, and also added a way to only display a plaintext
version of the hashtag for cases like this via PrettyText#excerpt.
* FIX: Use Category.secured(guardian) for hashtag datasource
Follow up to comments in #19219, changing the category
hashtag datasource to use Category.secured(guardian) instead
of Site.new(guardian).categories here since the latter does
more work for not much benefit, and the query time is the
same. Also eliminates some Hash -> Model back and forth
busywork. Add some more specs too.
* FIX: Server-side hashtag lookup cooking user loading
When we were using the PrettyText.options.currentUser
and parsing back and forth with JSON for the hashtag
lookups server-side, we had a bug where the user's
secure categories were not loaded since we never actually
loaded a User model from the database, only parsed it
from JSON.
This commit fixes the issue by instead using the
PretyText.options.userId and looking up the user directly
from the database when calling hashtag_lookup via the
PrettyText::Helpers code when cooking server-side. Added
the missing spec to check for this as well.
Note that we don't have a database table and a model for post mentions yet, and I decided to implement it without adding one to avoid heavy data migrations. Still, we may want to add such a model later, that would be convenient, we have such a model for mentions in chat.
Note that status appears on all mentions on all posts in a topic except of the case when you just posted a new post, and it appeared on the bottom of the topic. On such posts, status won't be shown immediately for now (you'll need to reload the page to see the status). I'll take care of it in one of the following PRs.
This commit fleshes out and adds functionality for the new `#hashtag` search and
lookup system, still hidden behind the `enable_experimental_hashtag_autocomplete`
feature flag.
**Serverside**
We have two plugin API registration methods that are used to define data sources
(`register_hashtag_data_source`) and hashtag result type priorities depending on
the context (`register_hashtag_type_in_context`). Reading the comments in plugin.rb
should make it clear what these are doing. Reading the `HashtagAutocompleteService`
in full will likely help a lot as well.
Each data source is responsible for providing its own **lookup** and **search**
method that returns hashtag results based on the arguments provided. For example,
the category hashtag data source has to take into account parent categories and
how they relate, and each data source has to define their own icon to use for the
hashtag, and so on.
The `Site` serializer has two new attributes that source data from `HashtagAutocompleteService`.
There is `hashtag_icons` that is just a simple array of all the different icons that
can be used for allowlisting in our markdown pipeline, and there is `hashtag_context_configurations`
that is used to store the type priority orders for each registered context.
When sending emails, we cannot render the SVG icons for hashtags, so
we need to change the HTML hashtags to the normal `#hashtag` text.
**Markdown**
The `hashtag-autocomplete.js` file is where I have added the new `hashtag-autocomplete`
markdown rule, and like all of our rules this is used to cook the raw text on both the clientside
and on the serverside using MiniRacer. Only on the server side do we actually reach out to
the database with the `hashtagLookup` function, on the clientside we just render a plainer
version of the hashtag HTML. Only in the composer preview do we do further lookups based
on this.
This rule is the first one (that I can find) that uses the `currentUser` based on a passed
in `user_id` for guardian checks in markdown rendering code. This is the `last_editor_id`
for both the post and chat message. In some cases we need to cook without a user present,
so the `Discourse.system_user` is used in this case.
**Chat Channels**
This also contains the changes required for chat so that chat channels can be used
as a data source for hashtag searches and lookups. This data source will only be
used when `enable_experimental_hashtag_autocomplete` is `true`, so we don't have
to worry about channel results suddenly turning up.
------
**Known Rough Edges**
- Onebox excerpts will not render the icon svg/use tags, I plan to address that in a follow up PR
- Selecting a hashtag + pressing the Quote button will result in weird behaviour, I plan to address that in a follow up PR
- Mixed hashtag contexts for hashtags without a type suffix will not work correctly, e.g. #ux which is both a category and a channel slug will resolve to a category when used inside a post or within a [chat] transcript in that post. Users can get around this manually by adding the correct suffix, for example ::channel. We may get to this at some point in future
- Icons will not show for the hashtags in emails since SVG support is so terrible in email (this is not likely to be resolved, but still noting for posterity)
- Additional refinements and review fixes wil
This commit renames all secure_media related settings to secure_uploads_* along with the associated functionality.
This is being done because "media" does not really cover it, we aren't just doing this for images and videos etc. but for all uploads in the site.
Additionally, in future we want to secure more types of uploads, and enable a kind of "mixed mode" where some uploads are secure and some are not, so keeping media in the name is just confusing.
This also keeps compatibility with the `secure-media-uploads` path, and changes new
secure URLs to be `secure-uploads`.
Deprecated settings:
* secure_media -> secure_uploads
* secure_media_allow_embed_images_in_emails -> secure_uploads_allow_embed_images_in_emails
* secure_media_max_email_embed_image_size_kb -> secure_uploads_max_email_embed_image_size_kb
We were already compiling the markdown bundle via ember-cli, but that version was only being used in the test environment. This commit improves the implementation, and updates the filename so it's also used in production.
This commit also
- Removes the vendored copy of `markdown-it.js` and fetches from node_modules instead
- Updates `pretty_text.rb` to remove the custom sprockets-manifest-parsing
- Removes `pretty-text-bundle.js`, which was only being used by `pretty_text.rb`
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
* FEATURE: Add case-sensitivity flag to watched_words
Currently, all watched words are matched case-insensitively. This flag
allows a watched word to be flagged for case-sensitive matching.
To allow allow for backwards compatibility the flag is set to false by
default.
* FEATURE: Support case-sensitive creation of Watched Words via API
Extend admin creation and upload of Watched Words to support case
sensitive flag. This lays the ground work for supporting
case-insensitive matching of Watched Words.
Support for an extra column has also been introduced for the Watched
Words upload CSV file. The new column structure is as follows:
word,replacement,case_sentive
* FEATURE: Enable case-sensitive matching of Watched Words
WordWatcher's word_matcher_regexp now returns a list of regular
expressions instead of one case-insensitive regular expression.
With the ability to flag a Watched Word as case-sensitive, an action
can have words of both sensitivities.This makes the use of the global
Regexp::IGNORECASE flag added to all words problematic.
To get around platform limitations around the use of subexpression level
switches/flags, a list of regular expressions is returned instead, one for each
case sensitivity.
Word matching has also been updated to use this list of regular expressions
instead of one.
* FEATURE: Use case-sensitive regular expressions for Watched Words
Update Watched Words regular expressions matching and processing to handle
the extra metadata which comes along with the introduction of
case-sensitive Watched Words.
This allows case-sensitive Watched Words to matched as such.
* DEV: Simplify type casting of case-sensitive flag from uploads
Use builtin semantics instead of a custom method for converting
string case flags in uploaded Watched Words to boolean.
* UX: Add case-sensitivity details to Admin Watched Words UI
Update Watched Word form to include a toggle for case-sensitivity.
This also adds support for, case-sensitive testing and matching of Watched Word
in the admin UI.
* DEV: Code improvements from review feedback
- Extract watched word regex creation out to a utility function
- Make JS array presence check more explicit and readable
* DEV: Extract Watched Word regex creation to utility function
Clean-up work from review feedback. Reduce code duplication.
* DEV: Rename word_matcher_regexp to word_matcher_regexp_list
Since a list is returned now instead of a single regular expression,
change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate
this change.
* DEV: Incorporate WordWatcher updates from upstream
Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches
that aren't at the beginning of the line.
This commit introduces a new site setting: `block_hotlinked_media`. When enabled, all attempts to hotlink media (images, videos, and audio) will fail, and be replaced with a linked placeholder. Exceptions to the rule can be added via `block_hotlinked_media_exceptions`.
`download_remote_image_to_local` can be used alongside this feature. In that case, hotlinked images will be blocked immediately when the post is created, but will then be replaced with the downloaded version a few seconds later.
This implementation is purely server-side, and does not impact the composer preview.
Technically, there are two stages to this feature:
1. `PrettyText.sanitize_hotlinked_media` is called during `PrettyText.cook`, and whenever new images are introduced by Onebox. It will iterate over all src/srcset attributes in the post HTML and check if they're allowed. If not, the attributes will be removed and replaced with a `data-blocked-hotlinked-src(set)` attribute
2. In the `CookedPostProcessor`, we iterate over all `data-blocked-hotlinked-src(set)` attributes and check whether we have a downloaded version of the media. If yes, we update the src to use the downloaded version. If not, the entire media element is replaced with a placeholder. The placeholder is labelled 'external media', and is a link to the offsite media.
This option will make it so the [quote] bbcode will always
include the HTML link to the quoted post, even if a topic_id
is not provided in the PrettyText#cook options. This is so
[quote] bbcode can be used in other places, like chat messages,
that always need the link and do not have an "off-topic" ID
to use.
This will allow pretty text deprecations / errors / warnings to appear in the Rails logs, rather than disappearing silently.
(implementation adapted from `discourse_js_processor.rb`)
Sometimes plugins need to have additional data or options available
when rendering custom markdown features/rules that are not available
on the default opts.discourse object. These additional options should
be namespaced to the plugin adding them.
```
Site.markdown_additional_options["chat"] = { limited_pretty_text_markdown_rules: [] }
```
These are passed down to markdown rules on opts.discourse.additionalOptions.
The main motivation for adding this is the chat plugin, which currently stores
chat_pretty_text_features and chat_pretty_text_markdown_rules on
the Site object via additions to the serializer, and the Site object is
not accessible to import via markdown rules (either through
Site.current() or through container.lookup). So, to have this working
for both front + backend code, we need to attach these additional options
from the Site object onto the markdown options object.
This commit extends the options which can be passed to
`PrettyText.markdown` so that which Markdown-it rules and Discourse
Markdown plugins to be used when rendering a text can be customizable.
Currently, this extension is mainly used by plugins.
When rendering the markdown code blocks we replace the
offending characters in the output string with spans highlighting a textual
representation of the character, along with a title attribute with
information about why the character was highlighted.
The list of characters stripped by this fix, which are the bidirectional
characters considered relevant, are:
U+202A
U+202B
U+202C
U+202D
U+202E
U+2066
U+2067
U+2068
U+2069
If a user posted a URL that appeared inside a Onebox, then the user
got a duplicate link notice. This was fixed by skipping those links in
Ruby.
If a user posted a URL that was Oneboxes and contained other links that
appeared in previous posts, then the user got a duplicate link notice.
This was fixed by skipping those links in JavaScript.
It was not clear that replace watched words can be used to replace text
with URLs. This introduces a new watched word type that makes it easier
to understand.
Watched words are always regular expressions, despite watched_words_
_regular_expressions being enabled or not. Internally, wildcard
characters are replaced with a regular expression that matches any non
whitespace character.
This is not a security issue because regular users are not allowed to insert FA icons anywhere in the app. Admins can insert icons via custom badges, but they do have the ability to create themes with JS.
This commit includes other various improvements to watched words.
auto_silence_first_post_regex site setting was removed because it overlapped
with 'require approval' watched words.