This fix changes the hashtag-raw hashtags, which are
the ones that do not actually match anything, back
to the old style which does not look like mentions.
Follow up from d3f02a1270
This commit fixes post quoting so that if the new
hashtag-cooked HTML is selected, we convert back to
a regular plain text #hashtag with the correct type and ref.
This commit fleshes out and adds functionality for the new `#hashtag` search and
lookup system, still hidden behind the `enable_experimental_hashtag_autocomplete`
feature flag.
**Serverside**
We have two plugin API registration methods that are used to define data sources
(`register_hashtag_data_source`) and hashtag result type priorities depending on
the context (`register_hashtag_type_in_context`). Reading the comments in plugin.rb
should make it clear what these are doing. Reading the `HashtagAutocompleteService`
in full will likely help a lot as well.
Each data source is responsible for providing its own **lookup** and **search**
method that returns hashtag results based on the arguments provided. For example,
the category hashtag data source has to take into account parent categories and
how they relate, and each data source has to define their own icon to use for the
hashtag, and so on.
The `Site` serializer has two new attributes that source data from `HashtagAutocompleteService`.
There is `hashtag_icons` that is just a simple array of all the different icons that
can be used for allowlisting in our markdown pipeline, and there is `hashtag_context_configurations`
that is used to store the type priority orders for each registered context.
When sending emails, we cannot render the SVG icons for hashtags, so
we need to change the HTML hashtags to the normal `#hashtag` text.
**Markdown**
The `hashtag-autocomplete.js` file is where I have added the new `hashtag-autocomplete`
markdown rule, and like all of our rules this is used to cook the raw text on both the clientside
and on the serverside using MiniRacer. Only on the server side do we actually reach out to
the database with the `hashtagLookup` function, on the clientside we just render a plainer
version of the hashtag HTML. Only in the composer preview do we do further lookups based
on this.
This rule is the first one (that I can find) that uses the `currentUser` based on a passed
in `user_id` for guardian checks in markdown rendering code. This is the `last_editor_id`
for both the post and chat message. In some cases we need to cook without a user present,
so the `Discourse.system_user` is used in this case.
**Chat Channels**
This also contains the changes required for chat so that chat channels can be used
as a data source for hashtag searches and lookups. This data source will only be
used when `enable_experimental_hashtag_autocomplete` is `true`, so we don't have
to worry about channel results suddenly turning up.
------
**Known Rough Edges**
- Onebox excerpts will not render the icon svg/use tags, I plan to address that in a follow up PR
- Selecting a hashtag + pressing the Quote button will result in weird behaviour, I plan to address that in a follow up PR
- Mixed hashtag contexts for hashtags without a type suffix will not work correctly, e.g. #ux which is both a category and a channel slug will resolve to a category when used inside a post or within a [chat] transcript in that post. Users can get around this manually by adding the correct suffix, for example ::channel. We may get to this at some point in future
- Icons will not show for the hashtags in emails since SVG support is so terrible in email (this is not likely to be resolved, but still noting for posterity)
- Additional refinements and review fixes wil
This commit renames all secure_media related settings to secure_uploads_* along with the associated functionality.
This is being done because "media" does not really cover it, we aren't just doing this for images and videos etc. but for all uploads in the site.
Additionally, in future we want to secure more types of uploads, and enable a kind of "mixed mode" where some uploads are secure and some are not, so keeping media in the name is just confusing.
This also keeps compatibility with the `secure-media-uploads` path, and changes new
secure URLs to be `secure-uploads`.
Deprecated settings:
* secure_media -> secure_uploads
* secure_media_allow_embed_images_in_emails -> secure_uploads_allow_embed_images_in_emails
* secure_media_max_email_embed_image_size_kb -> secure_uploads_max_email_embed_image_size_kb
Upgrading to Markdown.it v13 broke empty inline BBCodes. This works around the problem by adding an empty token before a closing token if the previous token was a BBCode token.
It also removes the unused `jump` attribute which was removed in Markdown.it v12.3
* FEATURE: Add case-sensitivity flag to watched_words
Currently, all watched words are matched case-insensitively. This flag
allows a watched word to be flagged for case-sensitive matching.
To allow allow for backwards compatibility the flag is set to false by
default.
* FEATURE: Support case-sensitive creation of Watched Words via API
Extend admin creation and upload of Watched Words to support case
sensitive flag. This lays the ground work for supporting
case-insensitive matching of Watched Words.
Support for an extra column has also been introduced for the Watched
Words upload CSV file. The new column structure is as follows:
word,replacement,case_sentive
* FEATURE: Enable case-sensitive matching of Watched Words
WordWatcher's word_matcher_regexp now returns a list of regular
expressions instead of one case-insensitive regular expression.
With the ability to flag a Watched Word as case-sensitive, an action
can have words of both sensitivities.This makes the use of the global
Regexp::IGNORECASE flag added to all words problematic.
To get around platform limitations around the use of subexpression level
switches/flags, a list of regular expressions is returned instead, one for each
case sensitivity.
Word matching has also been updated to use this list of regular expressions
instead of one.
* FEATURE: Use case-sensitive regular expressions for Watched Words
Update Watched Words regular expressions matching and processing to handle
the extra metadata which comes along with the introduction of
case-sensitive Watched Words.
This allows case-sensitive Watched Words to matched as such.
* DEV: Simplify type casting of case-sensitive flag from uploads
Use builtin semantics instead of a custom method for converting
string case flags in uploaded Watched Words to boolean.
* UX: Add case-sensitivity details to Admin Watched Words UI
Update Watched Word form to include a toggle for case-sensitivity.
This also adds support for, case-sensitive testing and matching of Watched Word
in the admin UI.
* DEV: Code improvements from review feedback
- Extract watched word regex creation out to a utility function
- Make JS array presence check more explicit and readable
* DEV: Extract Watched Word regex creation to utility function
Clean-up work from review feedback. Reduce code duplication.
* DEV: Rename word_matcher_regexp to word_matcher_regexp_list
Since a list is returned now instead of a single regular expression,
change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate
this change.
* DEV: Incorporate WordWatcher updates from upstream
Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches
that aren't at the beginning of the line.
Also, the change in insert-hyperlink (from `this.linkUrl.indexOf("http") === -1` to `!this.linkUrl.startsWith("http")`) was intentional fix: we don't want to prevent users from looking up topics with http in their titles.
Seems to only be a problem when a markdown.it rule inserts links without a attribute value. There's no test, because it's not reproducible with the markdown rules in core.
Updates markdown-it to v13.0.1
Noteworthy changes:
* `markdownit()` is now available on `globalThis` instead of `window`.
* The `text_collapse` rule was renamed to `fragments_join` which affected the `bbcode-inline` implementation.
* The `linkify` rule was added to the `inline` chain which affected the handling of the `[url]` BBCode. If available, our implementation reuses `link_open` and `link_close` tokens created by linkify in order to prevent duplicate links.
* The rendered HTML for code changed slightly. There's now a linebreak before the `</code>` tag. The tests were adjusted accordingly.
String.prototype.substr() is deprecated so we replace it with String.prototype.slice() which works similarily but isn't deprecated.
Signed-off-by: Tobias Speicher <rootcommander@gmail.com>
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
Previously, our `upload://` protocol urls were only supported in markdown image tags. This meant that our PullHotlinkedImages job was forced to convert `<img` tags to markdown. Depending on the exact syntax, this can actually cause the image to break.
This commit adds support for `upload://` inside regular HTML `<img` tags. In a future commit, we'll be able to use this to make our PullHotlinkedImages job much more robust.
Context at https://meta.discourse.org/t/152801
This option will make it so the [quote] bbcode will always
include the HTML link to the quoted post, even if a topic_id
is not provided in the PrettyText#cook options. This is so
[quote] bbcode can be used in other places, like chat messages,
that always need the link and do not have an "off-topic" ID
to use.
Allows to write custom code blocks:
```
```mermaid height=200,foo=bar
test
```
```
Which will then get converted to:
```
<pre data-code-wrap="mermaid" data-code-height="200" data-code-foo="bar">
<code class="lang-nohighlight">
test
</code>
</pre>
```
Watched words of type 'replace' or 'link' replaced the text inside
mentions or hashtags too, which broke these. These types of watched
words must skip any match that has an @ or # before it.
The generated regular expressions did not contain \b which matched
every text that contained the word, even if it was only a substring of
a word.
For example, if "art" was a watched word a post containing word
"artist" matched.
It was not clear that replace watched words can be used to replace text
with URLs. This introduces a new watched word type that makes it easier
to understand.
Over the years we accrued many spelling mistakes in the code base.
This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change
- comments
- test descriptions
- other low risk areas
Watched words are always regular expressions, despite watched_words_
_regular_expressions being enabled or not. Internally, wildcard
characters are replaced with a regular expression that matches any non
whitespace character.
Headings with the exact same name generated exactly the same heading
names, which was invalid. This replaces the old code for generating
names for non-English headings which were using URI encode and resulted
in unreadable headings.
* FIX: Use theme color for anchor icon
* FIX: Do not count anchor links
* FIX: Do not count hashtags links either
* DEV: Add tests for link_count
* FIX: Disable anchors in quotes and preview
* FIX: Try building some anchor slugs for unicode
* DEV: Fix tests
Applying oneboxes and replacing censored watched words does not happen
in a strict order which often lead to inconsistencies. This commit
fixes the behavior and will never censor oneboxes.
To make it always censor oneboxes implies significant changes to the
PrettyText pipeline.
We override the default replacements rule to no longer replace "(c)", "(p)", and "(p)". Additionally, we merged the custom arrows rule into the replacement function.
This commit includes other various improvements to watched words.
auto_silence_first_post_regex site setting was removed because it overlapped
with 'require approval' watched words.