* SECURITY: Update default allowed iframes list
Change the default iframe url list to all include 3 slashes.
* SECURITY: limit group tag's name length
Limit the size of a group tag's name to 100 characters.
Internal ref - t/130059
* SECURITY: Improve sanitization of SVGs in Onebox
---------
Co-authored-by: Blake Erickson <o.blakeerickson@gmail.com>
Co-authored-by: Régis Hanol <regis@hanol.fr>
Co-authored-by: David Taylor <david@taylorhq.com>
* FIX: Add post id to the anchor to prevent two identical anchors
We generate anchors for headings in posts. This works fine if there is
only one post in a topic with anchors. The problem comes when you have
two or more posts with the same heading. PrettyText generates anchors
based on the heading text using the raw context of each post, so it is
entirely possible to generate the same anchor for two posts in the same
topic, especially for topics with template replies
Post1:
# heading
context
Post2:
# heading
context
When both posts are on the page at the same time, the anchor will only
work for the first post, according to the [HTML specification](https://html.spec.whatwg.org/multipage/browsing-the-web.html#scroll-to-the-fragment-identifier).
> If there is an a element in the document tree whose root is document
> that has a name attribute whose value is equal to fragment, then
> return the *first* such element in tree order.
This bug is particularly serious in forums with non-Latin languages,
such as Chinese. We do not generate slugs for Chinese, which results in
the heading anchors being completely dependent on their order.
```ruby
[2] pry(main)> PrettyText.cook("# 中文")
=> "<h1><a name=\"h-1\" class=\"anchor\" href=\"#h-1\"></a>中文</h1>"
```
Therefore, the anchors in the two posts must be in exactly the same by
order, causing almost all of the anchors in the second post to be
invalid.
This commit solves this problem by adding the `post_id` to the anchor.
The new anchor generation method will add `p-{post_id}` as a prefix when
post_id is available:
```ruby
[3] pry(main)> PrettyText.cook("# 中文", post_id: 1234)
=> "<h1><a name=\"p-1234-h-1\" class=\"anchor\" href=\"#p-1234-h-1\"></a>中文</h1>"
```
This way we can ensure that each anchor name only appears once on the
same topic. Using post id also prevents the potential possibility of the
same anchor name when splitting/merging topics.
The AdminPlugin JS model uses a similar pattern to chat models,
where it is a plain JS class manually converting provided
snake_case attributes from the serializer to JS camelCase.
However this doesn't work when it comes to using `add_to_serializer`
in plugins since core does not know about these new attributes.
Instead, we can use a JS function to convert snake_case to camelCase
and use that when initializing AdminPlugin. This commit also moves
similar functions to a new case-converter.js file in
discourse-common/lib.
In 53b3d2f0dc we introduced a stricter BBCode Tag parser. It prevents having "values" with spaces when they're not surrounded by a valid pair of quotes.
The `[details=` BBCode Tag is popular enough that it's worth adding a special case for it (especially since it doesn't support other parameters).
This also adds the Finnish pair of quotes.
Context - https://meta.discourse.org/t/details-accepts-only-one-word-as-summary/313019
Wasn't quite handling the cases where a closing bracket `]` was used in the value of one of the attributes.
```markdown
[chat quote=user channel="[broken]"]
```
Would not be correctly parsed because we would _greedily_ use the first `]` as the end of the tag even though it might be a valid character when inside proper quotes.
c39a4de139/app/assets/javascripts/discourse-markdown-it/src/features/bbcode-block.js (L62)
Re-wrote the `parseBBCodeTag` to properly handle the following cases
- A closing tag (aka `[/name]`) which are easy since they don't have any attributes
- An old `[quote=...]` format we used that doesn't uses quotes but still has various attributes of the form `key:value`
- All three valid BBCode opening tag formats we support
- `[name]` without any attributes
- `[name=foo]` with a default value
- `[name foo=bar]` with some attributes
Ended up having to fix/rewrite the few bbcode rules that were using the `parseBBCodeTag` function, namely `d-wrap` and `discourse-local-dates`.
While working on this, I think I also found a way to get rid the of shims we had in place so that plugins could use the `parseBBCodeTag` function.
Reference - https://meta.discourse.org/t/having-a-right-bracket-in-a-channel-name-breaks-all-quotes-from-that-channel/308439
Inline the helper functions, avoid creating and then immediately destructuring arrays, use complete strings instead of string interpolation, Map instead of a pojo.
> [code]
> line1
> line2
> [/code]
would render as
| line1
| > line2
instead of the correct
| line1
| line2
That was due to the `bbcode-block` code using a `slice` to get the content of a block and not taking into account it being nested in a quote block for example.
The fix was to get the content using the `getLines` utils method.
Context: https://meta.discourse.org/t/markdown-bbcode-code-quote-bug/299047
- Remove vendored copy
- Update Rails implementation to look for language definitions in node_modules
- Use webpack-based dynamic import for hljs core
- Use browser-native dynamic import for site-specific language bundle (and fallback to webpack-based dynamic import in tests)
- Simplify markdown implementation to allow all languages into the `lang-{blah}` className
- Now that all languages are passed through, resolve aliases at runtime to avoid the need for the pre-built `highlightjs-aliases` index
With Embroider, we can rely on async `import()` to do the splitting
for us.
This commit extracts from `pretty-text` all the parts that are
meant to be loaded async into a new `discourse-markdown-it` package
that is also a V2 addon (meaning that all files are presumed unused
until they are imported, aka "static").
Mostly I tried to keep the very discourse specific stuff (accessing
site settings and loading plugin features) inside discourse proper,
while the new package aims to have some resembalance of a general
purpose library, a MarkdownIt++ if you will. It is far from perfect
because of how all the "options" stuff work but I think it's a good
start for more refactorings (clearing up the interfaces) to happen
later.
With this, pretty-text and app/lib/text are mostly a kitchen sink
of loosely related text processing utilities.
After the refactor, a lot more code related to setting up the
engine are now loaded lazily, which should be a pretty nice win. I
also noticed that we are currently pulling in the `xss` library at
initial load to power the "sanitize" stuff, but I suspect with a
similar refactoring effort those usages can be removed too. (See
also #23790).
This PR does not attempt to fix the sanitize issue, but I think it
sets things up on the right trajectory for that to happen later.
Co-authored-by: David Taylor <david@taylorhq.com>