Commit Graph

139 Commits

Author SHA1 Message Date
Gary Pendergast
f53c734ba6
FEATURE: Add a onebox_locale site setting. (#30655)
Following on from f369db5ae9, this change adds the ability to choose a custom locale to send to onebox providers.

If this setting is left blank, it will fall back to using default_locale.
2025-01-09 14:11:37 +11:00
Gary Pendergast
f369db5ae9
FIX: Ensure Onebox requests ask for the correct language. (#30637)
Onebox embeds currently default to accepting any language response from the destination, which can have some surprising behaviour. For example the `curl` equivalent of what Onebox does:

```
% curl -si -H "Accept-Language: *" 'https://developer.android.com/studio' | grep location:
location: /studio?hl=hi
```

This PR uses the value of `SiteSetting.default_locale` to populate the `Accept-Language` header, falling back to English if that isn't available, then finally accepting whatever language the destination makes available.
2025-01-09 09:08:27 +11:00
Alan Guo Xiang Tan
57f4176b57
DEV: Bump rubocop_discourse (#29608) 2024-11-06 06:27:49 +08:00
Martin Brennan
97e2b353f6
FEATURE: Allow for multiple GitHub onebox tokens (#27887)
Followup 560e8aff75

GitHub auth tokens cannot be made with permissions to
access multiple organisations. This is quite limiting.
This commit changes the site setting to be a "secret list"
type, which allows for a key/value mapping where the value
is treated like a password in the UI.

Now when a GitHub URL is requested for oneboxing, the
org name from the URL is used to determine which token
to use for the request.

Just in case anyone used the old site setting already,
there is a migration to create a `default` entry
with that token in the new list setting, and for
a period of time we will consider that token valid to
use for all GitHub oneboxes as well.
2024-07-15 13:07:36 +10:00
Martin Brennan
560e8aff75
FEATURE: Allow oneboxing private GitHub URLs (#27705)
This commit adds the ability to onebox private GitHub
commits, pull requests, issues, blobs, and actions using
a new `github_onebox_access_token` site setting. The token
must be set up in correctly to have access to the repos needed.

To do this successfully with the Oneboxer, we need to skip
redirects on the github.com host, otherwise we get a 404
on the URL before it is translated into a GitHub API URL
and has the appropriate headers added.
2024-07-10 09:39:31 +10:00
Roman Rizzi
a709b7e861
FIX: Allow sanitized-HTML in GH issues and categories oneboxes. (#25374)
Follow-up to d78357917c

Related meta topic: https://meta.discourse.org/t/html-is-not-render-on-category-onebox-description/289424:
2024-01-22 15:25:29 -03:00
Martin Brennan
30d5e752d7
DEV: Revert guardian changes (#24742)
I took the wrong approach here, need to rethink.

* Revert "FIX: Use Guardian.basic_user instead of new (anon) (#24705)"

This reverts commit 9057272ee2.

* Revert "DEV: Remove unnecessary method_missing from GuardianUser (#24735)"

This reverts commit a5d4bf6dd2.

* Revert "DEV: Improve Guardian devex (#24706)"

This reverts commit 77b6a038ba.

* Revert "FIX: Introduce Guardian::BasicUser for oneboxing checks (#24681)"

This reverts commit de983796e1.
2023-12-06 16:37:32 +10:00
Martin Brennan
9057272ee2
FIX: Use Guardian.basic_user instead of new (anon) (#24705)
c.f. de983796e1

There will soon be additional login_required checks
for Guardian, and the intent of many checks by automated
systems is better fulfilled by using BasicUser, which
simulates a logged in TL0 forum user, rather than an
anon user.

In some cases the use of anon still makes sense (e.g.
anonymous_cache), and in that case the more explicit
`Guardian.anon_user` is used
2023-12-06 11:56:21 +10:00
Martin Brennan
de983796e1
FIX: Introduce Guardian::BasicUser for oneboxing checks (#24681)
Through internal discussion, it has become clear that
we need a conceptual Guardian user that bridges the
gap between anon users and a logged in forum user with
an absolute baseline level of access to public topics,
which can be used in cases where:

1. Automated systems are running which shouldn't see any
   private data
1. A baseline level of user access is needed

In this case we are fixing the latter; when oneboxing a local
topic, and we are linking to a topic in another category from
the current one, we need to operate off a baseline level of
access, since not all users have access to the same categories,
and we don't want e.g. editing a post with an internal link to
expose sensitive internal information.
2023-12-05 09:25:23 +10:00
David Taylor
e9387e238c
FIX: Do not follow redirects for twitter oneboxes (#22362)
Twitter is now redirecting anonymous users (with a browser-like user agent, which FinalDestination uses) to the login page. Skipping redirect-following for twitter.com will allow us to continue oneboxing tweets via the OpenGraph data and the API (when credentials are present).

https://meta.discourse.org/t/269371/17
2023-06-30 11:30:03 +01:00
Sam
9e241e82e9
DEV: use HTML5 version of loofah (#21522)
https://meta.discourse.org/t/markdown-preview-and-result-differ/263878

The result of this markdown had different results in the composer preview and the post. This is solved by updating Loofah to the latest version and using html5 fragments like our user had reported. While the change was only needed in cooked_post_processor.rb for this fix, other areas also had to be updated due to various side effects.
2023-06-20 09:49:22 +08:00
Penar Musaraj
a67c96438c
UX: Fix user onebox layout (#21284) 2023-04-28 09:50:49 -04:00
Gerhard Schlager
7fd63b34b1
DEV: Make it obvious that joined translation is used by onebox (#20158)
This also moves the date as interpolation key into the string which makes translation easier.
2023-02-03 10:02:14 +08:00
Bianca Nenciu
c186a46910
SECURITY: Prevent XSS in local oneboxes (#20008)
Co-authored-by: OsamaSayegh <asooomaasoooma90@gmail.com>
2023-01-25 19:17:21 +02:00
Daniel Waterworth
666536cbd1
DEV: Prefer \A and \z over ^ and $ in regexes (#19936) 2023-01-20 12:52:49 -06:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
Martin Brennan
9d50790530
FIX: Allow svg in oneboxer in certain cases (#19253)
When doing local oneboxes we sometimes want to allow
SVGs in the final preview HTML. The main case currently
is for the new cooked hashtags, which include an SVG
icon.

SVGs will be included in local oneboxes via `ExcerptParser` _only_
if they have the d-icon class, and if the caller for `post.excerpt`
specifies the `keep_svg: true` option.
2022-11-30 12:42:15 +10:00
Jarek Radosz
49fa2e93c2
DEV: Clean up twitter onebox code (#18012) 2022-08-21 19:26:24 +02:00
David Taylor
3c81683955 DEV: Rename UriHelper.escape_uri to .normalized_encode
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
2022-08-09 11:55:25 +01:00
Martin Brennan
61b9e3ee30
FIX: InlineOneboxer watched word censor error (#16921)
In 7328a2bfb0 we changed the
InlineOneboxer#onebox_for method to run the title of the
onebox through WatchedWord#censor_text. However, it is
allowable for the title to be nil, which was causing this
error in production:

> NoMethodError : undefined method gsub for nil:NilClass

We just need to check whether the title is nil before trying
to censor it.
2022-05-26 14:01:44 +10:00
Bianca Nenciu
6c8f491dc3
DEV: Allow plugins to register Onebox handlers (#16870)
This targets only the local Oneboxes and allows plugins to customize
regular or inline Oneboxes for routes inside the site.
2022-05-23 20:02:02 +03:00
Osama Sayegh
d15867463f
FEATURE: Site setting for blocking onebox of URLs that redirect (#16881)
Meta topic: https://meta.discourse.org/t/prevent-to-linkify-when-there-is-a-redirect/226964/2?u=osama.

This commit adds a new site setting `block_onebox_on_redirect` (default off) for blocking oneboxes (full and inline) of URLs that redirect. Note that an initial http → https redirect is still allowed if the redirect location is identical to the source (minus the scheme of course). For example, if a user includes a link to `http://example.com/page` and the link resolves to `https://example.com/page`, then the link will onebox (assuming it can be oneboxed) even if the setting is enabled. The reason for this is a user may type out a URL (i.e. the URL is short and memorizable) with http and since a lot of sites support TLS with http traffic automatically redirected to https, so we should still allow the URL to onebox.
2022-05-23 13:52:06 +03:00
Loïc Guitaut
46176b7dd7 DEV: Don’t patch Sanitize::Config
Currently we’re reopening the `Sanitize::Config` class (which is part of
the `sanitize` gem) to put our custom config for Onebox in it. This is
unnecessary as we can simply create a dedicated module to hold our
custom configuration.
2022-04-06 17:10:51 +02:00
Osama Sayegh
b0656f3ed0
FIX: Apply onebox blocked domain checks on every redirect (#16150)
The `blocked onebox domains` setting lets site owners change what sites
are allowed to be oneboxed. When a link is entered into a post,
Discourse checks the domain of the link against that setting and blocks
the onebox if the domain is blocked. But if there's a chain of
redirects, then only the final destination website is checked against
the site setting.

This commit amends that behavior so that every website in the redirect
chain is checked against the site setting, and if anything is blocked
the original link doesn't onebox at all in the post. The
`Discourse-No-Onebox` header is also checked in every response and the
onebox is blocked if the header is set to "1".

Additionally, Discourse will now include the `Discourse-No-Onebox`
header with every response if the site requires login to access content.
This is done to signal to a Discourse instance that it shouldn't attempt
to onebox other Discourse instances if they're login-only. Non-Discourse
websites can also use include that header if they don't wish to have
Discourse onebox their content.

Internal ticket: t59305.
2022-03-11 09:18:12 +03:00
Natalie Tay
aac9f43038
Only block domains at the final destination (#15689)
In an earlier PR, we decided that we only want to block a domain if 
the blocked domain in the SiteSetting is the final destination (/t/59305). That 
PR used `FinalDestination#get`. `resolve` however is used several places
 but blocks domains along the redirect chain when certain options are provided.

This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed 
and youtube
  - these folks already go through `Oneboxer#external_onebox` which already
  blocks correctly
2022-01-31 15:35:12 +08:00
David Taylor
03998e0a29
FIX: Use CDN URL for internal onebox avatars (#15077)
This commit will also trigger a background rebake for all existing posts with internal oneboxes
2021-11-25 12:07:34 +00:00
jbrw
2f28ba318c
FEATURE: Onebox can match engines based on the content_type (#13876)
* FEATURE: Onebox can match engines based on the content_type

`FinalDestination` now returns the `content_type` of a resolved URL.

`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.

`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.

This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
2021-07-30 13:36:30 -04:00
Arpit Jalan
05bdbd9f97
SECURITY: Onebox canonical links bypassing FinalDestination checks (#13605) 2021-07-01 20:09:29 +05:30
Joffrey JAFFEUX
e50b7e9111
SECURITY: ensures timeouts are correctly used on connect (#13455) 2021-06-21 17:34:01 +02:00
Bianca Nenciu
d184fe59ca
FEATURE: Censor Oneboxes (#12902)
Previously onebox content was not passed by the censor regex, meaning you could sneak in censored words via onebox.
2021-06-03 11:39:12 +10:00
jbrw
461a2c334b
FIX: return an empty result if response from Amazon is missing expected attributes (#13173)
* FIX: return an empty result if response from Amazon is missing attributes

Check we have the basic attributes requires to construct a Onebox for Amazon.

This is an attempt to handle scenarios where we receive a valid 200-status response from an Amazon request that does not include the data we’re expecting.

* Update lib/onebox/engine/amazon_onebox.rb

Co-authored-by: Régis Hanol <regis@hanol.fr>

Co-authored-by: Régis Hanol <regis@hanol.fr>
2021-06-01 16:23:18 -04:00
jbrw
a24b6daa87
FIX: An unresolved blank uri should attempt an alternate Oneboxing strategy, if available (#13070) 2021-05-14 15:23:20 -04:00
jbrw
19182b1386
DEV: Oneboxer wildcard subdomains (#13015)
* DEV: Allow wildcards in Oneboxer optional domain Site Settings

Allows a wildcard to be used as a subdomain on Oneboxer-related SiteSettings, e.g.:

- `force_get_hosts`
- `cache_onebox_response_body_domains`
- `force_custom_user_agent_hosts`

* DEV: fix typos

* FIX: Try doing a GET after receiving a 500 error from a HEAD

By default we try to do a `HEAD` requests. If this results in a 500 error response, we should try to do a `GET`

* DEV: `force_get_hosts` should be a hidden setting

* DEV: Oneboxer Strategies

Have an alternative oneboxing ‘strategy’ (i.e., set of options) to use when an attempt to generate a Onebox fails. Keep track of any non-default strategies that were used on a particular host, and use that strategy for that host in the future.

Initially, the alternate strategy (`force_get_and_ua`) forces the FinalDestination step of Oneboxing to do a `GET` rather than `HEAD`, and forces a custom user agent.

* DEV: change stubbed return code

The stubbed status code needs to be a value not recognized by FinalDestination
2021-05-13 15:48:35 -04:00
Bianca Nenciu
21d1ee1065
FIX: Use Nokogiri and Loofah consistently (#12693)
CookedPostProcessor used Loofah to parse the cooked content of a post
and Nokogiri to parse cooked Oneboxes. Even though Loofah is built on
top of Nokogiri, replacing an element from the cooked post (a Nokogiri
node) with a parsed onebox (a Loofah node) produced a strange result
which included XML namespaces. Removing the mix and using Loofah
to parse Oneboxes fixed the problem.
2021-04-14 18:09:55 +03:00
jbrw
50252d803e
DEV: stub youtube embed requests (#12637)
* DEV: stub youtube embed requests

* DEV: Ignore redirects on youtube.com when oneboxing
2021-04-07 13:32:27 -04:00
jbrw
68d0916eb5
FEATURE: Oneboxer cache response body (#12562)
* FEATURE: Cache successful HTTP GET requests during Oneboxing

Some oneboxes may fail if when making excessive and/or odd requests against the target domains. This change provides a simple mechanism to cache the results of succesful GET requests as part of the oneboxing process, with the goal of reducing repeated requests and ultimately improving the rate of successful oneboxing.

To enable:

Set `SiteSetting.cache_onebox_response_body` to `true`

Add the domains you’re interesting in caching to `SiteSetting. cache_onebox_response_body_domains` e.g. `example.com|example.org|example.net`

Optionally set `SiteSetting.cache_onebox_user_agent` to a user agent string of your choice to use when making requests against domains in the above list.

* FIX: Swap order of duration and value in redis call

The correct order for `setex` arguments is `key`, `duration`, and `value`.

Duration and value had been flipped, however the code would not have thrown an error because we were caching the value of `1.day.to_i` for a period of 1 seconds… The intention appears to be to set a value of 1 (purely as a flag) for a period of 1 day.
2021-03-31 13:19:34 -04:00
jbrw
aed97c7bab
FIX: Add amazon sites to force_get_hosts (#12341)
It has been observed that doing a HEAD against an Amazon store URL may result in a 405 error being returned.

Skipping the HEAD request may result in an improved oneboxing experience when requesting these URLs.
2021-03-10 14:42:17 -05:00
jbrw
fff8a24f2b
FIX: Don’t display error if only error is a missing image (#12216)
`Onebox.preview` can return 0-to-n errors, where the errors are missing OpenGraph attributes (e.g. title, description, image, etc.). If any of these attributes are missing, we construct an error message and attach it to the Oneboxer preview HTML. The error message is something like:

 “Sorry, we were unable to generate a preview for this web page, because the following oEmbed / OpenGraph tags could not be found: description, image”

However, if the only missing tag is `image` we don’t need to display the error, as we have enough other data (title, description, etc.) to construct a useful/complete Onebox.
2021-02-25 14:30:40 -05:00
Martin Brennan
13c2a4886f
FEATURE: Add disable_onebox_media_download_controls hidden site setting (#12208)
Uses discourse/onebox@ff9ec90

Adds a hidden site setting called disable_onebox_media_download_controls which will add controlslist="nodownload" to video and audio oneboxes, and also to the local video and audio oneboxes within Discourse.
2021-02-25 12:39:15 +10:00
Dan Ungureanu
2d51833ca9
FIX: Make Oneboxer#apply insert block Oneboxes correctly (#11449)
It used to insert block Oneboxes inside paragraphs which resulted in
invalid HTML. This needed an additional parsing for removal of empty
paragraphs and the resulting HTML could still be invalid.

This commit ensure that block Oneboxes are inserted correctly, by
splitting the paragraph containing the link and putting the block
between the two. Paragraphs left with nothing but whitespaces will
be removed.

Follow up to 7f3a30d79f.
2020-12-14 17:49:37 +02:00
jbrw
da9b837da0
DEV: More robust processing of URLs (#11361)
* DEV: More robust processing of URLs

The previous `UrlHelper.encode_component(CGI.unescapeHTML(UrlHelper.unencode(uri))` method would naively process URLs, which could result in a badly formed response.

`Addressable::URI.normalized_encode(uri)` appears to deal with these edge-cases in a more robust way.

* DEV: onebox should use UrlHelper

* DEV: fix spec

* DEV: Escape output when rendering local links
2020-12-03 17:16:01 -05:00
jbrw
51f9a56137
FEATURE: Onebox local categories (#11311)
* FEATURE: onebox for local categories

This commit adjusts the category onebox to look more like the category boxes do on the category page.

Co-authored-by: Jordan Vidrine <jordan@jordanvidrine.com>
2020-11-25 10:53:05 +11:00
jbrw
331236d6d7
Onebox improved error handling and support for Instagram Access Tokens (#11253)
* FEATURE: display error if Oneboxing fails due to HTTP error

- display warning if onebox URL is unresolvable
- display warning if attributes are missing

* FEATURE: Use new Instagram oEmbed endpoint if access token is configured

Instagram requires an Access Token to access their oEmbed endpoint. The requirements (from https://developers.facebook.com/docs/instagram/oembed/) are as follows:

- a Facebook Developer account, which you can create at developers.facebook.com
- a registered Facebook app
- the oEmbed Product added to the app
- an Access Token
- The Facebook app must be in Live Mode

The generated Access Token, once added to SiteSetting.facebook_app_access_token, will be passed to onebox. Onebox can then use this token to access the oEmbed endpoint to generate a onebox for Instagram.

* DEV: update user agent string

* DEV: don’t do HEAD requests against news.yahoo.com

* DEV: Bump onebox version from 2.1.5 to 2.1.6

* DEV: Avoid re-reading templates

* DEV: Tweaks to onebox mustache templates

* DEV: simplified error message for missing onebox data

* Apply suggestions from code review
Co-authored-by: Gerhard Schlager <mail@gerhard-schlager.at>
2020-11-18 12:55:16 -05:00
David Taylor
a3577435f7
FEATURE: Additional control of iframes in oneboxes (#10523)
This commit adds a new site setting "allowed_onebox_iframes". By default, all onebox iframes are allowed. When the list of domains is restricted, Onebox will automatically skip engines which require those domains, and use a fallback engine.
2020-08-27 20:12:13 +01:00
Krzysztof Kotlarek
e0d9232259
FIX: use allowlist and blocklist terminology (#10209)
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
Dan Ungureanu
fe284ffd06
Revert "DEV: Remove useless code (#10130)"
Some oneboxes still generate empty P tags (video oneboxes).

This reverts commit c299d02287.
2020-06-29 13:56:28 +03:00
Dan Ungureanu
c299d02287
DEV: Remove useless code (#10130)
protection is not needed and can easily be bypassed with empty divs anyway.
2020-06-29 17:49:30 +10:00
Guo Xiang Tan
b28d97b64a
FIX: Bump onebox for twitch video and clips embedding fix. 2020-06-24 11:00:30 +08:00
Régis Hanol
91c89df68a FIX: onebox local topic when using slug-less URL
When linking to a topic in the same Discourse, we try to onebox the link to show the title
and other various information depending on whether it's a "standard" or "inline" onebox.

However, we were not properly detecting links to topics that had no slugs (eg. https://meta.discourse.org/t/1234).
2020-06-23 17:18:38 +02:00
Krzysztof Kotlarek
9bff0882c3
FEATURE: Nokogumbo (#9577)
* FEATURE: Nokogumbo

Use Nokogumbo HTML parser.
2020-05-05 13:46:57 +10:00