Commit Graph

88 Commits

Author SHA1 Message Date
Vinoth Kannan
d681decf01
FEATURE: use new site setting for onebox custom user agent. (#28045)
Previously, we couldn't change the user agent name dynamically for onebox requests. In this commit, a new hidden site setting `onebox_user_agent` is created to override the default user agent value specified in the [initializer](c333e9d6e6/config/initializers/100-onebox_options.rb (L15)).

Co-authored-by: Régis Hanol <regis@hanol.fr>
2024-07-24 04:45:30 +05:30
Loïc Guitaut
2a28cda15c DEV: Update to lastest rubocop-discourse 2024-05-27 18:06:14 +02:00
Alan Guo Xiang Tan
e31cf66f11
FIX: FinalDestination#get forwarding Authorization header on redirects (#27043)
This commits updates `FinalDestination#get` to not forward
`Authorization` header on redirects since most HTTP clients I tested like
curl and wget does not it.

This also fixes a recent problem in `DiscourseIpInfo.mmdb_download`
where we will fail to download the databases when both `GlobalSetting.maxmind_account_id` and
`GlobalSetting.maxmind_license_key` has been set. The failure is due to
the bug above where the redirected URL given by MaxMind does not accept
an `Authorization` header.
2024-05-16 08:37:34 +08:00
Kelv
2477bcc32e
DEV: lint against Layout/EmptyLineBetweenDefs (#24914) 2023-12-15 23:46:04 +08:00
Jarek Radosz
694b5f108b
DEV: Fix various rubocop lints (#24749)
These (21 + 3 from previous PRs) are soon to be enabled in rubocop-discourse:

Capybara/VisibilityMatcher
Lint/DeprecatedOpenSSLConstant
Lint/DisjunctiveAssignmentInConstructor
Lint/EmptyConditionalBody
Lint/EmptyEnsure
Lint/LiteralInInterpolation
Lint/NonLocalExitFromIterator
Lint/ParenthesesAsGroupedExpression
Lint/RedundantCopDisableDirective
Lint/RedundantRequireStatement
Lint/RedundantSafeNavigation
Lint/RedundantStringCoercion
Lint/RedundantWithIndex
Lint/RedundantWithObject
Lint/SafeNavigationChain
Lint/SafeNavigationConsistency
Lint/SelfAssignment
Lint/UnreachableCode
Lint/UselessMethodDefinition
Lint/Void

Previous PRs:
Lint/ShadowedArgument
Lint/DuplicateMethods
Lint/BooleanSymbol
RSpec/SpecFilePathSuffix
2023-12-06 23:25:00 +01:00
Jarek Radosz
74011232e9
FIX: Request html when fetching inline onebox data (#24674)
We do expect to receive html
2023-12-04 11:36:42 +10:00
Ted Johansson
54e813e964
FIX: Don't error out when trying to retrieve title and URL won't encode (#24660) 2023-12-01 15:03:06 +08:00
Martin Brennan
cf42466dea
DEV: Add S3 upload system specs using minio (#22975)
This commit adds some system specs to test uploads with
direct to S3 single and multipart uploads via uppy. This
is done with minio as a local S3 replacement. We are doing
this to catch regressions when uppy dependencies need to
be upgraded or we change uppy upload code, since before
this there was no way to know outside manual testing whether
these changes would cause regressions.

Minio's server lifecycle and the installed binaries are managed
by the https://github.com/discourse/minio_runner gem, though the
binaries are already installed on the discourse_test image we run
GitHub CI from.

These tests will only run in CI unless you specifically use the
CI=1 or RUN_S3_SYSTEM_SPECS=1 env vars.

For a history of experimentation here see https://github.com/discourse/discourse/pull/22381

Related PRs:

* https://github.com/discourse/minio_runner/pull/1
* https://github.com/discourse/minio_runner/pull/2
* https://github.com/discourse/minio_runner/pull/3
2023-08-23 11:18:33 +10:00
Ted Johansson
59867cc091
DEV: Gracefully handle user avatar download SSRF errors (#21523)
### Background

When SSRF detection fails, the exception bubbles all the way up, causing a log alert. This isn't actionable, and should instead be ignored. The existing `rescue` does already ignore network errors, but fails to account for SSRF exceptions coming from `FinalDestination`.

### What is this change?

This PR does two things.

---

Firstly, it introduces a common root exception class, `FinalDestination::SSRFError` for SSRF errors. This serves two functions: 1) it makes it easier to rescue both errors at once, which is generally what one wants to do and 2) prevents having to dig deep into the class hierarchy for the constant.

This change is fully backwards compatible thanks to how inheritance and exception handling works.

---

Secondly, it rescues this new exception in `UserAvatar.import_url_for_user`, which is causing sporadic errors to be logged in production. After this SSRF errors are handled the same as network errors.
2023-05-12 15:32:02 +08:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
Daniel Waterworth
d9364a272e
FIX: When following redirects before cloning, use the first git request (#19269)
This is closer to git's redirect following behaviour. We prevented git
following redirects when we clone in order to prevent SSRF attacks.

Follow-up-to: 291bbc4fb9
2022-11-30 14:21:09 -06:00
David Taylor
68b4fe4cf8
SECURITY: Expand and improve SSRF Protections (#18815)
See https://github.com/discourse/discourse/security/advisories/GHSA-rcc5-28r3-23rr

Co-authored-by: OsamaSayegh <asooomaasoooma90@gmail.com>
Co-authored-by: Daniel Waterworth <me@danielwaterworth.com>
2022-11-01 16:33:17 +00:00
Loïc Guitaut
afe7785141 FIX: Swallow SSL errors when generating oneboxes 2022-08-09 16:54:45 +02:00
David Taylor
3c81683955 DEV: Rename UriHelper.escape_uri to .normalized_encode
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
2022-08-09 11:55:25 +01:00
Loïc Guitaut
1f5682b7d7 FIX: Don’t raise an error on onebox timeouts
Currently when generating oneboxes if the connection timeouts and we’re
using the `FinalDestination#get` method, then it raises an exception.

We already catch this exception when using the
`FinalDestination#resolve` method so this patch just applies the same
logic to `FinalDestination#get`.
2022-07-25 10:41:46 +02:00
Osama Sayegh
d15867463f
FEATURE: Site setting for blocking onebox of URLs that redirect (#16881)
Meta topic: https://meta.discourse.org/t/prevent-to-linkify-when-there-is-a-redirect/226964/2?u=osama.

This commit adds a new site setting `block_onebox_on_redirect` (default off) for blocking oneboxes (full and inline) of URLs that redirect. Note that an initial http → https redirect is still allowed if the redirect location is identical to the source (minus the scheme of course). For example, if a user includes a link to `http://example.com/page` and the link resolves to `https://example.com/page`, then the link will onebox (assuming it can be oneboxed) even if the setting is enabled. The reason for this is a user may type out a URL (i.e. the URL is short and memorizable) with http and since a lot of sites support TLS with http traffic automatically redirected to https, so we should still allow the URL to onebox.
2022-05-23 13:52:06 +03:00
Dan Ungureanu
8e9cbe9db4
FIX: Do not raise if title cannot be crawled (#16247)
If the crawled page returned an error, `FinalDestination#safe_get`
yielded `nil` for `uri` and `chunk` arguments. Another problem is that
`get` did not handle the case when `safe_get` failed and did not return
the `location` and `set_cookie` headers.
2022-03-22 20:13:27 +02:00
Bianca Nenciu
b0f414f7f5
DEV: Remove unused uri parameter (#16179)
The parameter is not used and it did not work properly anyway
because sometimes `@uri` is used instead of `uri`, which can
be different.
2022-03-16 16:42:25 +02:00
Osama Sayegh
b0656f3ed0
FIX: Apply onebox blocked domain checks on every redirect (#16150)
The `blocked onebox domains` setting lets site owners change what sites
are allowed to be oneboxed. When a link is entered into a post,
Discourse checks the domain of the link against that setting and blocks
the onebox if the domain is blocked. But if there's a chain of
redirects, then only the final destination website is checked against
the site setting.

This commit amends that behavior so that every website in the redirect
chain is checked against the site setting, and if anything is blocked
the original link doesn't onebox at all in the post. The
`Discourse-No-Onebox` header is also checked in every response and the
onebox is blocked if the header is set to "1".

Additionally, Discourse will now include the `Discourse-No-Onebox`
header with every response if the site requires login to access content.
This is done to signal to a Discourse instance that it shouldn't attempt
to onebox other Discourse instances if they're login-only. Non-Discourse
websites can also use include that header if they don't wish to have
Discourse onebox their content.

Internal ticket: t59305.
2022-03-11 09:18:12 +03:00
Osama Sayegh
9b5cc1424f
DEV: Don't mutate Excon.defaults[:middlewares] (#16151)
`Excon.defaults` and its middlewares array are constants that we
shouldn't mutate everytime `FinalDestination#resolve` is called.
2022-03-10 14:21:45 +03:00
jbrw
cf545be338
FIX: Increase FinalDestination MAX_REQUEST_SIZE_BYTES (#15998)
The default of 1Mb was preventing some valid Onebox requests from successfully completing.

Increasing this to 5Mb should reduce the number of unexpected failures.
2022-02-18 13:37:31 -05:00
Krzysztof Kotlarek
a34075d205
SECURITY: Onebox response timeout and size limit (#15927)
Validation to ensure that Onebox request is no longer than 10 seconds and response size is not bigger than 1 MB
2022-02-14 12:11:09 +11:00
Natalie Tay
aac9f43038
Only block domains at the final destination (#15689)
In an earlier PR, we decided that we only want to block a domain if 
the blocked domain in the SiteSetting is the final destination (/t/59305). That 
PR used `FinalDestination#get`. `resolve` however is used several places
 but blocks domains along the redirect chain when certain options are provided.

This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed 
and youtube
  - these folks already go through `Oneboxer#external_onebox` which already
  blocks correctly
2022-01-31 15:35:12 +08:00
Roman Rizzi
53abcd825d
FIX: Canonical URLs may be relative (#14825)
FinalDestination's follow_canonical mode used for embedded topics should work when canonical URLs are relative, as specified in [RFC 6596](https://datatracker.ietf.org/doc/html/rfc6596)
2021-11-05 14:20:14 -03:00
jbrw
978a005a42
FIX: resolve responses of 103 should be retried using small_get (#14773)
If the initial `get`/`head` response within `resolve` returns a status code of `103`, attempt to fetch the same URL with the alternative `small_get` method.
2021-10-29 14:51:56 -04:00
Roman Rizzi
4c2d5158c5
FIX: Follow the canonical URL when importing a remote topic. (#14489)
FinalDestination now supports the `follow_canonical` option, which will perform an initial GET request, parse the canonical link if present, and perform a HEAD request to it.

We use this mode during embeds to avoid treating URLs with different query parameters as different topics.
2021-10-01 12:48:21 -03:00
jbrw
2f28ba318c
FEATURE: Onebox can match engines based on the content_type (#13876)
* FEATURE: Onebox can match engines based on the content_type

`FinalDestination` now returns the `content_type` of a resolved URL.

`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.

`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.

This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
2021-07-30 13:36:30 -04:00
jbrw
09d23a37a5
DEV: Add default Accept-Language to FinalDestination requests (#13817)
Not specifying an `Accept-Language` should be equivalent to specifying an `Accept-Language` of `*`, however some webservers seem to prefer it if we are explicit about being able to handle a response of content in any language.
2021-07-22 15:49:59 +10:00
Joffrey JAFFEUX
e50b7e9111
SECURITY: ensures timeouts are correctly used on connect (#13455) 2021-06-21 17:34:01 +02:00
jbrw
19182b1386
DEV: Oneboxer wildcard subdomains (#13015)
* DEV: Allow wildcards in Oneboxer optional domain Site Settings

Allows a wildcard to be used as a subdomain on Oneboxer-related SiteSettings, e.g.:

- `force_get_hosts`
- `cache_onebox_response_body_domains`
- `force_custom_user_agent_hosts`

* DEV: fix typos

* FIX: Try doing a GET after receiving a 500 error from a HEAD

By default we try to do a `HEAD` requests. If this results in a 500 error response, we should try to do a `GET`

* DEV: `force_get_hosts` should be a hidden setting

* DEV: Oneboxer Strategies

Have an alternative oneboxing ‘strategy’ (i.e., set of options) to use when an attempt to generate a Onebox fails. Keep track of any non-default strategies that were used on a particular host, and use that strategy for that host in the future.

Initially, the alternate strategy (`force_get_and_ua`) forces the FinalDestination step of Oneboxing to do a `GET` rather than `HEAD`, and forces a custom user agent.

* DEV: change stubbed return code

The stubbed status code needs to be a value not recognized by FinalDestination
2021-05-13 15:48:35 -04:00
Joffrey JAFFEUX
64dda7112d
FIX: correctly use timeouts in FileHelper and FinalDestination (#12921)
Previous refactors have lost usage of read_timeout in `FileHelper.download` and `FinalDestination` was incorrectly using `Net::HTTP.start` by setting `open_timeout` in the block instead of directly during the invocation.

Couldn't figure how to write a good test for this without slowing the spec.
2021-05-03 09:21:11 +02:00
jbrw
68d0916eb5
FEATURE: Oneboxer cache response body (#12562)
* FEATURE: Cache successful HTTP GET requests during Oneboxing

Some oneboxes may fail if when making excessive and/or odd requests against the target domains. This change provides a simple mechanism to cache the results of succesful GET requests as part of the oneboxing process, with the goal of reducing repeated requests and ultimately improving the rate of successful oneboxing.

To enable:

Set `SiteSetting.cache_onebox_response_body` to `true`

Add the domains you’re interesting in caching to `SiteSetting. cache_onebox_response_body_domains` e.g. `example.com|example.org|example.net`

Optionally set `SiteSetting.cache_onebox_user_agent` to a user agent string of your choice to use when making requests against domains in the above list.

* FIX: Swap order of duration and value in redis call

The correct order for `setex` arguments is `key`, `duration`, and `value`.

Duration and value had been flipped, however the code would not have thrown an error because we were caching the value of `1.day.to_i` for a period of 1 seconds… The intention appears to be to set a value of 1 (purely as a flag) for a period of 1 day.
2021-03-31 13:19:34 -04:00
jbrw
331236d6d7
Onebox improved error handling and support for Instagram Access Tokens (#11253)
* FEATURE: display error if Oneboxing fails due to HTTP error

- display warning if onebox URL is unresolvable
- display warning if attributes are missing

* FEATURE: Use new Instagram oEmbed endpoint if access token is configured

Instagram requires an Access Token to access their oEmbed endpoint. The requirements (from https://developers.facebook.com/docs/instagram/oembed/) are as follows:

- a Facebook Developer account, which you can create at developers.facebook.com
- a registered Facebook app
- the oEmbed Product added to the app
- an Access Token
- The Facebook app must be in Live Mode

The generated Access Token, once added to SiteSetting.facebook_app_access_token, will be passed to onebox. Onebox can then use this token to access the oEmbed endpoint to generate a onebox for Instagram.

* DEV: update user agent string

* DEV: don’t do HEAD requests against news.yahoo.com

* DEV: Bump onebox version from 2.1.5 to 2.1.6

* DEV: Avoid re-reading templates

* DEV: Tweaks to onebox mustache templates

* DEV: simplified error message for missing onebox data

* Apply suggestions from code review
Co-authored-by: Gerhard Schlager <mail@gerhard-schlager.at>
2020-11-18 12:55:16 -05:00
Krzysztof Kotlarek
e0d9232259
FIX: use allowlist and blocklist terminology (#10209)
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
Martin Brennan
edbc356593
FIX: Replace deprecated URI.encode, URI.escape, URI.unescape and URI.unencode (#8528)
The following methods have long been deprecated in ruby due to flaws in their implementation per http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/29293?29179-31097:

URI.escape
URI.unescape
URI.encode
URI.unencode
escape/encode are just aliases for one another. This PR uses the Addressable gem to replace these methods with its own encode, unencode, and encode_component methods where appropriate.

I have put all references to Addressable::URI here into the UrlHelper to keep them corralled in one place to make changes to this implementation easier.

Addressable is now also an explicit gem dependency.
2019-12-12 12:49:21 +10:00
Joffrey JAFFEUX
0d3d2c43a0
DEV: s/\$redis/Discourse\.redis (#8431)
This commit also adds a rubocop rule to prevent global variables.
2019-12-03 10:05:53 +01:00
Arpit Jalan
00c406520e FEATURE: allow FinalDestination to use custom user agent for specific hosts 2019-11-07 14:47:51 +05:30
Arpit Jalan
e90aac11cb fix the build 2019-08-07 16:39:58 +05:30
Arpit Jalan
b0e781e2d4 FIX: do not follow redirect on same host with path /login or /session 2019-08-07 16:26:55 +05:30
Sam Saffron
7429700389 FIX: ensure we can download maxmind without redis or db config
This also corrects FileHelper.download so it supports "follow_redirect"
correctly (it used to always follow 1 redirect) and adds a `validate_url`
param that will bypass all uri validation if set to false (default is true)
2019-05-28 10:28:57 +10:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Gerhard Schlager
92df6890df FIX: GET request didn't use headers 2019-03-08 21:36:49 +01:00
Sam
cfddfa6de2 SECURITY: bypass long GET requests
In some rare cases we would check URLs with very large payloads
this ensures we always bypass and do not read entire payloads
2019-02-27 14:51:28 +11:00
Arpit Jalan
1ab91f0474 FIX: preserve github fragment URL 2018-12-19 12:34:47 +05:30
Guo Xiang Tan
8dc1463ab3 Enable Lint/ShadowingOuterLocalVariable for Rubocop. 2018-09-04 10:16:42 +08:00
Bianca Nenciu
b6963b8ffb FIX: Ignore OneBox blacklisted domains. 2018-08-27 20:40:55 +02:00
Régis Hanol
de92913bf4 FIX: store the topic links using the cooked upload url 2018-08-14 12:23:32 +02:00
Robin Ward
7058205f70 FIX: Broken specs 2018-07-24 12:00:34 -04:00
Robin Ward
236243f38a SECURITY: Consider 0.0.0.0 a private IP 2018-07-24 11:16:27 -04:00
Guo Xiang Tan
d43895e2a0 Don't log 404s for FinalDestination.
* We can't do anything about 404s
2018-05-25 10:11:16 +08:00