FinalDestination's follow_canonical mode used for embedded topics should work when canonical URLs are relative, as specified in [RFC 6596](https://datatracker.ietf.org/doc/html/rfc6596)
FinalDestination now supports the `follow_canonical` option, which will perform an initial GET request, parse the canonical link if present, and perform a HEAD request to it.
We use this mode during embeds to avoid treating URLs with different query parameters as different topics.
* FEATURE: Onebox can match engines based on the content_type
`FinalDestination` now returns the `content_type` of a resolved URL.
`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.
`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.
This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
* DEV: Allow wildcards in Oneboxer optional domain Site Settings
Allows a wildcard to be used as a subdomain on Oneboxer-related SiteSettings, e.g.:
- `force_get_hosts`
- `cache_onebox_response_body_domains`
- `force_custom_user_agent_hosts`
* DEV: fix typos
* FIX: Try doing a GET after receiving a 500 error from a HEAD
By default we try to do a `HEAD` requests. If this results in a 500 error response, we should try to do a `GET`
* DEV: `force_get_hosts` should be a hidden setting
* DEV: Oneboxer Strategies
Have an alternative oneboxing ‘strategy’ (i.e., set of options) to use when an attempt to generate a Onebox fails. Keep track of any non-default strategies that were used on a particular host, and use that strategy for that host in the future.
Initially, the alternate strategy (`force_get_and_ua`) forces the FinalDestination step of Oneboxing to do a `GET` rather than `HEAD`, and forces a custom user agent.
* DEV: change stubbed return code
The stubbed status code needs to be a value not recognized by FinalDestination
* DEV: More robust processing of URLs
The previous `UrlHelper.encode_component(CGI.unescapeHTML(UrlHelper.unencode(uri))` method would naively process URLs, which could result in a badly formed response.
`Addressable::URI.normalized_encode(uri)` appears to deal with these edge-cases in a more robust way.
* DEV: onebox should use UrlHelper
* DEV: fix spec
* DEV: Escape output when rendering local links
This change both speeds up specs (less strings to allocate) and helps catch
cases where methods in Discourse are mutating inputs.
Overall we will be migrating everything to use #frozen_string_literal: true
it will take a while, but this is the first and safest move in this direction
* `rescue nil` is a really bad pattern to use in our code base.
We should rescue errors that we expect the code to throw and
not rescue everything because we're unsure of what errors the
code would throw. This would reduce the amount of pain we face
when debugging why something isn't working as expexted. I've
been bitten countless of times by errors being swallowed as a
result during debugging sessions.
If a hostname does an https redirect we cache that so next
lookup does not incur it.
Also, only rate limit per ip once per final destination
Raise final destination protection to 1000 ip lookups an hour
This refactors handling of s3 so it can be specified via GlobalSetting
This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3
It is a critical setting for situations where assets are mirrored to s3.