* DEV: Allow wildcards in Oneboxer optional domain Site Settings
Allows a wildcard to be used as a subdomain on Oneboxer-related SiteSettings, e.g.:
- `force_get_hosts`
- `cache_onebox_response_body_domains`
- `force_custom_user_agent_hosts`
* DEV: fix typos
* FIX: Try doing a GET after receiving a 500 error from a HEAD
By default we try to do a `HEAD` requests. If this results in a 500 error response, we should try to do a `GET`
* DEV: `force_get_hosts` should be a hidden setting
* DEV: Oneboxer Strategies
Have an alternative oneboxing ‘strategy’ (i.e., set of options) to use when an attempt to generate a Onebox fails. Keep track of any non-default strategies that were used on a particular host, and use that strategy for that host in the future.
Initially, the alternate strategy (`force_get_and_ua`) forces the FinalDestination step of Oneboxing to do a `GET` rather than `HEAD`, and forces a custom user agent.
* DEV: change stubbed return code
The stubbed status code needs to be a value not recognized by FinalDestination
CookedPostProcessor used Loofah to parse the cooked content of a post
and Nokogiri to parse cooked Oneboxes. Even though Loofah is built on
top of Nokogiri, replacing an element from the cooked post (a Nokogiri
node) with a parsed onebox (a Loofah node) produced a strange result
which included XML namespaces. Removing the mix and using Loofah
to parse Oneboxes fixed the problem.
* FEATURE: Cache successful HTTP GET requests during Oneboxing
Some oneboxes may fail if when making excessive and/or odd requests against the target domains. This change provides a simple mechanism to cache the results of succesful GET requests as part of the oneboxing process, with the goal of reducing repeated requests and ultimately improving the rate of successful oneboxing.
To enable:
Set `SiteSetting.cache_onebox_response_body` to `true`
Add the domains you’re interesting in caching to `SiteSetting. cache_onebox_response_body_domains` e.g. `example.com|example.org|example.net`
Optionally set `SiteSetting.cache_onebox_user_agent` to a user agent string of your choice to use when making requests against domains in the above list.
* FIX: Swap order of duration and value in redis call
The correct order for `setex` arguments is `key`, `duration`, and `value`.
Duration and value had been flipped, however the code would not have thrown an error because we were caching the value of `1.day.to_i` for a period of 1 seconds… The intention appears to be to set a value of 1 (purely as a flag) for a period of 1 day.
It has been observed that doing a HEAD against an Amazon store URL may result in a 405 error being returned.
Skipping the HEAD request may result in an improved oneboxing experience when requesting these URLs.
`Onebox.preview` can return 0-to-n errors, where the errors are missing OpenGraph attributes (e.g. title, description, image, etc.). If any of these attributes are missing, we construct an error message and attach it to the Oneboxer preview HTML. The error message is something like:
“Sorry, we were unable to generate a preview for this web page, because the following oEmbed / OpenGraph tags could not be found: description, image”
However, if the only missing tag is `image` we don’t need to display the error, as we have enough other data (title, description, etc.) to construct a useful/complete Onebox.
It used to insert block Oneboxes inside paragraphs which resulted in
invalid HTML. This needed an additional parsing for removal of empty
paragraphs and the resulting HTML could still be invalid.
This commit ensure that block Oneboxes are inserted correctly, by
splitting the paragraph containing the link and putting the block
between the two. Paragraphs left with nothing but whitespaces will
be removed.
Follow up to 7f3a30d79f.
* FEATURE: display error if Oneboxing fails due to HTTP error
- display warning if onebox URL is unresolvable
- display warning if attributes are missing
* FEATURE: Use new Instagram oEmbed endpoint if access token is configured
Instagram requires an Access Token to access their oEmbed endpoint. The requirements (from https://developers.facebook.com/docs/instagram/oembed/) are as follows:
- a Facebook Developer account, which you can create at developers.facebook.com
- a registered Facebook app
- the oEmbed Product added to the app
- an Access Token
- The Facebook app must be in Live Mode
The generated Access Token, once added to SiteSetting.facebook_app_access_token, will be passed to onebox. Onebox can then use this token to access the oEmbed endpoint to generate a onebox for Instagram.
* DEV: update user agent string
* DEV: don’t do HEAD requests against news.yahoo.com
* DEV: Bump onebox version from 2.1.5 to 2.1.6
* DEV: Avoid re-reading templates
* DEV: Tweaks to onebox mustache templates
* DEV: simplified error message for missing onebox data
* Apply suggestions from code review
Co-authored-by: Gerhard Schlager <mail@gerhard-schlager.at>
DEV: Replace instances of Discourse.base_uri with Discourse.base_path
This is clearer because the base_uri is actually just a path prefix. This continues the work started in 555f467.
This commit adds a new site setting "allowed_onebox_iframes". By default, all onebox iframes are allowed. When the list of domains is restricted, Onebox will automatically skip engines which require those domains, and use a fallback engine.
When linking to a topic in the same Discourse, we try to onebox the link to show the title
and other various information depending on whether it's a "standard" or "inline" onebox.
However, we were not properly detecting links to topics that had no slugs (eg. https://meta.discourse.org/t/1234).
We already cache failed onebox URL requests client-side, we now want to cache this on the server-side for extra protection. failed onebox previews will be cached for 1 hour, and any more requests for that URL will fail with a 404 status. Forcing a rebake via the Rebake HTML action will delete the failed URL cache (like how the oneboxer preview cache is deleted).
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains.
We no longer need to use Rails "require_dependency" anywhere and instead can just use standard
Ruby patterns to require files.
This is a far reaching change and we expect some followups here.
This change both speeds up specs (less strings to allocate) and helps catch
cases where methods in Discourse are mutating inputs.
Overall we will be migrating everything to use #frozen_string_literal: true
it will take a while, but this is the first and safest move in this direction
FEATURE: new 'max_oneboxes_per_post' site setting
FEATURE: change onebox whitelist to a blacklist
PERF: debounce the loading of oneboxes
PERF: improve perf of mention links in preview
FIX: sort loading of custom oneboxer
Since rspec-rails 3, the default installation creates two helper files:
* `spec_helper.rb`
* `rails_helper.rb`
`spec_helper.rb` is intended as a way of running specs that do not
require Rails, whereas `rails_helper.rb` loads Rails (as Discourse's
current `spec_helper.rb` does).
For more information:
https://www.relishapp.com/rspec/rspec-rails/docs/upgrade#default-helper-files
In this commit, I've simply replaced all instances of `spec_helper` with
`rails_helper`, and renamed the original `spec_helper.rb`.
This brings the Discourse project closer to the standard usage of RSpec
in a Rails app.
At present, every spec relies on loading Rails, but there are likely
many that don't need to. In a future pull request, I hope to introduce a
separate, minimal `spec_helper.rb` which can be used in tests which
don't rely on Rails.