Twitter removed OpenGraph tags from their pages. We can no longer
extract all the information (for example, the quoted tweet) we need
to render Oneboxes without using their API.
* FIX: Fix a bug that is accessing the values in a hash wrongly and write tests
I decided to write tests in order to be confident in my refactor that's in the next commit.
Meanwhile I have discovered a potential bug. The `title_attr` key was accessed as a string,
but all the keys are actually symbols so it was never evaluated to be true.
irb(main):025:0> d = {key: 'value'}
=> {:key=>"value"}
irb(main):026:0> d['key']
=> nil
irb(main):027:0> d[:key]
=> "value"
* DEV: Extract methods for readability
I will be adding a new method following the conventions in place for adding a new normalizer. And this will make the readability of the `raw` block even more difficult; so I am extracting self contained private methods beforehand.
* FEATURE: Parse JSON-LD and introduce Movie object
JSON LD data is very easily transferable to Ruby objects because they contain types. If these types are mapped to Ruby objects, it is also better to make all the parsed data very explicit and easily extendable.
JSON-LD has many more standardized item types, with a full list here: https://schema.org/docs/full.html
However in order to decrease the scope, I only adapted the movie type.
* DEV: Change inheritance between normalizers
Normalizers are not supposed to have an inheritance relationships amongst each other. They are all normalizers, but all normalizing separate protocols. This is why I chose to extract a parent class and relieve Open Graph off that responsibility. Removing the parent class altogether could also a possibility, but I am keeping the scope limited to having a more accurate representation of the normalizers while making it easier to add a new one.
* Lint changes
* Bring back the Oembed OpenGraph inheritance
There is one test that caught that this inheritance was necessary. I still think modelling wise this inheritance shouldn't exist, but this can be tackled separately.
* Return empty hash if the json received is invalid
Before this change if there was a parsing error with JSON it would throw an exception. The goal of this commit is to rescue that exception and then log a warning. I chose to use Discourse's logger wrapper `warn_exception` to have the backtrace and not just used Rails logger. I considered raising an `InvalidParameters` error however if the JSON here is invalid it should not block showing of the Onebox, so logging is enough.
* Prep to support more JSONLD schema types with case
* Extract mustache template object created from JSONLD
Some product pages on Amazon are using a new HTML structure, meaning the previous Onebox engine was unable to gather the price and/or description. This change should allow these pages to be Oneboxed.
It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors.
By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.
In an earlier PR, we decided that we only want to block a domain if
the blocked domain in the SiteSetting is the final destination (/t/59305). That
PR used `FinalDestination#get`. `resolve` however is used several places
but blocks domains along the redirect chain when certain options are provided.
This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed
and youtube
- these folks already go through `Oneboxer#external_onebox` which already
blocks correctly
If the SiteSetting `allowed_onebox_iframes` contains a value of `*`, it will use the values of `all_iframe_origins` during the Oneboxing process. If `all_iframe_origins` itself contains a value of `*`, `origins_to_regexes` will try to return a "catch-all" regex.
Other code assumes `origins_to_regexes`will return an array, so this change ensures the `*` case will return an array containing only the catch-all regex.
1. `html_doc.css('.Box.md')` always returns a truthy value (e.g. `[]`) so the second branch of the if-elsif never ran
2. `node&.css('text()')` was invalid code that would raise an error
3. Matching on h3 elements is no longer correct with the current html structure returned by GitHub
When attempting to Onebox a page if there is no `meta property="og:description"` tag but there is a `meta name="description"` tag, Onebox should try to use that value.
We are no longer able to display the image returned by Instagram directly within a Discourse site (either in the composer, or within a cooked post within a topic), so:
- Display an image placeholder in the composer preview
- A cooked post should use an iframe to display the Instagram 'embed' content
* FEATURE: Onebox can match engines based on the content_type
`FinalDestination` now returns the `content_type` of a resolved URL.
`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.
`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.
This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
By default, Twitter will return the URL for the avatar image of the tweet poster as the `og:image` value.
However, if the `user_generated` attribute is true, we should not use this as the avatar URL as this will be an URL of an image in the tweet itself (e.g., an image belonging to a tweeted news story).
* FIX: Quoting Oneboxed content should exclude formatting
When a post is quoted that includes Oneboxed content, we should not include the formatting generated by the Onebox. Rather, we should attempt to collapse the link referenced by the Onebox to a single line text link.
* DEV: fix tests
IMDb movie links were being rendered as posters. This was because
IMDb was sending `og:type` as `image` randomly in some cases. To
fix this we'll now default all IMDb links as article type. This will
ensure that the IMDb onebox link includes all the information instead
of showing just a poster without any context.
* FIX: return an empty result if response from Amazon is missing attributes
Check we have the basic attributes requires to construct a Onebox for Amazon.
This is an attempt to handle scenarios where we receive a valid 200-status response from an Amazon request that does not include the data we’re expecting.
* Update lib/onebox/engine/amazon_onebox.rb
Co-authored-by: Régis Hanol <regis@hanol.fr>
Co-authored-by: Régis Hanol <regis@hanol.fr>
Some specs failed when `LOAD_PLUGINS=1` was set while migrating the test DB and the narrative-bot plugin disabled the `send_welcome_message` site setting.
* FIX: Improve GitHub folder regexp in Onebox
It used to match any GitHub URL that was not matched by the other GitHub
Oneboxes and it did not do a good job at handling those. With this
change, the generic Onebox will handle the remaining URLs.
* FEATURE: Add Onebox for GitHub Actions
* FEATURE: Add Onebox for PR check runs
* FIX: Remove image from GitHub folder Oneboxes
It is a generic, auto-generated image which does not provide any value.
* DEV: Add tests
* FIX: Strip HTML comments from PR body
* Move onebox gem in core library
* Update template file path
* Remove warning for onebox gem caching
* Remove onebox version file
* Remove onebox gem
* Add sanitize gem
* Require onebox library in lazy-yt plugin
* Remove onebox web specific code
This code was used in standalone onebox Sinatra application
* Merge Discourse specific AllowlistedGenericOnebox engine in core
* Fix onebox engine filenames to match class name casing
* Move onebox specs from gem into core
* DEV: Rename `response` helper to `onebox_response`
Fixes a naming collision.
* Require rails_helper
* Don't use `before/after(:all)`
* Whitespace
* Remove fakeweb
* Remove poor unit tests
* DEV: Re-add fakeweb, plugins are using it
* Move onebox helpers
* Stub Instagram API
* FIX: Follow additional redirect status codes (#476)
Don’t throw errors if we encounter 303, 307 or 308 HTTP status codes in responses
* Remove an empty file
* DEV: Update the license file
Using the copy from https://choosealicense.com/licenses/gpl-2.0/#
Hopefully this will enable GitHub to show the license UI?
* DEV: Update embedded copyrights
* DEV: Add Onebox copyright notice
* DEV: Add MIT license, convert COPYRIGHT.txt to md
* DEV: Remove an incorrect copyright claim
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
Co-authored-by: jbrw <jamie@goatforce5.org>