Commit Graph

74 Commits

Author SHA1 Message Date
Rafael dos Santos Silva
5b5cbbfe5c
FEATURE: Onebox for news.ycombinator.com (#15781) 2022-02-03 13:39:21 -03:00
Natalie Tay
aac9f43038
Only block domains at the final destination (#15689)
In an earlier PR, we decided that we only want to block a domain if 
the blocked domain in the SiteSetting is the final destination (/t/59305). That 
PR used `FinalDestination#get`. `resolve` however is used several places
 but blocks domains along the redirect chain when certain options are provided.

This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed 
and youtube
  - these folks already go through `Oneboxer#external_onebox` which already
  blocks correctly
2022-01-31 15:35:12 +08:00
Bianca Nenciu
376799b1a4
FIX: Hide excerpt of binary files in GitHub onebox (#15639)
Oneboxer did not know if a file is binary or not and always tried to
show an excerpt of the file.
2022-01-19 14:45:36 +02:00
jbrw
2909b8b820
FIX: origins_to_regexes should always return an array (#15589)
If the SiteSetting `allowed_onebox_iframes` contains a value of `*`, it will use the values of `all_iframe_origins` during the Oneboxing process. If `all_iframe_origins` itself contains a value of `*`, `origins_to_regexes` will try to return a "catch-all" regex.

Other code assumes `origins_to_regexes`will return an array, so this change ensures the `*` case will return an array containing only the catch-all regex.
2022-01-17 12:48:41 -05:00
Jarek Radosz
31b27b3712
FIX: Broken GitHub folder onebox logic (#15612)
1. `html_doc.css('.Box.md')` always returns a truthy value (e.g. `[]`) so the second branch of the if-elsif never ran
2. `node&.css('text()')` was invalid code that would raise an error
3. Matching on h3 elements is no longer correct with the current html structure returned by GitHub
2022-01-17 18:32:07 +01:00
jbrw
6e925fee6f
FIX: Use basic meta description if other description tags are missing (#15356)
When attempting to Onebox a page if there is no `meta property="og:description"` tag but there is a  `meta name="description"` tag, Onebox should try to use that value.
2021-12-17 19:36:54 -05:00
jbrw
aec125b617
FIX: Display Instagram Oneboxes in an iframe (#14789)
We are no longer able to display the image returned by Instagram directly within a Discourse site (either in the composer, or within a cooked post within a topic), so:

- Display an image placeholder in the composer preview
- A cooked post should use an iframe to display the Instagram 'embed' content
2021-11-02 14:34:51 -04:00
Chema Balsas
745b99edbf TEST: Adds test for urls with url-encoded section hash 2021-08-12 10:43:50 -04:00
Chema Balsas
6b8ee4d5ef TEST: Adds test for urls with section hash 2021-08-12 10:43:50 -04:00
jbrw
2f28ba318c
FEATURE: Onebox can match engines based on the content_type (#13876)
* FEATURE: Onebox can match engines based on the content_type

`FinalDestination` now returns the `content_type` of a resolved URL.

`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.

`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.

This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
2021-07-30 13:36:30 -04:00
Michael Brown
76a11e6dc9 DEV: fix test (missed a reference to master) 2021-07-19 12:47:45 -04:00
Michael Brown
aa12d12c0b discourse/discourse change from 'master' to 'main': update fixture data 2021-07-19 11:46:15 -04:00
David Taylor
8b89787426
SECURITY: Sanitize YouTube Onebox data (#13748)
CVE-2021-32764
2021-07-15 19:31:50 +01:00
jbrw
a64aea38b7
FIX: Don’t use user_generated images as avatar images in Oneboxed Twitter content (#13712)
By default, Twitter will return the URL for the avatar image of the tweet poster as the `og:image` value.

However, if the `user_generated` attribute is true, we should not use this as the avatar URL as this will be an URL of an image in the tweet itself (e.g., an image belonging to a tweeted news story).
2021-07-13 14:54:28 -04:00
Arpit Jalan
05bdbd9f97
SECURITY: Onebox canonical links bypassing FinalDestination checks (#13605) 2021-07-01 20:09:29 +05:30
Arpit Jalan
b63c9febe8
FIX: ignore canonical link to localhost (#13577) 2021-06-30 13:55:17 +05:30
jbrw
09bc95d46b
FIX: Quoting Oneboxed content should exclude formatting (#13296)
* FIX: Quoting Oneboxed content should exclude formatting

When a post is quoted that includes Oneboxed content, we should not include the formatting generated by the Onebox. Rather, we should attempt to collapse the link referenced by the Onebox to a single line text link.

* DEV: fix tests
2021-06-07 13:03:53 -04:00
Arpit Jalan
2e4f07678e
FIX: IMDb links were being oneboxed as posters (#13310)
IMDb movie links were being rendered as posters. This was because
IMDb was sending `og:type` as `image` randomly in some cases. To
fix this we'll now default all IMDb links as article type. This will
ensure that the IMDb onebox link includes all the information instead
of showing just a poster without any context.
2021-06-07 18:45:59 +05:30
jbrw
461a2c334b
FIX: return an empty result if response from Amazon is missing expected attributes (#13173)
* FIX: return an empty result if response from Amazon is missing attributes

Check we have the basic attributes requires to construct a Onebox for Amazon.

This is an attempt to handle scenarios where we receive a valid 200-status response from an Amazon request that does not include the data we’re expecting.

* Update lib/onebox/engine/amazon_onebox.rb

Co-authored-by: Régis Hanol <regis@hanol.fr>

Co-authored-by: Régis Hanol <regis@hanol.fr>
2021-06-01 16:23:18 -04:00
Gerhard Schlager
3df928d609
DEV: Fix flaky specs (#13226)
Some specs failed when `LOAD_PLUGINS=1` was set while migrating the test DB and the narrative-bot plugin disabled the `send_welcome_message` site setting.
2021-06-01 14:38:55 +02:00
Penar Musaraj
06e1af2b1d
FIX: Giphy oneboxing when the response is an image (#13199) 2021-05-28 15:10:32 -04:00
Penar Musaraj
47e09700fe
FIX: Support pausing GIFs for giphy/tenor oneboxes (#13194) 2021-05-28 08:40:30 -04:00
Dan Ungureanu
723d7de18c
Various GitHub Onebox improvements (#13163)
* FIX: Improve GitHub folder regexp in Onebox

It used to match any GitHub URL that was not matched by the other GitHub
Oneboxes and it did not do a good job at handling those. With this
change, the generic Onebox will handle the remaining URLs.

* FEATURE: Add Onebox for GitHub Actions

* FEATURE: Add Onebox for PR check runs

* FIX: Remove image from GitHub folder Oneboxes

It is a generic, auto-generated image which does not provide any value.

* DEV: Add tests

* FIX: Strip HTML comments from PR body
2021-05-27 12:38:42 +03:00
Arpit Jalan
283b08d45f
DEV: Absorb onebox gem into core (#12979)
* Move onebox gem in core library

* Update template file path

* Remove warning for onebox gem caching

* Remove onebox version file

* Remove onebox gem

* Add sanitize gem

* Require onebox library in lazy-yt plugin

* Remove onebox web specific code

This code was used in standalone onebox Sinatra application

* Merge Discourse specific AllowlistedGenericOnebox engine in core

* Fix onebox engine filenames to match class name casing

* Move onebox specs from gem into core

* DEV: Rename `response` helper to `onebox_response`

Fixes a naming collision.

* Require rails_helper

* Don't use `before/after(:all)`

* Whitespace

* Remove fakeweb

* Remove poor unit tests

* DEV: Re-add fakeweb, plugins are using it

* Move onebox helpers

* Stub Instagram API

* FIX: Follow additional redirect status codes (#476)

Don’t throw errors if we encounter 303, 307 or 308 HTTP status codes in responses

* Remove an empty file

* DEV: Update the license file

Using the copy from https://choosealicense.com/licenses/gpl-2.0/#

Hopefully this will enable GitHub to show the license UI?

* DEV: Update embedded copyrights

* DEV: Add Onebox copyright notice

* DEV: Add MIT license, convert COPYRIGHT.txt to md

* DEV: Remove an incorrect copyright claim

Co-authored-by: Jarek Radosz <jradosz@gmail.com>
Co-authored-by: jbrw <jamie@goatforce5.org>
2021-05-26 15:11:35 +05:30