Commit Graph

69 Commits

Author SHA1 Message Date
Martin Brennan
560e8aff75
FEATURE: Allow oneboxing private GitHub URLs (#27705)
This commit adds the ability to onebox private GitHub
commits, pull requests, issues, blobs, and actions using
a new `github_onebox_access_token` site setting. The token
must be set up in correctly to have access to the repos needed.

To do this successfully with the Oneboxer, we need to skip
redirects on the github.com host, otherwise we get a 404
on the URL before it is translated into a GitHub API URL
and has the appropriate headers added.
2024-07-10 09:39:31 +10:00
Jan Cernik
311b737c91
SECURITY: Fix Stored-dom XSS via Facebook Oneboxes 2024-07-03 20:49:22 +08:00
Rafael dos Santos Silva
b2a9676f0b
FEATURE: Support Spotify Onebox (#27540) 2024-06-19 13:27:27 -03:00
Jan Cernik
30e963be03
DEV: Add spec for x.com onebox url matcher (#27214) 2024-05-28 09:04:20 -03:00
dsims
e6e3eaf472
FIX: avoid error from missing meta tags (#26927) 2024-05-14 11:41:53 -04:00
Alan Guo Xiang Tan
c8da2a33e8
FIX: Attempt to onebox even if response body exceeds max_download_kb (#26929)
In 95a82d608d, we lowered the default for
`Onebox.options.max_download_kb` from 10mb to 2mb for security hardening
purposes. However, this resulted in multiple bug reports where seemingly
nomral URLs stopped being oneboxed. It turns out that lowering
`Onebox.options.max_download_kb` resulted in `Onebox::Helpers::DownloadTooLarge` being raised
more often for more URLs  in `Onebox::Helpers.fetch_response` which
`Onebox::Helpers.fetch_html_doc` relies on. When
`Onebox::Helpers::DownloadTooLarge` is raised in
`Onebox::Helpers.fetch_response`, we throw away whatever response body
which we have already downloaded at that point. This is not ideal
because Nokogiri can parse incomplete HTML documents and there is a
really high chance that the incomplete HTML document still contains the
information which we need for oneboxing.

Therefore, this commit updates `Onebox::Helpers.fetch_html_doc` to not
throw away the response body when the size of the response body exceeds
`Onebox.options.max_download_size`. Instead, we just take whatever
response which we have and get Nokogiri to parse it.
2024-05-09 07:00:34 +08:00
Blake Erickson
40b707a690
FEATURE: Add onebox for loom (#26016)
Loom share links will now onebox and use the embedded loom player.
2024-03-04 15:12:08 -07:00
Roman Rizzi
a709b7e861
FIX: Allow sanitized-HTML in GH issues and categories oneboxes. (#25374)
Follow-up to d78357917c

Related meta topic: https://meta.discourse.org/t/html-is-not-render-on-category-onebox-description/289424:
2024-01-22 15:25:29 -03:00
Jarek Radosz
694b5f108b
DEV: Fix various rubocop lints (#24749)
These (21 + 3 from previous PRs) are soon to be enabled in rubocop-discourse:

Capybara/VisibilityMatcher
Lint/DeprecatedOpenSSLConstant
Lint/DisjunctiveAssignmentInConstructor
Lint/EmptyConditionalBody
Lint/EmptyEnsure
Lint/LiteralInInterpolation
Lint/NonLocalExitFromIterator
Lint/ParenthesesAsGroupedExpression
Lint/RedundantCopDisableDirective
Lint/RedundantRequireStatement
Lint/RedundantSafeNavigation
Lint/RedundantStringCoercion
Lint/RedundantWithIndex
Lint/RedundantWithObject
Lint/SafeNavigationChain
Lint/SafeNavigationConsistency
Lint/SelfAssignment
Lint/UnreachableCode
Lint/UselessMethodDefinition
Lint/Void

Previous PRs:
Lint/ShadowedArgument
Lint/DuplicateMethods
Lint/BooleanSymbol
RSpec/SpecFilePathSuffix
2023-12-06 23:25:00 +01:00
Jarek Radosz
7196613e2e
DEV: Fix various spec linting issues (#24672)
Duplicated specs, incorrect descriptions, incorrect assertions, incorrect filenames, old todo
2023-12-04 13:45:19 +01:00
Ted Johansson
95a82d608d SECURITY: Prevent Onebox cache overflow by limiting downloads and URL lengths 2023-11-09 13:39:18 +11:00
Roman Rizzi
d78357917c SECURITY: Onebox templates' HTML injections.
The use of triple-curlies on Mustache templates opens the possibility for HTML injections.
2023-11-09 13:39:11 +11:00
Ted Johansson
b2a5f5802a
DEV: Replace custom Onebox symbolize_keys implementation with ActiveSupport (#23828)
We have a custom implementation of #symbolize_keys in our Onebox helpers. This is likely a legacy from when Onebox was a standalone gem. This change replaces all usages with either #deep_symbolize_keys from ActiveSupport, or appropriate option to the JSON parser gem used.
2023-10-09 09:32:09 +02:00
Ted Johansson
60e624e768
DEV: Replace custom Onebox blank implementation with ActiveSupport (#23827)
We have a custom implementation of #blank? in our Onebox helpers. This is likely a legacy from when Onebox was a standalone gem. This change replaces all usages with respective incarnations of #blank?, #present?, and #presence from ActiveSupport. It changes a bunch of "unless blank" to "if present" as well.
2023-10-07 19:54:26 +02:00
Rafael dos Santos Silva
d10e9a6c1d
FEATURE: Onebox and Download for WEBP and AVIF (#23235)
This adds support for oneboxing WEBP and AVIF images in posts and fixing
oneboxing fixes download remote images for those formats too.

Reported in https://meta.discourse.org/t/-/276433?u=falco
2023-08-24 16:44:06 -03:00
Joffrey JAFFEUX
df7dab9dce
FIX: ensures generic onebox has width/height for thumbnail (#23040)
Prior to this fix we would output an image with no width/height which would then bypass a large part of `CookedProcessorMixin` and have no aspect ratio. As a result, an image with no size would cause layout shift.

It also removes a fix for oneboxes in chat messages due to this case.
2023-08-09 20:31:11 +02:00
Roman Agilov
3eac47443f
FEATURE: Add audio.com onebox provider (#22936)
* Audio.com provider added to onebox
* added specs for audio.com onebox provider
2023-08-08 16:55:04 +10:00
Ryan Vandersmith
44a104dff8
FIX: Update "Embed Motoko" Onebox URLs (#22198)
Embed Motoko service's primary URL is transiting from embed.smartcontracts.org to embed.motoko.org, this PR updates the Onebox logic to work for either domain.
2023-07-26 09:41:01 +08:00
Blake Erickson
9e8010df8b
DEV: Use thumbnail url for wikimedia onebox image (#22620)
Wikimedia provides a thumbnail url for its images, so we should use that
for oneboxes instead of the full-size image. Because the size of the
  onebox image we display is quite small anyways the thumbnail wikimedia
  provides should suffice and will save bandwidth.

See: https://meta.discourse.org/t/264039
2023-07-14 12:20:18 -06:00
Rafael dos Santos Silva
3fd327c458
FEATURE: Basic support for threads.net onebox (#22471) 2023-07-06 16:02:49 -03:00
Loïc Guitaut
0f4beab0fb DEV: Update the rubocop-discourse gem
This enables cops related to RSpec `subject`.

See https://github.com/discourse/rubocop-discourse/pull/32
2023-06-26 11:41:52 +02:00
Jan Cernik
24c90534fb
FIX: Use Twitter API v2 for oneboxes and restore OpenGraph fallback (#22187) 2023-06-22 14:39:02 -03:00
Richard
2b7c677a8c Fix tests 2023-05-15 16:45:33 +02:00
Loïc Guitaut
8b67a534a0 FIX: Allow floats for zoom level in Google Maps onebox
Sometimes we get Maps URL containing a zoom level as a float (17.5z and
not 17z) but this doesn’t work with our current onebox implementation.

While Google accepts those float zoom levels, it removes automatically
the floating part in the URL (thus when visiting a Maps URL containing
17.5z, the URL will be rewritten shortly after as 17z). When putting a
float zoom level in an embedded URL, this actually breaks (Maps API
returns a 400 error).

This patch addresses the issue by allowing the onebox engine to match on
a zoom level expressed as a float but we only keep the integer part thus
rendering properly maps.
2023-03-01 12:45:33 +01:00
Loïc Guitaut
14d97f9cf1 FEATURE: Show more context in Discourse topic oneboxes
Currently when generating a onebox for Discourse topics, some important
context is missing such as categories and tags.

This patch addresses this issue by introducing a new onebox engine
dedicated to display this information when available. Indeed to get this
new information, categories and tags are exposed in the topic metadata
as opengraph tags.
2023-01-11 14:22:53 +01:00
David Taylor
cb932d6ee1
DEV: Apply syntax_tree formatting to spec/* 2023-01-09 11:49:28 +00:00
Ryan Vandersmith
e6439e89cf
FEATURE: Onebox for Embed Motoko (#19293) 2022-12-16 09:59:40 -05:00
Rafael dos Santos Silva
d247e5d37c
FEATURE: Youtube Short onebox support (#19335)
* FEATURE: Youtube Shorts onebox support

Co-authored-by: Canapin <canapin@gmail.com>
2022-12-06 11:56:48 -03:00
Jarek Radosz
c32fe340f0
DEV: Fix mocha deprecations (#18828)
It now supports strict keyword argument matching by default.
2022-11-02 10:47:59 +01:00
David Taylor
68b4fe4cf8
SECURITY: Expand and improve SSRF Protections (#18815)
See https://github.com/discourse/discourse/security/advisories/GHSA-rcc5-28r3-23rr

Co-authored-by: OsamaSayegh <asooomaasoooma90@gmail.com>
Co-authored-by: Daniel Waterworth <me@danielwaterworth.com>
2022-11-01 16:33:17 +00:00
Bianca Nenciu
266e165885
FIX: Use only first line from commit message (#18724)
Linking a commit from a GitHub pull request included the complete commit
message, instead of just the first line. The rest of the commit message
will be added to the body of the Onebox.
2022-10-24 22:26:48 +03:00
Bianca Nenciu
73e9875a1d
FEATURE: Handle oneboxes for complex GitHub URLs (#18474)
GitHub PR URLs can link to a commit of the PR, a comment or a review
discussion.
2022-10-06 20:26:04 +03:00
Jarek Radosz
08e63ddab2
DEV: Fix spec file name (#18227)
Match the impl file name
2022-09-12 14:03:23 +02:00
Bianca Nenciu
626d50c15c
FIX: Disable Twitter onebox without API support (#17519)
Twitter removed OpenGraph tags from their pages. We can no longer
extract all the information (for example, the quoted tweet) we need
to render Oneboxes without using their API.
2022-08-17 18:32:48 +03:00
Loïc Guitaut
00b3f0e2c4 DEV: Make the first argument to the top-level describe a constant in specs 2022-08-08 18:07:49 +02:00
Loïc Guitaut
3eaac56797 DEV: Use proper wording for contexts in specs 2022-08-04 11:05:02 +02:00
Phil Pirozhkov
493d437e79
Add RSpec 4 compatibility (#17652)
* Remove outdated option

04078317ba

* Use the non-globally exposed RSpec syntax

https://github.com/rspec/rspec-core/pull/2803

* Use the non-globally exposed RSpec syntax, cont

https://github.com/rspec/rspec-core/pull/2803

* Comply to strict predicate matchers

See:
 - https://github.com/rspec/rspec-expectations/pull/1195
 - https://github.com/rspec/rspec-expectations/pull/1196
 - https://github.com/rspec/rspec-expectations/pull/1277
2022-07-28 10:27:38 +08:00
Bianca Nenciu
e7f04a8674
FIX: Use URI#merge to merge base and relative URLs (#17454)
The old implementation did not handle all cases, such as the case when
`src` is a relative URL that starts with `..`.
2022-07-18 14:17:54 +03:00
Penar Musaraj
3baefa25b5
FIX: Use first supported type item when JSON-LD returns array (#17217) 2022-06-23 13:02:01 -04:00
Jarek Radosz
f723b4c322
FIX: Handle sites with more than 1 JSON-LD element (#17095)
A followup to #17007
2022-06-15 02:55:55 +02:00
sansnumero
f0c6dd5682
Add support for JSON LD in Onebox (#17007)
* FIX: Fix a bug that is accessing the values in a hash wrongly and write tests

I decided to write tests in order to be confident in my refactor that's in the next commit.
Meanwhile I have discovered a potential bug. The `title_attr` key was accessed as a string,
but all the keys are actually symbols so it was never evaluated to be true.

irb(main):025:0> d = {key: 'value'}
=> {:key=>"value"}
irb(main):026:0> d['key']
=> nil
irb(main):027:0> d[:key]
=> "value"

* DEV: Extract methods for readability

I will be adding a new method following the conventions in place for adding a new normalizer. And this will make the readability of the `raw` block even more difficult; so I am extracting self contained private methods beforehand.

* FEATURE: Parse JSON-LD and introduce Movie object

JSON LD data is very easily transferable to Ruby objects because they contain types. If these types are mapped to Ruby objects, it is also better to make all the parsed data very explicit and easily extendable.

JSON-LD has many more standardized item types, with a full list here: https://schema.org/docs/full.html
However in order to decrease the scope, I only adapted the movie type.

* DEV: Change inheritance between normalizers

Normalizers are not supposed to have an inheritance relationships amongst each other. They are all normalizers, but all normalizing separate protocols. This is why I chose to extract a parent class and relieve Open Graph off that responsibility. Removing the parent class altogether could also a possibility, but I am keeping the scope limited to having a more accurate representation of the normalizers while making it easier to add a new one.

* Lint changes

* Bring back the Oembed OpenGraph inheritance

There is one test that caught that this inheritance was necessary. I still think modelling wise this inheritance shouldn't exist, but this can be tackled separately.

* Return empty hash if the json received is invalid

Before this change if there was a parsing error with JSON it would throw an exception. The goal of this commit is to rescue that exception and then log a warning. I chose to use Discourse's logger wrapper `warn_exception` to have the backtrace and not just used Rails logger. I considered raising an `InvalidParameters` error however if the JSON here is invalid it should not block showing of the Onebox, so logging is enough.

* Prep to support more JSONLD schema types with case

* Extract mustache template object created from JSONLD
2022-06-13 17:32:34 +02:00
David Taylor
ff93833fdf
UX: Use committed date for GitHub oneboxes (#16318)
Our copy says 'committed {date}`, but we were previously using the commit's authored date
2022-03-30 09:16:28 +08:00
jbrw
fc30669db2
FIX: Support new layout on Amazon product pages (#16091)
Some product pages on Amazon are using a new HTML structure, meaning the previous Onebox engine was unable to gather the price and/or description. This change should allow these pages to be Oneboxed.
2022-03-04 18:31:53 -05:00
David Taylor
c9dab6fd08
DEV: Automatically require 'rails_helper' in all specs (#16077)
It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors.

By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.
2022-03-01 17:50:50 +00:00
Alan Guo Xiang Tan
7afe768d60
DEV: Add tests for wistia onebox. (#15860)
Follow-up to 4ef56b0ca4
2022-02-08 13:04:32 +08:00
Rafael dos Santos Silva
5b5cbbfe5c
FEATURE: Onebox for news.ycombinator.com (#15781) 2022-02-03 13:39:21 -03:00
Natalie Tay
aac9f43038
Only block domains at the final destination (#15689)
In an earlier PR, we decided that we only want to block a domain if 
the blocked domain in the SiteSetting is the final destination (/t/59305). That 
PR used `FinalDestination#get`. `resolve` however is used several places
 but blocks domains along the redirect chain when certain options are provided.

This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed 
and youtube
  - these folks already go through `Oneboxer#external_onebox` which already
  blocks correctly
2022-01-31 15:35:12 +08:00
Bianca Nenciu
376799b1a4
FIX: Hide excerpt of binary files in GitHub onebox (#15639)
Oneboxer did not know if a file is binary or not and always tried to
show an excerpt of the file.
2022-01-19 14:45:36 +02:00
jbrw
2909b8b820
FIX: origins_to_regexes should always return an array (#15589)
If the SiteSetting `allowed_onebox_iframes` contains a value of `*`, it will use the values of `all_iframe_origins` during the Oneboxing process. If `all_iframe_origins` itself contains a value of `*`, `origins_to_regexes` will try to return a "catch-all" regex.

Other code assumes `origins_to_regexes`will return an array, so this change ensures the `*` case will return an array containing only the catch-all regex.
2022-01-17 12:48:41 -05:00
Jarek Radosz
31b27b3712
FIX: Broken GitHub folder onebox logic (#15612)
1. `html_doc.css('.Box.md')` always returns a truthy value (e.g. `[]`) so the second branch of the if-elsif never ran
2. `node&.css('text()')` was invalid code that would raise an error
3. Matching on h3 elements is no longer correct with the current html structure returned by GitHub
2022-01-17 18:32:07 +01:00