discourse/lib/onebox/engine/standard_embed.rb
sansnumero f0c6dd5682
Add support for JSON LD in Onebox (#17007)
* FIX: Fix a bug that is accessing the values in a hash wrongly and write tests

I decided to write tests in order to be confident in my refactor that's in the next commit.
Meanwhile I have discovered a potential bug. The `title_attr` key was accessed as a string,
but all the keys are actually symbols so it was never evaluated to be true.

irb(main):025:0> d = {key: 'value'}
=> {:key=>"value"}
irb(main):026:0> d['key']
=> nil
irb(main):027:0> d[:key]
=> "value"

* DEV: Extract methods for readability

I will be adding a new method following the conventions in place for adding a new normalizer. And this will make the readability of the `raw` block even more difficult; so I am extracting self contained private methods beforehand.

* FEATURE: Parse JSON-LD and introduce Movie object

JSON LD data is very easily transferable to Ruby objects because they contain types. If these types are mapped to Ruby objects, it is also better to make all the parsed data very explicit and easily extendable.

JSON-LD has many more standardized item types, with a full list here: https://schema.org/docs/full.html
However in order to decrease the scope, I only adapted the movie type.

* DEV: Change inheritance between normalizers

Normalizers are not supposed to have an inheritance relationships amongst each other. They are all normalizers, but all normalizing separate protocols. This is why I chose to extract a parent class and relieve Open Graph off that responsibility. Removing the parent class altogether could also a possibility, but I am keeping the scope limited to having a more accurate representation of the normalizers while making it easier to add a new one.

* Lint changes

* Bring back the Oembed OpenGraph inheritance

There is one test that caught that this inheritance was necessary. I still think modelling wise this inheritance shouldn't exist, but this can be tackled separately.

* Return empty hash if the json received is invalid

Before this change if there was a parsing error with JSON it would throw an exception. The goal of this commit is to rescue that exception and then log a warning. I chose to use Discourse's logger wrapper `warn_exception` to have the backtrace and not just used Rails logger. I considered raising an `InvalidParameters` error however if the JSON here is invalid it should not block showing of the Onebox, so logging is enough.

* Prep to support more JSONLD schema types with case

* Extract mustache template object created from JSONLD
2022-06-13 17:32:34 +02:00

188 lines
5.1 KiB
Ruby

# frozen_string_literal: true
require 'cgi'
require 'onebox/normalizer'
require 'onebox/open_graph'
require 'onebox/oembed'
require 'onebox/json_ld'
module Onebox
module Engine
module StandardEmbed
def self.oembed_providers
@@oembed_providers ||= {}
end
def self.add_oembed_provider(regexp, endpoint)
oembed_providers[regexp] = endpoint
end
def self.opengraph_providers
@@opengraph_providers ||= []
end
def self.add_opengraph_provider(regexp)
opengraph_providers << regexp
end
# Some oembed providers (like meetup.com) don't provide links to themselves
add_oembed_provider(/www\.meetup\.com\//, 'http://api.meetup.com/oembed')
add_oembed_provider(/www\.mixcloud\.com\//, 'https://www.mixcloud.com/oembed/')
# In order to support Private Videos
add_oembed_provider(/vimeo\.com\//, 'https://vimeo.com/api/oembed.json')
# NYT requires login so use oembed only
add_oembed_provider(/nytimes\.com\//, 'https://www.nytimes.com/svc/oembed/json/')
def always_https?
AllowlistedGenericOnebox.host_matches(uri, AllowlistedGenericOnebox.https_hosts) || super
end
def raw
return @raw if defined?(@raw)
@raw = {}
set_opengraph_data_on_raw
set_twitter_data_on_raw
set_oembed_data_on_raw
set_json_ld_data_on_raw
set_favicon_data_on_raw
set_description_on_raw
@raw
end
protected
def html_doc
return @html_doc if defined?(@html_doc)
headers = nil
headers = { 'Cookie' => options[:cookie] } if options[:cookie]
@html_doc = Onebox::Helpers.fetch_html_doc(url, headers)
end
def get_oembed
@oembed ||= Onebox::Oembed.new(get_json_response)
end
def get_opengraph
@opengraph ||= ::Onebox::OpenGraph.new(html_doc)
end
def get_twitter
return {} unless html_doc
twitter = {}
html_doc.css('meta').each do |m|
if (m["property"] && m["property"][/^twitter:(.+)$/i]) || (m["name"] && m["name"][/^twitter:(.+)$/i])
value = (m["content"] || m["value"]).to_s
twitter[$1.tr('-:' , '_').to_sym] ||= value unless (Onebox::Helpers::blank?(value) || value == "0 minutes")
end
end
twitter
end
def get_favicon
return nil unless html_doc
favicon = html_doc.css('link[rel="shortcut icon"], link[rel="icon shortcut"], link[rel="shortcut"], link[rel="icon"]').first
favicon = favicon.nil? ? nil : (favicon['href'].nil? ? nil : favicon['href'].strip)
Onebox::Helpers::get_absolute_image_url(favicon, url)
end
def get_description
return nil unless html_doc
description = html_doc.at("meta[name='description']").to_h['content']
description ||= html_doc.at("meta[name='Description']").to_h['content']
description
end
def get_json_response
oembed_url = get_oembed_url
return "{}" if Onebox::Helpers.blank?(oembed_url)
Onebox::Helpers.fetch_response(oembed_url) rescue "{}"
rescue Errno::ECONNREFUSED, Net::HTTPError, Net::HTTPFatalError, MultiJson::LoadError
"{}"
end
def get_oembed_url
oembed_url = nil
StandardEmbed.oembed_providers.each do |regexp, endpoint|
if url =~ regexp
oembed_url = "#{endpoint}?url=#{url}"
break
end
end
if html_doc
if Onebox::Helpers.blank?(oembed_url)
application_json = html_doc.at("//link[@type='application/json+oembed']/@href")
oembed_url = application_json.value if application_json
end
if Onebox::Helpers.blank?(oembed_url)
text_json = html_doc.at("//link[@type='text/json+oembed']/@href")
oembed_url ||= text_json.value if text_json
end
end
oembed_url
end
def get_json_ld
@json_ld ||= Onebox::JsonLd.new(html_doc)
end
def set_from_normalizer_data(normalizer)
normalizer.data.each do |k, v|
v = normalizer.send(k)
@raw[k] ||= v unless v.nil?
end
end
def set_opengraph_data_on_raw
og = get_opengraph
set_from_normalizer_data(og)
@raw.except!(:title_attr)
end
def set_twitter_data_on_raw
twitter = get_twitter
twitter.each { |k, v| @raw[k] ||= v unless Onebox::Helpers::blank?(v) }
end
def set_oembed_data_on_raw
oembed = get_oembed
set_from_normalizer_data(oembed)
end
def set_json_ld_data_on_raw
json_ld = get_json_ld
set_from_normalizer_data(json_ld)
end
def set_favicon_data_on_raw
favicon = get_favicon
@raw[:favicon] = favicon unless Onebox::Helpers::blank?(favicon)
end
def set_description_on_raw
unless @raw[:description]
description = get_description
@raw[:description] = description unless Onebox::Helpers::blank?(description)
end
end
end
end
end