mirror of
https://github.com/discourse/discourse.git
synced 2024-11-25 06:30:15 +08:00
de9a031073
* FEATURE: use canonical links in posts.rss feed Previously we used non canonical links in posts.rss These links get crawled frequently by crawlers when discovering new content forcing crawlers to hop to non canonical pages just to end up visiting canonical pages This uses up expensive crawl time and adds load on Discourse sites Old links were of the form: `https://DOMAIN/t/SLUG/43/21` New links are of the form `https://DOMAIN/t/SLUG/43?page=2#post_21` This also adds a post_id identified element to crawler view that was missing. Note, to avoid very expensive N+1 queries required to figure out the page a post is on during rss generation, we cache that information. There is a smart "cache breaker" which ensures worst case scenario is a "page drift" - meaning we would publicize a post is on page 11 when it is actually on page 10 due to post deletions. Cache holds for up to 12 hours. Change only impacts public post RSS feeds (`/posts.rss`)
40 lines
846 B
Ruby
40 lines
846 B
Ruby
# frozen_string_literal: true
|
|
|
|
module PostsHelper
|
|
include ApplicationHelper
|
|
|
|
CACHE_URL_DURATION = 12.hours.to_i
|
|
|
|
def self.clear_canonical_cache!(post)
|
|
key = canonical_redis_key(post)
|
|
Discourse.redis.del(key)
|
|
end
|
|
|
|
def self.canonical_redis_key(post)
|
|
"post_canonical_url_#{post.id}"
|
|
end
|
|
|
|
def cached_post_url(post, use_canonical:)
|
|
if use_canonical
|
|
# this is very expensive to calculate page, we cache it for 12 hours
|
|
key = PostsHelper.canonical_redis_key(post)
|
|
|
|
url = Discourse.redis.get(key)
|
|
|
|
# break cache if either slug or topic_id changes
|
|
if url && !url.start_with?(post.topic.url)
|
|
url = nil
|
|
end
|
|
|
|
if !url
|
|
url = post.canonical_url
|
|
Discourse.redis.setex(key, CACHE_URL_DURATION, url)
|
|
end
|
|
|
|
url
|
|
else
|
|
post.full_url
|
|
end
|
|
end
|
|
end
|