Commit Graph

111 Commits

Author SHA1 Message Date
David Taylor
3329484e2d
FEATURE: Simplify crawler content for non-canonical post URLs (#26324)
When crawlers visit a post-specific URL like `/t/-/{topic-id}/{post-number}`, we use the canonical to direct them to the appropriate crawler-optimised paginated view (e.g. `?page=3`).

However, analysis of google results shows that the post-specific URLs are still being included in the index. Google doesn't tell us exactly why this is happening. However, as a general rule, 'A large portion of the duplicate page's content should be present on the canonical version'.

In our previous implementation, this wasn't 100% true all the time. That's because a request for a post-specific URL would include posts 'surrounding' that post, and won't exactly conform to the page boundaries which are used in the canonical version of the page. Essentially: in some cases, the content of the post-specific pages would include many posts which were not present on the canonical paginated version.

This commit aims to resolve that problem by simplifying the implementation. Instead of rendering posts surrounding the target post_number, we will only render the target post, and include a link to 'show post in topic'. With this new implementation, 100% of the post-specific page content will be present on the canonical paginated version, which will hopefully mean google reduces their  indexing of the non-canonical post-specific pages.
2024-03-26 15:18:46 +00:00
David Taylor
4a7e69d8ee
UX: Include message when crawler content is omitted (#26325)
To improve performance, we omit the basic-HTML version of pages when users are logged in, or when they are using a modern mobile device. This can be confusing when analysing the SEO of sites, so this commit adds a short static message when content is omitted.
2024-03-22 17:24:57 +00:00
Ayke Halder
1a782acd9c
FIX: set microdata schema for topic on missing first post (#25195)
Some attributes of the microdata schema `DiscussionForumPosting` are rendered in the context of the first post.
Ensure these attributes are also set if the first post is not part of the current view.
2024-01-12 16:29:03 +05:30
Ayke Halder
9261500ea9
FIX: exclude empty posts from microdata schema for topic (#25198) 2024-01-12 12:47:56 +05:30
Ayke Halder
16b8476cb4
FIX: Ensure consistent datePublished on follow-up pages in topic microdata schema (#25130)
* Ensure consistent `datePublished` and remove `text` on second page in topic microdata schema

Always use `datePublished` from topic and never from `first_psot`. This ensures `datePublished` to be consistent on `first page` and `page=2+`.

No need to repeat `text` on `page=2+`. Especially do not set `text` on `page=2+` if it is only an abstract and thereby not 100% consistent with `text` on `first page`.

* Keep `text`attribute on follow-up pages
2024-01-12 12:11:08 +05:30
Arpit Jalan
4bf60b3e5b
FEATURE: include username link in the microdata schema (#25112) 2024-01-03 20:11:41 +05:30
Arpit Jalan
69383e9afd
FIX: include only author username in the schema (#25106) 2024-01-03 12:16:29 +05:30
Arpit Jalan
fb056ae719
FIX: add required metadata schema for subsequent pages (#25102)
This commits adds missing metadata schema for subsequent pages (?page=2)

https://meta.discourse.org/t/discussion-forum-schema-improvements/287347/21
2024-01-03 09:37:04 +05:30
Arpit Jalan
878d973d90
FIX: fixes for microdata schema rendering (#25082)
d9ca6c3bb9 (r135940042)
2024-01-02 19:23:57 +05:30
Rafael dos Santos Silva
a6cf2d20e6
FEATURE: Topic crawler view bottom plugin outlet (#25060) 2023-12-28 15:16:30 -03:00
Arpit Jalan
d9ca6c3bb9
FIX: improve structured data based on recent changes (#25043)
This commit makes some improvements to a topic's structured data based
on the recommendation on meta topic: https://meta.discourse.org/t/google-structured-data-for-forums-and-profile-pages/286762/9
2023-12-27 11:13:16 +05:30
Natalie Tay
22ce638ec3
FIX: Use subfolder-safe url for category in html view (#24595)
Use subfolder-safe url for category in html view
2023-11-28 19:08:14 +08:00
Vinoth Kannan
7cedb911a7
FEATURE: add category name in articleSection meta tag for schema. (#21004)
https://schema.org/DiscussionForumPosting
2023-04-06 23:30:19 +05:30
Vinoth Kannan
8405ec2831
FEATURE: use "Comment" schema type for post replies. (#20932)
Previously, we used the schema type "DiscussionForumPosting" for all the posts including replies. This is not recommended as per Google search experts. This commit changes the schema type to "Comment" for replies.
2023-04-03 14:36:47 +05:30
Loïc Guitaut
a9f2c6db64 SECURITY: Show only visible tags in metadata
Currently, the topic metadata show both public and private
tags whereas only visible ones should be exposed.
2023-02-23 17:22:20 +01:00
Ayke Halder
9f14d643a5
DEV: use structured data in crawler-linkback-list for referencing only (#16237)
This simplifies the crawler-linkback-list to only be a point of reference to the actual DiscussionForumPosting objects.

See "Summary page": https://developers.google.com/search/docs/advanced/structured-data/carousel?hl=en#summary-page
> [It] defines an ItemList, where each ListItem has only three properties: @type (set to ListItem), position (the position in the list), and url (the URL of a page with full details about that item).
2023-01-30 08:26:55 +01:00
Ayke Halder
137dbaf0dc
DEV: declare post position as simple number in structured data (#16231)
This replaces the position declared as `#123` with the more simple version `123`.

The property position may be of type Integer or Text. A value of type Integer, or more precise of type Text which simply casts to integer, is sufficient here.
See: https://schema.org/position

In category-view the topic-list already uses this notation for the position of topics:
`<meta itemprop="position" content="123">`
2023-01-30 08:07:04 +01:00
Loïc Guitaut
14d97f9cf1 FEATURE: Show more context in Discourse topic oneboxes
Currently when generating a onebox for Discourse topics, some important
context is missing such as categories and tags.

This patch addresses this issue by introducing a new onebox engine
dedicated to display this information when available. Indeed to get this
new information, categories and tags are exposed in the topic metadata
as opengraph tags.
2023-01-11 14:22:53 +01:00
Arpit Jalan
c39cebc161
PERF: remove server plugin outlet for post (#17105) 2022-06-16 17:21:24 +10:00
Arpit Jalan
defa5a4e94
FEATURE: allow locals to be passed in server_plugin_outlet (#16850) 2022-05-20 10:00:24 +05:30
Sam
de9a031073
FEATURE: use canonical links in posts.rss feed (#16190)
* FEATURE: use canonical links in posts.rss feed

Previously we used non canonical links in posts.rss

These links get crawled frequently by crawlers when discovering new
content forcing crawlers to hop to non canonical pages just to end up
visiting canonical pages

This uses up expensive crawl time and adds load on Discourse sites

Old links were of the form:

`https://DOMAIN/t/SLUG/43/21`

New links are of the form

`https://DOMAIN/t/SLUG/43?page=2#post_21`

This also adds a post_id identified element to crawler view that was
missing.

Note, to avoid very expensive N+1 queries required to figure out the
page a post is on during rss generation, we cache that information.

There is a smart "cache breaker" which ensures worst case scenario is
a "page drift" - meaning we would publicize a post is on page 11 when
it is actually on page 10 due to post deletions. Cache holds for up to
12 hours.

Change only impacts public post RSS feeds (`/posts.rss`)
2022-03-15 20:17:06 +11:00
Ayke Halder
28bb9e11f4
FEATURE: add nofollow to RSS alternate link in topics and categories (#16013)
* FEATURE: add nofollow to RSS alternate link in topics and categories

* Rspec tests for category and topic view: add nofollow to RSS alternate link
2022-03-09 16:34:02 +11:00
Krzysztof Kotlarek
8b93da9fe0
FIX: rename action_code_href to action_code_path (#14834)
Small actions should use path instead of absolute url. getURL function is necessary to insert a potential subfolder prefix.
2021-11-08 14:32:17 +11:00
Krzysztof Kotlarek
fe8087e523
FEATURE: small action post accepts href (#14816)
Optionally add href to small action.
It can be used by discourse-assign to link to correct post from translation
2021-11-08 08:24:44 +11:00
Gerhard Schlager
8e60bce903
FIX: Always show the creation date of posts in crawler view (#14269)
The modification date should always be a meta tag to make this less confusing. Especially for imported posts.
That's more in line with how the rest of Discourse presents post dates.
2021-09-08 11:03:55 +02:00
Vinoth Kannan
5a93893b08
FIX: use correct URL in schema markup for post images. (#13847)
Currently, it wrongly adds Discourse base URL in prefix even for CDN URLs.
2021-07-26 21:39:51 +05:30
Vinoth Kannan
ea423b471a FIX: make crawler linkback list compatible with google schema guidelines. 2020-09-04 04:35:32 +05:30
Dan Ungureanu
b80128a973
FEATURE: Add structured data to follow Google's guidelines (#9764)
All Schema.org properties are optional, but Google has a set of
properties which are required.
2020-05-14 10:42:01 +03:00
Dan Ungureanu
141f16eb6b
FIX: Multiple schema.org improvements
* Do not show "Uncategorized" category in topics list.
* Use "BreadcrumbList" only if topic is in a category.
* Add tags list as keywords to the first post.
* Add "dateModified" even if it is the same with "datePublished".
* Show "crawler-linkback-list" only if there are links to be shown.
2020-05-11 20:38:49 +03:00
Joffrey JAFFEUX
addf9d62f8
FIX: prevents rendering topic-category if empty (#9720) 2020-05-11 17:45:28 +03:00
Dan Ungureanu
fe51f7a863
FEATURE: More improvements to crawler and old browsers view
Related to c85018cdfd.
2020-04-30 12:07:51 +03:00
Dan Ungureanu
c85018cdfd
Improve support for old browsers (#9515)
* FEATURE: Improve crawler view

* FIX: Make lazyYT crawler-friendly

* DEV: Rename discourse-internet-explorer to discourse-unsupported-browser

* DEV: Detect more unsupported browsers

Follow-up to 4eebbd2212.

* FIX: Hide browser update notice in print view
2020-04-29 21:40:21 +03:00
Régis Hanol
96b64df4d4 FIX: use schema.org's BreadcrumList
The data-vocabulary.org schema is being deprecated.
We're now using the BreadcrumList data from the latest and greatest schema.org.

FIX: categories_breadcrumb helper to support more than 2 levels of categories.
2020-01-21 22:27:21 +01:00
Dan Ungureanu
89bd7ba45f
FIX: Use new tag routes (#8683)
Commit 1fb7a62 added unambiguous routes for tags. This commit ensures
that the new routes are used.
2020-01-21 19:23:08 +02:00
Joffrey JAFFEUX
71bf9ec1b2
FEATURE: opt-in guidance on topics for users without access (#7852)
Co-Authored-By: majakomel <maja.komel@gmail.com>
Co-Authored-By: Robin Ward <robin.ward@gmail.com>
2019-07-04 10:12:39 +02:00
Vinoth Kannan
f9f12ed221 PERF: fix N+1 queries for non-JS topic view. 2019-06-03 21:47:33 +05:30
Kris
98336de266 UX: Cleanup crawler styles, improve schema.org markup (#7668)
* Cleaning up crawler styles, improving some schema.org markup

* Cleaning up crawler styles, improving some schema.org markup

* additional styling

* add space for pagination
2019-06-03 12:03:16 +10:00
Saurabh Patel
e20f13ebb7 fix css of prev and next page links, move them to bottom of page (#7465)
Thanks 👍
2019-05-07 17:04:27 +02:00
Saurabh Patel
3658be42f5 FIX: remove like_count and <hr> tag from post crawler layout (#7413)
* show likes value in crawler view if count is > 0

* remove <hr> since horizontal line is already provided by css - this removes one of 2 horizontal lines in post crawler view
2019-04-23 15:35:57 +10:00
Saurabh Patel
da2f659635 UX: Improve posts layout for crawler (#7286) 2019-04-03 11:58:00 +02:00
Vinoth Kannan
e0c16d3a8a minor refactoring to improve code readability 2019-02-11 17:24:02 +05:30
Vinoth Kannan
2c12336c6b FIX: Display post updated date in non-JS view for crawler 2019-02-11 16:48:22 +05:30
Vinoth Kannan
3d52f690b3 UX: Add post action text in non-JS topic view 2019-01-28 22:51:06 +05:30
Joffrey JAFFEUX
096a81158a
FIX: siteNavigationElement was reversed (#6934) 2019-01-23 15:47:39 +01:00
Joe
7707e42441 DEV: moves print-specific styles from internal style tag to external print sheet (#6581)
* DEV: removes internal styles from print view

* DEV: adds styles to print sheet
2018-11-13 14:45:55 +11:00
Joe
4234058358 UX: don't show crawler navigation in print view (#6551)
* UX: adds CSS classes to crawler navigation links

* UX: hide crawler navigation in print view
2018-11-02 09:18:07 +11:00
Kyle Zhao
a6eca28ec6
CSP - extract all other inline JavaScripts (#6528)
* wizard page inline js

* print topic inline js

* drop JS for preventing double submission

this is the default behavior with Rails' UJS `disable_with` helper

* omniauth complete redirect JS

* account activate inline js
2018-10-25 09:52:01 -04:00
Arpit Jalan
dfcb2a0d42 FEATURE: include published_time in metadata 2018-07-30 17:09:56 +05:30
Gerhard Schlager
44ee388070 FEATURE: omit images from og and twitter description tags 2017-11-28 21:34:02 +01:00
Arpit Jalan
b354099252 FEATURE: add custom open graph tag for ignoring canonical url 2017-08-15 19:24:20 +05:30