Commit Graph

115 Commits

Author SHA1 Message Date
Natalie Tay
e8aa2b8d9a
DEV: Add a nofollow to /u so user profiles don't get added as a target for crawling (#30693)
In some sites, we are noticing that /u routes are getting indexed.

This commit adds a
["nofollow"](https://developers.google.com/search/docs/crawling-indexing/qualify-outbound-links)
on user profiles so that Google will not index these user profile URLs.
2025-01-13 13:50:00 +08:00
Natalie Tay
75236b30d8
FIX: Exclude reply count on posts due to required Comment nesting (#27892)
"Replies" in non-crawler view makes a request when clicked to get all replies, however this does not make sense in the crawler view where we load everything per post number.

So the solution here is to exclude the reply number so we can avoid having to nest all replies in a post.
2024-07-15 09:40:47 +08:00
Régis Hanol
0846862cb5
FIX: deleted topic author in crawler view (#27788)
When a crawler visits a topic that has a deleted author, it would error because the `show.html.erb` view was expecting a user to be always present.

This ensure we don't render the "author" meta data when the author of the topic has been deleted.

Internal ref t/132508
2024-07-09 10:44:03 +02:00
David Taylor
50caef6783
FIX: Restore author on non-first-post crawler views (#26459)
Followup to 3329484e2d
2024-04-02 12:08:26 +01:00
David Taylor
3329484e2d
FEATURE: Simplify crawler content for non-canonical post URLs (#26324)
When crawlers visit a post-specific URL like `/t/-/{topic-id}/{post-number}`, we use the canonical to direct them to the appropriate crawler-optimised paginated view (e.g. `?page=3`).

However, analysis of google results shows that the post-specific URLs are still being included in the index. Google doesn't tell us exactly why this is happening. However, as a general rule, 'A large portion of the duplicate page's content should be present on the canonical version'.

In our previous implementation, this wasn't 100% true all the time. That's because a request for a post-specific URL would include posts 'surrounding' that post, and won't exactly conform to the page boundaries which are used in the canonical version of the page. Essentially: in some cases, the content of the post-specific pages would include many posts which were not present on the canonical paginated version.

This commit aims to resolve that problem by simplifying the implementation. Instead of rendering posts surrounding the target post_number, we will only render the target post, and include a link to 'show post in topic'. With this new implementation, 100% of the post-specific page content will be present on the canonical paginated version, which will hopefully mean google reduces their  indexing of the non-canonical post-specific pages.
2024-03-26 15:18:46 +00:00
David Taylor
4a7e69d8ee
UX: Include message when crawler content is omitted (#26325)
To improve performance, we omit the basic-HTML version of pages when users are logged in, or when they are using a modern mobile device. This can be confusing when analysing the SEO of sites, so this commit adds a short static message when content is omitted.
2024-03-22 17:24:57 +00:00
Ayke Halder
1a782acd9c
FIX: set microdata schema for topic on missing first post (#25195)
Some attributes of the microdata schema `DiscussionForumPosting` are rendered in the context of the first post.
Ensure these attributes are also set if the first post is not part of the current view.
2024-01-12 16:29:03 +05:30
Ayke Halder
9261500ea9
FIX: exclude empty posts from microdata schema for topic (#25198) 2024-01-12 12:47:56 +05:30
Ayke Halder
16b8476cb4
FIX: Ensure consistent datePublished on follow-up pages in topic microdata schema (#25130)
* Ensure consistent `datePublished` and remove `text` on second page in topic microdata schema

Always use `datePublished` from topic and never from `first_psot`. This ensures `datePublished` to be consistent on `first page` and `page=2+`.

No need to repeat `text` on `page=2+`. Especially do not set `text` on `page=2+` if it is only an abstract and thereby not 100% consistent with `text` on `first page`.

* Keep `text`attribute on follow-up pages
2024-01-12 12:11:08 +05:30
Arpit Jalan
4bf60b3e5b
FEATURE: include username link in the microdata schema (#25112) 2024-01-03 20:11:41 +05:30
Arpit Jalan
69383e9afd
FIX: include only author username in the schema (#25106) 2024-01-03 12:16:29 +05:30
Arpit Jalan
fb056ae719
FIX: add required metadata schema for subsequent pages (#25102)
This commits adds missing metadata schema for subsequent pages (?page=2)

https://meta.discourse.org/t/discussion-forum-schema-improvements/287347/21
2024-01-03 09:37:04 +05:30
Arpit Jalan
878d973d90
FIX: fixes for microdata schema rendering (#25082)
d9ca6c3bb9 (r135940042)
2024-01-02 19:23:57 +05:30
Rafael dos Santos Silva
a6cf2d20e6
FEATURE: Topic crawler view bottom plugin outlet (#25060) 2023-12-28 15:16:30 -03:00
Arpit Jalan
d9ca6c3bb9
FIX: improve structured data based on recent changes (#25043)
This commit makes some improvements to a topic's structured data based
on the recommendation on meta topic: https://meta.discourse.org/t/google-structured-data-for-forums-and-profile-pages/286762/9
2023-12-27 11:13:16 +05:30
Natalie Tay
22ce638ec3
FIX: Use subfolder-safe url for category in html view (#24595)
Use subfolder-safe url for category in html view
2023-11-28 19:08:14 +08:00
Vinoth Kannan
7cedb911a7
FEATURE: add category name in articleSection meta tag for schema. (#21004)
https://schema.org/DiscussionForumPosting
2023-04-06 23:30:19 +05:30
Vinoth Kannan
8405ec2831
FEATURE: use "Comment" schema type for post replies. (#20932)
Previously, we used the schema type "DiscussionForumPosting" for all the posts including replies. This is not recommended as per Google search experts. This commit changes the schema type to "Comment" for replies.
2023-04-03 14:36:47 +05:30
Loïc Guitaut
a9f2c6db64 SECURITY: Show only visible tags in metadata
Currently, the topic metadata show both public and private
tags whereas only visible ones should be exposed.
2023-02-23 17:22:20 +01:00
Ayke Halder
9f14d643a5
DEV: use structured data in crawler-linkback-list for referencing only (#16237)
This simplifies the crawler-linkback-list to only be a point of reference to the actual DiscussionForumPosting objects.

See "Summary page": https://developers.google.com/search/docs/advanced/structured-data/carousel?hl=en#summary-page
> [It] defines an ItemList, where each ListItem has only three properties: @type (set to ListItem), position (the position in the list), and url (the URL of a page with full details about that item).
2023-01-30 08:26:55 +01:00
Ayke Halder
137dbaf0dc
DEV: declare post position as simple number in structured data (#16231)
This replaces the position declared as `#123` with the more simple version `123`.

The property position may be of type Integer or Text. A value of type Integer, or more precise of type Text which simply casts to integer, is sufficient here.
See: https://schema.org/position

In category-view the topic-list already uses this notation for the position of topics:
`<meta itemprop="position" content="123">`
2023-01-30 08:07:04 +01:00
Loïc Guitaut
14d97f9cf1 FEATURE: Show more context in Discourse topic oneboxes
Currently when generating a onebox for Discourse topics, some important
context is missing such as categories and tags.

This patch addresses this issue by introducing a new onebox engine
dedicated to display this information when available. Indeed to get this
new information, categories and tags are exposed in the topic metadata
as opengraph tags.
2023-01-11 14:22:53 +01:00
Arpit Jalan
c39cebc161
PERF: remove server plugin outlet for post (#17105) 2022-06-16 17:21:24 +10:00
Arpit Jalan
defa5a4e94
FEATURE: allow locals to be passed in server_plugin_outlet (#16850) 2022-05-20 10:00:24 +05:30
Sam
de9a031073
FEATURE: use canonical links in posts.rss feed (#16190)
* FEATURE: use canonical links in posts.rss feed

Previously we used non canonical links in posts.rss

These links get crawled frequently by crawlers when discovering new
content forcing crawlers to hop to non canonical pages just to end up
visiting canonical pages

This uses up expensive crawl time and adds load on Discourse sites

Old links were of the form:

`https://DOMAIN/t/SLUG/43/21`

New links are of the form

`https://DOMAIN/t/SLUG/43?page=2#post_21`

This also adds a post_id identified element to crawler view that was
missing.

Note, to avoid very expensive N+1 queries required to figure out the
page a post is on during rss generation, we cache that information.

There is a smart "cache breaker" which ensures worst case scenario is
a "page drift" - meaning we would publicize a post is on page 11 when
it is actually on page 10 due to post deletions. Cache holds for up to
12 hours.

Change only impacts public post RSS feeds (`/posts.rss`)
2022-03-15 20:17:06 +11:00
Ayke Halder
28bb9e11f4
FEATURE: add nofollow to RSS alternate link in topics and categories (#16013)
* FEATURE: add nofollow to RSS alternate link in topics and categories

* Rspec tests for category and topic view: add nofollow to RSS alternate link
2022-03-09 16:34:02 +11:00
Krzysztof Kotlarek
8b93da9fe0
FIX: rename action_code_href to action_code_path (#14834)
Small actions should use path instead of absolute url. getURL function is necessary to insert a potential subfolder prefix.
2021-11-08 14:32:17 +11:00
Krzysztof Kotlarek
fe8087e523
FEATURE: small action post accepts href (#14816)
Optionally add href to small action.
It can be used by discourse-assign to link to correct post from translation
2021-11-08 08:24:44 +11:00
Gerhard Schlager
8e60bce903
FIX: Always show the creation date of posts in crawler view (#14269)
The modification date should always be a meta tag to make this less confusing. Especially for imported posts.
That's more in line with how the rest of Discourse presents post dates.
2021-09-08 11:03:55 +02:00
Vinoth Kannan
5a93893b08
FIX: use correct URL in schema markup for post images. (#13847)
Currently, it wrongly adds Discourse base URL in prefix even for CDN URLs.
2021-07-26 21:39:51 +05:30
Vinoth Kannan
ea423b471a FIX: make crawler linkback list compatible with google schema guidelines. 2020-09-04 04:35:32 +05:30
Dan Ungureanu
b80128a973
FEATURE: Add structured data to follow Google's guidelines (#9764)
All Schema.org properties are optional, but Google has a set of
properties which are required.
2020-05-14 10:42:01 +03:00
Dan Ungureanu
141f16eb6b
FIX: Multiple schema.org improvements
* Do not show "Uncategorized" category in topics list.
* Use "BreadcrumbList" only if topic is in a category.
* Add tags list as keywords to the first post.
* Add "dateModified" even if it is the same with "datePublished".
* Show "crawler-linkback-list" only if there are links to be shown.
2020-05-11 20:38:49 +03:00
Joffrey JAFFEUX
addf9d62f8
FIX: prevents rendering topic-category if empty (#9720) 2020-05-11 17:45:28 +03:00
Dan Ungureanu
fe51f7a863
FEATURE: More improvements to crawler and old browsers view
Related to c85018cdfd.
2020-04-30 12:07:51 +03:00
Dan Ungureanu
c85018cdfd
Improve support for old browsers (#9515)
* FEATURE: Improve crawler view

* FIX: Make lazyYT crawler-friendly

* DEV: Rename discourse-internet-explorer to discourse-unsupported-browser

* DEV: Detect more unsupported browsers

Follow-up to 4eebbd2212.

* FIX: Hide browser update notice in print view
2020-04-29 21:40:21 +03:00
Régis Hanol
96b64df4d4 FIX: use schema.org's BreadcrumList
The data-vocabulary.org schema is being deprecated.
We're now using the BreadcrumList data from the latest and greatest schema.org.

FIX: categories_breadcrumb helper to support more than 2 levels of categories.
2020-01-21 22:27:21 +01:00
Dan Ungureanu
89bd7ba45f
FIX: Use new tag routes (#8683)
Commit 1fb7a62 added unambiguous routes for tags. This commit ensures
that the new routes are used.
2020-01-21 19:23:08 +02:00
Joffrey JAFFEUX
71bf9ec1b2
FEATURE: opt-in guidance on topics for users without access (#7852)
Co-Authored-By: majakomel <maja.komel@gmail.com>
Co-Authored-By: Robin Ward <robin.ward@gmail.com>
2019-07-04 10:12:39 +02:00
Vinoth Kannan
f9f12ed221 PERF: fix N+1 queries for non-JS topic view. 2019-06-03 21:47:33 +05:30
Kris
98336de266 UX: Cleanup crawler styles, improve schema.org markup (#7668)
* Cleaning up crawler styles, improving some schema.org markup

* Cleaning up crawler styles, improving some schema.org markup

* additional styling

* add space for pagination
2019-06-03 12:03:16 +10:00
Saurabh Patel
e20f13ebb7 fix css of prev and next page links, move them to bottom of page (#7465)
Thanks 👍
2019-05-07 17:04:27 +02:00
Saurabh Patel
3658be42f5 FIX: remove like_count and <hr> tag from post crawler layout (#7413)
* show likes value in crawler view if count is > 0

* remove <hr> since horizontal line is already provided by css - this removes one of 2 horizontal lines in post crawler view
2019-04-23 15:35:57 +10:00
Saurabh Patel
da2f659635 UX: Improve posts layout for crawler (#7286) 2019-04-03 11:58:00 +02:00
Vinoth Kannan
e0c16d3a8a minor refactoring to improve code readability 2019-02-11 17:24:02 +05:30
Vinoth Kannan
2c12336c6b FIX: Display post updated date in non-JS view for crawler 2019-02-11 16:48:22 +05:30
Vinoth Kannan
3d52f690b3 UX: Add post action text in non-JS topic view 2019-01-28 22:51:06 +05:30
Joffrey JAFFEUX
096a81158a
FIX: siteNavigationElement was reversed (#6934) 2019-01-23 15:47:39 +01:00
Joe
7707e42441 DEV: moves print-specific styles from internal style tag to external print sheet (#6581)
* DEV: removes internal styles from print view

* DEV: adds styles to print sheet
2018-11-13 14:45:55 +11:00
Joe
4234058358 UX: don't show crawler navigation in print view (#6551)
* UX: adds CSS classes to crawler navigation links

* UX: hide crawler navigation in print view
2018-11-02 09:18:07 +11:00