DEV: Replace instances of Discourse.base_uri with Discourse.base_path
This is clearer because the base_uri is actually just a path prefix. This continues the work started in 555f467.
Google insists on indexing pages so it can figure out if they
can be removed from the index.
see: https://support.google.com/webmasters/answer/6332384?hl=en
This change ensures the we have special behavior for Googlebot
where we allow indexing, but block the actual indexing via
X-Robots-Tag
This stops crawlers from hitting tags and category rss feeds to discover
new content, instead they should focus on latest/posts if they need to
consume something regular
Crawlers love hitting the rss feeds (confirmed that both Google and Bing do)
Experimenting with the impact of blocking these feeds and forcing Crawlers to hit
the content direct. It is better if they hit the actual page to start with as opposed to
1. Hit RSS feed
2. Find new content
3. Hit post link
4. Get canonical
5. Hit canonical
Lots of pointless work.
We do not know for sure what impact this will have on newsreader apps,
we will listen for feedback.
Also moved to default crawl delay bing so no more than a req every 5 seconds is allowed
New site settings:
"slow_down_crawler_user_agents" - list of crawlers that will be slowed down
"slow_down_crawler_rate" - how many seconds to wait between requests
Not enforced server side yet
Although most user uploads are probably harmless, it's possible someone
has (either maliciously or not) uploaded sensitive information. Prevent
robots from indexing the uploads route.