Joshua Rosenfeld
1e6d125db6
FIX: Remove additional paths from robots.txt
...
admin path is protected by guardian and thus inaccessible to crawlers
Follow up to b52143f
2020-08-26 16:52:22 -04:00
Krzysztof Kotlarek
e0d9232259
FIX: use allowlist and blocklist terminology ( #10209 )
...
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
Joshua Rosenfeld
b52143feff
FIX: Remove paths from robots.txt in favor of noindex header
...
Google no longer supports the use of robots.txt to block indexing.
See https://support.google.com/webmasters/answer/6062608 and
https://support.google.com/webmasters/answer/93710
Previous commits have added the `noindex` header to appropriate pages,
now we need to remove the paths from robots.txt so the pages can be
crawled.
Follow up to:
13f229808a
b6765aac4b
676be3a853
07b728c5e5
c94e6a9a66
2020-06-25 13:55:06 -04:00
Osama Sayegh
6515ff19e5
FEATURE: Allow customization of robots.txt ( #7884 )
...
* FEATURE: Allow customization of robots.txt
This allows admins to customize/override the content of the robots.txt
file at /admin/customize/robots. That page is not linked to anywhere in
the UI -- admins have to manually type the URL to access that page.
* use Ember.computed.not
* Jeff feedback
* Feedback
* Remove unused import
2019-07-15 20:47:44 +03:00
Sam Saffron
30990006a9
DEV: enable frozen string literal on all files
...
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Sam
baa72d18f8
FIX: simplify so we ban all auth paths
...
previously plugins that have auth paths were not disallowed and robots
tend to call them
2018-08-16 19:16:47 +10:00
Sam Saffron
030e322a39
FEATURE: block top level /my/ routes
2018-06-12 19:47:45 +10:00
Robin Ward
3d7dbdedc0
FEATURE: An API to help sites build robots.txt files programatically
...
This is mainly useful for subfolder sites, who need to expose their
robots.txt contents to a parent site.
2018-04-16 15:43:20 -04:00
Régis Hanol
df7970a6f6
prefix the robots.txt rules with the directory when using subfolder
2018-04-11 22:05:02 +02:00
Sam
3a7b696703
FEATURE: allow for setting crawl delay per user agent
...
Also moved to default crawl delay bing so no more than a req every 5 seconds is allowed
New site settings:
"slow_down_crawler_user_agents" - list of crawlers that will be slowed down
"slow_down_crawler_rate" - how many seconds to wait between requests
Not enforced server side yet
2018-04-06 10:15:23 +10:00
Neil Lalonde
ced7e9a691
FEATURE: control which web crawlers can access using a whitelist or blacklist
2018-03-22 15:41:02 -04:00
Guo Xiang Tan
77d4c4d8dc
Fix all the errors to get our tests green on Rails 5.1.
2017-09-25 13:48:58 +08:00
Régis Hanol
d75cc67d86
FIX: robots.txt should be accessible even when login is required
2015-10-15 11:42:41 +02:00
Sam
e5888cf090
PERF: avoid preloading json in cases where it is not needed
...
(uploads / avatars / non GET requests)
2015-05-20 17:12:16 +10:00
Neil Lalonde
a86b35c873
Remove the access_password site setting
2013-06-25 15:05:25 -04:00
Robin Ward
d2596c3c4c
Remove unusued site_settings, show checkbox in UI for boolean values, remove restrict_access
...
boolean to avoid locking yourself out by setting access_password to empty string. Minor
UI tweaks.
2013-03-01 14:27:41 -05:00
Gosha Arinich
cafc75b238
remove trailing whitespaces ❤️
2013-02-26 07:31:35 +03:00
Sam Saffron
1c12c91d0c
forgot to skip a filter
2013-02-11 17:14:36 +11:00
Sam Saffron
c50a9e4d01
added support for disabling indexing by google using SiteSetting.allow_index_in_robots_txt = false
2013-02-11 11:02:57 +11:00