discourse/spec/requests
Osama Sayegh 7bd3986b21
FEATURE: Replace Crawl-delay directive with proper rate limiting (#15131)
We have a couple of site setting, `slow_down_crawler_user_agents` and `slow_down_crawler_rate`, that are meant to allow site owners to signal to specific crawlers that they're crawling the site too aggressively and that they should slow down.

When a crawler is added to the `slow_down_crawler_user_agents` setting, Discourse currently adds a `Crawl-delay` directive for that crawler in `/robots.txt`. Unfortunately, many crawlers don't support the `Crawl-delay` directive in `/robots.txt` which leaves the site owners no options if a crawler is crawling the site too aggressively.

This PR replaces the `Crawl-delay` directive with proper rate limiting for crawlers added to the `slow_down_crawler_user_agents` list. On every request made by a non-logged in user, Discourse will check the User Agent string and if it contains one of the values of the `slow_down_crawler_user_agents` list, Discourse will only allow 1 request every N seconds for that User Agent (N is the value of the `slow_down_crawler_rate` setting) and the rest of requests made within the same interval will get a 429 response. 

The `slow_down_crawler_user_agents` setting becomes quite dangerous with this PR since it could rate limit lots if not all of anonymous traffic if the setting is not used appropriately. So to protect against this scenario, we've added a couple of new validations to the setting when it's changed:

1) each value added to setting must 3 characters or longer
2) each value cannot be a substring of tokens found in popular browser User Agent. The current list of prohibited values is: apple, windows, linux, ubuntu, gecko, firefox, chrome, safari, applewebkit, webkit, mozilla, macintosh, khtml, intel, osx, os x, iphone, ipad and mac.
2021-11-30 12:55:25 +03:00
..
admin FEATURE: adds uploads scope for API keys (#14941) 2021-11-22 10:49:08 -07:00
api FEATURE: Display pending posts on user’s page 2021-11-29 10:26:33 +01:00
about_controller_spec.rb FIX: Correct user profile URLs in /about crawler view 2020-07-14 16:09:27 +01:00
application_controller_spec.rb FEATURE: Replace Crawl-delay directive with proper rate limiting (#15131) 2021-11-30 12:55:25 +03:00
associate_accounts_controller_spec.rb DEV: Improve robustness of associate_accounts_controller 2021-08-10 15:07:40 +01:00
badges_controller_spec.rb FEATURE: add noindex header to badges, groups, and /my pages (#9736) 2020-05-11 15:05:42 +10:00
bookmarks_controller_spec.rb FEATURE: Topic-level bookmarks (#14353) 2021-09-21 08:45:47 +10:00
bootstrap_controller_spec.rb FIX: allows authentication data to be present in bootstrap (#13885) 2021-07-29 15:01:11 +02:00
categories_controller_spec.rb FIX: Display top posts from private categories if the user has access. (#14878) 2021-11-11 13:35:03 -03:00
clicks_controller_spec.rb DEV: Fix failling test. 2019-05-07 11:19:13 +03:00
composer_messages_controller_spec.rb DEV: Use response.parsed_body in specs (#9615) 2020-05-07 17:04:12 +02:00
csp_reports_controller_spec.rb DEV: Only include "report-sample" CSP directive when reporting is enabled (#9337) 2020-04-02 11:16:38 -04:00
directory_columns_controller_spec.rb DEV: Plugin API to add directory columns (#13440) 2021-06-22 13:00:04 -05:00
directory_items_controller_spec.rb FIX: Include user_field_ids in pagination URL for directory items (#13569) 2021-06-29 14:43:38 -05:00
do_not_disturb_controller_spec.rb DEV: Replace 'processed' column on notifications with new table (#11864) 2021-01-27 10:29:24 -06:00
drafts_controller_spec.rb FEATURE: Cook drafts excerpt in user activity (#14315) 2021-09-14 15:18:01 +03:00
email_controller_spec.rb FIX: Show Uncategorized when unsubscribing (#13832) 2021-07-26 12:19:30 +10:00
embed_controller_spec.rb UX: display correct replies count in embedded comments view. (#14175) 2021-08-30 10:37:53 +05:30
exceptions_controller_spec.rb DEV: use #frozen_string_literal: true on all spec 2019-04-30 10:27:42 +10:00
export_csv_controller_spec.rb DEV: Switch to new ExportUserArchive job 2020-08-28 11:46:53 -07:00
extra_locales_controller_spec.rb DEV: Correct typos and spelling mistakes (#12812) 2021-05-21 11:43:47 +10:00
finish_installation_controller_spec.rb DEV: use #frozen_string_literal: true on all spec 2019-04-30 10:27:42 +10:00
forums_controller_spec.rb FEATURE: Allow a cluster_name to be configured and used for /srv/status (#12365) 2021-03-15 15:41:59 +11:00
groups_controller_spec.rb FEATURE: Send a 'noindex' header in non-canonical responses (#15026) 2021-11-25 16:58:39 -03:00
hashtags_controller_spec.rb DEV: Correct typos and spelling mistakes (#12812) 2021-05-21 11:43:47 +10:00
inline_onebox_controller_spec.rb DEV: Use response.parsed_body in specs (#9615) 2020-05-07 17:04:12 +02:00
invites_controller_spec.rb FIX: Allow bulk invites to be used with DiscourseConnect (#14862) 2021-11-09 17:43:23 +00:00
list_controller_spec.rb Support parsing array in #param_to_integer_list 2021-11-16 10:27:00 -05:00
metadata_controller_spec.rb PERF: cache all metadata for 60 seconds 2020-07-01 12:58:02 +10:00
notifications_controller_spec.rb FIX: Typo in NotificationsController#index not caught by tests. 2020-07-22 09:22:26 +08:00
offline_controller_spec.rb DEV: use #frozen_string_literal: true on all spec 2019-04-30 10:27:42 +10:00
omniauth_callbacks_controller_spec.rb FEATURE: Allow linking an existing account during external-auth signup 2021-08-10 15:07:40 +01:00
onebox_controller_spec.rb FEATURE: Onebox local categories (#11311) 2020-11-25 10:53:05 +11:00
permalinks_controller_spec.rb DEV: Correct typos and spelling mistakes (#12812) 2021-05-21 11:43:47 +10:00
post_action_users_controller_spec.rb DEV: Cleanup ignored user logic (#11107) 2020-11-03 12:38:54 +00:00
post_actions_controller_spec.rb DEV: Use response.parsed_body in specs (#9615) 2020-05-07 17:04:12 +02:00
post_readers_controller_spec.rb DEV: Correct typos and spelling mistakes (#12812) 2021-05-21 11:43:47 +10:00
posts_controller_spec.rb FEATURE: Display pending posts on user’s page 2021-11-29 10:26:33 +01:00
presence_controller_spec.rb DEV: Various behind-the-scenes improvements to PresenceChannel (#14518) 2021-10-07 15:50:14 +01:00
published_pages_controller_spec.rb FIX: Do not enable published page if secure media enabled (#11131) 2020-11-06 10:33:19 +10:00
push_notification_controller_spec.rb DEV: Prefabrication (test optimization) (#7414) 2019-05-07 13:12:20 +10:00
qunit_controller_spec.rb DEV: Don't try to load admin locales in tests (#14917) 2021-11-13 15:31:55 +01:00
reviewable_claimed_topics_controller_spec.rb FIX: Make reviewable claiming work with deleted topics (#9040) 2020-02-25 15:49:23 +02:00
reviewables_controller_spec.rb FEATURE: Blocking is optional when deleting a user from the review queue. (#13375) 2021-06-15 12:35:45 -03:00
robots_txt_controller_spec.rb FEATURE: Replace Crawl-delay directive with proper rate limiting (#15131) 2021-11-30 12:55:25 +03:00
safe_mode_controller_spec.rb Code review comments. 2021-06-21 11:06:58 +08:00
search_controller_spec.rb FEATURE: Log only topic/post search queries in search log (#14994) 2021-11-18 09:21:12 +08:00
session_controller_spec.rb DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
similar_topics_controller_spec.rb FIX: reindex_search job should work on model with no search data (#11819) 2021-01-25 11:23:36 +01:00
site_controller_spec.rb DEV: Include login_required attribute in basic info endpoint (#14064) 2021-08-17 14:05:51 -04:00
static_controller_spec.rb DEV: apply allow origin response header for CDN requests. (#11893) 2021-01-29 07:44:49 +05:30
steps_controller_spec.rb DEV: use #frozen_string_literal: true on all spec 2019-04-30 10:27:42 +10:00
stylesheets_controller_spec.rb PERF: Eager load Theme associations in Stylesheet Manager. 2021-06-21 11:06:58 +08:00
svg_sprite_controller_spec.rb FIX: Use absolute URL when redirecting SVG sprite path. 2021-06-30 11:25:05 +08:00
tag_groups_controller_spec.rb DEV: Improve tag groups test (#12848) 2021-04-27 14:05:45 +03:00
tags_controller_spec.rb FEATURE: New and Unread messages for user personal messages. (#13603) 2021-08-02 12:41:41 +08:00
theme_javascripts_controller_spec.rb DEV: Correct typos and spelling mistakes (#12812) 2021-05-21 11:43:47 +10:00
topics_controller_spec.rb FEATURE: Add setting to disable notifications for topic tags edits (#14794) 2021-11-02 13:53:21 -04:00
uploads_controller_multisite_spec.rb DEV: Isolate multisite specs (#13634) 2021-07-07 18:57:42 +02:00
uploads_controller_spec.rb DEV: Extract shared external upload routes into controller helper (#14984) 2021-11-18 09:17:23 +10:00
user_actions_controller_spec.rb FIX: restrict other user's notification routes (#14442) 2021-09-29 16:24:28 +04:00
user_api_keys_controller_spec.rb DEV: Move UserApiKey scopes to dedicated table (#10704) 2020-09-29 10:57:48 +01:00
user_avatars_controller_spec.rb DEV: Remove the remaining Travis code (#13255) 2021-06-02 20:29:47 +02:00
user_badges_controller_spec.rb FIX: simplify and improve choosing favorite badges (#13743) 2021-07-16 11:13:00 +08:00
users_controller_spec.rb FEATURE: show recent searches in quick search panel (#15024) 2021-11-25 15:44:15 -05:00
users_email_controller_spec.rb DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
webhooks_controller_spec.rb Revert "Revert "Merge branch 'master' of https://github.com/discourse/discourse"" 2020-05-23 00:56:13 -04:00
wizard_controller_spec.rb DEV: Use response.parsed_body in specs (#9615) 2020-05-07 17:04:12 +02:00