DEV: add the notion of a 'crawler identifier' in anonymous_cache

We identify and deny blocked crawlers here in anonymous_cache.

Separating the notion of the crawler identifier here lets plugins perform an
override if they perform more advanced detection.
This commit is contained in:
Michael Brown 2024-12-05 16:24:21 -05:00 committed by Michael Brown
parent 6e54696003
commit c546111703

View File

@ -78,13 +78,17 @@ module Middleware
@request = request || Rack::Request.new(@env)
end
def crawler_identifier
@user_agent
end
def blocked_crawler?
@request.get? && !@request.xhr? && !@request.path.ends_with?("robots.txt") &&
!@request.path.ends_with?("srv/status") &&
@request[Auth::DefaultCurrentUserProvider::API_KEY].nil? &&
@env[Auth::DefaultCurrentUserProvider::USER_API_KEY].nil? &&
@env[Auth::DefaultCurrentUserProvider::HEADER_API_KEY].nil? &&
CrawlerDetection.is_blocked_crawler?(@user_agent)
CrawlerDetection.is_blocked_crawler?(crawler_identifier)
end
# rubocop:disable Lint/BooleanSymbol