discourse

mirror of https://github.com/discourse/discourse.git synced 2024-12-23 03:43:47 +08:00

Author	SHA1	Message	Date
Alan Guo Xiang Tan	864b7b6bc8	DEV: Fix flaky test (#30215 ) The test was flaky and failing with the following errors: ``` Failure/Error: klass .connection .select_raw(relation.arel) do \|result, _\| result.type_map = DB.type_map result.nfields == 1 ? result.column_values(0) : result.values end NoMethodError: undefined method `select_raw' for nil ./lib/freedom_patches/fast_pluck.rb:60:in `pluck' ./vendor/bundle/ruby/3.3.0/gems/activerecord-7.2.2.1/lib/active_record/relation/calculations.rb:354:in `pick' ./app/models/web_crawler_request.rb:27:in `request_id' ./app/models/web_crawler_request.rb:31:in `rescue in request_id' ./app/models/web_crawler_request.rb:26:in `request_id' ./app/models/web_crawler_request.rb:19:in `write_cache!' ./app/models/concerns/cached_counting.rb:135:in `block (3 levels) in flush_to_db' ./vendor/bundle/ruby/3.3.0/gems/rails_multisite-6.1.0/lib/rails_multisite/connection_management/null_instance.rb:49:in `with_connection' ./vendor/bundle/ruby/3.3.0/gems/rails_multisite-6.1.0/lib/rails_multisite/connection_management.rb:21:in `with_connection' ./app/models/concerns/cached_counting.rb:134:in `block (2 levels) in flush_to_db' ./app/models/concerns/cached_counting.rb:124:in `each' ./app/models/concerns/cached_counting.rb:124:in `block in flush_to_db' ./lib/distributed_mutex.rb:53:in `block in synchronize' ./lib/distributed_mutex.rb:49:in `synchronize' ./lib/distributed_mutex.rb:49:in `synchronize' ./lib/distributed_mutex.rb:34:in `synchronize' ./app/models/concerns/cached_counting.rb:120:in `flush_to_db' ./app/models/concerns/cached_counting.rb:187:in `perform_increment!' ./app/models/web_crawler_request.rb:15:in `increment!' ./lib/middleware/request_tracker.rb:74:in `log_request' ./lib/middleware/request_tracker.rb:409:in `block in log_later' ./lib/scheduler/defer.rb:125:in `block in do_work' ./vendor/bundle/ruby/3.3.0/gems/rails_multisite-6.1.0/lib/rails_multisite/connection_management/null_instance.rb:49:in `with_connection' ./vendor/bundle/ruby/3.3.0/gems/rails_multisite-6.1.0/lib/rails_multisite/connection_management.rb:21:in `with_connection' ./lib/scheduler/defer.rb:119:in `do_work' ./lib/scheduler/defer.rb:105:in `block (2 levels) in start_thread' ``` This was due to running the defer thread in an async manner which is actually no representative of the production environment. It also revealed a spot in our code base where writes are happening in a GET request which can cause requests to fail if ActiveRecord is in readonly mode.	2024-12-11 10:12:58 +08:00
Martin Brennan	527f02e99f	FEATURE: Only count topic views for explicit/deferred tracked views (#27533 ) Followup `2f2da72747` This commit moves topic view tracking from happening every time a Topic is requested, which is susceptible to inflating numbers of views from web crawlers, to our request tracker middleware. In this new location, topic views are only tracked when the following headers are sent: * HTTP_DISCOURSE_TRACK_VIEW - This is sent on every page navigation when clicking around the ember app. We count these as browser page views because we know it comes from the AJAX call in our app. The topic ID is extracted from HTTP_DISCOURSE_TRACK_VIEW_TOPIC_ID * HTTP_DISCOURSE_DEFERRED_TRACK_VIEW - Sent when MessageBus initializes after first loading the page to count the initial page load view. The topic ID is extracted from HTTP_DISCOURSE_DEFERRED_TRACK_VIEW. This will bring topic views more in line with the change we made to page views in the referenced commit and result in more realistic topic view counts.	2024-07-03 10:38:49 +10:00
Arkshine	29460e1422	DEV: Provide a safe agent in check_crawler_limits()	2024-06-11 14:02:46 +02:00
Arkshine	1fffb236b2	FIX: crawler requests exceptions for non UTF-8 user agents with invalid bytes	2024-06-11 14:02:46 +02:00
Osama Sayegh	361992bb74	FIX: Apply crawler rate limits to cached requests (#27174 ) This commit moves the logic for crawler rate limits out of the application controller and into the request tracker middleware. The reason for this move is to apply rate limits to all crawler requests instead of just the requests that make it to the application controller. Some requests are served early from the middleware stack without reaching the Rails app for performance reasons (e.g. `AnonymousCache`) which results in crawlers getting 200 responses even though they've reached their limits and should be getting 429 responses. Internal topic: t/128810.	2024-05-27 16:26:35 +03:00
David Taylor	ece0150cb7	FIX: Ensure RequestTracker handles bubbled exceptions correctly (#26940 ) This can happen for various reasons including rate limiting and middleware bugs. This should resolve the warning we're seeing in the logs ``` RequestTracker.get_data failed : NoMethodError : undefined method `[]' for nil:NilClass ```	2024-05-08 16:08:39 +01:00
David Taylor	620f76cec1	DEV: Log original exception/backtrace for RequestTracker errors (#26802 )	2024-04-29 09:05:32 +01:00
David Taylor	2f2da72747	FEATURE: Add experimental tracking of 'real browser' pageviews (#26647 ) Our 'page_view_crawler' / 'page_view_anon' metrics are based purely on the User Agent sent by clients. This means that 'badly behaved' bots which are imitating real user agents are counted towards 'anon' page views. This commit introduces a new method of tracking visitors. When an initial HTML request is made, we assume it is a 'non-browser' request (i.e. a bot). Then, once the JS application has booted, we notify the server to count it as a 'browser' request. This reliance on a JavaScript-capable browser matches up more closely to dedicated analytics systems like Google Analytics. Existing data collection and graphs are unchanged. Data collected via the new technique is available in a new 'experimental' report.	2024-04-25 11:00:01 +01:00
Michael Brown	680f1ff19c	FIX: Add content-type header to rate limiter error It's best to always set a content-type header and one was missing here.	2024-03-26 12:39:42 -04:00
David Taylor	a562214f56	FIX: Update global rate limiter keys/messages to clarify user vs ip (#25264 )	2024-01-15 19:54:50 +00:00
David Taylor	59c2407e18	FEATURE: add username header to global-rate-limited responses (#25265 ) This will make it easier to analyze rate limiting in reverse-proxy logs. To make this possible without a database lookup, we add the username to the encrypted `_t` cookie data.	2024-01-15 19:50:37 +00:00
Alan Guo Xiang Tan	773b22e8d0	DEV: Seperate concerns of tracking GC stat from `MethodProfiler` (#22921 ) Why this change? This is a follow up to `e8f7b62752`. Tracking of GC stats didn't really belong in the `MethodProfiler` class so we want to extract that concern into its own class. As part of this PR, the `track_gc_stat_per_request` site setting has also been renamed to `instrument_gc_stat_per_request`.	2023-08-02 10:46:37 +08:00
David Taylor	6417173082	DEV: Apply syntax_tree formatting to `lib/*`	2023-01-09 12:10:19 +00:00
David Taylor	66e8a35b4d	DEV: Include message-bus request type in HTTP request data (#19762 )	2023-01-06 11:26:18 +00:00
Bianca Nenciu	3048d3d07d	FEATURE: Track API and user API requests (#19186 ) Adds stats for API and user API requests similar to regular page views. This comes with a new report to visualize API requests per day like the consolidated page views one.	2022-11-29 13:07:42 +02:00
David Taylor	cd6b7459a7	DEV: Improve background-request information in request_tracker (#16037 ) This will allow consumers (e.g. the discourse-prometheus plugin) to separate topic-timings and message-bus requests. It also fixes the is_background boolean for subfolder sites.	2022-02-23 12:45:42 +00:00
Osama Sayegh	b86127ad12	FEATURE: Apply rate limits per user instead of IP for trusted users (#14706 ) Currently, Discourse rate limits all incoming requests by the IP address they originate from regardless of the user making the request. This can be frustrating if there are multiple users using Discourse simultaneously while sharing the same IP address (e.g. employees in an office). This commit implements a new feature to make Discourse apply rate limits by user id rather than IP address for users at or higher than the configured trust level (1 is the default). For example, let's say a Discourse instance is configured to allow 200 requests per minute per IP address, and we have 10 users at trust level 4 using Discourse simultaneously from the same IP address. Before this feature, the 10 users could only make a total of 200 requests per minute before they got rate limited. But with the new feature, each user is allowed to make 200 requests per minute because the rate limits are applied on user id rather than the IP address. The minimum trust level for applying user-id-based rate limits can be configured by the `skip_per_ip_rate_limit_trust_level` global setting. The default is 1, but it can be changed by either adding the `DISCOURSE_SKIP_PER_IP_RATE_LIMIT_TRUST_LEVEL` environment variable with the desired value to your `app.yml`, or changing the setting's value in the `discourse.conf` file. Requests made with API keys are still rate limited by IP address and the relevant global settings that control API keys rate limits. Before this commit, Discourse's auth cookie (`_t`) was simply a 32 characters string that Discourse used to lookup the current user from the database and the cookie contained no additional information about the user. However, we had to change the cookie content in this commit so we could identify the user from the cookie without making a database query before the rate limits logic and avoid introducing a bottleneck on busy sites. Besides the 32 characters auth token, the cookie now includes the user id, trust level and the cookie's generation date, and we encrypt/sign the cookie to prevent tampering. Internal ticket number: t54739.	2021-11-17 23:27:30 +03:00
Rafael dos Santos Silva	b136375582	FEATURE: Rate limit exceptions via ENV (#14033 ) Allow admins to configure exceptions to our Rails rate limiter. Configuration happens in the environment variables, and work with both IPs and CIDR blocks. Example: ``` env: DISCOURSE_MAX_REQS_PER_IP_EXCEPTIONS: >- 14.15.16.32/27 216.148.1.2 ```	2021-08-13 12:00:23 -03:00
Bianca Nenciu	765ba1ab2d	FEATURE: Ignore anonymous page views on private sites (#12800 ) For sites with login_required set to true, counting anonymous pageviews is confusing. Requests to /login and other pages would make it look like anonymous users have access to site's content.	2021-04-26 14:19:47 +03:00
Jarek Radosz	6ff888bd2c	DEV: Retry-after header values should be strings (#12475 ) Fixes `Rack::Lint::LintError: a header value must be a String, but the value of 'Retry-After' is a Integer`. (see: `14a236b4f0/lib/rack/lint.rb (L676)`) I found it when I got flooded by those warning a while back in a test-related accident 😉 (ember CLI tests were hitting a local rails server at a fast rate)	2021-03-23 20:32:36 +01:00
Martin Brennan	6eb0d0c38d	SECURITY: Fix is_private_ip for RateLimiter to cover all cases (#12464 ) The regular expression to detect private IP addresses did not always detect them successfully. Changed to use ruby's in-built IPAddr.new(ip_address).private? method instead which does the same thing but covers all cases.	2021-03-22 13:56:32 +10:00
Dan Ungureanu	1f2f84a6df	FIX: Add Retry-Header to rate limited responses (#11736 ) It returned a 429 error code with a 'Retry-After' header if a RateLimiter::LimitExceeded was raised and unhandled, but the header was missing if the request was limited in the 'RequestTracker' middleware.	2021-01-19 11:35:46 +02:00
Tobias Eigen	0a0fd6eace	DEV: fixed capitalization in rate limit message (#11193 )	2020-11-11 12:35:03 +11:00
Sam	2686d14b9a	PERF: introduce aggressive rate limiting for anonymous (#11129 ) Previous to this change our anonymous rate limits acted as a throttle. New implementation means we now also consider rate limited requests towards the limit. This means that if an anonymous user is hammering the server it will not be able to get any requests through until it subsides with traffic.	2020-11-05 16:36:17 +11:00
Aman Gupta Karmani	8a86705e51	FIX: handle heroku style HTTP_X_REQUEST_START (#10087 )	2020-06-19 10:17:24 -04:00
Daniel Waterworth	bca126f3f5	REFACTOR: Move the multisite middleware to the front Both request tracking and message bus rely on multisite before the middleware has run which is not ideal. Follow-up-to: `ca1208a636`	2020-04-02 16:44:44 +01:00
Daniel Waterworth	ca1208a636	Revert "REFACTOR: Move the multisite middleware to the front" Looks like this is causing problems. Follow-up-to: `a91843f0dc`	2020-04-02 15:20:28 +01:00
Daniel Waterworth	a91843f0dc	REFACTOR: Move the multisite middleware to the front Both request tracking and message bus rely on multisite before the middleware has run which is not ideal.	2020-04-02 10:15:38 +01:00
Sam Saffron	494fe335d3	DEV: allow handling crawler reqs with no user agent Followup to `e440ec25` we treat no user agent as crawler reqs.	2019-12-09 18:40:10 +11:00
Sam Saffron	e440ec2519	FIX: crawler requests not tracked for non UTF-8 user agents Non UTF-8 user_agent requests were bypassing logging due to PG always wanting UTF-8 strings. This adds some conversion to ensure we are always dealing with UTF-8	2019-12-09 17:43:51 +11:00
Daniel Waterworth	563253e9ed	FIX: Fix options given to per-minute rate limiter Previously the options for the per-minute and per-10-second rate limiters were the same.	2019-09-20 10:48:59 +01:00
Sam Saffron	08743e8ac0	FEATURE: anon cache reports data to loggers This allows custom plugins such as prometheus exporter to log how many requests are stored in the anon cache vs used by the anon cache. This metric allows us to fine tune cache behaviors	2019-09-02 18:45:35 +10:00
Sam Saffron	1f47ed1ea3	PERF: message_bus will be deferred by server when flooded The message_bus performs a fair amount of work prior to hijacking requests this change ensures that if there is a situation where the server is flooded message_bus will inform client to back off for 30 seconds + random(120 secs) This back-off is ultra cheap and happens very early in the middleware. It corrects a situation where a flood to message bus could cause the app to become unresponsive MessageBus update is here to ensure message_bus gem properly respects Retry-After header and status 429. Under normal state this code should never trigger, to disable raise the value of DISCOURSE_REJECT_MESSAGE_BUS_QUEUE_SECONDS, default is to tell message bus to go away if we are queueing for 100ms or longer	2019-08-09 17:48:01 +10:00
Sam Saffron	62141b6316	FEATURE: enable_performance_http_headers for performance diagnostics This adds support for DISCOURSE_ENABLE_PERFORMANCE_HTTP_HEADERS when set to `true` this will turn on performance related headers ```text X-Redis-Calls: 10 # number of redis calls X-Redis-Time: 1.02 # redis time in seconds X-Sql-Commands: 102 # number of SQL commands X-Sql-Time: 1.02 # duration in SQL in seconds X-Queue-Time: 1.01 # time the request sat in queue (depends on NGINX) ``` To get queue time NGINX must provide: HTTP_X_REQUEST_START We do not recommend you enable this without thinking, it exposes information about what your page is doing, usually you would only enable this if you intend to strip off the headers further down the stream in a proxy	2019-06-05 16:08:11 +10:00
David Taylor	8963f1af30	FEATURE: Optional detailed performance logging for Sidekiq jobs (#7091 ) By default, this does nothing. Two environment variables are available: - `DISCOURSE_LOG_SIDEKIQ` Set to `"1"` to enable logging. This will log all completed jobs to `log/rails/sidekiq.log`, along with various db/redis/network statistics. This is useful to track down poorly performing jobs. - `DISCOURSE_LOG_SIDEKIQ_INTERVAL` (seconds) Check running jobs periodically, and log their current duration. They will appear in the logs with `status:pending`. This is useful to track down jobs which take a long time, then crash sidekiq before completing.	2019-03-05 11:19:11 +00:00
Guo Xiang Tan	c732ae9ca9	FIX: Don't update `User#last_seen_at` when PG is in readonly.	2019-01-21 13:29:29 +08:00
Guo Xiang Tan	e2a20d90fe	FIX: Don't log request when Discourse is in readonly due to PG.	2019-01-21 11:04:32 +08:00
Sam	a19170a4c2	DEV: avoid require_dependency for some libs This avoids require dependency on method_profiler and anon cache. It means that if there is any change to these files the reloader will not pick it up. Previously the reloader was picking up the anon cache twice causing it to double load on boot. This caused warnings. Long term my plan is to give up on require dependency and instead use: https://github.com/Shopify/autoload_reloader	2018-12-31 10:53:30 +11:00
Sam	955cdad649	FIX: exec_params needs instrumentation the method no longer routes to "exec" or "async_exec" in latest PG so we need to explicitly intercept	2018-12-10 14:28:10 +11:00
Sam	168ffd8384	FEATURE: group warnings about IP level rate limiting	2018-08-13 14:38:20 +10:00
Sam	7f98ed69cd	FIX: move crawler blocking to app controller We need access to site settings in multisite, we do not have access yet if we attempt to get them in request tracker middleware	2018-07-04 10:30:50 +10:00
Neil Lalonde	e8a6323bea	remove crawler blocking until multisite support	2018-07-03 17:54:45 -04:00
Sam	4810ce3607	correct regression	2018-04-18 21:04:08 +10:00
Neil Lalonde	b87fa6d749	FIX: blacklisted crawlers could get through by omitting the accept header	2018-04-17 12:39:30 -04:00
Sam	9980f18d86	FEATURE: track request queueing as early as possible	2018-04-17 18:06:17 +10:00
Neil Lalonde	4d12ff2e8a	when writing cache, remove elements from the user agents list. also return a message and content type when blocking a crawler.	2018-03-27 13:44:14 -04:00
Neil Lalonde	a84bb81ab5	only applies to get html requests	2018-03-22 17:57:44 -04:00
Neil Lalonde	ced7e9a691	FEATURE: control which web crawlers can access using a whitelist or blacklist	2018-03-22 15:41:02 -04:00
Sam	0134e41286	FEATURE: detect when client thinks user is logged on but is not This cleans up an error condition where UI thinks a user is logged on but the user is not. If this happens user will be prompted to refresh.	2018-03-06 16:49:31 +11:00
Sam	f0d5f83424	FEATURE: limit assets less that non asset paths By default assets can be requested up to 200 times per 10 seconds from the app, this includes CSS and avatars	2018-03-06 15:20:39 +11:00

1 2

79 Commits