We have tested rate limiting with admin accounts with block rate limiting for
close to 12 months now on meta.discourse.org.
This has resulted in no degradation of services even to admin accounts that
request a lot of info from the site.
The default of 200 requests a minute and 50 per 10 seconds is very generous.
It simply protects against very aggressive clients.
This setting can be disabled or tweaked using:
DISCOURSE_MAX_REQS_PER_IP_MODE and family.
The only big downside here is in cases when a very large number of users tend
to all come from a single IP.
This can be the case on sites accessing Discourse from an internal network
all sharing the same IP via NAT. Or a misconfigured Discourse that is unable
to resolve IP addresses of users due to proxy mis-configuration.
This commit introduces 2 features:
1. DISCOURSE_COMPRESS_ANON_CACHE (true|false, default false): this allows
you to optionally compress the anon cache body entries in Redis, can be
useful for high load sites with Redis that lives on a separate server to
to webs
2. DISCOURSE_ANON_CACHE_STORE_THRESHOLD (default 2), only pop entries into
redis if we observe them more than N times. This avoids situations where
a crawler can walk a big pile of topics and store them all in Redis never
to be used. Our default anon cache time for topics is only 60 seconds. Anon
cache is in place to avoid the "slashdot" effect where a single topic is
hit by 100s of people in one minute.
Under extreme load on large databases certain regular jobs can take quite
a while to run. We need to ensure we never starve a sidekiq from running
mini scheduler, cause without it we are unable to queue stuff such as
heartbeat jobs.
This adds a 1 minute rate limit to all JS error reporting per IP. Previously
we would only use the global rate limit.
This also introduces DISCOURSE_ENABLE_JS_ERROR_REPORTING, if it is set to
false then no JS error reporting will be allowed on the site.
The message_bus performs a fair amount of work prior to hijacking requests
this change ensures that if there is a situation where the server is flooded
message_bus will inform client to back off for 30 seconds + random(120 secs)
This back-off is ultra cheap and happens very early in the middleware.
It corrects a situation where a flood to message bus could cause the app
to become unresponsive
MessageBus update is here to ensure message_bus gem properly respects
Retry-After header and status 429.
Under normal state this code should never trigger, to disable raise the
value of DISCOURSE_REJECT_MESSAGE_BUS_QUEUE_SECONDS, default is to tell
message bus to go away if we are queueing for 100ms or longer
The global setting disable_search_queue_threshold
(DISCOURSE_DISABLE_SEARCH_QUEUE_THRESHOLD) which default to 1 second was
added.
This protection ensures that when the application is unable to keep up with
requests it will simply turn off search till it is not backed up.
To disable this protection set this to 0.
This adds support for DISCOURSE_ENABLE_PERFORMANCE_HTTP_HEADERS
when set to `true` this will turn on performance related headers
```text
X-Redis-Calls: 10 # number of redis calls
X-Redis-Time: 1.02 # redis time in seconds
X-Sql-Commands: 102 # number of SQL commands
X-Sql-Time: 1.02 # duration in SQL in seconds
X-Queue-Time: 1.01 # time the request sat in queue (depends on NGINX)
```
To get queue time NGINX must provide: HTTP_X_REQUEST_START
We do not recommend you enable this without thinking, it exposes information
about what your page is doing, usually you would only enable this if you
intend to strip off the headers further down the stream in a proxy
Adds `DISCOURSE_MESSAGE_BUS_REDIS_ENABLED` env var, that when set
to true, will allow Discourse to connect to a different redis
instance for MessageBus needs.
When enabled you can configure the same env vars user for redis,
but prefixed by `MESSAGE_BUS`, eg:
`DISCOURSE_MESSAGE_BUS_REDIS_HOST`
This new `DISCOURSE_MAXMIND_BACKUP_PATH` can be used a secondary location
for maxmind db. That way a build machine, for example can cache it on the
host and reuse between builds.
Also per 5bfeef77 added proper error raising for download fails from
dedicated rake task
This also moves "refresh_maxmind_db_during_precompile_days" to a global
setting, it did not make sense in a site setting
This cleans up logster configuration a bit cause we no longer have to
check if we respond_to anything and keeps the logster limit properly
documented
Followup on da578e92
- s3_force_path_style was added as a Minio specific url scheme but it has never been well supported in our code base.
- Our new migrate_to_s3 rake task does not work reliably with path style urls too
- Minio has also added support for virtual style requests i.e the same scheme as AWS S3/DO Spaces so we can rely on that instead of using path style requests.
- Add migration to drop s3_force_path_style from the site_settings table
Some cloud providers (Google Memorystore) do not support any CLIENT commands
By setting :id to nil in the redis config hash we can avoid these commands.
This adds a special global setting GCE users can enable:
`DISCOURSE_REDIS_SKIP_CLIENT_COMMANDS = true`
We have the periodical job that regularly will rebake old posts. This is
used to trickle in update to cooked markdown. The problem is that each rebake
can issue multiple background jobs (post process and pull hotlinked images)
Previously we had no per-cluster limit so cluster running 100s of sites could
flood the sidekiq queue with rebake related jobs.
New system introduces a hard limit of 300 rebakes per 15 minutes across a
cluster to ensure the sidekiq job is not dominated by this.
We also reduced `rebake_old_posts_count` to 80, which is a safer default.
If "logged in" is being forced anonymous on certain routes, trigger
the protection for any requests that spend 50ms queueing
This means that ...
1. You need to trip it by having 3 requests take longer than 1 second in 10 second interval
2. Once tripped, if your route is still spending 50m queueuing it will continue to be protected
This means that site will continue to function with almost no delays while it is scaling up to handle the new load
If a particular path is being hit extremely hard by logged on users,
revert to anonymous cached view.
This will only come into effect if 3 requests queue for longer than 2 seconds
on a *single* path.
This can happen if a URL is shared with the entire forum base and everyone
is logged on
* In `pg_dump` 10.3+ and 9.5.12+, in
it does a `SELECT pg_catalog.set_config('search_path', '', false)`
which changes the state of the current connection. This is known
to be problematic with Pgbouncer which reuses connections. As such,
we'll always try to connect directly to PG directly during
the backup/restore process.
This refactors handling of s3 so it can be specified via GlobalSetting
This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3
It is a critical setting for situations where assets are mirrored to s3.
Revamped system for managing authentication tokens.
- Every user has 1 token per client (web browser)
- Tokens are rotated every 10 minutes
New system migrates the old tokens to "legacy" tokens,
so users still remain logged on.
Also introduces weekly job to expire old auth tokens.
Hardcoding the Redis DB and Redis Caching DB to 0 and 2 in
`config/database.yml` makes an unsafe assumption that Discourse is the
only application using that install of redis-server. Instead of forcing
users to undergo yet another form of configuration, allow Discourse
admins a nicer way to configure the Redis databases used.
Signed-off-by: David Celis <me@davidcel.is>