When the server gets overloaded and lots of requests start queuing server
will attempt to shed load by returning 429 errors on background requests.
The client can flag a request as background by setting the header:
`Discourse-Background` to `true`
Out-of-the-box we shed load when the queue time goes above 0.5 seconds.
The only request we shed at the moment is the request to load up a new post
when someone posts to a topic.
We can extend this as we go with a more general pattern on the client.
Previous to this change, rate limiting would "break" the post stream which
would make suggested topics vanish and users would have to scroll the page
to see more posts in the topic.
Server needs this protection for cases where tons of clients are navigated
to a topic and a new post is made. This can lead to a self inflicted denial
of service if enough clients are viewing the topic.
Due to the internal security design of Discourse it is hard for a large
number of clients to share a channel where we would pass the full post body
via the message bus.
It also renames (and deprecates) triggerNewPostInStream to triggerNewPostsInStream
This allows us to load a batch of new posts cleanly, so the controller can
keep track of a backlog
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
To avoid blocking the sidekiq queue a limit of 10,000 digests per 30 minutes
is introduced.
This acts as a safety measure that makes sure we don't keep pouring oil on
a fire.
On multisites it is recommended to set the number way lower so sites do not
dominate the backlog. A reasonable default for multisites may be 100-500.
This can be controlled with the environment var
DISCOURSE_MAX_DIGESTS_ENQUEUED_PER_30_MINS_PER_SITE
Demon::EmailSync is used in conjunction with the SiteSetting.enable_imap to sync N IMAP mailboxes with specific groups. It is a process started in unicorn.conf, and it spawns N threads (one for each multisite connection) and for each database spans another N threads (one for each configured group).
We want this off by default so the process is not started when it does not need to be (e.g. development, test, certain hosting tiers)
In some restricted setups all JS payloads need tight control.
This setting bans admins from making changes to JS on the site and
requires all themes be whitelisted to be used.
There are edge cases we still need to work through in this mode
hence this is still not supported in production and experimental.
Use an example like this to enable:
`DISCOURSE_WHITELISTED_THEME_REPOS="https://repo.com/repo.git,https://repo.com/repo2.git"`
By default this feature is not enabled and no changes are made.
One exception is that default theme id was missing a security check
this was added for correctness.
This is so that, on a multisite cluster, when we handle a CDN request,
the hostname that is requested corresponds to one of the sites -
specifically the default site.
Some providers don't implement the Expect: 100-continue support,
which results in a mismatch in the object signature.
With this settings, users can disable the header and use such providers.
MaxMind now requires an account with a license key to download files.
Discourse admins can register for such an account at:
https://www.maxmind.com/en/geolite2/signup
License key generation is available in the profile section.
Once registered you can set the license key using `DISCOURSE_MAXMIND_LICENSE_KEY`
This amends it so we unconditionally skip MaxMind DB downloads if no license key exists.
We have tested rate limiting with admin accounts with block rate limiting for
close to 12 months now on meta.discourse.org.
This has resulted in no degradation of services even to admin accounts that
request a lot of info from the site.
The default of 200 requests a minute and 50 per 10 seconds is very generous.
It simply protects against very aggressive clients.
This setting can be disabled or tweaked using:
DISCOURSE_MAX_REQS_PER_IP_MODE and family.
The only big downside here is in cases when a very large number of users tend
to all come from a single IP.
This can be the case on sites accessing Discourse from an internal network
all sharing the same IP via NAT. Or a misconfigured Discourse that is unable
to resolve IP addresses of users due to proxy mis-configuration.
This commit introduces 2 features:
1. DISCOURSE_COMPRESS_ANON_CACHE (true|false, default false): this allows
you to optionally compress the anon cache body entries in Redis, can be
useful for high load sites with Redis that lives on a separate server to
to webs
2. DISCOURSE_ANON_CACHE_STORE_THRESHOLD (default 2), only pop entries into
redis if we observe them more than N times. This avoids situations where
a crawler can walk a big pile of topics and store them all in Redis never
to be used. Our default anon cache time for topics is only 60 seconds. Anon
cache is in place to avoid the "slashdot" effect where a single topic is
hit by 100s of people in one minute.
Under extreme load on large databases certain regular jobs can take quite
a while to run. We need to ensure we never starve a sidekiq from running
mini scheduler, cause without it we are unable to queue stuff such as
heartbeat jobs.
This adds a 1 minute rate limit to all JS error reporting per IP. Previously
we would only use the global rate limit.
This also introduces DISCOURSE_ENABLE_JS_ERROR_REPORTING, if it is set to
false then no JS error reporting will be allowed on the site.
The message_bus performs a fair amount of work prior to hijacking requests
this change ensures that if there is a situation where the server is flooded
message_bus will inform client to back off for 30 seconds + random(120 secs)
This back-off is ultra cheap and happens very early in the middleware.
It corrects a situation where a flood to message bus could cause the app
to become unresponsive
MessageBus update is here to ensure message_bus gem properly respects
Retry-After header and status 429.
Under normal state this code should never trigger, to disable raise the
value of DISCOURSE_REJECT_MESSAGE_BUS_QUEUE_SECONDS, default is to tell
message bus to go away if we are queueing for 100ms or longer
The global setting disable_search_queue_threshold
(DISCOURSE_DISABLE_SEARCH_QUEUE_THRESHOLD) which default to 1 second was
added.
This protection ensures that when the application is unable to keep up with
requests it will simply turn off search till it is not backed up.
To disable this protection set this to 0.
This adds support for DISCOURSE_ENABLE_PERFORMANCE_HTTP_HEADERS
when set to `true` this will turn on performance related headers
```text
X-Redis-Calls: 10 # number of redis calls
X-Redis-Time: 1.02 # redis time in seconds
X-Sql-Commands: 102 # number of SQL commands
X-Sql-Time: 1.02 # duration in SQL in seconds
X-Queue-Time: 1.01 # time the request sat in queue (depends on NGINX)
```
To get queue time NGINX must provide: HTTP_X_REQUEST_START
We do not recommend you enable this without thinking, it exposes information
about what your page is doing, usually you would only enable this if you
intend to strip off the headers further down the stream in a proxy
Adds `DISCOURSE_MESSAGE_BUS_REDIS_ENABLED` env var, that when set
to true, will allow Discourse to connect to a different redis
instance for MessageBus needs.
When enabled you can configure the same env vars user for redis,
but prefixed by `MESSAGE_BUS`, eg:
`DISCOURSE_MESSAGE_BUS_REDIS_HOST`
This new `DISCOURSE_MAXMIND_BACKUP_PATH` can be used a secondary location
for maxmind db. That way a build machine, for example can cache it on the
host and reuse between builds.
Also per 5bfeef77 added proper error raising for download fails from
dedicated rake task
This also moves "refresh_maxmind_db_during_precompile_days" to a global
setting, it did not make sense in a site setting
This cleans up logster configuration a bit cause we no longer have to
check if we respond_to anything and keeps the logster limit properly
documented
Followup on da578e92
- s3_force_path_style was added as a Minio specific url scheme but it has never been well supported in our code base.
- Our new migrate_to_s3 rake task does not work reliably with path style urls too
- Minio has also added support for virtual style requests i.e the same scheme as AWS S3/DO Spaces so we can rely on that instead of using path style requests.
- Add migration to drop s3_force_path_style from the site_settings table
Some cloud providers (Google Memorystore) do not support any CLIENT commands
By setting :id to nil in the redis config hash we can avoid these commands.
This adds a special global setting GCE users can enable:
`DISCOURSE_REDIS_SKIP_CLIENT_COMMANDS = true`
We have the periodical job that regularly will rebake old posts. This is
used to trickle in update to cooked markdown. The problem is that each rebake
can issue multiple background jobs (post process and pull hotlinked images)
Previously we had no per-cluster limit so cluster running 100s of sites could
flood the sidekiq queue with rebake related jobs.
New system introduces a hard limit of 300 rebakes per 15 minutes across a
cluster to ensure the sidekiq job is not dominated by this.
We also reduced `rebake_old_posts_count` to 80, which is a safer default.
If "logged in" is being forced anonymous on certain routes, trigger
the protection for any requests that spend 50ms queueing
This means that ...
1. You need to trip it by having 3 requests take longer than 1 second in 10 second interval
2. Once tripped, if your route is still spending 50m queueuing it will continue to be protected
This means that site will continue to function with almost no delays while it is scaling up to handle the new load
If a particular path is being hit extremely hard by logged on users,
revert to anonymous cached view.
This will only come into effect if 3 requests queue for longer than 2 seconds
on a *single* path.
This can happen if a URL is shared with the entire forum base and everyone
is logged on
* In `pg_dump` 10.3+ and 9.5.12+, in
it does a `SELECT pg_catalog.set_config('search_path', '', false)`
which changes the state of the current connection. This is known
to be problematic with Pgbouncer which reuses connections. As such,
we'll always try to connect directly to PG directly during
the backup/restore process.
This refactors handling of s3 so it can be specified via GlobalSetting
This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3
It is a critical setting for situations where assets are mirrored to s3.
Revamped system for managing authentication tokens.
- Every user has 1 token per client (web browser)
- Tokens are rotated every 10 minutes
New system migrates the old tokens to "legacy" tokens,
so users still remain logged on.
Also introduces weekly job to expire old auth tokens.