This commit moves the logic for crawler rate limits out of the application controller and into the request tracker middleware. The reason for this move is to apply rate limits to all crawler requests instead of just the requests that make it to the application controller. Some requests are served early from the middleware stack without reaching the Rails app for performance reasons (e.g. `AnonymousCache`) which results in crawlers getting 200 responses even though they've reached their limits and should be getting 429 responses.
Internal topic: t/128810.
This commit removes the 'experimental_preconnect_link_header' site setting, and the 'preload_link_header' site setting, and introduces two new global settings: early_hint_header_mode and early_hint_header_name.
We don't actually send 103 Early Hint responses from Discourse. However, upstream proxies can be configured to cache a response header from the app and use that to send an Early Hint response to future clients.
- `early_hint_header_mode` specifies the mode for the early hint header. Can be nil (disabled), "preconnect" (lists just CDN domains) or "preload" (lists all assets).
- `early_hint_header_name` specifies which header name to use for the early hint. Defaults to "Link", but can be changed to support different proxy mechanisms.
Why this change?
In https://web.dev/articles/preconnect-and-dns-prefetch, it describes
how hinting to the browser to preconnect to domains which we will
eventually use the connection for can help improve the time it takes to
load a page.
We are putting this behind an experimental flag so that we can test and
profile this in a production environment.
What does this change introduce?
Introduce a hidden experimental `experimental_preconnect_link_header`
site setting which when enabled will add the `preconnect` and
`dns-prefetch` resource hints to the response headers for full page load
requests.
With the new admin sidebar restructure, we have a link to "Installed plugins". We would like to ensure that when the admin is searching for a plugin name like "akismet" or "automation" this link will be visible. Also when entering the plugins page, related plugins should be highlighted.
The strict-dynamic CSP directive is supported in all our target browsers, and makes for a much simpler configuration. Instead of allowlisting paths, we use a per-request nonce to authorize `<script>` tags, and then those scripts are allowed to load additional scripts (or add additional inline scripts) without restriction.
This becomes especially useful when admins want to add external scripts like Google Tag Manager, or advertising scripts, which then go on to load a ton of other scripts.
All script tags introduced via themes will automatically have the nonce attribute applied, so it should be zero-effort for theme developers. Plugins *may* need some changes if they are inserting their own script tags.
This commit introduces a strict-dynamic-based CSP behind an experimental `content_security_policy_strict_dynamic` site setting.
When we show the links to installed plugins in the admin
sidebar (for plugins that have custom admin routes) we were
previously only doing this if you opened /admin, not if you
navigated there from the main forum. We should just always
preload this data if the user is admin.
This commit also changes `admin_sidebar_enabled_groups` to
not be sent to the client as part of ongoing efforts to
not check groups on the client, since not all a user's groups
may be serialized.
We have all these calls to Group.refresh_automatic_groups! littered throughout the tests. Including tests that are seemingly unrelated to groups. This is because automatic group memberships aren't fabricated when making a vanilla user. There are two places where you'd want to use this:
You have fabricated a user that needs a certain trust level (which is now based on group membership.)
You need the system user to have a certain trust level.
In the first case, we can pass refresh_auto_groups: true to the fabricator instead. This is a more lightweight operation that only considers a single user, instead of all users in all groups.
The second case is no longer a thing after #25400.
Why this change?
This is part of our efforts to harden the security of the Discourse
application. Setting the `CROSS_ORIGIN_OPENER_POLICY` header to `same-origin-allow-popups`
by default makes the application safer. We have opted to make this a
hidden site setting because most admins will never have to care about
this setting so we're are opting not to show it. If they do have to
change it, they can still do so by setting the
`DISCOURSE_CROSS_ORIGIN_OPENER_POLICY` env.
Previously, the app HTML served by the Ember-CLI proxy was generated based on a 'bootstrap json' payload generated by Rails. This inevitably leads to differences between the Rails HTML and the Ember-CLI HTML.
This commit overhauls our proxying strategy. Now, we totally ignore the ember-cli `index.html` file. Instead, we take the full HTML from Rails and surgically replace script URLs based on a `data-discourse-entrypoint` attribute. This should be faster (only one request to Rails), more robust, and less confusing for developers.
The most common thing that we do with fab! is:
fab!(:thing) { Fabricate(:thing) }
This commit adds a shorthand for this which is just simply:
fab!(:thing)
i.e. If you omit the block, then, by default, you'll get a `Fabricate`d object using the fabricator of the same name.
No plugins or themes rely on anonymous_posting_min_trust_level so we
can just switch straight over to anonymous_posting_allowed_groups
This also adds an AUTO_GROUPS const which can be imported in JS
tests which is analogous to the one defined in group.rb. This can be used
to set the current user's groups where JS tests call for checking these groups
against site settings.
Finally a AtLeastOneGroupValidator validator is added for group_list site
settings which ensures that at least one group is always selected, since if
you want to allow all users to use a feature in this way you can just use
the everyone group.
Why this change?
As part of our ongoing efforts to security harden the Discourse
application, we are adding the `cross_origin_opener_policy_header` site setting
which allows the `Cross-Origin-Opener-Policy` response header to be set on requests
that preloads the Discourse application. In more technical terms, only
GET requests that are not json or xhr will have the response header set.
The `cross_origin_opener_policy_header` site setting is hidden for now
for testing purposes and will either be released as a public site
setting or be remove if we decide to be opinionated and ship a default
for the `Cross-Origin-Opener-Policy` response header.
]When changing fonts in the `/wizard/steps/styling` step of
the wizard, users would not see the font loaded straight away,
having to switch to another one then back to the original to
see the result. This is because we are using canvas to render
the style preview and this fails with a Chrome-based intervention
when font loading is taking too long:
> [Intervention] Slow network is detected. See
https://www.chromestatus.com/feature/5636954674692096 for more details.
Fallback font will be used while loading:
https://sea2.discourse-cdn.com/business7/fonts/Roboto-Bold.ttf?v=0.0.9
We can get around this by manually loading the fonts selected using
the FontFace JS API when the user selects them and before rerendering
the canvas. This just requires preloading more information about the
fonts if the user is admin so the wizard can query this data.
This method is a huge footgun in production, since it calls
the Redis KEYS command. From the Redis documentation at
https://redis.io/commands/keys/:
> Warning: consider KEYS as a command that should only be used in
production environments with extreme care. It may ruin performance when
it is executed against large databases. This command is intended for
debugging and special operations, such as changing your keyspace layout.
Don't use KEYS in your regular application code.
Since we were only using `delete_prefixed` in specs (now that we
removed the usage in production in 24ec06ff85)
we can remove this and instead rely on `use_redis_snapshotting` on the
particular tests that need this kind of clearing functionality.
* Revert "Revert "FEATURE: Preload resources via link header (#18475)" (#18511)"
This reverts commit 95a57f7e0c.
* put behind feature flag
* env -> global setting
* declare global setting
* forgot one spot
Experiment moving from preload tags in the document head to preload information the the response headers.
While this is a minor improvement in most browsers (headers are parsed before the response body), this allows smart proxies like Cloudflare to "learn" from those headers and build HTTP 103 Early Hints for subsequent requests to the same URI, which will allow the user agent to download and parse our JS/CSS while we are waiting for the server to generate and stream the HTML response.
Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>
This lets us use all our normal JS tooling like prettier, esline and babel on the splash screen JS. At runtime the JS file is read and inlined into the HTML. This commit also switches us to use a CSP hash rather than a nonce for the splash screen.
We previously relied on CSS animation-delay for the splash. This means that we can get inconsistent results based on device/network conditions.
This PR moves us to a more consistent timing based on {request time + 2 seconds}
Internal topic: /t/65378/65
This PR introduces a new hidden site setting that allows admins to display a splash screen while site assets load.
The splash screen can be enabled via the `splash_screen` hidden site setting.
This is what the splash screen currently looks like
5ceb72f085.mp4
Once site assets load, the splash screen is automatically removed.
To control the loading text that shows in the splash screen, you can change the preloader_text translation string in admin > customize > text
`TestLogger` was responsible for some flaky specs runs:
```
Error during failsafe response: undefined method `debug' for #<TestLogger:0x0000556c4b942cf0 @warnings=1>
Did you mean? debugger
```
This commit also cleans up other uses of `FakeLogger`
It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors.
By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.
When redirecting to login, we store a destination_url cookie, which the user is then redirected to after login. We never want the user to be redirected to a JSON URL. Instead, we should return a 403 in these situations.
This should also be much less confusing for API consumers - a 403 is a better representation than a 302.
We have a couple of site setting, `slow_down_crawler_user_agents` and `slow_down_crawler_rate`, that are meant to allow site owners to signal to specific crawlers that they're crawling the site too aggressively and that they should slow down.
When a crawler is added to the `slow_down_crawler_user_agents` setting, Discourse currently adds a `Crawl-delay` directive for that crawler in `/robots.txt`. Unfortunately, many crawlers don't support the `Crawl-delay` directive in `/robots.txt` which leaves the site owners no options if a crawler is crawling the site too aggressively.
This PR replaces the `Crawl-delay` directive with proper rate limiting for crawlers added to the `slow_down_crawler_user_agents` list. On every request made by a non-logged in user, Discourse will check the User Agent string and if it contains one of the values of the `slow_down_crawler_user_agents` list, Discourse will only allow 1 request every N seconds for that User Agent (N is the value of the `slow_down_crawler_rate` setting) and the rest of requests made within the same interval will get a 429 response.
The `slow_down_crawler_user_agents` setting becomes quite dangerous with this PR since it could rate limit lots if not all of anonymous traffic if the setting is not used appropriately. So to protect against this scenario, we've added a couple of new validations to the setting when it's changed:
1) each value added to setting must 3 characters or longer
2) each value cannot be a substring of tokens found in popular browser User Agent. The current list of prohibited values is: apple, windows, linux, ubuntu, gecko, firefox, chrome, safari, applewebkit, webkit, mozilla, macintosh, khtml, intel, osx, os x, iphone, ipad and mac.
* FEATURE: Optionally send a 'noindex' header in non-canonical responses
This will be used in a SEO experiment.
Co-authored-by: David Taylor <david@taylorhq.com>
Currently, Discourse rate limits all incoming requests by the IP address they
originate from regardless of the user making the request. This can be
frustrating if there are multiple users using Discourse simultaneously while
sharing the same IP address (e.g. employees in an office).
This commit implements a new feature to make Discourse apply rate limits by
user id rather than IP address for users at or higher than the configured trust
level (1 is the default).
For example, let's say a Discourse instance is configured to allow 200 requests
per minute per IP address, and we have 10 users at trust level 4 using
Discourse simultaneously from the same IP address. Before this feature, the 10
users could only make a total of 200 requests per minute before they got rate
limited. But with the new feature, each user is allowed to make 200 requests
per minute because the rate limits are applied on user id rather than the IP
address.
The minimum trust level for applying user-id-based rate limits can be
configured by the `skip_per_ip_rate_limit_trust_level` global setting. The
default is 1, but it can be changed by either adding the
`DISCOURSE_SKIP_PER_IP_RATE_LIMIT_TRUST_LEVEL` environment variable with the
desired value to your `app.yml`, or changing the setting's value in the
`discourse.conf` file.
Requests made with API keys are still rate limited by IP address and the
relevant global settings that control API keys rate limits.
Before this commit, Discourse's auth cookie (`_t`) was simply a 32 characters
string that Discourse used to lookup the current user from the database and the
cookie contained no additional information about the user. However, we had to
change the cookie content in this commit so we could identify the user from the
cookie without making a database query before the rate limits logic and avoid
introducing a bottleneck on busy sites.
Besides the 32 characters auth token, the cookie now includes the user id,
trust level and the cookie's generation date, and we encrypt/sign the cookie to
prevent tampering.
Internal ticket number: t54739.
By default, Rails only includes the Vary:Accept header in responses when the Accept: header is included in the request. This means that proxies/browsers may cache a response to a request with a missing Accept header, and then later serve that cached version for a request which **does** supply the Accept header. This can lead to some very unexpected behavior in browsers.
This commit adds the Vary:Accept header for all requests, even if the Accept header is not present in the request. If a format parameter (e.g. `.json` suffix) is included in the path, then the Accept header is still omitted. (The format parameter takes precedence over any Accept: header, so the response is no longer varies based on the Accept header)
Before this change, calling `StyleSheet::Manager.stylesheet_details`
for the first time resulted in multiple queries to the database. This is
because the code was modelled in a way where each `Theme` was loaded
from the database one at a time.
This PR restructures the code such that it allows us to load all the
theme records in a single query. It also allows us to eager load the
required associations upfront. In order to achieve this, I removed the
support of loading multiple themes per request. It was initially added
to support user selectable theme components but the feature was never
completed and abandoned because it wasn't a feature that we thought was
worth building.
Over the years we accrued many spelling mistakes in the code base.
This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change
- comments
- test descriptions
- other low risk areas
Rails 6.1.3.1 deprecates a few API and has some internal changes that break our tests suite, so this commit fixes all the deprecations and errors and now Discourse should be fully compatible with Rails 6.1.3.1. We also have a new release of the rails_failover gem that's compatible with Rails 6.1.3.1.
The 'Discourse SSO' protocol is being rebranded to DiscourseConnect. This should help to reduce confusion when 'SSO' is used in the generic sense.
This commit aims to:
- Rename `sso_` site settings. DiscourseConnect specific ones are prefixed `discourse_connect_`. Generic settings are prefixed `auth_`
- Add (server-side-only) backwards compatibility for the old setting names, with deprecation notices
- Copy `site_settings` database records to the new names
- Rename relevant translation keys
- Update relevant translations
This commit does **not** aim to:
- Rename any Ruby classes or methods. This might be done in a future commit
- Change any URLs. This would break existing integrations
- Make any changes to the protocol. This would break existing integrations
- Change any functionality. Further normalization across DiscourseConnect and other auth methods will be done separately
The risks are:
- There is no backwards compatibility for site settings on the client-side. Accessing auth-related site settings in Javascript is fairly rare, and an error on the client side would not be security-critical.
- If a plugin is monkey-patching parts of the auth process, changes to locale keys could cause broken error messages. This should also be unlikely. The old site setting names remain functional, so security-related overrides will remain working.
A follow-up commit will be made with a post-deploy migration to delete the old `site_settings` rows.
This cookie is only used during login. Having it persist after that can
cause some unusual behavior, especially for sites with short session
lengths.
We were already deleting the cookie following a new signup, but not for
existing users.
This commit moves the cookie deletion logic out of the erb template, and
adds logic and tests to ensure it is always deleted consistently.
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
This ensures that users are only served cached content in their own language. This commit also refactors to make use of the `Discourse.cache` framework rather than direct redis access
This reverts commit e3de45359f.
We need to improve out strategy by adding a cache breaker with this change ... some assets on CDNs and clients may have incorrect CORS headers which can cause stuff to break.