When redirecting to login, we store a destination_url cookie, which the user is then redirected to after login. We never want the user to be redirected to a JSON URL. Instead, we should return a 403 in these situations.
This should also be much less confusing for API consumers - a 403 is a better representation than a 302.
We have a couple of site setting, `slow_down_crawler_user_agents` and `slow_down_crawler_rate`, that are meant to allow site owners to signal to specific crawlers that they're crawling the site too aggressively and that they should slow down.
When a crawler is added to the `slow_down_crawler_user_agents` setting, Discourse currently adds a `Crawl-delay` directive for that crawler in `/robots.txt`. Unfortunately, many crawlers don't support the `Crawl-delay` directive in `/robots.txt` which leaves the site owners no options if a crawler is crawling the site too aggressively.
This PR replaces the `Crawl-delay` directive with proper rate limiting for crawlers added to the `slow_down_crawler_user_agents` list. On every request made by a non-logged in user, Discourse will check the User Agent string and if it contains one of the values of the `slow_down_crawler_user_agents` list, Discourse will only allow 1 request every N seconds for that User Agent (N is the value of the `slow_down_crawler_rate` setting) and the rest of requests made within the same interval will get a 429 response.
The `slow_down_crawler_user_agents` setting becomes quite dangerous with this PR since it could rate limit lots if not all of anonymous traffic if the setting is not used appropriately. So to protect against this scenario, we've added a couple of new validations to the setting when it's changed:
1) each value added to setting must 3 characters or longer
2) each value cannot be a substring of tokens found in popular browser User Agent. The current list of prohibited values is: apple, windows, linux, ubuntu, gecko, firefox, chrome, safari, applewebkit, webkit, mozilla, macintosh, khtml, intel, osx, os x, iphone, ipad and mac.
* FEATURE: Optionally send a 'noindex' header in non-canonical responses
This will be used in a SEO experiment.
Co-authored-by: David Taylor <david@taylorhq.com>
Currently, Discourse rate limits all incoming requests by the IP address they
originate from regardless of the user making the request. This can be
frustrating if there are multiple users using Discourse simultaneously while
sharing the same IP address (e.g. employees in an office).
This commit implements a new feature to make Discourse apply rate limits by
user id rather than IP address for users at or higher than the configured trust
level (1 is the default).
For example, let's say a Discourse instance is configured to allow 200 requests
per minute per IP address, and we have 10 users at trust level 4 using
Discourse simultaneously from the same IP address. Before this feature, the 10
users could only make a total of 200 requests per minute before they got rate
limited. But with the new feature, each user is allowed to make 200 requests
per minute because the rate limits are applied on user id rather than the IP
address.
The minimum trust level for applying user-id-based rate limits can be
configured by the `skip_per_ip_rate_limit_trust_level` global setting. The
default is 1, but it can be changed by either adding the
`DISCOURSE_SKIP_PER_IP_RATE_LIMIT_TRUST_LEVEL` environment variable with the
desired value to your `app.yml`, or changing the setting's value in the
`discourse.conf` file.
Requests made with API keys are still rate limited by IP address and the
relevant global settings that control API keys rate limits.
Before this commit, Discourse's auth cookie (`_t`) was simply a 32 characters
string that Discourse used to lookup the current user from the database and the
cookie contained no additional information about the user. However, we had to
change the cookie content in this commit so we could identify the user from the
cookie without making a database query before the rate limits logic and avoid
introducing a bottleneck on busy sites.
Besides the 32 characters auth token, the cookie now includes the user id,
trust level and the cookie's generation date, and we encrypt/sign the cookie to
prevent tampering.
Internal ticket number: t54739.
By default, Rails only includes the Vary:Accept header in responses when the Accept: header is included in the request. This means that proxies/browsers may cache a response to a request with a missing Accept header, and then later serve that cached version for a request which **does** supply the Accept header. This can lead to some very unexpected behavior in browsers.
This commit adds the Vary:Accept header for all requests, even if the Accept header is not present in the request. If a format parameter (e.g. `.json` suffix) is included in the path, then the Accept header is still omitted. (The format parameter takes precedence over any Accept: header, so the response is no longer varies based on the Accept header)
Before this change, calling `StyleSheet::Manager.stylesheet_details`
for the first time resulted in multiple queries to the database. This is
because the code was modelled in a way where each `Theme` was loaded
from the database one at a time.
This PR restructures the code such that it allows us to load all the
theme records in a single query. It also allows us to eager load the
required associations upfront. In order to achieve this, I removed the
support of loading multiple themes per request. It was initially added
to support user selectable theme components but the feature was never
completed and abandoned because it wasn't a feature that we thought was
worth building.
Over the years we accrued many spelling mistakes in the code base.
This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change
- comments
- test descriptions
- other low risk areas
Rails 6.1.3.1 deprecates a few API and has some internal changes that break our tests suite, so this commit fixes all the deprecations and errors and now Discourse should be fully compatible with Rails 6.1.3.1. We also have a new release of the rails_failover gem that's compatible with Rails 6.1.3.1.
The 'Discourse SSO' protocol is being rebranded to DiscourseConnect. This should help to reduce confusion when 'SSO' is used in the generic sense.
This commit aims to:
- Rename `sso_` site settings. DiscourseConnect specific ones are prefixed `discourse_connect_`. Generic settings are prefixed `auth_`
- Add (server-side-only) backwards compatibility for the old setting names, with deprecation notices
- Copy `site_settings` database records to the new names
- Rename relevant translation keys
- Update relevant translations
This commit does **not** aim to:
- Rename any Ruby classes or methods. This might be done in a future commit
- Change any URLs. This would break existing integrations
- Make any changes to the protocol. This would break existing integrations
- Change any functionality. Further normalization across DiscourseConnect and other auth methods will be done separately
The risks are:
- There is no backwards compatibility for site settings on the client-side. Accessing auth-related site settings in Javascript is fairly rare, and an error on the client side would not be security-critical.
- If a plugin is monkey-patching parts of the auth process, changes to locale keys could cause broken error messages. This should also be unlikely. The old site setting names remain functional, so security-related overrides will remain working.
A follow-up commit will be made with a post-deploy migration to delete the old `site_settings` rows.
This cookie is only used during login. Having it persist after that can
cause some unusual behavior, especially for sites with short session
lengths.
We were already deleting the cookie following a new signup, but not for
existing users.
This commit moves the cookie deletion logic out of the erb template, and
adds logic and tests to ensure it is always deleted consistently.
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
This ensures that users are only served cached content in their own language. This commit also refactors to make use of the `Discourse.cache` framework rather than direct redis access
This reverts commit e3de45359f.
We need to improve out strategy by adding a cache breaker with this change ... some assets on CDNs and clients may have incorrect CORS headers which can cause stuff to break.
Error messages for exceeded rate limits and invalid parameters always used the English locale instead of the default locale or the current user's locale.
This allows administrators to stop automatic redirect to an external authenticator. It only takes effect when there is a single authentication method, and the site is login_required
For pages that do not specify canonical URL we will default to `https://SITENAME/PATH`.
This ensures that if a URL is crawled on the CDN the search ranking will transfer to the main site.
Additionally we whitelist the `?page` param
Additionally correctly handle cookie path for authentication_data
There were two bugs that exposed an interesting case where two discourse
instances hosted across two subfolder installs in the same domain
with oauth may clash and cause strange redirection on first login:
Log in to example.com/forum1. authentication_data cookie is set with path /
On the first redirection, the current authentication_data cookie is not unset.
Log in to example.com/forum2. In this case, the authentication_data cookie
is already set from forum1 - the initial page load will incorrectly redirect
the user to the redirect URL from the already-stored cookie, to /forum1.
This removes this issue by:
* Setting the cookie for the correct path, and not having it on root
* Correctly removing the cookie on first login
Attempt 2, with more test.
Additionally correctly handle cookie path for authentication_data
There were two bugs that exposed an interesting case where two discourse
instances hosted across two subfolder installs in the same domain
with oauth may clash and cause strange redirection on first login:
Log in to example.com/forum1. authentication_data cookie is set with path /
On the first redirection, the current authentication_data cookie is not unset.
Log in to example.com/forum2. In this case, the authentication_data cookie
is already set from forum1 - the initial page load will incorrectly redirect
the user to the redirect URL from the already-stored cookie, to /forum1.
This removes this issue by:
Setting the cookie for the correct path, and not having it on root
Correctly removing the cookie on first login
* FEATURE: Ability to add components to all themes
This is the first and functional step from that topic https://dev.discourse.org/t/adding-a-theme-component-is-too-much-work/15398/16
The idea here is that when a new component is added, the user can easily assign it to all themes (parents).
To achieve that, I needed to change a site-setting component to accept `setDefaultValues` action and `setDefaultValuesLabel` translated label.
Also, I needed to add `allowAny` option to disable that for theme selector.
I also refactored backend to accept both parent and child ids with one method to avoid duplication (Renamed `add_child_theme!` to more general `add_relative_theme!`)
* FIX: Improvement after code review
* FIX: Improvement after code review2
* FIX: use mapBy and filterBy directly
Previously people were not consistent about mocking which left internals in
a fragile state when running subfolder specs.
This introduces a simple helper `set_subfolder` which you can use to set
the subfolder for the spec. It takes care of proper configuration of subfolder
and teardown.
```
# usage
set_subfolder "/my_amazing_subfolder"
```
You should no longer stub base_uri or global_settings
This brings the behavior in line with native Discourse SSO. If login is required, and a user tries to visit the forum, they will be directed straight to the external login page without requiring any clicks.
If the setting is turned on, then the user will receive information
about the subject: if it was deleted or requires some special access to
a group (only if the group is public). Otherwise, the user will receive
a generic #404 error message. For now, this change affects only the
topics and categories controller.
This commit also tries to refactor some of the code related to error
handling. To make error pages more consistent (design-wise), the actual
error page will be rendered server-side.
Using popups is becoming increasingly rare. Full page redirects are already used on mobile, and for some providers. This commit removes all logic related to popup authentication, leaving only the full page redirect method.
For more info, see https://meta.discourse.org/t/do-we-need-popups-for-login/127988
* Introduced fab!, a helper that creates database state for a group
It's almost identical to let_it_be, except:
1. It creates a new object for each test by default,
2. You can disable it using PREFABRICATION=0
This change both speeds up specs (less strings to allocate) and helps catch
cases where methods in Discourse are mutating inputs.
Overall we will be migrating everything to use #frozen_string_literal: true
it will take a while, but this is the first and safest move in this direction
We had a missing formats: string on our render partial that caused logs to
spam when CSS files got 404s.
Due to magic discourse_public_exceptions.rb was actually returning the
correct 404 cause it switched format when rendering the error.
After you visit a page in Rails an INFO is logged, this depending on
timing could land in the string or not
This changes the level to WARN which avoids the issue
If a required param is missing return a 400 and show a message
displaying which param was missing. Added this to the application
controller so that we don't have to add this logic to every controller
action.
previously admin got a free pass and could set theme via cookie to anything
including themes that are not selectable
this refactor ensures that only "preview" gets a free pass, all the rest
goes through the same pipeline
- allow to specify 410 vs 404 in Discourse::NotFound exception
- remove unused `permalink_redirect_or_not_found` which
- handle JS side links to topics via Discourse-Xhr-Redirect mechanism
This refinement of previous fix moves the crawler blocking into
anonymous cache
This ensures we never poison the cache incorrectly when blocking crawlers