* DEV: Remove HTML setting type and sanitization logic.
We concluded that we don't want settings to contain HTML, so I'm removing the setting type and sanitization logic. Additionally, we no longer allow the global-notice text to contain HTML.
I searched for usages of this setting type in the `all-the-plugins` repo and found none, so I haven't added a migration for existing settings.
* Mark Global notices containing links as HTML Safe.
The server will automatically serve the crawler view to game console
browsers. Neither PlayStation or Xbox can render Discourse because not
all required browser APIs are present.
See the previous commit d66b258b0e as
well.
If enable_upload_debug_mode is true, we do not want to abort the
direct S3 upload, because that will delete the file on S3 and prevent
further inspection of any errors that have come up.
Adds uppy upload functionality behind a
enable_experimental_composer_uploader site setting (default false,
and hidden).
When enabled this site setting will make the composer-editor-uppy
component be used within composer.hbs, which in turn points to
a ComposerUploadUppy mixin which overrides the relevant
functions from ComposerUpload. This uppy uploader has parity
with all the features of jQuery file uploader in the original
composer-editor, including:
progress tracking
error handling
number of files validation
pasting files
dragging and dropping files
updating upload placeholders
upload markdown resolvers
processing actions (the only one we have so far is the media optimization
worker by falco, this works)
cancelling uploads
For now all uploads still go via the /uploads.json endpoint, direct
S3 support will be added later.
Also included in this PR are some changes to the media optimization
service, to support uppy's different file data structures, and also
to make the promise tracking and resolving more robust. Currently
it uses the file name to track promises, we can switch to something
more unique later if needed.
Does not include custom upload handlers, that will come
in a later PR, it is a tricky problem to handle.
Also, this new functionality will not be used in encrypted PMs because
encrypted PM uploads rely on custom upload handlers.
This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader.
A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used.
### Starting a direct S3 upload
When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded.
Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage.
### Completing a direct S3 upload
Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`.
1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this.
2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues.
We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large.
3. Finally we return the serialized upload record back to the client
There are several errors that could happen that are handled by `UploadsController` as well.
Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
Using an invalid value was allowed. This commit tries to automatically
fix the color by adding missing # symbol or will show an error to the
user if it is not possible and it is not a CSS color either.
Configuring staged users to watch categories and tags is a way to sign
them up to get many emails. These emails may be unwanted and get marked
as spam, hurting the site's email deliverability.
Users can opt-in to email notifications by logging on to their
account and configuring their own preferences.
If staff need to be able to configure these preferences on behalf of
staged users, the "allow changing staged user tracking" site setting
can be enabled. Default is to not allow it.
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
Flips content_security_policy_frame_ancestors default to enabled, and
removes HTTP_REFERER checks on embed requests, as the new referer
privacy options made the check fragile.
This PR adds the first use of Uppy in our codebase, hidden behind a enable_experimental_image_uploader site setting. When the setting is enabled only the user card background uploader will use the new uppy-image-uploader component added in this PR.
I've introduced an UppyUpload mixin that has feature parity with the existing Upload mixin, and improves it slightly to deal with multiple/single file distinctions and validations better. For now, this just supports the XHRUpload plugin for uppy, which keeps our existing POST to /uploads.json.
* FEATURE: add penalty history when silencing a user
Display penalty history (last 6 months) when silencing/suspending a user
* FEATURE: allow default penalty values to be chosen
Adds a site setting that designates default penalty values in hours.
Silence/suspend modals will auto-fill in the default values, but otherwise
will still allow moderators to pick and overwrite values as normal.
First silence/suspend: first value
Second silence/suspend: second value
etc.
Penalty counts are forgiven at the same rate as tl3 promotion requirements do.
Co-authored-by: jjaffeux <j.jaffeux@gmail.com>
* FEATURE: Staff can receive pending user reminders more frequently.
We now express the "pending_users_reminder_delay" in minutes instead of hours so staff can have finer control over the delay.
We need to keep in mind that the reminders could still take up to 20 minutes, even when using a lower value. We send them from a scheduled job.
* Migrate to a new site setting for the reminders delay
Limit was 5 with the assumption that trust level badge
will be the 6th badge. With trust level badges disabled,
it should be possible to increase this to 6, or even more imo.
Over the years we have found that a few communities never discovered tags.
Instead of having them default off we now have them default on, ensuring
that everyone finds out about them.
Co-authored-by: Dan Ungureanu <dan@ungureanu.me>
When a topic is fully merged into another topic we close it. Now we want also to set a timer for deleting this topic. By default, stub topics will be deleted in 7 days. Users can change this period or disable auto-deleting by setting the period to 0.
Using 1 as the default value is confusing for some people as low-score flags are hidden unless staff uses the "(any)" priority filter. Let's change it to 0 and let every site adjust the setting to match their needs.
Code fences:
1. allow to have syntax highlighting by declaring a language
2. are easier to edit (the composer doesn't preserve indentation when inserting a new line)
3. are more popular
Too long excerpts don't make sense. They would make UI wonky. We already have a constraint for the `topic_excerpt_maxlength` setting. This adds the same constraint to `post_excerpt_maxlength`.
It also changes the max value of `topic_excerpt_maxlength` from 999 to 1000.
* DEV: Allow wildcards in Oneboxer optional domain Site Settings
Allows a wildcard to be used as a subdomain on Oneboxer-related SiteSettings, e.g.:
- `force_get_hosts`
- `cache_onebox_response_body_domains`
- `force_custom_user_agent_hosts`
* DEV: fix typos
* FIX: Try doing a GET after receiving a 500 error from a HEAD
By default we try to do a `HEAD` requests. If this results in a 500 error response, we should try to do a `GET`
* DEV: `force_get_hosts` should be a hidden setting
* DEV: Oneboxer Strategies
Have an alternative oneboxing ‘strategy’ (i.e., set of options) to use when an attempt to generate a Onebox fails. Keep track of any non-default strategies that were used on a particular host, and use that strategy for that host in the future.
Initially, the alternate strategy (`force_get_and_ua`) forces the FinalDestination step of Oneboxing to do a `GET` rather than `HEAD`, and forces a custom user agent.
* DEV: change stubbed return code
The stubbed status code needs to be a value not recognized by FinalDestination
This setting allows admin to de/activate automatic trimming of incoming email.
There are instances where it does wonders in trimming all the garbage content and other
instances where it's so bad that it trims the most important part of the email.
FIX: don't remove hidden content using the style attribute when converting HTML to Markdown.
The regexp used was doing more harm than good. It was way too broad.
FIX: properly elide signatures from emails sent with Front App.
This is fairly safe as Front App nicely identifies signatures in the HTML part.
This filter hides reviewables with a score lower than the "reviewable_low_priority_threshold" setting. We only use reviewables that already met this threshold to calculate the Medium and High priority filters.
* FEATURE: Review every post using the review queue.
If the `review_every_post` setting is enabled, posts created and edited by regular uses are sent to the review queue so staff can review them. We'll skip PMs and posts created or edited by TL4 or staff users.
Staff can choose to:
- Approve the post (nothing happens)
- Approve and restore the post (if deleted)
- Approve and unhide the post (if hidden)
- Reject and delete it
- Reject and keep deleted (if deleted)
- Reject and suspend the user
- Reject and silence the user
* Update config/locales/server.en.yml
Co-authored-by: Robin Ward <robin.ward@gmail.com>
Co-authored-by: Robin Ward <robin.ward@gmail.com>
We introduced a cap on the number of bookmarks the user can add in be145ccf2f. However this has caused unintended side effects; when the `jobs/scheduled/bookmark_reminder_notifications.rb` runs we get this error for users who already had more bookmarks than the limit:
> Job exception: Validation failed: Sorry, you have too many bookmarks, visit #{url}/my/activity/bookmarks to remove some.
This is because the `clear_reminder!` call was triggering a bookmark validation, which raised an error because the user already had to many, holding up other reminders.
This PR also adds `max_bookmarks_per_user` hidden site setting (default 2000). This replaces the BOOKMARK_LIMIT const so we can raise it for certain sites.
To add an extra layer of security, we sanitize settings before shipping them to the client. We don't sanitize those that have the "html" type.
The CookedPostProcessor already uses Loofah for sanitization, so I chose to also use it for this. I added it to our gemfile since we installed it as a transitive dependency.
This feature used to be controlled by two site settings
enable_personal_email_messages and min_trust_to_send_email_messages.
I removed enable_personal_email_messages and unhide
min_trust_to_send_email_messages to simplify the process of
enabling / disabling this feature.
* FEATURE: Cache successful HTTP GET requests during Oneboxing
Some oneboxes may fail if when making excessive and/or odd requests against the target domains. This change provides a simple mechanism to cache the results of succesful GET requests as part of the oneboxing process, with the goal of reducing repeated requests and ultimately improving the rate of successful oneboxing.
To enable:
Set `SiteSetting.cache_onebox_response_body` to `true`
Add the domains you’re interesting in caching to `SiteSetting. cache_onebox_response_body_domains` e.g. `example.com|example.org|example.net`
Optionally set `SiteSetting.cache_onebox_user_agent` to a user agent string of your choice to use when making requests against domains in the above list.
* FIX: Swap order of duration and value in redis call
The correct order for `setex` arguments is `key`, `duration`, and `value`.
Duration and value had been flipped, however the code would not have thrown an error because we were caching the value of `1.day.to_i` for a period of 1 seconds… The intention appears to be to set a value of 1 (purely as a flag) for a period of 1 day.
browser-update script does not work correctly in some very old browsers
because the contents of <noscript> is not accessible in JavaScript.
For these browsers, the server can display the crawler page and add the
browser update notice.
Simply loading the browser-update script in the crawler view is not a
solution because that means all crawlers will also see it.
This is not recommended. But if you have other protections in place for CSRF mitigation, you may wish to disable Discourse's implementation. This site setting is not visible in the UI, and must be changed via the console.
Prior to this change, we had weights for very_high, high, low and
very_low. This means there were 4 weights to tweak and what weights to
use for `very_high/high` and `very_low/low` pair was hard to explain.
This change makes it such that `very_high` search priority will always
ensure that the posts are ranked at the top while `very_low` search
priority will ensure that the posts are ranked at the very bottom.
* FIX: Do not show expired invites under Pending tab
* DEV: Controller action was renamed in previous commit
* FEATURE: Add 'Expired' tab to invites
* FEATURE: Refresh model after removing expired invites
* FEATURE: Do not immediately add invite to the list
Opening the 'create-invite' modal used to automatically generate an
invite to reserve an invite link. If the user did not save it and
closed the modal, the invite would be destroyed. This operations caused
the invite list to change in the background and confuse users.
* FEATURE: Sort redeemed users by creation time
* UX: Improve show / hide advanced options link
* FIX: Show redeemed users even if invites were trashed
* UX: Change modal title when editing invite
* UX: Remove Get Link button
Users can get it from the edit modal
* FEATURE: Add limit for invite links generated by regular users
* FEATURE: Add option to skip email
* UX: Show better error messages
* FIX: Show "Invited by" even if invite was trashed
Follow up to 1fdfa13a099d8e46edd0c481b3aaaafe40455ced.
* FEATURE: Add button to save without sending email
Follow up to c86379a465f28a3cc64a4a8c939cf32cf2931659.
* DEV: Use a buffer to hold all changed data
* FEATURE: Close modal after save
* FEATURE: Rate limit resend invite email
* FEATURE: Make the save buttons smarter
* FEATURE: Do not always send email even for new invites
Mailing list mode can generate significant email volume, especially on sites with a large user base. Disable mailing list mode via site settings by default so sites don't experience an unexpectedly large cost from outgoing email.
This commit includes other various improvements to watched words.
auto_silence_first_post_regex site setting was removed because it overlapped
with 'require approval' watched words.
Uses discourse/onebox@ff9ec90
Adds a hidden site setting called disable_onebox_media_download_controls which will add controlslist="nodownload" to video and audio oneboxes, and also to the local video and audio oneboxes within Discourse.
This setting is confusing admins, it is now hidden by default
It only applies to major updates of the rendering engine or imports and
very infrequently needs tweaking
The 'Discourse SSO' protocol is being rebranded to DiscourseConnect. This should help to reduce confusion when 'SSO' is used in the generic sense.
This commit aims to:
- Rename `sso_` site settings. DiscourseConnect specific ones are prefixed `discourse_connect_`. Generic settings are prefixed `auth_`
- Add (server-side-only) backwards compatibility for the old setting names, with deprecation notices
- Copy `site_settings` database records to the new names
- Rename relevant translation keys
- Update relevant translations
This commit does **not** aim to:
- Rename any Ruby classes or methods. This might be done in a future commit
- Change any URLs. This would break existing integrations
- Make any changes to the protocol. This would break existing integrations
- Change any functionality. Further normalization across DiscourseConnect and other auth methods will be done separately
The risks are:
- There is no backwards compatibility for site settings on the client-side. Accessing auth-related site settings in Javascript is fairly rare, and an error on the client side would not be security-critical.
- If a plugin is monkey-patching parts of the auth process, changes to locale keys could cause broken error messages. This should also be unlikely. The old site setting names remain functional, so security-related overrides will remain working.
A follow-up commit will be made with a post-deploy migration to delete the old `site_settings` rows.
- Improve warning message.
- Only display the warning if the language has a fallback and either "allow_user_locale", or "set_locale_from_accept_language_header" are enabled.
osts from topics with 'auto delete replies timer' with more than
skip_auto_delete_reply_likes likes will no longer be deleted. If 0,
all posts will be deleted.
Include the enable_filtered_replies_view site setting in the admin UI
Adds title label to in-reply-to widget
Invokes the filtered UI when using replies_to_post_number as a query
parameter
Replaces the "Show All" button icon
Fixes grammar for "Viewing 1 reply to..." label
You can let non-staff users use shared drafts by modifying the `shared_drafts_min_trust_level` site setting. These users must have access to the shared draft category.
* FIX: show/hide ignored users preferences
based on the current user trust level and the appropriate site setting.
* Allow us to await the `updateCurrentUser` call
Co-authored-by: Robin Ward <robin.ward@gmail.com>
This adds a new min_trust_level_to_allow_ignore site setting that enables admins to control the point at which a user is allowed to ignore other users.
* FEATURE: display error if Oneboxing fails due to HTTP error
- display warning if onebox URL is unresolvable
- display warning if attributes are missing
* FEATURE: Use new Instagram oEmbed endpoint if access token is configured
Instagram requires an Access Token to access their oEmbed endpoint. The requirements (from https://developers.facebook.com/docs/instagram/oembed/) are as follows:
- a Facebook Developer account, which you can create at developers.facebook.com
- a registered Facebook app
- the oEmbed Product added to the app
- an Access Token
- The Facebook app must be in Live Mode
The generated Access Token, once added to SiteSetting.facebook_app_access_token, will be passed to onebox. Onebox can then use this token to access the oEmbed endpoint to generate a onebox for Instagram.
* DEV: update user agent string
* DEV: don’t do HEAD requests against news.yahoo.com
* DEV: Bump onebox version from 2.1.5 to 2.1.6
* DEV: Avoid re-reading templates
* DEV: Tweaks to onebox mustache templates
* DEV: simplified error message for missing onebox data
* Apply suggestions from code review
Co-authored-by: Gerhard Schlager <mail@gerhard-schlager.at>
Per Google, sites are encouraged to upgrade from Universal Analytics v3 `analytics.js` to v4 `gtag.js` for Google Analytics tracking. We're giving admins the option to stay on the v3 API or migrate to v4. Admins can change the implementation they're using via the `ga_version` site setting. Eventually Google will deprecate v3, but our implementation gives admins the choice on what to use for now.
We chose this implementation to make the change less error prone, as many site admins are using custom events via the v3 UA API. With the site stetting defaulted to `v3_analytics`, site analytics won't break until the admin is ready to make the migration.
Additionally, in the v4 implementation, we do not enable automatic pageview tracking (on by default in the v4 API). Instead we rely on Discourse's page change API to report pageviews on transition to avoid double-tracking.
Discourse used to break from convention by logging out all sessions on any
specific session logout.
This would leave users confused about why mobile is logged out when the user
logged out of desktop.
log_out_strict is too conservative for most and not the pattern the industry
has adopted (google/twitter/facebook all perform no strict logouts)
This commit adds a site setting `auto_close_topics_create_linked_topic`
which when enabled works in conjunction with `auto_close_topics_post_count`
setting and creates a new linked topic for the topic just closed.
The auto-created new topic contains a link for all the previous topics
and the topic titles are appended with `(Part {n})`.
The setting is enabled by default.
Most proxies out there will work with chunked encoding transfer. However
some proxies buffer, causing large delays which in turn force the message
bus client to disable chunked encoding. This wastes a request to the message
bus causing superfluous load on the server.
Also
- enableLongPolling is already default true in the client, no need to set it
- remove confusing comment about zepto
Before deleting a topic that has a high number of views (default of 5000), the user will be prompted with a confirmation popup. This works for all delete buttons on the topic located in: topic-timeline, topic-admin-menu, topic-footer-buttons, and post-menu if the post's ID is 1.
The delete button will be disabled while deletion is in progress, to prevent any unwanted behavior.
A site setting is also available to change the minimum amount of views required to display the confirmation popup.
All kudos are going to @RickyC0626. I only rebased with master and added few qunit tests to ensure that this feature works as expected.
Original PR: #10459
New version of Thunderbird email client reimplemented PGP support. Now the following attachments are added by default, if email signatures are enabled:
* OpenPGP_0x(pgp key id).asc
* OpenPGP_signature(.asc)
The last one has `name="OpenPGP_signature.asc"` in `Content-Type` but `filename="OpenPGP_signature"` (without extension) in `Content-Disposition: attachment`.
Since both the key and the signature have proper MIME types, filter them by default.
* FEATURE - Add SiteSettings to control JPEG image quality
`recompress_original_jpg_quality` - the maximum quality of a newly
uploaded file.
`image_preview_jpg_quality` - the maximum quality of OptimizedImages
This was made adjustable to allow rolling back quickly if problems came
up. The new behaviour was made default in 93137066 and no problems with
this have been reported.
Dependency on gifsicle, allow_animated_avatars and allow_animated_thumbnails
site settings were all removed. Animated GIF images are still allowed, but
the generated optimized images are no longer animated for those (which were
used for avatars and thumbnails).
The added 'animated' is populated by extracting information using FastImage.
This field was used to selectively reoptimize old animations. This process
happens in the background.
Allows site administrators to pick different fonts for headings in the wizard and in their site settings. Also correctly displays the header logos in wizard previews.
Previously this site setting `embed unlisted` defaulted to false and
empty topics would be generated for embed, but those topics tend to take
up a lot of room on the topic lists.
This new default creates invisible topics by default until they receive
their first reply.
This allows administrators to stop automatic redirect to an external authenticator. It only takes effect when there is a single authentication method, and the site is login_required
It's a more user-friendly alternative to the default select-kit list,
for settings that are simple lists of items (the regular list widget is
better for settings with choice options).
The site setting `search_recent_posts_size` controls the window of posts
that we will search through before trying to search through the full index
If this number is too low then the search quality can suffer a lot as recent
posts may dominate search. If the number is too high then performance will
suffer.
This attempts to find a happy medium, 1 million posts will cover the majority
of forums out there and should perform adequately.
This is a little bit of refactoring. Core Discourse should have default promotion message for TL2.
In addition, when the Discobot plugin is enabled, the user is invited to advanced training
To check if a post contains any embedded media, we look if the "image_sizes" attribute is present in the new post manager arguments.
We want to see one boxed links, but we only store the raw content of the post. To work around this, I extracted the onebox logic from the composer editor into a module.
We are making the changes from the PR #10563 the default behaviour. Now, if secure media is enabled, secure images will be embedded in emails by default instead of redacting them and displaying a message. This will be a nicer overall experience by default, and for forums that want to be super strict with redaction this setting can always be disabled.
In some cases Discourse admins may opt for sessions not to persist when a
browser is closed.
This is particularly useful in healthcare and education settings where
computers are shared among multiple workers.
By default `persistent_sessions` site setting is enabled, to opt out you
must disable the site setting.
This PR introduces a few important changes to secure media redaction in emails. First of all, two new site settings have been introduced:
* `secure_media_allow_embed_images_in_emails`: If enabled we will embed secure images in emails instead of redacting them.
* `secure_media_max_email_embed_image_size_kb`: The cap to the size of the secure image we will embed, defaulting to 1mb, so the email does not become too big. Max is 10mb. Works in tandem with `email_total_attachment_size_limit_kb`.
`Email::Sender` will now attach images to the email based on these settings. The sender will also call `inline_secure_images` in `Email::Styles` after secure media is redacted and attachments are added to replace redaction messages with attached images. I went with attachment and `cid` URLs because base64 image support is _still_ flaky in email clients.
All redaction of secure media is now handled in `Email::Styles` and calls out to `PrettyText.strip_secure_media` to do the actual stripping and replacing with placeholders. `app/mailers/group_smtp_mailer.rb` and `app/mailers/user_notifications.rb` no longer do any stripping because they are earlier in the pipeline than `Email::Styles`.
Finally the redaction notice has been restyled and includes a link to the media that the user can click, which will show it to them if they have the necessary permissions.
![image](https://user-images.githubusercontent.com/920448/92341012-b9a2c380-f0ff-11ea-860e-b376b4528357.png)
This commit adds a new site setting "allowed_onebox_iframes". By default, all onebox iframes are allowed. When the list of domains is restricted, Onebox will automatically skip engines which require those domains, and use a fallback engine.
With the addition of `PostSearchData#private_message`, a partial
index consisting of only search data from regular posts can be created.
The partial index helps to speed up searches on large sites since PG
will not have to do an index scan on the entire search data index which
has shown to be a bottle neck.
Like "default watching" and "default tracking" categories option now the "regular" categories support is added. It will be useful for sites that are muted by default. The user option will be displayed only if `mute_all_categories_by_default` site setting is enabled.
Enabling the moderators_manage_categories_and_groups site setting will allow moderator users to create/manage groups.
* show New Group form to moderators
* Allow moderators to update groups and read logs, where appropriate
* Rename site setting from create -> manage
* improved tests
* Migration should rename old log entries
* Log group changes, even if those changes mean you can no longer see the group
* Slight reshuffle
* RouteTo /g if they no longer have permissions to view group
A first step to adding automatic dark mode color scheme switching. Adds a new SCSS file at `color_definitions.scss` that serves to output all SCSS color variables as CSS custom properties. And replaces all SCSS color variables with the new CSS custom properties throughout the stylesheets.
This is an alpha feature at this point, can only be enabled via console using the `default_dark_mode_color_scheme_id` site setting.
In 1bd8a075, a hidden site setting was added that causes Email::Styles
to treat its input as a complete document in all cases.
This commit enables that setting by default.
Some tests were removed that were broken by this change. They tested the
behaviour of applying email styles to empty strings. They weren't useful
because:
* Sending empty email is not something we ever intend to do,
* They were testing incidental behaviour - there are lots of
valid ways to process the empty string,
* Their intent wasn't clear from their descriptions,
Considering document length in search introduced too much variance in
our search results such that it makes certain searches better but at the
same time made certain searches worst. Instead, we want to have a more
determistic way of ranking search so that it is easier to reason about
why a post is rank higher in search than another.
The long term plan to tackle repeated terms is to restrict the number of
positions for a given lexeme in our search index.
```
discourse_development=# SELECT alias, lexemes FROM TS_DEBUG('www.discourse.org');
alias | lexemes
-------+---------------------
host | {www.discourse.org}
discourse_development=# SELECT TO_TSVECTOR('www.discourse.org');
to_tsvector
-----------------------
'www.discourse.org':1
```
Given the above lexeme, we will inject additional lexeme by splitting
the host on `.`. The actual tsvector stored will look something like
```
tsvector
---------------------------------------
'discourse':1 'discourse.org':1 'org':1 'www':1 'www.discourse.org':1
```
`Nokogiri::HTML.fragment` is a huge hack (a comment in the source code
admits this). The current behavior of `Email::Styles` is to try to
emulate `fragment` using nokogumbo, but it misses some edge cases. In
particular, meta tags in a email template don't make it through to the
final email.
Instead of treating the provided HTML as an indeterminate fragment, this
commit makes `Email::Styles` treat the HTML as a complete document. This
means that the generated HTML for an email will now always contain top
level structure (a doctype, html, head and body tags).
This new behavior is behind a hidden site setting for now and defaults
off.
Previously we would include this section, unfortunately
1. It is usually elided in gmail
2. It can make the emails longer and more confusing
3. Omission is a feature, it means people need to visit site to get context
There is a feature in search where we take over from the tokenizer
in postgres and attempt to inject more words into search.
So for example: sam.i.am will inject the words i and am.
This is not ideal cause there are many edge cases and this can
cause extreme index bloat.
This is an opening move commit to make it configurable, over the
next few weeks we will evaluate and decide if we disable this by
default or simply remove.