This commit fixes the follow quality issue with `PostSearchData#raw_data`:
1. URLs are being tokenized and links with similar href and characters
are being duplicated in the raw data.
`Post#cooked`:
```
<p><a href=\"https://meta.discourse.org/some.png\" class=\"onebox\" target=\"_blank\" rel=\"nofollow noopener\">https://meta.discourse.org/some.png</a></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png discourse org/some png https://meta.discourse.org/some.png discourse org/some png
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png meta discourse org
```
2. Ligthbox being included in search pollutes the
`PostSearchData#raw_data` unncessarily.
From 28 March 2018 to 28 March 2019, searches for the term `image` on
`meta.discourse.org` had a click through rate of 2.1%. Non-lightboxed images are not included in indexing for search yet we were indexing content within a lightbox. Also, search for terms like `image` was affected we were using `Pasted image` as the filename for
uploads that were pasted.
`Post#cooked`
```
<p>Let me see how I can fix this image<br>\n<div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://meta.discourse.org/some.png\" title=\"some.png\" rel=\"nofollow noopener\"><img src=\"https://meta.discourse.org/some.png\" width=\"275\" height=\"299\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#far-image\"></use></svg><span class=\"filename\">some.png</span><span class=\"informations\">1750×2000</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#discourse-expand\"></use></svg>\n</div></a></div></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image some.png png https://meta.discourse.org/some.png discourse org/some png some.png png 1750×2000
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image
```
In terms of indexing performance, we now have to parse the given HTML
through nokogiri twice. However performance is not a huge worry here since a string length of 194170 takes only 30ms
to scrub plus the indexing takes place in a background job.
Includes support for flags, reviewable users and queued posts, with REST API
backwards compatibility.
Co-Authored-By: romanrizzi <romanalejandro@gmail.com>
Co-Authored-By: jjaffeux <j.jaffeux@gmail.com>
Previously we would bypass touching `Topic.updated_at` for whispers and post
recovery / deletions.
This meant that certain types of caching can not be done where we rely on
this information for cache accuracy.
For example if we know we have zero unread topics as of yesterday and whisper
is made I need to bump this date so the cache remains accurate
This is only half of a larger change but provides the groundwork.
Confirmed none of our serializers leak out Topic.updated_at so this is safe
spot for this info
At the moment edits still do not change this but it is not relevant for the
unread cache.
This commit also cleans up some specs to use the new `eq_time` matcher for
millisecond fidelity comparison of times
Previously `freeze_time` would fudge this which is not that clean.
- The test_email job is removed, because it was always being run synchronously (not in sidekiq)
- 34b29f62 added a bypass for critical emails, to match the spec. This removes the bypass, and removes the spec.
- This adapts the specs for 72ffabf6, so that they check for emails being sent
- This reimplements c2797921, allowing test emails to be sent even when emails are disabled
Fixes two issues:
1. Redirecting to an external origin's path after login did not work
2. User would be erroneously redirected to the external origin after logout
https://meta.discourse.org/t/109755
* This is causing certain posts to appear in searches incorrectly as `PostSearchData#raw_data` contains the outdated title, category name and tag names.
* Remove use of 0 in favor of `TrustLevel.levels[:newuser]`.
* Consolidate two tests into a single one.
* Test that disabling the feature works.
* Avoid loading full ActiveRecord object in test when we only need to
know the existence of the record.
* improved emoji support
- always optimize images as part of the task
- use the unicode standard ordering/naming for sections
* UX: more height for when there are recently used
Migrates email user options to a new data structure, where `email_always`, `email_direct` and `email_private_messages` are replace by
* `email_messages_level`, with options: `always`, `only_when_away` and `never` (defaults to `always`)
* `email_level`, with options: `always`, `only_when_away` and `never` (defaults to `only_when_away`)
* First take
* Add support for sprites in themes
Automatically register any custom icons added via themes or plugins
* Fix theme sprite caching
* Simplify test
* Update lib/svg_sprite/svg_sprite.rb
Co-Authored-By: pmusaraj <pmusaraj@gmail.com>
* Fix /svg-sprite/search request
* FEATURE: Add `IgnoredUsersSummary` daily job
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8).
We want to:
1. Send an automatic group PM that goes out to moderators
2. When {x} users have Ignored the same user, threshold defined by a site setting, default of 5
3. Only send this message every X days which is defined by another site setting
It is not a setting, and only relevant in specs. The new API is:
```
Jobs.run_later! # jobs will be thrown on the queue
Jobs.run_immediately! # jobs will run right away, avoid the queue
```
If the existing email address for a user ends in `.invalid`, we should take the email address from an authentication payload, and replace the invalid address. This typically happens when we import users from a system without email addresses.
This commit also adds some extensibility so that plugin authenticators can define `always_update_user_email?`
Since uploads site settings are now backed by an actual upload, we don't
have to reach over the network just to fetch the favicon. Instead, we
can just read the upload directly from disk.
When using the api and you provide an http header based api key any other
auth based information (username, external_id, or user_id) passed in as
query params will not be used and vice versa.
Followup to f03b293e6a
* FEATURE: Add `Top Ignored Users` report
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8), and also part of [this PR](https://github.com/discourse/discourse/pull/7144).
We want to send a System Message daily when a specific count threshold for an ignored is reached. To make this system message informative, we want to link to a report for the Top Ignored Users too.
We can only be sure that an email is sent when we get a mailer in
`ActionMailer::Deliveries`. A couple of tests were actually incorrect
because it didn't flow through our email sender where there are more
conditions in determining whether an email is sent or not.
Previously if you wanted to have jobs execute in test mode, you'd have
to do `SiteSetting.queue_jobs = false`, because the opposite of queue
is to execute.
I found this very confusing, so I created a test helper called
`run_jobs_synchronously!` which is much more clear about what it does.
* FEATURE: Account for `ignored_users` when merging two users
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8).
When we merge two users, we need to account for merging their list of `ignored_users` too.
- Notices are visible only by poster and trust level 2+ users.
- Notices are not generated for non-human or staged users.
- Notices are deleted when post is deleted.
It seems that due to jobs being asynchronous and wrapping code in a
DistributedMutex that by the time we run the
`UserAvatar#update_gravatar!` job that the user/user email might be
destroyed.
This patch checks before a call to `user.email_hash` to make sure
the user and primary email exist to prevent the exception. If not
present, the job exits as there's nothing to do because we are
probably running after the user was destroyed for some reason.
Now you can also make authenticated API requests by passing the
`api_key` and `api_username` in the HTTP header instead of query params.
The new header values are: `Api-key` and `Api-Username`.
Here is an example in cURL:
``` text
curl -i -sS -X POST "http://127.0.0.1:3000/categories" \
-H "Content-Type: multipart/form-data;" \
-H "Api-Key: 7aa202bec1ff70563bc0a3d102feac0a7dd2af96b5b772a9feaf27485f9d31a2" \
-H "Api-Username: system" \
-F "name=7c1c0ed93583cba7124b745d1bd56b32" \
-F "color=49d9e9" \
-F "text_color=f0fcfd"
```
There is also support for `Api-User-Id` and `Api-User-External-Id`
instead of specifying the username along with the key.
If you reply to an email with the word "mute" a topic will be muted
If you reply to an email with the word "track" a topic will be tracked
If you reply to an email with the word "watch" a topic will be watched
These ninja command can help advanced mailing list ex-users, saves a trip
to the website
Mods require visibility to everyone group cause category dialogs need to
know about this.
If the site setting `allow moderators to create categories` will not function
without this
Note there is no security expansion of rights here, the group is technically
empty anyway and it always looks exactly the same on all discourse instances
This is a common pattern we see in tests. The `id` of the upload
is used to create the URL and we assume the `id` will always be
in a certain range which depends on the database.
Uses github.com/discourse/moment-timezone-names-translations to translate timezone names.
Plugins can also provide their own timezone name translations.
Previously with had `in:title` and `in:first` search shortcuts for
searching in first post or title only. They are a bit of handful to type.
This add 2 shortcuts (t and f) for searching titles of first posts.
This commit also cleans up all advanced filters, they were not properly
regex terminated allowing for weird clauses like `in:firstinator` acting
the same as `in:first`
Attempt to force NGINX to include content length when doing X-SendFile
This does not seem to be required when bypassing NGINX.
Without this header some CDNs may have issues caching
When a new post is triggered via message bus post stream will attempt to load
it, previously the `/topic/TOPIC_ID/posts.json` would unconditionally include
suggested topics, this caused excessive load on the server.
New pattern defaults to exclude suggested and related topics from this API
unless people explicitly ask for suggested.
When the S3 store was enabled, we were only applying the S3 CDN.
So all images stored locally, like the emojis, were never put on the local CDN.
Fixed a bunch of CookedPostProcessor test by adding a call to 'optimize_urls'
in order to get final URLs.
I also removed the unnecessary PrettyText.add_s3_cdn method since this is already
handled in the CookedPostProcessor.
Do not allow `/u/search/users.json` to list any group matches unless a
specific `term` is specified in the API call.
Adding groups should always be done when an actual search term exists,
blank search is only supported for users within a topic
Following this change when a user hits `@` and is replying to a topic they
will see usernames of people who were last seen and participated in the topic
This is somewhat experimental, we may tweak this, or make it optional.
Also, a regression in a423a938 where hitting TAB would eat a post you were writing:
Eg this would eat a post:
``` text
@hello, testing 123 <tab>
```
It was getting caught in a `DistributedMutex` deadlock (twice!), which
meant this test was taking 120s to run.
I'm not sure why queue jobs was turned off here, because when I turn it
on the test passes and takes <2s instead.
If a theme setting contained invalid SCSS, it would cause an error 500 on the site, with no way to recover. This commit stops loading theme settings in the core stylesheets, and instead only loads the color scheme variables. This change also makes `common/foundation/variables.scss` available to themes without an explicit import.
Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
Co-authored-by: David Taylor <david@taylorhq.com>
This gives more control over the request. In particular we can easily
lookup DNS dynamically, instead of only upon NGINX startup.
Previously, NGINX was looking up IP for the letter avatar service and
caching the CDN IP address, this caused issues if CDN changed IP, in
which letter avatars would be broken till a container restarted.
NGINX config has been updated to add caching. This change will require
a container rebuild.
The proxy will now function in development environments, so the patch
for `letter_avatar_proxy` has been removed.
We had a missing formats: string on our render partial that caused logs to
spam when CSS files got 404s.
Due to magic discourse_public_exceptions.rb was actually returning the
correct 404 cause it switched format when rendering the error.
This adjusts 53d592ad by @tgxworld
- Adds Sidekiq.upause_all! to unpause all sites
- Adds Sidekiq.paused_dbs to list dbs that are currently paused
- Handles some edge cases where unpause thread could extend expiry on
sites that were unpaused from a different process
- Ensures tests always terminates background thread used for pause
keepalive
Treating TIFF and BMP as images cause us to add them to IMG tags, this is very inconsistent across browsers.
You can still upload these files they will simply not be displayed in IMG tags.
Previously it would unhide their post but leave them silenced.
This fix also cleans up some of the helper classes to make it easier
to pass extra data to the silencing code (for example, a link to the
post that caused the user to be silenced.)
This patch also refactors the auto_silence specs to avoid using
stubs.
If for some reason we created andupload with id 1 in the test then the
test would fail. This can happen if this is the absolute first test to
run on the db.
Fix sets the upload to a legitimate which in turn means the last upload
will not be upload id 1 and stops using id hard coding for the testing.
Currently the theme is matched by name, which can be fragile when there are many themes with the same name. This functionality will be used by the next version of theme CLI.
If for some reason `Discourse.store.path_for` returns `nil`, the
forum would throw an error rather than returning 404.
Why would it be `nil`? One cause could be changing the type of
file store and having the `url` field no longer be relative.
This does not serve any technical purpose. It is there to provide a signpost for any user/developer that wants to know what to do with a theme archive.
There was a situation where if:
* There were new flags to review that met the visibility threshold
AND
* There were old flags that *didn't* meet the threshold
THEN
a pending flags notification would be sent out. This fixes that case.
Staff should not be notified of flags if they do not meet the threshold
and are old.
New `about.json` fields (all optional):
- `authors`: An arbitrary string describing the theme authors
- `theme_version`: An arbitrary string describing the theme version
- `minimum_discourse_version`: Theme will be auto-disabled for lower versions. Must be a valid version descriptor.
- `maximum_discourse_version`: Theme will be auto-disabled for lower versions. Must be a valid version descriptor.
A localized description for a theme can be provided in the language files under the `theme_metadata.description` key
The admin UI has been re-arranged to display this new information, and give more prominence to the remote theme options.
Under some conditions it was possible to pass in a user_id as an
integer, but we would try and parse it as a comma delimited string
resulting in an error. This has been fixed so that we are no longer
mapping the user_id param to user_ids.
* FIX: allow sending PMs to staff via flag even when PMs are disabled
FIX: allow sending PMs to staff via flag even if the user trust level is insufficient
* Update lib/topic_creator.rb
Co-Authored-By: techAPJ <arpit@techapj.com>
The `posts` relation on `Topic` is not ordered. Using `Topic.posts.first`
is basically the same as asking for a random post, it will depend on DB
order. This breaks on Topic merge and split for example.
Additionally, a huge problem with that is that it forces active record down
a slow path. `Topic.posts.first` is extremely slow on giant topics, since
it has no default ordering it appears AR materializes the entire set prior
to doing `first`.
This commit also illustrates the importance of testing, initially I only
fixed the second instance of the problem in `post_validator.rb` but testing
revealed that the problem was repeated at the top of the file.
Longer term we should consider a larger change of default ordering the posts
relations so people do not fall down this trap anymore.
- Themes can supply translation files in a format like `/locales/{locale}.yml`. These files should be valid YAML, with a single top level key equal to the locale being defined. For now these can only be defined using the `discourse_theme` CLI, importing a `.tar.gz`, or from a GIT repository.
- Fallback is handled on a global level (if the locale is not defined in the theme), as well as on individual keys (if some keys are missing from the selected interface language).
- Administrators can override individual keys on a per-theme basis in the /admin/customize/themes user interface.
- Theme developers should access defined translations using the new theme prefix variables:
JavaScript: `I18n.t(themePrefix("my_translation_key"))`
Handlebars: `{{theme-i18n "my_translation_key"}}` or `{{i18n (theme-prefix "my_translation_key")}}`
- To design for backwards compatibility, theme developers can check for the presence of the `themePrefix` variable in JavaScript
- As part of this, the old `{{themeSetting.setting_name}}` syntax is deprecated in favour of `{{theme-setting "setting_name"}}`
This commit introduces an ultra low priority queue for post rebakes. This
way rebakes can never interfere with regular sidekiq processing for cases
where we perform a large scale rebake.
Additionally it allows Post.rebake_old to be run with rate_limiter: false
to avoid triggering the limiter when rebaking. This is handy for cases
where you want to just force the full rebake and not wait for it to trickle
This corrects 2 issues:
First is a regression with d7c08e21 for some reason dependent :delete_all
respects default scopes where-as dependent :destroy bypasses it.
Secondly, we were keeping orphan user actions around on user destroy, this
ensures we remove all the user actions not only ones that originated by
the user.
So for example: if I like a post of user A we create a user action saying I
did that, but once user A is deleted we were not removing the action leading
to an orphan action in the database.
We use the `id` of the upload to calculate a `depth` partition in the
filename. This test would fail if your database had a higher seed
because the depth it was looking for was hard coded to 1.
The solution was to not save the records (which is faster anyway) and
specify the `id` of the upload to make the hash deterministic.
SiteSettingExtension triggers message bus which re-establishes a
DB connection in `SiteSettingExtension#process_message`. That happens
concurrently and a test that requires a connection to the db will
fail when the reconnection is happening.
This makes more sense than having the guardian take an accessor.
The logic belongs in the Serializer, where the JSON is calculated.
Also removed some of the DRYness in the spec. It's fewer lines
and made it easier to test the option on the serializer.
This commit introduces a new helper to enable transactional fixtures
when testing multisite. This would show up as tests that passed the
first time then failed the second time due to stale data being leftover.
Previously we depended on non Sidekiq specific mocking which is not the
official way of testing Sidekiq, this made these tests very fragile
New testing is more robust and complete
This allows us to run regular rebakes without starving the normal queue.
It additionally adds the ability to specify queue with `Jobs.enqueue` so
we can specifically queue a job with lower priority using the `queue` arg.
Previously we killed caching on old avatars cause we kept serving blank
this meant we would front many more avatar requests after a version change
This change ensures all old avatars do not cause a flood of requests on the
server
Before this patch, a high trust level user could flag something
and have an action be taken, as well as skipping the flag queue.
Now, if a TL3/TL4 cause an action, the flag will skip the minimum
visibility check and allow staff to review it.
This is a possible solution for https://meta.discourse.org/t/user-api-keys-specification/48536/19
This allows for user-api-key requests to not require a redirect url.
Instead, the encypted payload will just be displayed after creation ( which can be copied
pasted into an env for a CLI, for example )
Also: Show instructions when creating user-api-key w/out redirect
This adds a view to show instructions when requesting a user-api-key
without a redirect. It adds a erb template and json format.
Also adds a i18n user_api_key.instructions for server.en.yml
We have the periodical job that regularly will rebake old posts. This is
used to trickle in update to cooked markdown. The problem is that each rebake
can issue multiple background jobs (post process and pull hotlinked images)
Previously we had no per-cluster limit so cluster running 100s of sites could
flood the sidekiq queue with rebake related jobs.
New system introduces a hard limit of 300 rebakes per 15 minutes across a
cluster to ensure the sidekiq job is not dominated by this.
We also reduced `rebake_old_posts_count` to 80, which is a safer default.
This reverts commit 993f847a2c.
There is an edge case where the link click redirect fails when the URL has trailing slash. Need to figure out a better fix for this.
Previously we had no idea what algorithm generated thumbnails, this starts tracking the version.
We also bumped up the version to force all optimized images to be generated. This is important cause we recently introduced pngquant which results in much smaller images.
This feature ensures optimized images run via pngquant, this results extreme amounts of savings for resized images. Effectively the only impact is that the color palette on small resized images is reduced to 256.
To ensure safety we only apply this optimisation to images smaller than 500k.
This commit also makes a bunch of image specs less fragile.
Previously if upload had missing width and height we would calculate
on first use BUT we (me) forgot to save this to the database
This was particularly bad on home page cause category images (when old)
miss dimensions.
This allows fidelity in controlling excerpt (text that shows up when you pin a topic or link to it externally):
```
I am some text
[excerpt]
This is some **custom** markdown that should be the excerpt
[/excerpt]
More text
```
Previous solution relied on DIVs, unfortunately DIVs do not play well,
by design with mixing markdown unless you have a preceding newline eg:
```
<div class='hello'>
this will be treated properly as markdown
</div>
```
This extra newline is not desirable.
I am also considering adding
```
[div class=excerpt]
[/div]
```
This would offer lots of flexibility to themes and plugins that do not want the extra annoying newline.
As per the documentation for KEYS
```
Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout.
```
Instead SCAN
```
Since these commands allow for incremental iteration, returning only a small number of elements per call, they can be used in production without the downside of commands like KEYS or SMEMBERS that may block the server for a long time (even several seconds) when called against big collections of keys or elements.
```
This generates a 10x10 PNG thumbnail for each lightboxed image.
If Image Lazy Loading is enabled (IntersectionObserver API) then
we'll load the low res version when offscreen. As the image scrolls
in we'll swap it for the high res version.
We use a WeakMap to track the old image attributes. It's much less
memory than storing them as `data-*` attributes and swapping them
back and forth all the time.
This validation makes sure that the s3_upload_bucket and the
s3_backup_bucket have different values. The backup bucket is
allowed to be a subfolder of the upload bucket. The other way
around is forbidden because the backup system searches by
prefix and would return all files stored within the backup
bucket and its subfolders.
* Dashboard doesn't timeout anymore when Amazon S3 is used for backups
* Storage stats are now a proper report with the same caching rules
* Changing the backup_location, s3_backup_bucket or creating and deleting backups removes the report from the cache
* It shows the number of backups and the backup location
* It shows the used space for the correct backup location instead of always showing used space on local storage
* It shows the date of the last backup as relative date
`SiteSerializer#is_readonly` is cached for an anonymous user so we have
to clear the cache when disabling readonly mode. Otherwise, the site may
appear to be in readonly mode for an extended period of time.