restructure query so it avoids ORs
It appears postgres is picking suboptimal indexes if too many ORs exist
despite how trivial the condition is.
This bypasses conditional in the query and evals them upfront.
On meta for my user this made a 10x perf difference.
This boils down to either having `OR u.admin` or not having `OR u.admin` in
the query.
Note, to avoid race conditions we are setting last_unread to 10 minutes ago
if there is nothing unread.
This is safer in case of in progress transactions
we don't want to lose unread for any window of time.
This optimisation avoids large scans joining the topics table with the
topic_users table.
Previously when a user carried a lot of read state we would have to join
the entire read state with the topics table. This operation would slow down
home page and every topic page. The more read state you accumulated the
larger the impact.
The optimisation helps people who clean up unread, however if you carry
unread from years ago it will only have minimal impact.
Sometimes sidekiq is so fast that it starts jobs before transactions
have comitted. This patch moves the message bus stuff until after things
have comitted.
"Rejecting" a user in the queue is equivalent to deleting them, which
would then making it impossible to review rejected users. Now we store
information about the user in the payload so if they are deleted things
still display in the Rejected view.
Secondly, if a user is destroyed outside of the review queue, it will
now automatically "Reject" that queue item.
Conversely, if a user is deactivated the reviewable should automatically
be rejected.
Before this fix, if a user was not active they'd still show in the
review queue but without an "Approve" button which was confusing.
Previously every rebake would remove and recreate records in this table
This caused created_at and updated_at to keep changing
Yes, I know the SQL is somewhat complex, but this makes quote extraction
more efficient cause we do everything in 2 round trips.
This also removes some concurrency protection we should no longer need
Some sites have external URLs that don't even match `%/uploads/%' and
some sites surprise me with URLs that contains the default path when it
is a site in a multisite cluster. We can't do anything about those.
Adds the parallel_tests gem, and redis/postgres configuration for running rspec tests in parallel. To use:
```
rake parallel:rake[db:create]
rake parallel:rake[db:migrate]
rake parallel:spec
```
This brings the test suite from 12m20s to 3m11s on my macOS machine
This commit fixes the follow quality issue with `PostSearchData#raw_data`:
1. URLs are being tokenized and links with similar href and characters
are being duplicated in the raw data.
`Post#cooked`:
```
<p><a href=\"https://meta.discourse.org/some.png\" class=\"onebox\" target=\"_blank\" rel=\"nofollow noopener\">https://meta.discourse.org/some.png</a></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png discourse org/some png https://meta.discourse.org/some.png discourse org/some png
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png meta discourse org
```
2. Ligthbox being included in search pollutes the
`PostSearchData#raw_data` unncessarily.
From 28 March 2018 to 28 March 2019, searches for the term `image` on
`meta.discourse.org` had a click through rate of 2.1%. Non-lightboxed images are not included in indexing for search yet we were indexing content within a lightbox. Also, search for terms like `image` was affected we were using `Pasted image` as the filename for
uploads that were pasted.
`Post#cooked`
```
<p>Let me see how I can fix this image<br>\n<div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://meta.discourse.org/some.png\" title=\"some.png\" rel=\"nofollow noopener\"><img src=\"https://meta.discourse.org/some.png\" width=\"275\" height=\"299\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#far-image\"></use></svg><span class=\"filename\">some.png</span><span class=\"informations\">1750×2000</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#discourse-expand\"></use></svg>\n</div></a></div></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image some.png png https://meta.discourse.org/some.png discourse org/some png some.png png 1750×2000
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image
```
In terms of indexing performance, we now have to parse the given HTML
through nokogiri twice. However performance is not a huge worry here since a string length of 194170 takes only 30ms
to scrub plus the indexing takes place in a background job.
Includes support for flags, reviewable users and queued posts, with REST API
backwards compatibility.
Co-Authored-By: romanrizzi <romanalejandro@gmail.com>
Co-Authored-By: jjaffeux <j.jaffeux@gmail.com>
- s3_force_path_style was added as a Minio specific url scheme but it has never been well supported in our code base.
- Our new migrate_to_s3 rake task does not work reliably with path style urls too
- Minio has also added support for virtual style requests i.e the same scheme as AWS S3/DO Spaces so we can rely on that instead of using path style requests.
- Add migration to drop s3_force_path_style from the site_settings table
* improved emoji support
- always optimize images as part of the task
- use the unicode standard ordering/naming for sections
* UX: more height for when there are recently used
Migrates email user options to a new data structure, where `email_always`, `email_direct` and `email_private_messages` are replace by
* `email_messages_level`, with options: `always`, `only_when_away` and `never` (defaults to `always`)
* `email_level`, with options: `always`, `only_when_away` and `never` (defaults to `only_when_away`)
* FEATURE: Exposing a way to add a generic report filter
## Why do we need this change?
Part of the work discussed [here](https://meta.discourse.org/t/gain-understanding-of-file-uploads-usage/104994), and implemented a first spike [here](https://github.com/discourse/discourse/pull/6809), I am trying to expose a single generic filter selector per report.
## How does this work?
We basically expose a simple, single generic filter that is computed and displayed based on backend values passed into the report.
This would be a simple contract between the frontend and the backend.
**Backend changes:** we simply need to return a list of dropdown / select options, and enable the report's newly introduced `custom_filtering` property.
For example, for our [Top Uploads](https://github.com/discourse/discourse/pull/6809/files#diff-3f97cbb8726f3310e0b0c386dbe89e22R1423) report, it can look like this on the backend:
```ruby
report.custom_filtering = true
report.custom_filter_options = [{ id: "any", name: "Any" }, { id: "jpg", name: "JPEG" } ]
```
In our javascript report HTTP call, it will look like:
```js
{
"custom_filtering": true,
"custom_filter_options": [
{
"id": "any",
"name": "Any"
},
{
"id": "jpg",
"name": "JPG"
}
]
}
```
**Frontend changes:** We introduced a generic `filter` param and a `combo-box` which hooks up into the existing framework for fetching a report.
This works alright, with the limitation of being a single custom filter per report. If we wanted to add, for an instance a `filesize filter`, this will not work for us. _I went through with this approach because it is hard to predict and build abstractions for requirements or problems we don't have yet, or might not have._
## How does it look like?
![a1ktg1odde](https://user-images.githubusercontent.com/45508821/50485875-f17edb80-09ee-11e9-92dd-1454ab041fbb.gif)
## More on the bigger picture
The major concern here I have is the solution I introduced might serve the `think small` version of the reporting work, but I don't think it serves the `think big`, I will try to shed some light into why.
Within the current design, It is hard to maintain QueryParams for dynamically generated params (based on the idea of introducing more than one custom filter per report).
To allow ourselves to have more than one generic filter, we will need to:
a. Use the Route's model to retrieve the report's payload (we are now dependent on changes of the QueryParams via computed properties)
b. After retrieving the payload, we can use the `setupController` to define our dynamic QueryParams based on the custom filters definitions we received from the backend
c. Load a custom filter specific Ember component based on the definitions we received from the backend
* First take
* Add support for sprites in themes
Automatically register any custom icons added via themes or plugins
* Fix theme sprite caching
* Simplify test
* Update lib/svg_sprite/svg_sprite.rb
Co-Authored-By: pmusaraj <pmusaraj@gmail.com>
* Fix /svg-sprite/search request
* FEATURE: Add `Top Ignored Users` report
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8), and also part of [this PR](https://github.com/discourse/discourse/pull/7144).
We want to send a System Message daily when a specific count threshold for an ignored is reached. To make this system message informative, we want to link to a report for the Top Ignored Users too.
- Notices are visible only by poster and trust level 2+ users.
- Notices are not generated for non-human or staged users.
- Notices are deleted when post is deleted.
It seems that due to jobs being asynchronous and wrapping code in a
DistributedMutex that by the time we run the
`UserAvatar#update_gravatar!` job that the user/user email might be
destroyed.
This patch checks before a call to `user.email_hash` to make sure
the user and primary email exist to prevent the exception. If not
present, the job exits as there's nothing to do because we are
probably running after the user was destroyed for some reason.
Mods require visibility to everyone group cause category dialogs need to
know about this.
If the site setting `allow moderators to create categories` will not function
without this
Note there is no security expansion of rights here, the group is technically
empty anyway and it always looks exactly the same on all discourse instances
Following this change when a user hits `@` and is replying to a topic they
will see usernames of people who were last seen and participated in the topic
This is somewhat experimental, we may tweak this, or make it optional.
Also, a regression in a423a938 where hitting TAB would eat a post you were writing:
Eg this would eat a post:
``` text
@hello, testing 123 <tab>
```
If a theme setting contained invalid SCSS, it would cause an error 500 on the site, with no way to recover. This commit stops loading theme settings in the core stylesheets, and instead only loads the color scheme variables. This change also makes `common/foundation/variables.scss` available to themes without an explicit import.
Treating TIFF and BMP as images cause us to add them to IMG tags, this is very inconsistent across browsers.
You can still upload these files they will simply not be displayed in IMG tags.
Previously it would unhide their post but leave them silenced.
This fix also cleans up some of the helper classes to make it easier
to pass extra data to the silencing code (for example, a link to the
post that caused the user to be silenced.)
This patch also refactors the auto_silence specs to avoid using
stubs.
Currently the theme is matched by name, which can be fragile when there are many themes with the same name. This functionality will be used by the next version of theme CLI.
This does not serve any technical purpose. It is there to provide a signpost for any user/developer that wants to know what to do with a theme archive.
New `about.json` fields (all optional):
- `authors`: An arbitrary string describing the theme authors
- `theme_version`: An arbitrary string describing the theme version
- `minimum_discourse_version`: Theme will be auto-disabled for lower versions. Must be a valid version descriptor.
- `maximum_discourse_version`: Theme will be auto-disabled for lower versions. Must be a valid version descriptor.
A localized description for a theme can be provided in the language files under the `theme_metadata.description` key
The admin UI has been re-arranged to display this new information, and give more prominence to the remote theme options.
Some badges always appeared in the "Other" group (the default group) and some badges were always moved back into the original group during seeding.
Now badges are either in the correct, seeded group or stay in a custom group if the admin moved the badge into a custom group.
This issue is caused by the filename restrictions in `OptimizedImage#ensure_safe_paths!`. Difficult to add a test for this because image optimization is bypassed in test mode.
- Themes can supply translation files in a format like `/locales/{locale}.yml`. These files should be valid YAML, with a single top level key equal to the locale being defined. For now these can only be defined using the `discourse_theme` CLI, importing a `.tar.gz`, or from a GIT repository.
- Fallback is handled on a global level (if the locale is not defined in the theme), as well as on individual keys (if some keys are missing from the selected interface language).
- Administrators can override individual keys on a per-theme basis in the /admin/customize/themes user interface.
- Theme developers should access defined translations using the new theme prefix variables:
JavaScript: `I18n.t(themePrefix("my_translation_key"))`
Handlebars: `{{theme-i18n "my_translation_key"}}` or `{{i18n (theme-prefix "my_translation_key")}}`
- To design for backwards compatibility, theme developers can check for the presence of the `themePrefix` variable in JavaScript
- As part of this, the old `{{themeSetting.setting_name}}` syntax is deprecated in favour of `{{theme-setting "setting_name"}}`
This commit introduces an ultra low priority queue for post rebakes. This
way rebakes can never interfere with regular sidekiq processing for cases
where we perform a large scale rebake.
Additionally it allows Post.rebake_old to be run with rate_limiter: false
to avoid triggering the limiter when rebaking. This is handy for cases
where you want to just force the full rebake and not wait for it to trickle
This corrects 2 issues:
First is a regression with d7c08e21 for some reason dependent :delete_all
respects default scopes where-as dependent :destroy bypasses it.
Secondly, we were keeping orphan user actions around on user destroy, this
ensures we remove all the user actions not only ones that originated by
the user.
So for example: if I like a post of user A we create a user action saying I
did that, but once user A is deleted we were not removing the action leading
to an orphan action in the database.
Users can have 100s of thousands of post and user actions, we do not want
to destroy each individually cause the tracking is enormous and the amount
of queries we would need is enormous.
This gives up on the `after_commit` hook on `post_actions` which ships a message
to clients to synchronize a post, so some phantom post_actions may remain
in the UX in the rare occasion we delete a user. The phantoms will be gone
on reload.
This should only change the order on freshly imported instances with no likes.
This makes the user summary show the latest topics/posts/links instead of the firsts until the users get some likes.
This allows us to run regular rebakes without starving the normal queue.
It additionally adds the ability to specify queue with `Jobs.enqueue` so
we can specifically queue a job with lower priority using the `queue` arg.
Before this patch, a high trust level user could flag something
and have an action be taken, as well as skipping the flag queue.
Now, if a TL3/TL4 cause an action, the flag will skip the minimum
visibility check and allow staff to review it.
FIX: buildTranslationTree was erroring when translations overlapped (ie. ":-)" and ":-))")
FIX: emoji translations wasn't working properly when translations overlapped
Previously we only allowed one image optimization per machine, this meant there
was cross talk between avatar resizing and Sidekiq. This could lead to large
amounts of starvation when optimized image version changed which in turn could
block the Sidekiq queue.
This increases amount of allowed load on machines but this is preferable to
having crosstalk between avatar resizing and Sidekiq.
Some cloud providers (Google Memorystore) do not support any CLIENT commands
By setting :id to nil in the redis config hash we can avoid these commands.
This adds a special global setting GCE users can enable:
`DISCOURSE_REDIS_SKIP_CLIENT_COMMANDS = true`
We have the periodical job that regularly will rebake old posts. This is
used to trickle in update to cooked markdown. The problem is that each rebake
can issue multiple background jobs (post process and pull hotlinked images)
Previously we had no per-cluster limit so cluster running 100s of sites could
flood the sidekiq queue with rebake related jobs.
New system introduces a hard limit of 300 rebakes per 15 minutes across a
cluster to ensure the sidekiq job is not dominated by this.
We also reduced `rebake_old_posts_count` to 80, which is a safer default.
This reverts commit 993f847a2c.
There is an edge case where the link click redirect fails when the URL has trailing slash. Need to figure out a better fix for this.
Previously we had no idea what algorithm generated thumbnails, this starts tracking the version.
We also bumped up the version to force all optimized images to be generated. This is important cause we recently introduced pngquant which results in much smaller images.
This feature ensures optimized images run via pngquant, this results extreme amounts of savings for resized images. Effectively the only impact is that the color palette on small resized images is reduced to 256.
To ensure safety we only apply this optimisation to images smaller than 500k.
This commit also makes a bunch of image specs less fragile.
Previously if upload had missing width and height we would calculate
on first use BUT we (me) forgot to save this to the database
This was particularly bad on home page cause category images (when old)
miss dimensions.
This generates a 10x10 PNG thumbnail for each lightboxed image.
If Image Lazy Loading is enabled (IntersectionObserver API) then
we'll load the low res version when offscreen. As the image scrolls
in we'll swap it for the high res version.
We use a WeakMap to track the old image attributes. It's much less
memory than storing them as `data-*` attributes and swapping them
back and forth all the time.
* Dashboard doesn't timeout anymore when Amazon S3 is used for backups
* Storage stats are now a proper report with the same caching rules
* Changing the backup_location, s3_backup_bucket or creating and deleting backups removes the report from the cache
* It shows the number of backups and the backup location
* It shows the used space for the correct backup location instead of always showing used space on local storage
* It shows the date of the last backup as relative date
`SiteSerializer#is_readonly` is cached for an anonymous user so we have
to clear the cache when disabling readonly mode. Otherwise, the site may
appear to be in readonly mode for an extended period of time.