This commit fixes the follow quality issue with `PostSearchData#raw_data`:
1. URLs are being tokenized and links with similar href and characters
are being duplicated in the raw data.
`Post#cooked`:
```
<p><a href=\"https://meta.discourse.org/some.png\" class=\"onebox\" target=\"_blank\" rel=\"nofollow noopener\">https://meta.discourse.org/some.png</a></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png discourse org/some png https://meta.discourse.org/some.png discourse org/some png
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png meta discourse org
```
2. Ligthbox being included in search pollutes the
`PostSearchData#raw_data` unncessarily.
From 28 March 2018 to 28 March 2019, searches for the term `image` on
`meta.discourse.org` had a click through rate of 2.1%. Non-lightboxed images are not included in indexing for search yet we were indexing content within a lightbox. Also, search for terms like `image` was affected we were using `Pasted image` as the filename for
uploads that were pasted.
`Post#cooked`
```
<p>Let me see how I can fix this image<br>\n<div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://meta.discourse.org/some.png\" title=\"some.png\" rel=\"nofollow noopener\"><img src=\"https://meta.discourse.org/some.png\" width=\"275\" height=\"299\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#far-image\"></use></svg><span class=\"filename\">some.png</span><span class=\"informations\">1750×2000</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#discourse-expand\"></use></svg>\n</div></a></div></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image some.png png https://meta.discourse.org/some.png discourse org/some png some.png png 1750×2000
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image
```
In terms of indexing performance, we now have to parse the given HTML
through nokogiri twice. However performance is not a huge worry here since a string length of 194170 takes only 30ms
to scrub plus the indexing takes place in a background job.
Includes support for flags, reviewable users and queued posts, with REST API
backwards compatibility.
Co-Authored-By: romanrizzi <romanalejandro@gmail.com>
Co-Authored-By: jjaffeux <j.jaffeux@gmail.com>
- The test_email job is removed, because it was always being run synchronously (not in sidekiq)
- 34b29f62 added a bypass for critical emails, to match the spec. This removes the bypass, and removes the spec.
- This adapts the specs for 72ffabf6, so that they check for emails being sent
- This reimplements c2797921, allowing test emails to be sent even when emails are disabled
* Remove use of 0 in favor of `TrustLevel.levels[:newuser]`.
* Consolidate two tests into a single one.
* Test that disabling the feature works.
* Avoid loading full ActiveRecord object in test when we only need to
know the existence of the record.
Migrates email user options to a new data structure, where `email_always`, `email_direct` and `email_private_messages` are replace by
* `email_messages_level`, with options: `always`, `only_when_away` and `never` (defaults to `always`)
* `email_level`, with options: `always`, `only_when_away` and `never` (defaults to `only_when_away`)
* FEATURE: Add `IgnoredUsersSummary` daily job
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8).
We want to:
1. Send an automatic group PM that goes out to moderators
2. When {x} users have Ignored the same user, threshold defined by a site setting, default of 5
3. Only send this message every X days which is defined by another site setting
It is not a setting, and only relevant in specs. The new API is:
```
Jobs.run_later! # jobs will be thrown on the queue
Jobs.run_immediately! # jobs will run right away, avoid the queue
```
We can only be sure that an email is sent when we get a mailer in
`ActionMailer::Deliveries`. A couple of tests were actually incorrect
because it didn't flow through our email sender where there are more
conditions in determining whether an email is sent or not.
- Open the log file in "append" mode. This avoids issues if the file does not exist (and matches standard rails log behavior)
- Correctly parse the interval logging environment variable
By default, this does nothing. Two environment variables are available:
- `DISCOURSE_LOG_SIDEKIQ`
Set to `"1"` to enable logging. This will log all completed jobs to `log/rails/sidekiq.log`, along with various db/redis/network statistics. This is useful to track down poorly performing jobs.
- `DISCOURSE_LOG_SIDEKIQ_INTERVAL`
(seconds) Check running jobs periodically, and log their current duration. They will appear in the logs with `status:pending`. This is useful to track down jobs which take a long time, then crash sidekiq before completing.
There was a situation where if:
* There were new flags to review that met the visibility threshold
AND
* There were old flags that *didn't* meet the threshold
THEN
a pending flags notification would be sent out. This fixes that case.
Staff should not be notified of flags if they do not meet the threshold
and are old.
This commit introduces an ultra low priority queue for post rebakes. This
way rebakes can never interfere with regular sidekiq processing for cases
where we perform a large scale rebake.
Additionally it allows Post.rebake_old to be run with rate_limiter: false
to avoid triggering the limiter when rebaking. This is handy for cases
where you want to just force the full rebake and not wait for it to trickle
This allows us to run regular rebakes without starving the normal queue.
It additionally adds the ability to specify queue with `Jobs.enqueue` so
we can specifically queue a job with lower priority using the `queue` arg.
* Dashboard doesn't timeout anymore when Amazon S3 is used for backups
* Storage stats are now a proper report with the same caching rules
* Changing the backup_location, s3_backup_bucket or creating and deleting backups removes the report from the cache
* It shows the number of backups and the backup location
* It shows the used space for the correct backup location instead of always showing used space on local storage
* It shows the date of the last backup as relative date
* Doing it in a post migration was a bad idea
because the migration will fail if the site
is down while trying to download uploads
which points to the instance. This mainly
affects self-hosters using `discourse_docker`
where `./launcher rebuild` will take the
existing container down.
In the past onceoff was forcing inline download of gravatars,
this can be so expensive that it will never finish
This fix ensures it only marks avatars stale which will be picked
up by regular schedules
Last week we added support for dual stack urls but did not remap the
the old records in the uploads and optimized images table
This caused a few minor edge cases worst was that if you rebaked old
images S3 CDN was not repopulated.
If sidekiq is paused or Discourse is in readonly continue to queue
heartbeats
If we do not do that then a master process can end up reaping sidekiq
workers and causing various badness
This also impacts restore which can do weird stuff TM in cases like this
Introduces a hidden setting (default is 0.1) that erodes bounce score
every time we send an email. This means that erratic failures are less
painful cause system auto corrects
Previously we used width and height for thumbnails, new code ensures
1. We auto correct width and height
2. We added extra columns for thumbnail_width and height, this is determined
by actual upload and no longer passed in as a side effect
3. Optimized Image now stores filesize which can be used for analysis, decisions
Also
- fixes Android image manifest as a side effect
- fixes issue where a thumbnail generated that is smaller than the upload is no longer used
- moderation tab
- sorting/pagination
- improved third party reports support
- trending charts
- better perf
- many fixes
- refactoring
- new reports
Co-Authored-By: Simon Cossar <scossar@users.noreply.github.com>
### navigate_to_first_post_after_read setting for categories
When enabled on categories logged on users will return to OP after
reading the entire category. (useful for documentation categories)
### num_auto_bump_daily
Set a number of topics that will automatically bump daily on a category.
- Every 15 minutes we will check if any category has this setting
- Categories with the setting are shuffled
- We exclude pinned, closed, category description and archived topics
- Maximum of 1 topic for the list of categories is bumped till limit reached per category
- We always try to bump oldest first
- Limit is elastic using a RateLimiter that ensures that we only bump N per day
Also some minor organisation on category settings
Froze strings on category.rb
Introduce new patterns for direct sql that are safe and fast.
MiniSql is not prone to memory bloat that can happen with direct PG usage.
It also has an extremely fast materializer and very a convenient API
- DB.exec(sql, *params) => runs sql returns row count
- DB.query(sql, *params) => runs sql returns usable objects (not a hash)
- DB.query_hash(sql, *params) => runs sql returns an array of hashes
- DB.query_single(sql, *params) => runs sql and returns a flat one dimensional array
- DB.build(sql) => returns a sql builder
See more at: https://github.com/discourse/mini_sql
* don't update search index if post belongs to deleted topic
* log errors instead of crashing when updating post or revision fails
* update mentions even when the href attribute is missing
* run the background job with low priority
* replace username in all notifications
* update `action_code_who` used by small action posts
- remove inactive user report and replace with posts
- clean up internals so grouping by week happens on client
- when switching periods old report was not destroyed leading to bugs
- calculate trend based on previous interval ... not previous 30 days
- show percentages for mau/dau
- be more careful about utc date usage
- show uniqu and click through rate on search panel
- publish key of report with report so we only load the correct one
- subscribe earlier in channel in case of concurrency issues
* FEATURE: Update avatars in posts and revisions when user gets renamed
* FIX: Replace username in deleted posts when user gets renamed
* FEATURE: Replace username in notifications when user gets renamed
FEATURE: Update mentions and quotes when user gets merged
* Feature: Push notifications for Android
Notification config for desktop and mobile are merged.
Desktop notifications stay as they are for desktop views.
If mobile mode, push notifications are enabled.
Added push notification subscriptions in their own table, rather than through
custom fields.
Notification banner prompts appear for both mobile and desktop when enabled.
* `rescue nil` is a really bad pattern to use in our code base.
We should rescue errors that we expect the code to throw and
not rescue everything because we're unsure of what errors the
code would throw. This would reduce the amount of pain we face
when debugging why something isn't working as expexted. I've
been bitten countless of times by errors being swallowed as a
result during debugging sessions.
This feature can be enabled by choosing a destination for the
`shared drafts category` site setting.
* Staff members can create shared drafts, choosing a destination
category for the topic when it is published.
* Shared Drafts can be viewed in their category, or above the
topic list for the destination category where it will end up.
* When the shared draft is ready, it can be published to the
appropriate category by clicking a button on the topic view.
* When published, Drafts change their timestamps to the current
time, and any edits to the original post are removed.
Also
- Significantly improved search ranking, title is treated most strongly
- Adds tag names to the index
- Run search re-indexer more aggressively
- Re-index topic and all posts on category change
* This exposes the token in the Sidekiq dashboard which can be
viewed by an admin and defeats the purpose of using a token
in the download backup email ink.
* SPEC: PollFeedJob parsing atom feed
* add FeedItemAccessor
It is to provide a consistent interface to access a feed item's tag
content.
* add FeedElementInstaller
to install non-standard and non-namespaced feed elements
* FEATURE: replace SimpleRSS with Ruby RSS module
* get FinalDestination and download with Excon
* support namespaced element with FeedElementInstaller
This refactors handling of s3 so it can be specified via GlobalSetting
This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3
It is a critical setting for situations where assets are mirrored to s3.
Setting a user's default groups based on their email address should only be done once, ie. when they confirm their email address.
Previously we were doing this everytime we'd save a user record 🤷
Figuring out what unread topics a user has is a very expensive
operation over time.
Users can easily accumulate 10s of thousands of tracking state rows
(1 for every topic they ever visit)
When figuring out what a user has that is unread we need to join
the tracking state records to the topic table. This can very quickly
lead to cases where you need to scan through the entire topic table.
This commit optimises it so we always keep track of the "first" date
a user has unread topics. Then we can easily filter out all earlier
topics from the join.
We use pg functions, instead of nested queries here to assist the
planner.