Commit Graph

330 Commits

Author SHA1 Message Date
Alan Guo Xiang Tan
930f51e175 FEATURE: Split up text segmentation for Chinese and Japanese.
* Chinese segmenetation will continue to rely on cppjieba
* Japanese segmentation will use our port of TinySegmenter
* Korean currently does not rely on segmentation which was dropped in c677877e4f
* SiteSetting.search_tokenize_chinese_japanese_korean has been split
into SiteSetting.search_tokenize_chinese and
SiteSetting.search_tokenize_japanese respectively
2022-02-07 09:21:14 +08:00
Rafael dos Santos Silva
5b5cbbfe5c
FEATURE: Onebox for news.ycombinator.com (#15781) 2022-02-03 13:39:21 -03:00
Martin Brennan
82cb67e67b
FIX: Canonical Message-ID was incorrect for some cases (#15701)
When creating a direct message to a group with group SMTP
set up, and adding another person to that message in the OP,
we send an email to the second person in the OP via the group_smtp
job. This in turn creates an IncomingEmail record to guard against
IMAP double sync.

The issue with this was that this IncomingEmail (which is essentialy
a placeholder/dummy one) was having its Message-ID used as the canonical
References Message-ID for subsequent emails sent out to user_private_message
recipients (such as members of the group), causing threading issues in
the mail client. The canonical <topic/ID@HOST> format should be used
instead for these cases.

This commit fixes the issue by only using the IncomingEmail for the
OP's Message-ID if the OP was created via our handle_mail email receiver
pipeline. It does not make sense to use it in other cases.
2022-02-03 10:36:32 +10:00
Natalie Tay
aac9f43038
Only block domains at the final destination (#15689)
In an earlier PR, we decided that we only want to block a domain if 
the blocked domain in the SiteSetting is the final destination (/t/59305). That 
PR used `FinalDestination#get`. `resolve` however is used several places
 but blocks domains along the redirect chain when certain options are provided.

This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed 
and youtube
  - these folks already go through `Oneboxer#external_onebox` which already
  blocks correctly
2022-01-31 15:35:12 +08:00
Bianca Nenciu
376799b1a4
FIX: Hide excerpt of binary files in GitHub onebox (#15639)
Oneboxer did not know if a file is binary or not and always tried to
show an excerpt of the file.
2022-01-19 14:45:36 +02:00
jbrw
2909b8b820
FIX: origins_to_regexes should always return an array (#15589)
If the SiteSetting `allowed_onebox_iframes` contains a value of `*`, it will use the values of `all_iframe_origins` during the Oneboxing process. If `all_iframe_origins` itself contains a value of `*`, `origins_to_regexes` will try to return a "catch-all" regex.

Other code assumes `origins_to_regexes`will return an array, so this change ensures the `*` case will return an array containing only the catch-all regex.
2022-01-17 12:48:41 -05:00
Jarek Radosz
31b27b3712
FIX: Broken GitHub folder onebox logic (#15612)
1. `html_doc.css('.Box.md')` always returns a truthy value (e.g. `[]`) so the second branch of the if-elsif never ran
2. `node&.css('text()')` was invalid code that would raise an error
3. Matching on h3 elements is no longer correct with the current html structure returned by GitHub
2022-01-17 18:32:07 +01:00
Jarek Radosz
19fcb0b5ea
DEV: Prevent extraneous log message in specs (#15504)
Yo dawg, I put `silence_stdout` in your `silence_stdout` so you can still write to stdout? 🤔
2022-01-09 20:26:52 +01:00
Martin Brennan
04c7776650
DEV: Rolling back bookmarkable column changes (#15482)
It is too close to release of 2.8 for incomplete
feature shenanigans. Ignores and drops the columns and drops
the trigger/function introduced in
e21c640a3c.
Will pick this feature back up post-release.
2022-01-07 12:16:43 +10:00
Martin Brennan
e21c640a3c
DEV: Add polymorphic bookmarkable columns (#15454)
We are planning on attaching bookmarks to more and
more other models, so it makes sense to make a polymorphic
relationship to handle this. This commit adds the new
columns and backfills them in the bookmark table, and
makes sure that any new bookmark changes fill in the columns
via DB triggers.

This way we can gradually change the frontend and backend
to use these new columns, and eventually delete the
old post_id and for_topic columns in `bookmarks`.
2022-01-06 08:56:05 +10:00
Peter Zhu
c5fd8c42db
DEV: Fix methods removed in Ruby 3.2 (#15459)
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
2022-01-05 18:45:08 +01:00
Martin Brennan
20fe5eceb8
FEATURE: Scheduled group email credential problem check (#15396)
This commit adds a check that runs regularly as per
2d68e5d942 which tests the
credentials of groups with SMTP or IMAP enabled. If any issues
are found with those credentials a high priority problem is added to the
admin dashboard.

This commit also formats the admin dashboard differently if
there are high priority problems, bringing them to the top of
the list and highlighting them.

The problem will be cleared if the issue is fixed before the next
problem check, or if the group's settings are updated with a valid
credential.
2022-01-04 10:14:33 +10:00
David Taylor
cdf4d7156e
DEV: Introduce Auth::Result API for overrides_* (#15378)
This allows authenticators to instruct the Auth::Result to override attributes without using the general site settings. This provides an easy migration path for auth plugins which offer their own "overrides email", "overrides username" or "overrides name" settings. With this new api, they can set `overrides_*` on the result object, and the attribute will be overriden regardless of the general site setting.

ManagedAuthenticator is updated to use this new API. Plugins which consume ManagedAuthenticator will instantly take advantage of this change.
2021-12-23 10:53:17 +00:00
jbrw
6e925fee6f
FIX: Use basic meta description if other description tags are missing (#15356)
When attempting to Onebox a page if there is no `meta property="og:description"` tag but there is a  `meta name="description"` tag, Onebox should try to use that value.
2021-12-17 19:36:54 -05:00
Gerhard Schlager
e19a7a7c8d FIX: translation precedence was different on client and server
As an example, the lookup order for German was:

1. override for de
2. override for en
3. value from de
4. value from en

After this change the lookup order is the same as on the client:
1. override for de
2. value from de
3. override for en
4. value from en

see /t/16381
2021-12-17 14:03:35 +01:00
Blake Erickson
b93b6c4299
FIX: Blurry onebox favicon images (#15258)
This is a fix to address blurry onebox favicon images if the site you
are linking to happens to have a favicon.ico file that contains multiple
images.

This fix detects of we are trying to create an upload for a favicon.ico
file. We then convert it to a png and not a jpeg like we were doing. We
want a png because it will preserve transparency, otherwise if we
convert it to a jpeg we lose that and it looks bad on dark themed sites.

This fix also addresses the fact that .ico files can include multiple
images. The blurry images we were producing was caused by the
ImageMagick `-flatten` option when the .ico file had multiple images
which then squishes them all together. So for .ico files we are no
longer flattening them and instead we are grabbing the last image in the
.ico bundle and converting that single image to a png.
2021-12-10 12:25:50 -07:00
David Taylor
f799b8bfb1
FIX: Ensure MessageIdService can handle hostname changes and multisite (#15231) 2021-12-08 11:17:20 +00:00
Martin Brennan
f26b8b448d
FIX: References header leading to broken email threading (#15206)
Since 3b13f1146b the email threading
in mail clients has been broken, because the random suffix meant
that the References header would always be different for non-group
SMTP email notifications sent out.

This commit fixes the issue by always using the "canonical" topic
reference ID inside the References header in the format:

topic/TOPIC_ID@HOST

Which was the old format. We also add the References header to
notifications sent for the first post arriving, so the threading
works for subsequent emails. The Message-ID header is still random
as per the previous change.
2021-12-08 08:14:48 +10:00
Martin Brennan
3b13f1146b
FIX: Add random suffix to outbound Message-ID for email (#15179)
Currently the Message-IDs we send out for outbound email
are not unique; for a post they look like:

topic/TOPIC_ID/POST_ID@HOST

And for a topic they look like:

topic/TOPIC_ID@HOST

This commit changes the outbound Message-IDs to also have
a random suffix before the host, so the new format is
like this:

topic/TOPIC_ID/POST_ID.RANDOM_SUFFIX@HOST

Or:

topic/TOPIC_ID.RANDOM_SUFFIX@HOST

This should help with email deliverability. This change
is backwards-compatible, the old Message-ID format will
still be recognized in the mail receiver flow, so people
will still be able to reply using Message-IDs, In-Reply-To,
and References headers that have already been sent.

This commit also refactors Message-ID related logic
to a central location, and adds judicious amounts of
tests and documentation.
2021-12-06 10:34:39 +10:00
Osama Sayegh
7bd3986b21
FEATURE: Replace Crawl-delay directive with proper rate limiting (#15131)
We have a couple of site setting, `slow_down_crawler_user_agents` and `slow_down_crawler_rate`, that are meant to allow site owners to signal to specific crawlers that they're crawling the site too aggressively and that they should slow down.

When a crawler is added to the `slow_down_crawler_user_agents` setting, Discourse currently adds a `Crawl-delay` directive for that crawler in `/robots.txt`. Unfortunately, many crawlers don't support the `Crawl-delay` directive in `/robots.txt` which leaves the site owners no options if a crawler is crawling the site too aggressively.

This PR replaces the `Crawl-delay` directive with proper rate limiting for crawlers added to the `slow_down_crawler_user_agents` list. On every request made by a non-logged in user, Discourse will check the User Agent string and if it contains one of the values of the `slow_down_crawler_user_agents` list, Discourse will only allow 1 request every N seconds for that User Agent (N is the value of the `slow_down_crawler_rate` setting) and the rest of requests made within the same interval will get a 429 response. 

The `slow_down_crawler_user_agents` setting becomes quite dangerous with this PR since it could rate limit lots if not all of anonymous traffic if the setting is not used appropriately. So to protect against this scenario, we've added a couple of new validations to the setting when it's changed:

1) each value added to setting must 3 characters or longer
2) each value cannot be a substring of tokens found in popular browser User Agent. The current list of prohibited values is: apple, windows, linux, ubuntu, gecko, firefox, chrome, safari, applewebkit, webkit, mozilla, macintosh, khtml, intel, osx, os x, iphone, ipad and mac.
2021-11-30 12:55:25 +03:00
Natalie Tay
4c46c7e334
DEV: Remove xlink hrefs (#15059) 2021-11-25 15:22:43 +11:00
David Taylor
13fdc979a8
DEV: Improve multisite testing (#14884)
This commit adds the RailsMultisite middleware in test mode when Rails.configuration.multisite is true. This allows for much more realistic integration testing. The `multisite_spec.rb` file is rewritten to avoid needing to simulate a middleware stack.
2021-11-11 16:44:58 +00:00
Martin Brennan
fc98d1edfa
DEV: Improve s3:ensure_cors_rules logging (#14832) 2021-11-08 11:44:12 +10:00
Martin Brennan
9a72a0945f
FIX: Ensure CORS rules exist for S3 using rake task (#14802)
This commit introduces a new s3:ensure_cors_rules rake task
that is run as a prerequisite to s3:upload_assets. This rake
task calls out to the S3CorsRulesets class to ensure that
the 3 relevant sets of CORS rules are applied, depending on
site settings:

* assets
* direct S3 backups
* direct S3 uploads

This works for both Global S3 settings and Database S3 settings
(the latter set directly via SiteSetting).

As it is, only one rule can be applied, which is generally
the assets rule as it is called first. This commit changes
the ensure_cors! method to be able to apply new rules as
well as the existing ones.

This commit also slightly changes the existing rules to cover
direct S3 uploads via uppy, especially multipart, which requires
some more headers.
2021-11-08 09:16:38 +10:00
jbrw
aec125b617
FIX: Display Instagram Oneboxes in an iframe (#14789)
We are no longer able to display the image returned by Instagram directly within a Discourse site (either in the composer, or within a cooked post within a topic), so:

- Display an image placeholder in the composer preview
- A cooked post should use an iframe to display the Instagram 'embed' content
2021-11-02 14:34:51 -04:00
Joffrey JAFFEUX
b18c01e3c6
DEV: prevents flakky spec when deleting plugin (#14701)
Not reseting the registry could lead to assets still being registered for example.

This flakky spec was reprdocible with this call: `bundle exec rspec --seed 9472 spec/components/discourse_plugin_registry_spec.rb spec/components/svg_sprite/svg_sprite_spec.rb`

Which would trigger the following error:

```
Failures:

  1) DiscoursePluginRegistry#register_asset registers vendored_core_pretty_text properly
     Failure/Error: expect(registry.javascripts.count).to eq(0)

       expected: 0
            got: 1

       (compared using ==)
     # ./spec/components/discourse_plugin_registry_spec.rb:248:in `block (3 levels) in <top (required)>'
     # ./spec/rails_helper.rb:280:in `block (2 levels) in <top (required)>'
     # /Users/joffreyjaffeux/.gem/ruby/2.7.3/gems/webmock-3.14.0/lib/webmock/rspec.rb:37:in `block (2 levels) in <top (required)>'
```
2021-10-25 10:24:21 +02:00
Yasuo Honda
9a083a550c FIX: BackupRestore::DatabaseRestorer failures with Ruby 3
Implemented workaround suggested at
https://github.com/freerange/mocha/issues/445#issuecomment-644944003
2021-10-12 17:25:51 -04:00
Alan Guo Xiang Tan
34cebfd867
FIX: Exclude PMs that user sent to themselves. (#14496)
Regression from 016efeadf6

Follow-up to 016efeadf6
2021-10-04 11:55:35 +08:00
Alan Guo Xiang Tan
9d5da2b383
PERF: Revert all inboxes from messages route. (#14445)
The all inboxes was introduced in
016efeadf6 but we decided to roll it back
for performance reasons. The main performance challenge here is that PG
has to basically loop through all the PMs that a user is allowed to view
before being able to order by `Topic#bumped_at`. The all inboxes was not
planned as part of the new/unread filter so we've decided not to tackle
the performance issue for the upcoming release.

Follow-up to 016efeadf6
2021-09-28 11:58:04 +08:00
Martin Brennan
dba6a5eabf
FEATURE: Humanize file size error messages (#14398)
The file size error messages for max_image_size_kb and
max_attachment_size_kb are shown to the user in the KB
format, regardless of how large the limit is. Since we
are going to support uploading much larger files soon,
this KB-based limit soon becomes unfriendly to the end
user.

For example, if the max attachment size is set to 512000
KB, this is what the user sees:

> Sorry, the file you are trying to upload is too big (maximum
size is 512000KB)

This makes the user do math. In almost all file explorers that
a regular user would be familiar width, the file size is shown
in a format based on the maximum increment (e.g. KB, MB, GB).

This commit changes the behaviour to output a humanized file size
instead of the raw KB. For the above example, it would now say:

> Sorry, the file you are trying to upload is too big (maximum
size is 512 MB)

This humanization also handles decimals, e.g. 1536KB = 1.5 MB
2021-09-22 07:59:45 +10:00
Martin Brennan
27699648ef
FEATURE: Go to last unread for topic-level bookmark links (#14396)
Instead of going to the OP of the topic for topic-level bookmarks
(which are bookmarks where for_topic is true) when clicking on the
bookmark in the quick access menu or on the user bookmark list,
this commit takes the user to the last unread post in
the topic instead. This should be generally more useful than landing
on the unchanging OP.

To make this work nicely, I needed to add the last_read_post_number to
the BookmarkQuery based on the TopicUser association. It should not add
too much extra weight to the query, because it is limited to the user
that we are fetching bookmarks for.

Also fixed an issue where the bookmark serializer highest_post_number was
not taking into account whether the user was staff, which is when we
should use highest_staff_post_number instead.
2021-09-21 13:49:56 +10:00
Alan Guo Xiang Tan
7a8b5cdd5c
DEV: Improve tests coverage when listing private messages. (#14385)
This is in response to the security incident published in
https://github.com/discourse/discourse/security/advisories/GHSA-vm3x-w6jm-j9vv.

The security incident highlighted a gap in our test suite so we're
adding more test cases to ensure that personal and group messages do not
leak between users in the future.
2021-09-21 10:39:59 +08:00
Martin Brennan
0c42a1e5f3
FEATURE: Topic-level bookmarks (#14353)
Allows creating a bookmark with the `for_topic` flag introduced in d1d2298a4c set to true. This happens when clicking on the Bookmark button in the topic footer when no other posts are bookmarked. In a later PR, when clicking on these topic-level bookmarks the user will be taken to the last unread post in the topic, not the OP. Only the OP can have a topic level bookmark, and users can also make a post-level bookmark on the OP of the topic.

I had to do some pretty heavy refactors because most of the bookmark code in the JS topics controller was centred around instances of Post JS models, but the topic level bookmark is not centred around a post. Some refactors were just for readability as well.

Also removes some missed reminderType code from the purge in 41e19adb0d
2021-09-21 08:45:47 +10:00
Martin Brennan
41e19adb0d
DEV: Ignore reminder_type for bookmarks (#14349)
We don't actually use the reminder_type for bookmarks anywhere;
we are just storing it. It has no bearing on the UI. It used
to be relevant with the at_desktop bookmark reminders (see
fa572d3a7a)

This commit marks the column as readonly, ignores it, and removes
the index, and it will be dropped in a later PR. Some plugins
are relying on reminder_type partially so some stubs have been
left in place to avoid errors.
2021-09-16 09:56:54 +10:00
Alan Guo Xiang Tan
ddb458343d
PERF: Improve query performance all inbox private messages. (#14304)
First reported in https://meta.discourse.org/t/-/202482/19

There are two optimizations being applied here:

1. Fetch a user's group ids in a seperate query instead of including it
   as a sub-query. When I tried a subquery, the query plan becomes very
inefficient.

1. Join against the `topic_allowed_users` and `topic_allowed_groups`
   table instead of doing an IN against a subquery where we UNION the
`topic_id`s from the two tables. From my profiling, this enables PG to
do a backwards index scan on the `index_topics_on_timestamps_private`
index.

This commit fixes a bug where listing all messages was incorrectly
excluding topics if a topic has been archived by a group even if the
user did not belong to the group.

This commit also fixes another bug where dismissing private messages
selectively was subjected to the default limit of 30.
2021-09-15 10:29:42 +08:00
Martin Brennan
22208836c5
DEV: Ignore bookmarks.topic_id column and remove references to it in code (#14289)
We don't need no stinkin' denormalization! This commit ignores
the topic_id column on bookmarks, to be deleted at a later date.
We don't really need this column and it's better to rely on the
post.topic_id as the canonical topic_id for bookmarks, then we
don't need to remember to update both columns if the bookmarked
post moves to another topic.
2021-09-15 10:16:54 +10:00
Jean
34ff7bfeeb
FEATURE: Hide suspended users from site-wide search to regular users (#14245) 2021-09-06 09:59:35 -04:00
Martin Brennan
e43a8af3bd
FIX: Do not send emails to mailing_list_mode subscribers for PMs (#14159)
This bug was introduced by f66007ec83.

In PostJobsEnqueuer we previously did not fire the after_post_create
event and after_topic_create event for private message topics. This was
changed in the above commit in order to publish message bus messages
for topic tracking state updates. Unfortunately this caused the
NotifyMailingListSubscribers job to be enqueued for all posts including
private messages, and admins and the users involved in the PMs got
emailed the contents of the PMs if they had mailing list mode enabled.

Luckily the impact of this was mitigated by a Guardian#can_see? check
for each mailing list mode user in the NotifyMailingListSubscribers job.
We never want to notify mailing list mode subscribers for private messages
so an early return has been added there, plus the logic in PostJobsEnqueuer
has been fixed, and tests have been added to that class where there were
none before.
2021-08-26 15:16:35 +10:00
Alan Guo Xiang Tan
d13716286c
FIX: Unread group PMs should use GroupUser#first_unread_pm_at. (#14075)
This bug was causing unread PMs for groups to appear inaccurate.
2021-08-18 11:23:28 +08:00
Chema Balsas
745b99edbf TEST: Adds test for urls with url-encoded section hash 2021-08-12 10:43:50 -04:00
Chema Balsas
6b8ee4d5ef TEST: Adds test for urls with section hash 2021-08-12 10:43:50 -04:00
Alan Guo Xiang Tan
0bf27242ec FIX: Group inbox new filter not accounting for dismissed topics.
Follow-up to 2c046cc670
2021-08-05 16:53:12 +08:00
Alan Guo Xiang Tan
2c046cc670 FEATURE: Dismiss new and unread for PM inboxes. 2021-08-05 12:56:15 +08:00
Bianca Nenciu
e2c415457c
FEATURE: Attach backup log as upload (#13849)
Discourse automatically sends a private message after backup or
restore finished. The private message used to contain the log inline
even when it was very long. A very long log can create issues because
the length of the post will be over the maximum allowed length of a
post. When that happens, Discourse will try to create an upload with
the logs. If that fails, it will trim the log and inline it.
2021-08-03 20:06:50 +03:00
Bianca Nenciu
fbf7627c8e
FIX: Make search work with sub-sub-categories (#13901)
Searching in a category looked only one level down, ignoring the site
setting max_category_nesting. The user interface did not support the
third level of categories and did not display them in the "Categorized"
input of the advanced search options.
2021-08-02 14:04:13 +03:00
Alan Guo Xiang Tan
016efeadf6
FEATURE: New and Unread messages for user personal messages. (#13603)
* FEATURE: New and Unread messages for user personal messages.

Co-authored-by: awesomerobot <kris.aubuchon@discourse.org>
2021-08-02 12:41:41 +08:00
jbrw
2f28ba318c
FEATURE: Onebox can match engines based on the content_type (#13876)
* FEATURE: Onebox can match engines based on the content_type

`FinalDestination` now returns the `content_type` of a resolved URL.

`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.

`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.

This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
2021-07-30 13:36:30 -04:00
Joffrey JAFFEUX
5eb6e9281a
FIX: manually adds frowning_face_with_open_mouth for apple (#13528) 2021-07-21 23:27:20 +02:00
Michael Brown
76a11e6dc9 DEV: fix test (missed a reference to master) 2021-07-19 12:47:45 -04:00
Michael Brown
aa12d12c0b discourse/discourse change from 'master' to 'main': update fixture data 2021-07-19 11:46:15 -04:00