Commit Graph

181 Commits

Author SHA1 Message Date
Jarek Radosz
3d55f2e3b7
FIX: Improvements and fixes to the image downsizing script (#9950)
Fixed bugs, added specs, extracted the upload downsizing code to a class, added support for non-S3 setups, changed it so that images aren't downloaded twice.

This code has been tested on production and successfully resized ~180k uploads.

Includes:

* DEV: Extract upload downsizing logic
* DEV: Add support for non-S3 uploads
* DEV: Process only images uploaded by users
* FIX: Incorrect usage of `count` and `exist?` typo
* DEV: Spec S3 image downsizing
* DEV: Avoid downloading images twice
* DEV: Update filesizes earlier in the process
* DEV: Return false on invalid upload
* FIX: Download images that currently above the limit (If the image size limit is decreased, then there was no way to resize those images that now fall outside the allowed size range)
* Update script/downsize_uploads.rb (Co-authored-by: Régis Hanol <regis@hanol.fr>)
2020-06-11 14:47:59 +02:00
Jarek Radosz
27ad562ff5 DEV: Rubocop fix 2020-06-01 06:07:07 +02:00
Jarek Radosz
7df688d108 FIX: Handle files removed between glob and mtime 2020-06-01 05:50:50 +02:00
Roman Rizzi
b61a291cf3
FIX: returns false if the upload url is an invalid mailto link (#9877) 2020-05-26 10:32:48 -03:00
Michael Brown
d9a02d1336
Revert "Revert "Merge branch 'master' of https://github.com/discourse/discourse""
This reverts commit 20780a1eee.

* SECURITY: re-adds accidentally reverted commit:
  03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
  instead of the 03d26cd6 parent (which contains security fixes)
2020-05-23 00:56:13 -04:00
Jeff Atwood
20780a1eee Revert "Merge branch 'master' of https://github.com/discourse/discourse"
This reverts commit e62a85cf6f, reversing
changes made to 2660c2e21d.
2020-05-22 20:25:56 -07:00
Osama Sayegh
02f44def56
FIX: Don't blow up when trying to parse invalid or non-ASCII URLs (#9838)
* FIX: Don't blow up when trying to parseinvalid or non-ASCII URLs

Follow-up to 72f139191e
2020-05-20 12:46:27 +03:00
Martin Brennan
72f139191e
FIX: S3 store has_been_uploaded? was not taking into account s3 bucket path (#9810)
In some cases, between Discourse forums the hostname of a URL could match if they are hosting S3 files on the same bucket but the S3 bucket path might not. So e.g. https://testbucket.somesite.com/testpath/some/file/url.png vs https://testbucket.somesite.com/prodpath/some/file/url.png. So has_been_uploaded? was returning true for the second URL, even though it may have been uploaded on a different Discourse forum.

This is a very rare case but must be accounted for, because this impacts UrlHelper.is_local which mistakenly thinks the file has already been downloaded and thus allows the URL to be cooked, where we want to return the full URL to be downloaded using PullHotlinkedImages.
2020-05-20 10:40:38 +10:00
Sam Saffron
0cbaa8d813
FEATURE: extend duration allowed for download
Previously we would raise a warning in the logs if downloading
a file (from s3) takes longer than 60 seconds.

At scale this happens reasonably frequently.

1. Raised the duration to 3 minutes

2. Pulled the resizing mutex out of the downloading mutex
so we have less and clearer error logs
2020-05-15 12:45:47 +10:00
Sam Saffron
d0d5a138c3
DEV: stop freezing frozen strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.

For files with `# frozen_string_literal: true`

```
puts %w{a b}[0].frozen?
=> true

puts "hi".frozen?
=> true

puts "a #{1} b".frozen?
=> true

puts ("a " + "b").frozen?
=> false

puts (-("a " + "b")).frozen?
=> true
```

For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
2020-04-30 16:48:53 +10:00
Jarek Radosz
c1c211365a
FIX: Improve clearing store cache (#9568)
1. Shorter
2. Simpler
3. Doesn't depend on external binaries
4. Doesn't fail on large amounts of files
5. Hopefully eliminates flaky spec errors
2020-04-28 17:24:04 +02:00
David Taylor
ba616ffb50
DEV: Use a tmp directory for storing uploads in tests (#9554)
This avoids development-mode upload files from polluting the test environment
2020-04-28 14:03:04 +01:00
Gerhard Schlager
c6b411f6c1 FIX: Restore to S3 didn't work without env variables
The `uplaods:migrate_to_s3` rake task should always use the environment variables, because you usually don't want to break your site's uploads during the migration. But restoring a backup should work with site settings as well as environment variables, otherwise you can't restore uploads to S3 from the web interface.
2020-04-19 20:24:40 +02:00
Gerhard Schlager
baae0e7446 FIX: Infinite loop in migrate_to_s3 rake task 2020-04-19 20:24:40 +02:00
Gerhard Schlager
5bffb033df FIX: The migrate_to_s3 rake task couldn't find the AWS SDK 2020-03-26 16:41:10 +01:00
Gerhard Schlager
93b8b04b06 FIX: Migrating uploads to S3 could miss files
The rake task aborted the migration with "Already migrated" when all upload URLs linked to the correct S3 bucket even though the files didn't exist on S3. By removing the first check we force the rake task to check for the existance of uploads on S3.
2020-03-04 12:50:48 +01:00
Gerhard Schlager
0adab26e45 FIX: Don't count ignored, missing uploads in migration to S3 2020-02-12 16:18:52 +01:00
Jarek Radosz
63a4aa65ff
DEV: Ignore ls errors when clearing FileStore cache (#8780)
A race condition issue is possible when multiple thread/processes are calling this method.
`ls` prints out to stderr "cannot access '...': No such file or directory" if any of the files it's currently trying to list are being removed by the `xargs rm -rf` in an another process. That doesn't affect the result, but it did raise an error before this change.

Tested on a production instance where the original issue was observed.

Co-Authored-By: Régis Hanol <regis@hanol.fr>
2020-01-27 02:59:54 +01:00
Martin Brennan
7c32411881
FEATURE: Secure media allowing duplicated uploads with category-level privacy and post-based access rules (#8664)
### General Changes and Duplication

* We now consider a post `with_secure_media?` if it is in a read-restricted category.
* When uploading we now set an upload's secure status straight away.
* When uploading if `SiteSetting.secure_media` is enabled, we do not check to see if the upload already exists using the `sha1` digest of the upload. The `sha1` column of the upload is filled with a `SecureRandom.hex(20)` value which is the same length as `Upload::SHA1_LENGTH`. The `original_sha1` column is filled with the _real_ sha1 digest of the file. 
* Whether an upload `should_be_secure?` is now determined by whether the `access_control_post` is `with_secure_media?` (if there is no access control post then we leave the secure status as is).
* When serializing the upload, we now cook the URL if the upload is secure. This is so it shows up correctly in the composer preview, because we set secure status on upload.

### Viewing Secure Media

* The secure-media-upload URL will take the post that the upload is attached to into account via `Guardian.can_see?` for access permissions
* If there is no `access_control_post` then we just deliver the media. This should be a rare occurrance and shouldn't cause issues as the `access_control_post` is set when `link_post_uploads` is called via `CookedPostProcessor`

### Removed

We no longer do any of these because we do not reuse uploads by sha1 if secure media is enabled.

* We no longer have a way to prevent cross-posting of a secure upload from a private context to a public context.
* We no longer have to set `secure: false` for uploads when uploading for a theme component.
2020-01-16 13:50:27 +10:00
Gerhard Schlager
e474cda321 REFACTOR: Restoring of backups and migration of uploads to S3 2020-01-14 11:41:35 +01:00
Vinoth Kannan
3b7f5db5ba
FIX: parallel spec system needs a dedicated upload folder for each worker. (#8547) 2019-12-18 11:21:57 +05:30
Vinoth Kannan
d3e7768ea8 Revert "FIX: parallel spec system needs needs a dedicated upload folder for each worker. (#8372)"
This reverts commit 42e5176bc3.
2019-11-19 15:02:18 +05:30
Vinoth Kannan
42e5176bc3
FIX: parallel spec system needs needs a dedicated upload folder for each worker. (#8372) 2019-11-19 13:16:20 +05:30
Penar Musaraj
102909edb3 FEATURE: Add support for secure media (#7888)
This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access. 

A few notes: 

- the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads
- the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured
- upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status
- when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error
- when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3
2019-11-18 11:25:42 +10:00
Penar Musaraj
067696df8f DEV: Apply Rubocop redundant return style 2019-11-14 15:10:51 -05:00
David Taylor
1998be3b27
DEV: Raise errors when cleaning the download cache, and fix for macOS (#8319)
POSIX's `head` specification states: "The application shall ensure that the number option-argument is a positive decimal integer"

Negative values are supported on GNU `head`, so this works in the discourse docker image. However, in some environments (e.g. macOS), the system `head` version fails with a negative `n` parameter.

This commit does two things:

Checks the status at each stage of the pipe, so it cannot fail silently
Flip the `ls` command to list in descending time order, and use `tail -n +501` instead of `head -n -500`.

The visible result is that macOS users no longer see head: illegal line count -- -500 printed throughout the test suite.
2019-11-08 15:34:03 +00:00
Daniel Waterworth
55a1394342 DEV: pluck_first
Doing .pluck(:column).first is a very common pattern in Discourse and in
most cases, a limit cause isn't being added. Instead of adding a limit
clause to all these callsites, this commit adds two new methods to
ActiveRecord::Relation:

pluck_first, equivalent to limit(1).pluck(*columns).first

and pluck_first! which, like other finder methods, raises an exception
when no record is found
2019-10-21 12:08:20 +01:00
Gerhard Schlager
24877a7b8c FIX: Correctly encode non-ASCII filenames in HTTP header
Backport of fix from Rails 6: 890485cfce
2019-08-07 19:10:50 +02:00
Rafael dos Santos Silva
606c0ed14d
FIX: S3 uploads were missing a cache-control header (#7902)
Admins still need to run the rake task to fix the files who where uploaded previously.
2019-08-06 14:55:17 -03:00
Gerhard Schlager
f2dc59d61f FEATURE: Add hidden setting to include S3 uploads in backups 2019-07-09 14:04:16 +02:00
Penar Musaraj
03805e5a76
FIX: Ensure lightbox image download has correct content disposition in S3 (#7845) 2019-07-04 11:32:51 -04:00
Vinoth Kannan
b7830680b6 DEV: use cdn url to download the external uploads to local. 2019-06-06 19:17:19 +05:30
Penar Musaraj
f00275ded3 FEATURE: Support private attachments when using S3 storage (#7677)
* Support private uploads in S3
* Use localStore for local avatars
* Add job to update private upload ACL on S3
* Test multisite paths
* update ACL for private uploads in migrate_to_s3 task
2019-06-06 13:27:24 +10:00
Guo Xiang Tan
a3938f98f8 Revert changes to FileStore::S3Store#path_for in f0620e7118.
There are some places in the code base that assumes the method should
return nil.
2019-05-29 18:39:07 +08:00
Guo Xiang Tan
f0620e7118 FEATURE: Support [description|attachment](upload://<short-sha>) in MD take 2.
Previous attempt was missing `post_uploads` records.
2019-05-29 09:26:32 +08:00
Penar Musaraj
7c9fb95c15 Temporarily revert "FEATURE: Support [description|attachment](upload://<short-sha>) in MD. (#7603)"
This reverts commit b1d3c678ca.

We need to make sure post_upload records are correctly stored.
2019-05-28 16:37:01 -04:00
Guo Xiang Tan
b1d3c678ca FEATURE: Support [description|attachment](upload://<short-sha>) in MD. (#7603) 2019-05-28 11:18:21 -04:00
David Taylor
ef660d5a3e FIX: Return consistent character encodings when downloading S3 uploads
Net::HTTP always returns ASCII-8BIT encoding. File.read auto-detects the encoding. This leads to an encoding inconsistency between a fresh download, and a cached download. This commit ensures all downloaded files are treated equally, by always returning the cached version from the filesystem, even during initial download.

One symptom of this problem is during theme exports: https://meta.discourse.org/t/116907

Related ruby ticket: https://bugs.ruby-lang.org/issues/2567
2019-05-17 11:27:00 +01:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Guo Xiang Tan
5b934cb33d FIX: Error when trying to move the same file to tombstone.
If an optimized image is destroyed when a previous similar optimized
image is already placed in the tombstone, `FileUtils.move` will blow up.
2019-04-24 16:47:36 +08:00
Guo Xiang Tan
243fb8d9ad Fix the build. 2019-03-13 17:39:07 +08:00
Guo Xiang Tan
b0c8fdd7da FIX: Properly support defaults for upload site settings. 2019-03-13 16:36:57 +08:00
Régis Hanol
664e90bd17 FIX: ensure local images use local CDN when uploads are stored on S3
When the S3 store was enabled, we were only applying the S3 CDN.
So all images stored locally, like the emojis, were never put on the local CDN.

Fixed a bunch of CookedPostProcessor test by adding a call to 'optimize_urls'
in order to get final URLs.

I also removed the unnecessary PrettyText.add_s3_cdn method since this is already
handled in the CookedPostProcessor.
2019-02-20 19:24:38 +01:00
Vinoth Kannan
563b953224 DEV: Add 'backfill_etags_' to the method name since it also backfilling the etags 2019-02-19 21:54:35 +05:30
Vinoth Kannan
0472bd4adc FIX: Remove 'backfill_etags' keyword argument from 'uploads:missing' rake task
And etags backfilling code is optimized
2019-02-15 00:34:35 +05:30
Vinoth Kannan
7b5931013a Update rake task to backfill etags from s3 inventory 2019-02-14 05:18:06 +05:30
Vinoth Kannan
b4f713ca52
FEATURE: Use amazon s3 inventory to manage upload stats (#6867) 2019-02-01 10:10:48 +05:30
Vinoth Kannan
f94c0283b2
FIX: Use correct version when generating file path for optimized image (#6871) 2019-01-11 18:35:38 +05:30
Vinoth Kannan
75dbb98cca FEATURE: Add S3 etag value to uploads table (#6795) 2019-01-04 14:16:22 +08:00
Régis Hanol
5381096bfd PERF: new 'migrate_to_s3' rake task 2018-12-26 17:34:49 +01:00
Rishabh
cae5ba7356 FIX: Ensure that multisite s3 uploads are tombstoned correctly (#6769)
* FIX: Ensure that multisite uploads are tombstoned into the correct paths

* Move multisite specs to spec/multisite/s3_store_spec.rb
2018-12-19 13:32:32 +08:00
Rishabh
503ae1829f FIX: All multisite upload paths should start with /uploads/default/.. (#6707) 2018-12-03 12:04:14 +08:00
Rishabh
871d4543cc FIX: Use File.join for relative_base_url, fix spec 2018-11-29 09:49:56 +05:30
Rishabh
05a4f3fb51 FEATURE: Multisite support for S3 image stores (#6689)
* FEATURE: Multisite support for S3 image stores

* Use File.join to concatenate all paths & fix linting on multisite/s3_store_spec.rb
2018-11-29 12:11:48 +08:00
Vinoth Kannan
bcdf5b2f47 DEV: improve missing uploads query and skip checking file size 2018-11-27 02:21:33 +05:30
Vinoth Kannan
4ccf9d28eb Remove trailing whitespaces 2018-11-27 01:15:29 +05:30
Vinoth Kannan
fd272eee44 FEATURE: Make uploads:missing task compatible with s3 uploads 2018-11-27 00:54:51 +05:30
Guo Xiang Tan
ce6a0a5e9e FIX: Moving upload to tombstone should update modification time.
A upload created a long time ago will be nuked from the tombstone
immediately if it gets deleted.
2018-09-18 10:48:29 +08:00
Guo Xiang Tan
e1b16e445e Rename FileHelper.is_image? -> FileHelper.is_supported_image?. 2018-09-12 09:22:28 +08:00
Guo Xiang Tan
8496537590 Add RECOVER_FROM_S3 to uploads:list_posts_with_broken_images rake task. 2018-09-10 15:14:30 +08:00
Sam
5d96809abd FIX: improve support for subfolder S3 CDN 2018-08-22 12:31:13 +10:00
Sam
f5142861e5 Revert "Revert "FIX: upload URLs from S3 on subfolder installs""
This reverts commit 26c96e97e5.

We have no choice but to run this code
2018-08-22 11:31:33 +10:00
Sam
26c96e97e5 Revert "FIX: upload URLs from S3 on subfolder installs"
This reverts commit 357df2ff4f.
2018-08-22 10:51:40 +10:00
Neil Lalonde
357df2ff4f FIX: upload URLs from S3 on subfolder installs 2018-08-21 14:58:55 -04:00
Guo Xiang Tan
aafff740d2 Add FileStore::S3Store#copy_file. 2018-08-08 11:30:34 +08:00
Andrew Schleifer
dba22bbde2 rollback changes
This reverts:
* 1baba84c438e "fix s3 subfolders harder"
* ea5e57938edf "fix test for absolute_base_url change"
2018-07-06 17:16:40 -05:00
Andrew Schleifer
52e9f49ec1 fix s3 subfolders harder
specifically, include the folder in absolute_base_url
2018-07-06 16:28:40 -05:00
Régis Hanol
448e2fe1a2 FIX: properly delete files in the download cache 2018-07-04 18:18:39 +02:00
Andrew Schleifer
4be0e31459 fix s3_cdn_url when the s3 bucket contains a folder 2018-05-23 15:51:02 -05:00
Régis Hanol
5f4f617689 FIX: cache_file storage cleanup logic was wrong
https://meta.discourse.org/t/68296
2018-01-18 17:00:04 +01:00
Sam
70bb2aa426 FEATURE: allow specifying s3 config via globals
This refactors handling of s3 so it can be specified via GlobalSetting

This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3

It is a critical setting for situations where assets are mirrored to s3.
2017-10-06 16:20:01 +11:00
Guo Xiang Tan
8cc8010564 Maintain backwards compatibility before Jobs::MigrateUploadExtensions runs. 2017-08-03 11:56:55 +09:00
Neil Lalonde
83011045c8 fix rubocop offenses 2017-07-31 11:59:16 -04:00
Neil Lalonde
5d528f0d15 Merge pull request #4958 from dmacjam/search_posts_by_filetype
FEATURE: Search posts by filetype
2017-07-31 11:55:34 -04:00
Guo Xiang Tan
5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Neil Lalonde
d8c27e3871 Merge branch 'master' into search_posts_by_filetype 2017-07-25 14:41:20 -04:00
Jakub Macina
677267ae78 Add onceoff job for uploads migration of column extension. Simplify filetype search and related rspec tests. 2017-07-12 17:19:27 +02:00
Jakub Macina
8c445e9f17 Fix backend code for searching by a filetype as a combination of uploads and topic links. Add rspec test for extracting file extension in upload. 2017-07-06 19:19:31 +02:00
Régis Hanol
5d63a7f4a6 FIX: pull hotlinked images even when they have no extension 2017-06-13 13:27:05 +02:00
Robin Ward
cdbe027c1c Refactor FileHelper to use keyword arguments. 2017-05-24 13:54:26 -04:00
Guo Xiang Tan
f534f041a0 FIX: Ensure directory exists. 2017-04-07 15:50:17 +08:00
Matt Palmer
da7a44064b Fix purge_tombstone for the brave new world of secure command execution 2017-03-22 10:27:07 +11:00
Guo Xiang Tan
e7c972ac89 FIX: Don't use backticks that take in inputs. 2017-03-17 15:33:51 +08:00
Régis Hanol
93dfc87b99 FIX: always set the 'content_type' when storing a file on S3 2016-10-17 19:16:29 +02:00
Guo Xiang Tan
3141c179f7 REFACTOR: Get bucket name from S3Helper. 2016-08-19 14:08:37 +08:00
Guo Xiang Tan
7ff1f6cb9d Allow custom bucket name for FileStore::S3Store. 2016-08-16 15:25:42 +08:00
Guo Xiang Tan
205be0d044 Remove unused require. 2016-08-15 21:58:55 +08:00
Guo Xiang Tan
0433163866 FEATURE: Support subfolders in SiteSetting.s3_backup_bucket. 2016-08-15 16:14:51 +08:00
Guo Xiang Tan
aa5de3c40a FEATURE: Support subfolders in S3 bucket name.
This commit also fixes a bug where s3 uploads are not
moved to a tombstone folder when removed.
2016-08-15 13:07:41 +08:00
Guo Xiang Tan
3378ee223f FIX: Incorrect path being passed to S3Store#remove_file. 2016-08-15 11:35:30 +08:00
Guo Xiang Tan
1779a9634a FIX: Make sure we raise an error when method is not implemented. 2016-08-12 11:43:57 +08:00
Hu Ming
f8a12d4940 Add support for AWS cn (#4327) 2016-07-14 16:56:09 +02:00
Régis Hanol
5169bcdb6e FIX: httpshttps ultra secure URLs 2016-06-30 16:55:01 +02:00
Régis Hanol
1caaf5208f move tombstone under 'uploads/' for easier deployment 2016-05-30 09:46:27 +02:00
Guo Xiang Tan
26084397c1
FIX: Check if file exists upfront. 2016-05-23 10:25:53 +08:00
Rafael dos Santos Silva
c2e0eeb089 Separate relative base_url and upload_path
This makes easier to reason about paths
2016-03-10 00:47:18 -03:00
Rafael dos Santos Silva
4d01328bc2 FIX: Image Lightbox on Subfolder Install
Since LocalSotre.is_relative wasn't subfolder aware, images on subfolder install weren't getting lightboxed.
2016-03-02 18:44:15 -03:00
Robin Ward
f155ff8270 FIX: Tests would fail if your test db's optimized image ids were high 2015-10-16 17:08:41 -04:00
Luke GB
acc05dd3a5 Fix LocalStore.remove_file to not raise if source doesn't exist
FileUtils.move actually ends up raising an "unknown file type" error if the file doesn't exist instead of Errno::ENOENT.

It's possible to rescue this, but in the end it's easier to just ask move not to throw an error, since we're going to throw it away anyway.
2015-07-12 14:14:12 +01:00
Régis Hanol
81a699e2b0 better support for mixed content 2015-06-01 17:49:58 +02:00