Commit Graph

211 Commits

Author SHA1 Message Date
David Taylor
451b9b245f
DEV: Rename new upload rake tasks
These tasks are s3-specific, so update the names to make that very clear
2020-08-12 23:26:13 +01:00
Sam Saffron
ec173a72d9
DEV: add more counts to analyze missing 2020-08-12 17:28:58 +10:00
Sam Saffron
97fe68e980
DEV: look for avatars when analyzing missing uploads 2020-08-12 14:04:21 +10:00
Sam Saffron
990cd8c2e4
FEATURE: introduce tasks for dealing with legacy broken uploads
`rake uploads:analyze_missing` can be used get rich information regarding
uploads missing from s3 (where verified is false)

`rake uploads:fix_missing` is a work in progress task for automatically
correcting certain historic issues. At the moment it simply rebakes all
posts with missing uploads, but it will improve over time
2020-08-12 13:33:04 +10:00
Michael K Johnson
81e6bc7a0f
FEATURE: Add uploads:batch_migrate_from_s3 task to limit total posts migrated at once (#9933)
Allow limiting the number of migrations to do at once, both to do migrations that
have impact limited to multiple off-peak usage hours to reduce user impact from
a migration, and to allow tests that do only a very small number for test
purposes. ("Give me a ping, Vasili. One ping only, please.")
2020-06-04 09:48:11 +10:00
Martin Brennan
6f978bc95c
FIX: First pass to improve efficiency of secure uploads rake task (#9284)
Get rid of harmful each loop over uploads to update. Instead we put all the unique access control posts for the uploads into a map for fast access (vs using the slow .find through array) and look up the post when it is needed when looping through the uploads in batches.

On a Discourse instance with ~93k uploads, a simplified version of the old method takes > 1 minute, and a simplified version of the new method takes ~18s and uses a lot less memory.
2020-03-26 15:59:57 +10:00
Martin Brennan
efd5fb665b
DEV: Fix flaky time sensitive uploads.rake specs (#9283)
Also fix issues in spec where certain uploads were not considered secure
2020-03-26 13:31:39 +10:00
Martin Brennan
0388653a4d
DEV: Upload and secure media retroactive rake task improvements (#9027)
* Add uploads:sync_s3_acls rake task to ensure the ACLs in S3 are the correct (public-read or private) setting based on upload security

* Improved uploads:disable_secure_media to be more efficient and provide better messages to the user.

* Rename uploads:ensure_correct_acl task to uploads:secure_upload_analyse_and_update as it does more than check the ACL

* Many improvements to uploads:secure_upload_analyse_and_update

* Make sure that upload.access_control_post is unscoped so deleted posts are still fetched, because they still affect the security of the upload.

* Add escape hatch for capture_stdout in the form of RAILS_ENABLE_TEST_STDOUT. If provided the capture_stdout code will be ignored, so you can see the output if you need.
2020-03-03 10:03:58 +11:00
Martin Brennan
cfd56e9159 Include access control post when loading uploads in rake task
* to avoid N+1 query
2020-02-18 10:35:15 +10:00
Martin Brennan
9dcc454a07
FIX: Improvements and fixes for update_upload_acl rake task (#8980)
The rake task was broken, because the addition of the
UploadSecurity check returned true/false instead of the
upload ID to determine which uploads to set secure.
Also it was rebaking the posts in the wrong place and
pretty inefficiently at that. Also it was rebaking before
the upload was being changed to secure in the DB.
This also updates the task to set the access_control_post_id
for all uploads. the first post the upload is linked to is used
for the access control. if the upload doesn't get changed to
secure this doesn't affect anything.
Added a spec for the rake task to cover common cases.
2020-02-17 14:21:43 +10:00
Gerhard Schlager
4e8be6f18b FIX: uploads:s3_migration_status rake task was broken 2020-01-28 22:10:25 +01:00
Martin Brennan
7c32411881
FEATURE: Secure media allowing duplicated uploads with category-level privacy and post-based access rules (#8664)
### General Changes and Duplication

* We now consider a post `with_secure_media?` if it is in a read-restricted category.
* When uploading we now set an upload's secure status straight away.
* When uploading if `SiteSetting.secure_media` is enabled, we do not check to see if the upload already exists using the `sha1` digest of the upload. The `sha1` column of the upload is filled with a `SecureRandom.hex(20)` value which is the same length as `Upload::SHA1_LENGTH`. The `original_sha1` column is filled with the _real_ sha1 digest of the file. 
* Whether an upload `should_be_secure?` is now determined by whether the `access_control_post` is `with_secure_media?` (if there is no access control post then we leave the secure status as is).
* When serializing the upload, we now cook the URL if the upload is secure. This is so it shows up correctly in the composer preview, because we set secure status on upload.

### Viewing Secure Media

* The secure-media-upload URL will take the post that the upload is attached to into account via `Guardian.can_see?` for access permissions
* If there is no `access_control_post` then we just deliver the media. This should be a rare occurrance and shouldn't cause issues as the `access_control_post` is set when `link_post_uploads` is called via `CookedPostProcessor`

### Removed

We no longer do any of these because we do not reuse uploads by sha1 if secure media is enabled.

* We no longer have a way to prevent cross-posting of a secure upload from a private context to a public context.
* We no longer have to set `secure: false` for uploads when uploading for a theme component.
2020-01-16 13:50:27 +10:00
Gerhard Schlager
e474cda321 REFACTOR: Restoring of backups and migration of uploads to S3 2020-01-14 11:41:35 +01:00
Martin Brennan
abca91cc4d
FEATURE: Add rake task to disable secure media (#8669)
* Add a rake task to disable secure media. This sets all uploads to `secure: false`, changes the upload ACL to public, and rebakes all the posts using the uploads to make sure they point to the correct URLs. This is in a transaction for each upload with the upload being updated the last step, so if the task fails it can be resumed.
* Also allow viewing media via the secure url if secure media is disabled, redirecting to the normal CDN url, because otherwise media links will be broken while we go and rebake all the posts + update ACLs
2020-01-07 12:27:24 +10:00
David Taylor
46d8fd3831 FIX: Allow for nil upload record when migrating to S3 2019-12-04 15:13:39 +00:00
Penar Musaraj
102909edb3 FEATURE: Add support for secure media (#7888)
This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access. 

A few notes: 

- the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads
- the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured
- upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status
- when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error
- when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3
2019-11-18 11:25:42 +10:00
Krzysztof Kotlarek
427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
Michael Brown
503a11cc88 FIX: inline_uploads and subfolder (#8076)
* FIX: inline_uploads and subfolder

* if subfolder, also look for images with a path containing
  cdn_url + relative_url_root

* FIX: migrate_to_s3 task and subfolder
2019-09-11 11:50:48 +10:00
Vinoth Kannan
e44d56e4d2 DEV: raise error only when 'STOP_ON_ERROR' env variable is available. 2019-08-01 23:54:06 +05:30
Guo Xiang Tan
d21594f4f7 Revert changes added by mistake in 2b19e2acc8. 2019-06-25 15:25:12 +08:00
Guo Xiang Tan
2b19e2acc8 Fix typo in a0aeabbb94. 2019-06-25 15:18:57 +08:00
Penar Musaraj
f00275ded3 FEATURE: Support private attachments when using S3 storage (#7677)
* Support private uploads in S3
* Use localStore for local avatars
* Add job to update private upload ACL on S3
* Test multisite paths
* update ACL for private uploads in migrate_to_s3 task
2019-06-06 13:27:24 +10:00
Gerhard Schlager
f7a2648694 FEATURE: Migrate uploads to S3 during restore 2019-06-04 15:47:36 +02:00
Rafael dos Santos Silva
725588f835 FIX: migrate_to_s3 wasn't IAM profile aware 2019-06-01 12:09:46 -03:00
Vinoth Kannan
e12ae453e9 FIX: verify the exitence of s3_object properly without db name 2019-05-29 15:10:36 +05:30
Vinoth Kannan
9a9a06e34b DEV: add option to skip etag verification on 'migrate_to_s3' rake task 2019-05-29 14:16:36 +05:30
Vinoth Kannan
b3779dc377 DEV: rename 'uploads:missing' rake task into 'uploads:missing_files'.
To improve the readability.
2019-05-28 23:30:43 +05:30
Sam Saffron
7ce58df7bf lint the file
somehow I did not notice this on save
2019-05-23 15:28:41 +10:00
Sam Saffron
a5ce9cb470 FEATURE: fix_relative_upload_links now multisite safety
This also finds `<img src="/uploads/xyz` HTML images in raw and corrects
them. Also handles some cross multisite recovery and provides better output
2019-05-23 15:09:16 +10:00
Sam Saffron
e8799f0ba4 DEV: improve uploads:recover job so it stores a map of old to new sha
Previous attempt created broken images
2019-05-22 15:51:09 +10:00
Sam Saffron
ebcb571de7 FIX: allow upload recovery to recover uploads with sha mismatch
Filename on disk may mismatch sha of file in some old 1X setups. This will
attempt to recover file even if sha1 mismatches. We had an old bug that
caused this.

This also adds `uploads:fix_relative_upload_links` which attempts to replace
urls of the format `/upload/default/...` with `upload://`
2019-05-22 15:24:36 +10:00
Sam Saffron
f772ecc597 DEV: Correct missing output detailing failure
uploads:s3_migration_status was failing but not returning proper output
2019-05-22 12:58:54 +10:00
Guo Xiang Tan
5429c9b5e9 DEV: Fix incorrectly hardcoded value in rake task. 2019-05-22 09:01:25 +08:00
Sam Saffron
a2428bd862 FEATURE: warn about sidekiq overload prior to migrating
Also makes pre-flight check ensure there is no giant backlog of posts that
need to be cooked
2019-05-22 10:04:33 +10:00
Sam Saffron
5fdc7b7ca2 Correct 59012fc0
Logic was flipped here by mistake, oops
2019-05-22 09:48:03 +10:00
Sam Saffron
73f178a634 FEATURE: posts:rebake_uncooked_posts to look at mismatching baked_version
also amends flagging onebox updates to set baked_version to nil
2019-05-22 09:43:31 +10:00
Sam Saffron
4f296608da FEATURE: add uploads:s3_migration_status for looking at current status
Also a few minor cleanups and better progress reporting
2019-05-22 09:00:32 +10:00
Sam Saffron
59012fc0f7 PERF: mark posts for rebake vs forcing a rebake inline when migrating to s3
Rebaking posts can be expensive instead of blocking here simply mark posts
for rebake.

We can then work through them faster in other jobs, plus this should not
hold of a datacenter migration.
2019-05-22 08:39:25 +10:00
Vinoth Kannan
7b82850f66 FIX: migrate_to_s3 task should remap attachment links too. 2019-05-21 21:58:11 +05:30
Sam Saffron
8360415453 FEATURE: big warning for uploads missing which can be very very slow on S3 2019-05-21 16:11:56 +10:00
Sam Saffron
cb86d8279a FEATURE: add toggle for uploads:missing so it can skip external
Validating s3 uploads in uploads:missing can be very expensive, allow to
bypass.
2019-05-21 16:11:56 +10:00
Sam Saffron
f4d4f7871e FEATURE: make posts:missing_uploads multisite friendly
Previously this rake job would only run on a single site which is a bit
misleading

This also adds `VERBOSE=1 rake posts:missing_uploads` that will provide a
full report of missing uploads
2019-05-21 12:45:51 +10:00
David Taylor
0fbff66d97 DEV: Correct rake task syntax error 2019-05-20 17:43:30 +01:00
David Taylor
31902159af DEV: Allow skipping failed migrations when running S3 migration
Use the SKIP_FAILED environment variable to skip failed sites. Use with caution - make sure you go back and re-run the failed migrations afterwards.
2019-05-20 17:25:56 +01:00
David Taylor
a15cca9a0f DEV: Improve error message for posts:missing_uploads during S3 migration 2019-05-20 16:09:22 +01:00
Vinoth Kannan
2bfc0cf145 FIX: skip old scheme upload URLs while validating s3 uploads remap 2019-05-20 19:13:41 +05:30
David Taylor
41bc90dd3e DEV: Add progress indicator for post rebake during s3 migration
Now that we run sidekiq jobs synchronously, this is important
2019-05-20 14:19:58 +01:00
David Taylor
77a06209cb DEV: Skip S3 migration if all uploads are already migrated
This makes the task resumable in a multisite context
2019-05-20 14:17:37 +01:00
Vinoth Kannan
bc0c4b7253 FIX: should not migrate the system uploads to s3 2019-05-20 14:27:34 +05:30
Vinoth Kannan
be3fb85a04 DEV: add post migration checks and raise error if failed. 2019-05-20 14:18:28 +05:30
Sam Saffron
08371db0cc FIX: ensure we don't queue any jobs during s3 migration
Previously we could flood sidekiq with jobs which is not ideal.

This ensures we are 100% done when the job is done.
2019-05-20 16:28:50 +10:00
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
David Taylor
20daa76829 DEV: Change priority to ultra_low for post-s3-migration rebake 2019-05-10 18:37:45 +08:00
Guo Xiang Tan
2adbec1b3c PERF: Speed up migrate_to_s3 rake task by deleting optimized image record.
No point moving all optimized image files to tombstone when the store is
changing. Also, `destroy_all` can easily blow memory since we are no
loading in batches.
2019-05-07 16:10:32 +08:00
Guo Xiang Tan
71e431de9e DEV: Fix hardcoded value introduced in cc2bac86e9. 2019-03-26 07:45:21 +08:00
Gerhard Schlager
cd4fd447ca Make Rubocop happy 2019-03-25 17:04:45 +01:00
Guo Xiang Tan
cc2bac86e9 FIX: Dry run broken for uploads:migrate_to_s3. 2019-03-25 22:38:24 +08:00
Guo Xiang Tan
19c3c25db1 FIX: Handle BBCode in migrate_to_s3 task as well. 2019-03-22 16:47:06 +08:00
Guo Xiang Tan
2d34be24be FIX: Rebake lightbox and use short upload urls in migrate_to_s3. 2019-03-22 13:09:59 +08:00
Guo Xiang Tan
4e594f2b2b FIX: Destroy optimized images in upload:migrate_to_s3 rake task.
`OptimizedImage` are currently not renegerated when the image changes
store.
2019-03-21 16:50:15 +08:00
Régis Hanol
9dbca41152 FIX: don't check system uploads in migrate_to_s3 rake task 2019-03-18 11:01:18 +01:00
Guo Xiang Tan
d82876896e FIX: uploads:migrate_to_s3 broken for GlobalSetting using file provider. 2019-03-11 14:21:35 +08:00
Régis Hanol
ad87b0d662
Make "uploads:recover_from_tombstone" call the "uploads:recover" rake task 2019-03-07 14:15:30 +01:00
Guo Xiang Tan
433f07fcb3 Fix confusing ENV in migrate_to_s3 rake task. 2019-02-22 14:44:46 +08:00
Rishabh
e69634ec3a FIX: Use s3_endpoint in migrate_to_s3 when not using S3
- overrides :region and uses :endpoint when SiteSetting.s3_endpoint is provided
- Now, we can use the new rake task with DigitalOcean Spaces
- I've tested that it's compatible with/without bucket folder path
- I've tested that it's compatible with S3 and it doesn't break S3 for non-default regions
- follow-up on 97e17fe0
2019-02-21 15:42:48 +05:30
Vinoth Kannan
0472bd4adc FIX: Remove 'backfill_etags' keyword argument from 'uploads:missing' rake task
And etags backfilling code is optimized
2019-02-15 00:34:35 +05:30
Vinoth Kannan
7b5931013a Update rake task to backfill etags from s3 inventory 2019-02-14 05:18:06 +05:30
Guo Xiang Tan
0cf2df3028 Fix remap in migrate_to_s3 rake task.
The current way of doing the remap only allows to run the rake task
once. Running the rake task more than once will end up badly.
2019-01-23 15:50:44 +08:00
Guo Xiang Tan
07850994d3 Add ENV to skip multisite prefix when migrating to s3. 2019-01-23 15:19:50 +08:00
Guo Xiang Tan
979d03aa68 Remove s3 bucket check in migrate_to_s3 task.
Bucket creation is expected to be handled by the user. If the bucket
does not exist, the script will fail anyway.
2019-01-23 15:04:51 +08:00
Guo Xiang Tan
99cd3ff6ee FIX: migrate_to_s3 task not setting the right content_disposition. 2019-01-23 15:04:47 +08:00
Rishabh
97e17fe084 FIX: Use ENV values instead of 'S3Helper.s3_options' in migrate_to_s3
This commit makes the rake task operational for all regions for s3. If we declare s3_endpoint as https://s3.amazonaws.com while
creating an instance of Aws::S3::Client, head_bucket fails for all s3 regions apart from us-east-1. The commit manually defines all
parameters for Aws::S3::Client apart from s3_endpoint to bypass this problem make this task usable for AWS S3.

Removing s3_endpoint from the payload means that custom endpoints like Minio/DO Spaces for will not work in the meantime and we'll
have to add support for a custom `s3_endpoint` in the future.

This commit follows up on 60790eb0.
2019-01-20 20:55:27 +05:30
Rishabh
60790eb006 FIX: Use GlobalSetting values instead of ENV variables in migrate_to_s3
TIL how GlobalSetting works in sync with environment variables
Also fixes a small bug where bucket value was being used when it could have been nil
2019-01-16 14:40:38 +05:30
Rishabh
ff8f9dc1c9 FIX: prefix should precede folder path (follow-up on 10fbb07e) 2019-01-15 15:58:19 +05:30
Rishabh
10fbb07e1a FIX: include folder name in prefix for listing files on S3 (follow-up on 3ec38f5a)
Fix the destination url in remap since it's already a part of s3_base_url
2019-01-15 15:23:55 +05:30
Régis Hanol
3ec38f5a3b Revert "FIX: migrate_to_s3 rake task with folder path"
This reverts commit 97fd12e8af.
2019-01-08 19:44:31 +01:00
Régis Hanol
97fd12e8af
FIX: migrate_to_s3 rake task with folder path 2019-01-08 18:56:18 +01:00
Rishabh
f181e9cc08
FIX: Add compatibility for bucket folder paths in migrate_to_s3 task (#6855)
* FIX: Add compatibility for bucket folder paths in migrate_to_s3 task
* Refactor bucket_name split logic into S3Helper
2019-01-08 20:04:48 +05:30
Rishabh
efc481d9c0 DEV: Use puts instead of printing newline (follow up on c5b7bda1) 2019-01-05 01:20:00 +05:30
Rishabh
c5b7bda198 DEV: Show migrate_to_s3 output on a new line 2019-01-04 18:09:54 +05:30
Guo Xiang Tan
9e50813252 FIX: Pass all necessary options in migrate_to_s3 rake task. 2019-01-02 09:11:23 +08:00
Régis Hanol
5381096bfd PERF: new 'migrate_to_s3' rake task 2018-12-26 17:34:49 +01:00
Vinoth Kannan
fd272eee44 FEATURE: Make uploads:missing task compatible with s3 uploads 2018-11-27 00:54:51 +05:30
Vinoth Kannan
bc41057949 minor copy edit 2018-11-20 12:07:56 +05:30
Vinoth Kannan
1a9a2bd5c1 DEV: Report the missing uploads count 2018-11-19 12:06:46 +05:30
Rishabh
4a12cfaecb Remove trailing whitespace for Rubocop 2018-11-13 17:19:26 +05:30
Brian Helba
ea94323766 FIX: 'migrate_from_s3' rake task should respect max sizes (#6598)
Rather than hardcode a maximum size of 20MB for uploads migrated from S3, the task should use site settings for this value.
2018-11-13 12:27:38 +01:00
Guo Xiang Tan
14ff47f6f1 Fix typo. 2018-11-08 16:42:12 +08:00
Guo Xiang Tan
7290145641 PERF: Speed up migrate_to_s3 rake task.
* Prioritizes non-image uploads
* Does one remap per upload instead of 3 remaps previously
* Every 100 uploads migrated, do 2 remaps which fixes broken
  URLs
* Exclude email_logs table from remap
2018-11-08 16:39:56 +08:00
Guo Xiang Tan
0232a3b5e5 PERF: Exclude tables when remapping in migrate_to_s3 rake task. 2018-11-08 12:37:36 +08:00
Guo Xiang Tan
918633aa12 FIX: upload:migrate_to_s3 rake task not remapping properly. 2018-10-10 15:09:21 +08:00
Guo Xiang Tan
24c55bd613 Add dry run option to UploadRecovery. 2018-09-12 21:53:01 +08:00
Guo Xiang Tan
c053f8ccf6 New rake task uploads:recover. 2018-09-12 01:52:30 -07:00
Guo Xiang Tan
6d01e0aa04 DEV: Print the error class in uploads:list_posts_with_broken_images. 2018-09-12 01:06:51 -07:00
Guo Xiang Tan
94ff428571 Pass the right value to rake task. 2018-09-10 20:07:28 +08:00
Guo Xiang Tan
4a966c639d DEV: Update uploads:list_posts_with_broken_images to recover from tombstone. 2018-09-10 17:01:41 +08:00
Guo Xiang Tan
68572b8afc Print error messages on why upload fails to save. 2018-09-10 16:02:13 +08:00
Guo Xiang Tan
0aca80e92a Fixes to uploads:list_posts_with_broken_images. 2018-09-10 15:16:29 +08:00
Guo Xiang Tan
8496537590 Add RECOVER_FROM_S3 to uploads:list_posts_with_broken_images rake task. 2018-09-10 15:14:30 +08:00
Guo Xiang Tan
72834f19ff DEV: Add rake tasks to list posts with broken images. 2018-09-05 16:54:15 +08:00