`rake uploads:analyze_missing` can be used get rich information regarding
uploads missing from s3 (where verified is false)
`rake uploads:fix_missing` is a work in progress task for automatically
correcting certain historic issues. At the moment it simply rebakes all
posts with missing uploads, but it will improve over time
Allow limiting the number of migrations to do at once, both to do migrations that
have impact limited to multiple off-peak usage hours to reduce user impact from
a migration, and to allow tests that do only a very small number for test
purposes. ("Give me a ping, Vasili. One ping only, please.")
Get rid of harmful each loop over uploads to update. Instead we put all the unique access control posts for the uploads into a map for fast access (vs using the slow .find through array) and look up the post when it is needed when looping through the uploads in batches.
On a Discourse instance with ~93k uploads, a simplified version of the old method takes > 1 minute, and a simplified version of the new method takes ~18s and uses a lot less memory.
* Add uploads:sync_s3_acls rake task to ensure the ACLs in S3 are the correct (public-read or private) setting based on upload security
* Improved uploads:disable_secure_media to be more efficient and provide better messages to the user.
* Rename uploads:ensure_correct_acl task to uploads:secure_upload_analyse_and_update as it does more than check the ACL
* Many improvements to uploads:secure_upload_analyse_and_update
* Make sure that upload.access_control_post is unscoped so deleted posts are still fetched, because they still affect the security of the upload.
* Add escape hatch for capture_stdout in the form of RAILS_ENABLE_TEST_STDOUT. If provided the capture_stdout code will be ignored, so you can see the output if you need.
The rake task was broken, because the addition of the
UploadSecurity check returned true/false instead of the
upload ID to determine which uploads to set secure.
Also it was rebaking the posts in the wrong place and
pretty inefficiently at that. Also it was rebaking before
the upload was being changed to secure in the DB.
This also updates the task to set the access_control_post_id
for all uploads. the first post the upload is linked to is used
for the access control. if the upload doesn't get changed to
secure this doesn't affect anything.
Added a spec for the rake task to cover common cases.
### General Changes and Duplication
* We now consider a post `with_secure_media?` if it is in a read-restricted category.
* When uploading we now set an upload's secure status straight away.
* When uploading if `SiteSetting.secure_media` is enabled, we do not check to see if the upload already exists using the `sha1` digest of the upload. The `sha1` column of the upload is filled with a `SecureRandom.hex(20)` value which is the same length as `Upload::SHA1_LENGTH`. The `original_sha1` column is filled with the _real_ sha1 digest of the file.
* Whether an upload `should_be_secure?` is now determined by whether the `access_control_post` is `with_secure_media?` (if there is no access control post then we leave the secure status as is).
* When serializing the upload, we now cook the URL if the upload is secure. This is so it shows up correctly in the composer preview, because we set secure status on upload.
### Viewing Secure Media
* The secure-media-upload URL will take the post that the upload is attached to into account via `Guardian.can_see?` for access permissions
* If there is no `access_control_post` then we just deliver the media. This should be a rare occurrance and shouldn't cause issues as the `access_control_post` is set when `link_post_uploads` is called via `CookedPostProcessor`
### Removed
We no longer do any of these because we do not reuse uploads by sha1 if secure media is enabled.
* We no longer have a way to prevent cross-posting of a secure upload from a private context to a public context.
* We no longer have to set `secure: false` for uploads when uploading for a theme component.
* Add a rake task to disable secure media. This sets all uploads to `secure: false`, changes the upload ACL to public, and rebakes all the posts using the uploads to make sure they point to the correct URLs. This is in a transaction for each upload with the upload being updated the last step, so if the task fails it can be resumed.
* Also allow viewing media via the secure url if secure media is disabled, redirecting to the normal CDN url, because otherwise media links will be broken while we go and rebake all the posts + update ACLs
This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access.
A few notes:
- the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads
- the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured
- upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status
- when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error
- when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains.
We no longer need to use Rails "require_dependency" anywhere and instead can just use standard
Ruby patterns to require files.
This is a far reaching change and we expect some followups here.
* FIX: inline_uploads and subfolder
* if subfolder, also look for images with a path containing
cdn_url + relative_url_root
* FIX: migrate_to_s3 task and subfolder
* Support private uploads in S3
* Use localStore for local avatars
* Add job to update private upload ACL on S3
* Test multisite paths
* update ACL for private uploads in migrate_to_s3 task
Filename on disk may mismatch sha of file in some old 1X setups. This will
attempt to recover file even if sha1 mismatches. We had an old bug that
caused this.
This also adds `uploads:fix_relative_upload_links` which attempts to replace
urls of the format `/upload/default/...` with `upload://`
Rebaking posts can be expensive instead of blocking here simply mark posts
for rebake.
We can then work through them faster in other jobs, plus this should not
hold of a datacenter migration.
Previously this rake job would only run on a single site which is a bit
misleading
This also adds `VERBOSE=1 rake posts:missing_uploads` that will provide a
full report of missing uploads
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
No point moving all optimized image files to tombstone when the store is
changing. Also, `destroy_all` can easily blow memory since we are no
loading in batches.
- overrides :region and uses :endpoint when SiteSetting.s3_endpoint is provided
- Now, we can use the new rake task with DigitalOcean Spaces
- I've tested that it's compatible with/without bucket folder path
- I've tested that it's compatible with S3 and it doesn't break S3 for non-default regions
- follow-up on 97e17fe0
This commit makes the rake task operational for all regions for s3. If we declare s3_endpoint as https://s3.amazonaws.com while
creating an instance of Aws::S3::Client, head_bucket fails for all s3 regions apart from us-east-1. The commit manually defines all
parameters for Aws::S3::Client apart from s3_endpoint to bypass this problem make this task usable for AWS S3.
Removing s3_endpoint from the payload means that custom endpoints like Minio/DO Spaces for will not work in the meantime and we'll
have to add support for a custom `s3_endpoint` in the future.
This commit follows up on 60790eb0.
* Prioritizes non-image uploads
* Does one remap per upload instead of 3 remaps previously
* Every 100 uploads migrated, do 2 remaps which fixes broken
URLs
* Exclude email_logs table from remap