This commit adds some system specs to test uploads with
direct to S3 single and multipart uploads via uppy. This
is done with minio as a local S3 replacement. We are doing
this to catch regressions when uppy dependencies need to
be upgraded or we change uppy upload code, since before
this there was no way to know outside manual testing whether
these changes would cause regressions.
Minio's server lifecycle and the installed binaries are managed
by the https://github.com/discourse/minio_runner gem, though the
binaries are already installed on the discourse_test image we run
GitHub CI from.
These tests will only run in CI unless you specifically use the
CI=1 or RUN_S3_SYSTEM_SPECS=1 env vars.
For a history of experimentation here see https://github.com/discourse/discourse/pull/22381
Related PRs:
* https://github.com/discourse/minio_runner/pull/1
* https://github.com/discourse/minio_runner/pull/2
* https://github.com/discourse/minio_runner/pull/3
This method is a huge footgun in production, since it calls
the Redis KEYS command. From the Redis documentation at
https://redis.io/commands/keys/:
> Warning: consider KEYS as a command that should only be used in
production environments with extreme care. It may ruin performance when
it is executed against large databases. This command is intended for
debugging and special operations, such as changing your keyspace layout.
Don't use KEYS in your regular application code.
Since we were only using `delete_prefixed` in specs (now that we
removed the usage in production in 24ec06ff85)
we can remove this and instead rely on `use_redis_snapshotting` on the
particular tests that need this kind of clearing functionality.
When uploading images via direct to S3 upload, we were
assuming that we could not pre-emptively check the file
size because the client may do preprocessing to reduce
the size, and UploadCreator could also further reduce the
size.
This, however, is not true of gifs, so we would have an
issue where you upload a gif > the max_image_size_kb
setting and had to wait until the upload completed for
this error to show.
Now, instead, when we direct upload gifs to S3, we check
the size straight away and present a file size error to
the user rather than making them wait. This will increase
meme efficiency by approximately 1000%.
Way back when this was introduced way back in b96c10a903
I didn't have any frame of reference for what these max rate
limit numbers should be, so 10 seemed like a reasonable limit
until a real world case where this did not make sense came
along.
The time has come.
Moving these into site settings, which are hidden since in most
cases there is no need to change these.
If a secure upload's access_control_post was trashed, and an anon user
tried to look at that upload, they would get a 500 error rather than
the correct 403 because of an error inside the PostGuardian logic.
This commit renames all secure_media related settings to secure_uploads_* along with the associated functionality.
This is being done because "media" does not really cover it, we aren't just doing this for images and videos etc. but for all uploads in the site.
Additionally, in future we want to secure more types of uploads, and enable a kind of "mixed mode" where some uploads are secure and some are not, so keeping media in the name is just confusing.
This also keeps compatibility with the `secure-media-uploads` path, and changes new
secure URLs to be `secure-uploads`.
Deprecated settings:
* secure_media -> secure_uploads
* secure_media_allow_embed_images_in_emails -> secure_uploads_allow_embed_images_in_emails
* secure_media_max_email_embed_image_size_kb -> secure_uploads_max_email_embed_image_size_kb
Previously we hardcoded the DOWNLOAD_URL_EXPIRES_AFTER_SECONDS const
inside S3Helper to be 5 minutes (300 seconds). For various reasons,
some hosted sites may need this to be longer for other integrations.
The maximum expiry time for presigned URLs is 1 week (which is
604800 seconds), so that has been added as a validation on the
setting as well. The setting is hidden because 99% of the time
it should not be changed.
This was causing issues on some sites, having the const, because this really is heavily
dependent on upload speed. We request 5-10 URLs at a time with this endpoint; for
a 1.5GB upload with 5mb parts this could mean 60 requests to the server to get all
the part URLs. If the user's upload speed is super fast they may request all 60
batches in a minute, if it is slow they may request 5 batches in a minute.
The other external upload endpoints are not hit as often, so they can stay as constant
values for now. This commit also increases the default to 20 requests/minute.
When creating files with create-multipart, if the file
size was somehow zero we were showing a very unhelpful
error message to the user. Now we show a nicer message,
and proactively don't call the API if we know the file
size is 0 bytes in JS, along with extra console logging
to help with debugging.
It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors.
By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.
This allows text editors to use correct syntax coloring for the heredoc sections.
Heredoc tag names we use:
languages: SQL, JS, RUBY, LUA, HTML, CSS, SCSS, SH, HBS, XML, YAML/YML, MF, ICS
other: MD, TEXT/TXT, RAW, EMAIL
This commit refactors the direct external upload routes (get presigned
put, complete external, create/abort/complete multipart) into a
helper which is then included in both BackupController and the
UploadController. This is done so UploadController doesn't need
strange backup logic added to it, and so each controller implementing
this helper can do their own validation/error handling nicely.
This is a follow up to e4350bb966
This PR introduces a new `enable_experimental_backup_uploads` site setting (default false and hidden), which when enabled alongside `enable_direct_s3_uploads` will allow for direct S3 multipart uploads of backup .tar.gz files.
To make multipart external uploads work with both the S3BackupStore and the S3Store, I've had to move several methods out of S3Store and into S3Helper, including:
* presigned_url
* create_multipart
* abort_multipart
* complete_multipart
* presign_multipart_part
* list_multipart_parts
Then, S3Store and S3BackupStore either delegate directly to S3Helper or have their own special methods to call S3Helper for these methods. FileStore.temporary_upload_path has also removed its dependence on upload_path, and can now be used interchangeably between the stores. A similar change was made in the frontend as well, moving the multipart related JS code out of ComposerUppyUpload and into a mixin of its own, so it can also be used by UppyUploadMixin.
Some changes to ExternalUploadManager had to be made here as well. The backup direct uploads do not need an Upload record made for them in the database, so they can be moved to their final S3 resting place when completing the multipart upload.
This changeset is not perfect; it introduces some special cases in UploadController to handle backups that was previously in BackupController, because UploadController is where the multipart routes are located. A subsequent pull request will pull these routes into a module or some other sharing pattern, along with hooks, so the backup controller and the upload controller (and any future controllers that may need them) can include these routes in a nicer way.
We are only using list_multipart_parts right now in the
uploads controller for multipart uploads to check if the
upload exists; thus we don't need up to 1000 parts.
Also adding a note for future explorers that list_multipart_parts
only gets 1000 parts max, and adding params for max parts
and starting parts.
The file size error messages for max_image_size_kb and
max_attachment_size_kb are shown to the user in the KB
format, regardless of how large the limit is. Since we
are going to support uploading much larger files soon,
this KB-based limit soon becomes unfriendly to the end
user.
For example, if the max attachment size is set to 512000
KB, this is what the user sees:
> Sorry, the file you are trying to upload is too big (maximum
size is 512000KB)
This makes the user do math. In almost all file explorers that
a regular user would be familiar width, the file size is shown
in a format based on the maximum increment (e.g. KB, MB, GB).
This commit changes the behaviour to output a humanized file size
instead of the raw KB. For the above example, it would now say:
> Sorry, the file you are trying to upload is too big (maximum
size is 512 MB)
This humanization also handles decimals, e.g. 1536KB = 1.5 MB
The generate_presigned_put endpoint for direct external uploads
(such as the one for the uppy-image-uploader) records allowed
S3 metadata values on the uploaded object. We use this to store
the sha1-checksum generated by the UppyChecksum plugin, for later
comparison in ExternalUploadManager.
However, we were not doing this for the create_multipart endpoint,
so the checksum was never captured and compared correctly.
Also includes a fix to make sure UppyChecksum is the last preprocessor to run.
It is important that the UppyChecksum preprocessor is the last one to
be added; the preprocessors are run in order and since other preprocessors
may modify the file (e.g. the UppyMediaOptimization one), we need to
checksum once we are sure the file data has "settled".
This pull request introduces the endpoints required, and the JavaScript functionality in the `ComposerUppyUpload` mixin, for direct S3 multipart uploads. There are four new endpoints in the uploads controller:
* `create-multipart.json` - Creates the multipart upload in S3 along with an `ExternalUploadStub` record, storing information about the file in the same way as `generate-presigned-put.json` does for regular direct S3 uploads
* `batch-presign-multipart-parts.json` - Takes a list of part numbers and the unique identifier for an `ExternalUploadStub` record, and generates the presigned URLs for those parts if the multipart upload still exists and if the user has permission to access that upload
* `complete-multipart.json` - Completes the multipart upload in S3. Needs the full list of part numbers and their associated ETags which are returned when the part is uploaded to the presigned URL above. Only works if the user has permission to access the associated `ExternalUploadStub` record and the multipart upload still exists.
After we confirm the upload is complete in S3, we go through the regular `UploadCreator` flow, the same as `complete-external-upload.json`, and promote the temporary upload S3 into a full `Upload` record, moving it to its final destination.
* `abort-multipart.json` - Aborts the multipart upload on S3 and destroys the `ExternalUploadStub` record if the user has permission to access that upload.
Also added are a few new columns to `ExternalUploadStub`:
* multipart - Whether or not this is a multipart upload
* external_upload_identifier - The "upload ID" for an S3 multipart upload
* filesize - The size of the file when the `create-multipart.json` or `generate-presigned-put.json` is called. This is used for validation.
When the user completes a direct S3 upload, either regular or multipart, we take the `filesize` that was captured when the `ExternalUploadStub` was first created and compare it with the final `Content-Length` size of the file where it is stored in S3. Then, if the two do not match, we throw an error, delete the file on S3, and ban the user from uploading files for N (default 5) minutes. This would only happen if the user uploads a different file than what they first specified, or in the case of multipart uploads uploaded larger chunks than needed. This is done to prevent abuse of S3 storage by bad actors.
Also included in this PR is an update to vendor/uppy.js. This has been built locally from the latest uppy source at d613b849a6. This must be done so that I can get my multipart upload changes into Discourse. When the Uppy team cuts a proper release, we can bump the package.json versions instead.
This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader.
A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used.
### Starting a direct S3 upload
When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded.
Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage.
### Completing a direct S3 upload
Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`.
1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this.
2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues.
We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large.
3. Finally we return the serialized upload record back to the client
There are several errors that could happen that are handled by `UploadsController` as well.
Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
This PR adds the first use of Uppy in our codebase, hidden behind a enable_experimental_image_uploader site setting. When the setting is enabled only the user card background uploader will use the new uppy-image-uploader component added in this PR.
I've introduced an UppyUpload mixin that has feature parity with the existing Upload mixin, and improves it slightly to deal with multiple/single file distinctions and validations better. For now, this just supports the XHRUpload plugin for uppy, which keeps our existing POST to /uploads.json.
* FIX: Be able to handle long file extensions
Some applications have really long file extensions, but if we truncate
them weird behavior ensues.
This commit changes the file extension size from 10 characters to 255
characters instead.
See:
https://meta.discourse.org/t/182824
* Keep truncation at 10, but allow uppercase and dashes
The 'Discourse SSO' protocol is being rebranded to DiscourseConnect. This should help to reduce confusion when 'SSO' is used in the generic sense.
This commit aims to:
- Rename `sso_` site settings. DiscourseConnect specific ones are prefixed `discourse_connect_`. Generic settings are prefixed `auth_`
- Add (server-side-only) backwards compatibility for the old setting names, with deprecation notices
- Copy `site_settings` database records to the new names
- Rename relevant translation keys
- Update relevant translations
This commit does **not** aim to:
- Rename any Ruby classes or methods. This might be done in a future commit
- Change any URLs. This would break existing integrations
- Make any changes to the protocol. This would break existing integrations
- Change any functionality. Further normalization across DiscourseConnect and other auth methods will be done separately
The risks are:
- There is no backwards compatibility for site settings on the client-side. Accessing auth-related site settings in Javascript is fairly rare, and an error on the client side would not be security-critical.
- If a plugin is monkey-patching parts of the auth process, changes to locale keys could cause broken error messages. This should also be unlikely. The old site setting names remain functional, so security-related overrides will remain working.
A follow-up commit will be made with a post-deploy migration to delete the old `site_settings` rows.
The download link on the lightbox for images was not downloading the image if the upload was marked secure, because the code in the upload controller route was not respecting the dl=1 param for force download.
This PR fixes this so the download link works for secure images as well as regular ligthboxed images.
Extracted commonly used spec helpers into spec/support/uploads_helpers.rb, removed unused stubs and let definitions. Makes it easier to write new S3-related specs without copy and pasting setup steps from other specs.
The previous fix (f43c0a5d85) wasn't working for images that were already uploaded.
The "metadata" (eg. 'for_*' and 'secure' attributes) were not added to existing uploads.
Also used 'Upload.get_from_url' is the admin/site_setting controller to properly retrieve
an upload from its URL.
Fixed the Upload::URL_REGEX to use the \h (hexadecimal) for the SHA
Follow-up-to: f43c0a5d85
When uploading an image as a site setting, we need to return the "raw" URL, otherwise
when saving the site setting, the upload won't be looked up properly.
Follow-up-to: f11363d446
* FIX: randomize file name when created from fixtures
When a temporary file is created from fixtures it should have a unique name.
It is to prevent a collision in parallel specs evaluation
* FIX: use /tmp/pid folder to keep fixture files
Signed S3 URLs are valid for 15 seconds, so we can safely allow the browser to cache them for 10 seconds. This should help with large numbers of requests when composing a post with many images.
* DEPRECATION: Remove support for api creds in query params
This commit removes support for api credentials in query params except
for a few whitelisted routes like rss/json feeds and the handle_mail
route.
Several tests were written to valid these changes, but the bulk of the
spec changes are just switching them over to use header based auth so
that they will pass without changing what they were actually testing.
Original commit that notified admins this change was coming was created
over 3 months ago: 2db2003187
* fix tests
* Also allow iCalendar feeds
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
If the “secure media” site setting is enabled then ALL files uploaded to Discourse (images, video, audio, pdf, txt, zip etc. etc.) will follow the secure media rules. The “prevent anons from downloading files” setting will no longer have any bearing on upload security. Basically, the feature will more appropriately be called “secure uploads” instead of “secure media”.
This is being done because there are communities out there that would like all attachments and media to be secure based on category rules but still allow anonymous users to download attachments in public places, which is not possible in the current arrangement.
Meta report: https://meta.discourse.org/t/short-url-secure-uploads-s3/144224
* if the show_short route is hit for an upload that is
secure, we redirect to the secure presigned URL. however
this was not taking into account multisite so the db name
was left off the path which broke the presigned URL
* we now use the correct url_for method if we know the
upload (like in the show_short case) which takes into
account multisite
When secure media is enabled and an attachment is marked as secure we want to use the full url instead of the short-url so we get the same access control post protections as secure media uploads.
### General Changes and Duplication
* We now consider a post `with_secure_media?` if it is in a read-restricted category.
* When uploading we now set an upload's secure status straight away.
* When uploading if `SiteSetting.secure_media` is enabled, we do not check to see if the upload already exists using the `sha1` digest of the upload. The `sha1` column of the upload is filled with a `SecureRandom.hex(20)` value which is the same length as `Upload::SHA1_LENGTH`. The `original_sha1` column is filled with the _real_ sha1 digest of the file.
* Whether an upload `should_be_secure?` is now determined by whether the `access_control_post` is `with_secure_media?` (if there is no access control post then we leave the secure status as is).
* When serializing the upload, we now cook the URL if the upload is secure. This is so it shows up correctly in the composer preview, because we set secure status on upload.
### Viewing Secure Media
* The secure-media-upload URL will take the post that the upload is attached to into account via `Guardian.can_see?` for access permissions
* If there is no `access_control_post` then we just deliver the media. This should be a rare occurrance and shouldn't cause issues as the `access_control_post` is set when `link_post_uploads` is called via `CookedPostProcessor`
### Removed
We no longer do any of these because we do not reuse uploads by sha1 if secure media is enabled.
* We no longer have a way to prevent cross-posting of a secure upload from a private context to a public context.
* We no longer have to set `secure: false` for uploads when uploading for a theme component.
if SiteSetting.secure_media is disabled we still want to
redirect to the signed url for uploads that are marked as
secure because their ACLs are probably still private
* Add a rake task to disable secure media. This sets all uploads to `secure: false`, changes the upload ACL to public, and rebakes all the posts using the uploads to make sure they point to the correct URLs. This is in a transaction for each upload with the upload being updated the last step, so if the task fails it can be resumed.
* Also allow viewing media via the secure url if secure media is disabled, redirecting to the normal CDN url, because otherwise media links will be broken while we go and rebake all the posts + update ACLs
This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access.
A few notes:
- the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads
- the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured
- upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status
- when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error
- when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3