Commit Graph

65 Commits

Author SHA1 Message Date
David Taylor
f30f9ec5d9
PERF: Update s3:expire_missing_assets to delete in batches (#18908)
Some sites may have thousands of stale assets - deleting them one-by-one is very slow.

Followup to e8570b5cc9
2022-11-07 12:53:14 +00:00
Vinoth Kannan
169f2ad443
FIX: don't raise an error if file not found in S3. (#17841)
While deleting the object in S3, don't raise an error if the file is not available in S3.

Co-authored-by: Régis Hanol <regis@hanol.fr>
2022-08-09 15:16:35 +05:30
Gerhard Schlager
9d870f151c
FIX: Uploading large files (> 5GB) failed when enable_direct_s3_uploads is enabled (#16724)
Larger files require a multipart copy.
2022-06-28 21:30:00 +02:00
Martin Brennan
641c4e0b7a
FEATURE: Make S3 presigned GET URL expiry configurable (#16912)
Previously we hardcoded the DOWNLOAD_URL_EXPIRES_AFTER_SECONDS const
inside S3Helper to be 5 minutes (300 seconds). For various reasons,
some hosted sites may need this to be longer for other integrations.

The maximum expiry time for presigned URLs is 1 week (which is
604800 seconds), so that has been added as a validation on the
setting as well. The setting is hidden because 99% of the time
it should not be changed.
2022-05-26 09:53:01 +10:00
Martin Brennan
e4350bb966
FEATURE: Direct S3 multipart uploads for backups (#14736)
This PR introduces a new `enable_experimental_backup_uploads` site setting (default false and hidden), which when enabled alongside `enable_direct_s3_uploads` will allow for direct S3 multipart uploads of backup .tar.gz files.

To make multipart external uploads work with both the S3BackupStore and the S3Store, I've had to move several methods out of S3Store and into S3Helper, including:

* presigned_url
* create_multipart
* abort_multipart
* complete_multipart
* presign_multipart_part
* list_multipart_parts

Then, S3Store and S3BackupStore either delegate directly to S3Helper or have their own special methods to call S3Helper for these methods. FileStore.temporary_upload_path has also removed its dependence on upload_path, and can now be used interchangeably between the stores. A similar change was made in the frontend as well, moving the multipart related JS code out of ComposerUppyUpload and into a mixin of its own, so it can also be used by UppyUploadMixin.

Some changes to ExternalUploadManager had to be made here as well. The backup direct uploads do not need an Upload record made for them in the database, so they can be moved to their final S3 resting place when completing the multipart upload.

This changeset is not perfect; it introduces some special cases in UploadController to handle backups that was previously in BackupController, because UploadController is where the multipart routes are located. A subsequent pull request will pull these routes into a module or some other sharing pattern, along with hooks, so the backup controller and the upload controller (and any future controllers that may need them) can include these routes in a nicer way.
2021-11-11 08:25:31 +10:00
Martin Brennan
9a72a0945f
FIX: Ensure CORS rules exist for S3 using rake task (#14802)
This commit introduces a new s3:ensure_cors_rules rake task
that is run as a prerequisite to s3:upload_assets. This rake
task calls out to the S3CorsRulesets class to ensure that
the 3 relevant sets of CORS rules are applied, depending on
site settings:

* assets
* direct S3 backups
* direct S3 uploads

This works for both Global S3 settings and Database S3 settings
(the latter set directly via SiteSetting).

As it is, only one rule can be applied, which is generally
the assets rule as it is called first. This commit changes
the ensure_cors! method to be able to apply new rules as
well as the existing ones.

This commit also slightly changes the existing rules to cover
direct S3 uploads via uppy, especially multipart, which requires
some more headers.
2021-11-08 09:16:38 +10:00
Martin Brennan
a059c7251f
DEV: Add tests to S3Helper.ensure_cors and move rules to class (#14767)
In preparation for adding automatic CORS rules creation
for direct S3 uploads, I am adding tests here and moving the
CORS rule definitions into a dedicated class so they are all
in the one place.

There is a problem with ensure_cors! as well -- if there is
already a CORS rule defined (presumably the asset one) then
we do nothing and do not apply the new rule. This means that
the S3BackupStore.ensure_cors method does nothing right now
if the assets rule is already defined, and it will mean the
same for any direct S3 upload rules I add for uppy. We need
to be able to add more rules, not just one.

This is not a problem on our hosting because we define the
rules at an infra level.
2021-11-01 08:23:13 +10:00
Martin Brennan
0d809197aa
FIX: Make sure S3 object headers are preserved on copy (#14302)
When copying an existing upload stub temporary object
on S3 to its final destination we were not copying across
its additional headers such as content-disposition and
cache-control, which led to issues like attachments not
downloading with their original filename when clicking
the download links in posts.

This is because the metadata_directive = REPLACE option
was not being passed to object.copy_from(), so only the
source object's headers were being used. Added an option
for apply_metadata_to_destination to apply this option
conditionally, because we may not always want to replace
this metadata, but we definitely do when copying a temporary
upload.
2021-09-10 12:59:51 +10:00
Martin Brennan
841e054907
FIX: Do not prefix temp/ S3 keys with s3_bucket_folder_path in S3Helper (#14145)
This is unnecessary, as when the temporary key is created
in S3Store we already include the s3_bucket_folder_path, and
the key will always start with temp/ to assist with lifecycle
rules for multipart uploads.

This was affecting Discourse.store.object_from_path,
Discourse.store.signed_url_for_path, and possibly others.

See also: e0102a5
2021-08-26 08:50:49 +10:00
Martin Brennan
b500949ef6
FEATURE: Initial implementation of direct S3 uploads with uppy and stubs (#13787)
This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads`  settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader.

A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. 

### Starting a direct S3 upload

When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded.

Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage.

### Completing a direct S3 upload

Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`.

1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this.

2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues.

    We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large.

3. Finally we return the serialized upload record back to the client

There are several errors that could happen that are handled by `UploadsController` as well.

Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
2021-07-28 08:42:25 +10:00
Martin Brennan
c3394ed9bb
DEV: Update aws-sdk-s3 gem for S3 multipart uploads (#13523)
We are a few versions behind on this gem. We need to update it
for S3 multipart uploads. In the current version we are using, we
cannot do this:

```ruby
Discourse.store.s3_helper.object(key).presigned_url(:upload_part, part_number: 1, upload_id: multipart_upload_id)
```

The S3 client raises an error, saying the operation is undefined. Once
I updated the gem this operation works as expected and returns a
presigned URL for the upload_part operation.

Also remove use of Aws::S3::FileUploader::FIFTEEN_MEGABYTES.
This was part of a private API and should not have been used.
2021-06-25 14:22:31 +10:00
Michael Brown
c25dc43f54 FIX: AWS S3 errors don't necessarily include a message
* If the error doesn't have a message, the class name will help
* example:
  before: "Failed to download #{filename} because "
  after: "Failed to download #{filename} because Aws::S3::Errors::NotFound"
2020-08-12 17:00:09 -04:00
Andrew Schleifer
aee3c2c34d FIX: attempt to output a useful error message
currently AWS problems will just dump a stack trace
2020-08-05 17:49:42 +08:00
Martin Brennan
8ef782bdbd
FIX: Increase time of DOWNLOAD_URL_EXPIRES_AFTER_SECONDS to 5 minutes (#10160)
* Change S3Helper::DOWNLOAD_URL_EXPIRES_AFTER_SECONDS to 5 minutes, which controls presigned URL expiry and secure-media route cache time.
* This is done because of the composer preview refreshing while typing causes a lot of requests sent to our server because of the short URL expiry. If this ends up being not enough we can always increase the time or explore other avenues (e.g. GitHub has a 7 day validity for secure URLs)
2020-07-03 13:42:36 +10:00
Andrew Schleifer
74d28a43d1
new S3 backup layout (#9830)
* DEV: new S3 backup layout

Currently, with $S3_BACKUP_BUCKET of "bucket/backups", multisite backups
end up in "bucket/backups/backups/dbname/" and single-site will be in
"bucket/backups/".

Both _should_ be in "bucket/backups/dbname/"

- remove MULTISITE_PREFIX,
- always include dbname,
- method to move to the new prefix
- job to call the method

* SPEC: add tests for `VacateLegacyPrefixBackups` onceoff job.

Co-authored-by: Vinoth Kannan <vinothkannan@vinkas.com>
2020-05-29 00:28:23 +05:30
Rafael dos Santos Silva
b48299f81c
FEATURE: Add setting to disable automatic CORS rule install in S3 buckets (#9872) 2020-05-25 17:09:34 -03:00
Rafael dos Santos Silva
08e4af6636 FEATURE: Add setting to controle the Expect header on S3 calls
Some providers don't implement the Expect: 100-continue support,
which results in a mismatch in the object signature.

With this settings, users can disable the header and use such providers.
2020-04-30 12:12:00 -03:00
Sam Saffron
d0d5a138c3
DEV: stop freezing frozen strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.

For files with `# frozen_string_literal: true`

```
puts %w{a b}[0].frozen?
=> true

puts "hi".frozen?
=> true

puts "a #{1} b".frozen?
=> true

puts ("a " + "b").frozen?
=> false

puts (-("a " + "b")).frozen?
=> true
```

For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
2020-04-30 16:48:53 +10:00
David Taylor
ba616ffb50
DEV: Use a tmp directory for storing uploads in tests (#9554)
This avoids development-mode upload files from polluting the test environment
2020-04-28 14:03:04 +01:00
Robin Ward
80a572d3b7 FIX: Multisite spec was failing in parallel environment
We were not adding the test number to the path in all places.
2020-04-22 14:05:39 -04:00
Penar Musaraj
067696df8f DEV: Apply Rubocop redundant return style 2019-11-14 15:10:51 -05:00
Vinoth Kannan
3e456d5c0b FIX: don't include multisite upload path to source URL if already exist. 2019-08-02 07:57:27 +05:30
Penar Musaraj
f00275ded3 FEATURE: Support private attachments when using S3 storage (#7677)
* Support private uploads in S3
* Use localStore for local avatars
* Add job to update private upload ACL on S3
* Test multisite paths
* update ACL for private uploads in migrate_to_s3 task
2019-06-06 13:27:24 +10:00
Vinoth Kannan
be0555cc17 FIX: Add bucket folder path only if not exists 2019-05-15 15:37:40 +05:30
Sam Saffron
30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Rishabh
ad6ad3f679 DEV: Remove SiteSetting.s3_force_path_style (#7210)
- s3_force_path_style was added as a Minio specific url scheme but it has never been well supported in our code base.
- Our new migrate_to_s3 rake task does not work reliably with path style urls too
- Minio has also added support for virtual style requests i.e the same scheme as AWS S3/DO Spaces so we can rely on that instead of using path style requests.
- Add migration to drop s3_force_path_style from the site_settings table
2019-03-20 14:58:20 +01:00
Vinoth Kannan
cc496de10e FIX: Remove double quotes from etag value in API response
https://github.com/aws/aws-sdk-ruby/issues/1134
2019-02-08 14:31:19 +05:30
Gerhard Schlager
ba724d7f25 FIX: S3 endpoint broke bucket creation in non-default region 2019-02-05 18:17:02 +01:00
Vinoth Kannan
b4f713ca52
FEATURE: Use amazon s3 inventory to manage upload stats (#6867) 2019-02-01 10:10:48 +05:30
Rishabh
f181e9cc08
FIX: Add compatibility for bucket folder paths in migrate_to_s3 task (#6855)
* FIX: Add compatibility for bucket folder paths in migrate_to_s3 task
* Refactor bucket_name split logic into S3Helper
2019-01-08 20:04:48 +05:30
Vinoth Kannan
82d7f9ce5e fix the build
Checking size for a file object directly will cause issue if it is a closed stream
2019-01-04 13:25:11 +05:30
Vinoth Kannan
940a61037c DEV: Add option to pass s3 client in param 2019-01-04 12:16:09 +05:30
Vinoth Kannan
75dbb98cca FEATURE: Add S3 etag value to uploads table (#6795) 2019-01-04 14:16:22 +08:00
Régis Hanol
5381096bfd PERF: new 'migrate_to_s3' rake task 2018-12-26 17:34:49 +01:00
Rishabh
cae5ba7356 FIX: Ensure that multisite s3 uploads are tombstoned correctly (#6769)
* FIX: Ensure that multisite uploads are tombstoned into the correct paths

* Move multisite specs to spec/multisite/s3_store_spec.rb
2018-12-19 13:32:32 +08:00
Vinoth Kannan
fd272eee44 FEATURE: Make uploads:missing task compatible with s3 uploads 2018-11-27 00:54:51 +05:30
Guo Xiang Tan
84d4c81a26 FEATURE: Support backup uploads/downloads directly to/from S3.
This reverts commit 3c59106bac.
2018-10-15 09:43:31 +08:00
Guo Xiang Tan
3c59106bac Revert "FEATURE: Support backup uploads/downloads directly to/from S3."
This reverts commit c29a4dddc1.

We're doing a beta bump soon so un-revert this after that is done.
2018-10-11 11:08:23 +08:00
Gerhard Schlager
c29a4dddc1 FEATURE: Support backup uploads/downloads directly to/from S3. 2018-10-11 10:38:43 +08:00
Rishabh
4f46aa1ba3 FEATURE: Add SiteSetting for s3_configure_tombstone_policy
Add SiteSetting for s3_configure_tombstone_policy, skip policy generation if turned off (default on)
2018-09-17 10:57:50 +10:00
Guo Xiang Tan
4a966c639d DEV: Update uploads:list_posts_with_broken_images to recover from tombstone. 2018-09-10 17:01:41 +08:00
Guo Xiang Tan
df04e69cde FIX: S3Helper#list creates incorrect prefix. 2018-09-10 16:34:40 +08:00
Neil Lalonde
ac40fede66 Revert "FIX: subfolder support for S3 CDN" 2018-08-20 10:30:44 -04:00
Neil Lalonde
0b8f6d3637 FIX: subfolder support for S3 CDN 2018-08-20 10:22:03 -04:00
Guo Xiang Tan
1ea23b1eae FIX: Wrong order for S3Helper#copy_file. 2018-08-08 15:58:54 +08:00
Guo Xiang Tan
aafff740d2 Add FileStore::S3Store#copy_file. 2018-08-08 11:30:34 +08:00
Rishabh
a6c589d882 FEATURE: Add custom S3 Endpoint and DigitalOcean Spaces/Minio support for Backups (#6045)
- Add custom S3 Endpoints and DigitalOcean Spaces support
- Add Minio support using 'force_path_style' option and fix uploads to custom endpoint
2018-07-16 14:44:55 +10:00
Andrew Schleifer
4be0e31459 fix s3_cdn_url when the s3 bucket contains a folder 2018-05-23 15:51:02 -05:00
Andrew Schleifer
ff15d95983 FIX s3_helper.list for buckets with folders
s3_bucket_folder_path does not contain a trailing slash so it was
smashingstufftogether
2018-05-22 20:09:08 -05:00
Sam
4f28c71b50 FIX: error setting tombstone bucket when set to old version 2017-11-13 15:36:45 +11:00