discourse

mirror of https://github.com/discourse/discourse.git synced 2024-11-22 04:11:33 +08:00

Author	SHA1	Message	Date
Martin Brennan	0568d36133	FIX: Use dualstack S3 endpoint for direct uploads (#29611 ) When we added direct S3 uploads to Discourse, which use presigned URLs, we never took into account the dualstack endpoints for IPv6 on S3. This commit fixes the issue by using the dualstack endpoints for presigned URLs and requests, which are used in the get-presigned-put and batch-presign-urls endpoints used when directly uploading to S3. It also makes regular S3 requests for `put` and so on use dualstack URLs. It doesn't seem like there is a downside to doing this, but a bunch of specs needed to be updated to reflect this.	2024-11-07 11:06:39 +10:00
Alan Guo Xiang Tan	57f4176b57	DEV: Bump rubocop_discourse (#29608 )	2024-11-06 06:27:49 +08:00
Alan Guo Xiang Tan	8cf4ed5f88	DEV: Introduce hidden `s3_inventory_bucket` site setting (#27304 ) This commit introduces a hidden `s3_inventory_bucket` site setting which replaces the `enable_s3_inventory` and `s3_configure_inventory_policy` site setting. The reason `enable_s3_inventory` and `s3_configure_inventory_policy` site settings are removed is because this feature has technically been broken since it was introduced. When the `enable_s3_inventory` feature is turned on, the app will because configure a daily inventory policy for the `s3_upload_bucket` bucket and store the inventories under a prefix in the bucket. The problem here is that once the inventories are created, there is nothing cleaning up all these inventories so whoever that has enabled this feature would have been paying the cost of storing a whole bunch of inventory files which are never used. Given that we have not received any complains about inventory files inflating S3 storage costs, we think that it is very likely that this feature is no longer being used and we are looking to drop support for this feature in the not too distance future. For now, we will still support a hidden `s3_inventory_bucket` site setting which site administrators can configure via the `DISCOURSE_S3_INVENTORY_BUCKET` env.	2024-06-10 13:16:00 +08:00
Alan Guo Xiang Tan	df16ab0758	FIX: `S3Inventory` to ignore files older than last backup restore date (#27166 ) This commit updates `S3Inventory#files` to ignore S3 inventory files which have a `last_modified` timestamp which are not at least 2 days older than `BackupMetadata.last_restore_date` timestamp. This check was previously only in `Jobs::EnsureS3UploadsExistence` but `S3Inventory` can also be used via Rake tasks so this protection needs to be in `S3Inventory` and not in the scheduled job.	2024-05-24 10:54:06 +08:00
Martin Brennan	731dffdf92	DEV: Align S3 transfer acceleration global settings (#24302 ) Followup to `fe05fdae24` For consistency with other S3 settings, make the global setting the same name as the site setting and use SiteSetting.Upload too so it reads from the correct place.	2023-11-10 09:50:23 +10:00
Martin Brennan	fe05fdae24	DEV: Introduce S3 transfer acceleration for uploads behind hidden setting (#24238 ) This commit adds an `enable_s3_transfer_acceleration` site setting, which is hidden to begin with. We are adding this because in certain regions, using https://aws.amazon.com/s3/transfer-acceleration/ can drastically speed up uploads, sometimes as much as 70% in certain regions depending on the target bucket region. This is important for us because we have direct S3 multipart uploads enabled everywhere on our hosting. To start, we only want this on the uploads bucket, not the backup one. Also, this will accelerate both uploads and downloads, depending on whether a presigned URL is used for downloading. This is the case when secure uploads is enabled, not anywhere else at this time. To enable the S3 acceleration on downloads more generally would be a more in-depth change, since we currently store S3 Upload record URLs like this: ``` url: "//test.s3.dualstack.us-east-2.amazonaws.com/original/2X/6/123456.png" ``` For acceleration, `s3.dualstack` would need to be changed to `s3-accelerate.dualstack` here. Note that for this to have any effect, Transfer Acceleration must be enabled on the S3 bucket used for uploads per https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration-examples.html.	2023-11-07 11:50:40 +10:00
Martin Brennan	cf42466dea	DEV: Add S3 upload system specs using minio (#22975 ) This commit adds some system specs to test uploads with direct to S3 single and multipart uploads via uppy. This is done with minio as a local S3 replacement. We are doing this to catch regressions when uppy dependencies need to be upgraded or we change uppy upload code, since before this there was no way to know outside manual testing whether these changes would cause regressions. Minio's server lifecycle and the installed binaries are managed by the https://github.com/discourse/minio_runner gem, though the binaries are already installed on the discourse_test image we run GitHub CI from. These tests will only run in CI unless you specifically use the CI=1 or RUN_S3_SYSTEM_SPECS=1 env vars. For a history of experimentation here see https://github.com/discourse/discourse/pull/22381 Related PRs: * https://github.com/discourse/minio_runner/pull/1 * https://github.com/discourse/minio_runner/pull/2 * https://github.com/discourse/minio_runner/pull/3	2023-08-23 11:18:33 +10:00
Matt Palmer	a98d2a8086	FEATURE: allow S3 ACLs to be disabled (#21769 ) AWS recommends running buckets without ACLs, and to use resource policies to manage access control instead. This is not a bad idea, because S3 ACLs are whack, and while resource policies are also whack, they're a more constrained form of whack. Further, some compliance regimes get antsy if you don't go with the vendor's recommended settings, and arguing that you need to enable ACLs on a bucket just to store images in there is more hassle than it's worth. The new site setting (s3_use_acls) cannot be disabled when secure uploads is enabled -- the latter relies on private ACLs for security at this point in time. We may want to reexamine this in future.	2023-06-06 15:47:40 +10:00
Martin Brennan	21a95b000e	DEV: Remove defunct TODOs (#19825 ) * Firefox now finally returns PerformanceMeasure from performance.measure * Some TODOs were really more NOTE or FIXME material or no longer relevant * retain_hours is not needed in ExternalUploadsManager, it doesn't seem like anywhere in the UI sends this as a param for uploads * https://github.com/discourse/discourse/pull/18413 was merged so we can remove JS test workaround for settings	2023-01-12 09:41:39 +10:00
David Taylor	6417173082	DEV: Apply syntax_tree formatting to `lib/*`	2023-01-09 12:10:19 +00:00
David Taylor	f30f9ec5d9	PERF: Update `s3:expire_missing_assets` to delete in batches (#18908 ) Some sites may have thousands of stale assets - deleting them one-by-one is very slow. Followup to `e8570b5cc9`	2022-11-07 12:53:14 +00:00
Vinoth Kannan	169f2ad443	FIX: don't raise an error if file not found in S3. (#17841 ) While deleting the object in S3, don't raise an error if the file is not available in S3. Co-authored-by: Régis Hanol <regis@hanol.fr>	2022-08-09 15:16:35 +05:30
Gerhard Schlager	9d870f151c	FIX: Uploading large files (> 5GB) failed when `enable_direct_s3_uploads` is enabled (#16724 ) Larger files require a multipart copy.	2022-06-28 21:30:00 +02:00
Martin Brennan	641c4e0b7a	FEATURE: Make S3 presigned GET URL expiry configurable (#16912 ) Previously we hardcoded the DOWNLOAD_URL_EXPIRES_AFTER_SECONDS const inside S3Helper to be 5 minutes (300 seconds). For various reasons, some hosted sites may need this to be longer for other integrations. The maximum expiry time for presigned URLs is 1 week (which is 604800 seconds), so that has been added as a validation on the setting as well. The setting is hidden because 99% of the time it should not be changed.	2022-05-26 09:53:01 +10:00
Martin Brennan	e4350bb966	FEATURE: Direct S3 multipart uploads for backups (#14736 ) This PR introduces a new `enable_experimental_backup_uploads` site setting (default false and hidden), which when enabled alongside `enable_direct_s3_uploads` will allow for direct S3 multipart uploads of backup .tar.gz files. To make multipart external uploads work with both the S3BackupStore and the S3Store, I've had to move several methods out of S3Store and into S3Helper, including: * presigned_url * create_multipart * abort_multipart * complete_multipart * presign_multipart_part * list_multipart_parts Then, S3Store and S3BackupStore either delegate directly to S3Helper or have their own special methods to call S3Helper for these methods. FileStore.temporary_upload_path has also removed its dependence on upload_path, and can now be used interchangeably between the stores. A similar change was made in the frontend as well, moving the multipart related JS code out of ComposerUppyUpload and into a mixin of its own, so it can also be used by UppyUploadMixin. Some changes to ExternalUploadManager had to be made here as well. The backup direct uploads do not need an Upload record made for them in the database, so they can be moved to their final S3 resting place when completing the multipart upload. This changeset is not perfect; it introduces some special cases in UploadController to handle backups that was previously in BackupController, because UploadController is where the multipart routes are located. A subsequent pull request will pull these routes into a module or some other sharing pattern, along with hooks, so the backup controller and the upload controller (and any future controllers that may need them) can include these routes in a nicer way.	2021-11-11 08:25:31 +10:00
Martin Brennan	9a72a0945f	FIX: Ensure CORS rules exist for S3 using rake task (#14802 ) This commit introduces a new s3:ensure_cors_rules rake task that is run as a prerequisite to s3:upload_assets. This rake task calls out to the S3CorsRulesets class to ensure that the 3 relevant sets of CORS rules are applied, depending on site settings: * assets * direct S3 backups * direct S3 uploads This works for both Global S3 settings and Database S3 settings (the latter set directly via SiteSetting). As it is, only one rule can be applied, which is generally the assets rule as it is called first. This commit changes the ensure_cors! method to be able to apply new rules as well as the existing ones. This commit also slightly changes the existing rules to cover direct S3 uploads via uppy, especially multipart, which requires some more headers.	2021-11-08 09:16:38 +10:00
Martin Brennan	a059c7251f	DEV: Add tests to S3Helper.ensure_cors and move rules to class (#14767 ) In preparation for adding automatic CORS rules creation for direct S3 uploads, I am adding tests here and moving the CORS rule definitions into a dedicated class so they are all in the one place. There is a problem with ensure_cors! as well -- if there is already a CORS rule defined (presumably the asset one) then we do nothing and do not apply the new rule. This means that the S3BackupStore.ensure_cors method does nothing right now if the assets rule is already defined, and it will mean the same for any direct S3 upload rules I add for uppy. We need to be able to add more rules, not just one. This is not a problem on our hosting because we define the rules at an infra level.	2021-11-01 08:23:13 +10:00
Martin Brennan	0d809197aa	FIX: Make sure S3 object headers are preserved on copy (#14302 ) When copying an existing upload stub temporary object on S3 to its final destination we were not copying across its additional headers such as content-disposition and cache-control, which led to issues like attachments not downloading with their original filename when clicking the download links in posts. This is because the metadata_directive = REPLACE option was not being passed to object.copy_from(), so only the source object's headers were being used. Added an option for apply_metadata_to_destination to apply this option conditionally, because we may not always want to replace this metadata, but we definitely do when copying a temporary upload.	2021-09-10 12:59:51 +10:00
Martin Brennan	841e054907	FIX: Do not prefix temp/ S3 keys with s3_bucket_folder_path in S3Helper (#14145 ) This is unnecessary, as when the temporary key is created in S3Store we already include the s3_bucket_folder_path, and the key will always start with temp/ to assist with lifecycle rules for multipart uploads. This was affecting Discourse.store.object_from_path, Discourse.store.signed_url_for_path, and possibly others. See also: `e0102a5`	2021-08-26 08:50:49 +10:00
Martin Brennan	b500949ef6	FEATURE: Initial implementation of direct S3 uploads with uppy and stubs (#13787 ) This adds a few different things to allow for direct S3 uploads using uppy. These changes are still not the default. There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. ### Starting a direct S3 upload When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded. Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage. ### Completing a direct S3 upload Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`. 1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this. 2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues. We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large. 3. Finally we return the serialized upload record back to the client There are several errors that could happen that are handled by `UploadsController` as well. Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.	2021-07-28 08:42:25 +10:00
Martin Brennan	c3394ed9bb	DEV: Update aws-sdk-s3 gem for S3 multipart uploads (#13523 ) We are a few versions behind on this gem. We need to update it for S3 multipart uploads. In the current version we are using, we cannot do this: ```ruby Discourse.store.s3_helper.object(key).presigned_url(:upload_part, part_number: 1, upload_id: multipart_upload_id) ``` The S3 client raises an error, saying the operation is undefined. Once I updated the gem this operation works as expected and returns a presigned URL for the upload_part operation. Also remove use of Aws::S3::FileUploader::FIFTEEN_MEGABYTES. This was part of a private API and should not have been used.	2021-06-25 14:22:31 +10:00
Michael Brown	c25dc43f54	FIX: AWS S3 errors don't necessarily include a message * If the error doesn't have a message, the class name will help * example: before: "Failed to download #{filename} because " after: "Failed to download #{filename} because Aws::S3::Errors::NotFound"	2020-08-12 17:00:09 -04:00
Andrew Schleifer	aee3c2c34d	FIX: attempt to output a useful error message currently AWS problems will just dump a stack trace	2020-08-05 17:49:42 +08:00
Martin Brennan	8ef782bdbd	FIX: Increase time of DOWNLOAD_URL_EXPIRES_AFTER_SECONDS to 5 minutes (#10160 ) * Change S3Helper::DOWNLOAD_URL_EXPIRES_AFTER_SECONDS to 5 minutes, which controls presigned URL expiry and secure-media route cache time. * This is done because of the composer preview refreshing while typing causes a lot of requests sent to our server because of the short URL expiry. If this ends up being not enough we can always increase the time or explore other avenues (e.g. GitHub has a 7 day validity for secure URLs)	2020-07-03 13:42:36 +10:00
Andrew Schleifer	74d28a43d1	new S3 backup layout (#9830 ) * DEV: new S3 backup layout Currently, with $S3_BACKUP_BUCKET of "bucket/backups", multisite backups end up in "bucket/backups/backups/dbname/" and single-site will be in "bucket/backups/". Both _should_ be in "bucket/backups/dbname/" - remove MULTISITE_PREFIX, - always include dbname, - method to move to the new prefix - job to call the method * SPEC: add tests for `VacateLegacyPrefixBackups` onceoff job. Co-authored-by: Vinoth Kannan <vinothkannan@vinkas.com>	2020-05-29 00:28:23 +05:30
Rafael dos Santos Silva	b48299f81c	FEATURE: Add setting to disable automatic CORS rule install in S3 buckets (#9872 )	2020-05-25 17:09:34 -03:00
Rafael dos Santos Silva	08e4af6636	FEATURE: Add setting to controle the Expect header on S3 calls Some providers don't implement the Expect: 100-continue support, which results in a mismatch in the object signature. With this settings, users can disable the header and use such providers.	2020-04-30 12:12:00 -03:00
Sam Saffron	d0d5a138c3	DEV: stop freezing frozen strings We have the `# frozen_string_literal: true` comment on all our files. This means all string literals are frozen. There is no need to call #freeze on any literals. For files with `# frozen_string_literal: true` ``` puts %w{a b}[0].frozen? => true puts "hi".frozen? => true puts "a #{1} b".frozen? => true puts ("a " + "b").frozen? => false puts (-("a " + "b")).frozen? => true ``` For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby	2020-04-30 16:48:53 +10:00
David Taylor	ba616ffb50	DEV: Use a tmp directory for storing uploads in tests (#9554 ) This avoids development-mode upload files from polluting the test environment	2020-04-28 14:03:04 +01:00
Robin Ward	80a572d3b7	FIX: Multisite spec was failing in parallel environment We were not adding the test number to the path in all places.	2020-04-22 14:05:39 -04:00
Penar Musaraj	067696df8f	DEV: Apply Rubocop redundant return style	2019-11-14 15:10:51 -05:00
Vinoth Kannan	3e456d5c0b	FIX: don't include multisite upload path to source URL if already exist.	2019-08-02 07:57:27 +05:30
Penar Musaraj	f00275ded3	FEATURE: Support private attachments when using S3 storage (#7677 ) * Support private uploads in S3 * Use localStore for local avatars * Add job to update private upload ACL on S3 * Test multisite paths * update ACL for private uploads in migrate_to_s3 task	2019-06-06 13:27:24 +10:00
Vinoth Kannan	be0555cc17	FIX: Add bucket folder path only if not exists	2019-05-15 15:37:40 +05:30
Sam Saffron	30990006a9	DEV: enable frozen string literal on all files This reduces chances of errors where consumers of strings mutate inputs and reduces memory usage of the app. Test suite passes now, but there may be some stuff left, so we will run a few sites on a branch prior to merging	2019-05-13 09:31:32 +08:00
Rishabh	ad6ad3f679	DEV: Remove SiteSetting.s3_force_path_style (#7210 ) - s3_force_path_style was added as a Minio specific url scheme but it has never been well supported in our code base. - Our new migrate_to_s3 rake task does not work reliably with path style urls too - Minio has also added support for virtual style requests i.e the same scheme as AWS S3/DO Spaces so we can rely on that instead of using path style requests. - Add migration to drop s3_force_path_style from the site_settings table	2019-03-20 14:58:20 +01:00
Vinoth Kannan	cc496de10e	FIX: Remove double quotes from etag value in API response https://github.com/aws/aws-sdk-ruby/issues/1134	2019-02-08 14:31:19 +05:30
Gerhard Schlager	ba724d7f25	FIX: S3 endpoint broke bucket creation in non-default region	2019-02-05 18:17:02 +01:00
Vinoth Kannan	b4f713ca52	FEATURE: Use amazon s3 inventory to manage upload stats (#6867 )	2019-02-01 10:10:48 +05:30
Rishabh	f181e9cc08	FIX: Add compatibility for bucket folder paths in migrate_to_s3 task (#6855 ) * FIX: Add compatibility for bucket folder paths in migrate_to_s3 task * Refactor bucket_name split logic into S3Helper	2019-01-08 20:04:48 +05:30
Vinoth Kannan	82d7f9ce5e	fix the build Checking size for a file object directly will cause issue if it is a closed stream	2019-01-04 13:25:11 +05:30
Vinoth Kannan	940a61037c	DEV: Add option to pass s3 client in param	2019-01-04 12:16:09 +05:30
Vinoth Kannan	75dbb98cca	FEATURE: Add S3 etag value to uploads table (#6795 )	2019-01-04 14:16:22 +08:00
Régis Hanol	5381096bfd	PERF: new 'migrate_to_s3' rake task	2018-12-26 17:34:49 +01:00
Rishabh	cae5ba7356	FIX: Ensure that multisite s3 uploads are tombstoned correctly (#6769 ) * FIX: Ensure that multisite uploads are tombstoned into the correct paths * Move multisite specs to spec/multisite/s3_store_spec.rb	2018-12-19 13:32:32 +08:00
Vinoth Kannan	fd272eee44	FEATURE: Make uploads:missing task compatible with s3 uploads	2018-11-27 00:54:51 +05:30
Guo Xiang Tan	84d4c81a26	FEATURE: Support backup uploads/downloads directly to/from S3. This reverts commit `3c59106bac`.	2018-10-15 09:43:31 +08:00
Guo Xiang Tan	3c59106bac	Revert "FEATURE: Support backup uploads/downloads directly to/from S3." This reverts commit `c29a4dddc1`. We're doing a beta bump soon so un-revert this after that is done.	2018-10-11 11:08:23 +08:00
Gerhard Schlager	c29a4dddc1	FEATURE: Support backup uploads/downloads directly to/from S3.	2018-10-11 10:38:43 +08:00
Rishabh	4f46aa1ba3	FEATURE: Add SiteSetting for s3_configure_tombstone_policy Add SiteSetting for s3_configure_tombstone_policy, skip policy generation if turned off (default on)	2018-09-17 10:57:50 +10:00

1 2

75 Commits