Commit Graph

15 Commits

Author SHA1 Message Date
David Taylor
9a36286920
FIX: Open file handles 'just in time' during s3 migration (#28806)
Previously we were opening the file handles, then putting them in a queue for upload. If that queue grows too large, we can hit a maximum open files limit.

This commit opens the file handle 'just in time', so the maximum number of open handles is equal to the upload concurrency (20).
2024-09-09 18:39:26 +01:00
Matt Palmer
bd9c919e06
FIX: don't use etags for post-upload verification (#21923)
They don't work for server-side encryption with customer keys, and so instead we just use Content-MD5 to ensure there was no corruption in transit, which is the best we can do.

See also: https://meta.discourse.org/t/s3-uploads-incompatible-with-server-side-encryption/266853
2023-07-07 09:53:49 +02:00
Matt Palmer
a98d2a8086
FEATURE: allow S3 ACLs to be disabled (#21769)
AWS recommends running buckets without ACLs, and to use resource policies to manage access control instead.
This is not a bad idea, because S3 ACLs are whack, and while resource policies are also whack, they're a more constrained form of whack.
Further, some compliance regimes get antsy if you don't go with the vendor's recommended settings, and arguing that you need to enable ACLs on a bucket just to store images in there is more hassle than it's worth.
The new site setting (s3_use_acls) cannot be disabled when secure
uploads is enabled -- the latter relies on private ACLs for security
at this point in time. We may want to reexamine this in future.
2023-06-06 15:47:40 +10:00
Loïc Guitaut
f7c57fbc19 DEV: Enable unless cops
We discussed the use of `unless` internally and decided to enforce
available rules from rubocop to restrict its most problematic uses.
2023-02-21 10:30:48 +01:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
David Taylor
b0416cb1c1
FEATURE: Upload to s3 in parallel to speed up backup restores (#13391)
Uploading lots of small files can be made significantly faster by parallelizing the `s3.put_object` calls. In testing, an UPLOAD_CONCURRENCY of 10 made a large restore 10x faster. An UPLOAD_CONCURRENCY of 20 made the same restore 18x faster.

This commit is careful to parallelize as little as possible, to reduce the chance of concurrency issues. In the worker threads, no database transactions are performed. All modification of shared objects is controlled with a mutex.

Unfortunately we do not have any existing tests for the `ToS3Migration` class. This change has been tested with a large site backup (120k uploads totalling 45GB)
2021-06-16 10:34:39 +01:00
David Taylor
35e1e009fa
FIX: Allow restoring non-subfolder backup to subfolder site (#12537)
`GlobalSetting.relative_url_root` comes from the destination site. We
can't be sure whether it was the same on the original site. It's safer
to use a wildcard here, so we can backup/restore sites with different
relative_url_root values.
2021-04-12 14:00:52 +10:00
Martin Brennan
31e31ef449
SECURITY: Add content-disposition: attachment for SVG uploads
* strip out the href and xlink:href attributes from use element that
  are _not_ anchors in svgs which can be used for XSS
* adding the content-disposition: attachment ensures that
  uploaded SVGs cannot be opened and executed using the XSS exploit.
  svgs embedded using an img tag do not suffer from the same exploit
2020-07-09 13:31:48 +10:00
Martin Brennan
e92909aa77
FIX: Use ActionDispatch::Http::ContentDisposition for uploads content-disposition (#10108)
See https://meta.discourse.org/t/broken-pipe-error-when-uploading-to-a-s3-clone-a-pdf-with-a-name-containing-e-i-etc/155414

When setting content-disposition for attachment, use the ContentDisposition class to format it. This handles filenames with weird characters and localization (accented characters) correctly.
2020-06-23 17:10:56 +10:00
Gerhard Schlager
c6b411f6c1 FIX: Restore to S3 didn't work without env variables
The `uplaods:migrate_to_s3` rake task should always use the environment variables, because you usually don't want to break your site's uploads during the migration. But restoring a backup should work with site settings as well as environment variables, otherwise you can't restore uploads to S3 from the web interface.
2020-04-19 20:24:40 +02:00
Gerhard Schlager
baae0e7446 FIX: Infinite loop in migrate_to_s3 rake task 2020-04-19 20:24:40 +02:00
Gerhard Schlager
5bffb033df FIX: The migrate_to_s3 rake task couldn't find the AWS SDK 2020-03-26 16:41:10 +01:00
Gerhard Schlager
93b8b04b06 FIX: Migrating uploads to S3 could miss files
The rake task aborted the migration with "Already migrated" when all upload URLs linked to the correct S3 bucket even though the files didn't exist on S3. By removing the first check we force the rake task to check for the existance of uploads on S3.
2020-03-04 12:50:48 +01:00
Gerhard Schlager
0adab26e45 FIX: Don't count ignored, missing uploads in migration to S3 2020-02-12 16:18:52 +01:00
Gerhard Schlager
e474cda321 REFACTOR: Restoring of backups and migration of uploads to S3 2020-01-14 11:41:35 +01:00