discourse/lib/s3_helper.rb

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

345 lines
9.7 KiB
Ruby
Raw Normal View History

# frozen_string_literal: true
require "aws-sdk-s3"
class S3Helper
FIFTEEN_MEGABYTES = 15 * 1024 * 1024
class SettingMissing < StandardError; end
attr_reader :s3_bucket_name, :s3_bucket_folder_path
##
# Controls the following:
#
# * cache time for secure-media URLs
# * expiry time for S3 presigned URLs, which include backup downloads and
# any upload that has a private ACL (e.g. secure uploads)
FEATURE: Initial implementation of direct S3 uploads with uppy and stubs (#13787) This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. ### Starting a direct S3 upload When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded. Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage. ### Completing a direct S3 upload Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`. 1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this. 2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues. We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large. 3. Finally we return the serialized upload record back to the client There are several errors that could happen that are handled by `UploadsController` as well. Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
2021-07-28 06:42:25 +08:00
DOWNLOAD_URL_EXPIRES_AFTER_SECONDS ||= 5.minutes.to_i
##
# Controls the following:
#
# * presigned put_object URLs for direct S3 uploads
UPLOAD_URL_EXPIRES_AFTER_SECONDS ||= 10.minutes.to_i
def initialize(s3_bucket_name, tombstone_prefix = '', options = {})
@s3_client = options.delete(:client)
@s3_options = default_s3_options.merge(options)
@s3_bucket_name, @s3_bucket_folder_path = begin
raise Discourse::InvalidParameters.new("s3_bucket_name") if s3_bucket_name.blank?
self.class.get_bucket_and_folder_path(s3_bucket_name)
end
@tombstone_prefix =
if @s3_bucket_folder_path
File.join(@s3_bucket_folder_path, tombstone_prefix)
else
tombstone_prefix
end
end
def self.build_from_config(use_db_s3_config: false, for_backup: false, s3_client: nil)
setting_klass = use_db_s3_config ? SiteSetting : GlobalSetting
options = S3Helper.s3_options(setting_klass)
options[:client] = s3_client if s3_client.present?
bucket =
if for_backup
setting_klass.s3_backup_bucket
else
use_db_s3_config ? SiteSetting.s3_upload_bucket : GlobalSetting.s3_bucket
end
S3Helper.new(bucket.downcase, '', options)
end
def self.get_bucket_and_folder_path(s3_bucket_name)
s3_bucket_name.downcase.split("/", 2)
end
def upload(file, path, options = {})
path = get_path_for_s3_upload(path)
obj = s3_bucket.object(path)
etag = begin
if File.size(file.path) >= FIFTEEN_MEGABYTES
options[:multipart_threshold] = FIFTEEN_MEGABYTES
obj.upload_file(file, options)
obj.load
obj.etag
else
options[:body] = file
obj.put(options).etag
end
end
[path, etag.gsub('"', '')]
end
def remove(s3_filename, copy_to_tombstone = false)
s3_filename = s3_filename.dup
# copy the file in tombstone
if copy_to_tombstone && @tombstone_prefix.present?
2018-08-08 11:26:05 +08:00
self.copy(
get_path_for_s3_upload(s3_filename),
File.join(@tombstone_prefix, s3_filename)
2018-08-08 11:26:05 +08:00
)
end
# delete the file
s3_filename.prepend(multisite_upload_path) if Rails.configuration.multisite
delete_object(get_path_for_s3_upload(s3_filename))
rescue Aws::S3::Errors::NoSuchKey
end
def delete_object(key)
s3_bucket.object(key).delete
2015-05-25 15:57:06 +08:00
rescue Aws::S3::Errors::NoSuchKey
end
def copy(source, destination, options: {})
if options[:apply_metadata_to_destination]
options = options.except(:apply_metadata_to_destination).merge(metadata_directive: "REPLACE")
end
FEATURE: Initial implementation of direct S3 uploads with uppy and stubs (#13787) This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. ### Starting a direct S3 upload When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded. Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage. ### Completing a direct S3 upload Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`. 1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this. 2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues. We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large. 3. Finally we return the serialized upload record back to the client There are several errors that could happen that are handled by `UploadsController` as well. Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
2021-07-28 06:42:25 +08:00
destination = get_path_for_s3_upload(destination)
if !Rails.configuration.multisite
options[:copy_source] = File.join(@s3_bucket_name, source)
else
if source.include?(multisite_upload_path) || source.include?(@tombstone_prefix)
options[:copy_source] = File.join(@s3_bucket_name, source)
elsif @s3_bucket_folder_path
folder, filename = begin
FEATURE: Initial implementation of direct S3 uploads with uppy and stubs (#13787) This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. ### Starting a direct S3 upload When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded. Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage. ### Completing a direct S3 upload Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`. 1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this. 2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues. We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large. 3. Finally we return the serialized upload record back to the client There are several errors that could happen that are handled by `UploadsController` as well. Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
2021-07-28 06:42:25 +08:00
source.split("/", 2)
end
options[:copy_source] = File.join(@s3_bucket_name, folder, multisite_upload_path, filename)
else
options[:copy_source] = File.join(@s3_bucket_name, multisite_upload_path, source)
end
end
FEATURE: Initial implementation of direct S3 uploads with uppy and stubs (#13787) This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. ### Starting a direct S3 upload When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded. Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage. ### Completing a direct S3 upload Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`. 1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this. 2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues. We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large. 3. Finally we return the serialized upload record back to the client There are several errors that could happen that are handled by `UploadsController` as well. Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
2021-07-28 06:42:25 +08:00
destination_object = s3_bucket.object(destination)
# TODO: copy_source is a legacy option here and may become unsupported
# in later versions, we should change to use Aws::S3::Client#copy_object
# at some point.
#
# See https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-s3/lib/aws-sdk-s3/customizations/object.rb#L67-L74
#
# ----
#
# Also note, any options for metadata (e.g. content_disposition, content_type)
# will not be applied unless the metadata_directive = "REPLACE" option is passed
# in. If this is not passed in, the source object's metadata will be used.
response = destination_object.copy_from(options)
FEATURE: Initial implementation of direct S3 uploads with uppy and stubs (#13787) This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader. A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. ### Starting a direct S3 upload When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded. Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage. ### Completing a direct S3 upload Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`. 1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this. 2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues. We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large. 3. Finally we return the serialized upload record back to the client There are several errors that could happen that are handled by `UploadsController` as well. Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
2021-07-28 06:42:25 +08:00
[destination, response.copy_object_result.etag.gsub('"', '')]
2018-08-08 11:26:05 +08:00
end
# Several places in the application need certain CORS rules to exist
# inside an S3 bucket so requests to the bucket can be made
# directly from the browser. The s3:ensure_cors_rules rake task
# is used to ensure these rules exist for assets, S3 backups, and
# direct S3 uploads, depending on configuration.
def ensure_cors!(rules = nil)
return unless SiteSetting.s3_install_cors_rule
rules = [rules] if !rules.is_a?(Array)
existing_rules = fetch_bucket_cors_rules
new_rules = rules - existing_rules
return false if new_rules.empty?
2017-10-09 07:26:58 +08:00
final_rules = existing_rules + new_rules
2017-10-09 07:26:58 +08:00
begin
2017-10-09 07:26:58 +08:00
s3_resource.client.put_bucket_cors(
bucket: @s3_bucket_name,
cors_configuration: {
cors_rules: final_rules
2017-10-09 07:26:58 +08:00
}
)
rescue Aws::S3::Errors::AccessDenied => err
# TODO (martin) Remove this warning log level once we are sure this new
# ensure_cors! rule is functioning correctly.
Discourse.warn_exception(err, message: "Could not PutBucketCors rules for #{@s3_bucket_name}, rules: #{final_rules}")
return false
2017-10-09 07:26:58 +08:00
end
true
2017-10-09 07:26:58 +08:00
end
def update_lifecycle(id, days, prefix: nil, tag: nil)
filter = {}
if prefix
filter[:prefix] = prefix
elsif tag
filter[:tag] = tag
end
# cf. http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
rule = {
id: id,
status: "Enabled",
2017-10-09 07:26:58 +08:00
expiration: { days: days },
filter: filter
}
2017-10-09 07:26:58 +08:00
rules = []
2017-10-09 07:26:58 +08:00
begin
rules = s3_resource.client.get_bucket_lifecycle_configuration(bucket: @s3_bucket_name).rules
rescue Aws::S3::Errors::NoSuchLifecycleConfiguration
# skip trying to merge
end
# in the past we has a rule that was called purge-tombstone vs purge_tombstone
# just go ahead and normalize for our bucket
rules.delete_if do |r|
r.id.gsub('_', '-') == id.gsub('_', '-')
end
rules << rule
# normalize filter in rules, due to AWS library bug
rules = rules.map do |r|
r = r.to_h
prefix = r.delete(:prefix)
if prefix
r[:filter] = { prefix: prefix }
end
r
end
2017-10-09 07:26:58 +08:00
s3_resource.client.put_bucket_lifecycle_configuration(
bucket: @s3_bucket_name,
lifecycle_configuration: {
rules: rules
})
end
def update_tombstone_lifecycle(grace_period)
return if !SiteSetting.s3_configure_tombstone_policy
return if @tombstone_prefix.blank?
update_lifecycle("purge_tombstone", grace_period, prefix: @tombstone_prefix)
end
def list(prefix = "", marker = nil)
options = { prefix: get_path_for_s3_upload(prefix) }
options[:marker] = marker if marker.present?
s3_bucket.objects(options)
end
def tag_file(key, tags)
tag_array = []
tags.each do |k, v|
tag_array << { key: k.to_s, value: v.to_s }
end
s3_resource.client.put_object_tagging(
bucket: @s3_bucket_name,
key: key,
tagging: {
tag_set: tag_array
}
)
end
def object(path)
2018-12-27 00:34:49 +08:00
s3_bucket.object(get_path_for_s3_upload(path))
end
def self.s3_options(obj)
2018-12-27 00:34:49 +08:00
opts = {
region: obj.s3_region
2018-12-27 00:34:49 +08:00
}
opts[:endpoint] = SiteSetting.s3_endpoint if SiteSetting.s3_endpoint.present?
opts[:http_continue_timeout] = SiteSetting.s3_http_continue_timeout
unless obj.s3_use_iam_profile
opts[:access_key_id] = obj.s3_access_key_id
opts[:secret_access_key] = obj.s3_secret_access_key
end
opts
end
def download_file(filename, destination_path, failure_message = nil)
object(filename).download_file(destination_path)
rescue => err
raise failure_message&.to_s || "Failed to download #{filename} because #{err.message.length > 0 ? err.message : err.class.to_s}"
end
def s3_client
@s3_client ||= Aws::S3::Client.new(@s3_options)
end
def s3_inventory_path(path = 'inventory')
get_path_for_s3_upload(path)
end
private
def fetch_bucket_cors_rules
begin
s3_resource.client.get_bucket_cors(
bucket: @s3_bucket_name
).cors_rules&.map(&:to_h) || []
rescue Aws::S3::Errors::NoSuchCORSConfiguration
# no rule
[]
end
end
def default_s3_options
if SiteSetting.enable_s3_uploads?
options = self.class.s3_options(SiteSetting)
check_missing_site_options
options
elsif GlobalSetting.use_s3?
self.class.s3_options(GlobalSetting)
else
{}
end
end
def get_path_for_s3_upload(path)
if @s3_bucket_folder_path &&
!path.starts_with?(@s3_bucket_folder_path) &&
!path.starts_with?(File.join(FileStore::BaseStore::TEMPORARY_UPLOAD_PREFIX, @s3_bucket_folder_path))
return File.join(@s3_bucket_folder_path, path)
end
path
end
def multisite_upload_path
path = File.join("uploads", RailsMultisite::ConnectionManagement.current_db, "/")
return path if !Rails.env.test?
File.join(path, "test_#{ENV['TEST_ENV_NUMBER'].presence || '0'}", "/")
end
def s3_resource
Aws::S3::Resource.new(client: s3_client)
end
def s3_bucket
2018-08-08 11:26:05 +08:00
@s3_bucket ||= begin
bucket = s3_resource.bucket(@s3_bucket_name)
bucket.create unless bucket.exists?
bucket
end
end
def check_missing_site_options
unless SiteSetting.s3_use_iam_profile
raise SettingMissing.new("access_key_id") if SiteSetting.s3_access_key_id.blank?
raise SettingMissing.new("secret_access_key") if SiteSetting.s3_secret_access_key.blank?
end
end
end