Commit Graph

63 Commits

Author SHA1 Message Date
Alan Guo Xiang Tan
af642d0d69
Revert "FEATURE: Mark bad uploads with :invalid_url ()" ()
This reverts commit 5a00a041f1.

Implementation is currently not correct. Multiple uploads can share the
same etag but have different paths in the S3 bucket.
2024-11-08 13:04:52 +08:00
Bianca Nenciu
5a00a041f1
FEATURE: Mark bad uploads with :invalid_url ()
A "bad upload" in this context is a upload with a mismatched URL. This can happen when changing the S3 bucket used for uploads and the upload records in the database have not been remapped correctly.
2024-11-08 08:05:14 +08:00
Alan Guo Xiang Tan
5038cad68e
DEV: Restore missing_s3_uploads stats count if site was restored ()
This commit ensures that we reset the `missing_s3_uploads` status count
if there are no inventory files which are at least 2 days older than the
site's restored date.

Otherwise, a site with missing uploads but was subsequntly restored will
be continue to report missing uploads for 2 days.
2024-07-19 14:22:58 +08:00
Alan Guo Xiang Tan
86e5f46175
DEV: Add hidden s3_inventory_bucket_region site setting ()
This commit adds a hidden `s3_inventory_bucket_region` site setting to
specify the region of the `s3_inventory_bucket` when the `S3Inventory`
class initializes an instance of the `S3Helper`. By default, the
`S3Helper` class uses the value of the `s3_region` site setting but the
region of the `s3_inventory_bucket` is not always the same as the
`s3_region` configured.
2024-07-09 12:03:43 +08:00
Alan Guo Xiang Tan
8cf4ed5f88
DEV: Introduce hidden s3_inventory_bucket site setting ()
This commit introduces a hidden `s3_inventory_bucket` site setting which
replaces the `enable_s3_inventory` and `s3_configure_inventory_policy`
site setting.

The reason `enable_s3_inventory` and `s3_configure_inventory_policy`
site settings are removed is because this feature has technically been
broken since it was introduced. When the `enable_s3_inventory` feature
is turned on, the app will because configure a daily inventory policy for the
`s3_upload_bucket` bucket and store the inventories under a prefix in
the bucket. The problem here is that once the inventories are created,
there is nothing cleaning up all these inventories so whoever that has
enabled this feature would have been paying the cost of storing a whole
bunch of inventory files which are never used. Given that we have not
received any complains about inventory files inflating S3 storage costs,
we think that it is very likely that this feature is no longer being
used and we are looking to drop support for this feature in the not too
distance future.

For now, we will still support a hidden `s3_inventory_bucket` site
setting which site administrators can configure via the
`DISCOURSE_S3_INVENTORY_BUCKET` env.
2024-06-10 13:16:00 +08:00
Alan Guo Xiang Tan
dc55b645b2
DEV: Allow site administrators to mark S3 uploads with a missing status ()
This commit introduces the following changes which allows a site
administrator to mark `Upload` records with the `s3_file_missing`
verification status which will result in the `Upload` record being ignored when
`Discourse.store.list_missing_uploads` is ran on a site where S3 uploads
are enabled and `SiteSetting.enable_s3_inventory` is set to `true`.

1. Introduce `s3_file_missing` to `Upload.verification_statuses`
2. Introduce `Upload.mark_invalid_s3_uploads_as_missing` which updates
   `Upload#verification_status` of all `Upload` records from `invalid_etag` to `s3_file_missing`.
3. Introduce `rake uploads:mark_invalid_s3_uploads_as_missing` Rake task
   which allows a site administrator to change `Upload` records with
`invalid_etag` verification status to the `s3_file_missing`
verificaton_status.
4. Update `S3Inventory` to ignore `Upload` records with the
   `s3_file_missing` verification status.
2024-05-30 08:37:38 +08:00
Alan Guo Xiang Tan
df16ab0758
FIX: S3Inventory to ignore files older than last backup restore date ()
This commit updates `S3Inventory#files` to ignore S3 inventory files
which have a `last_modified` timestamp which are not at least 2 days
older than `BackupMetadata.last_restore_date` timestamp.

This check was previously only in `Jobs::EnsureS3UploadsExistence` but
`S3Inventory` can also be used via Rake tasks so this protection needs
to be in `S3Inventory` and not in the scheduled job.
2024-05-24 10:54:06 +08:00
Daniel Waterworth
666536cbd1
DEV: Prefer \A and \z over ^ and $ in regexes () 2023-01-20 12:52:49 -06:00
David Taylor
6417173082
DEV: Apply syntax_tree formatting to lib/* 2023-01-09 12:10:19 +00:00
Gerhard Schlager
62be87d5f3
FIX: Filtering rows of S3 inventory files was too strict ()
In some setups the keys start with "original/" and "optimized/" and in some setups the key is something like "foo/original/", so lets make the filter less strict.
2022-11-22 21:41:22 +01:00
Gerhard Schlager
a597ef7131
PERF: Speed up S3 inventory updates ()
The UPDATE statement could lock the `uploads` table for a very long time
when the `verification_status` of lots of uploads changed. Splitting up
and simplifying the UPDATE solves that problem.

Also, this change ensures that only the needed data from the inventory
gets inserted into the `TEMP TABLE`. For example, there's no need to
have records for optimized images in that table when the `uploads` table
gets updated.
2022-11-20 21:52:30 +01:00
Peter Zhu
c5fd8c42db
DEV: Fix methods removed in Ruby 3.2 ()
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
2022-01-05 18:45:08 +01:00
Sam
e0c952290b
FIX: increase inventory lag for s3 to 2 days ()
Inventory on S3 always lagged, over the past few weeks we are noticing that
1 day of lag is not enough.

We are increasing this to 2, to ensure that we do not get false positive
reports.
2020-12-30 16:05:42 +11:00
Penar Musaraj
9f6c4ad71a
FIX: inconsistency in S3 inventory config ()
Ensures it matches S3 inventory config generation in our hosting.
2020-11-05 08:39:40 -05:00
Martin Brennan
80268357e7
DEV: Change upload verified column to be integer ()
Per review https://review.discourse.org/t/dev-add-verified-to-uploads-and-fill-in-s3-inventory-10406/14180

Change the verified column for Upload to a verified_status integer column, to avoid having NULL as a weird implicit status.
2020-09-17 13:35:29 +10:00
Penar Musaraj
06b4ca5dc7
FIX: Mark only uploads as verified/unverified in S3 inventory 2020-09-14 10:21:34 -04:00
David Taylor
bd0a7553c4
DEV: Detect when s3 inventory failure is caused by etag difference () 2020-08-13 09:30:28 +10:00
Martin Brennan
b950b3fb3f
DEV: Add verified to uploads and fill in S3 inventory ()
When we run the S3 inventory, mark uploads that exist as verified true, those that don't as verified false, and uploads not included in the check / not yet checked as verified nil.
2020-08-11 14:43:51 +10:00
David Taylor
16c65a94f7
PERF: Preload S3 inventory data for multisite clusters 2020-07-29 10:31:55 +01:00
David Taylor
ec4024fe6d
FIX: Keep by_users check in S3 inventory
Partial revert of 8515d8fa - the by_users check is ensuring we don't raise errors for fixtures
2020-07-21 17:19:56 +01:00
David Taylor
8515d8fae5
FIX: Improve S3 inventory logic
Previously we considered 'upload rows without etags' to be exempt from the check. This is bad, because older/migrated sites might not have etags on all their uploads. We should consider rows without etags to be broken, since we can't check them against the inventory.

This also removes the `by_users` scope. We need all uploads to be working, even ones created by the system user.
2020-07-21 15:55:53 +01:00
David Taylor
3d65678a13
DEV: Add timestamp columns to optimized_images table ()
This allows us to filter by created/updated date when comparing to an S3 inventory.
2020-07-14 11:50:33 +01:00
David Taylor
7f2b5a446a
PERF: Remove post_upload recovery in daily EnsureS3UploadsExistence job ()
This is a very expensive process, and it should only be required in exceptional circumstances. It is possible to run a similar recovery using `rake uploads:recover` (5284d41a8e/lib/upload_recovery.rb (L135-L184))
2020-07-06 16:26:40 +01:00
Sam Saffron
38a30a6e96
DEV: correct regression and correct tests
etag change in 31976ecf was incorrect, revert it

Also correct regression in test suite.
2020-07-06 10:56:19 +10:00
Sam Saffron
31976ecfeb
PERF: only update etag when it changes
Previously when synchronizing upload etags we would update every single one
regardless of change.
2020-07-06 10:40:04 +10:00
Jarek Radosz
73b04976e5
FIX: Use updated_at in the S3 inventory job ()
When we change upload's sha1 (e.g. when resizing images) it won't match the data in the most recent S3 inventory index. With this change the uploads that have been updated since the inventory has been generated are ignored.
2020-01-31 11:02:44 +01:00
Vinoth Kannan
3b7f5db5ba
FIX: parallel spec system needs a dedicated upload folder for each worker. () 2019-12-18 11:21:57 +05:30
Osama Sayegh
68708db721 DEV: S3Inventory#unsorted_files should always return an array () 2019-08-23 17:59:31 +10:00
Sam Saffron
e53a171916 FIX: hold s3 related distributed locks longer
These operations are pretty expensive and can take multiple minutes due to
networking.

Hold distributed mutex for much longer.
2019-08-15 11:48:44 +10:00
Vinoth Kannan
9919ee1900 FIX: remove the tmp inventory files after the s3 uploads check. 2019-08-13 11:52:57 +05:30
Guo Xiang Tan
8a64b0c8e8 Revert "DEV: Remove unused kwarg and properly check for local missing uploads."
This reverts commit 97769f3d02.

The code is confusing but this change is quite risky. Defer for now
until we can look at it properly.
2019-07-29 14:35:34 +08:00
Guo Xiang Tan
97769f3d02 DEV: Remove unused kwarg and properly check for local missing uploads. 2019-07-29 14:21:06 +08:00
Vinoth Kannan
47deb8b3da FIX: use same id for both original & optimized inventories in multisite setup. 2019-07-25 14:16:47 +05:30
Vinoth Kannan
ad04ce9f43 FIX: remove post upload record creation inside 'find_missing_uploads' method. 2019-07-19 01:44:08 +05:30
Vinoth Kannan
35d6fff69e PERF: use url instead of file key in temporary inventory table. 2019-06-13 22:03:58 +05:30
David Taylor
ed21128ee6 FIX: Do not change directory when decompressing S3 inventory
In sidekiq, jobs are run in multiple threads within the same process. `cd` affects the entire process, so can cause unexpected issues in other running jobs.
2019-06-13 17:13:50 +01:00
Vinoth Kannan
d74ee9dbce DEV: skip S3 inventory records without correct multisite prefix. 2019-06-08 18:36:06 +05:30
Vinoth Kannan
2941c77abc FIX: skip upload recovery if file not found in s3 2019-05-21 00:06:36 +05:30
Vinoth Kannan
2a7065c505 FIX: skip uploads without etag in s3 inventory check. 2019-05-20 00:09:52 +05:30
Vinoth Kannan
3172172b52 remove unused local variable
ec84c87ddb
2019-05-16 15:39:13 +05:30
Vinoth Kannan
ec84c87ddb FIX: skip validation while recovering uploads from s3
TODO: add tests
2019-05-16 15:37:11 +05:30
Vinoth Kannan
40328f055e FIX: retrieve original filename from s3 object's content disposition header 2019-05-16 09:47:22 +05:30
Guo Xiang Tan
dd49be27d3 DEV: Fix undefined variable.
Follow up to e8fafbc123.
2019-05-16 11:28:48 +08:00
Vinoth Kannan
f5a217be92 Fix typo in condition value. 2019-05-07 17:09:08 +05:30
Vinoth Kannan
e8fafbc123 List and restore missing post uploads from S3 inventory. 2019-05-04 01:16:20 +05:30
Vinoth Kannan
73418aaf73 DEV: Add bucket folder path to inventory id 2019-05-02 04:35:35 +05:30
Vinoth Kannan
a8f410a9c5
FEATURE: Create new helper method 'Discourse.stats' () 2019-04-17 12:45:04 +05:30
Vinoth Kannan
35431a8ddb FIX: set missing count in redis even if zero 2019-04-04 20:05:57 +05:30
Vinoth Kannan
df6ef856e6 DEV: save missing s3 uploads count in redis 2019-04-04 19:05:57 +05:30
Guo Xiang Tan
243fb8d9ad Fix the build. 2019-03-13 17:39:07 +08:00