discourse

mirror of https://github.com/discourse/discourse.git synced 2024-11-22 05:47:45 +08:00

Author	SHA1	Message	Date
Bianca Nenciu	5a00a041f1	FEATURE: Mark bad uploads with :invalid_url (#29640 ) A "bad upload" in this context is a upload with a mismatched URL. This can happen when changing the S3 bucket used for uploads and the upload records in the database have not been remapped correctly.	2024-11-08 08:05:14 +08:00
Alan Guo Xiang Tan	5038cad68e	DEV: Restore `missing_s3_uploads` stats count if site was restored (#27984 ) This commit ensures that we reset the `missing_s3_uploads` status count if there are no inventory files which are at least 2 days older than the site's restored date. Otherwise, a site with missing uploads but was subsequntly restored will be continue to report missing uploads for 2 days.	2024-07-19 14:22:58 +08:00
Alan Guo Xiang Tan	86e5f46175	DEV: Add hidden `s3_inventory_bucket_region` site setting (#27786 ) This commit adds a hidden `s3_inventory_bucket_region` site setting to specify the region of the `s3_inventory_bucket` when the `S3Inventory` class initializes an instance of the `S3Helper`. By default, the `S3Helper` class uses the value of the `s3_region` site setting but the region of the `s3_inventory_bucket` is not always the same as the `s3_region` configured.	2024-07-09 12:03:43 +08:00
Alan Guo Xiang Tan	8cf4ed5f88	DEV: Introduce hidden `s3_inventory_bucket` site setting (#27304 ) This commit introduces a hidden `s3_inventory_bucket` site setting which replaces the `enable_s3_inventory` and `s3_configure_inventory_policy` site setting. The reason `enable_s3_inventory` and `s3_configure_inventory_policy` site settings are removed is because this feature has technically been broken since it was introduced. When the `enable_s3_inventory` feature is turned on, the app will because configure a daily inventory policy for the `s3_upload_bucket` bucket and store the inventories under a prefix in the bucket. The problem here is that once the inventories are created, there is nothing cleaning up all these inventories so whoever that has enabled this feature would have been paying the cost of storing a whole bunch of inventory files which are never used. Given that we have not received any complains about inventory files inflating S3 storage costs, we think that it is very likely that this feature is no longer being used and we are looking to drop support for this feature in the not too distance future. For now, we will still support a hidden `s3_inventory_bucket` site setting which site administrators can configure via the `DISCOURSE_S3_INVENTORY_BUCKET` env.	2024-06-10 13:16:00 +08:00
Alan Guo Xiang Tan	dc55b645b2	DEV: Allow site administrators to mark S3 uploads with a missing status (#27222 ) This commit introduces the following changes which allows a site administrator to mark `Upload` records with the `s3_file_missing` verification status which will result in the `Upload` record being ignored when `Discourse.store.list_missing_uploads` is ran on a site where S3 uploads are enabled and `SiteSetting.enable_s3_inventory` is set to `true`. 1. Introduce `s3_file_missing` to `Upload.verification_statuses` 2. Introduce `Upload.mark_invalid_s3_uploads_as_missing` which updates `Upload#verification_status` of all `Upload` records from `invalid_etag` to `s3_file_missing`. 3. Introduce `rake uploads:mark_invalid_s3_uploads_as_missing` Rake task which allows a site administrator to change `Upload` records with `invalid_etag` verification status to the `s3_file_missing` verificaton_status. 4. Update `S3Inventory` to ignore `Upload` records with the `s3_file_missing` verification status.	2024-05-30 08:37:38 +08:00
Alan Guo Xiang Tan	df16ab0758	FIX: `S3Inventory` to ignore files older than last backup restore date (#27166 ) This commit updates `S3Inventory#files` to ignore S3 inventory files which have a `last_modified` timestamp which are not at least 2 days older than `BackupMetadata.last_restore_date` timestamp. This check was previously only in `Jobs::EnsureS3UploadsExistence` but `S3Inventory` can also be used via Rake tasks so this protection needs to be in `S3Inventory` and not in the scheduled job.	2024-05-24 10:54:06 +08:00
Daniel Waterworth	666536cbd1	DEV: Prefer \A and \z over ^ and $ in regexes (#19936 )	2023-01-20 12:52:49 -06:00
David Taylor	6417173082	DEV: Apply syntax_tree formatting to `lib/*`	2023-01-09 12:10:19 +00:00
Gerhard Schlager	62be87d5f3	FIX: Filtering rows of S3 inventory files was too strict (#19153 ) In some setups the keys start with "original/" and "optimized/" and in some setups the key is something like "foo/original/", so lets make the filter less strict.	2022-11-22 21:41:22 +01:00
Gerhard Schlager	a597ef7131	PERF: Speed up S3 inventory updates (#19110 ) The UPDATE statement could lock the `uploads` table for a very long time when the `verification_status` of lots of uploads changed. Splitting up and simplifying the UPDATE solves that problem. Also, this change ensures that only the needed data from the inventory gets inserted into the `TEMP TABLE`. For example, there's no need to have records for optimized images in that table when the `uploads` table gets updated.	2022-11-20 21:52:30 +01:00
Peter Zhu	c5fd8c42db	DEV: Fix methods removed in Ruby 3.2 (#15459 ) * File.exists? is deprecated and removed in Ruby 3.2 in favor of File.exist? * Dir.exists? is deprecated and removed in Ruby 3.2 in favor of Dir.exist?	2022-01-05 18:45:08 +01:00
Sam	e0c952290b	FIX: increase inventory lag for s3 to 2 days (#11606 ) Inventory on S3 always lagged, over the past few weeks we are noticing that 1 day of lag is not enough. We are increasing this to 2, to ensure that we do not get false positive reports.	2020-12-30 16:05:42 +11:00
Penar Musaraj	9f6c4ad71a	FIX: inconsistency in S3 inventory config (#11112 ) Ensures it matches S3 inventory config generation in our hosting.	2020-11-05 08:39:40 -05:00
Martin Brennan	80268357e7	DEV: Change upload verified column to be integer (#10643 ) Per review https://review.discourse.org/t/dev-add-verified-to-uploads-and-fill-in-s3-inventory-10406/14180 Change the verified column for Upload to a verified_status integer column, to avoid having NULL as a weird implicit status.	2020-09-17 13:35:29 +10:00
Penar Musaraj	06b4ca5dc7	FIX: Mark only uploads as verified/unverified in S3 inventory	2020-09-14 10:21:34 -04:00
David Taylor	bd0a7553c4	DEV: Detect when s3 inventory failure is caused by etag difference (#10427 )	2020-08-13 09:30:28 +10:00
Martin Brennan	b950b3fb3f	DEV: Add verified to uploads and fill in S3 inventory (#10406 ) When we run the S3 inventory, mark uploads that exist as verified true, those that don't as verified false, and uploads not included in the check / not yet checked as verified nil.	2020-08-11 14:43:51 +10:00
David Taylor	16c65a94f7	PERF: Preload S3 inventory data for multisite clusters	2020-07-29 10:31:55 +01:00
David Taylor	ec4024fe6d	FIX: Keep by_users check in S3 inventory Partial revert of `8515d8fa` - the by_users check is ensuring we don't raise errors for fixtures	2020-07-21 17:19:56 +01:00
David Taylor	8515d8fae5	FIX: Improve S3 inventory logic Previously we considered 'upload rows without etags' to be exempt from the check. This is bad, because older/migrated sites might not have etags on all their uploads. We should consider rows without etags to be broken, since we can't check them against the inventory. This also removes the `by_users` scope. We need all uploads to be working, even ones created by the system user.	2020-07-21 15:55:53 +01:00
David Taylor	3d65678a13	DEV: Add timestamp columns to optimized_images table (#10199 ) This allows us to filter by created/updated date when comparing to an S3 inventory.	2020-07-14 11:50:33 +01:00
David Taylor	7f2b5a446a	PERF: Remove post_upload recovery in daily EnsureS3UploadsExistence job (#10173 ) This is a very expensive process, and it should only be required in exceptional circumstances. It is possible to run a similar recovery using `rake uploads:recover` (`5284d41a8e/lib/upload_recovery.rb (L135-L184)`)	2020-07-06 16:26:40 +01:00
Sam Saffron	38a30a6e96	DEV: correct regression and correct tests etag change in `31976ecf` was incorrect, revert it Also correct regression in test suite.	2020-07-06 10:56:19 +10:00
Sam Saffron	31976ecfeb	PERF: only update etag when it changes Previously when synchronizing upload etags we would update every single one regardless of change.	2020-07-06 10:40:04 +10:00
Jarek Radosz	73b04976e5	FIX: Use updated_at in the S3 inventory job (#8823 ) When we change upload's sha1 (e.g. when resizing images) it won't match the data in the most recent S3 inventory index. With this change the uploads that have been updated since the inventory has been generated are ignored.	2020-01-31 11:02:44 +01:00
Vinoth Kannan	3b7f5db5ba	FIX: parallel spec system needs a dedicated upload folder for each worker. (#8547 )	2019-12-18 11:21:57 +05:30
Osama Sayegh	68708db721	DEV: `S3Inventory#unsorted_files` should always return an array (#8034 )	2019-08-23 17:59:31 +10:00
Sam Saffron	e53a171916	FIX: hold s3 related distributed locks longer These operations are pretty expensive and can take multiple minutes due to networking. Hold distributed mutex for much longer.	2019-08-15 11:48:44 +10:00
Vinoth Kannan	9919ee1900	FIX: remove the tmp inventory files after the s3 uploads check.	2019-08-13 11:52:57 +05:30
Guo Xiang Tan	8a64b0c8e8	Revert "DEV: Remove unused kwarg and properly check for local missing uploads." This reverts commit `97769f3d02`. The code is confusing but this change is quite risky. Defer for now until we can look at it properly.	2019-07-29 14:35:34 +08:00
Guo Xiang Tan	97769f3d02	DEV: Remove unused kwarg and properly check for local missing uploads.	2019-07-29 14:21:06 +08:00
Vinoth Kannan	47deb8b3da	FIX: use same id for both original & optimized inventories in multisite setup.	2019-07-25 14:16:47 +05:30
Vinoth Kannan	ad04ce9f43	FIX: remove post upload record creation inside 'find_missing_uploads' method.	2019-07-19 01:44:08 +05:30
Vinoth Kannan	35d6fff69e	PERF: use url instead of file key in temporary inventory table.	2019-06-13 22:03:58 +05:30
David Taylor	ed21128ee6	FIX: Do not change directory when decompressing S3 inventory In sidekiq, jobs are run in multiple threads within the same process. `cd` affects the entire process, so can cause unexpected issues in other running jobs.	2019-06-13 17:13:50 +01:00
Vinoth Kannan	d74ee9dbce	DEV: skip S3 inventory records without correct multisite prefix.	2019-06-08 18:36:06 +05:30
Vinoth Kannan	2941c77abc	FIX: skip upload recovery if file not found in s3	2019-05-21 00:06:36 +05:30
Vinoth Kannan	2a7065c505	FIX: skip uploads without etag in s3 inventory check.	2019-05-20 00:09:52 +05:30
Vinoth Kannan	3172172b52	remove unused local variable `ec84c87ddb`	2019-05-16 15:39:13 +05:30
Vinoth Kannan	ec84c87ddb	FIX: skip validation while recovering uploads from s3 TODO: add tests	2019-05-16 15:37:11 +05:30
Vinoth Kannan	40328f055e	FIX: retrieve original filename from s3 object's content disposition header	2019-05-16 09:47:22 +05:30
Guo Xiang Tan	dd49be27d3	DEV: Fix undefined variable. Follow up to `e8fafbc123`.	2019-05-16 11:28:48 +08:00
Vinoth Kannan	f5a217be92	Fix typo in condition value.	2019-05-07 17:09:08 +05:30
Vinoth Kannan	e8fafbc123	List and restore missing post uploads from S3 inventory.	2019-05-04 01:16:20 +05:30
Vinoth Kannan	73418aaf73	DEV: Add bucket folder path to inventory id	2019-05-02 04:35:35 +05:30
Vinoth Kannan	a8f410a9c5	FEATURE: Create new helper method 'Discourse.stats' (#7388 )	2019-04-17 12:45:04 +05:30
Vinoth Kannan	35431a8ddb	FIX: set missing count in redis even if zero	2019-04-04 20:05:57 +05:30
Vinoth Kannan	df6ef856e6	DEV: save missing s3 uploads count in redis	2019-04-04 19:05:57 +05:30
Guo Xiang Tan	243fb8d9ad	Fix the build.	2019-03-13 17:39:07 +08:00
Vinoth Kannan	da1ff2da2c	FIX: Create and consume temp table inside a transaction (#7030 ) To prevent access issue in pgbouncer which runs in transaction pooling	2019-02-20 13:52:40 +11:00

1 2

62 Commits