mirror of
https://github.com/rclone/rclone.git
synced 2025-02-02 12:40:46 +08:00
s3: add docs on data integrity
Some checks are pending
Docker beta build / Build image job (push) Waiting to run
Some checks are pending
Docker beta build / Build image job (push) Waiting to run
See: https://forum.rclone.org/t/help-me-figure-out-how-to-verify-backup-accuracy-and-completeness-on-s3/37632/5
This commit is contained in:
parent
965bf19065
commit
91c8f92ccb
|
@ -435,6 +435,83 @@ If you are doing a server-side copy, you can also increase the number of transfe
|
|||
You will need to experiment with these values to find the optimal settings for your setup.
|
||||
|
||||
|
||||
### Data integrity
|
||||
|
||||
Rclone does its best to verify every part of an upload or download to
|
||||
the s3 provider using various hashes.
|
||||
|
||||
Every HTTP transaction to/from the provider has a
|
||||
`X-Amz-Content-Sha256` or a `Content-Md5` header to guard against
|
||||
corruption of the HTTP body. The HTTP Header is protected by the
|
||||
signature passed in the `Authorization` header.
|
||||
|
||||
All communications with the provider is done over https for encryption
|
||||
and additional error protection.
|
||||
|
||||
#### Single part uploads
|
||||
|
||||
- Rclone uploads single part uploads with a `Content-Md5` using the
|
||||
MD5 hash read from the source. The provider checks this is correct
|
||||
on receipt of the data.
|
||||
|
||||
- Rclone then does a HEAD request (disable with `--s3-no-head`) to
|
||||
read the `ETag` back which is the MD5 of the file and checks that with
|
||||
what it sent.
|
||||
|
||||
Note that if the source does not have an MD5 then the single part
|
||||
uploads will not have hash protection. In this case it is recommended
|
||||
to use `--s3-upload-cutoff 0` so all files are uploaded as multipart
|
||||
uploads.
|
||||
|
||||
#### Multipart uplaods
|
||||
|
||||
For files above `--s3-upload-cutoff` rclone splits the file into
|
||||
multiple parts for upload.
|
||||
|
||||
- Each part is protected with both an `X-Amz-Content-Sha256` and a
|
||||
`Content-Md5`
|
||||
|
||||
When rclone has finished the upload of all the parts it then completes
|
||||
the upload by sending:
|
||||
|
||||
- The MD5 hash of each part
|
||||
- The number of parts
|
||||
- This info is all protected with a `X-Amz-Content-Sha256`
|
||||
|
||||
The provider checks the MD5 for all the parts it has received against
|
||||
what rclone sends and if it is good it returns OK.
|
||||
|
||||
Rclone then does a HEAD request (disable with `--s3-no-head`) and
|
||||
checks the ETag is what it expects (in this case it should be the MD5
|
||||
sum of all the MD5 sums of all the parts with the number of parts on
|
||||
the end).
|
||||
|
||||
If the source has an MD5 sum then rclone will attach the
|
||||
`X-Amz-Meta-Md5chksum` with it as the `ETag` for a multipart upload
|
||||
can't easily be checked against the file as the chunk size must be
|
||||
known in order to calculate it.
|
||||
|
||||
#### Downloads
|
||||
|
||||
Rclone checks the MD5 hash of the data downloaded against either the
|
||||
ETag or the `X-Amz-Meta-Md5chksum` metadata (if present) which rclone
|
||||
uploads with multipart uploads.
|
||||
|
||||
#### Further checking
|
||||
|
||||
At each stage rclone and the provider are sending and checking hashes of
|
||||
**everything**. Rclone deliberately HEADs each object after upload to
|
||||
check it arrived safely for extra security. (You can disable this with
|
||||
`--s3-no-head`).
|
||||
|
||||
If you require further assurance that your data is intact you can use
|
||||
`rclone check` to check the hashes locally vs the remote.
|
||||
|
||||
And if you are feeling ultimately paranoid use `rclone check --download`
|
||||
which will download the files and check them against the local copies.
|
||||
(Note that this doesn't use disk to do this - it streams them in
|
||||
memory).
|
||||
|
||||
### Versions
|
||||
|
||||
When bucket versioning is enabled (this can be done with rclone with
|
||||
|
|
Loading…
Reference in New Issue
Block a user