mirror of
https://github.com/rclone/rclone.git
synced 2025-02-02 12:59:45 +08:00
s3: add docs on data integrity
Some checks are pending
Docker beta build / Build image job (push) Waiting to run
Some checks are pending
Docker beta build / Build image job (push) Waiting to run
See: https://forum.rclone.org/t/help-me-figure-out-how-to-verify-backup-accuracy-and-completeness-on-s3/37632/5
This commit is contained in:
parent
965bf19065
commit
91c8f92ccb
|
@ -435,6 +435,83 @@ If you are doing a server-side copy, you can also increase the number of transfe
|
||||||
You will need to experiment with these values to find the optimal settings for your setup.
|
You will need to experiment with these values to find the optimal settings for your setup.
|
||||||
|
|
||||||
|
|
||||||
|
### Data integrity
|
||||||
|
|
||||||
|
Rclone does its best to verify every part of an upload or download to
|
||||||
|
the s3 provider using various hashes.
|
||||||
|
|
||||||
|
Every HTTP transaction to/from the provider has a
|
||||||
|
`X-Amz-Content-Sha256` or a `Content-Md5` header to guard against
|
||||||
|
corruption of the HTTP body. The HTTP Header is protected by the
|
||||||
|
signature passed in the `Authorization` header.
|
||||||
|
|
||||||
|
All communications with the provider is done over https for encryption
|
||||||
|
and additional error protection.
|
||||||
|
|
||||||
|
#### Single part uploads
|
||||||
|
|
||||||
|
- Rclone uploads single part uploads with a `Content-Md5` using the
|
||||||
|
MD5 hash read from the source. The provider checks this is correct
|
||||||
|
on receipt of the data.
|
||||||
|
|
||||||
|
- Rclone then does a HEAD request (disable with `--s3-no-head`) to
|
||||||
|
read the `ETag` back which is the MD5 of the file and checks that with
|
||||||
|
what it sent.
|
||||||
|
|
||||||
|
Note that if the source does not have an MD5 then the single part
|
||||||
|
uploads will not have hash protection. In this case it is recommended
|
||||||
|
to use `--s3-upload-cutoff 0` so all files are uploaded as multipart
|
||||||
|
uploads.
|
||||||
|
|
||||||
|
#### Multipart uplaods
|
||||||
|
|
||||||
|
For files above `--s3-upload-cutoff` rclone splits the file into
|
||||||
|
multiple parts for upload.
|
||||||
|
|
||||||
|
- Each part is protected with both an `X-Amz-Content-Sha256` and a
|
||||||
|
`Content-Md5`
|
||||||
|
|
||||||
|
When rclone has finished the upload of all the parts it then completes
|
||||||
|
the upload by sending:
|
||||||
|
|
||||||
|
- The MD5 hash of each part
|
||||||
|
- The number of parts
|
||||||
|
- This info is all protected with a `X-Amz-Content-Sha256`
|
||||||
|
|
||||||
|
The provider checks the MD5 for all the parts it has received against
|
||||||
|
what rclone sends and if it is good it returns OK.
|
||||||
|
|
||||||
|
Rclone then does a HEAD request (disable with `--s3-no-head`) and
|
||||||
|
checks the ETag is what it expects (in this case it should be the MD5
|
||||||
|
sum of all the MD5 sums of all the parts with the number of parts on
|
||||||
|
the end).
|
||||||
|
|
||||||
|
If the source has an MD5 sum then rclone will attach the
|
||||||
|
`X-Amz-Meta-Md5chksum` with it as the `ETag` for a multipart upload
|
||||||
|
can't easily be checked against the file as the chunk size must be
|
||||||
|
known in order to calculate it.
|
||||||
|
|
||||||
|
#### Downloads
|
||||||
|
|
||||||
|
Rclone checks the MD5 hash of the data downloaded against either the
|
||||||
|
ETag or the `X-Amz-Meta-Md5chksum` metadata (if present) which rclone
|
||||||
|
uploads with multipart uploads.
|
||||||
|
|
||||||
|
#### Further checking
|
||||||
|
|
||||||
|
At each stage rclone and the provider are sending and checking hashes of
|
||||||
|
**everything**. Rclone deliberately HEADs each object after upload to
|
||||||
|
check it arrived safely for extra security. (You can disable this with
|
||||||
|
`--s3-no-head`).
|
||||||
|
|
||||||
|
If you require further assurance that your data is intact you can use
|
||||||
|
`rclone check` to check the hashes locally vs the remote.
|
||||||
|
|
||||||
|
And if you are feeling ultimately paranoid use `rclone check --download`
|
||||||
|
which will download the files and check them against the local copies.
|
||||||
|
(Note that this doesn't use disk to do this - it streams them in
|
||||||
|
memory).
|
||||||
|
|
||||||
### Versions
|
### Versions
|
||||||
|
|
||||||
When bucket versioning is enabled (this can be done with rclone with
|
When bucket versioning is enabled (this can be done with rclone with
|
||||||
|
|
Loading…
Reference in New Issue
Block a user