rclone

mirror of https://github.com/rclone/rclone.git synced 2024-11-23 21:30:07 +08:00

Author	SHA1	Message	Date
nielash	b4216648e4	bisync: full support for comparing checksum, size, modtime - fixes #5679 fixes #5683 fixes #5684 fixes #5675 Before this change, bisync could only detect changes based on modtime, and would refuse to run if either path lacked modtime support. This made bisync unavailable for many of rclone's backends. Additionally, bisync did not account for the Fs's precision when comparing modtimes, meaning that they could only be reliably compared within the same side -- not against the opposite side. Size and checksum (even when available) were ignored completely for deltas. After this change, bisync now fully supports comparing based on any combination of size, modtime, and checksum, lifting the prior restriction on backends without modtime support. The comparison logic considers the backend's precision, hash types, and other features as appropriate. The comparison features optionally use a new --compare flag (which takes any combination of size,modtime,checksum) and even supports some combinations not otherwise supported in `sync` (like comparing all three at the same time.) By default (without the --compare flag), bisync inherits the same comparison options as `sync` (that is: size and modtime by default, unless modified with flags such as --checksum or --size-only.) If the --compare flag is set, it will override these defaults. If --compare includes checksum and both remotes support checksums but have no hash types in common with each other, checksums will be considered only for comparisons within the same side (to determine what has changed since the prior sync), but not for comparisons against the opposite side. If one side supports checksums and the other does not, checksums will only be considered on the side that supports them. When comparing with checksum and/or size without modtime, bisync cannot determine whether a file is newer or older -- only whether it is changed or unchanged. (If it is changed on both sides, bisync still does the standard equality-check to avoid declaring a sync conflict unless it absolutely has to.) Also included are some new flags to customize the checksum comparison behavior on backends where hashes are slow or unavailable. --no-slow-hash and --slow-hash-sync-only allow selectively ignoring checksums on backends such as local where they are slow. --download-hash allows computing them by downloading when (and only when) they're otherwise not available. Of course, this option probably won't be practical with large files, but may be a good option for syncing small-but-important files with maximum accuracy (for example, a source code repo on a crypt remote.) An additional advantage over methods like cryptcheck is that the original file is not required for comparison (for example, --download-hash can be used to bisync two different crypt remotes with different passwords.) Additionally, all of the above are now considered during the final --check-sync for much-improved accuracy (before this change, it only compared filenames!) Many other details are explained in the included docs.	2024-01-20 16:08:06 -05:00
nielash	9cf783677e	bisync: support files with unknown length, including Google Docs - fixes #5696 Before this change, bisync intentionally ignored Google Docs (albeit in a buggy way that caused problems during --resync.) After this change, Google Docs (including Google Sheets, Slides, etc.) are now supported in bisync, subject to the same options, defaults, and limitations as in `rclone sync`. When bisyncing drive with non-drive backends, the drive -> non-drive direction is controlled by `--drive-export-formats` (default `"docx,xlsx,pptx,svg"`) and the non-drive -> drive direction is controlled by `--drive-import-formats` (default none.) For example, with the default export/import formats, a Google Sheet on the drive side will be synced to an `.xlsx` file on the non-drive side. In the reverse direction, `.xlsx` files with filenames that match an existing Google Sheet will be synced to that Google Sheet, while `.xlsx` files that do NOT match an existing Google Sheet will be copied to drive as normal `.xlsx` files (without conversion to Sheets, although the Google Drive web browser UI may still give you the option to open it as one.) If `--drive-import-formats` is set (it's not, by default), then all of the specified formats will be converted to Google Docs, if there is no existing Google Doc with a matching name. Caution: such conversion can be quite lossy, and in most cases it's probably not what you want! To bisync Google Docs as URL shortcut links (in a manner similar to "Drive for Desktop"), use: `--drive-export-formats url` (or alternatives.) Note that these link files cannot be edited on the non-drive side -- you will get errors if you try to sync an edited link file back to drive. They CAN be deleted (it will result in deleting the corresponding Google Doc.) If you create a `.url` file on the non-drive side that does not match an existing Google Doc, bisyncing it will just result in copying the literal `.url` file over to drive (no Google Doc will be created.) So, as a general rule of thumb, think of them as read-only placeholders on the non-drive side, and make all your changes on the drive side. Likewise, even with other export-formats, it is best to only move/rename Google Docs on the drive side. This is because otherwise, bisync will interpret this as a file deleted and another created, and accordingly, it will delete the Google Doc and create a new file at the new path. (Whether or not that new file is a Google Doc depends on `--drive-import-formats`.) Lastly, take note that all Google Docs on the drive side have a size of `-1` and no checksum. Therefore, they cannot be reliably synced with the `--checksum` or `--size-only` flags. (To be exact: they will still get created/deleted, and bisync's delta engine will notice changes and queue them for syncing, but the underlying sync function will consider them identical and skip them.) To work around this, use the default (modtime and size) instead of `--checksum` or `--size-only`. To ignore Google Docs entirely, use `--drive-skip-gdocs`. Nearly all of the Google Docs logic is outsourced to the Drive backend, so future changes should also be supported by bisync.	2024-01-20 14:50:08 -05:00
nielash	98f539de8f	bisync: refactor normalization code, fix deltas - fixes #7270 Refactored the case / unicode normalization logic to be much more efficient, and fix the last outstanding issue from #7270. Before this change, we were doing lots of for loops and re-normalizing strings we had already normalized earlier. Now, we leave the normalizing entirely to March and avoid re-transforming later, which seems to make a large difference in terms of performance.	2024-01-20 14:50:08 -05:00
nielash	9c96c13a35	bisync: optimize --resync performance -- partially addresses #5681 Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in #5681!)	2024-01-20 14:50:08 -05:00
nielash	f7f4651828	bisync: handle unicode and case normalization consistently - mostly-fixes #7270 Before this change, Bisync sometimes normalized NFD to NFC and sometimes did not, causing errors in some scenarios (particularly for users of macOS). It was similarly inconsistent in its handling of case-insensitivity. There were three main places where Bisync should have normalized, but didn't: 1. When building the list of files that need to be transferred during --resync 2. When building the list of deltas during a non-resync 3. When comparing Path1 to Path2 during --check-sync After this change, 1 and 3 are resolved, and bisync supports --no-unicode-normalization and --ignore-case-sync in the same way as sync. 2 will be addressed in a future update.	2024-01-20 14:50:08 -05:00
nielash	fd95511091	bisync: generate listings concurrently with march -- fixes #7332 Before this change, bisync needed to build a full listing for Path1, then a full listing for Path2, then compare them -- and each of those tasks needed to finish before the next one could start. In addition to being slow and inefficient, it also caused real problems if a file changed between the time bisync checked it on Path1 and the time it checked the corresponding file on Path2. This change solves these problems by listing both paths concurrently, using the same March infrastructure that check and sync use to traverse two directories in lock-step, optimized by Go's robust concurrency support. Listings should now be much faster, and any given path is now checked nearly-instantaneously on both sides, minimizing room for error. Further discussion: https://forum.rclone.org/t/bisync-bugs-and-feature-requests/37636#:~:text=4.%20Listings%20should%20alternate%20between%20paths%20to%20minimize%20errors	2024-01-20 14:50:08 -05:00

6 Commits