Before this change but after:
aea8776a43 vfs: fix modtimes not updating when writing via cache #4763
When a file was opened read-only the modtime was read from the cached
file. However this modtime wasn't correct leading to an incorrect
result.
This change fixes the definition of `item.IsDirty` to be true only
when the data is dirty. This fixes the problem as a read only file
isn't considered dirty.
Includes adding support for additional size input suffix Mi and MiB, treated equivalent to M.
Extends binary suffix output with letter i, e.g. Ki and Mi.
Centralizes creation of bit/byte unit strings.
The vfs-cache-max-size parameter is probably confusing to many users.
The cache cleaner checks cache size periodically at the --vfs-cache-poll-interval
(default 60 seconds) interval and remove cache items in the following order.
(1) cache items that are not in use and with age > vfs-cache-max-age
(2) if the cache space used at this time still is larger than
vfs-cache-max-size, the cleaner continues to remove cache items that are
not in use.
The cache cleaning process does not remove cache items that are currently in use.
If the total space consumed by in-use cache items exceeds vfs-cache-max-size, the
periodical cache cleaner thread does not do anything further and leaves the in-use
cache items alone with a total space larger than vfs-cache-max-size.
A cache reset feature was introduced in 1.53 which resets in-use (but not dirty,
i.e., not being updated) cache items when additional cache data incurs an ENOSPC
error. But this code was not activated in the periodical cache cleaning thread.
This patch adds the cache reset step in the cache cleaner thread during cache
poll to reset cache items until the total size of the remaining cache items is
below vfs-cache-max-size.
Before this fix, doing CTRL-C and CTRL-V on a file in Windows explorer
caused the **source** and the the destination to be truncated to 0.
This is because Windows opens the source file with Create with flags
`O_RDWR|O_CREATE|O_EXCL` but doesn't write to it - it only reads from
it. Rclone was taking the call to Create as a signal to always make a
new file, but this is incorrect.
This fix reads an existing file from the directory if it exists when
Create is called rather than always creating a new one. This fixes the
problem.
Fixes#5181
Before this change, if a directory was renamed and it or any children
had virtual entries in it they weren't flushed.
The consequence of this was that the directory path got out sync with
the actual position of the directory in the tree, leading to listings
of the old directory rather than the new one.
The fix renames any directories remaining after the ForgetAll to have
the correct path which fixes the problem.
See: https://forum.rclone.org/t/after-a-directory-renmane-using-mv-files-are-not-visible-any-longer/22797
When using --vfs-cache-mode writes or full if a file was opened for
write intent, the modtime was set and the file was closed without
being modified the modtime would never be written back to storage.
The sequence of events
- app opens file with write intent
- app does set modtime
- rclone sets the modtime on the cache file, but not the remote file
because it is open for write and can't be set yet
- app closes the file without changing it
- rclone doesn't upload the file because the file wasn't changed so
the modtime doesn't get updated
This fixes the problem by making sure any unapplied modtime changes
are applied even if the file is not modified when being closed.
Fixes#4795
Before this change, rclone would return the modification times of the
cache file or the pending modtime which would be more accurate than
the modtime that the backend was capable of.
This meant that the modtime would be change slightly when the item was
actually uploaded.
For example modification times on Google Drive would be rounded to the
nearest millisecond.
This fixes the VFS layer to always return modtimes directly from an
object stored on the remote, or rounded to the precision that the
remote is capable of.
Before this change, rclone would set the modification time of an
object after it had been uploaded. However with --vfs-cache-mode
writes and above, the modification time of the object is already
correct as the cache backing file gets set with the correct
modification time before upload.
Setting the modification time causes another version to be created on
backends such as S3 so it should be avoided if possible.
This change checks to see if the modification time needs changing and
only sets it if necessary.
See: https://forum.rclone.org/t/produce-2-versions-when-overwrite-an-object-in-min-io/19634
Some backends, most notably S3, do not report the amount of bytes used.
This patch introduces a new flag that allows instead of relying on the
backend, use recursive scan similar to `rclone size` to compute the total
used space. However, this is ineffective and should be used as a last resort.
Co-authored-by: Yves G <theYinYeti@yalis.fr>
Before this change options were read and set in native format. This
means for example nanoseconds for durations or an integer for
enumerated types, which isn't very convenient for humans.
This change enables these types to be set with a string with the
syntax as used in the command line instead, so `"10s"` rather than
`10000000000` or `"DEBUG"` rather than `8` for log level.
The initial ':' is included in the ad-hoc remote name, but is illegal character
in Windows path. Replacing it with '^', which is legal in filesystems but illegal
in regular remote names, so name conflict is avoided.
Fixes#4544
Before this change using --vfs-cache-mode full and --buffer-size 0
together caused the vfs downloader to open more and more downloaders.
This is fixed by introducing a minimum size of 1M for the window to
look for an existing downloader.
Fixes#4892
Add --network-mode option to activate mounting as network drive without having to set volume prefix.
Add support for automatic drive letter assignment (not specific to network drive mounting).
Allow full network share unc path in --volname, which will also implicitely activate network drive mounting.
Allow full network share unc path as mountpoint, which will also implicitely activate network drive mounting, and the specified path will be used as volume prefix and the remote will be mounted on an automatically assigned drive letter instead.
If --cache-dir is passed in as a relative path, then rclone will not
be able to turn it into a UNC path under Windows, which means that
file names longer than 260 chars will fail when stored in the cache.
This patch makes the --cache-dir path absolute before using it.
See: https://forum.rclone.org/t/handling-of-long-paths-on-windows-260-characters/20913
This is done by making fs.Config private and attaching it to the
context instead.
The Config should be obtained with fs.GetConfig and fs.AddConfig
should be used to get a new mutable config that can be changed.
Before this change if a file was uploaded through a mount, then
deleted externally, trying to upload that file again could give EEXIST
"file already exists".
This was because the file already existing in the cache was confusing
rclone into thinking it already had the file.
The fix is to check that if rclone has a stale cache file then to
ignore it in this situation.
See: https://forum.rclone.org/t/rclone-cant-reuse-filenames/20400
Before this change, if a file was created on a remote but deleted
externally from that remote then there was potential for the delete to
never be noticed.
The sequence of events was:
- Create file on VFS - creates virtual directory entry
- File deleted externally to remote before the directory refreshed
- Now the file has a virtual add but is not in the listings so will never disappear
This patch fixes it by removing all virtual directory entries except
the following when the directory is re-read.
- On remotes which can't have empty directories: virtual directory
adds are not flushed. These will remain virtual as long as the
directory is empty.
- For virtual file add: files that are in the process of being
uploaded are not flushed
This patch also adds the distinction between virtually added files and
directories.
It also refactors the virtual directory logic to make it easier to follow.
Fixes#4446
This adds a context.Context parameter to NewFs and related calls.
This is necessary as part of reading config from the context -
backends need to be able to read the global config.
The missed update can cause incorrect before-cleaning cache stats
and a pre-mature condition broadcast in purgeOld before the cache
space use is reduced below the quota.
Add an exponentially increasing delay during retries up ENOSPC error
to avoid exhausting the 10 retries too soon when the cache space
recovery from item resets is not available from the file system yet
or consumed by other large cache writes.
Item reset is invoked by cache cleaner for synchronous recovery
from ENOSPC errors. The reset operation removes the cache file and
closes/reopens the downloaders. Although most parts of reset and
other item operations are done with the item mutex held, the mutex
is released during fd.WriteAt and downloaders calls. We used preAccess
and postAccess calls to serialize Reset, ReadAt, and Open, but missed
some other item operations. The patch adds preAccess/postAccess
calls in Sync, Truncate, Close, WriteAt, and rename.
A failed item reset is saved in the errItems for retryFailedResets
to process. If the item gets closed before the retry, the item may
have been removed from the c.item array. Previous code did not
account for this condition. This patch adds the check for the
exitence of the retry items in retryFailedResets.
The downloaders.Close() call acquires the downloaders' mutex before
calling the wait group wait and the main downloaders thread has a
periodical (5 seconds interval) call to kick its waiters and the
waiter dispatch function tries to get the mutex. So a deadlock can
occur if the Close() call starts, gets the mutex, while the main
downloader thread already got the timer's tick and proceeded to
call kickWaiters. The deadlock happens when the Close call gets
the mutex between the timer's kick and the main downloader thread
gets the mutex first. So it's a pretty short period of time and
it probably explains why the problem has not surfaced, maybe
something like tens of nanoseconds out of 5 seconds (~10^^-8).
It took 5 days of continued stressing the Close calls for the
deadlock to appear.
Before this change if a file was removed from the cache while rclone
is running then rclone would not notice and proceed to re-create it
full of zeros.
This change notices files that we expect to have data in going missing
and if they do logs an ERROR recovers.
It isn't recommended deleting files from the cache manually with
rclone running!
See: https://forum.rclone.org/t/corrupted-data-streaming-after-vfs-meta-files-removed/18997Fixes#4602
Before this change the error message was produced for every file which
was confusing users.
After this change we check for EOF and return from ReadAt at that
point.
See: https://forum.rclone.org/t/rclone-1-53-release/18880/10
This patch provides the support of synchronous cache space recovery
to allow read threads to recover from ENOSPC errors when cache space
can be recovered from cache items that are not in use or safe to be
reset/emptied .
The patch complements the existing cache cleaning process in two ways.
Firstly, the existing cache cleaning process is time-driven that runs
periodically. The cache space can run out while the cache cleaner
thread is still waiting for its next scheduled run. The io threads
encountering ENOSPC return an internal error to the applications
in this case even when cache space can be recovered to avoid this
error. This patch addresses this problem by having the read threads
kick the cache cleaner thread in this condition to recover cache
space preventing unnecessary ENOSPC errors from being seen by the
applications.
Secondly, this patch enhances the cache cleaner to support cache
item reset. Currently the cache purge process removes cache
items that are not in use. This may not be sufficient when the
total size of the working set exceeds the cache directory's
capacity. Like in the current code, this patch starts the purge
process by removing cache files that are not in use. Cache items
whose access times are older than vfs-cache-max-age are removed first.
After that, other not-in-use items are removed in LRU order until
vfs-cache-max-size is reached. If the vfs-cache-max-size (the quota)
is still not reached at this time, this patch adds a cache reset
step to reset/empty cache files that are still in use but not
dirtied. This enables application processes to continue without
seeing an error even when the working set depletes the cache space
as long as there is not a large write working set hoarding the
entire cache space.
By design this patch does not add ENOSPC error recovery for write
IOs. Rclone does not empty a write cache item until the file data
is written back to the backend upon close. Allowing more cache
space to be consumed by dirty cache items when the cache space is
already running low would increase the risk of exhausting the cache
space in a way that the vfs mount becomes unreadable.
Before this change we set the modtime of the cache file when all
writers had finished.
This has the unfortunate effect that the file is uploaded with the
wrong modtime which means on backends which can't set modtimes except
when uploading files it is wrong.
This change sets the modtime of the cache file immediately in the
cache and in turn sets the modtime in the file info.
Before this change the background writing of the file was racing with
the test of the object on the remote.
This meant that the tests passed locally but failed on a lot of the
remotes.
Before this change we didn't check the file exists before renaming it,
setting its modification time or deleting it. If the file isn't in the
cache we don't need to do the action since it has been done on the
actual object, so these errors were producing unecessary log messages.
This change checks to see if the file exists first before doing those
actions.
Before this fix, download threads would fill up the buffer and then
timeout even though data was still being read from them. If the client
was streaming slower than network speed this caused the downloader to
stop and be restarted continuously. This caused more potential for
skips in the download and unecessary network transactions.
This patch fixes that behaviour - as long as a downloader is being
read from more often than once every 5 seconds, it won't timeout.
This was done by:
- kicking the downloader whenever ensureDownloader is called
- making the downloader loop if it has already downloaded past the maxOffset
- making setRange() always kick the downloader
Due to Chrome's rather complicated use of file handles when saving
files from the download windows, rclone was attempting to truncate a
closed file.
The file appeared closed due to the handling of 0 length files.
This patch removes the check for the file being closed in the
WriteFileHandle.Truncate call. This is safe because the only action
this method takes is to emit an error message if the file is the wrong
size.
See: https://forum.rclone.org/t/google-drive-cannot-save-files-directly-from-browser-to-gdrive-mounted-path/17992/
This is preparation for getting the Accounting to check the context,
buf first we need to get it in place. Since this is one of those
changes that makes lots of noise, this is in a seperate commit.
Before this fix we took the directory lock to read the ModTime of the
directory. This was causing locking on directories which were being
re-read from the backend.
This commit gives the modtime its own lock so it can be read even when
the directory is being updated.
See: https://forum.rclone.org/t/high-cpu-load-with-rclone-mount/17604
Before this change files that were in the cache and renamed with
--vfs-cache-mode minimal weren't renamed at all.
This fixes the problem and adds tests for all the different
combinations of cache modes and in and out of the cache.
In this commit (released in v1.52.0)
6ca7198f mount: fix disappearing cwd problem
SetSys was introduced to cache node lookups.
Unfortunately taking the vfs.(*Dir) lock in SetSys causes any FUSE
operations on a directory to pile up behind slow directory listings.
In some situations this leads to very high load.
This commit fixes it by using atomic operations to read and write the
Sys value make it independent of the lock.
See: https://forum.rclone.org/t/high-cpu-load-with-rclone-mount/17604
See: #4104
Previous to the fix, if an item was being uploaded and it was renamed,
the upload would fail with missing checksum errors.
This change cancels any uploads in progress if the file is renamed.
This was caused by the signal to stop buffering being ignored when
there was no buffer!
This is fixed by explicitly checking for no buffering and stopping.
Before this change, if we restarted an upload after a restart then the
file would get uploaded but never added to the directory listings.
This change makes sure we add virtual items to the directory cache
when reloading the cache so that they show up properly.
Rclone adds virtual directory entries to the directory cache when it
creates a file or directory.
Before this change these dropped out of the directory cache when the
directory cache was reloaded. This meant that when the directory cache
expired:
- On bucket based backends, empty directories would disappear
- When using VFS writeback, files in the process of uploading would disappear
This is fixed by keeping track of the virtual entries in each
directory. The virtual entries are removed when they become real - ie
the object is read back from the listing.
This also keeps tracks of deletes in the same way so if a file is
deleted, it will not re-appear when the directory cache is reloaded if
the deletion hasn't finished yet.
Before this change we initialized the rc for a single VFS. However
rclone can have multiple VFSes in use now so this is no longer
adequate.
This change adds an optional fs parameter to all the VFS methods to
disambiguate VFSes when there is more than one in use.
It also adds a method vfs/list to show all the active VFSes.
This adds outline tests for the rc commands which didn't have tests
before.
- fix deadlock when cancelling upload
- fix double upload and panic after cancelled upload
- fix cancelation strategy of uploading files
- don't cancel uploads if we don't modify the file
- cancel uploads if we do modify the file
- fix deadlock between Item and writeback
- fix confusion about whether writeback item was being uploaded
- fix cornercases in cancelling uploads and removing files
Item
- Remove unused method getName
- Fix Truncate on unopened file
- Fix bug when downloading segments to fill out file on close
- Fix bug when WriteAt extends the file and we don't mark space as used
downloader
- Retry failed waiters every 5 seconds
- Download to multiple places at once in the stream
- Restart as necessary
- Timeout unused downloaders
- Close reader if too much data skipped
- Only use one file handle as use item for writing
- Implement --vfs-read-chunk-size and --vfs-read-chunk-limit
- fix deadlock between asyncbuffer and vfs cache downloader
- fix display of stream abandoned error which should be ignored
On file Remove
- cancel any writebacks in progress
- ignore error message deleting non existent file if file was in the
process of being uploaded
Writeback
- Don't transfer the file if it has disappeared in the meantime
- Take our own copy of the file name to avoid deadlocks
- Fix delayed retry logic
- Wait for upload to finish when cancelling upload
Fix race condition in item saving
Fix race condition in vfscache test
Make sure we delete the file on the error path - this makes cascading
failures much less likely
This allows reads to only read part of the file and it keeps on disk a
cache of what parts of each file have been loaded.
File data itself is kept in sparse files.
Before this fix, rclone would sometimes hang in vfs.readAt().
This was due to a race condition causing rclone to miss the timeout
signal.
This was fixed by a small amount of extra locking.
This very likely also fixes a number of "failed to wait for
in-sequence read" errors.
The tests are now run for the mount commands and for the plain VFS.
This makes the tests much easier to debug when running with a VFS than
through a mount.
In
54deb01f00 vfs: Make OpenFile and friends return EINVAL if O_RDONLY and O_TRUNC
The generated file open_test.go was edited directly without editing
the generator.
This commit brings the generator make_open_tests.go back into line
with that edit. It also makes it so `go generate` can be used to
regenerate the tests.
Previously we were using f which calls f.String() which calls f.Path()
which can cause a deadlock if uses carelessly.
This patch explicitly calls f.Path() or f._path() to bring attention
to the fact that there is a call to a method.
As part of this we take a copy of the directory path as calling
d.Path() violates the total locking order.
See the comment at the top of file.go for details
--vfs-read-wait duration Time to wait for in-sequence read before seeking. (default 5ms)
--vfs-write-wait duration Time to wait for in-sequence write before giving error. (default 1s)
See: https://forum.rclone.org/t/constantly-high-iowait-add-log/14156
When a file has its modtime set while it is open we delay setting the
modtime until the file is closed.
The file is then uploaded in Flush. In Release we check the cached
file has been uploaded by comparing modtimes and or hashes and upload
it again if it has changed.
Before this change we forgot to change the time on the cached file
when we updated the time file on the object, so this mean that Release
reset the time to the wrong time and uploaded the file again on
remotes which don't support hashes (eg crypt).
The fix was to set the modtime of the cached file at the same time we
set the modtime of the remote object. This means that the files check
as identical in Release so it doesn't try to upload the file.
This means that we avoid a double upload and the modtime is correct.
See: https://forum.rclone.org/t/modification-time-with-vfs-cache/13906/8
Before this change, when uploading files from the VFS cache which were
pending a rename, rclone would use the new path of the object when
specifiying the destination remote. This didn't cause a problem with
most backends as the subsequent rename did nothing, however with the
drive backend, since it updates objects, the incorrect Remote was
embedded in the object. This caused the rename to apparently succeed
but the object be at the wrong location.
The fix for this was to make sure we upload to the path stored in the
object if available.
This problem was spotted by the new rename tests for the VFS layer.
Before this change, renaming an open file when using the VFS cache was
delayed until the file was closed. This meant that the file was not
readable after a rename even though it is was in the cache.
After this change we rename the local cache file and the in memory
cache, delaying only the rename of the file in object storage.
See: https://forum.rclone.org/t/xen-orchestra-ebadf-bad-file-descriptor-write/13104
This makes ReadAt for non cached files wait a short time (up to 5mS)
if it gets an out of order read (which would normally cause a seek and
which take a long time) to see if the gap will be filled with an in
order read.
This makes mount2 based on go-fuse work more efficiently and enables
async reading in normal mount.
A similar change was done for WriteAt in af030f74f5
Before this change, change notify polls would clear the directory
cache recursively. So uploading a file to the root would clear the
entire directory cache.
After this change we just invalidate the directory cache of the parent
directory of the item and if the item was a directory we invalidate it
too.
Before this change when we renamed a directory this cleared the
directory cache for the parent directory too.
If the directory was remaining in the same parent this wasn't
necessary and caused the empty directory to fall out of the cache.
Fixes#3597