Merge pull request #376 from trapexit/docs

update docs to include dropcacheonclose and warn about directory mtime
This commit is contained in:
Antonio SJ Musumeci 2017-02-18 20:06:54 -05:00 committed by GitHub
commit 09dabc7b4c
2 changed files with 88 additions and 42 deletions

View File

@ -1,6 +1,6 @@
% mergerfs(1) mergerfs user manual
% Antonio SJ Musumeci <trapexit@spawn.link>
% 2016-12-14
% 2017-02-18
# NAME
@ -35,10 +35,11 @@ mergerfs -o&lt;options&gt; &lt;srcmounts&gt; &lt;mountpoint&gt;
* **direct_io**: causes FUSE to bypass caching which can increase write speeds at the detriment of reads. Note that not enabling `direct_io` will cause double caching of files and therefore less memory for caching generally. However, `mmap` does not work when `direct_io` is enabled.
* **minfreespace**: the minimum space value used for creation policies. Understands 'K', 'M', and 'G' to represent kilobyte, megabyte, and gigabyte respectively. (default: 4G)
* **moveonenospc**: when enabled (set to **true**) if a **write** fails with **ENOSPC** or **EDQUOT** a scan of all drives will be done looking for the drive with most free space which is at least the size of the file plus the amount which failed to write. An attempt to move the file to that drive will occur (keeping all metadata possible) and if successful the original is unlinked and the write retried. (default: false)
* **use_ino**: causes mergerfs to supply file/directory inodes rather than libfuse. While not a default it is generally recommended it be enabled so that hard linked files share the same inode value.
* **dropcacheonclose**: when a file is requested to be closed call `posix_fadvise` on it first to instruct the kernel that we no longer need the data and it can drop its cache. Recommended when **direct_io** is not enabled to limit double caching. (default: false)
* **fsname**: sets the name of the filesystem as seen in **mount**, **df**, etc. Defaults to a list of the source paths concatenated together with the longest common prefix removed.
* **func.&lt;func&gt;=&lt;policy&gt;**: sets the specific FUSE function's policy. See below for the list of value types. Example: **func.getattr=newest**
* **category.&lt;category&gt;=&lt;policy&gt;**: Sets policy of all FUSE functions in the provided category. Example: **category.create=mfs**
* **fsname**: sets the name of the filesystem as seen in **mount**, **df**, etc. Defaults to a list of the source paths concatenated together with the longest common prefix removed.
* **use_ino**: causes mergerfs to supply file/directory inodes rather than libfuse. While not a default it is generally recommended it be enabled so that hard linked files share the same inode value.
**NOTE:** Options are evaluated in the order listed so if the options are **func.rmdir=rand,category.action=ff** the **action** category setting will override the **rmdir** setting.
@ -313,17 +314,27 @@ A B C
* If you don't see some directories / files you expect in a merged point be sure the user has permission to all the underlying directories. Use `mergerfs.fsck` to audit the drive for out of sync permissions.
* Do *not* use `direct_io` if you expect applications (such as rtorrent) to [mmap](http://linux.die.net/man/2/mmap) files. It is not currently supported in FUSE w/ `direct_io` enabled.
* Since POSIX gives you only error or success on calls its difficult to determine the proper behavior when applying the behavior to multiple targets. **mergerfs** will return an error only if all attempts of an action fail. Any success will lead to a success returned. This means however that some odd situations may arise.
* Remember that some policies mixed with some functions may result in strange behaviors. Not that some of these behaviors and race conditions couldn't happen outside **mergerfs** but that they are far more likely to occur on account of attempt to merge together multiple sources of data which could be out of sync due to the different policies.
* An example: [Kodi](http://kodi.tv) and [Plex](http://plex.tv) can use directory [mtime](http://linux.die.net/man/2/stat) to more efficiently determine whether to scan for new content rather than simply performing a full scan. If using the current default **getattr** policy of **ff** its possible **Kodi** will miss an update on account of it returning the first directory found's **stat** info and its a later directory on another mount which had the **mtime** recently updated. To fix this you will want to set **func.getattr=newest**. Remember though that this is just **stat**. If the file is later **open**'ed or **unlink**'ed and the policy is different for those then a completely different file or directory could be acted on.
* Due to previously mentioned issues its generally best to set **category** wide policies rather than individual **func**'s. This will help limit the confusion of tools such as [rsync](http://linux.die.net/man/1/rsync). However, the flexibility is there if needed.
* [Kodi](http://kodi.tv), [Plex](http://plex.tv), [Subsonic](http://subsonic.org), etc. can use directory [mtime](http://linux.die.net/man/2/stat) to more efficiently determine whether to scan for new content rather than simply performing a full scan. If using the default **getattr** policy of **ff** its possible **Kodi** will miss an update on account of it returning the first directory found's **stat** info and its a later directory on another mount which had the **mtime** recently updated. To fix this you will want to set **func.getattr=newest**. Remember though that this is just **stat**. If the file is later **open**'ed or **unlink**'ed and the policy is different for those then a completely different file or directory could be acted on.
* Some policies mixed with some functions may result in strange behaviors. Not that some of these behaviors and race conditions couldn't happen outside **mergerfs** but that they are far more likely to occur on account of attempt to merge together multiple sources of data which could be out of sync due to the different policies.
* For consistency its generally best to set **category** wide policies rather than individual **func**'s. This will help limit the confusion of tools such as [rsync](http://linux.die.net/man/1/rsync). However, the flexibility is there if needed.
# KNOWN ISSUES / BUGS
#### directory mtime is not being updated
Remember that the default policy for `getattr` is `ff`. The information for the first directory found will be returned. If it wasn't the directory which had been updated then it will appear outdated.
The reason this is the default is because any other policy would be far more expensive and for many applications it is unnecessary. To always return the directory with the most recent mtime or a faked value based on all found would require a scan of all drives. That alone is far more expensive than `ff` but would also possibly spin up sleeping drives.
If you always want the directory information from the one with the most recent mtime then use the `newest` policy for `getattr`.
#### cached memory appears greater than it should be
Use the `direct_io` option as described above. Due to what mergerfs is doing there ends up being two caches of a file under normal usage. One from the underlying filesystem and one from mergerfs. Enabling `direct_io` removes the mergerfs cache. This saves on memory but means the kernel needs to communicate with mergerfs more often and can therefore result in slower read speeds.
Use the `direct_io` option as described above. Due to what mergerfs is doing there ends up being two caches of a file under normal usage. One from the underlying filesystem and one from mergerfs. Enabling `direct_io` removes the mergerfs cache. This saves on memory but means the kernel needs to communicate with mergerfs more often and can therefore result in slower speeds.
Since enabling `direct_io` disables `mmap` this is not an ideal situation however write speeds should be increased and there are some tweaks being developed which may help in minimizing the extra caching.
Since enabling `direct_io` disables `mmap` this is not an ideal situation however write speeds should be increased.
If `direct_io` is disabled it is probably a good idea to enable `dropcacheonclose` to minimize double caching.
#### NFS clients don't work
@ -367,6 +378,10 @@ If suddenly the mergerfs mount point disappears and `Transport endpoint is not c
In order to fix this please install newer versions of libfuse. If using a Debian based distro (Debian,Ubuntu,Mint) you can likely just install newer versions of [libfuse](https://packages.debian.org/unstable/libfuse2) and [fuse](https://packages.debian.org/unstable/fuse) from the repo of a newer release.
#### mergerfs appears to be crashing or exiting
There seems to be an issue with Linux version `4.9.0` and above in which an invalid message appears to be transmitted to libfuse (used by mergerfs) causing it to exit. No messages will be printed in any logs as its not a proper crash. Debugging of the issue is still ongoing and can be followed via the [fuse-devel thread](https://sourceforge.net/p/fuse/mailman/message/35662577).
#### mergerfs under heavy load and memory preasure leads to kernel panic
https://lkml.org/lkml/2016/9/14/527

View File

@ -1,5 +1,5 @@
.\"t
.TH "mergerfs" "1" "2016\-12\-14" "mergerfs user manual" ""
.TH "mergerfs" "1" "2017\-02\-18" "mergerfs user manual" ""
.SH NAME
.PP
mergerfs \- another (FUSE based) union filesystem
@ -63,6 +63,23 @@ metadata possible) and if successful the original is unlinked and the
write retried.
(default: false)
.IP \[bu] 2
\f[B]use_ino\f[]: causes mergerfs to supply file/directory inodes rather
than libfuse.
While not a default it is generally recommended it be enabled so that
hard linked files share the same inode value.
.IP \[bu] 2
\f[B]dropcacheonclose\f[]: when a file is requested to be closed call
\f[C]posix_fadvise\f[] on it first to instruct the kernel that we no
longer need the data and it can drop its cache.
Recommended when \f[B]direct_io\f[] is not enabled to limit double
caching.
(default: false)
.IP \[bu] 2
\f[B]fsname\f[]: sets the name of the filesystem as seen in
\f[B]mount\f[], \f[B]df\f[], etc.
Defaults to a list of the source paths concatenated together with the
longest common prefix removed.
.IP \[bu] 2
\f[B]func.<func>=<policy>\f[]: sets the specific FUSE function\[aq]s
policy.
See below for the list of value types.
@ -71,16 +88,6 @@ Example: \f[B]func.getattr=newest\f[]
\f[B]category.<category>=<policy>\f[]: Sets policy of all FUSE functions
in the provided category.
Example: \f[B]category.create=mfs\f[]
.IP \[bu] 2
\f[B]fsname\f[]: sets the name of the filesystem as seen in
\f[B]mount\f[], \f[B]df\f[], etc.
Defaults to a list of the source paths concatenated together with the
longest common prefix removed.
.IP \[bu] 2
\f[B]use_ino\f[]: causes mergerfs to supply file/directory inodes rather
than libfuse.
While not a default it is generally recommended it be enabled so that
hard linked files share the same inode value.
.PP
\f[B]NOTE:\f[] Options are evaluated in the order listed so if the
options are \f[B]func.rmdir=rand,category.action=ff\f[] the
@ -719,35 +726,49 @@ fail.
Any success will lead to a success returned.
This means however that some odd situations may arise.
.IP \[bu] 2
Remember that some policies mixed with some functions may result in
strange behaviors.
Not that some of these behaviors and race conditions couldn\[aq]t happen
outside \f[B]mergerfs\f[] but that they are far more likely to occur on
account of attempt to merge together multiple sources of data which
could be out of sync due to the different policies.
.IP \[bu] 2
An example: Kodi (http://kodi.tv) and Plex (http://plex.tv) can use
directory mtime (http://linux.die.net/man/2/stat) to more efficiently
determine whether to scan for new content rather than simply performing
a full scan.
If using the current default \f[B]getattr\f[] policy of \f[B]ff\f[] its
possible \f[B]Kodi\f[] will miss an update on account of it returning
the first directory found\[aq]s \f[B]stat\f[] info and its a later
directory on another mount which had the \f[B]mtime\f[] recently
updated.
Kodi (http://kodi.tv), Plex (http://plex.tv),
Subsonic (http://subsonic.org), etc.
can use directory mtime (http://linux.die.net/man/2/stat) to more
efficiently determine whether to scan for new content rather than simply
performing a full scan.
If using the default \f[B]getattr\f[] policy of \f[B]ff\f[] its possible
\f[B]Kodi\f[] will miss an update on account of it returning the first
directory found\[aq]s \f[B]stat\f[] info and its a later directory on
another mount which had the \f[B]mtime\f[] recently updated.
To fix this you will want to set \f[B]func.getattr=newest\f[].
Remember though that this is just \f[B]stat\f[].
If the file is later \f[B]open\f[]\[aq]ed or \f[B]unlink\f[]\[aq]ed and
the policy is different for those then a completely different file or
directory could be acted on.
.IP \[bu] 2
Due to previously mentioned issues its generally best to set
\f[B]category\f[] wide policies rather than individual
\f[B]func\f[]\[aq]s.
Some policies mixed with some functions may result in strange behaviors.
Not that some of these behaviors and race conditions couldn\[aq]t happen
outside \f[B]mergerfs\f[] but that they are far more likely to occur on
account of attempt to merge together multiple sources of data which
could be out of sync due to the different policies.
.IP \[bu] 2
For consistency its generally best to set \f[B]category\f[] wide
policies rather than individual \f[B]func\f[]\[aq]s.
This will help limit the confusion of tools such as
rsync (http://linux.die.net/man/1/rsync).
However, the flexibility is there if needed.
.SH KNOWN ISSUES / BUGS
.SS directory mtime is not being updated
.PP
Remember that the default policy for \f[C]getattr\f[] is \f[C]ff\f[].
The information for the first directory found will be returned.
If it wasn\[aq]t the directory which had been updated then it will
appear outdated.
.PP
The reason this is the default is because any other policy would be far
more expensive and for many applications it is unnecessary.
To always return the directory with the most recent mtime or a faked
value based on all found would require a scan of all drives.
That alone is far more expensive than \f[C]ff\f[] but would also
possibly spin up sleeping drives.
.PP
If you always want the directory information from the one with the most
recent mtime then use the \f[C]newest\f[] policy for \f[C]getattr\f[].
.SS cached memory appears greater than it should be
.PP
Use the \f[C]direct_io\f[] option as described above.
@ -756,12 +777,13 @@ under normal usage.
One from the underlying filesystem and one from mergerfs.
Enabling \f[C]direct_io\f[] removes the mergerfs cache.
This saves on memory but means the kernel needs to communicate with
mergerfs more often and can therefore result in slower read speeds.
mergerfs more often and can therefore result in slower speeds.
.PP
Since enabling \f[C]direct_io\f[] disables \f[C]mmap\f[] this is not an
ideal situation however write speeds should be increased and there are
some tweaks being developed which may help in minimizing the extra
caching.
ideal situation however write speeds should be increased.
.PP
If \f[C]direct_io\f[] is disabled it is probably a good idea to enable
\f[C]dropcacheonclose\f[] to minimize double caching.
.SS NFS clients don\[aq]t work
.PP
Some NFS clients appear to fail when a mergerfs mount is exported.
@ -887,6 +909,15 @@ install newer versions of
libfuse (https://packages.debian.org/unstable/libfuse2) and
fuse (https://packages.debian.org/unstable/fuse) from the repo of a
newer release.
.SS mergerfs appears to be crashing or exiting
.PP
There seems to be an issue with Linux version \f[C]4.9.0\f[] and above
in which an invalid message appears to be transmitted to libfuse (used
by mergerfs) causing it to exit.
No messages will be printed in any logs as its not a proper crash.
Debugging of the issue is still ongoing and can be followed via the
fuse\-devel
thread (https://sourceforge.net/p/fuse/mailman/message/35662577).
.SS mergerfs under heavy load and memory preasure leads to kernel panic
.PP
https://lkml.org/lkml/2016/9/14/527