mirror of
https://github.com/trapexit/mergerfs.git
synced 2024-11-22 10:22:02 +08:00
Misc README updates
This commit is contained in:
parent
bd02bfd54c
commit
5152c63480
273
README.md
273
README.md
|
@ -65,9 +65,10 @@ A + B = C
|
|||
mergerfs does **not** support the copy-on-write (CoW) or whiteout
|
||||
behaviors found in **aufs** and **overlayfs**. You can **not** mount a
|
||||
read-only filesystem and write to it. However, mergerfs will ignore
|
||||
read-only drives when creating new files so you can mix read-write and
|
||||
read-only drives. It also does **not** split data across drives. It is
|
||||
not RAID0 / striping. It is simply a union of other filesystems.
|
||||
read-only filesystems when creating new files so you can mix
|
||||
read-write and read-only filesystems. It also does **not** split data
|
||||
across filesystems. It is not RAID0 / striping. It is simply a union of
|
||||
other filesystems.
|
||||
|
||||
|
||||
# TERMINOLOGY
|
||||
|
@ -178,7 +179,7 @@ These options are the same regardless of whether you use them with the
|
|||
policy of `create` (read below). Enabling this will cause rename and
|
||||
link to always use the non-path preserving behavior. This means
|
||||
files, when renamed or linked, will stay on the same
|
||||
drive. (default: false)
|
||||
filesystem. (default: false)
|
||||
* **security_capability=BOOL**: If false return ENOATTR when xattr
|
||||
security.capability is queried. (default: true)
|
||||
* **xattr=passthrough|noattr|nosys**: Runtime control of
|
||||
|
@ -191,7 +192,7 @@ These options are the same regardless of whether you use them with the
|
|||
copy-on-write function similar to cow-shell. (default: false)
|
||||
* **statfs=base|full**: Controls how statfs works. 'base' means it
|
||||
will always use all branches in statfs calculations. 'full' is in
|
||||
effect path preserving and only includes drives where the path
|
||||
effect path preserving and only includes branches where the path
|
||||
exists. (default: base)
|
||||
* **statfs_ignore=none|ro|nc**: 'ro' will cause statfs calculations to
|
||||
ignore available space for branches mounted or tagged as 'read-only'
|
||||
|
@ -324,9 +325,9 @@ you're using. Not all features are available in older releases. Use
|
|||
|
||||
The 'branches' argument is a colon (':') delimited list of paths to be
|
||||
pooled together. It does not matter if the paths are on the same or
|
||||
different drives nor does it matter the filesystem (within
|
||||
different filesystems nor does it matter the filesystem type (within
|
||||
reason). Used and available space will not be duplicated for paths on
|
||||
the same device and any features which aren't supported by the
|
||||
the same filesystem and any features which aren't supported by the
|
||||
underlying filesystem (such as file attributes or extended attributes)
|
||||
will return the appropriate errors.
|
||||
|
||||
|
@ -334,7 +335,7 @@ Branches currently have two options which can be set. A type which
|
|||
impacts whether or not the branch is included in a policy calculation
|
||||
and a individual minfreespace value. The values are set by prepending
|
||||
an `=` at the end of a branch designation and using commas as
|
||||
delimiters. Example: /mnt/drive=RW,1234
|
||||
delimiters. Example: `/mnt/drive=RW,1234`
|
||||
|
||||
|
||||
#### branch mode
|
||||
|
@ -590,10 +591,10 @@ something to keep in mind.
|
|||
|
||||
**WARNING:** Some backup solutions, such as CrashPlan, do not backup
|
||||
the target of a symlink. If using this feature it will be necessary to
|
||||
point any backup software to the original drives or configure the
|
||||
software to follow symlinks if such an option is
|
||||
available. Alternatively create two mounts. One for backup and one for
|
||||
general consumption.
|
||||
point any backup software to the original filesystems or configure the
|
||||
software to follow symlinks if such an option is available.
|
||||
Alternatively create two mounts. One for backup and one for general
|
||||
consumption.
|
||||
|
||||
|
||||
### nullrw
|
||||
|
@ -750,11 +751,11 @@ All policies which start with `ep` (**epff**, **eplfs**, **eplus**,
|
|||
**epmfs**, **eprand**) are `path preserving`. `ep` stands for
|
||||
`existing path`.
|
||||
|
||||
A path preserving policy will only consider drives where the relative
|
||||
A path preserving policy will only consider branches where the relative
|
||||
path being accessed already exists.
|
||||
|
||||
When using non-path preserving policies paths will be cloned to target
|
||||
drives as necessary.
|
||||
branches as necessary.
|
||||
|
||||
With the `msp` or `most shared path` policies they are defined as
|
||||
`path preserving` for the purpose of controlling `link` and `rename`'s
|
||||
|
@ -775,15 +776,15 @@ but it makes things a bit more uniform.
|
|||
| all | Search: For **mkdir**, **mknod**, and **symlink** it will apply to all branches. **create** works like **ff**. |
|
||||
| epall (existing path, all) | For **mkdir**, **mknod**, and **symlink** it will apply to all found. **create** works like **epff** (but more expensive because it doesn't stop after finding a valid branch). |
|
||||
| epff (existing path, first found) | Given the order of the branches, as defined at mount time or configured at runtime, act on the first one found where the relative path exists. |
|
||||
| eplfs (existing path, least free space) | Of all the branches on which the relative path exists choose the drive with the least free space. |
|
||||
| eplus (existing path, least used space) | Of all the branches on which the relative path exists choose the drive with the least used space. |
|
||||
| epmfs (existing path, most free space) | Of all the branches on which the relative path exists choose the drive with the most free space. |
|
||||
| eplfs (existing path, least free space) | Of all the branches on which the relative path exists choose the branch with the least free space. |
|
||||
| eplus (existing path, least used space) | Of all the branches on which the relative path exists choose the branch with the least used space. |
|
||||
| epmfs (existing path, most free space) | Of all the branches on which the relative path exists choose the branch with the most free space. |
|
||||
| eppfrd (existing path, percentage free random distribution) | Like **pfrd** but limited to existing paths. |
|
||||
| eprand (existing path, random) | Calls **epall** and then randomizes. Returns 1. |
|
||||
| ff (first found) | Given the order of the drives, as defined at mount time or configured at runtime, act on the first one found. |
|
||||
| lfs (least free space) | Pick the drive with the least available free space. |
|
||||
| lus (least used space) | Pick the drive with the least used space. |
|
||||
| mfs (most free space) | Pick the drive with the most available free space. |
|
||||
| ff (first found) | Given the order of the branches, as defined at mount time or configured at runtime, act on the first one found. |
|
||||
| lfs (least free space) | Pick the branch with the least available free space. |
|
||||
| lus (least used space) | Pick the branch with the least used space. |
|
||||
| mfs (most free space) | Pick the branch with the most available free space. |
|
||||
| msplfs (most shared path, least free space) | Like **eplfs** but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one. |
|
||||
| msplus (most shared path, least used space) | Like **eplus** but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one. |
|
||||
| mspmfs (most shared path, most free space) | Like **epmfs** but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one. |
|
||||
|
@ -832,7 +833,7 @@ filesystem. `rename` only works within a single filesystem or
|
|||
device. If a rename can't be done atomically due to the source and
|
||||
destination paths existing on different mount points it will return
|
||||
**-1** with **errno = EXDEV** (cross device / improper link). So if a
|
||||
`rename`'s source and target are on different drives within the pool
|
||||
`rename`'s source and target are on different filesystems within the pool
|
||||
it creates an issue.
|
||||
|
||||
Originally mergerfs would return EXDEV whenever a rename was requested
|
||||
|
@ -850,25 +851,25 @@ work while still obeying mergerfs' policies. Below is the basic logic.
|
|||
* Using the **rename** policy get the list of files to rename
|
||||
* For each file attempt rename:
|
||||
* If failure with ENOENT (no such file or directory) run **create** policy
|
||||
* If create policy returns the same drive as currently evaluating then clone the path
|
||||
* If create policy returns the same branch as currently evaluating then clone the path
|
||||
* Re-attempt rename
|
||||
* If **any** of the renames succeed the higher level rename is considered a success
|
||||
* If **no** renames succeed the first error encountered will be returned
|
||||
* On success:
|
||||
* Remove the target from all drives with no source file
|
||||
* Remove the source from all drives which failed to rename
|
||||
* Remove the target from all branches with no source file
|
||||
* Remove the source from all branches which failed to rename
|
||||
* If using a **create** policy which does **not** try to preserve directory paths
|
||||
* Using the **rename** policy get the list of files to rename
|
||||
* Using the **getattr** policy get the target path
|
||||
* For each file attempt rename:
|
||||
* If the source drive != target drive:
|
||||
* Clone target path from target drive to source drive
|
||||
* If the source branch != target branch:
|
||||
* Clone target path from target branch to source branch
|
||||
* Rename
|
||||
* If **any** of the renames succeed the higher level rename is considered a success
|
||||
* If **no** renames succeed the first error encountered will be returned
|
||||
* On success:
|
||||
* Remove the target from all drives with no source file
|
||||
* Remove the source from all drives which failed to rename
|
||||
* Remove the target from all branches with no source file
|
||||
* Remove the source from all branches which failed to rename
|
||||
|
||||
The the removals are subject to normal entitlement checks.
|
||||
|
||||
|
@ -894,11 +895,11 @@ the source of the metadata you see in an **ls**.
|
|||
#### statfs / statvfs ####
|
||||
|
||||
[statvfs](http://linux.die.net/man/2/statvfs) normalizes the source
|
||||
drives based on the fragment size and sums the number of adjusted
|
||||
filesystems based on the fragment size and sums the number of adjusted
|
||||
blocks and inodes. This means you will see the combined space of all
|
||||
sources. Total, used, and free. The sources however are dedupped based
|
||||
on the drive so multiple sources on the same drive will not result in
|
||||
double counting its space. Filesystems mounted further down the tree
|
||||
on the filesystem so multiple sources on the same drive will not result in
|
||||
double counting its space. Other filesystems mounted further down the tree
|
||||
of the branch will not be included when checking the mount's stats.
|
||||
|
||||
The options `statfs` and `statfs_ignore` can be used to modify
|
||||
|
@ -1211,8 +1212,8 @@ following:
|
|||
* mergerfs.fsck: Provides permissions and ownership auditing and the ability to fix them
|
||||
* mergerfs.dedup: Will help identify and optionally remove duplicate files
|
||||
* mergerfs.dup: Ensure there are at least N copies of a file across the pool
|
||||
* mergerfs.balance: Rebalance files across drives by moving them from the most filled to the least filled
|
||||
* mergerfs.consolidate: move files within a single mergerfs directory to the drive with most free space
|
||||
* mergerfs.balance: Rebalance files across filesystems by moving them from the most filled to the least filled
|
||||
* mergerfs.consolidate: move files within a single mergerfs directory to the filesystem with most free space
|
||||
* https://github.com/trapexit/scorch
|
||||
* scorch: A tool to help discover silent corruption of files and keep track of files
|
||||
* https://github.com/trapexit/bbf
|
||||
|
@ -1324,37 +1325,18 @@ of sizes below the FUSE message size (128K on older kernels, 1M on
|
|||
newer).
|
||||
|
||||
|
||||
#### policy caching
|
||||
|
||||
Policies are run every time a function (with a policy as mentioned
|
||||
above) is called. These policies can be expensive depending on
|
||||
mergerfs' setup and client usage patterns. Generally we wouldn't want
|
||||
to cache policy results because it may result in stale responses if
|
||||
the underlying drives are used directly.
|
||||
|
||||
The `open` policy cache will cache the result of an `open` policy for
|
||||
a particular input for `cache.open` seconds or until the file is
|
||||
unlinked. Each file close (release) will randomly chose to clean up
|
||||
the cache of expired entries.
|
||||
|
||||
This cache is really only useful in cases where you have a large
|
||||
number of branches and `open` is called on the same files repeatedly
|
||||
(like **Transmission** which opens and closes a file on every
|
||||
read/write presumably to keep file handle usage low).
|
||||
|
||||
|
||||
#### statfs caching
|
||||
|
||||
Of the syscalls used by mergerfs in policies the `statfs` / `statvfs`
|
||||
call is perhaps the most expensive. It's used to find out the
|
||||
available space of a drive and whether it is mounted
|
||||
available space of a filesystem and whether it is mounted
|
||||
read-only. Depending on the setup and usage pattern these queries can
|
||||
be relatively costly. When `cache.statfs` is enabled all calls to
|
||||
`statfs` by a policy will be cached for the number of seconds its set
|
||||
to.
|
||||
|
||||
Example: If the create policy is `mfs` and the timeout is 60 then for
|
||||
that 60 seconds the same drive will be returned as the target for
|
||||
that 60 seconds the same filesystem will be returned as the target for
|
||||
creates because the available space won't be updated for that time.
|
||||
|
||||
|
||||
|
@ -1392,42 +1374,42 @@ for instance.
|
|||
MergerFS does not natively support any sort of tiered caching. Most
|
||||
users have no use for such a feature and its inclusion would
|
||||
complicate the code. However, there are a few situations where a cache
|
||||
drive could help with a typical mergerfs setup.
|
||||
filesystem could help with a typical mergerfs setup.
|
||||
|
||||
1. Fast network, slow drives, many readers: You've a 10+Gbps network
|
||||
with many readers and your regular drives can't keep up.
|
||||
2. Fast network, slow drives, small'ish bursty writes: You have a
|
||||
1. Fast network, slow filesystems, many readers: You've a 10+Gbps network
|
||||
with many readers and your regular filesystems can't keep up.
|
||||
2. Fast network, slow filesystems, small'ish bursty writes: You have a
|
||||
10+Gbps network and wish to transfer amounts of data less than your
|
||||
cache drive but wish to do so quickly.
|
||||
cache filesystem but wish to do so quickly.
|
||||
|
||||
With #1 it's arguable if you should be using mergerfs at all. RAID
|
||||
would probably be the better solution. If you're going to use mergerfs
|
||||
there are other tactics that may help: spreading the data across
|
||||
drives (see the mergerfs.dup tool) and setting `func.open=rand`, using
|
||||
`symlinkify`, or using dm-cache or a similar technology to add tiered
|
||||
cache to the underlying device.
|
||||
filesystems (see the mergerfs.dup tool) and setting `func.open=rand`,
|
||||
using `symlinkify`, or using dm-cache or a similar technology to add
|
||||
tiered cache to the underlying device.
|
||||
|
||||
With #2 one could use dm-cache as well but there is another solution
|
||||
which requires only mergerfs and a cronjob.
|
||||
|
||||
1. Create 2 mergerfs pools. One which includes just the slow drives
|
||||
and one which has both the fast drives (SSD,NVME,etc.) and slow
|
||||
drives.
|
||||
2. The 'cache' pool should have the cache drives listed first.
|
||||
1. Create 2 mergerfs pools. One which includes just the slow devices
|
||||
and one which has both the fast devices (SSD,NVME,etc.) and slow
|
||||
devices.
|
||||
2. The 'cache' pool should have the cache filesystems listed first.
|
||||
3. The best `create` policies to use for the 'cache' pool would
|
||||
probably be `ff`, `epff`, `lfs`, or `eplfs`. The latter two under
|
||||
the assumption that the cache drive(s) are far smaller than the
|
||||
backing drives. If using path preserving policies remember that
|
||||
the assumption that the cache filesystem(s) are far smaller than the
|
||||
backing filesystems. If using path preserving policies remember that
|
||||
you'll need to manually create the core directories of those paths
|
||||
you wish to be cached. Be sure the permissions are in sync. Use
|
||||
`mergerfs.fsck` to check / correct them. You could also tag the
|
||||
slow drives as `=NC` though that'd mean if the cache drives fill
|
||||
you'd get "out of space" errors.
|
||||
`mergerfs.fsck` to check / correct them. You could also set the
|
||||
slow filesystems mode to `NC` though that'd mean if the cache
|
||||
filesystems fill you'd get "out of space" errors.
|
||||
4. Enable `moveonenospc` and set `minfreespace` appropriately. To make
|
||||
sure there is enough room on the "slow" pool you might want to set
|
||||
`minfreespace` to at least as large as the size of the largest
|
||||
cache drive if not larger. This way in the worst case the whole of
|
||||
the cache drive(s) can be moved to the other drives.
|
||||
cache filesystem if not larger. This way in the worst case the
|
||||
whole of the cache filesystem(s) can be moved to the other drives.
|
||||
5. Set your programs to use the cache pool.
|
||||
6. Save one of the below scripts or create you're own.
|
||||
7. Use `cron` (as root) to schedule the command at whatever frequency
|
||||
|
@ -1442,15 +1424,15 @@ rather than days. May want to use the `fadvise` / `--drop-cache`
|
|||
version of rsync or run rsync with the tool "nocache".
|
||||
|
||||
*NOTE:* The arguments to these scripts include the cache
|
||||
**drive**. Not the pool with the cache drive. You could have data loss
|
||||
if the source is the cache pool.
|
||||
**filesystem** itself. Not the pool with the cache filesystem. You
|
||||
could have data loss if the source is the cache pool.
|
||||
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
|
||||
if [ $# != 3 ]; then
|
||||
echo "usage: $0 <cache-drive> <backing-pool> <days-old>"
|
||||
echo "usage: $0 <cache-fs> <backing-pool> <days-old>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
@ -1469,15 +1451,15 @@ Move the oldest file from the cache to the backing pool. Continue till
|
|||
below percentage threshold.
|
||||
|
||||
*NOTE:* The arguments to these scripts include the cache
|
||||
**drive**. Not the pool with the cache drive. You could have data loss
|
||||
if the source is the cache pool.
|
||||
**filesystem** itself. Not the pool with the cache filesystem. You
|
||||
could have data loss if the source is the cache pool.
|
||||
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
|
||||
if [ $# != 3 ]; then
|
||||
echo "usage: $0 <cache-drive> <backing-pool> <percentage>"
|
||||
echo "usage: $0 <cache-fs> <backing-pool> <percentage>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
@ -1506,7 +1488,7 @@ FUSE filesystem working from userspace there is an increase in
|
|||
overhead relative to kernel based solutions. That said the performance
|
||||
can match the theoretical max but it depends greatly on the system's
|
||||
configuration. Especially when adding network filesystems into the mix
|
||||
there are many variables which can impact performance. Drive speeds
|
||||
there are many variables which can impact performance. Device speeds
|
||||
and latency, network speeds and latency, general concurrency,
|
||||
read/write sizes, etc. Unfortunately, given the number of variables it
|
||||
has been difficult to find a single set of settings which provide
|
||||
|
@ -1528,7 +1510,7 @@ understand what behaviors it may impact
|
|||
* disable `async_read`
|
||||
* test theoretical performance using `nullrw` or mounting a ram disk
|
||||
* use `symlinkify` if your data is largely static and read-only
|
||||
* use tiered cache drives
|
||||
* use tiered cache devices
|
||||
* use LVM and LVM cache to place a SSD in front of your HDDs
|
||||
* increase readahead: `readahead=1024`
|
||||
|
||||
|
@ -1567,9 +1549,9 @@ the order listed (but not combined).
|
|||
2. Mount mergerfs over `tmpfs`. `tmpfs` is a RAM disk. Extremely high
|
||||
speed and very low latency. This is a more realistic best case
|
||||
scenario. Example: `mount -t tmpfs -o size=2G tmpfs /tmp/tmpfs`
|
||||
3. Mount mergerfs over a local drive. NVMe, SSD, HDD, etc. If you have
|
||||
more than one I'd suggest testing each of them as drives and/or
|
||||
controllers (their drivers) could impact performance.
|
||||
3. Mount mergerfs over a local device. NVMe, SSD, HDD, etc. If you
|
||||
have more than one I'd suggest testing each of them as drives
|
||||
and/or controllers (their drivers) could impact performance.
|
||||
4. Finally, if you intend to use mergerfs with a network filesystem,
|
||||
either as the source of data or to combine with another through
|
||||
mergerfs, test each of those alone as above.
|
||||
|
@ -1579,7 +1561,7 @@ further testing with different options to see if they impact
|
|||
performance. For reads and writes the most relevant would be:
|
||||
`cache.files`, `async_read`. Less likely but relevant when using NFS
|
||||
or with certain filesystems would be `security_capability`, `xattr`,
|
||||
and `posix_acl`. If you find a specific system, drive, filesystem,
|
||||
and `posix_acl`. If you find a specific system, device, filesystem,
|
||||
controller, etc. that performs poorly contact trapexit so he may
|
||||
investigate further.
|
||||
|
||||
|
@ -1632,7 +1614,7 @@ echo 3 | sudo tee /proc/sys/vm/drop_caches
|
|||
* If you don't see some directories and files you expect, policies
|
||||
seem to skip branches, you get strange permission errors, etc. be
|
||||
sure the underlying filesystems' permissions are all the same. Use
|
||||
`mergerfs.fsck` to audit the drive for out of sync permissions.
|
||||
`mergerfs.fsck` to audit the filesystem for out of sync permissions.
|
||||
* If you still have permission issues be sure you are using POSIX ACL
|
||||
compliant filesystems. mergerfs doesn't generally make exceptions
|
||||
for FAT, NTFS, or other non-POSIX filesystem.
|
||||
|
@ -1684,7 +1666,7 @@ outdated.
|
|||
The reason this is the default is because any other policy would be
|
||||
more expensive and for many applications it is unnecessary. To always
|
||||
return the directory with the most recent mtime or a faked value based
|
||||
on all found would require a scan of all drives.
|
||||
on all found would require a scan of all filesystems.
|
||||
|
||||
If you always want the directory information from the one with the
|
||||
most recent mtime then use the `newest` policy for `getattr`.
|
||||
|
@ -1709,9 +1691,9 @@ then removing the source. Since the source **is** the target in this
|
|||
case, depending on the unlink policy, it will remove the just copied
|
||||
file and other files across the branches.
|
||||
|
||||
If you want to move files to one drive just copy them there and use
|
||||
mergerfs.dedup to clean up the old paths or manually remove them from
|
||||
the branches directly.
|
||||
If you want to move files to one filesystem just copy them there and
|
||||
use mergerfs.dedup to clean up the old paths or manually remove them
|
||||
from the branches directly.
|
||||
|
||||
|
||||
#### cached memory appears greater than it should be
|
||||
|
@ -1772,15 +1754,14 @@ Please read the section above regarding [rename & link](#rename--link).
|
|||
|
||||
The problem is that many applications do not properly handle `EXDEV`
|
||||
errors which `rename` and `link` may return even though they are
|
||||
perfectly valid situations which do not indicate actual drive or OS
|
||||
errors. The error will only be returned by mergerfs if using a path
|
||||
preserving policy as described in the policy section above. If you do
|
||||
not care about path preservation simply change the mergerfs policy to
|
||||
the non-path preserving version. For example: `-o category.create=mfs`
|
||||
|
||||
Ideally the offending software would be fixed and it is recommended
|
||||
that if you run into this problem you contact the software's author
|
||||
and request proper handling of `EXDEV` errors.
|
||||
perfectly valid situations which do not indicate actual device,
|
||||
filesystem, or OS errors. The error will only be returned by mergerfs
|
||||
if using a path preserving policy as described in the policy section
|
||||
above. If you do not care about path preservation simply change the
|
||||
mergerfs policy to the non-path preserving version. For example: `-o
|
||||
category.create=mfs` Ideally the offending software would be fixed and
|
||||
it is recommended that if you run into this problem you contact the
|
||||
software's author and request proper handling of `EXDEV` errors.
|
||||
|
||||
|
||||
#### my 32bit software has problems
|
||||
|
@ -1887,9 +1868,10 @@ Users have reported running mergerfs on everything from a Raspberry Pi
|
|||
to dual socket Xeon systems with >20 cores. I'm aware of at least a
|
||||
few companies which use mergerfs in production. [Open Media
|
||||
Vault](https://www.openmediavault.org) includes mergerfs as its sole
|
||||
solution for pooling drives. The author of mergerfs had it running for
|
||||
over 300 days managing 16+ drives with reasonably heavy 24/7 read and
|
||||
write usage. Stopping only after the machine's power supply died.
|
||||
solution for pooling filesystems. The author of mergerfs had it
|
||||
running for over 300 days managing 16+ devices with reasonably heavy
|
||||
24/7 read and write usage. Stopping only after the machine's power
|
||||
supply died.
|
||||
|
||||
Most serious issues (crashes or data corruption) have been due to
|
||||
[kernel
|
||||
|
@ -1897,14 +1879,14 @@ bugs](https://github.com/trapexit/mergerfs/wiki/Kernel-Issues-&-Bugs). All
|
|||
of which are fixed in stable releases.
|
||||
|
||||
|
||||
#### Can mergerfs be used with drives which already have data / are in use?
|
||||
#### Can mergerfs be used with filesystems which already have data / are in use?
|
||||
|
||||
Yes. MergerFS is a proxy and does **NOT** interfere with the normal
|
||||
form or function of the drives / mounts / paths it manages.
|
||||
form or function of the filesystems / mounts / paths it manages.
|
||||
|
||||
MergerFS is **not** a traditional filesystem. MergerFS is **not**
|
||||
RAID. It does **not** manipulate the data that passes through it. It
|
||||
does **not** shard data across drives. It merely shards some
|
||||
does **not** shard data across filesystems. It merely shards some
|
||||
**behavior** and aggregates others.
|
||||
|
||||
|
||||
|
@ -1920,8 +1902,8 @@ best off using `mfs` for `category.create`. It will spread files out
|
|||
across your branches based on available space. Use `mspmfs` if you
|
||||
want to try to colocate the data a bit more. You may want to use `lus`
|
||||
if you prefer a slightly different distribution of data if you have a
|
||||
mix of smaller and larger drives. Generally though `mfs`, `lus`, or
|
||||
even `rand` are good for the general use case. If you are starting
|
||||
mix of smaller and larger filesystems. Generally though `mfs`, `lus`,
|
||||
or even `rand` are good for the general use case. If you are starting
|
||||
with an imbalanced pool you can use the tool **mergerfs.balance** to
|
||||
redistribute files across the pool.
|
||||
|
||||
|
@ -1929,8 +1911,8 @@ If you really wish to try to colocate files based on directory you can
|
|||
set `func.create` to `epmfs` or similar and `func.mkdir` to `rand` or
|
||||
`eprand` depending on if you just want to colocate generally or on
|
||||
specific branches. Either way the *need* to colocate is rare. For
|
||||
instance: if you wish to remove the drive regularly and want the data
|
||||
to predictably be on that drive or if you don't use backup at all and
|
||||
instance: if you wish to remove the device regularly and want the data
|
||||
to predictably be on that device or if you don't use backup at all and
|
||||
don't wish to replace that data piecemeal. In which case using path
|
||||
preservation can help but will require some manual
|
||||
attention. Colocating after the fact can be accomplished using the
|
||||
|
@ -1965,29 +1947,29 @@ That said, for the average person, the following should be fine:
|
|||
`cache.files=off,dropcacheonclose=true,category.create=mfs`
|
||||
|
||||
|
||||
#### Why are all my files ending up on 1 drive?!
|
||||
#### Why are all my files ending up on 1 filesystem?!
|
||||
|
||||
Did you start with empty drives? Did you explicitly configure a
|
||||
Did you start with empty filesystems? Did you explicitly configure a
|
||||
`category.create` policy? Are you using an `existing path` / `path
|
||||
preserving` policy?
|
||||
|
||||
The default create policy is `epmfs`. That is a path preserving
|
||||
algorithm. With such a policy for `mkdir` and `create` with a set of
|
||||
empty drives it will select only 1 drive when the first directory is
|
||||
created. Anything, files or directories, created in that first
|
||||
directory will be placed on the same branch because it is preserving
|
||||
paths.
|
||||
empty filesystems it will select only 1 filesystem when the first
|
||||
directory is created. Anything, files or directories, created in that
|
||||
first directory will be placed on the same branch because it is
|
||||
preserving paths.
|
||||
|
||||
This catches a lot of new users off guard but changing the default
|
||||
would break the setup for many existing users. If you do not care
|
||||
about path preservation and wish your files to be spread across all
|
||||
your drives change to `mfs` or similar policy as described above. If
|
||||
you do want path preservation you'll need to perform the manual act of
|
||||
creating paths on the drives you want the data to land on before
|
||||
transferring your data. Setting `func.mkdir=epall` can simplify
|
||||
managing path preservation for `create`. Or use `func.mkdir=rand` if
|
||||
you're interested in just grouping together directory content by
|
||||
drive.
|
||||
your filesystems change to `mfs` or similar policy as described
|
||||
above. If you do want path preservation you'll need to perform the
|
||||
manual act of creating paths on the filesystems you want the data to
|
||||
land on before transferring your data. Setting `func.mkdir=epall` can
|
||||
simplify managing path preservation for `create`. Or use
|
||||
`func.mkdir=rand` if you're interested in just grouping together
|
||||
directory content by filesystem.
|
||||
|
||||
|
||||
#### Do hardlinks work?
|
||||
|
@ -2058,8 +2040,8 @@ such, mergerfs always changes its credentials to that of the
|
|||
caller. This means that if the user does not have access to a file or
|
||||
directory than neither will mergerfs. However, because mergerfs is
|
||||
creating a union of paths it may be able to read some files and
|
||||
directories on one drive but not another resulting in an incomplete
|
||||
set.
|
||||
directories on one filesystem but not another resulting in an
|
||||
incomplete set.
|
||||
|
||||
Whenever you run into a split permission issue (seeing some but not
|
||||
all files) try using
|
||||
|
@ -2153,9 +2135,10 @@ overlayfs have.
|
|||
#### Why use mergerfs over unionfs?
|
||||
|
||||
UnionFS is more like aufs than mergerfs in that it offers overlay /
|
||||
CoW features. If you're just looking to create a union of drives and
|
||||
want flexibility in file/directory placement then mergerfs offers that
|
||||
whereas unionfs is more for overlaying RW filesystems over RO ones.
|
||||
CoW features. If you're just looking to create a union of filesystems
|
||||
and want flexibility in file/directory placement then mergerfs offers
|
||||
that whereas unionfs is more for overlaying RW filesystems over RO
|
||||
ones.
|
||||
|
||||
|
||||
#### Why use mergerfs over overlayfs?
|
||||
|
@ -2179,8 +2162,8 @@ without the single point of failure.
|
|||
#### Why use mergerfs over ZFS?
|
||||
|
||||
MergerFS is not intended to be a replacement for ZFS. MergerFS is
|
||||
intended to provide flexible pooling of arbitrary drives (local or
|
||||
remote), of arbitrary sizes, and arbitrary filesystems. For `write
|
||||
intended to provide flexible pooling of arbitrary filesystems (local
|
||||
or remote), of arbitrary sizes, and arbitrary filesystems. For `write
|
||||
once, read many` usecases such as bulk media storage. Where data
|
||||
integrity and backup is managed in other ways. In that situation ZFS
|
||||
can introduce a number of costs and limitations as described
|
||||
|
@ -2200,6 +2183,29 @@ There are a number of UnRAID users who use mergerfs as well though I'm
|
|||
not entirely familiar with the use case.
|
||||
|
||||
|
||||
#### Why use mergerfs over StableBit's DrivePool?
|
||||
|
||||
DrivePool works only on Windows so not as common an alternative as
|
||||
other Linux solutions. If you want to use Windows then DrivePool is a
|
||||
good option. Functionally the two projects work a bit
|
||||
differently. DrivePool always writes to the filesystem with the most
|
||||
free space and later rebalances. mergerfs does not offer rebalance but
|
||||
chooses a branch at file/directory create time. DrivePool's
|
||||
rebalancing can be done differently in any directory and has file
|
||||
pattern matching to further customize the behavior. mergerfs, not
|
||||
having rebalancing does not have these features, but similar features
|
||||
are planned for mergerfs v3. DrivePool has builtin file duplication
|
||||
which mergerfs does not natively support (but can be done via an
|
||||
external script.)
|
||||
|
||||
There are a lot of misc differences between the two projects but most
|
||||
features in DrivePool can be replicated with external tools in
|
||||
combination with mergerfs.
|
||||
|
||||
Additionally DrivePool is a closed source commercial product vs
|
||||
mergerfs a ISC licensed OSS project.
|
||||
|
||||
|
||||
#### What should mergerfs NOT be used for?
|
||||
|
||||
* databases: Even if the database stored data in separate files
|
||||
|
@ -2214,7 +2220,7 @@ not entirely familiar with the use case.
|
|||
availability you should stick with RAID.
|
||||
|
||||
|
||||
#### Can drives be written to directly? Outside of mergerfs while pooled?
|
||||
#### Can filesystems be written to directly? Outside of mergerfs while pooled?
|
||||
|
||||
Yes, however it's not recommended to use the same file from within the
|
||||
pool and from without at the same time (particularly
|
||||
|
@ -2244,7 +2250,7 @@ was asked of it: filtering possible branches due to those
|
|||
settings. Only one error can be returned and if one of the reasons for
|
||||
filtering a branch was **minfreespace** then it will be returned as
|
||||
such. **moveonenospc** is only relevant to writing a file which is too
|
||||
large for the drive its currently on.
|
||||
large for the filesystem it's currently on.
|
||||
|
||||
It is also possible that the filesystem selected has run out of
|
||||
inodes. Use `df -i` to list the total and available inodes per
|
||||
|
@ -2336,7 +2342,8 @@ away by using realtime signals to inform all threads to change
|
|||
credentials. Taking after **Samba**, mergerfs uses
|
||||
**syscall(SYS_setreuid,...)** to set the callers credentials for that
|
||||
thread only. Jumping back to **root** as necessary should escalated
|
||||
privileges be needed (for instance: to clone paths between drives).
|
||||
privileges be needed (for instance: to clone paths between
|
||||
filesystems).
|
||||
|
||||
For non-Linux systems mergerfs uses a read-write lock and changes
|
||||
credentials only when necessary. If multiple threads are to be user X
|
||||
|
|
263
man/mergerfs.1
263
man/mergerfs.1
|
@ -77,9 +77,9 @@ A + B = C
|
|||
mergerfs does \f[B]not\f[R] support the copy-on-write (CoW) or whiteout
|
||||
behaviors found in \f[B]aufs\f[R] and \f[B]overlayfs\f[R].
|
||||
You can \f[B]not\f[R] mount a read-only filesystem and write to it.
|
||||
However, mergerfs will ignore read-only drives when creating new files
|
||||
so you can mix read-write and read-only drives.
|
||||
It also does \f[B]not\f[R] split data across drives.
|
||||
However, mergerfs will ignore read-only filesystems when creating new
|
||||
files so you can mix read-write and read-only filesystems.
|
||||
It also does \f[B]not\f[R] split data across filesystems.
|
||||
It is not RAID0 / striping.
|
||||
It is simply a union of other filesystems.
|
||||
.SH TERMINOLOGY
|
||||
|
@ -210,7 +210,8 @@ Typically rename and link act differently depending on the policy of
|
|||
\f[C]create\f[R] (read below).
|
||||
Enabling this will cause rename and link to always use the non-path
|
||||
preserving behavior.
|
||||
This means files, when renamed or linked, will stay on the same drive.
|
||||
This means files, when renamed or linked, will stay on the same
|
||||
filesystem.
|
||||
(default: false)
|
||||
.IP \[bu] 2
|
||||
\f[B]security_capability=BOOL\f[R]: If false return ENOATTR when xattr
|
||||
|
@ -233,7 +234,7 @@ to cow-shell.
|
|||
.IP \[bu] 2
|
||||
\f[B]statfs=base|full\f[R]: Controls how statfs works.
|
||||
`base' means it will always use all branches in statfs calculations.
|
||||
`full' is in effect path preserving and only includes drives where the
|
||||
`full' is in effect path preserving and only includes branches where the
|
||||
path exists.
|
||||
(default: base)
|
||||
.IP \[bu] 2
|
||||
|
@ -442,10 +443,10 @@ POLICY = mergerfs function policy
|
|||
.PP
|
||||
The `branches' argument is a colon (`:') delimited list of paths to be
|
||||
pooled together.
|
||||
It does not matter if the paths are on the same or different drives nor
|
||||
does it matter the filesystem (within reason).
|
||||
It does not matter if the paths are on the same or different filesystems
|
||||
nor does it matter the filesystem type (within reason).
|
||||
Used and available space will not be duplicated for paths on the same
|
||||
device and any features which aren\[cq]t supported by the underlying
|
||||
filesystem and any features which aren\[cq]t supported by the underlying
|
||||
filesystem (such as file attributes or extended attributes) will return
|
||||
the appropriate errors.
|
||||
.PP
|
||||
|
@ -454,7 +455,7 @@ A type which impacts whether or not the branch is included in a policy
|
|||
calculation and a individual minfreespace value.
|
||||
The values are set by prepending an \f[C]=\f[R] at the end of a branch
|
||||
designation and using commas as delimiters.
|
||||
Example: /mnt/drive=RW,1234
|
||||
Example: \f[C]/mnt/drive=RW,1234\f[R]
|
||||
.SS branch mode
|
||||
.IP \[bu] 2
|
||||
RW: (read/write) - Default behavior.
|
||||
|
@ -748,8 +749,8 @@ This is unlikely to occur in practice but is something to keep in mind.
|
|||
\f[B]WARNING:\f[R] Some backup solutions, such as CrashPlan, do not
|
||||
backup the target of a symlink.
|
||||
If using this feature it will be necessary to point any backup software
|
||||
to the original drives or configure the software to follow symlinks if
|
||||
such an option is available.
|
||||
to the original filesystems or configure the software to follow symlinks
|
||||
if such an option is available.
|
||||
Alternatively create two mounts.
|
||||
One for backup and one for general consumption.
|
||||
.SS nullrw
|
||||
|
@ -939,11 +940,11 @@ All policies which start with \f[C]ep\f[R] (\f[B]epff\f[R],
|
|||
\f[C]path preserving\f[R].
|
||||
\f[C]ep\f[R] stands for \f[C]existing path\f[R].
|
||||
.PP
|
||||
A path preserving policy will only consider drives where the relative
|
||||
A path preserving policy will only consider branches where the relative
|
||||
path being accessed already exists.
|
||||
.PP
|
||||
When using non-path preserving policies paths will be cloned to target
|
||||
drives as necessary.
|
||||
branches as necessary.
|
||||
.PP
|
||||
With the \f[C]msp\f[R] or \f[C]most shared path\f[R] policies they are
|
||||
defined as \f[C]path preserving\f[R] for the purpose of controlling
|
||||
|
@ -990,19 +991,19 @@ T}
|
|||
T{
|
||||
eplfs (existing path, least free space)
|
||||
T}@T{
|
||||
Of all the branches on which the relative path exists choose the drive
|
||||
Of all the branches on which the relative path exists choose the branch
|
||||
with the least free space.
|
||||
T}
|
||||
T{
|
||||
eplus (existing path, least used space)
|
||||
T}@T{
|
||||
Of all the branches on which the relative path exists choose the drive
|
||||
Of all the branches on which the relative path exists choose the branch
|
||||
with the least used space.
|
||||
T}
|
||||
T{
|
||||
epmfs (existing path, most free space)
|
||||
T}@T{
|
||||
Of all the branches on which the relative path exists choose the drive
|
||||
Of all the branches on which the relative path exists choose the branch
|
||||
with the most free space.
|
||||
T}
|
||||
T{
|
||||
|
@ -1019,23 +1020,23 @@ T}
|
|||
T{
|
||||
ff (first found)
|
||||
T}@T{
|
||||
Given the order of the drives, as defined at mount time or configured at
|
||||
runtime, act on the first one found.
|
||||
Given the order of the branches, as defined at mount time or configured
|
||||
at runtime, act on the first one found.
|
||||
T}
|
||||
T{
|
||||
lfs (least free space)
|
||||
T}@T{
|
||||
Pick the drive with the least available free space.
|
||||
Pick the branch with the least available free space.
|
||||
T}
|
||||
T{
|
||||
lus (least used space)
|
||||
T}@T{
|
||||
Pick the drive with the least used space.
|
||||
Pick the branch with the least used space.
|
||||
T}
|
||||
T{
|
||||
mfs (most free space)
|
||||
T}@T{
|
||||
Pick the drive with the most available free space.
|
||||
Pick the branch with the most available free space.
|
||||
T}
|
||||
T{
|
||||
msplfs (most shared path, least free space)
|
||||
|
@ -1141,8 +1142,8 @@ If a rename can\[cq]t be done atomically due to the source and
|
|||
destination paths existing on different mount points it will return
|
||||
\f[B]-1\f[R] with \f[B]errno = EXDEV\f[R] (cross device / improper
|
||||
link).
|
||||
So if a \f[C]rename\f[R]\[cq]s source and target are on different drives
|
||||
within the pool it creates an issue.
|
||||
So if a \f[C]rename\f[R]\[cq]s source and target are on different
|
||||
filesystems within the pool it creates an issue.
|
||||
.PP
|
||||
Originally mergerfs would return EXDEV whenever a rename was requested
|
||||
which was cross directory in any way.
|
||||
|
@ -1169,7 +1170,7 @@ For each file attempt rename:
|
|||
If failure with ENOENT (no such file or directory) run \f[B]create\f[R]
|
||||
policy
|
||||
.IP \[bu] 2
|
||||
If create policy returns the same drive as currently evaluating then
|
||||
If create policy returns the same branch as currently evaluating then
|
||||
clone the path
|
||||
.IP \[bu] 2
|
||||
Re-attempt rename
|
||||
|
@ -1184,9 +1185,9 @@ returned
|
|||
On success:
|
||||
.RS 2
|
||||
.IP \[bu] 2
|
||||
Remove the target from all drives with no source file
|
||||
Remove the target from all branches with no source file
|
||||
.IP \[bu] 2
|
||||
Remove the source from all drives which failed to rename
|
||||
Remove the source from all branches which failed to rename
|
||||
.RE
|
||||
.RE
|
||||
.IP \[bu] 2
|
||||
|
@ -1201,10 +1202,10 @@ Using the \f[B]getattr\f[R] policy get the target path
|
|||
For each file attempt rename:
|
||||
.RS 2
|
||||
.IP \[bu] 2
|
||||
If the source drive != target drive:
|
||||
If the source branch != target branch:
|
||||
.RS 2
|
||||
.IP \[bu] 2
|
||||
Clone target path from target drive to source drive
|
||||
Clone target path from target branch to source branch
|
||||
.RE
|
||||
.IP \[bu] 2
|
||||
Rename
|
||||
|
@ -1219,9 +1220,9 @@ returned
|
|||
On success:
|
||||
.RS 2
|
||||
.IP \[bu] 2
|
||||
Remove the target from all drives with no source file
|
||||
Remove the target from all branches with no source file
|
||||
.IP \[bu] 2
|
||||
Remove the source from all drives which failed to rename
|
||||
Remove the source from all branches which failed to rename
|
||||
.RE
|
||||
.RE
|
||||
.PP
|
||||
|
@ -1247,14 +1248,14 @@ file/directory which is the source of the metadata you see in an
|
|||
.SS statfs / statvfs
|
||||
.PP
|
||||
statvfs (http://linux.die.net/man/2/statvfs) normalizes the source
|
||||
drives based on the fragment size and sums the number of adjusted blocks
|
||||
and inodes.
|
||||
filesystems based on the fragment size and sums the number of adjusted
|
||||
blocks and inodes.
|
||||
This means you will see the combined space of all sources.
|
||||
Total, used, and free.
|
||||
The sources however are dedupped based on the drive so multiple sources
|
||||
on the same drive will not result in double counting its space.
|
||||
Filesystems mounted further down the tree of the branch will not be
|
||||
included when checking the mount\[cq]s stats.
|
||||
The sources however are dedupped based on the filesystem so multiple
|
||||
sources on the same drive will not result in double counting its space.
|
||||
Other filesystems mounted further down the tree of the branch will not
|
||||
be included when checking the mount\[cq]s stats.
|
||||
.PP
|
||||
The options \f[C]statfs\f[R] and \f[C]statfs_ignore\f[R] can be used to
|
||||
modify \f[C]statfs\f[R] behavior.
|
||||
|
@ -1611,11 +1612,11 @@ mergerfs.dedup: Will help identify and optionally remove duplicate files
|
|||
mergerfs.dup: Ensure there are at least N copies of a file across the
|
||||
pool
|
||||
.IP \[bu] 2
|
||||
mergerfs.balance: Rebalance files across drives by moving them from the
|
||||
most filled to the least filled
|
||||
mergerfs.balance: Rebalance files across filesystems by moving them from
|
||||
the most filled to the least filled
|
||||
.IP \[bu] 2
|
||||
mergerfs.consolidate: move files within a single mergerfs directory to
|
||||
the drive with most free space
|
||||
the filesystem with most free space
|
||||
.RE
|
||||
.IP \[bu] 2
|
||||
https://github.com/trapexit/scorch
|
||||
|
@ -1746,40 +1747,21 @@ Note that if an application is properly sizing writes then writeback
|
|||
caching will have little or no effect.
|
||||
It will only help with writes of sizes below the FUSE message size (128K
|
||||
on older kernels, 1M on newer).
|
||||
.SS policy caching
|
||||
.PP
|
||||
Policies are run every time a function (with a policy as mentioned
|
||||
above) is called.
|
||||
These policies can be expensive depending on mergerfs\[cq] setup and
|
||||
client usage patterns.
|
||||
Generally we wouldn\[cq]t want to cache policy results because it may
|
||||
result in stale responses if the underlying drives are used directly.
|
||||
.PP
|
||||
The \f[C]open\f[R] policy cache will cache the result of an
|
||||
\f[C]open\f[R] policy for a particular input for \f[C]cache.open\f[R]
|
||||
seconds or until the file is unlinked.
|
||||
Each file close (release) will randomly chose to clean up the cache of
|
||||
expired entries.
|
||||
.PP
|
||||
This cache is really only useful in cases where you have a large number
|
||||
of branches and \f[C]open\f[R] is called on the same files repeatedly
|
||||
(like \f[B]Transmission\f[R] which opens and closes a file on every
|
||||
read/write presumably to keep file handle usage low).
|
||||
.SS statfs caching
|
||||
.PP
|
||||
Of the syscalls used by mergerfs in policies the \f[C]statfs\f[R] /
|
||||
\f[C]statvfs\f[R] call is perhaps the most expensive.
|
||||
It\[cq]s used to find out the available space of a drive and whether it
|
||||
is mounted read-only.
|
||||
It\[cq]s used to find out the available space of a filesystem and
|
||||
whether it is mounted read-only.
|
||||
Depending on the setup and usage pattern these queries can be relatively
|
||||
costly.
|
||||
When \f[C]cache.statfs\f[R] is enabled all calls to \f[C]statfs\f[R] by
|
||||
a policy will be cached for the number of seconds its set to.
|
||||
.PP
|
||||
Example: If the create policy is \f[C]mfs\f[R] and the timeout is 60
|
||||
then for that 60 seconds the same drive will be returned as the target
|
||||
for creates because the available space won\[cq]t be updated for that
|
||||
time.
|
||||
then for that 60 seconds the same filesystem will be returned as the
|
||||
target for creates because the available space won\[cq]t be updated for
|
||||
that time.
|
||||
.SS symlink caching
|
||||
.PP
|
||||
As of version 4.20 Linux supports symlink caching.
|
||||
|
@ -1815,54 +1797,55 @@ NVMe, SSD, Optane in front of traditional HDDs for instance.
|
|||
MergerFS does not natively support any sort of tiered caching.
|
||||
Most users have no use for such a feature and its inclusion would
|
||||
complicate the code.
|
||||
However, there are a few situations where a cache drive could help with
|
||||
a typical mergerfs setup.
|
||||
However, there are a few situations where a cache filesystem could help
|
||||
with a typical mergerfs setup.
|
||||
.IP "1." 3
|
||||
Fast network, slow drives, many readers: You\[cq]ve a 10+Gbps network
|
||||
with many readers and your regular drives can\[cq]t keep up.
|
||||
Fast network, slow filesystems, many readers: You\[cq]ve a 10+Gbps
|
||||
network with many readers and your regular filesystems can\[cq]t keep
|
||||
up.
|
||||
.IP "2." 3
|
||||
Fast network, slow drives, small\[cq]ish bursty writes: You have a
|
||||
Fast network, slow filesystems, small\[cq]ish bursty writes: You have a
|
||||
10+Gbps network and wish to transfer amounts of data less than your
|
||||
cache drive but wish to do so quickly.
|
||||
cache filesystem but wish to do so quickly.
|
||||
.PP
|
||||
With #1 it\[cq]s arguable if you should be using mergerfs at all.
|
||||
RAID would probably be the better solution.
|
||||
If you\[cq]re going to use mergerfs there are other tactics that may
|
||||
help: spreading the data across drives (see the mergerfs.dup tool) and
|
||||
setting \f[C]func.open=rand\f[R], using \f[C]symlinkify\f[R], or using
|
||||
dm-cache or a similar technology to add tiered cache to the underlying
|
||||
device.
|
||||
help: spreading the data across filesystems (see the mergerfs.dup tool)
|
||||
and setting \f[C]func.open=rand\f[R], using \f[C]symlinkify\f[R], or
|
||||
using dm-cache or a similar technology to add tiered cache to the
|
||||
underlying device.
|
||||
.PP
|
||||
With #2 one could use dm-cache as well but there is another solution
|
||||
which requires only mergerfs and a cronjob.
|
||||
.IP "1." 3
|
||||
Create 2 mergerfs pools.
|
||||
One which includes just the slow drives and one which has both the fast
|
||||
drives (SSD,NVME,etc.) and slow drives.
|
||||
One which includes just the slow devices and one which has both the fast
|
||||
devices (SSD,NVME,etc.) and slow devices.
|
||||
.IP "2." 3
|
||||
The `cache' pool should have the cache drives listed first.
|
||||
The `cache' pool should have the cache filesystems listed first.
|
||||
.IP "3." 3
|
||||
The best \f[C]create\f[R] policies to use for the `cache' pool would
|
||||
probably be \f[C]ff\f[R], \f[C]epff\f[R], \f[C]lfs\f[R], or
|
||||
\f[C]eplfs\f[R].
|
||||
The latter two under the assumption that the cache drive(s) are far
|
||||
smaller than the backing drives.
|
||||
The latter two under the assumption that the cache filesystem(s) are far
|
||||
smaller than the backing filesystems.
|
||||
If using path preserving policies remember that you\[cq]ll need to
|
||||
manually create the core directories of those paths you wish to be
|
||||
cached.
|
||||
Be sure the permissions are in sync.
|
||||
Use \f[C]mergerfs.fsck\f[R] to check / correct them.
|
||||
You could also tag the slow drives as \f[C]=NC\f[R] though that\[cq]d
|
||||
mean if the cache drives fill you\[cq]d get \[lq]out of space\[rq]
|
||||
errors.
|
||||
You could also set the slow filesystems mode to \f[C]NC\f[R] though
|
||||
that\[cq]d mean if the cache filesystems fill you\[cq]d get \[lq]out of
|
||||
space\[rq] errors.
|
||||
.IP "4." 3
|
||||
Enable \f[C]moveonenospc\f[R] and set \f[C]minfreespace\f[R]
|
||||
appropriately.
|
||||
To make sure there is enough room on the \[lq]slow\[rq] pool you might
|
||||
want to set \f[C]minfreespace\f[R] to at least as large as the size of
|
||||
the largest cache drive if not larger.
|
||||
This way in the worst case the whole of the cache drive(s) can be moved
|
||||
to the other drives.
|
||||
the largest cache filesystem if not larger.
|
||||
This way in the worst case the whole of the cache filesystem(s) can be
|
||||
moved to the other drives.
|
||||
.IP "5." 3
|
||||
Set your programs to use the cache pool.
|
||||
.IP "6." 3
|
||||
|
@ -1880,8 +1863,8 @@ May want to use the \f[C]fadvise\f[R] / \f[C]--drop-cache\f[R] version
|
|||
of rsync or run rsync with the tool \[lq]nocache\[rq].
|
||||
.PP
|
||||
\f[I]NOTE:\f[R] The arguments to these scripts include the cache
|
||||
\f[B]drive\f[R].
|
||||
Not the pool with the cache drive.
|
||||
\f[B]filesystem\f[R] itself.
|
||||
Not the pool with the cache filesystem.
|
||||
You could have data loss if the source is the cache pool.
|
||||
.IP
|
||||
.nf
|
||||
|
@ -1889,7 +1872,7 @@ You could have data loss if the source is the cache pool.
|
|||
#!/bin/bash
|
||||
|
||||
if [ $# != 3 ]; then
|
||||
echo \[dq]usage: $0 <cache-drive> <backing-pool> <days-old>\[dq]
|
||||
echo \[dq]usage: $0 <cache-fs> <backing-pool> <days-old>\[dq]
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
@ -1907,8 +1890,8 @@ Move the oldest file from the cache to the backing pool.
|
|||
Continue till below percentage threshold.
|
||||
.PP
|
||||
\f[I]NOTE:\f[R] The arguments to these scripts include the cache
|
||||
\f[B]drive\f[R].
|
||||
Not the pool with the cache drive.
|
||||
\f[B]filesystem\f[R] itself.
|
||||
Not the pool with the cache filesystem.
|
||||
You could have data loss if the source is the cache pool.
|
||||
.IP
|
||||
.nf
|
||||
|
@ -1916,7 +1899,7 @@ You could have data loss if the source is the cache pool.
|
|||
#!/bin/bash
|
||||
|
||||
if [ $# != 3 ]; then
|
||||
echo \[dq]usage: $0 <cache-drive> <backing-pool> <percentage>\[dq]
|
||||
echo \[dq]usage: $0 <cache-fs> <backing-pool> <percentage>\[dq]
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
@ -1946,7 +1929,7 @@ That said the performance can match the theoretical max but it depends
|
|||
greatly on the system\[cq]s configuration.
|
||||
Especially when adding network filesystems into the mix there are many
|
||||
variables which can impact performance.
|
||||
Drive speeds and latency, network speeds and latency, general
|
||||
Device speeds and latency, network speeds and latency, general
|
||||
concurrency, read/write sizes, etc.
|
||||
Unfortunately, given the number of variables it has been difficult to
|
||||
find a single set of settings which provide optimal performance.
|
||||
|
@ -1982,7 +1965,7 @@ disk
|
|||
.IP \[bu] 2
|
||||
use \f[C]symlinkify\f[R] if your data is largely static and read-only
|
||||
.IP \[bu] 2
|
||||
use tiered cache drives
|
||||
use tiered cache devices
|
||||
.IP \[bu] 2
|
||||
use LVM and LVM cache to place a SSD in front of your HDDs
|
||||
.IP \[bu] 2
|
||||
|
@ -2029,7 +2012,7 @@ Extremely high speed and very low latency.
|
|||
This is a more realistic best case scenario.
|
||||
Example: \f[C]mount -t tmpfs -o size=2G tmpfs /tmp/tmpfs\f[R]
|
||||
.IP "3." 3
|
||||
Mount mergerfs over a local drive.
|
||||
Mount mergerfs over a local device.
|
||||
NVMe, SSD, HDD, etc.
|
||||
If you have more than one I\[cq]d suggest testing each of them as drives
|
||||
and/or controllers (their drivers) could impact performance.
|
||||
|
@ -2046,7 +2029,7 @@ For reads and writes the most relevant would be: \f[C]cache.files\f[R],
|
|||
Less likely but relevant when using NFS or with certain filesystems
|
||||
would be \f[C]security_capability\f[R], \f[C]xattr\f[R], and
|
||||
\f[C]posix_acl\f[R].
|
||||
If you find a specific system, drive, filesystem, controller, etc.
|
||||
If you find a specific system, device, filesystem, controller, etc.
|
||||
that performs poorly contact trapexit so he may investigate further.
|
||||
.PP
|
||||
Sometimes the problem is really the application accessing or writing
|
||||
|
@ -2109,7 +2092,7 @@ exibit incorrect behavior if run otherwise..
|
|||
If you don\[cq]t see some directories and files you expect, policies
|
||||
seem to skip branches, you get strange permission errors, etc.
|
||||
be sure the underlying filesystems\[cq] permissions are all the same.
|
||||
Use \f[C]mergerfs.fsck\f[R] to audit the drive for out of sync
|
||||
Use \f[C]mergerfs.fsck\f[R] to audit the filesystem for out of sync
|
||||
permissions.
|
||||
.IP \[bu] 2
|
||||
If you still have permission issues be sure you are using POSIX ACL
|
||||
|
@ -2165,7 +2148,7 @@ appear outdated.
|
|||
The reason this is the default is because any other policy would be more
|
||||
expensive and for many applications it is unnecessary.
|
||||
To always return the directory with the most recent mtime or a faked
|
||||
value based on all found would require a scan of all drives.
|
||||
value based on all found would require a scan of all filesystems.
|
||||
.PP
|
||||
If you always want the directory information from the one with the most
|
||||
recent mtime then use the \f[C]newest\f[R] policy for \f[C]getattr\f[R].
|
||||
|
@ -2191,7 +2174,7 @@ Since the source \f[B]is\f[R] the target in this case, depending on the
|
|||
unlink policy, it will remove the just copied file and other files
|
||||
across the branches.
|
||||
.PP
|
||||
If you want to move files to one drive just copy them there and use
|
||||
If you want to move files to one filesystem just copy them there and use
|
||||
mergerfs.dedup to clean up the old paths or manually remove them from
|
||||
the branches directly.
|
||||
.SS cached memory appears greater than it should be
|
||||
|
@ -2253,16 +2236,15 @@ Please read the section above regarding rename & link.
|
|||
The problem is that many applications do not properly handle
|
||||
\f[C]EXDEV\f[R] errors which \f[C]rename\f[R] and \f[C]link\f[R] may
|
||||
return even though they are perfectly valid situations which do not
|
||||
indicate actual drive or OS errors.
|
||||
indicate actual device, filesystem, or OS errors.
|
||||
The error will only be returned by mergerfs if using a path preserving
|
||||
policy as described in the policy section above.
|
||||
If you do not care about path preservation simply change the mergerfs
|
||||
policy to the non-path preserving version.
|
||||
For example: \f[C]-o category.create=mfs\f[R]
|
||||
.PP
|
||||
Ideally the offending software would be fixed and it is recommended that
|
||||
if you run into this problem you contact the software\[cq]s author and
|
||||
request proper handling of \f[C]EXDEV\f[R] errors.
|
||||
For example: \f[C]-o category.create=mfs\f[R] Ideally the offending
|
||||
software would be fixed and it is recommended that if you run into this
|
||||
problem you contact the software\[cq]s author and request proper
|
||||
handling of \f[C]EXDEV\f[R] errors.
|
||||
.SS my 32bit software has problems
|
||||
.PP
|
||||
Some software have problems with 64bit inode values.
|
||||
|
@ -2373,24 +2355,24 @@ to dual socket Xeon systems with >20 cores.
|
|||
I\[cq]m aware of at least a few companies which use mergerfs in
|
||||
production.
|
||||
Open Media Vault (https://www.openmediavault.org) includes mergerfs as
|
||||
its sole solution for pooling drives.
|
||||
its sole solution for pooling filesystems.
|
||||
The author of mergerfs had it running for over 300 days managing 16+
|
||||
drives with reasonably heavy 24/7 read and write usage.
|
||||
devices with reasonably heavy 24/7 read and write usage.
|
||||
Stopping only after the machine\[cq]s power supply died.
|
||||
.PP
|
||||
Most serious issues (crashes or data corruption) have been due to kernel
|
||||
bugs (https://github.com/trapexit/mergerfs/wiki/Kernel-Issues-&-Bugs).
|
||||
All of which are fixed in stable releases.
|
||||
.SS Can mergerfs be used with drives which already have data / are in use?
|
||||
.SS Can mergerfs be used with filesystems which already have data / are in use?
|
||||
.PP
|
||||
Yes.
|
||||
MergerFS is a proxy and does \f[B]NOT\f[R] interfere with the normal
|
||||
form or function of the drives / mounts / paths it manages.
|
||||
form or function of the filesystems / mounts / paths it manages.
|
||||
.PP
|
||||
MergerFS is \f[B]not\f[R] a traditional filesystem.
|
||||
MergerFS is \f[B]not\f[R] RAID.
|
||||
It does \f[B]not\f[R] manipulate the data that passes through it.
|
||||
It does \f[B]not\f[R] shard data across drives.
|
||||
It does \f[B]not\f[R] shard data across filesystems.
|
||||
It merely shards some \f[B]behavior\f[R] and aggregates others.
|
||||
.SS Can mergerfs be removed without affecting the data?
|
||||
.PP
|
||||
|
@ -2402,7 +2384,8 @@ probably best off using \f[C]mfs\f[R] for \f[C]category.create\f[R].
|
|||
It will spread files out across your branches based on available space.
|
||||
Use \f[C]mspmfs\f[R] if you want to try to colocate the data a bit more.
|
||||
You may want to use \f[C]lus\f[R] if you prefer a slightly different
|
||||
distribution of data if you have a mix of smaller and larger drives.
|
||||
distribution of data if you have a mix of smaller and larger
|
||||
filesystems.
|
||||
Generally though \f[C]mfs\f[R], \f[C]lus\f[R], or even \f[C]rand\f[R]
|
||||
are good for the general use case.
|
||||
If you are starting with an imbalanced pool you can use the tool
|
||||
|
@ -2413,8 +2396,8 @@ set \f[C]func.create\f[R] to \f[C]epmfs\f[R] or similar and
|
|||
\f[C]func.mkdir\f[R] to \f[C]rand\f[R] or \f[C]eprand\f[R] depending on
|
||||
if you just want to colocate generally or on specific branches.
|
||||
Either way the \f[I]need\f[R] to colocate is rare.
|
||||
For instance: if you wish to remove the drive regularly and want the
|
||||
data to predictably be on that drive or if you don\[cq]t use backup at
|
||||
For instance: if you wish to remove the device regularly and want the
|
||||
data to predictably be on that device or if you don\[cq]t use backup at
|
||||
all and don\[cq]t wish to replace that data piecemeal.
|
||||
In which case using path preservation can help but will require some
|
||||
manual attention.
|
||||
|
@ -2451,9 +2434,9 @@ the documentation will be improved.
|
|||
That said, for the average person, the following should be fine:
|
||||
.PP
|
||||
\f[C]cache.files=off,dropcacheonclose=true,category.create=mfs\f[R]
|
||||
.SS Why are all my files ending up on 1 drive?!
|
||||
.SS Why are all my files ending up on 1 filesystem?!
|
||||
.PP
|
||||
Did you start with empty drives?
|
||||
Did you start with empty filesystems?
|
||||
Did you explicitly configure a \f[C]category.create\f[R] policy?
|
||||
Are you using an \f[C]existing path\f[R] / \f[C]path preserving\f[R]
|
||||
policy?
|
||||
|
@ -2461,23 +2444,23 @@ policy?
|
|||
The default create policy is \f[C]epmfs\f[R].
|
||||
That is a path preserving algorithm.
|
||||
With such a policy for \f[C]mkdir\f[R] and \f[C]create\f[R] with a set
|
||||
of empty drives it will select only 1 drive when the first directory is
|
||||
created.
|
||||
of empty filesystems it will select only 1 filesystem when the first
|
||||
directory is created.
|
||||
Anything, files or directories, created in that first directory will be
|
||||
placed on the same branch because it is preserving paths.
|
||||
.PP
|
||||
This catches a lot of new users off guard but changing the default would
|
||||
break the setup for many existing users.
|
||||
If you do not care about path preservation and wish your files to be
|
||||
spread across all your drives change to \f[C]mfs\f[R] or similar policy
|
||||
as described above.
|
||||
spread across all your filesystems change to \f[C]mfs\f[R] or similar
|
||||
policy as described above.
|
||||
If you do want path preservation you\[cq]ll need to perform the manual
|
||||
act of creating paths on the drives you want the data to land on before
|
||||
transferring your data.
|
||||
act of creating paths on the filesystems you want the data to land on
|
||||
before transferring your data.
|
||||
Setting \f[C]func.mkdir=epall\f[R] can simplify managing path
|
||||
preservation for \f[C]create\f[R].
|
||||
Or use \f[C]func.mkdir=rand\f[R] if you\[cq]re interested in just
|
||||
grouping together directory content by drive.
|
||||
grouping together directory content by filesystem.
|
||||
.SS Do hardlinks work?
|
||||
.PP
|
||||
Yes.
|
||||
|
@ -2546,8 +2529,8 @@ of the caller.
|
|||
This means that if the user does not have access to a file or directory
|
||||
than neither will mergerfs.
|
||||
However, because mergerfs is creating a union of paths it may be able to
|
||||
read some files and directories on one drive but not another resulting
|
||||
in an incomplete set.
|
||||
read some files and directories on one filesystem but not another
|
||||
resulting in an incomplete set.
|
||||
.PP
|
||||
Whenever you run into a split permission issue (seeing some but not all
|
||||
files) try using
|
||||
|
@ -2644,7 +2627,7 @@ features which aufs and overlayfs have.
|
|||
.PP
|
||||
UnionFS is more like aufs than mergerfs in that it offers overlay / CoW
|
||||
features.
|
||||
If you\[cq]re just looking to create a union of drives and want
|
||||
If you\[cq]re just looking to create a union of filesystems and want
|
||||
flexibility in file/directory placement then mergerfs offers that
|
||||
whereas unionfs is more for overlaying RW filesystems over RO ones.
|
||||
.SS Why use mergerfs over overlayfs?
|
||||
|
@ -2664,8 +2647,9 @@ without the single point of failure.
|
|||
.SS Why use mergerfs over ZFS?
|
||||
.PP
|
||||
MergerFS is not intended to be a replacement for ZFS.
|
||||
MergerFS is intended to provide flexible pooling of arbitrary drives
|
||||
(local or remote), of arbitrary sizes, and arbitrary filesystems.
|
||||
MergerFS is intended to provide flexible pooling of arbitrary
|
||||
filesystems (local or remote), of arbitrary sizes, and arbitrary
|
||||
filesystems.
|
||||
For \f[C]write once, read many\f[R] usecases such as bulk media storage.
|
||||
Where data integrity and backup is managed in other ways.
|
||||
In that situation ZFS can introduce a number of costs and limitations as
|
||||
|
@ -2683,6 +2667,29 @@ open source is important.
|
|||
.PP
|
||||
There are a number of UnRAID users who use mergerfs as well though
|
||||
I\[cq]m not entirely familiar with the use case.
|
||||
.SS Why use mergerfs over StableBit\[cq]s DrivePool?
|
||||
.PP
|
||||
DrivePool works only on Windows so not as common an alternative as other
|
||||
Linux solutions.
|
||||
If you want to use Windows then DrivePool is a good option.
|
||||
Functionally the two projects work a bit differently.
|
||||
DrivePool always writes to the filesystem with the most free space and
|
||||
later rebalances.
|
||||
mergerfs does not offer rebalance but chooses a branch at file/directory
|
||||
create time.
|
||||
DrivePool\[cq]s rebalancing can be done differently in any directory and
|
||||
has file pattern matching to further customize the behavior.
|
||||
mergerfs, not having rebalancing does not have these features, but
|
||||
similar features are planned for mergerfs v3.
|
||||
DrivePool has builtin file duplication which mergerfs does not natively
|
||||
support (but can be done via an external script.)
|
||||
.PP
|
||||
There are a lot of misc differences between the two projects but most
|
||||
features in DrivePool can be replicated with external tools in
|
||||
combination with mergerfs.
|
||||
.PP
|
||||
Additionally DrivePool is a closed source commercial product vs mergerfs
|
||||
a ISC licensed OSS project.
|
||||
.SS What should mergerfs NOT be used for?
|
||||
.IP \[bu] 2
|
||||
databases: Even if the database stored data in separate files (mergerfs
|
||||
|
@ -2698,7 +2705,7 @@ much latency (if it works at all).
|
|||
As replacement for RAID: mergerfs is just for pooling branches.
|
||||
If you need that kind of device performance aggregation or high
|
||||
availability you should stick with RAID.
|
||||
.SS Can drives be written to directly? Outside of mergerfs while pooled?
|
||||
.SS Can filesystems be written to directly? Outside of mergerfs while pooled?
|
||||
.PP
|
||||
Yes, however it\[cq]s not recommended to use the same file from within
|
||||
the pool and from without at the same time (particularly writing).
|
||||
|
@ -2729,7 +2736,7 @@ those settings.
|
|||
Only one error can be returned and if one of the reasons for filtering a
|
||||
branch was \f[B]minfreespace\f[R] then it will be returned as such.
|
||||
\f[B]moveonenospc\f[R] is only relevant to writing a file which is too
|
||||
large for the drive its currently on.
|
||||
large for the filesystem it\[cq]s currently on.
|
||||
.PP
|
||||
It is also possible that the filesystem selected has run out of inodes.
|
||||
Use \f[C]df -i\f[R] to list the total and available inodes per
|
||||
|
@ -2824,7 +2831,7 @@ Taking after \f[B]Samba\f[R], mergerfs uses
|
|||
\f[B]syscall(SYS_setreuid,\&...)\f[R] to set the callers credentials for
|
||||
that thread only.
|
||||
Jumping back to \f[B]root\f[R] as necessary should escalated privileges
|
||||
be needed (for instance: to clone paths between drives).
|
||||
be needed (for instance: to clone paths between filesystems).
|
||||
.PP
|
||||
For non-Linux systems mergerfs uses a read-write lock and changes
|
||||
credentials only when necessary.
|
||||
|
|
Loading…
Reference in New Issue
Block a user