This includes an HDFS docker image to use with the integration tests. Co-authored-by: Ivan Andreev <ivandeex@gmail.com> Co-authored-by: Nick Craig-Wood <nick@craig-wood.com>
4.8 KiB
title | description |
---|---|
HDFS Remote | Remote for Hadoop Distributed Filesystem |
{{< icon "fa fa-globe" >}} HDFS
HDFS is a
distributed file-system, part of the Apache Hadoop framework.
Paths are specified as remote:
or remote:path/to/dir
.
Here is an example of how to make a remote called remote
. First run:
rclone config
This will guide you through an interactive setup process:
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> remote
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
[skip]
XX / Hadoop distributed file system
\ "hdfs"
[skip]
Storage> hdfs
** See help for hdfs backend at: https://rclone.org/hdfs/ **
hadoop name node and port
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Connect to host namenode at port 8020
\ "namenode:8020"
namenode> namenode.hadoop:8020
hadoop user name
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Connect to hdfs as root
\ "root"
username> root
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
--------------------
[remote]
type = hdfs
namenode = namenode.hadoop:8020
username = root
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:
Name Type
==== ====
hadoop hdfs
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
This remote is called remote
and can now be used like this
See all the top level directories
rclone lsd remote:
List the contents of a directory
rclone ls remote:directory
Sync the remote directory
to /home/local/directory
, deleting any excess files.
rclone sync -i remote:directory /home/local/directory
Setting up your own HDFS instance for testing
You may start with a manual setup
or use the docker image from the tests:
If you want to build the docker image
git clone https://github.com/rclone/rclone.git
cd rclone/fstest/testserver/images/test-hdfs
docker build --rm -t rclone/test-hdfs .
Or you can just use the latest one pushed
docker run --rm --name "rclone-hdfs" -p 127.0.0.1:9866:9866 -p 127.0.0.1:8020:8020 --hostname "rclone-hdfs" rclone/test-hdfs
NB it need few seconds to startup.
For this docker image the remote needs to be configured like this:
[remote]
type = hdfs
namenode = 127.0.0.1:8020
username = root
You can stop this image with docker kill rclone-hdfs
(NB it does not use volumes, so all data
uploaded will be lost.)
Modified time
Time accurate to 1 second is stored.
Checksum
No checksums are implemented.
Usage information
You can use the rclone about remote:
command which will display filesystem size and current usage.
Restricted filename characters
In addition to the default restricted characters set
the following characters are also replaced:
Character | Value | Replacement |
---|---|---|
: | 0x3A | : |
Invalid UTF-8 bytes will also be replaced.
Limitations
- No server-side
Move
orDirMove
. - Checksums not implemented.
{{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/hdfs/hdfs.go then run make backenddocs" >}}
Standard Options
Here are the standard options specific to hdfs (Hadoop distributed file system).
--hdfs-namenode
hadoop name node and port
- Config: namenode
- Env Var: RCLONE_HDFS_NAMENODE
- Type: string
- Default: ""
- Examples:
- "namenode:8020"
- Connect to host namenode at port 8020
- "namenode:8020"
--hdfs-username
hadoop user name
- Config: username
- Env Var: RCLONE_HDFS_USERNAME
- Type: string
- Default: ""
- Examples:
- "root"
- Connect to hdfs as root
- "root"
Advanced Options
Here are the advanced options specific to hdfs (Hadoop distributed file system).
--hdfs-encoding
This sets the encoding for the backend.
See: the encoding section in the overview for more info.
- Config: encoding
- Env Var: RCLONE_HDFS_ENCODING
- Type: MultiEncoder
- Default: Slash,Colon,Del,Ctl,InvalidUtf8,Dot
{{< rem autogenerated options stop >}}