. Advertisement .
..3..
. Advertisement .
..4..
Do you know what HDFS DFS commands are? No? Then this article is for readers like you. In this concise guide, we will briefly sum up definitions, concepts, purposes, and several examples of a typical HDFS DFS command!
What Are HDFS DFS Commands?
They are a major and primary component that stems from the Hadoop ecosystem. Hence, it takes charge of large data storage sets concerning unstructured or structured data of numerous nodes. Therefore, it maintains your metadata under a log file format.
So how can you use HDFS commands?
The first step is to kick start Hadoop services via this command: sbin/start-all.sh. Meanwhile, the command “jps” can help you inspect whether the Hadoop service is still activated.
Popular HDFS DFS Commands
1. Ls
Ls do not have any difference compared to UNIX Ls Commands. This command is adopted to help programmers list directories available under specific directories in HDFS systems. On the other hand, “-Lsr” commands aim to list files and directories recursively under specific folders.
Syntaxes:
$ hadoop fs -ls [-d] [-h] [-R]
Here are some indicators for the syntax:
- -d: Used to help list directories as several plain files
- -h: Used to help format file sizes into human-readable materials instead of bytes.
- -R: used to list directories’ content recursively.
Example:
$ hadoop fs -ls /
$ hadoop fs -lsr /
The command will correspond with specified file patterns, resulting in the format for directory entries shown as below:
permissions - userId groupId sizeOfDirectory(in bytes) modificationDate(yyyy-MM-dd HH:mm) directoryName’’
2. Setrep
The Setrep command is adopted to help programmers change the file’s replication factor into a specific count (rather than default replication factors for other remaining ones in an HDFS system). In essence, this command will make recursive changes to replication factors of all residing files on directory trees per provided input.
Syntaxes:
$ hadoop fs -setrep [-R] [-w]
Here are some syntax options:
- -w: Request commands to wait till the replications have been completed
- -R: Accepts backward capabilities with no effects.
Examples:
$ hadoop fs -setrep -R /user/hadoop/
3. Test
As the name suggests, it helps users test the zero length’s existence of one HDFF file. It also confirms whether that zero length is considered a directory.
Some options available in the syntax include:
- -d: Check whether the zero-length is a directory. If yes, it will send back ‘0’.
- -e: Check whether they exist. If yes, it will send back ‘0’.
- -f: Check whether the file is there or not. If yes, it will send back ‘0’.
- -s: Check whether the size of the file is bigger than zero bytes. If yes, it will send back ‘0’.
- -z: Check whether the size of the file is 0 bytes. If yes, it will send back ‘0’. Other results will send back ‘1’.
Example:
$ hadoop fs -test -[defsz] /user/test/test.txt
Example 2:
hdfs dfs -test -e sample
hdfs dfs -test -z sample
hdfs dfs -test -d sample
4. Put
The command helps users copy several files from a local system to an HDFS system, quite similar to the command “copyFromLocal”.
Keep in mind that the command cannot work if this file has already been there. The only exception is that your F flags are given to these commands, which can overwrite the destination and the file’s existence before your copy.
Syntaxes:
$ hadoop fs -put [-f] [-p] …
in which “-p” represents the flag that preserves ownership, mode, modification time, and access.
Examples:
$ hadoop fs -put sample.txt /user/data/
5. appendToFile
This command appends your content to a file present on the HDFS system. It may append multiple sources (or only one source) from a local system to a destination system. On another note, it’s safe to say it can accommodate contents from all given files to a pre-assigned destination file from your HDFS file system.
Syntax:
$ hadoop fs -appendToFile /hdfs-file-path
or
$ hdfs dfs -appendToFile /hdfs-file-path
Example:
hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile
hdfs dfs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
The Difference Between HDFS DFS and Hadoop FS
The two might seem similar at first. However, at a closer look, you can identify certain differences, as stated by the official documentation issued by Apache. We will summarize some key points here.
In essence, as most people can guess, an “HDFS DFS” command aims specifically at the HFS (Hadoop filesystem) data operation. Meanwhile, a “Hadoop FS” takes charge of a huge data quantity presented on the external platform. These platforms cover local filesystem info as well.
For further understanding, do not hesitate to look through some official Hadoop tutorials to grasp these fundamental entry-level concepts. Once you no longer have doubts about them, insights into data analytics in real-world scenarios will be clearer to you.
What Are Their Parameters?
1. Configured Capacities
What is configured capacity? It is a complete capacity accessed by HDFS for storing data. Regarding its calculation, the formula is:
Amoun of The Entire Disk Space – Amount of Reserved Space = Configured Capacity.
Reserved space refers to a space allocated for specific OS-level operations, which you may configure by using ” dfs.datanode.du.reserved (remember that you may update/add it from “dfs-site.xml.” With Configured Capacity, no relevance is identified in replication factors.
2. Present Capacities
Present capacities point to the entire quantity of space storage, which allows you to store files once all open blocks and metadata (non-DFS used space) have been allocated.
Thus, what differentiates Present Capacities and Configured Capacities will be adopted to store your system files, metadata, and other info.
Once the DataNotes transmit messages and reports to your NameNode, they also boast a parameter for Present Capacities, which is conveyed to NameNode, enabling NameNode to aggregate and track it from DataNotes. The DataNotes will then be displayed when you run the “HDFS DFSadmin-report” command.
Thus, your Present Capacities can differ in different circumstances, depending on your usage of Non-HDFS directories. On the other hand, Configured Capacities stay unchanged till you remove/add disks/volume from HDFS.
Conclusion
Our article has delivered a brief yet info-rich guide on everything you need to know about HDFS DFS commands. We hope that what you learn here can be useful for your future programs, especially those that use Python languages. For more tips to fix common Python command errors (such as “can’t find python executable “python”, you can set the python env variable.”), feel free to browse through our website.
Leave a comment