Cloudera Hadoop Interview Questions

Here is the list of Hadoop Interview Questions which are recently asked in Wipro company. These questions are included for both Freshers and Experienced professionals. Our Hadoop Training has Answered all the below Questions.

1. What is rack awareness? And why is it necessary?

Rack Awareness in Hadoop is the concept that chooses closer Datanodes based on the rack information. To improve the network traffic while reading/writing HDFS files in large clusters of Hadoop. The main purpose of the Rack-Awareness is to prevent data loss if the entire rack fails. It also improves network bandwidth.

2. What is the default block size and how is it defined?

In HDFS data is stored in the terms of Block. It is the size of the file that get divided into when the file is store in any node. In the Hadoop the default block size is 128 MB.

3. How do you get the report of hdfs file system? About disk availability and no. of active nodes?

Run the fsck command on namenode as $HDFS_USER: su – hdfs -c “hdfs fsck / -files -blocks -locations > dfs-new-fsck-1.log” .
Run hdfs namespace and report.
Compare the namespace report before the upgrade and after the upgrade.
Verify that read and write to hdfs works successfully.

4. What is Hadoop balancer and why is it necessary?

The HDFS balancer re-balances data across the DataNodes, moving blocks from the overutilized to underutilized nodes. As the system administrator, you can run the balancer from the command-line as necessary .for example, after adding the new DataNodes to the cluster.

5. Difference between Cloudera and Ambari?

Cloudera is a mature Management suite in comparison to Ambari. Cloudera is consists of advanced cluster management features and is an open-source application that comes with a vendor-lock management suite which helps in a faster installation and deployment process. Whereas Ambari allows enterprises to plan, install, and securely configure HDP making it easier to provide the ongoing cluster maintenance and management, no matter the size of the cluster.

Free PDF : Get our updated Hadoop Course Content pdf

6. What are the main actions performed by the Hadoop admin?

The typical responsibilities of a Hadoop admin include deploying a hadoop cluster, maintaining a hadoop cluster, adding and the removing nodes using cluster monitoring tools like Ganglia Nagios or Cloudera Manager, configuring the NameNode high availability and keeping a track of all the running hadoop jobs.

7. What is Kerberos?

Kerberos was designed to the provide secure authentication to services over an insecure network. Kerberos uses tickets to authenticate a user and the completely avoids sending passwords across the network.

8. What is the important list of hdfs commands?

The important list of hdfs commands are:

ls: This command is used to list all the files.
mkdir: To create a directory.
touchz: It creates an empty file.
copyFromLocal (or) put: To copy files/folders from local file system to hdfs store.
cat: To print file contents.
copyToLocal (or) get: To copy files/folders from hdfs store to local file system.

9. How to check the logs of a Hadoop job submitted in the cluster and how to terminate already running process?

Click on HDFS —-> Configs ——-> type log in filter box. The picture below shows how to locate log directory for Apache Oozie using grep command of Unix. log directories will have three types of files . Logs of running daemons will be available here in this. You can configure the maximum number of times a particular map or reduce the task can fail before the entire job fails through the following properties:

mapred. map. max. attempts – The maximum number of attempts per map task.
mapred. reduce. max. attempts – Same as above, but for reduce tasks.

Request more information

Cloudera Hadoop Interview Questions