Standard Chartered – Hadoop Interview Questions
Here is the list of Hadoop Interview Questions which are recently asked in Standard Chartered company. These questions are included for both Freshers and Experienced professionals.
1. Explain Hadoop streaming?
Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run the Map/Reduce jobs with any executable or script as the mapper and/or the reducer.
2. What is HDFS- Hadoop Distributed File System?
HDFS is a distributed file system that handles the large data sets running on commodity hardware. It is used to scale a single Apache Hadoop cluster to hundreds and the even thousands of nodes. HDFS is one of the major components of Apache Hadoop, the others being MapReduce and YARN.
3. What does hadoop-metrics.properties file do?
Hadoop-metrics. properties is used for 'Performance Reporting' purposes. It controls the reporting for Hadoop. The API is abstract so that it can be implemented on top of a variety of the metrics client libraries.
4. How Hadoop’s CLASSPATH plays a vital role in starting or stopping in Hadoop daemons?
Class path will contains the list of directories containing jar files required to stop/start daemons.
5. What are the different commands used to startup and shutdown Hadoop daemons?
stop-dfs.sh - Stops the Hadoop DFS daemons. start-mapred.sh - Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers. stop-mapred.sh - Stops the Hadoop Map/Reduce daemons. start-all.sh - Starts all the Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers.
6. What is configured in /etc/hosts and what is its role in setting Hadoop cluster?
In Hadoop cluster, we store all the hostnames with their IP addresses in /etc/hosts so, that we can use hostnames easily instead of the IP addresses.
7. How is the splitting of file invoked in Hadoop framework?
An Input File for processing is stored on the local HDFS store. The InputFormat component of MapReduce task divides this file into Splits.
8. Is it possible to provide multiple input to Hadoop? If yes then how?
If Multiple input files are present in the same directory – By default hadoop doesnt read the directory recursively. But suppose if multiple input files like data1, data2,etc are present in the /folder1, then Set mapreduce. input.
9. Is it possible to have hadoop job output in multiple directories? If yes, how?
Yes, it is possible to have the output of Hadoop MapReduce Job written to multiple directories. In Hadoop MapReduce, the output of the Reducer is the final output of a Job, and thus its written in to the Hadoop Local File System.
10. Explain NameNode and DataNode in HDFS?
The NameNode keeps an image of the entire file system namespace and file Blockmap in memory.The DataNode stores HDFS data in files in its the local file system. The DataNode has no knowledge about HDFS files. It stores each block of HDFS data in a separate file in its the local file system.
11. Why is block size set to 128 MB in Hadoop HDFS?
The default size of a block in HDFS is 128 MB which is much larger as compared to the Linux system where the block size is 4KB. The reason of having this huge block size is to minimize the cost of seek and reduce the meta data information generated to per block.
12. How data or file is written into HDFS?
To write a file in HDFS, a client needs to interact with master i.e. namenode (master). Namenode provides the address of the datanodes on which client will start writing the data. Client can directly write data on the datanodes, now datanode will create the data write pipeline.
13. How data or file is read in HDFS?
To read a file from HDFS, a client needs to interact with the namenode as namenode is the centerpiece of Hadoop cluster it stores all the metadata i.e. data about the data. Now client will interact directly with the respective datanodes to read the data blocks.
14. How is indexing done in HDFS?
In Distributed file system like HDFS, indexing is diffenent from that of the local file system. Here indexing and searching of data is done using the memory of the HDFS node where the data is residing. The generated index files are stored in a folder in directory where the actual data is residing.
15. What is a Heartbeat in HDFS?
A Heartbeat is a signal from Datanode to Namenode to indicate that it is alive. In HDFS, absence of heartbeat indicates that there is some problem and then the Namenode, Datanode can not perform any computation.
16. Explain Hadoop Archives?
Hadoop Archives or HAR is an archiving facility that packs files in to the HDFS blocks efficiently and hence HAR can be used to tackle the small files problem in Hadoop.
TOP MNC's HADOOP INTERVIEW QUESTIONS & ANSWERS
Here we listed all Hadoop Interview Questions and Answers which are asked in Top MNCs. Periodically we update this page with recently asked Questions, please do visit our page often and be updated in Hadoop .