FIS – Hadoop Interview Questions
Here is the list of Hadoop Interview Questions which are recently asked in FIS company. These questions are included for both Freshers and Experienced professionals.
1. What is Apache Hadoop?
Apache Hadoop is an open source framework that is used to efficiently store and the process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze the massive datasets in parallel more quickly.
2. Why do we need Hadoop?
The primary function of Hadoop is to facilitate quickly doing analytics on huge sets of unstructured data.You can add new storage capacity simply by adding server nodes in your Hadoop cluster. In theory, a Hadoop cluster can be almost infinitely expanded as needed using the low cost commodity server and storage hardware.
3. What are the core components of Hadoop?
There are four major elements of Hadoop are HDFS, MapReduce, YARN, and Hadoop Common. Most of the tools or solutions are used to the supplement or support these major elements.
4. What are the Features of Hadoop?
- Open Source
- Highly Scalable Cluster
- Fault Tolerance is Available
- High Availability is Provided
- Hadoop Provide Flexibility
- Easy to Use
- Hadoop uses Data Locality
5. Compare Hadoop and RDBMS?
- RDBMS stores structured data while the Hadoop stores structured, semi-structured, and unstructured data.
- The RDBMS is a database management system based on the relational model. The Hadoop is a software for storing data and the running applications on clusters of commodity hardware.
6. What are the modes in which Hadoop run?Hadoop Mainly works on Three different Modes are:
- Standalone Mode.
- Pseudo-distributed Mode.
- Fully-Distributed Mode.
7. What are the features of Standalone (local) mode?
Standalone mode is the default mode in which Hadoop run. Standalone mode is mainly used for the debugging where you don't really use HDFS. You can use the input and output both as a local file system in standalone mode. You also don't need to do any custom configuration in the files- mapred-site.
8. What are the features of Pseudo mode?
Pseudo mode runs virtual nodes on the same system. In the single mode, only one node is run and this mode is mainly used for the debugging process.
9. What are the features of Fully-Distributed mode?
This is the most important one in which multiple nodes are used few of them run the Master Daemon's that are Namenode and Resource Manager and the rest of them run the Slave Daemon's that are the DataNode and Node Manager. Here Hadoop will run on the clusters of Machine or nodes.
10. What are configuration files in Hadoop?
Configuration Files are the files which are located in the extracted tar. gz file in the etc/hadoop/ directory. All Configuration Files in the Hadoop are listed below, 1) HADOOP-ENV.sh->>It specifies the environment variables that affect the JDK used by Hadoop Daemon (bin/hadoop).
11. What are the limitations of Hadoop?Limitations of Hadoop are:
- a. Issues with Small Files. The main problem with Hadoop is that it is not suitable for small data.
- b. Slow Processing Speed.
- c. Support for Batch Processing only.
- d. No Real-time Processing.
- e. Iterative Processing.
- f. Latency.
- g. No Ease of Use.
- h. Security Issue.
12. Compare Hadoop 2 and Hadoop 3?
Hadoop 3 can work up to 30% faster than Hadoop 2 due to the addition of native Java implementation of the map output collector to the MapReduce. Spark can process the information in memory 100 times faster than the Hadoop. If working with a disk, Spark is 10 times faster than the Hadoop.
13. Explain Data Locality in Hadoop?
Data locality in Hadoop is the process of moving the computation close to where the actual data resides instead of moving large data to the computation. This minimizes overall the network congestion. This also increases the overall throughput of the system.
14. What is Safemode in Hadoop?
Safe Mode in hadoop is a maintenance state of NameNode during which NameNode doesn't allow any changes to the file system. During Safe Mode, HDFS cluster is the read-only and doesn't replicate or delete blocks.
15. How to restart NameNode or all the daemons in Hadoop?
You can stop the NameNode individually using /sbin/hadoop-daemon.sh stop namenode command. Then start the NameNode using /sbin/hadoop-daemon.sh start namenode.Use /sbin/stop-all.sh and the use /sbin/start-all.sh, command which will stop all the demons first.
16. What is a “Distributed Cache” in Apache Hadoop?
Distributed Cache in Hadoop is a facility provided by the MapReduce framework. Distributed Cache can cache files when needed by the applications. It can be cache read only the text files, archives, jar files etc.
17. How is security achieved in Hadoop?
The first step in securing an Apache Hadoop cluster is to enable the encryption in transit and at rest. Authentication and Kerberos rely on secure communications, so before you even go down the road of enabling authentication and the Kerberos you must enable encryption of data-in-transit.
18. Why does one remove or add nodes in a Hadoop cluster frequently?
Basically, in a Hadoop cluster a Manager node will be deployed on a reliable hardware with the high configurations, the Slave node's will be deployed on commodity hardware. So chance's of data node crashing is more . So more frequently you will see admin's remove and add new data node's in a cluster.
19. What is throughput in Hadoop?
Throughput is the amount of work done in a unit time. Because of bellow reasons HDFS provides good throughput: In hadoop, the task is divided among different blocks, the processing is done parallel and independent to each other. so because of the parallel processing, HDFS has the high throughput.
TOP MNC's HADOOP INTERVIEW QUESTIONS & ANSWERS
Here we listed all Hadoop Interview Questions and Answers which are asked in Top MNCs. Periodically we update this page with recently asked Questions, please do visit our page often and be updated in Hadoop .