PayPal Hadoop Interview Questions

Here is the list of Hadoop Interview Questions which are recently asked in PayPal company. These questions are included for both Freshers and Experienced professionals. Our Hadoop Training has Answered all the below Questions.

1. Configure slots in Hadoop 2.0 and Hadoop 1.0.

Each Map Reduce Jobs are split into task and Task tracker runs each task on a fixed number of map and reduce slots inside a data node based on a static configuration.
In Hadoop 1.0 we need to specify in mapred-site.xml the following parameter to Configure the number of map slots and reduce slots.
Hadoop 2 now supports Automatic Failover of the YARN ResourceManager. Because of many such enterprise ready features, Hadoop is making news and the positive predictions.

2. In case of high availability, if the connectivity between Standby and Active NameNode is lost. How will this impact the Hadoop cluster?

Active Name Node and standby Name Node is not directly connected. They are connected through a medium, journal. Read and write operation is through journal only. If the network is down then only connectivity between the NameNode and Standby Name Node will be lost. There is no impact on hadoop cluster till then your Name Node is Up and running.

3. What is the minimum number of ZooKeeper services required in Hadoop 2.0 and Hadoop 1.0?

Group of ZooKeeper servers. The minimum number of the nodes that is required to form an ensemble is 3.

4. If the hardware quality of few machines in a Hadoop Cluster is very low. How will it affect the performance of the job and the overall performance of the cluster?

Installing Hadoop cluster in production is just half the battle won. It is extremely important for a Hadoop admin to tune the Hadoop cluster setup to gain maximum performance. During the Hadoop installation, the cluster is configured with default configuration settings which are on par with the minimal hardware configuration. It is the extremely important for Hadoop admins to be familiar with various hardware specifications like – the number of disks mounted on datanodes, RAM capacity, the number of virtual and physical cores, the number of CPU cores, NIC Cards, etc.

5. How does a NameNode confirm that a particular node is dead?

Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead.

6. Explain the difference between blacklist node and dead node.

When the JobTracker submits jobs to the TaskTracker and the tasks on that the node have failed too many times, the JobTracker will blacklisted a TaskTracker.Dead Node , which are not in the cluster or configure but not showing into the cluster.

7. How can you increase the NameNode heap memory?

Blocksize=128 MB, Replication=1.
Cluster capacity in MB: 200 * 24,000,000 MB = 4,800,000,000 MB (4800 TB).
Disk space needed per block: 128 MB per block * 1 = 128 MB storage per block.
Cluster capacity in blocks: 4,800,000,000 MB / 128 MB = 36,000,000 blocks.

Free PDF : Get our updated Hadoop Course Content pdf

8. Configure capacity scheduler in Hadoop.

Purpose.
Features.
Configuration. Setting up ResourceManager to use CapacityScheduler. Setting up queues.
Changing Queue Configuration. Changing queue configuration via file. Deleting queue via file.
Updating a Container .
Activities. Scheduler Activities.

9. After restarting the cluster, if the MapReduce jobs that were working earlier are failing now, what could have gone wrong while restarting?

The cluster could be in a safe mode after the restart of a namenode. The administrator needs to wait for the namenode to exit the safe mode before restarting the jobs again.

10. Explain the steps to add and remove a DataNode from the Hadoop cluster?

Decommission the DataNode role. When asked to select the role instance to decommission, select the DataNode role instance.
Stop the DataNode role.
Verify the integrity of the HDFS service.
After all errors are resolved, perform the following steps.

11. In a large busy Hadoop cluster-how can you identify a long running job?

Login to Ambari.
Click on YARN.
Click on Quick Links.
Click on resource Manager UI.
By default you will see a list of all submitted jobs.
Click on “Jobs -> Running” from left hand side menu.
Then click on sort by StartTime.

12. When NameNode is down, what does the JobTracker do?

Client submits job to the Namenode. Namenode looks for the data requested by the client and gives the block information. JobTracker is responsible for the job to be completed and the allocation of resources to the job.

13. When configuring Hadoop manually, which property file should be modified to configure slots?

There could be a separate configuration file for configuring the properties of these and job ACLs are checked for authorizing view and the modification of jobs.

14. How will you add a new user to the cluster?

To create the HDFS home directory[i.e. /user/] on edge node. You can still run jobs with the new user on cluster, even if you haven’t created his home directory in the Linux.

15. What is the advantage of speculative execution? Under what situations, Speculative Execution might not be beneficial?

The main work of the speculative execution is to reduce the job execution time; however, the clustering efficiency is affected due to duplicate tasks. Since in speculative execution redundant tasks are being executed, thus this can be reduce overall throughput.

Request more information

PayPal Hadoop Interview Questions