Accenture – Hadoop Interview Questions
Here is the list of Hadoop Interview Questions which are recently asked in Accenture company. These questions are included for both Freshers and Experienced professionals. Our Hadoop Training has Answered all the below Questions.
1. How will you decide whether you need to use the Capacity Scheduler or the Fair Scheduler?
If you wants the jobs to make equal progress instead of following the First in first out order then you must use Fair Scheduling.
If you have slow connectivity and data locality plays a vital role and makes a significant difference to the job runtime then you must use the Fair Scheduling.
2. What are the daemons required to run a Hadoop cluster?
Hadoop cluster has 5 daemons.They are NameNode, DataNode, Secondary NameNode, JobTracker and TaskTracker.
3. How will you restart a NameNode?The following methods we can restart the NameNode are:
- You can stop the NameNode individually using /sbin/hadoop-daemon.sh stop namenode command. Then start the NameNode using /sbin/hadoop-daemon.sh start namenode.
- Use /sbin/stop-all.sh and the use /sbin/start-all.sh, command which will stop all the demons first.
4. Explain about the different schedulers available in Hadoop.The different Schedulers in Hadoop are:
- First In First Out Scheduler.
- Capacity Scheduler.
- Fair Scheduler.
5. List few Hadoop shell commands that are used to perform a copy operation.
- ls: This command is used to list all the files.
- mkdir: To create a directory.
- touchz: It creates an empty file.
- copyFromLocal (or) put: To copy files/folders from local file system to hdfs store.
- cat: To print file contents.
- copyToLocal (or) get: To copy files/folders from hdfs store to local file system.
6. What is jps command used for?
JPS command is used to check if a specific daemon is up or not. The command of JPS displays all the processes that are based on the Java for a particular user. The command of JPS should run from the root to check all the operating nodes in the host.
7. What are the important hardware considerations when deploying Hadoop in production environment?
A Hadoop Platform should be designed by moving the computing activities to data and thus achieving scalability and the high performance. Capacity: Large Form Factor disks will cost less and allow for the more storage. Network: Two TOR switches per rack is ideal to avoid any chances for the redundancy.
8. How many NameNodes can you run on a single Hadoop cluster?
Hadoop cluster has two Namenodes- Active Namenode and Passive Namenode.
9. What happens when the NameNode on the Hadoop cluster goes down?
When the NameNode goes down, the file system goes offline. There is an optional SecondaryNameNode that can be hosted on a separate machine. It only creates the checkpoints of the namespace by merging the edits file into the fsimage file and does not provide any real redundancy.
10. What is the conf/hadoop-env.sh file and which variable in the file should be set for Hadoop to work
11. Apart from using the jps command is there any other way that you can check whether the NameNode is working or not.
To check Hadoop daemons are running or not, what you can do is just run the jps command in the shell. You just have to type the jps.You can also check if the daemons are running or not through their the web ui.
12. Which command is used to verify if the HDFS is corrupt or not?
To identify "corrupt" or "missing" blocks, the command-line command 'hdfs fsck /path/to/file' can be used. Other tools to also exist. HDFS will attempt to recover the situation automatically. By default there are the three replicas of any block in the cluster.
13. List some use cases of the Hadoop Ecosystem
- Call Data Records Management.
- Servicing of Telecom Data Equipment.
- Advanced Telecom infrastructure planning.
- Creating new products and services.
- Network traffic analytics.
14. I want to see all the jobs running in a Hadoop cluster. How can you do this?
- Login to Ambari.
- Click on YARN.
- Click on Quick Links.
- Click on resource Manager UI.
- By default you will see a list of all submitted jobs.
- Click on "Jobs -> Running" from left hand side menu.
- Then click on sort by StartTime.
15. Is it possible to copy files across multiple clusters? If yes, how can you accomplish this?
Yes, it is possible to the copy files across the multiple Hadoop clusters and this can be achieved using distributed copy. DistCP command is used for the intra or inter cluster copying.
16. Which is the best operating system to run Hadoop?
Linux is the only supported by the production platform, but other flavors of Unix can be used to run Hadoop for the development. Windows is only supported as a development platform, and the additionally requires Cygwin to run.
TOP MNC's HADOOP INTERVIEW QUESTIONS & ANSWERS
Here we listed all Hadoop Interview Questions and Answers which are asked in Top MNCs. Periodically we update this page with recently asked Questions, please do visit our page often and be updated in Hadoop .
To become a Hadoop Certified professional and join in your dream company, Enroll now for our Best Hadoop Training. We help you to crack any levels of Hadoop Interviews and We offering Hadoop Training with Placements.