Amazon - Hadoop Interview Questions
- What is InputSplit in Hadoop?
- What is the advantage of having a Distributed Cache in Hadoop?
- Explain process to access sub directories recursively in Hive queries.
- You have a file that contains 200 billion URLs. How will you find the first unique URL using Hadoop Hive?
- Assume that the web server creates a log file with timestamp and query. How will you design the Hadoop architecture (explaining how you will store the data) that can help you return top 15 queries made in the last 12 hours.
- How will you scale a system to handle huge amounts of unstructured data?
- What is the difference between TextInput format and KeyValue format in Hadoop?
- You have a huge file (in GB’s) that contains data in multiple languages. Find n most frequently occurring patterns in a text file using Hadoop MapReduce.
- How do you get the report of hdfs file system? About disk availability and no.of active nodes?
- Can We Change settings within Hive Session? If Yes, How?