Wipro - Hadoop Interview Questions
Here is the list of Hadoop Interview Questions which are recently asked in Wipro company. These questions are included for both Freshers and Experienced professionals.
1. Garbage Collection in Java – How it works?
In Java, garbage collection is the process of managing memory, automatically. It finds the unused objects and delete or remove them to free up the memory. The garbage collection mechanism uses the several GC algorithms. The most popular algorithm that is used is Mark and Sweep.
2. Different Types of Comprassions in Hive?
- GZIP compression: GZip compression is a GNU zip compression utility that is based on the DEFLATE algorithm.
- BZIP2 compression: Bzip2 compresses files more effectively and with a higher compression ratio than Gzip.
3. Job Properties in Oozie
Oozie workflows can be parameterized. The parameters come from a configuration file called as property file. We can run the multiple jobs using same workflow by using multiple . property files .
4. How do you ensure 3rparty Jar files are available in Data Nodes.
You could certainly upload external jar files to each tasktracker and the update HADOOOP_CLASSPATH accordingly, but are you really willing to bother Ops team each time you need to add a new jar ? Works well on a single server node, but are you going to the upload such jar across all of the 10, 100 or even more Hadoop nodes ? This approach does not scale at all !
5. How do you define and use UDF’s in HiveCreating custom UDF in Hive are:
- Add Dependency JAR file to your eclipse build path.
- Create a Java class extending hive's "UDF" class.
- Export JAR file from Eclipse Project.
- Add Jar On to Hive.
- Create UDF under Hive.
- Create function and add jar permanently.
6. If we have 10GB and 10MB file, How do you load and process the 10 MB file in map-reduce
In MapReduce, Map task process a block of data at a time. Many small files mean lots of blocks which means the lots of tasks, and lots of book keeping by Application Master. This will slow the overall cluster performance compared to the large files processing.
7. What are Joins in Hive in Map-Reduce Paradigm
Hive joins are executed by MapReduce jobs through the different execution engines like for example Tez, Spark or MapReduce. Joins even of multiple tables can be achieved by the one job only. Since it's first release many optimizations have been added to the Hive giving users various options for query improvements of joins.
8. Apart from Map-side and reduce side joins any other joins in map-reduce?
Map side join is usually used when one data set is large and the other data set is small. Whereas the Reduce side join can join both the large data sets. The Map side join is faster as it does not have to wait for all the mappers to complete as in case of reducer. Hence reduce side the join is slower.
9. What is Sort-merge-Bucketing?
Sort Merge Bucket is a technique for the writing data to file system in deterministic file locations, sorted according by the some pre-determined key, so that it can later be read in as key groups within the no shuffle required.
10. How do we test Hive in production?
- Configure Hive execution environment.
- Setup test input data.
- Execute SQL script under test.
- Extract data written by the executed script.
- Make assertions on the data extracted.
11. What is the difference between Hashmap and HashTable
Though both Hashtable and HashMap are data-structure based upon the hashing and implementation of Map interface, the main difference between them is that HashMap is not thread-safe but Hashtable is thread-safe. Another difference is HashMap allows one null key and the null values but Hashtable doesn't allow null key or values.
12. What is bucketing
The bucketing in Hive is a data organizing technique. It is similar to the partitioning in Hive with an added functionality that it divides large datasets into more manageable parts known as buckets. However, we can also divide the partitions further in buckets.
TOP MNC's HADOOP INTERVIEW QUESTIONS & ANSWERS
Here we listed all Hadoop Interview Questions and Answers which are asked in Top MNCs. Periodically we update this page with recently asked Questions, please do visit our page often and be updated in Hadoop .