Robert Bosch – Hadoop Interview Questions
Here is the list of Hadoop Interview Questions which are recently asked in Robert Bosch company. These questions are included for both Freshers and Experienced professionals.
1. Which language you use in flume configuration
Flume agent configuration is stored in a local configuration file. This is a text file that follows the Java properties file format. Configurations for that the one or more agents can be specified in the same configuration file.
2. Write a command to import customer table in Hadoop
The ‘Import tool’ imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in the text files or as binary data in Avro and the Sequence files.
The following syntax is used to import data into HDFS.
$ sqoop import (generic-args) (import-args)
$ sqoop-import (generic-args) (import-args)
3. What is the mapper in Sqoop and how you decide the number of mapper in Sqoop?
Number of mappers indicates how parallel your Sqoop job is running . But the corner case is that the number of mapper is also equal to the number of data base .
4. What is the difference between an external table and internal table?
An internal table data is stored in the warehouse folder, whereas an external table data is stored at the location you mentioned in the table creation.
5. Where you can specify the input and output location in MapReduce program
No, it is not mandatory to set the input and output type/format in MapReduce. By default, the cluster takes the input and the output type as 'text'.
6. What is serialization?
Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called the deserialization.
7. There is a table having 500 Million records. Now you want to copy the data of that table in some other table, what best approach you will choose.
8. What type of data we should store in Fact table and dimension table?
Fact table is defined by their grain or its most atomic level whereas Dimension table should be wordy, descriptive, complete, and quality assured. Fact table helps to store report labels whereas Dimension table contains the detailed data.
9. How bucketing is different from Partition and why we use it?
Bucketing decomposes data into more manageable or equal parts. With partitioning, there is a possibility that you can create the multiple small partitions based on column values. If you go for bucketing, you are restricting the number of buckets to store the data. This number is defined during the table creation scripts.
10. What is Fact Table and Dimension Table?
Dimension table Dimension table is a table which contain attributes of measurements stored in the fact tables. Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.
TOP MNC's HADOOP INTERVIEW QUESTIONS & ANSWERS
Here we listed all Hadoop Interview Questions and Answers which are asked in Top MNCs. Periodically we update this page with recently asked Questions, please do visit our page often and be updated in Hadoop .