Robert Bosch Hadoop Interview Questions

Here is the list of Hadoop Interview Questions which are recently asked in Robert Bosch company. These questions are included for both Freshers and Experienced professionals. Our Hadoop Training has Answered all the below Questions.

1. Which language you use in flume configuration?

Flume agent configuration is stored in a local configuration file. This is a text file that follows the Java properties file format. Configurations for that the one or more agents can be specified in the same configuration file.

2. Write a command to import customer table in Hadoop.

The ‘Import tool’ imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in the text files or as binary data in Avro and the Sequence files.

Syntax The following syntax is used to import data into HDFS.

$ sqoop import (generic-args) (import-args)

$ sqoop-import (generic-args) (import-args)

3. What is the mapper in Sqoop and how you decide the number of mapper in Sqoop?

Number of mappers indicates how parallel your Sqoop job is running . But the corner case is that the number of mapper is also equal to the number of data base .

4. What is the difference between an external table and internal table?

An internal table data is stored in the warehouse folder, whereas an external table data is stored at the location you mentioned in the table creation.

5. Where you can specify the input and output location in MapReduce program?

No, it is not mandatory to set the input and output type/format in MapReduce. By default, the cluster takes the input and the output type as ‘text’.

Free PDF : Get our updated Hadoop Course Content pdf

6. What is serialization?

Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called the deserialization.

7. There is a table having 500 Million records. Now you want to copy the data of that table in some other table, what best approach you will choose.

To copy 500 million records from one table to another in Hadoop, the best approach is to optimize both data movement and processing. Distributed copy is one of the approaches to handle large-scale data copying

Distributed Copy

DistCp allows copying large amounts of data between HDFS clusters or directories using MapReduce. Store the table data in HDFS and use DistCp to copy data efficiently from the source HDFS location to the destination.

8. What type of data we should store in Fact table and dimension table?

Fact table is defined by their grain or its most atomic level whereas Dimension table should be wordy, descriptive, complete, and quality assured. Fact table helps to store report labels whereas Dimension table contains the detailed data.

9. How bucketing is different from Partition and why we use it?

Bucketing decomposes data into more manageable or equal parts. With partitioning, there is a possibility that you can create the multiple small partitions based on column values. If you go for bucketing, you are restricting the number of buckets to store the data. This number is defined during the table creation scripts.

10. What is Fact Table and Dimension Table?

Dimension table Dimension table is a table which contain attributes of measurements stored in the fact tables. Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.

Request more information

Robert Bosch Hadoop Interview Questions