Here is the list Data Science Interview Questions which are recently asked in Larsen and Toubro company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. Can you enumerate the various differences between Supervised and Unsupervised Learning?

The main difference between supervised and unsupervised learning: Labeled data. The main distinction between the two approaches is the use of labeled datasets. To put it simply, supervised learning uses labeled input and output data, while an unsupervised learning algorithm does not.

### 2. What do you understand by the Selection Bias? What are its various types?

Selection bias is a kind of error that occurs when the researcher decides who is going to be studied. It is usually associated with research where the selection of participants isn’t random. Three types of bias can be distinguished: information bias, selection bias, and confounding.

### 3. Please explain the goal of A/B Testing.

A/B testing allows individuals, teams and companies to make careful changes to their user experiences while collecting data on the results.This allows to construct hypotheses and to learn why certain elements of their experiences impact user behavior.

### 4. How will you calculate the Sensitivity of machine learning models?

Sensitivity = d/(c+d): The proportion of observed positives that were predicted to be positive.
### 5. Could you draw a comparison between overfitting and underfitting?

Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. Underfitting refers to a model that can neither model the training data nor generalize to new data.

### 6. Between Python and R, which one would you pick for text analytics and why?

Python would be the best option because it has Pandas library that provides easy to use data structures and high-performance data analysis tools. R is more suitable for machine learning than just text analysis. Python performs faster for all types of text analytics.

### 7. Please explain the role of data cleaning in data analysis.

Data cleansing is a process in which you go through all of the data within a database and either remove or update information that is incomplete, incorrect, improperly formatted, duplicated, or irrelevant (source).

### 8. What do you mean by cluster sampling and systematic sampling?

Cluster sampling divides the population into clusters and then takes a simple random sample from each cluster. Systematic sampling selects a random starting point from the population, and then a sample is taken from regular fixed intervals of the population depending on its size.

### 9. What is A/B testing in Data Science?

A/B testing is a basic randomized control experiment. It is a way to compare the two versions of a variable to find out which performs better in a controlled environment.

### 10. What are the differences between overfitting and underfitting?

Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. Underfitting refers to a model that can neither model the training data nor generalize to new data.

### 11. What is underfitting

Underfitting is a scenario in data science where a data model is unable to capture the relationship between the input and output variables accurately, generating a high error rate on both the training set and unseen data.

### 12. Why data cleaning plays a vital role in the analysis?

Data cleaning can help in analysis because: Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with. Data Cleaning helps to increase the accuracy of the model in machine learning

### 13. What are the types of machine learning?

- Supervised learning
- Unsupervised learning
- Reinforcement learning

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 14. Please explain Eigenvectors and Eigenvalues.

Eigenvector, eigenvalue are often referred to as right vectors, which simply means a column vector. Eigenvalues are coefficients applied to eigenvectors that give the vectors their length or magnitude. The eigenvectors are returned as a matrix with the same dimensions as the parent matrix, where each column is an eigenvector.

### 15. Can you compare the validation set with the test set?

The term “validation set” is used interchangeably with the term “test set” and refers to a sample of the dataset held back from training the model. … The model is fit on the training set, and the fitted model is used to predict the responses for the observations in the validation set.

### 16. What do you understand by linear regression and logistic regression?

The purpose of Linear Regression is to find the best-fitted line while Logistic regression is one step ahead and fitting the line values to the sigmoid curve. The method for calculating loss function in linear regression is the mean squared error whereas for logistic regression it is maximum likelihood estimation.

### 17. Please explain Recommender Systems along with an application.

In a recommendation-system application there are two classes of entities, which we shall refer to as users and items. The data itself is represented as a utility matrix, giving for each user-item pair, a value that represents what is known about the degree of preference of that user for that item.

### 18. Could you explain how to define the number of clusters in a clustering algorithm?

The optimal number of clusters can be defined as follow: Compute clustering algorithm (e.g., k-means clustering) for different values of k. … For each k, calculate the total within-cluster sum of square (wss). Plot the curve of wss according to the number of clusters k.

### 19.What do you understand by Deep Learning?

Deep learning is an AI function that mimics the workings of the human brain in processing data for use in detecting objects, recognizing speech, translating languages, and making decisions. Deep learning AI is able to learn without human supervision, drawing from data that is both unstructured and unlabeled.

### 20. How does Backpropagation work? Also, it state its various variants.

Back-propagation is just a way of propagating the total loss back into the neural network to know how much of the loss every node is responsible for, and subsequently updating the weight. There are three main variations of back-propagation: stochastic (also called online), batch and mini-batch.

### 21. What do you know about Autoencoders?

Autoencoders are self-supervised machine learning models which are used to reduce the size of input data by recreating it. These models are trained as supervised machine learning models and during inference, they work as unsupervised models that’s why they are called self-supervised models.

### 22. What are the types of machine learning?

The types are supervised, unsupervised, and reinforcement learning. Hybrid types of learning, such as semi-supervised and self-supervised learning. Broad techniques are active, online, and transfer learning.

### 23. What do you mean by Deep Learning and Why has it become popular now?

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. It is gaining much popularity due to its supremacy in terms of accuracy when trained with huge amount of data.

### 24. How regularly must an algorithm be updated?

Algorithm is updated based on its use and need for updation. For example Google is reported to change its search algorithm around 500 to 600 times each year while most of these updates are small and often aren’t even picked up by users.

### 25. What Is Power Analysis?

Power Analysis is the process of estimating one of the 4 variables given values for the 3 variables. It is commonly used to estimate the minimum sample size to carry out an experiment.