Here is the list Data Science Interview Questions which are recently asked in TCS company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. What Are the Feature Vectors?

A feature vector is just a vector containing multiple elements (features). The features may represent a pixel or a whole object in an image. Examples of features are color components, length, area, circularity, gradient magnitude, gradient direction, or simply the gray-level intensity value.

### 2. Explain the Steps in Making a Decision Tree.

- Identify the decision. To make a decision, identify the problem that need to solve or the question need to answer.
- Gather relevant information.
- Identify the alternatives.
- Weigh the evidence.
- Choose among alternatives.
- Take action.
- Review the decision.

### 3. What Is Root Cause Analysis?

Root cause analysis (RCA) is the process of discovering the root causes of problems in order to identify appropriate solutions. RCA assumes that it is much more effective to systematically prevent and solve for underlying issues rather than just treating ad hoc symptoms and putting out fires.

### 4. What Is Logistic Regression?

A logistic regression model predicts a dependent data variable by analyzing the relationship between one or more existing independent variables.Logistic regression is a linear algorithm (with a non-linear transform on output). It does assume a linear relationship between the input variables with the output.

### 5. What Are Recommender Systems?

Data science helps companies make better decisions, and recommender systems help data scientists succeed in it.A Recommender System refers to a system that is capable of predicting the future preference of a set of items for a user, and recommend the top items.

### 6. Explain Cross-Validation.

Cross-Validation also referred to as out of sampling technique is an essential element of a data science project. It is a resampling procedure used to evaluate machine learning models and access how the model will perform for an independent test dataset.

### 7. What Is Collaborative Filtering?

Collaborative filtering (CF) is a technique used by recommender systemswhere there are multiple ways to find similar users or items and multiple ways to calculate rating based on ratings of similar users. It is calculated only on the basis of the rating (explicit or implicit) a user gives to an item.

### 8. Do Gradient Descent Methods at All-Time Converge to a Similar Point?

Not always.Gradient Descent is an algorithm which is designed to find the optimal points, but these optimal points are not necessarily global. It depends on where you start (initialize) and It is very easy to get stuck in local minima. So if we start from the same point for each solution, it will converge to the same minima.

### 9. What Is the Goal of A/B Testing?

A/B testing is a basic randomized control experiment and a way to compare the two versions of a variable to find out which performs better in a controlled environment.It allows individuals, teams and companies to make careful changes to their user experiences while collecting data on the results.

### 10. What Are the Drawbacks of the Linear Model?

Linear regression assumes a linear relationship between the input and output variables, it fails to fit complex datasets properly. In most real life scenarios the relationship between the variables of the dataset isn’t linear and hence a straight line doesn’t fit the data properly.

### 11. What is the difference between Machine learning Vs Data Mining?

Data mining is designed to extract the rules from large quantities of data, while machine learning teaches a computer how to learn and comprehend the given parameters. Data mining is used on an existing dataset (like a data warehouse) to find patterns. Machine learning, on the other hand, is trained on a ‘training’ data set.

### 12. Explain Star Schema?

Star schema is a database organizational structure optimized for use in a data warehouse or business intelligence that uses a single large fact table to store transactional or measured data and one or more smaller dimensional tables that store attributes about the data.

### 13. What is meant by supervised and unsupervised learning in data?

Supervised: All data is labeled and the algorithms learn to predict the output from the input data. Unsupervised: All data is unlabeled and the algorithms learn to inherent structure from the input data.

### 14. Describe the structure of Artificial Neural Networks?

ANN is made of three layers namely input layer, output layer, and hidden layer/s. There must be a connection from the nodes in the input layer with the nodes in the hidden layer and from each hidden layer node with the nodes of the output layer. The input layer takes the data from the network.

### 15. How can you assess a good logistic model?

**Measuring the performance of Logistic Regression**

- One can evaluate it by looking at the confusion matrix and count the misclassifications (when using some probability value as the cutoff) or.
- One can evaluate it by looking at statistical tests such as the Deviance or individual Z-scores.

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 16. What Is the Law of Large Numbers?

The law of large numbers is a theorem from probability and statistics that suggests that the average result from repeating an experiment multiple times will better approximate the true or expected underlying result. All sample observations for an experiment are drawn from an idealized population of observations

### 17. What Are Confounding Variables?

Two variables are confounded when their effects cannot be separated from each other. When approaching a data science project, this problem is encountered when there is a variable other than the predictor variable that may have caused the effect being studied.

### 18. Explain Star Schema.

A star schema is a database organizational structure optimized for use in a data warehouse or business intelligence that uses a single large fact table to store transactional or measured data and one or more smaller dimensional tables that store attributes about the data.

### 19. How Regularly Must an Algorithm be Updated?

An algorithm is a set of rules or instructions that are followed by a computer programme to implement calculations or perform other problem-solving functions. The important data science algorithms include regression, classification and clustering techniques, decision trees and random forests.

### 20. What Are Eigenvalue and Eigenvector?

Eigenvectors are unit vectors, which mean that their length or magnitude is equal to 1. They are often referred to as right vectors, which simply mean a column vector. Whereas, eigenvalues are coefficients applied to eigenvectors that give the vectors their length or magnitude.

### 21. Why Is Resampling Done?

One of the reasons we use resampling is that we can analyze the results of a model fitted over many samples of the same dataset. This allows us to obtain some additional knowledge that would not have been obtained by fitting the model just once.

### 22. Explain Selective Bias.

Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed and it is sometimes referred to as the selection effect.

### 23. What Are the Types of Biases That Can Occur During Sampling?

- Observer Bias.
- Self-Selection/Voluntary Response Bias.
- Survivorship Bias.
- Recall Bias.
- Exclusion Bias

### 24. Explain Survivorship Bias.

Survivorship bias or survivor bias is the tendency to view the performance of existing stocks or funds in the market as a representative comprehensive sample without regarding those that have gone bust.

### 25. How Do You Work Towards a Random Forest?

The random forest is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building each individual tree to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.

### 26. Can you name the type of biases that occur in machine learning?

- Sample Bias
- Prejudice Bias
- Confirmation Bias
- Group attribution Bias

### 27. Do you think 50 small decision trees are better than a large one? Why?

Yes, A shorter tree most of the time generalizes better and faster. 50 create a more robust model (less subject to over-fitting) and easier to interpret.The minimum number of samples required to split a node.

### 28. Can you write the formula to calculat R-square?

- Calculate predicted values, subtract actual values and square the results.
- Divide the first sum of errors (explained variance) by the second sum (total variance)
- Subtract the result from one, and you have the R-squared.

### 29. What does NLP stand for?

NLP stands for Neuro-linguistic programming and is a way of changing someone’s thoughts and behaviors to help achieve desired outcomes for them.

### 30. Do gradient descent methods always converge to same point?

Not always. In gradient descent, it depends on where to start (initialize). It is very easy to get stuck in local minima. So if started from the same point for each solution, it will converge to the same minima.