Here is the list Data Science Interview Questions which are recently asked in Wipro company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. Python or R – Which one would you prefer for text analytics?

Python would be the best option because it has Pandas library that provides easy to use data structures and high-performance data analysis tools. R is more suitable for machine learning than just text analysis. Python performs faster for all types of text analytics.

### 2. Differentiate between univariate, bivariate and multivariate analysis.

Univariate statistics summarize only one variable at a time.For example, data collected from a sensor measuring the temperature of a room every second. Bivariate statistics compare two variables. Multivariate statistics compare more than two variables.

### 3. Define some key performance indicators for the product

- Key performance indicators (KPIs) measure a company’s success versus a set of targets, objectives, or industry peers.
- KPIs can be financial, including net profit (or the bottom line, gross profit margin), revenues minus certain expenses, or the current ratio (liquidity and cash availability).

### 4. Which technique is used to predict categorical responses

ANOVA, or analysis of variance, is to be used when the target variable is continuous and the dependent variables are categorical. Predictive model can be developed to predict the Survived feature. Categorical predictors are represented using 0 and 1 for dichotomous variables or using indicator (or dummy) variables for ordinal or categorical variables.

### 5. What is logistic regression? Or State an example when you have used logistic regression recently.

Logistic regression is a statistical analysis method used to predict a data value based on prior observations of a data set.Logistic Regression is used when the dependent variable (target) is categorical. For example: To predict whether an email is spam (1) or (0) whether the tumor is malignant (1) or not (0).

### 6. What are Recommender Systems?

Recommender systems are tools designed for interacting with large and complex information spaces and prioritizing items in these spaces that are likely to be of interest to the user.The recommender system deals with a large volume of information present by filtering the most important information based on the data.

### 7. Why data cleaning plays a vital role in analysis?

Data cleaning, or data cleansing, is an important part of the process involved in preparing data for analysis.Data cleaning can help in analysis because: Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with.

### 8. What does NLP stand for?

NLP stands for Natural Language Processing which means a computer programs that either understand or generate speech or text. It is a sub-field of artificial intelligence, and regularly makes use of machine learning techniques.

### 9. What do you understand by the term Normal Distribution?

A normal distribution is a common probability distribution and has a shape often referred to as a “bell curve.” Many everyday data sets typically follow a normal distribution: for example, the heights of adult humans, the scores on a test given to a large class, errors in measurements.

### 10. Explain Cross-validation?

Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation.

### 11. How do you define data science?

Data Science is a combination of algorithms, tools, and machine learning technique which helps you to find common hidden patterns from the given raw data.1Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis.

### 12. What is meant by supervised and unsupervised learning in data?

Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled data. Unsupervised learning model finds the hidden patterns in data. In supervised learning, input data is provided to the model along with the output.

### 13. What are the variants of Back Propagation?

The most common technique used to train a neural network is the back-propagation algorithm. There are three main variations of back-propagation: stochastic (also called online), batch and mini-batch.

### 14. What is a Random Forest?

Random forest is a supervised learning algorithm. The “forest” it builds, is an ensemble of decision trees, usually trained with the “bagging” method. The general idea of the bagging method is that a combination of learning models increases the overall result.

### 15. What is Collaborative filtering?

Collaborative filtering (CF) is a technique used by recommender systems. It is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 16. What is Linear Regression?

Linear Regression is a linear model that assumes a linear relationship between input variables (independent variables ‘x’) and output variable (dependent variable-‘y’) such that ‘y’ can be calculated from a linear combination of input variables(x).

### 17. What is Interpolation and Extrapolation?

When we predict values that fall within the range of data points taken it is called interpolation. When we predict values for points outside the range of data taken it is called extrapolation.

### 18. What is power analysis?

Statistical power is the probability of a hypothesis test of finding an effect if there is an effect to be found. A power analysis can be used to estimate the minimum sample size required for an experiment, given a desired significance level, effect size, and statistical power.

### 19. What is the difference between Cluster and Systematic Sampling?

Systematic sampling selects a random starting point from the population, and then a sample is taken from regular fixed intervals of the population depending on its size. Cluster sampling divides the population into clusters and then takes a simple random sample from each cluster.

### 20. Are expected value and mean value different?

Expected values and mean value are same.To find the expected value, E(X), or mean μ of a discrete random variable X, simply multiply each value of the random variable by its probability and add the products. The formula is given as.

E ( X ) = μ = ∑ x P ( x ).
### 21. What does P-value signify about the statistical data?

In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.

### 22. How you can make data normal using Box-Cox transformation?

The statisticians George Box and David Cox developed a procedure to identify an appropriate exponent (Lambda = l) to use to transform data into a “normal shape.” The Lambda value indicates the power to which all data should be raised.

### 23. What is the goal of A/B Testing?

A/B testing is a basic randomized control experiment which is a way to compare the two versions of a variable to find out which performs better in a controlled environment.

### 24. What is an Eigenvalue and Eigenvector?

Eigenvectors are unit vector that represent their length or magnitude is equal to 1. They are often referred to as right vectors which mean a column vector. Whereas, eigenvalues are coefficients applied to eigenvectors that give the vectors their length or magnitude.

### 25. What is Gradient Descent?

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent.

### 26. Do gradient descent methods always converge to same point?

Not always. In gradient descent, it depends on where you start (initialize). It is very easy to get stuck in local minima. So if you start from the same point for each solution, it will converge to the same minima.

### 27. What are various steps involved in an analytics project?

- Data Preparation.
- Data Modelling.
- Validation.
- Implementation of the Model and Tracking.

### 28. What are the basic assumptions to be made for linear regression?

Four assumptions are associated with a linear regression model: Linearity: The relationship between X and the mean of Y is linear. Homoscedasticity: The variance of residual is the same for any value of X. Independence: Observations are independent of each other.

### 29. How will you assess the statistical significance of an insight whether it is a real insight or just by chance?

Statistical significance is often calculated with statistical hypothesis testing, which tests the validity of a hypothesis by figuring out the probability that your results have happened by chance.The level at which one can accept whether an event is statistically significant is known as the significance level.