Here is the list Data Science Interview Questions which are recently asked in Walmart company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. Write the code to reverse a Linked list.

Iterative Method
Initialize three pointers prev as NULL, curr as head and next as NULL.
Iterate through the linked list. In loop, do following. Before changing next of current, store next node. next = curr->next. Now change next of current. This is where actual reversing happens. curr->next = prev

### 2. What assumptions does linear regression machine learning algorithm make?

The assumption of linear regression is that the dependent and independent variables should be linearly related. It is also necessary to check for outliers because linear regression is sensitive to outliers.

### 3. A stranger uses a search engine to find something and you do not know anything about the person. How will you design an algorithm to determine what the stranger is looking for just after he/she types few characters in the search box?

Any search engine’s algorithm can be made using proper options for the person searching any content on a search engine using survey of the overall populations trend in search, and if any character matches among those topics, present in the database, all such topics shall be made available for referencing search because big data and data analytics are important facts through which a lot of accurate predictions can be made because big data and data analytics are important facts through which a lot of accurate predictions can be made.

### 4. How will you fix multi-colinearity in a regression model?

To eliminate the problem of multicollinearity is to identify the collinear independent variables and then remove all but one. It is also possible to eliminate multicollinearity by combining two or more collinear variables into a single variable.

### 5. What data structures are available in the Pandas package in Python programming language?

Pandas is a one-dimensional labeled array and capable of holding data of any type (integer, string, float, python objects, etc.)

**Data :** array- Contains data stored in Series.
**Index :** array-like or Index (1d)
**Dtype :** str, numpy. dtype, or ExtensionDtype, optional.
**Name:** str, optional.
**copy:** bool, default False.

### 6. State some use cases where Hadoop MapReduce works well and where it does not.

** Use cases where Hadoop MapReduce is used, **

- Real Time Analytics
- Multiple Smaller Datasets

** Use cases where Hadoop MapReduce is used,**

- Data Size and Data Diversity

### 7. What is the difference between an iterator, generator and list comprehension in Python?

The generator yields one item at a time and generates item only when in demand whereas in a list comprehension, Python reserves memory for the whole list. Thus the generator expressions are memory efficient than the lists.

### 8. What is the Law of Large Numbers?

The law of large numbers is a theorem from probability and statistics that suggests that the average result from repeating an experiment multiple times will better approximate the true or expected underlying result. All sample observations for an experiment are drawn from an idealized population of observations.

### 9. What is A/B testing in Data Science?

A/B testing is a basic randomized control experiment and is a way to compare the two versions of a variable to find out which performs better in a controlled environment. It also enables to accurately quantify our effect size and errors, and so calculate the probability that we have made a type I or type II error

### 10. What are the differences between overfitting and underfitting?

Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. Under fitting refers to a model that can neither model the training data nor generalize to new data.

### 11. What is collaborative filtering?

Collaborative filtering is a type of recommendation engine that uses both user and item data. For eg: ratings from individual users on individual items. This way, items are recommended based on the ratings from other users, thus, collaborative.

### 12. Explain Star Schema?

A star schema is a database organizational structure optimized for use in a data warehouse or business intelligence that uses a single large fact table to store transactional or measured data and one or more smaller dimensional tables that store attributes about the data.

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 13. What is the difference between a bagged model and a boosted model?

Bagging is a method of merging the same type of predictions and decreases variance, not bias, and solves over-fitting issues in a model. Boosting is a iterative technique of merging different types of predictions and decreases bias, not variance.

### 14. What do you understand by parametric and non-parametric methods? Explain with examples.

Parametric Methods uses a fixed number of parameters to build the model.Parametric analysis is to test group means. Non-Parametric Methods use the flexible number of parameters to build the model. Non-parametric analysis is to test medians.

### 15. Have you used sampling? What are the various types of sampling have you worked with?

Sampling is used to indicate how much data to collect and how often it should be collected. It defines the samples to take in order to quantify a system, process, issue, or problem.

**Types of sampling:**

- Simple Random Sampling.
- Stratified Random Sampling.
- Cluster Random Sampling.
- Systematic Random Sampling

### 16. Explain about cross entropy ?

Cross-entropy is a measure of the difference between two probability distributions for a given random variable or set of events.

### 17. What are the assuptions you make for linear regression ?

There are four assumptions associated with a linear regression model: Linearity: The relationship between X and the mean of Y is linear. Homoscedasticity: The variance of residual is the same for any value of X. Independence: Observations are independent of each other.

### 18. Differentiate between gradient boosting and random forest.

**The two main differences are:**

**How trees are built:** random forests builds each tree independently while gradient boosting builds one tree at a time.

**Combining results:** random forests combine results at the end of theprocess while gradient boosting combines results along the way.

### 19. What is the signigicance of log odds?

Log odds play a central role in logistic regression. Every probability can be easily converted to log odds, by finding the odds ratio and taking the logarithm.

### 20. What is a Linear Regression?

Linear regression is one of the statistical methods of predictive analytics to predict the target variable. When we have one independent variable, we call it Simple Linear Regression. If the number of independent variables is more than one, we call it Multiple Linear Regression.

### 21. How do you define data science?

Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.

### 22. What are Eigenvalue and Eigenvector?

Eigenvectors are unit vectors which mean that their length or magnitude is equal to 1. They are often referred to as right vectors, which simply mean a column vector whereas, eigenvalues are coefficients applied to eigenvectors that give the vectors their length or magnitude.

### 23. What is a Random Forest?

Random forest is the prime example of ensemble machine learning method. It is one could intuitively guess, ensembles various decision trees to produce a more generalized model by reducing the notorious over-fitting tendency of decision trees.

### 24. What is power analysis?

Power Analysis is the process of estimating one of the 4 variables given values for the 3 variables. It can be used to estimate the minimum sample size required for an experiment, given a desired significance level, effect size, and statistical power.

### 25. Do gradient descent methods always converge to same point?

Not always.
In gradient descent, it depends on where you start (initialize). It is very easy to get stuck in local minima. So if the same point is started for each solution, it will converge to the same minima.