Here is the list Data Science Interview Questions which are recently asked in HP company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. What is skewed Distribution & uniform distribution?

Skewed distribution is a condition when one side (either right or left) of the graph has more dataset in comparison to the other side. Uniform distribution is a condition when all the observations in a dataset are equally spread across the range of distribution.

### 2. When underfitting occurs in a static model?

Underfitting destroys the accuracy of our machine learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough. It usually happens when we have less data to build an accurate model.

### 3. What is reinforcement learning?

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment. In reinforcement learning, an artificial intelligence faces a game-like situation.

### 4. Name commonly used algorithms.

- Linear Regression.
- Logistic Regression.
- Decision Tree.
- SVM.
- Naive Bayes.

### 5. What is precision?

The precision of a measurement system is refers to how close the agreement is between repeated measurements (which are repeated under the same conditions). Consider the example of the paper measurements. The precision of the measurements refers to the spread of the measured values.

### 6. What is a univariate analysis?

Univariate analysis explores each variable in a data set, separately. It looks at the range of values, as well as the central tendency of the values. It describes the pattern of response to the variable. It describes each variable on its own. Univariate descriptive statistics describe individual variables.

### 7. How do you overcome challenges to your findings?

- Step 1: Define the problem. First, it’s necessary to accurately define the data problem that is to be solved.
- Step 2: Decide on an approach.
- Step 3: Collect data.
- Step 4: Analyze data.
- Step 5: Interpret results

### 8. Explain cluster sampling technique in Data science

Cluster sampling is a probability sampling technique in which all population elements are categorized into mutually exclusive and exhaustive groups called clusters. Clusters are selected for sampling, and all or some elements from selected clusters comprise the sample.

### 9. State the difference between a Validation Set and a Test Set

Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network. – Test set: A set of examples used only to assess the performance of a fully-specified classifier. These are the recommended definitions and usages of the terms.

### 10. Explain the term Binomial Probability Formula?

The expected value, or mean, of a binomial distribution, is calculated by multiplying the number of trials (n) by the probability of successes (p), or n x p. X is the number of successful trials; p is probability of success in a single trial; nCx is the combination of n and x.

### 11. What are the variants of Back Propagation?

The most common technique used to train a neural network is the back-propagation algorithm. There are three main variations of back-propagation: stochastic (also called online), batch and mini-batch.

### 12. What is meant by supervised and unsupervised learning in data?

Supervised learning algorithms are trained using labeled data and input data is provided to the model along with the output.Unsupervised learning algorithms are trained using unlabeled data. Unsupervised learning model finds the hidden patterns in data.

### 13. What is an Auto-Encoder?

An autoencoder is a neural network that is trained to attempt to copy its input to its output. It has a hidden layer h that describes a code used to represent the input. It is composed of an encoder and a decoder sub-models. The encoder compresses the input and the decoder attempts to recreate the input from the compressed version provided by the encoder.

### 14. Can you explain the difference between a Test Set and a Validation Set?

Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network. Test set: A set of examples used only to assess the performance of a fully-specified classifier. These are the recommended definitions and usages of the terms.

### 15. What do you understand by term hash table collisions?

A collision occurs when two keys are hashed to the same index in a hash table. Collisions are a problem because every slot in a hash table is supposed to store a single element.

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 16. What is a recall?

The precise definition of recall is the number of true positives divided by the number of true positives plus the number of false negatives. Recall can be thought as of a model’s ability to find all the data points of interest in a dataset.

### 17. Discuss normal distribution

In normally distributed data, there is a constant proportion of data points lying under the curve between the mean and a specific number of standard deviations from the mean. Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean.

### 18. While working on a data set, how can you select important variables? Explain

- Remove the correlated variables prior to selecting important variables.
- Use linear regression and select variables based on p values.
- Use Forward Selection, Backward Selection, Stepwise Selection.

### 19. Is it possible to capture the correlation between continuous and categorical variable?

The point biserial correlation is the most intuitive of the various options to measure association between a continuous and categorical variable. … Additionally, it can also help us model and detect non-linear relationships between the categorical and continuous variables.

### 20. Discuss Artificial Neural Networks

An artificial neural network (ANN) is the piece of a computing system designed to simulate the way the human brain analyzes and processes information. It is the foundation of artificial intelligence (AI) and solves problems that would prove impossible or difficult by human or statistical standards.

### 21. What is Back Propagation?

Back propagation is the essence of neural network training. It is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error rates and make the model reliable by increasing its generalization.

### 22. What is a Random Forest?

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large.

### 23. Explain Recommender Systems?

A recommender system, or a recommendation system (sometimes replacing ‘system’ with a synonym such as platform or engine), is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item.

### 24. Explain Collaborative filtering

Collaborative filtering (CF) is a technique used by recommender systems. … In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).

### 25. Do you prefer Python or R for text analytics?

Python would be the best option because it has Pandas library that provides easy to use data structures and high-performance data analysis tools. R is more suitable for machine learning than just text analysis. Python performs faster for all types of text analytics.

### 26. What is the difference between Machine learning Vs Data Mining?

Data mining is designed to extract the rules from large quantities of data, while machine learning teaches a computer how to learn and comprehend the given parameters. Or to put it another way, data mining is simply a method of researching to determine a particular outcome based on the total of the gathered data.

### 27. What is collaborative filtering?

Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets.

### 28. What are the differences between overfitting and underfitting?

Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. Underfitting refers to a model that can neither model the training data nor generalize to new data. Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough.

### 29. What Is Power Analysis?

Power analyses are normally run before a study is conducted. A prospective or a priori power analysis can be used to estimate any one of the four power parameters but is most often used to estimate required sample sizes.

### 30 .Why is data cleaning essential in Data Science?

Efficiency – Cleaning data helps you perform your analysis faster. This is because having clean data means you avoid multiple errors, and your results will be more accurate. Therefore, re-do is not possible in the the whole task due to false results.

Book a Free Mock Interviews and Test your Data Science skills with our Experts Book a Free Mock Interviews