Here is the list Data Science Interview Questions which are recently asked in Blackboard company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. How do data scientists use statistics?

Data scientists use a combination of statistical formulas and computer algorithms to notice patterns and trends within data. Then, they use their knowledge of social sciences and a particular industry or sector to interpret the meaning of those patterns and how they apply to real-world situations.

### 2. What’s the difference between SAS, R, And Python Programming?

SAS is probably the easiest to learn of all three. It has a good GUI that makes it even easier to learn and use. Python is a high level, object-oriented language, and is easier to learn than R. When it comes to learning, SAS is the easiest to learn, followed by Python and R.

### 3. What are interpolation and extrapolation?

Interpolation is used to predict values that exist within a data set, and extrapolation is used to predict values that fall outside of a data set and use known values to predict unknown values.

### 4. What is the difference between population and sample in data?

A population is the entire group that you want to draw conclusions about. A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population. In research, a population doesn’t always refer to people.

### 5. What are the steps in making a decision tree?

- Identify the decision.
- Gather relevant information.
- Identify the alternatives.
- Weigh the evidence.
- Choose among alternatives.
- Review your decision.
- Take action
.

### 6. How is machine learning deployed in real-world scenarios?

If things have gone well run a front-end web app on the machine that allows accessing the machine learning model predictions. Once your web app and API containers are deployed to these platforms, you will then have a public URL you or anyone can access to interact with your machine learning model.

### 7. What is Collaborative filtering?

Collaborative filtering (CF) is the process of filtering or evaluating items through the opinions of other people. CF technology brings together the opinions of large interconnected communities on the web, supporting filtering of substantial quantities of data.

### 8. What do you mean by the term linear regression?

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable.

### 9. Where to seek help in case of discrepancies in Tableau?

**The data presented in a report within Salesforce does not match the data imported into Tableau.**
- Set up the data source by joining relevant tables, instead of using a standard connection.
- Account for timezone difference.
- Check the filters applied to the Salesforce report.

### 10. Why is resampling done?

Resampling methods are used to ensure that the model is good enough and can handle variations in data. The model does that by training it on the variety of patterns found in the dataset.

### 11. What is underfitting?

A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data. Underfitting destroys the accuracy of our machine learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough.

### 12. What is the difference between Machine learning Vs Data Mining?

Data mining is used on an existing dataset (like a data warehouse) to find patterns. Machine learning, on the other hand, is trained on a ‘training’ data set, which teaches the computer how to make sense of data, and then to make predictions about new data sets.

### 13. Do you prefer Python or R for text analytics?

Python would be the best option because it has Pandas library that provides easy to use data structures and high-performance data analysis tools. R is more suitable for machine learning than just text analysis. Python performs faster for all types of text analytics.

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 14. What is the difference between extrapolation and interpolation?

Extrapolation is approximating a value by extending a known set of values or facts. Extrapolation is an estimation of a value based on extending a known sequence of values or facts beyond the area that is certainly known. Interpolation is an estimation of a value within two known values in a sequence of values.

### 15. What is the purpose of A/B testing?

A/B testing, in the context of email, is the process of sending one variation of your campaign to a subset of your subscribers and a different variation to another subset of subscribers, with the ultimate goal of working out which variation of the campaign garners the best results.

### 16. How different is a mean value different from expected value?

While mean is the simple average of all the values, expected value of expectation is the average value of a random variable which is probability-weighted. The concept of expectation can be easily understood by an example that involves tossing up a coin 10 times.

### 17. Why is it mandatory to clean a data set?

Data cleansing or scrubbing or appending is the procedure of correcting or removing inaccurate and corrupt data. This process is crucial and emphasized because wrong data can drive a business to wrong decisions, conclusions, and poor analysis, especially if the huge quantities of big data are into the picture.

### 18. What are the steps involved in analytics projects?

- Understand the Business.
- Get Your Data.
- Explore and Clean Your Data.
- Enrich Your Dataset.
- Build Helpful Visualizations.
- Get Predictive.
- Iterate.

### 19. What do you understand by the term recommender systems?

A recommender system, or a recommendation system is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item.

### 20. If you had to choose between the programming languages R and Python, Which one would you use for text analytics?

R programming is better suited for statistical learning, with unmatched libraries for data exploration and experimentation. Python is a better choice for machine learning and large-scale applications, especially for data analysis within web applications.

### 21. For linear regression, what are some of the assumptions a data scientist is most likely to make?

- Linear relationship.
- Multivariate normality.
- No or little multicollinearity.
- No auto-correlation.
- Homoscedasticity.

### 22. How do you find the correlation between a categorical variable and a continuous variable?

There are three big-picture methods to understand if a continuous and categorical are significantly correlated — point biserial correlation, logistic regression, and Kruskal Wallis H Test. The point biserial correlation coefficient is a special case of Pearson’s correlation coefficient.

### 23. Can you explain the difference between a Test Set and a Validation Set?

Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network. – Test set: A set of examples used only to assess the performance of a fully-specified classifier. These are the recommended definitions and usages of the terms.

### 24. What is an Auto-Encoder?

An Autoencoder is a unsupervised artificial neural network used for learning. It learns how to compress data then reconstructing it back to a representation close to the original form.

### 25. How can the outlier values be treated?

- Set up a filter in your testing tool. Even though this has a little cost, filtering out outliers is worth it.
- Remove or change outliers during post-test analysis.
- Change the value of outliers.
- Consider the underlying distribution.
- Consider the value of mild outliers.

### 26. Describe the structure of Artificial Neural Networks?

ANN is made of three layers namely input layer, output layer, and hidden layer/s. There must be a connection from the nodes in the input layer with the nodes in the hidden layer and from each hidden layer node with the nodes of the output layer. The input layer takes the data from the network.