Here is the list Data Science Interview Questions which are recently asked in Mindtree company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. Explain Eigenvalue and Eigenvector

Geometrically, an eigenvector, corresponding to a real nonzero eigenvalue, points in a direction in which it is stretched by the transformation and the eigenvalue is the factor by which it is stretched. If the eigenvalue is negative, the direction is reversed.

### 2. Define the term cross-validation

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model. … The basic form of cross-validation is k-fold cross-validation.

### 3. Explain the steps for a Data analytics project

- Step 1: Understand the Business.
- Step 2: Get Your Data.
- Step 3: Explore and Clean Your Data.
- Step 4: Enrich Your Dataset.
- Step 5: Build Helpful Visualizations.
- Step 6: Get Predictive.
- Step 7: Iterate, Iterate, Iterate

### 4. Discuss Artificial Neural Networks

An artificial neural network (ANN) is the piece of a computing system designed to simulate the way the human brain analyzes and processes information. It is the foundation of artificial intelligence (AI) and solves problems that would prove impossible or difficult by human or statistical standards.

### 5. What is Back Propagation?

Back propagation is the essence of neural network training. It is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch (i.e., iteration)

### 6. What is a Random Forest?

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees.

### 7. What is the importance of having a selection bias?

Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed.

### 8. What is the K-means clustering method

Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. The less variation we have within clusters, the more homogeneous (similar) the data points are within the same cluster

### 9. Explain the difference between Data Science and Data Analytics

Data analytics is mainly concerned with Statistics, Mathematics, and Statistical Analysis. Data science is an umbrella term for a group of fields that are used to mine large datasets. Data analytics is a more focused version of this and can even be considered part of the larger process.

### 10. Explain p-value?

A p-value is a measure of the probability that an observed difference could have occurred just by random chance. The lower the p-values is the greater the statistical significance of the observed difference. P-value can be used as an alternative to or in addition to pre-selected confidence levels for hypothesis testing.

### 11. What are the important libraries of Python that are used in Data Science?

- TensorFlow.
- NumPy.
- SciPy.
- Pandas.
- Matplotlib

### 12. What does NLP stand for?

NLP stands for Natural Language Processing. It is a subfield or branch of Artificial intelligence (AI) that enables computers to understand human languages and process them in a manner that is valuable.

### 13. What is Interpolation and Extrapolation?

Interpolation is the process of calculating the unknown value from known given values whereas extrapolation is the process of calculating unknown values beyond the given data points.

### 14. How can the outlier values be treated?

- Set up a filter in your testing tool. Even though this has a little cost, filtering out outliers is worth it.
- Remove or change outliers during post-test analysis.
- Change the value of outliers.
- Consider the underlying distribution.
- Consider the value of mild outliers.

### 15. How often should an algorithm be updated?

Algorithm can be updated based on many criteria such as data updation, domain expertise changes, data size, performance and much more.

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 16. Define the term deep learning

Deep learning is a type of machine learning and artificial intelligence (AI) that imitates the way humans gain certain types of knowledge. Deep learning is an important element of data science, which includes statistics and predictive modeling.

### 17. Explain the method to collect and analyze data to use social media to predict the weather condition.

Observational data collected by doppler radar, radiosondes, weather satellites, buoys and other instruments are fed into computerized NWS numerical forecast models. The models use equations, along with new and past weather data, to provide forecast guidance to our meteorologists.

### 18. When do you need to update the algorithm in Data science?

Knowledge of algorithms and data structures is useful for data scientists because our solutions are inevitably written in code. As such, it is important to understand the structure of our data and how to think in terms of algorithms. It can be updated based on the need for change n data, domain, logic, data set, data size etc.

### 19. What is Normal Distribution

The normal distribution is a probability function that describes how the values of a variable are distributed. It is a symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions.

### 20. Which language is best for text analytics? R or Python?

Python would be the best option because it has Pandas library that provides easy to use data structures and high-performance data analysis tools. R is more suitable for machine learning than just text analysis. Python performs faster for all types of text analytics.

### 21. Explain the benefits of using statistics by Data Scientists

Statistics is the basic use of mathematics in formulating a technical analysis of data. It is used to process complex problems in the real world so that data scientists and analysts can look for meaningful trends and changes in Data

### 22. Name various types of Deep Learning Frameworks

- Tensorflow.
- Keras.
- PyTorch.
- MxNet.
- Chainer.

### 23. What is skewed Distribution & uniform distribution?

Skewed distribution is a condition when one side (either right or left) of the graph has more dataset in comparison to the other side. Uniform distribution is a condition when all the observations in a dataset are equally spread across the range of distribution.

### 24. What is reinforcement learning?

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment.

### 25. What is precision?

Precision quantifies the number of positive class predictions that actually belong to the positive class. Recall quantifies the number of positive class predictions made out of all positive examples in the dataset. F-Measure provides a single score that balances both the concerns of precision and recall in one number.

### 26. Explain Cross-validation?

Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation.

### 27. Do you prefer Python or R for text analytics?

Python would be the best option because it has Pandas library that provides easy to use data structures and high-performance data analysis tools. R is more suitable for machine learning than just text analysis. Python performs faster for all types of text analytics.

### 28. What is Cluster Sampling?

Cluster sampling divides the population into groups or clusters. A number of clusters are selected randomly to represent the total population, and then all units within selected clusters are included in the sample

### 29. What tools or devices help you succeed in your role as a data scientist?

- SAS.
- Apache Spark.
- BigML.
- D3.
- MATLAB.

### 30. Where to seek help in case of discrepancies in Tableau?

For help in determining who your site administrator is, please reach out to our Customer Support team at customerservice@tableau.com and include your Tableau Online site name if you have it, sales order number, and invoice with your email so we can make sure you get access as soon as possible.

Book a Free Mock Interviews and Test your Data Science skills with our Experts Book a Free Mock Interviews