Here is the list Data Science Interview Questions which are recently asked in Cisco Systems company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. What is Data Science?

Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. These systems generate insights which analysts and business users can translate into tangible business value.

### 2. What is logistic regression in Data Science?

Logistic regression is a statistical method for predicting binary classes whose the outcome or target variable is dichotomous in nature. It does assume a linear relationship between the input variables with the output.

### 3. Name three types of biases that can occur during sampling.

The three types of bias can be distinguished: information bias, selection bias, and confounding.

### 4. Discuss Decision Tree algorithm.

The goal of this algorithm is to create a model that predicts the value of a target variable, for which the decision tree uses the tree representation to solve the problem in which the leaf node corresponds to a class label and attributes are represented on the internal node of the tree.

### 5. What is Prior probability and likelihood?

Prior probability shows the likelihood of an outcome in a given dataset. For example, in the mortgage case, P(Y) is the default rate on a home mortgage, which is 2%. P(Y|X) is also called posterior probability. Calculating posterior probability is the objective of data science using Bayes’ theorem.

### 6. Explain Recommender Systems?

A recommender system, or a recommendation system (sometimes replacing ‘system’ with a synonym such as platform or engine), is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item.

### 7. Name three disadvantages of using a linear model

- Linear Regression Only Looks at the Mean of the Dependent Variable. Linear regression looks at a relationship between the mean of the dependent variable and the independent variables.
- Linear Regression Is Sensitive to Outliers.
- Data Must Be Independent.

### 8. Why do you need to perform resampling?

It aims to balance class distribution by randomly eliminating majority class examples. When instances of two different classes are very close to each other, we remove the instances of the majority class to increase the spaces between the two classes which helps in the classification process.

### 9. List out the libraries in Python used for Data Analysis and Scientific Computations.

SciPy (Scientific Python) is the go-to library when it comes to scientific computing used heavily in the fields of mathematics, science, and engineering.

### 10. What are the differences between overfitting and underfitting?

Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points whereas Underfitting refers to a model that can neither model the training data nor generalize to new data.

### 11. What Is K-means? How Can You Select K For K-means?

The K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.

### 12. Why is data cleaning essential in Data Science?

Data cleaning is considered a foundational element of the basic data science. When it comes to the real world data, it is not improbable that data may contain incomplete, inconsistent or missing values. If the data is corrupted then it may hinder the process or provide inaccurate results.

### 13. What do you mean by word Data Science?

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

### 14. Do you think 50 small decision trees are better than a large one? Why?

Yes, 50 create a more robust model (less subject to over-fitting) and easier to interpret when compared to the large one.

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 15. What is Power Analysis?

Power Analysis is the process of estimating one of the 4 variables given values for the 3 variables. It is commonly used to estimate the minimum sample size to carry out an experiment.

### 16. Explain Collaborative filtering

Collaborative filtering filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.

### 17. What is bias?

Bias is a deviation from expectation in the data. In a general sense, bias in data science refers to an error in the data. But, the error is often intricate or is overlooked.

### 18. Discuss ‘Naive’ in a Naive Bayes algorithm?

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

### 19. What is a Linear Regression?

Linear regression is a linear approach to modelling the relationship between a dependent variable and one or more independent variables. In the case of one independent variable it is called simple linear regression. For more than one independent variable, the process is called multiple linear regression.

### 20. State the difference between the expected value and mean value.

Mean or “Average” and “Expected Value” only differ by their applications, however they both are same conceptually. Expected Value is used in case of Random Variables (or in other words Probability Distributions). Since, the average is defined as the sum of all the elements divided by the sum of their frequencies.

### 21. What the aim of conducting A/B Testing?

A/B testing is a method of comparing two versions of a webpage or app against each other to determine which one performs better.

### 22. What is Ensemble Learning?

Ensemble learning is a general meta approach to machine learning that seeks better predictive performance by combining the predictions from multiple models.

### 23. Explain Eigenvalue and Eigenvector?

Eigenvectors are unit vectors which mean that their length or magnitude is equal to 1. They are often referred to as right vectors which simply mean a column vector. Whereas, eigenvalues are coefficients applied to eigenvectors that give the vectors their length or magnitude.

### 24. How will you assess the statistical significance of an insight whether it is a real insight or just by chance?

Statistical significance is often calculated with statistical hypothesis testing, which tests the validity of a hypothesis by figuring out the probability that your results have happened by chance. The result of a hypothesis test allows us to see whether this assumption holds under scrutiny or not.

### 25. What are the basic assumptions to be made for linear regression?

Linearity: The relationship between X and the mean of Y are linear. Homoscedasticity: The variance of residual is the same for any value of X. Independence: Observations are independent of each other.

### 26. How can you iterate over a list and also retrieve element indices at the same time?

In Python, there is an optional start argument to the enumerate function, which I find very helpful when I need to count from 1 or any other number instead of 0.
for index, value in enumerate(numbers, start=1):
print ‘The value at position’, index, ‘is’, value.

### 27. Do gradient descent methods always converge to same point?

Not always. In gradient descent, it depends on where you start (initialize). It is very easy to get stuck in local minima. So if you start from the same point for each solution, it will converge to the same minima.

### 28. What is Collaborative filtering?

Collaborative filtering (CF) is the process of filtering or evaluating items through the opinions of other people. CF technology brings together the opinions of large interconnected communities on the web, supporting filtering of substantial quantities of data.