Here is the list Data Science Interview Questions which are recently asked in Cognizant company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. Explain what regularization is and why it is useful?

Regularization is a technique used for tuning the function by adding an additional penalty term in the error function. The additional term controls the excessively fluctuating function such that the coefficients don’t take extreme values.

### 2. Which data scientists do you admire most? which startups?

Dean Abbott admire the most who is the Co-founder and chief data scientist, SmarterHQ because he is masterful at blending data science with a deep understanding of data science.

### 3. How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression?

The validation of a predictive model requires (i) divide the initial sample set into a training and validation datasets, (ii) infer a model with the training dataset, (iii) evaluate the quality of the model with the validation dataset by computing the aforementioned metrics.

### 4. Explain what precision and recall are. How do they relate to the ROC curve?

**Recall**

Recall is the number of relevant documents retrieved by a search divided by the total number of existing relevant documents.

**Precision**

Precision is the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search.

ROC curve represents a relation between sensitivity (RECALL) and specificity(NOT PRECISION) is commonly used to measure the performance of binary classifiers. However, when dealing with highly skewed datasets, precision recall curves give a more representative picture of performance.

### 5. How can you prove that one improvement you’ve brought to an algorithm is really an improvement over not doing anything?

There are several ideas with potential for improvement to yield better results for the task of improving an algorithm. It can be achieved using A/B testing, where both the versions of algorithm are kept running on similar environment for a considerably long time and real-life input data is randomly split between the two.

### 6. What is root cause analysis?

Root cause analysis (RCA) is the process of discovering the root causes of problems in order to identify appropriate solutions. It assumes that it is much more effective to systematically prevent and solve for underlying issues rather than just treating ad hoc symptoms and putting out fires.

### 7. Are you familiar with price optimization, price elasticity, inventory management, competitive intelligence? Give examples.

**Price optimization:**
The best pricing optimization software utilizes artificial intelligence to measure price elasticity and predict the outcomes of various pricing strategies to generate revenue- or profit-maximizing prices.

**Price elasticity:**
Price elasticity of demand is a measure used in economics to show the responsiveness, or elasticity, of the quantity demanded of a good or service to a change in its price when nothing but the price changes.

**Inventory management:**
Inventory is the accounting of items, component parts and raw materials a company uses in production, or sells. Inventory management is to ensure that you have enough stock on-hand and to identify when there’s a shortage.

**Competitive intelligence:**
Competitive intelligence helps to gather and analyze information about its industry, business environment, competitors, and competitive products and services. The gathering of the information and the analysis will support a company’s strategy as well as identify competitive gaps.

### 8. What is statistical power?

Statistical power is the probability that a test will correctly reject a false null hypothesis. Statistical power has relevance only when the null is false.

### 9. Explain what resampling methods are and why they are useful. Also explain their limitations?

Resampling techniques are a set of methods to either repeat sampling from a given sample or population, or a way to estimate the precision of a statistic. For example, if you’re conducting a Sequential Probability Ratio Test and don’t come to a conclusion, then you resample and rerun the test.

### 10. What is selection bias, why is it important and how can you avoid it?

Selection bias is the term used to describe the situation where an analysis has been conducted among a subset of the data (a sample) with the goal of drawing conclusions about the population, but the resulting conclusions will likely be wrong (biased), because the subgroup differs from the population in some important.

### 11. Are expected value and mean value different?

Expected value is the average value of a random variable over a large number of experiments. A random variable maps numeric values to each possible outcome in an experiment.

The Mean value of a dataset is the average value i.e. a number around which a whole data is spread out. All values used in calculating the average are weighted equally when defining the Mean.

### 12. What Is Power Analysis?

Power Analysis is the process of estimating one of the 4 variables given values for the 3 variables. It is commonly used to estimate the minimum sample size to carry out an experiment.

### 13. How can you iterate over a list and also retrieve element indices at the same time?

There is an optional start argument to the enumerate function, which I find very helpful when I need to count from 1 or any other number instead of 0.
for index, value in enumerate(numbers, start=1):
Print ‘The value at position’, index, ‘is’, value.

### 14. Can you write the formula to calculate R-square?

Calculate predicted values, subtract actual values and square the results. Then divide the first sum of errors (explained variance) by the second sum (total variance), subtract the result from one, and you have the R-squared.

### 15. What does NLP stand for?

Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken and written referred to as natural language. It is a component of artificial intelligence (AI).

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 16. What do you mean by word Data Science?

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

### 17. Explain the term botnet?

A botnet is a collection of internet-connected devices infected by malware that allow hackers to control them. Cyber criminals use botnets to instigate botnet attacks, which include malicious activities such as credentials leaks, unauthorized access, data theft and DDoS attacks.

### 18. What is Data Visualization?

Data visualization, or ‘data viz’ as it’s commonly known, is the graphic presentation of data. Visualizations are aesthetically beautiful, providing layers of detail that generate deeper dimensions of insight and whole new layers of understanding.

### 19. Why data cleaning plays a vital role in analysis?

Data cleaning can help in analysis because: Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with. Data Cleaning helps to increase the accuracy of the model in machine learning.

### 20. What is Linear Regression?

In statistics, linear regression is a linear approach to modelling the relationship between a dependent variable and one or more independent variables. In the case of one independent variable it is called simple linear regression.

### 21. What do you understand by term hash table collisions?

The situation where a newly inserted key maps to an already occupied slot in the hash table is called collision and must be handled using some collision handling technique. Collisions are very likely even if we have big table to store keys.

### 22. Compare and contrast R and SAS?

SAS is a specific programming language designed primarily for statistical analysis of data from spreadsheets or databases. R programming language is widely used among statisticians and data miners to develop statistical software and data analysis.

### 23. What do you understand by letter ‘R’?

R analytics (or R programming language) is free, open-source software used for all kinds of data science, statistics, and visualization projects. R allows building and running statistical models using Sisense data, automatically updating this as new information flows into the model.

### 24. What is the goal of A/B Testing?

The ultimate goal of an A/B test is to build on the learnings from previous experiments and use those insights improve the pages being tested.

### 25. What are Eigenvalue and Eigenvector?

Eigenvectors are the vectors which when multiplied by a matrix (linear combination or transformation) results in another vector having same direction but scaled in forward or reverse direction by a magnitude of the scaler multiple which can be termed as Eigenvalue.

### 26. What are the important libraries of Python that are used in Data Science?

- NumPy.
- SciPy.
- Pandas, Keras.
- SciKit-Learn.
- PyTorch.

### 27. What is the Law of Large Numbers?

The Law of Large Numbers is a theorem within probability theory that suggests that as a trial is repeated, and more data is gathered, the average of the results will get closer to the expected value.

### 28. How Machine Learning Is Deployed In Real World Scenarios?

Machine learning model can be deployed by writing RestAPI’s around it. The few real world examples are Image recognition, speech recognition, medical diagnosis statistical arbitrage, and predictive analytics.

### 29. Why data cleaning plays a vital role in the analysis?

Data cleaning can help in analysis because: Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with. Data Cleaning helps to increase the accuracy of the model in machine learning.