AMAZON DATA SCIENCE INTERVIEW QUESTIONS

Here is the list Data Science Interview Questions which are recently asked in Amazon company. These questions are included for both Freshers and Experienced professionals. Our Data Science Training has Answered all the below Questions.

1. Estimate the probability of a disease in a particular city given that the probability of the disease on a national level is low.

Bayes theorem helps to estimate the probability of a disease in a particular city in which the probability of the disease on a national level is low.

2. How will inspect missing data and when are they important for your analysis?

The most common approach to the missing data is to simply omit those cases with the missing data and analyze the remaining data. This approach is known as the complete case (or available case) analysis or listwise deletion. The concept of missing values is important to understand in order to successfully manage data.

3. How will you decide whether a customer will buy a product today or not given the income of the customer, location where the customer lives, profession and gender? Define a machine learning algorithm for this.

For any give situation, the following five core steps are needed to define a machine learning algorithm,

Get Data. The first step in the Machine Learning process is getting data for the given situation.
Clean, Prepare & Manipulate Data because real-world data often has unorganized, missing, or noisy elements.
Train Model. This step is where the magic happens! To solve the problem.
Test Model. Now, it’s time to validate the trained model for improvement and finally improve.

4. From a long sorted list and a short 4 element sorted list, which algorithm will you use to search the long sorted list for 4 elements.

Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until narrowed down the possible locations to just one.

5. Why do you want to work as a data scientist?

Data Scientist is the hottest job that helps to brings in skill sets and knowledge from various backgrounds such as mathematics, statistics, Analytics, modeling, and business acumen. These skills help them to identify patterns which can help the organization to recognize new market opportunities.

6. What Is K-means? How Can You Select K For K-means?

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. There is a popular method known as elbow method which is used to determine the optimal value of K to perform the K-Means Clustering Algorithm. This method is used to plots the various values of cost with changing k. As the value of K increases, there will be fewer elements in the cluster.

7. What are the differences between overfitting and underfitting?

Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. Underfitting refers to a model that can neither model the training data nor generalize to new data.

8. What is collaborative filtering?

Collaborative filtering (CF) is a technique used by recommender systems which is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users.

9. What are the types of biases that can occur during sampling?

Sampling bias occurs when some members of a population are systematically more likely to be selected in a sample than others. The some common types of sampling bias include self-selection, non-response, undercoverage, survivorship, pre-screening or advertising, and healthy user bias.

10. Explain Star Schema?

Star schema is the fundamental, simplest schema among the data mart schema. It is widely used to develop or build a data warehouse and dimensional data marts. It includes one or more fact tables indexing any number of dimensional tables and the cause of the snowflake schema.

Free PDF : Get our updated Data Science Course Content pdf

11. How can you compare a neural network that has one layer, one input and output to a logistic regression model?

The difference between a classification and regression is that a classification outputs a prediction probability for class/classes and regression provides a value. We can make a neural network to output a value by simply changing the activation function in the final layer to output the values.

12. How do you treat colinearity?

Colinearity is treated by remove some of the highly correlated independent variables. Linearly combine the independent variables, such as adding them together.

13. How will you deal with unbalanced data where the ratio of negative and positive is huge?

Use the right evaluation metrics.
Resample the training set.
Use K-fold Cross-Validation in the right way.
Ensemble different resampled datasets.
Resample with different ratios.

14. What is the difference between

i) Stack and queue

ii) LinkedIn and array

Stack and Queue

Stack is used in solving problems works on recursion.Queue is used in solving problems having sequential processing. The difference between stacks and queues is in removing. In a stack we remove the item the most recently added; in a queue, we remove the item the least recently added.

Linkedin and Array

Array is a collection of elements of similar data type whereas Linked List is an ordered collection of elements of same type, which are connected to each other using pointers.

15. What is a Linear Regression?

In linear regression, we predict the mean of the dependent variable for given independent variables. Since mean does not describe the whole distribution, so modeling the mean is not a full description of a relationship between dependent and independent variables.

16. What are the types of machine learning?

Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. The three types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.

17. What is meant by supervised and unsupervised learning in data?

To put it simply, supervised learning uses labeled input and output data, while an unsupervised learning algorithm does not. In supervised learning, the algorithm “learns” from the training dataset by iteratively making predictions on the data and adjusting for the correct answer.

18. Compare Sas, R, And Python Programming?

SAS is probably the easiest to learn of all three. It has a good GUI that makes it even easier to learn and use. Python is a high level, object-oriented language, and is easier to learn than R. When it comes to learning, SAS is the easiest to learn, followed by Python and R.

19. What are the time series algorithms?

Time-series data is simply a set of ordered data points with respect to time. The Time Series mining function provides the following algorithms to predict future trends: Autoregressive Integrated Moving Average (ARIMA) Exponential Smoothing, Seasonal Trend Decomposition.

20. What is Interpolation and Extrapolation?

When we predict values that fall within the range of data points taken it is called interpolation. When we predict values for points outside the range of data taken it is called extrapolation.

Request more information

AMAZON DATA SCIENCE INTERVIEW QUESTIONS