Amazon Machine Learning Interview Questions

Here is the list Machine Learning Interview Questions which are recently asked in Amazon company. These questions are included for both Freshers and Experienced professionals.

1. How would you explain Machine Learning to a school-going kid?

Machine learning is an application of Artificial Intelligence where we give machines access to data and let them use that data to learn for themselves. Then, you can input new conditions and it will predict the outcome. It’s basically getting a computer to perform a task without explicitly being programmed to do so.

2. How does Deep Learning differ from Machine Learning?

ML refers to an AI system that can self-learn based on the algorithm. Systems that get smarter and smarter over time without the human intervention is ML. Deep Learning is a machine learning applied to large data sets. Most AI work involves ML because intelligent behaviour requires considerable knowledge.

3. Explain Classification and Regression?

Classification is a process of categorizing a given set of data into classes, It can be performed on both structured or unstructured data. Regression in machine learning consists of mathematical methods that allow data scientists to predict a continuous outcome (y) based on the value of one or more predictor variables (x). Linear regression is probably the most popular form of regression analysis because of its ease-of-use in predicting and forecasting.

4. What do you understand by selection bias?

Selection bias is a kind of error that occurs when the researcher decides who is going to be studied. It is usually associated with the research where the selection of participants isn’t random.

5. What do you understand by Precision and Recall?

Recall is the number of relevant documents retrieved by a search divided by the total number of the existing relevant documents, while precision is the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search.

6. What is a Confusion Matrix?

The confusion is a 26 by 26 matrix with the probability of each reaction to each stimulus. This explains the name and matches the use in machine learning today.

7. What is the difference between inductive and deductive learning?

The main difference between inductive and deductive reasoning is that inductive reasoning aims at developing a theory while deductive reasoning aims at testing an existing theory. Inductive reasoning moves from the specific observations to broad generalizations, and deductive reasoning the other way around.

8. How is KNN different from K-means clustering?

K-means is an unsupervised learning algorithm used for the clustering problem whereas KNN is a supervised learning algorithm used for classification and regression problem. This is the basic difference between K-means and KNN algorithm. It makes predictions by learning from the past available data.

Free PDF : Get our updated Machine Learning Course Content pdf

9. What is ROC curve and what does it represent?

An ROC curve is a graph showing the performance of a classification model at all the classification thresholds. This curve plots two parameters: True Positive Rate. False Positive Rate.

10. What’s the difference between Type I and Type II error?

Type 1 error, in statistical hypothesis testing, is the error caused by rejecting a null hypothesis when it is true. Type II error is the error that occurs when the null hypothesis is accepted when it is not true. Type I error is equivalent to a false positive. Type II error is equivalent to a false negative.

11. Is it better to have too many false positives or too many false negatives? Explain.

In medical testing, false negatives may provide a falsely reassuring message to patients and physicians that the disease is absent, when it is actually present. This sometimes leads to inappropriate or inadequate treatment of both the patient and their disease. So, it is desired to have too many false positive.

12. Which is more important to you – model accuracy or model performance?

The accuracy extremely critical, even if the models would take minutes or hours to make a prediction. Other applications require the real time performance, even if this comes at a cost of accuracy.

13. What is the difference between Entropy and Information Gain?

The information gain is the amount of information gained about a random variable or signal from observing another random variable. Entropy is that the average rate at which information is produced by a stochastic source of data, Or, it is a measure of the uncertainty associated with a random variable.

14. Explain Ensemble learning technique in Machine Learning.

Ensemble methods are meta-algorithms that combine several machine learning techniques into the one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking).

15. What is bagging and boosting in Machine Learning?

Bagging is a method of merging the same type of predictions. Boosting is a method of the merging different types of predictions. Bagging decreases variance, not bias, and solves over-fitting issues in a model. Boosting decreases bias, not variance.

16. What are collinearity and multicollinearity?

Collinearity is a linear association between the two predictors. Multicollinearity is a situation where two or more predictors are highly linearly related. In general, an absolute correlation coefficient of >0.7 among two or more predictors indicates the presence of multicollinearity.

Request more information

Amazon Machine Learning Interview Questions