Shell – Machine Learning Interview Questions
Here is the list Machine Learning Interview Questions which are recently asked in Shell company. These questions are included for both Freshers and Experienced professionals.
1. What is A/B Testing?
A/B testing is a common and powerful marketing technique. Before sending out a marketing message, a marketer would send "test" versions to a portion of the audience members to see which performs better. Using an A/B test gives you an idea of what delights your customers the most.
2. What is Cluster Sampling?
Cluster sampling is a type of the probability sampling in which every and each element of the population is selected equally, we use the subsets of the population as the sampling part rather than the individual elements for sampling.
3. Name a few libraries in Python used for Data Analysis and Scientific Computations.Pandas is an open-source Python package that the provides high-performance, easy-to-use data structures and data analysis tools for the labeled data in Python programming language.
- Scikit Learn.
4. How are NumPy and SciPy related?
NumPy stands for Numerical Python while SciPy stands for Scientific Python. Both of their functions are written in the Python language. We use NumPy for homogenous array operations. We use NumPy for the manipulation of elements of numerical array data.
5. What is the main difference between a Pandas series and a single-column DataFrame in Python?
A Pandas Series is one dimensioned whereas a DataFrame is two dimensioned. Therefore, a single column DataFrame can have a name for its single column but a Series cannot have a column name. Series is a type of the list in pandas which can take integer values, string values, double values and more.
6. How can you handle duplicate values in a dataset for a variable in Python?
A dictionary is a mutable unordered collection that Python indexes with name and value pairs. List- A list is a mutable ordered collection that allows the duplicate elements. Set - A set is a mutable unordered collection with no duplicate elements.
7. How do you map nicknames (Pete, Andy, Nick, Rob, etc) to real names?
- This problem can be solved in n number of ways. Let’s assume that you’re given a data set containing 1000s of twitter interactions. You will begin by studying the relationship between two people by carefully analyzing the words used in the tweets.
- This kind of problem statement can be solved by implementing Text Mining using Natural Language Processing techniques, wherein each word in a sentence is broken down and co-relations between various words are found.
- NLP is actively used in understanding customer feedback, performing sentimental analysis on Twitter and Facebook. Thus, one of the ways to solve this problem is through Text Mining and Natural Language Processing techniques.
8. Is rotation necessary in PCA? If yes, Why? What will happen if you don’t rotate the components?
Yes, rotation is necessary to account the maximum variance of the training set. If we don't rotate the components, the effect of PCA will diminish and we will have to select more number of components to explain variance in the training set.
9. Why is naive Bayes so ‘naive’?
Naive Bayes is called naive because it assumes that the each input variable is independent. This is a strong assumption and unrealistic for real data; however, the technique is very effective on a large range of complex problems.
10. How is kNN different from kmeans clustering?
KNN represents a supervised classification algorithm that will give new data points accordingly to the k number or the closest data points, while k-means clustering is an unsupervised clustering algorithm that gathers and groups data into the k number of clusters.
11. How is True Positive Rate and Recall related? Write the equation.
In machine learning, the true positive rate, also referred to sensitivity or recall, is used to measure the percentage of actual positives which are correctly identified.
12. When is Ridge regression favorable over Lasso regression?
You can quote ISLR's authors Hastie, Tibshirani who asserted that, in presence of few variables with medium / large sized effect, use lasso regression. In presence of many variables with small / medium sized effect, use ridge regression.
13. While working on a data set, how do you select important variables? Explain your methods.Following are the methods of variable selection you can use:
- Remove the correlated variables prior to selecting important variables
- Use linear regression and select variables based on p values
- Use Forward Selection, Backward Selection, Stepwise Selection
- Use Random Forest, Xgboost and plot variable importance chart
- Use Lasso Regression
- Measure information gain for the available set of features and select top n features accordingly.
14. What is the difference between covariance and correlation?
- “Covariance” indicates the direction of the linear relationship between variables.
- “Correlation” on the other hand measures both the strength and direction of the linear relationship between two variables.