Capgemini – Data Science Interview Questions
Here is the list Data Science Interview Questions which are recently asked in Capgemini company. These questions are included for both Freshers and Experienced professionals. Our Data Science Training has Answered all the below Questions.
1. What do you know about Autoencoders?
Autoencoders are artificial neural networks that can learn from an unlabeled training set. This may be dubbed as unsupervised deep learning. They can be used for either dimensionality reduction or as a generative model, meaning that they can generate new data from input data.
2. Please explain the concept of a Boltzmann Machine.
A Boltzmann Machine is a network of symmetrically connected, neuron- like units that make stochastic decisions about whether to be on or off. Boltz- mann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors.
3. What do you understand by linear regression?
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.
4. What do you understand by logistic regression?
Logistic regression is a statistical analysis method used to predict a data value based on prior observations of a data set. A logistic regression model predicts a dependent data variable by analyzing the relationship between one or more existing independent variables.
5. What is a confusion matrix?
A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known.
6. What is the difference between supervised and unsupervised machine learning?
In supervised learning, input data is provided to the model along with the output. In unsupervised learning, only input data is provided to the model. The goal of supervised learning is to train the model so that it can predict the output when it is given new data. Supervised learning model produces an accurate result.
7. What is bias, variance trade off ?
Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.
8. What is exploding gradients ?
Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights during training. This has the effect of your model being unstable and unable to learn from your training data.
9. Can you name the type of biases that occur in machine learning?
Types of Bias in Machine Learning- Sample Bias.
- Prejudice Bias.
- Confirmation Bias.
- Group attribution Bias.
10. Why is mean square error a bad measure of model performance? What would you suggest instead?
A disadvantage of the mean-squared error is that it is not very interpretable because MSEs vary depending on the prediction task and thus cannot be compared across different tasks.
11. What is collaborative filtering?
Collaborative filtering is a recommendation system that creates a prediction based on a user's previous behaviors. Recommendation systems have made their way into our day-to-day online surfing and have become unavoidable in any online user's journey.
12. Can you write the formula to calculat R-square?
To calculate the total variance, you would subtract the average actual value from each of the actual values, square the results and sum them. From there, divide the first sum of errors (explained variance) by the second sum (total variance), subtract the result from one, and you have the R-square.
13. What is the difference between Bayesian Estimate and Maximum Likelihood Estimation (MLE)?
To solve for parameters in MLE, we took the argmax of the log likelihood function to get numerical solutions for (μ,σ²). In Bayesian estimation, we instead compute a distribution over the parameter space, called the posterior pdf, denoted as p(θ|D).
Free PDF : Get our updated Data Science Course Content pdf
14. What is a confusion matrix ?
A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix. The confusion matrix shows the ways in which your classification model.
15. Explain how a ROC curve works ?
In a ROC curve, a higher X-axis value indicates a higher number of False positives than True negatives. While a higher Y-axis value indicates a higher number of True positives than False negatives. So, the choice of the threshold depends on the ability to balance between False positives and False negatives.
16. What is selection Bias?
Selection bias is a kind of error that occurs when the researcher decides who is going to be studied. It is usually associated with research where the selection of participants isn't random
17. Explain Decision Tree algorithm in detail.
Decision tree algorithm falls under the category of supervised learning. It uses the tree representation to solve the problem in which each leaf node corresponds to a class label and attributes are represented on the internal node of the tree.
18. What is Ensemble Learning ?
Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the classification, prediction, function approximation, etc.
19. What is a Box Cox Transformation?
A Box-Cox power transformation refers to a way of transforming response to satisfy the usual regression assumption of homogeneity and normality of variance. The regression model is therefore used to fit the transformed response.
20. What is deep learning?
Deep Learning is the force that is bringing autonomous driving to life. A million sets of data are fed to a system to build a model, to train the machines to learn, and then test the results in a safe environment.
21. What are Recommender Systems?
Recommender systems are the systems that are designed to recommend things to the user based on many different factors. It finds out the match between user and item and imputes the similarities between users and items for recommendation.
22. What is the difference between Regression and classification ML techniques?
The main difference between Regression and Classification algorithms that Regression algorithms are used to predict the continuous values such as price, salary, age, etc. and Classification algorithms are used to predict/Classify the discrete values such as Male or Female, True or False, Spam or Not Spam, etc.
23. What are various steps involved in an analytics project?
- Step 1: Understand the Business.
- Step 2: Get Your Data.
- Step 3: Explore and Clean Your Data.
- Step 4: Enrich Your Dataset.
- Step 5: Build Helpful Visualizations.
- Step 6: Get Predictive.
- Step 7: Iterate, Iterate, Iterate.
24. Do gradient descent methods always converge to same point?
Not always. In gradient descent, it depends on where you start (initialize). It is very easy to get stuck in local minima. So if we start from the same point for each solution, it will converge to the same minima.
25. What is Collaborative filtering?
Collaborative Filtering is a technique widely used by recommender systems when you have a decent size of user — item data. It makes recommendations based on the content preferences of similar users.
26. What are the time series algorithms?
The Time Series mining function provides the following algorithms to predict future trends: Autoregressive Integrated Moving Average (ARIMA) Exponential Smoothing. Seasonal Trend Decomposition. Time series data is simply a set of ordered data points with respect to time.
27. Describe the structure of Artificial Neural Networks?
Artificial neural networks are made of three layers namely input layer, output layer, and hidden layer/s. There must be a connection from the nodes in the input layer with the nodes in the hidden layer and from each hidden layer node with the nodes of the output layer. The input layer takes the data from the network.
Book a Free Mock Interviews and Test your Data Science skills with our Experts
TOP MNC's DATA SCIENCE INTERVIEW QUESTIONS & ANSWERS
Here we listed all Data Science Interview Questions and Answers which are asked in Top MNCs. Periodically we update this page with recently asked Questions, please do visit our page often and be updated in Data Science.
To become a Data Science Certified professional and join in your dream company, Enroll now for our Best Data Science Training. We help you to crack any levels of Data Science Interviews and We offering Data Science Training with Placement Assistance.