Here is the list Data Science Interview Questions which are recently asked in Google company. These questions are included for both Freshers and Experienced professionals. Our **Data Science Training** has Answered all the below Questions.

### 1. Derive the equations for GMM.

To Find Gram Molecular Mass
- Look up the relative atomic mass of each element in the formula.
- Multiply the subscript after each element symbol (the number of atoms) by that element’s atomic mass.
- Add all of the values together to find the gram molecular mass.

### 2. What do you mean by word Data Science?

Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis.

### 3. Explain the term botnet?

A botnet is a collection of internet-connected devices infected by malware that allow hackers to control them. Cyber criminals use botnets to instigate botnet attacks, which include malicious activities such as credentials leaks, unauthorized access, data theft and DDoS attacks.

### 4. What is Data Visualization?

Data visualization is defined as a graphical representation that contains the information and the data. By using visual elements like charts, graphs, and maps, data visualization techniques provide an accessible way to see and understand trends, outliers, and patterns in data.

### 5. How you can define Data cleaning as a critical part of process?

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled.

### 6. Point out 7 Ways how Data Scientists use Statistics?

- Design and interpret experiments to inform product decisions.
- Build models that predict signal, not noise.
- Turn big data into the big picture.
- Understand user engagement, retention, conversion, and leads.
- Give your users what they want.
- Estimate intelligently.
- Tell the story with the data.

### 7. Differentiate between Data modeling and Database design?

Database design is stored in the database schema, which is in turn stored in the data dictionary. Data model is a set or collection of construct used for creating a database and producing designs for the databases.

### 8. Describe in brief the data Science Process flowchart?

- Step 1: Frame the problem
- Step 2: Collect the raw data needed for your problem
- Step 3: Process the data for analysis
- Step 4: Explore the data
- Step 5: Perform in-depth analysis
- Step 6: Communicate results of the analysis

### 9. What are Recommender Systems?

Recommender systems predict the preference of the user for these items, which could be in form of a rating or response. When more data becomes available for a customer profile, the recommendations become more accurate.

### 10. Why data cleaning plays a vital role in analysis?

Data cleaning can help in analysis because: Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with. Data Cleaning helps to increase the accuracy of the model in machine learning.

### 11. What do you mean by Deep Learning and Why has it become popular now?

Deep learning is a type of machine learning and artificial intelligence (AI) that imitates the way humans gain certain types of knowledge. Deep Learning is gaining much popularity due to its supremacy in terms of accuracy when trained with huge amount of data.

### 12. What is meant by supervised and unsupervised learning in data?

Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled data. Unsupervised learning model finds the hidden patterns in data. In supervised learning, input data is provided to the model along with the output.

### 13. How often should an algorithm be updated?

Algorithm is updated based upon the data set, trained data, logic enhancement, data collection, training the model using the acquired data, training set to make accurate predictions.

### 14. What is a Linear Regression?

Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line). It is represented by an equation Y=a+b*X + e, where a is intercept, b is slope of the line and e is error term.

### 15. Explain Star Schema?

A star schema is a database organizational structure optimized for use in a data warehouse or business intelligence that uses a single large fact table to store transactional or measured data, and one or more smaller dimensional tables that store attributes about the data.

**Free PDF : Get our updated Data Science Course Content pdf**Download Now

### 16. How Do Data Scientists Use Statistics?

Data scientists use statistics when working with probability distribution. Data scientists use probability on a scale between 0 and 1 for an event to happen, with 0 meaning there is no chance of that event occurring and 1 meaning it will certainly happen.

### 17. What do you understand by term hash table collisions?

A collision occurs when two keys are hashed to the same index in a hash table. Collisions are a problem because every slot in a hash table is supposed to store a single element.

### 18. Compare and contrast R and SAS?

SAS is commercial software, so it needs a financial investment. R is open source software, So, anyone can use it. SAS offers a powerful package which offers all types of statistical analysis and techniques. R is an open source tool which allows users to submit their own packages/libraries.

### 19. What do you understand by letter ‘R’?

R analytics (or R programming language) is free, open-source software used for all kinds of data science, statistics, and visualization projects. R also allows you to build and run statistical models using Sisense data, automatically updating these as new information flows into the model.

### 20. What is Interpolation and Extrapolation?

Interpolation is the process of calculating the unknown value from known given values whereas extrapolation is the process of calculating unknown values beyond the given data points.

### 21. What is Collaborative filtering?

Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).Collaborative filtering filters information by using the interactions and data collected by the system from other users.

### 22. How to design a customer satisfaction survey?

Customer Satisfaction Index (CSI), or Customer Satisfaction Score (CSAT), or Customer Effort Score (CES), or Net Promoter Score (NPS®).

### 23.Explain a probability distribution that is not normal and how to apply that?

To select the correct probability distribution:
- Look at the variable in question.
- Review the descriptions of the probability distributions.
- Select the distribution that characterizes this variable.
- If historical data are available, use distribution fitting to select the distribution that best describes your data.

### 24. Describe the process of data analysis?

- Ask The Right Questions.
- Data Collection.
- Data Cleaning.
- Analyzing the Data.
- Interpreting the Results.

### 25. Mention what is data cleansing?

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled.