DATA SCIENCE INTERVIEW QUESTIONS
[ PART - 13]
𝐐1. 𝐇𝐨𝐰 𝐭𝐨 𝐢𝐝𝐞𝐧𝐭𝐢𝐟𝐲 𝐚 𝐜𝐚𝐮𝐬𝐞 𝐯𝐬. 𝐚 𝐜𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧? 𝐆𝐢𝐯𝐞 𝐞𝐱𝐚𝐦𝐩𝐥𝐞𝐬.
Ans. While causation and correlation can exist at the same time, correlation does not imply causation. Causation explicitly applies to cases where action A causes outcome B. On the other hand, correlation is simply a relationship. Correlation between Ice cream sales and sunglasses sold. As the sales of ice creams is increasing so do the sales of sunglasses. Causation takes a step further than correlation.
𝐐2. 𝐩𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧, 𝐚𝐜𝐜𝐮𝐫𝐚𝐜𝐲 𝐚𝐧𝐝 𝐫𝐞𝐜𝐚𝐥𝐥?
Ans. The recall is the ratio of the relevant results returned by the search engine to the total number of the relevant results that could have been returned. The precision is the proportion of relevant results in the list of all returned search results. Accuracy is the measurement used to determine which model is best at identifying relationships and patterns between variables in a dataset based on the input, or training, data.
𝐐3. 𝐜𝐡𝐨𝐨𝐬𝐞 𝐤 𝐢𝐧 𝐤-𝐦𝐞𝐚𝐧𝐬?
Ans. There is a popular method known as elbow method which is used to determine the optimal value of K to perform the K-Means Clustering Algorithm. The basic idea behind this method is that it plots the various values of cost with changing k. As the value of K increases, there will be fewer elements in the cluster.
𝐐4. 𝐰𝐨𝐫𝐝2𝐯𝐞𝐜 𝐦𝐞𝐭𝐡𝐨𝐝𝐬?
Ans. Word2vec is a technique for natural language processing published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence.
𝐐5. P𝐫𝐮𝐧𝐢𝐧𝐠 𝐢𝐧 𝐜𝐚𝐬𝐞 𝐨𝐟 𝐝𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐭𝐫𝐞𝐞𝐬?
Ans. Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances.
ENJOY LEARNING 👍👍
[ PART - 13]
𝐐1. 𝐇𝐨𝐰 𝐭𝐨 𝐢𝐝𝐞𝐧𝐭𝐢𝐟𝐲 𝐚 𝐜𝐚𝐮𝐬𝐞 𝐯𝐬. 𝐚 𝐜𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧? 𝐆𝐢𝐯𝐞 𝐞𝐱𝐚𝐦𝐩𝐥𝐞𝐬.
Ans. While causation and correlation can exist at the same time, correlation does not imply causation. Causation explicitly applies to cases where action A causes outcome B. On the other hand, correlation is simply a relationship. Correlation between Ice cream sales and sunglasses sold. As the sales of ice creams is increasing so do the sales of sunglasses. Causation takes a step further than correlation.
𝐐2. 𝐩𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧, 𝐚𝐜𝐜𝐮𝐫𝐚𝐜𝐲 𝐚𝐧𝐝 𝐫𝐞𝐜𝐚𝐥𝐥?
Ans. The recall is the ratio of the relevant results returned by the search engine to the total number of the relevant results that could have been returned. The precision is the proportion of relevant results in the list of all returned search results. Accuracy is the measurement used to determine which model is best at identifying relationships and patterns between variables in a dataset based on the input, or training, data.
𝐐3. 𝐜𝐡𝐨𝐨𝐬𝐞 𝐤 𝐢𝐧 𝐤-𝐦𝐞𝐚𝐧𝐬?
Ans. There is a popular method known as elbow method which is used to determine the optimal value of K to perform the K-Means Clustering Algorithm. The basic idea behind this method is that it plots the various values of cost with changing k. As the value of K increases, there will be fewer elements in the cluster.
𝐐4. 𝐰𝐨𝐫𝐝2𝐯𝐞𝐜 𝐦𝐞𝐭𝐡𝐨𝐝𝐬?
Ans. Word2vec is a technique for natural language processing published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence.
𝐐5. P𝐫𝐮𝐧𝐢𝐧𝐠 𝐢𝐧 𝐜𝐚𝐬𝐞 𝐨𝐟 𝐝𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐭𝐫𝐞𝐞𝐬?
Ans. Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances.
ENJOY LEARNING 👍👍
👍3
Which of the following is used to read csv file in python using pandas?
import pandas as pd
import pandas as pd
Anonymous Quiz
10%
pd.readcsv(file.csv)
80%
pd.read_csv("file.csv")
6%
pd.read(file)
4%
pd(read_csv.file)
𝐓𝐨𝐝𝐚𝐲'𝐬 𝐢𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐐𝐮𝐞𝐬𝐭 𝐍 𝐀𝐧𝐬
DATA SCIENCE INTERVIEW QUESTIONS
[PART - 14]
𝐐1. 𝐅𝐞𝐚𝐭𝐮𝐫𝐞 𝐬𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧 𝐦𝐞𝐭𝐡𝐨𝐝𝐬 𝐟𝐨𝐫 𝐬𝐞𝐥𝐞𝐜𝐭𝐢𝐧𝐠 𝐭𝐡𝐞 𝐫𝐢𝐠𝐡𝐭 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞𝐬 𝐟𝐨𝐫 𝐛𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐩𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥𝐬?
Ans. Some of the Feature selection techniques are: Information Gain, Chi-square test, Correlation Coefficient, Mean Absolute Difference (MAD), Exhaustive selection, Forward selection, Regularization.
𝐐2. 𝐓𝐫𝐞𝐚𝐭 𝐦𝐢𝐬𝐬𝐢𝐧𝐠 𝐯𝐚𝐥𝐮𝐞𝐬?
Ans. They are:
1. List wise or case deletion
2. Pairwise deletion
3. Mean substitution
4. Regression imputation
5. Maximum likelihood.
𝐐3. 𝐚𝐬𝐬𝐮𝐦𝐩𝐭𝐢𝐨𝐧𝐬 𝐮𝐬𝐞𝐝 𝐢𝐧 𝐥𝐢𝐧𝐞𝐚𝐫 𝐫𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨𝐧? 𝐖𝐡𝐚𝐭 𝐰𝐨𝐮𝐥𝐝 𝐡𝐚𝐩𝐩𝐞𝐧 𝐢𝐟 𝐭𝐡𝐞𝐲 𝐚𝐫𝐞 𝐯𝐢𝐨𝐥𝐚𝐭𝐞𝐝?
Ans. 1. Linear relationship.
2. Multivariate normality.
3. no or little multicollinearity.
4. no auto-correlation.
5. Homoscedasticity.
Data to be analyzed by linear regression were sampled violate one or more of the linear regression assumptions, the results of the analysis may be incorrect or misleading.
𝐐4. 𝐇𝐨𝐰 𝐢𝐬 𝐭𝐡𝐞 𝐠𝐫𝐢𝐝 𝐬𝐞𝐚𝐫𝐜𝐡 𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐫𝐚𝐧𝐝𝐨𝐦 𝐬𝐞𝐚𝐫𝐜𝐡 𝐭𝐮𝐧𝐢𝐧𝐠 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐲?
Ans. Random search differs from grid search in that we no longer provide an explicit set of possible values for each hyperparameter; rather, we provide a statistical distribution for each hyperparameter from which values are sampled. Essentially, we define a sampling distribution for each hyperparameter to carry out a randomized search.
𝐐5. 𝐈𝐬 𝐢𝐭 𝐠𝐨𝐨𝐝 𝐭𝐨 𝐝𝐨 𝐝𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧𝐚𝐥𝐢𝐭𝐲 𝐫𝐞𝐝𝐮𝐜𝐭𝐢𝐨𝐧 𝐛𝐞𝐟𝐨𝐫𝐞 𝐟𝐢𝐭𝐭𝐢𝐧𝐠 𝐚 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐕𝐞𝐜𝐭𝐨𝐫 𝐌𝐨𝐝𝐞𝐥?
𝐀ns. Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to perform dimensionality reduction before fitting an SVM if the number of features is large when compared to the number of observations.
𝐐6. 𝐑𝐎𝐂 𝐂𝐮𝐫𝐯𝐞?
Ans ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.
ENJOY LEARNING 👍👍
DATA SCIENCE INTERVIEW QUESTIONS
[PART - 14]
𝐐1. 𝐅𝐞𝐚𝐭𝐮𝐫𝐞 𝐬𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧 𝐦𝐞𝐭𝐡𝐨𝐝𝐬 𝐟𝐨𝐫 𝐬𝐞𝐥𝐞𝐜𝐭𝐢𝐧𝐠 𝐭𝐡𝐞 𝐫𝐢𝐠𝐡𝐭 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞𝐬 𝐟𝐨𝐫 𝐛𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐩𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥𝐬?
Ans. Some of the Feature selection techniques are: Information Gain, Chi-square test, Correlation Coefficient, Mean Absolute Difference (MAD), Exhaustive selection, Forward selection, Regularization.
𝐐2. 𝐓𝐫𝐞𝐚𝐭 𝐦𝐢𝐬𝐬𝐢𝐧𝐠 𝐯𝐚𝐥𝐮𝐞𝐬?
Ans. They are:
1. List wise or case deletion
2. Pairwise deletion
3. Mean substitution
4. Regression imputation
5. Maximum likelihood.
𝐐3. 𝐚𝐬𝐬𝐮𝐦𝐩𝐭𝐢𝐨𝐧𝐬 𝐮𝐬𝐞𝐝 𝐢𝐧 𝐥𝐢𝐧𝐞𝐚𝐫 𝐫𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨𝐧? 𝐖𝐡𝐚𝐭 𝐰𝐨𝐮𝐥𝐝 𝐡𝐚𝐩𝐩𝐞𝐧 𝐢𝐟 𝐭𝐡𝐞𝐲 𝐚𝐫𝐞 𝐯𝐢𝐨𝐥𝐚𝐭𝐞𝐝?
Ans. 1. Linear relationship.
2. Multivariate normality.
3. no or little multicollinearity.
4. no auto-correlation.
5. Homoscedasticity.
Data to be analyzed by linear regression were sampled violate one or more of the linear regression assumptions, the results of the analysis may be incorrect or misleading.
𝐐4. 𝐇𝐨𝐰 𝐢𝐬 𝐭𝐡𝐞 𝐠𝐫𝐢𝐝 𝐬𝐞𝐚𝐫𝐜𝐡 𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐫𝐚𝐧𝐝𝐨𝐦 𝐬𝐞𝐚𝐫𝐜𝐡 𝐭𝐮𝐧𝐢𝐧𝐠 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐲?
Ans. Random search differs from grid search in that we no longer provide an explicit set of possible values for each hyperparameter; rather, we provide a statistical distribution for each hyperparameter from which values are sampled. Essentially, we define a sampling distribution for each hyperparameter to carry out a randomized search.
𝐐5. 𝐈𝐬 𝐢𝐭 𝐠𝐨𝐨𝐝 𝐭𝐨 𝐝𝐨 𝐝𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧𝐚𝐥𝐢𝐭𝐲 𝐫𝐞𝐝𝐮𝐜𝐭𝐢𝐨𝐧 𝐛𝐞𝐟𝐨𝐫𝐞 𝐟𝐢𝐭𝐭𝐢𝐧𝐠 𝐚 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐕𝐞𝐜𝐭𝐨𝐫 𝐌𝐨𝐝𝐞𝐥?
𝐀ns. Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to perform dimensionality reduction before fitting an SVM if the number of features is large when compared to the number of observations.
𝐐6. 𝐑𝐎𝐂 𝐂𝐮𝐫𝐯𝐞?
Ans ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.
ENJOY LEARNING 👍👍
❤1👍1
DATA SCIENCE INTERVIEW QUESTIONS
[PART -15]
𝐐1. 𝐃𝐞𝐚𝐥 𝐰𝐢𝐭𝐡 𝐮𝐧𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐝 𝐛𝐢𝐧𝐚𝐫𝐲 𝐜𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧?
𝐀ns. Techniques to Handle unbalanced Data:
1. Use the right evaluation metrics
2. Use K-fold Cross-Validation in the right way
3. Ensemble different resampled datasets
4. Resample with different ratios
5. Design your own models
𝐐2. 𝐀𝐜𝐭𝐢𝐯𝐚𝐭𝐢𝐨𝐧 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧?
𝐀ns. Activation functions are mathematical equations that determine the output of a neural network model. It is a non-linear transformation that we do over the input before sending it to the next layer of neurons or finalizing it as output.
𝐐3. 𝐃𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧 𝐫𝐞𝐝𝐮𝐜𝐭𝐢𝐨𝐧?
𝐀ns. Dimensionality Reduction is used to reduce the feature space with consideration by a set of principal features.
𝐐4. 𝐖𝐡𝐲 𝐢𝐬 𝐦𝐞𝐚𝐧 𝐬𝐪𝐮𝐚𝐫𝐞 𝐞𝐫𝐫𝐨𝐫 𝐚 𝐛𝐚𝐝 𝐦𝐞𝐚𝐬𝐮𝐫𝐞 𝐨𝐟 𝐦𝐨𝐝𝐞𝐥 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞?
𝐀ns. Mean Squared Error (MSE) gives a relatively high weight to large errors — therefore, MSE tends to put too much emphasis on large deviations.
𝐐5. 𝐑𝐞𝐦𝐨𝐯𝐞 𝐦𝐮𝐥𝐭𝐢𝐜𝐨𝐥𝐥𝐢𝐧𝐞𝐚𝐫𝐢𝐭𝐲?
𝐀ns. To remove multicollinearities, we can do two things.
1. We can create new features
2. remove them from our data.
𝐐6. 𝐥𝐨𝐧𝐠-𝐭𝐚𝐢𝐥𝐞𝐝 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 ?
𝐀ns. A long tail distribution of numbers is a kind of distribution having many occurrences far from the "head" or central part of the distribution. Most of occurrences in this kind of distributions occurs at early frequencies/values of x-axis.
𝐐7. 𝐎𝐮𝐭𝐥𝐢𝐞𝐫? 𝐃𝐞𝐚𝐥 𝐰𝐢𝐭𝐡 𝐢𝐭?
𝐀ns. An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution error.
Removing outliers is legitimate only for specific reasons. Outliers can be very informative about the subject-area and data collection process. If the outlier does not change the results but does affect assumptions, you may drop the outlier. Or just trim the data set, but replace outliers with the nearest “good” data, as opposed to truncating them completely.
𝐐8. 𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝐰𝐡𝐞𝐫𝐞 𝐭𝐡𝐞 𝐦𝐞𝐝𝐢𝐚𝐧 𝐢𝐬 𝐚 𝐛𝐞𝐭𝐭𝐞𝐫 𝐦𝐞𝐚𝐬𝐮𝐫𝐞 𝐭𝐡𝐚𝐧 𝐭𝐡𝐞 𝐦𝐞𝐚𝐧 ?
𝐀ns. If your data contains outliers, then you would typically rather use the median because otherwise the value of the mean would be dominated by the outliers rather than the typical values. In conclusion, if you are considering the mean, check your data for outliers, if any then better choose median.
ENJOY LEARNING 👍👍
[PART -15]
𝐐1. 𝐃𝐞𝐚𝐥 𝐰𝐢𝐭𝐡 𝐮𝐧𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐝 𝐛𝐢𝐧𝐚𝐫𝐲 𝐜𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧?
𝐀ns. Techniques to Handle unbalanced Data:
1. Use the right evaluation metrics
2. Use K-fold Cross-Validation in the right way
3. Ensemble different resampled datasets
4. Resample with different ratios
5. Design your own models
𝐐2. 𝐀𝐜𝐭𝐢𝐯𝐚𝐭𝐢𝐨𝐧 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧?
𝐀ns. Activation functions are mathematical equations that determine the output of a neural network model. It is a non-linear transformation that we do over the input before sending it to the next layer of neurons or finalizing it as output.
𝐐3. 𝐃𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧 𝐫𝐞𝐝𝐮𝐜𝐭𝐢𝐨𝐧?
𝐀ns. Dimensionality Reduction is used to reduce the feature space with consideration by a set of principal features.
𝐐4. 𝐖𝐡𝐲 𝐢𝐬 𝐦𝐞𝐚𝐧 𝐬𝐪𝐮𝐚𝐫𝐞 𝐞𝐫𝐫𝐨𝐫 𝐚 𝐛𝐚𝐝 𝐦𝐞𝐚𝐬𝐮𝐫𝐞 𝐨𝐟 𝐦𝐨𝐝𝐞𝐥 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞?
𝐀ns. Mean Squared Error (MSE) gives a relatively high weight to large errors — therefore, MSE tends to put too much emphasis on large deviations.
𝐐5. 𝐑𝐞𝐦𝐨𝐯𝐞 𝐦𝐮𝐥𝐭𝐢𝐜𝐨𝐥𝐥𝐢𝐧𝐞𝐚𝐫𝐢𝐭𝐲?
𝐀ns. To remove multicollinearities, we can do two things.
1. We can create new features
2. remove them from our data.
𝐐6. 𝐥𝐨𝐧𝐠-𝐭𝐚𝐢𝐥𝐞𝐝 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 ?
𝐀ns. A long tail distribution of numbers is a kind of distribution having many occurrences far from the "head" or central part of the distribution. Most of occurrences in this kind of distributions occurs at early frequencies/values of x-axis.
𝐐7. 𝐎𝐮𝐭𝐥𝐢𝐞𝐫? 𝐃𝐞𝐚𝐥 𝐰𝐢𝐭𝐡 𝐢𝐭?
𝐀ns. An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution error.
Removing outliers is legitimate only for specific reasons. Outliers can be very informative about the subject-area and data collection process. If the outlier does not change the results but does affect assumptions, you may drop the outlier. Or just trim the data set, but replace outliers with the nearest “good” data, as opposed to truncating them completely.
𝐐8. 𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝐰𝐡𝐞𝐫𝐞 𝐭𝐡𝐞 𝐦𝐞𝐝𝐢𝐚𝐧 𝐢𝐬 𝐚 𝐛𝐞𝐭𝐭𝐞𝐫 𝐦𝐞𝐚𝐬𝐮𝐫𝐞 𝐭𝐡𝐚𝐧 𝐭𝐡𝐞 𝐦𝐞𝐚𝐧 ?
𝐀ns. If your data contains outliers, then you would typically rather use the median because otherwise the value of the mean would be dominated by the outliers rather than the typical values. In conclusion, if you are considering the mean, check your data for outliers, if any then better choose median.
ENJOY LEARNING 👍👍
🔥2👍1
Which of the following method/s can be used to handle missing values?
Anonymous Quiz
16%
Mean Substitution
6%
Pairwise deletion
11%
Regression imputation
66%
All of the above
👍2
Which of the following is not a feature selection technique?
Anonymous Quiz
21%
Information Gain
13%
Forward Selection
23%
Regularisation
44%
K-means clustering
Data Science Interview Questions
[PART-16]
Q. How can outlier values be treated?
A. An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.
Q. What is root cause analysis?
A. A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problem—the most fundamental reason—that puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.
Q. What is bias and variance in Data Science?
A. The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.
Q. What is a confusion matrix?
A. A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.
ENJOY LEARNING 👍👍
[PART-16]
Q. How can outlier values be treated?
A. An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.
Q. What is root cause analysis?
A. A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problem—the most fundamental reason—that puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.
Q. What is bias and variance in Data Science?
A. The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.
Q. What is a confusion matrix?
A. A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.
ENJOY LEARNING 👍👍
👍4❤1
Which of the following is not a python library?
Anonymous Quiz
3%
Pandas
2%
Numpy
3%
Matplotlib
10%
Scikit-learn
82%
Array
Which of the following is not a machine learning algorithm?
Anonymous Quiz
5%
Linear Regression
9%
Random Forest
77%
Standard scalar
6%
Decision Tree
4%
Logistic Regression
Which of the following is not a supervised algorithm?
Anonymous Quiz
11%
Linear Regression
9%
Logistic Regression
64%
Clustering
16%
Decision Tree
👏3
Which of the following tool can be used for Data Visualization?
Anonymous Quiz
9%
Tableau
11%
Matplotlib
7%
Power BI
74%
All of the above
Data Science & Machine Learning
Do you want daily quiz to enhance your knowledge?
Thats an amazing response from you guys ❤️👍
Which of the following cannot give 10 as an answer?
Anonymous Quiz
8%
5*2
7%
2+5*2-2
69%
2+5*(2-2)
16%
3*2+9//2
👍2
Data Science & Machine Learning
Which of the following cannot give 10 as an answer?
Well done guys!!
Explanation for those who marked wrong answer:
Read the question again
The Answer to (9//2) is 4 and not 4.5
Explanation for those who marked wrong answer:
Read the question again
The Answer to (9//2) is 4 and not 4.5
Mathematics for Machine Learning
Published by Cambridge University Press (published April 2020)
https://mml-book.com
PDF: https://mml-book.github.io/book/mml-book.pdf
Published by Cambridge University Press (published April 2020)
https://mml-book.com
PDF: https://mml-book.github.io/book/mml-book.pdf
👍5