DATA SCIENCE INTERVIEW QUESTIONS [PART-18]
Q. Is skewness in data bad for the model? Why?
A. In a statistical distribution, skewed data is defined as a curve that seems deformed or skewed to the left or right. Many statistical models fail when there is too much skewness in the data. The tail portion of skewed data may act as an outlier for the statistical model, and we know that outliers have a negative impact on model performance, particularly regression-based models.
Q. How to train a model robust to outliers?
A. You can employ an outlier-resistant model. Outliers have little effect on tree-based models, but they do alter regression-based models. If you're doing a statistical test, instead of using a parametric test, use a non-parametric one. A robust error metric can be used: The influence of outliers is reduced by switching from mean squared error to mean absolute difference. Set a limit on how much data you can collect. Try a log transformation if your data has a strong right tail.
Q. Show me how lamda and map function works together in python
A. In Python, the map() function accepts two arguments: a function and a list. The function is called with a lambda function and a list, and it returns a new list with all of the lambda modified items returned by that function for each item.
Q. Combat Overfitting?
A. When a model performs well on training data but not on new data, it is said to be overfitted. To avoid overfitting, enhance training data and simplify the model. During the training phase, you should end sooner rather than later (have an eye over the loss over the training period as soon as loss begins to increase stop training). Ridge Regularization and Lasso Regularization are two types of regularisation. To combat overfitting in neural networks, use dropout.
ENJOY LEARNING 👍👍
Q. Is skewness in data bad for the model? Why?
A. In a statistical distribution, skewed data is defined as a curve that seems deformed or skewed to the left or right. Many statistical models fail when there is too much skewness in the data. The tail portion of skewed data may act as an outlier for the statistical model, and we know that outliers have a negative impact on model performance, particularly regression-based models.
Q. How to train a model robust to outliers?
A. You can employ an outlier-resistant model. Outliers have little effect on tree-based models, but they do alter regression-based models. If you're doing a statistical test, instead of using a parametric test, use a non-parametric one. A robust error metric can be used: The influence of outliers is reduced by switching from mean squared error to mean absolute difference. Set a limit on how much data you can collect. Try a log transformation if your data has a strong right tail.
Q. Show me how lamda and map function works together in python
A. In Python, the map() function accepts two arguments: a function and a list. The function is called with a lambda function and a list, and it returns a new list with all of the lambda modified items returned by that function for each item.
Q. Combat Overfitting?
A. When a model performs well on training data but not on new data, it is said to be overfitted. To avoid overfitting, enhance training data and simplify the model. During the training phase, you should end sooner rather than later (have an eye over the loss over the training period as soon as loss begins to increase stop training). Ridge Regularization and Lasso Regularization are two types of regularisation. To combat overfitting in neural networks, use dropout.
ENJOY LEARNING 👍👍
👍1😁1
Machine Learning with Python for Everyone
👇👇
https://mariapilot.noblogs.org/files/2020/10/Machine-Learning-With-Python-For-Everyone-Pearson-2020.pdf
👇👇
https://mariapilot.noblogs.org/files/2020/10/Machine-Learning-With-Python-For-Everyone-Pearson-2020.pdf
Hands-on-Machine-Learning.pdf
7.8 MB
Hands-on-Machine-Learning.pdf
❤2🥰2
DATA SCIENCE INTERVIEW QUESTIONS
[PART -19]
Q. What is a bias-variance trade-off?
A. If the algorithm is too basic (hypothesis with linear eq. ), it may be prone to errors due to strong bias and low variance. If the algorithms are too sophisticated (hypothesis with a high degree eq. ), the variance and bias may be considerable. The new entries will not fare well in the latter scenario. Trade-off, also known as Bias Variance Trade-off, is something that exists between these two situations. There is a tradeoff between bias and variance because of this tradeoff in complexity. It's impossible for an algorithm to be both more complex and less complex at the same time.
Q. What are the support vectors in SVM?
A. Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.
Q. Describe different regularization methods, such as L1 and L2 regularization?
A. L1 regularization, also known as L1 norm or Lasso (in regression problems), combats overfitting by shrinking the parameters towards 0. This makes some features obsolete. L2 regularization, or the L2 norm, or Ridge (in regression problems), combats overfitting by forcing weights to be small, but not making them exactly 0.
Q. What is correlation and covariance in statistics?
A. Both covariance and correlation measure the relationship and the dependency between two variables. Covariance indicates the direction of the linear relationship between variables. Correlation measures both the strength and direction of the linear relationship between two variables. Correlation values are standardized. Covariance values are not standardized.
ENJOY LEARNING 👍👍
[PART -19]
Q. What is a bias-variance trade-off?
A. If the algorithm is too basic (hypothesis with linear eq. ), it may be prone to errors due to strong bias and low variance. If the algorithms are too sophisticated (hypothesis with a high degree eq. ), the variance and bias may be considerable. The new entries will not fare well in the latter scenario. Trade-off, also known as Bias Variance Trade-off, is something that exists between these two situations. There is a tradeoff between bias and variance because of this tradeoff in complexity. It's impossible for an algorithm to be both more complex and less complex at the same time.
Q. What are the support vectors in SVM?
A. Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.
Q. Describe different regularization methods, such as L1 and L2 regularization?
A. L1 regularization, also known as L1 norm or Lasso (in regression problems), combats overfitting by shrinking the parameters towards 0. This makes some features obsolete. L2 regularization, or the L2 norm, or Ridge (in regression problems), combats overfitting by forcing weights to be small, but not making them exactly 0.
Q. What is correlation and covariance in statistics?
A. Both covariance and correlation measure the relationship and the dependency between two variables. Covariance indicates the direction of the linear relationship between variables. Correlation measures both the strength and direction of the linear relationship between two variables. Correlation values are standardized. Covariance values are not standardized.
ENJOY LEARNING 👍👍
👍3
How Uber works
With a huge database of drivers, as soon as a user requests for car, their algorithms match a user with the most suitable driver within a 15 second window to the nearest driver. Uber stores and analyses data on every single trip the users take which is leveraged to predict the demand for cars, set the fares and allocate sufficient resources. Data science team at Uber also performs in-depth analysis of the public transport networks across different cities so that they can focus on cities that have poor transportation and make the best use of the data to enhance customer service experience.
With a huge database of drivers, as soon as a user requests for car, their algorithms match a user with the most suitable driver within a 15 second window to the nearest driver. Uber stores and analyses data on every single trip the users take which is leveraged to predict the demand for cars, set the fares and allocate sufficient resources. Data science team at Uber also performs in-depth analysis of the public transport networks across different cities so that they can focus on cities that have poor transportation and make the best use of the data to enhance customer service experience.
Some interview questions related to Data science
1- what is difference between structured data and unstructured data.
2- what is multicollinearity.and how to remove them
3- which algorithms you use to find the most correlated features in the datasets.
4- define entropy
5- what is the workflow of principal component analysis
6- what are the applications of principal component analysis not with respect to dimensionality reduction
7- what is the Convolutional neural network. Explain me its working
1- what is difference between structured data and unstructured data.
2- what is multicollinearity.and how to remove them
3- which algorithms you use to find the most correlated features in the datasets.
4- define entropy
5- what is the workflow of principal component analysis
6- what are the applications of principal component analysis not with respect to dimensionality reduction
7- what is the Convolutional neural network. Explain me its working
👍8❤5
Decision trees and Random forests?
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables.
Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables.
Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.
👍9