Data Science & Machine Learning – Telegram
Data Science & Machine Learning
72.1K subscribers
768 photos
1 video
68 files
677 links
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data
Download Telegram
DATA SCIENCE INTERVIEW QUESTIONS [PART-17]

Q. How can outlier values be treated?

A. An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.



Q. What is root cause analysis?

A. A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problem—the most fundamental reason—that puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.



Q. What is bias and variance in Data Science?

A. The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.



Q. What is a confusion matrix?

A. A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.

ENJOY LEARNING 👍👍
👍2
DATA SCIENCE INTERVIEW QUESTIONS [PART-18]


Q. Is skewness in data bad for the model? Why?

A. In a statistical distribution, skewed data is defined as a curve that seems deformed or skewed to the left or right. Many statistical models fail when there is too much skewness in the data. The tail portion of skewed data may act as an outlier for the statistical model, and we know that outliers have a negative impact on model performance, particularly regression-based models.



Q. How to train a model robust to outliers?

A. You can employ an outlier-resistant model. Outliers have little effect on tree-based models, but they do alter regression-based models. If you're doing a statistical test, instead of using a parametric test, use a non-parametric one. A robust error metric can be used: The influence of outliers is reduced by switching from mean squared error to mean absolute difference. Set a limit on how much data you can collect. Try a log transformation if your data has a strong right tail.



Q. Show me how lamda and map function works together in python

A. In Python, the map() function accepts two arguments: a function and a list. The function is called with a lambda function and a list, and it returns a new list with all of the lambda modified items returned by that function for each item.



Q. Combat Overfitting?

A. When a model performs well on training data but not on new data, it is said to be overfitted. To avoid overfitting, enhance training data and simplify the model. During the training phase, you should end sooner rather than later (have an eye over the loss over the training period as soon as loss begins to increase stop training). Ridge Regularization and Lasso Regularization are two types of regularisation. To combat overfitting in neural networks, use dropout.

ENJOY LEARNING 👍👍
👍1😁1
DATA SCIENCE INTERVIEW QUESTIONS
[PART -19]


Q. What is a bias-variance trade-off?

A. If the algorithm is too basic (hypothesis with linear eq. ), it may be prone to errors due to strong bias and low variance. If the algorithms are too sophisticated (hypothesis with a high degree eq. ), the variance and bias may be considerable. The new entries will not fare well in the latter scenario. Trade-off, also known as Bias Variance Trade-off, is something that exists between these two situations. There is a tradeoff between bias and variance because of this tradeoff in complexity. It's impossible for an algorithm to be both more complex and less complex at the same time.


Q. What are the support vectors in SVM?

A. Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.



Q. Describe different regularization methods, such as L1 and L2 regularization?

A. L1 regularization, also known as L1 norm or Lasso (in regression problems), combats overfitting by shrinking the parameters towards 0. This makes some features obsolete. L2 regularization, or the L2 norm, or Ridge (in regression problems), combats overfitting by forcing weights to be small, but not making them exactly 0.


Q. What is correlation and covariance in statistics?

A. Both covariance and correlation measure the relationship and the dependency between two variables. Covariance indicates the direction of the linear relationship between variables. Correlation measures both the strength and direction of the linear relationship between two variables. Correlation values are standardized. Covariance values are not standardized.

ENJOY LEARNING 👍👍
👍3
Unsupervised Learning using Python
👇👇
How Uber works

With a huge database of drivers, as soon as a user requests for car, their algorithms match a user with the most suitable driver within a 15 second window to the nearest driver. Uber stores and analyses data on every single trip the users take which is leveraged to predict the demand for cars, set the fares and allocate sufficient resources. Data science team at Uber also performs in-depth analysis of the public transport networks across different cities so that they can focus on cities that have poor transportation and make the best use of the data to enhance customer service experience.
Some interview questions related to Data science

1- what is difference between structured data and unstructured data.

2- what is multicollinearity.and how to remove them

3- which algorithms you use to find the most correlated features in the datasets.

4- define entropy

5- what is the workflow of principal component analysis

6- what are the applications of principal component analysis not with respect to dimensionality reduction

7- what is the Convolutional neural network. Explain me its working
👍85