NEW BOT Телеграм, страница

Data Science & Machine Learning

DATA SCIENCE INTERVIEW QUESTIONS [PART-18]

Q. Is skewness in data bad for the model? Why?

A. In a statistical distribution, skewed data is defined as a curve that seems deformed or skewed to the left or right. Many statistical models fail when there is too much skewness in the data. The tail portion of skewed data may act as an outlier for the statistical model, and we know that outliers have a negative impact on model performance, particularly regression-based models.

Q. How to train a model robust to outliers?

A. You can employ an outlier-resistant model. Outliers have little effect on tree-based models, but they do alter regression-based models. If you're doing a statistical test, instead of using a parametric test, use a non-parametric one. A robust error metric can be used: The influence of outliers is reduced by switching from mean squared error to mean absolute difference. Set a limit on how much data you can collect. Try a log transformation if your data has a strong right tail.

Q. Show me how lamda and map function works together in python

A. In Python, the map() function accepts two arguments: a function and a list. The function is called with a lambda function and a list, and it returns a new list with all of the lambda modified items returned by that function for each item.

Q. Combat Overfitting?

A. When a model performs well on training data but not on new data, it is said to be overfitted. To avoid overfitting, enhance training data and simplify the model. During the training phase, you should end sooner rather than later (have an eye over the loss over the training period as soon as loss begins to increase stop training). Ridge Regularization and Lasso Regularization are two types of regularisation. To combat overfitting in neural networks, use dropout.

ENJOY LEARNING 👍👍

👍1😁1

3.44K views07:07

Data Science & Machine Learning

Introduction_to_Machine_Learning_with_Python_A_Guide_for_Beginners.epub

2 MB

👍4

3.1K views05:02

Data Science & Machine Learning

Machine Learning with Python for Everyone
👇👇
https://mariapilot.noblogs.org/files/2020/10/Machine-Learning-With-Python-For-Everyone-Pearson-2020.pdf

3.13K views16:25

Data Science & Machine Learning

Data Science from Scratch- First Principles with Python.pdf

5.6 MB

Hands-on-Machine-Learning.pdf

7.8 MB

Hands-on-Machine-Learning.pdf

❤2🥰2

3.52K views16:25

Data Science & Machine Learning

DATA SCIENCE INTERVIEW QUESTIONS
[PART -19]

Q. What is a bias-variance trade-off?

A. If the algorithm is too basic (hypothesis with linear eq. ), it may be prone to errors due to strong bias and low variance. If the algorithms are too sophisticated (hypothesis with a high degree eq. ), the variance and bias may be considerable. The new entries will not fare well in the latter scenario. Trade-off, also known as Bias Variance Trade-off, is something that exists between these two situations. There is a tradeoff between bias and variance because of this tradeoff in complexity. It's impossible for an algorithm to be both more complex and less complex at the same time.

Q. What are the support vectors in SVM?

A. Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.

Q. Describe different regularization methods, such as L1 and L2 regularization?

A. L1 regularization, also known as L1 norm or Lasso (in regression problems), combats overfitting by shrinking the parameters towards 0. This makes some features obsolete. L2 regularization, or the L2 norm, or Ridge (in regression problems), combats overfitting by forcing weights to be small, but not making them exactly 0.

Q. What is correlation and covariance in statistics?

A. Both covariance and correlation measure the relationship and the dependency between two variables. Covariance indicates the direction of the linear relationship between variables. Correlation measures both the strength and direction of the linear relationship between two variables. Correlation values are standardized. Covariance values are not standardized.

ENJOY LEARNING 👍👍

👍3

4.08K views18:15

Data Science & Machine Learning

Python for data science cheatsheet.pdf.pdf

135.2 KB

👍1

3.64K views13:00

Data Science & Machine Learning

Unsupervised Learning using Python
👇👇

3.27K viewsedited 12:44

Data Science & Machine Learning

Ankur_A_Patel_Hands_On_Unsupervised_Learning_Using_Python_How_to.epub

4.6 MB

🔥2

3.47K views12:44

Data Science & Machine Learning

How Uber works

With a huge database of drivers, as soon as a user requests for car, their algorithms match a user with the most suitable driver within a 15 second window to the nearest driver. Uber stores and analyses data on every single trip the users take which is leveraged to predict the demand for cars, set the fares and allocate sufficient resources. Data science team at Uber also performs in-depth analysis of the public transport networks across different cities so that they can focus on cities that have poor transportation and make the best use of the data to enhance customer service experience.

3.85K views04:49

Data Science & Machine Learning

Machine_Learning_For_Dummies_by_John_Paul_Mueller,_Luca_Massaron.pdf

11.8 MB

👍15

3.93K views13:40

Data Science & Machine Learning

practical statistics for data scientist.pdf

13.5 MB

👍16

4.09K views01:19

Data Science & Machine Learning

Pandas Tricks to Create a DataFrame From an Existing One.pdf

532.5 KB

👍3

3.51K views11:58

Data Science & Machine Learning

machine-learning-cheat-sheet.pdf

1.9 MB

👍7

3.56K views12:25

Data Science & Machine Learning

Python_Complete_cheatsheet.pdf

2.4 MB

👍6

3.69K views05:44

Data Science & Machine Learning

Some interview questions related to Data science

1- what is difference between structured data and unstructured data.

2- what is multicollinearity.and how to remove them

3- which algorithms you use to find the most correlated features in the datasets.

4- define entropy

5- what is the workflow of principal component analysis

6- what are the applications of principal component analysis not with respect to dimensionality reduction

7- what is the Convolutional neural network. Explain me its working

👍8❤5

3.94K views05:44

Data Science & Machine Learning

Decision trees and Random forests?

Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables.

Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.

👍9

4.04K views08:43

Data Science & Machine Learning

😉5 Machine Learning Algorithms with Project Ideas

📉Linear Regression -> House Price Prediction
📈Logistic Regression -> Loan Default Prediction
🗞️ SVM -> News Classification
🏛️ KNN -> Breast Cancer Classification
🧮 Naive Bayes -> Text Classification

👍17❤8

4.22K views04:38

Data Science & Machine Learning

Data Science Interview Questions.pdf

382.5 KB

👍5

4.12K views04:09

Data Science & Machine Learning

Supervised Learning Cheatsheet.pdf

641.3 KB

👍6

4K views14:33