Data Science & Machine Learning – Telegram
Data Science & Machine Learning
72.1K subscribers
768 photos
1 video
68 files
677 links
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data
Download Telegram
Which of the following cannot give 10 as an answer?
Anonymous Quiz
8%
5*2
7%
2+5*2-2
69%
2+5*(2-2)
16%
3*2+9//2
👍2
Data Science & Machine Learning
Which of the following cannot give 10 as an answer?
Well done guys!!

Explanation for those who marked wrong answer:
Read the question again
The Answer to (9//2) is 4 and not 4.5
Mathematics for Machine Learning

Published by Cambridge University Press (published April 2020)

https://mml-book.com

PDF: https://mml-book.github.io/book/mml-book.pdf
👍5
Neural Networks and Learning Machines Third Edition
👇👇
https://cours.etsmtl.ca/sys843/REFS/Books/ebook_Haykin09.pdf
👍3
Which of the following is not an Unsupervised algorithm?
Anonymous Quiz
13%
K-means clustering
14%
Hierarchical Clustering
21%
Anomaly detection
52%
Logistic Regression
©How fresher can get a job as a data scientist?©

India as a job market is highly resistant to hire data scientist as a fresher. Everyone out there asks for at least 2 years of experience, but then the question is where will we get the two years experience from?

The important thing here to build a portfolio. As you are a fresher I would assume you had learnt data science through online courses. They only teach you the basics, the analytical skills required to clean the data and apply machine learning algorithms to them comes only from practice.

Do some real-world data science projects, participate in Kaggle competition. kaggle provides data sets for practice as well. Whatever projects you do, create a GitHub repository for it. Place all your projects there so when a recruiter is looking at your profile they know you have hands-on practice and do know the basics. This will take you a long way.

All the major data science jobs for freshers will only be available through off-campus interviews.

Some companies that hires data scientists are:

Siemens

Accenture

IBM

Cerner

Creating a technical portfolio will showcase the knowledge you have already gained and that is essential while you got out there as a fresher and try to find a data scientist job.
👍4
7 Steps of the Machine Learning Process

Data Collection: The process of extracting raw datasets for the machine learning task. This data can come from a variety of places, ranging from open-source online resources to paid crowdsourcing. The first step of the machine learning process is arguably the most important. If the data you collect is poor quality or irrelevant, then the model you train will be poor quality as well.

Data Processing and Preparation:
Once you’ve gathered the relevant data, you need to process it and make sure that it is in a usable format for training a machine learning model. This includes handling missing data, dealing with outliers, etc.

Feature Engineering:
Once you’ve collected and processed your dataset, you will likely need to transform some of the features (and sometimes even drop some features) in order to optimize how well a model can be trained on the data.

Model Selection:
Based on the dataset, you will choose which model architecture to use. This is one of the main tasks of industry engineers. Rather than attempting to come up with a completely novel model architecture, most tasks can be thoroughly performed with an existing architecture (or combination of model architectures).

Model Training and Data Pipeline:
After selecting the model architecture, you will create a data pipeline for training the model. This means creating a continuous stream of batched data observations to efficiently train the model. Since training can take a long time, you want your data pipeline to be as efficient as possible.

Model Validation:
After training the model for a sufficient amount of time, you will need to validate the model’s performance on a held-out portion of the overall dataset. This data needs to come from the same underlying distribution as the training dataset, but needs to be different data that the model has not seen before.

Model Persistence:
Finally, after training and validating the model’s performance, you need to be able to properly save the model weights and possibly push the model to production. This means setting up a process with which new users can easily use your pre-trained model to make predictions.
5_6339144778529113396.pdf
11.1 MB
Machine learning notes in 15 pages
DATA SCIENCE INTERVIEW QUESTIONS [PART-17]

Q. How can outlier values be treated?

A. An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.



Q. What is root cause analysis?

A. A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problem—the most fundamental reason—that puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.



Q. What is bias and variance in Data Science?

A. The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.



Q. What is a confusion matrix?

A. A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.

ENJOY LEARNING 👍👍
👍2
DATA SCIENCE INTERVIEW QUESTIONS [PART-18]


Q. Is skewness in data bad for the model? Why?

A. In a statistical distribution, skewed data is defined as a curve that seems deformed or skewed to the left or right. Many statistical models fail when there is too much skewness in the data. The tail portion of skewed data may act as an outlier for the statistical model, and we know that outliers have a negative impact on model performance, particularly regression-based models.



Q. How to train a model robust to outliers?

A. You can employ an outlier-resistant model. Outliers have little effect on tree-based models, but they do alter regression-based models. If you're doing a statistical test, instead of using a parametric test, use a non-parametric one. A robust error metric can be used: The influence of outliers is reduced by switching from mean squared error to mean absolute difference. Set a limit on how much data you can collect. Try a log transformation if your data has a strong right tail.



Q. Show me how lamda and map function works together in python

A. In Python, the map() function accepts two arguments: a function and a list. The function is called with a lambda function and a list, and it returns a new list with all of the lambda modified items returned by that function for each item.



Q. Combat Overfitting?

A. When a model performs well on training data but not on new data, it is said to be overfitted. To avoid overfitting, enhance training data and simplify the model. During the training phase, you should end sooner rather than later (have an eye over the loss over the training period as soon as loss begins to increase stop training). Ridge Regularization and Lasso Regularization are two types of regularisation. To combat overfitting in neural networks, use dropout.

ENJOY LEARNING 👍👍
👍1😁1