Data Science & Machine Learning – Telegram
Data Science & Machine Learning
72.1K subscribers
768 photos
1 video
68 files
677 links
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data
Download Telegram
Useful Pandas🐼 method you should definitely know

head()
info()
fillna()
melt()
pivot()
query()
merge()
assign()
groupby()
describe()
sample()
replace()
rename()
👍15😁1
Data Analyst Interview Questions
[Python, SQL, PowerBI]

1. Is indentation required in python?
Ans:
Indentation is necessary for Python. It specifies a block of code. All code within loops, classes, functions, etc is specified within an indented block. It is usually done using four space characters. If your code is not indented necessarily, it will not execute accurately and will throw errors as well.

2. What are Entities and Relationships?
Ans:
Entity:
An entity can be a real-world object that can be easily identifiable. For example, in a college database, students, professors, workers, departments, and projects can be referred to as entities.

Relationships: Relations or links between entities that have something to do with each other. For example – The employee’s table in a company’s database can be associated with the salary table in the same database.

3. What are Aggregate and Scalar functions?
Ans:
An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. A scalar function returns a single value based on the input value.

4. What are Custom Visuals in Power BI?
Ans:
Custom Visuals are like any other visualizations, generated using Power BI. The only difference is that it develops the custom visuals using a custom SDK. The languages like JQuery and JavaScript are used to create custom visuals in Power BI

ENJOY LEARNING 👍👍
👍18
4👏3👍1
Harvard CS109A #DataScience course materials — huge collection free & open!

1. Lecture notes
2. R code, #Python notebooks
3. Lab material
4. Advanced sections
and more ...

https://harvard-iacs.github.io/2019-CS109A/pages/materials.html
👍9😁1
Which of the following command isn't used in pandas?
Anonymous Quiz
6%
head()
4%
replace()
8%
groupby()
4%
rename()
78%
datasciencefun()
😁13🤩5👏3👍2🔥2
American Express is hiring
Position: Data Science Analyst
👉 Apply: https://aexp.eightfold.ai/careers/job/13347327
👍 All the best.
👍5
Amazon is hiring Data Scientist Intern!
Qualifications: Bachelor's/ Master's Degree
Salary: 5.4 LPA (Expected)
Batch: 2019/2020/2021/2022/2023
Experience: Freshers
Location: Bangalore, India

📌Apply Link: https://www.amazon.jobs/en/jobs/2213292/data-scientist-intern
👍3
Every ML project should keep the following documentation:

• Change log
• Tech debt log
• Potential risks
• Experiment logs
• Future work ideas
• List of assumptions
• ETL pipeline denoscription
👍101
Advanced Data Analytics Using Python.pdf
2.2 MB
Advanced Data Analytics Using Python
With Machine Learning, Deep Learning and NLP Examples
#book #Ml
👍4
Do you want roadmap for becoming data scientist in this channel?
Anonymous Poll
96%
Yes
4%
No
🤩64👍1👏1🎉1
Important Topics to become a data scientist
[Advanced Level]
👇👇

1. Mathematics

Linear Algebra
Analytic Geometry
Matrix
Vector Calculus
Optimization
Regression
Dimensionality Reduction
Density Estimation
Classification

2. Probability

Introduction to Probability
1D Random Variable
The function of One Random Variable
Joint Probability Distribution
Discrete Distribution
Normal Distribution

3. Statistics

Introduction to Statistics
Data Denoscription
Random Samples
Sampling Distribution
Parameter Estimation
Hypotheses Testing
Regression

4. Programming

Python:

Python Basics
List
Set
Tuples
Dictionary
Function
NumPy
Pandas
Matplotlib/Seaborn

R Programming:

R Basics
Vector
List
Data Frame
Matrix
Array
Function
dplyr
ggplot2
Tidyr
Shiny

DataBase:
SQL
MongoDB

Data Structures

Web scraping

Linux

Git

5. Machine Learning

How Model Works
Basic Data Exploration
First ML Model
Model Validation
Underfitting & Overfitting
Random Forest
Handling Missing Values
Handling Categorical Variables
Pipelines
Cross-Validation(R)
XGBoost(Python|R)
Data Leakage

6. Deep Learning

Artificial Neural Network
Convolutional Neural Network
Recurrent Neural Network
TensorFlow
Keras
PyTorch
A Single Neuron
Deep Neural Network
Stochastic Gradient Descent
Overfitting and Underfitting
Dropout Batch Normalization
Binary Classification

7. Feature Engineering

Baseline Model
Categorical Encodings
Feature Generation
Feature Selection

8. Natural Language Processing

Text Classification
Word Vectors

9. Data Visualization Tools

BI (Business Intelligence):
Tableau
Power BI
Qlik View
Qlik Sense

10. Deployment

Microsoft Azure
Heroku
Google Cloud Platform
Flask
Django

Join @datasciencefun to learning important data science and machine learning concepts

ENJOY LEARNING 👍👍
👍307
Some of the essential libraries of Python that are used in Data Science

Numpy

SciPy

Pandas

Matplotlib

Keras

TensorFlow

Scikit-learn
👍14
Python Machine Learning Projects
👇👇
https://news.1rj.ru/str/Programming_experts/151
👍1🥰1
You don't need to buy a GPU for machine learning work!

There are other alternatives. Here are some:

1. Google Colab
2. Kaggle
3. Deepnote
4. AWS SageMaker
5. GCP Notebooks
6. Azure Notebooks
7. Cocalc
8. Binder
9. Saturncloud
10. Datablore
11. IBM Notebooks

Spend your time focusing on your problem.💪💪
👍13
1. What is Dimensionality Reduction?

In the real world, Machine Learning models are built on top of features and parameters. These features can be multidimensional and large in number. Sometimes, the features may be irrelevant and it becomes a difficult task to visualize them. This is where dimensionality reduction is used to cut down irrelevant and redundant features with the help of principal variables. These principal variables conserve the features, and are a subgroup, of the parent variables.


2.What is the bin in tableau?

Bins in tableau are containers of equal size used to store data values fitting in bin size. In other words, bins group the data into groups of equal size or data which can be used in systematic viewing of data. All the discrete fields in tableau can also be considered as set of bins.


3.What’s a Fourier transform?

A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes, and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain—it’s a very common way to extract features from audio signals or other time series such as sensor data.


4. What are Superkey and candidate key in SQL?

A super key may be a single or a combination of keys that help to identify a record in a table. Know that Super keys can have one or more attributes, even though all the attributes are not necessary to identify the records.

A candidate key is the subset of Superkey, which can have one or more than one attribute to identify records in a table. Unlike Superkey, all the attributes of the candidate key must be helpful to identify the records.
👍74
You don't need to spend several $𝟭𝟬𝟬𝟬𝘀 to learn Data Science.

Stanford University, Harvard University & Massachusetts Institute of Technology is providing free courses.💥

Here's 8 free Courses that'll teach you better than the paid ones:


1. CS50’s Introduction to Artificial Intelligence with Python (Harvard)

https://pll.harvard.edu/course/cs50s-introduction-artificial-intelligence-python

2. Data Science: Machine Learning (Harvard)

https://pll.harvard.edu/course/data-science-machine-learning

3. Artificial Intelligence (MIT)

https://lnkd.in/dG5BCPen

4. Introduction to Computational Thinking and Data Science (MIT)

https://lnkd.in/ddm5Ckk9

5. Machine Learning (MIT)

https://lnkd.in/dJEjStCw

6. Matrix Methods in Data Analysis, Signal Processing, and Machine Learning (MIT)

https://lnkd.in/dkpyt6qr

7. Statistical Learning (Stanford)

https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning

8. Mining Massive Data Sets (Stanford)

📍https://online.stanford.edu/courses/soe-ycs0007-mining-massive-data-sets

ENJOY LEARNING
👏8👍54🥰1
1.What is the meaning of term weight initialization in neural networks?

In neural networking, weight initialization is one of the essential factors. A bad weight initialization prevents a network from learning. On the other side, a good weight initialization helps in giving a quicker convergence and a better overall error. Biases can be initialized to zero. The standard rule for setting the weights is to be close to zero without being too small.

2.What is Cross-validation in Machine Learning?

Cross-validation allows a system to increase the performance of the given Machine Learning algorithm. This sampling process is done to break the dataset into smaller parts that have the same number of rows, out of which a random part is selected as a test set and the rest of the parts are kept as train sets. Cross-validation consists of the following techniques:
• Holdout method
• K-fold cross-validation
• Stratified k-fold cross-validation
• Leave p-out cross-validation

3.What is a Self-Join?

A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.

4. What are the types of views in SQL?

In SQL, the views are classified into four types. They are:
Simple View: A view that is based on a single table and does not have a GROUP BY clause or other features.
Complex View: A view that is built from several tables and includes a GROUP BY clause as well as functions.
Inline View: A view that is built on a subquery in the FROM clause, which provides a temporary table and simplifies a complicated query.
Materialized View: A view that saves both the definition and the details. It builds data replicas by physically preserving them.
👍7
1. What do you understand by the term silhouette coefficient?

The silhouette coefficient is a measure of how well clustered together a data point is with respect to the other points in its cluster. It is a measure of how similar a point is to the points in its own cluster, and how dissimilar it is to the points in other clusters. The silhouette coefficient ranges from -1 to 1, with 1 being the best possible score and -1 being the worst possible score.


2. What is the difference between trend and seasonality in time series?

Trends and seasonality are two characteristics of time series metrics that break many models. Trends are continuous increases or decreases in a metric’s value. Seasonality, on the other hand, reflects periodic (cyclical) patterns that occur in a system, usually rising above a baseline and then decreasing again.


3. What is Bag of Words in NLP?

Bag of Words is a commonly used model that depends on word frequencies or occurrences to train a classifier. This model creates an occurrence matrix for documents or sentences irrespective of its grammatical structure or word order.


4. What is the difference between bagging and boosting?

Bagging is a homogeneous weak learners’ model that learns from each other independently in parallel and combines them for determining the model average. Boosting is also a homogeneous weak learners’ model but works differently from Bagging. In this model, learners learn sequentially and adaptively to improve model predictions of a learning algorithm.

ENJOY LEARNING 👍👍
👍4