Useful Pandas🐼 method you should definitely know
✅ head()
✅ info()
✅ fillna()
✅ melt()
✅ pivot()
✅ query()
✅ merge()
✅ assign()
✅ groupby()
✅ describe()
✅ sample()
✅ replace()
✅ rename()
✅ head()
✅ info()
✅ fillna()
✅ melt()
✅ pivot()
✅ query()
✅ merge()
✅ assign()
✅ groupby()
✅ describe()
✅ sample()
✅ replace()
✅ rename()
👍15😁1
Data Analyst Interview Questions
[Python, SQL, PowerBI]
1. Is indentation required in python?
Ans: Indentation is necessary for Python. It specifies a block of code. All code within loops, classes, functions, etc is specified within an indented block. It is usually done using four space characters. If your code is not indented necessarily, it will not execute accurately and will throw errors as well.
2. What are Entities and Relationships?
Ans:
Entity: An entity can be a real-world object that can be easily identifiable. For example, in a college database, students, professors, workers, departments, and projects can be referred to as entities.
Relationships: Relations or links between entities that have something to do with each other. For example – The employee’s table in a company’s database can be associated with the salary table in the same database.
3. What are Aggregate and Scalar functions?
Ans: An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. A scalar function returns a single value based on the input value.
4. What are Custom Visuals in Power BI?
Ans: Custom Visuals are like any other visualizations, generated using Power BI. The only difference is that it develops the custom visuals using a custom SDK. The languages like JQuery and JavaScript are used to create custom visuals in Power BI
ENJOY LEARNING 👍👍
[Python, SQL, PowerBI]
1. Is indentation required in python?
Ans: Indentation is necessary for Python. It specifies a block of code. All code within loops, classes, functions, etc is specified within an indented block. It is usually done using four space characters. If your code is not indented necessarily, it will not execute accurately and will throw errors as well.
2. What are Entities and Relationships?
Ans:
Entity: An entity can be a real-world object that can be easily identifiable. For example, in a college database, students, professors, workers, departments, and projects can be referred to as entities.
Relationships: Relations or links between entities that have something to do with each other. For example – The employee’s table in a company’s database can be associated with the salary table in the same database.
3. What are Aggregate and Scalar functions?
Ans: An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. A scalar function returns a single value based on the input value.
4. What are Custom Visuals in Power BI?
Ans: Custom Visuals are like any other visualizations, generated using Power BI. The only difference is that it develops the custom visuals using a custom SDK. The languages like JQuery and JavaScript are used to create custom visuals in Power BI
ENJOY LEARNING 👍👍
👍18
Harvard CS109A #DataScience course materials — huge collection free & open!
1. Lecture notes
2. R code, #Python notebooks
3. Lab material
4. Advanced sections
and more ...
https://harvard-iacs.github.io/2019-CS109A/pages/materials.html
1. Lecture notes
2. R code, #Python notebooks
3. Lab material
4. Advanced sections
and more ...
https://harvard-iacs.github.io/2019-CS109A/pages/materials.html
👍9😁1
Which of the following command isn't used in pandas?
Anonymous Quiz
6%
head()
4%
replace()
8%
groupby()
4%
rename()
78%
datasciencefun()
😁13🤩5👏3👍2🔥2
Forwarded from Jobs | Internships | Placement | Interviews
American Express is hiring
Position: Data Science Analyst
👉 Apply: https://aexp.eightfold.ai/careers/job/13347327
👍 All the best.
Position: Data Science Analyst
👉 Apply: https://aexp.eightfold.ai/careers/job/13347327
👍 All the best.
👍5
Forwarded from Jobs | Internships | Placement | Interviews
Amazon is hiring Data Scientist Intern!
Qualifications: Bachelor's/ Master's Degree
Salary: 5.4 LPA (Expected)
Batch: 2019/2020/2021/2022/2023
Experience: Freshers
Location: Bangalore, India
📌Apply Link: https://www.amazon.jobs/en/jobs/2213292/data-scientist-intern
Qualifications: Bachelor's/ Master's Degree
Salary: 5.4 LPA (Expected)
Batch: 2019/2020/2021/2022/2023
Experience: Freshers
Location: Bangalore, India
📌Apply Link: https://www.amazon.jobs/en/jobs/2213292/data-scientist-intern
👍3
🤩6❤4👍1👏1🎉1
Important Topics to become a data scientist
[Advanced Level]
👇👇
1. Mathematics
Linear Algebra
Analytic Geometry
Matrix
Vector Calculus
Optimization
Regression
Dimensionality Reduction
Density Estimation
Classification
2. Probability
Introduction to Probability
1D Random Variable
The function of One Random Variable
Joint Probability Distribution
Discrete Distribution
Normal Distribution
3. Statistics
Introduction to Statistics
Data Denoscription
Random Samples
Sampling Distribution
Parameter Estimation
Hypotheses Testing
Regression
4. Programming
Python:
Python Basics
List
Set
Tuples
Dictionary
Function
NumPy
Pandas
Matplotlib/Seaborn
R Programming:
R Basics
Vector
List
Data Frame
Matrix
Array
Function
dplyr
ggplot2
Tidyr
Shiny
DataBase:
SQL
MongoDB
Data Structures
Web scraping
Linux
Git
5. Machine Learning
How Model Works
Basic Data Exploration
First ML Model
Model Validation
Underfitting & Overfitting
Random Forest
Handling Missing Values
Handling Categorical Variables
Pipelines
Cross-Validation(R)
XGBoost(Python|R)
Data Leakage
6. Deep Learning
Artificial Neural Network
Convolutional Neural Network
Recurrent Neural Network
TensorFlow
Keras
PyTorch
A Single Neuron
Deep Neural Network
Stochastic Gradient Descent
Overfitting and Underfitting
Dropout Batch Normalization
Binary Classification
7. Feature Engineering
Baseline Model
Categorical Encodings
Feature Generation
Feature Selection
8. Natural Language Processing
Text Classification
Word Vectors
9. Data Visualization Tools
BI (Business Intelligence):
Tableau
Power BI
Qlik View
Qlik Sense
10. Deployment
Microsoft Azure
Heroku
Google Cloud Platform
Flask
Django
Join @datasciencefun to learning important data science and machine learning concepts
ENJOY LEARNING 👍👍
[Advanced Level]
👇👇
1. Mathematics
Linear Algebra
Analytic Geometry
Matrix
Vector Calculus
Optimization
Regression
Dimensionality Reduction
Density Estimation
Classification
2. Probability
Introduction to Probability
1D Random Variable
The function of One Random Variable
Joint Probability Distribution
Discrete Distribution
Normal Distribution
3. Statistics
Introduction to Statistics
Data Denoscription
Random Samples
Sampling Distribution
Parameter Estimation
Hypotheses Testing
Regression
4. Programming
Python:
Python Basics
List
Set
Tuples
Dictionary
Function
NumPy
Pandas
Matplotlib/Seaborn
R Programming:
R Basics
Vector
List
Data Frame
Matrix
Array
Function
dplyr
ggplot2
Tidyr
Shiny
DataBase:
SQL
MongoDB
Data Structures
Web scraping
Linux
Git
5. Machine Learning
How Model Works
Basic Data Exploration
First ML Model
Model Validation
Underfitting & Overfitting
Random Forest
Handling Missing Values
Handling Categorical Variables
Pipelines
Cross-Validation(R)
XGBoost(Python|R)
Data Leakage
6. Deep Learning
Artificial Neural Network
Convolutional Neural Network
Recurrent Neural Network
TensorFlow
Keras
PyTorch
A Single Neuron
Deep Neural Network
Stochastic Gradient Descent
Overfitting and Underfitting
Dropout Batch Normalization
Binary Classification
7. Feature Engineering
Baseline Model
Categorical Encodings
Feature Generation
Feature Selection
8. Natural Language Processing
Text Classification
Word Vectors
9. Data Visualization Tools
BI (Business Intelligence):
Tableau
Power BI
Qlik View
Qlik Sense
10. Deployment
Microsoft Azure
Heroku
Google Cloud Platform
Flask
Django
Join @datasciencefun to learning important data science and machine learning concepts
ENJOY LEARNING 👍👍
👍30❤7
Some of the essential libraries of Python that are used in Data Science
Numpy
SciPy
Pandas
Matplotlib
Keras
TensorFlow
Scikit-learn
Numpy
SciPy
Pandas
Matplotlib
Keras
TensorFlow
Scikit-learn
👍14
👍1🥰1
You don't need to buy a GPU for machine learning work!
There are other alternatives. Here are some:
1. Google Colab
2. Kaggle
3. Deepnote
4. AWS SageMaker
5. GCP Notebooks
6. Azure Notebooks
7. Cocalc
8. Binder
9. Saturncloud
10. Datablore
11. IBM Notebooks
Spend your time focusing on your problem.💪💪
There are other alternatives. Here are some:
1. Google Colab
2. Kaggle
3. Deepnote
4. AWS SageMaker
5. GCP Notebooks
6. Azure Notebooks
7. Cocalc
8. Binder
9. Saturncloud
10. Datablore
11. IBM Notebooks
Spend your time focusing on your problem.💪💪
👍13
1. What is Dimensionality Reduction?
In the real world, Machine Learning models are built on top of features and parameters. These features can be multidimensional and large in number. Sometimes, the features may be irrelevant and it becomes a difficult task to visualize them. This is where dimensionality reduction is used to cut down irrelevant and redundant features with the help of principal variables. These principal variables conserve the features, and are a subgroup, of the parent variables.
2.What is the bin in tableau?
Bins in tableau are containers of equal size used to store data values fitting in bin size. In other words, bins group the data into groups of equal size or data which can be used in systematic viewing of data. All the discrete fields in tableau can also be considered as set of bins.
3.What’s a Fourier transform?
A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes, and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain—it’s a very common way to extract features from audio signals or other time series such as sensor data.
4. What are Superkey and candidate key in SQL?
A super key may be a single or a combination of keys that help to identify a record in a table. Know that Super keys can have one or more attributes, even though all the attributes are not necessary to identify the records.
A candidate key is the subset of Superkey, which can have one or more than one attribute to identify records in a table. Unlike Superkey, all the attributes of the candidate key must be helpful to identify the records.
In the real world, Machine Learning models are built on top of features and parameters. These features can be multidimensional and large in number. Sometimes, the features may be irrelevant and it becomes a difficult task to visualize them. This is where dimensionality reduction is used to cut down irrelevant and redundant features with the help of principal variables. These principal variables conserve the features, and are a subgroup, of the parent variables.
2.What is the bin in tableau?
Bins in tableau are containers of equal size used to store data values fitting in bin size. In other words, bins group the data into groups of equal size or data which can be used in systematic viewing of data. All the discrete fields in tableau can also be considered as set of bins.
3.What’s a Fourier transform?
A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes, and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain—it’s a very common way to extract features from audio signals or other time series such as sensor data.
4. What are Superkey and candidate key in SQL?
A super key may be a single or a combination of keys that help to identify a record in a table. Know that Super keys can have one or more attributes, even though all the attributes are not necessary to identify the records.
A candidate key is the subset of Superkey, which can have one or more than one attribute to identify records in a table. Unlike Superkey, all the attributes of the candidate key must be helpful to identify the records.
👍7❤4
You don't need to spend several $𝟭𝟬𝟬𝟬𝘀 to learn Data Science.❌
Stanford University, Harvard University & Massachusetts Institute of Technology is providing free courses.💥
Here's 8 free Courses that'll teach you better than the paid ones:
1. CS50’s Introduction to Artificial Intelligence with Python (Harvard)
https://pll.harvard.edu/course/cs50s-introduction-artificial-intelligence-python
2. Data Science: Machine Learning (Harvard)
https://pll.harvard.edu/course/data-science-machine-learning
3. Artificial Intelligence (MIT)
https://lnkd.in/dG5BCPen
4. Introduction to Computational Thinking and Data Science (MIT)
https://lnkd.in/ddm5Ckk9
5. Machine Learning (MIT)
https://lnkd.in/dJEjStCw
6. Matrix Methods in Data Analysis, Signal Processing, and Machine Learning (MIT)
https://lnkd.in/dkpyt6qr
7. Statistical Learning (Stanford)
https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning
8. Mining Massive Data Sets (Stanford)
📍https://online.stanford.edu/courses/soe-ycs0007-mining-massive-data-sets
ENJOY LEARNING
Stanford University, Harvard University & Massachusetts Institute of Technology is providing free courses.💥
Here's 8 free Courses that'll teach you better than the paid ones:
1. CS50’s Introduction to Artificial Intelligence with Python (Harvard)
https://pll.harvard.edu/course/cs50s-introduction-artificial-intelligence-python
2. Data Science: Machine Learning (Harvard)
https://pll.harvard.edu/course/data-science-machine-learning
3. Artificial Intelligence (MIT)
https://lnkd.in/dG5BCPen
4. Introduction to Computational Thinking and Data Science (MIT)
https://lnkd.in/ddm5Ckk9
5. Machine Learning (MIT)
https://lnkd.in/dJEjStCw
6. Matrix Methods in Data Analysis, Signal Processing, and Machine Learning (MIT)
https://lnkd.in/dkpyt6qr
7. Statistical Learning (Stanford)
https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning
8. Mining Massive Data Sets (Stanford)
📍https://online.stanford.edu/courses/soe-ycs0007-mining-massive-data-sets
ENJOY LEARNING
👏8👍5❤4🥰1
1.What is the meaning of term weight initialization in neural networks?
In neural networking, weight initialization is one of the essential factors. A bad weight initialization prevents a network from learning. On the other side, a good weight initialization helps in giving a quicker convergence and a better overall error. Biases can be initialized to zero. The standard rule for setting the weights is to be close to zero without being too small.
2.What is Cross-validation in Machine Learning?
Cross-validation allows a system to increase the performance of the given Machine Learning algorithm. This sampling process is done to break the dataset into smaller parts that have the same number of rows, out of which a random part is selected as a test set and the rest of the parts are kept as train sets. Cross-validation consists of the following techniques:
• Holdout method
• K-fold cross-validation
• Stratified k-fold cross-validation
• Leave p-out cross-validation
3.What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
4. What are the types of views in SQL?
In SQL, the views are classified into four types. They are:
Simple View: A view that is based on a single table and does not have a GROUP BY clause or other features.
Complex View: A view that is built from several tables and includes a GROUP BY clause as well as functions.
Inline View: A view that is built on a subquery in the FROM clause, which provides a temporary table and simplifies a complicated query.
Materialized View: A view that saves both the definition and the details. It builds data replicas by physically preserving them.
In neural networking, weight initialization is one of the essential factors. A bad weight initialization prevents a network from learning. On the other side, a good weight initialization helps in giving a quicker convergence and a better overall error. Biases can be initialized to zero. The standard rule for setting the weights is to be close to zero without being too small.
2.What is Cross-validation in Machine Learning?
Cross-validation allows a system to increase the performance of the given Machine Learning algorithm. This sampling process is done to break the dataset into smaller parts that have the same number of rows, out of which a random part is selected as a test set and the rest of the parts are kept as train sets. Cross-validation consists of the following techniques:
• Holdout method
• K-fold cross-validation
• Stratified k-fold cross-validation
• Leave p-out cross-validation
3.What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
4. What are the types of views in SQL?
In SQL, the views are classified into four types. They are:
Simple View: A view that is based on a single table and does not have a GROUP BY clause or other features.
Complex View: A view that is built from several tables and includes a GROUP BY clause as well as functions.
Inline View: A view that is built on a subquery in the FROM clause, which provides a temporary table and simplifies a complicated query.
Materialized View: A view that saves both the definition and the details. It builds data replicas by physically preserving them.
👍7
1. What do you understand by the term silhouette coefficient?
The silhouette coefficient is a measure of how well clustered together a data point is with respect to the other points in its cluster. It is a measure of how similar a point is to the points in its own cluster, and how dissimilar it is to the points in other clusters. The silhouette coefficient ranges from -1 to 1, with 1 being the best possible score and -1 being the worst possible score.
2. What is the difference between trend and seasonality in time series?
Trends and seasonality are two characteristics of time series metrics that break many models. Trends are continuous increases or decreases in a metric’s value. Seasonality, on the other hand, reflects periodic (cyclical) patterns that occur in a system, usually rising above a baseline and then decreasing again.
3. What is Bag of Words in NLP?
Bag of Words is a commonly used model that depends on word frequencies or occurrences to train a classifier. This model creates an occurrence matrix for documents or sentences irrespective of its grammatical structure or word order.
4. What is the difference between bagging and boosting?
Bagging is a homogeneous weak learners’ model that learns from each other independently in parallel and combines them for determining the model average. Boosting is also a homogeneous weak learners’ model but works differently from Bagging. In this model, learners learn sequentially and adaptively to improve model predictions of a learning algorithm.
ENJOY LEARNING 👍👍
The silhouette coefficient is a measure of how well clustered together a data point is with respect to the other points in its cluster. It is a measure of how similar a point is to the points in its own cluster, and how dissimilar it is to the points in other clusters. The silhouette coefficient ranges from -1 to 1, with 1 being the best possible score and -1 being the worst possible score.
2. What is the difference between trend and seasonality in time series?
Trends and seasonality are two characteristics of time series metrics that break many models. Trends are continuous increases or decreases in a metric’s value. Seasonality, on the other hand, reflects periodic (cyclical) patterns that occur in a system, usually rising above a baseline and then decreasing again.
3. What is Bag of Words in NLP?
Bag of Words is a commonly used model that depends on word frequencies or occurrences to train a classifier. This model creates an occurrence matrix for documents or sentences irrespective of its grammatical structure or word order.
4. What is the difference between bagging and boosting?
Bagging is a homogeneous weak learners’ model that learns from each other independently in parallel and combines them for determining the model average. Boosting is also a homogeneous weak learners’ model but works differently from Bagging. In this model, learners learn sequentially and adaptively to improve model predictions of a learning algorithm.
ENJOY LEARNING 👍👍
👍4