Commonly used Python libraries are:
👉🏻NumPy: This library is used for scientific computing and working with arrays of data. It provides functions for working with arrays of data, including mathematical operations, linear algebra, and random number generation.
👉🏻Pandas: This library is used for data manipulation and analysis. It provides tools for importing, cleaning, and transforming data, as well as tools for working with time series data and performing statistical analysis.
👉🏻Matplotlib: This library is used for data visualization. It provides functions for creating a wide range of plots, including scatter plots, line plots, bar plots, and histograms.
👉🏻Scikit-learn: This library is used for machine learning. It provides a range of algorithms for classification, regression, clustering, and dimensionality reduction, as well as tools for model evaluation and selection.
👉🏻TensorFlow: This library is used for deep learning. It provides a range of tools and libraries for building and training neural networks, including support for distributed training and hardware acceleration.
👉🏻NumPy: This library is used for scientific computing and working with arrays of data. It provides functions for working with arrays of data, including mathematical operations, linear algebra, and random number generation.
👉🏻Pandas: This library is used for data manipulation and analysis. It provides tools for importing, cleaning, and transforming data, as well as tools for working with time series data and performing statistical analysis.
👉🏻Matplotlib: This library is used for data visualization. It provides functions for creating a wide range of plots, including scatter plots, line plots, bar plots, and histograms.
👉🏻Scikit-learn: This library is used for machine learning. It provides a range of algorithms for classification, regression, clustering, and dimensionality reduction, as well as tools for model evaluation and selection.
👉🏻TensorFlow: This library is used for deep learning. It provides a range of tools and libraries for building and training neural networks, including support for distributed training and hardware acceleration.
👍19
1. Compare SVM and Logistic Regression in handling outliers
For Logistic Regression, outliers can have an unusually large effect on the estimate of logistic regression coefficients. It will find a linear boundary if it exists to accommodate the outliers. To solve the problem of outliers, sometimes a sigmoid function is used in logistic regression.
For SVM, outliers can make the decision boundary deviate severely from the optimal hyperplane. One way for SVM to get around the problem is to introduce slack variables. There is a penalty involved with using slack variables, and how SVM handles outliers depends on how this penalty is imposed.
2. Can you explain how to implement a simple kNN algorithm in code?
The kNN algorithm can be used for a variety of tasks (classification & regression). To implement it, you will need to first calculate the distance between the new data point and all of the training data points. Once you have the distances, you will then need to find the k nearest neighbors and take the majority vote of those neighbors to determine the class of the new data point.
3. What type of node is considered Pure in the decision tree?
If the Gini Index of the data is 0 then it means that all the elements belong to a specific class. When this happens it is said to be pure.
When all of the data belongs to a single class (pure) then the leaf node is reached in the tree.
The leaf node represents the class label in the tree (which means that it gives the final output).
4. What is Space and Time Complexity of the Hierarchical Clustering Algorithm?
Space complexity: Hierarchical Clustering Technique requires very high space when the number of observations in our dataset is more since we need to store the similarity matrix in the RAM. So, the space complexity is the order of the square of n. Space complexity = O(n²) where n is the number of observations.
Time complexity: Since we have to perform n iterations and in each iteration, we need to update the proximity matrix and also restore that matrix, therefore the time complexity is also very high. So, the time complexity is the order of the cube of n. Time complexity = O(n³) where n is the number of observations.
ENJOY LEARNING 👍👍
For Logistic Regression, outliers can have an unusually large effect on the estimate of logistic regression coefficients. It will find a linear boundary if it exists to accommodate the outliers. To solve the problem of outliers, sometimes a sigmoid function is used in logistic regression.
For SVM, outliers can make the decision boundary deviate severely from the optimal hyperplane. One way for SVM to get around the problem is to introduce slack variables. There is a penalty involved with using slack variables, and how SVM handles outliers depends on how this penalty is imposed.
2. Can you explain how to implement a simple kNN algorithm in code?
The kNN algorithm can be used for a variety of tasks (classification & regression). To implement it, you will need to first calculate the distance between the new data point and all of the training data points. Once you have the distances, you will then need to find the k nearest neighbors and take the majority vote of those neighbors to determine the class of the new data point.
3. What type of node is considered Pure in the decision tree?
If the Gini Index of the data is 0 then it means that all the elements belong to a specific class. When this happens it is said to be pure.
When all of the data belongs to a single class (pure) then the leaf node is reached in the tree.
The leaf node represents the class label in the tree (which means that it gives the final output).
4. What is Space and Time Complexity of the Hierarchical Clustering Algorithm?
Space complexity: Hierarchical Clustering Technique requires very high space when the number of observations in our dataset is more since we need to store the similarity matrix in the RAM. So, the space complexity is the order of the square of n. Space complexity = O(n²) where n is the number of observations.
Time complexity: Since we have to perform n iterations and in each iteration, we need to update the proximity matrix and also restore that matrix, therefore the time complexity is also very high. So, the time complexity is the order of the cube of n. Time complexity = O(n³) where n is the number of observations.
ENJOY LEARNING 👍👍
👍11😁2
How to start learning Data Science?
There are many resources available to help you start learning data science, depending on your background and goals.
Here are a few steps you can take:
Develop a strong understanding of the basics of statistics and programming.
Learn Python or R programming languages, both are popular among data scientists.
Learn the basics of data manipulation and visualization with tools such as pandas and matplotlib.
Learn the basics of machine learning, such as linear regression and k-nearest neighbors, and practice applying them to real-world datasets.
Take online courses and tutorials, such as those offered by Coursera, edX, and DataCamp.
Practice by working on projects and participating in online data science competitions.
Get familiar with popular data science libraries such as numpy, scikit-learn, tensorflow, keras and pytorch.
It's a good idea to start with a solid foundation in statistics and programming, and then build on that foundation by learning the specific tools and techniques used in data science. As you gain experience, you can start working on more complex projects and exploring specialized areas of the field.
There are many resources available to help you start learning data science, depending on your background and goals.
Here are a few steps you can take:
Develop a strong understanding of the basics of statistics and programming.
Learn Python or R programming languages, both are popular among data scientists.
Learn the basics of data manipulation and visualization with tools such as pandas and matplotlib.
Learn the basics of machine learning, such as linear regression and k-nearest neighbors, and practice applying them to real-world datasets.
Take online courses and tutorials, such as those offered by Coursera, edX, and DataCamp.
Practice by working on projects and participating in online data science competitions.
Get familiar with popular data science libraries such as numpy, scikit-learn, tensorflow, keras and pytorch.
It's a good idea to start with a solid foundation in statistics and programming, and then build on that foundation by learning the specific tools and techniques used in data science. As you gain experience, you can start working on more complex projects and exploring specialized areas of the field.
👍16🔥2
1. What is DBSCAN Clustering?
DBSCAN groups ‘densely grouped’ data points into a single cluster. It can identify clusters in large spatial datasets by looking at the local density of the data points. The most exciting feature of DBSCAN clustering is that it is robust to outliers. It also does not require the number of clusters to be told beforehand, unlike K-Means, where we have to specify the number of centroids.
2. What are the different forms of joins in a table?
SQL has many kinds of different joins including INNER JOIN, SELF JOIN, CROSS JOIN, and OUTER JOIN. In fact, each join type defines the way two tables are related in a query. OUTER JOINS can further be divided into LEFT OUTER JOINS, RIGHT OUTER JOINS, and FULL OUTER JOINS.
3.How is the grid search parameter different from the random search?
Model Hyperparameter tuning is very useful to enhance the performance of a machine learning model. The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.
Random search is a technique where random combinations of the hyperparameters are used to find the best solution for the built model. The drawback of random search is that it yields high variance during computing. Since the selection of parameters is completely random; and since no intelligence is used to sample these combinations, luck plays its role.
4.How should you maintain a deployed model?
A deployed model needs to be retrained after a while so as to improve the performance of the model. Since deployment, a track should be kept of the predictions made by the model and the truth values. Later this can be used to retrain the model with the new data. Also, root cause analysis for wrong predictions should be done.
DBSCAN groups ‘densely grouped’ data points into a single cluster. It can identify clusters in large spatial datasets by looking at the local density of the data points. The most exciting feature of DBSCAN clustering is that it is robust to outliers. It also does not require the number of clusters to be told beforehand, unlike K-Means, where we have to specify the number of centroids.
2. What are the different forms of joins in a table?
SQL has many kinds of different joins including INNER JOIN, SELF JOIN, CROSS JOIN, and OUTER JOIN. In fact, each join type defines the way two tables are related in a query. OUTER JOINS can further be divided into LEFT OUTER JOINS, RIGHT OUTER JOINS, and FULL OUTER JOINS.
3.How is the grid search parameter different from the random search?
Model Hyperparameter tuning is very useful to enhance the performance of a machine learning model. The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.
Random search is a technique where random combinations of the hyperparameters are used to find the best solution for the built model. The drawback of random search is that it yields high variance during computing. Since the selection of parameters is completely random; and since no intelligence is used to sample these combinations, luck plays its role.
4.How should you maintain a deployed model?
A deployed model needs to be retrained after a while so as to improve the performance of the model. Since deployment, a track should be kept of the predictions made by the model and the truth values. Later this can be used to retrain the model with the new data. Also, root cause analysis for wrong predictions should be done.
👍9❤2
1. How does a Decision Tree handle continuous(numerical) features?
Ans.
Decision Trees handle continuous features by converting these continuous features to a threshold-based boolean feature.
To decide The threshold value, we use the concept of Information Gain, choosing that threshold that maximizes the information gain.
2. What are Loss Function and Cost Functions?
Ans. the loss function is to capture the difference between the actual and predicted values for a single record whereas cost functions aggregate the difference for the entire training dataset.
The Most commonly used loss functions are Mean-squared error and Hinge loss.
3. What is the difference between Python Arrays and lists?
Ans. Arrays in python can only contain elements of same data types i.e., data type of array should be homogeneous. It is a thin wrapper around C language arrays and consumes far less memory than lists.
Lists in python can contain elements of different data types i.e., data type of lists can be heterogeneous. It has the disadvantage of consuming large memory.
4. What is root cause analysis? What is a causation vs. a correlation?
Ans. Root cause analysis: a method of problem-solving used for identifying the root cause(s) of a problem [5]
Correlation measures the relationship between two variables, range from -1 to 1. Causation is when a first event appears to have caused a second event. Causation essentially looks at direct relationships while correlation can look at both direct and indirect relationships.
ENJOY LEARNING 👍👍
Ans.
Decision Trees handle continuous features by converting these continuous features to a threshold-based boolean feature.
To decide The threshold value, we use the concept of Information Gain, choosing that threshold that maximizes the information gain.
2. What are Loss Function and Cost Functions?
Ans. the loss function is to capture the difference between the actual and predicted values for a single record whereas cost functions aggregate the difference for the entire training dataset.
The Most commonly used loss functions are Mean-squared error and Hinge loss.
3. What is the difference between Python Arrays and lists?
Ans. Arrays in python can only contain elements of same data types i.e., data type of array should be homogeneous. It is a thin wrapper around C language arrays and consumes far less memory than lists.
Lists in python can contain elements of different data types i.e., data type of lists can be heterogeneous. It has the disadvantage of consuming large memory.
4. What is root cause analysis? What is a causation vs. a correlation?
Ans. Root cause analysis: a method of problem-solving used for identifying the root cause(s) of a problem [5]
Correlation measures the relationship between two variables, range from -1 to 1. Causation is when a first event appears to have caused a second event. Causation essentially looks at direct relationships while correlation can look at both direct and indirect relationships.
ENJOY LEARNING 👍👍
👍9❤4
1. Explain the difference between L1 and L2 regularization.
Answer: L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior.
2. What is deep learning, and how does it contrast with other machine learning algorithms?
Answer: Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.
3. Name an example where ensemble techniques might be useful.
Answer: Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data).
You could list some examples of ensemble methods (bagging, boosting, the “bucket of models” method) and demonstrate how they could increase predictive power.
4. What’s the “kernel trick” and how is it useful?
Answer: The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates.
Answer: L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior.
2. What is deep learning, and how does it contrast with other machine learning algorithms?
Answer: Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.
3. Name an example where ensemble techniques might be useful.
Answer: Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data).
You could list some examples of ensemble methods (bagging, boosting, the “bucket of models” method) and demonstrate how they could increase predictive power.
4. What’s the “kernel trick” and how is it useful?
Answer: The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates.
👍11
Three different learning styles in machine learning algorithms:
1. Supervised Learning
Input data is called training data and has a known label or result such as spam/not-spam or a stock price at a time.
A model is prepared through a training process in which it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data.
Example problems are classification and regression.
Example algorithms include: Logistic Regression and the Back Propagation Neural Network.
2. Unsupervised Learning
Input data is not labeled and does not have a known result.
A model is prepared by deducing structures present in the input data. This may be to extract general rules. It may be through a mathematical process to systematically reduce redundancy, or it may be to organize data by similarity.
Example problems are clustering, dimensionality reduction and association rule learning.
Example algorithms include: the Apriori algorithm and K-Means.
3. Semi-Supervised Learning
Input data is a mixture of labeled and unlabelled examples.
There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions.
Example problems are classification and regression.
Example algorithms are extensions to other flexible methods that make assumptions about how to model the unlabeled data.
1. Supervised Learning
Input data is called training data and has a known label or result such as spam/not-spam or a stock price at a time.
A model is prepared through a training process in which it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data.
Example problems are classification and regression.
Example algorithms include: Logistic Regression and the Back Propagation Neural Network.
2. Unsupervised Learning
Input data is not labeled and does not have a known result.
A model is prepared by deducing structures present in the input data. This may be to extract general rules. It may be through a mathematical process to systematically reduce redundancy, or it may be to organize data by similarity.
Example problems are clustering, dimensionality reduction and association rule learning.
Example algorithms include: the Apriori algorithm and K-Means.
3. Semi-Supervised Learning
Input data is a mixture of labeled and unlabelled examples.
There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions.
Example problems are classification and regression.
Example algorithms are extensions to other flexible methods that make assumptions about how to model the unlabeled data.
👍9
1. What is the Impact of Outliers on Logistic Regression?
The estimates of the Logistic Regression are sensitive to unusual observations such as outliers, high leverage, and influential observations. Therefore, to solve the problem of outliers, a sigmoid function is used in Logistic Regression.
2. What is the difference between vanilla RNNs and LSTMs?
The main difference between vanilla RNNs and LSTMs is that LSTMs are able to better remember long-term dependencies, while vanilla RNNs tend to forget them. This is due to the fact that LSTMs have a special type of memory cell that can retain information for longer periods of time, while vanilla RNNs only have a single layer of memory cells.
3. What is Masked Language Model in NLP?
Masked language models help learners to understand deep representations in downstream tasks by taking an output from the corrupt input. This model is often used to predict the words to be used in a sentence.
4. Why is the KNN Algorithm known as Lazy Learner?
When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.
The estimates of the Logistic Regression are sensitive to unusual observations such as outliers, high leverage, and influential observations. Therefore, to solve the problem of outliers, a sigmoid function is used in Logistic Regression.
2. What is the difference between vanilla RNNs and LSTMs?
The main difference between vanilla RNNs and LSTMs is that LSTMs are able to better remember long-term dependencies, while vanilla RNNs tend to forget them. This is due to the fact that LSTMs have a special type of memory cell that can retain information for longer periods of time, while vanilla RNNs only have a single layer of memory cells.
3. What is Masked Language Model in NLP?
Masked language models help learners to understand deep representations in downstream tasks by taking an output from the corrupt input. This model is often used to predict the words to be used in a sentence.
4. Why is the KNN Algorithm known as Lazy Learner?
When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.
👍7🔥4
1. What are the disadvantages of the linear regression model?
One of the most significant demerits of the linear model is that it is sensitive and dependent on the outliers. It can affect the overall result. Another notable demerit of the linear model is overfitting. Similarly, underfitting is also a significant disadvantage of the linear model.
2. Why Naive Bayes is called Naive?
We call it naive because its assumptions (it assumes that all of the features in the dataset are equally important and independent) are really optimistic and rarely true in most real-world applications:
we consider that these predictors are independent
we consider that all the predictors have an equal effect on the outcome (like the day being windy does not have more importance in deciding to play golf or not)
3. How does Random Forest handle missing values?
The Random Forest methods encourage two ways of handling missing values:
Drop data points with missing values. This is not recommended due to the fact that all the available data points is not used.
Fill in the missing values with the median (for numerical values) or mode (for categorical values). This method will brush too broad a stroke for datasets with many gaps and significant structure.
There are other methods of filling in missing values such as calculating the similarity between the missing features, and the missing values estimated by weighting.
4. Why does XGBoost perform better than SVM?
In case of missing values, XGB is internally designed to handle missing values. The missing values are interpreted in such a way that if there endures any trend in the missing values, it is captured by the model. Users are required to supply a different value than other observations and pass that as a parameter. XGBoost tries different things as it encounters a missing value on each node and learns which path to take for missing values in future. On the other hand, Support Vector Machine (SVM) does not perform well with the missing data and it is always a better option to impute the missing values before running SVM.
ENJOY LEARNING 👍👍
One of the most significant demerits of the linear model is that it is sensitive and dependent on the outliers. It can affect the overall result. Another notable demerit of the linear model is overfitting. Similarly, underfitting is also a significant disadvantage of the linear model.
2. Why Naive Bayes is called Naive?
We call it naive because its assumptions (it assumes that all of the features in the dataset are equally important and independent) are really optimistic and rarely true in most real-world applications:
we consider that these predictors are independent
we consider that all the predictors have an equal effect on the outcome (like the day being windy does not have more importance in deciding to play golf or not)
3. How does Random Forest handle missing values?
The Random Forest methods encourage two ways of handling missing values:
Drop data points with missing values. This is not recommended due to the fact that all the available data points is not used.
Fill in the missing values with the median (for numerical values) or mode (for categorical values). This method will brush too broad a stroke for datasets with many gaps and significant structure.
There are other methods of filling in missing values such as calculating the similarity between the missing features, and the missing values estimated by weighting.
4. Why does XGBoost perform better than SVM?
In case of missing values, XGB is internally designed to handle missing values. The missing values are interpreted in such a way that if there endures any trend in the missing values, it is captured by the model. Users are required to supply a different value than other observations and pass that as a parameter. XGBoost tries different things as it encounters a missing value on each node and learns which path to take for missing values in future. On the other hand, Support Vector Machine (SVM) does not perform well with the missing data and it is always a better option to impute the missing values before running SVM.
ENJOY LEARNING 👍👍
👍12❤4
7 Baby steps to start with Machine Learning:
1. Start with Python
2. Learn to use Google Colab
3. Take a Pandas tutorial
4. Then a Seaborn tutorial
5. Decision Trees are a good first algorithm
6. Finish Kaggle's "Intro to Machine Learning"
7. Solve the Titanic challenge
1. Start with Python
2. Learn to use Google Colab
3. Take a Pandas tutorial
4. Then a Seaborn tutorial
5. Decision Trees are a good first algorithm
6. Finish Kaggle's "Intro to Machine Learning"
7. Solve the Titanic challenge
👍35❤2
1. What do you understand by a random forest model?
It combines multiple models together to get the final output or, to be more precise, it combines multiple decision trees together to get the final output. So, decision trees are the building blocks of the random forest model.
2. How are Data Science and Machine Learning related to each other?
Data Science and Machine Learning are two terms that are closely related but are often misunderstood. Both of them deal with data. Data Science is a broad field that deals with large volumes of data and allows us to draw insights out of this voluminous data. Machine Learning, on the other hand, can be thought of as a sub-field of Data Science. It also deals with data, but here, we are solely focused on learning how to convert the processed data into a functional model, which can be used to map inputs to outputs, e.g., a model that can expect an image as an input and tell us if that image contains a flower as an output.
3. What is a kernel function in SVM?
In the SVM algorithm, a kernel function is a special mathematical function. In simple terms, a kernel function takes data as input and converts it into a required form. This transformation of the data is based on something called a kernel trick, which is what gives the kernel function its name. Using the kernel function, we can transform the data that is not linearly separable (cannot be separated using a straight line) into one that is linearly separable.
4. Explain TF/IDF vectorization.
The expression ‘TF/IDF’ stands for Term Frequency–Inverse Document Frequency. It is a numerical measure that allows us to determine how important a word is to a document in a collection of documents called a corpus. TF/IDF is used often in text mining and information retrieval.
ENJOY LEARNING 👍👍
It combines multiple models together to get the final output or, to be more precise, it combines multiple decision trees together to get the final output. So, decision trees are the building blocks of the random forest model.
2. How are Data Science and Machine Learning related to each other?
Data Science and Machine Learning are two terms that are closely related but are often misunderstood. Both of them deal with data. Data Science is a broad field that deals with large volumes of data and allows us to draw insights out of this voluminous data. Machine Learning, on the other hand, can be thought of as a sub-field of Data Science. It also deals with data, but here, we are solely focused on learning how to convert the processed data into a functional model, which can be used to map inputs to outputs, e.g., a model that can expect an image as an input and tell us if that image contains a flower as an output.
3. What is a kernel function in SVM?
In the SVM algorithm, a kernel function is a special mathematical function. In simple terms, a kernel function takes data as input and converts it into a required form. This transformation of the data is based on something called a kernel trick, which is what gives the kernel function its name. Using the kernel function, we can transform the data that is not linearly separable (cannot be separated using a straight line) into one that is linearly separable.
4. Explain TF/IDF vectorization.
The expression ‘TF/IDF’ stands for Term Frequency–Inverse Document Frequency. It is a numerical measure that allows us to determine how important a word is to a document in a collection of documents called a corpus. TF/IDF is used often in text mining and information retrieval.
ENJOY LEARNING 👍👍
👍20
1. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?
One-hot encoding is the representation of categorical variables as binary vectors. Label Encoding is converting labels/words into numeric form. Using one-hot encoding increases the dimensionality of the data set. Label encoding doesn’t affect the dimensionality of the data set. One-hot encoding creates a new variable for each level in the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.
2. When does regularization come into play in Machine Learning?
At times when the model begins to underfit or overfit, regularization becomes necessary. It is a regression that diverts or regularizes the coefficient estimates towards zero. It reduces flexibility and discourages learning in a model to avoid the risk of overfitting. The model complexity is reduced and it becomes better at predicting.
3. How can we relate standard deviation and variance?
Standard deviation refers to the spread of your data from the mean. Variance is the average degree to which each point differs from the mean i.e. the average of all data points. We can relate Standard deviation and Variance because it is the square root of Variance.
4. What is the exploding gradient problem while using the back propagation technique?
When large error gradients accumulate and result in large changes in the neural network weights during training, it is called the exploding gradient problem. The values of weights can become so large as to overflow and result in NaN values. This makes the model unstable and the learning of the model to stall just like the vanishing gradient problem.
One-hot encoding is the representation of categorical variables as binary vectors. Label Encoding is converting labels/words into numeric form. Using one-hot encoding increases the dimensionality of the data set. Label encoding doesn’t affect the dimensionality of the data set. One-hot encoding creates a new variable for each level in the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.
2. When does regularization come into play in Machine Learning?
At times when the model begins to underfit or overfit, regularization becomes necessary. It is a regression that diverts or regularizes the coefficient estimates towards zero. It reduces flexibility and discourages learning in a model to avoid the risk of overfitting. The model complexity is reduced and it becomes better at predicting.
3. How can we relate standard deviation and variance?
Standard deviation refers to the spread of your data from the mean. Variance is the average degree to which each point differs from the mean i.e. the average of all data points. We can relate Standard deviation and Variance because it is the square root of Variance.
4. What is the exploding gradient problem while using the back propagation technique?
When large error gradients accumulate and result in large changes in the neural network weights during training, it is called the exploding gradient problem. The values of weights can become so large as to overflow and result in NaN values. This makes the model unstable and the learning of the model to stall just like the vanishing gradient problem.
👍5❤4👏1
1. Mention The Different Types Of Data Structures In pandas?
There are two data structures supported by pandas library, Series and DataFrames. Both of the data structures are built on top of Numpy. Series is a one-dimensional data structure in pandas and DataFrame is the two-dimensional data structure in pandas. There is one more axis label known as Panel which is a three-dimensional data structure and it includes items, major_axis, and minor_axis.
2. Why is KNN a non-parametric Algorithm?
The term “non-parametric” refers to not making any assumptions on the underlying data distribution. These methods do not have any fixed numbers of parameters in the model.
Similarly in KNN, the model parameters grow with the training data by considering each training case as a parameter of the model. So, KNN is a non-parametric algorithm.
3. Explain the CART Algorithm for Decision Trees.
CART is a variation of the decision tree algorithm. It can handle both classification and regression tasks.The CART stands for Classification and Regression Trees is a greedy algorithm that greedily searches for an optimum split at the top level, then repeats the same process at each of the subsequent levels. Moreover, it does verify whether the split will lead to the lowest impurity or not as well as the solution provided by the greedy algorithm is not guaranteed to be optimal, it often produces a solution that’s reasonably good since finding the optimal Tree is an NP-Complete problem that requires exponential time complexity.
4. Explain leave-p-out cross validation.
When using this exhaustive method, we take p number of points out from the total number of data points in the dataset(say n). While training the model we train it on these (n – p) data points and test the model on p data points. We repeat this process for all the possible combinations of p from the original dataset. Then to get the final accuracy, we average the accuracies from all these iterations.
ENJOY LEARNING 👍👍
There are two data structures supported by pandas library, Series and DataFrames. Both of the data structures are built on top of Numpy. Series is a one-dimensional data structure in pandas and DataFrame is the two-dimensional data structure in pandas. There is one more axis label known as Panel which is a three-dimensional data structure and it includes items, major_axis, and minor_axis.
2. Why is KNN a non-parametric Algorithm?
The term “non-parametric” refers to not making any assumptions on the underlying data distribution. These methods do not have any fixed numbers of parameters in the model.
Similarly in KNN, the model parameters grow with the training data by considering each training case as a parameter of the model. So, KNN is a non-parametric algorithm.
3. Explain the CART Algorithm for Decision Trees.
CART is a variation of the decision tree algorithm. It can handle both classification and regression tasks.The CART stands for Classification and Regression Trees is a greedy algorithm that greedily searches for an optimum split at the top level, then repeats the same process at each of the subsequent levels. Moreover, it does verify whether the split will lead to the lowest impurity or not as well as the solution provided by the greedy algorithm is not guaranteed to be optimal, it often produces a solution that’s reasonably good since finding the optimal Tree is an NP-Complete problem that requires exponential time complexity.
4. Explain leave-p-out cross validation.
When using this exhaustive method, we take p number of points out from the total number of data points in the dataset(say n). While training the model we train it on these (n – p) data points and test the model on p data points. We repeat this process for all the possible combinations of p from the original dataset. Then to get the final accuracy, we average the accuracies from all these iterations.
ENJOY LEARNING 👍👍
👍12